Re: [GEOM] Disk IO error when resyncing gmirror - massive hang in D state

2015-04-16 Thread Dmitry Morozovsky
Walter,

thanks for your suggestions.

to quickly answer: I' already evacuated data to the new drive (see the last 
paragraph of my original message). Luckily no critical data were on failed disk 
part, so rsync finished well the very first pass.

The only question still actually open for me is why the kernel was stuck in 
geom, not returning read/write errors to the applications

I'll try to collect lab machine with this drive (which is still by my work 
table) and reproduce the error.



On Wed, 15 Apr 2015, Walter Cramer wrote:

 Here are a few ideas I had, if more capable people have not already sent you
 better ones:
 
 Copy as much important data as possible from the Toshiba drive, since it could
 degrade further or die at any time.
 
 Check whether a 'dd' command can quickly reproduce the error, so you can try
 things faster.
 
 If the failing drive is not fairly cold, try chilling it with a strong fan.
 
 Briefly put the drive in another system, to see if using a different power
 supply, controller, data cable, etc. would help.  Changing the orientation
 (direction of gravity on the drive) might also be good.
 
 If nothing else helped, a tiny c language program might use open(), read(),
 lseek(), write(), etc. to copy all readable sectors to your replacement disk
 (using zeros for the unreadable bad sectors).
 
 -Walter
 
 
 On Tue, 14 Apr 2015, Dmitry Morozovsky wrote:
 
  Dear colleagues,
  
  unfortunately, the machine in question is in productin, so I have no clear
  reproduce case. I do have console logs, however.
  
  prerequisites:
  - rather fresh stable/10, amd64, SuperMicro MicroCloud 1150, X10SLD-F/HF
  - su+j ufs2 on top of gmirror of two SATA Toshiba drives
  - one disk died some time ago, so gmirror works in degraded state
  
  trouble:
  - inserted new drive, labelled, started gmirror resync
  - apparently remaining drive also has read issues:
  (ada0:ahcich1:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 10 b2 c3 40 01 00 00 01
  00 00
  (ada0:ahcich1:0:0:0): CAM status: ATA Status Error
  (ada0:ahcich1:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
  (ada0:ahcich1:0:0:0): RES: 41 40 04 b3 c3 40 01 00 00 00 01
  (ada0:ahcich1:0:0:0): Error 5, Retries exhausted
  GEOM_MIRROR: Request failed (error=5). ada0a[READ(offset=6566445056,
  length=131072)]
  GEOM_MIRROR: Synchronization request failed (error=5).
  mirror/m0a[READ(offset=6566445056, length=131072)]
  
  at this point, all requests to disk I/O are stalled, all cron jobs, syslogd,
  dchpd, etc.
  
  Situation reproduce itself at least two times, then as an emergency new
  drive
  had been labelled independently and rsynced over.
  
  Any thoughts?
  
  Thanks in advance!
  
  
  -- 
  Sincerely,
  D.Marck [DM5020, MCK-RIPE, DM3-RIPN]
  [ FreeBSD committer: ma...@freebsd.org ]
  
  *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- ma...@rinet.ru ***
  
  ___
  freebsd-stable@freebsd.org mailing list
  http://lists.freebsd.org/mailman/listinfo/freebsd-stable
  To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
 

-- 
Sincerely,
D.Marck [DM5020, MCK-RIPE, DM3-RIPN]
[ FreeBSD committer: ma...@freebsd.org ]

*** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- ma...@rinet.ru ***

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


[GEOM] Disk IO error when resyncing gmirror - massive hang in D state

2015-04-13 Thread Dmitry Morozovsky
Dear colleagues,

unfortunately, the machine in question is in productin, so I have no clear 
reproduce case. I do have console logs, however.

prerequisites:
- rather fresh stable/10, amd64, SuperMicro MicroCloud 1150, X10SLD-F/HF
- su+j ufs2 on top of gmirror of two SATA Toshiba drives
- one disk died some time ago, so gmirror works in degraded state

trouble:
- inserted new drive, labelled, started gmirror resync
- apparently remaining drive also has read issues:
(ada0:ahcich1:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 10 b2 c3 40 01 00 00 01 00 
00
(ada0:ahcich1:0:0:0): CAM status: ATA Status Error
(ada0:ahcich1:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
(ada0:ahcich1:0:0:0): RES: 41 40 04 b3 c3 40 01 00 00 00 01
(ada0:ahcich1:0:0:0): Error 5, Retries exhausted
GEOM_MIRROR: Request failed (error=5). ada0a[READ(offset=6566445056, 
length=131072)]
GEOM_MIRROR: Synchronization request failed (error=5). 
mirror/m0a[READ(offset=6566445056, length=131072)]

at this point, all requests to disk I/O are stalled, all cron jobs, syslogd, 
dchpd, etc.

Situation reproduce itself at least two times, then as an emergency new drive 
had been labelled independently and rsynced over.

Any thoughts?

Thanks in advance!


-- 
Sincerely,
D.Marck [DM5020, MCK-RIPE, DM3-RIPN]
[ FreeBSD committer: ma...@freebsd.org ]

*** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- ma...@rinet.ru ***

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org