Re: deadlock or bad disk ? RELENG_8

2011-03-25 Thread Steven Hartland
- Original Message - From: Jeremy Chadwick free...@jdc.parodius.com Was there any conclusion from this guys, was there a bad disk causing the issue? Regards Steve This e.mail is private and confidential between Multiplay (UK) Ltd.

Re: deadlock or bad disk ? RELENG_8

2011-03-25 Thread Mike Tancsa
On 3/25/2011 6:29 AM, Steven Hartland wrote: - Original Message - From: Jeremy Chadwick free...@jdc.parodius.com Was there any conclusion from this guys, was there a bad disk causing the issue? You mean this old thread ?

Re: deadlock or bad disk ? RELENG_8

2011-03-25 Thread Steven Hartland
- Original Message - From: Mike Tancsa m...@sentex.net I would say probably the disk mostly. Perhaps a driver or firmware bug on the Areca. Hard to say. The drive totally failed a month or so later. Also, moved to a later firmware on the areaca controller after that and all has

Re: deadlock or bad disk ? RELENG_8

2011-03-25 Thread Jeremy Chadwick
On Fri, Mar 25, 2011 at 01:59:01PM -, Steven Hartland wrote: - Original Message - From: Mike Tancsa m...@sentex.net I would say probably the disk mostly. Perhaps a driver or firmware bug on the Areca. Hard to say. The drive totally failed a month or so later. Also, moved to a

Re: deadlock or bad disk ? RELENG_8

2011-03-25 Thread Steven Hartland
- Original Message - From: Jeremy Chadwick free...@jdc.parodius.com I apologise in advance if I have already reviewed your situation, but if you could please provide full smartctl -a output for the disk, I can review the data to see if anything looks out of place. An example: on some

Re: deadlock or bad disk ? RELENG_8

2011-03-25 Thread Mike Tancsa
On 3/25/2011 11:37 AM, Steven Hartland wrote: I've raised this with their support as an issue with their areca-cli utility so hopefully they will fix. Alternatively maybe smartctl will add support for the areca under freebsd in the future. Hopefully smartmontools will eventually work. It can

Re: deadlock or bad disk ? RELENG_8

2011-03-25 Thread Jeremy Chadwick
On Fri, Mar 25, 2011 at 03:37:55PM -, Steven Hartland wrote: - Original Message - From: Jeremy Chadwick free...@jdc.parodius.com I apologise in advance if I have already reviewed your situation, but if you could please provide full smartctl -a output for the disk, I can review

Re: deadlock or bad disk ? RELENG_8

2011-03-25 Thread Steven Hartland
- Original Message - From: Jeremy Chadwick free...@jdc.parodius.com Bummer. Competitor's drivers make use of pass(4) and/or xpt(4), the result being that you can see (and talk to directly) all the disks which are on the RAID card. No need for a CLI utility getting in the way, etc..

Re: deadlock or bad disk ? RELENG_8

2011-03-25 Thread Mike Tancsa
On 3/25/2011 9:28 PM, Steven Hartland wrote: Interesting, camcontrol reports the following, so does that indicate it may be possible to do this: camcontrol devlist Areca ARC-1220-VOL#00 R001 at scbus0 target 0 lun 0 (da0,pass0) Areca RAID controller R001 at scbus0 target 16 lun

Re: deadlock or bad disk ? RELENG_8

2011-03-25 Thread Jeremy Chadwick
On Fri, Mar 25, 2011 at 09:55:23PM -0400, Mike Tancsa wrote: On 3/25/2011 9:28 PM, Steven Hartland wrote: Interesting, camcontrol reports the following, so does that indicate it may be possible to do this: camcontrol devlist Areca ARC-1220-VOL#00 R001 at scbus0 target 0 lun 0

Re: deadlock or bad disk ? RELENG_8

2010-07-19 Thread Mike Tancsa
At 11:34 PM 7/18/2010, Jeremy Chadwick wrote: yes, da0 is a RAID volume with 4 disks behind the scenes. Okay, so can you get full SMART statistics for all 4 of those disks? The adjusted/calculated values for SMART thresholds won't be helpful here, one will need the actual raw SMART data. I

Re: deadlock or bad disk ? RELENG_8

2010-07-19 Thread Mike Tancsa
At 11:58 PM 7/18/2010, Jeremy Chadwick wrote: So I believe this indicates the message only gets printed during swapin, not swapout. Meaning it's happening during an I/O read from da0. Yes, and from my existing ssh sessions, it would _seem_ no disk IO was completing. ie I tried a killall -9

Re: deadlock or bad disk ? RELENG_8

2010-07-19 Thread Mike Tancsa
At 12:11 AM 7/19/2010, Jeremy Chadwick wrote: On Sun, Jul 18, 2010 at 08:58:44PM -0700, Jeremy Chadwick wrote: I took a look at the RELENG_8 code responsible for printing this message: src/sys/vm/swap_pager.c [...] 1086 static int 1087 swap_pager_getpages(vm_object_t object, vm_page_t *m,

Re: deadlock or bad disk ? RELENG_8

2010-07-19 Thread Sascha Holzleiter
just hangs, I guess because its having trouble reading from the disk. If I hit CTRL+t, I see load: 0.00 cmd: csh 73167 [vnread] 22.32r 0.00u 0.00s 0% 3232k load: 0.00 cmd: csh 73167 [vnread] 22.65r 0.00u 0.00s 0% 3232k load: 0.00 cmd: csh 73167 [vnread] 22.96r 0.00u 0.00s 0% 3232k

Re: deadlock or bad disk ? RELENG_8

2010-07-19 Thread Jeremy Chadwick
On Mon, Jul 19, 2010 at 08:41:40AM -0400, Mike Tancsa wrote: At 11:58 PM 7/18/2010, Jeremy Chadwick wrote: So I believe this indicates the message only gets printed during swapin, not swapout. Meaning it's happening during an I/O read from da0. Yes, and from my existing ssh sessions, it

Re: deadlock or bad disk ? RELENG_8

2010-07-19 Thread Jeremy Chadwick
On Mon, Jul 19, 2010 at 08:37:50AM -0400, Mike Tancsa wrote: At 11:34 PM 7/18/2010, Jeremy Chadwick wrote: yes, da0 is a RAID volume with 4 disks behind the scenes. Okay, so can you get full SMART statistics for all 4 of those disks? The adjusted/calculated values for SMART thresholds

deadlock or bad disk ? RELENG_8

2010-07-18 Thread Mike Tancsa
On the serial console I see swap_pager: indefinite wait buffer: bufobj: 0, blkno: 74, size: 4096 swap_pager: indefinite wait buffer: bufobj: 0, blkno: 128, size: 20480 swap_pager: indefinite wait buffer: bufobj: 0, blkno: 69, size: 4096 swap_pager: indefinite wait buffer: bufobj: 0, blkno: 6,

Re: deadlock or bad disk ? RELENG_8

2010-07-18 Thread Jeremy Chadwick
On Sun, Jul 18, 2010 at 05:08:09PM -0400, Mike Tancsa wrote: On the serial console I see swap_pager: indefinite wait buffer: bufobj: 0, blkno: 74, size: 4096 swap_pager: indefinite wait buffer: bufobj: 0, blkno: 128, size: 20480 swap_pager: indefinite wait buffer: bufobj: 0, blkno: 69,

Re: deadlock or bad disk ? RELENG_8

2010-07-18 Thread Mike Tancsa
At 05:14 PM 7/18/2010, Jeremy Chadwick wrote: Where exactly is your swap partition? On one of the areca raidsets. # swapctl -l Device: 1024-blocks Used: /dev/da0s1b10485760 108 If you Google for swap_pager: indefinite wait buffer: bufobj you'll find this is a pretty

Re: deadlock or bad disk ? RELENG_8

2010-07-18 Thread Jeremy Chadwick
On Sun, Jul 18, 2010 at 05:42:14PM -0400, Mike Tancsa wrote: At 05:14 PM 7/18/2010, Jeremy Chadwick wrote: Where exactly is your swap partition? On one of the areca raidsets. # swapctl -l Device: 1024-blocks Used: /dev/da0s1b10485760 108 So is da0 actually a RAID

Re: deadlock or bad disk ? RELENG_8

2010-07-18 Thread Jeremy Chadwick
On Sun, Jul 18, 2010 at 07:34:19PM -0700, Jeremy Chadwick wrote: Now I'm confused -- this indicates twa(4) is involved, not arcmsr(4). Can you please provide a verbose explanation of the configuration of the disks and controllers in this machine, including device and disk names and what

Re: deadlock or bad disk ? RELENG_8

2010-07-18 Thread Mike Tancsa
At 10:34 PM 7/18/2010, Jeremy Chadwick wrote: On Sun, Jul 18, 2010 at 05:42:14PM -0400, Mike Tancsa wrote: At 05:14 PM 7/18/2010, Jeremy Chadwick wrote: Where exactly is your swap partition? On one of the areca raidsets. # swapctl -l Device: 1024-blocks Used: /dev/da0s1b

Re: deadlock or bad disk ? RELENG_8

2010-07-18 Thread Mike Tancsa
At 10:58 PM 7/18/2010, Jeremy Chadwick wrote: I re-worked this out myself based on the OP's dmesg. It's confusing because there's literally 6 different storage controllers on a single machine: Its a big storage server. Some files dont require fast or frequent access, others do. The disks

Re: deadlock or bad disk ? RELENG_8

2010-07-18 Thread Jeremy Chadwick
On Sun, Jul 18, 2010 at 11:01:03PM -0400, Mike Tancsa wrote: At 10:34 PM 7/18/2010, Jeremy Chadwick wrote: On Sun, Jul 18, 2010 at 05:42:14PM -0400, Mike Tancsa wrote: At 05:14 PM 7/18/2010, Jeremy Chadwick wrote: Where exactly is your swap partition? On one of the areca raidsets.

Re: deadlock or bad disk ? RELENG_8

2010-07-18 Thread Jeremy Chadwick
On Sun, Jul 18, 2010 at 08:34:24PM -0700, Jeremy Chadwick wrote: On Sun, Jul 18, 2010 at 11:01:03PM -0400, Mike Tancsa wrote: I do track some basic mem stats via rrd. Looking at the graphs upto that period, nothing unusual was happening sysctl vm.stats.vm | grep swap Here's another

Re: deadlock or bad disk ? RELENG_8

2010-07-18 Thread Jeremy Chadwick
On Sun, Jul 18, 2010 at 08:58:44PM -0700, Jeremy Chadwick wrote: I took a look at the RELENG_8 code responsible for printing this message: src/sys/vm/swap_pager.c [...] 1086 static int 1087 swap_pager_getpages(vm_object_t object, vm_page_t *m, int count, int reqpage) 1088 { [...] There