RAID5 lockup with AMCC440 and async-tx

2007-10-01 Thread Dale Dunlea
Hi, I have a board with an AMCC440 processor, running RAID5 using the async-tx interface. In general, it works well, but I have found a test case that consistently causes a hard lockup of the entire system. What makes this case odd is that I have only been able to generate it when accessing

Re: RAID5 lockup with AMCC440 and async-tx

2007-10-01 Thread Justin Piszcz
On Mon, 1 Oct 2007, Dale Dunlea wrote: Hi, I have a board with an AMCC440 processor, running RAID5 using the async-tx interface. In general, it works well, but I have found a test case that consistently causes a hard lockup of the entire system. What makes this case odd is that I have only

Re: RAID5 lockup with AMCC440 and async-tx

2007-10-01 Thread Dale Dunlea
On 01/10/2007, Wolfgang Denk [EMAIL PROTECTED] wrote: Dear Dale, in message [EMAIL PROTECTED] you wrote: I have a board with an AMCC440 processor, running RAID5 using the async-tx interface. In general, it works well, but I have found a test case that consistently causes a hard lockup

Re: RAID5 lockup with AMCC440 and async-tx

2007-10-01 Thread Wolfgang Denk
Dear Dale, in message [EMAIL PROTECTED] you wrote: Latest code from Dan or latest code from denx.de? I grabbed the latest From linux-2.6-denx code from Dan, but I'm having trouble cloning denx.de: remote: error: object directory /home/git/linux-2.6/.git/objects does not exist; check

Re: problem killing raid 5

2007-10-01 Thread Daniel Santos
I retried rebuilding the array once again from scratch, and this time checked the syslog messages. The reconstructions process is getting stuck at a disk block that it can't read. I double checked the block number by repeating the array creation, and did a bad block scan. No bad blocks were

Re: problem killing raid 5

2007-10-01 Thread Michael Tokarev
Daniel Santos wrote: I retried rebuilding the array once again from scratch, and this time checked the syslog messages. The reconstructions process is getting stuck at a disk block that it can't read. I double checked the block number by repeating the array creation, and did a bad block scan.

Re: problem killing raid 5

2007-10-01 Thread Patrik Jonsson
Michael Tokarev wrote: Daniel Santos wrote: I retried rebuilding the array once again from scratch, and this time checked the syslog messages. The reconstructions process is getting stuck at a disk block that it can't read. I double checked the block number by repeating the array creation,

Re: RAID5 lockup with AMCC440 and async-tx

2007-10-01 Thread Dale Dunlea
On 01/10/2007, Wolfgang Denk [EMAIL PROTECTED] wrote: Latest code from Dan or latest code from denx.de? I grabbed the latest From linux-2.6-denx I grabbed the latest from denx.de, but unfortunately, to no avail. The dd test still hangs pretty much immediately. Thanks nonetheless. Regards,

Optimization report for Justin .

2007-10-01 Thread Mr. James W. Laferriere
Hello Justin , Three seperate single runs of bonnie(*) . Please note , the linux-2.6.23-rc6 , Concerns your email of this weekend about Subject: Bonnie++ with 1024k stripe SW/RAID5 causes kernel to goto D-state . No lockups or hangs were noticed .

Re: Optimization report for Justin .

2007-10-01 Thread Justin Piszcz
So you got 2x with those optimizations I mentioned? Nice, did you previously get that speed, or? On Mon, 1 Oct 2007, Mr. James W. Laferriere wrote: Hello Justin , Three seperate single runs of bonnie(*) . Please note , the linux-2.6.23-rc6 , Concerns your email of

Re: problem killing raid 5

2007-10-01 Thread Michael Tokarev
Patrik Jonsson wrote: Michael Tokarev wrote: [] But in any case, md should not stall - be it during reconstruction or not. For this, I can't comment - to me it smells like a bug somewhere (md layer? error handling in driver? something else?) which should be found and fixed. And for this,

Re: Optimization report for Justin .

2007-10-01 Thread Mr. James W. Laferriere
Hello Justin , On Mon, 1 Oct 2007, Justin Piszcz wrote: So you got 2x with those optimizations I mentioned? Nice, did you previously get that speed, or? Yes , the opt.s did get me a ~ 2.5x speed up . This array (afaicr) has never had that high a thruput . But I am

Re: problem killing raid 5

2007-10-01 Thread Daniel Santos
It stopped the reconstruction process and the output of /proc/mdstat was : oraculo:/home/dlsa# cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] [raid1] [raid0] [linear] md0 : active raid5 sdc1[3](S) sdb1[4](F) sdd1[0] 781417472 blocks level 5, 256k chunk, algorithm 2 [3/1] [U__] I

Re: problem killing raid 5

2007-10-01 Thread Justin Piszcz
On Mon, 1 Oct 2007, Daniel Santos wrote: It stopped the reconstruction process and the output of /proc/mdstat was : oraculo:/home/dlsa# cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] [raid1] [raid0] [linear] md0 : active raid5 sdc1[3](S) sdb1[4](F) sdd1[0] 781417472 blocks