Raid1, mdadm and nfs that remains in D state

2008-01-22 Thread BERTRAND Joël
Hello, I have installed a lot of T1000 with debian/testing and official 2.6.23.9 linux kernel. All but iscsi packages come from debian repositories. iscsi was built from SVN tree. md7 is a raid1 volume over iscsi and I can access to this device. This morning, one of my T1000 has

Re: HELP! New disks being dropped from RAID 6 array on every reboot

2007-11-23 Thread BERTRAND Joël
Joshua Johnson wrote: Greetings, long time listener, first time caller. I recently replaced a disk in my existing 8 disk RAID 6 array. Previously, all disks were PATA drives connected to the motherboard IDE and 3 promise Ultra 100/133 controllers. I replaced one of the Promise controllers with

Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-08 Thread BERTRAND Joël
BERTRAND Joël wrote: Chuck Ebbert wrote: On 11/05/2007 03:36 AM, BERTRAND Joël wrote: Neil Brown wrote: On Sunday November 4, [EMAIL PROTECTED] wrote: # ps auxww | grep D USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND root 273 0.0 0.0 0 0

Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-07 Thread BERTRAND Joël
Dan Williams wrote: On Tue, 2007-11-06 at 03:19 -0700, BERTRAND Joël wrote: Done. Here is obtained ouput : Much appreciated. [ 1260.969314] handling stripe 7629696, state=0x14 cnt=1, pd_idx=2 ops=0:0:0 [ 1260.980606] check 5: state 0x6 toread read

Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-07 Thread BERTRAND Joël
Chuck Ebbert wrote: On 11/05/2007 03:36 AM, BERTRAND Joël wrote: Neil Brown wrote: On Sunday November 4, [EMAIL PROTECTED] wrote: # ps auxww | grep D USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND root 273 0.0 0.0 0 0 ?DOct21 14:40

Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-06 Thread BERTRAND Joël
Done. Here is obtained ouput : [ 1260.967796] for sector 7629696, rmw=0 rcw=0 [ 1260.969314] handling stripe 7629696, state=0x14 cnt=1, pd_idx=2 ops=0:0:0 [ 1260.980606] check 5: state 0x6 toread read write f800ffcffcc0 written [

Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-06 Thread BERTRAND Joël
Justin Piszcz wrote: On Tue, 6 Nov 2007, BERTRAND Joël wrote: Done. Here is obtained ouput : [ 1265.899068] check 4: state 0x6 toread read write f800fdd4e360 written [ 1265.941328] check 3: state 0x1 toread read

Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-06 Thread BERTRAND Joël
Justin Piszcz wrote: On Tue, 6 Nov 2007, BERTRAND Joël wrote: Justin Piszcz wrote: On Tue, 6 Nov 2007, BERTRAND Joël wrote: Done. Here is obtained ouput : [ 1265.899068] check 4: state 0x6 toread read write f800fdd4e360 written

Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-05 Thread BERTRAND Joël
Neil Brown wrote: On Sunday November 4, [EMAIL PROTECTED] wrote: # ps auxww | grep D USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND root 273 0.0 0.0 0 0 ?DOct21 14:40 [pdflush] root 274 0.0 0.0 0 0 ?DOct21

Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-04 Thread BERTRAND Joël
Justin Piszcz wrote: # ps auxww | grep D USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND root 273 0.0 0.0 0 0 ?DOct21 14:40 [pdflush] root 274 0.0 0.0 0 0 ?DOct21 13:00 [pdflush] After several days/weeks, this

Re: Strange CPU occupation... and system hangs

2007-11-01 Thread BERTRAND Joël
BERTRAND Joël wrote: snip and some process are in D state : Root gershwin:[/etc] ps auwx | grep D USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND root 270 0.0 0.0 0 0 ?DOct27 1:17 [pdflush] root 3676 0.9 0.0 0 0

Strange CPU occupation...

2007-10-31 Thread BERTRAND Joël
Hello, I'm looking for a bug in iSCSI target code, but I have found this morning a new bug that is certainly related to mine... Please consider these raid volumes: Root gershwin:[/etc] cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md7 : active raid1 sdi1[2](F)

Re: [BUG] Raid1/5 over iSCSI trouble

2007-10-29 Thread BERTRAND Joël
Ming Zhang wrote: off topic, could you resubmit the alignment issue patch to list and see if tomof accept. he needs a patch inlined in email. it is found and fixed by you, so had better you post it (instead of me). thx. diff -u kernel.old/iscsi.c kernel/iscsi.c --- kernel.old/iscsi.c

Re: [BUG] Raid1/5 over iSCSI trouble

2007-10-27 Thread BERTRAND Joël
Dan Williams wrote: On 10/24/07, BERTRAND Joël [EMAIL PROTECTED] wrote: Hello, Any news about this trouble ? Any idea ? I'm trying to fix it, but I don't see any specific interaction between raid5 and istd. Does anyone try to reproduce this bug on another arch than sparc64 ? I

Re: [BUG] Raid1/5 over iSCSI trouble

2007-10-27 Thread BERTRAND Joël
Dan Williams wrote: On 10/27/07, BERTRAND Joël [EMAIL PROTECTED] wrote: Dan Williams wrote: Can you collect some oprofile data, as Ming suggested, so we can maybe see what md_d0_raid5 and istd1 are fighting about? Hopefully it is as painless to run on sparc as it is on IA: opcontrol --start

Re: [BUG] Raid1/5 over iSCSI trouble

2007-10-24 Thread BERTRAND Joël
Hello, Any news about this trouble ? Any idea ? I'm trying to fix it, but I don't see any specific interaction between raid5 and istd. Does anyone try to reproduce this bug on another arch than sparc64 ? I only use sparc32 and 64 servers and I cannot test on other archs. Of course, I

Re: [BUG] Raid1/5 over iSCSI trouble

2007-10-20 Thread BERTRAND Joël
Bill Davidsen wrote: BERTRAND Joël wrote: Sorry for this last mail. I have found another mistake, but I don't know if this bug comes from iscsi-target or raid5 itself. iSCSI target is disconnected because istd1 and md_d0_raid5 kernel threads use 100% of CPU each ! Tasks: 235 total

Re: [Iscsitarget-devel] Abort Task ?

2007-10-19 Thread BERTRAND Joël
Ming Zhang wrote: as Ross pointed out, many io pattern only have 1 outstanding io at any time, so there is only one work thread actively to serve it. so it can not exploit the multiple core here. you see 100% at nullio or fileio? with disk, most time should spend on iowait and cpu utilization

Re: [Iscsitarget-devel] Abort Task ?

2007-10-19 Thread BERTRAND Joël
Ming Zhang wrote: On Fri, 2007-10-19 at 09:48 +0200, BERTRAND Joël wrote: Ross S. W. Walker wrote: BERTRAND Joël wrote: BERTRAND Joël wrote: I can format serveral times (mkfs.ext3) a 1.5 TB volume over iSCSI without any trouble. I can read and write on this virtual disk without any

Re: [BUG] Raid5 trouble

2007-10-19 Thread BERTRAND Joël
Bill Davidsen wrote: Dan Williams wrote: On Fri, 2007-10-19 at 01:04 -0700, BERTRAND Joël wrote: I run for 12 hours some dd's (read and write in nullio) between initiator and target without any disconnection. Thus iSCSI code seems to be robust. Both initiator and target are alone

Re: [BUG] Raid1/5 over iSCSI trouble

2007-10-19 Thread BERTRAND Joël
BERTRAND Joël wrote: Bill Davidsen wrote: Dan Williams wrote: On Fri, 2007-10-19 at 01:04 -0700, BERTRAND Joël wrote: I run for 12 hours some dd's (read and write in nullio) between initiator and target without any disconnection. Thus iSCSI code seems to be robust. Both initiator

Re: [BUG] Raid5 trouble

2007-10-19 Thread BERTRAND Joël
Bill Davidsen wrote: Dan Williams wrote: I found a problem which may lead to the operations count dropping below zero. If ops_complete_biofill() gets preempted in between the following calls: raid5.c:554 clear_bit(STRIPE_OP_BIOFILL, sh-ops.ack); raid5.c:555 clear_bit(STRIPE_OP_BIOFILL,

Re: [Iscsitarget-devel] Abort Task ?

2007-10-19 Thread BERTRAND Joël
Ross S. W. Walker wrote: BERTRAND Joël wrote: BERTRAND Joël wrote: I can format serveral times (mkfs.ext3) a 1.5 TB volume over iSCSI without any trouble. I can read and write on this virtual disk without any trouble. Now, I have configured ietd with : Lun 0 Sectors=1464725758

Re: [BUG] Raid5 trouble

2007-10-18 Thread BERTRAND Joël
Dan, I'm testing your last patch (fix-biofill-clear2.patch). It seems to work: Every 1.0s: cat /proc/mdstatThu Oct 18 10:28:55 2007 Personalities : [raid1] [raid6] [raid5] [raid4] md7 : active raid1 sdi1[1] md_d0p1[0] 1464725632 blocks [2/2]

Re: [Iscsitarget-devel] Abort Task ?

2007-10-18 Thread BERTRAND Joël
Ming Zhang wrote: On Thu, 2007-10-18 at 11:33 -0400, Ross S. W. Walker wrote: BERTRAND Joël wrote: BERTRAND Joël wrote: BERTRAND Joël wrote: Hello, When I try to create a raid1 volume over iscsi, process aborts with : - on target side: iscsi_trgt: cmnd_abort(1156) 29 1 0 42

Re: [Iscsitarget-devel] Abort Task ?

2007-10-18 Thread BERTRAND Joël
BERTRAND Joël wrote: I can format serveral times (mkfs.ext3) a 1.5 TB volume over iSCSI without any trouble. I can read and write on this virtual disk without any trouble. Now, I have configured ietd with : Lun 0 Sectors=1464725758,Type=nullio and I run on initiator side : Root

Re: [BUG] Raid5 trouble

2007-10-17 Thread BERTRAND Joël
BERTRAND Joël wrote: Hello, I run 2.6.23 linux kernel on two T1000 (sparc64) servers. Each server has a partitionable raid5 array (/dev/md/d0) and I have to synchronize both raid5 volumes by raid1. Thus, I have tried to build a raid1 volume between /dev/md/d0p1 and /dev/sdi1

Re: [BUG] Raid5 trouble

2007-10-17 Thread BERTRAND Joël
Dan Williams wrote: On 10/17/07, BERTRAND Joël [EMAIL PROTECTED] wrote: BERTRAND Joël wrote: Hello, I run 2.6.23 linux kernel on two T1000 (sparc64) servers. Each server has a partitionable raid5 array (/dev/md/d0) and I have to synchronize both raid5 volumes by raid1. Thus, I have

Re: [BUG] Raid5 trouble

2007-10-17 Thread BERTRAND Joël
Dan Williams wrote: On 10/17/07, Dan Williams [EMAIL PROTECTED] wrote: On 10/17/07, BERTRAND Joël [EMAIL PROTECTED] wrote: BERTRAND Joël wrote: Hello, I run 2.6.23 linux kernel on two T1000 (sparc64) servers. Each server has a partitionable raid5 array (/dev/md/d0) and I have

Re: Partitionable raid array... How to create devices ?

2007-10-16 Thread BERTRAND Joël
Neil Brown wrote: On Tuesday October 16, [EMAIL PROTECTED] wrote: Hello, I use software raid for a long time without any trouble. Today, I have to install a partitionable raid1 array over iSCSI. I have some questions because I don't understand how make this kind of array. I have

[BUG] Raid5 trouble

2007-10-16 Thread BERTRAND Joël
Hello, I run 2.6.23 linux kernel on two T1000 (sparc64) servers. Each server has a partitionable raid5 array (/dev/md/d0) and I have to synchronize both raid5 volumes by raid1. Thus, I have tried to build a raid1 volume between /dev/md/d0p1 and /dev/sdi1 (exported by iscsi from the