Raid1, mdadm and nfs that remains in D state
Hello, I have installed a lot of T1000 with debian/testing and official 2.6.23.9 linux kernel. All but iscsi packages come from debian repositories. iscsi was built from SVN tree. md7 is a raid1 volume over iscsi and I can access to this device. This morning, one of my T1000 has crashed. NFS daemon stays in D state: Root gershwin:[~] ps auwx | grep NFS root 17041 0.0 0.0 2064 744 ttyS0S+ 12:33 0:00 grep NFS Root gershwin:[~] ps auwx | grep nfs root 17043 0.0 0.0 2064 744 ttyS0S+ 12:33 0:00 grep nfs root 18276 0.0 0.0 0 0 ?D 2007 16:59 [nfsd] root 18277 0.0 0.0 0 0 ?D 2007 16:56 [nfsd] root 18278 0.0 0.0 0 0 ?D 2007 16:57 [nfsd] root 18279 0.0 0.0 0 0 ?D 2007 16:41 [nfsd] root 18280 0.0 0.0 0 0 ?D 2007 16:44 [nfsd] root 18281 0.0 0.0 0 0 ?D 2007 16:49 [nfsd] root 18282 0.0 0.0 0 0 ?D 2007 16:37 [nfsd] root 18283 0.0 0.0 0 0 ?D 2007 16:54 [nfsd] Root gershwin:[~] dmesg sp: f800f2bcf3b1 ret_pc: 005e6d54 RPC: raid1d+0x35c/0x1020 l0: f80060b8fa40 l1: 0050 l2: 0006 l3: 0001 l4: f800fde2c8a0 l5: f800fc74dc20 l6: 0007 l7: i0: f800fb70c400 i1: f800fde2c8c8 i2: f8006297ee40 i3: f800 i4: 0010 i5: 007a2f00 i6: f800f2bcf4f1 i7: 005f2f50 I7: md_thread+0x38/0x140 BUG: soft lockup - CPU#6 stuck for 11s! [md7_raid1:5818] TSTATE: 80001600 TPC: 0055bff0 TNPC: 0055bff4 Y: Not tainted TPC: loop+0x14/0x28 g0: 0020 g1: dffd57408000 g2: 0002a8ba2e81 g3: g4: f800fd52d960 g5: f800020bc000 g6: f800f2bcc000 g7: o0: f8009d13d254 o1: f80071755254 o2: 0dac o3: o4: 0018d1a6 o5: 00225c52 sp: f800f2bcf3b1 ret_pc: 005e6d54 RPC: raid1d+0x35c/0x1020 l0: f80077d36ce0 l1: 0050 l2: 0006 l3: 0001 l4: f800fde2c8a0 l5: f800f4372ea0 l6: 0007 l7: i0: f800fb70c400 i1: f800fde2c8c8 i2: f80091038660 i3: f800 i4: 0010 i5: 007a2f00 i6: f800f2bcf4f1 i7: 005f2f50 I7: md_thread+0x38/0x140 BUG: soft lockup - CPU#6 stuck for 11s! [md7_raid1:5818] TSTATE: 004480001607 TPC: 006803a0 TNPC: 006803a4 Y: Not tainted TPC: _spin_unlock_irqrestore+0x28/0x40 g0: f800fed95000 g1: g2: c0002000 g3: d0002000 g4: f800fd52d960 g5: f800020bc000 g6: f800f2bcc000 g7: f800ffcb o0: f800fee16000 o1: o2: o3: f800fee16000 o4: o5: 00784000 sp: f800f2bceda1 ret_pc: 005a4fb8 RPC: tg3_poll+0x820/0xc40 l0: 042a l1: 0001 l2: f800f79aba00 l3: 01d0 l4: f800fed95700 l5: f800f1091ec0 l6: 01d0 l7: 0001 i0: 01df i1: 0029 i2: 01df i3: 0029 i4: f800fed95794 i5: 94479812 i6: f800f2bcee81 i7: 00609780 I7: net_rx_action+0x88/0x160 BUG: soft lockup - CPU#6 stuck for 11s! [md7_raid1:5818] TSTATE: 009980001602 TPC: 10170100 TNPC: 10170104 Y: Not tainted TPC: ipv4_get_l4proto+0x8/0xa0 [nf_conntrack_ipv4] g0: 1002bb58 g1: 006c g2: f800eba32b0c g3: 10170100 g4: f800fd52d960 g5: f800020bc000 g6: f800f2bcc000 g7: 0003 o0: f800d69aae00 o1: o2: f800f2bced24 o3: f800f2bced2f o4: f800fed95000 o5: f800f2bceec8 sp: f800f2bce411 ret_pc: 10019d7c RPC: nf_conntrack_in+0xa4/0x580 [nf_conntrack] l0: 0002 l1: 10175590 l2: 8000 l3: 0002 l4: l5: 0cbcc8bb l6: 0002 l7: f80062b8f820 i0: 0002 i1: 0003 i2: f800f2bcf080 i3: f800fed95000 i4: 00630260 i5: 00630260 i6: f800f2bce541 i7: 0062517c I7: nf_iterate+0x84/0xe0 BUG: soft lockup - CPU#6 stuck for 11s! [md7_raid1:5818] TSTATE: 004480001605 TPC: 10161030 TNPC: 10161034 Y: Not tainted TPC: ipt_do_table+0xd8/0x5a0 [ip_tables] g0: 0001 g1: g2: c0a80001 g3: g4: f800fd52d960 g5: f800020bc000 g6: f800f2bcc000 g7: 0be0 o0: 10180b74 o1: f800f2bcf480 o2: o3: f800fed95000 o4: o5: f8005ef72be0 sp: f800f2bce821 ret_pc: 10160fac RPC: ipt_do_table+0x54/0x5a0 [ip_tables] l0:
Re: HELP! New disks being dropped from RAID 6 array on every reboot
Joshua Johnson wrote: Greetings, long time listener, first time caller. I recently replaced a disk in my existing 8 disk RAID 6 array. Previously, all disks were PATA drives connected to the motherboard IDE and 3 promise Ultra 100/133 controllers. I replaced one of the Promise controllers with a Via 64xx based controller, which has 2 SATA ports and one PATA port. I connected a new SATA drive to the new card, partitioned the drive and added it to the array. After 5 or 6 hours the resyncing process finished and the array showed up complete. Upon rebooting I discovered that the new drive had not been added to the array when it was assembled on boot. I resynced it and tried again -- still would not persist after a reboot. I moved one of the existing PATA drives to the new controller (so I could have the slot for network), rebooted and rebuilt the array. Now when I reboot BOTH disks are missing from the array (sda and sdb). Upon examining the disks it appears they think they are part of the array, but for some reason they are not being added when the array is being assembled. For example, this is a disk on the new controller which was not added to the array after rebooting: What is your partition system ? When I have tried to created a raid6 array over a SunOS partition type, I have seen this bug. Never on PC system. Regards, JKB - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.23.1: mdadm/raid5 hung/d-state
BERTRAND Joël wrote: Chuck Ebbert wrote: On 11/05/2007 03:36 AM, BERTRAND Joël wrote: Neil Brown wrote: On Sunday November 4, [EMAIL PROTECTED] wrote: # ps auxww | grep D USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND root 273 0.0 0.0 0 0 ?DOct21 14:40 [pdflush] root 274 0.0 0.0 0 0 ?DOct21 13:00 [pdflush] After several days/weeks, this is the second time this has happened, while doing regular file I/O (decompressing a file), everything on the device went into D-state. At a guess (I haven't looked closely) I'd say it is the bug that was meant to be fixed by commit 4ae3f847e49e3787eca91bced31f8fd328d50496 except that patch applied badly and needed to be fixed with the following patch (not in git yet). These have been sent to stable@ and should be in the queue for 2.6.23.2 My linux-2.6.23/drivers/md/raid5.c contains your patch for a long time : ... spin_lock(sh-lock); clear_bit(STRIPE_HANDLE, sh-state); clear_bit(STRIPE_DELAYED, sh-state); s.syncing = test_bit(STRIPE_SYNCING, sh-state); s.expanding = test_bit(STRIPE_EXPAND_SOURCE, sh-state); s.expanded = test_bit(STRIPE_EXPAND_READY, sh-state); /* Now to look around and see what can be done */ /* clean-up completed biofill operations */ if (test_bit(STRIPE_OP_BIOFILL, sh-ops.complete)) { clear_bit(STRIPE_OP_BIOFILL, sh-ops.pending); clear_bit(STRIPE_OP_BIOFILL, sh-ops.ack); clear_bit(STRIPE_OP_BIOFILL, sh-ops.complete); } rcu_read_lock(); for (i=disks; i--; ) { mdk_rdev_t *rdev; struct r5dev *dev = sh-dev[i]; ... but it doesn't fix this bug. Did that chunk starting with clean-up completed biofill operations end up where it belongs? The patch with the big context moves it to a different place from where the original one puts it when applied to 2.6.23... Lately I've seen several problems where the context isn't enough to make a patch apply properly when some offsets have changed. In some cases a patch won't apply at all because two nearly-identical areas are being changed and the first chunk gets applied where the second one should, leaving nowhere for the second chunk to apply. I always apply this kind of patches by hands, and no by patch command. Last patch sent here seems to fix this bug : gershwin:[/usr/scripts] cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md7 : active raid1 sdi1[2] md_d0p1[0] 1464725632 blocks [2/1] [U_] [=...] recovery = 27.1% (396992504/1464725632) finish=1040.3min speed=17104K/sec Resync done. Patch fix this bug. Regards, JKB - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.23.1: mdadm/raid5 hung/d-state
Dan Williams wrote: On Tue, 2007-11-06 at 03:19 -0700, BERTRAND Joël wrote: Done. Here is obtained ouput : Much appreciated. [ 1260.969314] handling stripe 7629696, state=0x14 cnt=1, pd_idx=2 ops=0:0:0 [ 1260.980606] check 5: state 0x6 toread read write f800ffcffcc0 written [ 1260.994808] check 4: state 0x6 toread read write f800fdd4e360 written [ 1261.009325] check 3: state 0x1 toread read write written [ 1261.244478] check 2: state 0x1 toread read write written [ 1261.270821] check 1: state 0x6 toread read write f800ff517e40 written [ 1261.312320] check 0: state 0x6 toread read write f800fd4cae60 written [ 1261.361030] locked=4 uptodate=2 to_read=0 to_write=4 failed=0 failed_num=0 [ 1261.443120] for sector 7629696, rmw=0 rcw=0 [..] This looks as if the blocks were prepared to be written out, but were never handled in ops_run_biodrain(), so they remain locked forever. The operations flags are all clear which means handle_stripe thinks nothing else needs to be done. The following patch, also attached, cleans up cases where the code looks at sh-ops.pending when it should be looking at the consistent stack-based snapshot of the operations flags. Thanks for this patch. I'm testing it for three hours. I'm rebuilding a 1.5 TB raid1 array over iSCSI without any trouble. gershwin:[/usr/scripts] cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md7 : active raid1 sdi1[2] md_d0p1[0] 1464725632 blocks [2/1] [U_] [=...] recovery = 6.7% (99484736/1464725632) finish=1450.9min speed=15679K/sec Without your patch, I never reached 1%... I hope it fix this bug and I shall come back when my raid1 volume shall be resynchronized. Regards, JKB - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.23.1: mdadm/raid5 hung/d-state
Chuck Ebbert wrote: On 11/05/2007 03:36 AM, BERTRAND Joël wrote: Neil Brown wrote: On Sunday November 4, [EMAIL PROTECTED] wrote: # ps auxww | grep D USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND root 273 0.0 0.0 0 0 ?DOct21 14:40 [pdflush] root 274 0.0 0.0 0 0 ?DOct21 13:00 [pdflush] After several days/weeks, this is the second time this has happened, while doing regular file I/O (decompressing a file), everything on the device went into D-state. At a guess (I haven't looked closely) I'd say it is the bug that was meant to be fixed by commit 4ae3f847e49e3787eca91bced31f8fd328d50496 except that patch applied badly and needed to be fixed with the following patch (not in git yet). These have been sent to stable@ and should be in the queue for 2.6.23.2 My linux-2.6.23/drivers/md/raid5.c contains your patch for a long time : ... spin_lock(sh-lock); clear_bit(STRIPE_HANDLE, sh-state); clear_bit(STRIPE_DELAYED, sh-state); s.syncing = test_bit(STRIPE_SYNCING, sh-state); s.expanding = test_bit(STRIPE_EXPAND_SOURCE, sh-state); s.expanded = test_bit(STRIPE_EXPAND_READY, sh-state); /* Now to look around and see what can be done */ /* clean-up completed biofill operations */ if (test_bit(STRIPE_OP_BIOFILL, sh-ops.complete)) { clear_bit(STRIPE_OP_BIOFILL, sh-ops.pending); clear_bit(STRIPE_OP_BIOFILL, sh-ops.ack); clear_bit(STRIPE_OP_BIOFILL, sh-ops.complete); } rcu_read_lock(); for (i=disks; i--; ) { mdk_rdev_t *rdev; struct r5dev *dev = sh-dev[i]; ... but it doesn't fix this bug. Did that chunk starting with clean-up completed biofill operations end up where it belongs? The patch with the big context moves it to a different place from where the original one puts it when applied to 2.6.23... Lately I've seen several problems where the context isn't enough to make a patch apply properly when some offsets have changed. In some cases a patch won't apply at all because two nearly-identical areas are being changed and the first chunk gets applied where the second one should, leaving nowhere for the second chunk to apply. I always apply this kind of patches by hands, and no by patch command. Last patch sent here seems to fix this bug : gershwin:[/usr/scripts] cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md7 : active raid1 sdi1[2] md_d0p1[0] 1464725632 blocks [2/1] [U_] [=...] recovery = 27.1% (396992504/1464725632) finish=1040.3min speed=17104K/sec Regards, JKB - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.23.1: mdadm/raid5 hung/d-state
Done. Here is obtained ouput : [ 1260.967796] for sector 7629696, rmw=0 rcw=0 [ 1260.969314] handling stripe 7629696, state=0x14 cnt=1, pd_idx=2 ops=0:0:0 [ 1260.980606] check 5: state 0x6 toread read write f800ffcffcc0 written [ 1260.994808] check 4: state 0x6 toread read write f800fdd4e360 written [ 1261.009325] check 3: state 0x1 toread read write written [ 1261.244478] check 2: state 0x1 toread read write written [ 1261.270821] check 1: state 0x6 toread read write f800ff517e40 written [ 1261.312320] check 0: state 0x6 toread read write f800fd4cae60 written [ 1261.361030] locked=4 uptodate=2 to_read=0 to_write=4 failed=0 failed_num=0 [ 1261.443120] for sector 7629696, rmw=0 rcw=0 [ 1261.453348] handling stripe 7629696, state=0x14 cnt=1, pd_idx=2 ops=0:0:0 [ 1261.491538] check 5: state 0x6 toread read write f800ffcffcc0 written [ 1261.529120] check 4: state 0x6 toread read write f800fdd4e360 written [ 1261.560151] check 3: state 0x1 toread read write written [ 1261.599180] check 2: state 0x1 toread read write written [ 1261.637138] check 1: state 0x6 toread read write f800ff517e40 written [ 1261.674502] check 0: state 0x6 toread read write f800fd4cae60 written [ 1261.712589] locked=4 uptodate=2 to_read=0 to_write=4 failed=0 failed_num=0 [ 1261.864338] for sector 7629696, rmw=0 rcw=0 [ 1261.873475] handling stripe 7629696, state=0x14 cnt=1, pd_idx=2 ops=0:0:0 [ 1261.907840] check 5: state 0x6 toread read write f800ffcffcc0 written [ 1261.950770] check 4: state 0x6 toread read write f800fdd4e360 written [ 1261.989003] check 3: state 0x1 toread read write written [ 1262.019621] check 2: state 0x1 toread read write written [ 1262.068705] check 1: state 0x6 toread read write f800ff517e40 written [ 1262.113265] check 0: state 0x6 toread read write f800fd4cae60 written [ 1262.150511] locked=4 uptodate=2 to_read=0 to_write=4 failed=0 failed_num=0 [ 1262.171143] for sector 7629696, rmw=0 rcw=0 [ 1262.179142] handling stripe 7629696, state=0x14 cnt=1, pd_idx=2 ops=0:0:0 [ 1262.201905] check 5: state 0x6 toread read write f800ffcffcc0 written [ 1262.252750] check 4: state 0x6 toread read write f800fdd4e360 written [ 1262.289631] check 3: state 0x1 toread read write written [ 1262.344709] check 2: state 0x1 toread read write written [ 1262.400411] check 1: state 0x6 toread read write f800ff517e40 written [ 1262.437353] check 0: state 0x6 toread read write f800fd4cae60 written [ 1262.492561] locked=4 uptodate=2 to_read=0 to_write=4 failed=0 failed_num=0 [ 1262.524993] for sector 7629696, rmw=0 rcw=0 [ 1262.533314] handling stripe 7629696, state=0x14 cnt=1, pd_idx=2 ops=0:0:0 [ 1262.561900] check 5: state 0x6 toread read write f800ffcffcc0 written [ 1262.588986] check 4: state 0x6 toread read write f800fdd4e360 written [ 1262.619455] check 3: state 0x1 toread read write written [ 1262.671006] check 2: state 0x1 toread read write written [ 1262.709065] check 1: state 0x6 toread read write f800ff517e40 written [ 1262.746904] check 0: state 0x6 toread read write f800fd4cae60 written [ 1262.780203] locked=4 uptodate=2 to_read=0 to_write=4 failed=0 failed_num=0 [ 1262.805941] for sector 7629696, rmw=0 rcw=0 [ 1262.815759]
Re: 2.6.23.1: mdadm/raid5 hung/d-state
Justin Piszcz wrote: On Tue, 6 Nov 2007, BERTRAND Joël wrote: Done. Here is obtained ouput : [ 1265.899068] check 4: state 0x6 toread read write f800fdd4e360 written [ 1265.941328] check 3: state 0x1 toread read write written [ 1265.972129] check 2: state 0x1 toread read write written For information, after crash, I have : Root poulenc:[/sys/block] cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md_d0 : active raid5 sdc1[0] sdh1[5] sdg1[4] sdf1[3] sde1[2] sdd1[1] 1464725760 blocks level 5, 64k chunk, algorithm 2 [6/6] [UU] Regards, JKB After the crash it is not 'resyncing' ? No, it isn't... JKB - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.23.1: mdadm/raid5 hung/d-state
Justin Piszcz wrote: On Tue, 6 Nov 2007, BERTRAND Joël wrote: Justin Piszcz wrote: On Tue, 6 Nov 2007, BERTRAND Joël wrote: Done. Here is obtained ouput : [ 1265.899068] check 4: state 0x6 toread read write f800fdd4e360 written [ 1265.941328] check 3: state 0x1 toread read write written [ 1265.972129] check 2: state 0x1 toread read write written For information, after crash, I have : Root poulenc:[/sys/block] cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md_d0 : active raid5 sdc1[0] sdh1[5] sdg1[4] sdf1[3] sde1[2] sdd1[1] 1464725760 blocks level 5, 64k chunk, algorithm 2 [6/6] [UU] Regards, JKB After the crash it is not 'resyncing' ? No, it isn't... JKB After any crash/unclean shutdown the RAID should resync, if it doesn't, that's not good, I'd suggest running a raid check. The 'repair' is supposed to clean it, in some cases (md0=swap) it gets dirty again. Tue May 8 09:19:54 EDT 2007: Executing RAID health check for /dev/md0... Tue May 8 09:19:55 EDT 2007: Executing RAID health check for /dev/md1... Tue May 8 09:19:56 EDT 2007: Executing RAID health check for /dev/md2... Tue May 8 09:19:57 EDT 2007: Executing RAID health check for /dev/md3... Tue May 8 10:09:58 EDT 2007: cat /sys/block/md0/md/mismatch_cnt Tue May 8 10:09:58 EDT 2007: 2176 Tue May 8 10:09:58 EDT 2007: cat /sys/block/md1/md/mismatch_cnt Tue May 8 10:09:58 EDT 2007: 0 Tue May 8 10:09:58 EDT 2007: cat /sys/block/md2/md/mismatch_cnt Tue May 8 10:09:58 EDT 2007: 0 Tue May 8 10:09:58 EDT 2007: cat /sys/block/md3/md/mismatch_cnt Tue May 8 10:09:58 EDT 2007: 0 Tue May 8 10:09:58 EDT 2007: The meta-device /dev/md0 has 2176 mismatched sectors. Tue May 8 10:09:58 EDT 2007: Executing repair on /dev/md0 Tue May 8 10:09:59 EDT 2007: The meta-device /dev/md1 has no mismatched sectors. Tue May 8 10:10:00 EDT 2007: The meta-device /dev/md2 has no mismatched sectors. Tue May 8 10:10:01 EDT 2007: The meta-device /dev/md3 has no mismatched sectors. Tue May 8 10:20:02 EDT 2007: All devices are clean... Tue May 8 10:20:02 EDT 2007: cat /sys/block/md0/md/mismatch_cnt Tue May 8 10:20:02 EDT 2007: 2176 Tue May 8 10:20:02 EDT 2007: cat /sys/block/md1/md/mismatch_cnt Tue May 8 10:20:02 EDT 2007: 0 Tue May 8 10:20:02 EDT 2007: cat /sys/block/md2/md/mismatch_cnt Tue May 8 10:20:02 EDT 2007: 0 Tue May 8 10:20:02 EDT 2007: cat /sys/block/md3/md/mismatch_cnt Tue May 8 10:20:02 EDT 2007: 0 I cannot repair this raid volume. I cannot reboot server without sending stop+A. init 6 stops at INIT:. After reboot, md0 is resynchronized. Regards, JKB - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.23.1: mdadm/raid5 hung/d-state
Neil Brown wrote: On Sunday November 4, [EMAIL PROTECTED] wrote: # ps auxww | grep D USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND root 273 0.0 0.0 0 0 ?DOct21 14:40 [pdflush] root 274 0.0 0.0 0 0 ?DOct21 13:00 [pdflush] After several days/weeks, this is the second time this has happened, while doing regular file I/O (decompressing a file), everything on the device went into D-state. At a guess (I haven't looked closely) I'd say it is the bug that was meant to be fixed by commit 4ae3f847e49e3787eca91bced31f8fd328d50496 except that patch applied badly and needed to be fixed with the following patch (not in git yet). These have been sent to stable@ and should be in the queue for 2.6.23.2 My linux-2.6.23/drivers/md/raid5.c contains your patch for a long time : ... spin_lock(sh-lock); clear_bit(STRIPE_HANDLE, sh-state); clear_bit(STRIPE_DELAYED, sh-state); s.syncing = test_bit(STRIPE_SYNCING, sh-state); s.expanding = test_bit(STRIPE_EXPAND_SOURCE, sh-state); s.expanded = test_bit(STRIPE_EXPAND_READY, sh-state); /* Now to look around and see what can be done */ /* clean-up completed biofill operations */ if (test_bit(STRIPE_OP_BIOFILL, sh-ops.complete)) { clear_bit(STRIPE_OP_BIOFILL, sh-ops.pending); clear_bit(STRIPE_OP_BIOFILL, sh-ops.ack); clear_bit(STRIPE_OP_BIOFILL, sh-ops.complete); } rcu_read_lock(); for (i=disks; i--; ) { mdk_rdev_t *rdev; struct r5dev *dev = sh-dev[i]; ... but it doesn't fix this bug. Regards, JKB - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.23.1: mdadm/raid5 hung/d-state
Justin Piszcz wrote: # ps auxww | grep D USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND root 273 0.0 0.0 0 0 ?DOct21 14:40 [pdflush] root 274 0.0 0.0 0 0 ?DOct21 13:00 [pdflush] After several days/weeks, this is the second time this has happened, while doing regular file I/O (decompressing a file), everything on the device went into D-state. Same observation here (kernel 2.6.23). I can see this bug when I try to synchronize a raid1 volume over iSCSI (each element is a raid5 volume), or sometimes only with a 1,5 TB raid5 volume. When this bug occurs, md subsystem eats 100% of one CPU and pdflush remains in D state too. What is your architecture ? I use two 32-threads T1000 (sparc64), and I'm trying to determine if this bug is arch specific. Regards, JKB - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Strange CPU occupation... and system hangs
BERTRAND Joël wrote: snip and some process are in D state : Root gershwin:[/etc] ps auwx | grep D USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND root 270 0.0 0.0 0 0 ?DOct27 1:17 [pdflush] root 3676 0.9 0.0 0 0 ?DOct27 56:03 [nfsd] root 5435 0.0 0.0 0 0 ?D Oct27 3:16 [md7_raid1] root 5438 0.0 0.0 0 0 ?D Oct27 1:01 [kjournald] root 5440 0.0 0.0 0 0 ?D Oct27 0:33 [loop0] root 5441 0.0 0.0 0 0 ?D Oct27 0:05 [kjournald] root 16442 0.0 0.0 20032 1208 pts/2D+ 13:23 0:00 iftop -i eth2 Why md7_raid is in D state ? Same question about iftop ? Some bad news... After ten or eleven hours, kernel crashes on this server. The last top screen is : top - 04:59:46 up 4 days, 16:24, 3 users, load average: 19.72, 19.22, 19.05 Tasks: 285 total, 5 running, 279 sleeping, 0 stopped, 1 zombie Cpu(s): 0.0%us, 4.2%sy, 0.0%ni, 68.5%id, 27.3%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 4139024k total, 4130800k used, 8224k free,38984k buffers Swap: 7815536k total, 304k used, 7815232k free,79056k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 5426 root 15 -5 000 R 100 0.0 970:17.21 md_d0_raid5 26923 root 20 0 3120 1568 1112 R2 0.0 13:32.24 top ... I have rebooted. I don't have any message in log files. I don't have any screen but I haven't seen anything on serial console. In ker.log, I have : Oct 31 15:36:15 gershwin kernel: swapper: page allocation failure. order:2, mode:0x4020 Oct 31 15:36:15 gershwin kernel: Call Trace: Oct 31 15:36:15 gershwin kernel: [004b6568] __slab_alloc+0x1b0/0x720 Oct 31 15:36:15 gershwin kernel: [004b87a8] __kmalloc_track_caller+0xb0/0xe0 Oct 31 15:36:15 gershwin kernel: [00601d68] __alloc_skb+0x50/0x120 Oct 31 15:36:15 gershwin kernel: [00642ee0] tcp_collapse+0x1e8/0x440 Oct 31 15:36:15 gershwin kernel: [00643298] tcp_prune_queue+0x160/0x3a0 Oct 31 15:36:15 gershwin kernel: [00643d08] tcp_data_queue+0x830/0xde0 Oct 31 15:36:15 gershwin kernel: [00645d74] tcp_rcv_established+0x35c/0x840 Oct 31 15:36:15 gershwin kernel: [0064cf7c] tcp_v4_do_rcv+0xe4/0x4a0 Oct 31 15:36:15 gershwin kernel: [0064fdd8] tcp_v4_rcv+0xb00/0xb20 Oct 31 15:36:15 gershwin kernel: [0062e2ac] ip_local_deliver+0x194/0x3a0 Oct 31 15:36:15 gershwin kernel: [0062dd98] ip_rcv+0x360/0x6e0 Oct 31 15:36:15 gershwin kernel: [00607f64] netif_receive_skb+0x1ec/0x480 Oct 31 15:36:15 gershwin kernel: [005a5fe0] tg3_poll+0x6c8/0xc40 Oct 31 15:36:15 gershwin kernel: [0060a940] net_rx_action+0x88/0x160 Oct 31 15:36:15 gershwin kernel: [00468078] __do_softirq+0x80/0x100 Oct 31 15:36:15 gershwin kernel: [0046815c] do_softirq+0x64/0x80 Oct 31 15:36:15 gershwin kernel: Mem-info: Oct 31 15:36:15 gershwin kernel: Normal per-cpu: Oct 31 15:36:15 gershwin kernel: CPU0: Hot: hi: 90, btch: 15 usd: 15 Cold: hi: 30, btch: 7 usd: 5 Oct 31 15:36:15 gershwin kernel: CPU1: Hot: hi: 90, btch: 15 usd: 31 Cold: hi: 30, btch: 7 usd: 4 Oct 31 15:36:15 gershwin kernel: CPU2: Hot: hi: 90, btch: 15 usd: 4 Cold: hi: 30, btch: 7 usd: 3 Oct 31 15:36:15 gershwin kernel: CPU3: Hot: hi: 90, btch: 15 usd: 82 Cold: hi: 30, btch: 7 usd: 2 Oct 31 15:36:15 gershwin kernel: CPU4: Hot: hi: 90, btch: 15 usd: 84 Cold: hi: 30, btch: 7 usd: 0 Oct 31 15:36:15 gershwin kernel: CPU5: Hot: hi: 90, btch: 15 usd: 65 Cold: hi: 30, btch: 7 usd: 4 Oct 31 15:36:15 gershwin kernel: CPU6: Hot: hi: 90, btch: 15 usd: 85 Cold: hi: 30, btch: 7 usd: 6 Oct 31 15:36:15 gershwin kernel: CPU7: Hot: hi: 90, btch: 15 usd: 69 Cold: hi: 30, btch: 7 usd: 4 Oct 31 15:36:15 gershwin kernel: CPU8: Hot: hi: 90, btch: 15 usd: 11 Cold: hi: 30, btch: 7 usd: 5 Oct 31 15:36:15 gershwin kernel: CPU9: Hot: hi: 90, btch: 15 usd: 75 Cold: hi: 30, btch: 7 usd: 1 Oct 31 15:36:15 gershwin kernel: CPU 10: Hot: hi: 90, btch: 15 usd: 84 Cold: hi: 30, btch: 7 usd: 2 Oct 31 15:36:15 gershwin kernel: CPU 11: Hot: hi: 90, btch: 15 usd: 13 Cold: hi: 30, btch: 7 usd: 1 Oct 31 15:36:15 gershwin kernel: CPU 12: Hot: hi: 90, btch: 15 usd: 17 Cold: hi: 30, btch: 7 usd: 23 Oct 31 15:36:15 gershwin kernel: CPU 13: Hot: hi: 90, btch: 15 usd: 7 Cold: hi: 30, btch: 7 usd: 25 Oct 31 15:36:15 gershwin kernel: CPU 14: Hot: hi: 90, btch: 15 usd: 64 Cold: hi: 30, btch: 7 usd: 27 Oct 31 15:36:15 gershwin kernel: CPU 15: Hot: hi: 90, btch: 15 usd: 12 Cold: hi: 30, btch: 7 usd: 6 Oct 31 15:36:15 gershwin kernel: CPU 16
Strange CPU occupation...
Hello, I'm looking for a bug in iSCSI target code, but I have found this morning a new bug that is certainly related to mine... Please consider these raid volumes: Root gershwin:[/etc] cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md7 : active raid1 sdi1[2](F) md_d0p1[0] 1464725632 blocks [2/1] [U_] md_d0 : active raid5 sdc1[0] sdh1[5] sdg1[4] sdf1[3] sde1[2] sdd1[1] 1464725760 blocks level 5, 64k chunk, algorithm 2 [6/6] [UU] md6 : active raid1 sda1[0] sdb1[1] 7815552 blocks [2/2] [UU] md5 : active raid1 sda8[0] sdb8[1] 14538752 blocks [2/2] [UU] md4 : active raid1 sda7[0] sdb7[1] 4883648 blocks [2/2] [UU] md3 : active raid1 sda6[0] sdb6[1] 9767424 blocks [2/2] [UU] md2 : active raid1 sda5[0] sdb5[1] 29294400 blocks [2/2] [UU] md1 : active raid1 sda2[0] sdb2[1] 489856 blocks [2/2] [UU] md0 : active raid1 sdb4[1] sda4[0] 4883648 blocks [2/2] [UU] unused devices: none Root gershwin:[/etc] md7 only has one disk because I cannot synchronize it over iSCSI. But without any message, load average of this server (24 threads T1000) increases until more than 9. top returns : top - 13:36:08 up 4 days, 1:00, 3 users, load average: 9.23, 8.46, 6.26 Tasks: 252 total, 5 running, 246 sleeping, 0 stopped, 1 zombie Cpu(s): 0.0%us, 4.2%sy, 0.0%ni, 87.4%id, 8.4%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 4139024k total, 4115920k used,23104k free, 743976k buffers Swap: 7815536k total, 304k used, 7815232k free, 2188048k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 5426 root 15 -5 000 R 100 0.0 46:32.54 md_d0_raid5 17215 root 20 0 3120 1552 1112 R1 0.0 0:01.38 top 1 root 20 0 2576 960 816 S0 0.0 0:09.74 init 2 root 15 -5 000 S0 0.0 0:00.00 kthreadd 3 root RT -5 000 S0 0.0 0:00.18 migration/0 4 root 15 -5 000 S0 0.0 0:00.18 ksoftirqd/0 and some process are in D state : Root gershwin:[/etc] ps auwx | grep D USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND root 270 0.0 0.0 0 0 ?DOct27 1:17 [pdflush] root 3676 0.9 0.0 0 0 ?DOct27 56:03 [nfsd] root 5435 0.0 0.0 0 0 ?D Oct27 3:16 [md7_raid1] root 5438 0.0 0.0 0 0 ?D Oct27 1:01 [kjournald] root 5440 0.0 0.0 0 0 ?D Oct27 0:33 [loop0] root 5441 0.0 0.0 0 0 ?D Oct27 0:05 [kjournald] root 16442 0.0 0.0 20032 1208 pts/2D+ 13:23 0:00 iftop -i eth2 Why md7_raid is in D state ? Same question about iftop ? Regards, JKB - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Raid1/5 over iSCSI trouble
Ming Zhang wrote: off topic, could you resubmit the alignment issue patch to list and see if tomof accept. he needs a patch inlined in email. it is found and fixed by you, so had better you post it (instead of me). thx. diff -u kernel.old/iscsi.c kernel/iscsi.c --- kernel.old/iscsi.c 2007-10-29 09:49:16.0 +0100 +++ kernel/iscsi.c 2007-10-17 11:19:14.0 +0200 @@ -726,13 +726,26 @@ case READ_10: case WRITE_10: case WRITE_VERIFY: - *off = be32_to_cpu(*(u32 *)cmd[2]); + *off = be32_to_cpuu32) cmd[2]) 24) | + (((u32) cmd[3]) 16) | + (((u32) cmd[4]) 8) | + cmd[5]); *len = (cmd[7] 8) + cmd[8]; break; case READ_16: case WRITE_16: - *off = be64_to_cpu(*(u64 *)cmd[2]); - *len = be32_to_cpu(*(u32 *)cmd[10]); + *off = be32_to_cpuu64) cmd[2]) 56) | + (((u64) cmd[3]) 48) | + (((u64) cmd[4]) 40) | + (((u64) cmd[5]) 32) | + (((u64) cmd[6]) 24) | + (((u64) cmd[7]) 16) | + (((u64) cmd[8]) 8) | + cmd[9]); + *len = be32_to_cpuu32) cmd[10]) 24) | + (((u32) cmd[11]) 16) | + (((u32) cmd[12]) 8) | + cmd[13]); break; default: BUG(); diff -u kernel.old/target_disk.c kernel/target_disk.c --- kernel.old/target_disk.c2007-10-29 09:49:16.0 +0100 +++ kernel/target_disk.c2007-10-17 16:04:06.0 +0200 @@ -66,13 +66,15 @@ unsigned char geo_m_pg[] = {0x04, 0x16, 0x00, 0x00, 0x00, 0x40, 0x00, 0x 00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x 00, 0x00, 0x00, 0x00, 0x00, 0x3a, 0x98, 0x00, 0x 00}; - u32 ncyl, *p; + u32 ncyl; + u32 n; /* assume 0xff heads, 15krpm. */ memcpy(ptr, geo_m_pg, sizeof(geo_m_pg)); ncyl = sec 14; /* 256 * 64 */ - p = (u32 *)(ptr + 1); - *p = *p | cpu_to_be32(ncyl); + memcpy(n,ptr+1,sizeof(u32)); + n = n | cpu_to_be32(ncyl); + memcpy(ptr+1, n, sizeof(u32)); return sizeof(geo_m_pg); } @@ -249,7 +251,10 @@ struct iet_volume *lun; int rest, idx = 0; - size = be32_to_cpu(*(u32 *)req-scb[6]); + size = be32_to_cpuu32) req-scb[6]) 24) | + (((u32) req-scb[7]) 16) | + (((u32) req-scb[8]) 8) | + req-scb[9]); if (size 16) return -1; Regards, JKB - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Raid1/5 over iSCSI trouble
Dan Williams wrote: On 10/24/07, BERTRAND Joël [EMAIL PROTECTED] wrote: Hello, Any news about this trouble ? Any idea ? I'm trying to fix it, but I don't see any specific interaction between raid5 and istd. Does anyone try to reproduce this bug on another arch than sparc64 ? I only use sparc32 and 64 servers and I cannot test on other archs. Of course, I have a laptop, but I cannot create a raid5 array on its internal HD to test this configuration ;-) Can you collect some oprofile data, as Ming suggested, so we can maybe see what md_d0_raid5 and istd1 are fighting about? Hopefully it is as painless to run on sparc as it is on IA: opcontrol --start --vmlinux=/path/to/vmlinux wait opcontrol --stop opreport --image-path=/lib/modules/`uname -r` -l Done. Profiling through timer interrupt samples %image name app name symbol name 20028038 92.9510 vmlinux-2.6.23 vmlinux-2.6.23 cpu_idle 1198566 5.5626 vmlinux-2.6.23 vmlinux-2.6.23 schedule 41558 0.1929 vmlinux-2.6.23 vmlinux-2.6.23 yield 34791 0.1615 vmlinux-2.6.23 vmlinux-2.6.23 NGmemcpy 18417 0.0855 vmlinux-2.6.23 vmlinux-2.6.23 xor_niagara_5 17430 0.0809 raid456 raid456 (no symbols) 15837 0.0735 vmlinux-2.6.23 vmlinux-2.6.23 sys_sched_yield 14860 0.0690 iscsi_trgt.koiscsi_trgt istd 12705 0.0590 nf_conntrack nf_conntrack (no symbols) 9236 0.0429 libc-2.6.1.solibc-2.6.1.so(no symbols) 9034 0.0419 vmlinux-2.6.23 vmlinux-2.6.23 xor_niagara_2 6534 0.0303 oprofiledoprofiled(no symbols) 6149 0.0285 vmlinux-2.6.23 vmlinux-2.6.23 scsi_request_fn 5947 0.0276 ip_tablesip_tables(no symbols) 4510 0.0209 vmlinux-2.6.23 vmlinux-2.6.23 dma_4v_map_single 3823 0.0177 vmlinux-2.6.23 vmlinux-2.6.23 __make_request 3326 0.0154 vmlinux-2.6.23 vmlinux-2.6.23 tg3_poll 3162 0.0147 iscsi_trgt.koiscsi_trgt scsi_cmnd_exec 3091 0.0143 vmlinux-2.6.23 vmlinux-2.6.23 scsi_dispatch_cmd 2849 0.0132 vmlinux-2.6.23 vmlinux-2.6.23 tcp_v4_rcv 2811 0.0130 vmlinux-2.6.23 vmlinux-2.6.23 nf_iterate 2729 0.0127 vmlinux-2.6.23 vmlinux-2.6.23 _spin_lock_bh 2551 0.0118 vmlinux-2.6.23 vmlinux-2.6.23 kfree 2467 0.0114 vmlinux-2.6.23 vmlinux-2.6.23 kmem_cache_free 2314 0.0107 vmlinux-2.6.23 vmlinux-2.6.23 atomic_add 2065 0.0096 vmlinux-2.6.23 vmlinux-2.6.23 NGbzero_loop 1826 0.0085 vmlinux-2.6.23 vmlinux-2.6.23 ip_rcv 1823 0.0085 nf_conntrack_ipv4nf_conntrack_ipv4(no symbols) 1822 0.0085 vmlinux-2.6.23 vmlinux-2.6.23 clear_bit 1767 0.0082 python2.4python2.4(no symbols) 1734 0.0080 vmlinux-2.6.23 vmlinux-2.6.23 atomic_sub_ret 1694 0.0079 vmlinux-2.6.23 vmlinux-2.6.23 tcp_rcv_established 1673 0.0078 vmlinux-2.6.23 vmlinux-2.6.23 tcp_recvmsg 1670 0.0078 vmlinux-2.6.23 vmlinux-2.6.23 netif_receive_skb 1668 0.0077 vmlinux-2.6.23 vmlinux-2.6.23 set_bit 1545 0.0072 vmlinux-2.6.23 vmlinux-2.6.23 __kmalloc_track_caller 1526 0.0071 iptable_nat iptable_nat (no symbols) 1526 0.0071 vmlinux-2.6.23 vmlinux-2.6.23 kmem_cache_alloc 1373 0.0064 vmlinux-2.6.23 vmlinux-2.6.23 generic_unplug_device ... Is it enough ? Regards, JKB - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Raid1/5 over iSCSI trouble
Dan Williams wrote: On 10/27/07, BERTRAND Joël [EMAIL PROTECTED] wrote: Dan Williams wrote: Can you collect some oprofile data, as Ming suggested, so we can maybe see what md_d0_raid5 and istd1 are fighting about? Hopefully it is as painless to run on sparc as it is on IA: opcontrol --start --vmlinux=/path/to/vmlinux wait opcontrol --stop opreport --image-path=/lib/modules/`uname -r` -l Done. [..] Is it enough ? I would expect md_d0_raid5 and istd1 to show up pretty high in the list if they are constantly pegged at a 100% CPU utilization like you showed in the failure case. Maybe this was captured after the target has disconnected? No, I have launched opcontrol before starting raid1 creation, and stopped after disconnection. Don't forget that this server has 32 CPU's. Regards, JKB - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Raid1/5 over iSCSI trouble
Hello, Any news about this trouble ? Any idea ? I'm trying to fix it, but I don't see any specific interaction between raid5 and istd. Does anyone try to reproduce this bug on another arch than sparc64 ? I only use sparc32 and 64 servers and I cannot test on other archs. Of course, I have a laptop, but I cannot create a raid5 array on its internal HD to test this configuration ;-) Please note that I won't read my mails until next saturday morning (CEST). After disconnection of iSCSI target : Tasks: 232 total, 7 running, 224 sleeping, 0 stopped, 1 zombie Cpu(s): 0.0%us, 15.2%sy, 0.0%ni, 84.3%id, 0.0%wa, 0.1%hi, 0.3%si, 0.0%st Mem: 4139032k total, 4127584k used,11448k free,95752k buffers Swap: 7815536k total,0k used, 7815536k free, 3758792k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 9738 root 15 -5 000 R 100 0.0 4:56.82 md_d0_raid5 9774 root 15 -5 000 R 100 0.0 5:52.41 istd1 9739 root 15 -5 000 R 14 0.0 0:28.90 md_d0_resync 9916 root 20 0 3248 1544 1120 R2 0.0 0:00.56 top 4129 root 20 0 41648 5024 2432 S0 0.1 2:56.17 fail2ban-server 1 root 20 0 2576 960 816 S0 0.0 0:01.58 init 2 root 15 -5 000 S0 0.0 0:00.00 kthreadd 3 root RT -5 000 S0 0.0 0:00.00 migration/0 4 root 15 -5 000 S0 0.0 0:00.02 ksoftirqd/0 5 root RT -5 000 S0 0.0 0:00.00 migration/1 6 root 15 -5 000 S0 0.0 0:00.00 ksoftirqd/1 Regards, JKB - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Raid1/5 over iSCSI trouble
Bill Davidsen wrote: BERTRAND Joël wrote: Sorry for this last mail. I have found another mistake, but I don't know if this bug comes from iscsi-target or raid5 itself. iSCSI target is disconnected because istd1 and md_d0_raid5 kernel threads use 100% of CPU each ! Tasks: 235 total, 6 running, 227 sleeping, 0 stopped, 2 zombie Cpu(s): 0.1%us, 12.5%sy, 0.0%ni, 87.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 4139032k total, 218424k used, 3920608k free,10136k buffers Swap: 7815536k total,0k used, 7815536k free,64808k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 5824 root 15 -5 000 R 100 0.0 10:34.25 istd1 5599 root 15 -5 000 R 100 0.0 7:25.43 md_d0_raid5 Given that the summary shows 87.4% idle, something is not right. You might try another tool, like vmstat, to at least verify the way the CPU is being used. When you can't trust what your tools tell you it gets really hard to make decisions based on the data. Don't forget this box is a 32-CPU server. JKB - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Iscsitarget-devel] Abort Task ?
Ming Zhang wrote: as Ross pointed out, many io pattern only have 1 outstanding io at any time, so there is only one work thread actively to serve it. so it can not exploit the multiple core here. you see 100% at nullio or fileio? with disk, most time should spend on iowait and cpu utilization should not high at all. With both nullio and fileio... - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Iscsitarget-devel] Abort Task ?
Ming Zhang wrote: On Fri, 2007-10-19 at 09:48 +0200, BERTRAND Joël wrote: Ross S. W. Walker wrote: BERTRAND Joël wrote: BERTRAND Joël wrote: I can format serveral times (mkfs.ext3) a 1.5 TB volume over iSCSI without any trouble. I can read and write on this virtual disk without any trouble. Now, I have configured ietd with : Lun 0 Sectors=1464725758,Type=nullio and I run on initiator side : Root gershwin:[/dev] dd if=/dev/zero of=/dev/sdj bs=8192 479482+0 records in 479482+0 records out 3927916544 bytes (3.9 GB) copied, 153.222 seconds, 25.6 MB/s Root gershwin:[/dev] dd if=/dev/zero of=/dev/sdj bs=8192 I'm waitinfor a crash. No one when I write these lines. I suspect an interaction between raid and iscsi. I simultanely run : Root gershwin:[/dev] dd if=/dev/zero of=/dev/sdj bs=8192 8397210+0 records in 8397210+0 records out 68789944320 bytes (69 GB) copied, 2732.55 seconds, 25.2 MB/s and Root gershwin:[~] dd if=/dev/sdj of=/dev/null bs=8192 739200+0 records in 739199+0 records out 6055518208 bytes (6.1 GB) copied, 447.178 seconds, 13.5 MB/s without any trouble. The speed can definitely be improved. Look at your network setup and use ping to try and get the network latency to a minimum. # ping -A -s 8192 172.16.24.140 --- 172.16.24.140 ping statistics --- 14058 packets transmitted, 14057 received, 0% packet loss, time 9988ms rtt min/avg/max/mdev = 0.234/0.268/2.084/0.041 ms, ipg/ewma 0.710/0.260 ms gershwin:[~] ping -A -s 8192 192.168.0.2 PING 192.168.0.2 (192.168.0.2) 8192(8220) bytes of data. 8200 bytes from 192.168.0.2: icmp_seq=1 ttl=64 time=0.693 ms 8200 bytes from 192.168.0.2: icmp_seq=2 ttl=64 time=0.595 ms 8200 bytes from 192.168.0.2: icmp_seq=3 ttl=64 time=0.583 ms 8200 bytes from 192.168.0.2: icmp_seq=4 ttl=64 time=0.589 ms 8200 bytes from 192.168.0.2: icmp_seq=5 ttl=64 time=0.580 ms 8200 bytes from 192.168.0.2: icmp_seq=6 ttl=64 time=0.594 ms 8200 bytes from 192.168.0.2: icmp_seq=7 ttl=64 time=0.580 ms 8200 bytes from 192.168.0.2: icmp_seq=8 ttl=64 time=0.592 ms 8200 bytes from 192.168.0.2: icmp_seq=9 ttl=64 time=0.589 ms 8200 bytes from 192.168.0.2: icmp_seq=10 ttl=64 time=0.571 ms 8200 bytes from 192.168.0.2: icmp_seq=11 ttl=64 time=0.588 ms 8200 bytes from 192.168.0.2: icmp_seq=12 ttl=64 time=0.580 ms 8200 bytes from 192.168.0.2: icmp_seq=13 ttl=64 time=0.587 ms --- 192.168.0.2 ping statistics --- 13 packets transmitted, 13 received, 0% packet loss, time 2400ms rtt min/avg/max/mdev = 0.571/0.593/0.693/0.044 ms, ipg/ewma 200.022/0.607 ms gershwin:[~] Both initiator and target are alone on a gigabit NIC (Tigon3). On target server, istd1 takes 100% of a CPU (and only one CPU, even my T1000 can simultaneous run 32 threads). I think the limitation comes from istd1. usually istdx will not take 100% cpu with 1G network, especially when using disk as back storage, some kind of profiling work might be helpful to tell what happened... forgot to ask, your sparc64 platform cpu spec. Root gershwin:[/mnt/solaris] cat /proc/cpuinfo cpu : UltraSparc T1 (Niagara) fpu : UltraSparc T1 integrated FPU prom: OBP 4.23.4 2006/08/04 20:45 type: sun4v ncpus probed: 24 ncpus active: 24 D$ parity tl1 : 0 I$ parity tl1 : 0 Both servers are built with 1 GHz T1 processors (6 cores, 24 threads). Regards, JKB - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Raid5 trouble
Bill Davidsen wrote: Dan Williams wrote: On Fri, 2007-10-19 at 01:04 -0700, BERTRAND Joël wrote: I run for 12 hours some dd's (read and write in nullio) between initiator and target without any disconnection. Thus iSCSI code seems to be robust. Both initiator and target are alone on a single gigabit ethernet link (without any switch). I'm investigating... Can you reproduce on 2.6.22? Also, I do not think this is the cause of your failure, but you have CONFIG_DMA_ENGINE=y in your config. Setting this to 'n' will compile out the unneeded checks for offload engines in async_memcpy and async_xor. Given that offload engines are far less tested code, I think this is a very good thing to try! I'm trying wihtout CONFIG_DMA_ENGINE=y. istd1 only uses 40% of one CPU when I rebuild my raid1 array. 1% of this array was now resynchronized without any hang. Root gershwin:[/usr/scripts] cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md7 : active raid1 sdi1[2] md_d0p1[0] 1464725632 blocks [2/1] [U_] [] recovery = 1.0% (15705536/1464725632) finish=1103.9min speed=21875K/sec Regards, JKB - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Raid1/5 over iSCSI trouble
BERTRAND Joël wrote: Bill Davidsen wrote: Dan Williams wrote: On Fri, 2007-10-19 at 01:04 -0700, BERTRAND Joël wrote: I run for 12 hours some dd's (read and write in nullio) between initiator and target without any disconnection. Thus iSCSI code seems to be robust. Both initiator and target are alone on a single gigabit ethernet link (without any switch). I'm investigating... Can you reproduce on 2.6.22? Also, I do not think this is the cause of your failure, but you have CONFIG_DMA_ENGINE=y in your config. Setting this to 'n' will compile out the unneeded checks for offload engines in async_memcpy and async_xor. Given that offload engines are far less tested code, I think this is a very good thing to try! I'm trying wihtout CONFIG_DMA_ENGINE=y. istd1 only uses 40% of one CPU when I rebuild my raid1 array. 1% of this array was now resynchronized without any hang. Root gershwin:[/usr/scripts] cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md7 : active raid1 sdi1[2] md_d0p1[0] 1464725632 blocks [2/1] [U_] [] recovery = 1.0% (15705536/1464725632) finish=1103.9min speed=21875K/sec Same result... connection2:0: iscsi: detected conn error (1011) session2: iscsi: session recovery timed out after 120 secs sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery Regards, JKB - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Raid5 trouble
Bill Davidsen wrote: Dan Williams wrote: I found a problem which may lead to the operations count dropping below zero. If ops_complete_biofill() gets preempted in between the following calls: raid5.c:554 clear_bit(STRIPE_OP_BIOFILL, sh-ops.ack); raid5.c:555 clear_bit(STRIPE_OP_BIOFILL, sh-ops.pending); ...then get_stripe_work() can recount/re-acknowledge STRIPE_OP_BIOFILL causing the assertion. In fact, the 'pending' bit should always be cleared first, but the other cases are protected by spin_lock(sh-lock). Patch attached. Once this patch has been vetted, can it be offered to -stable for 2.6.23? Or to be pedantic, it *can*, will you make that happen? I never see any oops with this patch. But I cannot create a RAID1 array with a local RAID5 volume and a foreign RAID5 array exported by iSCSI. iSCSI seems to works fine, but RAID1 creation randomly aborts due to a unknown SCSI task on target side. I have stressed iSCSI target with some simultaneous I/O without any trouble (nullio, fileio and blockio), thus I suspect another bug in raid code (or an arch specific bug). The last two days, I have made some tests to isolate and reproduce this bug: 1/ iSCSI target and initiator seem work when I export with iSCSI a raid5 array; 2/ raid1 and raid5 seem work with local disks; 3/ iSCSI target is disconnected only when I create a raid1 volume over iSCSI (blockio _and_ fileio) with following message: Oct 18 10:43:52 poulenc kernel: iscsi_trgt: cmnd_abort(1156) 29 1 0 42 57344 0 0 Oct 18 10:43:52 poulenc kernel: iscsi_trgt: Abort Task (01) issued on tid:1 lun:0 by sid:630024457682948 (Unknown Task) I run for 12 hours some dd's (read and write in nullio) between initiator and target without any disconnection. Thus iSCSI code seems to be robust. Both initiator and target are alone on a single gigabit ethernet link (without any switch). I'm investigating... Regards, JKB - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Iscsitarget-devel] Abort Task ?
Ross S. W. Walker wrote: BERTRAND Joël wrote: BERTRAND Joël wrote: I can format serveral times (mkfs.ext3) a 1.5 TB volume over iSCSI without any trouble. I can read and write on this virtual disk without any trouble. Now, I have configured ietd with : Lun 0 Sectors=1464725758,Type=nullio and I run on initiator side : Root gershwin:[/dev] dd if=/dev/zero of=/dev/sdj bs=8192 479482+0 records in 479482+0 records out 3927916544 bytes (3.9 GB) copied, 153.222 seconds, 25.6 MB/s Root gershwin:[/dev] dd if=/dev/zero of=/dev/sdj bs=8192 I'm waitinfor a crash. No one when I write these lines. I suspect an interaction between raid and iscsi. I simultanely run : Root gershwin:[/dev] dd if=/dev/zero of=/dev/sdj bs=8192 8397210+0 records in 8397210+0 records out 68789944320 bytes (69 GB) copied, 2732.55 seconds, 25.2 MB/s and Root gershwin:[~] dd if=/dev/sdj of=/dev/null bs=8192 739200+0 records in 739199+0 records out 6055518208 bytes (6.1 GB) copied, 447.178 seconds, 13.5 MB/s without any trouble. The speed can definitely be improved. Look at your network setup and use ping to try and get the network latency to a minimum. # ping -A -s 8192 172.16.24.140 --- 172.16.24.140 ping statistics --- 14058 packets transmitted, 14057 received, 0% packet loss, time 9988ms rtt min/avg/max/mdev = 0.234/0.268/2.084/0.041 ms, ipg/ewma 0.710/0.260 ms gershwin:[~] ping -A -s 8192 192.168.0.2 PING 192.168.0.2 (192.168.0.2) 8192(8220) bytes of data. 8200 bytes from 192.168.0.2: icmp_seq=1 ttl=64 time=0.693 ms 8200 bytes from 192.168.0.2: icmp_seq=2 ttl=64 time=0.595 ms 8200 bytes from 192.168.0.2: icmp_seq=3 ttl=64 time=0.583 ms 8200 bytes from 192.168.0.2: icmp_seq=4 ttl=64 time=0.589 ms 8200 bytes from 192.168.0.2: icmp_seq=5 ttl=64 time=0.580 ms 8200 bytes from 192.168.0.2: icmp_seq=6 ttl=64 time=0.594 ms 8200 bytes from 192.168.0.2: icmp_seq=7 ttl=64 time=0.580 ms 8200 bytes from 192.168.0.2: icmp_seq=8 ttl=64 time=0.592 ms 8200 bytes from 192.168.0.2: icmp_seq=9 ttl=64 time=0.589 ms 8200 bytes from 192.168.0.2: icmp_seq=10 ttl=64 time=0.571 ms 8200 bytes from 192.168.0.2: icmp_seq=11 ttl=64 time=0.588 ms 8200 bytes from 192.168.0.2: icmp_seq=12 ttl=64 time=0.580 ms 8200 bytes from 192.168.0.2: icmp_seq=13 ttl=64 time=0.587 ms --- 192.168.0.2 ping statistics --- 13 packets transmitted, 13 received, 0% packet loss, time 2400ms rtt min/avg/max/mdev = 0.571/0.593/0.693/0.044 ms, ipg/ewma 200.022/0.607 ms gershwin:[~] Both initiator and target are alone on a gigabit NIC (Tigon3). On target server, istd1 takes 100% of a CPU (and only one CPU, even my T1000 can simultaneous run 32 threads). I think the limitation comes from istd1. You want your avg ping time for 8192 byte payloads to be 300us or less. 1000/.268 = 3731 IOPS @ 8k = 30 MB/s If you use apps that do overlapping asynchronous IO you can see better numbers. Regards, JKB - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Raid5 trouble
Dan, I'm testing your last patch (fix-biofill-clear2.patch). It seems to work: Every 1.0s: cat /proc/mdstatThu Oct 18 10:28:55 2007 Personalities : [raid1] [raid6] [raid5] [raid4] md7 : active raid1 sdi1[1] md_d0p1[0] 1464725632 blocks [2/2] [UU] [] resync = 0.4% (6442248/1464725632) finish=1216.6 min speed=19974K/sec md_d0 : active raid5 sdc1[0] sdh1[5] sdg1[4] sdf1[3] sde1[2] sdd1[1] 1464725760 blocks level 5, 64k chunk, algorithm 2 [6/6] [UU] I hope it fixes bug I have seen. I shall come back - I think tomorrow, my raid volume requires more than 20 hours to be created - to say if it works fine. Regards, JKB - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Iscsitarget-devel] Abort Task ?
Ming Zhang wrote: On Thu, 2007-10-18 at 11:33 -0400, Ross S. W. Walker wrote: BERTRAND Joël wrote: BERTRAND Joël wrote: BERTRAND Joël wrote: Hello, When I try to create a raid1 volume over iscsi, process aborts with : - on target side: iscsi_trgt: cmnd_abort(1156) 29 1 0 42 57344 0 0 iscsi_trgt: Abort Task (01) issued on tid:1 lun:0 by sid:630024457682948 (Unknown Task) Next run: iscsi_trgt: cmnd_abort(1156) 13 1 0 42 57344 0 0 iscsi_trgt: Abort Task (01) issued on tid:1 lun:0 by sid:630058817421315 (Unknown Task) You can see that both lines are very similar. I shall try to use blockio instead fileio. With blockio, I got the following message... iscsi_trgt: cmnd_abort(1156) c 1 0 42 8192 0 0 iscsi_trgt: Abort Task (01) issued on tid:1 lun:0 by sid:630024457682946 (Unknown Task) Command is the same. What is the signification of 1156 ? Both outputs are from the same Abort Task management function the 1156 refers to the line in iscsi.c where the debug printf was issued. The other is the more verbose informative message that says an Abort Task command was issued, but the task was not found. pure guess, this might because the sparc64 you are using. could you export a NULLIO target and do some intensive io tests? sort out these platform issues first... I can format serveral times (mkfs.ext3) a 1.5 TB volume over iSCSI without any trouble. I can read and write on this virtual disk without any trouble. Now, I have configured ietd with : Lun 0 Sectors=1464725758,Type=nullio and I run on initiator side : Root gershwin:[/dev] dd if=/dev/zero of=/dev/sdj bs=8192 479482+0 records in 479482+0 records out 3927916544 bytes (3.9 GB) copied, 153.222 seconds, 25.6 MB/s Root gershwin:[/dev] dd if=/dev/zero of=/dev/sdj bs=8192 I'm waitinfor a crash. No one when I write these lines. I suspect an interaction between raid and iscsi. Regards, JKB - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Iscsitarget-devel] Abort Task ?
BERTRAND Joël wrote: I can format serveral times (mkfs.ext3) a 1.5 TB volume over iSCSI without any trouble. I can read and write on this virtual disk without any trouble. Now, I have configured ietd with : Lun 0 Sectors=1464725758,Type=nullio and I run on initiator side : Root gershwin:[/dev] dd if=/dev/zero of=/dev/sdj bs=8192 479482+0 records in 479482+0 records out 3927916544 bytes (3.9 GB) copied, 153.222 seconds, 25.6 MB/s Root gershwin:[/dev] dd if=/dev/zero of=/dev/sdj bs=8192 I'm waitinfor a crash. No one when I write these lines.I suspect an interaction between raid and iscsi. I simultanely run : Root gershwin:[/dev] dd if=/dev/zero of=/dev/sdj bs=8192 8397210+0 records in 8397210+0 records out 68789944320 bytes (69 GB) copied, 2732.55 seconds, 25.2 MB/s and Root gershwin:[~] dd if=/dev/sdj of=/dev/null bs=8192 739200+0 records in 739199+0 records out 6055518208 bytes (6.1 GB) copied, 447.178 seconds, 13.5 MB/s without any trouble. Regards, JKB - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Raid5 trouble
BERTRAND Joël wrote: Hello, I run 2.6.23 linux kernel on two T1000 (sparc64) servers. Each server has a partitionable raid5 array (/dev/md/d0) and I have to synchronize both raid5 volumes by raid1. Thus, I have tried to build a raid1 volume between /dev/md/d0p1 and /dev/sdi1 (exported by iscsi from the second server) and I obtain a BUG : Root gershwin:[/usr/scripts] mdadm -C /dev/md7 -l1 -n2 /dev/md/d0p1 /dev/sdi1 ... Hello, I have fixed iscsi-target, and I have tested it. It works now without any trouble. Patches were posted on iscsi-target mailing list. When I use iSCSI to access to foreign raid5 volume, it works fine. I can format foreign volume, copy large files on it... But when I tried to create a new raid1 volume with a local raid5 volume and a foreign raid5 volume, I receive my well known Oops. You can find my dmesg after Oops : md: md_d0 stopped. md: bindsdd1 md: bindsde1 md: bindsdf1 md: bindsdg1 md: bindsdh1 md: bindsdc1 raid5: device sdc1 operational as raid disk 0 raid5: device sdh1 operational as raid disk 5 raid5: device sdg1 operational as raid disk 4 raid5: device sdf1 operational as raid disk 3 raid5: device sde1 operational as raid disk 2 raid5: device sdd1 operational as raid disk 1 raid5: allocated 12518kB for md_d0 raid5: raid level 5 set md_d0 active with 6 out of 6 devices, algorithm 2 RAID5 conf printout: --- rd:6 wd:6 disk 0, o:1, dev:sdc1 disk 1, o:1, dev:sdd1 disk 2, o:1, dev:sde1 disk 3, o:1, dev:sdf1 disk 4, o:1, dev:sdg1 disk 5, o:1, dev:sdh1 md_d0: p1 scsi3 : iSCSI Initiator over TCP/IP scsi 3:0:0:0: Direct-Access IET VIRTUAL-DISK 0PQ: 0 ANSI: 4 sd 3:0:0:0: [sdi] 2929451520 512-byte hardware sectors (1499879 MB) sd 3:0:0:0: [sdi] Write Protect is off sd 3:0:0:0: [sdi] Mode Sense: 77 00 00 08 sd 3:0:0:0: [sdi] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA sd 3:0:0:0: [sdi] 2929451520 512-byte hardware sectors (1499879 MB) sd 3:0:0:0: [sdi] Write Protect is off sd 3:0:0:0: [sdi] Mode Sense: 77 00 00 08 sd 3:0:0:0: [sdi] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA sdi: sdi1 sd 3:0:0:0: [sdi] Attached SCSI disk md: bindmd_d0p1 md: bindsdi1 md: md7: raid array is not clean -- starting background reconstruction raid1: raid set md7 active with 2 out of 2 mirrors md: resync of RAID array md7 md: minimum _guaranteed_ speed: 1000 KB/sec/disk. md: using maximum available idle IO bandwidth (but not more than 20 KB/sec) for resync. md: using 256k window, over a total of 1464725632 blocks. kernel BUG at drivers/md/raid5.c:380! \|/ \|/ @'/ .. \`@ /_| \__/ |_\ \__U_/ md7_resync(4929): Kernel bad sw trap 5 [#1] TSTATE: 80001606 TPC: 005ed50c TNPC: 005ed510 Y: Not tainted TPC: get_stripe_work+0x1f4/0x200 g0: 0005 g1: 007c0400 g2: 0001 g3: 00748400 g4: f800feeb6880 g5: f8000208 g6: f800e7598000 g7: 00748528 o0: 0029 o1: 00715798 o2: 017c o3: 0005 o4: 0006 o5: f800e8f0a060 sp: f800e759ad81 ret_pc: 005ed504 RPC: get_stripe_work+0x1ec/0x200 l0: 0002 l1: l2: f800e8f0a0a0 l3: f800e8f09fe8 l4: f800e8f0a088 l5: fff8 l6: 0005 l7: f800e8374000 i0: f800e8f0a028 i1: i2: 0004 i3: f800e759b720 i4: 0080 i5: 0080 i6: f800e759ae51 i7: 005f0274 I7: handle_stripe5+0x4fc/0x1340 Caller[005f0274]: handle_stripe5+0x4fc/0x1340 Caller[005f211c]: handle_stripe+0x24/0x13e0 Caller[005f4450]: make_request+0x358/0x600 Caller[00542890]: generic_make_request+0x198/0x220 Caller[005eb240]: sync_request+0x608/0x640 Caller[005fef7c]: md_do_sync+0x384/0x920 Caller[005ff8f0]: md_thread+0x38/0x140 Caller[00478b40]: kthread+0x48/0x80 Caller[004273d0]: kernel_thread+0x38/0x60 Caller[00478de0]: kthreadd+0x148/0x1c0 Instruction DUMP: 9210217c 7ff8f57f 90122398 91d02005 30680004 0100 0100 0100 9de3bf00 I suspect a major bug in raid5 code but I don't know how debug it... md7 was crated by mdadm -C /dev/md7 -l1 -n2 /dev/md/d0 /dev/sdi1. /dev/md/d0 is a raid5 volume, and sdi a iSCSI disk. Regards, JKB - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Raid5 trouble
Dan Williams wrote: On 10/17/07, BERTRAND Joël [EMAIL PROTECTED] wrote: BERTRAND Joël wrote: Hello, I run 2.6.23 linux kernel on two T1000 (sparc64) servers. Each server has a partitionable raid5 array (/dev/md/d0) and I have to synchronize both raid5 volumes by raid1. Thus, I have tried to build a raid1 volume between /dev/md/d0p1 and /dev/sdi1 (exported by iscsi from the second server) and I obtain a BUG : Root gershwin:[/usr/scripts] mdadm -C /dev/md7 -l1 -n2 /dev/md/d0p1 /dev/sdi1 ... Hello, I have fixed iscsi-target, and I have tested it. It works now without any trouble. Patches were posted on iscsi-target mailing list. When I use iSCSI to access to foreign raid5 volume, it works fine. I can format foreign volume, copy large files on it... But when I tried to create a new raid1 volume with a local raid5 volume and a foreign raid5 volume, I receive my well known Oops. You can find my dmesg after Oops : Your patch does not work for me. It was applied, new kernel was built, and I obtain the same Oops. Can you send your .config and your bootup dmesg? Yes, of course ;-) Both files are attached. My new Oops is : kernel BUG at drivers/md/raid5.c:380! \|/ \|/ @'/ .. \`@ /_| \__/ |_\ \__U_/ md7_resync(4258): Kernel bad sw trap 5 [#1] TSTATE: 80001606 TPC: 005ed50c TNPC: 005ed510 Y: Not tainted TPC: get_stripe_work+0x1f4/0x200 (exactly the same than the old one ;-) ). I have patched iscsi-target to avoid alignement bug on sparc64. Do you think a bug in ietd can produced this kind of bug ? Patch I have written for iscsi-target (against SVN) is attached too. Regards, JKB PROMLIB: Sun IEEE Boot Prom 'OBP 4.23.4 2006/08/04 20:45' PROMLIB: Root node compatible: sun4v Linux version 2.6.23 ([EMAIL PROTECTED]) (gcc version 4.1.3 20070831 (prerelease) (Debian 4.1.2-16)) #7 SMP Wed Oct 17 17:52:22 CEST 2007 ARCH: SUN4V Ethernet address: 00:14:4f:6f:59:fe OF stdout device is: /[EMAIL PROTECTED]/[EMAIL PROTECTED] PROM: Built device tree with 74930 bytes of memory. MDESC: Size is 32560 bytes. PLATFORM: banner-name [Sun Fire(TM) T1000] PLATFORM: name [SUNW,Sun-Fire-T1000] PLATFORM: hostid [846f59fe] PLATFORM: serial# [00ab4130] PLATFORM: stick-frequency [3b9aca00] PLATFORM: mac-address [144f6f59fe] PLATFORM: watchdog-resolution [1000 ms] PLATFORM: watchdog-max-timeout [3153600 ms] On node 0 totalpages: 522246 Normal zone: 3583 pages used for memmap Normal zone: 0 pages reserved Normal zone: 518663 pages, LIFO batch:15 Movable zone: 0 pages used for memmap Built 1 zonelists in Zone order. Total pages: 518663 Kernel command line: root=/dev/md0 ro md=0,/dev/sda4,/dev/sdb4 raid=noautodetect md: Will configure md0 (super-block) from /dev/sda4,/dev/sdb4, below. PID hash table entries: 4096 (order: 12, 32768 bytes) clocksource: mult[1] shift[16] clockevent: mult[8000] shift[31] Console: colour dummy device 80x25 console [tty0] enabled Dentry cache hash table entries: 524288 (order: 9, 4194304 bytes) Inode-cache hash table entries: 262144 (order: 8, 2097152 bytes) Memory: 4138072k available (2608k kernel code, 960k data, 144k init) [f800,fffc8000] SLUB: Genslabs=23, HWalign=32, Order=0-2, MinObjects=8, CPUs=32, Nodes=1 Calibrating delay using timer specific routine.. 1995.16 BogoMIPS (lpj=3990330) Mount-cache hash table entries: 512 Brought up 24 CPUs xor: automatically using best checksumming function: Niagara Niagara : 240.000 MB/sec xor: using function: Niagara (240.000 MB/sec) NET: Registered protocol family 16 PCI: Probing for controllers. SUN4V_PCI: Registered hvapi major[1] minor[0] /[EMAIL PROTECTED]: SUN4V PCI Bus Module /[EMAIL PROTECTED]: PCI IO[e81000] MEM[ea] /[EMAIL PROTECTED]: SUN4V PCI Bus Module /[EMAIL PROTECTED]: PCI IO[f01000] MEM[f2] PCI: Scanning PBM /[EMAIL PROTECTED] PCI: Scanning PBM /[EMAIL PROTECTED] ebus: No EBus's found. SCSI subsystem initialized NET: Registered protocol family 2 Time: stick clocksource has been installed. Switched to high resolution mode on CPU 0 Switched to high resolution mode on CPU 20 Switched to high resolution mode on CPU 8 Switched to high resolution mode on CPU 21 Switched to high resolution mode on CPU 9 Switched to high resolution mode on CPU 22 Switched to high resolution mode on CPU 10 Switched to high resolution mode on CPU 23 Switched to high resolution mode on CPU 11 Switched to high resolution mode on CPU 12 Switched to high resolution mode on CPU 13 Switched to high resolution mode on CPU 1 Switched to high resolution mode on CPU 14 Switched to high resolution mode on CPU 2 Switched to high resolution mode on CPU 15 Switched to high resolution mode on CPU 3 Switched to high resolution mode on CPU 16 Switched to high resolution mode on CPU 4 Switched to high resolution mode on CPU 17 Switched to high resolution mode
Re: [BUG] Raid5 trouble
Dan Williams wrote: On 10/17/07, Dan Williams [EMAIL PROTECTED] wrote: On 10/17/07, BERTRAND Joël [EMAIL PROTECTED] wrote: BERTRAND Joël wrote: Hello, I run 2.6.23 linux kernel on two T1000 (sparc64) servers. Each server has a partitionable raid5 array (/dev/md/d0) and I have to synchronize both raid5 volumes by raid1. Thus, I have tried to build a raid1 volume between /dev/md/d0p1 and /dev/sdi1 (exported by iscsi from the second server) and I obtain a BUG : Root gershwin:[/usr/scripts] mdadm -C /dev/md7 -l1 -n2 /dev/md/d0p1 /dev/sdi1 ... Hello, I have fixed iscsi-target, and I have tested it. It works now without any trouble. Patches were posted on iscsi-target mailing list. When I use iSCSI to access to foreign raid5 volume, it works fine. I can format foreign volume, copy large files on it... But when I tried to create a new raid1 volume with a local raid5 volume and a foreign raid5 volume, I receive my well known Oops. You can find my dmesg after Oops : Can you send your .config and your bootup dmesg? I found a problem which may lead to the operations count dropping below zero. If ops_complete_biofill() gets preempted in between the following calls: raid5.c:554 clear_bit(STRIPE_OP_BIOFILL, sh-ops.ack); raid5.c:555 clear_bit(STRIPE_OP_BIOFILL, sh-ops.pending); ...then get_stripe_work() can recount/re-acknowledge STRIPE_OP_BIOFILL causing the assertion. In fact, the 'pending' bit should always be cleared first, but the other cases are protected by spin_lock(sh-lock). Patch attached. Dan, I have modified get_stripe_work like this : static unsigned long get_stripe_work(struct stripe_head *sh) { unsigned long pending; int ack = 0; int a,b,c,d,e,f,g; pending = sh-ops.pending; test_and_ack_op(STRIPE_OP_BIOFILL, pending); a=ack; test_and_ack_op(STRIPE_OP_COMPUTE_BLK, pending); b=ack; test_and_ack_op(STRIPE_OP_PREXOR, pending); c=ack; test_and_ack_op(STRIPE_OP_BIODRAIN, pending); d=ack; test_and_ack_op(STRIPE_OP_POSTXOR, pending); e=ack; test_and_ack_op(STRIPE_OP_CHECK, pending); f=ack; if (test_and_clear_bit(STRIPE_OP_IO, sh-ops.pending)) ack++; g=ack; sh-ops.count -= ack; if (sh-ops.count0) printk(%d %d %d %d %d %d %d\n, a,b,c,d,e,f,g); BUG_ON(sh-ops.count 0); return pending; } and I obtain on console : 1 1 1 1 1 2 kernel BUG at drivers/md/raid5.c:390! \|/ \|/ @'/ .. \`@ /_| \__/ |_\ \__U_/ md7_resync(5409): Kernel bad sw trap 5 [#1] If that can help you... JKB - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Partitionable raid array... How to create devices ?
Neil Brown wrote: On Tuesday October 16, [EMAIL PROTECTED] wrote: Hello, I use software raid for a long time without any trouble. Today, I have to install a partitionable raid1 array over iSCSI. I have some questions because I don't understand how make this kind of array. I have a sparc64 (T1000) with a JBOD (U320 SCSI) that runs a 2.6.23 linux kernel and debian testing distribution. /dev/sda : internal SAS drive - OS /dev/sdb : internal SAS drive - OS I have made on /dev/sda and /dev/sdb seven raid1 volumes (non partitionables arrays). /dev/sd[c-h] : external U320 drives. Each 300 GB drive only contains one type fd partition. I have tried to create a partitionable array with : Root gershwin:[/usr/src/linux-2.6.23] mdadm -C /dev/mdp0 -l5 --auto=mdp4 -n6 /dev/sd[c-h]1 Try /dev/md/d0 or /dev/md_d0 as suggested in the DEVICE NAMES section of the man page. However what you used should work. I'll get that fixed for the next release. Thanks, it works now. I have seen this note, thus I have remaned my array mdp0, but it was not enough ;-) Regards, JKB - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[BUG] Raid5 trouble
Hello, I run 2.6.23 linux kernel on two T1000 (sparc64) servers. Each server has a partitionable raid5 array (/dev/md/d0) and I have to synchronize both raid5 volumes by raid1. Thus, I have tried to build a raid1 volume between /dev/md/d0p1 and /dev/sdi1 (exported by iscsi from the second server) and I obtain a BUG : Root gershwin:[/usr/scripts] mdadm -C /dev/md7 -l1 -n2 /dev/md/d0p1 /dev/sdi1 ... kernel BUG at drivers/md/raid5.c:380! \|/ \|/ @'/ .. \`@ /_| \__/ |_\ \__U_/ md7_resync(4476): Kernel bad sw trap 5 [#1] TSTATE: 80001606 TPC: 005ed50c TNPC: 005ed510 Y: Not tainted TPC: get_stripe_work+0x1f4/0x200 g0: 0005 g1: 007c0400 g2: 0001 g3: 00748400 g4: f800ebdb2400 g5: f8000208 g6: f800e82fc000 g7: 00748528 o0: 0029 o1: 00715798 o2: 017c o3: 0005 o4: 0006 o5: f800e9bb6e28 sp: f800e82fed81 ret_pc: 005ed504 RPC: get_stripe_work+0x1ec/0x200 l0: 0002 l1: l2: f800e9bb6e68 l3: f800e9bb6db0 l4: f800e9bb6e50 l5: fff8 l6: 0005 l7: f800fcbd6000 i0: f800e9bb6df0 i1: i2: 0004 i3: f800e82ff720 i4: 0080 i5: 0080 i6: f800e82fee51 i7: 005f0274 I7: handle_stripe5+0x4fc/0x1340 Caller[005f0274]: handle_stripe5+0x4fc/0x1340 Caller[005f211c]: handle_stripe+0x24/0x13e0 Caller[005f4450]: make_request+0x358/0x600 Caller[00542890]: generic_make_request+0x198/0x220 Caller[005eb240]: sync_request+0x608/0x640 Caller[005fef7c]: md_do_sync+0x384/0x920 Caller[005ff8f0]: md_thread+0x38/0x140 Caller[00478b40]: kthread+0x48/0x80 Caller[004273d0]: kernel_thread+0x38/0x60 Caller[00478de0]: kthreadd+0x148/0x1c0 Instruction DUMP: 9210217c 7ff8f57f 90122398 91d02005 30680004 0100 0100 0100 9de3bf00 Root gershwin:[/usr/scripts] cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md7 : active raid1 sdi1[1] md_d0p1[0] 1464725632 blocks [2/2] [UU] [] resync = 0.0% (132600/1464725632) finish=141823.7min speed=171K/sec md_d0 : active raid5 sdc1[0] sdh1[5] sdg1[4] sdf1[3] sde1[2] sdd1[1] 1464725760 blocks level 5, 64k chunk, algorithm 2 [6/6] [UU] ... Root gershwin:[/usr/scripts] fdisk -l /dev/md/d0 Disk /dev/md/d0: 1499.8 GB, 1499879178240 bytes 2 heads, 4 sectors/track, 366181440 cylinders Units = cylinders of 8 * 512 = 4096 bytes Disk identifier: 0xa4a52979 Device Boot Start End Blocks Id System /dev/md/d0p1 1 366181440 1464725758 fd Linux raid autodetect Root gershwin:[/usr/scripts] fdisk -l /dev/sdi Disk /dev/sdi: 1499.8 GB, 1499879178240 bytes 2 heads, 4 sectors/track, 366181440 cylinders Units = cylinders of 8 * 512 = 4096 bytes Disk identifier: 0xf6cdb2a3 Device Boot Start End Blocks Id System /dev/sdi1 1 366181440 1464725758 fd Linux raid autodetect Root gershwin:[/usr/scripts] cat /proc/scsi/scsi Attached devices: Host: scsi0 Channel: 00 Id: 00 Lun: 00 Vendor: FUJITSU Model: MAY2073RCSUN72G Rev: 0501 Type: Direct-AccessANSI SCSI revision: 04 Host: scsi0 Channel: 00 Id: 01 Lun: 00 Vendor: FUJITSU Model: MAY2073RCSUN72G Rev: 0501 Type: Direct-AccessANSI SCSI revision: 04 Host: scsi2 Channel: 00 Id: 08 Lun: 00 Vendor: FUJITSU Model: MAW3300NCRev: 0104 Type: Direct-AccessANSI SCSI revision: 03 Host: scsi2 Channel: 00 Id: 09 Lun: 00 Vendor: FUJITSU Model: MAW3300NCRev: 0104 Type: Direct-AccessANSI SCSI revision: 03 Host: scsi2 Channel: 00 Id: 10 Lun: 00 Vendor: FUJITSU Model: MAW3300NCRev: 0104 Type: Direct-AccessANSI SCSI revision: 03 Host: scsi2 Channel: 00 Id: 11 Lun: 00 Vendor: FUJITSU Model: MAW3300NCRev: 0104 Type: Direct-AccessANSI SCSI revision: 03 Host: scsi2 Channel: 00 Id: 12 Lun: 00 Vendor: FUJITSU Model: MAW3300NCRev: 0104 Type: Direct-AccessANSI SCSI revision: 03 Host: scsi2 Channel: 00 Id: 13 Lun: 00 Vendor: FUJITSU Model: MAW3300NCRev: 0104 Type: Direct-AccessANSI SCSI revision: 03 Host: scsi3 Channel: 00 Id: 00 Lun: 00 Vendor: IET Model: VIRTUAL-DISK Rev: 0 Type: Direct-AccessANSI SCSI revision: 04 Root gershwin:[/usr/scripts] I don't think if this bug is arch specific, but I never see it on amd64... Regards, JKB - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More