Raid1, mdadm and nfs that remains in D state

2008-01-22 Thread BERTRAND Joël

Hello,

	I have installed a lot of T1000 with debian/testing and official 
2.6.23.9 linux kernel. All but iscsi packages come from debian 
repositories. iscsi was built from SVN tree. md7 is a raid1 volume over 
iscsi and I can access to this device. This morning, one of my T1000 has 
crashed. NFS daemon stays in D state:


Root gershwin:[~]  ps auwx | grep NFS
root 17041  0.0  0.0   2064   744 ttyS0S+   12:33   0:00 grep NFS
Root gershwin:[~]  ps auwx | grep nfs
root 17043  0.0  0.0   2064   744 ttyS0S+   12:33   0:00 grep nfs
root 18276  0.0  0.0  0 0 ?D 2007  16:59 [nfsd]
root 18277  0.0  0.0  0 0 ?D 2007  16:56 [nfsd]
root 18278  0.0  0.0  0 0 ?D 2007  16:57 [nfsd]
root 18279  0.0  0.0  0 0 ?D 2007  16:41 [nfsd]
root 18280  0.0  0.0  0 0 ?D 2007  16:44 [nfsd]
root 18281  0.0  0.0  0 0 ?D 2007  16:49 [nfsd]
root 18282  0.0  0.0  0 0 ?D 2007  16:37 [nfsd]
root 18283  0.0  0.0  0 0 ?D 2007  16:54 [nfsd]
Root gershwin:[~]  dmesg
sp: f800f2bcf3b1 ret_pc: 005e6d54
RPC: raid1d+0x35c/0x1020
l0: f80060b8fa40 l1: 0050 l2: 0006 l3: 
0001
l4: f800fde2c8a0 l5: f800fc74dc20 l6: 0007 l7: 

i0: f800fb70c400 i1: f800fde2c8c8 i2: f8006297ee40 i3: 
f800
i4: 0010 i5: 007a2f00 i6: f800f2bcf4f1 i7: 
005f2f50

I7: md_thread+0x38/0x140
BUG: soft lockup - CPU#6 stuck for 11s! [md7_raid1:5818]
TSTATE: 80001600 TPC: 0055bff0 TNPC: 0055bff4 Y: 
Not tainted

TPC: loop+0x14/0x28
g0: 0020 g1: dffd57408000 g2: 0002a8ba2e81 g3: 

g4: f800fd52d960 g5: f800020bc000 g6: f800f2bcc000 g7: 

o0: f8009d13d254 o1: f80071755254 o2: 0dac o3: 

o4: 0018d1a6 o5: 00225c52 sp: f800f2bcf3b1 ret_pc: 
005e6d54

RPC: raid1d+0x35c/0x1020
l0: f80077d36ce0 l1: 0050 l2: 0006 l3: 
0001
l4: f800fde2c8a0 l5: f800f4372ea0 l6: 0007 l7: 

i0: f800fb70c400 i1: f800fde2c8c8 i2: f80091038660 i3: 
f800
i4: 0010 i5: 007a2f00 i6: f800f2bcf4f1 i7: 
005f2f50

I7: md_thread+0x38/0x140
BUG: soft lockup - CPU#6 stuck for 11s! [md7_raid1:5818]
TSTATE: 004480001607 TPC: 006803a0 TNPC: 006803a4 Y: 
Not tainted

TPC: _spin_unlock_irqrestore+0x28/0x40
g0: f800fed95000 g1:  g2: c0002000 g3: 
d0002000
g4: f800fd52d960 g5: f800020bc000 g6: f800f2bcc000 g7: 
f800ffcb
o0: f800fee16000 o1:  o2:  o3: 
f800fee16000
o4:  o5: 00784000 sp: f800f2bceda1 ret_pc: 
005a4fb8

RPC: tg3_poll+0x820/0xc40
l0: 042a l1: 0001 l2: f800f79aba00 l3: 
01d0
l4: f800fed95700 l5: f800f1091ec0 l6: 01d0 l7: 
0001
i0: 01df i1: 0029 i2: 01df i3: 
0029
i4: f800fed95794 i5: 94479812 i6: f800f2bcee81 i7: 
00609780

I7: net_rx_action+0x88/0x160
BUG: soft lockup - CPU#6 stuck for 11s! [md7_raid1:5818]
TSTATE: 009980001602 TPC: 10170100 TNPC: 10170104 Y: 
Not tainted

TPC: ipv4_get_l4proto+0x8/0xa0 [nf_conntrack_ipv4]
g0: 1002bb58 g1: 006c g2: f800eba32b0c g3: 
10170100
g4: f800fd52d960 g5: f800020bc000 g6: f800f2bcc000 g7: 
0003
o0: f800d69aae00 o1:  o2: f800f2bced24 o3: 
f800f2bced2f
o4: f800fed95000 o5: f800f2bceec8 sp: f800f2bce411 ret_pc: 
10019d7c

RPC: nf_conntrack_in+0xa4/0x580 [nf_conntrack]
l0: 0002 l1: 10175590 l2: 8000 l3: 
0002
l4:  l5: 0cbcc8bb l6: 0002 l7: 
f80062b8f820
i0: 0002 i1: 0003 i2: f800f2bcf080 i3: 
f800fed95000
i4: 00630260 i5: 00630260 i6: f800f2bce541 i7: 
0062517c

I7: nf_iterate+0x84/0xe0
BUG: soft lockup - CPU#6 stuck for 11s! [md7_raid1:5818]
TSTATE: 004480001605 TPC: 10161030 TNPC: 10161034 Y: 
Not tainted

TPC: ipt_do_table+0xd8/0x5a0 [ip_tables]
g0: 0001 g1:  g2: c0a80001 g3: 

g4: f800fd52d960 g5: f800020bc000 g6: f800f2bcc000 g7: 
0be0
o0: 10180b74 o1: f800f2bcf480 o2:  o3: 
f800fed95000
o4:  o5: f8005ef72be0 sp: f800f2bce821 ret_pc: 
10160fac

RPC: ipt_do_table+0x54/0x5a0 [ip_tables]
l0: 

Re: HELP! New disks being dropped from RAID 6 array on every reboot

2007-11-23 Thread BERTRAND Joël

Joshua Johnson wrote:

Greetings, long time listener, first time caller.

I recently replaced a disk in my existing 8 disk RAID 6 array.
Previously, all disks were PATA drives connected to the motherboard
IDE and 3 promise Ultra 100/133 controllers.  I replaced one of the
Promise controllers with a Via 64xx based controller, which has 2 SATA
ports and one PATA port.  I connected a new SATA drive to the new
card, partitioned the drive and added it to the array.  After 5 or 6
hours the resyncing process finished and the array showed up complete.
 Upon rebooting I discovered that the new drive had not been added to
the array when it was assembled on boot.   I resynced it and tried
again -- still would not persist after a reboot.  I moved one of the
existing PATA drives to the new controller (so I could have the slot
for network), rebooted and rebuilt the array.  Now when I reboot BOTH
disks are missing from the array (sda and sdb).  Upon examining the
disks it appears they think they are part of the array, but for some
reason they are not being added when the array is being assembled.
For example, this is a disk on the new controller which was not added
to the array after rebooting:


	What is your partition system ? When I have tried to created a raid6 
array over a SunOS partition type, I have seen this bug. Never on PC system.


Regards,

JKB
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-08 Thread BERTRAND Joël

BERTRAND Joël wrote:

Chuck Ebbert wrote:

On 11/05/2007 03:36 AM, BERTRAND Joël wrote:

Neil Brown wrote:

On Sunday November 4, [EMAIL PROTECTED] wrote:

# ps auxww | grep D
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME 
COMMAND

root   273  0.0  0.0  0 0 ?DOct21  14:40
[pdflush]
root   274  0.0  0.0  0 0 ?DOct21  13:00
[pdflush]

After several days/weeks, this is the second time this has happened,
while doing regular file I/O (decompressing a file), everything on
the device went into D-state.

At a guess (I haven't looked closely) I'd say it is the bug that was
meant to be fixed by

commit 4ae3f847e49e3787eca91bced31f8fd328d50496

except that patch applied badly and needed to be fixed with
the following patch (not in git yet).
These have been sent to stable@ and should be in the queue for 2.6.23.2

My linux-2.6.23/drivers/md/raid5.c contains your patch for a long
time :

...
spin_lock(sh-lock);
clear_bit(STRIPE_HANDLE, sh-state);
clear_bit(STRIPE_DELAYED, sh-state);

s.syncing = test_bit(STRIPE_SYNCING, sh-state);
s.expanding = test_bit(STRIPE_EXPAND_SOURCE, sh-state);
s.expanded = test_bit(STRIPE_EXPAND_READY, sh-state);
/* Now to look around and see what can be done */

/* clean-up completed biofill operations */
if (test_bit(STRIPE_OP_BIOFILL, sh-ops.complete)) {
clear_bit(STRIPE_OP_BIOFILL, sh-ops.pending);
clear_bit(STRIPE_OP_BIOFILL, sh-ops.ack);
clear_bit(STRIPE_OP_BIOFILL, sh-ops.complete);
}

rcu_read_lock();
for (i=disks; i--; ) {
mdk_rdev_t *rdev;
struct r5dev *dev = sh-dev[i];
...

but it doesn't fix this bug.



Did that chunk starting with clean-up completed biofill operations end
up where it belongs? The patch with the big context moves it to a 
different

place from where the original one puts it when applied to 2.6.23...

Lately I've seen several problems where the context isn't enough to make
a patch apply properly when some offsets have changed. In some cases a
patch won't apply at all because two nearly-identical areas are being
changed and the first chunk gets applied where the second one should,
leaving nowhere for the second chunk to apply.


I always apply this kind of patches by hands, and no by patch 
command. Last patch sent here seems to fix this bug :


gershwin:[/usr/scripts]  cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md7 : active raid1 sdi1[2] md_d0p1[0]
  1464725632 blocks [2/1] [U_]
  [=...]  recovery = 27.1% (396992504/1464725632) 
finish=1040.3min speed=17104K/sec


Resync done. Patch fix this bug.

Regards,

JKB
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-07 Thread BERTRAND Joël

Dan Williams wrote:

On Tue, 2007-11-06 at 03:19 -0700, BERTRAND Joël wrote:

Done. Here is obtained ouput :


Much appreciated.

[ 1260.969314] handling stripe 7629696, state=0x14 cnt=1, pd_idx=2 ops=0:0:0
[ 1260.980606] check 5: state 0x6 toread  read  
write f800ffcffcc0 written 
[ 1260.994808] check 4: state 0x6 toread  read  
write f800fdd4e360 written 
[ 1261.009325] check 3: state 0x1 toread  read  
write  written 
[ 1261.244478] check 2: state 0x1 toread  read  
write  written 
[ 1261.270821] check 1: state 0x6 toread  read  
write f800ff517e40 written 
[ 1261.312320] check 0: state 0x6 toread  read  
write f800fd4cae60 written 
[ 1261.361030] locked=4 uptodate=2 to_read=0 to_write=4 failed=0 failed_num=0
[ 1261.443120] for sector 7629696, rmw=0 rcw=0

[..]

This looks as if the blocks were prepared to be written out, but were
never handled in ops_run_biodrain(), so they remain locked forever.  The
operations flags are all clear which means handle_stripe thinks nothing
else needs to be done.

The following patch, also attached, cleans up cases where the code looks
at sh-ops.pending when it should be looking at the consistent
stack-based snapshot of the operations flags.


	Thanks for this patch. I'm testing it for three hours. I'm rebuilding a 
1.5 TB raid1 array over iSCSI without any trouble.


gershwin:[/usr/scripts]  cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md7 : active raid1 sdi1[2] md_d0p1[0]
  1464725632 blocks [2/1] [U_]
  [=...]  recovery =  6.7% (99484736/1464725632) 
finish=1450.9min speed=15679K/sec


Without your patch, I never reached 1%... I hope it fix this bug and I 
shall come back when my raid1 volume shall be resynchronized.


Regards,

JKB
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-07 Thread BERTRAND Joël

Chuck Ebbert wrote:

On 11/05/2007 03:36 AM, BERTRAND Joël wrote:

Neil Brown wrote:

On Sunday November 4, [EMAIL PROTECTED] wrote:

# ps auxww | grep D
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
root   273  0.0  0.0  0 0 ?DOct21  14:40
[pdflush]
root   274  0.0  0.0  0 0 ?DOct21  13:00
[pdflush]

After several days/weeks, this is the second time this has happened,
while doing regular file I/O (decompressing a file), everything on
the device went into D-state.

At a guess (I haven't looked closely) I'd say it is the bug that was
meant to be fixed by

commit 4ae3f847e49e3787eca91bced31f8fd328d50496

except that patch applied badly and needed to be fixed with
the following patch (not in git yet).
These have been sent to stable@ and should be in the queue for 2.6.23.2

My linux-2.6.23/drivers/md/raid5.c contains your patch for a long
time :

...
spin_lock(sh-lock);
clear_bit(STRIPE_HANDLE, sh-state);
clear_bit(STRIPE_DELAYED, sh-state);

s.syncing = test_bit(STRIPE_SYNCING, sh-state);
s.expanding = test_bit(STRIPE_EXPAND_SOURCE, sh-state);
s.expanded = test_bit(STRIPE_EXPAND_READY, sh-state);
/* Now to look around and see what can be done */

/* clean-up completed biofill operations */
if (test_bit(STRIPE_OP_BIOFILL, sh-ops.complete)) {
clear_bit(STRIPE_OP_BIOFILL, sh-ops.pending);
clear_bit(STRIPE_OP_BIOFILL, sh-ops.ack);
clear_bit(STRIPE_OP_BIOFILL, sh-ops.complete);
}

rcu_read_lock();
for (i=disks; i--; ) {
mdk_rdev_t *rdev;
struct r5dev *dev = sh-dev[i];
...

but it doesn't fix this bug.



Did that chunk starting with clean-up completed biofill operations end
up where it belongs? The patch with the big context moves it to a different
place from where the original one puts it when applied to 2.6.23...

Lately I've seen several problems where the context isn't enough to make
a patch apply properly when some offsets have changed. In some cases a
patch won't apply at all because two nearly-identical areas are being
changed and the first chunk gets applied where the second one should,
leaving nowhere for the second chunk to apply.


	I always apply this kind of patches by hands, and no by patch command. 
Last patch sent here seems to fix this bug :


gershwin:[/usr/scripts]  cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md7 : active raid1 sdi1[2] md_d0p1[0]
  1464725632 blocks [2/1] [U_]
  [=...]  recovery = 27.1% (396992504/1464725632) 
finish=1040.3min speed=17104K/sec


Regards,

JKB
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-06 Thread BERTRAND Joël

Done. Here is obtained ouput :

[ 1260.967796] for sector 7629696, rmw=0 rcw=0
[ 1260.969314] handling stripe 7629696, state=0x14 cnt=1, pd_idx=2 ops=0:0:0
[ 1260.980606] check 5: state 0x6 toread  read 
 write f800ffcffcc0 written 
[ 1260.994808] check 4: state 0x6 toread  read 
 write f800fdd4e360 written 
[ 1261.009325] check 3: state 0x1 toread  read 
 write  written 
[ 1261.244478] check 2: state 0x1 toread  read 
 write  written 
[ 1261.270821] check 1: state 0x6 toread  read 
 write f800ff517e40 written 
[ 1261.312320] check 0: state 0x6 toread  read 
 write f800fd4cae60 written 
[ 1261.361030] locked=4 uptodate=2 to_read=0 to_write=4 failed=0 
failed_num=0

[ 1261.443120] for sector 7629696, rmw=0 rcw=0
[ 1261.453348] handling stripe 7629696, state=0x14 cnt=1, pd_idx=2 ops=0:0:0
[ 1261.491538] check 5: state 0x6 toread  read 
 write f800ffcffcc0 written 
[ 1261.529120] check 4: state 0x6 toread  read 
 write f800fdd4e360 written 
[ 1261.560151] check 3: state 0x1 toread  read 
 write  written 
[ 1261.599180] check 2: state 0x1 toread  read 
 write  written 
[ 1261.637138] check 1: state 0x6 toread  read 
 write f800ff517e40 written 
[ 1261.674502] check 0: state 0x6 toread  read 
 write f800fd4cae60 written 
[ 1261.712589] locked=4 uptodate=2 to_read=0 to_write=4 failed=0 
failed_num=0

[ 1261.864338] for sector 7629696, rmw=0 rcw=0
[ 1261.873475] handling stripe 7629696, state=0x14 cnt=1, pd_idx=2 ops=0:0:0
[ 1261.907840] check 5: state 0x6 toread  read 
 write f800ffcffcc0 written 
[ 1261.950770] check 4: state 0x6 toread  read 
 write f800fdd4e360 written 
[ 1261.989003] check 3: state 0x1 toread  read 
 write  written 
[ 1262.019621] check 2: state 0x1 toread  read 
 write  written 
[ 1262.068705] check 1: state 0x6 toread  read 
 write f800ff517e40 written 
[ 1262.113265] check 0: state 0x6 toread  read 
 write f800fd4cae60 written 
[ 1262.150511] locked=4 uptodate=2 to_read=0 to_write=4 failed=0 
failed_num=0

[ 1262.171143] for sector 7629696, rmw=0 rcw=0
[ 1262.179142] handling stripe 7629696, state=0x14 cnt=1, pd_idx=2 ops=0:0:0
[ 1262.201905] check 5: state 0x6 toread  read 
 write f800ffcffcc0 written 
[ 1262.252750] check 4: state 0x6 toread  read 
 write f800fdd4e360 written 
[ 1262.289631] check 3: state 0x1 toread  read 
 write  written 
[ 1262.344709] check 2: state 0x1 toread  read 
 write  written 
[ 1262.400411] check 1: state 0x6 toread  read 
 write f800ff517e40 written 
[ 1262.437353] check 0: state 0x6 toread  read 
 write f800fd4cae60 written 
[ 1262.492561] locked=4 uptodate=2 to_read=0 to_write=4 failed=0 
failed_num=0

[ 1262.524993] for sector 7629696, rmw=0 rcw=0
[ 1262.533314] handling stripe 7629696, state=0x14 cnt=1, pd_idx=2 ops=0:0:0
[ 1262.561900] check 5: state 0x6 toread  read 
 write f800ffcffcc0 written 
[ 1262.588986] check 4: state 0x6 toread  read 
 write f800fdd4e360 written 
[ 1262.619455] check 3: state 0x1 toread  read 
 write  written 
[ 1262.671006] check 2: state 0x1 toread  read 
 write  written 
[ 1262.709065] check 1: state 0x6 toread  read 
 write f800ff517e40 written 
[ 1262.746904] check 0: state 0x6 toread  read 


write f800fd4cae60 written 
[ 1262.780203] locked=4 uptodate=2 to_read=0 to_write=4 failed=0 
failed_num=0

[ 1262.805941] for sector 7629696, rmw=0 rcw=0
[ 1262.815759] 

Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-06 Thread BERTRAND Joël

Justin Piszcz wrote:



On Tue, 6 Nov 2007, BERTRAND Joël wrote:


Done. Here is obtained ouput :

[ 1265.899068] check 4: state 0x6 toread  read 
 write f800fdd4e360 written 
[ 1265.941328] check 3: state 0x1 toread  read 
 write  written 
[ 1265.972129] check 2: state 0x1 toread  read 
 write  written 



For information, after crash, I have :

Root poulenc:[/sys/block]  cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md_d0 : active raid5 sdc1[0] sdh1[5] sdg1[4] sdf1[3] sde1[2] sdd1[1]
 1464725760 blocks level 5, 64k chunk, algorithm 2 [6/6] [UU]

Regards,

JKB


After the crash it is not 'resyncing' ?


No, it isn't...

JKB
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-06 Thread BERTRAND Joël

Justin Piszcz wrote:



On Tue, 6 Nov 2007, BERTRAND Joël wrote:


Justin Piszcz wrote:



On Tue, 6 Nov 2007, BERTRAND Joël wrote:


Done. Here is obtained ouput :

[ 1265.899068] check 4: state 0x6 toread  read 
 write f800fdd4e360 written 
[ 1265.941328] check 3: state 0x1 toread  read 
 write  written 
[ 1265.972129] check 2: state 0x1 toread  read 
 write  written 



For information, after crash, I have :

Root poulenc:[/sys/block]  cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md_d0 : active raid5 sdc1[0] sdh1[5] sdg1[4] sdf1[3] sde1[2] sdd1[1]
 1464725760 blocks level 5, 64k chunk, algorithm 2 [6/6] [UU]

Regards,

JKB


After the crash it is not 'resyncing' ?


No, it isn't...

JKB



After any crash/unclean shutdown the RAID should resync, if it doesn't, 
that's not good, I'd suggest running a raid check.


The 'repair' is supposed to clean it, in some cases (md0=swap) it gets 
dirty again.


Tue May  8 09:19:54 EDT 2007: Executing RAID health check for /dev/md0...
Tue May  8 09:19:55 EDT 2007: Executing RAID health check for /dev/md1...
Tue May  8 09:19:56 EDT 2007: Executing RAID health check for /dev/md2...
Tue May  8 09:19:57 EDT 2007: Executing RAID health check for /dev/md3...
Tue May  8 10:09:58 EDT 2007: cat /sys/block/md0/md/mismatch_cnt
Tue May  8 10:09:58 EDT 2007: 2176
Tue May  8 10:09:58 EDT 2007: cat /sys/block/md1/md/mismatch_cnt
Tue May  8 10:09:58 EDT 2007: 0
Tue May  8 10:09:58 EDT 2007: cat /sys/block/md2/md/mismatch_cnt
Tue May  8 10:09:58 EDT 2007: 0
Tue May  8 10:09:58 EDT 2007: cat /sys/block/md3/md/mismatch_cnt
Tue May  8 10:09:58 EDT 2007: 0
Tue May  8 10:09:58 EDT 2007: The meta-device /dev/md0 has 2176 
mismatched sectors.

Tue May  8 10:09:58 EDT 2007: Executing repair on /dev/md0
Tue May  8 10:09:59 EDT 2007: The meta-device /dev/md1 has no mismatched 
sectors.
Tue May  8 10:10:00 EDT 2007: The meta-device /dev/md2 has no mismatched 
sectors.
Tue May  8 10:10:01 EDT 2007: The meta-device /dev/md3 has no mismatched 
sectors.

Tue May  8 10:20:02 EDT 2007: All devices are clean...
Tue May  8 10:20:02 EDT 2007: cat /sys/block/md0/md/mismatch_cnt
Tue May  8 10:20:02 EDT 2007: 2176
Tue May  8 10:20:02 EDT 2007: cat /sys/block/md1/md/mismatch_cnt
Tue May  8 10:20:02 EDT 2007: 0
Tue May  8 10:20:02 EDT 2007: cat /sys/block/md2/md/mismatch_cnt
Tue May  8 10:20:02 EDT 2007: 0
Tue May  8 10:20:02 EDT 2007: cat /sys/block/md3/md/mismatch_cnt
Tue May  8 10:20:02 EDT 2007: 0


	I cannot repair this raid volume. I cannot reboot server without 
sending stop+A. init 6 stops at INIT:. After reboot, md0 is 
resynchronized.


Regards,

JKB
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-05 Thread BERTRAND Joël

Neil Brown wrote:

On Sunday November 4, [EMAIL PROTECTED] wrote:

# ps auxww | grep D
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
root   273  0.0  0.0  0 0 ?DOct21  14:40 [pdflush]
root   274  0.0  0.0  0 0 ?DOct21  13:00 [pdflush]

After several days/weeks, this is the second time this has happened, while 
doing regular file I/O (decompressing a file), everything on the device 
went into D-state.


At a guess (I haven't looked closely) I'd say it is the bug that was
meant to be fixed by

commit 4ae3f847e49e3787eca91bced31f8fd328d50496

except that patch applied badly and needed to be fixed with
the following patch (not in git yet).
These have been sent to stable@ and should be in the queue for 2.6.23.2


My linux-2.6.23/drivers/md/raid5.c contains your patch for a long time :

...
spin_lock(sh-lock);
clear_bit(STRIPE_HANDLE, sh-state);
clear_bit(STRIPE_DELAYED, sh-state);

s.syncing = test_bit(STRIPE_SYNCING, sh-state);
s.expanding = test_bit(STRIPE_EXPAND_SOURCE, sh-state);
s.expanded = test_bit(STRIPE_EXPAND_READY, sh-state);
/* Now to look around and see what can be done */

/* clean-up completed biofill operations */
if (test_bit(STRIPE_OP_BIOFILL, sh-ops.complete)) {
clear_bit(STRIPE_OP_BIOFILL, sh-ops.pending);
clear_bit(STRIPE_OP_BIOFILL, sh-ops.ack);
clear_bit(STRIPE_OP_BIOFILL, sh-ops.complete);
}

rcu_read_lock();
for (i=disks; i--; ) {
mdk_rdev_t *rdev;
struct r5dev *dev = sh-dev[i];
...

but it doesn't fix this bug.

Regards,

JKB
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-04 Thread BERTRAND Joël

Justin Piszcz wrote:

# ps auxww | grep D
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
root   273  0.0  0.0  0 0 ?DOct21  14:40 [pdflush]
root   274  0.0  0.0  0 0 ?DOct21  13:00 [pdflush]

After several days/weeks, this is the second time this has happened, 
while doing regular file I/O (decompressing a file), everything on the 
device went into D-state.


	Same observation here (kernel 2.6.23). I can see this bug when I try to 
synchronize a raid1 volume over iSCSI (each element is a raid5 volume), 
or sometimes only with a 1,5 TB raid5 volume. When this bug occurs, md 
subsystem eats 100% of one CPU and pdflush remains in D state too. What 
is your architecture ? I use two 32-threads T1000 (sparc64), and I'm 
trying to determine if this bug is arch specific.


Regards,

JKB
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Strange CPU occupation... and system hangs

2007-11-01 Thread BERTRAND Joël

BERTRAND Joël wrote:

snip


and some process are in D state :
Root gershwin:[/etc]  ps auwx | grep D
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
root   270  0.0  0.0  0 0 ?DOct27   1:17 [pdflush]
root  3676  0.9  0.0  0 0 ?DOct27  56:03 [nfsd]
root  5435  0.0  0.0  0 0 ?D   Oct27   3:16 [md7_raid1]
root  5438  0.0  0.0  0 0 ?D   Oct27   1:01 [kjournald]
root  5440  0.0  0.0  0 0 ?D   Oct27   0:33 [loop0]
root  5441  0.0  0.0  0 0 ?D   Oct27   0:05 [kjournald]
root 16442  0.0  0.0  20032  1208 pts/2D+   13:23   0:00 iftop 
-i eth2


Why md7_raid is in D state ? Same question about iftop ?


	Some bad news... After ten or eleven hours, kernel crashes on this 
server. The last top screen is :


top - 04:59:46 up 4 days, 16:24,  3 users,  load average: 19.72, 19.22, 
19.05

Tasks: 285 total,   5 running, 279 sleeping,   0 stopped,   1 zombie
Cpu(s):  0.0%us,  4.2%sy,  0.0%ni, 68.5%id, 27.3%wa,  0.0%hi,  0.0%si, 
0.0%st

Mem:   4139024k total,  4130800k used, 8224k free,38984k buffers
Swap:  7815536k total,  304k used,  7815232k free,79056k cached
PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND 


5426 root  15  -5 000 R  100  0.0 970:17.21 md_d0_raid5
26923 root  20   0  3120 1568 1112 R2  0.0  13:32.24 top 


...

	I have rebooted. I don't have any message in log files. I don't have 
any screen but I haven't seen anything on serial console. In ker.log, I 
have :
Oct 31 15:36:15 gershwin kernel: swapper: page allocation failure. 
order:2, mode:0x4020

Oct 31 15:36:15 gershwin kernel: Call Trace:
Oct 31 15:36:15 gershwin kernel:  [004b6568] 
__slab_alloc+0x1b0/0x720
Oct 31 15:36:15 gershwin kernel:  [004b87a8] 
__kmalloc_track_caller+0xb0/0xe0

Oct 31 15:36:15 gershwin kernel:  [00601d68] __alloc_skb+0x50/0x120
Oct 31 15:36:15 gershwin kernel:  [00642ee0] 
tcp_collapse+0x1e8/0x440
Oct 31 15:36:15 gershwin kernel:  [00643298] 
tcp_prune_queue+0x160/0x3a0
Oct 31 15:36:15 gershwin kernel:  [00643d08] 
tcp_data_queue+0x830/0xde0
Oct 31 15:36:15 gershwin kernel:  [00645d74] 
tcp_rcv_established+0x35c/0x840
Oct 31 15:36:15 gershwin kernel:  [0064cf7c] 
tcp_v4_do_rcv+0xe4/0x4a0

Oct 31 15:36:15 gershwin kernel:  [0064fdd8] tcp_v4_rcv+0xb00/0xb20
Oct 31 15:36:15 gershwin kernel:  [0062e2ac] 
ip_local_deliver+0x194/0x3a0

Oct 31 15:36:15 gershwin kernel:  [0062dd98] ip_rcv+0x360/0x6e0
Oct 31 15:36:15 gershwin kernel:  [00607f64] 
netif_receive_skb+0x1ec/0x480

Oct 31 15:36:15 gershwin kernel:  [005a5fe0] tg3_poll+0x6c8/0xc40
Oct 31 15:36:15 gershwin kernel:  [0060a940] 
net_rx_action+0x88/0x160

Oct 31 15:36:15 gershwin kernel:  [00468078] __do_softirq+0x80/0x100
Oct 31 15:36:15 gershwin kernel:  [0046815c] do_softirq+0x64/0x80
Oct 31 15:36:15 gershwin kernel: Mem-info:
Oct 31 15:36:15 gershwin kernel: Normal per-cpu:
Oct 31 15:36:15 gershwin kernel: CPU0: Hot: hi:   90, btch:  15 usd: 
 15   Cold: hi:   30, btch:   7 usd:   5
Oct 31 15:36:15 gershwin kernel: CPU1: Hot: hi:   90, btch:  15 usd: 
 31   Cold: hi:   30, btch:   7 usd:   4
Oct 31 15:36:15 gershwin kernel: CPU2: Hot: hi:   90, btch:  15 usd: 
  4   Cold: hi:   30, btch:   7 usd:   3
Oct 31 15:36:15 gershwin kernel: CPU3: Hot: hi:   90, btch:  15 usd: 
 82   Cold: hi:   30, btch:   7 usd:   2
Oct 31 15:36:15 gershwin kernel: CPU4: Hot: hi:   90, btch:  15 usd: 
 84   Cold: hi:   30, btch:   7 usd:   0
Oct 31 15:36:15 gershwin kernel: CPU5: Hot: hi:   90, btch:  15 usd: 
 65   Cold: hi:   30, btch:   7 usd:   4
Oct 31 15:36:15 gershwin kernel: CPU6: Hot: hi:   90, btch:  15 usd: 
 85   Cold: hi:   30, btch:   7 usd:   6
Oct 31 15:36:15 gershwin kernel: CPU7: Hot: hi:   90, btch:  15 usd: 
 69   Cold: hi:   30, btch:   7 usd:   4
Oct 31 15:36:15 gershwin kernel: CPU8: Hot: hi:   90, btch:  15 usd: 
 11   Cold: hi:   30, btch:   7 usd:   5
Oct 31 15:36:15 gershwin kernel: CPU9: Hot: hi:   90, btch:  15 usd: 
 75   Cold: hi:   30, btch:   7 usd:   1
Oct 31 15:36:15 gershwin kernel: CPU   10: Hot: hi:   90, btch:  15 usd: 
 84   Cold: hi:   30, btch:   7 usd:   2
Oct 31 15:36:15 gershwin kernel: CPU   11: Hot: hi:   90, btch:  15 usd: 
 13   Cold: hi:   30, btch:   7 usd:   1
Oct 31 15:36:15 gershwin kernel: CPU   12: Hot: hi:   90, btch:  15 usd: 
 17   Cold: hi:   30, btch:   7 usd:  23
Oct 31 15:36:15 gershwin kernel: CPU   13: Hot: hi:   90, btch:  15 usd: 
  7   Cold: hi:   30, btch:   7 usd:  25
Oct 31 15:36:15 gershwin kernel: CPU   14: Hot: hi:   90, btch:  15 usd: 
 64   Cold: hi:   30, btch:   7 usd:  27
Oct 31 15:36:15 gershwin kernel: CPU   15: Hot: hi:   90, btch:  15 usd: 
 12   Cold: hi:   30, btch:   7 usd:   6
Oct 31 15:36:15 gershwin kernel: CPU   16

Strange CPU occupation...

2007-10-31 Thread BERTRAND Joël

Hello,

	I'm looking for a bug in iSCSI target code, but I have found this 
morning a new bug that is certainly related to mine...


Please consider these raid volumes:
Root gershwin:[/etc]  cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md7 : active raid1 sdi1[2](F) md_d0p1[0]
  1464725632 blocks [2/1] [U_]

md_d0 : active raid5 sdc1[0] sdh1[5] sdg1[4] sdf1[3] sde1[2] sdd1[1]
  1464725760 blocks level 5, 64k chunk, algorithm 2 [6/6] [UU]

md6 : active raid1 sda1[0] sdb1[1]
  7815552 blocks [2/2] [UU]

md5 : active raid1 sda8[0] sdb8[1]
  14538752 blocks [2/2] [UU]

md4 : active raid1 sda7[0] sdb7[1]
  4883648 blocks [2/2] [UU]

md3 : active raid1 sda6[0] sdb6[1]
  9767424 blocks [2/2] [UU]

md2 : active raid1 sda5[0] sdb5[1]
  29294400 blocks [2/2] [UU]

md1 : active raid1 sda2[0] sdb2[1]
  489856 blocks [2/2] [UU]

md0 : active raid1 sdb4[1] sda4[0]
  4883648 blocks [2/2] [UU]

unused devices: none
Root gershwin:[/etc] 

md7 only has one disk because I cannot synchronize it over iSCSI. But 
without any message, load average of this server (24 threads T1000) 
increases until more than 9. top returns :

top - 13:36:08 up 4 days,  1:00,  3 users,  load average: 9.23, 8.46, 6.26
Tasks: 252 total,   5 running, 246 sleeping,   0 stopped,   1 zombie
Cpu(s):  0.0%us,  4.2%sy,  0.0%ni, 87.4%id,  8.4%wa,  0.0%hi,  0.0%si, 
0.0%st

Mem:   4139024k total,  4115920k used,23104k free,   743976k buffers
Swap:  7815536k total,  304k used,  7815232k free,  2188048k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND 

 5426 root  15  -5 000 R  100  0.0  46:32.54 
md_d0_raid5
17215 root  20   0  3120 1552 1112 R1  0.0   0:01.38 top 

1 root  20   0  2576  960  816 S0  0.0   0:09.74 init 

2 root  15  -5 000 S0  0.0   0:00.00 kthreadd 

3 root  RT  -5 000 S0  0.0   0:00.18 
migration/0
4 root  15  -5 000 S0  0.0   0:00.18 
ksoftirqd/0


and some process are in D state :
Root gershwin:[/etc]  ps auwx | grep D
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
root   270  0.0  0.0  0 0 ?DOct27   1:17 [pdflush]
root  3676  0.9  0.0  0 0 ?DOct27  56:03 [nfsd]
root  5435  0.0  0.0  0 0 ?D   Oct27   3:16 [md7_raid1]
root  5438  0.0  0.0  0 0 ?D   Oct27   1:01 [kjournald]
root  5440  0.0  0.0  0 0 ?D   Oct27   0:33 [loop0]
root  5441  0.0  0.0  0 0 ?D   Oct27   0:05 [kjournald]
root 16442  0.0  0.0  20032  1208 pts/2D+   13:23   0:00 iftop 
-i eth2


Why md7_raid is in D state ? Same question about iftop ?

Regards,

JKB
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Raid1/5 over iSCSI trouble

2007-10-29 Thread BERTRAND Joël

Ming Zhang wrote:

off topic, could you resubmit the alignment issue patch to list and see
if tomof accept. he needs a patch inlined in email. it is found and
fixed by you, so had better you post it (instead of me). thx.


diff -u kernel.old/iscsi.c kernel/iscsi.c
--- kernel.old/iscsi.c  2007-10-29 09:49:16.0 +0100
+++ kernel/iscsi.c  2007-10-17 11:19:14.0 +0200
@@ -726,13 +726,26 @@
case READ_10:
case WRITE_10:
case WRITE_VERIFY:
-   *off = be32_to_cpu(*(u32 *)cmd[2]);
+   *off = be32_to_cpuu32) cmd[2])  24) |
+   (((u32) cmd[3])  16) |
+   (((u32) cmd[4])  8) |
+   cmd[5]);
*len = (cmd[7]  8) + cmd[8];
break;
case READ_16:
case WRITE_16:
-   *off = be64_to_cpu(*(u64 *)cmd[2]);
-   *len = be32_to_cpu(*(u32 *)cmd[10]);
+   *off = be32_to_cpuu64) cmd[2])  56) |
+   (((u64) cmd[3])  48) |
+   (((u64) cmd[4])  40) |
+   (((u64) cmd[5])  32) |
+   (((u64) cmd[6])  24) |
+   (((u64) cmd[7])  16) |
+   (((u64) cmd[8])  8) |
+   cmd[9]);
+   *len = be32_to_cpuu32) cmd[10])  24) |
+   (((u32) cmd[11])  16) |
+   (((u32) cmd[12])  8) |
+   cmd[13]);
break;
default:
BUG();
diff -u kernel.old/target_disk.c kernel/target_disk.c
--- kernel.old/target_disk.c2007-10-29 09:49:16.0 +0100
+++ kernel/target_disk.c2007-10-17 16:04:06.0 +0200
@@ -66,13 +66,15 @@
unsigned char geo_m_pg[] = {0x04, 0x16, 0x00, 0x00, 0x00, 0x40, 
0x00, 0x

00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x

00,
0x00, 0x00, 0x00, 0x00, 0x3a, 0x98, 
0x00, 0x

00};
-   u32 ncyl, *p;
+   u32 ncyl;
+   u32 n;

/* assume 0xff heads, 15krpm. */
memcpy(ptr, geo_m_pg, sizeof(geo_m_pg));
ncyl = sec  14; /* 256 * 64 */
-   p = (u32 *)(ptr + 1);
-   *p = *p | cpu_to_be32(ncyl);
+   memcpy(n,ptr+1,sizeof(u32));
+   n = n | cpu_to_be32(ncyl);
+   memcpy(ptr+1, n, sizeof(u32));
return sizeof(geo_m_pg);
 }

@@ -249,7 +251,10 @@
struct iet_volume *lun;
int rest, idx = 0;

-   size = be32_to_cpu(*(u32 *)req-scb[6]);
+   size = be32_to_cpuu32) req-scb[6])  24) |
+   (((u32) req-scb[7])  16) |
+   (((u32) req-scb[8])  8) |
+   req-scb[9]);
if (size  16)
return -1;

Regards,

JKB
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Raid1/5 over iSCSI trouble

2007-10-27 Thread BERTRAND Joël

Dan Williams wrote:

On 10/24/07, BERTRAND Joël [EMAIL PROTECTED] wrote:

Hello,

Any news about this trouble ? Any idea ? I'm trying to fix it, but I
don't see any specific interaction between raid5 and istd. Does anyone
try to reproduce this bug on another arch than sparc64 ? I only use
sparc32 and 64 servers and I cannot test on other archs. Of course, I
have a laptop, but I cannot create a raid5 array on its internal HD to
test this configuration ;-)



Can you collect some oprofile data, as Ming suggested, so we can maybe
see what md_d0_raid5 and istd1 are fighting about?  Hopefully it is as
painless to run on sparc as it is on IA:

opcontrol --start --vmlinux=/path/to/vmlinux
wait
opcontrol --stop
opreport --image-path=/lib/modules/`uname -r` -l


Done.

Profiling through timer interrupt
samples  %image name   app name 
symbol name

20028038 92.9510  vmlinux-2.6.23   vmlinux-2.6.23   cpu_idle
1198566   5.5626  vmlinux-2.6.23   vmlinux-2.6.23   schedule
41558 0.1929  vmlinux-2.6.23   vmlinux-2.6.23   yield
34791 0.1615  vmlinux-2.6.23   vmlinux-2.6.23   NGmemcpy
18417 0.0855  vmlinux-2.6.23   vmlinux-2.6.23 
xor_niagara_5
17430 0.0809  raid456  raid456  (no 
symbols)
15837 0.0735  vmlinux-2.6.23   vmlinux-2.6.23 
sys_sched_yield

14860 0.0690  iscsi_trgt.koiscsi_trgt   istd
12705 0.0590  nf_conntrack nf_conntrack (no 
symbols)
9236  0.0429  libc-2.6.1.solibc-2.6.1.so(no 
symbols)
9034  0.0419  vmlinux-2.6.23   vmlinux-2.6.23 
xor_niagara_2
6534  0.0303  oprofiledoprofiled(no 
symbols)
6149  0.0285  vmlinux-2.6.23   vmlinux-2.6.23 
scsi_request_fn
5947  0.0276  ip_tablesip_tables(no 
symbols)
4510  0.0209  vmlinux-2.6.23   vmlinux-2.6.23 
dma_4v_map_single
3823  0.0177  vmlinux-2.6.23   vmlinux-2.6.23 
__make_request

3326  0.0154  vmlinux-2.6.23   vmlinux-2.6.23   tg3_poll
3162  0.0147  iscsi_trgt.koiscsi_trgt 
scsi_cmnd_exec
3091  0.0143  vmlinux-2.6.23   vmlinux-2.6.23 
scsi_dispatch_cmd
2849  0.0132  vmlinux-2.6.23   vmlinux-2.6.23 
tcp_v4_rcv
2811  0.0130  vmlinux-2.6.23   vmlinux-2.6.23 
nf_iterate
2729  0.0127  vmlinux-2.6.23   vmlinux-2.6.23 
_spin_lock_bh

2551  0.0118  vmlinux-2.6.23   vmlinux-2.6.23   kfree
2467  0.0114  vmlinux-2.6.23   vmlinux-2.6.23 
kmem_cache_free
2314  0.0107  vmlinux-2.6.23   vmlinux-2.6.23 
atomic_add
2065  0.0096  vmlinux-2.6.23   vmlinux-2.6.23 
NGbzero_loop

1826  0.0085  vmlinux-2.6.23   vmlinux-2.6.23   ip_rcv
1823  0.0085  nf_conntrack_ipv4nf_conntrack_ipv4(no 
symbols)
1822  0.0085  vmlinux-2.6.23   vmlinux-2.6.23 
clear_bit
1767  0.0082  python2.4python2.4(no 
symbols)
1734  0.0080  vmlinux-2.6.23   vmlinux-2.6.23 
atomic_sub_ret
1694  0.0079  vmlinux-2.6.23   vmlinux-2.6.23 
tcp_rcv_established
1673  0.0078  vmlinux-2.6.23   vmlinux-2.6.23 
tcp_recvmsg
1670  0.0078  vmlinux-2.6.23   vmlinux-2.6.23 
netif_receive_skb

1668  0.0077  vmlinux-2.6.23   vmlinux-2.6.23   set_bit
1545  0.0072  vmlinux-2.6.23   vmlinux-2.6.23 
__kmalloc_track_caller
1526  0.0071  iptable_nat  iptable_nat  (no 
symbols)
1526  0.0071  vmlinux-2.6.23   vmlinux-2.6.23 
kmem_cache_alloc
1373  0.0064  vmlinux-2.6.23   vmlinux-2.6.23 
generic_unplug_device

...

Is it enough ?

Regards,

JKB
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Raid1/5 over iSCSI trouble

2007-10-27 Thread BERTRAND Joël

Dan Williams wrote:

On 10/27/07, BERTRAND Joël [EMAIL PROTECTED] wrote:

Dan Williams wrote:

Can you collect some oprofile data, as Ming suggested, so we can maybe
see what md_d0_raid5 and istd1 are fighting about?  Hopefully it is as
painless to run on sparc as it is on IA:

opcontrol --start --vmlinux=/path/to/vmlinux
wait
opcontrol --stop
opreport --image-path=/lib/modules/`uname -r` -l

Done.



[..]


Is it enough ?


I would expect md_d0_raid5 and istd1 to show up pretty high in the
list if they are constantly pegged at a 100% CPU utilization like you
showed in the failure case.  Maybe this was captured after the target
has disconnected?


	No, I have launched opcontrol before starting raid1 creation, and 
stopped after disconnection. Don't forget that this server has 32 CPU's.


Regards,

JKB
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Raid1/5 over iSCSI trouble

2007-10-24 Thread BERTRAND Joël

Hello,

	Any news about this trouble ? Any idea ? I'm trying to fix it, but I 
don't see any specific interaction between raid5 and istd. Does anyone 
try to reproduce this bug on another arch than sparc64 ? I only use 
sparc32 and 64 servers and I cannot test on other archs. Of course, I 
have a laptop, but I cannot create a raid5 array on its internal HD to 
test this configuration ;-)


Please note that I won't read my mails until next saturday morning 
(CEST).


After disconnection of iSCSI target :

Tasks: 232 total,   7 running, 224 sleeping,   0 stopped,   1 zombie
Cpu(s):  0.0%us, 15.2%sy,  0.0%ni, 84.3%id,  0.0%wa,  0.1%hi,  0.3%si, 
0.0%st

Mem:   4139032k total,  4127584k used,11448k free,95752k buffers
Swap:  7815536k total,0k used,  7815536k free,  3758792k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 9738 root  15  -5 000 R  100  0.0   4:56.82 md_d0_raid5
 9774 root  15  -5 000 R  100  0.0   5:52.41 istd1
 9739 root  15  -5 000 R   14  0.0   0:28.90 md_d0_resync
 9916 root  20   0  3248 1544 1120 R2  0.0   0:00.56 top
 4129 root  20   0 41648 5024 2432 S0  0.1   2:56.17 
fail2ban-server

1 root  20   0  2576  960  816 S0  0.0   0:01.58 init
2 root  15  -5 000 S0  0.0   0:00.00 kthreadd
3 root  RT  -5 000 S0  0.0   0:00.00 migration/0
4 root  15  -5 000 S0  0.0   0:00.02 ksoftirqd/0
5 root  RT  -5 000 S0  0.0   0:00.00 migration/1
6 root  15  -5 000 S0  0.0   0:00.00 ksoftirqd/1


Regards,

JKB
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Raid1/5 over iSCSI trouble

2007-10-20 Thread BERTRAND Joël

Bill Davidsen wrote:

BERTRAND Joël wrote:


Sorry for this last mail. I have found another mistake, but I 
don't know if this bug comes from iscsi-target or raid5 itself. iSCSI 
target is disconnected because istd1 and md_d0_raid5 kernel threads 
use 100% of CPU each !


Tasks: 235 total,   6 running, 227 sleeping,   0 stopped,   2 zombie
Cpu(s):  0.1%us, 12.5%sy,  0.0%ni, 87.4%id,  0.0%wa,  0.0%hi,  0.0%si, 
0.0%st

Mem:   4139032k total,   218424k used,  3920608k free,10136k buffers
Swap:  7815536k total,0k used,  7815536k free,64808k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 5824 root  15  -5 000 R  100  0.0  10:34.25 istd1
 5599 root  15  -5 000 R  100  0.0   7:25.43 md_d0_raid5


Given that the summary shows 87.4% idle, something is not right. You 
might try another tool, like vmstat, to at least verify the way the CPU 
is being used. When you can't trust what your tools tell you it gets 
really hard to make decisions based on the data.


Don't forget this box is a 32-CPU server.

JKB
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Iscsitarget-devel] Abort Task ?

2007-10-19 Thread BERTRAND Joël

Ming Zhang wrote:


as Ross pointed out, many io pattern only have 1 outstanding io at any
time, so there is only one work thread actively to serve it. so it can
not exploit the multiple core here.


you see 100% at nullio or fileio? with disk, most time should spend on
iowait and cpu utilization should not high at all.


With both nullio and fileio...
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Iscsitarget-devel] Abort Task ?

2007-10-19 Thread BERTRAND Joël

Ming Zhang wrote:

On Fri, 2007-10-19 at 09:48 +0200, BERTRAND Joël wrote:

Ross S. W. Walker wrote:

BERTRAND Joël wrote:

BERTRAND Joël wrote:
I can format serveral times (mkfs.ext3) a 1.5 TB volume 
over iSCSI 
without any trouble. I can read and write on this virtual 
disk without 

any trouble.

Now, I have configured ietd with :

Lun 0 Sectors=1464725758,Type=nullio

and I run on initiator side :

Root gershwin:[/dev]  dd if=/dev/zero of=/dev/sdj bs=8192
479482+0 records in
479482+0 records out
3927916544 bytes (3.9 GB) copied, 153.222 seconds, 25.6 MB/s

Root gershwin:[/dev]  dd if=/dev/zero of=/dev/sdj bs=8192

I'm waitinfor a crash. No one when I write these lines. 
   I suspect 

an interaction between raid and iscsi.

I simultanely run :

Root gershwin:[/dev]  dd if=/dev/zero of=/dev/sdj bs=8192
8397210+0 records in
8397210+0 records out
68789944320 bytes (69 GB) copied, 2732.55 seconds, 25.2 MB/s

and

Root gershwin:[~]  dd if=/dev/sdj of=/dev/null bs=8192
739200+0 records in
739199+0 records out
6055518208 bytes (6.1 GB) copied, 447.178 seconds, 13.5 MB/s

without any trouble.

The speed can definitely be improved. Look at your network setup
and use ping to try and get the network latency to a minimum.

# ping -A -s 8192 172.16.24.140

--- 172.16.24.140 ping statistics ---
14058 packets transmitted, 14057 received, 0% packet loss, time 9988ms
rtt min/avg/max/mdev = 0.234/0.268/2.084/0.041 ms, ipg/ewma 0.710/0.260 ms

gershwin:[~]  ping -A -s 8192 192.168.0.2
PING 192.168.0.2 (192.168.0.2) 8192(8220) bytes of data.
8200 bytes from 192.168.0.2: icmp_seq=1 ttl=64 time=0.693 ms
8200 bytes from 192.168.0.2: icmp_seq=2 ttl=64 time=0.595 ms
8200 bytes from 192.168.0.2: icmp_seq=3 ttl=64 time=0.583 ms
8200 bytes from 192.168.0.2: icmp_seq=4 ttl=64 time=0.589 ms
8200 bytes from 192.168.0.2: icmp_seq=5 ttl=64 time=0.580 ms
8200 bytes from 192.168.0.2: icmp_seq=6 ttl=64 time=0.594 ms
8200 bytes from 192.168.0.2: icmp_seq=7 ttl=64 time=0.580 ms
8200 bytes from 192.168.0.2: icmp_seq=8 ttl=64 time=0.592 ms
8200 bytes from 192.168.0.2: icmp_seq=9 ttl=64 time=0.589 ms
8200 bytes from 192.168.0.2: icmp_seq=10 ttl=64 time=0.571 ms
8200 bytes from 192.168.0.2: icmp_seq=11 ttl=64 time=0.588 ms
8200 bytes from 192.168.0.2: icmp_seq=12 ttl=64 time=0.580 ms
8200 bytes from 192.168.0.2: icmp_seq=13 ttl=64 time=0.587 ms

--- 192.168.0.2 ping statistics ---
13 packets transmitted, 13 received, 0% packet loss, time 2400ms
rtt min/avg/max/mdev = 0.571/0.593/0.693/0.044 ms, ipg/ewma 200.022/0.607 ms
gershwin:[~] 

	Both initiator and target are alone on a gigabit NIC (Tigon3). On 
target server, istd1 takes 100% of a CPU (and only one CPU, even my 
T1000 can simultaneous run 32 threads). I think the limitation comes 
from istd1.


usually istdx will not take 100% cpu with 1G network, especially when
using disk as back storage, some kind of profiling work might be helpful
to tell what happened...

forgot to ask, your sparc64 platform cpu spec.


Root gershwin:[/mnt/solaris]  cat /proc/cpuinfo
cpu : UltraSparc T1 (Niagara)
fpu : UltraSparc T1 integrated FPU
prom: OBP 4.23.4 2006/08/04 20:45
type: sun4v
ncpus probed: 24
ncpus active: 24
D$ parity tl1   : 0
I$ parity tl1   : 0

Both servers are built with 1 GHz T1 processors (6 cores, 24 threads).

Regards,

JKB
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Raid5 trouble

2007-10-19 Thread BERTRAND Joël

Bill Davidsen wrote:

Dan Williams wrote:

On Fri, 2007-10-19 at 01:04 -0700, BERTRAND Joël wrote:
  

I run for 12 hours some dd's (read and write in nullio)
between
initiator and target without any disconnection. Thus iSCSI code seems
to
be robust. Both initiator and target are alone on a single gigabit
ethernet link (without any switch). I'm investigating...



Can you reproduce on 2.6.22?

Also, I do not think this is the cause of your failure, but you have
CONFIG_DMA_ENGINE=y in your config.  Setting this to 'n' will compile
out the unneeded checks for offload engines in async_memcpy and
async_xor.


Given that offload engines are far less tested code, I think this is a 
very good thing to try!


	I'm trying wihtout CONFIG_DMA_ENGINE=y. istd1 only uses 40% of one CPU 
when I rebuild my raid1 array. 1% of this array was now resynchronized 
without any hang.


Root gershwin:[/usr/scripts]  cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md7 : active raid1 sdi1[2] md_d0p1[0]
  1464725632 blocks [2/1] [U_]
  []  recovery =  1.0% (15705536/1464725632) 
finish=1103.9min speed=21875K/sec


Regards,

JKB
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Raid1/5 over iSCSI trouble

2007-10-19 Thread BERTRAND Joël

BERTRAND Joël wrote:

Bill Davidsen wrote:

Dan Williams wrote:

On Fri, 2007-10-19 at 01:04 -0700, BERTRAND Joël wrote:
 

I run for 12 hours some dd's (read and write in nullio)
between
initiator and target without any disconnection. Thus iSCSI code seems
to
be robust. Both initiator and target are alone on a single gigabit
ethernet link (without any switch). I'm investigating...



Can you reproduce on 2.6.22?

Also, I do not think this is the cause of your failure, but you have
CONFIG_DMA_ENGINE=y in your config.  Setting this to 'n' will compile
out the unneeded checks for offload engines in async_memcpy and
async_xor.


Given that offload engines are far less tested code, I think this is a 
very good thing to try!


I'm trying wihtout CONFIG_DMA_ENGINE=y. istd1 only uses 40% of one 
CPU when I rebuild my raid1 array. 1% of this array was now 
resynchronized without any hang.


Root gershwin:[/usr/scripts]  cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md7 : active raid1 sdi1[2] md_d0p1[0]
  1464725632 blocks [2/1] [U_]
  []  recovery =  1.0% (15705536/1464725632) 
finish=1103.9min speed=21875K/sec


Same result...

connection2:0: iscsi: detected conn error (1011)

 session2: iscsi: session recovery timed out after 120 secs
sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
sd 4:0:0:0: scsi: Device offlined - not ready after error recovery

Regards,

JKB
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Raid5 trouble

2007-10-19 Thread BERTRAND Joël

Bill Davidsen wrote:

Dan Williams wrote:

I found a problem which may lead to the operations count dropping
below zero.  If ops_complete_biofill() gets preempted in between the
following calls:

raid5.c:554 clear_bit(STRIPE_OP_BIOFILL, sh-ops.ack);
raid5.c:555 clear_bit(STRIPE_OP_BIOFILL, sh-ops.pending);

...then get_stripe_work() can recount/re-acknowledge STRIPE_OP_BIOFILL
causing the assertion.  In fact, the 'pending' bit should always be
cleared first, but the other cases are protected by
spin_lock(sh-lock).  Patch attached.
  


Once this patch has been vetted, can it be offered to -stable for 
2.6.23? Or to be pedantic, it *can*, will you make that happen?


	I never see any oops with this patch. But I cannot create a RAID1 array 
with a local RAID5 volume and a foreign RAID5 array exported by iSCSI. 
iSCSI seems to works fine, but RAID1 creation randomly aborts due to a 
unknown SCSI task on target side.


	I have stressed iSCSI target with some simultaneous I/O without any 
trouble (nullio, fileio and blockio), thus I suspect another bug in raid 
code (or an arch specific bug). The last two days, I have made some 
tests to isolate and reproduce this bug:


1/ iSCSI target and initiator seem work when I export with iSCSI a raid5 
array;

2/ raid1 and raid5 seem work with local disks;
3/ iSCSI target is disconnected only when I create a raid1 volume over 
iSCSI (blockio _and_ fileio) with following message:


Oct 18 10:43:52 poulenc kernel: iscsi_trgt: cmnd_abort(1156) 29 1 0 42 
57344 0 0
Oct 18 10:43:52 poulenc kernel: iscsi_trgt: Abort Task (01) issued on 
tid:1 lun:0 by sid:630024457682948 (Unknown Task)


	I run for 12 hours some dd's (read and write in nullio) between 
initiator and target without any disconnection. Thus iSCSI code seems to 
be robust. Both initiator and target are alone on a single gigabit 
ethernet link (without any switch). I'm investigating...


Regards,

JKB
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Iscsitarget-devel] Abort Task ?

2007-10-19 Thread BERTRAND Joël

Ross S. W. Walker wrote:

BERTRAND Joël wrote:

BERTRAND Joël wrote:
I can format serveral times (mkfs.ext3) a 1.5 TB volume 
over iSCSI 
without any trouble. I can read and write on this virtual 
disk without 

any trouble.

Now, I have configured ietd with :

Lun 0 Sectors=1464725758,Type=nullio

and I run on initiator side :

Root gershwin:[/dev]  dd if=/dev/zero of=/dev/sdj bs=8192
479482+0 records in
479482+0 records out
3927916544 bytes (3.9 GB) copied, 153.222 seconds, 25.6 MB/s

Root gershwin:[/dev]  dd if=/dev/zero of=/dev/sdj bs=8192

I'm waitinfor a crash. No one when I write these lines. 
   I suspect 

an interaction between raid and iscsi.

I simultanely run :

Root gershwin:[/dev]  dd if=/dev/zero of=/dev/sdj bs=8192
8397210+0 records in
8397210+0 records out
68789944320 bytes (69 GB) copied, 2732.55 seconds, 25.2 MB/s

and

Root gershwin:[~]  dd if=/dev/sdj of=/dev/null bs=8192
739200+0 records in
739199+0 records out
6055518208 bytes (6.1 GB) copied, 447.178 seconds, 13.5 MB/s

without any trouble.


The speed can definitely be improved. Look at your network setup
and use ping to try and get the network latency to a minimum.

# ping -A -s 8192 172.16.24.140

--- 172.16.24.140 ping statistics ---
14058 packets transmitted, 14057 received, 0% packet loss, time 9988ms
rtt min/avg/max/mdev = 0.234/0.268/2.084/0.041 ms, ipg/ewma 0.710/0.260 ms


gershwin:[~]  ping -A -s 8192 192.168.0.2
PING 192.168.0.2 (192.168.0.2) 8192(8220) bytes of data.
8200 bytes from 192.168.0.2: icmp_seq=1 ttl=64 time=0.693 ms
8200 bytes from 192.168.0.2: icmp_seq=2 ttl=64 time=0.595 ms
8200 bytes from 192.168.0.2: icmp_seq=3 ttl=64 time=0.583 ms
8200 bytes from 192.168.0.2: icmp_seq=4 ttl=64 time=0.589 ms
8200 bytes from 192.168.0.2: icmp_seq=5 ttl=64 time=0.580 ms
8200 bytes from 192.168.0.2: icmp_seq=6 ttl=64 time=0.594 ms
8200 bytes from 192.168.0.2: icmp_seq=7 ttl=64 time=0.580 ms
8200 bytes from 192.168.0.2: icmp_seq=8 ttl=64 time=0.592 ms
8200 bytes from 192.168.0.2: icmp_seq=9 ttl=64 time=0.589 ms
8200 bytes from 192.168.0.2: icmp_seq=10 ttl=64 time=0.571 ms
8200 bytes from 192.168.0.2: icmp_seq=11 ttl=64 time=0.588 ms
8200 bytes from 192.168.0.2: icmp_seq=12 ttl=64 time=0.580 ms
8200 bytes from 192.168.0.2: icmp_seq=13 ttl=64 time=0.587 ms

--- 192.168.0.2 ping statistics ---
13 packets transmitted, 13 received, 0% packet loss, time 2400ms
rtt min/avg/max/mdev = 0.571/0.593/0.693/0.044 ms, ipg/ewma 200.022/0.607 ms
gershwin:[~] 

	Both initiator and target are alone on a gigabit NIC (Tigon3). On 
target server, istd1 takes 100% of a CPU (and only one CPU, even my 
T1000 can simultaneous run 32 threads). I think the limitation comes 
from istd1.



You want your avg ping time for 8192 byte payloads to be 300us or less.

1000/.268 = 3731 IOPS @ 8k = 30 MB/s

If you use apps that do overlapping asynchronous IO you can see better
numbers.


Regards,

JKB
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Raid5 trouble

2007-10-18 Thread BERTRAND Joël

Dan,

I'm testing your last patch (fix-biofill-clear2.patch). It seems to 
work:

Every 1.0s: cat /proc/mdstatThu Oct 18 
10:28:55 2007


Personalities : [raid1] [raid6] [raid5] [raid4]
md7 : active raid1 sdi1[1] md_d0p1[0]
  1464725632 blocks [2/2] [UU]
  []  resync =  0.4% (6442248/1464725632) 
finish=1216.6

min speed=19974K/sec

md_d0 : active raid5 sdc1[0] sdh1[5] sdg1[4] sdf1[3] sde1[2] sdd1[1]
  1464725760 blocks level 5, 64k chunk, algorithm 2 [6/6] [UU]

	I hope it fixes bug I have seen. I shall come back - I think tomorrow, 
my raid volume requires more than 20 hours to be created - to say if it 
works fine.


Regards,

JKB
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Iscsitarget-devel] Abort Task ?

2007-10-18 Thread BERTRAND Joël

Ming Zhang wrote:

On Thu, 2007-10-18 at 11:33 -0400, Ross S. W. Walker wrote:

BERTRAND Joël wrote:

BERTRAND Joël wrote:

BERTRAND Joël wrote:

Hello,

	When I try to create a raid1 volume over iscsi, process 

aborts with :

- on target side:

iscsi_trgt: cmnd_abort(1156) 29 1 0 42 57344 0 0
iscsi_trgt: Abort Task (01) issued on tid:1 lun:0 by 
sid:630024457682948 

(Unknown Task)

Next run:
iscsi_trgt: cmnd_abort(1156) 13 1 0 42 57344 0 0
iscsi_trgt: Abort Task (01) issued on tid:1 lun:0 by 
sid:630058817421315 

(Unknown Task)

	You can see that both lines are very similar. I shall 
try to use 

blockio instead fileio.

With blockio, I got the following message...

iscsi_trgt: cmnd_abort(1156) c 1 0 42 8192 0 0
iscsi_trgt: Abort Task (01) issued on tid:1 lun:0 by 
sid:630024457682946 
(Unknown Task)


Command is the same. What is the signification of 1156 ?

Both outputs are from the same Abort Task management function
the 1156 refers to the line in iscsi.c where the debug printf
was issued.

The other is the more verbose informative message that says
an Abort Task command was issued, but the task was not found.


pure guess, this might because the sparc64 you are using.

could you export a NULLIO target and do some intensive io tests? sort
out these platform issues first...


	I can format serveral times (mkfs.ext3) a 1.5 TB volume over iSCSI 
without any trouble. I can read and write on this virtual disk without 
any trouble.


Now, I have configured ietd with :

Lun 0 Sectors=1464725758,Type=nullio

and I run on initiator side :

Root gershwin:[/dev]  dd if=/dev/zero of=/dev/sdj bs=8192
479482+0 records in
479482+0 records out
3927916544 bytes (3.9 GB) copied, 153.222 seconds, 25.6 MB/s

Root gershwin:[/dev]  dd if=/dev/zero of=/dev/sdj bs=8192

	I'm waitinfor a crash. No one when I write these lines.	I suspect an 
interaction between raid and iscsi.


Regards,

JKB
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Iscsitarget-devel] Abort Task ?

2007-10-18 Thread BERTRAND Joël

BERTRAND Joël wrote:
I can format serveral times (mkfs.ext3) a 1.5 TB volume over iSCSI 
without any trouble. I can read and write on this virtual disk without 
any trouble.


Now, I have configured ietd with :

Lun 0 Sectors=1464725758,Type=nullio

and I run on initiator side :

Root gershwin:[/dev]  dd if=/dev/zero of=/dev/sdj bs=8192
479482+0 records in
479482+0 records out
3927916544 bytes (3.9 GB) copied, 153.222 seconds, 25.6 MB/s

Root gershwin:[/dev]  dd if=/dev/zero of=/dev/sdj bs=8192

I'm waitinfor a crash. No one when I write these lines.I suspect 
an interaction between raid and iscsi.


I simultanely run :

Root gershwin:[/dev]  dd if=/dev/zero of=/dev/sdj bs=8192
8397210+0 records in
8397210+0 records out
68789944320 bytes (69 GB) copied, 2732.55 seconds, 25.2 MB/s

and

Root gershwin:[~]  dd if=/dev/sdj of=/dev/null bs=8192
739200+0 records in
739199+0 records out
6055518208 bytes (6.1 GB) copied, 447.178 seconds, 13.5 MB/s

without any trouble.

Regards,

JKB
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Raid5 trouble

2007-10-17 Thread BERTRAND Joël

BERTRAND Joël wrote:

Hello,

I run 2.6.23 linux kernel on two T1000 (sparc64) servers. Each 
server has a partitionable raid5 array (/dev/md/d0) and I have to 
synchronize both raid5 volumes by raid1. Thus, I have tried to build a 
raid1 volume between /dev/md/d0p1 and /dev/sdi1 (exported by iscsi from 
the second server) and I obtain a BUG :


Root gershwin:[/usr/scripts]  mdadm -C /dev/md7 -l1 -n2 /dev/md/d0p1 
/dev/sdi1

...


Hello,

	I have fixed iscsi-target, and I have tested it. It works now without 
any trouble. Patches were posted on iscsi-target mailing list. When I 
use iSCSI to access to foreign raid5 volume, it works fine. I can format 
foreign volume, copy large files on it... But when I tried to create a 
new raid1 volume with a local raid5 volume and a foreign raid5 volume, I 
receive my well known Oops. You can find my dmesg after Oops :


md: md_d0 stopped.
md: bindsdd1
md: bindsde1
md: bindsdf1
md: bindsdg1
md: bindsdh1

md: bindsdc1
raid5: device sdc1 operational as raid disk 0
raid5: device sdh1 operational as raid disk 5
raid5: device sdg1 operational as raid disk 4
raid5: device sdf1 operational as raid disk 3
raid5: device sde1 operational as raid disk 2
raid5: device sdd1 operational as raid disk 1
raid5: allocated 12518kB for md_d0
raid5: raid level 5 set md_d0 active with 6 out of 6 devices, algorithm 2
RAID5 conf printout:
 --- rd:6 wd:6
 disk 0, o:1, dev:sdc1
 disk 1, o:1, dev:sdd1
 disk 2, o:1, dev:sde1
 disk 3, o:1, dev:sdf1
 disk 4, o:1, dev:sdg1
 disk 5, o:1, dev:sdh1
 md_d0: p1
scsi3 : iSCSI Initiator over TCP/IP
scsi 3:0:0:0: Direct-Access IET  VIRTUAL-DISK 0PQ: 0 ANSI: 4
sd 3:0:0:0: [sdi] 2929451520 512-byte hardware sectors (1499879 MB)
sd 3:0:0:0: [sdi] Write Protect is off
sd 3:0:0:0: [sdi] Mode Sense: 77 00 00 08
sd 3:0:0:0: [sdi] Write cache: disabled, read cache: enabled, doesn't 
support DPO or FUA

sd 3:0:0:0: [sdi] 2929451520 512-byte hardware sectors (1499879 MB)
sd 3:0:0:0: [sdi] Write Protect is off
sd 3:0:0:0: [sdi] Mode Sense: 77 00 00 08
sd 3:0:0:0: [sdi] Write cache: disabled, read cache: enabled, doesn't 
support DPO or FUA

 sdi: sdi1
sd 3:0:0:0: [sdi] Attached SCSI disk
md: bindmd_d0p1
md: bindsdi1
md: md7: raid array is not clean -- starting background reconstruction
raid1: raid set md7 active with 2 out of 2 mirrors
md: resync of RAID array md7
md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
md: using maximum available idle IO bandwidth (but not more than 20 
KB/sec) for resync.

md: using 256k window, over a total of 1464725632 blocks.
kernel BUG at drivers/md/raid5.c:380!
  \|/  \|/
  @'/ .. \`@
  /_| \__/ |_\
 \__U_/
md7_resync(4929): Kernel bad sw trap 5 [#1]
TSTATE: 80001606 TPC: 005ed50c TNPC: 005ed510 Y: 
Not tainted

TPC: get_stripe_work+0x1f4/0x200
g0: 0005 g1: 007c0400 g2: 0001 g3: 
00748400
g4: f800feeb6880 g5: f8000208 g6: f800e7598000 g7: 
00748528
o0: 0029 o1: 00715798 o2: 017c o3: 
0005
o4: 0006 o5: f800e8f0a060 sp: f800e759ad81 ret_pc: 
005ed504

RPC: get_stripe_work+0x1ec/0x200
l0: 0002 l1:  l2: f800e8f0a0a0 l3: 
f800e8f09fe8
l4: f800e8f0a088 l5: fff8 l6: 0005 l7: 
f800e8374000
i0: f800e8f0a028 i1:  i2: 0004 i3: 
f800e759b720
i4: 0080 i5: 0080 i6: f800e759ae51 i7: 
005f0274

I7: handle_stripe5+0x4fc/0x1340
Caller[005f0274]: handle_stripe5+0x4fc/0x1340
Caller[005f211c]: handle_stripe+0x24/0x13e0
Caller[005f4450]: make_request+0x358/0x600
Caller[00542890]: generic_make_request+0x198/0x220
Caller[005eb240]: sync_request+0x608/0x640
Caller[005fef7c]: md_do_sync+0x384/0x920
Caller[005ff8f0]: md_thread+0x38/0x140
Caller[00478b40]: kthread+0x48/0x80
Caller[004273d0]: kernel_thread+0x38/0x60
Caller[00478de0]: kthreadd+0x148/0x1c0
Instruction DUMP: 9210217c  7ff8f57f  90122398 91d02005 30680004 
0100  0100  0100  9de3bf00


I suspect a major bug in raid5 code but I don't know how debug it...

	md7 was crated by mdadm -C /dev/md7 -l1 -n2 /dev/md/d0 /dev/sdi1. 
/dev/md/d0 is a raid5 volume, and sdi a iSCSI disk.


Regards,

JKB
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Raid5 trouble

2007-10-17 Thread BERTRAND Joël

Dan Williams wrote:

On 10/17/07, BERTRAND Joël [EMAIL PROTECTED] wrote:

BERTRAND Joël wrote:

Hello,

I run 2.6.23 linux kernel on two T1000 (sparc64) servers. Each
server has a partitionable raid5 array (/dev/md/d0) and I have to
synchronize both raid5 volumes by raid1. Thus, I have tried to build a
raid1 volume between /dev/md/d0p1 and /dev/sdi1 (exported by iscsi from
the second server) and I obtain a BUG :

Root gershwin:[/usr/scripts]  mdadm -C /dev/md7 -l1 -n2 /dev/md/d0p1
/dev/sdi1
...

Hello,

I have fixed iscsi-target, and I have tested it. It works now without
any trouble. Patches were posted on iscsi-target mailing list. When I
use iSCSI to access to foreign raid5 volume, it works fine. I can format
foreign volume, copy large files on it... But when I tried to create a
new raid1 volume with a local raid5 volume and a foreign raid5 volume, I
receive my well known Oops. You can find my dmesg after Oops :



	Your patch does not work for me. It was applied, new kernel was built, 
and I obtain the same Oops.



Can you send your .config and your bootup dmesg?


Yes, of course ;-) Both files are attached. My new Oops is :

kernel BUG at drivers/md/raid5.c:380!
  \|/  \|/
  @'/ .. \`@
  /_| \__/ |_\
 \__U_/
md7_resync(4258): Kernel bad sw trap 5 [#1]
TSTATE: 80001606 TPC: 005ed50c TNPC: 005ed510 Y: 
Not tainted

TPC: get_stripe_work+0x1f4/0x200

(exactly the same than the old one ;-) ). I have patched iscsi-target to 
avoid alignement bug on sparc64. Do you think a bug in ietd can produced 
this kind of bug ? Patch I have written for iscsi-target (against SVN) 
is attached too.


Regards,

JKB
PROMLIB: Sun IEEE Boot Prom 'OBP 4.23.4 2006/08/04 20:45'
PROMLIB: Root node compatible: sun4v
Linux version 2.6.23 ([EMAIL PROTECTED]) (gcc version 4.1.3 20070831 
(prerelease) (Debian 4.1.2-16)) #7 SMP Wed Oct 17 17:52:22 CEST 2007
ARCH: SUN4V
Ethernet address: 00:14:4f:6f:59:fe
OF stdout device is: /[EMAIL PROTECTED]/[EMAIL PROTECTED]
PROM: Built device tree with 74930 bytes of memory.
MDESC: Size is 32560 bytes.
PLATFORM: banner-name [Sun Fire(TM) T1000]
PLATFORM: name [SUNW,Sun-Fire-T1000]
PLATFORM: hostid [846f59fe]
PLATFORM: serial# [00ab4130]
PLATFORM: stick-frequency [3b9aca00]
PLATFORM: mac-address [144f6f59fe]
PLATFORM: watchdog-resolution [1000 ms]
PLATFORM: watchdog-max-timeout [3153600 ms]
On node 0 totalpages: 522246
  Normal zone: 3583 pages used for memmap
  Normal zone: 0 pages reserved
  Normal zone: 518663 pages, LIFO batch:15
  Movable zone: 0 pages used for memmap
Built 1 zonelists in Zone order.  Total pages: 518663
Kernel command line: root=/dev/md0 ro md=0,/dev/sda4,/dev/sdb4 raid=noautodetect
md: Will configure md0 (super-block) from /dev/sda4,/dev/sdb4, below.
PID hash table entries: 4096 (order: 12, 32768 bytes)
clocksource: mult[1] shift[16]
clockevent: mult[8000] shift[31]
Console: colour dummy device 80x25
console [tty0] enabled
Dentry cache hash table entries: 524288 (order: 9, 4194304 bytes)
Inode-cache hash table entries: 262144 (order: 8, 2097152 bytes)
Memory: 4138072k available (2608k kernel code, 960k data, 144k init) 
[f800,fffc8000]
SLUB: Genslabs=23, HWalign=32, Order=0-2, MinObjects=8, CPUs=32, Nodes=1
Calibrating delay using timer specific routine.. 1995.16 BogoMIPS (lpj=3990330)
Mount-cache hash table entries: 512
Brought up 24 CPUs
xor: automatically using best checksumming function: Niagara
   Niagara   :   240.000 MB/sec
xor: using function: Niagara (240.000 MB/sec)
NET: Registered protocol family 16
PCI: Probing for controllers.
SUN4V_PCI: Registered hvapi major[1] minor[0]
/[EMAIL PROTECTED]: SUN4V PCI Bus Module
/[EMAIL PROTECTED]: PCI IO[e81000] MEM[ea]
/[EMAIL PROTECTED]: SUN4V PCI Bus Module
/[EMAIL PROTECTED]: PCI IO[f01000] MEM[f2]
PCI: Scanning PBM /[EMAIL PROTECTED]
PCI: Scanning PBM /[EMAIL PROTECTED]
ebus: No EBus's found.
SCSI subsystem initialized
NET: Registered protocol family 2
Time: stick clocksource has been installed.
Switched to high resolution mode on CPU 0
Switched to high resolution mode on CPU 20
Switched to high resolution mode on CPU 8
Switched to high resolution mode on CPU 21
Switched to high resolution mode on CPU 9
Switched to high resolution mode on CPU 22
Switched to high resolution mode on CPU 10
Switched to high resolution mode on CPU 23
Switched to high resolution mode on CPU 11
Switched to high resolution mode on CPU 12
Switched to high resolution mode on CPU 13
Switched to high resolution mode on CPU 1
Switched to high resolution mode on CPU 14
Switched to high resolution mode on CPU 2
Switched to high resolution mode on CPU 15
Switched to high resolution mode on CPU 3
Switched to high resolution mode on CPU 16
Switched to high resolution mode on CPU 4
Switched to high resolution mode on CPU 17
Switched to high resolution mode

Re: [BUG] Raid5 trouble

2007-10-17 Thread BERTRAND Joël

Dan Williams wrote:

On 10/17/07, Dan Williams [EMAIL PROTECTED] wrote:

On 10/17/07, BERTRAND Joël [EMAIL PROTECTED] wrote:

BERTRAND Joël wrote:

Hello,

I run 2.6.23 linux kernel on two T1000 (sparc64) servers. Each
server has a partitionable raid5 array (/dev/md/d0) and I have to
synchronize both raid5 volumes by raid1. Thus, I have tried to build a
raid1 volume between /dev/md/d0p1 and /dev/sdi1 (exported by iscsi from
the second server) and I obtain a BUG :

Root gershwin:[/usr/scripts]  mdadm -C /dev/md7 -l1 -n2 /dev/md/d0p1
/dev/sdi1
...

Hello,

I have fixed iscsi-target, and I have tested it. It works now without
any trouble. Patches were posted on iscsi-target mailing list. When I
use iSCSI to access to foreign raid5 volume, it works fine. I can format
foreign volume, copy large files on it... But when I tried to create a
new raid1 volume with a local raid5 volume and a foreign raid5 volume, I
receive my well known Oops. You can find my dmesg after Oops :


Can you send your .config and your bootup dmesg?



I found a problem which may lead to the operations count dropping
below zero.  If ops_complete_biofill() gets preempted in between the
following calls:

raid5.c:554 clear_bit(STRIPE_OP_BIOFILL, sh-ops.ack);
raid5.c:555 clear_bit(STRIPE_OP_BIOFILL, sh-ops.pending);

...then get_stripe_work() can recount/re-acknowledge STRIPE_OP_BIOFILL
causing the assertion.  In fact, the 'pending' bit should always be
cleared first, but the other cases are protected by
spin_lock(sh-lock).  Patch attached.


Dan,

I have modified get_stripe_work like this :

static unsigned long get_stripe_work(struct stripe_head *sh)
{
unsigned long pending;
int ack = 0;
int a,b,c,d,e,f,g;

pending = sh-ops.pending;

test_and_ack_op(STRIPE_OP_BIOFILL, pending);
a=ack;
test_and_ack_op(STRIPE_OP_COMPUTE_BLK, pending);
b=ack;
test_and_ack_op(STRIPE_OP_PREXOR, pending);
c=ack;
test_and_ack_op(STRIPE_OP_BIODRAIN, pending);
d=ack;
test_and_ack_op(STRIPE_OP_POSTXOR, pending);
e=ack;
test_and_ack_op(STRIPE_OP_CHECK, pending);
f=ack;
if (test_and_clear_bit(STRIPE_OP_IO, sh-ops.pending))
ack++;
g=ack;

sh-ops.count -= ack;

if (sh-ops.count0) printk(%d %d %d %d %d %d %d\n, 
a,b,c,d,e,f,g);

BUG_ON(sh-ops.count  0);

return pending;
}

and I obtain on console :

 1 1 1 1 1 2
kernel BUG at drivers/md/raid5.c:390!
  \|/  \|/
  @'/ .. \`@
  /_| \__/ |_\
 \__U_/
md7_resync(5409): Kernel bad sw trap 5 [#1]

If that can help you...

JKB
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Partitionable raid array... How to create devices ?

2007-10-16 Thread BERTRAND Joël

Neil Brown wrote:

On Tuesday October 16, [EMAIL PROTECTED] wrote:

Hello,

	I use software raid for a long time without any trouble. Today, I have 
to install a partitionable raid1 array over iSCSI. I have some questions 
because I don't understand how make this kind of array.


	I have a sparc64 (T1000) with a JBOD (U320 SCSI) that runs a 2.6.23 
linux kernel and debian testing distribution.


/dev/sda : internal SAS drive - OS
/dev/sdb : internal SAS drive - OS

I have made on /dev/sda and /dev/sdb seven raid1 volumes (non 
partitionables arrays).


/dev/sd[c-h] : external U320 drives. Each 300 GB drive only contains one 
type fd partition.


I have tried to create a partitionable array with :

Root gershwin:[/usr/src/linux-2.6.23]  mdadm -C /dev/mdp0 -l5 
--auto=mdp4 -n6 /dev/sd[c-h]1


Try
  /dev/md/d0
or
  /dev/md_d0

as suggested in the DEVICE NAMES section of the man page.
However what you used should work.  I'll get that fixed for the next
release.


	Thanks, it works now. I have seen this note, thus I have remaned my 
array mdp0, but it was not enough ;-)


Regards,

JKB
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[BUG] Raid5 trouble

2007-10-16 Thread BERTRAND Joël

Hello,

	I run 2.6.23 linux kernel on two T1000 (sparc64) servers. Each server 
has a partitionable raid5 array (/dev/md/d0) and I have to synchronize 
both raid5 volumes by raid1. Thus, I have tried to build a raid1 volume 
between /dev/md/d0p1 and /dev/sdi1 (exported by iscsi from the second 
server) and I obtain a BUG :


Root gershwin:[/usr/scripts]  mdadm -C /dev/md7 -l1 -n2 /dev/md/d0p1 
/dev/sdi1

...
kernel BUG at drivers/md/raid5.c:380!
  \|/  \|/
  @'/ .. \`@
  /_| \__/ |_\
 \__U_/
md7_resync(4476): Kernel bad sw trap 5 [#1]
TSTATE: 80001606 TPC: 005ed50c TNPC: 005ed510 Y: 
Not tainted

TPC: get_stripe_work+0x1f4/0x200
g0: 0005 g1: 007c0400 g2: 0001 g3: 
00748400
g4: f800ebdb2400 g5: f8000208 g6: f800e82fc000 g7: 
00748528
o0: 0029 o1: 00715798 o2: 017c o3: 
0005
o4: 0006 o5: f800e9bb6e28 sp: f800e82fed81 ret_pc: 
005ed504

RPC: get_stripe_work+0x1ec/0x200
l0: 0002 l1:  l2: f800e9bb6e68 l3: 
f800e9bb6db0
l4: f800e9bb6e50 l5: fff8 l6: 0005 l7: 
f800fcbd6000
i0: f800e9bb6df0 i1:  i2: 0004 i3: 
f800e82ff720
i4: 0080 i5: 0080 i6: f800e82fee51 i7: 
005f0274

I7: handle_stripe5+0x4fc/0x1340
Caller[005f0274]: handle_stripe5+0x4fc/0x1340
Caller[005f211c]: handle_stripe+0x24/0x13e0
Caller[005f4450]: make_request+0x358/0x600
Caller[00542890]: generic_make_request+0x198/0x220
Caller[005eb240]: sync_request+0x608/0x640
Caller[005fef7c]: md_do_sync+0x384/0x920
Caller[005ff8f0]: md_thread+0x38/0x140
Caller[00478b40]: kthread+0x48/0x80
Caller[004273d0]: kernel_thread+0x38/0x60
Caller[00478de0]: kthreadd+0x148/0x1c0
Instruction DUMP: 9210217c  7ff8f57f  90122398 91d02005 30680004 
0100  0100  0100  9de3bf00


Root gershwin:[/usr/scripts]  cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md7 : active raid1 sdi1[1] md_d0p1[0]
  1464725632 blocks [2/2] [UU]
  []  resync =  0.0% (132600/1464725632) 
finish=141823.7min speed=171K/sec


md_d0 : active raid5 sdc1[0] sdh1[5] sdg1[4] sdf1[3] sde1[2] sdd1[1]
  1464725760 blocks level 5, 64k chunk, algorithm 2 [6/6] [UU]
...
Root gershwin:[/usr/scripts]  fdisk -l /dev/md/d0

Disk /dev/md/d0: 1499.8 GB, 1499879178240 bytes
2 heads, 4 sectors/track, 366181440 cylinders
Units = cylinders of 8 * 512 = 4096 bytes
Disk identifier: 0xa4a52979

  Device Boot  Start End  Blocks   Id  System
/dev/md/d0p1   1   366181440  1464725758   fd  Linux raid 
autodetect

Root gershwin:[/usr/scripts]  fdisk -l /dev/sdi

Disk /dev/sdi: 1499.8 GB, 1499879178240 bytes
2 heads, 4 sectors/track, 366181440 cylinders
Units = cylinders of 8 * 512 = 4096 bytes
Disk identifier: 0xf6cdb2a3

   Device Boot  Start End  Blocks   Id  System
/dev/sdi1   1   366181440  1464725758   fd  Linux raid 
autodetect

Root gershwin:[/usr/scripts]  cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: FUJITSU  Model: MAY2073RCSUN72G  Rev: 0501
  Type:   Direct-AccessANSI  SCSI revision: 04
Host: scsi0 Channel: 00 Id: 01 Lun: 00
  Vendor: FUJITSU  Model: MAY2073RCSUN72G  Rev: 0501
  Type:   Direct-AccessANSI  SCSI revision: 04
Host: scsi2 Channel: 00 Id: 08 Lun: 00
  Vendor: FUJITSU  Model: MAW3300NCRev: 0104
  Type:   Direct-AccessANSI  SCSI revision: 03
Host: scsi2 Channel: 00 Id: 09 Lun: 00
  Vendor: FUJITSU  Model: MAW3300NCRev: 0104
  Type:   Direct-AccessANSI  SCSI revision: 03
Host: scsi2 Channel: 00 Id: 10 Lun: 00
  Vendor: FUJITSU  Model: MAW3300NCRev: 0104
  Type:   Direct-AccessANSI  SCSI revision: 03
Host: scsi2 Channel: 00 Id: 11 Lun: 00
  Vendor: FUJITSU  Model: MAW3300NCRev: 0104
  Type:   Direct-AccessANSI  SCSI revision: 03
Host: scsi2 Channel: 00 Id: 12 Lun: 00
  Vendor: FUJITSU  Model: MAW3300NCRev: 0104
  Type:   Direct-AccessANSI  SCSI revision: 03
Host: scsi2 Channel: 00 Id: 13 Lun: 00
  Vendor: FUJITSU  Model: MAW3300NCRev: 0104
  Type:   Direct-AccessANSI  SCSI revision: 03
Host: scsi3 Channel: 00 Id: 00 Lun: 00
  Vendor: IET  Model: VIRTUAL-DISK Rev: 0
  Type:   Direct-AccessANSI  SCSI revision: 04
Root gershwin:[/usr/scripts] 

I don't think if this bug is arch specific, but I never see it on 
amd64...

Regards,

JKB
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More