Re: [BUG] Raid1/5 over iSCSI trouble

2007-10-29 Thread BERTRAND Joël

Ming Zhang wrote:

off topic, could you resubmit the alignment issue patch to list and see
if tomof accept. he needs a patch inlined in email. it is found and
fixed by you, so had better you post it (instead of me). thx.


diff -u kernel.old/iscsi.c kernel/iscsi.c
--- kernel.old/iscsi.c  2007-10-29 09:49:16.0 +0100
+++ kernel/iscsi.c  2007-10-17 11:19:14.0 +0200
@@ -726,13 +726,26 @@
case READ_10:
case WRITE_10:
case WRITE_VERIFY:
-   *off = be32_to_cpu(*(u32 *)cmd[2]);
+   *off = be32_to_cpuu32) cmd[2])  24) |
+   (((u32) cmd[3])  16) |
+   (((u32) cmd[4])  8) |
+   cmd[5]);
*len = (cmd[7]  8) + cmd[8];
break;
case READ_16:
case WRITE_16:
-   *off = be64_to_cpu(*(u64 *)cmd[2]);
-   *len = be32_to_cpu(*(u32 *)cmd[10]);
+   *off = be32_to_cpuu64) cmd[2])  56) |
+   (((u64) cmd[3])  48) |
+   (((u64) cmd[4])  40) |
+   (((u64) cmd[5])  32) |
+   (((u64) cmd[6])  24) |
+   (((u64) cmd[7])  16) |
+   (((u64) cmd[8])  8) |
+   cmd[9]);
+   *len = be32_to_cpuu32) cmd[10])  24) |
+   (((u32) cmd[11])  16) |
+   (((u32) cmd[12])  8) |
+   cmd[13]);
break;
default:
BUG();
diff -u kernel.old/target_disk.c kernel/target_disk.c
--- kernel.old/target_disk.c2007-10-29 09:49:16.0 +0100
+++ kernel/target_disk.c2007-10-17 16:04:06.0 +0200
@@ -66,13 +66,15 @@
unsigned char geo_m_pg[] = {0x04, 0x16, 0x00, 0x00, 0x00, 0x40, 
0x00, 0x

00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x

00,
0x00, 0x00, 0x00, 0x00, 0x3a, 0x98, 
0x00, 0x

00};
-   u32 ncyl, *p;
+   u32 ncyl;
+   u32 n;

/* assume 0xff heads, 15krpm. */
memcpy(ptr, geo_m_pg, sizeof(geo_m_pg));
ncyl = sec  14; /* 256 * 64 */
-   p = (u32 *)(ptr + 1);
-   *p = *p | cpu_to_be32(ncyl);
+   memcpy(n,ptr+1,sizeof(u32));
+   n = n | cpu_to_be32(ncyl);
+   memcpy(ptr+1, n, sizeof(u32));
return sizeof(geo_m_pg);
 }

@@ -249,7 +251,10 @@
struct iet_volume *lun;
int rest, idx = 0;

-   size = be32_to_cpu(*(u32 *)req-scb[6]);
+   size = be32_to_cpuu32) req-scb[6])  24) |
+   (((u32) req-scb[7])  16) |
+   (((u32) req-scb[8])  8) |
+   req-scb[9]);
if (size  16)
return -1;

Regards,

JKB
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Raid1/5 over iSCSI trouble

2007-10-27 Thread BERTRAND Joël

Dan Williams wrote:

On 10/24/07, BERTRAND Joël [EMAIL PROTECTED] wrote:

Hello,

Any news about this trouble ? Any idea ? I'm trying to fix it, but I
don't see any specific interaction between raid5 and istd. Does anyone
try to reproduce this bug on another arch than sparc64 ? I only use
sparc32 and 64 servers and I cannot test on other archs. Of course, I
have a laptop, but I cannot create a raid5 array on its internal HD to
test this configuration ;-)



Can you collect some oprofile data, as Ming suggested, so we can maybe
see what md_d0_raid5 and istd1 are fighting about?  Hopefully it is as
painless to run on sparc as it is on IA:

opcontrol --start --vmlinux=/path/to/vmlinux
wait
opcontrol --stop
opreport --image-path=/lib/modules/`uname -r` -l


Done.

Profiling through timer interrupt
samples  %image name   app name 
symbol name

20028038 92.9510  vmlinux-2.6.23   vmlinux-2.6.23   cpu_idle
1198566   5.5626  vmlinux-2.6.23   vmlinux-2.6.23   schedule
41558 0.1929  vmlinux-2.6.23   vmlinux-2.6.23   yield
34791 0.1615  vmlinux-2.6.23   vmlinux-2.6.23   NGmemcpy
18417 0.0855  vmlinux-2.6.23   vmlinux-2.6.23 
xor_niagara_5
17430 0.0809  raid456  raid456  (no 
symbols)
15837 0.0735  vmlinux-2.6.23   vmlinux-2.6.23 
sys_sched_yield

14860 0.0690  iscsi_trgt.koiscsi_trgt   istd
12705 0.0590  nf_conntrack nf_conntrack (no 
symbols)
9236  0.0429  libc-2.6.1.solibc-2.6.1.so(no 
symbols)
9034  0.0419  vmlinux-2.6.23   vmlinux-2.6.23 
xor_niagara_2
6534  0.0303  oprofiledoprofiled(no 
symbols)
6149  0.0285  vmlinux-2.6.23   vmlinux-2.6.23 
scsi_request_fn
5947  0.0276  ip_tablesip_tables(no 
symbols)
4510  0.0209  vmlinux-2.6.23   vmlinux-2.6.23 
dma_4v_map_single
3823  0.0177  vmlinux-2.6.23   vmlinux-2.6.23 
__make_request

3326  0.0154  vmlinux-2.6.23   vmlinux-2.6.23   tg3_poll
3162  0.0147  iscsi_trgt.koiscsi_trgt 
scsi_cmnd_exec
3091  0.0143  vmlinux-2.6.23   vmlinux-2.6.23 
scsi_dispatch_cmd
2849  0.0132  vmlinux-2.6.23   vmlinux-2.6.23 
tcp_v4_rcv
2811  0.0130  vmlinux-2.6.23   vmlinux-2.6.23 
nf_iterate
2729  0.0127  vmlinux-2.6.23   vmlinux-2.6.23 
_spin_lock_bh

2551  0.0118  vmlinux-2.6.23   vmlinux-2.6.23   kfree
2467  0.0114  vmlinux-2.6.23   vmlinux-2.6.23 
kmem_cache_free
2314  0.0107  vmlinux-2.6.23   vmlinux-2.6.23 
atomic_add
2065  0.0096  vmlinux-2.6.23   vmlinux-2.6.23 
NGbzero_loop

1826  0.0085  vmlinux-2.6.23   vmlinux-2.6.23   ip_rcv
1823  0.0085  nf_conntrack_ipv4nf_conntrack_ipv4(no 
symbols)
1822  0.0085  vmlinux-2.6.23   vmlinux-2.6.23 
clear_bit
1767  0.0082  python2.4python2.4(no 
symbols)
1734  0.0080  vmlinux-2.6.23   vmlinux-2.6.23 
atomic_sub_ret
1694  0.0079  vmlinux-2.6.23   vmlinux-2.6.23 
tcp_rcv_established
1673  0.0078  vmlinux-2.6.23   vmlinux-2.6.23 
tcp_recvmsg
1670  0.0078  vmlinux-2.6.23   vmlinux-2.6.23 
netif_receive_skb

1668  0.0077  vmlinux-2.6.23   vmlinux-2.6.23   set_bit
1545  0.0072  vmlinux-2.6.23   vmlinux-2.6.23 
__kmalloc_track_caller
1526  0.0071  iptable_nat  iptable_nat  (no 
symbols)
1526  0.0071  vmlinux-2.6.23   vmlinux-2.6.23 
kmem_cache_alloc
1373  0.0064  vmlinux-2.6.23   vmlinux-2.6.23 
generic_unplug_device

...

Is it enough ?

Regards,

JKB
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Raid1/5 over iSCSI trouble

2007-10-27 Thread Dan Williams
On 10/27/07, BERTRAND Joël [EMAIL PROTECTED] wrote:
 Dan Williams wrote:
  Can you collect some oprofile data, as Ming suggested, so we can maybe
  see what md_d0_raid5 and istd1 are fighting about?  Hopefully it is as
  painless to run on sparc as it is on IA:
 
  opcontrol --start --vmlinux=/path/to/vmlinux
  wait
  opcontrol --stop
  opreport --image-path=/lib/modules/`uname -r` -l

 Done.


[..]


 Is it enough ?

I would expect md_d0_raid5 and istd1 to show up pretty high in the
list if they are constantly pegged at a 100% CPU utilization like you
showed in the failure case.  Maybe this was captured after the target
has disconnected?
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Raid1/5 over iSCSI trouble

2007-10-27 Thread BERTRAND Joël

Dan Williams wrote:

On 10/27/07, BERTRAND Joël [EMAIL PROTECTED] wrote:

Dan Williams wrote:

Can you collect some oprofile data, as Ming suggested, so we can maybe
see what md_d0_raid5 and istd1 are fighting about?  Hopefully it is as
painless to run on sparc as it is on IA:

opcontrol --start --vmlinux=/path/to/vmlinux
wait
opcontrol --stop
opreport --image-path=/lib/modules/`uname -r` -l

Done.



[..]


Is it enough ?


I would expect md_d0_raid5 and istd1 to show up pretty high in the
list if they are constantly pegged at a 100% CPU utilization like you
showed in the failure case.  Maybe this was captured after the target
has disconnected?


	No, I have launched opcontrol before starting raid1 creation, and 
stopped after disconnection. Don't forget that this server has 32 CPU's.


Regards,

JKB
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Raid1/5 over iSCSI trouble

2007-10-27 Thread Ming Zhang
off topic, could you resubmit the alignment issue patch to list and see
if tomof accept. he needs a patch inlined in email. it is found and
fixed by you, so had better you post it (instead of me). thx.


On Sat, 2007-10-27 at 15:29 +0200, BERTRAND Joël wrote:
 Dan Williams wrote:
  On 10/24/07, BERTRAND Joël [EMAIL PROTECTED] wrote:
  Hello,
 
  Any news about this trouble ? Any idea ? I'm trying to fix it, but 
  I
  don't see any specific interaction between raid5 and istd. Does anyone
  try to reproduce this bug on another arch than sparc64 ? I only use
  sparc32 and 64 servers and I cannot test on other archs. Of course, I
  have a laptop, but I cannot create a raid5 array on its internal HD to
  test this configuration ;-)
 
  
  Can you collect some oprofile data, as Ming suggested, so we can maybe
  see what md_d0_raid5 and istd1 are fighting about?  Hopefully it is as
  painless to run on sparc as it is on IA:
  
  opcontrol --start --vmlinux=/path/to/vmlinux
  wait
  opcontrol --stop
  opreport --image-path=/lib/modules/`uname -r` -l
 
   Done.
 
 Profiling through timer interrupt
 samples  %image name   app name 
 symbol name
 20028038 92.9510  vmlinux-2.6.23   vmlinux-2.6.23   cpu_idle
 1198566   5.5626  vmlinux-2.6.23   vmlinux-2.6.23   schedule
 41558 0.1929  vmlinux-2.6.23   vmlinux-2.6.23   yield
 34791 0.1615  vmlinux-2.6.23   vmlinux-2.6.23   NGmemcpy
 18417 0.0855  vmlinux-2.6.23   vmlinux-2.6.23 
 xor_niagara_5

raid5 use these 2. forgot to ask if you met any memory pressure here.




 17430 0.0809  raid456  raid456  (no 
 symbols)
 15837 0.0735  vmlinux-2.6.23   vmlinux-2.6.23 
 sys_sched_yield
 14860 0.0690  iscsi_trgt.koiscsi_trgt   istd

could you get a call graph from oprofile. the yield is called quite
frequently. iet has some place to call it when no memory available. not
sure if this is the case.

i remember there was a post (maybe in lwn.net?) about some issues
between tickless kernel and yield() lead to 100% cpu utilization, i just
could not recalled the place, anybody have a clue?

or sparc64 does not have tickless kernel yet? did not follow these
carefully these days.


 12705 0.0590  nf_conntrack nf_conntrack (no 
 symbols)
 9236  0.0429  libc-2.6.1.solibc-2.6.1.so(no 
 symbols)
 9034  0.0419  vmlinux-2.6.23   vmlinux-2.6.23 
 xor_niagara_2
 6534  0.0303  oprofiledoprofiled(no 
 symbols)
 6149  0.0285  vmlinux-2.6.23   vmlinux-2.6.23 
 scsi_request_fn
 5947  0.0276  ip_tablesip_tables(no 
 symbols)
 4510  0.0209  vmlinux-2.6.23   vmlinux-2.6.23 
 dma_4v_map_single
 3823  0.0177  vmlinux-2.6.23   vmlinux-2.6.23 
 __make_request
 3326  0.0154  vmlinux-2.6.23   vmlinux-2.6.23   tg3_poll
 3162  0.0147  iscsi_trgt.koiscsi_trgt 
 scsi_cmnd_exec
 3091  0.0143  vmlinux-2.6.23   vmlinux-2.6.23 
 scsi_dispatch_cmd
 2849  0.0132  vmlinux-2.6.23   vmlinux-2.6.23 
 tcp_v4_rcv
 2811  0.0130  vmlinux-2.6.23   vmlinux-2.6.23 
 nf_iterate
 2729  0.0127  vmlinux-2.6.23   vmlinux-2.6.23 
 _spin_lock_bh
 2551  0.0118  vmlinux-2.6.23   vmlinux-2.6.23   kfree
 2467  0.0114  vmlinux-2.6.23   vmlinux-2.6.23 
 kmem_cache_free
 2314  0.0107  vmlinux-2.6.23   vmlinux-2.6.23 
 atomic_add
 2065  0.0096  vmlinux-2.6.23   vmlinux-2.6.23 
 NGbzero_loop
 1826  0.0085  vmlinux-2.6.23   vmlinux-2.6.23   ip_rcv
 1823  0.0085  nf_conntrack_ipv4nf_conntrack_ipv4(no 
 symbols)
 1822  0.0085  vmlinux-2.6.23   vmlinux-2.6.23 
 clear_bit
 1767  0.0082  python2.4python2.4(no 
 symbols)
 1734  0.0080  vmlinux-2.6.23   vmlinux-2.6.23 
 atomic_sub_ret
 1694  0.0079  vmlinux-2.6.23   vmlinux-2.6.23 
 tcp_rcv_established
 1673  0.0078  vmlinux-2.6.23   vmlinux-2.6.23 
 tcp_recvmsg
 1670  0.0078  vmlinux-2.6.23   vmlinux-2.6.23 
 netif_receive_skb
 1668  0.0077  vmlinux-2.6.23   vmlinux-2.6.23   set_bit
 1545  0.0072  vmlinux-2.6.23   vmlinux-2.6.23 
 __kmalloc_track_caller
 1526  0.0071  iptable_nat  iptable_nat  (no 
 symbols)
 1526  0.0071  vmlinux-2.6.23   vmlinux-2.6.23 
 kmem_cache_alloc
 1373  0.0064  vmlinux-2.6.23   vmlinux-2.6.23 
 generic_unplug_device
 ...
 
   Is it enough ?
 
   Regards,
 
   JKB
-- 
Ming Zhang


@#$%^ purging memory... (*!%
http://blackmagic02881.wordpress.com/
http://www.linkedin.com/in/blackmagic02881


-
To 

Re: [BUG] Raid1/5 over iSCSI trouble

2007-10-24 Thread BERTRAND Joël

Hello,

	Any news about this trouble ? Any idea ? I'm trying to fix it, but I 
don't see any specific interaction between raid5 and istd. Does anyone 
try to reproduce this bug on another arch than sparc64 ? I only use 
sparc32 and 64 servers and I cannot test on other archs. Of course, I 
have a laptop, but I cannot create a raid5 array on its internal HD to 
test this configuration ;-)


Please note that I won't read my mails until next saturday morning 
(CEST).


After disconnection of iSCSI target :

Tasks: 232 total,   7 running, 224 sleeping,   0 stopped,   1 zombie
Cpu(s):  0.0%us, 15.2%sy,  0.0%ni, 84.3%id,  0.0%wa,  0.1%hi,  0.3%si, 
0.0%st

Mem:   4139032k total,  4127584k used,11448k free,95752k buffers
Swap:  7815536k total,0k used,  7815536k free,  3758792k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 9738 root  15  -5 000 R  100  0.0   4:56.82 md_d0_raid5
 9774 root  15  -5 000 R  100  0.0   5:52.41 istd1
 9739 root  15  -5 000 R   14  0.0   0:28.90 md_d0_resync
 9916 root  20   0  3248 1544 1120 R2  0.0   0:00.56 top
 4129 root  20   0 41648 5024 2432 S0  0.1   2:56.17 
fail2ban-server

1 root  20   0  2576  960  816 S0  0.0   0:01.58 init
2 root  15  -5 000 S0  0.0   0:00.00 kthreadd
3 root  RT  -5 000 S0  0.0   0:00.00 migration/0
4 root  15  -5 000 S0  0.0   0:00.02 ksoftirqd/0
5 root  RT  -5 000 S0  0.0   0:00.00 migration/1
6 root  15  -5 000 S0  0.0   0:00.00 ksoftirqd/1


Regards,

JKB
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Raid1/5 over iSCSI trouble

2007-10-24 Thread Bill Davidsen

BERTRAND Joël wrote:

Hello,

Any news about this trouble ? Any idea ? I'm trying to fix it, but 
I don't see any specific interaction between raid5 and istd. Does 
anyone try to reproduce this bug on another arch than sparc64 ? I only 
use sparc32 and 64 servers and I cannot test on other archs. Of 
course, I have a laptop, but I cannot create a raid5 array on its 
internal HD to test this configuration ;-)


Sure you can, a few loopback devices and a few iSCSI, and you're in 
business. I think the ongoing discussion of timeouts and whatnot may 
bear some fruit eventually, perhaps not as fast as you would like. By 
Saturday a solution may emerge.


Please note that I won't read my mails until next saturday morning 
(CEST). 



--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Raid1/5 over iSCSI trouble

2007-10-24 Thread Dan Williams
On 10/24/07, BERTRAND Joël [EMAIL PROTECTED] wrote:
 Hello,

 Any news about this trouble ? Any idea ? I'm trying to fix it, but I
 don't see any specific interaction between raid5 and istd. Does anyone
 try to reproduce this bug on another arch than sparc64 ? I only use
 sparc32 and 64 servers and I cannot test on other archs. Of course, I
 have a laptop, but I cannot create a raid5 array on its internal HD to
 test this configuration ;-)


Can you collect some oprofile data, as Ming suggested, so we can maybe
see what md_d0_raid5 and istd1 are fighting about?  Hopefully it is as
painless to run on sparc as it is on IA:

opcontrol --start --vmlinux=/path/to/vmlinux
wait
opcontrol --stop
opreport --image-path=/lib/modules/`uname -r` -l

--
Dan
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Raid1/5 over iSCSI trouble

2007-10-24 Thread David Miller
From: Dan Williams [EMAIL PROTECTED]
Date: Wed, 24 Oct 2007 16:49:28 -0700

 Hopefully it is as painless to run on sparc as it is on IA:
 
 opcontrol --start --vmlinux=/path/to/vmlinux
 wait
 opcontrol --stop
 opreport --image-path=/lib/modules/`uname -r` -l

It is painless, I use it all the time.

The only caveat is to make sure the /path/to/vmlinux is
the pre-stripped kernel image.  The images installed
under /boot/ are usually stripped and thus not suitable
for profiling.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Raid1/5 over iSCSI trouble

2007-10-20 Thread BERTRAND Joël

Bill Davidsen wrote:

BERTRAND Joël wrote:


Sorry for this last mail. I have found another mistake, but I 
don't know if this bug comes from iscsi-target or raid5 itself. iSCSI 
target is disconnected because istd1 and md_d0_raid5 kernel threads 
use 100% of CPU each !


Tasks: 235 total,   6 running, 227 sleeping,   0 stopped,   2 zombie
Cpu(s):  0.1%us, 12.5%sy,  0.0%ni, 87.4%id,  0.0%wa,  0.0%hi,  0.0%si, 
0.0%st

Mem:   4139032k total,   218424k used,  3920608k free,10136k buffers
Swap:  7815536k total,0k used,  7815536k free,64808k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 5824 root  15  -5 000 R  100  0.0  10:34.25 istd1
 5599 root  15  -5 000 R  100  0.0   7:25.43 md_d0_raid5


Given that the summary shows 87.4% idle, something is not right. You 
might try another tool, like vmstat, to at least verify the way the CPU 
is being used. When you can't trust what your tools tell you it gets 
really hard to make decisions based on the data.


Don't forget this box is a 32-CPU server.

JKB
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Raid1/5 over iSCSI trouble

2007-10-19 Thread BERTRAND Joël

BERTRAND Joël wrote:

Bill Davidsen wrote:

Dan Williams wrote:

On Fri, 2007-10-19 at 01:04 -0700, BERTRAND Joël wrote:
 

I run for 12 hours some dd's (read and write in nullio)
between
initiator and target without any disconnection. Thus iSCSI code seems
to
be robust. Both initiator and target are alone on a single gigabit
ethernet link (without any switch). I'm investigating...



Can you reproduce on 2.6.22?

Also, I do not think this is the cause of your failure, but you have
CONFIG_DMA_ENGINE=y in your config.  Setting this to 'n' will compile
out the unneeded checks for offload engines in async_memcpy and
async_xor.


Given that offload engines are far less tested code, I think this is a 
very good thing to try!


I'm trying wihtout CONFIG_DMA_ENGINE=y. istd1 only uses 40% of one 
CPU when I rebuild my raid1 array. 1% of this array was now 
resynchronized without any hang.


Root gershwin:[/usr/scripts]  cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md7 : active raid1 sdi1[2] md_d0p1[0]
  1464725632 blocks [2/1] [U_]
  []  recovery =  1.0% (15705536/1464725632) 
finish=1103.9min speed=21875K/sec


Same result...

connection2:0: iscsi: detected conn error (1011)

 session2: iscsi: session recovery timed out after 120 secs
sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
sd 4:0:0:0: scsi: Device offlined - not ready after error recovery

Regards,

JKB
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Raid1/5 over iSCSI trouble

2007-10-19 Thread Dan Williams
On Fri, 2007-10-19 at 14:04 -0700, BERTRAND Joël wrote:
 
 Sorry for this last mail. I have found another mistake, but I
 don't
 know if this bug comes from iscsi-target or raid5 itself. iSCSI target
 is disconnected because istd1 and md_d0_raid5 kernel threads use 100%
 of
 CPU each !
 
 Tasks: 235 total,   6 running, 227 sleeping,   0 stopped,   2 zombie
 Cpu(s):  0.1%us, 12.5%sy,  0.0%ni, 87.4%id,  0.0%wa,  0.0%hi,  0.0%si,
 0.0%st
 Mem:   4139032k total,   218424k used,  3920608k free,10136k
 buffers
 Swap:  7815536k total,0k used,  7815536k free,64808k
 cached
 
PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 
   5824 root  15  -5 000 R  100  0.0  10:34.25 istd1
 
   5599 root  15  -5 000 R  100  0.0   7:25.43
 md_d0_raid5

What is the output of:
cat /proc/5824/wchan
cat /proc/5599/wchan

Thanks,
Dan
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Raid1/5 over iSCSI trouble

2007-10-19 Thread Bill Davidsen

BERTRAND Joël wrote:


Sorry for this last mail. I have found another mistake, but I 
don't know if this bug comes from iscsi-target or raid5 itself. iSCSI 
target is disconnected because istd1 and md_d0_raid5 kernel threads 
use 100% of CPU each !


Tasks: 235 total,   6 running, 227 sleeping,   0 stopped,   2 zombie
Cpu(s):  0.1%us, 12.5%sy,  0.0%ni, 87.4%id,  0.0%wa,  0.0%hi,  0.0%si, 
0.0%st

Mem:   4139032k total,   218424k used,  3920608k free,10136k buffers
Swap:  7815536k total,0k used,  7815536k free,64808k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 5824 root  15  -5 000 R  100  0.0  10:34.25 istd1
 5599 root  15  -5 000 R  100  0.0   7:25.43 md_d0_raid5


Given that the summary shows 87.4% idle, something is not right. You 
might try another tool, like vmstat, to at least verify the way the CPU 
is being used. When you can't trust what your tools tell you it gets 
really hard to make decisions based on the data.


--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Raid1/5 over iSCSI trouble

2007-10-19 Thread Bill Davidsen

Bill Davidsen wrote:

BERTRAND Joël wrote:


Sorry for this last mail. I have found another mistake, but I 
don't know if this bug comes from iscsi-target or raid5 itself. iSCSI 
target is disconnected because istd1 and md_d0_raid5 kernel threads 
use 100% of CPU each !


Tasks: 235 total,   6 running, 227 sleeping,   0 stopped,   2 zombie
Cpu(s):  0.1%us, 12.5%sy,  0.0%ni, 87.4%id,  0.0%wa,  0.0%hi,  
0.0%si, 0.0%st

Mem:   4139032k total,   218424k used,  3920608k free,10136k buffers
Swap:  7815536k total,0k used,  7815536k free,64808k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 5824 root  15  -5 000 R  100  0.0  10:34.25 istd1
 5599 root  15  -5 000 R  100  0.0   7:25.43 md_d0_raid5


Given that the summary shows 87.4% idle, something is not right. You 
might try another tool, like vmstat, to at least verify the way the 
CPU is being used. When you can't trust what your tools tell you it 
gets really hard to make decisions based on the data.


ALSO: you have zombie processes. Looking at machines up for 45, 54, and 
470 days, zombies are *not* something you just have to expect. Do you 
get these just about the same time things go to hell? Better you than 
me, I suspect there are still many ways to have a learning experience 
with iSCSI.


Hope that and the summary confusion result in some useful data.

--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html