Re: [BUG] Raid1/5 over iSCSI trouble
Ming Zhang wrote: off topic, could you resubmit the alignment issue patch to list and see if tomof accept. he needs a patch inlined in email. it is found and fixed by you, so had better you post it (instead of me). thx. diff -u kernel.old/iscsi.c kernel/iscsi.c --- kernel.old/iscsi.c 2007-10-29 09:49:16.0 +0100 +++ kernel/iscsi.c 2007-10-17 11:19:14.0 +0200 @@ -726,13 +726,26 @@ case READ_10: case WRITE_10: case WRITE_VERIFY: - *off = be32_to_cpu(*(u32 *)cmd[2]); + *off = be32_to_cpuu32) cmd[2]) 24) | + (((u32) cmd[3]) 16) | + (((u32) cmd[4]) 8) | + cmd[5]); *len = (cmd[7] 8) + cmd[8]; break; case READ_16: case WRITE_16: - *off = be64_to_cpu(*(u64 *)cmd[2]); - *len = be32_to_cpu(*(u32 *)cmd[10]); + *off = be32_to_cpuu64) cmd[2]) 56) | + (((u64) cmd[3]) 48) | + (((u64) cmd[4]) 40) | + (((u64) cmd[5]) 32) | + (((u64) cmd[6]) 24) | + (((u64) cmd[7]) 16) | + (((u64) cmd[8]) 8) | + cmd[9]); + *len = be32_to_cpuu32) cmd[10]) 24) | + (((u32) cmd[11]) 16) | + (((u32) cmd[12]) 8) | + cmd[13]); break; default: BUG(); diff -u kernel.old/target_disk.c kernel/target_disk.c --- kernel.old/target_disk.c2007-10-29 09:49:16.0 +0100 +++ kernel/target_disk.c2007-10-17 16:04:06.0 +0200 @@ -66,13 +66,15 @@ unsigned char geo_m_pg[] = {0x04, 0x16, 0x00, 0x00, 0x00, 0x40, 0x00, 0x 00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x 00, 0x00, 0x00, 0x00, 0x00, 0x3a, 0x98, 0x00, 0x 00}; - u32 ncyl, *p; + u32 ncyl; + u32 n; /* assume 0xff heads, 15krpm. */ memcpy(ptr, geo_m_pg, sizeof(geo_m_pg)); ncyl = sec 14; /* 256 * 64 */ - p = (u32 *)(ptr + 1); - *p = *p | cpu_to_be32(ncyl); + memcpy(n,ptr+1,sizeof(u32)); + n = n | cpu_to_be32(ncyl); + memcpy(ptr+1, n, sizeof(u32)); return sizeof(geo_m_pg); } @@ -249,7 +251,10 @@ struct iet_volume *lun; int rest, idx = 0; - size = be32_to_cpu(*(u32 *)req-scb[6]); + size = be32_to_cpuu32) req-scb[6]) 24) | + (((u32) req-scb[7]) 16) | + (((u32) req-scb[8]) 8) | + req-scb[9]); if (size 16) return -1; Regards, JKB - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Raid1/5 over iSCSI trouble
Dan Williams wrote: On 10/24/07, BERTRAND Joël [EMAIL PROTECTED] wrote: Hello, Any news about this trouble ? Any idea ? I'm trying to fix it, but I don't see any specific interaction between raid5 and istd. Does anyone try to reproduce this bug on another arch than sparc64 ? I only use sparc32 and 64 servers and I cannot test on other archs. Of course, I have a laptop, but I cannot create a raid5 array on its internal HD to test this configuration ;-) Can you collect some oprofile data, as Ming suggested, so we can maybe see what md_d0_raid5 and istd1 are fighting about? Hopefully it is as painless to run on sparc as it is on IA: opcontrol --start --vmlinux=/path/to/vmlinux wait opcontrol --stop opreport --image-path=/lib/modules/`uname -r` -l Done. Profiling through timer interrupt samples %image name app name symbol name 20028038 92.9510 vmlinux-2.6.23 vmlinux-2.6.23 cpu_idle 1198566 5.5626 vmlinux-2.6.23 vmlinux-2.6.23 schedule 41558 0.1929 vmlinux-2.6.23 vmlinux-2.6.23 yield 34791 0.1615 vmlinux-2.6.23 vmlinux-2.6.23 NGmemcpy 18417 0.0855 vmlinux-2.6.23 vmlinux-2.6.23 xor_niagara_5 17430 0.0809 raid456 raid456 (no symbols) 15837 0.0735 vmlinux-2.6.23 vmlinux-2.6.23 sys_sched_yield 14860 0.0690 iscsi_trgt.koiscsi_trgt istd 12705 0.0590 nf_conntrack nf_conntrack (no symbols) 9236 0.0429 libc-2.6.1.solibc-2.6.1.so(no symbols) 9034 0.0419 vmlinux-2.6.23 vmlinux-2.6.23 xor_niagara_2 6534 0.0303 oprofiledoprofiled(no symbols) 6149 0.0285 vmlinux-2.6.23 vmlinux-2.6.23 scsi_request_fn 5947 0.0276 ip_tablesip_tables(no symbols) 4510 0.0209 vmlinux-2.6.23 vmlinux-2.6.23 dma_4v_map_single 3823 0.0177 vmlinux-2.6.23 vmlinux-2.6.23 __make_request 3326 0.0154 vmlinux-2.6.23 vmlinux-2.6.23 tg3_poll 3162 0.0147 iscsi_trgt.koiscsi_trgt scsi_cmnd_exec 3091 0.0143 vmlinux-2.6.23 vmlinux-2.6.23 scsi_dispatch_cmd 2849 0.0132 vmlinux-2.6.23 vmlinux-2.6.23 tcp_v4_rcv 2811 0.0130 vmlinux-2.6.23 vmlinux-2.6.23 nf_iterate 2729 0.0127 vmlinux-2.6.23 vmlinux-2.6.23 _spin_lock_bh 2551 0.0118 vmlinux-2.6.23 vmlinux-2.6.23 kfree 2467 0.0114 vmlinux-2.6.23 vmlinux-2.6.23 kmem_cache_free 2314 0.0107 vmlinux-2.6.23 vmlinux-2.6.23 atomic_add 2065 0.0096 vmlinux-2.6.23 vmlinux-2.6.23 NGbzero_loop 1826 0.0085 vmlinux-2.6.23 vmlinux-2.6.23 ip_rcv 1823 0.0085 nf_conntrack_ipv4nf_conntrack_ipv4(no symbols) 1822 0.0085 vmlinux-2.6.23 vmlinux-2.6.23 clear_bit 1767 0.0082 python2.4python2.4(no symbols) 1734 0.0080 vmlinux-2.6.23 vmlinux-2.6.23 atomic_sub_ret 1694 0.0079 vmlinux-2.6.23 vmlinux-2.6.23 tcp_rcv_established 1673 0.0078 vmlinux-2.6.23 vmlinux-2.6.23 tcp_recvmsg 1670 0.0078 vmlinux-2.6.23 vmlinux-2.6.23 netif_receive_skb 1668 0.0077 vmlinux-2.6.23 vmlinux-2.6.23 set_bit 1545 0.0072 vmlinux-2.6.23 vmlinux-2.6.23 __kmalloc_track_caller 1526 0.0071 iptable_nat iptable_nat (no symbols) 1526 0.0071 vmlinux-2.6.23 vmlinux-2.6.23 kmem_cache_alloc 1373 0.0064 vmlinux-2.6.23 vmlinux-2.6.23 generic_unplug_device ... Is it enough ? Regards, JKB - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Raid1/5 over iSCSI trouble
On 10/27/07, BERTRAND Joël [EMAIL PROTECTED] wrote: Dan Williams wrote: Can you collect some oprofile data, as Ming suggested, so we can maybe see what md_d0_raid5 and istd1 are fighting about? Hopefully it is as painless to run on sparc as it is on IA: opcontrol --start --vmlinux=/path/to/vmlinux wait opcontrol --stop opreport --image-path=/lib/modules/`uname -r` -l Done. [..] Is it enough ? I would expect md_d0_raid5 and istd1 to show up pretty high in the list if they are constantly pegged at a 100% CPU utilization like you showed in the failure case. Maybe this was captured after the target has disconnected? - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Raid1/5 over iSCSI trouble
Dan Williams wrote: On 10/27/07, BERTRAND Joël [EMAIL PROTECTED] wrote: Dan Williams wrote: Can you collect some oprofile data, as Ming suggested, so we can maybe see what md_d0_raid5 and istd1 are fighting about? Hopefully it is as painless to run on sparc as it is on IA: opcontrol --start --vmlinux=/path/to/vmlinux wait opcontrol --stop opreport --image-path=/lib/modules/`uname -r` -l Done. [..] Is it enough ? I would expect md_d0_raid5 and istd1 to show up pretty high in the list if they are constantly pegged at a 100% CPU utilization like you showed in the failure case. Maybe this was captured after the target has disconnected? No, I have launched opcontrol before starting raid1 creation, and stopped after disconnection. Don't forget that this server has 32 CPU's. Regards, JKB - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Raid1/5 over iSCSI trouble
off topic, could you resubmit the alignment issue patch to list and see if tomof accept. he needs a patch inlined in email. it is found and fixed by you, so had better you post it (instead of me). thx. On Sat, 2007-10-27 at 15:29 +0200, BERTRAND Joël wrote: Dan Williams wrote: On 10/24/07, BERTRAND Joël [EMAIL PROTECTED] wrote: Hello, Any news about this trouble ? Any idea ? I'm trying to fix it, but I don't see any specific interaction between raid5 and istd. Does anyone try to reproduce this bug on another arch than sparc64 ? I only use sparc32 and 64 servers and I cannot test on other archs. Of course, I have a laptop, but I cannot create a raid5 array on its internal HD to test this configuration ;-) Can you collect some oprofile data, as Ming suggested, so we can maybe see what md_d0_raid5 and istd1 are fighting about? Hopefully it is as painless to run on sparc as it is on IA: opcontrol --start --vmlinux=/path/to/vmlinux wait opcontrol --stop opreport --image-path=/lib/modules/`uname -r` -l Done. Profiling through timer interrupt samples %image name app name symbol name 20028038 92.9510 vmlinux-2.6.23 vmlinux-2.6.23 cpu_idle 1198566 5.5626 vmlinux-2.6.23 vmlinux-2.6.23 schedule 41558 0.1929 vmlinux-2.6.23 vmlinux-2.6.23 yield 34791 0.1615 vmlinux-2.6.23 vmlinux-2.6.23 NGmemcpy 18417 0.0855 vmlinux-2.6.23 vmlinux-2.6.23 xor_niagara_5 raid5 use these 2. forgot to ask if you met any memory pressure here. 17430 0.0809 raid456 raid456 (no symbols) 15837 0.0735 vmlinux-2.6.23 vmlinux-2.6.23 sys_sched_yield 14860 0.0690 iscsi_trgt.koiscsi_trgt istd could you get a call graph from oprofile. the yield is called quite frequently. iet has some place to call it when no memory available. not sure if this is the case. i remember there was a post (maybe in lwn.net?) about some issues between tickless kernel and yield() lead to 100% cpu utilization, i just could not recalled the place, anybody have a clue? or sparc64 does not have tickless kernel yet? did not follow these carefully these days. 12705 0.0590 nf_conntrack nf_conntrack (no symbols) 9236 0.0429 libc-2.6.1.solibc-2.6.1.so(no symbols) 9034 0.0419 vmlinux-2.6.23 vmlinux-2.6.23 xor_niagara_2 6534 0.0303 oprofiledoprofiled(no symbols) 6149 0.0285 vmlinux-2.6.23 vmlinux-2.6.23 scsi_request_fn 5947 0.0276 ip_tablesip_tables(no symbols) 4510 0.0209 vmlinux-2.6.23 vmlinux-2.6.23 dma_4v_map_single 3823 0.0177 vmlinux-2.6.23 vmlinux-2.6.23 __make_request 3326 0.0154 vmlinux-2.6.23 vmlinux-2.6.23 tg3_poll 3162 0.0147 iscsi_trgt.koiscsi_trgt scsi_cmnd_exec 3091 0.0143 vmlinux-2.6.23 vmlinux-2.6.23 scsi_dispatch_cmd 2849 0.0132 vmlinux-2.6.23 vmlinux-2.6.23 tcp_v4_rcv 2811 0.0130 vmlinux-2.6.23 vmlinux-2.6.23 nf_iterate 2729 0.0127 vmlinux-2.6.23 vmlinux-2.6.23 _spin_lock_bh 2551 0.0118 vmlinux-2.6.23 vmlinux-2.6.23 kfree 2467 0.0114 vmlinux-2.6.23 vmlinux-2.6.23 kmem_cache_free 2314 0.0107 vmlinux-2.6.23 vmlinux-2.6.23 atomic_add 2065 0.0096 vmlinux-2.6.23 vmlinux-2.6.23 NGbzero_loop 1826 0.0085 vmlinux-2.6.23 vmlinux-2.6.23 ip_rcv 1823 0.0085 nf_conntrack_ipv4nf_conntrack_ipv4(no symbols) 1822 0.0085 vmlinux-2.6.23 vmlinux-2.6.23 clear_bit 1767 0.0082 python2.4python2.4(no symbols) 1734 0.0080 vmlinux-2.6.23 vmlinux-2.6.23 atomic_sub_ret 1694 0.0079 vmlinux-2.6.23 vmlinux-2.6.23 tcp_rcv_established 1673 0.0078 vmlinux-2.6.23 vmlinux-2.6.23 tcp_recvmsg 1670 0.0078 vmlinux-2.6.23 vmlinux-2.6.23 netif_receive_skb 1668 0.0077 vmlinux-2.6.23 vmlinux-2.6.23 set_bit 1545 0.0072 vmlinux-2.6.23 vmlinux-2.6.23 __kmalloc_track_caller 1526 0.0071 iptable_nat iptable_nat (no symbols) 1526 0.0071 vmlinux-2.6.23 vmlinux-2.6.23 kmem_cache_alloc 1373 0.0064 vmlinux-2.6.23 vmlinux-2.6.23 generic_unplug_device ... Is it enough ? Regards, JKB -- Ming Zhang @#$%^ purging memory... (*!% http://blackmagic02881.wordpress.com/ http://www.linkedin.com/in/blackmagic02881 - To
Re: [BUG] Raid1/5 over iSCSI trouble
Hello, Any news about this trouble ? Any idea ? I'm trying to fix it, but I don't see any specific interaction between raid5 and istd. Does anyone try to reproduce this bug on another arch than sparc64 ? I only use sparc32 and 64 servers and I cannot test on other archs. Of course, I have a laptop, but I cannot create a raid5 array on its internal HD to test this configuration ;-) Please note that I won't read my mails until next saturday morning (CEST). After disconnection of iSCSI target : Tasks: 232 total, 7 running, 224 sleeping, 0 stopped, 1 zombie Cpu(s): 0.0%us, 15.2%sy, 0.0%ni, 84.3%id, 0.0%wa, 0.1%hi, 0.3%si, 0.0%st Mem: 4139032k total, 4127584k used,11448k free,95752k buffers Swap: 7815536k total,0k used, 7815536k free, 3758792k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 9738 root 15 -5 000 R 100 0.0 4:56.82 md_d0_raid5 9774 root 15 -5 000 R 100 0.0 5:52.41 istd1 9739 root 15 -5 000 R 14 0.0 0:28.90 md_d0_resync 9916 root 20 0 3248 1544 1120 R2 0.0 0:00.56 top 4129 root 20 0 41648 5024 2432 S0 0.1 2:56.17 fail2ban-server 1 root 20 0 2576 960 816 S0 0.0 0:01.58 init 2 root 15 -5 000 S0 0.0 0:00.00 kthreadd 3 root RT -5 000 S0 0.0 0:00.00 migration/0 4 root 15 -5 000 S0 0.0 0:00.02 ksoftirqd/0 5 root RT -5 000 S0 0.0 0:00.00 migration/1 6 root 15 -5 000 S0 0.0 0:00.00 ksoftirqd/1 Regards, JKB - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Raid1/5 over iSCSI trouble
BERTRAND Joël wrote: Hello, Any news about this trouble ? Any idea ? I'm trying to fix it, but I don't see any specific interaction between raid5 and istd. Does anyone try to reproduce this bug on another arch than sparc64 ? I only use sparc32 and 64 servers and I cannot test on other archs. Of course, I have a laptop, but I cannot create a raid5 array on its internal HD to test this configuration ;-) Sure you can, a few loopback devices and a few iSCSI, and you're in business. I think the ongoing discussion of timeouts and whatnot may bear some fruit eventually, perhaps not as fast as you would like. By Saturday a solution may emerge. Please note that I won't read my mails until next saturday morning (CEST). -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Raid1/5 over iSCSI trouble
On 10/24/07, BERTRAND Joël [EMAIL PROTECTED] wrote: Hello, Any news about this trouble ? Any idea ? I'm trying to fix it, but I don't see any specific interaction between raid5 and istd. Does anyone try to reproduce this bug on another arch than sparc64 ? I only use sparc32 and 64 servers and I cannot test on other archs. Of course, I have a laptop, but I cannot create a raid5 array on its internal HD to test this configuration ;-) Can you collect some oprofile data, as Ming suggested, so we can maybe see what md_d0_raid5 and istd1 are fighting about? Hopefully it is as painless to run on sparc as it is on IA: opcontrol --start --vmlinux=/path/to/vmlinux wait opcontrol --stop opreport --image-path=/lib/modules/`uname -r` -l -- Dan - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Raid1/5 over iSCSI trouble
From: Dan Williams [EMAIL PROTECTED] Date: Wed, 24 Oct 2007 16:49:28 -0700 Hopefully it is as painless to run on sparc as it is on IA: opcontrol --start --vmlinux=/path/to/vmlinux wait opcontrol --stop opreport --image-path=/lib/modules/`uname -r` -l It is painless, I use it all the time. The only caveat is to make sure the /path/to/vmlinux is the pre-stripped kernel image. The images installed under /boot/ are usually stripped and thus not suitable for profiling. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Raid1/5 over iSCSI trouble
Bill Davidsen wrote: BERTRAND Joël wrote: Sorry for this last mail. I have found another mistake, but I don't know if this bug comes from iscsi-target or raid5 itself. iSCSI target is disconnected because istd1 and md_d0_raid5 kernel threads use 100% of CPU each ! Tasks: 235 total, 6 running, 227 sleeping, 0 stopped, 2 zombie Cpu(s): 0.1%us, 12.5%sy, 0.0%ni, 87.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 4139032k total, 218424k used, 3920608k free,10136k buffers Swap: 7815536k total,0k used, 7815536k free,64808k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 5824 root 15 -5 000 R 100 0.0 10:34.25 istd1 5599 root 15 -5 000 R 100 0.0 7:25.43 md_d0_raid5 Given that the summary shows 87.4% idle, something is not right. You might try another tool, like vmstat, to at least verify the way the CPU is being used. When you can't trust what your tools tell you it gets really hard to make decisions based on the data. Don't forget this box is a 32-CPU server. JKB - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Raid1/5 over iSCSI trouble
BERTRAND Joël wrote: Bill Davidsen wrote: Dan Williams wrote: On Fri, 2007-10-19 at 01:04 -0700, BERTRAND Joël wrote: I run for 12 hours some dd's (read and write in nullio) between initiator and target without any disconnection. Thus iSCSI code seems to be robust. Both initiator and target are alone on a single gigabit ethernet link (without any switch). I'm investigating... Can you reproduce on 2.6.22? Also, I do not think this is the cause of your failure, but you have CONFIG_DMA_ENGINE=y in your config. Setting this to 'n' will compile out the unneeded checks for offload engines in async_memcpy and async_xor. Given that offload engines are far less tested code, I think this is a very good thing to try! I'm trying wihtout CONFIG_DMA_ENGINE=y. istd1 only uses 40% of one CPU when I rebuild my raid1 array. 1% of this array was now resynchronized without any hang. Root gershwin:[/usr/scripts] cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md7 : active raid1 sdi1[2] md_d0p1[0] 1464725632 blocks [2/1] [U_] [] recovery = 1.0% (15705536/1464725632) finish=1103.9min speed=21875K/sec Same result... connection2:0: iscsi: detected conn error (1011) session2: iscsi: session recovery timed out after 120 secs sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery Regards, JKB - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Raid1/5 over iSCSI trouble
On Fri, 2007-10-19 at 14:04 -0700, BERTRAND Joël wrote: Sorry for this last mail. I have found another mistake, but I don't know if this bug comes from iscsi-target or raid5 itself. iSCSI target is disconnected because istd1 and md_d0_raid5 kernel threads use 100% of CPU each ! Tasks: 235 total, 6 running, 227 sleeping, 0 stopped, 2 zombie Cpu(s): 0.1%us, 12.5%sy, 0.0%ni, 87.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 4139032k total, 218424k used, 3920608k free,10136k buffers Swap: 7815536k total,0k used, 7815536k free,64808k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 5824 root 15 -5 000 R 100 0.0 10:34.25 istd1 5599 root 15 -5 000 R 100 0.0 7:25.43 md_d0_raid5 What is the output of: cat /proc/5824/wchan cat /proc/5599/wchan Thanks, Dan - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Raid1/5 over iSCSI trouble
BERTRAND Joël wrote: Sorry for this last mail. I have found another mistake, but I don't know if this bug comes from iscsi-target or raid5 itself. iSCSI target is disconnected because istd1 and md_d0_raid5 kernel threads use 100% of CPU each ! Tasks: 235 total, 6 running, 227 sleeping, 0 stopped, 2 zombie Cpu(s): 0.1%us, 12.5%sy, 0.0%ni, 87.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 4139032k total, 218424k used, 3920608k free,10136k buffers Swap: 7815536k total,0k used, 7815536k free,64808k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 5824 root 15 -5 000 R 100 0.0 10:34.25 istd1 5599 root 15 -5 000 R 100 0.0 7:25.43 md_d0_raid5 Given that the summary shows 87.4% idle, something is not right. You might try another tool, like vmstat, to at least verify the way the CPU is being used. When you can't trust what your tools tell you it gets really hard to make decisions based on the data. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Raid1/5 over iSCSI trouble
Bill Davidsen wrote: BERTRAND Joël wrote: Sorry for this last mail. I have found another mistake, but I don't know if this bug comes from iscsi-target or raid5 itself. iSCSI target is disconnected because istd1 and md_d0_raid5 kernel threads use 100% of CPU each ! Tasks: 235 total, 6 running, 227 sleeping, 0 stopped, 2 zombie Cpu(s): 0.1%us, 12.5%sy, 0.0%ni, 87.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 4139032k total, 218424k used, 3920608k free,10136k buffers Swap: 7815536k total,0k used, 7815536k free,64808k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 5824 root 15 -5 000 R 100 0.0 10:34.25 istd1 5599 root 15 -5 000 R 100 0.0 7:25.43 md_d0_raid5 Given that the summary shows 87.4% idle, something is not right. You might try another tool, like vmstat, to at least verify the way the CPU is being used. When you can't trust what your tools tell you it gets really hard to make decisions based on the data. ALSO: you have zombie processes. Looking at machines up for 45, 54, and 470 days, zombies are *not* something you just have to expect. Do you get these just about the same time things go to hell? Better you than me, I suspect there are still many ways to have a learning experience with iSCSI. Hope that and the summary confusion result in some useful data. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html