Re: [Iscsitarget-devel] [BUG] Raid1/5 over iSCSI trouble
On Fri, 2007-10-19 at 23:04 +0200, BERTRAND Joël wrote: BERTRAND Joël wrote: BERTRAND Joël wrote: Bill Davidsen wrote: Dan Williams wrote: On Fri, 2007-10-19 at 01:04 -0700, BERTRAND Joël wrote: I run for 12 hours some dd's (read and write in nullio) between initiator and target without any disconnection. Thus iSCSI code seems to be robust. Both initiator and target are alone on a single gigabit ethernet link (without any switch). I'm investigating... Can you reproduce on 2.6.22? Also, I do not think this is the cause of your failure, but you have CONFIG_DMA_ENGINE=y in your config. Setting this to 'n' will compile out the unneeded checks for offload engines in async_memcpy and async_xor. Given that offload engines are far less tested code, I think this is a very good thing to try! I'm trying wihtout CONFIG_DMA_ENGINE=y. istd1 only uses 40% of one CPU when I rebuild my raid1 array. 1% of this array was now resynchronized without any hang. Root gershwin:[/usr/scripts] cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md7 : active raid1 sdi1[2] md_d0p1[0] 1464725632 blocks [2/1] [U_] [] recovery = 1.0% (15705536/1464725632) finish=1103.9min speed=21875K/sec Same result... connection2:0: iscsi: detected conn error (1011) session2: iscsi: session recovery timed out after 120 secs sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery Sorry for this last mail. I have found another mistake, but I don't know if this bug comes from iscsi-target or raid5 itself. iSCSI target is disconnected because istd1 and md_d0_raid5 kernel threads use 100% of CPU each ! Tasks: 235 total, 6 running, 227 sleeping, 0 stopped, 2 zombie Cpu(s): 0.1%us, 12.5%sy, 0.0%ni, 87.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 4139032k total, 218424k used, 3920608k free,10136k buffers Swap: 7815536k total,0k used, 7815536k free,64808k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 5824 root 15 -5 000 R 100 0.0 10:34.25 istd1 5599 root 15 -5 000 R 100 0.0 7:25.43 md_d0_raid5 i would rather use oprofile to check where cpu cycles went to. Regards, JKB - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Iscsitarget-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/iscsitarget-devel -- Ming Zhang @#$%^ purging memory... (*!% http://blackmagic02881.wordpress.com/ http://www.linkedin.com/in/blackmagic02881 - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [Iscsitarget-devel] [BUG] Raid1/5 over iSCSI trouble
BERTRAND Joël wrote: Ross S. W. Walker wrote: BERTRAND Joël wrote: BERTRAND Joël wrote: Bill Davidsen wrote: Dan Williams wrote: On Fri, 2007-10-19 at 01:04 -0700, BERTRAND Joël wrote: I run for 12 hours some dd's (read and write in nullio) between initiator and target without any disconnection. Thus iSCSI code seems to be robust. Both initiator and target are alone on a single gigabit ethernet link (without any switch). I'm investigating... Can you reproduce on 2.6.22? Also, I do not think this is the cause of your failure, but you have CONFIG_DMA_ENGINE=y in your config. Setting this to 'n' will compile out the unneeded checks for offload engines in async_memcpy and async_xor. Given that offload engines are far less tested code, I think this is a very good thing to try! I'm trying wihtout CONFIG_DMA_ENGINE=y. istd1 only uses 40% of one CPU when I rebuild my raid1 array. 1% of this array was now resynchronized without any hang. Root gershwin:[/usr/scripts] cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md7 : active raid1 sdi1[2] md_d0p1[0] 1464725632 blocks [2/1] [U_] [] recovery = 1.0% (15705536/1464725632) finish=1103.9min speed=21875K/sec Same result... connection2:0: iscsi: detected conn error (1011) session2: iscsi: session recovery timed out after 120 secs sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery I am unsure why you would want to setup an iSCSI RAID1, but before doing so I would try to verify that each independant iSCSI session is bullet proof. I use one and only one iSCSI session. Raid1 array is built between a local and iSCSI volume. Oh, in that case you will be much better served with DRBD, which would provide you with what you want without creating a Frankenstein setup... -Ross __ This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail, and any attachments thereto, is strictly prohibited. If you have received this e-mail in error, please immediately notify the sender and permanently delete the original and any copy or printout thereof. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Iscsitarget-devel] [BUG] Raid1/5 over iSCSI trouble
[snip] I am unsure why you would want to setup an iSCSI RAID1, but before doing so I would try to verify that each independant iSCSI session is bullet proof. I use one and only one iSCSI session. Raid1 array is built between a local and iSCSI volume. So you only get this problem doesn't happen when doing I/O with only the iSCSI session? Wouldn't it be better to do the RAID1 on the target machine? Then you don't need to mess around with weird timing behavior of remote/local writing. If you want to have the disks on 2 different machines and have them mirrored DRDB is the way to go. @Ross: He is trying mirroring his local drive with a iSCSI lun. JKB -- Scott Kaelin Sitrof Technologies [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [Iscsitarget-devel] [BUG] Raid1/5 over iSCSI trouble
BERTRAND Joël wrote: BERTRAND Joël wrote: BERTRAND Joël wrote: Bill Davidsen wrote: Dan Williams wrote: On Fri, 2007-10-19 at 01:04 -0700, BERTRAND Joël wrote: I run for 12 hours some dd's (read and write in nullio) between initiator and target without any disconnection. Thus iSCSI code seems to be robust. Both initiator and target are alone on a single gigabit ethernet link (without any switch). I'm investigating... Can you reproduce on 2.6.22? Also, I do not think this is the cause of your failure, but you have CONFIG_DMA_ENGINE=y in your config. Setting this to 'n' will compile out the unneeded checks for offload engines in async_memcpy and async_xor. Given that offload engines are far less tested code, I think this is a very good thing to try! I'm trying wihtout CONFIG_DMA_ENGINE=y. istd1 only uses 40% of one CPU when I rebuild my raid1 array. 1% of this array was now resynchronized without any hang. Root gershwin:[/usr/scripts] cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md7 : active raid1 sdi1[2] md_d0p1[0] 1464725632 blocks [2/1] [U_] [] recovery = 1.0% (15705536/1464725632) finish=1103.9min speed=21875K/sec Same result... connection2:0: iscsi: detected conn error (1011) session2: iscsi: session recovery timed out after 120 secs sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery Sorry for this last mail. I have found another mistake, but I don't know if this bug comes from iscsi-target or raid5 itself. iSCSI target is disconnected because istd1 and md_d0_raid5 kernel threads use 100% of CPU each ! Tasks: 235 total, 6 running, 227 sleeping, 0 stopped, 2 zombie Cpu(s): 0.1%us, 12.5%sy, 0.0%ni, 87.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 4139032k total, 218424k used, 3920608k free, 10136k buffers Swap: 7815536k total,0k used, 7815536k free, 64808k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 5824 root 15 -5 000 R 100 0.0 10:34.25 istd1 5599 root 15 -5 000 R 100 0.0 7:25.43 md_d0_raid5 Regards, JKB If you have 2 iSCSI sessions mirrored then any failure along either path will hose the setup. Plus having iSCSI and MD RAID fight over same resources in kernel is a recipe for a race condition. How about exploring MPIO and DRBD? -Ross __ This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail, and any attachments thereto, is strictly prohibited. If you have received this e-mail in error, please immediately notify the sender and permanently delete the original and any copy or printout thereof. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html