Re: [Iscsitarget-devel] [BUG] Raid1/5 over iSCSI trouble

2007-10-19 Thread Ming Zhang
On Fri, 2007-10-19 at 23:04 +0200, BERTRAND Joël wrote:
 BERTRAND Joël wrote:
  BERTRAND Joël wrote:
  Bill Davidsen wrote:
  Dan Williams wrote:
  On Fri, 2007-10-19 at 01:04 -0700, BERTRAND Joël wrote:
   
  I run for 12 hours some dd's (read and write in nullio)
  between
  initiator and target without any disconnection. Thus iSCSI code seems
  to
  be robust. Both initiator and target are alone on a single gigabit
  ethernet link (without any switch). I'm investigating...
  
 
  Can you reproduce on 2.6.22?
 
  Also, I do not think this is the cause of your failure, but you have
  CONFIG_DMA_ENGINE=y in your config.  Setting this to 'n' will compile
  out the unneeded checks for offload engines in async_memcpy and
  async_xor.
 
  Given that offload engines are far less tested code, I think this is 
  a very good thing to try!
 
  I'm trying wihtout CONFIG_DMA_ENGINE=y. istd1 only uses 40% of one 
  CPU when I rebuild my raid1 array. 1% of this array was now 
  resynchronized without any hang.
 
  Root gershwin:[/usr/scripts]  cat /proc/mdstat
  Personalities : [raid1] [raid6] [raid5] [raid4]
  md7 : active raid1 sdi1[2] md_d0p1[0]
1464725632 blocks [2/1] [U_]
[]  recovery =  1.0% (15705536/1464725632) 
  finish=1103.9min speed=21875K/sec
  
  Same result...
  
  connection2:0: iscsi: detected conn error (1011)
  
   session2: iscsi: session recovery timed out after 120 secs
  sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
  sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
  sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
  sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
  sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
  sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
  sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
 
   Sorry for this last mail. I have found another mistake, but I don't 
 know if this bug comes from iscsi-target or raid5 itself. iSCSI target 
 is disconnected because istd1 and md_d0_raid5 kernel threads use 100% of 
 CPU each !
 
 Tasks: 235 total,   6 running, 227 sleeping,   0 stopped,   2 zombie
 Cpu(s):  0.1%us, 12.5%sy,  0.0%ni, 87.4%id,  0.0%wa,  0.0%hi,  0.0%si, 
 0.0%st
 Mem:   4139032k total,   218424k used,  3920608k free,10136k buffers
 Swap:  7815536k total,0k used,  7815536k free,64808k cached
 
PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND 
 
   5824 root  15  -5 000 R  100  0.0  10:34.25 istd1 
 
   5599 root  15  -5 000 R  100  0.0   7:25.43 
 md_d0_raid5
 

i would rather use oprofile to check where cpu cycles went to.


   Regards,
 
   JKB
 
 -
 This SF.net email is sponsored by: Splunk Inc.
 Still grepping through log files to find problems?  Stop.
 Now Search log events and configuration files using AJAX and a browser.
 Download your FREE copy of Splunk now  http://get.splunk.com/
 ___
 Iscsitarget-devel mailing list
 [EMAIL PROTECTED]
 https://lists.sourceforge.net/lists/listinfo/iscsitarget-devel
-- 
Ming Zhang


@#$%^ purging memory... (*!%
http://blackmagic02881.wordpress.com/
http://www.linkedin.com/in/blackmagic02881


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Iscsitarget-devel] [BUG] Raid1/5 over iSCSI trouble

2007-10-19 Thread Ross S. W. Walker
BERTRAND Joël wrote:
 
 Ross S. W. Walker wrote:
  BERTRAND Joël wrote:
  BERTRAND Joël wrote:
  Bill Davidsen wrote:
  Dan Williams wrote:
  On Fri, 2007-10-19 at 01:04 -0700, BERTRAND Joël wrote:
   
  I run for 12 hours some dd's (read and write in nullio)
  between
  initiator and target without any disconnection. Thus 
  iSCSI code seems
  to
  be robust. Both initiator and target are alone on a 
  single gigabit
  ethernet link (without any switch). I'm investigating...
  
  Can you reproduce on 2.6.22?
 
  Also, I do not think this is the cause of your failure, 
  but you have
  CONFIG_DMA_ENGINE=y in your config.  Setting this to 'n' 
  will compile
  out the unneeded checks for offload engines in async_memcpy and
  async_xor.
  Given that offload engines are far less tested code, I 
  think this is a 
  very good thing to try!
  I'm trying wihtout CONFIG_DMA_ENGINE=y. istd1 only uses 
  40% of one 
  CPU when I rebuild my raid1 array. 1% of this array was now 
  resynchronized without any hang.
 
  Root gershwin:[/usr/scripts]  cat /proc/mdstat
  Personalities : [raid1] [raid6] [raid5] [raid4]
  md7 : active raid1 sdi1[2] md_d0p1[0]
1464725632 blocks [2/1] [U_]
[]  recovery =  1.0% 
  (15705536/1464725632) 
  finish=1103.9min speed=21875K/sec
 Same result...
 
  connection2:0: iscsi: detected conn error (1011)
   
session2: iscsi: session recovery timed out 
 after 120 secs
  sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
  sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
  sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
  sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
  sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
  sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
  sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
  
  I am unsure why you would want to setup an iSCSI RAID1, but before
  doing so I would try to verify that each independant iSCSI session
  is bullet proof.
 
   I use one and only one iSCSI session. Raid1 array is 
 built between a 
 local and iSCSI volume.

Oh, in that case you will be much better served with DRBD, which
would provide you with what you want without creating a Frankenstein
setup...

-Ross

__
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Iscsitarget-devel] [BUG] Raid1/5 over iSCSI trouble

2007-10-19 Thread Scott Kaelin
[snip]
 
  I am unsure why you would want to setup an iSCSI RAID1, but before
  doing so I would try to verify that each independant iSCSI session
  is bullet proof.

 I use one and only one iSCSI session. Raid1 array is built between a
 local and iSCSI volume.

So you only get this problem doesn't happen when doing I/O with only
the iSCSI session?

Wouldn't it be better to do the RAID1 on the target machine? Then you
don't need to mess around with weird timing behavior of remote/local
writing.

If you want to have the disks on 2 different machines and have them
mirrored DRDB is the way to go.

@Ross: He is trying mirroring his local drive with a iSCSI lun.


 JKB






-- 
Scott Kaelin
Sitrof Technologies
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Iscsitarget-devel] [BUG] Raid1/5 over iSCSI trouble

2007-10-19 Thread Ross S. W. Walker
BERTRAND Joël wrote:
 
 BERTRAND Joël wrote:
  BERTRAND Joël wrote:
  Bill Davidsen wrote:
  Dan Williams wrote:
  On Fri, 2007-10-19 at 01:04 -0700, BERTRAND Joël wrote:
   
  I run for 12 hours some dd's (read and write in nullio)
  between
  initiator and target without any disconnection. Thus 
 iSCSI code seems
  to
  be robust. Both initiator and target are alone on a 
 single gigabit
  ethernet link (without any switch). I'm investigating...
  
 
  Can you reproduce on 2.6.22?
 
  Also, I do not think this is the cause of your failure, 
 but you have
  CONFIG_DMA_ENGINE=y in your config.  Setting this to 'n' 
 will compile
  out the unneeded checks for offload engines in async_memcpy and
  async_xor.
 
  Given that offload engines are far less tested code, I 
 think this is 
  a very good thing to try!
 
  I'm trying wihtout CONFIG_DMA_ENGINE=y. istd1 only 
 uses 40% of one 
  CPU when I rebuild my raid1 array. 1% of this array was now 
  resynchronized without any hang.
 
  Root gershwin:[/usr/scripts]  cat /proc/mdstat
  Personalities : [raid1] [raid6] [raid5] [raid4]
  md7 : active raid1 sdi1[2] md_d0p1[0]
1464725632 blocks [2/1] [U_]
[]  recovery =  1.0% 
 (15705536/1464725632) 
  finish=1103.9min speed=21875K/sec
  
  Same result...
  
  connection2:0: iscsi: detected conn error (1011)
  
   session2: iscsi: session recovery timed out after 120 secs
  sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
  sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
  sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
  sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
  sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
  sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
  sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
 
   Sorry for this last mail. I have found another mistake, 
 but I don't 
 know if this bug comes from iscsi-target or raid5 itself. 
 iSCSI target 
 is disconnected because istd1 and md_d0_raid5 kernel threads 
 use 100% of 
 CPU each !
 
 Tasks: 235 total,   6 running, 227 sleeping,   0 stopped,   2 zombie
 Cpu(s):  0.1%us, 12.5%sy,  0.0%ni, 87.4%id,  0.0%wa,  0.0%hi, 
  0.0%si, 
 0.0%st
 Mem:   4139032k total,   218424k used,  3920608k free,
 10136k buffers
 Swap:  7815536k total,0k used,  7815536k free,
 64808k cached
 
PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND 
 
   5824 root  15  -5 000 R  100  0.0  10:34.25 istd1 
 
   5599 root  15  -5 000 R  100  0.0   7:25.43 
 md_d0_raid5
 
   Regards,
 
   JKB

If you have 2 iSCSI sessions mirrored then any failure along either
path will hose the setup. Plus having iSCSI and MD RAID fight over
same resources in kernel is a recipe for a race condition.

How about exploring MPIO and DRBD?

-Ross

__
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html