Re: Possible bug in open-iSCSI

2009-03-14 Thread Donald Williams
Another test might be to take the filesystem out of the equation.   Use 'dd'
or 'dt' to write out past the 2GB mark and see what error results.
Don

On Wed, Mar 11, 2009 at 4:42 AM, sushrut shirole
shirole.sush...@gmail.comwrote:

 Thanks a lot .. ill let u know about this ..

 2009/3/10 Konrad Rzeszutek kon...@virtualiron.com


 On Tue, Mar 10, 2009 at 12:34:55PM +0530, sushrut shirole wrote:
 
  Hi All,

 Hey Sushrut,

 I am also cross-posting my response to the linux-scsi mailing list
 in case they have insight in this problem.

I am currently guiding few students who are working on unh-iSCSI
  target. Currently we are simulating some faults at a target side .
  Like we are adding an error injection module to unh-iSCSI , so that
  one can test how initiator behaves on particular error .
as a part of it we injects a fault in report LUN size . where we
  report a wrong LUN size . ( Suppose a LUN is of size 2 gb we report it
  as a 4 gb ).(Microsoft and open-iSCSI initiators we are using ).When
  we try formatting this LUN on open-iSCSI initiator it formats this LUN
  . In fact it doesn't give any error when we try to read or lseek 4gb
  of data . But on Microsoft initiator we get an error when we try to
  format this LUN . So is this a bug of open-iSCSI or this is bug of
  read lseek ?

 The Open-iSCSI does not investigate any SCSI commands (except the AEN
 which
 gets is own special iSCSI PDU header).

 What you are looking at is the SCSI middle-layer, or the block-device
 layer,
 or the target not reporting an error, at being potentially faulty.
 What Linux kernel does when you lseek to a location past 2GB and do a
 read,
 is to transmute the request to a SCSI READ command.

 That SCSI READ command (you can see what the fields look like when you
 capture it under ethereal) specifies what sector it wants. Open-iSCSI
 wraps that SCSI command in its own header and puts it in a TCP packet
 destined to the target. The the target should then report a failure
 (sending a SCSI SENSE value reporting a problem). Now it might be that
 SCSI
 middle layer doesn't understand that error condition and passes it on as
 OK.
 Or it might be that the target doesn't report a failure and returns
 garbage/null data.

 What I would suggest is to do a comparison. Create a test setup where you
 have a real 4GB LUN, do a lseek/read above 2GB and capture all of that
 traffic using wireshark/ethereal. Then do the same test but with a 2GB LUN
 that looks like a 4GB and see what the traffic looks like.

 If it looks the same then somehow the target isn't reporting the right
 error. Which implies that when Microsoft formats the disks they verify it
 - by
 rereading the data they wrote in and failing if the doesn't match. Which
 might
 not be what mkfs.ext3 does under Linux - look in the man-page to find out.
 But
 by using lseek/read (or just do a dd with the skip argument - look in the
 manpage
 for more details) a couple of times on the same sector and you should see
 different data as well.

 If the TCP dump looks different, and the target reports a error and the
 Linux kernel
 doesn't do anything then it is time to dig through the code (scsi_error.c)
 to find
 why Linux doesn't see it as. Make sure you do use the latest kernel
 thought - which as of
 today is 2.6.29-rc7-git3. And if you do find the problem post a patch
 on the linux-scsi mailing list.

 
  --
  Thanks,

 Hope this lengthy explanation helps in your endeavor.




 


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Possible bug in open-iSCSI

2009-03-11 Thread sushrut shirole
Thanks a lot .. ill let u know about this ..

2009/3/10 Konrad Rzeszutek kon...@virtualiron.com


 On Tue, Mar 10, 2009 at 12:34:55PM +0530, sushrut shirole wrote:
 
  Hi All,

 Hey Sushrut,

 I am also cross-posting my response to the linux-scsi mailing list
 in case they have insight in this problem.

I am currently guiding few students who are working on unh-iSCSI
  target. Currently we are simulating some faults at a target side .
  Like we are adding an error injection module to unh-iSCSI , so that
  one can test how initiator behaves on particular error .
as a part of it we injects a fault in report LUN size . where we
  report a wrong LUN size . ( Suppose a LUN is of size 2 gb we report it
  as a 4 gb ).(Microsoft and open-iSCSI initiators we are using ).When
  we try formatting this LUN on open-iSCSI initiator it formats this LUN
  . In fact it doesn't give any error when we try to read or lseek 4gb
  of data . But on Microsoft initiator we get an error when we try to
  format this LUN . So is this a bug of open-iSCSI or this is bug of
  read lseek ?

 The Open-iSCSI does not investigate any SCSI commands (except the AEN which
 gets is own special iSCSI PDU header).

 What you are looking at is the SCSI middle-layer, or the block-device
 layer,
 or the target not reporting an error, at being potentially faulty.
 What Linux kernel does when you lseek to a location past 2GB and do a read,
 is to transmute the request to a SCSI READ command.

 That SCSI READ command (you can see what the fields look like when you
 capture it under ethereal) specifies what sector it wants. Open-iSCSI
 wraps that SCSI command in its own header and puts it in a TCP packet
 destined to the target. The the target should then report a failure
 (sending a SCSI SENSE value reporting a problem). Now it might be that SCSI
 middle layer doesn't understand that error condition and passes it on as
 OK.
 Or it might be that the target doesn't report a failure and returns
 garbage/null data.

 What I would suggest is to do a comparison. Create a test setup where you
 have a real 4GB LUN, do a lseek/read above 2GB and capture all of that
 traffic using wireshark/ethereal. Then do the same test but with a 2GB LUN
 that looks like a 4GB and see what the traffic looks like.

 If it looks the same then somehow the target isn't reporting the right
 error. Which implies that when Microsoft formats the disks they verify it -
 by
 rereading the data they wrote in and failing if the doesn't match. Which
 might
 not be what mkfs.ext3 does under Linux - look in the man-page to find out.
 But
 by using lseek/read (or just do a dd with the skip argument - look in the
 manpage
 for more details) a couple of times on the same sector and you should see
 different data as well.

 If the TCP dump looks different, and the target reports a error and the
 Linux kernel
 doesn't do anything then it is time to dig through the code (scsi_error.c)
 to find
 why Linux doesn't see it as. Make sure you do use the latest kernel thought
 - which as of
 today is 2.6.29-rc7-git3. And if you do find the problem post a patch
 on the linux-scsi mailing list.

 
  --
  Thanks,

 Hope this lengthy explanation helps in your endeavor.

 


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---