Re: DataDigest CRC mismatches

2009-07-15 Thread Mark van Walraven

On Wed, Jul 15, 2009 at 07:14:40PM -0700, mala...@us.ibm.com wrote:
> Seems to be a well known problem with iSCSI data digests and mirrored devices.
> 
> See this (iscsi issue):
> http://thread.gmane.org/gmane.linux.iscsi.open-iscsi/2670/focus=48961
> or  (dm-raid1 issue)  
> 
> http://thread.gmane.org/gmane.linux.kernel.device-mapper.devel/5392

That clearly looks like it.

Ouch.  Looks very difficult to fix without copy the page data before
transmission/checksumming. :-(  I'll do more more reading.

Thanks again,

Mark.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: DataDigest CRC mismatches

2009-07-15 Thread malahal

Mark van Walraven [ma...@netvalue.net.nz] wrote:
> 
> Do you mean a well known problem with zero-copy block devices or a well
> known problem with iscsi with data digests?

Seems to be a well known problem with iSCSI data digests and mirrored devices.

See this (iscsi issue):
http://thread.gmane.org/gmane.linux.iscsi.open-iscsi/2670/focus=48961

or  (dm-raid1 issue)  

http://thread.gmane.org/gmane.linux.kernel.device-mapper.devel/5392

Thanks, Malahal.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: DataDigest CRC mismatches

2009-07-15 Thread Mark van Walraven

Hi Malahal,

Thanks for your response.

On Wed, Jul 15, 2009 at 09:41:58AM -0700, mala...@us.ibm.com wrote:
> > I've been playing with kvm (virtio_blk, writeback) -> dm_multipath 
> > (failover, queue_if_no_path) -> open-iscsi -> gigE -> IET on a new 
> > server, winding up the queue and segment lengths and I'm getting 
> > frequent disconnects during heavy writes from the KVM guest.  Wireshark 
> > shows a PDU with a incorrect DataDigest (sample at 
> > http://www.interspeed.co.nz/crcerr.pcap for a little while) just before 
> > IET resets the connection (reasonably, if it gets the same CRC mis-match).
> 
> What kind of application are you using to generate the write I/O? It is

The application is KVM (qemu-kvm-0.10.5), running a single Debian Lenny
instance with the iscsi device visible to the guest as a virtio disk.
I've found running this on the guest is a pretty reliable way to produce
the problem:

find / > /dev/null ; sync

The guest filesystems are ext3, so presumably journal flushes are the
trigger ...

> possible that a file system (or some other application) can modify the
> write buffer once it is submitted to the block layer. Any modification
> done after generating CRC is going cause CRC mismatch. This is a well
> known problem!

Do you mean a well known problem with zero-copy block devices or a well
known problem with iscsi with data digests?

I've been trawling through the code and if I understand correctly,
iscsi_sw_tcp_xmit_segment() uses iscsi_tcp_segment_done() to calculate
the digest after each sendpage or sendmsg.  Do you think the segment data
might be getting modified in between sendpage/sendmsg and packet assembly?

(sendpage looks to be sock_no_sendpage if data digests are enabled.)

If access to the segment data isn't exclusive during the execution of
iscsi_sw_tcp_xmit_segment(), then I suppose there is also the chance the
data might be altered between sendpage/sendmsg and crypto_hash_final()
completing.

(FWIW, the same thing happens with 871, built from source.)

Thanks,

Mark.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: DataDigest CRC mismatches

2009-07-15 Thread malahal

Mark van Walraven [ma...@netvalue.net.nz] wrote:
> 
> Hi All,
> 
> I've been playing with kvm (virtio_blk, writeback) -> dm_multipath 
> (failover, queue_if_no_path) -> open-iscsi -> gigE -> IET on a new 
> server, winding up the queue and segment lengths and I'm getting 
> frequent disconnects during heavy writes from the KVM guest.  Wireshark 
> shows a PDU with a incorrect DataDigest (sample at 
> http://www.interspeed.co.nz/crcerr.pcap for a little while) just before 
> IET resets the connection (reasonably, if it gets the same CRC mis-match).

What kind of application are you using to generate the write I/O? It is
possible that a file system (or some other application) can modify the
write buffer once it is submitted to the block layer. Any modification
done after generating CRC is going cause CRC mismatch. This is a well
known problem!

-Malahal.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---