On 01/29/2016 09:07 PM, Mike Christie wrote:
> On 01/27/2016 05:17 PM, Christian Seiler wrote:
>> The test setup is quite simple: LIO target running on the host with
>> fileio backend (I tried both Debian Jessie host with 3.16 kernel and
>> Ubuntu 15.04 host with 3.19 kernel, no difference), initiator
>> running inside a libvirt/KVM instance (Debian sid); nothing special
>> about the setup otherwise. (The target itself shows no problems.)
> 
> So there are no other log messages on the LIO box? Something about max
> sectors being violated or a memory allocation failure?

I didn't notice that earlier, but I get:

kernel: fd_do_rw() write returned -22

(-22 is -EINVAL)

Weird. I could have sworn that when I first encountered the problem
I didn't see anything in the target's logs... Obviously I'm mistaken
and I'm very sorry that I didn't notice that earlier.

> What kernel version is LIO running on in these tests?

LIO targets I've tried:
 - 3.16.7-ckt20 (Debian Jessie)
 - 3.19.0-43-generic (Ubuntu 15.04)

I'll try a newer version tomorrow and see if the problem
persists.

> I think you are hitting a bug where the block layer is now sending
> really large IOs that LIO cannot handle or does not want to.
> 
> There was a change in LIO to better tell the the initiator what size to
> use, so get the LIO kernel version so we can check that.
> 
> You can still hit memory allocation failures in LIO and hit a similar
> issue, but with just a dd of 4MB IOs you should not hit the problem.

I just checked: 8388608 bytes (8 MiB) in a single dd are fine,
8388609 (1 byte more) consistently reproduces the error.

Ok, so this is then not actually an initiator problem but a LIO
target problem... Ok, thanks, then I'll investigate in that
direction (and ask the LIO people for help if I run into
problems).

But I'm curious - why haven't I seen any problems with older
kernel versions for the initiator? I've been using the same LIO
target for a long time (3.16 was released 1.5 years ago) and
I've never had any problems with it, with multiple different
kernel versions for the initiator (going back as far as 3.2).

So something must have also changed with the initiator some
time after 3.16 that it now triggers the bug...

Btw. is there any setting (sysctl or iscsid configuration
option) to adjust the IO size to work around the target
problem?

Thanks a lot for your help!

Regards,
Christian

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.

Reply via email to