Bug#805252: linux-image-4.2.0-1-amd64: I/O errors when writing to iSCSI volumes
Control: tags -1 + fixed-upstream patch jessie Control: found -1 3.16.7-ckt20-1+deb8u3 Control: fixed -1 4.1.1-1~exp1 Hi, so I've found out that this is actually a bug in LIO (the target) not the initiator. Problem is: LIO in Linux up to 4.0 uses vfs_writev to write out blocks when using the fileio backend, so this has a limit of 1024 iovecs (as per UIO_MAXIOV in include/uapi/linux/uio.h), each at most a page in size, so that gives us a maximum of 4 MiB in data that can be processed per request. (At 4 kiB page size.) Older versions of the Linux software initiator had a hard limit of 512 kiB per request, which means that at most 128 iovec entries were used, which fits perfectly. Newer versions of the Linux iSCSI initiator don't have this hard-coded limit but rather use the value supplied by the target. (This is correct behavior by the initiator, so there's no bug there, against what I initially assumed.) Problem now is that LIO with the fileio backend assumes that 8 MiB may be transfered at the same time, because (according to comments in drivers/target/target_core_file.h) they assume for some reason unbeknownst to me that the maximum number of iovecs is 2048. Note that this also means that any non-Linux initiator that properly follows the target-supplied values for the maximum I/O size will run into the same problem and cause I/O errors, even if it behaves properly. This problem doesn't affect upstream anymore, because they have rewritten LIO to use vfs_iter_write, which doesn't have such limitations, but was only introduced after 3.16. Backporting this would be too much, and probably ABI-incompatible. Fortunately, there's a much easier way to fix this, by just lowering the limit in drivers/target/target_core_file.h to 4 MiB. I've tested that and the limit will be properly set by LIO and newer initiators won't choke on that, so that fixes the bug. See the attached patch. CAVEAT: there is a slight problem with this change, and I don't know what the best solution here is: the optimal_sectors setting for fileio disks on people's existing setups is likely to be 16384, because that corresponds to 8 MiB (the previous max value, which is the default for optimal_sectors if no other value is set) - but that will cause the kernel to refuse that setting, because it's now larger than the maximum allowed value. If you use targetcli 3 (not part of Jessie, but you can install e.g. the version from sid) then that will fail to set up the target properly, because it will abort as soon as it notices that it can't make the setting. (Leftover targetcli 2 from Wheezy upgrades should not be affected as badly as far as I can tell, because the startup scripts seem to ignore errors. But I haven't tested that.) So that leaves the situation that without this fix, 3.16 kernels produce I/O errors when used with initiators that respect the kernel's setting, but with the fix the target configuration needs to be updated. (Of course, one could also patch the kernel to ignore the specific value of 16384 for optimal_sectors if fileio is used as a backend and print a warning.) Don't know what you'd prefer here. Also note that this likely also affects the kernel in Wheezy, although I haven't done any tests in that direction. Regards, Christian PS, for reference, upstream discussion on the initiator mailing list that resulted in me finding out that it's not the initiator but the target that was the problem: https://groups.google.com/forum/#!topic/open-iscsi/UE2JJfDmQ7w From: Christian SeilerDate: Sat, 30 Jan 2016 13:48:54 CET Subject: LIO: assume a maximum of 1024 iovecs Previously the code assumed that vfs_[read|write}v supported 2048 iovecs, which is incorrect, as UIO_MAXIOV is 1024 instead. This caused the target to advertise a maximum I/O size that was too large, which in turn caused conforming initiators (most notably recent Linux kernels, which started to respect the maximum I/O size of the target and not have a hard-coded 512 kiB as previous kernel versions did) to send write requests that were too large, resulting in LIO rejecting them (kernel: fd_do_rw() write returned -22), resulting in data loss. This patch adjusts the limit to 1024 iovecs, and also uses the PAGE_SIZE macro instead of just assuming 4 kiB pages. Signed-off-by: Christian Seiler --- drivers/target/target_core_file.h | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) --- a/drivers/target/target_core_file.h +++ b/drivers/target/target_core_file.h @@ -1,6 +1,8 @@ #ifndef TARGET_CORE_FILE_H #define TARGET_CORE_FILE_H +#include + #define FD_VERSION "4.0" #define FD_MAX_DEV_NAME 256 @@ -9,9 +11,9 @@ #define FD_MAX_DEVICE_QUEUE_DEPTH 128 #define FD_BLOCKSIZE 512 /* - * Limited by the number of iovecs (2048) per vfs_[writev,readv] call + * Limited by the number of iovecs (1024) per vfs_[writev,readv] call */ -#define FD_MAX_BYTES 8388608 +#define FD_MAX_BYTES (1024*PAGE_SIZE) #define
Processed: Re: Bug#805252: linux-image-4.2.0-1-amd64: I/O errors when writing to iSCSI volumes
Processing control commands: > tags -1 + fixed-upstream patch jessie Bug #805252 [src:linux] linux-image-4.2.0-1-amd64: I/O errors when writing to iSCSI volumes Added tag(s) jessie, fixed-upstream, and patch. > found -1 3.16.7-ckt20-1+deb8u3 Bug #805252 [src:linux] linux-image-4.2.0-1-amd64: I/O errors when writing to iSCSI volumes Marked as found in versions linux/3.16.7-ckt20-1+deb8u3. > fixed -1 4.1.1-1~exp1 Bug #805252 [src:linux] linux-image-4.2.0-1-amd64: I/O errors when writing to iSCSI volumes Marked as fixed in versions linux/4.1.1-1~exp1. -- 805252: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=805252 Debian Bug Tracking System Contact ow...@bugs.debian.org with problems
Bug#805252: linux-image-4.2.0-1-amd64: I/O errors when writing to iSCSI volumes
Control: tags -1 - fixed-upstream patch Control: notfixed -1 4.4~rc5-1~exp1 Hi, I'm sorry, but the commit in question doesn't help. I just got around to testing this with 4.4-1~exp1, and I verified that the sources do indeed contain the aforementioned patch, and the problem persists. I also tried this with the most recent upstream kernel git tree [1] and the problem also persists there. Using dd if=/dev/zero of=test.dat it will reproducibly cause lots of I/O errors *before* the disk runs full. As already said in the original report, the most recent 3.16 kernel that comes with Jessie does not show this problem, even when used with exactly the same userland (up to date sid), there the dd command will just create a large file until the disk is full (as expected). I'll ask open-iscsi upstream for some help with this, but wanted to make sure this is properly tracked in Debian's bugtracker. Regards, Christian [1] Latest commit at time of testing: 03c21cb775a313f1ff19be59c5d02df3e3526471 Built the custom kernel via make-kpkg, using 4.4-1~exp1's config as a basis and then running make oldconfig. signature.asc Description: OpenPGP digital signature
Bug#805252: linux-image-4.2.0-1-amd64: I/O errors when writing to iSCSI volumes
Package: src:linux Version: 4.2.6-1 Severity: important Tags: upstream Dear Maintainer, with this kernel version writing to iSCSI volumes will consistently produce I/O errors. It appears as though this happens if a lot of writes are performed at once (but I'm not sure of that). I've seen this in the version in unstable (i.e. the version I'm reporting this against), but also with 4.3-1~exp1, and also with vanilla upstream git from just now. Steps to reproduce: - have an iSCSI target ready (hardware or e.g. targetcli) - log in to the iSCSI target in a VM or on a separate computer - dd if=/dev/zero of=/dev/iscsidevice bs=4M -> will consistently produce I/O errors after a short amount of time Reading from the same device is not a problem, I can dd the whole device to /dev/null without any errors. For example, the latest testing netinst installer is not able to install Debian on iSCSI rootfs because of this: when using ext4 as the root filesystem mkfs.ext4 will fail due to I/O errors consistently, when using btrfs the filesystem will be created successfully, but the package installation will fail with I/O errors, presumeably due to a different block access strategy. Logs I gathered with the latest upstream git kernel are (when doing dd if=/dev/zero of=/mnt/foo bs=4M until disk full on a mounted iSCSI drive): --- [ 21.473911] sd 2:0:0:0: [sda] tag#3 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [ 21.473914] sd 2:0:0:0: [sda] tag#3 Sense Key : Not Ready [current] [ 21.473916] sd 2:0:0:0: [sda] tag#3 Add. Sense: Logical unit communication failure [ 21.473917] sd 2:0:0:0: [sda] tag#3 CDB: Write(10) 2a 00 00 38 15 d8 00 20 28 00 [ 21.473918] blk_update_request: I/O error, dev sda, sector 3675608 [ 21.473939] EXT4-fs warning (device sda1): ext4_end_bio:329: I/O error -5 writing to inode 46 (offset 50331648 size 4214784 starting block 459707) [ 21.473941] Buffer I/O error on device sda1, logical block 457403 [ 21.473954] Buffer I/O error on device sda1, logical block 457404 [ 21.473966] Buffer I/O error on device sda1, logical block 457405 [ 21.473978] Buffer I/O error on device sda1, logical block 457406 [ 21.473990] Buffer I/O error on device sda1, logical block 457407 [ 21.474003] Buffer I/O error on device sda1, logical block 457408 [ 21.474015] Buffer I/O error on device sda1, logical block 457409 [ 21.474027] Buffer I/O error on device sda1, logical block 457410 [ 21.474039] Buffer I/O error on device sda1, logical block 457411 [ 21.474051] Buffer I/O error on device sda1, logical block 457412 [ 21.474096] EXT4-fs warning (device sda1): ext4_end_bio:329: I/O error -5 writing to inode 46 (offset 50331648 size 4214784 starting block 459963) [ 21.474128] EXT4-fs warning (device sda1): ext4_end_bio:329: I/O error -5 writing to inode 46 (offset 50331648 size 4214784 starting block 460219) [ 21.474162] EXT4-fs warning (device sda1): ext4_end_bio:329: I/O error -5 writing to inode 46 (offset 50331648 size 4214784 starting block 460475) [ 21.474191] EXT4-fs warning (device sda1): ext4_end_bio:329: I/O error -5 writing to inode 46 (offset 50331648 size 4214784 starting block 460480) [ 21.478983] sd 2:0:0:0: [sda] tag#12 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [ 21.478986] sd 2:0:0:0: [sda] tag#12 Sense Key : Not Ready [current] [ 21.478987] sd 2:0:0:0: [sda] tag#12 Add. Sense: Logical unit communication failure [ 21.478989] sd 2:0:0:0: [sda] tag#12 CDB: Write(10) 2a 00 00 3c 1f 20 00 20 e0 00 [ 21.478990] blk_update_request: I/O error, dev sda, sector 3940128 [ 21.479010] EXT4-fs warning (device sda1): ext4_end_bio:329: I/O error -5 writing to inode 46 (offset 16777216 size 4308992 starting block 492772) [ 21.479049] EXT4-fs warning (device sda1): ext4_end_bio:329: I/O error -5 writing to inode 46 (offset 16777216 size 4308992 starting block 493028) [ 21.479079] EXT4-fs warning (device sda1): ext4_end_bio:329: I/O error -5 writing to inode 46 (offset 16777216 size 4308992 starting block 493284) [ 21.479108] EXT4-fs warning (device sda1): ext4_end_bio:329: I/O error -5 writing to inode 46 (offset 16777216 size 4308992 starting block 493540) [ 21.479136] EXT4-fs warning (device sda1): ext4_end_bio:329: I/O error -5 writing to inode 46 (offset 16777216 size 4308992 starting block 493568) [ 21.483612] sd 2:0:0:0: [sda] tag#15 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [ 21.483615] sd 2:0:0:0: [sda] tag#15 Sense Key : Not Ready [current] [ 21.483617] sd 2:0:0:0: [sda] tag#15 Add. Sense: Logical unit communication failure [ 21.483618] sd 2:0:0:0: [sda] tag#15 CDB: Write(10) 2a 00 00 33 14 00 00 2c 00 00 [ 21.483620] blk_update_request: I/O error, dev sda, sector 3347456 [ 21.490616] sd 2:0:0:0: [sda] tag#20 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [ 21.490619] sd 2:0:0:0: