On Friday, March 15, 2013 6:02:05 AM UTC-5, [email protected] wrote: > > Hi, > > we are using KVMs with root backed by iSCSI LUNs mapped from > Netapp. Occasionally, the device get write errors and it is remounted > read-only > > Mar 15 10:16:59 rb-vertica-hds2-devel dhclient[5053]: DHCPACK from > 172.30.40.175 (xid=0x47a97e90) > Mar 15 10:17:00 rb-vertica-hds2-devel dhclient[5053]: bound to > 172.30.40.92 -- renewal in 47 seconds. > Mar 15 10:17:03 rb-vertica-hds2-devel kernel: Buffer I/O error on device > vda1, logical block 708624 > Mar 15 10:17:03 rb-vertica-hds2-devel kernel: lost page write due to I/O > error on vda1 > .. > Mar 15 10:17:32 rb-vertica-hds2-devel kernel: Buffer I/O error on device > vda1, logical block 903881 > Mar 15 10:17:32 rb-vertica-hds2-devel kernel: lost page write due to I/O > error on vda1 > Mar 15 10:17:32 rb-vertica-hds2-devel kernel: Buffer I/O error on device > vda1, logical block 1705084 > Mar 15 10:17:32 rb-vertica-hds2-devel kernel: lost page write due to I/O > error on vda1 > Mar 15 10:17:32 rb-vertica-hds2-devel kernel: JBD2: Detected IO errors > while flushing file data on vda1-8 > > When the problem happens there are NO errors in logs on compute node. I'm > running 'iscsiadm -m session -P3' every 5s. > It shows no change in session or LUN state. I did also 'iscsid' with -d8 > which also shows nothing around the time. > > How do I identify where are these write errors coming from ? > * problem on virtio-blk ?? Not likely. > * iscsi client problem connecting to target > * actual write problem on target > > Example KVM device definition > > <disk type='block' device='disk'> > <driver name='qemu' type='raw' cache='none'/> > <source > dev='/dev/disk/by-path/ip-172.30.128.3:3260-iscsi-iqn.1992-08.com.netapp:node.netapp02-lun-17'/> > <target dev='vda' bus='virtio'/> > <alias name='virtio-disk0'/> > <address type='pci' domain='0x0000' bus='0x00' slot='0x04' > function='0x0'/> > </disk> > > iSCSI session has default configuration > > iscsiadm -m node -T iqn.1992-08.com.netapp:node.netapp02 > ... > node.session.cmds_max = 128 > node.session.queue_depth = 32 > node.session.timeo.replacement_timeout = 120 > node.session.err_timeo.abort_timeout = 15 > node.session.err_timeo.lu_reset_timeout = 30 > node.session.err_timeo.tgt_reset_timeout = 30 > node.session.err_timeo.host_reset_timeout = 60 > node.conn[0].timeo.logout_timeout = 15 > node.conn[0].timeo.login_timeout = 15 > node.conn[0].timeo.auth_timeout = 45 > node.conn[0].timeo.noop_out_interval = 5 > node.conn[0].timeo.noop_out_timeout = 5 > > iscsiadm -m session -P3 > ... > Recovery Timeout: 120 > Target Reset Timeout: 30 > LUN Reset Timeout: 30 > Abort Timeout: 15 > > This is the device which had IO errors few hours ago.. > grep . /sys/block/sdk/device/* > grep: /sys/block/sdk/device/delete: Permission denied > /sys/block/sdk/device/device_blocked:0 > /sys/block/sdk/device/dh_state:detached > /sys/block/sdk/device/evt_media_change:0 > /sys/block/sdk/device/iocounterbits:32 > /sys/block/sdk/device/iodone_cnt:0x29a > /sys/block/sdk/device/ioerr_cnt:0x0 <-- error > count ? > /sys/block/sdk/device/iorequest_cnt:0x29a > /sys/block/sdk/device/modalias:scsi:t-0x00 > /sys/block/sdk/device/model:LUN > /sys/block/sdk/device/queue_depth:32 > /sys/block/sdk/device/queue_ramp_up_period:120000 > /sys/block/sdk/device/queue_type:none > grep: /sys/block/sdk/device/rescan: Permission denied > /sys/block/sdk/device/rev:7360 > /sys/block/sdk/device/scsi_level:5 > /sys/block/sdk/device/state:running > /sys/block/sdk/device/timeout:30 > /sys/block/sdk/device/type:0 > /sys/block/sdk/device/uevent:DEVTYPE=scsi_device > /sys/block/sdk/device/uevent:DRIVER=sd > /sys/block/sdk/device/uevent:MODALIAS=scsi:t-0x00 > /sys/block/sdk/device/vendor:NETAPP > > I assume that I'm not hitting any of those timeouts, otherwise I should > see something in debug output. Do any of those values affect the kernel > part of iSCSI client ? > > Do the counts (iorequest_cnt, iodone_cnt, ioerr_cnt) mean that every > request sent out was successfully completed ? ( iorequest_cnt == iodone_cnt > ) > I would like to know if the response to write is an actual error response > sent by the target or is it a (network) problem related to client. > > Thanks in advance for any tips. I'm desperate enough to start tcpdump-ing > the whole thing.. ;) > > Regards, > > Brano Zarnovican > > Host/Guest OS: Scientific Linux release 6.3 > Host kernel: 2.6.32-358.0.1.el6.x86_64 > Guest kernel: 2.6.32-279.9.1.el6.x86_64 > iSCSI client: iscsi-initiator-utils-6.2.0.872-41.el6.x86_64 > >
-- You received this message because you are subscribed to the Google Groups "open-iscsi" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/open-iscsi?hl=en. For more options, visit https://groups.google.com/groups/opt_out.
