ISCSI - connection error after moving VM to different Hyper-V cluster
Environment: NETAPP SAN storage Hyper-V Cluster OS: Cloud Linux 6.3 Im in the process of moving individual VM's off to a separate Hyper-V cluster as we are having some stability issues. The VM connects fine to iscsi, but when I've copied the VHD over to the new cluster and power it on , I cant reconnect to the iscsi targets anymore. Even when attempting to connect to the a new lun with a different initiator name the issue persists Ive attempted to stop iscsi log out of the session delete the session and to remove everything under /var/lib/iscisi When attempting to reconnect, I can see the session and even the /dev/sdc and /dev/sdd drives that the session provisions iscsiadm --mode session tcp: [1] x.x.x.x:3260,7 iqn.1992-08.com.netapp:xxx root@lnxwebr02 [~]# iscsiadm --mode session -P 3 .. .. Attached SCSI devices: Host Number: 4State: running scsi4 Channel 00 Id 0 Lun: 0 Attached scsi disk sdcState: running scsi4 Channel 00 Id 0 Lun: 1 Attached scsi disk sddState: running However the logs shows buffer io errors Dec 3 09:23:21 lnxwebr02 kernel: [ 1094.018336] Buffer I/O error on device sdc, logical block 0 Dec 3 09:24:18 lnxwebr02 kernel: [ 1151.062556] Buffer I/O error on device sdd, logical block 0 and connection errors Dec 3 09:12:05 lnxwebr02 kernel: [ 418.382452] scsi4 : iSCSI Initiator over TCP/IP Dec 3 09:12:06 lnxwebr02 kernel: [ 418.698600] scsi 4:0:0:0: Direct-Access NETAPP LUN 8020 PQ: 0 ANSI: 5 Dec 3 09:12:06 lnxwebr02 kernel: [ 418.700035] sd 4:0:0:0: Attached scsi generic sg2 type 0 Dec 3 09:12:06 lnxwebr02 kernel: [ 418.706815] scsi 4:0:0:1: Direct-Access NETAPP LUN 8020 PQ: 0 ANSI: 5 Dec 3 09:12:06 lnxwebr02 kernel: [ 418.706842] sd 4:0:0:0: [sdc] 1572990976 512-byte logical blocks: (805 GB/750 GiB) Dec 3 09:12:06 lnxwebr02 kernel: [ 418.708974] sd 4:0:0:1: Attached scsi generic sg3 type 0 Dec 3 09:12:06 lnxwebr02 kernel: [ 418.712961] sd 4:0:0:1: [sdd] 419430400 512-byte logical blocks: (214 GB/200 GiB) Dec 3 09:12:06 lnxwebr02 kernel: [ 418.713334] sd 4:0:0:0: [sdc] Write Protect is off Dec 3 09:12:06 lnxwebr02 kernel: [ 418.714268] sd 4:0:0:0: [sdc] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA Dec 3 09:12:06 lnxwebr02 kernel: [ 418.715730] sd 4:0:0:1: [sdd] Write Protect is off Dec 3 09:12:06 lnxwebr02 kernel: [ 418.716605] sd 4:0:0:1: [sdd] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA Dec 3 09:12:06 lnxwebr02 iscsid: Connection1:0 to [target: iqn.1992-08.com.netapp:wnlsfas3240b, portal: 10.11.52.12,3260] through [iface: default] is operational now Dec 3 09:12:29 lnxwebr02 PAM-hulk[2881]: failed to connect stream socket Dec 3 09:12:52 lnxwebr02 kernel: [ 418.719682] sdc: Dec 3 09:12:52 lnxwebr02 kernel: [ 464.704180] connection1:0: detected conn error (1021) Dec 3 09:12:52 lnxwebr02 iscsid: Kernel reported iSCSI connection 1:0 error (1021 - ISCSI_ERR_SCSI_EH_SESSION_RST: Session was dropped as a result of SCSI error recovery) state (3) Dec 3 09:13:00 lnxwebr02 iscsid: connection1:0 is operational after recovery (1 attempts) Dec 3 09:13:45 lnxwebr02 kernel: [ 517.704102] connection1:0: detected conn error (1021) Dec 3 09:13:45 lnxwebr02 iscsid: Kernel reported iSCSI connection 1:0 error (1021 - ISCSI_ERR_SCSI_EH_SESSION_RST: Session was dropped as a result of SCSI error recovery) state (3) Dec 3 09:13:51 lnxwebr02 iscsid: connection1:0 is operational after recovery (1 attempts) I'm also seeing the following kernel message Dec 3 09:14:36 lnxwebr02 kernel: [ 568.704067] connection1:0: detected conn error (1021) Dec 3 09:14:36 lnxwebr02 iscsid: Kernel reported iSCSI connection 1:0 error (1021 - ISCSI_ERR_SCSI_EH_SESSION_RST: Session was dropped as a result of SCSI error recovery) state (3) Dec 3 09:14:42 lnxwebr02 iscsid: connection1:0 is operational after recovery (1 attempts) Dec 3 09:15:08 lnxwebr02 kernel: [ 600.729107] INFO: task async/0:2830 blocked for more than 120 seconds. Dec 3 09:15:08 lnxwebr02 kernel: [ 600.733430] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. Dec 3 09:15:08 lnxwebr02 kernel: [ 600.737702] async/0 D 88020556c440 0 2830 20 0x0080 Dec 3 09:15:08 lnxwebr02 kernel: [ 600.737718] 880205595950 0046 8802055959b8 Dec 3 09:15:08 lnxwebr02 kernel: [ 600.737729] 8802023eb938 8802023eb848 880205590f28 880205590f28 Dec 3 09:15:08 lnxwebr02 kernel: [ 600.737738] 880028036fe8 88020556c9f8 880205595fd8 880205595fd8 Dec 3 09:15:08 lnxwebr02 kernel: [ 600.737748] Call Trace: Dec 3 09:15:08 lnxwebr02 kernel: [ 600.737761] [811231b0] ? sync_page+0x0/0x50 Dec 3 09:15:08 lnxwebr02
iscsi connection error 1020
I have several other clients connecting to this equallogic san, but this one will not and I could use some advice on where to proceed. [root@holly ~]# uname -r 2.6.32-279.11.1.el6.x86_64 [root@holly ~]# iscsiadm --version iscsiadm version 2.0-872.41.el6 [root@holly ~]# iscsiadm --mode discovery --portal 172.16.50.1 --type sendtargets 172.16.50.1:3260,1 iqn.2001-05.com.equallogic:0-8a0906-fa55b5707-ac6000425bb4d7e6-cronos-misc 172.16.50.1:3260,1 iqn.2001-05.com.equallogic:0-8a0906-8725b5707-8c7000425be4d7fd-pmf-data 172.16.50.1:3260,1 iqn.2001-05.com.equallogic:0-8a0906-fa45b5707-21a000425c24d99d-calypso-ftp 172.16.50.1:3260,1 iqn.2001-05.com.equallogic:0-8a0906-40a5b5707-b4c000427454dff9-cronos-pcbi-data 172.16.50.1:3260,1 iqn.2001-05.com.equallogic:0-8a0906-0a311c608-a68ae6e4ebd6-cronos-amanda-holdingdisk 172.16.50.1:3260,1 iqn.2001-05.com.equallogic:0-8a0906-23611c608-37d000e7aed4fd63-xena-files 172.16.50.1:3260,1 iqn.2001-05.com.equallogic:0-8a0906-d51fb8606-96b00154be08-zoe1-oracle 172.16.50.1:3260,1 iqn.2001-05.com.equallogic:0-8a0906-42afb8606-72b00224bf1b-zoe2-oracle 172.16.50.1:3260,1 iqn.2001-05.com.equallogic:0-8a0906-591db8506-c5e7a084bf94-calypso-mail 172.16.50.1:3260,1 iqn.2001-05.com.equallogic:0-8a0906-bafdb8506-1df4f6c4c616-cronos-amanda-diskbuffer 172.16.50.1:3260,1 iqn.2001-05.com.equallogic:0-8a0906-c76228806-52b00754d08e-zoe1-zoe2-oracle-backups 172.16.50.1:3260,1 iqn.2001-05.com.equallogic:0-8a0906-172228806-45a001b4e1a50980-workflow-software-6 [root@holly ~]# iscsiadm -m node -p 172.16.50.1 -T iqn.2001-05.com.equallogic:0-8a0906-172228806-45a001b4e1a50980-workflow-software-6 --login Logging in to [iface: eth1, target: iqn.2001-05.com.equallogic:0-8a0906-172228806-45a001b4e1a50980-workflow-software-6, portal: 172.16.50.1,3260] (multiple) iscsiadm: Could not login to [iface: eth1, target: iqn.2001-05.com.equallogic:0-8a0906-172228806-45a001b4e1a50980-workflow-software-6, portal: 172.16.50.1,3260]. iscsiadm: initiator reported error (8 - connection timed out) iscsiadm: Could not log into all portals [root@holly ~]# tail /var/log/messages Nov 6 10:21:22 holly kernel: session55: session recovery timed out after 120 secs Nov 6 10:21:26 holly iscsid: semop down failed 22 Nov 6 10:25:05 holly kernel: scsi64 : iSCSI Initiator over TCP/IP Nov 6 10:25:14 holly kernel: connection56:0: detected conn error (1020) Nov 6 10:25:42 holly kernel: connection56:0: detected conn error (1020) Nov 6 10:26:09 holly kernel: connection56:0: detected conn error (1020) Nov 6 10:26:36 holly kernel: connection56:0: detected conn error (1020) Nov 6 10:27:03 holly kernel: connection56:0: detected conn error (1020) Nov 6 10:27:15 holly kernel: session56: session recovery timed out after 120 secs Nov 6 10:27:19 holly iscsid: semop down failed 22 [root@holly ~]# grep -v \# /etc/iscsi/iscsid.conf | grep [a-z] iscsid.startup = /etc/rc.d/init.d/iscsid force-start node.startup = automatic node.leading_login = No node.session.timeo.replacement_timeout = 120 node.conn[0].timeo.login_timeout = 15 node.conn[0].timeo.logout_timeout = 15 node.conn[0].timeo.noop_out_interval = 5 node.conn[0].timeo.noop_out_timeout = 5 node.session.err_timeo.abort_timeout = 15 node.session.err_timeo.lu_reset_timeout = 30 node.session.err_timeo.tgt_reset_timeout = 30 node.session.initial_login_retry_max = 8 node.session.cmds_max = 128 node.session.queue_depth = 32 node.session.xmit_thread_priority = -20 node.session.iscsi.InitialR2T = No node.session.iscsi.ImmediateData = Yes node.session.iscsi.FirstBurstLength = 262144 node.session.iscsi.MaxBurstLength = 16776192 node.conn[0].iscsi.MaxRecvDataSegmentLength = 262144 node.conn[0].iscsi.MaxXmitDataSegmentLength = 0 discovery.sendtargets.iscsi.MaxRecvDataSegmentLength = 32768 node.conn[0].iscsi.HeaderDigest = None node.session.nr_sessions = 1 node.session.iscsi.FastAbort = Yes [root@holly ~]# cat /var/lib/iscsi/ifaces/eth1 # BEGIN RECORD 2.0-872.41.el6 iface.iscsi_ifacename = eth1 iface.net_ifacename = eth1 iface.transport_name = tcp iface.vlan_id = 0 iface.vlan_priority = 0 iface.iface_num = 0 iface.mtu = 0 iface.port = 0 # END RECORD [root@holly ~]# cat /var/lib/iscsi/nodes/iqn.2001-05.com.equallogic\:0-8a0906-172228806-45a001b4e1a50980-workflow-software-6/172.16.50.1\,3260\,1/eth1 # BEGIN RECORD 2.0-872.41.el6 node.name = iqn.2001-05.com.equallogic:0-8a0906-172228806-45a001b4e1a50980-workflow-software-6 node.tpgt = 1 node.startup = automatic node.leading_login = No iface.iscsi_ifacename = eth1 iface.net_ifacename = eth1 iface.transport_name = tcp iface.vlan_id = 0 iface.vlan_priority = 0 iface.iface_num = 0 iface.mtu = 0 iface.port = 0 node.discovery_address = 172.16.50.1 node.discovery_port = 3260 node.discovery_type = send_targets node.session.initial_cmdsn = 0 node.session.initial_login_retry_max = 8 node.session.xmit_thread_priority = -20
Re: iscsi connection error 1020
On Nov 6, 2012, at 9:54 AM, guidry.m...@gmail.com wrote: [root@holly ~]# iscsiadm -m node -p 172.16.50.1 -T iqn.2001-05.com.equallogic:0-8a0906-172228806-45a001b4e1a50980-workflow-software-6 --login Logging in to [iface: eth1, target: iqn.2001-05.com.equallogic:0-8a0906-172228806-45a001b4e1a50980-workflow-software-6, portal: 172.16.50.1,3260] (multiple) iscsiadm: Could not login to [iface: eth1, target: iqn.2001-05.com.equallogic:0-8a0906-172228806-45a001b4e1a50980-workflow-software-6, portal: 172.16.50.1,3260]. iscsiadm: initiator reported error (8 - connection timed out) iscsiadm: Could not log into all portals [root@holly ~]# tail /var/log/messages Nov 6 10:21:22 holly kernel: session55: session recovery timed out after 120 secs Nov 6 10:21:26 holly iscsid: semop down failed 22 Nov 6 10:25:05 holly kernel: scsi64 : iSCSI Initiator over TCP/IP Nov 6 10:25:14 holly kernel: connection56:0: detected conn error (1020) Nov 6 10:25:42 holly kernel: connection56:0: detected conn error (1020) Nov 6 10:26:09 holly kernel: connection56:0: detected conn error (1020) Nov 6 10:26:36 holly kernel: connection56:0: detected conn error (1020) Nov 6 10:27:03 holly kernel: connection56:0: detected conn error (1020) Nov 6 10:27:15 holly kernel: session56: session recovery timed out after 120 secs Nov 6 10:27:19 holly iscsid: semop down failed 22 So are some sessions getting setup ok and some are not? Could you send all of the /var/log/messages? When you see the 1020 errors above, what is in the target's log? Did you try changing the rp_filter value? See the iscsi README and your kernel documentation. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: iscsi connection error
Hi Mike, First my apologies for a late reply. I have resolved this issue. I changed the install locations and it worked. Earlier i tried shared home install. Basically, i had a shared oracle home and tried to install the oracle software (binaries) to be shared by both nodes. But later i installed the oracle softwares separate on both the nodes and they worked fine. Appreciate your sincere efforts to help noobs like me. On Thu, Jun 11, 2009 at 1:02 PM, Mike Christiemicha...@cs.wisc.edu wrote: On 06/10/2009 10:49 AM, sundar mahadevan wrote: Hi Members, First of all, I'm not too sure if this question is supposed to raise here. Sorry if this is not the right place. Appreciate if you could direct me to the right place. Thanks. OS: Oracle enterprise linux rpm -qa | grep -i scsi scsi-target-utils-0.0-5.20080917snap.el5 iscsi-initiator-utils-6.2.0.868-0.18.el5 what kernel are you using (do uname -a)? I'm trying to install oracle 9i rac. During instllation half way through, i receive the following error messages. I encountered this problem twice. In fact on the second attempt, the connection got dropped at 10:54 and then somehow reconnected itself in a few seconds. But later the connection dropped and did not comeback again. Please help. Noob. Thanks in advance. Jun 10 10:54:37 sunny1pub kernel: connection1:0: iscsi: detected conn error (1011) . . . Jun 10 10:54:40 sunny1pub iscsid: connection1:0 is operational after recovery (1 attempts) Jun 10 10:27:14 sunny1pub kernel: o2net: accepted connection from node sunny2pub.ezhome.com (num 1) at 10.1.1.2: Jun 10 10:27:18 sunny1pub kernel: ocfs2_dlm: Node 1 joins domain 1B9768E4C4FC4165A22E5E95E6A93F80 Jun 10 10:27:18 sunny1pub kernel: ocfs2_dlm: Nodes in domain (1B9768E4C4FC4165A22E5E95E6A93F80): 0 1 Jun 10 10:54:37 sunny1pub kernel: connection1:0: iscsi: detected conn error (1011) Jun 10 10:54:37 sunny1pub iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3) Jun 10 10:54:37 sunny1pub tgtd: abort_task_set(979) found a01 0 Jun 10 10:54:37 sunny1pub tgtd: conn_close(88) connection closed 0x9e370c4 2 Jun 10 10:54:39 sunny1pub kernel: iscsi: host reset succeeded Is the initiator connected to the target running on the same box? It looks like a command took too long. If a command takes longer than the scsi command timeout (/sys/block/sdX/device/timeout) then the scsi layer will try to abort it (if that fails reset the lun and if that fails reset the host). Jun 10 10:54:40 sunny1pub iscsid: received iferror -38 Jun 10 10:54:40 sunny1pub last message repeated 2 times Jun 10 10:54:40 sunny1pub iscsid: connection1:0 is operational after recovery (1 attempts) Jun 10 10:57:57 sunny1pub kernel: ping timeout of 5 secs expired, last rx 1740985, last ping 1745985, now 1750985 The initiator sends a iscsi nop as a ping every x seconds. If we do not get a response we drop the sesison, try to relogin and retry the IO. Jun 10 10:57:57 sunny1pub kernel: connection1:0: iscsi: detected conn error (1011) Jun 10 10:57:59 sunny1pub iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3) Jun 10 10:59:58 sunny1pub kernel: session1: iscsi: session recovery timed out after 120 secs it looks like something happened to the target or connection. We were not able to log back in after trying for 2 minutes (node.session.replacement_timeout). Jun 10 10:59:58 sunny1pub kernel: iscsi: cmd 0x2a is not queued (8) Jun 10 11:06:10 sunny1pub syslogd 1.4.1: restart. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: iscsi connection error
On 06/10/2009 10:49 AM, sundar mahadevan wrote: Hi Members, First of all, I'm not too sure if this question is supposed to raise here. Sorry if this is not the right place. Appreciate if you could direct me to the right place. Thanks. OS: Oracle enterprise linux rpm -qa | grep -i scsi scsi-target-utils-0.0-5.20080917snap.el5 iscsi-initiator-utils-6.2.0.868-0.18.el5 what kernel are you using (do uname -a)? I'm trying to install oracle 9i rac. During instllation half way through, i receive the following error messages. I encountered this problem twice. In fact on the second attempt, the connection got dropped at 10:54 and then somehow reconnected itself in a few seconds. But later the connection dropped and did not comeback again. Please help. Noob. Thanks in advance. Jun 10 10:54:37 sunny1pub kernel: connection1:0: iscsi: detected conn error (1011) . . . Jun 10 10:54:40 sunny1pub iscsid: connection1:0 is operational after recovery (1 attempts) Jun 10 10:27:14 sunny1pub kernel: o2net: accepted connection from node sunny2pub.ezhome.com (num 1) at 10.1.1.2: Jun 10 10:27:18 sunny1pub kernel: ocfs2_dlm: Node 1 joins domain 1B9768E4C4FC4165A22E5E95E6A93F80 Jun 10 10:27:18 sunny1pub kernel: ocfs2_dlm: Nodes in domain (1B9768E4C4FC4165A22E5E95E6A93F80): 0 1 Jun 10 10:54:37 sunny1pub kernel: connection1:0: iscsi: detected conn error (1011) Jun 10 10:54:37 sunny1pub iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3) Jun 10 10:54:37 sunny1pub tgtd: abort_task_set(979) found a01 0 Jun 10 10:54:37 sunny1pub tgtd: conn_close(88) connection closed 0x9e370c4 2 Jun 10 10:54:39 sunny1pub kernel: iscsi: host reset succeeded Is the initiator connected to the target running on the same box? It looks like a command took too long. If a command takes longer than the scsi command timeout (/sys/block/sdX/device/timeout) then the scsi layer will try to abort it (if that fails reset the lun and if that fails reset the host). Jun 10 10:54:40 sunny1pub iscsid: received iferror -38 Jun 10 10:54:40 sunny1pub last message repeated 2 times Jun 10 10:54:40 sunny1pub iscsid: connection1:0 is operational after recovery (1 attempts) Jun 10 10:57:57 sunny1pub kernel: ping timeout of 5 secs expired, last rx 1740985, last ping 1745985, now 1750985 The initiator sends a iscsi nop as a ping every x seconds. If we do not get a response we drop the sesison, try to relogin and retry the IO. Jun 10 10:57:57 sunny1pub kernel: connection1:0: iscsi: detected conn error (1011) Jun 10 10:57:59 sunny1pub iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3) Jun 10 10:59:58 sunny1pub kernel: session1: iscsi: session recovery timed out after 120 secs it looks like something happened to the target or connection. We were not able to log back in after trying for 2 minutes (node.session.replacement_timeout). Jun 10 10:59:58 sunny1pub kernel: iscsi: cmd 0x2a is not queued (8) Jun 10 11:06:10 sunny1pub syslogd 1.4.1: restart. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---