ISCSI - connection error after moving VM to different Hyper-V cluster

2012-12-03 Thread Nico Visser
 

Environment:
NETAPP SAN storage
Hyper-V Cluster
OS: Cloud Linux 6.3

Im in the process of moving individual VM's off to a separate Hyper-V 
cluster as we are having some stability issues.
The VM connects fine to iscsi, but when I've copied the VHD over to the new 
cluster and power it on , I cant reconnect to the iscsi targets anymore.

Even when attempting to connect to the a new lun with a different initiator 
name the issue persists 

Ive attempted to 

stop iscsi 
log out of the session 
delete the session
and to remove everything under /var/lib/iscisi

When attempting to reconnect, I can see the session and even the /dev/sdc 
and /dev/sdd drives that the session provisions


iscsiadm --mode session
tcp: [1] x.x.x.x:3260,7 iqn.1992-08.com.netapp:xxx


root@lnxwebr02 [~]# iscsiadm --mode session -P 3
..
..


Attached SCSI devices:

Host Number: 4State: running
scsi4 Channel 00 Id 0 Lun: 0
Attached scsi disk sdcState: running
scsi4 Channel 00 Id 0 Lun: 1
Attached scsi disk sddState: running

However the logs shows buffer io errors

Dec  3 09:23:21 lnxwebr02 kernel: [ 1094.018336] Buffer I/O error on device 
sdc, logical block 0
Dec  3 09:24:18 lnxwebr02 kernel: [ 1151.062556] Buffer I/O error on device 
sdd, logical block 0

and connection errors

Dec  3 09:12:05 lnxwebr02 kernel: [  418.382452] scsi4 : iSCSI Initiator 
over TCP/IP
Dec  3 09:12:06 lnxwebr02 kernel: [  418.698600] scsi 4:0:0:0: 
Direct-Access NETAPP   LUN  8020 PQ: 0 ANSI: 5
Dec  3 09:12:06 lnxwebr02 kernel: [  418.700035] sd 4:0:0:0: Attached scsi 
generic sg2 type 0
Dec  3 09:12:06 lnxwebr02 kernel: [  418.706815] scsi 4:0:0:1: 
Direct-Access NETAPP   LUN  8020 PQ: 0 ANSI: 5
Dec  3 09:12:06 lnxwebr02 kernel: [  418.706842] sd 4:0:0:0: [sdc] 
1572990976 512-byte logical blocks: (805 GB/750 GiB)
Dec  3 09:12:06 lnxwebr02 kernel: [  418.708974] sd 4:0:0:1: Attached scsi 
generic sg3 type 0
Dec  3 09:12:06 lnxwebr02 kernel: [  418.712961] sd 4:0:0:1: [sdd] 
419430400 512-byte logical blocks: (214 GB/200 GiB)
Dec  3 09:12:06 lnxwebr02 kernel: [  418.713334] sd 4:0:0:0: [sdc] Write 
Protect is off
Dec  3 09:12:06 lnxwebr02 kernel: [  418.714268] sd 4:0:0:0: [sdc] Write 
cache: disabled, read cache: enabled, doesn't support DPO or FUA
Dec  3 09:12:06 lnxwebr02 kernel: [  418.715730] sd 4:0:0:1: [sdd] Write 
Protect is off
Dec  3 09:12:06 lnxwebr02 kernel: [  418.716605] sd 4:0:0:1: [sdd] Write 
cache: disabled, read cache: enabled, doesn't support DPO or FUA
Dec  3 09:12:06 lnxwebr02 iscsid: Connection1:0 to [target: 
iqn.1992-08.com.netapp:wnlsfas3240b, portal: 10.11.52.12,3260] through 
[iface: default] is operational now
Dec  3 09:12:29 lnxwebr02 PAM-hulk[2881]: failed to connect stream socket
Dec  3 09:12:52 lnxwebr02 kernel: [  418.719682]  sdc:
Dec  3 09:12:52 lnxwebr02 kernel: [  464.704180]  connection1:0: detected 
conn error (1021)
Dec  3 09:12:52 lnxwebr02 iscsid: Kernel reported iSCSI connection 1:0 
error (1021 - ISCSI_ERR_SCSI_EH_SESSION_RST: Session was dropped as a 
result of SCSI error recovery) state (3)
Dec  3 09:13:00 lnxwebr02 iscsid: connection1:0 is operational after 
recovery (1 attempts)
Dec  3 09:13:45 lnxwebr02 kernel: [  517.704102]  connection1:0: detected 
conn error (1021)
Dec  3 09:13:45 lnxwebr02 iscsid: Kernel reported iSCSI connection 1:0 
error (1021 - ISCSI_ERR_SCSI_EH_SESSION_RST: Session was dropped as a 
result of SCSI error recovery) state (3)
Dec  3 09:13:51 lnxwebr02 iscsid: connection1:0 is operational after 
recovery (1 attempts)


I'm also seeing the following kernel message


Dec  3 09:14:36 lnxwebr02 kernel: [  568.704067]  connection1:0: detected 
conn error (1021)
Dec  3 09:14:36 lnxwebr02 iscsid: Kernel reported iSCSI connection 1:0 
error (1021 - ISCSI_ERR_SCSI_EH_SESSION_RST: Session was dropped as a 
result of SCSI error recovery) state (3)
Dec  3 09:14:42 lnxwebr02 iscsid: connection1:0 is operational after 
recovery (1 attempts)
Dec  3 09:15:08 lnxwebr02 kernel: [  600.729107] INFO: task async/0:2830 
blocked for more than 120 seconds.
Dec  3 09:15:08 lnxwebr02 kernel: [  600.733430] echo 0  
/proc/sys/kernel/hung_task_timeout_secs disables this message.
Dec  3 09:15:08 lnxwebr02 kernel: [  600.737702] async/0   D 
88020556c440 0  2830  20 0x0080
Dec  3 09:15:08 lnxwebr02 kernel: [  600.737718]  880205595950 
0046  8802055959b8
Dec  3 09:15:08 lnxwebr02 kernel: [  600.737729]  8802023eb938 
8802023eb848 880205590f28 880205590f28
Dec  3 09:15:08 lnxwebr02 kernel: [  600.737738]  880028036fe8 
88020556c9f8 880205595fd8 880205595fd8
Dec  3 09:15:08 lnxwebr02 kernel: [  600.737748] Call Trace:
Dec  3 09:15:08 lnxwebr02 kernel: [  600.737761]  [811231b0] ? 
sync_page+0x0/0x50
Dec  3 09:15:08 lnxwebr02 

iscsi connection error 1020

2012-11-07 Thread guidry . matt


I have several other clients connecting to this equallogic san, but this one 
will not and I could use some advice on where to proceed.


[root@holly ~]# uname -r

2.6.32-279.11.1.el6.x86_64

[root@holly ~]# iscsiadm --version
iscsiadm version 2.0-872.41.el6

[root@holly ~]# iscsiadm --mode discovery --portal 172.16.50.1 --type 
sendtargets
172.16.50.1:3260,1 
iqn.2001-05.com.equallogic:0-8a0906-fa55b5707-ac6000425bb4d7e6-cronos-misc
172.16.50.1:3260,1 
iqn.2001-05.com.equallogic:0-8a0906-8725b5707-8c7000425be4d7fd-pmf-data
172.16.50.1:3260,1 
iqn.2001-05.com.equallogic:0-8a0906-fa45b5707-21a000425c24d99d-calypso-ftp
172.16.50.1:3260,1 
iqn.2001-05.com.equallogic:0-8a0906-40a5b5707-b4c000427454dff9-cronos-pcbi-data
172.16.50.1:3260,1 
iqn.2001-05.com.equallogic:0-8a0906-0a311c608-a68ae6e4ebd6-cronos-amanda-holdingdisk
172.16.50.1:3260,1 
iqn.2001-05.com.equallogic:0-8a0906-23611c608-37d000e7aed4fd63-xena-files
172.16.50.1:3260,1 
iqn.2001-05.com.equallogic:0-8a0906-d51fb8606-96b00154be08-zoe1-oracle
172.16.50.1:3260,1 
iqn.2001-05.com.equallogic:0-8a0906-42afb8606-72b00224bf1b-zoe2-oracle
172.16.50.1:3260,1 
iqn.2001-05.com.equallogic:0-8a0906-591db8506-c5e7a084bf94-calypso-mail
172.16.50.1:3260,1 
iqn.2001-05.com.equallogic:0-8a0906-bafdb8506-1df4f6c4c616-cronos-amanda-diskbuffer
172.16.50.1:3260,1 
iqn.2001-05.com.equallogic:0-8a0906-c76228806-52b00754d08e-zoe1-zoe2-oracle-backups
172.16.50.1:3260,1 
iqn.2001-05.com.equallogic:0-8a0906-172228806-45a001b4e1a50980-workflow-software-6

[root@holly ~]# iscsiadm -m node -p 172.16.50.1 -T 
iqn.2001-05.com.equallogic:0-8a0906-172228806-45a001b4e1a50980-workflow-software-6
 
--login
Logging in to [iface: eth1, target: 
iqn.2001-05.com.equallogic:0-8a0906-172228806-45a001b4e1a50980-workflow-software-6,
 
portal: 172.16.50.1,3260] (multiple)
iscsiadm: Could not login to [iface: eth1, target: 
iqn.2001-05.com.equallogic:0-8a0906-172228806-45a001b4e1a50980-workflow-software-6,
 
portal: 172.16.50.1,3260].
iscsiadm: initiator reported error (8 - connection timed out)
iscsiadm: Could not log into all portals

[root@holly ~]# tail /var/log/messages
Nov  6 10:21:22 holly kernel: session55: session recovery timed out after 
120 secs
Nov  6 10:21:26 holly iscsid: semop down failed 22
Nov  6 10:25:05 holly kernel: scsi64 : iSCSI Initiator over TCP/IP
Nov  6 10:25:14 holly kernel: connection56:0: detected conn error (1020)
Nov  6 10:25:42 holly kernel: connection56:0: detected conn error (1020)
Nov  6 10:26:09 holly kernel: connection56:0: detected conn error (1020)
Nov  6 10:26:36 holly kernel: connection56:0: detected conn error (1020)
Nov  6 10:27:03 holly kernel: connection56:0: detected conn error (1020)
Nov  6 10:27:15 holly kernel: session56: session recovery timed out after 
120 secs
Nov  6 10:27:19 holly iscsid: semop down failed 22

[root@holly ~]# grep -v \# /etc/iscsi/iscsid.conf | grep [a-z]
iscsid.startup = /etc/rc.d/init.d/iscsid force-start
node.startup = automatic
node.leading_login = No
node.session.timeo.replacement_timeout = 120
node.conn[0].timeo.login_timeout = 15
node.conn[0].timeo.logout_timeout = 15
node.conn[0].timeo.noop_out_interval = 5
node.conn[0].timeo.noop_out_timeout = 5
node.session.err_timeo.abort_timeout = 15
node.session.err_timeo.lu_reset_timeout = 30
node.session.err_timeo.tgt_reset_timeout = 30
node.session.initial_login_retry_max = 8
node.session.cmds_max = 128
node.session.queue_depth = 32
node.session.xmit_thread_priority = -20
node.session.iscsi.InitialR2T = No
node.session.iscsi.ImmediateData = Yes
node.session.iscsi.FirstBurstLength = 262144
node.session.iscsi.MaxBurstLength = 16776192
node.conn[0].iscsi.MaxRecvDataSegmentLength = 262144
node.conn[0].iscsi.MaxXmitDataSegmentLength = 0
discovery.sendtargets.iscsi.MaxRecvDataSegmentLength = 32768
node.conn[0].iscsi.HeaderDigest = None
node.session.nr_sessions = 1
node.session.iscsi.FastAbort = Yes

[root@holly ~]# cat /var/lib/iscsi/ifaces/eth1 
# BEGIN RECORD 2.0-872.41.el6
iface.iscsi_ifacename = eth1
iface.net_ifacename = eth1
iface.transport_name = tcp
iface.vlan_id = 0
iface.vlan_priority = 0
iface.iface_num = 0
iface.mtu = 0
iface.port = 0
# END RECORD

[root@holly ~]# cat 
/var/lib/iscsi/nodes/iqn.2001-05.com.equallogic\:0-8a0906-172228806-45a001b4e1a50980-workflow-software-6/172.16.50.1\,3260\,1/eth1
 
# BEGIN RECORD 2.0-872.41.el6
node.name = 
iqn.2001-05.com.equallogic:0-8a0906-172228806-45a001b4e1a50980-workflow-software-6
node.tpgt = 1
node.startup = automatic
node.leading_login = No
iface.iscsi_ifacename = eth1
iface.net_ifacename = eth1
iface.transport_name = tcp
iface.vlan_id = 0
iface.vlan_priority = 0
iface.iface_num = 0
iface.mtu = 0
iface.port = 0
node.discovery_address = 172.16.50.1
node.discovery_port = 3260
node.discovery_type = send_targets
node.session.initial_cmdsn = 0
node.session.initial_login_retry_max = 8
node.session.xmit_thread_priority = -20

Re: iscsi connection error 1020

2012-11-07 Thread Michael Christie

On Nov 6, 2012, at 9:54 AM, guidry.m...@gmail.com wrote:
 [root@holly ~]# iscsiadm -m node -p 172.16.50.1 -T 
 iqn.2001-05.com.equallogic:0-8a0906-172228806-45a001b4e1a50980-workflow-software-6
  --login
 Logging in to [iface: eth1, target: 
 iqn.2001-05.com.equallogic:0-8a0906-172228806-45a001b4e1a50980-workflow-software-6,
  portal: 172.16.50.1,3260] (multiple)
 iscsiadm: Could not login to [iface: eth1, target: 
 iqn.2001-05.com.equallogic:0-8a0906-172228806-45a001b4e1a50980-workflow-software-6,
  portal: 172.16.50.1,3260].
 iscsiadm: initiator reported error (8 - connection timed out)
 iscsiadm: Could not log into all portals
 
 [root@holly ~]# tail /var/log/messages
 Nov  6 10:21:22 holly kernel: session55: session recovery timed out after 120 
 secs
 Nov  6 10:21:26 holly iscsid: semop down failed 22
 Nov  6 10:25:05 holly kernel: scsi64 : iSCSI Initiator over TCP/IP
 Nov  6 10:25:14 holly kernel: connection56:0: detected conn error (1020)
 Nov  6 10:25:42 holly kernel: connection56:0: detected conn error (1020)
 Nov  6 10:26:09 holly kernel: connection56:0: detected conn error (1020)
 Nov  6 10:26:36 holly kernel: connection56:0: detected conn error (1020)
 Nov  6 10:27:03 holly kernel: connection56:0: detected conn error (1020)
 Nov  6 10:27:15 holly kernel: session56: session recovery timed out after 120 
 secs
 Nov  6 10:27:19 holly iscsid: semop down failed 22

So are some sessions getting setup ok and some are not?

Could you send all of the /var/log/messages?

When you see the 1020 errors above, what is in the target's log?

Did you try changing the rp_filter value? See the iscsi README and your kernel 
documentation.

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: iscsi connection error

2009-06-15 Thread sundar mahadevan

Hi Mike,
First my apologies for a late reply. I have resolved this issue. I
changed the install locations and it worked. Earlier i tried shared
home install. Basically, i had a shared oracle home and tried to
install the oracle software (binaries) to be shared by both nodes. But
later i installed the oracle softwares separate on both the nodes and
they worked fine. Appreciate your sincere efforts to help noobs like
me.

On Thu, Jun 11, 2009 at 1:02 PM, Mike Christiemicha...@cs.wisc.edu wrote:

 On 06/10/2009 10:49 AM, sundar mahadevan wrote:
 Hi Members,
 First of all, I'm not too sure if this question is supposed to raise
 here. Sorry if this is not the right place. Appreciate if you could
 direct me to the right place. Thanks.

 OS: Oracle enterprise linux
 rpm -qa | grep -i scsi

 scsi-target-utils-0.0-5.20080917snap.el5
 iscsi-initiator-utils-6.2.0.868-0.18.el5


 what kernel are you using (do uname -a)?

 I'm trying to install oracle 9i rac. During instllation half way
 through, i receive the following error messages. I encountered this
 problem twice. In fact on the second attempt, the connection got
 dropped at 10:54 and then somehow reconnected itself in a few seconds.
 But later the connection dropped and did not comeback again. Please
 help. Noob. Thanks in advance.

 Jun 10 10:54:37 sunny1pub kernel:  connection1:0: iscsi: detected conn
 error (1011)
 .
 .
 .
 Jun 10 10:54:40 sunny1pub iscsid: connection1:0 is operational after
 recovery (1 attempts)


 Jun 10 10:27:14 sunny1pub kernel: o2net: accepted connection from node
 sunny2pub.ezhome.com (num 1) at 10.1.1.2:
 Jun 10 10:27:18 sunny1pub kernel: ocfs2_dlm: Node 1 joins domain
 1B9768E4C4FC4165A22E5E95E6A93F80
 Jun 10 10:27:18 sunny1pub kernel: ocfs2_dlm: Nodes in domain
 (1B9768E4C4FC4165A22E5E95E6A93F80): 0 1
 Jun 10 10:54:37 sunny1pub kernel:  connection1:0: iscsi: detected conn
 error (1011)
 Jun 10 10:54:37 sunny1pub iscsid: Kernel reported iSCSI connection 1:0
 error (1011) state (3)
 Jun 10 10:54:37 sunny1pub tgtd: abort_task_set(979) found a01 0
 Jun 10 10:54:37 sunny1pub tgtd: conn_close(88) connection closed 0x9e370c4 2
 Jun 10 10:54:39 sunny1pub kernel: iscsi: host reset succeeded


 Is the initiator connected to the target running on the same box?


 It looks like a command took too long. If a command takes longer than
 the scsi command timeout (/sys/block/sdX/device/timeout) then the scsi
 layer will try to abort it (if that fails reset the lun and if that
 fails reset the host).


 Jun 10 10:54:40 sunny1pub iscsid: received iferror -38
 Jun 10 10:54:40 sunny1pub last message repeated 2 times
 Jun 10 10:54:40 sunny1pub iscsid: connection1:0 is operational after
 recovery (1 attempts)
 Jun 10 10:57:57 sunny1pub kernel: ping timeout of 5 secs expired, last
 rx 1740985, last ping 1745985, now 1750985

 The initiator sends a iscsi nop as a ping every x seconds. If we do not
 get a response we drop the sesison, try to relogin and retry the IO.

 Jun 10 10:57:57 sunny1pub kernel:  connection1:0: iscsi: detected conn
 error (1011)
 Jun 10 10:57:59 sunny1pub iscsid: Kernel reported iSCSI connection 1:0
 error (1011) state (3)
 Jun 10 10:59:58 sunny1pub kernel:  session1: iscsi: session recovery
 timed out after 120 secs

 it looks like something happened to the target or connection. We were
 not able to log back in after trying for 2 minutes
 (node.session.replacement_timeout).

 Jun 10 10:59:58 sunny1pub kernel: iscsi: cmd 0x2a is not queued (8)
 Jun 10 11:06:10 sunny1pub syslogd 1.4.1: restart.

 


 


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: iscsi connection error

2009-06-11 Thread Mike Christie

On 06/10/2009 10:49 AM, sundar mahadevan wrote:
 Hi Members,
 First of all, I'm not too sure if this question is supposed to raise
 here. Sorry if this is not the right place. Appreciate if you could
 direct me to the right place. Thanks.

 OS: Oracle enterprise linux
 rpm -qa | grep -i scsi

 scsi-target-utils-0.0-5.20080917snap.el5
 iscsi-initiator-utils-6.2.0.868-0.18.el5


what kernel are you using (do uname -a)?

 I'm trying to install oracle 9i rac. During instllation half way
 through, i receive the following error messages. I encountered this
 problem twice. In fact on the second attempt, the connection got
 dropped at 10:54 and then somehow reconnected itself in a few seconds.
 But later the connection dropped and did not comeback again. Please
 help. Noob. Thanks in advance.

 Jun 10 10:54:37 sunny1pub kernel:  connection1:0: iscsi: detected conn
 error (1011)
 .
 .
 .
 Jun 10 10:54:40 sunny1pub iscsid: connection1:0 is operational after
 recovery (1 attempts)


 Jun 10 10:27:14 sunny1pub kernel: o2net: accepted connection from node
 sunny2pub.ezhome.com (num 1) at 10.1.1.2:
 Jun 10 10:27:18 sunny1pub kernel: ocfs2_dlm: Node 1 joins domain
 1B9768E4C4FC4165A22E5E95E6A93F80
 Jun 10 10:27:18 sunny1pub kernel: ocfs2_dlm: Nodes in domain
 (1B9768E4C4FC4165A22E5E95E6A93F80): 0 1
 Jun 10 10:54:37 sunny1pub kernel:  connection1:0: iscsi: detected conn
 error (1011)
 Jun 10 10:54:37 sunny1pub iscsid: Kernel reported iSCSI connection 1:0
 error (1011) state (3)
 Jun 10 10:54:37 sunny1pub tgtd: abort_task_set(979) found a01 0
 Jun 10 10:54:37 sunny1pub tgtd: conn_close(88) connection closed 0x9e370c4 2
 Jun 10 10:54:39 sunny1pub kernel: iscsi: host reset succeeded


Is the initiator connected to the target running on the same box?


It looks like a command took too long. If a command takes longer than 
the scsi command timeout (/sys/block/sdX/device/timeout) then the scsi 
layer will try to abort it (if that fails reset the lun and if that 
fails reset the host).


 Jun 10 10:54:40 sunny1pub iscsid: received iferror -38
 Jun 10 10:54:40 sunny1pub last message repeated 2 times
 Jun 10 10:54:40 sunny1pub iscsid: connection1:0 is operational after
 recovery (1 attempts)
 Jun 10 10:57:57 sunny1pub kernel: ping timeout of 5 secs expired, last
 rx 1740985, last ping 1745985, now 1750985

The initiator sends a iscsi nop as a ping every x seconds. If we do not 
get a response we drop the sesison, try to relogin and retry the IO.

 Jun 10 10:57:57 sunny1pub kernel:  connection1:0: iscsi: detected conn
 error (1011)
 Jun 10 10:57:59 sunny1pub iscsid: Kernel reported iSCSI connection 1:0
 error (1011) state (3)
 Jun 10 10:59:58 sunny1pub kernel:  session1: iscsi: session recovery
 timed out after 120 secs

it looks like something happened to the target or connection. We were 
not able to log back in after trying for 2 minutes 
(node.session.replacement_timeout).

 Jun 10 10:59:58 sunny1pub kernel: iscsi: cmd 0x2a is not queued (8)
 Jun 10 11:06:10 sunny1pub syslogd 1.4.1: restart.

 


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---