Re: [PATCH V1] IB/iser: Add Discovery support

2013-08-03 Thread Mike Christie
On 08/01/2013 05:58 AM, Or Gerlitz wrote:
 On 31/07/2013 19:54, Mike Christie wrote:
 Just send to linux-scsi. When it goes there I will respond to the mail
 with a Reviewed-by email.
 
 Thanks, will do that.
 
 So now what about the user space part... as minimum we need to apply the
 relevant parts from
 patch 1/4 below, that is add new params to iscsi_if.h and actually what
 else besides calling down to set param of ISCSI_PARAM_DISCOVERY_SESS
 from within the
 discovery code.
 
 Could you coach me if/how to use the kernel patch or whatever the
 fastest way to come up
 with the user space patch?
 

I am not completely sure what info you need so let me know if you need more.

- Take the needed include pieces and submit a patch against the
open-iscsi/include code. Do not worry about touching the
open-iscsi/kernel code. It is only for old kernels/distros and is not
udpated.
- Modify the open-isccsi/usr initiator_common.c (I think that was the
file or was it the discovery.c one) related code as needed.
- Run against a kernel with your patches and Qlogic's patches for testing.

Does that answer your question?

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/groups/opt_out.




Re: Rescan LUNs hangs

2013-08-03 Thread Mike Christie
On 08/01/2013 01:43 PM, Tracy Reed wrote:
 On Wed, Jul 24, 2013 at 05:15:16PM PDT, Mike Christie spake thusly:
 Did you bring the target back up and if so did you do it with the same
 target name?
 
 Sorry for the delay in getting back, been travelling on business. But thanks
 very much for the reply!
 
 Yes, I did bring the target back up and with the same name. Although some of
 the LUNs have moved around as I rebuilt the machine to match its partner which
 the VMs RAID 1 it against.
 
 What is your replacement/recovery timeout setting in /etc/iscsi/iscsid.conf?
 
 Looks like 120 but just in case, here's the entire contents:
 
 scsid.startup = /etc/rc.d/init.d/iscsid force-start
 node.startup = automatic
 node.leading_login = No
 node.session.timeo.replacement_timeout = 120
 node.conn[0].timeo.login_timeout = 15
 node.conn[0].timeo.logout_timeout = 15
 node.conn[0].timeo.noop_out_interval = 5
 node.conn[0].timeo.noop_out_timeout = 5
 node.session.err_timeo.abort_timeout = 15
 node.session.err_timeo.lu_reset_timeout = 30
 node.session.err_timeo.tgt_reset_timeout = 30
 node.session.initial_login_retry_max = 8
 node.session.cmds_max = 128
 node.session.queue_depth = 32
 node.session.xmit_thread_priority = -20
 node.session.iscsi.InitialR2T = No
 node.session.iscsi.ImmediateData = Yes
 node.session.iscsi.FirstBurstLength = 262144
 node.session.iscsi.MaxBurstLength = 16776192
 node.conn[0].iscsi.MaxRecvDataSegmentLength = 262144
 node.conn[0].iscsi.MaxXmitDataSegmentLength = 0
 discovery.sendtargets.iscsi.MaxRecvDataSegmentLength = 32768
 node.conn[0].iscsi.HeaderDigest = None
 node.session.nr_sessions = 1
 node.session.iscsi.FastAbort = Yes
 
 See anything amiss? I now have around 8 processes stuck on this system. I'm
 going to have to reboot it this weekend to clear up the issue but I would
 really like to find out what is really going on and how to avoid it before
 taking such measures.
 
 It sounds like the scsi scan IO is stuck on a target that disappeared
 and never came back, or it is a Centos scsi layer bug. Could you send
 the /var/log/messages.
 
 The entire file is rather large but here are some of the messages relevant to
 iscsi:
 
 Jul  4 15:18:44 cpu03 kernel: connection8:0: detected conn error (1020)
 Jul  4 15:18:45 cpu03 iscsid: Kernel reported iSCSI connection 8:0 error 
 (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3)
 Jul  4 15:18:46 cpu03 kernel: connection6:0: detected conn error (1020)
 Jul  4 15:18:46 cpu03 kernel: connection7:0: detected conn error (1020)
 Jul  4 15:18:47 cpu03 iscsid: Kernel reported iSCSI connection 6:0 error 
 (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3)
 Jul  4 15:18:47 cpu03 iscsid: Kernel reported iSCSI connection 7:0 error 
 (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3)
 Jul  4 15:18:47 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection 
 refused)
 Jul  4 15:18:50 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection 
 refused)
 Jul  4 15:18:50 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection 
 refused)
 Jul  4 15:18:51 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection 
 refused)
 Jul  4 15:19:27 cpu03 iscsid: connect to 10.0.1.11:3260 failed (No route to 
 host)
 Jul  4 15:19:30 cpu03 iscsid: connect to 10.0.1.11:3260 failed (No route to 
 host)
 Jul  4 15:19:30 cpu03 iscsid: connect to 10.0.1.11:3260 failed (No route to 
 host)
 Jul  4 15:19:33 cpu03 iscsid: connect to 10.0.1.11:3260 failed (No route to 
 host)
 Jul  4 15:19:36 cpu03 iscsid: connect to 10.0.1.11:3260 failed (No route to 
 host)
 skip many of these no route to host messages, happened while I was 
 rebuilding the target with ip 10.0.1.11
 Jul  4 15:18:47 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection 
 refused)
 Jul  4 15:18:50 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection 
 refused)
 Jul  4 15:18:50 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection 
 refused)
 Jul  4 15:18:51 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection 
 refused)
 Jul  4 15:20:45 cpu03 kernel: session8: session recovery timed out after 120 
 secs
 Jul  4 15:20:45 cpu03 iscsid: connect to 10.0.1.11:3260 failed (No route to 
 host)
 Jul  4 15:20:47 cpu03 kernel: session6: session recovery timed out after 120 
 secs
 Jul  4 15:20:47 cpu03 kernel: session7: session recovery timed out after 120 
 secs
 Jul  8 20:37:04 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection 
 refused)
 Jul  8 20:37:04 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection 
 refused)
 Jul  8 20:37:07 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection 
 refused)
 skip lots of these connection refused messages)
 Jul 12 14:33:08 cpu03 kernel: connection8:0: detected conn error (1020)
 Jul 12 14:33:08 cpu03 kernel: connection6:0: detected conn error (1020)
 Jul 12 14:33:08 cpu03 kernel: connection7:0: detected conn error (1020)
 Jul 12 14:33:09 cpu03 iscsid: conn 0 login rejected: initiator error - 

Re: BUG: iscsid: Can not allocate memory for receive context.

2013-08-03 Thread Mike Christie
On 08/02/2013 02:59 PM, Julian Freed wrote:
 
 Thanks for the patch.
 After few tests I believe it solves the problem, after restoring  
 #define CONTEXT_POOL_MAX 32
 

Thanks for testing.

 Two more questions.
 
 I need to increase the lun_limit, currently it is limited to 511
 
 I thought I found the limit   in  kernel/iscsi_tcp.c  static
 unsigned int iscsi_max_lun = 512
 But changing this, that did not help.

Did you change the code or use the modparam that the code is for? What
kernel is this with too? There is a weird kink with older kernels and
that modparam. You had to set the param then do login. You also cannot
write to the sysfs file. You had to do it modprobe time.

 The file in the kernel 3.8.4 seems to be unlimited static
 unsigned int iscsi_max_lun = ~0;
 By the way the file in the kernel is very different from open-iscsi-2.0-873
 

Yeah, the 512 limit in the old code was just a artificial limit. We used
to hit bugs with older targets that would not respond to inquirys and
report luns like how the scsi layer wanted it, so the scsi layer would
drop down to sequential scans that took a long time. Those targets are
longer really used and so the limit was removed in newer kernels.



 
 Another question:
 Why the iscsi login is so slow.   It takes 40 seconds for 900 iscsi luns
 in 3 targets.
 

Is the iscsi login slow or the device setup and discovery? When you run
the iscsiadm login we do both. The iscsi login should be fast. Send the
/var/log/messages so I can check.

The device setup and discovery for 900 luns is probably slow. The scsi
layer sends lots of commands to find luns and set them up. In newer
kernels we do some of the setup in parrallel so that should help if your
kernel has it on. It is still slow for 900 luns though.

It seems you are used to looking at the source so if interested see
sd_probe_async. The iscsi layer already does scsi scanning for each
target async so ignore the scsi async scanning stuff.

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/groups/opt_out.