Re: Recovering initiator connection when target (IET) is restarted - root over iscsi

2017-08-16 Thread The Lee-Man
On Monday, August 7, 2017 at 2:35:26 PM UTC-7, Bob Marris wrote:
>
> I'm running my initiator's root partition via open-iscsi to an IET target.
>
> I'm testing failure scenarios, and in the event of a target reboot (or 
> daemon restart) my initiator seems to hang.  I was hoping it would be able 
> to recover and just carry on running once the target came back. Since this 
> is my root partition I can't see any logs once it hangs
>
> I've increased node.session.timeo.replacement_timeout.  
>
> I've also added "discovery.sendtargets.use_discoveryd = Yes"  in  
> /etc/iscsi/send_targets/192.168.1.5,3260/st_config, thinking that it 
> would allow the initiator to rediscover/login to the target when it comes 
> back up.
>
> I do not believe having your target rediscovered is the right approach and 
is likely just mucking things up. I suggest you reset this back to default, 
until you figure out the issue.
 

> Is it my initiator configuration or somehow a failing of IET at the target 
> side?
>
>
> My settings at the initiator end are:
>
> pi@homeauto:/etc/iscsi $ sudo iscsiadm -m node -T 
> iqn.2017-07.eu.bobta:domoticz -p 192.168.1.5
> # BEGIN RECORD 2.0-873
> node.name = iqn.2017-07.eu.bobta:domoticz
> node.tpgt = 1
> node.startup = automatic
> node.leading_login = No
> iface.hwaddress = 
> iface.ipaddress = 
> iface.iscsi_ifacename = default
> iface.net_ifacename = 
> iface.transport_name = tcp
> iface.initiatorname = 
> iface.bootproto = 
> iface.subnet_mask = 
> iface.gateway = 
> iface.ipv6_autocfg = 
> iface.linklocal_autocfg = 
> iface.router_autocfg = 
> iface.ipv6_linklocal = 
> iface.ipv6_router = 
> iface.state = 
> iface.vlan_id = 0
> iface.vlan_priority = 0
> iface.vlan_state = 
> iface.iface_num = 0
> iface.mtu = 0
> iface.port = 0
> node.discovery_address = 192.168.1.5
> node.discovery_port = 3260
> node.discovery_type = send_targets
> node.session.initial_cmdsn = 0
> node.session.initial_login_retry_max = 8
> node.session.xmit_thread_priority = -20
> node.session.cmds_max = 128
> node.session.queue_depth = 32
> node.session.nr_sessions = 1
> node.session.auth.authmethod = CHAP
> node.session.auth.username = bob
> node.session.auth.password = 
> node.session.auth.username_in = 
> node.session.auth.password_in = 
> node.session.timeo.replacement_timeout = 1200
> node.session.err_timeo.abort_timeout = 15
> node.session.err_timeo.lu_reset_timeout = 30
> node.session.err_timeo.tgt_reset_timeout = 30
> node.session.err_timeo.host_reset_timeout = 60
> node.session.iscsi.FastAbort = Yes
> node.session.iscsi.InitialR2T = No
> node.session.iscsi.ImmediateData = Yes
> node.session.iscsi.FirstBurstLength = 262144
> node.session.iscsi.MaxBurstLength = 16776192
> node.session.iscsi.DefaultTime2Retain = 0
> node.session.iscsi.DefaultTime2Wait = 2
> node.session.iscsi.MaxConnections = 1
> node.session.iscsi.MaxOutstandingR2T = 1
> node.session.iscsi.ERL = 0
> node.conn[0].address = 192.168.1.5
> node.conn[0].port = 3260
> node.conn[0].startup = automatic
> node.conn[0].tcp.window_size = 524288
> node.conn[0].tcp.type_of_service = 0
> node.conn[0].timeo.logout_timeout = 15
> node.conn[0].timeo.login_timeout = 15
> node.conn[0].timeo.auth_timeout = 45
> node.conn[0].timeo.noop_out_interval = 0
> node.conn[0].timeo.noop_out_timeout = 0
> node.conn[0].iscsi.MaxXmitDataSegmentLength = 0
> node.conn[0].iscsi.MaxRecvDataSegmentLength = 262144
> node.conn[0].iscsi.HeaderDigest = None
> node.conn[0].iscsi.DataDigest = None
> node.conn[0].iscsi.IFMarker = No
> node.conn[0].iscsi.OFMarker = No
> # END RECORD
>
> Any help would be very much appreciated!
>

It looks like you are using CHAP? That shouldn't be an issue, but to 
simplify things you might want to remove that.

I tested using iet as a target on SLE 12 SP1 and open-iscsi initiator on 
SLE 12 SP2. I ran a "dd" to read all the blocks on the iscsi target from 
the inititator, as well as running a "mkfs.ext3". Then I restarted the iet 
daemon. Both IO streams paused then continued. In other words, I cannot 
reproduce your issue.

Can you do some testing? When the system hangs, is it the initiator or the 
target? You should be able to log into the target from another (different) 
initiator to see if the target's still working.

You can also enable debugging on the target and the initiator to see what's 
happening when your hang occurs.

I would suggest testing by using your iscsi target as a *non-root* disc. If 
it's your root disc and you have issues, you can't debug them very easily 
if you're using it as your root disc.
-- 
Lee Duncan 

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at https://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.


Re: [RFC PATCH 1/6] bsg: fix kernel panic resulting from missing allocation of a reply-buffer

2017-08-16 Thread Christoph Hellwig
On Mon, Aug 14, 2017 at 06:32:17PM +0200, Benjamin Block wrote:
> > -   blk_end_request_all(rq, BLK_STS_OK);
> > -
> > put_device(job->dev);   /* release reference for the request */
> >
> > kfree(job->request_payload.sg_list);
> > kfree(job->reply_payload.sg_list);
> > -   kfree(job);
> > +   blk_end_request_all(rq, BLK_STS_OK);
> 
> What is the reason for moving that last line? Just wondering whether
> that might change the behavior somehow, although it doesn't look like it
> from the code.

The job is now allocated as part of the request, so we must fre
it last.  The only change in behavior is that the reference gets dropped
a bit earlier, but once ownership is handed to the block layer
it's not needed, as are the memory allocations for the S/G lists.

> > +{
> > +   struct bsg_job *job = blk_mq_rq_to_pdu(req);
> > +
> > +   memset(job, 0, sizeof(*job));
> > +   job->req = req;
> > +   job->request = job->sreq.cmd;
> 
> That doesn't work with bsg.c if the request submitted by the user is
> bigger than BLK_MAX_CDB. There is code in blk_fill_sgv4_hdr_rq() that
> will reassign the req->cmd point in that case to something else.
> 
> This is maybe wrong in the same vein as my Patch 1 is. If not for the
> legacy code in bsg.c, setting this here, will miss changes to that
> pointer between request-allocation and job-submission.
> 
> So maybe just move this to bsg_create_job().

Yes, this should be in  indeed.

> 
> > +   job->dd_data = job + 1;
> > +   job->reply = job->sreq.sense = kzalloc(job->reply_len, gfp);
> 
> job->reply_len will be 0 here, won't it? You probably have to pull the
> "job->reply_len = SCSI_SENSE_BUFFERSIZE" here from bsg_create_job().

True.

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at https://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.