On 12/12/2014 04:11 PM, Lee Duncan wrote:
> On Dec 12, 2014, at 12:08 PM, Mike Christie <[email protected]> wrote:
>>
>> In that other thread, I said I wanted to find out why we are
>> adding/removing sessions. That should not be happening and if you
>> figured that out, I was saying that I thought you would not hit the
>> sysfs scanning issue. Also I said in that other thread I found that bug
>> with the discoveryd daemon caching when it should not so that also
>> causes a bug that might be affecting your testing.
>>
>
> You have valid points, of course, but let's discuss that issue on that thread.
>
>>
>>>>
>>>> Please just fix this properly. It is really silly that the user would
>>>> have to run this command twice to actually do discovery.
>>>>
>>>
>>> Ouch. Okay, interesting approach. I guess I stand rebuffed.
>>>
>>> "Fixing this" would really mean getting rid of iscsiuio and having BRCM
>>> devices use the network stack, wouldn't it? I'm not sure where you draw the
>>> line between cosmetic hacks and fixing it "properly".
>>
>> Upstream already rejected the net stack approach so that is not an option.
>
> I understood as much from Eddie. I shouldn't have suggested it, as I know
> it's not an option. I was trying (poorly) to make a point about how slippery
> the "right thing" slope can be. I withdraw the point.
>
>>
>>>
>>> Why do you think the user has to "run this command twice"? That is
>>> incorrect. The iscsiadm command retries the communication 10 times, I
>>> believe, or until iscsiuio replies. Do you consider that silly? Do you mean
>>> that the iscsiadm code should not retry?
>>>
>>
>> If the user/caller has retries off then they would have to retry. In the
>> command output you put in the mail, it was not clear if you allowed
>> retries or not.
>>
>> iscsid and/or iscsiadm should not have to retry. It seems like a bogus
>> design that it does not work the first time. That is what should be fixed.
>
> I am surprised to hear you say that. That seems a little like saying device
> drivers shouldn't have to retry I/O commands: they should just work the first
> time.
>
That is not what I am saying at all. Above, I am saying that if some
code knows how to execute the operation and can, then I think it should
and should not rely on every caller to know what is right and drive retries.
We have retry code in iscsid. This is for cases like where the
driver/net init might race with iscsi init or for cases where the
network is down. To balance the needs of different callers, we want to
be able to allow iscsid to retry but we do not want to indefinitely
block iscsiadm calls that might be probing or during init time.
> The code in open-iscsi that does this retry loop is in uip_broadcast() in
> iscsid_req.c:
>
> ...
> +#define MAX_UIP_BROADCAST_READ_TRIES 3
> + for (count = 0; count < MAX_UIP_BROADCAST_READ_TRIES; count++) {
> + /* Wait for the response */
> + err = read(fd, &rsp, sizeof(rsp));
> + if (err == sizeof(rsp)) {
> + log_debug(3, "Broadcasted to uIP with length: %ld "
> + "cmd: 0x%x rsp: 0x%x\n", buf_len,
> + rsp.command, rsp.err);
> + err = 0;
> + break;
> + } else if ((err == -1) && (errno == EAGAIN)) {
> + usleep(250000);
> + continue;
> + } else {
> + log_error("Could not read response (%d/%d), daemon "
> + "died?", err, errno);
> + err = ISCSI_ERR;
> + break;
> + }
> + }
> ...
>
> This seems to have been added last year when Eddie sent you a patch (see
> commit d571cdfc0b3c539310f43dc11d54d9308299efdc). That commit was quite
> extensive, introducing the EAGAIN error to iscsi processing.
>
This is for when iscsiuio and iscsid init race. To try and detect that
race vs cases where iscsiio never comes we have limited retries there,
so we retry but do not block forever.
I think the code you are interested in is a little lower in that code
(the ISCSID_UIP_MGMT_IPC_DEVICE_UP chunk) in session_conn_uio_poll ->
iscsi_set_net_config. Are you saying below that the retries you have
been talking about are the same ones that for iscsid
session_conn_uio_poll handles? If so I misunderstood you the first time.
Your original program description made me think you were talking about a
different issue.
> And it looks like there are only 3 retries in this case, not 10 as I guessed
> before. And only in the case of EAGAIN.
>
> If you look at iscsiuio (as I have now for a few days), it's clear that it
> was designed as a minimalist state machine. The choice of uIP for the network
> stack reenforces that impression. The design is all about assuming nothing
> comes from the operating system, and is often used in firmware.
>
> Because of this, I hesitate to modify or replace this state machine, no
> matter how much fun or how pure such an exercise might be.
>
> Instead, I believe the current design works just fine, given all of the
> design considerations (of which there are too many to list here). I therefore
> still think a simple tweak to this current design is all that is needed. The
> current design either works well enough or isn't used, since few bug reports
> seem to have surfaced.
>
> I believe the best approach here is one of two:
>
> - simplest: just allow the "EAGAIN" to percolate up so the error message is
> more correct, or
> - better: change the code to not print an error message when EAGAIN is
> received and we still have retries left
>
> I know you think there is a 3rd approach, but I see problems with "fixing"
> this, and here's why ...
> The problem: iscsiuio needs to initialize an ethernet NIC/adapter before it
> can use it for bnx2i traffic (e.g. set up a VLAN, etc), but it does not know
> which adapter might be used until it gets a message from iscsid saying it has
> a new host connection. So there is no way for iscsiuio to prepare an adapter
> until it gets a request. And preparing an adapter takes a non-trivial amount
> of time. This is the same problems device drivers have with disc drives: the
> response will take too long to wait for.
>
> One possible "fix" would be to have iscsiuio actually wait to respond to its
> first request until the adapter is prepared and UP, but that would be no
> better than having the caller retry: it still blocks success and makes the
> end user wait until it is complete.
>
> Perhaps a cleaner fix would be to have iscsiuio accept the request to handle
> a new host by detaching and then replying later, asynchronously, when the NIC
> is set up and the host is registered. Even though this seems to me to be a
> cleaner design, the problem is that the poor end user that has typed
> "iscsiadm -m discovery -t st -I <bnx2i-interface> -t <some-target>" still has
> to wait! The are still going to have to wait, at least the first time, for
> the NIC to be set up. And discovery kind of has to be synchronous based on
> the current design.
>
> The only way I can see that might fix this iscsiuio delay-on-first-use
> problem is to have iscsiuio aggressively configure any and all bnx2i
> interfaces it sees when it first starts up. Or perhaps it should have a
> configuration file that says "I'm going to use eth0, so get it ready"? I'm
> not sure this approach would be considered cleaner, though.
--
You received this message because you are subscribed to the Google Groups
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.