Anurag S. Maskey wrote: > >> Stepping back a bit, I think our big concern >> is ending up in the wrong state - we're processing >> event A, and as a consequence move into >> state X, but meanwhile sitting in the event >> queue is a state change event moving us >> into state Y. Have I got this right? > Yep, that's exactly right. Furthermore, state X may not be reachable > from state Y, which I think is the main problem. > >> Could someone provide an example of the sort >> of problem that can occur - I'm a bit confused >> I'm afraid, and a concrete example might help. >> Thanks! > 11103 is sort of an example of this. The ncu is already online. When > dhcp times out, we originally moved to offline*. My fix checks to > make sure the ncu is not already online. If it is online, don't > create the state change event, otherwise create the event that changes > to offline*. > > This fix is still not complete. The ncu could be in offline* state > and the online state event could be in the queue. In this case, the > timed out event is still enqueued. The NCU changes to online and then > offline* state. > Okay, got it, thanks! So in this case, a solution might be (and I think this is what you were suggesting as a more long-term solution) to validate the state change when we dequeue and consume the state change event (rather than use the current state to determine whether we enqueue the state change event). So in this case specifically, we'd enqueue the "offline*/dhcp timed out" state change unconditionally, but reject it in nwamd_ncu_handle_state_event() if we are already online as an invalid state transition.
I think that would work in this case - what worries me though are situations where we enqueue state change X if in state A or state change Y if in state B. So in other words, the actual desired state in the state change event we enqueue is dependent on the state we are in. I think that type of scenario is more complex, and I think there's probably a few examples of this sort of thing scattered around the nwamd code. Michael's suggestion to make state changes immediate might help localize state change processing in time a bit more closely. In practical terms, this could be implemented by designating state change events as prioritized - they still get added to the event queue in order, but they're added to the event queue ahead of other types of event. I think it would help in evaluating which of these approaches (and I can see arguments for doing both, and I don't even think they're mutually exclusive) we utilize if we could come up with a rough rule of thumb which helps evaluate possibly problematic state changes like the one above. I've tried to look at all the cases where we call nwamd_object_set_state() (there's 50 or so) but I keep getting bogged down in the details. What do you think? Alan
