Michael Hunter wrote:
> On Thu, 24 Sep 2009 15:41:25 -0400
> "Anurag S. Maskey" <Anurag.Maskey at Sun.COM> wrote:
>
>   
>> Michael Hunter wrote:
>>     
>>>>> That still doesn't answer how you know what is in the event stream that
>>>>> hasn't been processed that doesn't take take the ncu down.  You are
>>>>> comparing information from the state now to determine how to process an
>>>>> event that will be operating on a possible different state.
>>>>>   
>>>>>       
>>>>>           
>>>> Isn't this true for all calls to nwamd_object_set_state() and 
>>>> nwamd_object_set_state_timed() calls? None of these calls care for 
>>>> what's already in the event queue. Some of the abnormalities that we 
>>>> have seen in the past are related to this. We transition to a state 
>>>> without actually caring for what state we are currently in.
>>>>     
>>>>         
>>> s/in/in or will be in/
>>>
>>> Correct.  That is bad.  By mixing things we do immediately with things
>>> that we queue we act on things without haveing processed all the
>>> information.
>>>
>>> The problem here is that we could have an ncu down in the event queue
>>> to be processed which would mean that we really want to transition to
>>> off*/TOd.  Your change will break in that case.
>>>   
>>>       
>> I am not introducing the brokenness. It already existed, in fact, my 
>> changes slightly reduces it because if the ncu is online and there is a 
>> offline*/TOd in the event queue, the state is not changed (before my 
>> fix, the state was being changed anyway).
>>     
>
> Whether you slightly fix it or just move it around is debateable.  But
> neither is acceptable.
>
>   
>> You are leading me to think that all our state machines are broken 
>> because we never care about what is in the queue. Every 
>> nwamd_object_set_state() is questionable because by the time that 
>> particular state event is processed, the world may be different than 
>> when the state event was created.
>>     
>
> I was thinking about this some more.  nwamd_object_set_state()
> shouldn't be an event.  It should act directly on the object.  The
> things that are events should be external state changes we receive and
> keep ordered.  Decision we make based on those events should take
> effect immediately.
>
>   
>>> Well, in the end it is all an FSM ;)  Choosing to be more formal about
>>> our core logic because it is complex in a structured kind of way would
>>> help.
>>>
>>> FWIW I don't know how to fix the bug in your code off the top of my
>>> head.  You might be able to argue it will never happen or you can
>>> detect when it does/will happen.  Or you might be able to stash some
>>> state away that you check in other places.  The first seems unlikely.
>>> The second seems hard to get right and ultimately leads us to belts and
>>> braces types of complexity.
>>>   
>>>       
>> "it will never happen" is not the case because no one knows what will 
>> happen to the links. stashing state may work, it will require every 
>> state transition to check the stashed state in which case we can write, 
>> what I call "correct", solution
>>     
>
> We need to make the correct a decision based on the information we've
> received so far and effect changes based on that information.
>
>   
>> I think the solution here is for the state event handler (i.e., all 
>> nwamd_*_handle_state_event() functions) to make sure that the new state 
>> and aux state is reachable from the current state and aux state 
>> according to the state transitions that are possible (we'll have to 
>> create a complete state diagram to achieve this). The combination of 
>> state and aux state increases the number of checks, but there's no way 
>> around it. I don't know how feasible this change is at this point.
>>     
>
> I'm not sure I quite follow.
>
>   
Stepping back a bit, I think our big concern
is ending up in the wrong state - we're processing
event A, and as a consequence move into
state X, but meanwhile sitting in the event
queue is a state change event moving us
into state Y. Have I got this right?
Could someone provide an example of the sort
of problem that can occur - I'm a bit confused
I'm afraid, and a concrete example might help.
Thanks!

Alan

Reply via email to