Seems reasonable. BTW, we've talked about this already, but since the
msgpairarray state machine is the current topic, I'll reiterate some of
my ideas. Its written in such a way that at present can't be used by
sys-io.sm. The problem is that it blocks (doesn't complete) waiting
for a response from all of the servers in the array. In the case of
sys-io, we want to fire off the flows to the servers once we get
responses from them. With the new concurrent state machine code in
place, I could imagine the msgpairarray being a set of concurrent
nested state machines, that in the normal case all just wait for
completion before returning to the parent. In the case of sys-io
though, it seems like we could leverage that by allowing groups of
concurrent machines to be chained together, and instead of waiting for
all of the concurrent machines to finish (each msgpairarray), allow the
completion of one to be the beginning of another nested machine in a
different grouping (in the io case, start_flow). Its a bit hand wavy
on the details, but the idea is that forked concurrent state machines
could complete without joining, instead they just go off to the next
state. The join would happen explicitly with some kind of syntax in
the state machine definition (join?).
I like this general idea, but for my 2c I would probably structure
things a little different. I don't think the msgpairarray should be
augmented much further, since it already getting a little complicated.
As it stands it does a good job of handling all of the other types of
operations (other than sys-io) pretty well. If anything it would be
nice to find ways to make it simpler in the long run.
As far as sys-io goes, I could see it consisting mainly of a a state
machine that handles I/O to a single server with these basic steps (plus
whatever helper logic states are needed):
1) setup
2) post ack
2) post req
4) post flow
5) write ack
6) retry if needed from any of the above
You could then use the concurrent sm infrastructure to start N of these,
one for each server. I'm not sure exactly what Walt's model is, but I
imagine this would mean having a parent state machine with 2 states: one
to decide which servers to use and launch the N child sms, and one to
collect the results. The N copies would not coordinate with each other
at all until they complete.
These child state machines would be much easier to debug than our
current scheme because they wouldn't need any logic in them to deal with
multiple servers, arrays of error codes, some servers going faster than
others, etc.
I know this duplicates a little of what msgpairarray does, but I think
it is sufficiently different to warrant just doing something custom for
sys-io so the msgpairarray doesn't get too elaborate.
-Phil
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers