Seems reasonable. BTW, we've talked about this already, but since the msgpairarray state machine is the current topic, I'll reiterate some of my ideas. Its written in such a way that at present can't be used by sys-io.sm. The problem is that it blocks (doesn't complete) waiting for a response from all of the servers in the array. In the case of sys-io, we want to fire off the flows to the servers once we get responses from them. With the new concurrent state machine code in place, I could imagine the msgpairarray being a set of concurrent nested state machines, that in the normal case all just wait for completion before returning to the parent. In the case of sys-io though, it seems like we could leverage that by allowing groups of concurrent machines to be chained together, and instead of waiting for all of the concurrent machines to finish (each msgpairarray), allow the completion of one to be the beginning of another nested machine in a different grouping (in the io case, start_flow). Its a bit hand wavy on the details, but the idea is that forked concurrent state machines could complete without joining, instead they just go off to the next state. The join would happen explicitly with some kind of syntax in the state machine definition (join?).

I like this general idea, but for my 2c I would probably structure things a little different. I don't think the msgpairarray should be augmented much further, since it already getting a little complicated. As it stands it does a good job of handling all of the other types of operations (other than sys-io) pretty well. If anything it would be nice to find ways to make it simpler in the long run.

As far as sys-io goes, I could see it consisting mainly of a a state machine that handles I/O to a single server with these basic steps (plus whatever helper logic states are needed):

1) setup
2) post ack
2) post req
4) post flow
5) write ack
6) retry if needed from any of the above

You could then use the concurrent sm infrastructure to start N of these, one for each server. I'm not sure exactly what Walt's model is, but I imagine this would mean having a parent state machine with 2 states: one to decide which servers to use and launch the N child sms, and one to collect the results. The N copies would not coordinate with each other at all until they complete.

These child state machines would be much easier to debug than our current scheme because they wouldn't need any logic in them to deal with multiple servers, arrays of error codes, some servers going faster than others, etc.

I know this duplicates a little of what msgpairarray does, but I think it is sufficiently different to warrant just doing something custom for sys-io so the msgpairarray doesn't get too elaborate.

-Phil
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to