On Jul 26, 2006, at 5:17 PM, Phil Carns wrote:
Sam Lang wrote:
On Jul 26, 2006, at 3:41 PM, Walter B. Ligon III wrote:
Yeah, the idea is that the SM code would call the job function.
Depending on the state actions to do it seems like asking for
trouble, all the details that have to be kept up with.
Actually, there are already job structs used by the SM code, now
I've had to add a context id to the smcb and there will be job
calls. I think you are right though, the amount of dependency
is pretty small.
As for the job funcs I think I'd need one new one to post the
parent job, establishing a counter. The child job would look up
the counter, decrement, and if zero, call job_null to relaunch
the parent, or just
replicate what job_null does, whatever seem the easiest.
I would rather see the parent get relaunched by the normal job
test code by putting itself in the job completion queue once its
finished. This could happen in a job_sm_test call like I
suggested in my previous email. Also, instead of a counter that
a test function would check, and the child state machines would
have to decrement, I'd prefer the parent job keep an array of
child state machines (it does this anyway, no?) and check each
element in the array for completion of the state machine. That
way the children aren't competing to lock the same state to
notify of completion, the parent just checks each one.
There doesn't need to be any locking- the main server thread only
executes one state function or one transition at a time. The
counter also doesn't need to be visible- it could be hidden inside
the job call, which could lock or not lock as it sees fit.
The parent also couldn't be the one checking the elements in an
array like that - it would have to be done from within the job code
somewhere (which I think you described in your previous email).
That means that somewhere in the job code (or request scheduler,
etc.) something will have to do the following on every
job_testcontext() call:
for each active sm
Only jobs that got posted as parent states would need to be checked.
for each child within that sm
check state
Which could get expensive depending on how extensively we use the
child/parallel sm model.
It seems unlikely to me that this would cost much overall. If we're
going to use this child/parallel system to send out 1000s of messages
at once, well, then perhaps we'd be better suited using something
like mpi? :-)
-sam
The implicit call is the child's call when it terminates. The
parent's call could be implicit too, or done by the state action.
Doesn't this require child state machines to only function in the
child state machine context? I'd prefer to just have generic
state machines that can be used as a child state machine or as a
top-level state machine.
I would prefer that too :) Is this going to work Walt? It would
be nice if the state machine processing code handled transparently
triggering different termination functions depending on whether it
was a top level sm or not without the state functions themselves
knowing any better.
As of this moment we really haven't taken any pains to keep the
SM independent from the job system, in fact you have to have the
job system to drive things, so in some sense its not really an
issue.
I vote for making the interfaces as separate as possible. If
someone else wants to use the state machine code somewhere else,
it would be nice to allow them to take it as-is (mpich2 guys were
talking about using it, but I think they ended up doing something
else). Also, independent layers make testing and debugging
easier in my view.
In the current code, the sm_p is passed through to the job
descriptor as a void*, and we just cast back to a sm_p in the
while loop that does the job_testcontext and then drives the
state machines again. The use of job_status does bring in the
job code into the state machine code, but it seems like mostly
only the error_code field is used within the state actions, and
the rest of that structure could be independent of the state
machine code.
-sam
Any more commends? (Sam I hope this address some of yours)
Walt
Phil Carns wrote:
Walter B. Ligon III wrote:
OK, guys, I have another issue I want input on. When child
SMs terminate they have to notify their parent. The parent
has to wait for all the children to terminate. So I've been
thinking to use the job subsystem for this: the parent would
post a job to wait for N children,
and each child would post a job, the last one releasing the
parent.
Now I see two ways to implement this - one is to implement
this directly in the state machine code. The parent simply
stops running (because it does not schedule a job yet returns
DEFERRED). Each child decrements a counter, and when it hits
0 the parent is restarted. This is a little ugly because the
waiting parent is not being held on any list or queue (up to
now all waiting SMs are in the job subsystem), also the last
terminating child becomes the parent as it starts executing
the parent code. Things can get weird when one SM starts
children that start children, and so on.
Now the other way to implement this is with the job subsystem
as I suggested above. Much cleaner except for one thing: up
to now the state machine subsystem has had no dependency at
all on the job subsystem. If we do it this way, this function
only works with the job system intact. I'd prefer not to do
this, but it does seem the cleanest, most logical means.
I like the job approach. I guess this is an extra dependency
because the sms would be calling these particular job functions
implicitly, rather than relying on the state functions to
handle those posts and releases? We definitely haven't done
that before, but at least in this case the job function that
the sm infrastructure would be depending on is the simplest one
in the arsenal :) It shouldn't be hard for someone to
reimplement that particular functionality if they wanted to use
the state machine mechanism in another project.
If you weren't planning on these job calls to be implicit, then
I'm not sure where the extra dependency is- we already use jobs
to trigger all of the other "normal" transitions.
This reminded me of a question, though- is there going to be a
standard mechanism for the children to report each of their
independent error codes to the parent sm? Or do the children
need to just keep a reference to the parent sm structure and
manually fill in an array or something?
I guess I have a broader question of how data that the children
generate (like a handle value or an attr structure) gets
transferred to the parent. Does the parent copy this stuff
from the child after the child finishes, or does the child copy
it to the parent before it exits? I think we talked about
this before at some point but I forgot what the plan is. It
would be nice if we made the developer define macros or
something to dictate what the input parameters need to be
filled in when invoking a child and what output parameters can
be retrieved when it finishes. Otherwise it starts getting
tricky to remember what fields need to be set in the sm
structure before kicking something off.
-Phil
-Phil
--
Dr. Walter B. Ligon III
Associate Professor
ECE Department
Clemson University
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers