On Oct 24, 2006, at 1:53 PM, Walter B. Ligon III wrote:
Good. I'm making progress tracking down the problems in the code -
somehow a bunch of edits got lost. I'm fixing them. Involves
changes it all of the client state machines.
BTW, there is one I'm confused about. src/client/sysint/sys-
getattr.sm
the last state action "getattr_set_sys_response" returns from
several places. It is not clear if ALL of them intend to terminate
since they don't all set the op_completed flag, but the only option
in the SM is to terminate. So I'm assuming they want to
terminate. If you know anything about that one I'd appreciate it
if you'd look.
I agree that you don't want SM_ACTION_DEFERRED for any of those. It
looks like you just went through and replaced all the return 0; lines
in state actions with SM_ACTION_DEFERRED, even if the error_code is
set to a negative value (we used to ignore the return value if the
error value was negative?). If it was just a search and replace,
there are probably a bunch of other places like this as well.
BTW, when is SM_ACTION_COMPLETE supposed to be used (returned by a
state action)? For nested machines? We could really use some
documentation for what is supposed to be returned by state actions
and when. It didn't exist before, and it took me a while to figure
out how return 0; and return 1; behaved, and now all that is changing
again. Its certainly for the better, but it will help me to have the
rules documented explicitly.
Also, the semantics of state machines and jobs, what are they? What
are the jobs currently associated with a state machine pointer
(PINT_client_sm or PINT_server_op)? How do I stop/cancel a state
machine? This is especially pertinent for our state machines that
essentially infinite loop, such as the job-timer sm. We don't
currently cleanup those state machines ourselves, it would be nice of
us if we did. That means figuring out what (if any) jobs are
currently posted by the machine, and cancelling or waiting for
completion on those jobs.
-sam
Walt
Sam Lang wrote:
I'm working with your branch Walt. Most of the code that does
allocation of the client state machines is the same.
-sam
On Oct 24, 2006, at 9:10 AM, Walter B. Ligon III wrote:
Should be careful here, since all of the code dealing with
PINT_client_sm's have been rewritten for the new SM code and
Murali's suggestions (for example) may not work so well.
Walt
Murali Vilayannur wrote:
Hey Sam,
I ran pvfs2-client-core in valgrind, and then ran Bonnie++ a
few times (10) on the mounted pvfs volume, and noticed the
following when I stopped the client process:
==20132== malloc/free: 1,298,824 allocs, 1,297,888 frees,
3,462,517,583 bytes allocated.
Allocating and freeing 3.5GB seemed extreme, so I went
exploring. It turns out that every time we allocate a
PINT_client_sm, we're allocating about 35KB:
(gdb) p sizeof(struct PINT_client_sm)
$4 = 37764
Oh boy.. that is definitely large..
static array of 8 PINT_client_lookup_sm_ctx, which itself has
a static array 40 PINT_client_lookup_sm_segment, which are
each about 112 bytes. Anyway, it ends up accumulating.
So I'm convinced at this point that this is beyond the noise
range, plus its just cruft that we don't need. I'd like to
swap out those static arrays for dynamic allocation when we
get to the start of the lookup state machine. Any thoughts or
suggestions?
I agree. It definitely does not look like noise region anymore.
How about we keep a pool of PINT_client_sm's around in client-
core and allocate from that instead of dynamically allocating
one everytime?
My 2 cents :)
thanks,
Murali
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-
developers
--
Dr. Walter B. Ligon III
Associate Professor
ECE Department
Clemson University
--
Dr. Walter B. Ligon III
Associate Professor
ECE Department
Clemson University
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers