Hi,
> Do you mean Pete's IB 4x numbers or my MX-10G numbers? Or both?
both are good, sorry I meant especially the throughput of the new 
implementation MX-10G.
> I am not sure that I follow you here. Ideally, I only want to measure
> network activity and PVFS2 overhead. I would prefer to avoid
> measuring disk activity but these old nodes do not have enough memory
> to use ramfs well.
> My MX-10G results are from newer nodes that have enough memory to use
> ramfs effectively. I can't keep them tied up for bmi_mx development
> as long as I can these older nodes. :-)
TAS simply discards data and handles metadata efficiently in-memory. It also 
is different by using immediately completion of I/O jobs, so the internal 
handling of pvfs2 is a bit different. In the past this different view 
sometimes helped the evaluation. 

> > On the other hand you could use the pvfs2-hint-branch which
> > provides you with
> > better MPE logging on the server side, we have some tools to
> > convert and
> > merge client and pvfs2-server logs and show the results. I could
> > upgrade the
> > current pvfs2-hint-branch with your patches (which is currently
> > somewhere at
> > release 2.5). The reason we need parts of the advanced logging is
> > that logs
> > have a problem on the server side if multiple start events occur
> > before the
> > end events happen, for example if you use multiple flow streams.
> >
> > @Scott
> > I have seen you solved it by using events and wonder which tool you
> > have used
> > to create states out of the events.
> > You said you have problems with the MPE log on the server, maybe we
> > could help
> > you if you give details ?
> I am using MPE. Since I am not using MPI, I compile mpich2 with
> CLFAGS="-DCLOG_NOMPI". I then add "-lmpe_nompi" to my LIBS and the
> path to mpich2 to my LDFLAGS.
>
> In my initialization function, I have:
>
> #if BMX_LOGGING
>          MPE_Init_log();
>          send_start              = MPE_Log_get_event_number();
>          send_finish             = MPE_Log_get_event_number();
>          recv_start              = MPE_Log_get_event_number();
>          recv_finish             = MPE_Log_get_event_number();
>          sendunex_start          = MPE_Log_get_event_number();
>          sendunex_finish         = MPE_Log_get_event_number();
>          recvunex_start          = MPE_Log_get_event_number();
>          recvunex_finish         = MPE_Log_get_event_number();
>          MPE_Describe_state(send_start, send_finish, "Send", "red");
>          MPE_Describe_state(recv_start, recv_finish, "Recv", "blue");
>          MPE_Describe_state(sendunex_start, sendunex_finish,
> "SendUnex", "orange");
>          MPE_Describe_state(recvunex_start, recvunex_finish,
> "RecvUnex", "green");
> #endif
Ah I see so you use the states and not events, that actually was in the 
earlier versions of MPE a problem (and I think it still is).
Assume that you have one client and for example the Category BMI_Send, you 
have start and stop events, now assume that you actually see the sequence on 
one client or server (introduced by multiple parallel flows): 
start, start, stop, stop

MPE state should create two overlapping states, however it doesn't, it will 
create only ONE state for actually both events, thus the resulting logs are 
wrong! This only happens if one machine (e.g. one timeline) you get two 
overlapping states !

Thus, we try to use the functions MPE_Log_get_solo_eventID and  
PE_Describe_info_event to create single events and distribute them on 
multiple time lines. We have a suite of slog2 transformation programs which 
allow to merge client and server logs, split overlapping actions on multiple 
timelines. 

> I can now get server logs. My SERVER_LDFLAGS were wrong. Also, on the
> server, I had to specify an absolute path (I did not on the client).
> I would be interested in merging the logs if you can provide some
> tools or insight.
The question which arises is if we could synchronize the pvfs2-hint-branch 
with HEAD and will have all the stuff you need ?
If that is the case I could do so and you could use our branch (which also 
provides a patched pvfs2-cp to support the hints :), 
it especially allows for MPI to show JOBS and TROVE operations, however BMI 
operations could be logged too, but not with "Request ID", but this is not 
important for the pvfs2-cp utility and could be integraded into the log, too.

You can find a excerpt of mpi-io-test run with collective and contiguous I/O 
regions and a jumpshot log here:
http://www.rzuser.uni-heidelberg.de/~jkunkel2/4S4C-level-2.jpg
and http://www.rzuser.uni-heidelberg.de/~jkunkel2/4S4C-level-1.jpg. (Note that 
on these diagrams the seperation of parallel states to different timelines is 
not expanded).
Clients are timelines 0-3, 4 is the metadata server and 6-8 are 
data servers.
The first operation creates the file far right you can see some TROVE write 
operations. If you move the mouse over the job you can see the pvfs2-job in 
the third bar from the top, like CREATE also you can see the types of all 
unexpected  messages (Request decode) and request type they belong like 
PVFS_SERV_CREATE. 

It could be possible though to merge your client log with the server log of 
our environment. (Of course of the same run). I think this at least will 
allow you to see idle times efficiently and might help to find the source.

Our working group will provide a package with the tracing tools and some 
instructions how to use them, for you if you like, but we will need a few 
days.

However, then I have to upgrade the pvfs2-hint-branch and patch it with all 
the stuff required to run MX, if you have patches for MX against HEAD, I 
could upgrade the hint-branch to HEAD. Just tell me what you need for MX.

Best regards,
Julian
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to