Hi, > Do you mean Pete's IB 4x numbers or my MX-10G numbers? Or both? both are good, sorry I meant especially the throughput of the new implementation MX-10G. > I am not sure that I follow you here. Ideally, I only want to measure > network activity and PVFS2 overhead. I would prefer to avoid > measuring disk activity but these old nodes do not have enough memory > to use ramfs well. > My MX-10G results are from newer nodes that have enough memory to use > ramfs effectively. I can't keep them tied up for bmi_mx development > as long as I can these older nodes. :-) TAS simply discards data and handles metadata efficiently in-memory. It also is different by using immediately completion of I/O jobs, so the internal handling of pvfs2 is a bit different. In the past this different view sometimes helped the evaluation.
> > On the other hand you could use the pvfs2-hint-branch which > > provides you with > > better MPE logging on the server side, we have some tools to > > convert and > > merge client and pvfs2-server logs and show the results. I could > > upgrade the > > current pvfs2-hint-branch with your patches (which is currently > > somewhere at > > release 2.5). The reason we need parts of the advanced logging is > > that logs > > have a problem on the server side if multiple start events occur > > before the > > end events happen, for example if you use multiple flow streams. > > > > @Scott > > I have seen you solved it by using events and wonder which tool you > > have used > > to create states out of the events. > > You said you have problems with the MPE log on the server, maybe we > > could help > > you if you give details ? > I am using MPE. Since I am not using MPI, I compile mpich2 with > CLFAGS="-DCLOG_NOMPI". I then add "-lmpe_nompi" to my LIBS and the > path to mpich2 to my LDFLAGS. > > In my initialization function, I have: > > #if BMX_LOGGING > MPE_Init_log(); > send_start = MPE_Log_get_event_number(); > send_finish = MPE_Log_get_event_number(); > recv_start = MPE_Log_get_event_number(); > recv_finish = MPE_Log_get_event_number(); > sendunex_start = MPE_Log_get_event_number(); > sendunex_finish = MPE_Log_get_event_number(); > recvunex_start = MPE_Log_get_event_number(); > recvunex_finish = MPE_Log_get_event_number(); > MPE_Describe_state(send_start, send_finish, "Send", "red"); > MPE_Describe_state(recv_start, recv_finish, "Recv", "blue"); > MPE_Describe_state(sendunex_start, sendunex_finish, > "SendUnex", "orange"); > MPE_Describe_state(recvunex_start, recvunex_finish, > "RecvUnex", "green"); > #endif Ah I see so you use the states and not events, that actually was in the earlier versions of MPE a problem (and I think it still is). Assume that you have one client and for example the Category BMI_Send, you have start and stop events, now assume that you actually see the sequence on one client or server (introduced by multiple parallel flows): start, start, stop, stop MPE state should create two overlapping states, however it doesn't, it will create only ONE state for actually both events, thus the resulting logs are wrong! This only happens if one machine (e.g. one timeline) you get two overlapping states ! Thus, we try to use the functions MPE_Log_get_solo_eventID and PE_Describe_info_event to create single events and distribute them on multiple time lines. We have a suite of slog2 transformation programs which allow to merge client and server logs, split overlapping actions on multiple timelines. > I can now get server logs. My SERVER_LDFLAGS were wrong. Also, on the > server, I had to specify an absolute path (I did not on the client). > I would be interested in merging the logs if you can provide some > tools or insight. The question which arises is if we could synchronize the pvfs2-hint-branch with HEAD and will have all the stuff you need ? If that is the case I could do so and you could use our branch (which also provides a patched pvfs2-cp to support the hints :), it especially allows for MPI to show JOBS and TROVE operations, however BMI operations could be logged too, but not with "Request ID", but this is not important for the pvfs2-cp utility and could be integraded into the log, too. You can find a excerpt of mpi-io-test run with collective and contiguous I/O regions and a jumpshot log here: http://www.rzuser.uni-heidelberg.de/~jkunkel2/4S4C-level-2.jpg and http://www.rzuser.uni-heidelberg.de/~jkunkel2/4S4C-level-1.jpg. (Note that on these diagrams the seperation of parallel states to different timelines is not expanded). Clients are timelines 0-3, 4 is the metadata server and 6-8 are data servers. The first operation creates the file far right you can see some TROVE write operations. If you move the mouse over the job you can see the pvfs2-job in the third bar from the top, like CREATE also you can see the types of all unexpected messages (Request decode) and request type they belong like PVFS_SERV_CREATE. It could be possible though to merge your client log with the server log of our environment. (Of course of the same run). I think this at least will allow you to see idle times efficiently and might help to find the source. Our working group will provide a package with the tracing tools and some instructions how to use them, for you if you like, but we will need a few days. However, then I have to upgrade the pvfs2-hint-branch and patch it with all the stuff required to run MX, if you have patches for MX against HEAD, I could upgrade the hint-branch to HEAD. Just tell me what you need for MX. Best regards, Julian _______________________________________________ Pvfs2-developers mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
