Re: [Pvfs2-developers] help debugging request processor/distribution

Phil Carns Mon, 12 Jun 2006 19:40:08 -0700

I went back and added some much more specific debugging messages and putsome special prefixes on flow, bmi, and request processor messages so Icould group them a little easier, and got rid of the extra mutexes.

After running a few more tests and double checking the logs, this iswhat I am seeing in every failure case:


- it is a write flow
- it gets broken into two buffers within the flow protocol
  - the first is 256K
  - the second is 64K
- both sides _do_ get the request processing correct
- both sides post the correct BMI operations
  - the server posts 2 receives
  - the client posts 2 sends
- the second data message never shows up at the server

I must have missed part of the flow/request processing on the clientside when I looked before. Sorry about the wild goose chase with therequest process question. The request processing definitely lookscorrect in the logs that I am seeing now (or at least both the clientand server agree on it anyway).

BMI has logging that shows when incoming messages appear on a socket,and it includes the tag number. The 64K message never shows up on asocket at the server side. The BMI operation therefore never completes,and the callback never gets triggered to complete the flow.

I don't know if this is a bmi_tcp problem, or if there is some sort ofgeneral network wackiness in my environment. I'll try to revert to someolder versions of bmi_tcp tomorrow and see if the problem persists. Ialready tried just backing out changes to the poll notification insocket-collection.[ch] and (assuming I got that right) it didn't makeany difference.


-Phil

Phil Carns wrote:

Hi Sam,
I may be wrong, but so far it looks like the problem is a littledifferent this time. In the scenario that I am seeing now it doesn'tlook like the flow has finished everything and then just failed to markcompletion- it looks like one side actually posted more BMI messagesthan the other. The server posted two receives of 256K, while theclient only posted one send of 256K. I don't see any zero byte resultsfrom the request processor or any cases where the flow should havemarked completion.
I'm still trying to get a better log. After adding mutexes inPINT_process_request() on the server side, the problem has gotten muchharder to reproduce. I don't know if there is any correlation or if Iam just perturbing the timing enough to hide the problem. So far out of9 runs I have had only 1 failure, and the logs from that failure wereinconclusive.
I definitely agree with both of your points about reworking the flowcode; it is difficult to debug right now.
-Phil

Sam Lang wrote:
On Jun 12, 2006, at 8:34 AM, Phil Carns wrote:
Hi all,
I am looking at an I/O problem that I don't completely understand.The setup is that there are 15 servers and 20 clients (all RHEL3SMP). The clients are running a proprietary application. At theend of the run they each write their share of a data set into a 36GB file. So each is doing a contiguous write of 36/20 GB of data.
Roughly 80-90% of the time, one or more of the clients and servershang, though it isn't predictable which one will do it. After quitea bit of digging through logs, it turns out that there is a flowthat never completes on the server side. There are two problemscontributing here:
1) The flow timeout mechanism doesn't seem to be working right forme (the timer for the flow pops and checks progress one time, thennever gets triggered again). This isn't the core problem, but it iscausing the flow to get stuck forever and hold up the requestscheduler rather than giving up eventually.
2) For this particular flow, the client and server appear todisagree on how much data to exchange. The client sends one 256Kbuffer and completes the flow. The server receives that 256K bufferbut still expects at least one more to arrive (which never does).
Hi Phil, these problems sound very similar to the bug I was seeingabout a month ago with the flow code for certain request types. Theparticular case I was looking at was with reads, so given thatthere's a lot of code duplication in flow I could imagine the sameproblem existing (but not getting fixed) for writes. It may not bethe same as the problem you're seeing, but I'll walk through the bugand how I fixed it, in case it helps.
Basically, the request was ending right at the boundary of a flowbuffer (so bytes == bytemax), this caused the request processing codeto finish and the bmi_send_callback_fn would return, but it would getcalled again by the post code for the next buffer, except this timethe request processing would return 0 bytes, and this queue itemwould get marked as the last one. It was that last queue item thatwas the problem: since the trove_bstream_read_list was never gettingposted, the queue item who's last field was set, was never making itinto the trove_read_callback_fn, so the dest_last_posted field forthe flow data wasn't getting set either.
The fix was to add some extra checking in bmi_send_callback_fn. Inthe case that the request processing returns 0 bytes processed, I canset the dest_last_posted field to 1 immediately. The only caveat tothis is in the case that other queue items in the source list haven'tmade it to the dest list yet, so we need to check for that case. Thediff of the changes I made might help:
http://www.pvfs.org/cgi-bin/pvfs2/viewcvs/viewcvs.cgi/pvfs2/src/io/flow/flowproto-bmi-trove/flowproto-multiqueue.c.diff?r1=1.105&r2=1.106
In any case, I wasn't able to make use of the debug messages for therequest processing code, it was too verbose to make any sense of it.I think I ended up just adding debug statements for the entry andexit points of the flow callbacks along with all the different side-state fields as they were changed. I'm not sure those debugstatements made it into the trunk though. :-(
Since we're here, I think the flow code could benefit from somereworking in two ways:
1. Less side-state variables to be set. Each queue item has fieldsthat change the behavior of when things should complete. Its hard tokeep track of all that.
2. More code abstraction. We end up doing very similar thingsbetween reads and writes on the server (and on the client too),especially with respect to the request processing. It might makesense to abstract out some of that code duplication into separatefunctions.
-sam
I am pretty sure I can get to the bottom of the first problem. Thesecond one is trickier.
I attached a few log file excerpts:
client-1st.txt: This is the request processing log output from theclient side when it is working on the flow in question
server-1st.txt: request processing log output from the firstiteration of the server side flow
server-2nd.txt: request processing log output from the seconditeration of the server side flow
Anyone have any suggestions, or see any indication from those logportions about why the server and client arn't on the same page?
Some of this log output (the server-2nd.txt in particular?) looks alittle jumbled, like maybe there are two flows doing requestprocessing at once, but I don't really know how to interpret it.
I may try adding a mutex around PINT_process_request() next, just tosee if I can make the logs clearer by making sure only oneinvocation is going at once.
-Phil


_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Re: [Pvfs2-developers] help debugging request processor/distribution

Reply via email to