Re: [Pvfs2-developers] libpvfs2 usage

Sam Lang Wed, 18 Oct 2006 14:35:52 -0700


On Oct 18, 2006, at 2:18 PM, Sam Lang wrote:

On Oct 16, 2006, at 5:40 PM, Brett Bode wrote:
Hello,
We have modified an existing application to directly calllibpvfs2. Our pvfs2 setup has 6 servers and is setup to run pvfs2over OpenIB verbs. We borrowed the code more or less from pvfs2-cp. This seems to work and we have had several successful runs.However we have also had a couple of hangs on one node. Thetraceback for the hang is:
#0  0x00002ab9874a34bf in poll () from /lib/libc.so.6
#1  0x0000000001cbea67 in BMI_ib_testcontext ()
#2  0x0000000001c8feb4 in BMI_testcontext ()
#3  0x0000000001c99624 in PINT_thread_mgr_bmi_push ()
#4  0x0000000001c950d3 in do_one_work_cycle_all ()
#5  0x0000000001c95883 in job_testcontext ()
#6  0x0000000001ca37e4 in PINT_client_state_machine_test ()
#7  0x0000000001ca3c00 in PINT_client_wait_internal ()
#8  0x0000000001c7df71 in PVFS_sys_io ()
#9  0x0000000001c6e253 in flushBuffer ()
at /afs/.scl.ameslab.gov/project/nodeimg/amd64.test/usr/src/gamess-pvfs/bypa
ssIO-pvfs.c:355
        #10 0x0000000005eb27b0 in userFilePos ()
Eventually we timeout and die. So the first question is do youhave any suggestions as to where to look for the cause of thehang? That is a write, but I have seen it fail now during a readas well (it died on the 12th pass through after reading thecomplete file 11 times).
We also have several usage and/or tuning related questions. Firstoff, when the file is created there are options for the"dfile_count" and the "strip_size". Thus far I have left them atdefaults. Can you comment on what sort of values would be optimalfor sequentially accessed large files. Would tuning the IO buffersize the application passes to the strip size be useful?
You're already seeing that matching the stripe size and requestsize give you much fewer cache misses, which is new info we can addto the tuning guide, or maybe Pete can come up with someoptimizations around that.

Rob pointed out that its not matching the two that you really want.The ideal strip size needs to be large enough to prevent a singlerequest from being many multiples of the stripe size, but smallenough that a request still spans all servers. So for your specificcase, ideally you would have:


strip_size = request_size / number_of_servers

-sam

Usually the strip size is used to control the behavior of diskIO, as it means the trove layer is able to do reads and writes inlarger chunks. I think we've generally found for that largerworkloads the default strip size is ideally matched to the size ofrequests. I think just increasing the strip size shouldn'tnecessarily help for sequential accesses.
Up to this point, the dfile_count has only been used to improveperformance of IO on smaller files, by setting the value to 1, sothat small requests are not broken down even further. In your caseit probably makes sense to leave it at its default value.
What 'tuning guide' you say? Its currently a work inprogress :-). If anyone is interested in helping out, especiallyfor the IB sections, we could really use it.
We have also have a problem when running on our IBM EHCA's withtoo many memory registrations. The odd part is that I am using thesame 1MB buffer all time so I don't see why it seems to bereregistered at each write. My write code looks like this:
                file_req = PVFS_BYTE;
ret = PVFS_Request_contiguous(ioSize, PVFS_BYTE,&mem_req);
                if (ret < 0) {
                    PVFS_perror("PVFS_Request_contiguous", ret);
                    return;
                }
                ret = PVFS_sys_write(target_object.ref, file_req,
                    bufferedFilePos, myBuffer, mem_req,
                    &credentials, &resp_io);
                if (ret == 0) {
                     PVFS_Request_free(&mem_req);
            /*       return(resp_io.total_completed);*/
                } else
                    PVFS_perror("PVFS_sys_write", ret);

One question is what does PVFS_Request_contiguous actually do?
It creates a request structure that essentially contains the sizeand offset into the memory buffer.
Since I am using the same buffer all the time would it be ok tosetup the request once and then reuse it so long as the io size isthe same?
Yes. The request structure doesn't get modified by the IO call.You (correctly) use PVFS_BYTE for the file request. The reason youcan't just use PVFS_BYTE for the memory request is that the sizehas to be encapsulated in the request as well (while the filerequest gets tiled based on the actual file size).
-sam
Thanks for any help you can provide,

Brett


____________________________________________
Dr. Brett Bode
329 Wilhelm Hall
Ames Laboratory
Iowa State University
Ames, IA 50011              (515) 294-9192
[EMAIL PROTECTED]  FAX: (515) 294-4491
____________________________________________



_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers


_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Re: [Pvfs2-developers] libpvfs2 usage

Reply via email to