On Oct 18, 2006, at 2:18 PM, Sam Lang wrote:


On Oct 16, 2006, at 5:40 PM, Brett Bode wrote:

Hello,
We have modified an existing application to directly call libpvfs2. Our pvfs2 setup has 6 servers and is setup to run pvfs2 over OpenIB verbs. We borrowed the code more or less from pvfs2- cp. This seems to work and we have had several successful runs. However we have also had a couple of hangs on one node. The traceback for the hang is:

#0  0x00002ab9874a34bf in poll () from /lib/libc.so.6
#1  0x0000000001cbea67 in BMI_ib_testcontext ()
#2  0x0000000001c8feb4 in BMI_testcontext ()
#3  0x0000000001c99624 in PINT_thread_mgr_bmi_push ()
#4  0x0000000001c950d3 in do_one_work_cycle_all ()
#5  0x0000000001c95883 in job_testcontext ()
#6  0x0000000001ca37e4 in PINT_client_state_machine_test ()
#7  0x0000000001ca3c00 in PINT_client_wait_internal ()
#8  0x0000000001c7df71 in PVFS_sys_io ()
#9  0x0000000001c6e253 in flushBuffer ()
at /afs/.scl.ameslab.gov/project/nodeimg/amd64.test/usr/src/ gamess-pvfs/bypa
ssIO-pvfs.c:355
        #10 0x0000000005eb27b0 in userFilePos ()

Eventually we timeout and die. So the first question is do you have any suggestions as to where to look for the cause of the hang? That is a write, but I have seen it fail now during a read as well (it died on the 12th pass through after reading the complete file 11 times).

We also have several usage and/or tuning related questions. First off, when the file is created there are options for the "dfile_count" and the "strip_size". Thus far I have left them at defaults. Can you comment on what sort of values would be optimal for sequentially accessed large files. Would tuning the IO buffer size the application passes to the strip size be useful?

You're already seeing that matching the stripe size and request size give you much fewer cache misses, which is new info we can add to the tuning guide, or maybe Pete can come up with some optimizations around that.

Rob pointed out that its not matching the two that you really want. The ideal strip size needs to be large enough to prevent a single request from being many multiples of the stripe size, but small enough that a request still spans all servers. So for your specific case, ideally you would have:

strip_size = request_size / number_of_servers

-sam

Usually the strip size is used to control the behavior of disk IO, as it means the trove layer is able to do reads and writes in larger chunks. I think we've generally found for that larger workloads the default strip size is ideally matched to the size of requests. I think just increasing the strip size shouldn't necessarily help for sequential accesses.

Up to this point, the dfile_count has only been used to improve performance of IO on smaller files, by setting the value to 1, so that small requests are not broken down even further. In your case it probably makes sense to leave it at its default value.

What 'tuning guide' you say? Its currently a work in progress :-). If anyone is interested in helping out, especially for the IB sections, we could really use it.


We have also have a problem when running on our IBM EHCA's with too many memory registrations. The odd part is that I am using the same 1MB buffer all time so I don't see why it seems to be reregistered at each write. My write code looks like this:

                file_req = PVFS_BYTE;
ret = PVFS_Request_contiguous(ioSize, PVFS_BYTE, &mem_req);
                if (ret < 0) {
                    PVFS_perror("PVFS_Request_contiguous", ret);
                    return;
                }
                ret = PVFS_sys_write(target_object.ref, file_req,
                    bufferedFilePos, myBuffer, mem_req,
                    &credentials, &resp_io);
                if (ret == 0) {
                     PVFS_Request_free(&mem_req);
            /*       return(resp_io.total_completed);*/
                } else
                    PVFS_perror("PVFS_sys_write", ret);

One question is what does PVFS_Request_contiguous actually do?

It creates a request structure that essentially contains the size and offset into the memory buffer.

Since I am using the same buffer all the time would it be ok to setup the request once and then reuse it so long as the io size is the same?

Yes. The request structure doesn't get modified by the IO call. You (correctly) use PVFS_BYTE for the file request. The reason you can't just use PVFS_BYTE for the memory request is that the size has to be encapsulated in the request as well (while the file request gets tiled based on the actual file size).

-sam


Thanks for any help you can provide,

Brett


____________________________________________
Dr. Brett Bode
329 Wilhelm Hall
Ames Laboratory
Iowa State University
Ames, IA 50011              (515) 294-9192
[EMAIL PROTECTED]  FAX: (515) 294-4491
____________________________________________



_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers


_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers


_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to