On Oct 18, 2006, at 2:18 PM, Sam Lang wrote:
On Oct 16, 2006, at 5:40 PM, Brett Bode wrote:
Hello,
We have modified an existing application to directly call
libpvfs2. Our pvfs2 setup has 6 servers and is setup to run pvfs2
over OpenIB verbs. We borrowed the code more or less from pvfs2-
cp. This seems to work and we have had several successful runs.
However we have also had a couple of hangs on one node. The
traceback for the hang is:
#0 0x00002ab9874a34bf in poll () from /lib/libc.so.6
#1 0x0000000001cbea67 in BMI_ib_testcontext ()
#2 0x0000000001c8feb4 in BMI_testcontext ()
#3 0x0000000001c99624 in PINT_thread_mgr_bmi_push ()
#4 0x0000000001c950d3 in do_one_work_cycle_all ()
#5 0x0000000001c95883 in job_testcontext ()
#6 0x0000000001ca37e4 in PINT_client_state_machine_test ()
#7 0x0000000001ca3c00 in PINT_client_wait_internal ()
#8 0x0000000001c7df71 in PVFS_sys_io ()
#9 0x0000000001c6e253 in flushBuffer ()
at /afs/.scl.ameslab.gov/project/nodeimg/amd64.test/usr/src/
gamess-pvfs/bypa
ssIO-pvfs.c:355
#10 0x0000000005eb27b0 in userFilePos ()
Eventually we timeout and die. So the first question is do you
have any suggestions as to where to look for the cause of the
hang? That is a write, but I have seen it fail now during a read
as well (it died on the 12th pass through after reading the
complete file 11 times).
We also have several usage and/or tuning related questions. First
off, when the file is created there are options for the
"dfile_count" and the "strip_size". Thus far I have left them at
defaults. Can you comment on what sort of values would be optimal
for sequentially accessed large files. Would tuning the IO buffer
size the application passes to the strip size be useful?
You're already seeing that matching the stripe size and request
size give you much fewer cache misses, which is new info we can add
to the tuning guide, or maybe Pete can come up with some
optimizations around that.
Rob pointed out that its not matching the two that you really want.
The ideal strip size needs to be large enough to prevent a single
request from being many multiples of the stripe size, but small
enough that a request still spans all servers. So for your specific
case, ideally you would have:
strip_size = request_size / number_of_servers
-sam
Usually the strip size is used to control the behavior of disk
IO, as it means the trove layer is able to do reads and writes in
larger chunks. I think we've generally found for that larger
workloads the default strip size is ideally matched to the size of
requests. I think just increasing the strip size shouldn't
necessarily help for sequential accesses.
Up to this point, the dfile_count has only been used to improve
performance of IO on smaller files, by setting the value to 1, so
that small requests are not broken down even further. In your case
it probably makes sense to leave it at its default value.
What 'tuning guide' you say? Its currently a work in
progress :-). If anyone is interested in helping out, especially
for the IB sections, we could really use it.
We have also have a problem when running on our IBM EHCA's with
too many memory registrations. The odd part is that I am using the
same 1MB buffer all time so I don't see why it seems to be
reregistered at each write. My write code looks like this:
file_req = PVFS_BYTE;
ret = PVFS_Request_contiguous(ioSize, PVFS_BYTE,
&mem_req);
if (ret < 0) {
PVFS_perror("PVFS_Request_contiguous", ret);
return;
}
ret = PVFS_sys_write(target_object.ref, file_req,
bufferedFilePos, myBuffer, mem_req,
&credentials, &resp_io);
if (ret == 0) {
PVFS_Request_free(&mem_req);
/* return(resp_io.total_completed);*/
} else
PVFS_perror("PVFS_sys_write", ret);
One question is what does PVFS_Request_contiguous actually do?
It creates a request structure that essentially contains the size
and offset into the memory buffer.
Since I am using the same buffer all the time would it be ok to
setup the request once and then reuse it so long as the io size is
the same?
Yes. The request structure doesn't get modified by the IO call.
You (correctly) use PVFS_BYTE for the file request. The reason you
can't just use PVFS_BYTE for the memory request is that the size
has to be encapsulated in the request as well (while the file
request gets tiled based on the actual file size).
-sam
Thanks for any help you can provide,
Brett
____________________________________________
Dr. Brett Bode
329 Wilhelm Hall
Ames Laboratory
Iowa State University
Ames, IA 50011 (515) 294-9192
[EMAIL PROTECTED] FAX: (515) 294-4491
____________________________________________
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers