[Pvfs2-developers] Re: dtype I/O bugs

Sam Lang Thu, 15 Jun 2006 09:54:26 -0700


On Jun 14, 2006, at 5:05 PM, Avery Ching wrote:

Certainly I was able to at least identify one bug I think.  The small

I/O path is being used based on whether the amount of data going to the I/O servers is below the max_unexep_payload. However, when request gets

to the server, small-io.sm calls PINT_Process_request() once and then
calls job_trove_bstream_write_list once.  Then it returns.  If the
number of stream offset-length pairs generated is greater than
SMALL_IO_MAX_REGIONS then the operation won't finish.  This won't show
up in the list I/O path since we break it up on 64 ol-pairs.  It shows
up on the datatype I/O path since it doesn't get broken up.  You could
probably trigger it in list I/O by just making SMALL_IO_MAX_REGIONS
smaller.

Suggestions for fixing:

1) (Preferred) Loop around the job_trove_bstream_write_list and
job_trove_bstream_read_list calls to keep moving data until the entire
datatype has been satisfied.

2) (Alternative) Make the offset-length pairs limit part of the
requirement for small I/O.

Thanks for debugging this Avery. For now I went with option #2 since its easier :-). If you find that small IO is a big improvement for list io then we can change it to do option 1. Can you let me know if this patch fixes the problem for you?


Thanks,

-sam

smallio.patch
Description: Binary data

Avery

On Tue, 2006-06-13 at 17:44 -0500, Avery Ching wrote:
I was able to repeat the bug on the 4 server 20 client setup you had. I also made it happen on 1 client and 2 servers. It seems to work fine with 1 server and 1 client or 1 server and 20 clients, therefore, this probably is a multi-server issue. I'll investigate further and let you know the
progress.  I hope it's not another one of those PINT_Process_req() of
flow type problems!

Avery

On Mon, 12 Jun 2006, Avery Ching wrote:
Yeah I have. I am not sure exactly what the problem is to be honest.
Basically that error messsage is just reporting what it got from the
PVFS_Sys_write() call. Therefore, it could be a lot of things. The odd thing though is that it seems to happen at random places. The test works fine for other sizes, just fails on certain ones. I'm wondering whether
it's related to the flow or Pint_process_request() problems we've
been seeing on the listserv. Oddly enough, when I did my IPDPS testing, I
never ran into that issue for write, only for read (just sometimes -
hence I had no read results =) ).
Unfortunately, debugging the flow and PINT_process_req() areas is quite difficult. I'll try and look into it a bit though. At least see if I can
repeat the bug.

Avery
suspect that the write call is not returning the correct amount of data
processed.

On Mon, 12 Jun 2006, Robert Latham wrote:
Hi Avery
I've got another hpio bug:

with 4 servers, 20 clients, hpio ran for a long long time and then
died like this:
write | region_count | c-nc | datatype
----------------time (seconds)--------------|-bandwidth (MB/ s)|---test type--- open | io | sync | close | total | IO | IOsyn | region_count 0.062 | 8.160 | 0.208 | 0.000 | 8.429 | 0.031 | 0.030 | 2048 ADIOI_PVFS2_StridedDtypeIO: Warning - PVFS_sys_read/write returned -1610612737 and completed -4611717612071138032 bytes. ADIOI_PVFS2_StridedDtypeIO: Warning - PVFS_sys_read/write returned -1610612737 and completed -4611717612071081488 bytes.
Seen anything like this before?
==rob

--
Rob Latham
Mathematics and Computer Science Division A215 0178 EA2D B059 8CDF Argonne National Labs, IL USA B29D F333 664A 4280 315B

_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

[Pvfs2-developers] Re: dtype I/O bugs

Reply via email to