Hi Julian,
I have a few ideas for you to try to help narrow down these bugs.I'm not sure how well the small-io stuff will work with non-contig. It was never rigorously tested. Can you recompile with - DPVFS2_SMALL_IO_OFF and run your tests again?
I've attached a patch that fixes the last valgrind error in your list (in PINT_distribute). Can you try it and let me know if that fixes it?
Thanks, -sam
memset-fdata-smallio.patch
Description: Binary data
On Mar 8, 2007, at 10:50 AM, Julian Martin Kunkel wrote:
Hi,We see a rather strange and wrong behavior with PVFS2 using a file view withMPI-IO using different levels :) mpiexec -np 2 ./MPI-IO -i 4 -f pvfs2://pvfs2/test -s 10 level0 0000000 0000 0000 0000 0000 0000 0101 0101 0101 0000010 0101 0101 0101 0101 0101 0101 0101 0101 * 0000030 0101 0000 0000 0000 0000 0000 0000 0000 0000040 0000 0000 0000 0000046 mpiexec -np 2 ./MPI-IO -i 4 -f pvfs2://pvfs2/test -s 10 level2 0000000 0000 0000 0000 0000 0000 0101 0101 0101 0000010 0101 0101 0000 0000 0000 0000 0000 0101 0000020 0101 0101 0101 0101 0000 0000 0000 0000 0000030 0000 0101 0101 0101 0101 0101 0101 0101 0000040 0101 0101 0101 0000046With this level in addition the number of bytes which are transfered betweenclient and servers does not match the amount of data it should be...With a level3(non-contig, coll) and level1 (coll, contig) it looks correctlike: 0000000 0000 0000 0000 0000 0000 0101 0101 0101 0000010 0101 0101 0000 0000 0000 0000 0000 0101 0000020 0101 0101 0101 0101 0000 0000 0000 0000 0000030 0000 0101 0101 0101 0101 0101 0000 0000 0000040 0000 0000 0000 0101 0101 0101 0101 0101 0000050Minimum setup where this error ocurred was with 3 data servers. However, sometimes for examples with 4 dataservers the bug may disappear. Using 5dataservers and a bigger file (500K) (mpiexec -np 4 ./MPI-IO -i 10 -fpvfs2://pvfs2/test -s 50K level2) shows that the content of the file isdifferent for different runs. The md5sum might be for example: c809928d82ca72e00469283f2450c5f0 7d215f060b113f81c2210ac6e8e4c6d9 b4ca34c8a8a7b06a9b6d29e4b78964c3Software: PVFS2 03/08/07 CVS and the new tiled-types-for-mkuhn.diff patch withthe current mpich2-1.0.5-p3...I did some runs for the levels with valgrind this showed (among other reportedissues) in level0 and level2 the following: ==18294== Invalid read of size 4==18294== at 0x80EF461: ADIOI_PVFS2_WriteStrided (ad_pvfs2_write.c:392)==18294== by 0x80AA299: MPIOI_File_write (write.c:156) ==18294== by 0x80A9C80: PMPI_File_write (write.c:52) ==18294== by 0x8056706: ??? (log_mpi_io.c:871) ==18294== by 0x804ACDA: Test_level0 (MPI-IO.c:75) ==18294== by 0x804B699: main (MPI-IO.c:309)==18294== Address 0x4771460 is 0 bytes after a block of size 8 alloc'd==18294== at 0x401B867: malloc (vg_replace_malloc.c:149) ==18294== by 0x80B505C: ADIOI_Malloc_fn (malloc.c:50) ==18294== by 0x80B4D66: ADIOI_Optimize_flattened (flatten.c:759) ==18294== by 0x80B3036: ADIOI_Flatten_datatype (flatten.c:79) ==18294== by 0x80BF8C8: ADIO_Set_view (ad_set_view.c:52) ==18294== by 0x80AA85A: PMPI_File_set_view (set_view.c:138) ==18294== by 0x8055CDE: MPI_File_set_view (log_mpi_io.c:611) ==18294== by 0x804AC80: Test_level0 (MPI-IO.c:70) ==18294== by 0x804B699: main (MPI-IO.c:309) Similar for reads in ReadStrided...These issues are not reported for the other levels and look rather suspiciousfor me... The following issue is common for all levels: ==18315== Conditional jump or move depends on uninitialised value(s) ==18315== at 0x8121869: PINT_distribute (pint-request.c:740) ==18315== by 0x811FB0B: PINT_process_request (pint-request.c:322)==18315== by 0x8139641: small_io_completion_fn (sys-small-io.sm: 257) ==18315== by 0x8180DD9: msgpairarray_completion_fn (msgpairarray.sm:547) ==18315== by 0x812A648: PINT_state_machine_next (state-machine- fns.h:158)==18315== by 0x8129D3D: PINT_client_state_machine_test (client-state-machine.c:559) ==18315== by 0x812A1C3: PINT_client_wait_internal (client-state-machine.c:733) ==18315== by 0x812A3C5: PVFS_sys_wait (client-state-machine.c:861) ==18315== by 0x813300A: PVFS_sys_io (sys-io.sm:351)==18315== by 0x80ECCCD: ADIOI_PVFS2_ReadStrided (ad_pvfs2_read.c: 500)==18315== by 0x80A9571: MPIOI_File_read (read.c:151) ==18315== by 0x80A8F58: PMPI_File_read (read.c:52) Thanks, Julian _______________________________________________ Pvfs2-developers mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
_______________________________________________ Pvfs2-developers mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
