Hi, We see a rather strange and wrong behavior with PVFS2 using a file view with MPI-IO using different levels :)
mpiexec -np 2 ./MPI-IO -i 4 -f pvfs2://pvfs2/test -s 10 level0 0000000 0000 0000 0000 0000 0000 0101 0101 0101 0000010 0101 0101 0101 0101 0101 0101 0101 0101 * 0000030 0101 0000 0000 0000 0000 0000 0000 0000 0000040 0000 0000 0000 0000046 mpiexec -np 2 ./MPI-IO -i 4 -f pvfs2://pvfs2/test -s 10 level2 0000000 0000 0000 0000 0000 0000 0101 0101 0101 0000010 0101 0101 0000 0000 0000 0000 0000 0101 0000020 0101 0101 0101 0101 0000 0000 0000 0000 0000030 0000 0101 0101 0101 0101 0101 0101 0101 0000040 0101 0101 0101 0000046 With this level in addition the number of bytes which are transfered between client and servers does not match the amount of data it should be... With a level3(non-contig, coll) and level1 (coll, contig) it looks correct like: 0000000 0000 0000 0000 0000 0000 0101 0101 0101 0000010 0101 0101 0000 0000 0000 0000 0000 0101 0000020 0101 0101 0101 0101 0000 0000 0000 0000 0000030 0000 0101 0101 0101 0101 0101 0000 0000 0000040 0000 0000 0000 0101 0101 0101 0101 0101 0000050 Minimum setup where this error ocurred was with 3 data servers. However, sometimes for examples with 4 dataservers the bug may disappear. Using 5 dataservers and a bigger file (500K) (mpiexec -np 4 ./MPI-IO -i 10 -f pvfs2://pvfs2/test -s 50K level2) shows that the content of the file is different for different runs. The md5sum might be for example: c809928d82ca72e00469283f2450c5f0 7d215f060b113f81c2210ac6e8e4c6d9 b4ca34c8a8a7b06a9b6d29e4b78964c3 Software: PVFS2 03/08/07 CVS and the new tiled-types-for-mkuhn.diff patch with the current mpich2-1.0.5-p3... I did some runs for the levels with valgrind this showed (among other reported issues) in level0 and level2 the following: ==18294== Invalid read of size 4 ==18294== at 0x80EF461: ADIOI_PVFS2_WriteStrided (ad_pvfs2_write.c:392) ==18294== by 0x80AA299: MPIOI_File_write (write.c:156) ==18294== by 0x80A9C80: PMPI_File_write (write.c:52) ==18294== by 0x8056706: ??? (log_mpi_io.c:871) ==18294== by 0x804ACDA: Test_level0 (MPI-IO.c:75) ==18294== by 0x804B699: main (MPI-IO.c:309) ==18294== Address 0x4771460 is 0 bytes after a block of size 8 alloc'd ==18294== at 0x401B867: malloc (vg_replace_malloc.c:149) ==18294== by 0x80B505C: ADIOI_Malloc_fn (malloc.c:50) ==18294== by 0x80B4D66: ADIOI_Optimize_flattened (flatten.c:759) ==18294== by 0x80B3036: ADIOI_Flatten_datatype (flatten.c:79) ==18294== by 0x80BF8C8: ADIO_Set_view (ad_set_view.c:52) ==18294== by 0x80AA85A: PMPI_File_set_view (set_view.c:138) ==18294== by 0x8055CDE: MPI_File_set_view (log_mpi_io.c:611) ==18294== by 0x804AC80: Test_level0 (MPI-IO.c:70) ==18294== by 0x804B699: main (MPI-IO.c:309) Similar for reads in ReadStrided... These issues are not reported for the other levels and look rather suspicious for me... The following issue is common for all levels: ==18315== Conditional jump or move depends on uninitialised value(s) ==18315== at 0x8121869: PINT_distribute (pint-request.c:740) ==18315== by 0x811FB0B: PINT_process_request (pint-request.c:322) ==18315== by 0x8139641: small_io_completion_fn (sys-small-io.sm:257) ==18315== by 0x8180DD9: msgpairarray_completion_fn (msgpairarray.sm:547) ==18315== by 0x812A648: PINT_state_machine_next (state-machine-fns.h:158) ==18315== by 0x8129D3D: PINT_client_state_machine_test (client-state-machine.c:559) ==18315== by 0x812A1C3: PINT_client_wait_internal (client-state-machine.c:733) ==18315== by 0x812A3C5: PVFS_sys_wait (client-state-machine.c:861) ==18315== by 0x813300A: PVFS_sys_io (sys-io.sm:351) ==18315== by 0x80ECCCD: ADIOI_PVFS2_ReadStrided (ad_pvfs2_read.c:500) ==18315== by 0x80A9571: MPIOI_File_read (read.c:151) ==18315== by 0x80A8F58: PMPI_File_read (read.c:52) Thanks, Julian _______________________________________________ Pvfs2-developers mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
