We have found that when trying to use pvfs with romio under openmpi, we are getting errors when the task count is bigger than 128, using 1MB messages. Smaller message sizes and larger task counts also cause the same error to be generated, just not as consistently or quickly. Errors that we see look like:
[E 15:05:50.012128] job_time_mgr_expire: job time out: cancelling bmi operation, job_id: 34. [E 15:05:50.012380] msgpair failed, will retry: Operation cancelled (possibly due to timeout) Writing to an NFS mounted file system instead of PVFS, works fine even with 256 tasks. Our version of PVFS is 2.6.2. Both openmpi 1.1.x and 1.2 produce the same errors. Any known limitations with romio and PVFS? We can supply you with a test code if you are interested in reproducing the problem. The code should compile well with mpich as well as openmpi. Regards, Jan Lindheim _______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
