Hi, guys. We're doing research enabling HPC applications on cloud filesystems. For comparison purposes, we're using a PVFS2 filesystem as an example of a mature HPC filesystem. However, we're getting unusually bad performance from our cluster on the MPI applications we're testing writing to PVFS2 through mpich2, and want to know if there are any configuration parameters we might be forgetting to tweak.
We've tried two applications: LANL's mpi_io_test (http://institutes.lanl.gov/data/software/src/mpi-io/README_20.pdf), and the IOR benchmark. Both are running through the mpich2 library, compiled with PVFS2 support. In other words, we're connecting to pvfs2 with mpi rather than through a kernel module. We're using version 2.8.1 of PVFS2, configured with a stripe size of 1 MB. The mpi_io_test program is writing 1 MB objects. The cluster has a 1gig E network. We vary the number of servers/clients from 1 to 40. Every server handles both metadata and data. Clients and servers share the same nodes (this makes things more comparable to the typical cloud computing situation). The performance we're seeing is on the order of 15 to 20 MB/s/node. This is much lower than the performance we're getting through network tools such as crisscross or iperf (which report 45-50 MB/s between most pairs of nodes--still too low but more than 20 MB/s). It doesn't seem to be a disk speed issue: We've tested with the TroveMethod set to null-aio and metadata syncing turned off and still get the same results. So we believe it might be a networking issue with the way we're sending MPI messages. Perhaps they're too small? Any advice as to parameters we should look into tweaking for MPI to get performance closer to that of the network? Any help is appreciated. Thanks. ~Milo and Esteban _______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
