Heh, the job works whenever I do that: http://www.parl.clemson.edu/~bradles/downloads/anl-io-bm-mx-16-2.o168456
However, this run had a really slow write in the second instance: http://www.parl.clemson.edu/~bradles/downloads/anl-io-bm-mx-16-2.o168495 Both include debug from two procs (on seperate nodes). Hope that is okay. Cheers, Brad On Thu, Mar 5, 2009 at 1:23 PM, Scott Atchley <[email protected]> wrote: > Brad, > > Can you rerun the job with PVFS2_DEBUGMASK=network exported? > > Scott > > On Mar 5, 2009, at 12:39 PM, Bradley Settlemyer wrote: > >> Scott and Rob, >> >> PAV is the pvfs autovolume service, it allows me to start pvfs for a >> job on the compute nodes I've scheduled. Effectively, its a remote >> configuration tool that takes a config file and configures and starts >> the pvfs servers on a subset of my job's nodes. >> >> Additional requested info . . . >> MX version: >> [brad...@node0394:bradles-pav:1009]$ mx_info >> MX Version: 1.2.7 >> MX Build: w...@node0002:/home/wolf/rpm/BUILD/mx-1.2.7 Wed Dec 3 >> 09:21:26 EST 2008 >> 1 Myrinet board installed. >> The MX driver is configured to support a maximum of: >> 16 endpoints per NIC, 1024 NICs on the network, 32 NICs per host >> =================================================================== >> Instance #0: 313.6 MHz LANai, PCI-E x8, 2 MB SRAM, on NUMA node 0 >> Status: Running, P0: Link Up >> Network: Myrinet 10G >> >> MAC Address: 00:60:dd:47:23:4e >> Product code: 10G-PCIE-8A-C >> Part number: 09-03327 >> Serial number: 338892 >> Mapper: 00:60:dd:47:21:dd, version = 0x00000063, configured >> Mapped hosts: 772 >> >> >> Pvfs2 is version 2.7.1, built with the mx turned on and the tcp turned >> off. I can copy files out of the file system, but writing to the file >> system is precarious. Data gets written in, but seems to hang or >> something. Here is my job output using mpi-io-test: >> >> time -p mpiexec -n 2 -npernode 1 >> /home/bradles/software/anl-io-test/bin/anl-io-test-mx -f >> pvfs2:/tmp/bradles-pav/mount/anl-io-data >> # Using mpi-io calls. >> [E 12:21:32.047891] job_time_mgr_expire: job time out: cancelling bmi >> operation, job_id: 3. >> [E 12:21:32.058035] msgpair failed, will retry: Operation cancelled >> (possibly due to timeout) >> [E 12:26:32.217723] job_time_mgr_expire: job time out: cancelling bmi >> operation, job_id: 56. >> [E 12:26:32.227774] msgpair failed, will retry: Operation cancelled >> (possibly due to timeout) >> =>> PBS: job killed: walltime 610 exceeded limit 600 >> >> This is writing 32MB into a file. The data seems to all be there >> (file size is 33554432), but the writes must not ever return I guess. >> I don't know how to diagnose what is the matter. Any help is much >> appreciated. >> >> Thanks >> Brad >> >> >> >> On Thu, Mar 5, 2009 at 9:41 AM, Scott Atchley <[email protected]> wrote: >>> >>> On Mar 5, 2009, at 8:46 AM, Robert Latham wrote: >>> >>>> On Wed, Mar 04, 2009 at 07:15:24PM -0500, Bradley Settlemyer wrote: >>>>> >>>>> Hello >>>>> >>>>> I am trying to use PAV to run pvfs with the MX protocol. I've >>>>> updated pav so that servers start and ping correctly. But when I try >>>>> and run an mpi code, I'm getting client timeouts like the client >>>>> cannot contact the servers: >>>>> >>>>> Lots of this stuff: >>>>> >>>>> [E 19:11:02.573509] job_time_mgr_expire: job time out: cancelling bmi >>>>> operation, job_id: 3. >>>>> [E 19:11:02.583659] msgpair failed, will retry: Operation cancelled >>>>> (possibly due to timeout) >>> >>> Brad, which version of MX and PVFS2? >>> >>>> OK, so pvfs utilities are all hunky-dory? not just pvfs2-ping but >>>> pvfs2-cp and pvfs2-ls? >>>> >>>> On Jazz, I usually configure MPICH2 to communicate over TCP and have >>>> the PVFS system interface communicate over MX. This keeps the >>>> situation fairly simple, but of course you get awful MPI performance. >>>> >>>> Does MX still have the "ports" restriction that GM has? I wonder if >>>> MPI communication is getting in the way of PVFS communication... >>>> >>>> In short, I don't exactly know what's wrong myself. just tossing out >>>> some theories. >>>> >>>> ==rob >>> >>> Rob, MX is limited to 8 endpoints per NIC. One can use mx_info to get the >>> number: >>> >>> 8 endpoints per NIC, 1024 NICs on the network, 32 NICs per host >>> >>> This can be increased to 16 with a module parameter. >>> >>> Generally, you want no more than one endpoint per process and one process >>> per core for MPI. When you want to use MPI-IO over PVFS2, each process >>> will >>> need two endpoints (one for MPI and one for PVFS2). If you have eight >>> cores, >>> you should increase the max endpoints to 16 (if you have eight cores). >>> >>> Generally, I would not want to limit my MPI to TCP and IO to MX >>> especially >>> if my TCP is over gigabit Ethernet. Unless your IO can exceed the link >>> rate, >>> there will be plenty of bandwidth left over for MPI and your latency will >>> stay much lower than TCP. >>> >>> What is PAV? >>> >>> Scott >>> >> > > _______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
