Re: [Pvfs2-users] MX help

Bradley Settlemyer Thu, 05 Mar 2009 10:53:58 -0800

Heh, the job works whenever I do that:

http://www.parl.clemson.edu/~bradles/downloads/anl-io-bm-mx-16-2.o168456


However, this run had a really slow write in the second instance:

http://www.parl.clemson.edu/~bradles/downloads/anl-io-bm-mx-16-2.o168495

Both include debug from two procs (on seperate nodes).  Hope that is okay.

Cheers,
Brad

On Thu, Mar 5, 2009 at 1:23 PM, Scott Atchley <[email protected]> wrote:
> Brad,
>
> Can you rerun the job with PVFS2_DEBUGMASK=network exported?
>
> Scott
>
> On Mar 5, 2009, at 12:39 PM, Bradley Settlemyer wrote:
>
>> Scott and Rob,
>>
>>  PAV is the pvfs autovolume service, it allows me to start pvfs for a
>> job on the compute nodes I've scheduled.  Effectively, its a remote
>> configuration tool that takes a config file and configures and starts
>> the pvfs servers on a subset of my job's nodes.
>>
>> Additional requested info . . .
>> MX version:
>> [brad...@node0394:bradles-pav:1009]$ mx_info
>> MX Version: 1.2.7
>> MX Build: w...@node0002:/home/wolf/rpm/BUILD/mx-1.2.7 Wed Dec  3
>> 09:21:26 EST 2008
>> 1 Myrinet board installed.
>> The MX driver is configured to support a maximum of:
>>        16 endpoints per NIC, 1024 NICs on the network, 32 NICs per host
>> ===================================================================
>> Instance #0:  313.6 MHz LANai, PCI-E x8, 2 MB SRAM, on NUMA node 0
>>        Status:         Running, P0: Link Up
>>        Network:        Myrinet 10G
>>
>>        MAC Address:    00:60:dd:47:23:4e
>>        Product code:   10G-PCIE-8A-C
>>        Part number:    09-03327
>>        Serial number:  338892
>>        Mapper:         00:60:dd:47:21:dd, version = 0x00000063, configured
>>        Mapped hosts:   772
>>
>>
>> Pvfs2 is version 2.7.1, built with the mx turned on and the tcp turned
>> off.  I can copy files out of the file system, but writing to the file
>> system is precarious.  Data gets written in, but seems to hang or
>> something.  Here is my job output using mpi-io-test:
>>
>> time -p mpiexec -n 2 -npernode 1
>> /home/bradles/software/anl-io-test/bin/anl-io-test-mx -f
>> pvfs2:/tmp/bradles-pav/mount/anl-io-data
>> # Using mpi-io calls.
>> [E 12:21:32.047891] job_time_mgr_expire: job time out: cancelling bmi
>> operation, job_id: 3.
>> [E 12:21:32.058035] msgpair failed, will retry: Operation cancelled
>> (possibly due to timeout)
>> [E 12:26:32.217723] job_time_mgr_expire: job time out: cancelling bmi
>> operation, job_id: 56.
>> [E 12:26:32.227774] msgpair failed, will retry: Operation cancelled
>> (possibly due to timeout)
>> =>> PBS: job killed: walltime 610 exceeded limit 600
>>
>> This is writing 32MB into a file.  The data seems to all be there
>> (file size is 33554432), but the writes must not ever return I guess.
>> I don't know how to diagnose what is the matter.  Any help is much
>> appreciated.
>>
>> Thanks
>> Brad
>>
>>
>>
>> On Thu, Mar 5, 2009 at 9:41 AM, Scott Atchley <[email protected]> wrote:
>>>
>>> On Mar 5, 2009, at 8:46 AM, Robert Latham wrote:
>>>
>>>> On Wed, Mar 04, 2009 at 07:15:24PM -0500, Bradley Settlemyer wrote:
>>>>>
>>>>> Hello
>>>>>
>>>>>  I am trying to use PAV to run pvfs with the MX protocol.  I've
>>>>> updated pav so that servers start and ping correctly.  But when I try
>>>>> and run an mpi code, I'm getting client timeouts like the client
>>>>> cannot contact the servers:
>>>>>
>>>>> Lots of this stuff:
>>>>>
>>>>> [E 19:11:02.573509] job_time_mgr_expire: job time out: cancelling bmi
>>>>> operation, job_id: 3.
>>>>> [E 19:11:02.583659] msgpair failed, will retry: Operation cancelled
>>>>> (possibly due to timeout)
>>>
>>> Brad, which version of MX and PVFS2?
>>>
>>>> OK, so pvfs utilities are all hunky-dory? not just pvfs2-ping but
>>>> pvfs2-cp and pvfs2-ls?
>>>>
>>>> On Jazz, I usually configure MPICH2 to communicate over TCP and have
>>>> the PVFS system interface communicate over MX.  This keeps the
>>>> situation fairly simple, but of course you get awful MPI performance.
>>>>
>>>> Does MX still have the "ports" restriction that GM has?  I wonder if
>>>> MPI communication is getting in the way of PVFS communication...
>>>>
>>>> In short, I don't exactly know what's wrong myself.  just tossing out
>>>> some theories.
>>>>
>>>> ==rob
>>>
>>> Rob, MX is limited to 8 endpoints per NIC. One can use mx_info to get the
>>> number:
>>>
>>> 8 endpoints per NIC, 1024 NICs on the network, 32 NICs per host
>>>
>>> This can be increased to 16 with a module parameter.
>>>
>>> Generally, you want no more than one endpoint per process and one process
>>> per core for MPI. When you want to use MPI-IO over PVFS2, each process
>>> will
>>> need two endpoints (one for MPI and one for PVFS2). If you have eight
>>> cores,
>>> you should increase the max endpoints to 16 (if you have eight cores).
>>>
>>> Generally, I would not want to limit my MPI to TCP and IO to MX
>>> especially
>>> if my TCP is over gigabit Ethernet. Unless your IO can exceed the link
>>> rate,
>>> there will be plenty of bandwidth left over for MPI and your latency will
>>> stay much lower than TCP.
>>>
>>> What is PAV?
>>>
>>> Scott
>>>
>>
>
>

_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Re: [Pvfs2-users] MX help

Reply via email to