Re: [Pvfs2-users] MX help

Bradley Settlemyer Thu, 05 Mar 2009 10:05:24 -0800

It must be something with my mpich or linked mpi-io code, because all
of the pvfs tools work, including pvfs-cp in both directions.


Cheers,
brad


On Thu, Mar 5, 2009 at 12:39 PM, Bradley Settlemyer
<[email protected]> wrote:
> Scott and Rob,
>
>  PAV is the pvfs autovolume service, it allows me to start pvfs for a
> job on the compute nodes I've scheduled.  Effectively, its a remote
> configuration tool that takes a config file and configures and starts
> the pvfs servers on a subset of my job's nodes.
>
> Additional requested info . . .
> MX version:
> [brad...@node0394:bradles-pav:1009]$ mx_info
> MX Version: 1.2.7
> MX Build: w...@node0002:/home/wolf/rpm/BUILD/mx-1.2.7 Wed Dec  3
> 09:21:26 EST 2008
> 1 Myrinet board installed.
> The MX driver is configured to support a maximum of:
>        16 endpoints per NIC, 1024 NICs on the network, 32 NICs per host
> ===================================================================
> Instance #0:  313.6 MHz LANai, PCI-E x8, 2 MB SRAM, on NUMA node 0
>        Status:         Running, P0: Link Up
>        Network:        Myrinet 10G
>
>        MAC Address:    00:60:dd:47:23:4e
>        Product code:   10G-PCIE-8A-C
>        Part number:    09-03327
>        Serial number:  338892
>        Mapper:         00:60:dd:47:21:dd, version = 0x00000063, configured
>        Mapped hosts:   772
>
>
> Pvfs2 is version 2.7.1, built with the mx turned on and the tcp turned
> off.  I can copy files out of the file system, but writing to the file
> system is precarious.  Data gets written in, but seems to hang or
> something.  Here is my job output using mpi-io-test:
>
> time -p mpiexec -n 2 -npernode 1
> /home/bradles/software/anl-io-test/bin/anl-io-test-mx -f
> pvfs2:/tmp/bradles-pav/mount/anl-io-data
> # Using mpi-io calls.
> [E 12:21:32.047891] job_time_mgr_expire: job time out: cancelling bmi
> operation, job_id: 3.
> [E 12:21:32.058035] msgpair failed, will retry: Operation cancelled
> (possibly due to timeout)
> [E 12:26:32.217723] job_time_mgr_expire: job time out: cancelling bmi
> operation, job_id: 56.
> [E 12:26:32.227774] msgpair failed, will retry: Operation cancelled
> (possibly due to timeout)
> =>> PBS: job killed: walltime 610 exceeded limit 600
>
> This is writing 32MB into a file.  The data seems to all be there
> (file size is 33554432), but the writes must not ever return I guess.
> I don't know how to diagnose what is the matter.  Any help is much
> appreciated.
>
> Thanks
> Brad
>
>
>
> On Thu, Mar 5, 2009 at 9:41 AM, Scott Atchley <[email protected]> wrote:
>> On Mar 5, 2009, at 8:46 AM, Robert Latham wrote:
>>
>>> On Wed, Mar 04, 2009 at 07:15:24PM -0500, Bradley Settlemyer wrote:
>>>>
>>>> Hello
>>>>
>>>>  I am trying to use PAV to run pvfs with the MX protocol.  I've
>>>> updated pav so that servers start and ping correctly.  But when I try
>>>> and run an mpi code, I'm getting client timeouts like the client
>>>> cannot contact the servers:
>>>>
>>>> Lots of this stuff:
>>>>
>>>> [E 19:11:02.573509] job_time_mgr_expire: job time out: cancelling bmi
>>>> operation, job_id: 3.
>>>> [E 19:11:02.583659] msgpair failed, will retry: Operation cancelled
>>>> (possibly due to timeout)
>>
>> Brad, which version of MX and PVFS2?
>>
>>> OK, so pvfs utilities are all hunky-dory? not just pvfs2-ping but
>>> pvfs2-cp and pvfs2-ls?
>>>
>>> On Jazz, I usually configure MPICH2 to communicate over TCP and have
>>> the PVFS system interface communicate over MX.  This keeps the
>>> situation fairly simple, but of course you get awful MPI performance.
>>>
>>> Does MX still have the "ports" restriction that GM has?  I wonder if
>>> MPI communication is getting in the way of PVFS communication...
>>>
>>> In short, I don't exactly know what's wrong myself.  just tossing out
>>> some theories.
>>>
>>> ==rob
>>
>> Rob, MX is limited to 8 endpoints per NIC. One can use mx_info to get the
>> number:
>>
>> 8 endpoints per NIC, 1024 NICs on the network, 32 NICs per host
>>
>> This can be increased to 16 with a module parameter.
>>
>> Generally, you want no more than one endpoint per process and one process
>> per core for MPI. When you want to use MPI-IO over PVFS2, each process will
>> need two endpoints (one for MPI and one for PVFS2). If you have eight cores,
>> you should increase the max endpoints to 16 (if you have eight cores).
>>
>> Generally, I would not want to limit my MPI to TCP and IO to MX especially
>> if my TCP is over gigabit Ethernet. Unless your IO can exceed the link rate,
>> there will be plenty of bandwidth left over for MPI and your latency will
>> stay much lower than TCP.
>>
>> What is PAV?
>>
>> Scott
>>
>
> _______________________________________________
> Pvfs2-users mailing list
> [email protected]
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>

_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Re: [Pvfs2-users] MX help

Reply via email to