Matthew, Sam-

FYI, using jumbo frames is not necessarily that simple. The IBM file servers that came with our BG/L don't support jumbo frames on the internal NICs. Ours are IBM x346 type 8840, and the built-in Broadcom NICs couldn't handle jumbo frames. I imagine other xSeries boxes with integrated Broadcom NICs may have similar issues. We ended up buying PCI network cards in order to implement jumbo frames in our environment. Also, you'll need to make sure your network switch can handle jumbo frames (ours is a Force10, don't know the exact model off the top of my head but it supports jumbo frames).

The other thing you need to be aware of is that switching to jumbo frames is an all-or-nothing proposition; if you do it, you'll have to do it for *all* of the hardware on the involved network segment. You can't just change a couple of servers.

I'm Cc:ing a couple of folks at Argonne who worked on getting jumbo frames working for our environment; they might be able to warn you of any other gotchas. We're using 8000 byte frames, but if I were starting from scratch I'd try something closer to 8300 so that an entire 8192-byte NFS packet can fit in a single frame (avoiding fragmentation if you're using an 8192 byte NFS rsize/wsize). Note, 8300 is just a ballparck guess that I haven't been able to confirm.

Be warned -- in our environment, we started to have problems when we got close to 9000 byte frames, so don't go too high.

-Andrew Cherry
 BG/L Support
 Argonne National Laboratory

On Apr 20, 2007, at 5:02 PM, Sam Lang wrote:


Hi Matthew,

I think the version of PVFS in the Zepto release is pvfs2-1.5.1. Besides some performance improvements in the latest release (pvfs-2.6.3), there was a specific bugfix made in PVFS for largish mpi-io jobs. If you could try the latest (at http:// www.pvfs.org/), it would help us to verify that you're not running into the same problem.

Regarding config options for PVFS on BGL, make sure you have jumbo frames enabled, i.e.

ifconfig eth0 mtu 8000 up

Also, you should probably set the tcp buffer sizes explicitly in the pvfs config file, fs.conf:

<Defaults>
        ...
        TCPBufferSend 524288
        TCPBufferReceive 1048576
</Defaults>

You might also see better performance with an alternative trove method for doing disk io:

<StorageHints>
        ...
        TroveMethod alt-aio
</StorageHints>


Thanks,

-sam

On Apr 20, 2007, at 4:25 PM, Matthew Woitaszek wrote:


Good afternoon,

Michael Oberg and I are attempting to get PVFS2 working on NCAR's 1-rack BlueGene/L system using ZeptoOS. We ran into a snag at over 8 BG/L I/O nodes
(>256 compute nodes).

We've been using the mpi-io-test program shipped with PVFS2 to test the system. For cases up to and including 8 I/O nodes (256 coprocessor or 512 virtual node mode tasks), everything works fine. Larger jobs fail with file
not found error messages, such as:

   MPI_File_open: File does not exist, error stack:
ADIOI_BGL_OPEN(54): File /pvfs2/mattheww/_file_0512_co does not exist

The file is created on the PVFS2 filesystem and has a zero-byte size. We've run the tests with 512 tasks on 256 nodes, and it successfully created a
8589934592-byte file. Going to 257 nodes fails.

Has anyone seen this behavior before? Are there any PVFS2 server or client configuration options that you would recommend for a BG/L installation like
this?

Thanks for your time,

Matthew



_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users



_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to