Thanks for the reply, apparently the driver that comes with this card is bad and only 
transmits certain length UDP packets, which is bad for LAM and various other things 
that use UDP heavily.

Supposedly downloading the newest version of the syskonnect driver the 3com driver is 
based on (http://www.syskonnect.com/syskonnect/support/driver/) and installing it is 
supposed to solve this problem.

I just started trying to work the driver installation, but here is the original 
message from the lam-mpi group.  I would cut and paste but its a long thread and 
theres some terminated sub-threads that contain interesting information.

http://www.lam-mpi.org/MailArchives/lam/msg08448.php

I thought this might be of interest since several folks have mentioned using 3com 
cards.
Original Message -----------------------
(appologies if someone already answered this; I'm composing this while
offline and I don't know when my laptop will be in range of the net to
actually send :-)

On Aug 16, 2004, at 6:30 PM, Michael Edwards wrote:

> I just got my cluster "working" with OSCAR 3.0 and RH 9.0 using 3com
> 3c2000T cards and a Cisco GigE switch.  When I installed OSCAR I had
> forgotten to take the machine name out of the alias list for 127.0.0.1
> and put it on its own line where it belongs.  I fixed the hosts files
> on the head and all the nodes (not sure how to fix on the image).
> Pushing files around works fine.
>
> I am now having problems with running programs with lam-mpi on all my
> processors.  I ran NetPipe between several nodes without problem (it
> simple, only ever has two processes), but running the hpl benchmark on
> more than the head node and maybe one or two other processors locks up
> right away.

IIRC, NetPIPE only runs between two processes.  So you may not be
testing what you think you are testing.  You'll need to check the
NetPIPE docs / source code to be sure -- I don't remember offhand.

That being said, LAM opens TCP sockets between all of its processes
during MPI_INIT.  So if you run a NetPIPE across all your nodes, even
if it's only communicating between two of the nodes, LAM has
successfully managed to open TCP sockets between all of them -- so your
network connectivity is good.

So if NetPIPE runs but HPL dies/hangs somewhere in the middle, you may
actually be having subtle network issues (packet loss or other
badness).  In my experience, this is usually due to bad hardware or
(more frequently) bad TCP drivers in Linux.  I'm not a hardware/driver
expert, though -- others on this list can speak about this more
intelligently than I.

> My question is mainly what effect having the hosts file slightly wrong
> would have on OSCAR's install process (it seemed to work ok, so it
> clearly wasn't fatal).  I changed a number of things at the same time
> so I am not sure which one of them is causing the lockup, so I thought
> I would check and see if any of you folks could think of obvious
> configuration problems it would cause.

--
{+} Jeff Squyres
{+} [EMAIL PROTECTED]
{+} http://www.lam-mpi.org/



-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
Oscar-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/oscar-users



-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
Oscar-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to