Thank you for your reply, Reese!

What version of GM are you running?
# rpm -qa |egrep "^gm-[0-9]+|^gm-devel"
Is this too old?

And are you sure that gm_board_info
shows all the nodes that are listed in your machine file?
Yes, that was the issue - bad cable connection to my compute node
prevented it from being seen on the fabric :( Thanks for pointing this
out for me.

 Could you send
a copy of your gm_board_info output , please?
# ./gm_board_info
GM build ID is "2.0.24_Linux_rc20051223164441PST Tue Jan 30
23:07:45 EST 2007."

Board number 0:
 lanai_cpu_version = 0x0a00 (LANai10.0)
 lanai_sram_size   = 0x001fe000 (2040K bytes)
ROM settings:
LANai time is 0x209b211b12 ticks, or about 1043 minutes since reset.
Mapper is 00:60:dd:49:99:96.
Map version is 1965903.
2 hosts.
Network is fully configured.
This node is ""
Board has room for 16 ports,  1559 nodes/routes,  16384 cache entries
         Port token cnt: send=61, recv=253
Port: Status  PID
  0:   BUSY  7489  (this process [gm_board_info])
  1:   BUSY 25113
Route table for this node follows:
gmID MAC Address                                 gmName Route
---- ----------------- -------------------------------- ---------------------
  1 00:60:dd:49:1e:bf   (this node)
  2 00:60:dd:49:99:96   81 (mapper)

 A mismatch between the list
of nodes actually configured onto the Myrinet fabric and the machine file is
a common source of errors like this.  The mismatch could be caused by cable
failure or other mapping issues.
Could you elaborate on the mapping issues you mentioned? What are they?

Why GM instead of MX, by the way?
We have a few MX cards in-house, but no MX switch due to its current
market price. So we're only able to perform MX testing using
direct-connection cables, which is not very exciting :) On the
contrary, we've already had GM boards and a switch and found it
sufficient for OpenMPI testing purposes. Would be great to upgrade to
MX in the near future.

Thank you very much for your help.


Reply via email to