[EMAIL PROTECTED] wrote on Fri, 16 Nov 2007 16:30 -0600:
> I am coming back to a problem I still have with PVFS 2.6.3 over IB.
> 
> I run it  on Lonestar - Xeon Intel Duo-Core 64bit cluster at TACC:
> http://www.tacc.utexas.edu/services/userguides/lonestar/
> 
> I remind you that PVFS-IB works on the front end, but fails when I try
> to start it on the compute nodes.
> 
> As Pete suggested I had set the debug level to network.
> 
> I found out that there for each run one of  two types of errors show up:
> 
> 1) this is from the previous message I sent to the list
> > > [E 10:04:01.781047] Error: openib_mem_register: ibv_register_mr.
> 
> 2) this I just got (the full messages are at the end of this mail):
> [E 12:05:07.676399] Error: openib_ib_initialize: ibv_create_cq failed.

This comes before the register_mr so let's tackle it first.

> As Pete suggested I looked in /etc/security/limits.conf: soft and hard
> memlock are set to unlimited.

Nice to know, but just to be sure, sit on the machine where you are
getting the error message, in bash, and do "ulimit -a" and tell us
what "max locked memory" says.  I bet it is 32.  That would explain
why the CQ fails:  it tries to pin 1k elements of 32 bytes each.

> In do not have control over the nodes, I can not install things, I am
> just a user :)

If this is true, complain to your admin.  He probably forgot to do
"ulimit -l unlimited" in the PBS mom startup script, if you are
landing on the nodes thanks to "qsub -I".  I wonder how anybody has
been able to run any MPI/IB codes.  If you are getting there via
rsh or ssh, limits.conf should be doing the trick, but maybe there
is some hokeyness it /etc/profile.d/* or similar.  You will have to
nose around.

> Pete, how can I find out what type of Infiniband fabric is installed?

lspci | grep Infi

                -- Pete

_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to