[EMAIL PROTECTED] wrote on Fri, 16 Nov 2007 16:30 -0600:
> I am coming back to a problem I still have with PVFS 2.6.3 over IB.
>
> I run it on Lonestar - Xeon Intel Duo-Core 64bit cluster at TACC:
> http://www.tacc.utexas.edu/services/userguides/lonestar/
>
> I remind you that PVFS-IB works on the front end, but fails when I try
> to start it on the compute nodes.
>
> As Pete suggested I had set the debug level to network.
>
> I found out that there for each run one of two types of errors show up:
>
> 1) this is from the previous message I sent to the list
> > > [E 10:04:01.781047] Error: openib_mem_register: ibv_register_mr.
>
> 2) this I just got (the full messages are at the end of this mail):
> [E 12:05:07.676399] Error: openib_ib_initialize: ibv_create_cq failed.
This comes before the register_mr so let's tackle it first.
> As Pete suggested I looked in /etc/security/limits.conf: soft and hard
> memlock are set to unlimited.
Nice to know, but just to be sure, sit on the machine where you are
getting the error message, in bash, and do "ulimit -a" and tell us
what "max locked memory" says. I bet it is 32. That would explain
why the CQ fails: it tries to pin 1k elements of 32 bytes each.
> In do not have control over the nodes, I can not install things, I am
> just a user :)
If this is true, complain to your admin. He probably forgot to do
"ulimit -l unlimited" in the PBS mom startup script, if you are
landing on the nodes thanks to "qsub -I". I wonder how anybody has
been able to run any MPI/IB codes. If you are getting there via
rsh or ssh, limits.conf should be doing the trick, but maybe there
is some hokeyness it /etc/profile.d/* or similar. You will have to
nose around.
> Pete, how can I find out what type of Infiniband fabric is installed?
lspci | grep Infi
-- Pete
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users