[EMAIL PROTECTED] wrote on Mon, 21 Aug 2006 13:34 -0500:
> FYI I was finally cleared to upgrade my cluster to RHEL4 (2.6 kernel).
> Unfortunately this doesn't look like it fixed my problem. Doing any
> operations on a pvfs2 filesystem over native infiniband (i.e. not tcp or
> IPoIB) are extermely slow. Just a simple "ls" on a pvfs2 filesystem
> with a handful of files and directories takes 5-10 seconds and the
> pvfs2-server process takes up 98% of the CPU.
>
> Because of the operational demands of the users on this cluster I can't
> change the filesystems back from tcp to ib and get you some debug info
> right this moment. I'm hoping I can set up some playspace where I can
> give you some more details later this week.
That's sad to hear. There's code now in both VAPI and OpenIB
versions of pvfs/IB that spins for the first 10 ms waiting for a
message, then blocks using poll() until the NIC interrupts that it
sees something.
I'll be happy to look at debug traces again if you get a chance.
The sched_yield() that seemed to help your machine with the 2.4.21
kernel is commented out in src/io/bmi/bmi_ib/ib.c, because it is the
wrong thing to do, but we can use that to diagnose again.
-- Pete
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers