Thanks, Rob! Andrew:
Can you run the parallel make using strace, so I can see which system calls are made? That will help narrow down where the problem exists. Please clarify: AFTER the make process has completed, you can access the files using pvfs2-cp but not through the kernel module? If this is true, can you send me the output of an "ls -al" and "pvfs2-ls -al"? Thanks, Becky On Mon, May 14, 2012 at 3:18 PM, Rob Latham <[email protected]> wrote: > On Mon, May 14, 2012 at 12:44:48PM -0400, Becky Ligon wrote: > > Andrew: > > > > You have given us a lot to chew on, but it sounds like the kernel module > is > > having problems. I'm not familiar with parallel make. Does it use MPI? > > No MPI. He's referring to the 'make' feature where you can spawn many > processes to work on the dependency tree. so 'make -j10' says > "whenever make finds independent targets, spawn up to 10 instances to > work on them. > > ==rob > > > Becky > > > > On Sat, May 12, 2012 at 9:17 PM, Andrew Savchenko <[email protected]> > wrote: > > > > > Hello, > > > > > > During some testing I found that orangefs behaves badly when multiple > > > intense parallel i/o is used on the same directory. For testing I > > > used parallel make: just untar some relatively large tarball and run > > > make -j10 > > > I used torque-3.0.5, but this should not matter. > > > > > > My current setup is: orangefs-2.8.5, 15 servers serving both data and > > > metadata, 16 clients, 15 of them are on the same nodes as servers, > > > this testing was conducted on a separated node with no servers on it. > > > Kernel is linux-3.2.14, ACL support is disabled due to previously > > > found bugs: > > > > > > > http://www.beowulf-underground.org/pipermail/pvfs2-developers/2012-April/004974.html > > > I use TroveSync disabled. > > > > > > During parallel make random files (rarely directories) become > > > inaccessible, any attempt to use them results in EIO (system error 5, > > > input/output error). However, these files can be normally accessed > > > from other nodes or even from the same node using pvfs2-cp, which > > > doesn't use kernel VFS to my knowledge. > > > > > > I made a series of tests to find what may affect this behaviour and > > > found that: > > > > > > 1) Error rate depends on parallelism level: make -j2 is often fine, > > > -j5 produces more problems, -j10 tends to "generate" broken files > > > very often and so on. > > > > > > 2) With client-side caching disabled (defaults are -a5 -n5): > > > pvfs2-client -a 0 -n 0 ... > > > things became worse: frequency of error occurrence raised > > > significantly. Somewhat large cache (-a10 -n10) seems to work better, > > > but doesn't eliminate problem completely. > > > > > > 3) During such tests I found that sometimes kernel produce backtraces > > > and complains about NULL pointer dereference. See attached kernel.log > > > for details. pvfs2-client complains a lot in its log via the same > > > message: > > > [E 09:49:22.580278] Completed upcall of unknown type ff00000d! > > > Though, it is not strictly in sync with kernel backtraces. > > > > > > 4) When I tried to increase client cache significatly (-a500 -b500) > > > and run make -j10, I got kernel crash, all disk subsystem (not only > > > pvfs2) became unresponsive and only hardware watchdog save the > > > situation. This was general protection fault. I managed to saved > > > kernel trace, see kernel.crash.log. > > > > > > 5) There are no errors logged on the pvfs2 servers. > > > > > > 6) TroveSyncMeta yes has no noticeable effect on this issue. > > > > > > 7) TroveSyncData yes makes it somewhat better in one cases and worse > > > in another. > > > > > > 8) I tried to increase AttrCacheSize and AttrCacheMaxNumElems values, > > > though with no effect. Nevertheless I plan to keep larger values, > > > they shouldn't hurt and we have a plenty of RAM available. > > > > > > My current pvfs config as attached for reference. > > > > > > As for now I can somewhat mitigate this issue by using a cron > > > script with either mount -o remount on nodes with problems (though > > > remount with live applications may produce problems itself) or by > > > using the following sequence: > > > pvfs2-cp badfile tempfile > > > pvfs2-rm badfile > > > cp tempfile badfile > > > > > > But anyway this will not help with already confused applications... > > > > > > I'm aware that support for 3.1 and 3.2 kernels is still experimental, > > > but I can't downgrade this system because other applications require > > > some new kernel features. > > > > > > Also I found an interesting options: DBCacheSizeBytes and > > > DBCacheType, though as far as I understand they have effect only on > > > TroveMethod dbpf and are useless for alt-aio used in my setup. > > > Please correct me if I'm wrong. > > > > > > Best regards, > > > Andrew Savchenko > > > > > > _______________________________________________ > > > Pvfs2-users mailing list > > > [email protected] > > > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users > > > > > > > > > > > > > _______________________________________________ > > Pvfs2-users mailing list > > [email protected] > > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users > > > -- > Rob Latham > Mathematics and Computer Science Division > Argonne National Lab, IL USA > -- Becky Ligon OrangeFS Support and Development Omnibond Systems Anderson, South Carolina
_______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
