Thanks, Rob!

Andrew:

Can you run the parallel make using strace, so I can see which system calls
are made?  That will help narrow down where the problem exists.

Please clarify:
AFTER the make process has completed, you can access the files using
pvfs2-cp but not through the kernel module?  If this is true, can you send
me the output of an "ls -al" and "pvfs2-ls -al"?

Thanks,
Becky

On Mon, May 14, 2012 at 3:18 PM, Rob Latham <[email protected]> wrote:

> On Mon, May 14, 2012 at 12:44:48PM -0400, Becky Ligon wrote:
> > Andrew:
> >
> > You have given us a lot to chew on, but it sounds like the kernel module
> is
> > having problems.  I'm not familiar with parallel make.  Does it use MPI?
>
> No MPI.  He's referring to the 'make' feature where you can spawn many
> processes to work on the dependency tree.   so 'make -j10' says
> "whenever make finds independent targets, spawn up to 10 instances to
> work on them.
>
> ==rob
>
> > Becky
> >
> > On Sat, May 12, 2012 at 9:17 PM, Andrew Savchenko <[email protected]>
> wrote:
> >
> > > Hello,
> > >
> > > During some testing I found that orangefs behaves badly when multiple
> > > intense parallel i/o is used on the same directory. For testing I
> > > used parallel make: just untar some relatively large tarball and run
> > > make -j10
> > > I used torque-3.0.5, but this should not matter.
> > >
> > > My current setup is: orangefs-2.8.5, 15 servers serving both data and
> > > metadata, 16 clients, 15 of them are on the same nodes as servers,
> > > this testing was conducted on a separated node with no servers on it.
> > > Kernel is linux-3.2.14, ACL support is disabled due to previously
> > > found bugs:
> > >
> > >
> http://www.beowulf-underground.org/pipermail/pvfs2-developers/2012-April/004974.html
> > > I use TroveSync disabled.
> > >
> > > During parallel make random files (rarely directories) become
> > > inaccessible, any attempt to use them results in EIO (system error 5,
> > > input/output error). However, these files can be normally accessed
> > > from other nodes or even from the same node using pvfs2-cp, which
> > > doesn't use kernel VFS to my knowledge.
> > >
> > > I made a series of tests to find what may affect this behaviour and
> > > found that:
> > >
> > > 1) Error rate depends on parallelism level: make -j2 is often fine,
> > > -j5 produces more problems, -j10 tends to "generate" broken files
> > > very often and so on.
> > >
> > > 2) With client-side caching disabled (defaults are -a5 -n5):
> > > pvfs2-client -a 0 -n 0 ...
> > > things became worse: frequency of error occurrence raised
> > > significantly. Somewhat large cache (-a10 -n10) seems to work better,
> > > but doesn't eliminate problem completely.
> > >
> > > 3) During such tests I found that sometimes kernel produce backtraces
> > > and complains about NULL pointer dereference. See attached kernel.log
> > > for details. pvfs2-client complains a lot in its log via the same
> > > message:
> > > [E 09:49:22.580278] Completed upcall of unknown type ff00000d!
> > > Though, it is not strictly in sync with kernel backtraces.
> > >
> > > 4) When I tried to increase client cache significatly (-a500 -b500)
> > > and run make -j10, I got kernel crash, all disk subsystem (not only
> > > pvfs2) became unresponsive and only hardware watchdog save the
> > > situation. This was general protection fault. I managed to saved
> > > kernel trace, see kernel.crash.log.
> > >
> > > 5) There are no errors logged on the pvfs2 servers.
> > >
> > > 6) TroveSyncMeta yes has no noticeable effect on this issue.
> > >
> > > 7) TroveSyncData yes makes it somewhat better in one cases and worse
> > > in another.
> > >
> > > 8) I tried to increase AttrCacheSize and AttrCacheMaxNumElems values,
> > > though with no effect. Nevertheless I plan to keep larger values,
> > > they shouldn't hurt and we have a plenty of RAM available.
> > >
> > > My current pvfs config as attached for reference.
> > >
> > > As for now I can somewhat mitigate this issue by using a cron
> > > script with either mount -o remount on nodes with problems (though
> > > remount with live applications may produce problems itself) or by
> > > using the following sequence:
> > > pvfs2-cp badfile tempfile
> > > pvfs2-rm badfile
> > > cp tempfile badfile
> > >
> > > But anyway this will not help with already confused applications...
> > >
> > > I'm aware that support for 3.1 and 3.2 kernels is still experimental,
> > > but I can't downgrade this system because other applications require
> > > some new kernel features.
> > >
> > > Also I found an interesting options: DBCacheSizeBytes and
> > > DBCacheType, though as far as I understand they have effect only on
> > > TroveMethod dbpf and are useless for alt-aio used in my setup.
> > > Please correct me if I'm wrong.
> > >
> > > Best regards,
> > > Andrew Savchenko
> > >
> > > _______________________________________________
> > > Pvfs2-users mailing list
> > > [email protected]
> > > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
> > >
> > >
> >
> >
>
> > _______________________________________________
> > Pvfs2-users mailing list
> > [email protected]
> > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>
>
> --
> Rob Latham
> Mathematics and Computer Science Division
> Argonne National Lab, IL USA
>



-- 
Becky Ligon
OrangeFS Support and Development
Omnibond Systems
Anderson, South Carolina
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to