Hi Mark,

It looks like you've got a number of configuration problems on your nodes that need to be fixed before PVFS can be run. On the frontend node, there are a bunch of "Network is unreachable" errors in the client log. It looks like you don't have the network on that node setup properly. On the compute-0-0 node, there are "Host name lookup failure" errors, where its trying to resolve 'frontend' but fails. You need to setup DNS properly for that to work.

Probably a good first step would be to verify that you can ping one node from another. From compute-0-0:

>  ping frontend

When that succeeds, and pinging the other nodes in your cluster succeeds as well, then you can proceed with the PVFS setup.

It doesn't look like you've setup any PVFS servers. PVFS is an asymmetric network file system: the servers usually run on backend nodes separate from the clients on the compute nodes. There's more information about the configurations that PVFS works well for in the user's guide on the website.

-sam

On Feb 15, 2007, at 9:53 PM, Mark Van De Vyver wrote:

Hi Sam,
Thank you for the prompt response.
I attach the /tmp/pvfs2-client.log files I found on each machine.  I
didn't see any /tmp/pvfs2-server.log files.  Are the one and the same?

I may have been a little hasty earlier in claiming that cp and rm work
fine... I think I am seeing some error when I cp from the tmpfs area
to the PVFS2 area.
I'm still trying to work out what is happening where and when

Hope this helps.
Regards
Mark


On 2/16/07, Sam Lang <[EMAIL PROTECTED]> wrote:

On Feb 15, 2007, at 7:47 PM, Mark Van De Vyver wrote:

> Hi,
> Thank you for all the effort put into making PVFS2 available.
> I'm relatively new to Linux (from WinXP), and have built a 3 node
> cluster using the Rocks Cluster software v4.2.1. I've installed the
> PVFS2 roll and by following the PVFS2 roll guide all has proceeded
> very smoothly - really, thanks - I'd expected a few days/weeks to get
> to this point.
>
> At the end of this email I pose some questions that the following
> behavior has raised.
>
> About my set-up:
> A single user.  I made no changes to the PVFS configuration
> established by the PVFS2 roll, and have one head node and two
> compute-I/O nodes.
> PVFS version 1.5.1
>
> The unexpected behavior:
> Using pvfs2-cp I have copied approx 900GB of files from serval DVD
> using dd (I dd to a tmpfs area then pvfs2-cp this 'image' to
> /mnt/pvfs2/some/path).
> I have noticed that this runs fine so long as it is the first time the > file is copied. If I use pvfs2-rm to delete a file, not necessarily
> from the same node used to make the copy, the following occurs (all
> nodes seems to be up and working fine):
> - I can see the file is removed using the gnome file browser.
> - The pvfs2-rm seems to hang, and the hollowing message is displayed:
>
> [E 15:10:02.584608] Job time out: cancelling bmi operation, job_id:
> 21.
> [E 15:10:02.584769] msgpair failed, will retry: Operation cancelled
> (possibly due to timeout)
>
Hi Mark,

It looks like the first failure with pvfs2-rm caused one of the
servers to crash, giving the appearance that pvfs2-rm was hanging.
It probably timed out at about 5 minutes or so?  The error message is
that timeout.

> If I try to re-copy the file (using pvfs2-cp), again, not necessarily
> from the same node it was first copied on, then I see and the copy
> fails.
>
> [E 15:26:53.690560] Job time out: cancelling bmi operation, job_id:
> 25.
> [E 15:26:53.690710] msgpair failed, will retry: Operation cancelled
> (possibly due to timeout)
> [E 15:26:53.690733] *** msgpairarray_completion_fn: msgpair to server > tcp://pvfs2-compute-0-1:3334 failed: Operation cancelled (possibly due
> to timeout)

The failure here with pvfs2-cp at this point is also because the
server crashed in the previous pvfs2-rm.

> [E 15:26:53.690743] *** No retries requested.
> pvfs2-cp: src/client/sysint/sys-getattr.sm:331: getattr_acache_lookup:
> Assertion `object_ref.handle != ((PVFS_handle)0)' failed.
> /
>

This is a bug, when pvfs2-cp fails due to timeout, we shouldn't
assertion fail.  I will look into this, although it may have already
been fixed since 1.5.1.

> On rebooting one of the nodes I was forced to run fsck, after this the
> cluster seems  to have returned to 'normal'.

You can probably just restart the servers to get things back.

>
> The good news is that the std linux commands: cp and rm don't seem to
> have any trouble, so I am using those at the moment..... I couldn't
> find any advice that cp, etc, is preferred to pvfs2-cp, or vice versa.

I think in general a lot more effort is made to get the kernel module
working properly than the client tools (pvfs2-*).  That being said,
we don't discourage the use of the client tools, they just don't get
as much pounding, and they aren't written to match the functionality
that the VFS provides.

>
> 1) Is this a known issue that is fixed in PVFS 2.6?

The issue I think is why pvfs2-rm causes the server(s) to crash.  If
possible, could you send us the logs of the servers?  They should be
in /tmp/pvfs2-server.log.

> 2) Is it fine to continue to use v1.5.1 so long as I don't use the
> PVFS-* commands?

Yes.  There are known bugs in the 1.5.1 release, but they aren't
likely to cause any problems for what you're doing.

> 3) Is upgrading to v2.6 on a rocks cluster 'straight forward', or is
> it likely to involve some 'debugging' and a few days work - bear in
> mind my relative inexperience with Linux.

I've never installed Rocks so I'm going to have to let someone else
answer that.  We pride ourselves on making PVFS easy to install and
deploy, and that hasn't changed in the newer releases.

-sam

>
> Regards
> Mark
> _______________________________________________
> Pvfs2-users mailing list
> [email protected]
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>

<pvfs2-client.log.frontend>
<pvfs2-client.log.compute-0-0>
<pvfs2-client.log.compute-0-1>

_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to