Hello Mr. Lang,
well, the mystery is not lifted yet. I got a hold unexpectedly of the
previous admin, and indeed it seems there is something
wrong with the machine. There is a secondary hard-disk that he had put
under lvm that is not responding anymore:
fdisk reports problem with the partition table, and no logical volumes
are found by the lvm tools.
Is this the problem? I do not know. It never occurred to me that there
could be another hd since there was
no mention in the fstab. So maybe a flying mount... Strange. But
stranger still is the timestamp of the directory
in the pvfs2-storage-space in the master..
I had a joint screen session with the previous admin yesterday as I
said, but the problem, namely
recovering this lvm setup was not resolved. I hope to be able to have
him help me again,
I will keep you informed on the progress.
Thanks again.
Raimondo
Sam Lang wrote:
On Oct 10, 2007, at 3:51 AM, Raimondo Giammanco wrote:
Hello Mr. Lang,
As far as I understand, on the master /pvfs2-storage-space is
not a mount point. /etc/fstab has no mention of it,
and the directory it contains (744468fe) has a timestamp
that is relative to the day we had to shutdown the master, so
I cannot think that there was something mounted there..
So, I am fairly certain /pvfs2-storage-space on the master was
related to the metadata, but it is empty.
Hi Raimondo,
Somehow then a number of files in your storage space on the master
have gone missing. Without them, you won't be able to start that
server, and you will have to recreate the storage space (destroying
the files that were there). I'm a little skeptical that the files
just vanished (it looks like they were deleted somehow), which is why
I suggested the storage space might not be mounted properly. Do you
get anything interesting when you run fsck on /dev/sda1? What
raid-level was used for the raid device?
I guess the big question is whether you can just ask the person you
inherited administration from what the setup was before. That would
save us a lot of trouble trying to figure it out post-mortem.
-sam
If I were to initialize it with the -f option, would after
reconstruct the data from the
IO nodes, were all seems correct and the pvfs2-server process started
correctly?
This seems rather risky to me.
Thanks for your help.
Raimondo
Sam Lang wrote:
On Oct 9, 2007, at 12:57 PM, Raimondo Giammanco wrote:
Hello Mr. Lang,
the master is a different unit type, different from the nodes that
are
blades in a rack mounted cluster.
The mount command provides on the master:
##################
/dev/sda1 on / type ext3 (rw)
none on /proc type proc (rw)
none on /sys type sysfs (rw)
none on /dev/pts type devpts (rw,gid=5,mode=620)
usbfs on /proc/bus/usb type usbfs (rw)
none on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
nfsd on /proc/fs/nfsd type nfsd (rw)
##################
while on the node it is
##################
/dev/ram0 on / type ext2 (rw)
none on /proc type proc (rw)
none on /sys type sysfs (rw)
none on /dev/pts type devpts (rw,gid=5,mode=620)
usbfs on /proc/bus/usb type usbfs (rw)
none on /dev/shm type tmpfs (rw)
/dev/md0 on /tmp type ext3 (rw)
/dev/md1 on /pvfs2-storage-space type ext3 (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
10.0.0.254:/home on /home type nfs (rw,addr=10.0.0.254)
10.0.0.254:/usr on /usr type nfs (rw,addr=10.0.0.254)
10.0.0.254:/opt on /opt type nfs (rw,addr=10.0.0.254)
nfsd on /proc/fs/nfsd type nfsd (rw)
#####################
The difference is, I believe, that the master has a hardware raid,
Is the hardware raid /dev/sda1 mounted to / ? If not, maybe the
hardware raid on the master needs to be mounted to
/pvfs2-storage-space?
while the nodes have 2 small hd in software raid for the system and
temporary data, and 2 big ones, still in software raid, for pvfs.
Ok that explains the lost+found. FYI, while the
/pvfs2-storage-space may exist as a directory in /, it can also be a
mountpoint for something else, so its contents may not be visible
(at least the contents you would expect) if you haven't mounted
everything properly.
-sam
Regards,
Raimondo
On Oct 9, 2007, at 9:40 AM, Giammanco Raimondo wrote:
Hello Mr. Ross,
thanks for your prompt reply.
I believe the config file you mention is (for my case) /etc/pvfs2-
server.conf-master-pvfs.
its contents are:
############################
StorageSpace /pvfs2-storage-space
HostID "tcp://master-pvfs:3334"
LogFile /tmp/pvfs2-server.log
############################
The config file for a node, /etc/pvfs2-server.conf-node1-pvfs for
example, is the following:
############################
StorageSpace /pvfs2-storage-space
HostID "tcp://node1-pvfs:3334"
LogFile /tmp/pvfs2-server.log
############################
Now, this /pvfs2-storage-space is unfortunately directly on the /,
so the wrong
mount timing theory is unfortunately to discard.
In the directory listing you gave us for node1 /pvfs2-storage-space,
there's a lost+found directory. That only appears if you've mounted
another volume into that directory. My guess is that for the master
node, you've managed to somehow create part of the storage space
before mounting something to /pvfs2-storage-space, and the rest was
created after. You're only seeing what was created before the
mount. That's just a guess though. Can you send us the output of
'mount' on node1 and master?
-sam
On the nodes instead /pvfs2-storage-space it is on a mounted
filesystem, /dev/md1,
but there all goes apparently right, so it seems to me that really
there is a problem
with the master node and metadata server.
The suggestion given by the log of pvfs2-server binary of using the
-f option looks
very dangerous to me, or in case of the metadata server it is ok,
in the sense that
it will reconstruct the data from the IO nodes? I cannot understand
why
the different storages have the same directory in common "744468fe",
but the master has nothing else beside this empty directory.
Even if the pvfs2-server process had been killed in a not clean way
on the master and metadata server,
it would not have been able (I assume) to delete data on the
storage directory...
So this absence of data in /pvfs2-storage-space for the metadata
server is both disconcerting and confusing...
Hope this mail will help us to proceed further.
Best Regards
Raimondo
Rob Ross wrote:
Hi Raimondo,
Two things. One, there is a second config file around that
specifies the storage directory etc. You should be able to find it
in /etc/ also. Please send that to us.
An idea is that perhaps /pvfs2-storage-space is a mounted file
system, and that somehow it is getting mounted *after* the server
is started? Just a blind guess. If you try to start the service
after the system has finished booting, does it do the same thing?
Thanks,
Rob
Raimondo Giammanco wrote:
Hello, there.
I am coming here seeking words of wisdom. I have looked the
interweb and
this list but I cannot seem to find useful informations, so I
post here.
I apologize if the answer to the question has already been
provided and I
could not find it.
I have a problem with a pvfs2 installation that has been set-up
by a third
person. The cluster has been shutdown cleanly for a scheduled
maintenance
on the power lines, and I cannot bring pvfs2 up again.
Here is the description.
There is a cluster using a fronted and 9 nodes.
As far as I understand, the fronted is a metadata server, and the
nodes
are IO servers, as for the /etc/pvfs2-fs.conf file I present here
below:
####################
<Defaults>
UnexpectedRequests 50
EventLogging none
LogStamp datetime
BMIModules bmi_tcp
FlowModules flowproto_multiqueue
PerfUpdateInterval 1000
ServerJobBMITimeoutSecs 30
ServerJobFlowTimeoutSecs 30
ClientJobBMITimeoutSecs 300
ClientJobFlowTimeoutSecs 300
ClientRetryLimit 5
ClientRetryDelayMilliSecs 2000
</Defaults>
<Aliases>
Alias master-pvfs tcp://master-pvfs:3334
Alias node1-pvfs tcp://node1-pvfs:3334
Alias node2-pvfs tcp://node2-pvfs:3334
Alias node3-pvfs tcp://node3-pvfs:3334
Alias node4-pvfs tcp://node4-pvfs:3334
Alias node5-pvfs tcp://node5-pvfs:3334
Alias node6-pvfs tcp://node6-pvfs:3334
Alias node7-pvfs tcp://node7-pvfs:3334
Alias node8-pvfs tcp://node8-pvfs:3334
Alias node9-pvfs tcp://node9-pvfs:3334
</Aliases>
<Filesystem>
Name pvfs2-fs
ID 1950640382
RootHandle 1048576
<MetaHandleRanges>
Range master-pvfs 4-429496732
</MetaHandleRanges>
<DataHandleRanges>
Range node1-pvfs 429496733-858993461
Range node2-pvfs 858993462-1288490190
Range node3-pvfs 1288490191-1717986919
Range node4-pvfs 1717986920-2147483648
Range node5-pvfs 2147483649-2576980377
Range node6-pvfs 2576980378-3006477106
Range node7-pvfs 3006477107-3435973835
Range node8-pvfs 3435973836-3865470564
Range node9-pvfs 3865470565-4294967293
</DataHandleRanges>
<StorageHints>
TroveSyncMeta yes
TroveSyncData no
</StorageHints>
</Filesystem>
####################
The nodes are apparently working correctly, at boot the /etc/
init.d/pvfs2
script worked and the log file (/tmp/pvfs2-server.log) gives me
for a
node:
####################
[D 10/08 14:39] PVFS2 Server version 2.6.2 starting.
####################
on the master instead, it gives
####################
[D 10/09 11:09] PVFS2 Server version 2.6.2 starting.
[E 10/09 11:09] Error: trove_initialize: No such file or directory
[E 10/09 11:09]
***********************************************
[E 10/09 11:09] Invalid Storage Space: /pvfs2-storage-space
[E 10/09 11:09] Storage initialization failed. The most common
reason
for this is that the storage space has not yet been
created or is located on a partition that has not yet
been mounted. If you'd like to create the storage space,
re-run this program with a -f option.
[E 10/09 11:09]
***********************************************
[E 10/09 11:09] Error: Could not initialize server interfaces;
aborting.
[E 10/09 11:09] Error: Could not initialize server; aborting.
####################
Now, the storage space on the nodes is full:
####################
[EMAIL PROTECTED] ~]# ls /pvfs2-storage-space/
744468fe collections.db lost+found storage_attributes.db
####################
on the master (frontend) not:
####################
[EMAIL PROTECTED] ~]# ls /pvfs2-storage-space/
744468fe
####################
Anyone can point me in the right direction?
Thanks Again
Raimondo
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
<giamma.vcf>
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
begin:vcard
fn:Raimondo Giammanco
n:Giammanco;Raimondo
org:von Karman Institute for Fluid Dynamics;Environmental and Applied Fluid Dynamics
adr:;;Chaussee De Waterloo 72;Rhode-Saint-Genese;;B-1640;Belgium
email;internet:[EMAIL PROTECTED]
title:Senior Research Engineer
tel;work:+3223599763
tel;fax:+3223599611
x-mozilla-html:FALSE
url:http://www.vki.ac.be
version:2.1
end:vcard
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users