Hi Everyone,

 

I am having a problem with one of my OpenAFS file servers.   About ½ of the
volumes are “Off-line” and I am unable to bring them online.  First some
system info and then I will list problem details and what I have tried.

 

The system is running Scientific Linux 5.5/x86_64 (basically CentOS 5.5
64-bit).  The openafs rpms are:

 

[atums2:~]# rpm -qa | grep openafs

openafs-kpasswd-1.4.12-6.cern

openafs-client-1.4.12-6.cern

kernel-module-openafs-2.6.18-194.3.1.el5-1.4.12-5.cern

openafs-1.4.12-6.cern

kernel-module-openafs-2.6.18-194.8.1.el5-1.4.12-5.cern

openafs-krb5-1.4.12-6.cern

kernel-module-openafs-2.6.18-238.1.1.el5-1.4.12-6.cern

openafs-server-1.4.12-6.cern

 

The version of ‘e2fsprogs’ is 1.39

 

The system has an ext3 1TB partition for AFS:

 

[atums2:~]# df /vicepb

Filesystem           1K-blocks      Used Available Use% Mounted on

/dev/sda1            1007931664 635382472 321349196  67% /vicepb

 

The system has 931 volumes and only 470 are On-line while 461 are Off-line:

 

[atums2:~]# vos listvol atums2 

Total number of volumes on server atums2 partition /vicepb: 931

chamber.OLD_eml4a07               536872814 RW    8634169 K Off-line

chamber.OLD_eml4a07.readonly      536872815 RO    8634169 K On-line

chamber.OLD_eml4a09               536872817 RW     702642 K Off-line

chamber.OLD_eml4a09.readonly      536872818 RO     702642 K On-line

…

 

Total volumes onLine 470 ; Total volumes offLine 461 ; Total busy 0

 

I have run ‘bos salvage’ on the partition multiple times.   I have restarted
the system.  I have run a force fsck.ext3 check on the underlying partition
(no problems found).   Only RW volumes are Off-line.   All RO volumes are
On-line.   There are a few RW volumes On-line (8 out of 469) but the rest
won’t come On-line.

 

Here is a particular volume which is Off-line:

 

[atums2:~]# vos examine chdata.sn

chdata.sn                         536871656 RW        598 K  Off-line

    atums2.cern.ch /vicepb

    RWrite  536871656 ROnly          0 Backup          0

    MaxQuota   10000000 K

    Creation    Fri May 26 04:02:49 2006

    Copy        Wed Oct 11 12:35:42 2006

    Backup      Sun Jun 11 00:30:10 2006

    Last Access Fri Jan  7 16:38:32 2011

    Last Update Wed Apr  4 15:29:42 2007

    0 accesses in the past day (i.e., vnode references)

 

    RWrite: 536871656     ROnly: 536871657     RClone: 536871657

    number of sites -> 3

       server atums1.cern.ch partition /vicepi RO Site  -- Old release

       server atums2.cern.ch partition /vicepb RW Site  -- New release

       server atums2.cern.ch partition /vicepb RO Site  -- New release

 

Try to bring online:

 

[atums2:~]# vos online -server atums2 -partition /vicepb -id chdata.sn

 

The FileLog shows:

 

Sun Jan 23 22:57:03 2011 GetBitmap: addled vnode index in volume chdata.sn;
volume needs salvage

Sun Jan 23 22:57:03 2011 VAttachVolume: error getting bitmap for volume
(/vicepb//V0536871656.vol)

 

Try to Salvage:

 

[atums2:~]# bos salvage atums2 /vicepb chdata.sn

Starting salvage.

bos: salvage completed

 

The SalvageLog shows:

 

[atums2:~]# tail /usr/afs/logs/SalvageLog

@(#) OpenAFS 1.4.12 built  2010-12-13 1928681 19919656

01/23/2011 22:58:19 STARTING AFS SALVAGER 2.4 (/usr/afs/bin/salvager /vicepb
536871656)

01/23/2011 22:58:19 2 nVolumesInInodeFile 64

01/23/2011 22:58:19 CHECKING CLONED VOLUME 536871657.

01/23/2011 22:58:19 chdata.sn.readonly (536871657) updated 04/04/2007 15:29

01/23/2011 22:58:19 Partially allocated vnode 2 deleted.

 

Try again:

 

[atums2:~]# vos online -server atums2 -partition /vicepb -id chdata.sn


FileLog has the same message:

 

Sun Jan 23 22:59:05 2011 GetBitmap: addled vnode index in volume chdata.sn;
volume needs salvage

Sun Jan 23 22:59:05 2011 VAttachVolume: error getting bitmap for volume
(/vicepb//V0536871656.vol)

 

Salvage attempt again:

 

[atums2:~]# bos salvage atums2 /vicepb chdata.sn

Starting salvage.

bos: salvage completed

 

[atums2:~]# tail /usr/afs/logs/SalvageLog

@(#) OpenAFS 1.4.12 built  2010-12-13 1928681 19919656

01/23/2011 23:00:07 STARTING AFS SALVAGER 2.4 (/usr/afs/bin/salvager /vicepb
536871656)

01/23/2011 23:00:07 2 nVolumesInInodeFile 64

01/23/2011 23:00:07 CHECKING CLONED VOLUME 536871657.

01/23/2011 23:00:07 chdata.sn.readonly (536871657) updated 04/04/2007 15:29

01/23/2011 23:00:07 Partially allocated vnode 2 deleted.

 

Same result as if the prior salvage didn’t do anything.    This is exactly
what happens on other volumes I have tried to bring online. 

 

So how would I fix this?   Any suggestions for how to get the rest of these
volumes On-line?  

 

Let me know if you need further details.  Thanks,

 

Shawn

 

 

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to