Thanks for the information Hartmut. I tried setting ulimit to 1000000 blocks and rerunning the salvage. I still got no core file (salvager "seemed" to complete):
[atums2:~]# ulimit -a core file size (blocks, -c) 1000000 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 49152 max locked memory (kbytes, -l) 32 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 49152 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited [atums2:~]# bos salvage atums2 /vicepb chdata.sn Starting salvage. bos: salvage completed SalvageLog file shows the same thing as before. Then I tried running 'gdb' and got: [atums2:~]# gdb /usr/afs/bin/salvager GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-32.el5) Copyright (C) 2009 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /usr/afs/bin/salvager...(no debugging symbols found)...done. (gdb) run /vicepb 536871656 -debug Starting program: /usr/afs/bin/salvager /vicepb 536871656 -debug warning: no loadable sections found in added symbol-file system-supplied DSO at 0x2aaaaaaab000 Mon Jan 24 15:16:47 2011 Assertion failed! file vol-salvage.c, line 2859. Program received signal SIGABRT, Aborted. 0x0000003408c30265 in raise () from /lib64/libc.so.6 The log file then showed: [atums2:~]# tail /usr/afs/logs/SalvageLog @(#) OpenAFS 1.4.12 built 2010-12-13 1928681 19919656 01/24/2011 15:16:47 STARTING AFS SALVAGER 2.4 (/usr/afs/bin/salvager /vicepb 536871656 -debug) 01/24/2011 15:16:47 2 nVolumesInInodeFile 64 01/24/2011 15:16:47 CHECKING CLONED VOLUME 536871657. 01/24/2011 15:16:47 chdata.sn.readonly (536871657) updated 04/04/2007 15:29 01/24/2011 15:16:47 Partially allocated vnode 2 deleted. So I assume that I need to dig into vol-salvage.c around line 2859 to figure out why the Assertion failed? I should also note that the rest of the AFS cell is running the "SL" version of OpenAFS, rather than "SLC" like this node. One possibility is that I could switch to those RPMS since I have hit issues in the past with CERN customizations for OpenAFS. Thanks, Shawn -----Original Message----- From: Hartmut Reuter [mailto:[email protected]] Sent: Monday, January 24, 2011 9:04 AM To: McKee, Shawn Cc: [email protected] Subject: Re: [OpenAFS] Problem with Off-line volumes...unable to bring On-line Looks like a crash of the salvager. The SalvageLog should end differently with the summary line for the RW-volume. Are there any core files in /usr/afs/logs? If not, make sure ulimit for core file size isn't set to 0 and retry. You also could run the salvager by hand under gdb to see why it crashes. You need then to add the -debug flag to prevent it from forking. E.g. gdb /usr/afs/bin/salvager ... (gdb) run /vicepb 536871656 -debug Good luck, Hartmut McKee, Shawn wrote: > Hi Everyone, > > I am having a problem with one of my OpenAFS file servers. About ½ of > the volumes are Off-line and I am unable to bring them online. First > some system info and then I will list problem details and what I have tried. > > The system is running Scientific Linux 5.5/x86_64 (basically CentOS 5.5 > 64-bit). The openafs rpms are: > > [atums2:~]# rpm -qa | grep openafs > > openafs-kpasswd-1.4.12-6.cern > > openafs-client-1.4.12-6.cern > > kernel-module-openafs-2.6.18-194.3.1.el5-1.4.12-5.cern > > openafs-1.4.12-6.cern > > kernel-module-openafs-2.6.18-194.8.1.el5-1.4.12-5.cern > > openafs-krb5-1.4.12-6.cern > > kernel-module-openafs-2.6.18-238.1.1.el5-1.4.12-6.cern > > openafs-server-1.4.12-6.cern > > The version of e2fsprogs is 1.39 > > The system has an ext3 1TB partition for AFS: > > [atums2:~]# df /vicepb > > Filesystem 1K-blocks Used Available Use% Mounted on > > /dev/sda1 1007931664 635382472 321349196 67% /vicepb > > The system has 931 volumes and only 470 are On-line while 461 are Off-line: > > [atums2:~]# vos listvol atums2 > > Total number of volumes on server atums2 partition /vicepb: 931 > > chamber.OLD_eml4a07 536872814 RW 8634169 K Off-line > > chamber.OLD_eml4a07.readonly 536872815 RO 8634169 K On-line > > chamber.OLD_eml4a09 536872817 RW 702642 K Off-line > > chamber.OLD_eml4a09.readonly 536872818 RO 702642 K On-line > > > > Total volumes onLine 470 ; Total volumes offLine 461 ; Total busy 0 > > I have run bos salvage on the partition multiple times. I have > restarted the system. I have run a force fsck.ext3 check on the > underlying partition (no problems found). Only RW volumes are Off-line. > All RO volumes are On-line. There are a few RW volumes On-line (8 out of > 469) but the rest wont come On-line. > > Here is a particular volume which is Off-line: > > [atums2:~]# vos examine chdata.sn > > chdata.sn 536871656 RW 598 K Off-line > > atums2.cern.ch /vicepb > > RWrite 536871656 ROnly 0 Backup 0 > > MaxQuota 10000000 K > > Creation Fri May 26 04:02:49 2006 > > Copy Wed Oct 11 12:35:42 2006 > > Backup Sun Jun 11 00:30:10 2006 > > Last Access Fri Jan 7 16:38:32 2011 > > Last Update Wed Apr 4 15:29:42 2007 > > 0 accesses in the past day (i.e., vnode references) > > RWrite: 536871656 ROnly: 536871657 RClone: 536871657 > > number of sites -> 3 > > server atums1.cern.ch partition /vicepi RO Site -- Old release > > server atums2.cern.ch partition /vicepb RW Site -- New release > > server atums2.cern.ch partition /vicepb RO Site -- New release > > Try to bring online: > > [atums2:~]# vos online -server atums2 -partition /vicepb -id chdata.sn > > The FileLog shows: > > Sun Jan 23 22:57:03 2011 GetBitmap: addled vnode index in volume > chdata.sn; volume needs salvage > > Sun Jan 23 22:57:03 2011 VAttachVolume: error getting bitmap for volume > (/vicepb//V0536871656.vol) > > Try to Salvage: > > [atums2:~]# bos salvage atums2 /vicepb chdata.sn > > Starting salvage. > > bos: salvage completed > > The SalvageLog shows: > > [atums2:~]# tail /usr/afs/logs/SalvageLog > > @(#) OpenAFS 1.4.12 built 2010-12-13 1928681 19919656 > > 01/23/2011 22:58:19 STARTING AFS SALVAGER 2.4 (/usr/afs/bin/salvager > /vicepb 536871656) > > 01/23/2011 22:58:19 2 nVolumesInInodeFile 64 > > 01/23/2011 22:58:19 CHECKING CLONED VOLUME 536871657. > > 01/23/2011 22:58:19 chdata.sn.readonly (536871657) updated 04/04/2007 15:29 > > 01/23/2011 22:58:19 Partially allocated vnode 2 deleted. > > Try again: > > [atums2:~]# vos online -server atums2 -partition /vicepb -id chdata.sn > > > FileLog has the same message: > > Sun Jan 23 22:59:05 2011 GetBitmap: addled vnode index in volume > chdata.sn; volume needs salvage > > Sun Jan 23 22:59:05 2011 VAttachVolume: error getting bitmap for volume > (/vicepb//V0536871656.vol) > > Salvage attempt again: > > [atums2:~]# bos salvage atums2 /vicepb chdata.sn > > Starting salvage. > > bos: salvage completed > > [atums2:~]# tail /usr/afs/logs/SalvageLog > > @(#) OpenAFS 1.4.12 built 2010-12-13 1928681 19919656 > > 01/23/2011 23:00:07 STARTING AFS SALVAGER 2.4 (/usr/afs/bin/salvager > /vicepb 536871656) > > 01/23/2011 23:00:07 2 nVolumesInInodeFile 64 > > 01/23/2011 23:00:07 CHECKING CLONED VOLUME 536871657. > > 01/23/2011 23:00:07 chdata.sn.readonly (536871657) updated 04/04/2007 15:29 > > 01/23/2011 23:00:07 Partially allocated vnode 2 deleted. > > Same result as if the prior salvage didnt do anything. This is exactly > what happens on other volumes I have tried to bring online. > > So how would I fix this? Any suggestions for how to get the rest of > these volumes On-line? > > Let me know if you need further details. Thanks, > > Shawn > -- ----------------------------------------------------------------- Hartmut Reuter e-mail [email protected] phone +49-89-3299-1328 fax +49-89-3299-1301 RZG (Rechenzentrum Garching) web http://www.rzg.mpg.de/~hwr Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) -----------------------------------------------------------------
smime.p7s
Description: S/MIME cryptographic signature
