On Thu, 24 May 2018 20:40:02 +1000 Ian Wienand <iwien...@redhat.com> wrote:
> Hello, > > We lost the backing storage on our R/O server /vicepa sometime > yesterday (it's cloud block storage out of our control, so it > disappeared in a unknown manner). Once things came back, we had > volumes in a range of mostly locked states from updates and "vos > release"s triggered by update cron jobs. > > Quite a few I could manually unlock and re-release, and things went > OK. Others have proven more of a problem. > > To cut things short, there was a lot of debugging, and we ended up > with stuck transactions between the R/W and R/O server and > un-unlockable volumes. Eventually we rebooted both to clear out > everything. In an attempt to just clear the R/O mirrors and start > again, I did for each problem volume: > > vos unlock $MIRROR > vos remove -server afs02.dfw.openstack.org -partition a -id $MIRROR.readonly > vos release -v $MIRROR > vos addsite -server afs02.dfw.openstack.org -partition a -id $MIRROR > > My theory being this would completely remove the R/O mirror volume and > start fresh. I then proceeded to do a "vos release" on each volume in > sequence (more details in [1]). > > However, this release on the new R/O volume has not worked. Here is > the output from the release of one of the volumes: > > --- > Thu May 24 09:49:54 UTC 2018 > Kerberos initialization for service/afsad...@openstack.org > > mirror.ubuntu-ports > RWrite: 536871041 ROnly: 536871042 > number of sites -> 3 > server afs01.dfw.openstack.org partition /vicepa RW Site > server afs01.dfw.openstack.org partition /vicepa RO Site > server afs02.dfw.openstack.org partition /vicepa RO Site -- Not > released > This is a complete release of volume 536871041 > There are new RO sites; we will try to only release to new sites > Querying old RO sites for update times... done > RW vol has not changed; only releasing to new RO sites > Starting transaction on cloned volume 536871042... done > Creating new volume 536871042 on replication site afs02.dfw.openstack.org: > done > This will be a full dump: read-only volume needs be created for new site > Starting ForwardMulti from 536871042 to 536871042 on afs02.dfw.openstack.org > (entire volume). > Release failed: VOLSER: Problems encountered in doing the dump ! > The volume 536871041 could not be released to the following 1 sites: > afs02.dfw.openstack.org /vicepa > VOLSER: release could not be completed > Error in vos release command. > VOLSER: release could not be completed > Thu May 24 09:51:49 UTC 2018 > --- > > It triggers the salvage, on the I presume only partially cloned > volume, which logs > > --- > 05/24/2018 09:51:49 dispatching child to salvage volume 536871041... > 05/24/2018 09:51:49 namei_ListAFSSubDirs: warning: VG 536871042 does not have > a link table; salvager will recreate it. > 05/24/2018 09:51:49 fileserver requested salvage of clone 536871042; > scheduling salvage of volume group 536871041... > 05/24/2018 09:51:49 VReadVolumeDiskHeader: Couldn't open header for volume > 536871041 (errno 2). > 05/24/2018 09:51:49 2 nVolumesInInodeFile 64 > 05/24/2018 09:51:49 CHECKING CLONED VOLUME 536871042. > 05/24/2018 09:51:49 mirror.ubuntu-ports.readonly (536871042) updated > 05/24/2018 06:08 > 05/24/2018 09:51:49 totalInodes 32896 > --- > > On the R/O server side (afs02) we have > > --- > Thu May 24 09:49:55 2018 VReadVolumeDiskHeader: Couldn't open header for > volume 536871042 (errno 2). > Thu May 24 09:49:55 2018 attach2: forcing vol 536871042 to error state (state > 0 flags 0x0 ec 103) > Thu May 24 09:49:55 2018 1 Volser: CreateVolume: volume 536871042 > (mirror.ubuntu-ports.readonly) created > Thu May 24 09:51:49 2018 1 Volser: ReadVnodes: IH_CREATE: File exists - > restore aborted > Thu May 24 09:51:49 2018 Scheduling salvage for volume 536871042 on part > /vicepa over FSSYNC > --- > > I do not see anything on the R/W server side (afs01). > > I have fsck'd the /vicepa partition on the RO server (afs02) and it is > OK. > > I can not find much info on "IH_CREATE: File exists" which I assume is > the problem here. Yes, there seems to be files left over. For that parent volume number (536871041) the left over files would be in the path /vicep*/AFSIDat/=0/=0++U > I would welcome any suggestions! Clearly my theory > of "vos remove" and "vos add" of the mirror hasn't cleared out enough > state to recover things? A full partition salvage on the ro server should remove the orphaned files, bos salvage -server afs02 -partition a -showlog -orphans attach -forceDAFS > > All servers are Xenial-based with it's current 1.6.7-1ubuntu1.1 > openafs packages. > > Thanks, > > -i > > [1] http://lists.openstack.org/pipermail/openstack-infra/2018-May/005949.html > _______________________________________________ > OpenAFS-devel mailing list > OpenAFS-devel@openafs.org > https://lists.openafs.org/mailman/listinfo/openafs-devel -- Michael Meffie <mmef...@sinenomine.net> _______________________________________________ OpenAFS-devel mailing list OpenAFS-devel@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-devel