It was determined out of list that the particular error I'm seeing in
this case is because I was adding a RO volume on the same server but
different partition as the RW volume.  While I know it's terrible
practice, it did work in previous versions and I was using it for
testing purposes.  Apparently, this is no longer allowed, and gives the
errors that I'm seeing.

So, that mystery is solved - however, why I got stuck in salvage loops
with 1.4.10 is not, and as I don't have logs and am wary to bring
production machines back to 1.4.10, it'll remain a mystery for the
forseeable future I imagine.

Thanks for the help!

-stefan

On Thu, Aug 20, 2009 at 09:19:01AM -0500, Stefan Strandberg wrote:
> Hi,
> 
> 1.4.11 isn't really doable until it's at least in lenny-backports as we
> don't want to roll our own versions of this.
> 
> As for a stale replica existing, I may be misunderstanding.  If you're
> saying that it's a replica of that volume, I don't see how that's the
> case.
> 
> Here's the creation and subsequent release attempt for a brand new
> volume:
> 
> ste...@cog ~ $ vos create beth a foo.bar
> Volume 536885610 created on partition /vicepa of beth
> ste...@cog ~ $ vos addsite beth b foo.bar
> Added replication site beth /vicepb for volume foo.bar
> ste...@cog ~ $ vos rel -v foo.bar
> 
> foo.bar 
>     RWrite: 536885610 
>     number of sites -> 2
>        server beth.cae.wisc.edu partition /vicepa RW Site 
>        server beth.cae.wisc.edu partition /vicepb RO Site  -- Not released
> This is a complete release of volume 536885610
> Cloning RW volume 536885610 to temporary RO... done
> Getting status of RW volume 536885610... done
> Ending cloning transaction on RW volume 536885610... done
> Starting transaction on cloned volume 536885611... done
> Creating new volume 536885611 on replication site beth.cae.wisc.edu: Failed 
> to create the ro volume: : Input/output error
> The volume 536885610 could not be released to the following 1 sites:
>                           beth.cae.wisc.edu /vicepb
> VOLSER: release could not be completed
> Error in vos release command.
> VOLSER: release could not be completed
> 
> And here's the VolserLog output:
> 
> Thu Aug 20 09:15:14 2009 1 Volser: CreateVolume: volume 536885610 (foo.bar) 
> created
> Thu Aug 20 09:15:24 2009 1 Volser: Clone: Cloning volume 536885610 to new 
> volume 536885611
> Thu Aug 20 09:15:24 2009 VAttachVolume: Failed to open 
> /vicepb/V0536885611.vol (errno 2)
> Thu Aug 20 09:15:24 2009 1 Volser: CreateVolume: Unable to create the volume; 
> aborted, error code 18
> Thu Aug 20 09:15:24 2009 : Invalid cross-device link
> 
> Turning up debugging doesn't show any extra anything really.
> 
> Thanks again,
> 
> -stefan
> 
> On Thu, Aug 20, 2009 at 09:42:46AM -0400, Derrick Brashear wrote:
> > On Thu, Aug 20, 2009 at 9:39 AM, Jeffrey
> > Altman<[email protected]> wrote:
> > > Stefan Strandberg wrote:
> > >> Anyone have any ideas?  I would really like to get everything on 1.4.10
> > >> for the performance increases.
> > >
> > > The current version of OpenAFS is 1.4.11 which addresses:
> > >
> > > - Fix race in background sync code which could cause volumes to go
> > >  offline. (124359)
> > >
> > > This is not the issue you are describing.  However, please test with
> > > 1.4.11 and see if the problem is still present.   If so, send logs and
> > > report to [email protected].
> > 
> > 
> > it will still be present. the real problem is you have a stale copy of
> > the replica elsewhere on the disk. there should be exactly one copy of
> > 536885604, and it should be on the same partition as 536885602, both
> > according to the vldb and in vos listvol output. arrange to make that
> > true, and your issue will go away.
> > _______________________________________________
> > OpenAFS-info mailing list
> > [email protected]
> > https://lists.openafs.org/mailman/listinfo/openafs-info
> > 
> 
> -- 
> Stefan Strandberg
> UNIX group
> Computer Aided Engineering - UW Madison
> [email protected]
> 
> 
> _______________________________________________
> OpenAFS-info mailing list
> [email protected]
> https://lists.openafs.org/mailman/listinfo/openafs-info
> 

-- 
Stefan Strandberg
UNIX group
Computer Aided Engineering - UW Madison
[email protected]


_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info

Reply via email to