Re: [OpenAFS] Re: State of the Michigan shadow system (long)

Steve Simmons Mon, 20 Dec 2010 16:01:08 -0800

On Dec 20, 2010, at 3:29 PM, Andrew Deason wrote:

> On Mon, 20 Dec 2010 14:46:38 -0500
> Steve Simmons <s...@umich.edu> wrote:
> 
>> A shadow volume is a read-only remote clone of a primary volume. We
>> had to create some terminology here, and 'primary' is what we called
>> the real-time, in-use, r/w production volume. A remote clone closely
>> resembles a read-only replica of a volume, but differs in several
>> important respects.
> 
> By 'read-only' do you just mean in intended usage? I may be way off, but
> my memory of shadow volumes (as implemented in openafs.org code) is that
> they are are virtually identical to the primary, and are not marked as
> RO volumes or anything like that in the underlying namei metadata. So, a
> fileserver could theoretically attach it and modify it, though it was
> intended that the lack of an entry in the vldb would prevent clients
> from accessing it.


Yes, 'read-only' is sloppy terminology on my part. 'Enforcement' of the 
read-only nature was done by virtue of the shadow being invisible to most 
things that access volumes.

> 
>> First and foremost, it does not appear in the vldb. Thus there is no
>> possibility of the read-only copy coming into production.
> 
> I understand this was probably the best way to do this at the time, but
> this alone does not prevent the volume from getting used. Since vldb
> results are cached by clients and an administrator could screw up vldb
> data somehow, it's possible for someone to access the wrong volume.

Correct.

>> Shadow volumes could be detected only on the server on which they
>> reside. Modification were made to vos listvol for that purpose. A bit
>> in the volume header was selected for distinguishing a shadow from a
>> primary volume; I believe that was the only modification made to the
>> volume header file. This work is also done.
> 
> By "done" does this mean you just implemented it at umich, or it's in
> the openafs.org tree? Is the volume header bit you're referring to
> inService (or another existing flag), or did you use a separate field
> specifically for shadows?

That's how we implemented it, yes. I don't believe the source is in the public 
openafs.org source tree anywhere, tho I think Dan Hyde has it incorported as a 
branch in  his git archive. I'd ask him, but he's on vacation this week.

I don't know off the top of the head which bit he used. In our disucssions at 
the time we used one of the reserved bits, but in full knowledge that such 
might have to change when/if time came to make the implementation more public.

>> I think we were sliding towards a transparent upward-compatible
>> replacement of the vldb as well. Based purely on how I imagine the
>> vldb to work :-), it should be possible to add shadow data to it and
>> define some additional rpcs. Users of the old rpcs would only get the
>> data that was in the 'legacy' vldb, users of the new rpcs would get
>> shadow data as well. That's a door folks may not want opened yet, but
>> it seems a better choice than bolting a separate shadow-oriented vldb
>> to the side.
> 
> I thought the bigger problem is not the compatibility of the
> client<->vlserver interface, but rather the vlserver<->vlserver
> interface; that is, the structure of the VL entries in ubik, since those
> structures doesn't have any spare fields (although LockAfsId is not
> used). You can probably play some games to keep enough compatibility
> with older vlservers, but it requires some thought.

Again, loose terminology on my part, 'cause I didn't want to drown folks in 
detail. But since you were kind enough to ask:

Yeah, it's hard. One chunk of what makes it hard is that the vldb format is 
fixed and there's little or no space to wedge new stuff into it. Another 
complicating factor is this whole idea of volume families and determining if, 
when and how we want to be tracking the inter-volume relationships and 
dependencies. As a particular example, in our existing implementation it's 
perfectly possible for shadow A' (A-prime) of volume A to be overwritten by as 
a shadow of volume B. Sometimes you want that: B could be a shadow of A, and 
we're reducing overhead on A by refreshing B from A'. In a sense, you might 
think of B as more properly A''. But how should such relationships be detected, 
and what if any limitations should be imposed on such refreshes? Lacking a good 
taxonomy of what a shadow volume is and how it relates to the primary, we can't 
come up with a good database definition to encode that. Lacking that 
definition, we can't come up with a proposal that would allow shadow data to be 
placed in the vldb in any upwards-compatable method.

The decision to leave shadows outside of the vldb ultimately begs the question 
of how to manage shadows and volume families, and IMHO is acceptable only as a 
short-term case.

Coming to the more specific vlserver-vlserver-ubiq questions - yeah, that's 
hard. If all we're thinking of is simple records that could (please, god, 
please!) be shoehorned into the dbs, those are relatively simple issues. I 
dunno if that's possible, tho. In addition, it ignores any possible issues that 
may arise when the db is in a transitional state - ie, an incomplete subset of 
the volume family data has been distributed and somebody makes a query about 
it. As far as I know, ubiq doesn't have any concept of atomic commits across 
multiple entries. That makes processing volume families in any except the 
simples ways very hard.

If a more complex implementation is required, well, maybe maybe huge violence 
has to be done to the vldb format, the servers, and ubiq. Maybe we need to move 
to some other replicated system entirely. Maybe this is a good argument for 
keeping the shadow data in a separate db, not unlike the kind of system Russ 
built for extended data at Stanford (I believe they use a mysql db to track 
creation and manipulation of mount points, etc). Maybe not. No matter what, 
it's not an implementation I'd want to proceed with in the absence of a 
community decision that This Is The Right Direction. So for now, shadows stay 
outside the vldb and non-vldb processes are going to have to handle it.

Assuming there is some way to get shadow data into the vldb:

My seat-of-the-pants feeling is that an upwardly compatible db is doable. Older 
clients and servers that don't understand shadows should work perfectly fine. 
They will use the older RPCs to communicate, and as such will never get  
presented with data about shadows. Those older rpcs to do vos move, copy, 
delete, etc, doesn't require any knowledge of shadow entries in the vldb. When 
those are actions requested using the existing RPCs, shadow-enable vldb 
manipulation code needs to be handle the relationships in whatever default way 
we define. As an bit of precedent, current removal of a replication site from 
the list doesn't cause the replicant copy to be deleted. I can see (or rather, 
could live with) similar performance for shadows.

Newer clients and vldb replication should use the newer RPCs. For thing like 
rename or delete, the newer client commands and the rpcs should allow us 
instruct the appropriate entity that removal of a volume should or should not 
cause the shadow(s) to be removed, or shadows renamed as part of volume 
renames, etc.

But all that is *way* ahead of the game. For now, we've gone with the initial 
implementers decision of shadows not being in the vldb.

Steve_______________________________________________
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info

Re: [OpenAFS] Re: State of the Michigan shadow system (long)

Reply via email to