> On Oct 19, 2014, at 2:22 PM, Jan Lehnardt <[email protected]> wrote: > > >> On 19 Oct 2014, at 20:15 , Brian Mitchell <[email protected]> wrote: >> >> >>> On Oct 19, 2014, at 1:49 PM, Jan Lehnardt <[email protected]> wrote: >>> >>> >>>> On 18 Oct 2014, at 01:17 , Jens Alfke <[email protected]> wrote: >>>> >>>> >>>>> On Oct 17, 2014, at 2:22 PM, Brian Mitchell <[email protected]> >>>>> wrote: >>>>> >>>>> Giving revs meaning outside of this scope is likely to bring up more meta >>>>> discussion about the CouchDB data model and a long history of >>>>> undocumented choices which only manifest in the particular >>>>> implementation we have today. >>>> >>>> That does appear to be a danger. I'm not interested in bike-shedding; if >>>> the Apache CouchDB community can't make progress on this issue then we can >>>> discuss it elsewhere to come up with solutions. I can't speak for Chris, >>>> but I'm here as a courtesy and because I believe interoperability is >>>> important. But I believe making progress is more important. >>> >>> +1000. I think so far we’ve had a brief chatter about this and we are ready >>> to move on. >>> >>> How does moving this to a strawperson proposal sound? E.g. have a ticket, >>> or pad, or gist somewhere where we can hammer out the details of this and >>> what the various trade-offs of open decisions are? >>> >>> JIRA obviously preferred, but happy to start this elsewhere if it provides >>> less friction. >> >> My primary point is that interoperation does *not* require the rev hashes be >> done the same. Clustering does but I can’t see why we’d encourage people to >> write the same thing to two slightly different systems simultaneously. Doing >> that, I can guarantee that rev problems will not be the only thing to fix. >> >> If we want to define rev interoperation in terms of the minimal and the >> stronger case, that might work just fine but defining interoperation as the >> latter is excludes a variety of strategies that implementations can have and >> will likely mean different versions of CouchDB don’t “interoperate” under >> this very definition, which is simply not a useful way to describe the >> situation. > > I can’t parse this, can you rephrase? :)
I’m basically saying that they don’t need to be generated the same way to be defined as interoperable. There are a few invariants required and a specific digest algorithm isn’t one of them. Creating a bogus rev 1-abcfoobaz using new_edits=false shows exactly how this works. The foundation for interoperation should only assume some definition of “match” which I mean, intuitively, that 1-abcfoobaz = 1-abcfoobaz, 2-abcfoobaz /= 1-abcfoobaz, 1-xyz /= 1-abc. The need for a stronger set of rules is specific to how the implementation is *intended* on being used. In an eventually consistent cluster, it’s quite useful to have idempotents to repair via replication or to even duplicate writes to redundant nodes which replicate between one another. I don’t see a problem with defining rules to make this work well but it’s a very specific and demanding kind of interoperability. Of course, revs matching are not going to solve cluster coherence between implementations on their own. For example, the abstraction still leaks in the multi-node replication case if there is replication lag (quite easily achieved, at least with how things work now). One can’t simply just write to two places and hope that my “idempotent operation” works. It’s a huge assumption of what was written prior to that and it relies on minimal knowledge being replication. It’s just a bad practice to assume that two distributed systems will always have the same view of things in relation to a third client. Clustering modes go through quite a bit of work to make it usable but it’s certainly far from automatic and not something that I’d put on the table for the definition of general interoperation. [1] Thus a middle ground might be allowing two levels of interoperation to be defined. I still don’t see the value in focusing on this specific case. It’s my opinion that if there is something that breaks between vendors because of this, there are likely other assumptions to visit far before this one. I could be wrong as I don’t know what others are planning on doing. >> Finally, if we really want to define a stable digest, I’d suggest that a >> reference implementation be created and proposed rather than forced upon the >> implementations before it materializes. This could possibly be made an >> option in the CouchDB configuration or build allowing it to be an >> experimental feature. > > Hence my strawperson proposal that we can work on. I envision all > implementors getting a say in what works for them and what doesn’t and that > we find a consensus and a solution that we can roll this out harmlessly. I agree but there seems to be a dismissal of the idea that we don’t need this rather than it really being a matter of just finding the right implementation that fits every useless. [2] Brian. [1]: I also alluded to the 409 issue in another email which shows the growing problem of how the old revision system isn’t well designed for anything but single node systems. I’d vote to remove this in 3.x since conflicts on write mean nothing in an eventually consistent system and the 409 actually makes it harder to test code in this case. It’s just trivial to poke holes in the setup and I don’t see how revs can possibly be the wall people actually hit. [2]: I think there is a better need for revision control that applications can leverage more significantly. There’s a long history of, rightly, discouraging people of using the MVCC implementation for application concerns, but that’s a limitation of the API, not of the idea. I could easily see revs being a richer entity in some systems, which makes this whole digest thing seem so specific and low level, that we’re really just locking ourselves in rather than opening the protocol up. It depends on where one might want to go, I guess.
