I said a bunch of this on IRC also but Adam has it. further operations within the 'expired' txn just fail. we recognise that and start a new one. In the _changes case, we'd send a last_seq row and end the request, but this isn't going to be a great answer (at least, not a backward compatible answer) for _view, _all_docs and _find.
-- Robert Samuel Newson rnew...@apache.org On Thu, 7 Mar 2019, at 12:37, Adam Kocoloski wrote: > Bah, our “cue”, not our “queue” ;) > > Adam > > > On Mar 7, 2019, at 7:35 AM, Adam Kocoloski <kocol...@apache.org> wrote: > > > > Hi Garren, > > > > In general we wouldn’t know ahead of time whether we can complete in five > > seconds. I believe the way it works is that we start a transaction, issue a > > bunch of reads, and after 5 seconds any additional reads will start to fail > > with something like “read version too old”. That’s our queue to start a new > > transaction. All the reads that completed successfully are fine, and the > > CouchDB API layer can certainly choose to start streaming as soon as the > > first read completes (~2ms after the beginning of the transaction). > > > > Agree with Bob that steering towards a larger number of short-lived > > operations is the way to go in general. But I also want to balance that > > with backwards-compatibility where it makes sense. > > > > Adam > > > >> On Mar 7, 2019, at 7:22 AM, Garren Smith <gar...@apache.org> wrote: > >> > >> I agree that option A seems the most sensibile. I just want to understand > >> this comment: > >> > >>>> A _changes request that cannot be satisfied within the 5 second limit > >> will be implemented as multiple FoundationDB transactions under the covers > >> > >> How will we know if a change request cannot be completed in 5 seconds? Can > >> we tell that beforehand. Or would we try and complete a change request. The > >> transaction fails after 5 seconds and then do multiple transactions to get > >> the full changes? If that is the case the response from CouchDB to the user > >> will be really slow as they have already waited 5 seconds and have still > >> not received anything. Or if we start streaming a result back to the user > >> in the first transaction (Is this even possible?) then we would somehow > >> need to know how to continue the changes feed after the transaction has > >> failed. > >> > >> Then Bob from your comment: > >> > >>>> Forcing clients to do short (<5s) requests feels like a general good, as > >> long as meaningful things can be done in that time-frame, which I strongly > >> believe from what we've said elsewhere that they can. > >> > >> That makes sense, but how would we do that? How do you help a user to make > >> sure their request is under 5 seconds? > >> > >> Cheers > >> Garren > >> > >> > >> > >> On Thu, Mar 7, 2019 at 11:15 AM Robert Newson <rnew...@apache.org> wrote: > >> > >>> Hi, > >>> > >>> Given that option A is the behaviour of feed=continuous today (barring the > >>> initial whole-snapshot phase to catch up to "now") I think that's the > >>> right > >>> move. I confess to not reading your option B too deeply but I was there > >>> on > >>> IRC when the first spark was lit. We can build some sort of temporary > >>> multi-index on FDB today, that's clear, but it's equally clear that we > >>> should avoid doing so if at all possible. > >>> > >>> Perhaps the future Redwood storage engine for FDB will, as you say, > >>> significantly improve on this, but, even if it does, I'm not 100% > >>> convinced > >>> we should expose it. Forcing clients to do short (<5s) requests feels like > >>> a general good, as long as meaningful things can be done in that > >>> time-frame, which I strongly believe from what we've said elsewhere that > >>> they can. > >>> > >>> CouchDB's API, as we both know from rich (heh, and sometimes poor) > >>> experience in production, has a lot of endpoints of wildly varying > >>> performance characteristics. It's right that we evolve away from that > >>> where > >>> possible, and this seems a great candidate given the replicator in ~all > >>> versions of CouchDB will handle the change without blinking. > >>> > >>> We have the same issue for _all_docs and _view and _find, in that the user > >>> might ask for more data back than can be sent within a single FDB > >>> transaction. I suggest that's a new thread, though. > >>> > >>> -- > >>> Robert Samuel Newson > >>> rnew...@apache.org > >>> > >>> On Thu, 7 Mar 2019, at 01:24, Adam Kocoloski wrote: > >>>> Hi all, as the project devs are working through the design for the > >>>> _changes feed in FoundationDB we’ve come across a limitation that is > >>>> worth discussing with the broader user community. FoundationDB > >>>> currently imposes a 5 second limit on all transactions, and read > >>>> versions from old transactions are inaccessible after that window. This > >>>> means that, unlike a single CouchDB storage shard, it is not possible > >>>> to grab a long-lived snapshot of the entire database. > >>>> > >>>> In extant versions of CouchDB we rely on this long-lived snapshot > >>>> behavior for a number of operations, some of which are user-facing. For > >>>> example, it is possible to make a request to the _changes feed for a > >>>> database of an arbitrary size and, if you’ve got the storage space and > >>>> time to spare, you can pull down a snapshot of the entire database in a > >>>> single request. That snapshot will contain exactly one entry for each > >>>> document in the database. In CouchDB 1.x the documents appear in the > >>>> order in which they were most recently updated. In CouchDB 2.x there is > >>>> no guaranteed ordering, although in practice the documents are roughly > >>>> ordered by most recent edit. Note that you really do have to complete > >>>> the operation in a single HTTP request; if you chunk up the requests or > >>>> have to retry because the connection was severed then the exactly-once > >>>> guarantees disappear. > >>>> > >>>> We have a couple of different options for how we can implement _changes > >>>> with FoundationDB as a backing store, I’ll describe them below and > >>>> discuss the tradeoffs > >>>> > >>>> ## Option A: Single Version Index, long-running operations as multiple > >>>> transactions > >>>> > >>>> In this option the internal index has exactly one entry for each > >>>> document at all times. A _changes request that cannot be satisfied > >>>> within the 5 second limit will be implemented as multiple FoundationDB > >>>> transactions under the covers. These transactions will have different > >>>> read versions, and a document that gets updated in between those read > >>>> versions will show up *multiple times* in the response body. The entire > >>>> feed will be totally ordered, and later occurrences of a particular > >>>> document are guaranteed to represent more recent edits than than the > >>>> earlier occurrences. In effect, it’s rather like the semantics of a > >>>> feed=continuous request today, but with much better ordering and zero > >>>> possibility of “rewinds” where large portions of the ID space get > >>>> replayed because of issues in the cluster. > >>>> > >>>> This option is very efficient internally and does not require any > >>>> background maintenance. A future enhancement in FoundationDB’s storage > >>>> engine is designed to enable longer-running read-only transactions, so > >>>> we will likely to be able to improve the semantics with this option > >>>> over time. > >>>> > >>>> ## Option B: Multi-Version Index > >>>> > >>>> In this design the internal index can contain multiple entries for a > >>>> given document. Each entry includes the sequence at which the document > >>>> edit was made, and may also include a sequence at which it was > >>>> overwritten by a more recent edit. > >>>> > >>>> The implementation of a _changes request would start by getting the > >>>> current version of the datastore (call this the read version), and then > >>>> as it examines entries in the index it would skip over any entries > >>>> where there’s a “tombstone” sequence less than the read version. > >>>> Crucially, if the request needs to be implemented across multiple > >>>> transactions, each transaction would use the same read version when > >>>> deciding whether to include entries in the index in the _changes > >>>> response. The readers would know to stop when and if they encounter an > >>>> entry where the created version is greater than the read version. > >>>> Perhaps a diagram helps to clarify, a simplified version of the > >>>> internal index might look like > >>>> > >>>> {“seq”: 1, “id”: ”foo”} > >>>> {“seq”: 2, “id”: ”bar”, “tombstone”: 5} > >>>> {“seq”: 3, “id”: “baz”} > >>>> {“seq”: 4, “id”: “bif”, “tombstone": 6} > >>>> {“seq”: 5, “id”: “bar”} > >>>> {“seq”: 6, “id”: “bif”} > >>>> > >>>> A _changes request which happens to commence when the database is at > >>>> sequence 5 would return (ignoring the format of “seq” for simplicity) > >>>> > >>>> {“seq”: 1, “id”: ”foo”} > >>>> {“seq”: 3, “id”: “baz”} > >>>> {“seq”: 4, “id”: “bif”} > >>>> {“seq”: 5, “id”: “bar”} > >>>> > >>>> i.e., the first instance “bar” would be skipped over because a more > >>>> recent version exists within the time horizon, but the first instance > >>>> of “bif” would included because “seq”: 6 is outside our horizon. > >>>> > >>>> The downside of this approach is someone has to go in and clean up > >>>> tombstoned index entries eventually (or else provision lots and lots of > >>>> storage space). One way we could do this (inside CouchDB) would be to > >>>> have each _changes session record its read version somewhere, and then > >>>> have a background process go in and remove tombstoned entries where the > >>>> tombstone is less than the earliest read version of any active request. > >>>> It’s doable, but definitely more load on the server. > >>>> > >>>> Also, note this approach is not guaranteeing that the older versions of > >>>> the documents referenced in those tombstoned entries are actually > >>>> accessible. Much like today, the changes feed would include a revision > >>>> identifier which, upon closer inspection, has been superseded by a more > >>>> recent version of the document. Unlike today, that older version would > >>>> be expunged from the database immediately if a descendant revision > >>>> exists. > >>>> > >>>> — > >>>> > >>>> OK, so those are the two basic options. I’d particularly like to hear > >>>> if the behavior described in Option A would prove problematic for > >>>> certain use cases, as it’s the simpler and more efficient of the two > >>>> options. Thanks! > >>>> > >>>> Adam > >>> > > > >