Re: [DISCUSS] : things we need to solve/decide : changes feed

2019-03-21 Thread Adam Kocoloski
FYI I moved this to an RFC at 
https://github.com/apache/couchdb-documentation/pull/401

Adam

> On Mar 18, 2019, at 10:47 PM, Adam Kocoloski  wrote:
> 
> 
>> On Mar 18, 2019, at 9:03 PM, Alex Miller  
>> wrote:
>> 
>> 
>>> On Mar 5, 2019, at 4:04 PM, Adam Kocoloski  wrote:
>>> With the incarnation and branch count in place we’d be looking at a design 
>>> where the KV pairs have the structure
>>> 
>>> (“changes”, Incarnation, Versionstamp) = (ValFomat, DocID, RevFormat, 
>>> RevPosition, RevHash, BranchCount)
>>> 
>>> where ValFormat is an enumeration enabling schema evolution of the value 
>>> format in the future, and RevFormat, RevPosition, RevHash are associated 
>>> with the winning edit branch for the document (not necessarily the edit 
>>> that occurred at this version, matching current CouchDB behavior) and carry 
>>> the meanings defined in the revision storage RFC[2].
>> 
>> 
>> 
>> Do note that with versionstamped keys, and atomic operations in general, 
>> it’s important to keep in mind that committing a transaction might return 
>> `commit_unknown_result`.  Transaction loops will retry a 
>> `commit_unknown_result` error by default.  (Or, will, if your erlang/elixer 
>> bindings copy the behavior of the rest of the bindings.)  So you’ll need 
>> some way of making an insert into `changes` an idempotent operation.
>> 
>> 
>> I’ll volunteer three possible options:
>> 
>> 1. The easiest case is if you happen to be inserting a known, fixed key (and 
>> preferably one that contains a versionstamped value) in the same transaction 
>> as a versionstamped key, as then you have a key to check in your database to 
>> tell if your commit happened or not.
>> 
>> 2. If you’re doing an insert of just this key in a transaction, and your key 
>> space has relatively infrequent writes, then you might be able to get away 
>> with remembering the initial read version of your transaction, and issue a 
>> range scan from (“changes”, Incarnation, InitiailReadVersion) -> (“changes”, 
>> infinity, infinity), and filter through looking for a value equal to what 
>> you tried to write.
>> 
>> 3. Accept that you might write duplicate values at different versionstamped 
>> keys, and write your client code such that it will skip repeated values that 
>> it has already seen.
>> 
>> I had filed an internal bug long ago to complain about this before, which 
>> I’ve now copied over to GitHub[1].  So if this becomes absurdly difficult to 
>> work around, feel free to show up there to complain.
>> 
>> [1]: https://github.com/apple/foundationdb/issues/1321 
>> 
> 
> Hi Alex, thanks for that comment and for taking a close read. Option 1 could 
> almost work here; we will be inserting up to two keys in a “revisions” 
> subspace as part of the same transaction that we could read and that would 
> include both the RevHash and the Versionstamp. The latest design for that 
> subspace is here:
> 
> https://github.com/apache/couchdb-documentation/blob/5197cdffe1e2c08a7640dd646dd02909c0cf51ef/rfcs/001-fdb-revision-metadata-model.md
> 
> If I understand correctly, I think the edge case regarding 
> `commit_unknown_result` that we’re not adequately guarding against is the 
> following series of events:
> 
> 1) Txn A tries to commit an edit and gets `commit_unknown_result`; in 
> reality, the transaction failed
> 2) Txn B tries to commit an *identical* edit (save for the versionstamp) and 
> succeeds
> 3) Txn A retries and finds the entry in “revisions” for this `RevHash` exists 
> and the `Versionstamp` in “changes” for this DocID higher than the one 
> initially attempted
> 
> In this scenario we should report an edit conflict failure back to the client 
> for Txn A, but the end result is indistinguishable from the case where 
> 
> 1) Txn A tries to commit an edit and gets `commit_unknown_result`; in 
> reality, the transaction *succeeds*
> 2) Txn B tries to edit a *different* branch of the document and succeeds 
> (thereby replacing Txn A’s entry in “changes”)
> 
> which is a scenario where we need to report success for both Txn A and Txn B.
> 
> We could close this loophole by storing the Versionstamp alongside the 
> RevHash for every edit in the “revisions” subspace, rather than only storing 
> the Versionstamp of the latest edit to the document. Not cheap though. Will 
> give it some thought. Thanks!
> 
> Adam
> 



Re: [DISCUSS] Implementing _all_docs on FoundationDB

2019-03-21 Thread Robert Newson
Hi,

Thanks for pushing forward, and I owe feedback on other threads you've started.

Rather feebly, I'm just agreeing with you. option 3 for include_docs=false and 
option 1 for include_docs=true sounds ideal. both flavours are very common so 
it makes sense to build a solution for each. At a pinch we can just do option 3 
+ async doc lookups in a first release and then circle back, but the RFC should 
propose 1 and 3 as our design intention.

-- 
  Robert Samuel Newson
  rnew...@apache.org

On Thu, 21 Mar 2019, at 19:50, Adam Kocoloski wrote:
> Hi all, me again. This one will be shorter :) As I see it we have three 
> different options for serving the _all_docs endpoint from FDB: 
> 
> ## Option 1: Read the document data, discard the bodies
> 
> We likely will have the documents stored in docid order already; we 
> could do range reads and discard everything but the ID and _rev by 
> default. This can be a very efficient implementation of 
> include_docs=true (though one needs to be careful about skipping the 
> conflict bodies), but pretty wasteful otherwise.
> 
> ## Option 2: Read the “revisions” subspace
> 
> We also have an entry for every document in ID order in the “revisions” 
> subspace. The disadvantage of this approach is that every deleted edit 
> branch shows up there, too, and some databases will have lots of 
> deleted documents. We may need to build skiplists to know how to scan 
> efficiently. This subspace is also doing a lot of heavy lifting for us 
> already, and if we wanted to toy with alternative revision history 
> representations in the future it could get complicated
> 
> ## Option 3: Add specific entries to support _all_docs
> 
> We can also write an extra KV containing the ID and winning _rev in a 
> special subspace just to support this endpoint. It would be a blind 
> write because we’re already coordinating concurrent transactions 
> through reads on the “revisions” subspace. This would be conceptually 
> quite clean and simple, and the fastest implementation for constructing 
> the default response.
> 
> ===
> 
> My sense is Option 2 is a non-starter but I include it for completeness 
> in case anyone else thought of the same. I think Option 3 is a 
> reasonable space / efficiency / simplicity tradeoff, and it might also 
> be worth testing out Option 1 as an optimized implementation for 
> include_docs=true.
> 
> Thoughts? I imagine we can move quickly to an RFC for at least having 
> the extra KVs for Option 3, and in that design also acknowledge the 
> option for scanning the docs space directly to support include_docs.
> 
> Adam


[DISCUSS] Implementing _all_docs on FoundationDB

2019-03-21 Thread Adam Kocoloski
Hi all, me again. This one will be shorter :) As I see it we have three 
different options for serving the _all_docs endpoint from FDB: 

## Option 1: Read the document data, discard the bodies

We likely will have the documents stored in docid order already; we could do 
range reads and discard everything but the ID and _rev by default. This can be 
a very efficient implementation of include_docs=true (though one needs to be 
careful about skipping the conflict bodies), but pretty wasteful otherwise.

## Option 2: Read the “revisions” subspace

We also have an entry for every document in ID order in the “revisions” 
subspace. The disadvantage of this approach is that every deleted edit branch 
shows up there, too, and some databases will have lots of deleted documents. We 
may need to build skiplists to know how to scan efficiently. This subspace is 
also doing a lot of heavy lifting for us already, and if we wanted to toy with 
alternative revision history representations in the future it could get 
complicated

## Option 3: Add specific entries to support _all_docs

We can also write an extra KV containing the ID and winning _rev in a special 
subspace just to support this endpoint. It would be a blind write because we’re 
already coordinating concurrent transactions through reads on the “revisions” 
subspace. This would be conceptually quite clean and simple, and the fastest 
implementation for constructing the default response.

===

My sense is Option 2 is a non-starter but I include it for completeness in case 
anyone else thought of the same. I think Option 3 is a reasonable space / 
efficiency / simplicity tradeoff, and it might also be worth testing out Option 
1 as an optimized implementation for include_docs=true.

Thoughts? I imagine we can move quickly to an RFC for at least having the extra 
KVs for Option 3, and in that design also acknowledge the option for scanning 
the docs space directly to support include_docs.

Adam

Re: Shard Splitting API Proposal

2019-03-21 Thread Jan Lehnardt
Hi Nick,

On first glance, this looks all great and like an exemplary PR that is easy to 
follow. And bonus props for the nice docs. I'll have more time for a through 
review over the weekend.

Cheers
Jan
—

> On 18. Mar 2019, at 19:42, Nick Vatamaniuc  wrote:
> 
> Hello everyone,
> 
> Thank you all (Joan, Jan, Mike, Ilya, Adam) who contributed to the API
> discussion. There is now a PR open
> https://github.com/apache/couchdb/pull/1972 . If you get a chance, I would
> appreciate any reviews, feedback or comments.
> 
> The PR message explains how the commits are organized and references the
> RFC. Basically it starts with preparatory work, ensuring all the existing
> components know how to deal with split shards. Then, some lower level bits
> are implemented, like bulk copy, internal replicator updates, etc.,
> followed by the individual job implementation and the job manager which
> stitches everything together. In the end is the HTTP API implementation
> along with a suite of unit and Elixir integration tests.
> 
> There is also a README_reshard.md file in src/mem3 that tries to provide a
> more in-depth technical description of how everything fits together.
> https://github.com/apache/couchdb/pull/1972/files#diff-5ac7b51ec4e03e068bf271f34ecf88df
> (notice
> this URL might changer after a rebase).
> 
> Also special thanks to Paul (job module implementation, get_ring function,
> a lot of architectural and implementation advice), Eric (finding many bugs,
> fixes for the bugs, and writing bulk copy and change feed tests), and Jay
> (testing and a thorough code review).
> 
> Cheers,
> -Nick
> 
>> On Sun, Feb 17, 2019 at 2:32 AM Jan Lehnardt  wrote:
>> 
>> Heya Nick,
>> 
>> Nicely done. I think even though the majority of the discussion had
>> already happened here, the RFC nicely pulled together the various
>> discussion threads into a coherent whole.
>> 
>> I would imagine the discussion on GH would be similarly fruitful.
>> 
>> I gave it my +1, and as I said on the outset: I'm very excited about this
>> feature!
>> 
>> Best
>> Jan
>> —
>> 
>>> On 15. Feb 2019, at 23:45, Nick Vatamaniuc  wrote:
>>> 
>>> Decided to kick the tires on the new RFC proposal issue type and created
>>> one for shard splitting:
>>> 
>>> https://github.com/apache/couchdb/issues/1920
>>> 
>>> Let's see how it goes. Being it's the first one let me know if I missed
>>> anything obvious.
>>> 
>>> Also I'd like to thank everyone who contributed to the discussion. The
>> API
>>> is looking more solid and is much improved from where it started.
>>> 
>>> Cheers,
>>> -Nick
>>> 
>>> 
>>> 
 On Wed, Feb 13, 2019 at 12:03 PM Nick Vatamaniuc 
>> wrote:
 
 
 
> On Wed, Feb 13, 2019 at 11:52 AM Jan Lehnardt  wrote:
> 
> 
> 
>> On 13. Feb 2019, at 17:12, Nick Vatamaniuc 
>> wrote:
>> 
>> Hi Jan,
>> 
>> Thanks for taking a look!
>> 
>>> On Wed, Feb 13, 2019 at 6:28 AM Jan Lehnardt  wrote:
>>> 
>>> Nick, this is great, I have a few tiny nits left, apologies I only
>> now
> got
>>> to it.
>>> 
 On 12. Feb 2019, at 18:08, Nick Vatamaniuc 
> wrote:
 
 Shard Splitting API Proposal
 
 I'd like thank everyone who contributed to the API discussion. As a
>>> result
 we have a much better and consistent API that what we started with.
 
 Before continuing I wanted to summarize to see what we ended up
>> with.
> The
 main changes since the initial proposal were switching to using
> /_reshard
 as the main endpoint and having a detailed state transition history
> for
 jobs.
 
 * GET /_reshard
 
 Top level summary. Besides the new _reshard endpoint, there `reason`
> and
 the stats are more detailed.
 
 Returns
 
 {
 "completed": 3,
 "failed": 4,
 "running": 0,
 "state": "stopped",
 "state_reason": "Manual rebalancing",
 "stopped": 0,
 "total": 7
 }
 
 * PUT /_reshard/state
 
 Start or stop global rebalacing.
 
 Body
 
 {
 "state": "stopped",
 "reason": "Manual rebalancing"
 }
 
 Returns
 
 {
 "ok": true
 }
 
 * GET /_reshard/state
 
 Return global resharding state and reason.
 
 {
 "reason": "Manual rebalancing",
 "state": “stopped”
 }
>>> 
>>> More a note than a change request, but `state` is a very generic term
> that
>>> often confuses folks when they are new to something. If the set of
> possible
>>> states is `started` and `stopped`, how about making this endpoint a
> boolean?
>>> 
>>> /_reshard/enabled
>>> 
>>> {
>>> "enabled": true|false,
>>> "reason": "Manual rebalancing"
>>>