Re: [DISCUSS] _db_updates feed in FoundationDB

2019-03-27 Thread Ilya Khlopotov



> I don’t understand why you want to atomically append to an array here instead 
> of using a separate 
> (DbName, Versionstamp) KV each time. What’s the advantage? Both structures 
> require periodic 
> cleanup. I also don’t understand why you need this DbName -> Versionstamp 
> mapping at all. Is there > a reason to do some per-database cleanup on the 
> contents of this global feed?
The idea is to amortize the cleanup/de-duplication cost. We can trigger cleanup 
from the write transactions chosen by random sampling. However, we want to 
constrain cleanup to the events of a single database to avoid coordination 
between multiple de-duplication processes. Therefore, we need to maintain a 
history of updates to the database (since we use the versionstamp as a key). In 
this context (DbName, Versionstamp) is a very good idea. Because it would allow 
us to use standard range operations instead of messing with IBLTs.

# Summary

- Every update transaction would write following 
   Sequence = (DbName, EventType)
   (DbName, Sequence) = True
- When update transaction is finished we would generate random number to decide 
if we need to trigger de-duplication
- If we need to trigger de-duplication we would spawn a new process and pass 
the name of the database to it
- In that process we would do the following
  maxSequence = 0
  for _, Sequence in range(DbName, *):
 - remove oldest entries in "Sequence -> (DbName, EventType)" mapping
 - materialize the results in more consumable form
 maxSequence = max(maxSequence, Sequence)
  - issue delete range request range((DbName, *), last_less_than((DbName, 
maxSequence))

Due to scoping of the de-duplication operation to single database and use of 
random sampling we would be able to cleanup frequently updated operation at a 
different rate than less frequently updated ones. 

I hope it does make sense.

On 2019/03/27 20:33:16, Adam Kocoloski  wrote: 
>  Hi Ilya,
> 
> I agree it would be quite nice if there was a way to implement this feature 
> without a background worker — while also avoiding write contention for 
> transactions that would otherwise not conflict with one another. I’m not sure 
> it’s possible.
> 
> I have a few comments:
> 
> > We could maintain database level Sequence number and store global changes 
> > feed in the following form:
> >   UpdateSequence = (DbName, EventType, PreviousUpdateSequence)
> 
> Tracking a database-wide “latest Sequence” in a single KV would mean we can’t 
> execute any transactions on that database in parallel, so yet another reason 
> why that strawman approach route cannot work.
> 
> > In this case we could store the data we need as follows (under separate 
> > subspace TBD).
> > VersionStamp = (DbName, EventType)
> > DbName = [versionstamps]
> 
> I don’t understand why you want to atomically append to an array here instead 
> of using a separate (DbName, Versionstamp) KV each time. What’s the 
> advantage? Both structures require periodic cleanup. I also don’t understand 
> why you need this DbName -> Versionstamp mapping at all. Is there a reason to 
> do some per-database cleanup on the contents of this global feed?
> 
> Cheers, Adam
> 
> 
> > On Mar 27, 2019, at 2:07 PM, Ilya Khlopotov  wrote:
> > 
> > Hi, 
> > 
> > Both proposals are fine but need a consumer process. Which is a tricky 
> > requirement because it will lead to problems in cases when queue grows 
> > faster than we can consume it. This realization got me thinking about 
> > finding possible ways to eliminate the need for a consumer.
> > 
> > I wouldn't spell out the final solution right away since I want to 
> > demonstrate the thinking process so others could build better proposals on 
> > top of it. 
> > 
> > Essentially, we need to de-duplicate events. In order to do that we need to 
> > know when given database was updated last time. We could maintain database 
> > level Sequence number and store global changes feed in the following form:
> >   UpdateSequence = (DbName, EventType, PreviousUpdateSequence)
> > 
> > Then every 10th (or 100th or 1000th) transaction can trigger a compaction 
> > process for updated database. It would use PreviousUpdateSequence to get 
> > pointer to get its parent, read pointer to grandparent, cleanup parent, and 
> > so on so force until we wouldn't have anything to clean up.
> > 
> > This is a terrible idea for the following reasons:
> > - Including UpdateSequence is expensive since we would need to add one more 
> > read to every update transaction
> > - recursion to do cleanup is expensive and most likely would need to be 
> > done in multiple transactions
> > 
> > What if FDB would support a list type for a value and would have an atomic 
> > operation to add the value to the list if it is missing. In this case we 
> > could store the data we need as follows (under separate subspace TBD).
> > VersionStamp = (DbName, EventType)
> > DbName = [versionstamps]
> > 
> > In this case in order to de-duplicate 

Re: [DISCUSS] _db_updates feed in FoundationDB

2019-03-27 Thread Adam Kocoloski
 Hi Ilya,

I agree it would be quite nice if there was a way to implement this feature 
without a background worker — while also avoiding write contention for 
transactions that would otherwise not conflict with one another. I’m not sure 
it’s possible.

I have a few comments:

> We could maintain database level Sequence number and store global changes 
> feed in the following form:
>   UpdateSequence = (DbName, EventType, PreviousUpdateSequence)

Tracking a database-wide “latest Sequence” in a single KV would mean we can’t 
execute any transactions on that database in parallel, so yet another reason 
why that strawman approach route cannot work.

> In this case we could store the data we need as follows (under separate 
> subspace TBD).
> VersionStamp = (DbName, EventType)
> DbName = [versionstamps]

I don’t understand why you want to atomically append to an array here instead 
of using a separate (DbName, Versionstamp) KV each time. What’s the advantage? 
Both structures require periodic cleanup. I also don’t understand why you need 
this DbName -> Versionstamp mapping at all. Is there a reason to do some 
per-database cleanup on the contents of this global feed?

Cheers, Adam


> On Mar 27, 2019, at 2:07 PM, Ilya Khlopotov  wrote:
> 
> Hi, 
> 
> Both proposals are fine but need a consumer process. Which is a tricky 
> requirement because it will lead to problems in cases when queue grows faster 
> than we can consume it. This realization got me thinking about finding 
> possible ways to eliminate the need for a consumer.
> 
> I wouldn't spell out the final solution right away since I want to 
> demonstrate the thinking process so others could build better proposals on 
> top of it. 
> 
> Essentially, we need to de-duplicate events. In order to do that we need to 
> know when given database was updated last time. We could maintain database 
> level Sequence number and store global changes feed in the following form:
>   UpdateSequence = (DbName, EventType, PreviousUpdateSequence)
> 
> Then every 10th (or 100th or 1000th) transaction can trigger a compaction 
> process for updated database. It would use PreviousUpdateSequence to get 
> pointer to get its parent, read pointer to grandparent, cleanup parent, and 
> so on so force until we wouldn't have anything to clean up.
> 
> This is a terrible idea for the following reasons:
> - Including UpdateSequence is expensive since we would need to add one more 
> read to every update transaction
> - recursion to do cleanup is expensive and most likely would need to be done 
> in multiple transactions
> 
> What if FDB would support a list type for a value and would have an atomic 
> operation to add the value to the list if it is missing. In this case we 
> could store the data we need as follows (under separate subspace TBD).
> VersionStamp = (DbName, EventType)
> DbName = [versionstamps]
> 
> In this case in order to de-duplicate events, we would do the following:
> - every once in a while (every 10th (or 100th or 1000th) update transaction 
> (we would use PRNG ) to specific database) would execute compaction algorithm 
> - Read list of versionstamps for older updates and issue remove operations 
> for every version stamp except the biggest one
> - update history value to include only biggest versionstamp
> 
> The question is how we would implement atomic addition of a value to a list. 
> There is an IBLT data structure (https://arxiv.org/pdf/1101.2245.pdf) which 
> can help us to achieve that. IBLT consists of the multiple cells where every 
> cell has the following fields:
> - count
> - keySum
> - valueSum
> - hashkeySum
> 
> The beauty of this structure is that all fields are updated using blind 
> addition operations while supporting enumeration of all key-values stored in 
> the structure (with configurable probability). Which is available in FDB (aka 
> atomic addition).
> 
> For our specific case it doesn't look like we need valueSum (because we only 
> need keys) and hashkeySum (because we wouldn't have duplicates), so we can 
> simplify the structure.
> 
> Best regards,
> iilyak
> 
> 
> On 2019/03/20 22:47:42, Adam Kocoloski  wrote: 
>> Hi all,
>> 
>> Most of the discussions so far have focused on the core features that are 
>> fundamental to CouchDB: JSON documents, revision tracking, _changes. I 
>> thought I’d start a thread on something a bit different: the _db_updates 
>> feed.
>> 
>> The _db_updates feed is an API that enables users to discover database 
>> lifecycle events across an entire CouchDB instance. It’s primarily useful in 
>> deployments that have lots and lots of databases, where it’s impractical to 
>> keep connections open for every database, and where database creations and 
>> deletions may be an automated aspect of the application’s use of CouchDB.
>> 
>> There are really two topics for discussion here. The first is: do we need to 
>> keep it? The primary driver of applications creating lots of DBs is the 
>> per-DB granularity 

Re: [DISCUSS] _db_updates feed in FoundationDB

2019-03-27 Thread Ilya Khlopotov
Hi, 

Both proposals are fine but need a consumer process. Which is a tricky 
requirement because it will lead to problems in cases when queue grows faster 
than we can consume it. This realization got me thinking about finding possible 
ways to eliminate the need for a consumer.

I wouldn't spell out the final solution right away since I want to demonstrate 
the thinking process so others could build better proposals on top of it. 

Essentially, we need to de-duplicate events. In order to do that we need to 
know when given database was updated last time. We could maintain database 
level Sequence number and store global changes feed in the following form:
   UpdateSequence = (DbName, EventType, PreviousUpdateSequence)

Then every 10th (or 100th or 1000th) transaction can trigger a compaction 
process for updated database. It would use PreviousUpdateSequence to get 
pointer to get its parent, read pointer to grandparent, cleanup parent, and so 
on so force until we wouldn't have anything to clean up.

This is a terrible idea for the following reasons:
- Including UpdateSequence is expensive since we would need to add one more 
read to every update transaction
- recursion to do cleanup is expensive and most likely would need to be done in 
multiple transactions

What if FDB would support a list type for a value and would have an atomic 
operation to add the value to the list if it is missing. In this case we could 
store the data we need as follows (under separate subspace TBD).
VersionStamp = (DbName, EventType)
DbName = [versionstamps]

In this case in order to de-duplicate events, we would do the following:
- every once in a while (every 10th (or 100th or 1000th) update transaction (we 
would use PRNG ) to specific database) would execute compaction algorithm 
- Read list of versionstamps for older updates and issue remove operations for 
every version stamp except the biggest one
- update history value to include only biggest versionstamp

The question is how we would implement atomic addition of a value to a list. 
There is an IBLT data structure (https://arxiv.org/pdf/1101.2245.pdf) which can 
help us to achieve that. IBLT consists of the multiple cells where every cell 
has the following fields:
- count
- keySum
- valueSum
- hashkeySum

The beauty of this structure is that all fields are updated using blind 
addition operations while supporting enumeration of all key-values stored in 
the structure (with configurable probability). Which is available in FDB (aka 
atomic addition).

For our specific case it doesn't look like we need valueSum (because we only 
need keys) and hashkeySum (because we wouldn't have duplicates), so we can 
simplify the structure.

Best regards,
iilyak
   

On 2019/03/20 22:47:42, Adam Kocoloski  wrote: 
> Hi all,
> 
> Most of the discussions so far have focused on the core features that are 
> fundamental to CouchDB: JSON documents, revision tracking, _changes. I 
> thought I’d start a thread on something a bit different: the _db_updates feed.
> 
> The _db_updates feed is an API that enables users to discover database 
> lifecycle events across an entire CouchDB instance. It’s primarily useful in 
> deployments that have lots and lots of databases, where it’s impractical to 
> keep connections open for every database, and where database creations and 
> deletions may be an automated aspect of the application’s use of CouchDB.
> 
> There are really two topics for discussion here. The first is: do we need to 
> keep it? The primary driver of applications creating lots of DBs is the 
> per-DB granularity of access controls; if we go down the route of 
> implementing the document-level _access proposal perhaps users naturally 
> migrate away from this DB-per-user data model. I’d be curious to hear points 
> of view there.
> 
> I’ll assume for now that we do want to keep it, and offer some thoughts on 
> how to implement it. The main challenge with _db_updates is managing the 
> write contention; in write-heavy databases you have a lot of producers trying 
> to tag that particular database as “updated", but all the consumer really 
> cares about is getting a single “dbname”:”updated” event as needed. In the 
> current architecture we try to dedupe a lot of the events in-memory before 
> updating a regular CouchDB database with this information, but this leaves us 
> exposed to possibly dropping events within a few second window.
> 
> ## Option 1: Queue + Compaction
> 
> One way to tackle this in FoundationDB is to have an intermediate subspace 
> reserved as a queue. Each transaction that modifies a database would insert a 
> versionstamped KV into the queue like
> 
> Versionstamp = (DbName, EventType)
> 
> Versionstamps are monotonically increasing and inserting versionstamped keys 
> is a conflict-free operation. We’d have a consumer of this queue which is 
> responsible for “log compaction”; i.e., the consumer would do range reads on 
> the queue subspace, toss out duplicate 

Re: Prototype CouchDB Layer for FoundationDB

2019-03-27 Thread Nick Vatamaniuc
Looking over the code, it seems very simple and clean. Without knowing much
of the internals or following the discussion too closely I think I was able
to read and understand most of it.

I like split between db and fdb layers. Hopefully it means if we start from
this we can do some parallel work implementing on top of db layer and below
it at the same time.

The use of maps is nice and seems to simply things quite a bit.

Don't have much to add about metadata and other issues, will let others who
know more chime in. It seems a bit similar to how we had the
instance_start_time at one point or how we add the suffix to db shards.

Great work!
-Nick

On Wed, Mar 27, 2019 at 12:53 PM Paul Davis 
wrote:

> Hey everyone!
>
> I've gotten enough of a FoundationDB layer prototype implemented [1]
> to start sharing publicly. This is emphatically no where near useful
> to non-CouchDB-developers. The motivation for this work was to try and
> get enough of a basic prototype written so that we can all start
> fleshing out the various RFCs with actual implementations to compare
> and contrast and so on.
>
> To be clear, I've made a lot of intentionally "bad" choices while
> writing this to both limit the scope of what I was trying to
> accomplish and also to make super clear that I don't expect any of
> this code to be "final" in any way whatsoever. This work is purely so
> that everyone has an initial code base that can be "turned on" so to
> speak. To that end, here's a non-exhaustive list of some of the
> silliness I've done:
>
>   1. All document bodies must fit inside a single value
>   2. All requests must fit within the single fdb transaction limits
>   3. I'm using binary_to_term for things like the revision tree
>   4. The revision tree has to fit in a single value
>   5. There's basically 0 supported query string parameters at this point
>   6. Nothing outside super basic db/doc ops is implemented (i.e., no views)
>
> However, what it does do is start! And it talks to FoundationDB! So at
> least that bit seems to be reliable (only tested on OS X via
> FoundationDB binary installers so super duper caveat on that
> "reliable").
>
> There's a small test script [2] that shows what it's currently capable
> of. A quick glance at that should give a pretty good idea of how
> little is actually implemented in this prototype. There's also a list
> of notes I've been keeping as I've been hacking on this that also
> tries to gather a bunch of questions that'll need to be answered [3]
> as we continue to work on this.
>
> To that end, I have learned a couple lessons from working with
> FoundationDB from this work that I'd like to share. First is that
> while we can cache a bunch of stuff, we have to be able to ensure that
> the cache is invalidated properly when various bits of metadata
> change. There's a feature on FoundationDB master [1] for this specific
> issue. I've faked the same behavior using an arbitrary key but the
> `fabric2_fdb:is_current/1` function I think is a good implementation
> of this done correctly.
>
> Secondly, I spent a lot of time trying to figure out how to use
> FoundationDB's Directory and Subspace layers inside the CouchDB layer.
> After barking up that tree for a long time I've basically decided that
> the best answer is probably "don't". I do open a single directory at
> the root, but that's merely in order to play nice with any other
> layers that use the directory layer. Inside the "CouchDB directory"
> its all strictly Tuple Layer direct code.
>
> The Subspace Layer seems to be basically useless in Erlang. First, its
> a very thin wrapper over the Tuple Layer that basically just holds
> onto a prefix that's prepended onto the tuple layer operations. In
> other languages the Subspace Layer has a lot of syntactical sugar that
> makes them useful. Erlang doesn't support any of that so it ends up
> being more of a burden to use that rather than just using the Tuple
> Layer directly. Dropping the use of directories and subspaces has
> greatly simplified the implementation thus far.
>
> In terms of code layout, nearly all of the new implementation is in
> `src/fabric/src/fabric2*` modules. There's also a few changes to
> chttpd obviously to call the new code as well as commenting out parts
> of features so I didn't have to follow all the various call stacks
> updating huge swathes of semi-unrelated code.
>
> I'd be super interested to hear feed back and see people start running
> with this in whatever direction catches their fancy. Hopefully this
> proves useful for people to start writing implementations of the
> various RFCs so we can make progress on those fronts.
>
> [1] https://github.com/apache/couchdb/compare/prototype/fdb-layer
> [2] https://github.com/apache/couchdb/blob/prototype/fdb-layer/fdb-test.py
> [3]
> https://github.com/apache/couchdb/blob/prototype/fdb-layer/FDB_NOTES.md
> [4]
> https://forums.foundationdb.org/t/a-new-tool-for-managing-layer-metadata/1191
>


Prototype CouchDB Layer for FoundationDB

2019-03-27 Thread Paul Davis
Hey everyone!

I've gotten enough of a FoundationDB layer prototype implemented [1]
to start sharing publicly. This is emphatically no where near useful
to non-CouchDB-developers. The motivation for this work was to try and
get enough of a basic prototype written so that we can all start
fleshing out the various RFCs with actual implementations to compare
and contrast and so on.

To be clear, I've made a lot of intentionally "bad" choices while
writing this to both limit the scope of what I was trying to
accomplish and also to make super clear that I don't expect any of
this code to be "final" in any way whatsoever. This work is purely so
that everyone has an initial code base that can be "turned on" so to
speak. To that end, here's a non-exhaustive list of some of the
silliness I've done:

  1. All document bodies must fit inside a single value
  2. All requests must fit within the single fdb transaction limits
  3. I'm using binary_to_term for things like the revision tree
  4. The revision tree has to fit in a single value
  5. There's basically 0 supported query string parameters at this point
  6. Nothing outside super basic db/doc ops is implemented (i.e., no views)

However, what it does do is start! And it talks to FoundationDB! So at
least that bit seems to be reliable (only tested on OS X via
FoundationDB binary installers so super duper caveat on that
"reliable").

There's a small test script [2] that shows what it's currently capable
of. A quick glance at that should give a pretty good idea of how
little is actually implemented in this prototype. There's also a list
of notes I've been keeping as I've been hacking on this that also
tries to gather a bunch of questions that'll need to be answered [3]
as we continue to work on this.

To that end, I have learned a couple lessons from working with
FoundationDB from this work that I'd like to share. First is that
while we can cache a bunch of stuff, we have to be able to ensure that
the cache is invalidated properly when various bits of metadata
change. There's a feature on FoundationDB master [1] for this specific
issue. I've faked the same behavior using an arbitrary key but the
`fabric2_fdb:is_current/1` function I think is a good implementation
of this done correctly.

Secondly, I spent a lot of time trying to figure out how to use
FoundationDB's Directory and Subspace layers inside the CouchDB layer.
After barking up that tree for a long time I've basically decided that
the best answer is probably "don't". I do open a single directory at
the root, but that's merely in order to play nice with any other
layers that use the directory layer. Inside the "CouchDB directory"
its all strictly Tuple Layer direct code.

The Subspace Layer seems to be basically useless in Erlang. First, its
a very thin wrapper over the Tuple Layer that basically just holds
onto a prefix that's prepended onto the tuple layer operations. In
other languages the Subspace Layer has a lot of syntactical sugar that
makes them useful. Erlang doesn't support any of that so it ends up
being more of a burden to use that rather than just using the Tuple
Layer directly. Dropping the use of directories and subspaces has
greatly simplified the implementation thus far.

In terms of code layout, nearly all of the new implementation is in
`src/fabric/src/fabric2*` modules. There's also a few changes to
chttpd obviously to call the new code as well as commenting out parts
of features so I didn't have to follow all the various call stacks
updating huge swathes of semi-unrelated code.

I'd be super interested to hear feed back and see people start running
with this in whatever direction catches their fancy. Hopefully this
proves useful for people to start writing implementations of the
various RFCs so we can make progress on those fronts.

[1] https://github.com/apache/couchdb/compare/prototype/fdb-layer
[2] https://github.com/apache/couchdb/blob/prototype/fdb-layer/fdb-test.py
[3] https://github.com/apache/couchdb/blob/prototype/fdb-layer/FDB_NOTES.md
[4] 
https://forums.foundationdb.org/t/a-new-tool-for-managing-layer-metadata/1191


Re: [GitHub] [couchdb-docker] bgehman commented on issue #71: Change default user in the Docker image from root to couchdb

2019-03-27 Thread Joan Touzet
A tested pull request that does the right thing and keeps the same uid/gid we 
currently use would be welcomed and considered seriously.