Re: [DISCUSSION] Clean up non-functioning applications from main

2021-04-13 Thread Paul Davis
Another +1 to removing as much as possible and building back anything
that we feel is appropriate for main.

On Mon, Apr 12, 2021 at 1:31 PM Robert Newson  wrote:
>
> +1 to all the proposed cuts.
>
> I’m keen to see couch_server.erl itself go, so its remaining uses need new 
> homes (couch_passwords an obvious choice for the hashing referred to, etc).
>
> I’m inferring that neither purge and global_changes work on main anyway, but 
> they can still be called and will route to 3.x code. Agree that it’s better 
> to stub those out (send a 503 I guess?) in the short term and either 
> re-implement on FDB or (as Joan said) vote on their permanent removal. 
> (Noting that a much better implementation of purge and global_changes seems 
> possible with FDB though less clear if the effort is justified).
>
> So, in brief, remove absolutely all the obsoleted, unreachable code as soon 
> as possible, then once the dust has settled we can see if there are obvious 
> gaps we should fill in before the 4.0 release.
>
> B.
>
> > On 12 Apr 2021, at 18:51, Nick Vatamaniuc  wrote:
> >
> > The current versions of those apps rely on mem3, clustering, adding
> > nodes, etc and they will trail behind the 3.x versions since
> > developers wouldn't think to port those updates to main since they are
> > simply non-functional there. Most of those apps have to be re-written
> > from scratch and it would be better to start from the recent working
> > versions on 3.x.  The tests for those apps don't really fail as we get
> > green builds on PR branches to main. We simply don't run them at all
> > and only run a subset of applications (fabric, couch_jobs, couch_views
> > and a few others).
> >
> > Don't think this is about a 4.x release per-se. This is mainly about
> > cleaning up, reducing the cognitive load of anyone jumping in trying
> > to work on main and seeing applications and endpoints calling into
> > non-existing applications.
> >
> > -Nick
> >
> >
> > -Nick
> >
> > On Mon, Apr 12, 2021 at 1:13 PM Joan Touzet  wrote:
> >>
> >> Generally +1 with one major reservation:
> >>
> >> On 12/04/2021 12:25, Nick Vatamaniuc wrote:
> >>> * Some applications we want to have in main, but the way they are
> >>> implemented currently rely completely or mostly on 3.x code: purge
> >>> logic, couch_peruser, global_changes, setup. I am thinking it may be
> >>> better to remove them from main as we'll have them on the 3.x branch
> >>> they'll be recent (working) there. When we're ready to fix them up, we
> >>> can copy that code from there to the main branch.
> >>
> >> If the intent is to release 4.0 with them, then I would suggest keeping
> >> them there and allowing their tests to fail so we know that a "failing
> >> main" means that the product isn't ready to release yet.
> >>
> >> If we are pushing these out past a 4.0 release, then that decision needs
> >> to be made formally.
> >>
> >> Parenthetically, we try to avoid "code owners" here, but usually fixes
> >> to couch_peruser and setup fall to Jan, while purge and global_changes I
> >> *believe* have generally been made by IBM/Cloudant.
> >>
> >> -Joan "not sure main is ready to be called 4.0 yet anyway" Touzet
>


Re: [VOTE] Set a finite default for max_attachment_size

2021-02-01 Thread Paul Davis
+1

Default unlimited seems like an oversight regardless of what we change it to.

On Mon, Feb 1, 2021 at 11:59 AM Eric Avdey  wrote:
>
> Maybe I didn't express myself clear enough. Setting some finit default is not 
> a purpose, it's what you are doing and I'm asking what the reason for this 
> change. In other words I'm not asking what are you doing, I'm asking why are 
> you doing this.
>
> Introducing a new limit will be a breaking change to anoyone who uploads 
> attachments larger than that limit, obviously, so "assumed 1G is large 
> enough" sounds really arbitrary to me without any factual support for that 
> assumption.
>
>
> Eric
>
>
> > On Feb 1, 2021, at 13:15, Bessenyei Balázs Donát  wrote:
> >
> > The purpose of this vote / PR is to set _some_ finite default. I went
> > with 1G as I assumed that would not break anyone's production system.
> > I'd support decreasing that limit over time.
> >
> > The vote has been open for 72 hours now, but I believe it still needs
> > two more +1s to pass.
> >
> >
> > Donat
> >
> > On Thu, Jan 28, 2021 at 10:44 PM Eric Avdey  wrote:
> >>
> >> This got me curious and I tried to upload Ubuntu image as an attachment. 
> >> Interestingly CouchDB 3.x accepted first 1.4G of 2.8G file and then 
> >> returned proper 201 response with a new doc revision, which I certanly 
> >> didn't expect. Should say, that 1.4G seems suspiciously similar to a 
> >> normal memory limit for a 32 bit process.
> >>
> >> Putting this aside, I agree that uploading large attachments is an 
> >> anti-pattern and 1G seems excessive, hence my question. I'd expect this 
> >> number to be based on something and correlating it with a  technical limit 
> >> in 4.x makes a lot of sense to me.
> >>
> >>
> >> Eric
> >>
> >>
> >>> On Jan 28, 2021, at 16:02, Robert Newson  wrote:
> >>>
> >>> Hi,
> >>>
> >>> I think a gigabyte is _very_ generous given our experience of this 
> >>> feature in practice.
> >>>
> >>> In 4.x attachment size will necessarily be much more restrictive, so it 
> >>> seems prudent to move toward that limit.
> >>>
> >>> I don’t think many folks (hopefully no one!) is routinely inserting 
> >>> attachments over 1 gib today, I’d be fairly surprised if it even works.
> >>>
> >>> B.
> >>>
>  On 28 Jan 2021, at 19:42, Eric Avdey  wrote:
> 
>  There is no justification neither here or on the PR for this change, 
>  i.e. why this is done. Original infinity default was set to preserve 
>  previous behaviour, this change will inadvertently break workflow for 
>  users who upload large attachment and haven't set explicit default, so 
>  why is it fine to do now? There might be some discussion around this 
>  somewhere, but it'd be nice to include it here for sake of people like 
>  me who's out of the loop.
> 
>  Also 1G limit seems arbitrary - how was it choosen?
> 
> 
>  Thanks,
>  Eric
> 
> 
> 
> > On Jan 28, 2021, at 01:46, Bessenyei Balázs Donát  
> > wrote:
> >
> > Hi All,
> >
> > In https://github.com/apache/couchdb/pull/3347 I'm proposing to set a
> > finite default for max_attachment_size .
> > The PR is approved, but as per Ilya's request, I'd like to call for a
> > lazy majority vote here.
> > The vote will remain open for at least 72 hours from now.
> >
> > Please let me know if you have any questions, comments or concerns.
> >
> >
> > Donat
> 
> >>>
> >>
>


Re: [VOTE] couchdb 4.0 transaction semantics

2021-01-08 Thread Paul Davis
+1

On Thu, Jan 7, 2021 at 5:03 AM Robert Newson  wrote:
>
> +1
>
> > On 7 Jan 2021, at 11:00, Robert Newson  wrote:
> >
> > Hi,
> >
> > Following on from the discussion at 
> > https://lists.apache.org/thread.html/rac6c90c4ae03dc055c7e8be6eca1c1e173cf2f98d2afe6d018e62d29%40%3Cdev.couchdb.apache.org%3E
> >  
> > <https://lists.apache.org/thread.html/rac6c90c4ae03dc055c7e8be6eca1c1e173cf2f98d2afe6d018e62d29@%3Cdev.couchdb.apache.org%3E>
> >
> > The proposal is;
> >
> > "With the exception of the changes endpoint when in feed=continuous mode, 
> > that all data-bearing responses from CouchDB are constructed from a single, 
> > immutable snapshot of the database at the time of the request.”
> >
> > Paul Davis summarised the discussion in four bullet points, reiterated here 
> > for context;
> >
> > 1. A single CouchDB API call should map to a single FDB transaction
> > 2. We absolutely do not want to return a valid JSON response to any
> > streaming API that hit a transaction boundary (because data
> > loss/corruption)
> > 3. We're willing to change the API requirements so that 2 is not an issue.
> > 4. None of this applies to continuous changes since that API call was
> > never a single snapshot.
> >
> >
> > Please vote accordingly, we’ll run this as lazy consensus per the bylaws 
> > (https://couchdb.apache.org/bylaws.html#lazy 
> > <https://couchdb.apache.org/bylaws.html#lazy>)
> >
> > B.
> >
>


Re: [DISCUSS] Deprecate custom reduce functions

2020-10-19 Thread Paul Davis
nd it is a feature I use, so I 
> >>>>> would not be in favor of deprecating it entirely, without a clear 
> >>>>> proposal/documentation for an alternative/work-around.
> >>>>>
> >>>>> Based on the explanation below, it doesn't sound like there's a 
> >>>>> technical reason to deprecate it, but rather a user-experience reason. 
> >>>>> Is this correct?
> >>>>>
> >>>>> If my understanding is correct, I'm not excited about the proposal, but 
> >>>>> before I dive further into my thoughts, I'd like confirmation that I 
> >>>>> actually understand the proposal, and am not worried about something 
> >>>>> else ;)
> >>>>>
> >>>>> Jonathan
> >>>>>
> >>>>>
> >>>>> On 10/13/20 5:48 PM, Robert Samuel Newson wrote:
> >>>>>> Hi All,
> >>>>>>
> >>>>>> As part of CouchDB 4.0, which moves the storage tier of CouchDB into 
> >>>>>> FoundationDB, we have struggled to reproduce the full map/reduce 
> >>>>>> functionality. Happily this has now happened, and that work is now 
> >>>>>> merged to the couchdb main branch.
> >>>>>>
> >>>>>> This functionality includes the use of custom (javascript) reduce 
> >>>>>> functions. It is my experience that these are very often problematic, 
> >>>>>> in that much more often than not the functions do not significantly 
> >>>>>> reduce the input parameters into a smaller result (indeed, sometimes 
> >>>>>> the output is the same or larger than the input).
> >>>>>>
> >>>>>> To that end, I'm asking if we should deprecate the feature entirely.
> >>>>>>
> >>>>>> In scope for this thread is the middle ground proposal that Paul Davis 
> >>>>>> has written up here;
> >>>>>>
> >>>>>> https://github.com/apache/couchdb/pull/3214
> >>>>>>
> >>>>>> Where custom reduces are not allowed by default but can be enabled.
> >>>>>>
> >>>>>> The core _ability_ to do custom reduces will always been maintained, 
> >>>>>> this is intrinsic to the design of ebtree, the structure we use on top 
> >>>>>> of FoundationDB to hold and maintain intermediate reduce values.
> >>>>>>
> >>>>>> My view is that we should merge #3214 and disable custom reduces by 
> >>>>>> default.
> >>>>>>
> >>>>>> B.
> >>>>>>
> >
>


Re: [PROPOSAL] Archiving git branches

2020-10-08 Thread Paul Davis
+1

On Wed, Oct 7, 2020 at 5:11 PM Joan Touzet  wrote:
>
> Hi there,
>
> I'd like to clean up our branches in git on the main couchdb repo. This
> would involve deleting some of our obsolete branches, after tagging the
> final revision on each branch. This way, we retain the history but the
> branch no longer appears in the dropdown on GitHub, or in git branch
> listings at the cli.
>
> Example process:
>
> git tag archive/1.3.x 1.3.x
> git branch -d 1.3.x
> git push origin :1.3.x
> git push --tags
>
> If we ever needed the branch back, we just:
>
> git checkout -b 1.3.x archive/1.3.x
>
> I would propose to do this for all branches except:
>
> main
> master (for now)
> 2.3.x
> 3.x
> prototype/fdb-layer
>
> ...plus any branches that have been touched in the past 90 days, that
> still have open PRs, or that someone specifically asks me to retain in
> this thread.
>
> I'd also like to do this on couchdb-documentation and couchdb-fauxton.
>
> I would propose to do this about 1 week from now, let's say on October 15th.
>
> Thoughts?
>
> -Joan "fall cleaning" Touzet


Re: [DISCUSS] Rename default branch to `main`

2020-09-16 Thread Paul Davis
I'll create the branches on all of the appropriate repositories today
and start looking at Jenkins requirements. I'll hold off on filing the
infra ticket until tomorrow so that folks have time to double check my
sanity.

On Wed, Sep 16, 2020 at 10:50 AM Paul Davis  wrote:
>
> Also, I added an easier to digest list of repositories with their
> planned actions to the gist [1]. I'm repasting here for historical
> tracing.
>
> Create `main` branch, update default branch and branch protections
> ===
>
> * couchdb
> * couchdb-admin
> * couchdb-b64url
> * couchdb-bear
> * couchdb-ci
> * couchdb-config
> * couchdb-docker
> * couchdb-documentation
> * couchdb-erlfdb
> * couchdb-escodegen
> * couchdb-esprima
> * couchdb-ets-lru
> * couchdb-fauxton
> * couchdb-folsom
> * couchdb-glazier
> * couchdb-helm
> * couchdb-hqueue
> * couchdb-hyper
> * couchdb-ibrowse
> * couchdb-ioq
> * couchdb-jaeger-passage
> * couchdb-jiffy
> * couchdb-khash
> * couchdb-local
> * couchdb-meck
> * couchdb-mochiweb
> * couchdb-nano
> * couchdb-passage
> * couchdb-pkg
> * couchdb-rebar
> * couchdb-recon
> * couchdb-snappy
> * couchdb-thrift-protocol
>
> Have infra set `asf-site` as default branch
> ===
>
> * couchdb-www
>
> Empty repositories to delete
> ===
>
> * couchdb-fauxton-server
> * couchdb-javascript-tests
> * couchdb-query-server-spidermonkey
>
> Repositories to archive
> ===
>
> * couchdb-cassim
> * couchdb-chttpd
> * couchdb-couch
> * couchdb-couch-collate
> * couchdb-couch-dbupdates
> * couchdb-couch-epi
> * couchdb-couch-event
> * couchdb-couch-httpd
> * couchdb-couch-index
> * couchdb-couch-log
> * couchdb-couch-log-lager
> * couchdb-couch-mrview
> * couchdb-couch-plugins
> * couchdb-couch-replicator
> * couchdb-couch-stats
> * couchdb-ddoc-cache
> * couchdb-erlang-bcrypt
> * couchdb-erlang-tests
> * couchdb-examples
> * couchdb-fabric
> * couchdb-futon
> * couchdb-global-changes
> * couchdb-goldrush
> * couchdb-jquery-couch
> * couchdb-lager
> * couchdb-mango
> * couchdb-mem3
> * couchdb-nmo
> * couchdb-oauth
> * couchdb-peruser
> * couchdb-query-server-node
> * couchdb-rexi
> * couchdb-setup
> * couchdb-smoosh
> * couchdb-triq
> * couchdb-twig
>
>
> [1] https://gist.github.com/davisp/9de8fa167812f80356d4990e390c9351
>
> On Wed, Sep 16, 2020 at 10:49 AM Paul Davis  
> wrote:
> >
> > Right. I figure that's basically an ASF version of the `gh-pages` branch?
> >
> > On Wed, Sep 16, 2020 at 10:39 AM Joan Touzet  wrote:
> > >
> > >
> > > On 16/09/2020 11:39, Paul Davis wrote:
> > > > On Wed, Sep 16, 2020 at 10:32 AM Joan Touzet  wrote:
> > > >>
> > > >>
> > > >>
> > > >> On 16/09/2020 10:57, Paul Davis wrote:
> > > >>> Hey all,
> > > >>>
> > > >>> Here's a list of all CouchDB related repositories with a few quick
> > > >>> stats and my read on their status and requirements. Can I get some
> > > >>> eyeballs on this to double check before I submit a ticket to infra
> > > >>> for doing our branch renaming updates?
> > > >>>
> > > >>> https://gist.github.com/davisp/9de8fa167812f80356d4990e390c9351
> > > >>>
> > > >>> There are a few repos with comments I had when I wasn't 100% sure on
> > > >>> the status. For ease those are:
> > > >>>
> > > >>> couchdb-couch-collate - I couldn't easily tell if this was still
> > > >>> used for Windows builds
> > > >>
> > > >> Nope
> > > >>
> > > >>> couchdb-fauxton-server - This is an empty repo, should we have it
> > > >>> deleted?
> > > >>
> > > >> Sure
> > > >>
> > > >>> couchdb-jquery-couch - Should this be archived? Has PouchDB/nano
> > > >>> replaced it?
> > > >>
> > > >> If I recall correctly this was part of Futon and 1.x releases?
> > > >>
> > > >>> couchdb-nmo - Should this be archived?
> > > >>
> > > >> Very old code from 2015 from Robert Kowalski to help set up

Re: [DISCUSS] Rename default branch to `main`

2020-09-16 Thread Paul Davis
Also, I added an easier to digest list of repositories with their
planned actions to the gist [1]. I'm repasting here for historical
tracing.

Create `main` branch, update default branch and branch protections
===

* couchdb
* couchdb-admin
* couchdb-b64url
* couchdb-bear
* couchdb-ci
* couchdb-config
* couchdb-docker
* couchdb-documentation
* couchdb-erlfdb
* couchdb-escodegen
* couchdb-esprima
* couchdb-ets-lru
* couchdb-fauxton
* couchdb-folsom
* couchdb-glazier
* couchdb-helm
* couchdb-hqueue
* couchdb-hyper
* couchdb-ibrowse
* couchdb-ioq
* couchdb-jaeger-passage
* couchdb-jiffy
* couchdb-khash
* couchdb-local
* couchdb-meck
* couchdb-mochiweb
* couchdb-nano
* couchdb-passage
* couchdb-pkg
* couchdb-rebar
* couchdb-recon
* couchdb-snappy
* couchdb-thrift-protocol

Have infra set `asf-site` as default branch
===

* couchdb-www

Empty repositories to delete
===

* couchdb-fauxton-server
* couchdb-javascript-tests
* couchdb-query-server-spidermonkey

Repositories to archive
===

* couchdb-cassim
* couchdb-chttpd
* couchdb-couch
* couchdb-couch-collate
* couchdb-couch-dbupdates
* couchdb-couch-epi
* couchdb-couch-event
* couchdb-couch-httpd
* couchdb-couch-index
* couchdb-couch-log
* couchdb-couch-log-lager
* couchdb-couch-mrview
* couchdb-couch-plugins
* couchdb-couch-replicator
* couchdb-couch-stats
* couchdb-ddoc-cache
* couchdb-erlang-bcrypt
* couchdb-erlang-tests
* couchdb-examples
* couchdb-fabric
* couchdb-futon
* couchdb-global-changes
* couchdb-goldrush
* couchdb-jquery-couch
* couchdb-lager
* couchdb-mango
* couchdb-mem3
* couchdb-nmo
* couchdb-oauth
* couchdb-peruser
* couchdb-query-server-node
* couchdb-rexi
* couchdb-setup
* couchdb-smoosh
* couchdb-triq
* couchdb-twig


[1] https://gist.github.com/davisp/9de8fa167812f80356d4990e390c9351

On Wed, Sep 16, 2020 at 10:49 AM Paul Davis  wrote:
>
> Right. I figure that's basically an ASF version of the `gh-pages` branch?
>
> On Wed, Sep 16, 2020 at 10:39 AM Joan Touzet  wrote:
> >
> >
> > On 16/09/2020 11:39, Paul Davis wrote:
> > > On Wed, Sep 16, 2020 at 10:32 AM Joan Touzet  wrote:
> > >>
> > >>
> > >>
> > >> On 16/09/2020 10:57, Paul Davis wrote:
> > >>> Hey all,
> > >>>
> > >>> Here's a list of all CouchDB related repositories with a few quick
> > >>> stats and my read on their status and requirements. Can I get some
> > >>> eyeballs on this to double check before I submit a ticket to infra
> > >>> for doing our branch renaming updates?
> > >>>
> > >>> https://gist.github.com/davisp/9de8fa167812f80356d4990e390c9351
> > >>>
> > >>> There are a few repos with comments I had when I wasn't 100% sure on
> > >>> the status. For ease those are:
> > >>>
> > >>> couchdb-couch-collate - I couldn't easily tell if this was still
> > >>> used for Windows builds
> > >>
> > >> Nope
> > >>
> > >>> couchdb-fauxton-server - This is an empty repo, should we have it
> > >>> deleted?
> > >>
> > >> Sure
> > >>
> > >>> couchdb-jquery-couch - Should this be archived? Has PouchDB/nano
> > >>> replaced it?
> > >>
> > >> If I recall correctly this was part of Futon and 1.x releases?
> > >>
> > >>> couchdb-nmo - Should this be archived?
> > >>
> > >> Very old code from 2015 from Robert Kowalski to help set up
> > >> clusters/etc. I don't know anything about it, and it appears
> > >> unmaintained. +1 to archive
> > >>
> > >>> couchdb-oauth - I couldn't find this used anywhere, should we archive
> > >>
> > >> I remember using this extensively! 1.x asset. As we no longer officially
> > >> support it (or CouchDB "plugins" in this form), +1 to archive
> > >>
> > >>> couchdb-www - Should this be archived or included in the rename?
> > >>
> > >> We already have to use asf-site branch on this, and the 'master' branch
> > >> already says "you're on the wrong branch." Just have Infra change the
> > >> default branch to asf-site, no need to master -> main here IMO.
> > >>
> > >
> > > Want me to have infra change the default branch to `asf-site` 

Re: [DISCUSS] Rename default branch to `main`

2020-09-16 Thread Paul Davis
Right. I figure that's basically an ASF version of the `gh-pages` branch?

On Wed, Sep 16, 2020 at 10:39 AM Joan Touzet  wrote:
>
>
> On 16/09/2020 11:39, Paul Davis wrote:
> > On Wed, Sep 16, 2020 at 10:32 AM Joan Touzet  wrote:
> >>
> >>
> >>
> >> On 16/09/2020 10:57, Paul Davis wrote:
> >>> Hey all,
> >>>
> >>> Here's a list of all CouchDB related repositories with a few quick
> >>> stats and my read on their status and requirements. Can I get some
> >>> eyeballs on this to double check before I submit a ticket to infra
> >>> for doing our branch renaming updates?
> >>>
> >>> https://gist.github.com/davisp/9de8fa167812f80356d4990e390c9351
> >>>
> >>> There are a few repos with comments I had when I wasn't 100% sure on
> >>> the status. For ease those are:
> >>>
> >>> couchdb-couch-collate - I couldn't easily tell if this was still
> >>> used for Windows builds
> >>
> >> Nope
> >>
> >>> couchdb-fauxton-server - This is an empty repo, should we have it
> >>> deleted?
> >>
> >> Sure
> >>
> >>> couchdb-jquery-couch - Should this be archived? Has PouchDB/nano
> >>> replaced it?
> >>
> >> If I recall correctly this was part of Futon and 1.x releases?
> >>
> >>> couchdb-nmo - Should this be archived?
> >>
> >> Very old code from 2015 from Robert Kowalski to help set up
> >> clusters/etc. I don't know anything about it, and it appears
> >> unmaintained. +1 to archive
> >>
> >>> couchdb-oauth - I couldn't find this used anywhere, should we archive
> >>
> >> I remember using this extensively! 1.x asset. As we no longer officially
> >> support it (or CouchDB "plugins" in this form), +1 to archive
> >>
> >>> couchdb-www - Should this be archived or included in the rename?
> >>
> >> We already have to use asf-site branch on this, and the 'master' branch
> >> already says "you're on the wrong branch." Just have Infra change the
> >> default branch to asf-site, no need to master -> main here IMO.
> >>
> >
> > Want me to have infra change the default branch to `asf-site` on this repo?
>
> Yes please! No need to change to main here.
>
> -Joan
>
> >
> > Everything else sounds good.
> >
> >>>
> >>> Paul
> >>>
> >>> On Fri, Sep 11, 2020 at 6:28 AM Glynn Bird 
> >>> wrote:
> >>>>
> >>>> +1
> >>>>
> >>>> Happy to help reconfigure apache/couchdb-nano if necessary after
> >>>> the switch to main
> >>>>
> >>>> On Thu, 10 Sep 2020 at 10:40, Andy Wenk 
> >>>> wrote:
> >>>>
> >>>>> strong +1
> >>>>>
> >>>>> here at sum.cumo we also change the “master” branches to main
> >>>>>
> >>>>> Best
> >>>>>
> >>>>> Andy -- Andy Wenk Hamburg
> >>>>>
> >>>>> GPG fingerprint C32E 275F BCF3 9DF6 4E55  21BD 45D3 5653 77F9
> >>>>> 3D29
> >>>>>
> >>>>>
> >>>>>
> >>>>>> On 9. Sep 2020, at 20:09, Joan Touzet 
> >>>>>> wrote:
> >>>>>>
> >>>>>> +1. Thanks for starting this, Paul. I was actually going to try
> >>>>>> and
> >>>>> drive this a month or two ago, but things got busy for me.
> >>>>>>
> >>>>>> I'd also support renaming it to 'trunk' but really don't care
> >>>>>> what we
> >>>>> pick.
> >>>>>>
> >>>>>> The first commercial version control system I used to use,
> >>>>>> called that
> >>>>> branch "main":
> >>>>>>
> >>>>>> https://i.ibb.co/7bMDt3c/cc-ver-tree2.gif
> >>>>>>
> >>>>>> -Joan "yes, that's motif" Touzet
> >>>>>>
> >>>>>>
> >>>>>> On 2020-09-09 11:40 a.m., Paul Davis wrote:
> >>>>>>> Howdy Folks! Words matter. I've just started a thread on
> >>>>>>> merging all of the FoundationDB work into mainline
> >>>>>>> development and thought this would be a good time to bring up
> >>>>>>> a separate discussion on renaming our default branch.
> >>>>>>> Personally, I've got a few projects where I used `main` for
> >>>>>>> the mainline development branch. I find it to be a fairly
> >>>>>>> natural shift because I tab-complete everything on the
> >>>>>>> command line. I'd be open to other suggestions but I'm also
> >>>>>>> hoping this doesn't devolve into a bikeshed on what we end up
> >>>>>>> picking. For mechanics, what I'm thinking is that when we
> >>>>>>> finish up the last rebase of the FoundationDB work that
> >>>>>>> instead of actually pushing the merge/rebase button we just
> >>>>>>> rename the branch and then change the default branch on
> >>>>>>> GitHub and close the PR. Thoughts? Paul
> >>>>>
> >>>>>


Re: [DISCUSS] Rename default branch to `main`

2020-09-16 Thread Paul Davis
On Wed, Sep 16, 2020 at 10:32 AM Joan Touzet  wrote:
>
>
>
> On 16/09/2020 10:57, Paul Davis wrote:
> > Hey all,
> >
> > Here's a list of all CouchDB related repositories with a few quick
> > stats and my read on their status and requirements. Can I get some
> > eyeballs on this to double check before I submit a ticket to infra
> > for doing our branch renaming updates?
> >
> > https://gist.github.com/davisp/9de8fa167812f80356d4990e390c9351
> >
> > There are a few repos with comments I had when I wasn't 100% sure on
> > the status. For ease those are:
> >
> > couchdb-couch-collate - I couldn't easily tell if this was still
> > used for Windows builds
>
> Nope
>
> > couchdb-fauxton-server - This is an empty repo, should we have it
> > deleted?
>
> Sure
>
> > couchdb-jquery-couch - Should this be archived? Has PouchDB/nano
> > replaced it?
>
> If I recall correctly this was part of Futon and 1.x releases?
>
> > couchdb-nmo - Should this be archived?
>
> Very old code from 2015 from Robert Kowalski to help set up
> clusters/etc. I don't know anything about it, and it appears
> unmaintained. +1 to archive
>
> > couchdb-oauth - I couldn't find this used anywhere, should we archive
>
> I remember using this extensively! 1.x asset. As we no longer officially
> support it (or CouchDB "plugins" in this form), +1 to archive
>
> > couchdb-www - Should this be archived or included in the rename?
>
> We already have to use asf-site branch on this, and the 'master' branch
> already says "you're on the wrong branch." Just have Infra change the
> default branch to asf-site, no need to master -> main here IMO.
>

Want me to have infra change the default branch to `asf-site` on this repo?

Everything else sounds good.

> >
> > Paul
> >
> > On Fri, Sep 11, 2020 at 6:28 AM Glynn Bird 
> > wrote:
> >>
> >> +1
> >>
> >> Happy to help reconfigure apache/couchdb-nano if necessary after
> >> the switch to main
> >>
> >> On Thu, 10 Sep 2020 at 10:40, Andy Wenk 
> >> wrote:
> >>
> >>> strong +1
> >>>
> >>> here at sum.cumo we also change the “master” branches to main
> >>>
> >>> Best
> >>>
> >>> Andy -- Andy Wenk Hamburg
> >>>
> >>> GPG fingerprint C32E 275F BCF3 9DF6 4E55  21BD 45D3 5653 77F9
> >>> 3D29
> >>>
> >>>
> >>>
> >>>> On 9. Sep 2020, at 20:09, Joan Touzet 
> >>>> wrote:
> >>>>
> >>>> +1. Thanks for starting this, Paul. I was actually going to try
> >>>> and
> >>> drive this a month or two ago, but things got busy for me.
> >>>>
> >>>> I'd also support renaming it to 'trunk' but really don't care
> >>>> what we
> >>> pick.
> >>>>
> >>>> The first commercial version control system I used to use,
> >>>> called that
> >>> branch "main":
> >>>>
> >>>> https://i.ibb.co/7bMDt3c/cc-ver-tree2.gif
> >>>>
> >>>> -Joan "yes, that's motif" Touzet
> >>>>
> >>>>
> >>>> On 2020-09-09 11:40 a.m., Paul Davis wrote:
> >>>>> Howdy Folks! Words matter. I've just started a thread on
> >>>>> merging all of the FoundationDB work into mainline
> >>>>> development and thought this would be a good time to bring up
> >>>>> a separate discussion on renaming our default branch.
> >>>>> Personally, I've got a few projects where I used `main` for
> >>>>> the mainline development branch. I find it to be a fairly
> >>>>> natural shift because I tab-complete everything on the
> >>>>> command line. I'd be open to other suggestions but I'm also
> >>>>> hoping this doesn't devolve into a bikeshed on what we end up
> >>>>> picking. For mechanics, what I'm thinking is that when we
> >>>>> finish up the last rebase of the FoundationDB work that
> >>>>> instead of actually pushing the merge/rebase button we just
> >>>>> rename the branch and then change the default branch on
> >>>>> GitHub and close the PR. Thoughts? Paul
> >>>
> >>>


Re: [DISCUSS] Rename default branch to `main`

2020-09-16 Thread Paul Davis
Hey all,

Here's a list of all CouchDB related repositories with a few quick
stats and my read on their status and requirements. Can I get some
eyeballs on this to double check before I submit a ticket to infra for
doing our branch renaming updates?

https://gist.github.com/davisp/9de8fa167812f80356d4990e390c9351

There are a few repos with comments I had when I wasn't 100% sure on
the status. For ease those are:

couchdb-couch-collate - I couldn't easily tell if this was still used
for Windows builds
couchdb-fauxton-server - This is an empty repo, should we have it deleted?
couchdb-jquery-couch - Should this be archived? Has PouchDB/nano replaced it?
couchdb-nmo - Should this be archived?
couchdb-oauth - I couldn't find this used anywhere, should we archive it?
couchdb-www - Should this be archived or included in the rename?

Paul

On Fri, Sep 11, 2020 at 6:28 AM Glynn Bird  wrote:
>
> +1
>
> Happy to help reconfigure apache/couchdb-nano if necessary after the switch
> to main
>
> On Thu, 10 Sep 2020 at 10:40, Andy Wenk  wrote:
>
> > strong +1
> >
> > here at sum.cumo we also change the “master” branches to main
> >
> > Best
> >
> > Andy
> > --
> > Andy Wenk
> > Hamburg
> >
> > GPG fingerprint C32E 275F BCF3 9DF6 4E55  21BD 45D3 5653 77F9 3D29
> >
> >
> >
> > > On 9. Sep 2020, at 20:09, Joan Touzet  wrote:
> > >
> > > +1. Thanks for starting this, Paul. I was actually going to try and
> > drive this a month or two ago, but things got busy for me.
> > >
> > > I'd also support renaming it to 'trunk' but really don't care what we
> > pick.
> > >
> > > The first commercial version control system I used to use, called that
> > branch "main":
> > >
> > >  https://i.ibb.co/7bMDt3c/cc-ver-tree2.gif
> > >
> > > -Joan "yes, that's motif" Touzet
> > >
> > >
> > > On 2020-09-09 11:40 a.m., Paul Davis wrote:
> > >> Howdy Folks!
> > >> Words matter. I've just started a thread on merging all of the
> > >> FoundationDB work into mainline development and thought this would be
> > >> a good time to bring up a separate discussion on renaming our default
> > >> branch.
> > >> Personally, I've got a few projects where I used `main` for the
> > >> mainline development branch. I find it to be a fairly natural shift
> > >> because I tab-complete everything on the command line. I'd be open to
> > >> other suggestions but I'm also hoping this doesn't devolve into a
> > >> bikeshed on what we end up picking.
> > >> For mechanics, what I'm thinking is that when we finish up the last
> > >> rebase of the FoundationDB work that instead of actually pushing the
> > >> merge/rebase button we just rename the branch and then change the
> > >> default branch on GitHub and close the PR.
> > >> Thoughts?
> > >> Paul
> >
> >


Re: Is it time to merge prototype/fdb-layer to master?

2020-09-10 Thread Paul Davis
I should have noted, for each of the `apache/couchdb-$repo`
repositories my plan is to do a straight up copy of master -> main
with zero other changes. Once that's done we'll need to update
rebar.config.script but that should be all we need there.


On Thu, Sep 10, 2020 at 3:11 PM Paul Davis  wrote:
>
> So I've gotten `make check` passing against a merge of master into the
> `prototype/fdb-layer` branch. I ended up finding a flaky test and a
> bug in a recent commit to master. I've just merged a fix for the flaky
> test and Bob is working on a patch for the buffered_response feature.
>
> Once those are both merged I'll re-run the merge and name that branch `main`.
>
> Once that happens we'll need to work through a to-do list. Things I
> know that are on that list:
>
> 1. File infra ticket to have them change our GitHub setting for the
> default branch to `main`.
> 2. Copy branch protection rules from `master` to `main`
> 3. Steps 1 and 2 for all our `apache/couchdb-$repo` repositories
> 4. Update Jenkins config
> 5. Figure out FreeBSD builder situation
> 6. Probably other stuff
> 7. Eventually rename current `master` to something else so as to avoid 
> confusion
>
> Assuming no one objects beforehand, I'll start the ball rolling with
> Infra on Monday.
>
> Paul
>
> On Wed, Sep 9, 2020 at 1:11 PM Joan Touzet  wrote:
> >
> > Have been asking for it for a while ;) obviously +1.
> >
> > Be aware that Jenkinsfile.full post-merge will probably fail because, at
> > the very least, the FreeBSD hosts won't have fdb and can't run docker to
> > containerise it. This will need some exploration to resolve but
> > shouldn't be a blocker.
> >
> > The Jenkins setup will also need slight changes when we rename branches.
> > Also keep in mind other repos need the branch renaming, too. ASF Infra
> > can do the GitHub dance to change the name of the main branch.
> >
> > -Joan "about time" Touzet
> >
> > On 2020-09-09 2:05 p.m., Robert Samuel Newson wrote:
> > > Agree that its time to get the fdb-layer work into master, that's where 
> > > couchdb 4.0 should be being created.
> > >
> > > thanks for preserving the imported ebtree history.
> > >
> > >> On 9 Sep 2020, at 17:28, Paul Davis  wrote:
> > >>
> > >> The merge on this turned out to be a lot more straightforward so I
> > >> think its probably the way to go. I've got a failing test in
> > >> couch_views_active_tasks_test but it appears to be flaky rather than a
> > >> merge error. I'll work though getting `make check` to complete and
> > >> then send another update.
> > >>
> > >> https://github.com/apache/couchdb/tree/prototype/fdb-layer-final-merge
> > >> https://github.com/apache/couchdb/commit/873ccb4882f2e984c25f59ad0fd0a0677b9d4477
> > >>
> > >> On Wed, Sep 9, 2020 at 10:29 AM Paul Davis  
> > >> wrote:
> > >>>
> > >>> Howdy folks!
> > >>>
> > >>> I've just gone through a rebase of `prototype/fdb-layer` against
> > >>> master. Its not quite finished because the ebtree import went wrong
> > >>> during rebase due to a weirdness of the history.
> > >>>
> > >>> I have a PR up for the rebase into master for people to look at [1].
> > >>> Although the more important comparison is likely with the current
> > >>> `prototype/fdb-layer` that can be found at [2].
> > >>>
> > >>> Given the ebtree aspect, as well as the fact that I get labeled as the
> > >>> committer for all commits when doing a rebase I'm also wondering if we
> > >>> shouldn't turn this into a merge in this instance. I'll work up a
> > >>> second branch that shows that diff as well that we could then rebase
> > >>> onto master.
> > >>>
> > >>> Regardless, I'd appreciate if we could get some eyeballs on the diff
> > >>> and then finally merge this work to the default branch so its the main
> > >>> line development going forward.
> > >>>
> > >>> Paul
> > >>>
> > >>> [1] https://github.com/apache/couchdb/pull/3137
> > >>> [2] 
> > >>> https://github.com/apache/couchdb/compare/prototype/fdb-layer...prototype/fdb-layer-final-rebase
> > >


Re: Is it time to merge prototype/fdb-layer to master?

2020-09-10 Thread Paul Davis
So I've gotten `make check` passing against a merge of master into the
`prototype/fdb-layer` branch. I ended up finding a flaky test and a
bug in a recent commit to master. I've just merged a fix for the flaky
test and Bob is working on a patch for the buffered_response feature.

Once those are both merged I'll re-run the merge and name that branch `main`.

Once that happens we'll need to work through a to-do list. Things I
know that are on that list:

1. File infra ticket to have them change our GitHub setting for the
default branch to `main`.
2. Copy branch protection rules from `master` to `main`
3. Steps 1 and 2 for all our `apache/couchdb-$repo` repositories
4. Update Jenkins config
5. Figure out FreeBSD builder situation
6. Probably other stuff
7. Eventually rename current `master` to something else so as to avoid confusion

Assuming no one objects beforehand, I'll start the ball rolling with
Infra on Monday.

Paul

On Wed, Sep 9, 2020 at 1:11 PM Joan Touzet  wrote:
>
> Have been asking for it for a while ;) obviously +1.
>
> Be aware that Jenkinsfile.full post-merge will probably fail because, at
> the very least, the FreeBSD hosts won't have fdb and can't run docker to
> containerise it. This will need some exploration to resolve but
> shouldn't be a blocker.
>
> The Jenkins setup will also need slight changes when we rename branches.
> Also keep in mind other repos need the branch renaming, too. ASF Infra
> can do the GitHub dance to change the name of the main branch.
>
> -Joan "about time" Touzet
>
> On 2020-09-09 2:05 p.m., Robert Samuel Newson wrote:
> > Agree that its time to get the fdb-layer work into master, that's where 
> > couchdb 4.0 should be being created.
> >
> > thanks for preserving the imported ebtree history.
> >
> >> On 9 Sep 2020, at 17:28, Paul Davis  wrote:
> >>
> >> The merge on this turned out to be a lot more straightforward so I
> >> think its probably the way to go. I've got a failing test in
> >> couch_views_active_tasks_test but it appears to be flaky rather than a
> >> merge error. I'll work though getting `make check` to complete and
> >> then send another update.
> >>
> >> https://github.com/apache/couchdb/tree/prototype/fdb-layer-final-merge
> >> https://github.com/apache/couchdb/commit/873ccb4882f2e984c25f59ad0fd0a0677b9d4477
> >>
> >> On Wed, Sep 9, 2020 at 10:29 AM Paul Davis  
> >> wrote:
> >>>
> >>> Howdy folks!
> >>>
> >>> I've just gone through a rebase of `prototype/fdb-layer` against
> >>> master. Its not quite finished because the ebtree import went wrong
> >>> during rebase due to a weirdness of the history.
> >>>
> >>> I have a PR up for the rebase into master for people to look at [1].
> >>> Although the more important comparison is likely with the current
> >>> `prototype/fdb-layer` that can be found at [2].
> >>>
> >>> Given the ebtree aspect, as well as the fact that I get labeled as the
> >>> committer for all commits when doing a rebase I'm also wondering if we
> >>> shouldn't turn this into a merge in this instance. I'll work up a
> >>> second branch that shows that diff as well that we could then rebase
> >>> onto master.
> >>>
> >>> Regardless, I'd appreciate if we could get some eyeballs on the diff
> >>> and then finally merge this work to the default branch so its the main
> >>> line development going forward.
> >>>
> >>> Paul
> >>>
> >>> [1] https://github.com/apache/couchdb/pull/3137
> >>> [2] 
> >>> https://github.com/apache/couchdb/compare/prototype/fdb-layer...prototype/fdb-layer-final-rebase
> >


Re: [DISCUSS] Rename default branch to `main`

2020-09-09 Thread Paul Davis
Alexander,

Any PRs that aren't trivially rebased against the current
prototype/fdb-layer will by definition break regardless of what we
call our default branch moving forward.

Paul


On Wed, Sep 9, 2020 at 11:11 AM Jan Lehnardt  wrote:
>
> I’m in favour and I think the FDB merge is a nice opportunity to take
> the plunge.
>
> Best
> Jan
> —
> > On 9. Sep 2020, at 17:40, Paul Davis  wrote:
> >
> > Howdy Folks!
> >
> > Words matter. I've just started a thread on merging all of the
> > FoundationDB work into mainline development and thought this would be
> > a good time to bring up a separate discussion on renaming our default
> > branch.
> >
> > Personally, I've got a few projects where I used `main` for the
> > mainline development branch. I find it to be a fairly natural shift
> > because I tab-complete everything on the command line. I'd be open to
> > other suggestions but I'm also hoping this doesn't devolve into a
> > bikeshed on what we end up picking.
> >
> > For mechanics, what I'm thinking is that when we finish up the last
> > rebase of the FoundationDB work that instead of actually pushing the
> > merge/rebase button we just rename the branch and then change the
> > default branch on GitHub and close the PR.
> >
> > Thoughts?
> >
> > Paul
>


Re: Is it time to merge prototype/fdb-layer to master?

2020-09-09 Thread Paul Davis
The merge on this turned out to be a lot more straightforward so I
think its probably the way to go. I've got a failing test in
couch_views_active_tasks_test but it appears to be flaky rather than a
merge error. I'll work though getting `make check` to complete and
then send another update.

https://github.com/apache/couchdb/tree/prototype/fdb-layer-final-merge
https://github.com/apache/couchdb/commit/873ccb4882f2e984c25f59ad0fd0a0677b9d4477

On Wed, Sep 9, 2020 at 10:29 AM Paul Davis  wrote:
>
> Howdy folks!
>
> I've just gone through a rebase of `prototype/fdb-layer` against
> master. Its not quite finished because the ebtree import went wrong
> during rebase due to a weirdness of the history.
>
> I have a PR up for the rebase into master for people to look at [1].
> Although the more important comparison is likely with the current
> `prototype/fdb-layer` that can be found at [2].
>
> Given the ebtree aspect, as well as the fact that I get labeled as the
> committer for all commits when doing a rebase I'm also wondering if we
> shouldn't turn this into a merge in this instance. I'll work up a
> second branch that shows that diff as well that we could then rebase
> onto master.
>
> Regardless, I'd appreciate if we could get some eyeballs on the diff
> and then finally merge this work to the default branch so its the main
> line development going forward.
>
> Paul
>
> [1] https://github.com/apache/couchdb/pull/3137
> [2] 
> https://github.com/apache/couchdb/compare/prototype/fdb-layer...prototype/fdb-layer-final-rebase


[DISCUSS] Rename default branch to `main`

2020-09-09 Thread Paul Davis
Howdy Folks!

Words matter. I've just started a thread on merging all of the
FoundationDB work into mainline development and thought this would be
a good time to bring up a separate discussion on renaming our default
branch.

Personally, I've got a few projects where I used `main` for the
mainline development branch. I find it to be a fairly natural shift
because I tab-complete everything on the command line. I'd be open to
other suggestions but I'm also hoping this doesn't devolve into a
bikeshed on what we end up picking.

For mechanics, what I'm thinking is that when we finish up the last
rebase of the FoundationDB work that instead of actually pushing the
merge/rebase button we just rename the branch and then change the
default branch on GitHub and close the PR.

Thoughts?

Paul


Is it time to merge prototype/fdb-layer to master?

2020-09-09 Thread Paul Davis
Howdy folks!

I've just gone through a rebase of `prototype/fdb-layer` against
master. Its not quite finished because the ebtree import went wrong
during rebase due to a weirdness of the history.

I have a PR up for the rebase into master for people to look at [1].
Although the more important comparison is likely with the current
`prototype/fdb-layer` that can be found at [2].

Given the ebtree aspect, as well as the fact that I get labeled as the
committer for all commits when doing a rebase I'm also wondering if we
shouldn't turn this into a merge in this instance. I'll work up a
second branch that shows that diff as well that we could then rebase
onto master.

Regardless, I'd appreciate if we could get some eyeballs on the diff
and then finally merge this work to the default branch so its the main
line development going forward.

Paul

[1] https://github.com/apache/couchdb/pull/3137
[2] 
https://github.com/apache/couchdb/compare/prototype/fdb-layer...prototype/fdb-layer-final-rebase


Re: [DISCUSS] Creating new deleted documents in CouchDB 4

2020-09-01 Thread Paul Davis
Replication of deletions isn't affected due to the new_edits=false
flag like you guessed. This is purely "interactively creating a new
document that is deleted". Its a fairly minor edge case in that the
document must not exist. Any other attempt to "revive" a deleted doc
into a deleted state will fail with a conflict on 3.x.

I'm +0 for compatibility. Its not a significant amount of work to
implement the behavior and we can always deprecate it in the future to
remove the weird edge case of "document that has never existed can be
created in a deleted state".

On Tue, Sep 1, 2020 at 3:26 PM Jonathan Hall  wrote:
>
> Isn't compatibility required to support replication of deleted documents? Or 
> does creation of a deleted document work with new_edits=false?
>
>
>
> On Sep 1, 2020, 10:16 PM, at 10:16 PM, Nick Vatamaniuc  
> wrote:
> >Hi everyone,
> >
> >While running PouchDB replication unit tests against the CouchDB 4
> >replicator PR branch (thanks to Garren Smith, who helped set up the
> >tests), we had noticed a doc update API incompatibility between
> >CouchDB 3.x/PouchDB and the prototype/fdb-layer branch: CouchDB
> >3.x/PouchDB allow creating new deleted documents and
> >prototype/fdb-layer branch doesn't.
> >
> >For example:
> >
> >$ http put $DB1/mydb/doc1 _deleted:='true' a=b
> >HTTP/1.1 200 OK
> >
> >{
> >"id": "doc1",
> >"ok": true,
> >"rev": "1-ad7eb689fcae75e7a7edb57dc1f30939"
> >}
> >
> >$ http $DB1/mydb/doc1?deleted=true
> >HTTP/1.1 200 OK
> >
> >{
> >"_deleted": true,
> >"_id": "doc1",
> >"_rev": "1-ad7eb689fcae75e7a7edb57dc1f30939",
> >"a": "b"
> >}
> >
> >On prototype/fdb-layer it returns a 409 conflict error
> >
> >I opened a PR to make the prototype/fdb-layer branch behave the same
> >and keep the API compatibility, but also wanted to see what the
> >community thinks.
> >
> >https://github.com/apache/couchdb/pull/3123
> >
> >Would we want to keep compatibility with CouchDB 3.x/PouchDB or,
> >return a conflict (409), like the prototype/fdb-layer branch does?
> >
> >My vote is for compatibility.
> >
> >Thanks,
> >-Nick


Re: object keys in couchdb 4

2020-07-28 Thread Paul Davis
San,

I don't remember that being a performance issue under consideration
for the "exploded document" design that we had contemplated in
particular, but I could see there being some concerns around it.
However, we have not implemented that idea and instead just store
documents in as few consecutive keys as possible so in terms of fdb,
the doc bodies are just a string of bytes so its definitely a
non-issue with the current design.

Paul

On Tue, Jul 28, 2020 at 4:15 PM San Sato  wrote:
>
> During the design process with FDB integration, there was talk about issues
> with nesting keys having user-provided values, the gist of which was, as I
> recall, was that having many documents *{ _id: "xyz", o: ...}* with *o*'s
> value being a nested object with runtime-generated, non-predictable keys *{x1,
> x2, y1, y2, y3, m1...mX, n, ...}* harms FDB performance; that FDB keyspace
> prefers, operationally, to be relatively static (maybe this had something
> to do with joined paths-to-value, and limiting the number of distincts in
> the flattened namespace for the keys?).
>
> Is that still the case?
>
> Thanks,


Re: [DISCUSS] Reduce on FDB take 3

2020-07-27 Thread Paul Davis
alue are the emitted key and value from the map
> > function. ebtree:open is handed a reduce_fun which handles the reduce and
> > rereduce for the users reduction and, like couchdb 1/2/3, we also mix in a
> > system reduction that calculates the view size (paul's diff above appears
> > to do just this portion and looks broadly right).
> > > >
> > > > This avoids the duplication that garren's idea of storing the reduces
> > in ebtree was trying to avoid, but does so in all cases. This approach
> > allows us to compare the performance of map queries against map-only and
> > map-reduce indexes, it allows users to opt in or out of encrypting their
> > emitted view keys, and it doesn't prevent anyone from choosing to do both
> > (You can define a view with map function A and another view with map
> > function A and reduce function B, the former will be the couch_views
> > vertical index, the latter an ebtree "horizontal" index).
> > > >
> > > > As we learn more over time we could enhance one type of index or the
> > other and reach a point where we eliminate one, or the two could coexist
> > indefinitely.
> > > >
> > > >> On 24 Jul 2020, at 20:00, Paul Davis 
> > wrote:
> > > >>
> > > >> FWIW, a first pass at views entirely on ebtree turned out to be fairly
> > > >> straightforward. Almost surprisingly simple in some cases.
> > > >>
> > > >>
> > https://github.com/apache/couchdb/compare/prototype/fdb-layer...prototype/fdb-layer-ebtree-views
> > > >>
> > > >> Its currently passing all tests in `couch_views_map_test.erl` but is
> > > >> failing on other suites. I only did a quick skim on the failures but
> > > >> they all look superficial around some APIs I changed.
> > > >>
> > > >> I haven't added the APIs to query reduce functions via HTTP but the
> > > >> reduce functions are being executed to calculate row counts and KV
> > > >> sizes. Adding the builtin reduce functions and extending those to user
> > > >> defined reduce functions should be straightforward.
> > > >>
> > > >> On Fri, Jul 24, 2020 at 9:39 AM Robert Newson 
> > wrote:
> > > >>>
> > > >>>
> > > >>> I’m happy to restrict my PR comments to the actual diff, yes. So I’m
> > not +1 yet.
> > > >>>
> > > >>> I fixed the spurious conflicts at
> > https://github.com/apache/couchdb/pull/3033.
> > > >>>
> > > >>> --
> > > >>> Robert Samuel Newson
> > > >>> rnew...@apache.org
> > > >>>
> > > >>> On Fri, 24 Jul 2020, at 14:59, Garren Smith wrote:
> > > >>>> Ok so just to confirm, we keep my PR as-is with ebtree only for
> > reduce. We
> > > >>>> can get that ready to merge into fdb master. We can then use that
> > to battle
> > > >>>> test ebtree and then look at using it for the map side as well. At
> > that
> > > >>>> point we would combine the reduce and map index into a ebtree
> > index. Are
> > > >>>> you happy with that?
> > > >>>>
> > > >>>> Cheers
> > > >>>> Garren
> > > >>>>
> > > >>>> On Fri, Jul 24, 2020 at 3:48 PM Robert Newson 
> > wrote:
> > > >>>>
> > > >>>>> Hi,
> > > >>>>>
> > > >>>>> It’s not as unknown as you think but certainly we need empirical
> > data to
> > > >>>>> guide us on the reduce side. I’m also fine with continuing with the
> > > >>>>> map-only code as it stands today until such time as we demonstrate
> > ebtree
> > > >>>>> meets or exceeds our needs (and I freely accept the possibility
> > that it
> > > >>>>> might not).
> > > >>>>>
> > > >>>>> I think the principal enhancement to ebtree that would address most
> > > >>>>> concerns is if it could store the leaf entries vertically (as they
> > are
> > > >>>>> currently). I have some thoughts which I’ll try to realise as
> > working code.
> > > >>>>>
> > > >>>>> I’ve confirmed that I do create spurious conflicts and will have a
> > PR up
> > > >>>

Re: [DISCUSS] Reduce on FDB take 3

2020-07-24 Thread Paul Davis
 Fri, Jul 24, 2020 at 2:35 AM Kyle Snavely 
> > > > > wrote:
> > > > > >
> > > > > >> When in doubt, throw us a build at Cloudant with ebtree maps and
> > > we'll
> > > > > see
> > > > > >> if it comes close to the crazy fast KV map queries.
> > > > > >>
> > > > > >> Kyle
> > > > > >>
> > > > > >> On Thu, Jul 23, 2020, 2:17 PM Robert Samuel Newson <
> > > rnew...@apache.org>
> > > > > >> wrote:
> > > > > >>
> > > > > >>> I (perhaps obviously) don't agree that I'm tying myself to old
> > > CouchDB
> > > > > or
> > > > > >>> failing to embrace FDB. FDB is a toolkit and does not, to my mind,
> > > > > force
> > > > > >> us
> > > > > >>> down any particular path.
> > > > > >>>
> > > > > >>> I haven't sat down to modify couch_views in the manner I've
> > > suggested
> > > > > >>> (where ebtree is used as designed; being fed the output of the
> > > emit()
> > > > > >> calls
> > > > > >>> and calculating reductions as it does so) but I think it's a
> > > worthwhile
> > > > > >>> exercise. I'd be surprised if performance of map-only traversals
> > > would
> > > > > be
> > > > > >>> disappointing but who knows? I also expect it would allow for
> > > > > significant
> > > > > >>> simplification of the code, which is one of the higher virtues.
> > > > > >>>
> > > > > >>> Adam, can you describe in a little more detail how you picture
> > > "b+tree
> > > > > is
> > > > > >>> only used for incremental aggregations,"? It's implied in your
> > > reply
> > > > > that
> > > > > >>> it would preserve the "interesting property" of keeping user data
> > > out
> > > > > of
> > > > > >>> FDB Keys (for casual readers: the new native database encryption,
> > > > > called
> > > > > >>> "aegis", only encrypts the FDB value. It can't encrypt the key as
> > > this
> > > > > >>> would change the order of keys, which the current code depends 
> > > > > >>> on).
> > > > > Did I
> > > > > >>> misread you?
> > > > > >>>
> > > > > >>> B.
> > > > > >>>
> > > > > >>>> On 23 Jul 2020, at 20:11, Adam Kocoloski 
> > > wrote:
> > > > > >>>>
> > > > > >>>> OK thanks for the clarification. As I said I wasn’t all that
> > > confident
> > > > > >> I
> > > > > >>> understood the design :)
> > > > > >>>>
> > > > > >>>> I like the idea that the b+tree is only used for incremental
> > > > > >>> aggregations, rather than storing the entire materialized view,
> > > for the
> > > > > >>> same reasons that Garren stated.
> > > > > >>>>
> > > > > >>>> An index maintained entirely in ebtree has the interesting
> > > property
> > > > > >> that
> > > > > >>> it does not leak any user data into FDB Keys, which could be
> > > attractive
> > > > > >> for
> > > > > >>> security reasons.
> > > > > >>>>
> > > > > >>>> Adam
> > > > > >>>>
> > > > > >>>>> On Jul 23, 2020, at 1:54 PM, Garren Smith 
> > > wrote:
> > > > > >>>>>
> > > > > >>>>> On Thu, Jul 23, 2020 at 6:55 PM Paul Davis <
> > > > > >> paul.joseph.da...@gmail.com
> > > > > >>>>
> > > > > >>>>> wrote:
> > > > > >>>>>
> > > > > >>>>>>> I would like to keep ebtree to use just for the reduce index.
> > > > > >>>>>>
> > > > > >>>>>> Could you expand on your reasoning here, Garren? I haven't done
> > > any
> > > > > >>>>>> experiments on my own to understand if I'm missing something
> > > > > >>>>>> important. My initial reaction is to not diverge too far from
> > > the
> > > > > >>>>>> previous shape of the implementation since we have a decent
> > > idea of
> > > > > >>>>>> how that behaves already but perhaps you've seen or measured
> > > > > >> something
> > > > > >>>>>> I'm not thinking of?
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>> I think this must have been a misunderstanding on my part. I
> > > always
> > > > > >>> thought
> > > > > >>>>> of using ebtree to solve reduce and wasn't planning to use it
> > > for the
> > > > > >>> map
> > > > > >>>>> index.
> > > > > >>>>> I don't like the idea that we have ordered distributed key/value
> > > > > store
> > > > > >>> and
> > > > > >>>>> then implementing a b-tree on top of that for indexing.
> > > Especially
> > > > > >>> since we
> > > > > >>>>> know that the current map index is fast,
> > > > > >>>>> easy to use, and follows recommended practices in the FDB
> > > community
> > > > > on
> > > > > >>> how
> > > > > >>>>> to do indexing. ebtree makes sense for reduce and is a means to
> > > an
> > > > > end
> > > > > >>> to
> > > > > >>>>> give us CouchDB's reduce api, which is heavily reliant on a
> > > b-tree,
> > > > > >> with
> > > > > >>>>> CouchDB on FDB. This feels like a step backward and I worry we
> > > are
> > > > > >> tying
> > > > > >>>>> ourselves heavily to old CouchDB instead of using the fact that
> > > we
> > > > > >>> moving
> > > > > >>>>> to FDB which then allows us to design new api's and
> > > functionality.
> > > > > >>>>
> > > > > >>>
> > > > > >>>
> > > > > >>
> > > > >
> > > > >
> > > >
> > >
> >


Re: [DISCUSS] Reduce on FDB take 3

2020-07-23 Thread Paul Davis
> I would like to keep ebtree to use just for the reduce index.

Could you expand on your reasoning here, Garren? I haven't done any
experiments on my own to understand if I'm missing something
important. My initial reaction is to not diverge too far from the
previous shape of the implementation since we have a decent idea of
how that behaves already but perhaps you've seen or measured something
I'm not thinking of?


Re: [DISCUSS] Reduce on FDB take 3

2020-07-23 Thread Paul Davis
Mostly chiming in just to say that I agree with Bob here. I haven't
been paying the closest attention due to other things taking my time
but when I've heard the discussion my assumption was that we'd be
using a single ebtree instance per map/reduce index similar to how it
works in CouchDB classic. As evidenced by the multiple long
discussions about implementing CouchDB's reduce feature there's really
not a good way to do it without a custom B+Tree implementation that
allows us to run calculations as part of the implementation.

Personally, I'd think the first thing to try would be to avoid being
clever here and just implement the straightforward single ebtree per
map/reduce view and then measure what performance. Given that we're
moving towards the "single HTTP request is a single DB snapshot"
approach I *think* that the ebtree will perform well enough given that
it has linked list pointers between leaf nodes for map-only requests.
But we should also measure it to make sure.

In terms of value sizes and limits there are plenty of approaches
between capping sizes more aggressively and chunking ebtree nodes
among multiple FDB KVs that I'm not overly concerned at this stage.

On Thu, Jul 23, 2020 at 4:18 AM Robert Samuel Newson  wrote:
>
> Hi Adam,
>
> You are right on those 3 bullet points, that is what ebtree is/does.
>
> You are not right on your second paragraph, at least as it pertains to ebtree 
> itself. ebtree stores the inserted keys, values, links between btree nodes 
> and intermediate reduction values in the fdb value (the fdb key is always 
> constructed as {IndexPrefix, NodeId} where NodeId is currently an 
> auto-incrementing integer). Some of those items are unused (stored as a 
> single-byte NIL) if there no reduce_fun is specified. In garren's work 
> (https://github.com/apache/couchdb/pull/3018) that integrates ebtree into the 
> couch_views application you are right, in that ebtree is only used for the 
> reduce storage. I've commented on this topic that PR as I think that's a 
> missed opportunity.
>
> Through the ebtree development I had considered whether the tree should be 
> stored separately from the user-defined (and therefore potentially large and 
> non-uniform in size) values or not. As you've noted the "order" in ebtree is 
> implemented correctly (on cardinality not size) and this is key to 
> predictable performance (as is the rebalancing when deleting, which 
> couch_btree has never attempted).
>
> Without the user-defined values, we could choose a high value for order (say, 
> 100 or 200, perhaps even higher) and have confidence that they will fit 
> within fdb value size limits. I deliberately went simple here and deferred 
> the debate till I had something worth discussing. And so I would love to 
> continue that discussion here; this thread seems the appropriate place.
>
> My thoughts so far are;
>
> 1) we could introduce new limits on what we allow users to reduce on, with a 
> view to capture 95%+ of the intended use case. e.g, restricting the emitted 
> values to scalars, which we know will reduce without growing excessively 
> (and, yes, I would include lists or objects of scalars). The only thing I 
> think we'd want to preclude is what reduce_limit also, and naively, tried to 
> prevent; an ever-growing reduction value.
>
> 2) As you've suggested independently, the emitted key/values, and the 
> intermediate reductions, are stored in their own k-v entries, leaving the 
> k-v's of the btree structure itself within predictable size bounds.
>
> 3) Splitting an encoded node over multiple, adjacent k-v's exactly like we do 
> with a large document body today.
>
> 4) A hybrid approach of the current ebtree code and 2 above, externalising 
> key/value/reductions if warranted on a per node basis. Well-behaved reduce 
> functions over appropriate emitted data would not need to do so, but we'd not 
> be surprised in production deployments if large values or large reductions 
> happened on occasion.
>
> I do think it's a good time to define what an appropriate use of reduce in 
> CouchDB is, whatever the mechanism for calculating and storing it. I don't 
> think we should support egregious cases like "function(ks, vs, rr) { return 
> vs}", for example.
>
> Finally, I note that I have a local branch called "immutable" which changes 
> node ids whenever the "members" attribute of an ebtree node changes. It 
> changes node ids to uuids and adds a cache callback. The intention is to 
> eliminate the vast majority of inner node lookups _outside_ of any fdb 
> transaction (it is always necessary to read the root node and we can't cache 
> leaf nodes as they are linked for efficient forward and reverse ordered 
> traversal). This code works as expected but I have been unable to prove its 
> benefit as ebtree performs very well under the test scenarios I've been able 
> to bring to bear so far. I will post that work as a branch to couchdb later 
> today.
>
> B.
>
>
> > On 23 Jul 

Re: [DISCUSS] couchdb 4.0 transactional semantics

2020-07-16 Thread Paul Davis
>From what I'm reading it sounds like we have general consensus on a few things:

1. A single CouchDB API call should map to a single FDB transaction
2. We absolutely do not want to return a valid JSON response to any
streaming API that hit a transaction boundary (because data
loss/corruption)
3. We're willing to change the API requirements so that 2 is not an issue.
4. None of this applies to continuous changes since that API call was
never a single snapshot.

If everyone generally agrees with that summarization, my suggestion
would be that we just revisit the new pagination APIs and make them
the only behavior rather than having them be opt-in. I believe those
APIs already address all the concerns in this thread and the only
reason we kept the older versions with `restart_tx` was to maintain
API backwards compatibility at the expense of a slight change to
semantics of snapshots. However, if there's a consensus that the
semantics are more important than allowing a blanket `GET
/db/_all_docs` I think it'd make the most sense to just embrace the
pagination APIs that already exist and were written to cover these
issues.

The only thing I'm not 100% on is how to deal with non-continuous
replications. I.e., the older single shot replication. Do we go back
with patches to older replicators to allow 4.0 compatibility? Just
declare that you have to mediate a replication on the newer of the two
CouchDB deployments? Sniff the replicator's UserAgent and behave
differently on 4.x for just that special case?

Paul

On Wed, Jul 15, 2020 at 7:25 PM Adam Kocoloski  wrote:
>
> Sorry, I also missed that you quoted this specific bit about eagerly 
> requesting a new snapshot. Currently the code will just react to the 
> transaction expiring, then wait till it acquires a new snapshot if 
> “restart_tx” is set (which can take a couple of milliseconds on a 
> FoundationDB cluster that is deployed across multiple AZs in a cloud Region) 
> and then proceed.
>
> Adam
>
> > On Jul 15, 2020, at 6:54 PM, Adam Kocoloski  wrote:
> >
> > Right now the code has an internal “restart_tx” flag that is used to 
> > automatically request a new snapshot if the original one expires and 
> > continue streaming the response. It can be used for all manner of multi-row 
> > responses, not just _changes.
> >
> > As this is a pretty big change to the isolation guarantees provided by the 
> > database Bob volunteered to elevate the issue to the mailing list for a 
> > deeper discussion.
> >
> > Cheers, Adam
> >
> >> On Jul 15, 2020, at 11:38 AM, Joan Touzet  wrote:
> >>
> >> I'm having trouble following the thread...
> >>
> >> On 14/07/2020 14:56, Adam Kocoloski wrote:
> >>> For cases where you’re not concerned about the snapshot isolation (e.g. 
> >>> streaming an entire _changes feed), there is a small performance benefit 
> >>> to requesting a new FDB transaction asynchronously before the old one 
> >>> actually times out and swapping over to it. That’s a pattern I’ve seen in 
> >>> other FDB layers but I’m not sure we’ve used it anywhere in CouchDB yet.
> >>
> >> How does _changes work right now in the proposed 4.0 code?
> >>
> >> -Joan
> >
>


Re: Is everything ok on our Jenkins cluster?

2020-06-18 Thread Paul Davis
Yap, just mention that restarting all Jenkins agents solved the issue.

On Thu, Jun 18, 2020 at 2:40 PM Alessio 'Blaster' Biancalana
 wrote:
>
> Hi,
> Looks like the issue is gone now. Sorry for bothering! Shall I close th
> Jira infra ticket?
>
> Alessio
>
> Il gio 18 giu 2020, 20:23 Joan Touzet  ha scritto:
>
> > On 18/06/2020 13:19, Alessio 'Blaster' Biancalana wrote:
> > > Hi Joan,
> > > Hi opened the infra bug, could you please check I've done everything
> > > correctly?
> > >
> > > https://issues.apache.org/jira/browse/INFRA-20441
> >
> > Looks fine.
> >
> > FYI Paul Davis just updated and restarted the agents on all of the
> > machines. Nick just got his PR through. Can you try restarting your
> > build now?
> >
> > -Joan
> >
> > >
> > > I also shared the link to the thread. Strange situation :/
> > >
> > > @Paul could you check again? Definitely the issue wasn't fixed all day
> > > long, and it looks like it's not a networking blurb.
> > >
> > > Alessio
> > >
> > > On Thu, Jun 18, 2020 at 12:48 AM Joan Touzet  wrote:
> > >
> > >> Can you try opening an Infra ticket on this?
> > >>
> > >> https://issues.apache.org/jira
> > >>
> > >> Open it against the Infra project and share with them the link(s). You
> > >> can also link to this mailing list discussion via
> > >> https://lists.apache.org/ .
> > >>
> > >> -Joan
> > >>
> > >> On 2020-06-17 5:16 p.m., Alessio 'Blaster' Biancalana wrote:
> > >>> Thanks for the response Paul!
> > >>> Same stuff, I tried rerunning the job but it looks stuck with that
> > error
> > >>>
> > >>>
> > >>
> > https://ci-couchdb.apache.org/blue/organizations/jenkins/jenkins-cm1%2FPullRequests/detail/PR-2893/6/pipeline
> > >>>
> > >>> I don't know why that happens, if you have any clue...
> > >>>
> > >>> Thanks,
> > >>> Alessio
> > >>>
> > >>> Il mer 17 giu 2020, 23:01 Paul Davis  ha
> > >>> scritto:
> > >>>
> > >>>> I looked at Jenkins and saw them all as connected and in sync. Is
> > >>>> there more to the report or was this some sort of networking burb?
> > >>>>
> > >>>> On Wed, Jun 17, 2020 at 2:20 PM Joan Touzet 
> > wrote:
> > >>>>>
> > >>>>> IBM maintains these workers for us - will have to ask Paul Davis to
> > >> take
> > >>>>> a look.
> > >>>>>
> > >>>>> -Joan
> > >>>>>
> > >>>>> On 17/06/2020 05:36, Alessio 'Blaster' Biancalana wrote:
> > >>>>>> Hey folks,
> > >>>>>> I have a job on Jenkins that is repeatedly giving me this error:
> > >>>>>>
> > >>>>>> <
> > >>>>
> > >>
> > https://ci-couchdb.apache.org/blue/organizations/jenkins/jenkins-cm1%2FPullRequests/detail/PR-2893/4/pipeline/#step-117-log-1
> > >>>>> [2020-06-17T09:09:49.584Z]
> > >>>>>> Recording test results
> > >>>>>> <
> > >>>>
> > >>
> > https://ci-couchdb.apache.org/blue/organizations/jenkins/jenkins-cm1%2FPullRequests/detail/PR-2893/4/pipeline/#step-117-log-2
> > >>>>> [2020-06-17T09:09:49.718Z]
> > >>>>>> Remote call on JNLP4-connect connection from
> > >>>>>> 76.9a.30a9.ip4.static.sl-reverse.com/169.48.154.118:7778 failed
> > >>>>>> <
> > >>>>
> > >>
> > https://ci-couchdb.apache.org/blue/organizations/jenkins/jenkins-cm1%2FPullRequests/detail/PR-2893/4/pipeline/#step-117-log-3
> > >>>>> Remote
> > >>>>>> call on JNLP4-connect connection from
> > >>>>>> 76.9a.30a9.ip4.static.sl-reverse.com/169.48.154.118:7778 failed
> > >>>>>>
> > >>>>>> I was wondering, is everything ok on our Jenkins cluster? Maybe
> > >> there's
> > >>>>>> some maintenance I'm not aware of?
> > >>>>>>
> > >>>>>> Cheers,
> > >>>>>> Alessio
> > >>>>>>
> > >>>>
> > >>>
> > >>
> > >
> >


Re: Is everything ok on our Jenkins cluster?

2020-06-17 Thread Paul Davis
I looked at Jenkins and saw them all as connected and in sync. Is
there more to the report or was this some sort of networking burb?

On Wed, Jun 17, 2020 at 2:20 PM Joan Touzet  wrote:
>
> IBM maintains these workers for us - will have to ask Paul Davis to take
> a look.
>
> -Joan
>
> On 17/06/2020 05:36, Alessio 'Blaster' Biancalana wrote:
> > Hey folks,
> > I have a job on Jenkins that is repeatedly giving me this error:
> >
> >   
> > <https://ci-couchdb.apache.org/blue/organizations/jenkins/jenkins-cm1%2FPullRequests/detail/PR-2893/4/pipeline/#step-117-log-1>[2020-06-17T09:09:49.584Z]
> > Recording test results
> >   
> > <https://ci-couchdb.apache.org/blue/organizations/jenkins/jenkins-cm1%2FPullRequests/detail/PR-2893/4/pipeline/#step-117-log-2>[2020-06-17T09:09:49.718Z]
> > Remote call on JNLP4-connect connection from
> > 76.9a.30a9.ip4.static.sl-reverse.com/169.48.154.118:7778 failed
> >   
> > <https://ci-couchdb.apache.org/blue/organizations/jenkins/jenkins-cm1%2FPullRequests/detail/PR-2893/4/pipeline/#step-117-log-3>Remote
> > call on JNLP4-connect connection from
> > 76.9a.30a9.ip4.static.sl-reverse.com/169.48.154.118:7778 failed
> >
> > I was wondering, is everything ok on our Jenkins cluster? Maybe there's
> > some maintenance I'm not aware of?
> >
> > Cheers,
> > Alessio
> >


Re: [DISCUSSION] Emit an instance ID value in DB info API response in CouchDB 4.0

2020-05-26 Thread Paul Davis
We already have the uuid generated. I'd suggest just adding a `uuid`
field that exposes it.

On Tue, May 26, 2020 at 1:27 PM Nick Vatamaniuc  wrote:
>
> Hi everyone,
>
> I was wondering if we could expose an "instance_id" field in the top
> level `/` (db_info) result. The value would be a uuid which is
> unique for every database instance. That is, if a database is deleted
> and re-created with the same name, it would have a different
> "instance_id". [*]
>
> We'd get at least 2 benefits from it:
>
> 1) Replicator could eventually could be updated to checkpoint on the
> target only, and thus have a read-only access to the source. Currently
> we need to checkpoint on the source to account for the case when the
> source db has been recreated, so we maintain the checkpoint history on
> the source and the target.
>
> 2) User's code might want to know if it the database has been
> recreated, mostly to avoid mishaps when they continue performing
> requests against the db with the same, which now may have completely
> different data in it.
>
> What do we think, do we like this idea?
>
> Cheers,
> -Nick
>
> [*] Back in 1.x we had the "instance_start_time" field which does
> mostly same thing, but is based on time. In 2.x and 3.x we still emit
> that field and for compatibility and hard code it to "0". We could
> re-use that field, but I think since the idea is to make it a uuid and
> not a timestamp so it's name doesn't quite match and it would have a
> different format (64bits vs 128bits).


Re: Partitions list

2020-05-24 Thread Paul Davis
Currently no, and comp sci theoretically nothing that’ll be any more
efficient internally than the obvious map/reduce view.

On Sun, May 24, 2020 at 12:48 PM ermouth  wrote:

> Hi devs,
>
> is there a way to get the list of DB partitions, except dedicated
> map/reduce? Trying to add partitioned queries UI into Photon and bit stuck
> how to implement it.
>
> ermouth
>


Re: Should we continue with FDB RFC's

2020-05-19 Thread Paul Davis
Can +1 but its gonna feel really silly when I think about how the code
is already merged...

On Tue, May 19, 2020 at 12:28 PM Joan Touzet  wrote:
>
> Looks like the Mango one has the required +1 already.
>
> There's reviews of the map index one by Adam, Paul, and Mike (Rhodes)
> but neither have explicitly +1'ed. Can any of you get to this?
>
> I'd rather not be the deciding +1 right now, too much else on my plate
> to give this the attention it deserves for that - but I have skimmed it.
>
> -Joan
>
> On 2020-05-18 7:49, Garren Smith wrote:
> > Great thanks for the feedback. Its good to know that they are still
> > considered useful. I've updated my mango and map index RFC's to match the
> > current implementations.
> > I would like to merge them in.
> >
> > Cheers
> > Garren
> >
> >
> > On Thu, May 14, 2020 at 11:14 PM Joan Touzet  wrote:
> >
> >> The intent of the RFCs was to give people a place to look at what's
> >> being done, comment on the implementation decisions, and to form the
> >> basis for eventual documentation.
> >>
> >> I think they've been relatively successful on the first two pieces, but
> >> it sounds like they've fallen behind, especially because we have quite a
> >> few languishing PRs over in the couchdb-documentation repo.
> >>
> >> My hope had been that those PRs would land much faster - even if they
> >> were WIPs - and would get updated regularly with new PRs.
> >>
> >> Is that too onerous of a request?
> >>
> >> I agree with Adam that the level of detail doesn't have to be there in
> >> great detail when it comes to implementation decisions. It only really
> >> needs to be there in detail for API changes, so we have good source
> >> material for the eventual documentation side of things. Since 4.0 is
> >> meant to be largely API compatible with 3.0, I hope this is also in-line
> >> with expectations.
> >>
> >> -Joan "engineering, more than anything, means writing it down" Touzet
> >>
> >> On 2020-05-13 8:53 a.m., Adam Kocoloski wrote:
> >>> I do find them useful and would be glad to see us maintain some sort of
> >> “system architecture guide” as a living document. I understand that can be
> >> a challenge when things are evolving quickly, though I also think that if
> >> there’s a substantial change to the design from the RFC it could be worth a
> >> note to dev@ to call that out.
> >>>
> >>> I imagine we can omit some level of detail from these documents to still
> >> capture the main points of the data model and data flows without needing to
> >> update them e.g. every time a new field is added to a packed value.
> >>>
> >>> Cheers, Adam
> >>>
>  On May 13, 2020, at 5:29 AM, Garren Smith  wrote:
> 
>  Hi All,
> 
>  The majority of RFC's for CouchDB 4.x have gone stale and I want to know
>  what everyone thinks we should do about it? Do you find the RFC's
> >> useful?
> 
>  So far I've found maintaining the RFC's really difficult. Often we
> >> write an
>  RFC, then write the code. The code often ends up quite different from
> >> how
>  we thought it would when writing the RFC. Following that smaller code
>  changes and improvements to a section moves the codebase even further
> >> from
>  the RFC design. Do we keep updating the RFC for every change or should
> >> we
>  leave it at a certain point?
> 
>  I've found the discussion emails to be really useful way to explore the
>  high-level design of each new feature. I would probably prefer that we
>  continue the discussion emails but don't do the RFC unless its a feature
>  that a lot of people want to be involved in the design.
> 
>  Cheers
>  Garren
> >>>
> >>
> >


Re: [DISCUSS] Streaming API in CouchDB 4.0

2020-04-28 Thread Paul Davis
Seems reasonable to me. I'd agree that setting query string parameters
with a bookmark should be rejected. I was also going to suggest
eliding the href member. In the examples I've seen those are usually
structured as something like:

"links": {
"previous": "/path/and/qs=foo",
"next": "/path/and/qs=bar"
}

On Tue, Apr 28, 2020 at 6:56 AM Ilya Khlopotov  wrote:
>
> Hello,
>
> I would like to introduce second proposal.
>
> 1) Add new optional query field called `bookmark` (or `token`) to following 
> endpoints
>   - {db}/_all_docs
>   - {db}/_all_docs/queries
>   - _dbs_info
>   - {db}/_design/{ddoc}/_view/{view}
>   - {db}/_design/{ddoc}/_view/{view}/queries
> 2) Add following additional fields into response:
>```
> "first": {
> "href": 
> "https://myserver.com/myddb/_all_docs?limit=50=true;
> },
> "previous": {
>  "href": "https://myserver.com/myddb/_all_docs?bookmark=983uiwfjkdsdf;
> },
> "next": {
> "href": "https://myserver.com/myddb/_all_docs?bookmark=12343tyekf3;
>  },
>  ```
> 3) Implement per-endpoint configurable max limits
>```
>[request_limits]
>   _all_docs = 5000
>   _all_docs/queries = 5000
>   _all_dbs = 5000
>   _dbs_info = 5000
>   _view = 2500
>   _view/queries = 2500
>   _find = 2500
>   ```
> 4) Implement following semantics:
>- The bookmark would be opaque token and would include information needed 
> to ensure proper pagination without the need to repeat initial parameters of 
> the request. In fact we might prohibit setting additional parameters when 
> bookmark query field is specified.
>- don't use delayed responses when `bookmark` field is provided
>- don't use delayed responses when `limit` query key is specified and when 
> it is below the max limit
>- return 400 when limit query key is specified and it is greater than the 
> max limit
>- return 400 when we stream rows (in case when `limit` query key wasn't 
> specified) and reach max limit
>- the `previous`/`next`/`first` keys are optional and we omit them for the 
> cases they don't make sense
>
> Latter on we would introduce API versioning and deal with `{db}/_changes` and 
> `_all_docs` endpoints.
>
> Questions:
> - `bookmark` vs `token`?
> - should we prohibit setting other fields when bookmark is set?
> - `previous`/`next`/`first` as href vs token value itself (i.e. `{"previous": 
> "983uiwfjkdsdf", "next": "12343tyekf3", "first": "iekjhfwo034"}`)
>
> Best regards,
> iilyak
>
> On 2020/04/22 20:18:57, Ilya Khlopotov  wrote:
> > Hello everyone,
> >
> > Based on the discussions on the thread I would like to propose a number of 
> > first steps:
> > 1) introduce new endpoints
> >   - {db}/_all_docs/page
> >   - {db}/_all_docs/queries/page
> >   - _all_dbs/page
> >   - _dbs_info/page
> >   - {db}/_design/{ddoc}/_view/{view}/page
> >   - {db}/_design/{ddoc}/_view/{view}/queries/page
> >   - {db}/_find/page
> >
> > These new endpoints would act as follows:
> > - don't use delayed responses
> > - return object with following structure
> >   ```
> >   {
> >  "total": Total,
> >  "bookmark": base64 encoded opaque value,
> >  "completed": true | false,
> >  "update_seq": when available,
> >  "page": current page number,
> >  "items": [
> >  ]
> >   }
> >   ```
> > - the bookmark would include following data (base64 or protobuff???):
> >   - direction
> >   - page
> >   - descending
> >   - endkey
> >   - endkey_docid
> >   - inclusive_end
> >   - startkey
> >   - startkey_docid
> >   - last_key
> >   - update_seq
> >   - timestamp
> >   ```
> >
> > 2) Implement per-endpoint configurable max limits
> > ```
> > _all_docs = 5000
> > _all_docs/queries = 5000
> > _all_dbs = 5000
> > _dbs_info = 5000
> > _view = 2500
> > _view/queries = 2500
> > _find = 2500
> > ```
> >
> > Latter (after few years) CouchDB would deprecate and remove old endpoints.
> >
> > Best regards,
> > iilyak
> >
> > On 2020/02/19 22:39:45, Nick Vatamaniuc  wrote:
> > > Hello everyone,
> > >
> > > I'd like to discuss the shape and behavior of streaming APIs for CouchDB 
> > > 4.x
> > >
> > > By "streaming APIs" I mean APIs which stream data in row as it gets
> > > read from the database. These are the endpoints I was thinking of:
> > >
> > >  _all_docs, _all_dbs, _dbs_info  and query results
> > >
> > > I want to focus on what happens when FoundationDB transactions
> > > time-out after 5 seconds. Currently, all those APIs except _changes[1]
> > > feeds, will crash or freeze. The reason is because the
> > > transaction_too_old error at the end of 5 seconds is retry-able by
> > > default, so the request handlers run again and end up shoving the
> > > whole request down the socket again, headers and all, which is
> > > obviously broken and not what we want.
> > >
> > > There are few alternatives discussed in couchdb-dev channel. I'll
> > > present some behaviors but feel free to add more. Some ideas might
> > > have been discounted on the IRC 

Re: API versioning

2020-04-27 Thread Paul Davis
I didn’t read that hard. If memory serves its supposed to be something like
application/json;couchdb._v2 which shouldn’t require registration.
Regardless, being wrong three ways instead of four is fine by me (HIBP
wrong to be clear).

On Mon, Apr 27, 2020 at 5:44 PM Robert Samuel Newson 
wrote:

> As a general direction, I like it, but the specific example of
> accept/content doesn't sit right with me.
>
> application/couchdb needs registering, which is a faff, and isn't
> appropriate imo as we have many different formats of request and response,
> which a content-type ought to capture. From the post I linked, it's the
> ".v2+json" bit that matters. We could sink a lot of time into declaring
> content types for the all_docs response, the changes response, etc, or do
> something like like Accept: application/json; couch_version=2, or simply
> omit this option and just have the path option, the query parameter option
> and the custom header option.
>
> B.
>
> > On 27 Apr 2020, at 22:34, Paul Davis 
> wrote:
> >
> > Overall this looks quite good to me. The only thing I'd say is that we
> > should set our version much earlier so we can eventually rely on this
> > for selecting an entirely independent implementation. Though that's
> > not very pressing as once we have the concept embedded we can extend
> > it as needed.
> >
> > For this approach the only thing that concerns me is the way
> > versioning is applied to individual URL handlers. I'd rather see
> > something where we can say "replace these things with newer versions,
> > fall back to v1 for the defaults". Though I couldn't figure out a very
> > clean way to do that. The only thing I came up with was to have a
> > chttpd_handlers_v2.erl service that's called and then
> > chttpd_httpd_handlers_v2.erl that instead of defaulting to `no_match`
> > would just forward to `chttpd_httpd_handlers:url_handler(Req)` or w/e
> > it would be. But to be honest, I'm not super fond of that approach.
> >
> > On Mon, Apr 27, 2020 at 2:41 PM Ilya Khlopotov 
> wrote:
> >>
> >> I've implemented a PoC for versioned API
> https://github.com/apache/couchdb/pull/2832. The code is very ugly but it
> demonstrates how it could work.
> >>
> >> Best regards,
> >> iilyak
> >>
> >> On 2020/04/27 14:55:10, Ilya Khlopotov  wrote:
> >>> Hello,
> >>>
> >>> The topic of API versioning was brought in the [Streaming API in
> CouchDB 4.0](
> https://lists.apache.org/thread.html/ra8d16937cca332207d772844d2789f932fbc4572443a354391663b9c%40%3Cdev.couchdb.apache.org%3E)
> thread. The tread proposes to add new API endpoints to introduce a response
> structure change. The alternative approach could be to implement proper
> support for different API versions.
> >>>
> >>> It would benefit CouchDB project if we would have support for API
> versioning. Adding new endpoint is easy but it is very hard to deprecate or
> change the old ones. With proper API versioning we can avoid the need to
> rewrite all client applications at the same time.
> >>>
> >>> rnewson mentioned a good blog post about API versioning (
> https://www.troyhunt.com/your-api-versioning-is-wrong-which-is/). The
> main idea of the blog post is. There is no perfect solution it would be the
> best to support all options so the user can choose which one to use.
> >>>
> >>> In that spirit I propose to implement four different ways of
> specifying the API version (per endpoint):
> >>>
> >>> - Path based -  `/_v{version_number}/{db}/_all_docs`
> >>> - Query parameter based  - `/{db}/_all_docs?_v={version_number}`
> >>> - Accept / Content-Type headers in the form of `application/couchdb;
> _v={version_number},application/json`
> >>> - Custom header - X-Couch-API: v2
> >>>
> >>> The server would include response version in two places:
> >>> - Custom header - `X-Couch-API: v2`
> >>> - `Content-type: application/couchdb;
> _v={version_number},application/json`
> >>>
> >>> Implementation wise it would go as follows:
> >>> 1) we teach chttpd how to extract version (we set version to `1` if it
> is not specified)
> >>> 2) we change arity of chttpd_handlers:url_handler/2 to pass API version
> >>> 3) we would update functions in chttpd_httpd_handlers.erl to match on
> API version
> >>>  ```
> >>>  url_handler(<<"_all_dbs">>, 1)-> fun
> chttpd_misc:handle_all_dbs_req/1;
> >>>  url_handler(<&l

Re: API versioning

2020-04-27 Thread Paul Davis
Overall this looks quite good to me. The only thing I'd say is that we
should set our version much earlier so we can eventually rely on this
for selecting an entirely independent implementation. Though that's
not very pressing as once we have the concept embedded we can extend
it as needed.

For this approach the only thing that concerns me is the way
versioning is applied to individual URL handlers. I'd rather see
something where we can say "replace these things with newer versions,
fall back to v1 for the defaults". Though I couldn't figure out a very
clean way to do that. The only thing I came up with was to have a
chttpd_handlers_v2.erl service that's called and then
chttpd_httpd_handlers_v2.erl that instead of defaulting to `no_match`
would just forward to `chttpd_httpd_handlers:url_handler(Req)` or w/e
it would be. But to be honest, I'm not super fond of that approach.

On Mon, Apr 27, 2020 at 2:41 PM Ilya Khlopotov  wrote:
>
> I've implemented a PoC for versioned API 
> https://github.com/apache/couchdb/pull/2832. The code is very ugly but it 
> demonstrates how it could work.
>
> Best regards,
> iilyak
>
> On 2020/04/27 14:55:10, Ilya Khlopotov  wrote:
> > Hello,
> >
> > The topic of API versioning was brought in the [Streaming API in CouchDB 
> > 4.0](https://lists.apache.org/thread.html/ra8d16937cca332207d772844d2789f932fbc4572443a354391663b9c%40%3Cdev.couchdb.apache.org%3E)
> >  thread. The tread proposes to add new API endpoints to introduce a 
> > response structure change. The alternative approach could be to implement 
> > proper support for different API versions.
> >
> > It would benefit CouchDB project if we would have support for API 
> > versioning. Adding new endpoint is easy but it is very hard to deprecate or 
> > change the old ones. With proper API versioning we can avoid the need to 
> > rewrite all client applications at the same time.
> >
> > rnewson mentioned a good blog post about API versioning 
> > (https://www.troyhunt.com/your-api-versioning-is-wrong-which-is/). The main 
> > idea of the blog post is. There is no perfect solution it would be the best 
> > to support all options so the user can choose which one to use.
> >
> > In that spirit I propose to implement four different ways of specifying the 
> > API version (per endpoint):
> >
> > - Path based -  `/_v{version_number}/{db}/_all_docs`
> > - Query parameter based  - `/{db}/_all_docs?_v={version_number}`
> > - Accept / Content-Type headers in the form of `application/couchdb; 
> > _v={version_number},application/json`
> > - Custom header - X-Couch-API: v2
> >
> > The server would include response version in two places:
> > - Custom header - `X-Couch-API: v2`
> > - `Content-type: application/couchdb; _v={version_number},application/json`
> >
> > Implementation wise it would go as follows:
> > 1) we teach chttpd how to extract version (we set version to `1` if it is 
> > not specified)
> > 2) we change arity of chttpd_handlers:url_handler/2 to pass API version
> > 3) we would update functions in chttpd_httpd_handlers.erl to match on API 
> > version
> >   ```
> >   url_handler(<<"_all_dbs">>, 1)-> fun 
> > chttpd_misc:handle_all_dbs_req/1;
> >   url_handler(<<"_all_dbs">>, 2)-> fun 
> > chttpd_misc_v2:handle_all_dbs_req/1;
> >   ...
> >   db_handler(<<"_design">>, 1)   -> fun chttpd_db:handle_design_req/2;
> >   db_handler(<<"_design">>, 2)   -> fun 
> > chttpd_db_v2:handle_design_req/2;
> >   ...
> >   design_handler(<<"_view">>, 1)-> fun chttpd_view:handle_view_req/3;
> >   design_handler(<<"_view">>, 2)-> fun chttpd_view_v2:handle_view_req/3;
> >   ```
> > 4) Modify chttpd:send_response to set response version (pass additional 
> > argument)
> >
> > I don't expect the implementation to exceed 20 lines of code (not counting 
> > changes in arity of functions in chttpd_httpd_handlers).
> >
> > Best regards,
> > iilyak
> >


Re: [DISCUSS] Streaming API in CouchDB 4.0

2020-04-23 Thread Paul Davis
I'd agree that my initial reaction to cursor was that its not a great
fit, but there does seem to be the existing name used in the greater
REST world for this sort of pagination so I'm not concerned about
using that terminology.

I'm generally on board with allowing and setting some default sane
limits on pages. We probably should have done that quite awhile ago
after moving to native clustering and now that we have FDB limits I
think it makes even more sense to have an API that does not lend
itself to crazy errors when people are just trying to poke at an API.

I think we're all on board that one of the goals is to make sure that
clients don't accidentally misinterpret a response. That is, we're
trying to be quite diligent that a user doesn't get 1000 rows and not
realize there's another 10 that were beyond the limit. The bookmark
approach with hard caps seems like a generally fine approach to me.
The current approach users extra URL path segments to try and avoid
this confusion. I wonder if we should consider starting to properly
version our API using one of the many schemes that are used. Having
read through a few articles I don't have a very clear favorite though.

As to this particular proposal I do see a couple issues:

`total` - We can do this in most cases fairly easily. Though it's a
bit odd for continuous changes.

`complete` - I'm not sure whether this is entirely possible given the
API that FDB presents us. Specifically, when we set a range and we get
back exactly $num_rows in the response, if the data set ended at
exactly that page I don't think the `more` flag from fdb would tell us
that. So we'd have a clunky UX there where we say not complete but the
next page is empty. That's also not to mention that depending on
whether we're looking at snapshots and so on that there's no way for
us to know between stateless requests whether there were more rows
added to the end.

`page` - This one is just hard/impossible to calculate. FDB doesn't
provide us with offsets or even an efficient "about how many rows in
this range?" type queries so providing that would be both inaccurate
and fairly difficult/expensive to calculate. In some cases I think we
could have something maybe close that didn't suck too badly, but it'd
also fall down for changes as well due to the way that updates reorder
them.

`update_seq` - I'm just not sure on when this would be useful or what
it would refer to. Maybe a version stamp of the last change for that
request? If we had a future API that asked for a snapshot access then
maybe? But if we did do something there with versionstamps or read
versions I'd expect that to come with the rest of the API.

For the bookmark fields:

`direction` vs `descending` seems like a field duplication to me.

`page` - This would seem to suggest we could skip to a certain
location in the results numerically which we are not able to do with
the FDB API.

`last_key` vs `start_key` seems like a field duplication. We don't
need to know where things started I don't think. Just where to start
from and where to end.

`update_seq` - is same as earlier. Not entirely sure on the intent there.

`timestamp` - Expiring bookmarks based on time does not seem like a
good idea. Both for clock skew and why bother when this would
functionally just be a convenience API that users could already
implement for themselves.

Another thing might also be to provide our bookmark as a full link
that seems to be fairly standard REST practice these days. Something
that clients don't have to do any logic with so that we're free to
change the implementation.

And lastly, I don't think we should be neglecting the _changes API as
part of this discussion. I realize that we'll need to support the
older streaming semantics if we want to maintain replication
compatibility (which I think we'll all agree is a Good Thing) but it
also feels a bit wrong to ignore it as part of this work if we're
going to be modernizing our APIs. Though if we do pick up a good
versioning scheme then we could theoretically make those changes
easily enough. Plus, who doesn't want to rewrite chttpd to be a whole
lot less... chttpd-y?


On Thu, Apr 23, 2020 at 1:43 PM Robert Samuel Newson  wrote:
>
>
> I think it's a key difference from "cursor" as I've seen them elsewhere, that 
> ours will point at an ever changing database, you couldn't seamlessly cursor 
> through a large data set, one "page" at a time.
>
> Bookmarks began in search (raises guilty hand) in order to address a 
> Lucene-specific issue (that high values of "skip" are incredibly inefficient, 
> using lots of RAM). That is not true for CouchDB's own indexes, which can be 
> navigated perfectly with startkey/endkey/startkey_docid/endkey_docid, etc.
>
> I guess I'm not helping much with these observations but I wouldn't like to 
> see CouchDB gain an additional and ugly method of doing something already 
> possible.
>
> B.
>
> > On 23 Apr 2020, at 19:02, Joan Touzet  wrote:
> >
> > I realise this 

Re: Cloudbees Operations Center and Client Masters upgrade this weekend.

2020-04-05 Thread Paul Davis
Howdy Gavin,

I've updated everything besides the two FreeBSD nodes that Joan
manages. Is there a trick to having Jenkins realize a node has
upgraded its slave.jar? Took me an extra twenty minutes fumbling
around before realizing that slave.jar is now actually agent.jar and
that even though Jenkins realized the jar had actually been updated. I
ended up realizing that I had to go through each node in the web UI to
re-enable each worker.

Joan, for the FreeBSD ones I'm not sure how you set up the run
scripts. For me it was just a `sudo sv restart jenkins` on each node
because I download the agent.jar as part of `/etc/sv/jenkins/run`.
Then it was just a random Googling till I happened to notice that was
enough to upgrade the jar and needed to click the button.

Paul

On Sun, Apr 5, 2020 at 11:32 AM Gavin McDonald  wrote:
>
> Hi All,
>
> This is now completed.
>
> Overall time spent was around 3 hours on the Operations Center and 5
> masters.
>
> When the OC was being upgraded, there was no downtime for the masters.
>
> Each master had downtime of around 20 minutes to 30 minutes, including
> multiple restarts
> , backups and plugin upgrades.
>
> CouchDB folks - most of your nodes are connected via 'node connects to the
> master' meaning that you'll
> have to update each slave.jar manually. The preferred method is ssh i/o
> non-blocking, all nodes connected
> in this fashion all came back on their own without intervention.
>
> Thanks all
>
> Gavin McDonald (ASF Infra)
>
>
> On Sat, Apr 4, 2020 at 3:20 PM Gavin McDonald  wrote:
>
> > Hi All,
> >
> > I'll be upgrading the Cloudbees Core Operations Center and all attached
> > Masters on Sunday 5th April at around 10am UTC until completed.
> >
> > I'll turn off new builds around 7am UTC.
> >
> > I'm 'expecting' a couple of hours downtime assuming all goes well,
> > including upgrading all
> > associated plugins, performing backups etc. But, this is the first time
> > I'm upgrading this setup so there may be the odd gotcha that surprises me
> > who knows! - I'm as prepared as possible and I am not expecting any
> > problems.
> >
> > Thanks
> >
> > Gavin McDonald (ASF Infra)
> >
> >


Re: Getting FDB work onto master

2020-03-31 Thread Paul Davis
There are a few other bits to `make check` that aren't included in
`make check-fdb`. Updating `make check` should just be a matter of
taking our test subset and applying them to `make check`.

On Tue, Mar 31, 2020 at 11:04 AM Garren Smith  wrote:
>
> On the fdb branch we have a make check-fdb which is a subset of all the
> tests that should pass. I think we should use that instead of make check
>
> On Tue, Mar 31, 2020 at 5:34 PM Joan Touzet  wrote:
>
> > Took a bit longer than expected, but it's been tested & validated, and
> > then re-pushed. I'll push up the other platforms as well (except for
> > CentOS 8, which still has a broken Python dependency).
> >
> > PR for the changes necessary was also merged to couchdb-ci.
> >
> > FDB should feel free to merge to master once `make check` is working.
> >
> > We can hammer out the wide matrix issues on a slower timeframe.
> >
> > -Joan
> >
> > On 2020-03-30 19:42, Joan Touzet wrote:
> > > OOPS, looks like I pushed the wrong image.
> > >
> > > I'll build the kerl-based version of the image and re-push. This will
> > > take a couple of hours, since I have to build 3x Erlangs from source.
> > >
> > > Good to know step 1 works! Next step for y'all: fix `make check`.
> > >
> > > -Joan
> > >
> > > On 2020-03-30 6:05 p.m., Robert Samuel Newson wrote:
> > >> noting that
> > >>
> > https://ci-couchdb.apache.org/blue/organizations/jenkins/jenkins-cm1%2FPullRequests/detail/PR-2732/7/pipeline/
> > >> now fails because kerl isn't there. related?
> > >>
> > >> B.
> > >>
> > >>> On 30 Mar 2020, at 22:11, Robert Samuel Newson 
> > >>> wrote:
> > >>>
> > >>> Nice, make check-fdb passes for me on that branch (the 2nd time, the
> > >>> 1st time was a spurious failure, timing related).
> > >>>
> > >>> B.
> > >>>
> >  On 30 Mar 2020, at 22:04, Robert Samuel Newson 
> >  wrote:
> > 
> >  Hi,
> > 
> >  Great timing, I merged something to prototype/fdb-layer without the
> >  check passing, I'm trying this now.
> > 
> >  First note, the kerl line doesn't work but it seems there's a system
> >  wide erlang 20 install instead.
> > 
> >  B.
> > 
> > > On 30 Mar 2020, at 19:50, Joan Touzet  wrote:
> > >
> > > Hi everyone, hope you're all staying at home[1].
> > >
> > > I've just pushed out a new version of our
> > > couchdbdev/debian-buster-erlang-all Docker image. This now includes
> > > the fdb binaries, as well as client libraries and headers. This is
> > > a necessary (but not sufficient) step to getting the fdb prototype
> > > merged to master.
> > >
> > > Can someone please test if this works correctly for them to build
> > > and test CouchDB (with fdb)?
> > >
> > > Here's instructions:
> > >
> > > docker pull couchdbdev/debian-buster-erlang-all
> > > docker run -it couchdbdev/debian-buster-erlang-all
> > > # then, inside the image:
> > > cd
> > > git clone https://github.com/apache/couchdb
> > > cd couchdb && git checkout 
> > > . /usr/local/kerl/20.3.8.25/activate
> > > # you still need to fix make check, but Paul says this should work:
> > > make check-fdb
> > >
> > > The next step would be to fix `make check`. Then, you can merge the
> > > fdb branch to master.
> > >
> > > CI on master will be broken after fdb merge until we get answers to
> > > these questions: [2].
> > >
> > > **REMEMBER**: Any 3.x fixes should land on the 3.x branch at this
> > > point. If they're backend specific, there's no need for them to
> > > land on master anymore.
> > >
> > > **QUESTION**: Now that we have a new feature (JWT), it's likely the
> > > next CouchDB release would be 3.1.0 - so, probably no need to land
> > > more fixes on 3.0.x at this point. Does everyone agree?
> > >
> > > -Joan "I miss restaurants" Touzet
> > >
> > > [1]: https://www.youtube.com/watch?v=rORMGH0jE2I
> > > [2]:
> > https://forums.foundationdb.org/t/package-download-questions/2037
> > 
> > >>>
> > >>
> >


Re: [DISCUSS] Mango indexes on FDB

2020-03-27 Thread Paul Davis
Thanks! For some reason your step 4 was elided in the GMail UI but not
when Garren responded and I was confused.

On Fri, Mar 27, 2020 at 9:11 AM Glynn Bird  wrote:
>
> > The quoting here is weird. Are you saying to skip _all_docs in your
> proposal, Glynn?
>
> I'm saying eliminate (3) from your list of things.
>
> 1. If user specifies an index, use it even if we have to wait
> 2. If an index is built that can be used, use it
> 3. n/a
> 4. As a last resort use _all_docs
>
>
> On Thu, 26 Mar 2020 at 16:59, Paul Davis 
> wrote:
>
> > On Thu, Mar 26, 2020 at 5:33 AM Will Holley  wrote:
> > >
> > > Ah - in that case I think we should remove step 3, as it leads to a
> > > confusing mental model. It's much simpler to explain that Mango will only
> > > use fresh indexes and any new indexes will build in the background.
> > >
> >
> > Simpler in some respect. The trade off being that we then have to
> > teach users how to know that an index is built and also that they then
> > need to be aware that different index types will have different ideas
> > of what "built" means.
> >
> > > On Thu, 26 Mar 2020 at 10:15, Garren Smith  wrote:
> > >
> > > > On Thu, Mar 26, 2020 at 11:04 AM Will Holley 
> > wrote:
> > > >
> > > > > Broadly, I think it's a big step forward if we can prevent Mango from
> > > > > automatically selecting extremely stale indexes.
> > > > >
> > > > > I've been going back and forth on whether step 3 could lead to some
> > > > > difficult-to-predict behaviour. If we assume that requests have a
> > short
> > > > > timeout - e.g. we can't return any result if it doesn't complete
> > within
> > > > the
> > > > > FDB transaction timeout - then I think it's fine: queries that use
> > > > > _all_docs and a large database will be timing out anyway.
> > > > >
> > > > > If we were to allow long-running queries then it seems a bit
> > sketchier
> > > > > because adding an index to a large database could cause queries that
> > > > > previously completed to start timing out whilst they block on the
> > index
> > > > > build. This is basically how Mango in CouchDB 2/3 behaves and has
> > been a
> > > > > big pain point for customers I've worked with, to the point where you
> > > > > basically need to explicitly specify which index Mango uses in all
> > cases
> > > > if
> > > > > you're to avoid surprise timeouts when somebody adds a new index.
> > > > >
> > > > > As I understand it, we're not allowing queries to span FDB
> > transactions
> > > > so
> > > > > this latter case is not something to worry about?
> > > >
> > > >
> > > > We are going to allow queries to span transactions. This is already
> > > > implemented for views and will be for mango
> > > >
> > > >
> > > > >
> > > > > Cheers,
> > > > >
> > > > > Will
> > > > >
> > > > > On Wed, 25 Mar 2020 at 19:43, Garren Smith 
> > wrote:
> > > > >
> > > > > > On Wed, Mar 25, 2020 at 8:35 PM Paul Davis <
> > > > paul.joseph.da...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > > It was therefore felt that having an immediate "Not ready"
> > signal
> > > > for
> > > > > > > just _some_ calls to _find, based on the type of backing index,
> > was a
> > > > > bad
> > > > > > > and confusing api.
> > > > > > > >
> > > > > > > > We also discussed _find calls where the user does not specify
> > an
> > > > > index,
> > > > > > > and concluded that we would be free to choose between using the
> > > > > _all_docs
> > > > > > > index (which is always up to date but rarely the best index for a
> > > > given
> > > > > > > selector) or blocking to update a better but stale index.
> > > > > > > >
> > > > > > > > Summary-ing my summarisation;
> > > > > > > >
> > > > > > > > 1) if you specify an index, we'll use it even if we have to
> > update
> > > > > it,
> > > > > > >

Re: [DISCUSS] Mango indexes on FDB

2020-03-26 Thread Paul Davis
On Thu, Mar 26, 2020 at 5:33 AM Will Holley  wrote:
>
> Ah - in that case I think we should remove step 3, as it leads to a
> confusing mental model. It's much simpler to explain that Mango will only
> use fresh indexes and any new indexes will build in the background.
>

Simpler in some respect. The trade off being that we then have to
teach users how to know that an index is built and also that they then
need to be aware that different index types will have different ideas
of what "built" means.

> On Thu, 26 Mar 2020 at 10:15, Garren Smith  wrote:
>
> > On Thu, Mar 26, 2020 at 11:04 AM Will Holley  wrote:
> >
> > > Broadly, I think it's a big step forward if we can prevent Mango from
> > > automatically selecting extremely stale indexes.
> > >
> > > I've been going back and forth on whether step 3 could lead to some
> > > difficult-to-predict behaviour. If we assume that requests have a short
> > > timeout - e.g. we can't return any result if it doesn't complete within
> > the
> > > FDB transaction timeout - then I think it's fine: queries that use
> > > _all_docs and a large database will be timing out anyway.
> > >
> > > If we were to allow long-running queries then it seems a bit sketchier
> > > because adding an index to a large database could cause queries that
> > > previously completed to start timing out whilst they block on the index
> > > build. This is basically how Mango in CouchDB 2/3 behaves and has been a
> > > big pain point for customers I've worked with, to the point where you
> > > basically need to explicitly specify which index Mango uses in all cases
> > if
> > > you're to avoid surprise timeouts when somebody adds a new index.
> > >
> > > As I understand it, we're not allowing queries to span FDB transactions
> > so
> > > this latter case is not something to worry about?
> >
> >
> > We are going to allow queries to span transactions. This is already
> > implemented for views and will be for mango
> >
> >
> > >
> > > Cheers,
> > >
> > > Will
> > >
> > > On Wed, 25 Mar 2020 at 19:43, Garren Smith  wrote:
> > >
> > > > On Wed, Mar 25, 2020 at 8:35 PM Paul Davis <
> > paul.joseph.da...@gmail.com>
> > > > wrote:
> > > >
> > > > > > It was therefore felt that having an immediate "Not ready" signal
> > for
> > > > > just _some_ calls to _find, based on the type of backing index, was a
> > > bad
> > > > > and confusing api.
> > > > > >
> > > > > > We also discussed _find calls where the user does not specify an
> > > index,
> > > > > and concluded that we would be free to choose between using the
> > > _all_docs
> > > > > index (which is always up to date but rarely the best index for a
> > given
> > > > > selector) or blocking to update a better but stale index.
> > > > > >
> > > > > > Summary-ing my summarisation;
> > > > > >
> > > > > > 1) if you specify an index, we'll use it even if we have to update
> > > it,
> > > > > no matter how long that takes.
> > > > > > 2) if you don't specify an index, it's the dealers choice. The
> > > details
> > > > > here may change in point releases.
> > > > > >
> > > > >
> > > > > So it seems there's still a bit of confusion on what the consensus is
> > > > > here. The way that I had thought this would work is that we'd do
> > > > > something like such:
> > > > >
> > > > > 1. If user specifies and index, use it even if we have to wait
> > > > > 2. If an index is built that can be used, use it
> > > > > 3. If an index is building that can be used, wait for it
> > > > > 4. As a last resort use _all_docs
> > > > >
> > > > > Discussing with Garren on the PR he's of the opinion that we should
> > > > > skip step 3 and just go directly to using _all_docs if nothing is
> > > > > built.
> > > > >
> > > >
> > > > I just want to clarify step 3. I'm ok with using an index that still
> > > needs
> > > > to be built as long as there is no other built index
> > > > that can service the request.
> > > >
> > > > So the big thing for me is to always prefer a built index over a
&g

Re: [DISCUSS] Mango indexes on FDB

2020-03-26 Thread Paul Davis
The quoting here is weird. Are you saying to skip _all_docs in your
proposal, Glynn?

On Thu, Mar 26, 2020 at 5:46 AM Glynn Bird  wrote:
>
> +1 on removing step 3 - my reservation on falling back on all_docs is that
> users have no insight into how expensive a query is, other than measuring
> latencies (which might depend on other factors).  I would hope that folks
> would use option 1 anyway.
>
> So Paul's list becomes:
>
> 1. If user specifies an index, use it even if we have to wait
> 2. If an index is built that can be used, use it
> 3. n/a
> 4. As a last resort use _all_docs
>
>
> On Thu, 26 Mar 2020 at 10:33, Will Holley  wrote:
>
> > Ah - in that case I think we should remove step 3, as it leads to a
> > confusing mental model. It's much simpler to explain that Mango will only
> > use fresh indexes and any new indexes will build in the background.
> >
> > On Thu, 26 Mar 2020 at 10:15, Garren Smith  wrote:
> >
> > > On Thu, Mar 26, 2020 at 11:04 AM Will Holley 
> > wrote:
> > >
> > > > Broadly, I think it's a big step forward if we can prevent Mango from
> > > > automatically selecting extremely stale indexes.
> > > >
> > > > I've been going back and forth on whether step 3 could lead to some
> > > > difficult-to-predict behaviour. If we assume that requests have a short
> > > > timeout - e.g. we can't return any result if it doesn't complete within
> > > the
> > > > FDB transaction timeout - then I think it's fine: queries that use
> > > > _all_docs and a large database will be timing out anyway.
> > > >
> > > > If we were to allow long-running queries then it seems a bit sketchier
> > > > because adding an index to a large database could cause queries that
> > > > previously completed to start timing out whilst they block on the index
> > > > build. This is basically how Mango in CouchDB 2/3 behaves and has been
> > a
> > > > big pain point for customers I've worked with, to the point where you
> > > > basically need to explicitly specify which index Mango uses in all
> > cases
> > > if
> > > > you're to avoid surprise timeouts when somebody adds a new index.
> > > >
> > > > As I understand it, we're not allowing queries to span FDB transactions
> > > so
> > > > this latter case is not something to worry about?
> > >
> > >
> > > We are going to allow queries to span transactions. This is already
> > > implemented for views and will be for mango
> > >
> > >
> > > >
> > > > Cheers,
> > > >
> > > > Will
> > > >
> > > > On Wed, 25 Mar 2020 at 19:43, Garren Smith  wrote:
> > > >
> > > > > On Wed, Mar 25, 2020 at 8:35 PM Paul Davis <
> > > paul.joseph.da...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > > It was therefore felt that having an immediate "Not ready" signal
> > > for
> > > > > > just _some_ calls to _find, based on the type of backing index,
> > was a
> > > > bad
> > > > > > and confusing api.
> > > > > > >
> > > > > > > We also discussed _find calls where the user does not specify an
> > > > index,
> > > > > > and concluded that we would be free to choose between using the
> > > > _all_docs
> > > > > > index (which is always up to date but rarely the best index for a
> > > given
> > > > > > selector) or blocking to update a better but stale index.
> > > > > > >
> > > > > > > Summary-ing my summarisation;
> > > > > > >
> > > > > > > 1) if you specify an index, we'll use it even if we have to
> > update
> > > > it,
> > > > > > no matter how long that takes.
> > > > > > > 2) if you don't specify an index, it's the dealers choice. The
> > > > details
> > > > > > here may change in point releases.
> > > > > > >
> > > > > >
> > > > > > So it seems there's still a bit of confusion on what the consensus
> > is
> > > > > > here. The way that I had thought this would work is that we'd do
> > > > > > something like such:
> > > > > >
> > > > > > 1. If user specifies and index, use it even if we have to 

Re: [DISCUSS] Mango indexes on FDB

2020-03-25 Thread Paul Davis
> It was therefore felt that having an immediate "Not ready" signal for just 
> _some_ calls to _find, based on the type of backing index, was a bad and 
> confusing api.
>
> We also discussed _find calls where the user does not specify an index, and 
> concluded that we would be free to choose between using the _all_docs index 
> (which is always up to date but rarely the best index for a given selector) 
> or blocking to update a better but stale index.
>
> Summary-ing my summarisation;
>
> 1) if you specify an index, we'll use it even if we have to update it, no 
> matter how long that takes.
> 2) if you don't specify an index, it's the dealers choice. The details here 
> may change in point releases.
>

So it seems there's still a bit of confusion on what the consensus is
here. The way that I had thought this would work is that we'd do
something like such:

1. If user specifies and index, use it even if we have to wait
2. If an index is built that can be used, use it
3. If an index is building that can be used, wait for it
4. As a last resort use _all_docs

Discussing with Garren on the PR he's of the opinion that we should
skip step 3 and just go directly to using _all_docs if nothing is
built.

My main assumption is that most cases where a user is creating an
index and then wanting to run a query with it are in the
design/exploration phase of learning the feature or designing an index
to use. In that scenario if we skip waiting it seems likely that a
user could easily be led to believe that an index creation "worked"
for their selector when in reality it was just backed by _all_docs.

The other reason for preferring to wait for an index to finish
building is that the UI for the normal case of creating indexes is a
bit awkward. Having to run a polling loop around checking the index
status seems suboptimal in most cases.

Am I missing other cases that would benefit from not waiting and just
using _all_docs?

Paul


Re: [DISCUSS] soft-deletion

2020-03-18 Thread Paul Davis
Alex,

The first con I see for that approach is that its not soft-deletion.
Its actual deletion with an API for restoration. Which, fair enough,
is probably a feature we should consider supporting for CouchDB
installations that are based on FoundationDB.

The second major con is that it relies on CouchDB being based on
FoundationDB. Part of CouchDB's design philosophy is that the internet
may or may not exist, and if it does exist that it may or may not be
reliable. There are lots of deployments of CouchDB that are part of a
desktop application or POS installation that may see internet only
periodically if at all so an S3 backup solution is out. There also may
come a time that there's a flavor of CouchDB that uses LevelDB or
SQLite or FDBLite (I just made that up, any idea how hard it'd be?)
for these sorts of embedded deployments where fdbrestore/fdbbackup
wouldn't be feasible.

Then the last major con I see is the time-to-restore disparity. With
soft-deletion restoration is a few milliseconds. Streaming from S3
will obviously depend on the size of the database and obviously be
orders of magnitude longer.

On the pro side for the soft-delete on FoundationDB is that the first
draft of the RFC is 108 lines [1]. We obviously can't say for sure how
big or involved the fdbrestore approach would be but I think we'd all
agree it'd be bigger.

Paul

[1] https://github.com/apache/couchdb/pull/2666


On Wed, Mar 18, 2020 at 2:31 PM Alex Miller
 wrote:
>
> Let me perhaps paint an alternative RFC:
>
> 1) `DELETE /{db}`
>
> If soft-deletion is enabled, delete the database subspace, and also record 
> into ?DELETED_DBS the timestamp of the commit and the database subspace prefix
>
> 2) `GET /{db}/_deleted_dbs_info`
>
> Return the timestamp (and whatever other info one should record) of deleted 
> databases.
>
> 3) `PUT /{db}/_restore/{deletedTS}`
>
> Invoke `fdbrestore -k` to do a key range restricted restore into the current 
> cluster of the deleted subspace prefix at versionstamp-1.  Wait for it to 
> complete, and return 200 when completed.
>
> And this would all rely on having a continuous backup configured and running 
> that would hold a minimum of 48 hours of changes.
>
>
> Now, I don’t actually deal with backups often so my memory on current caveats 
> is a bit fuzzy.  I think there might be a couple complications here that I’ve 
> missed, like…
> * There not being key range restricted locking of the database
> * A key range restore is currently suboptimal in that it doesn’t do obvious 
> filtering that it could to cut down on the amount of data it reads
>
> But, neither of these seem heavily blocking, as they could be tackled 
> quickly, particularly if you leverage some upstream relationships ;).  Backup 
> and restore has been the general answer to accidental data deletion (or 
> corruption) on FDB, and I could paint some attractive looking pros of this 
> approach: backup files are more disk space efficient, soft deleted data could 
> be offloaded to an S3-compatible store, it would be free if FDB is already 
> configured to take backups.  I was just curious to hear a bit more detail on 
> your/Peng’s side of the reasons for preferring to build soft deletion on top 
> of FDB (and thus have also intentionally withheld more of the cons of this 
> approach, or the pros of yours).
>
> > On Mar 18, 2020, at 11:59, Paul Davis  wrote:
> >
> > Alex,
> >
> > All joking aside, soft-deletion's target use case is accidental
> > deletions. This isn't a replacement for backup/restore which will
> > still happen for all the usual reasons.
> >
> > Paul
> >
> > On Wed, Mar 18, 2020 at 1:42 PM Paul Davis  
> > wrote:
> >>
> >> On Wed, Mar 18, 2020 at 1:29 PM Alex Miller
> >>  wrote:
> >>>
> >>>
> >>>> On Mar 18, 2020, at 05:04, jiangph  wrote:
> >>>>
> >>>> Instead of automatically and immediately removing data and index in 
> >>>> database after a delete operation, soft-deletion allows to restore the 
> >>>> deleted data back to original state due to a “fat finger”or undesired 
> >>>> delete operation, up to defined periods, such as 48 hours.
> >>>>
> >>>> In CouchDB 3.0, soft-deletion of database is implemented in [1]. The 
> >>>> .couch file is renamed with the ..deleted.couch file after 
> >>>> soft-deletion is enabled, and such file can be changed back to .couch 
> >>>> for the purpose of restore. If restore is not needed and some specified 
> >>>> period passed, the ..deleted.couch file can be deleted to 
> >>>> achieve deletion of database permanently.
> >

Re: [DISCUSS] soft-deletion

2020-03-18 Thread Paul Davis
Alex,

All joking aside, soft-deletion's target use case is accidental
deletions. This isn't a replacement for backup/restore which will
still happen for all the usual reasons.

Paul

On Wed, Mar 18, 2020 at 1:42 PM Paul Davis  wrote:
>
> On Wed, Mar 18, 2020 at 1:29 PM Alex Miller
>  wrote:
> >
> >
> > > On Mar 18, 2020, at 05:04, jiangph  wrote:
> > >
> > > Instead of automatically and immediately removing data and index in 
> > > database after a delete operation, soft-deletion allows to restore the 
> > > deleted data back to original state due to a “fat finger”or undesired 
> > > delete operation, up to defined periods, such as 48 hours.
> > >
> > > In CouchDB 3.0, soft-deletion of database is implemented in [1]. The 
> > > .couch file is renamed with the ..deleted.couch file after 
> > > soft-deletion is enabled, and such file can be changed back to .couch for 
> > > the purpose of restore. If restore is not needed and some specified 
> > > period passed, the ..deleted.couch file can be deleted to 
> > > achieve deletion of database permanently.
> > >
> > > In CouchDB 4.0, with the introduction of FoundationDB, the data model and 
> > > storage is changed. In order to support soft-deletion, we propose below 
> > > solution and then implement them.
> >
> >
> >
> > I’ve sort of hand waved some answers to this in my head, but would you mind 
> > expanding a bit on the advantages of keeping soft-deleted data in 
> > FoundationDB as opposed to actually deleting it and relying on 
> > FoundationDB’s backup and restore to recover it if needed?
>
> From: Panicked User
> To: Customer Support
> Subject: URGENT! EMERGENCY DATABASE RESTORE!
>
> Dear,
>
> I have accidentally deleted my Very Important Database and need to
> have it restored ASAP! Without this mission critical database my
> company is completely offline which is costing $1B an hour!
>
> Please respond ASAP!
>
> Sincerely,
> Panicky McPanics


Re: [DISCUSS] soft-deletion

2020-03-18 Thread Paul Davis
On Wed, Mar 18, 2020 at 1:29 PM Alex Miller
 wrote:
>
>
> > On Mar 18, 2020, at 05:04, jiangph  wrote:
> >
> > Instead of automatically and immediately removing data and index in 
> > database after a delete operation, soft-deletion allows to restore the 
> > deleted data back to original state due to a “fat finger”or undesired 
> > delete operation, up to defined periods, such as 48 hours.
> >
> > In CouchDB 3.0, soft-deletion of database is implemented in [1]. The .couch 
> > file is renamed with the ..deleted.couch file after 
> > soft-deletion is enabled, and such file can be changed back to .couch for 
> > the purpose of restore. If restore is not needed and some specified period 
> > passed, the ..deleted.couch file can be deleted to achieve 
> > deletion of database permanently.
> >
> > In CouchDB 4.0, with the introduction of FoundationDB, the data model and 
> > storage is changed. In order to support soft-deletion, we propose below 
> > solution and then implement them.
>
>
>
> I’ve sort of hand waved some answers to this in my head, but would you mind 
> expanding a bit on the advantages of keeping soft-deleted data in 
> FoundationDB as opposed to actually deleting it and relying on FoundationDB’s 
> backup and restore to recover it if needed?

From: Panicked User
To: Customer Support
Subject: URGENT! EMERGENCY DATABASE RESTORE!

Dear,

I have accidentally deleted my Very Important Database and need to
have it restored ASAP! Without this mission critical database my
company is completely offline which is costing $1B an hour!

Please respond ASAP!

Sincerely,
Panicky McPanics


Re: [DISCUSS] moving email lists to Discourse

2020-03-12 Thread Paul Davis
Ah, fair point!

On Thu, Mar 12, 2020 at 10:25 AM Jan Lehnardt  wrote:
>
>
>
> > On 12. Mar 2020, at 16:21, Paul Davis  wrote:
> >
> > I'm not against anything of that nature, but if memory serves the
> > email lists are dictated by ASF policy.
>
> If you remember when we did the GitHub transition, as long as we can make
> sure messages end up on a mailing list, we should be fine wrt the 
> requirements.
>
> Best
> Jan
> —
> >
> > On Thu, Mar 12, 2020 at 9:32 AM Garren Smith  wrote:
> >>
> >> Hi All,
> >>
> >> The CouchDB slack channel has been a real success with lots of people
> >> asking for help and getting involved. The main issue is that it is not
> >> searchable so we often get people asking the same questions over and over.
> >> The user mailing list is great in that sense that if you have subscribed to
> >> it you have a searchable list of questions and answers. However, it's
> >> really not user-friendly and judging by the fact that it has very low user
> >> participation I'm guessing most people prefer to use slack to ask 
> >> questions.
> >>
> >> I've been really impressed with how the FoundationDB forum[1] and the rust
> >> internal forum work [2]. I find them easy to use and really encourage
> >> participation. I would like to propose that we move our user and dev
> >> discussion to Discourse or a forum that works as well as Discourse. I think
> >> that would make it really easy for users of CouchDB to look up answers to
> >> questions and get involved in the development discussion.
> >>
> >> I haven't checked yet, but I'm sure we could get all discourse threads to
> >> automatically email back to the user and dev mailing list so that we still
> >> fulfill our Apache requirements.
> >>
> >> I know its a big step away from what we're used to with our mailing lists,
> >> but I think it would definitely open up our community.
> >>
> >> Cheers
> >> Garren
> >>
> >>
> >> [1] https://forums.foundationdb.org/
> >> [2] https://internals.rust-lang.org/
>


Re: [DISCUSS] moving email lists to Discourse

2020-03-12 Thread Paul Davis
I'm not against anything of that nature, but if memory serves the
email lists are dictated by ASF policy.

On Thu, Mar 12, 2020 at 9:32 AM Garren Smith  wrote:
>
> Hi All,
>
> The CouchDB slack channel has been a real success with lots of people
> asking for help and getting involved. The main issue is that it is not
> searchable so we often get people asking the same questions over and over.
> The user mailing list is great in that sense that if you have subscribed to
> it you have a searchable list of questions and answers. However, it's
> really not user-friendly and judging by the fact that it has very low user
> participation I'm guessing most people prefer to use slack to ask questions.
>
> I've been really impressed with how the FoundationDB forum[1] and the rust
> internal forum work [2]. I find them easy to use and really encourage
> participation. I would like to propose that we move our user and dev
> discussion to Discourse or a forum that works as well as Discourse. I think
> that would make it really easy for users of CouchDB to look up answers to
> questions and get involved in the development discussion.
>
> I haven't checked yet, but I'm sure we could get all discourse threads to
> automatically email back to the user and dev mailing list so that we still
> fulfill our Apache requirements.
>
> I know its a big step away from what we're used to with our mailing lists,
> but I think it would definitely open up our community.
>
> Cheers
> Garren
>
>
> [1] https://forums.foundationdb.org/
> [2] https://internals.rust-lang.org/


Re: CI for prototype/fdb-layer branch

2020-02-19 Thread Paul Davis
On Wed, Feb 19, 2020 at 4:51 AM Ilya Khlopotov  wrote:
>
> Sounds harder than I hoped. :-(
>

I was a bit out of it yesterday so dunno if my email went through
properly. Though I was trying to say the crazy docker bits should not
be an issue. dev/run and eunit already have the smarts to start fdb if
the fdbserver binary is available. We should be able to just add
multi-stage docker builds that create the fdbserver binary and copy
into our existing builds without much change to the existing stuff.

foundationdb does take a while to build though, so finding binaries
might short circuit everything to be even a single apt-get line or
w/e.

Though that papers over CentOS support and the like. Dunno what that
story is like.

> On 2020/02/18 20:06:48, Joan Touzet  wrote:
> > What Adam said.
> >
> > Reminder that to get CI builds working someone's going to need to do the
> > legwork to add an FDB instance to the container-based build. I'm happy
> > to grant docker hub credentials to couchdbdev for whomever takes this
> > work on.
> >
> > -Joan
> >
> > On 2020-02-18 14:15, Adam Kocoloski wrote:
> > > There’s nothing blocking us from using master for FDB development now 
> > > that we’ve got a 3.x branch — wouldn’t it be better to just make that 
> > > switch?
> > >
> > > Adam
> > >
> > >> On Feb 18, 2020, at 2:13 PM, Ilya Khlopotov  wrote:
> > >>
> > >> Currently we only trigger CI on attempts to merge to master branch. With 
> > >> ongoing effort to rebase CouchDB on top of FoundationDB it seems like we 
> > >> would be running two projects in parallel for quite some time. The lack 
> > >> of CI on merge to prototype/fdb-layer causes merges of broken code.
> > >>
> > >> I am curious how hard it would be to enable CI for prototype/fdb-layer 
> > >> branch in addition to master branch.
> > >>
> > >> Best regards,
> > >> iilyak
> > >
> >


Re: CI for prototype/fdb-layer branch

2020-02-18 Thread Paul Davis
If memory serves, fdbserver is statically linked by default which should
save some work.

On Tue, Feb 18, 2020 at 8:13 PM Paul Davis 
wrote:

> We probably don’t need fancy  Docker here. Eunit and dev/run already setup
> fdbserver automatically if the binary is found. A two stage build that
> copies fdbserver into the existing image is probably all that needs to
> happen.
>
> I’m limited to typing one handed for the forseeable future so I won’t get
> to it myself. But i’m more than happy to chat on email or irc if someone
> has questions.
>
> Here’s build notes for macOS:
>
> https://gist.github.com/davisp/aa6f526b8fd0441f2035c93774828798
>
>
> On Tue, Feb 18, 2020 at 5:08 PM Joan Touzet  wrote:
>
>> Thanks Alex!
>>
>> The Jenkins setup currently runs each CI job in a container per host,
>> and Docker-in-Docker can be a bit challenging. Hopefully the alternative
>> "expose your parent Docker socket inside of Docker with -v" approach, as
>> described at
>> https://jpetazzo.github.io/2015/09/03/do-not-use-docker-in-docker-for-ci/
>> will be sufficient. We'd also need the docker cli inside the CI image,
>> which shouldn't be too hard to add.
>>
>> For builds on master itself (post-merge, to check a wider set of
>> platforms), we currently have FreeBSD builds where there is no Docker.
>> We'll need to sort out a solution for that as well. I can preinstall FDB
>> on those hosts but the Jenkinsfile will have to take care of starting
>> and stopping FDB as pre/post actions.
>>
>> -Joan
>>
>> On 2020-02-18 17:22, Alex Miller wrote:
>> > And for whoever does pick it up, there's some examples of
>> docker-ify-ing FDB and running it via docker-compose in
>> apple/foundationdb's packaging/docker/<
>> https://github.com/apple/foundationdb/tree/master/packaging/docker>  ,
>> and the Dockerfile therein is built and pushed as foundationdb/foundationdb<
>> https://hub.docker.com/r/foundationdb/foundationdb/tags>  on docker hub.
>>
>


Re: CI for prototype/fdb-layer branch

2020-02-18 Thread Paul Davis
We probably don’t need fancy  Docker here. Eunit and dev/run already setup
fdbserver automatically if the binary is found. A two stage build that
copies fdbserver into the existing image is probably all that needs to
happen.

I’m limited to typing one handed for the forseeable future so I won’t get
to it myself. But i’m more than happy to chat on email or irc if someone
has questions.

Here’s build notes for macOS:

https://gist.github.com/davisp/aa6f526b8fd0441f2035c93774828798


On Tue, Feb 18, 2020 at 5:08 PM Joan Touzet  wrote:

> Thanks Alex!
>
> The Jenkins setup currently runs each CI job in a container per host,
> and Docker-in-Docker can be a bit challenging. Hopefully the alternative
> "expose your parent Docker socket inside of Docker with -v" approach, as
> described at
> https://jpetazzo.github.io/2015/09/03/do-not-use-docker-in-docker-for-ci/
> will be sufficient. We'd also need the docker cli inside the CI image,
> which shouldn't be too hard to add.
>
> For builds on master itself (post-merge, to check a wider set of
> platforms), we currently have FreeBSD builds where there is no Docker.
> We'll need to sort out a solution for that as well. I can preinstall FDB
> on those hosts but the Jenkinsfile will have to take care of starting
> and stopping FDB as pre/post actions.
>
> -Joan
>
> On 2020-02-18 17:22, Alex Miller wrote:
> > And for whoever does pick it up, there's some examples of docker-ify-ing
> FDB and running it via docker-compose in apple/foundationdb's
> packaging/docker/<
> https://github.com/apple/foundationdb/tree/master/packaging/docker>  ,
> and the Dockerfile therein is built and pushed as foundationdb/foundationdb<
> https://hub.docker.com/r/foundationdb/foundationdb/tags>  on docker hub.
>


Re: [ANNOUNCE] Juan José Rodriguez elected as CouchDB committer

2020-02-07 Thread Paul Davis
Welcome to the party!

On Fri, Feb 7, 2020 at 1:41 PM Jay Doane  wrote:
>
> Congrats Juanjo! Thanks for all your hard work, and welcome aboard!
>
> On Fri, Feb 7, 2020 at 11:29 AM Joan Touzet  wrote:
>
> > Dear community,
> >
> > I am pleased to announce that the CouchDB Project Management Committee
> > has elected Juan José Rodriguez as a CouchDB committer.
> >
> >  Apache ID: juanjo
> >
> >  Github ID: jjrodrig
> >
> > Committers are given a binding vote in certain project decisions, as
> > well as write access to public project infrastructure.
> >
> > This election was made in recognition of Juanjo's commitment to the
> > project. We mean this in the sense of being loyal to the project and its
> > interests.
> >
> > Please join me in extending a warm welcome to Juanjo!
> >
> > On behalf of the CouchDB PMC,
> > Joan Touzet
> >


Re: FDB: Map index key/value limits

2020-01-16 Thread Paul Davis
For A you also want to consider multiple emitted K/Vs on whether we
index some or none. I'd assume none as that would match the existing
equivalent of a doc throwing an exception during indexing.

On Thu, Jan 16, 2020 at 8:45 AM Garren Smith  wrote:
>
> Hi Everyone,
>
> We want to impose limits on the size of keys and values for map indexes.
> See the RFC for full details -
> https://github.com/apache/couchdb-documentation/pull/410
>
> The question I have is what is the best user experience if the user does
> exceed the key or value limit?
>
> Option A - Do not index the key/value and log the error
>
> Option B - Throw an error and don't build the index
>
> Option C - Any other ideas?
>
> Cheers
> Garren


Re: [DISCUSS] Minor replicator error reporting API change

2019-12-20 Thread Paul Davis
+1

On Fri, Dec 20, 2019 at 12:33 PM Nick Vatamaniuc  wrote:
>
> Hi everyone,
>
> Before 3.0 goes out, I wanted to propose a minor replicator
> _scheduler/* API change.
>
> Currently when a replication job is crashing it reports the error as a
> string in the "info" field. So that that "info" field can be null, a
> string, or an object with various replication stats.
>
> The proposal is to turn the string into an object as well with an
> "error' field. So instead of
>
> {
>   ...
>"info": "some error message"
> }
>
> It will look like
>
> {
>...
>"info": {"error": "some error message"}
> }
>
> A few reasons for this change that it's a bit more consistent, and it
> should help with the some clients in static type language which have
> harder time handling an object being either a string, or an object, as
> opposed to a nullable fields and be different objects.
>
> What does everything think?
>
> Cheers,
> -Nick


Re: Request: Committers, delete your old branches on apache/couchdb!

2019-12-18 Thread Paul Davis
I noticed that there are a lot of branches pointing at commits that
have been merged to master. I'll hack up a quick script today that
will go through all branches and delete anything that's been merged.
I'll write out the specific branch/sha combinations and report them
here in case anyone really needs a branch I end up deleting.

On Tue, Dec 17, 2019 at 6:35 PM Joan Touzet  wrote:
>
> Bump. Please do this.
>
> On 2019-12-13 12:15 p.m., Joan Touzet wrote:
> > Hi again committers,
> >
> > Our proto-Jenkins setup is currently using my API token to scan GitHub
> > for branches. GitHub has notoriously low values for how many API calls
> > you can make per hour.
> >
> > Unfortunately, because multibranch pipeline jobs scan *all* branches in
> > a repo for changes, this means that the total number of jobs we can run
> > in an hour is pretty small.
> >
> > I'm asking everyone to please go and delete any obsolete or unused
> > branches on the apache/couchdb repo that you don't need. I'm too nervous
> > to mass delete other people's branches, but if I see anything that got
> > merged and is more than a few years old, it'll probably get wiped.
> >
> > Thanks,
> > Joan "needs sleep" Touzet
> >


Re: [PROPOSAL] Drop Erlang 19 support in CouchDB 3.0

2019-12-13 Thread Paul Davis
I was thinking something along these lines:

https://github.com/apache/couchdb/pull/2358

The two big changes are for 19 saying "We don't support this, but we
don't have any reason to believe it won't work." and changing the
wording for the blacklisted 20.x+ versions saying "Absolutely do not
use this version." which I think should make things a bit more clear
to users.

I haven't tested this or updated docs to match. If everyone thinks
this is worth the extra effort I'll run it against 17-22 and update
the documented dependency versions and so on.

On Thu, Dec 12, 2019 at 3:27 PM Joan Touzet  wrote:
>
> Sure...do you mean just in the release notes, or some tangible change to
> rebar.config.script / configure?
>
> -Joan
>
> On 2019-12-12 4:06 p.m., Paul Davis wrote:
> > +1
> >
> > The only thought that comes to mind is that it might be useful to
> > differ in some of our error messages between versions. AFAIK, there's
> > nothing from 19.x that currently would prevent someone from using it.
> > We just don't have the resources to cover all of the tests and
> > packaging for it. Which is different than some than the black-listed
> > 20.x versions which have known bugs that break things.
> >
> > So, basically having something like Allow/Warn/Block classifications
> > rather than an opaque "This version is not supported" error message?
> >
> > On Thu, Dec 12, 2019 at 2:34 PM Dave Cottlehuber  wrote:
> >>
> >> On Thu, 12 Dec 2019, at 01:35, Joan Touzet wrote:
> >>> Hello everyone,
> >>>
> >>> I'm working this week with Paul Davis on our new Jenkins CI
> >>> infrastructure, which is coming along nicely. One of the changes I'm
> >>> planning to make is that our PR tests will run against only 3 versions
> >>> of Erlang:
> >>>
> >>> 1. The oldest we support (right now, 19.3.6.latest)
> >>> 2. The version we currently ship with our binary distros & Docker
> >>> (right now, 20.3.8.latest)
> >>> 3. The very latest version we support (right now, 22.2)
> >>>
> >>> In preparing the containers for CI testing, it's turning out to be very
> >>> difficult to build Erlang 19.* anymore on modern Linuxes. This is
> >>> because they ship with OpenSSL 1.1+, and 19.* cannot build against
> >>> anything newer than OpenSSL 1.0.
> >>>
> >>> I can jump through a huge number of hoops for this...or we can just drop
> >>> Erlang 19 support for CouchDB 3.0 and require Erlang 20. (Note we
> >>> blacklist a number of versions of Erlang 20.) I would then replace
> >>> 19.3.6.latest with 20.3.8.11 [1].
> >>
> >> +1
> >>
> >> FWIW RabbitMQ has done the same (21 & 22 only soon), and in FreeBSD
> >> we drop <19 from January, and I expect to be more aggressive on that
> >> in future, in line with what the OTP team have stated.
> >>
> >> I would consider going with 21+ only and be done with this.
> >>
> >> A+
> >> Dave


Re: Script for displaying test errors and timing information

2019-12-13 Thread Paul Davis
Cool beans. I'll take a crack at removing some of the ick around
displaying along with adding some command line arguments for
controlling output.

For long term storage I think it'd be better to store the raw XML
files rather than the formatted output which should be an easy
addition.

On Thu, Dec 12, 2019 at 3:28 PM Joan Touzet  wrote:
>
> +1. I think this is useful information to collect on every CI run and
> store alongside the run, though keeping in mind we only keep the last
> 7-10 runs in Jenkins nowadays.
>
> If we need it longer, we should stash the output with the log stasher.
> This could allow graphing if someone wants to build a neato D3 thing
> that hits the couchdb-vm2 instance for stats.
>
> On 2019-12-12 3:51 p.m., Paul Davis wrote:
> > Hey all,
> >
> > I was poking around at Jenkins the other day trying to get a good idea
> > of how much time we're spending in various parts of the build. It
> > occurred to me that one good way to at least investigate our eunit
> > test suite is to parse all of the generated surefire reports. I spent
> > an hour or so yesterday throwing a python script together to see what
> > that might look like.
> >
> > I've posted an example output and the script itself here:
> >
> > https://gist.github.com/davisp/7064c1ef0dc94a99c739729b97fef10e
> >
> > So far it shows some aggregate timing information along with the top
> > few results for total suite time, total setup/teardown time, and
> > longest individual test times. Not included in the output, but any
> > test failures or skipped tests are also included in the output which
> > has the nice benefit of showing exactly which test or tests failed
> > during a run.
> >
> > So far a few interesting things jump out. First, we spend more time in
> > setup/teardown fixtures than we do running actual tests which I found
> > to be rather surprising. Not so surprising is that our property tests
> > are taking the longest for each run.
> >
> > I thought it might be interesting to include a version of this script
> > (with perhaps a somewhat more succinct output) at the end of every
> > eunit run (or perhaps failed eunit run) to list out the errors for
> > easier debugging of failed CI runs. And then beyond that, having it in
> > the source tree would allow devs to poke around and trying to speed up
> > some of the slower tests.
> >
> > Another thought was to take a look at moving the property based tests
> > to a nightly build and then actually increasing their run times to
> > cover more of the state space for each of those tests. That way we'd
> > be doing a bit more of the normal property based testing where its
> > more about long soaks to find edge cases rather than acceptance
> > testing for a particular change.
> >
> > So far I've just thrown this together. If there's enough of a
> > consensus that we'd like to see something of this nature I can work a
> > bit more on improving the script to have something that's useful both
> > locally for reducing the test suite times and also for spotting errors
> > that failed a CI run.
> >
> > Paul
> >


Re: [PROPOSAL] Drop Erlang 19 support in CouchDB 3.0

2019-12-12 Thread Paul Davis
+1

The only thought that comes to mind is that it might be useful to
differ in some of our error messages between versions. AFAIK, there's
nothing from 19.x that currently would prevent someone from using it.
We just don't have the resources to cover all of the tests and
packaging for it. Which is different than some than the black-listed
20.x versions which have known bugs that break things.

So, basically having something like Allow/Warn/Block classifications
rather than an opaque "This version is not supported" error message?

On Thu, Dec 12, 2019 at 2:34 PM Dave Cottlehuber  wrote:
>
> On Thu, 12 Dec 2019, at 01:35, Joan Touzet wrote:
> > Hello everyone,
> >
> > I'm working this week with Paul Davis on our new Jenkins CI
> > infrastructure, which is coming along nicely. One of the changes I'm
> > planning to make is that our PR tests will run against only 3 versions
> > of Erlang:
> >
> > 1. The oldest we support (right now, 19.3.6.latest)
> > 2. The version we currently ship with our binary distros & Docker
> >(right now, 20.3.8.latest)
> > 3. The very latest version we support (right now, 22.2)
> >
> > In preparing the containers for CI testing, it's turning out to be very
> > difficult to build Erlang 19.* anymore on modern Linuxes. This is
> > because they ship with OpenSSL 1.1+, and 19.* cannot build against
> > anything newer than OpenSSL 1.0.
> >
> > I can jump through a huge number of hoops for this...or we can just drop
> > Erlang 19 support for CouchDB 3.0 and require Erlang 20. (Note we
> > blacklist a number of versions of Erlang 20.) I would then replace
> > 19.3.6.latest with 20.3.8.11 [1].
>
> +1
>
> FWIW RabbitMQ has done the same (21 & 22 only soon), and in FreeBSD
> we drop <19 from January, and I expect to be more aggressive on that
> in future, in line with what the OTP team have stated.
>
> I would consider going with 21+ only and be done with this.
>
> A+
> Dave


Script for displaying test errors and timing information

2019-12-12 Thread Paul Davis
Hey all,

I was poking around at Jenkins the other day trying to get a good idea
of how much time we're spending in various parts of the build. It
occurred to me that one good way to at least investigate our eunit
test suite is to parse all of the generated surefire reports. I spent
an hour or so yesterday throwing a python script together to see what
that might look like.

I've posted an example output and the script itself here:

https://gist.github.com/davisp/7064c1ef0dc94a99c739729b97fef10e

So far it shows some aggregate timing information along with the top
few results for total suite time, total setup/teardown time, and
longest individual test times. Not included in the output, but any
test failures or skipped tests are also included in the output which
has the nice benefit of showing exactly which test or tests failed
during a run.

So far a few interesting things jump out. First, we spend more time in
setup/teardown fixtures than we do running actual tests which I found
to be rather surprising. Not so surprising is that our property tests
are taking the longest for each run.

I thought it might be interesting to include a version of this script
(with perhaps a somewhat more succinct output) at the end of every
eunit run (or perhaps failed eunit run) to list out the errors for
easier debugging of failed CI runs. And then beyond that, having it in
the source tree would allow devs to poke around and trying to speed up
some of the slower tests.

Another thought was to take a look at moving the property based tests
to a nightly build and then actually increasing their run times to
cover more of the state space for each of those tests. That way we'd
be doing a bit more of the normal property based testing where its
more about long soaks to find edge cases rather than acceptance
testing for a particular change.

So far I've just thrown this together. If there's enough of a
consensus that we'd like to see something of this nature I can work a
bit more on improving the script to have something that's useful both
locally for reducing the test suite times and also for spotting errors
that failed a CI run.

Paul


Re: [DISCUSS] Node types in CouchDB 4.x

2019-11-19 Thread Paul Davis
Sounds reasonable assuming you made a typo here:

> By default, with any extra configuration, the behavior would stay as is 
> today...

I assume that should have been "without any extra configuration"?

On Tue, Nov 19, 2019 at 11:11 AM Nick Vatamaniuc  wrote:
>
> Hi everyone,
>
> I'd like to discuss the ability to have heterogeneous nodes types in CouchDB 
> 4.
>
> In CouchDB 2 and 3 the nodes in the cluster are usually similar, and
> functionality is uniformly distributed amongst the nodes. That is all
> nodes can accept HTTP requests, run replication jobs, build indices
> etc. They are typically deployed such that they similar hardware
> requirements.
>
> In an FDB-based CouchDB 4, CRUD operations, on the Erlang nodes,
> wouldn't require as many resources, so it would be possible to have a
> set of nodes, performing just CRUD operations that are much smaller
> than the equivalent CouchDB 2 and 3 nodes. However, indexing and
> replication might still require heavy resource usage.
>
> So the proposal is to add configuration to CouchDB 4 to allow some
> nodes to perform only  a subset of their current functionality. For
> example, it would be possible to have 6 1-CPU nodes with 512MB
> accepting API requests, and, 2 4-CPU node with 4GB of memory each
> running replication and indexing jobs only, or any other such
> combinations. By default, with any extra configuration, the behavior
> would stay as is today -- all nodes will run all the functionality.
>
> I created an RFC exploring how it might look like:
> https://github.com/apache/couchdb-documentation/pull/457
>
> There is a comment there how it could be implemented. So far it looks
> like it could be fairly trivial  since it would build on the
> couch_jobs work already in place.
>
> What does everyone think?
>
> Cheers,
> -Nick


Re: Introduction of open tracing (otter)

2019-09-16 Thread Paul Davis
Aha! Nice!

On Thu, Sep 12, 2019 at 12:19 PM Ilya Khlopotov  wrote:
>
> > Ilya, you mentioned hopping from the coordinator to RPC workers which
> > is definitely an open problem. I only skimmed the docs months ago but
> > one of the things I came across was trying to figure out how to
> > represent parallel traces. Given we have a coordinator that has N>1
> > RPC workers running in parallel I wasn't sure how that'd work. Granted
> > that was on the shallowest of shallow dives skimming their docs when
> > someone mentioned the tracing thing somewhere.
>
> I couldn't find a way to attach image so I created a gist here 
> https://gist.github.com/iilyak/db7678f3c368466efc5abb1f70f2c088 to show how 
> nested spans are rendered. The otter library should produce similar result if 
> we would link new span with its parent. Parent can be specified if we use  
> start_with_tags(Name, InitialTags, TraceId, ParentId).
>
> On 2019/09/10 21:43:02, Paul Davis  wrote:
> > Looks pretty awesome. I've got basically the same questions as Koco on
> > performance. There are also games like the lager transforms that
> > conditionally enable/disable log levels at runtime. If memory serves,
> > it ended up being a single function call overhead to check for
> > disabled based on some dynamically compiled module or ets lookup I
> > think.
> >
> > Koco, are client inherited spans an opentracing concept? At first I
> > read it as "let a user specify points in CouchDB to insert trace
> > markers at runtime" and it sounded kinda crazy. But if you mean
> > somehow connecting the CouchDB generated span with some other span in
> > a different application that sounds like something reasonable to
> > support.
> >
> > Ilya, you mentioned hopping from the coordinator to RPC workers which
> > is definitely an open problem. I only skimmed the docs months ago but
> > one of the things I came across was trying to figure out how to
> > represent parallel traces. Given we have a coordinator that has N>1
> > RPC workers running in parallel I wasn't sure how that'd work. Granted
> > that was on the shallowest of shallow dives skimming their docs when
> > someone mentioned the tracing thing somewhere.
> >
> > On Tue, Sep 10, 2019 at 3:46 PM Adam Kocoloski  wrote:
> > >
> > > I think this is a great idea overall, particularly given the number of 
> > > significant changes that are happening in the codebase between 3.0 and 
> > > 4.0.
> > >
> > > For me the main question is how much overhead is associated with tracing. 
> > > Can an admin safely configure it to run in production? Is it possible to 
> > > sample just a small percentage of events? Does the overhead change if no 
> > > OpenTracing tracer is configured?
> > >
> > > I also think a full picture here might include the ability to inherit 
> > > client-provided spans, so an app developer could drill down from her own 
> > > code into the database internals and figure out why some DB request was 
> > > unexpectedly slow.
> > >
> > > Thanks for starting this discussion. Cheers,
> > >
> > > Adam
> > >
> > > > On Sep 10, 2019, at 2:32 PM, Ilya Khlopotov  wrote:
> > > >
> > > > Hi,
> > > >
> > > > I wanted to run this idea by the ML to see if there is any interest 
> > > > before investing time into preparing formal RFC.
> > > >
> > > > # Problem statement
> > > >
> > > > Collecting profiling data is very tricky at the moment. Developers have 
> > > > to run generic profiling tools which are not aware of CouchDB specifics.
> > > > This makes it hard to do the performance optimization work. We need a 
> > > > tool which would allow us to get profiling data from specific points in 
> > > > the codebase. This means code instrumentation.
> > > >
> > > > # Proposed solution
> > > >
> > > > There is an https://opentracing.io/ project, which is a vendor-neutral 
> > > > APIs and instrumentation for distributed tracing. In Erlang it is 
> > > > implemented by https://github.com/Bluehouse-Technology/otter library. 
> > > > The library provides a nice abstraction to start/finish tracing spans 
> > > > as well as adding tags and log entries to a given span. In the context 
> > > > of CouchDB this means that we can do something like the following:
> > > > - start tracing span on every HTTP request
> > > > - add tags to capture a

Re: Introduction of open tracing (otter)

2019-09-10 Thread Paul Davis
+1

On Tue, Sep 10, 2019 at 4:51 PM Adam Kocoloski  wrote:
>
> Yeah, I meant the latter — joining CouchDB’s span information to spans in an 
> app built against CouchDB so a developer can see the end-to-end story. Wasn’t 
> proposing user-customized spans inside the DB :)
>
> Adam
>
> > On Sep 10, 2019, at 5:43 PM, Paul Davis  wrote:
> >
> > Looks pretty awesome. I've got basically the same questions as Koco on
> > performance. There are also games like the lager transforms that
> > conditionally enable/disable log levels at runtime. If memory serves,
> > it ended up being a single function call overhead to check for
> > disabled based on some dynamically compiled module or ets lookup I
> > think.
> >
> > Koco, are client inherited spans an opentracing concept? At first I
> > read it as "let a user specify points in CouchDB to insert trace
> > markers at runtime" and it sounded kinda crazy. But if you mean
> > somehow connecting the CouchDB generated span with some other span in
> > a different application that sounds like something reasonable to
> > support.
> >
> > Ilya, you mentioned hopping from the coordinator to RPC workers which
> > is definitely an open problem. I only skimmed the docs months ago but
> > one of the things I came across was trying to figure out how to
> > represent parallel traces. Given we have a coordinator that has N>1
> > RPC workers running in parallel I wasn't sure how that'd work. Granted
> > that was on the shallowest of shallow dives skimming their docs when
> > someone mentioned the tracing thing somewhere.
> >
> > On Tue, Sep 10, 2019 at 3:46 PM Adam Kocoloski  wrote:
> >>
> >> I think this is a great idea overall, particularly given the number of 
> >> significant changes that are happening in the codebase between 3.0 and 4.0.
> >>
> >> For me the main question is how much overhead is associated with tracing. 
> >> Can an admin safely configure it to run in production? Is it possible to 
> >> sample just a small percentage of events? Does the overhead change if no 
> >> OpenTracing tracer is configured?
> >>
> >> I also think a full picture here might include the ability to inherit 
> >> client-provided spans, so an app developer could drill down from her own 
> >> code into the database internals and figure out why some DB request was 
> >> unexpectedly slow.
> >>
> >> Thanks for starting this discussion. Cheers,
> >>
> >> Adam
> >>
> >>> On Sep 10, 2019, at 2:32 PM, Ilya Khlopotov  wrote:
> >>>
> >>> Hi,
> >>>
> >>> I wanted to run this idea by the ML to see if there is any interest 
> >>> before investing time into preparing formal RFC.
> >>>
> >>> # Problem statement
> >>>
> >>> Collecting profiling data is very tricky at the moment. Developers have 
> >>> to run generic profiling tools which are not aware of CouchDB specifics.
> >>> This makes it hard to do the performance optimization work. We need a 
> >>> tool which would allow us to get profiling data from specific points in 
> >>> the codebase. This means code instrumentation.
> >>>
> >>> # Proposed solution
> >>>
> >>> There is an https://opentracing.io/ project, which is a vendor-neutral 
> >>> APIs and instrumentation for distributed tracing. In Erlang it is 
> >>> implemented by https://github.com/Bluehouse-Technology/otter library. The 
> >>> library provides a nice abstraction to start/finish tracing spans as well 
> >>> as adding tags and log entries to a given span. In the context of CouchDB 
> >>> this means that we can do something like the following:
> >>> - start tracing span on every HTTP request
> >>> - add tags to capture additional information such as "database 
> >>> name"/"name of endpoint"/"nonce"
> >>> - add otter logs in critical parts of the codebase to get profiling data 
> >>> for these points.
> >>>
> >>> The otter is the most useful in combination with 
> >>> [zipkin](https://zipkin.io/) compatible server such as 
> >>> [jaeger](https://github.com/jaegertracing/jaeger). However it can be used 
> >>> even without zipkin. It has a configurable set of counters, which makes 
> >>> it possible to get answers on questions like:
> >>> - what kind of requests are slow
> >>> - if we get a slow requ

Re: Compaction daemon changes

2019-09-06 Thread Paul Davis
Oh duh. I was only thinking about the scheduling aspect but I'm
guessing you were also thinking about the fragmentation settings and
so on that don't have a direct equivalent. Seems fair to me.

To be fair, that was before I was properly caffeinated.

On Fri, Sep 6, 2019 at 9:56 AM Adam Kocoloski  wrote:
>
> I was just going to port over the `check_period` function and add support for 
> “from” and “to” as per-channel config parameters, so I don’t think it will 
> meaningfully help with the rationalization of the config systems.
>
> Adam
>
> > On Sep 6, 2019, at 10:15 AM, Paul Davis  wrote:
> >
> > Seems mostly reasonable. The only thing I'd add is that if we're
> > looking to implement #1 I'd assume we'd reuse or at least rework the
> > old compaction daemon code which makes me think that #3 would be
> > trivial to support?
> >
> > On Fri, Sep 6, 2019 at 8:25 AM Adam Kocoloski  wrote:
> >>
> >> Hi all,
> >>
> >> CouchDB 3.0 will feature a new, smarter auto-compaction daemon with the 
> >> following key features:
> >>
> >> - continuous re-prioritization of compaction queues based on estimated 
> >> space savings
> >> - fine-grained control over compaction “channels” to independently 
> >> prioritize different types of jobs (large databases, small views, etc.)
> >> - QoS capabilities: compaction I/O is executed at low priority by default, 
> >> but admins can can reprioritize as needed
> >>
> >> However, there are a few gaps compared to the daemon in 2.x:
> >>
> >> 1. no ability to configure compaction to only run during specific time 
> >> intervals
> >> 2. no ability to specify compaction thresholds for specific databases
> >> 3. incompatible configuration system; users who have customized their 
> >> auto-compaction configuration in 2.x will need to redo their configuration 
> >> in 3.0
> >>
> >> I have those gaps ordered in what I’d consider to be the priority. I think 
> >> we should try to address #1 before 3.0 as I’m sure many DBAs have grown 
> >> accustomed to compacting during quiet hours and could be forgiven if they 
> >> don’t trust our fancy QoS to keep things healthy on Day 1. I can see where 
> >> #2 could be a nice enhancement but I’m OK to wait for user feedback on 
> >> that one. #3 I’m content to solve with a migration guide in the 
> >> documentation.
> >>
> >> Does that plan make sense to everyone?
> >>
> >> Adam
>


Re: Compaction daemon changes

2019-09-06 Thread Paul Davis
Seems mostly reasonable. The only thing I'd add is that if we're
looking to implement #1 I'd assume we'd reuse or at least rework the
old compaction daemon code which makes me think that #3 would be
trivial to support?

On Fri, Sep 6, 2019 at 8:25 AM Adam Kocoloski  wrote:
>
> Hi all,
>
> CouchDB 3.0 will feature a new, smarter auto-compaction daemon with the 
> following key features:
>
> - continuous re-prioritization of compaction queues based on estimated space 
> savings
> - fine-grained control over compaction “channels” to independently prioritize 
> different types of jobs (large databases, small views, etc.)
> - QoS capabilities: compaction I/O is executed at low priority by default, 
> but admins can can reprioritize as needed
>
> However, there are a few gaps compared to the daemon in 2.x:
>
> 1. no ability to configure compaction to only run during specific time 
> intervals
> 2. no ability to specify compaction thresholds for specific databases
> 3. incompatible configuration system; users who have customized their 
> auto-compaction configuration in 2.x will need to redo their configuration in 
> 3.0
>
> I have those gaps ordered in what I’d consider to be the priority. I think we 
> should try to address #1 before 3.0 as I’m sure many DBAs have grown 
> accustomed to compacting during quiet hours and could be forgiven if they 
> don’t trust our fancy QoS to keep things healthy on Day 1. I can see where #2 
> could be a nice enhancement but I’m OK to wait for user feedback on that one. 
> #3 I’m content to solve with a migration guide in the documentation.
>
> Does that plan make sense to everyone?
>
> Adam


Re: [DISCUSS] [PROPOSAL] Accept donation of the IBM Cloudant Weather Report diagnostic tool?

2019-08-14 Thread Paul Davis
+1

On Wed, Aug 14, 2019 at 9:07 AM Jan Lehnardt  wrote:
>
> Agreed, and recon’s BSD 3-clause is totally fine as per 
> https://apache.org/legal/resolved.html#category-a
>
> +1
>
> Best
> Jan
> —
>
> > On 14. Aug 2019, at 15:36, Robert Newson  wrote:
> >
> > I’m for the proposal and am confident IBM will apply release custodian 
> > under the ASLv2 if the community is in favour of the the proposal.
> >
> > B.
> >
> >> On 14 Aug 2019, at 07:19, Jay Doane  wrote:
> >>
> >> In the interest of making CouchDB 3.0 "the best CouchDB Classic possible",
> >> I'd like to discuss whether to accept a donation from Cloudant of the
> >> "Weather Report" diagnostic tool. This tool (and dependencies) are OTP
> >> applications, and it is typically run from an escript which connects to a
> >> running cluster, gathers numerous diagnostics, and emits various warning
> >> and errors when it finds something to complain about. It was originally
> >> ported from a fork of Riaknostic (the Automated diagnostic tools for Riak)
> >> [1] by Mike Wallace.
> >>
> >> The checks it makes are represented by the following modules:
> >>
> >> weatherreport_check_custodian.erl
> >> weatherreport_check_disk.erl
> >> weatherreport_check_internal_replication.erl
> >> weatherreport_check_ioq.erl
> >> weatherreport_check_mem3_sync.erl
> >> weatherreport_check_membership.erl
> >> weatherreport_check_memory_use.erl
> >> weatherreport_check_message_queues.erl
> >> weatherreport_check_node_stats.erl
> >> weatherreport_check_nodes_connected.erl
> >> weatherreport_check_process_calls.erl
> >> weatherreport_check_process_memory.erl
> >> weatherreport_check_safe_to_rebuild.erl
> >> weatherreport_check_search.erl
> >> weatherreport_check_tcp_queues.erl
> >>
> >> While some of these checks are self-contained, check_node_stats,
> >> check_process_calls, check_process_memory, and check_message_queues all use
> >> recon [2] under the hood. Similarly, check_custodian
> >> and check_safe_to_rebuild use another Cloudant OTP application called
> >> Custodian, which periodically scans the "dbs" database to track the
> >> location of every shard of every database and can integrate with sensu [3]
> >> to ensure that operators are aware of any shard that is under-replicated.
> >>
> >> I have created a POC branch [4] that adds Weather Report, Custodian, and
> >> Recon to CouchDB, and when I ran it in my dev environment (without search
> >> running), got the following diagnostic output:
> >>
> >> $ ./weatherreport --etc ~/proj/couchdb/dev/lib/node1/etc/ -a
> >> ['node1@127.0.0.1'] [error] Local search node at 'clouseau@127.0.0.1' not
> >> responding: pang
> >> ['node2@127.0.0.1'] [error] Local search node at 'clouseau@127.0.0.1' not
> >> responding: pang
> >> ['node3@127.0.0.1'] [error] Local search node at 'clouseau@127.0.0.1' not
> >> responding: pang
> >> ['node1@127.0.0.1'] [notice] Data directory
> >> /Users/jay/proj/couchdb/dev/lib/node1/data is not mounted with 'noatime'.
> >> Please remount its disk with the 'noatime' flag to improve performance.
> >> ['node2@127.0.0.1'] [notice] Data directory
> >> /Users/jay/proj/couchdb/dev/lib/node2/data is not mounted with 'noatime'.
> >> Please remount its disk with the 'noatime' flag to improve performance.
> >> ['node3@127.0.0.1'] [notice] Data directory
> >> /Users/jay/proj/couchdb/dev/lib/node3/data is not mounted with 'noatime'.
> >> Please remount its disk with the 'noatime' flag to improve performance.
> >> returned 1
> >>
> >> There is still a little cleanup to be done before these tools would be
> >> ready to donate, but it seems that overall they already integrate tolerably
> >> well with CouchDB.
> >>
> >> As far as licenses go, Riaknostic is Apache 2.0. Recon is not [5], but it
> >> seems like it should be ok to include in CouchDB based on my possibly naive
> >> reading. Currently Custodian has no license (just Copyright 2013 Cloudant),
> >> but I assume it would get an Apache license, just like all other donated
> >> code.
> >>
> >> Would this be a welcome addition to CouchDB? Please let me know what you
> >> think.
> >>
> >> Thanks,
> >> Jay
> >>
> >> [1] https://github.com/basho/riaknostic
> >> [2] http://ferd.github.io/recon/
> >> [3] https://sensu.io
> >> [4]
> >> https://github.com/apache/couchdb/compare/master...cloudant:weatherreport?expand=1
> >> [5] https://github.com/ferd/recon/blob/master/LICENSE
> >
>
> --
> Professional Support for Apache CouchDB:
> https://neighbourhood.ie/couchdb-support/
>


Re: [PROPOSAL] Deprecate several mailing lists

2019-08-01 Thread Paul Davis
+1

On Thu, Aug 1, 2019 at 1:31 PM Nick Vatamaniuc  wrote:
>
> +1
>
> On Thu, Aug 1, 2019 at 1:08 PM Garren Smith  wrote:
>
> > +1
> >
> > On Thu, Aug 1, 2019 at 6:52 PM Jan Lehnardt  wrote:
> >
> > > +1
> > >
> > > Cheers
> > > Jan
> > > —
> > >
> > > > On 1. Aug 2019, at 18:33, Joan Touzet  wrote:
> > > >
> > > > Right now, we have a massive number of mailing lists, most of which we
> > > > never use.
> > > >
> > > > I propose keeping these mailing lists only:
> > > >
> > > >  - announce@
> > > >  - user@
> > > >  - dev@
> > > >  - notifications@
> > > >  - commits@
> > > >  - private@
> > > >
> > > > and we drop all the rest:
> > > >
> > > >  - l10n@
> > > >  - replication@
> > > >  - marketing@
> > > >  - www@
> > > >  - design@
> > > >  - couchapp@
> > > >  - erlang@ (already archived)
> > > >
> > > > Archives will remain of lists we drop for future reference.
> > > >
> > > > Given the extremely low traffic on these other lists, I'm doing this as
> > > > lazy consensus and will act Monday Aug 5. If anyone has a solid
> > argument
> > > > why traffic on those other lists can't live on user@ or dev@ instead,
> > > > please speak now.
> > > >
> > > > -Joan
> > > >
> > >
> > >
> >


Re: [VOTE] Adopt FoundationDB

2019-07-30 Thread Paul Davis
+1

On Tue, Jul 30, 2019 at 10:06 AM Nick Vatamaniuc  wrote:
>
> +1
>
> On Tue, Jul 30, 2019 at 10:25 AM Wendall Cada  wrote:
>
> > +1
> >
> > On Tue, Jul 30, 2019 at 7:08 AM Adam Kocoloski 
> > wrote:
> >
> > > +1
> > >
> > > > On Jul 30, 2019, at 4:27 AM, Jan Lehnardt  wrote:
> > > >
> > > > Dear CouchDB developers,
> > > >
> > > > This vote decides whether the CouchDB project accepts the proposal[1]
> > > > to switch our underlying storage and distributed systems technology out
> > > > for FoundationDB[2].
> > > >
> > > > At the outset, we said that we wanted to cover these topic areas before
> > > > making a vote:
> > > >
> > > > - Bylaw changes
> > > >- RFC process: done, passed
> > > >- Add qualified vote option: done, changes proposed were not
> > > >  ratified
> > > >
> > > > - Roadmap: proposal done, detailed discussions TBD, includes
> > > >  deprecations
> > > >
> > > > - Onboarding: ASF onboarding links shared, CouchDB specific onboarding
> > > >  TBD.
> > > >
> > > > - (Re-)Branding: tentatively: 3.0 is the last release before FDB
> > > >  CouchDB and 4.0 is the FDB CouchDB. If we need nicknames, we can
> > > >  decide on those later.
> > > >
> > > > - FoundationDB Governance: FoundationDB is currently loosely organised
> > > >  between Apple and a few key stakeholder companies invested in the
> > > >  technology. Apple contributions are trending downwards relatively,
> > > >  approaching 50%, which means in the future, more non-Apple than Apple
> > > >  contributions are likely.
> > > >
> > > >  In addition, the CouchDB PMC has requested addition to the current
> > > >  organisational FDB weekly meeting, which is where any more formal
> > > >  governance decisions are going to be made and the CouchDB PMC can be
> > > >  a part of the surrounding discussions.
> > > >
> > > > - FoundationDB Operations knowledge: IBM has intends to share this
> > > >  knowledge as they acquire it in conjunction with Apache CouchDB in
> > > >  terms of general ops knowledge, best practices and tooling.
> > > >
> > > > - Proj. Mgmt.: RFC process + outline list of TBD RFCs allow for enough
> > > >  visibility and collaboration opportunities, everyone on dev@ list is
> > > >  encouraged to participate.
> > > >
> > > > - Tech deep dives: DISCUSS threads and RFCs are covering this, current
> > > >  list of TBD DISCUSS/RFCs, for the proposal. Most of which were
> > > >  already discussed on dev@ or RFC’d in our documentation repo:
> > > >
> > > >* JSON doc storage and storage of edit conflicts
> > > >* revision management
> > > >* _changes feed
> > > >* _db_updates
> > > >* _all_docs
> > > >* database creation and deletion
> > > >* attachments
> > > >* mango indexes (including collation)
> > > >* map-only views / search / geo
> > > >* reduces
> > > >* aggregate metrics (data_size, etc.)
> > > >* release engineering
> > > >* local/desktop/dev install security
> > > >
> > > > * * *
> > > >
> > > > As shown above, all topics we wanted to have clarity on have been
> > > > advanced to a point where we are now ready to make a decision:
> > > >
> > > >  Should Apache CouchDB adopt FoundationDB?
> > > >
> > > > Since this is a big decision, I suggest we make this a Lazy 2/3
> > > > Majority Vote with PMC Binding Votes, and a 7 day duration (as per our
> > > > bylaws[3]).
> > > >
> > > > You can cast your votes now.
> > > >
> > > > Best
> > > > Jan
> > > > —
> > > > [1]:
> > >
> > https://lists.apache.org/thread.html/04e7889354c077a6beb91fd1292b6d38b7a3f2c6a5dc7d20f5b87c44@%3Cdev.couchdb.apache.org%3E
> > > > [2]: https://www.foundationdb.org
> > > > [3]: https://couchdb.apache.org/bylaws.html
> > > >
> > > >
> > >
> > >
> >


Re: Fsyncgate: errors on fsync are unrecovarable

2019-07-21 Thread Paul Davis
I’m browsing on my phone but I’m pretty sure we should add an `ok =` to
this line so that we force a bad match:

https://github.com/apache/couchdb/blob/9d098787a71d1c7f7f6adea05da15b0da3ecc7ef/src/couch/src/couch_file.erl#L223

Unless I’m missing somewhere else that we’re making that assertion.

On Sun, Jul 21, 2019 at 4:21 AM Jan Lehnardt  wrote:

> Hey all,
>
>
> Joan sent this along in IRC and it reads bad enough[tm] that we should at
> least have a few pairs of eyes looking at if we have to do anything:
>
> http://danluu.com/fsyncgate/
>
> (It is long and dense, you’ll need to read 10-15% of that page to get the
> main picture.
>
> tl;dr: running fsync() after an fsync() that reported EIO clears that
> error state with no way of recovery on Linux.
>
> There are two ways of handling this correctly:
>
> 1. whatever you wrote() between the last successful fsync() and the
> fsync() that raised the error, keep around until after the second fsync(),
> so you can write() it again.
>
> 2. if any one fsync() returns EIO, report this back up immediately, so
> whoever calls you can retry.
>
> * * *
>
> We seem to be doing 2. as per my reading.
>
> Erlang looks like it correctly just raises whatever error fsync() might
> return:
>
> 1.
> https://github.com/erlang/otp/blob/maint-r14/erts/emulator/drivers/unix/unix_efile.c#L792-L809
> 2.
> https://github.com/erlang/otp/blob/maint-r14/erts/emulator/drivers/unix/unix_efile.c#L151-L163
>
> couch_file too:
>
> 1.
> https://github.com/apache/couchdb/blob/master/src/couch/src/couch_file.erl#L215-L223
>
> I glanced at a few paths going up this chain and couldn’t spot a catch
> where we’d hide that error, but it’d be great to get some confirmation on
> this.
>
> * * *
>
> Please double-check my understanding of the issue, the correct ways
> forward and the findings in Erlang and CouchDB.
>
> Best
> Jan
> --
> Professional Support for Apache CouchDB:
> https://neighbourhood.ie/couchdb-support/
>
>


Re: [VOTE] amend the bylaws

2019-07-01 Thread Paul Davis
+1

Also for others fighting GitHub's side scrolly thing, this article
made review significantly easier:

https://www.viget.com/articles/dress-up-your-git-diffs-with-word-level-highlights/

On Mon, Jul 1, 2019 at 9:55 AM Naomi S  wrote:
>
> hello,
>
> I have prepared a pull request that amends the bylaws:
>
> https://github.com/apache/couchdb-www/pull/31
>
> this PR makes two changes:
>
> - Add explicit  elements to fix display issue
>
> - Remove redundant and potentially confusing paragraph about the chair
>
> The first sentence of this paragraph repeats some of the text in the previous 
> paragraph. The second sentence covers powers and responsibilities shared with 
> the PMC, and stating this in this section runs the risk of implying that the 
> chair has special powers, which is not the case. (See § 3.8. Decision Types)
>
> - General copyedit for style, grammar, spelling, etc.
>
> - Detab the file and fix some minor HTML formatting issues
>
> because this is a change to the bylaws, the PR requires voting and a lazy 2/3 
> majority (binding votes: PMC members) to pass
>
> for convenience, I have attached the commits as patches to this email. I hope 
> they get through the list filters
>
> please cast your vote now
>
> in three or more days, I will tally the votes and announce the result
>
> thank you!
>
> P.S. I made the copyedit change because, as it turns out, my writing has 
> improved a lot since 2015. I have, to the best of my ability, preserved the 
> meaning and intent of the text
>


Re: Use ExUnit to write unit tests.

2019-05-23 Thread Paul Davis
On Thu, May 23, 2019 at 11:04 AM Joan Touzet  wrote:
>
> On 2019-05-23 11:15, Paul Davis wrote:
> > I'm pretty happy with the ExUnit we've got going for the HTTP
> > interface and would be an enthusiastic +1 on starting to use it for
> > internals as well.
>
> Wait, where's the full ExUnit implementation for the HTTP interface? Is
> that Ilya's PR, or something that Cloudant runs internally?
>
> If you mean the slow conversion of the JS tests over to Elixir, I wasn't
> aware that these were implemented in ExUnit already. Learn something new
> every day!
>

Just the slow conversion is all I meant. There's no magical HTTP test
suite hiding anywhere. :P

> > The only thing I'd say is that the adapter concept while interesting
> > doesn't feel like it would be that interesting for our particular
> > situation. I could see it being useful for the 5984/5986 distinction
> > since its the same code underneath and we'd only be munging a few
> > differences for testing. However, as Garren points out 5986 is going
> > to disappear one way or another so long term not a huge deal.
>
> +1, the intent was to deprecate 5986 for CouchDB 3.0, and obviously it's
> gone for 4.0.
>
> -Joan
>


Re: Use ExUnit to write unit tests.

2019-05-23 Thread Paul Davis
I'm pretty happy with the ExUnit we've got going for the HTTP
interface and would be an enthusiastic +1 on starting to use it for
internals as well.

The only thing I'd say is that the adapter concept while interesting
doesn't feel like it would be that interesting for our particular
situation. I could see it being useful for the 5984/5986 distinction
since its the same code underneath and we'd only be munging a few
differences for testing. However, as Garren points out 5986 is going
to disappear one way or another so long term not a huge deal.

For testing HTTP vs Fabric layer I'd wager that it'll turn out to be
not very useful. There's a *lot* of code in chttpd that mutates
results returned from fabric functions. Attempting to do meaningful
tests using logic based on adapaters seems to me like it means that
you'd have to either reimplement chttpd bug-complete in the fabric
adapter, or do some sort of weird inverted-chttpd in the HTTP adapter
such that your tests can make the same assertions. And neither of
those ideas sounds like a good idea to me.

Currently my ideal approach would be to start by implementing a test
suite that covers fabric thoroughly (based on at least code coverage
and possibly unnamed other tooling), and then move to chttpd and do
the same. That way each test suite is focused on a particular layer of
concern rather than trying to formulate a single suite that tests two
different layers.

On Thu, May 23, 2019 at 6:21 AM Ilya Khlopotov  wrote:
>
> Hi Joan,
>
> My answers inline
>
> On 2019/05/22 20:16:18, Joan Touzet  wrote:
> > Hi Ilya, thanks for starting this thread. Comments inline.
> >
> > On 2019-05-22 14:42, Ilya Khlopotov wrote:
> > > The eunit testing framework is very hard to maintain. In particular, it 
> > > has the following problems:
> > > - the process structure is designed in such a way that failure in setup 
> > > or teardown of one test affects the execution environment of subsequent 
> > > tests. Which makes it really hard to locate the place where the problem 
> > > is coming from.
> >
> > I've personally experienced this a lot when reviewing failed logfiles,
> > trying to find the *first* failure where things go wrong. It's a huge
> > problem.
> >
> > > - inline test in the same module as the functions it tests might be 
> > > skipped
> > > - incorrect usage of ?assert vs ?_assert is not detectable since it makes 
> > > tests pass
> > > - there is a weird (and hard to debug) interaction when used in 
> > > combination with meck
> > >- https://github.com/eproxus/meck/issues/133#issuecomment-113189678
> > >- https://github.com/eproxus/meck/issues/61
> > >- meck:unload() must be used instead of meck:unload(Module)
> >
> > Eep! I wasn't aware of this one. That's ugly.
> >
> > > - teardown is not always run, which affects all subsequent tests
> >
> > Have first-hand experienced this one too.
> >
> > > - grouping of tests is tricky
> > > - it is hard to group tests so individual tests have meaningful 
> > > descriptions
> > >
> > > We believe that with ExUnit we wouldn't have these problems:
> >
> > Who's "we"?
> Wrong pronoun read it as I.
>
> >
> > > - on_exit function is reliable in ExUnit
> > > - it is easy to group tests using `describe` directive
> > > - code-generation is trivial, which makes it is possible to generate 
> > > tests from formal spec (if/when we have one)
> >
> > Can you address the timeout question w.r.t. EUnit that I raised
> > elsewhere for cross-platform compatibility testing? I know that
> > Peng ran into the same issues I did here and was looking into extending
> > timeouts.
> >
> > Many of our tests suffer from failures where CI resources are slow and
> > simply fail due to taking longer than expected. Does ExUnit have any
> > additional support here?
> >
> > A suggestion was made (by Jay Doane, I believe, on IRC) that perhaps we
> > simply remove all timeout==failure logic (somehow?) and consider a
> > timeout a hung test run, which would eventually fail the entire suite.
> > This would ultimately lead to better deterministic testing, but we'd
> > probably uncover quite a few bugs in the process (esp. against CouchDB
> > <= 4.0).
>
> There is one easy workaround. We could set trace: true in the config
> because one of the side effects of it is timeout = infinity (see here 
> https://github.com/elixir-lang/elixir/blob/master/lib/ex_unit/lib/ex_unit/runner.ex#L410).
>  However this approach has an important caveat:
> - all tests would be run sequentially which means that we wouldn't be able to 
> parallelize them latter.
>
> > >
> > > Here are a few examples:
> > >
> > > # Test adapters to test different interfaces using same test suite
> >
> > This is neat. I'd like someone else to comment whether this the approach
> > you define will handle the polymorphic interfaces gracefully, or if the
> > effort to parametrise/DRY out the tests will be more difficulty than
> > simply maintaining 4 sets of tests.
> >
> >
> > > # Using same 

Re: Numbers in JavaScript, Lucene, and FoundationDB

2019-05-16 Thread Paul Davis
Its late so just a few quick notes here:

Jiffy decodes numbers based on their encoding. I.e., any number that
includes a decimal point or exponent is decoded as a double while any
integer is decoded as an integer or bignum depending on size. While
encoding jiffy will also encode 1.0 as "1.0" and 1 as "1". Generally
speaking this seems to be the least surprising behavior for users.

That said, one particular aspect of JSON and numbers in particular has
always been around money math. Things like "$1 / 3" follow a different
set of rules than arbitrary floating point arithmetic. CouchDB has a
long history of telling users that numbers mostly behave like doubles
given our JavaScript default. Given that, I would expect anyone that
needs a JSON oriented database that has fancy numerical needs to
already be paying special attention to their numeric data.

The FoundationDB collation does definitely present new questions given
that we're forced to implement a strict byte ordering. On the face of
it I'm more than fine forcing everything to doubles and providing the
mentioned warning label. I do know that FoundationDB's tuple layer has
some ¯\_(ツ)_/¯ semantics for "invalid" doubles (-Nan, Nan, -0, other
oddities I'd never heard of). So there may be caveats to mention there
as well. However, for the most part I'd our standard reply of "if you
care about your numbers to the actual bit representation level, use a
string representation" is while maybe not officially official, still
the best advice given JSON.

That of course ignores the fact that `emit(1, 2)` returns a view row
of `("1.0", "2.0")` which Adam noted as another whole big thing. On
that I don't have any amazing thoughts this late at night.

On Thu, May 16, 2019 at 9:39 PM Adam Kocoloski  wrote:
>
> Hi all, CouchDB has always had a somewhat complicated relationship with 
> numbers. I’d like to dig into that a little bit and see if any changes are 
> warranted, or if we can at least be really clear about exactly how they’re 
> handled going forward.
>
> Most of you are likely aware that JS represents *all* numbers as IEEE 754 
> double precision floats. This means that any number in a JSON document with 
> more than 15 significant digits is at risk of being corrupted when it passes 
> through the JS engine during a view build, for example. Our current behavior 
> is to let that silent corruption occur and put whatever number comes out of 
> the JS engine into the view, formatting as a double, int64, or bignum based 
> on jiffy’s decoding of the JSON output from the JS code.
>
> On the other hand, FoundationDB’s tuple layer encoding is quite a bit more 
> specific. It has a whole bunch of typecodes for integers of practically 
> arbitrary size (up to 255 bytes), along with codes for 32 bit and 64 bit 
> floating point numbers. The typecodes control the sorting; i.e., integers 
> sort separately from floats.
>
> We also have the ever-popular Lucene indexes for folks who build CouchDB with 
> the search extension. I don’t have all the details for the number handling in 
> that one handy, but it is another one to keep in mind.
>
> One question that comes up fairly quickly — when a user emits a number as a 
> key in a view, what do we store in FoundationDB? In order to respect 
> CouchDB’s existing collation rules we need to use the same typecode for all 
> numbers. Do we simply treat every number as a double, since they were all 
> coerced into that representation anyway in JS?
>
> But now let’s consider Mango indexes, which don’t suffer from any of 
> JavaScript’s sloppiness around number handling. If we’re to respect CouchDB’s 
> current collation rules we still need a common typecode and sortable binary 
> representation across integers and floats. Do we end up using the IEEE 754 
> float representation of each number as a “sort key” and storing the original 
> number alongside it?
>
> I feel like this ends up being a rabbit hole, but one where we owe it to our 
> users to thoroughly explore and produce a definitive guide :)
>
> Cheers, Adam
>
>
>
>
>
>
>
>
>
>
>
>


Re: [DISCUSS] couch_event and FDB

2019-04-29 Thread Paul Davis
On Mon, Apr 29, 2019 at 1:29 PM Nick Vatamaniuc  wrote:
>
> After discussing how replicator might be implemented with fdb
> https://lists.apache.org/thread.html/9338bd50f39d7fdec68d7ab2441c055c166041bd84b403644f662735@%3Cdev.couchdb.apache.org%3E,
> and thinking about a global jobs queue for replicator (and indexing as
> well, as Garren had mentioned), I had noticed that we might want to have
> something like a couch_event subsystem. We'd need it that to drive creation
> of replication jobs or pending indexing requests. So that's basically what
> this discussion thread is about.
>
> couch_event, for those who don't know, is a node-local event bus for
> database-level events. When dbs are created, deleted, or updated it
> publishes a corresponding event. Any interested listeners subscribe to
> those events. Currently some of the listeners are:
>
>  * ddoc cache : for ddoc updates, and db creates and deletes
>  * ken : for ddoc updates to know when to start building indexes
>  * _replicator db updates : to monitor changes to replication docs
>  * _users db auth : to drive auth cache updates
>  * _db_updates handler : drive db updates
>  * smoosh : drive compaction triggers
>
> This all happens in memory on each node. We use the nifty and fast khash
> library to distribute events. If listeners register and then die, we
> automatically clean that up.
>
> Now with fdb, some of those users of couch_event go away (smoosh) but most
> remain, and we need something to drive them.
>
> There was a separate discussion for how to implement _db_updates handler,
> which is a starting point for this one, but I had made a new thread as it
> involves not just the _db_updates but those other things:
>
> https://lists.apache.org/thread.html/a7bf140aea864286817bbd4f16f5a2f0a295f4d046950729400e0e2a@%3Cdev.couchdb.apache.org%3E
>

Just to make sure, this basically reads like _db_updates but with the
addition that we're tracking updates to design docs and maybe local
docs except the only thing with local docs is going away.

> From the _db_updates discussion I liked the proposal to use atomic ops to
> accumulate and deduplicate events and consumers periodically reading and
> resetting the stats.
>
> The list of events currently being published is the following:
>
>  * DbName, created
>  * DbName, deleted
>  * DbName, updated
>  * DbName, local_updated (not needed anymore, used by smoosh only)
>  * DbName, ddoc_updated
>  * DbName, {ddoc_updated, DDocId}
>
> (The {ddoc_updated, DDocId} makes it slightly more difficult as we'd need
> to track specific DDocIDs. Maybe we could forgo such detailed updates and
> let consumers keep track design documents on their own?)
>
> But the idea is to have consumer worker processes and queues for each
> consumer of the API. We could share them per-node. So, if on the same node
> replicator and indexer want to find out about db update, we'd just add
> extra callbacks for one, but they'd share the same consumer worker. Each
> consumer would receive updates in their queue from the producers. Producers
> would mostly end up doing atomic ops to update counts and periodically
> monitor if consumer are still alive. Consumers would poll changes (or use
> watches) to their update queues and notify their callbacks (start
> replication jobs, update _db_updates DB, start indexing jobs, etc.).
>
> Because consumers can die at any time, we don't want to have a growing
> event queue, so each consumer will periodically report its health in the
> db. The producer will monitor periodically all the consumers health
> (asynchronously, using snapshot reads) and if consumers stop updating their
> health their queues will be cleared. If they are alive next time they go to
> read they'd have to re-register themselves in the consumers list. (This is
> the same pattern from the replication proposal to replace process
> linking/monitoring and cleanup that we have now).
>

I'm super confused about this discussion of queues and producers and
consumers. Comparing with the _db_updates discussion it sounds like
"producers" would be anything that modifies a database or otherwise
generate an "event". And the queue is maybe the counters subspace?
Though its not really a queue right, its just a counter table? In the
_db_updates discussion we'd have a process that would reference the
counters table and then periodically update the _db_updates subspace
with whatever changed which sounds like something you'd expect on
every node?

> The data model might look something like:
>
>  ("couch_event", "consumers") = [C1, C2,...]
>  ("couch_event", Ci, "heath") = (MaxTimeout, Timestamp)
>  ("couch_event", Ci, "events", DbName, DbVer) = CreateDeleteUpdates
>  ("couch_event", Ci, "events", DbName, DbVer, "ddoc_updates", DDocId) =
> Updates
>
> CreateDeleteUpdates is an integer that will encode create, delete, and
> updates in one value using atomic ops:
>
> * The value is initialized to 2, this is equivalent to "unknown" or "idle"
> state.
> * 

Re: [DISCUSS] Statistics maintenance in FoundationDB

2019-04-09 Thread Paul Davis
I've only got two notes for color.

I'm pretty sure that keeping the update_seq as a key could be fine
since its an atomic op underneath and shouldn't conflict. However
given that we're looking to store an Incarnation and Batch Id with
every version stamp I still think it makes better sense to read from
the end of the changes feed as that means we're only doing the update
logic in a single place.

For the offset calculation I'll just mention that its the same
scenario as custom JS reduces in that we need to be able to calculate
some value over and arbitrary range of keys. For custom JS reduces I
could see having the complexity (if we go that route) however for
offset its not very useful. Especially given fdb transaction
limitations which means its not necessarily valid any time we have to
use multiple transactions to satisfy a read from the index.

On Tue, Apr 9, 2019 at 3:12 AM Robert Newson  wrote:
>
> Hi,
>
> I agree with all of this.
>
> On "sizes", we should clean up the various places that the different sizes 
> are reported. I suggest we stick with just the "sizes" object, which will 
> have two items, 'external' which will be jiffy's estimate of the body as json 
> plus the length of all attachments (only if held within fdb) and 'file' which 
> will be the sum of the lengths of the keys and values in fdb for the 
> Directory (excluding the sum key/value itself). (the long way of saying I 
> agree with what you already said).
>
> On "offset", I agree we should remove it. It's of questionable value today, 
> so let's call it out as an API change in the appropriate RFC section. The fdb 
> release (ostensibly "4.0") is an opportunity to clean up some API cruft. 
> Given we know about this one early, we should also remove it in 3.0.
>
> --
>   Robert Samuel Newson
>   rnew...@apache.org
>
> On Mon, 8 Apr 2019, at 23:33, Adam Kocoloski wrote:
> > Hi all, a recent comment from Paul on the revision model RFC reminded
> > me that we should have a discussion on how we maintain aggregate
> > statistics about databases stored in FoundationDB. I’ll ignore the
> > statistics associated with secondary indexes for the moment, assuming
> > that the design we put in place for document data can serve as the
> > basis for an extension there.
> >
> > The first class of statistics are the ones we report in GET /,
> > which are documented here:
> >
> > http://docs.couchdb.org/en/stable/api/database/common.html#get--db
> >
> > These fall into a few different classes:
> >
> > doc_count, doc_del_count: these should be maintained using
> > FoundationDB’s atomic operations. The revision model RFC enumerated all
> > the possible update paths and showed that we always have enough
> > information to know whether to increment or decrement each of these
> > counters; i.e., we always know when we’re removing the last
> > deleted=false branch, adding a new branch to a previously-deleted
> > document, etc.
> >
> > update_seq: this must _not_ be maintained as its own key; attempting to
> > do so would cause every write to the database to conflict with every
> > other write and kill throughput. Rather, we can do a limit=1 range read
> > on the end of the ?CHANGES space to retrieve the current sequence of
> > the database.
> >
> > sizes.*: things get a little weird here. Historically we relied on the
> > relationship between sizes.active and sizes.file to know when to
> > trigger a database compaction, but we don’t yet have a need for
> > compaction in the FDB-based data model and it’s not clear how we should
> > define these two quantities. The sizes.external field has also been a
> > little fuzzy. Ignoring the various definitions of “size” for the
> > moment, let’s agree that we’ll want to be tracking some set of byte
> > counts for each database. I think the way we should do this is by
> > extending the information stored in each edit branch in ?REVISIONS to
> > included the size(s) of the current revision. When we update a document
> > we need to compare the size(s) of the new revision with the size(s) of
> > the parent, and update the database level atomic counter(s)
> > appropriately. This requires an enhancement to RFC 001.
> >
> > I’d like to further propose that we track byte counts not just at a
> > database level but also across the entire Directory associated with a
> > single CouchDB deployment, so that FoundationDB administrators managing
> > multiple applications for a single cluster can have a better view of
> > per-Directory resource utilization without walking every single
> > database stored inside.
> >
> > Looking past the DB info endpoint, one other statistic worth discussing
> > is the “offset” field included with every response to an _all_docs
> > request. This is not something that we get for free in FoundationDB,
> > and I have to confess it seems to be of limited utility. We could
> > support this by implementing a tree structure by adding additional
> > aggregation keys on top of the keys stored in the 

Re: Prototype CouchDB Layer for FoundationDB

2019-03-28 Thread Paul Davis
On Thu, Mar 28, 2019 at 10:58 AM Adam Kocoloski  wrote:
>
> Hi Paul, good stuff.
>
> I agree with you about the FDB Subspaces feature. I’ve been thinking that our 
> layer code should maintain its own enumeration of the various “subspaces” to 
> single-byte prefixes within the directory. I haven’t yet captured that in the 
> RFCs, but e.g. we should be using ?REVISIONS instead of <<“revisions”>> as 
> the prefix for the KVs representing the revision tree.
>

I agree with you so hard that I already did exactly that:

https://github.com/apache/couchdb/blob/prototype/fdb-layer/src/fabric/src/fabric2_fdb.erl#L67-L77

> I also agree with the use of a top-level Directory to enable multiple 
> applications to use the same FoundationDB cluster.
>
> Adam
>
> > On Mar 27, 2019, at 1:46 PM, Nick Vatamaniuc  wrote:
> >
> > Looking over the code, it seems very simple and clean. Without knowing much
> > of the internals or following the discussion too closely I think I was able
> > to read and understand most of it.
> >
> > I like split between db and fdb layers. Hopefully it means if we start from
> > this we can do some parallel work implementing on top of db layer and below
> > it at the same time.
> >
> > The use of maps is nice and seems to simply things quite a bit.
> >
> > Don't have much to add about metadata and other issues, will let others who
> > know more chime in. It seems a bit similar to how we had the
> > instance_start_time at one point or how we add the suffix to db shards.
> >
> > Great work!
> > -Nick
> >
> > On Wed, Mar 27, 2019 at 12:53 PM Paul Davis 
> > wrote:
> >
> >> Hey everyone!
> >>
> >> I've gotten enough of a FoundationDB layer prototype implemented [1]
> >> to start sharing publicly. This is emphatically no where near useful
> >> to non-CouchDB-developers. The motivation for this work was to try and
> >> get enough of a basic prototype written so that we can all start
> >> fleshing out the various RFCs with actual implementations to compare
> >> and contrast and so on.
> >>
> >> To be clear, I've made a lot of intentionally "bad" choices while
> >> writing this to both limit the scope of what I was trying to
> >> accomplish and also to make super clear that I don't expect any of
> >> this code to be "final" in any way whatsoever. This work is purely so
> >> that everyone has an initial code base that can be "turned on" so to
> >> speak. To that end, here's a non-exhaustive list of some of the
> >> silliness I've done:
> >>
> >>  1. All document bodies must fit inside a single value
> >>  2. All requests must fit within the single fdb transaction limits
> >>  3. I'm using binary_to_term for things like the revision tree
> >>  4. The revision tree has to fit in a single value
> >>  5. There's basically 0 supported query string parameters at this point
> >>  6. Nothing outside super basic db/doc ops is implemented (i.e., no views)
> >>
> >> However, what it does do is start! And it talks to FoundationDB! So at
> >> least that bit seems to be reliable (only tested on OS X via
> >> FoundationDB binary installers so super duper caveat on that
> >> "reliable").
> >>
> >> There's a small test script [2] that shows what it's currently capable
> >> of. A quick glance at that should give a pretty good idea of how
> >> little is actually implemented in this prototype. There's also a list
> >> of notes I've been keeping as I've been hacking on this that also
> >> tries to gather a bunch of questions that'll need to be answered [3]
> >> as we continue to work on this.
> >>
> >> To that end, I have learned a couple lessons from working with
> >> FoundationDB from this work that I'd like to share. First is that
> >> while we can cache a bunch of stuff, we have to be able to ensure that
> >> the cache is invalidated properly when various bits of metadata
> >> change. There's a feature on FoundationDB master [1] for this specific
> >> issue. I've faked the same behavior using an arbitrary key but the
> >> `fabric2_fdb:is_current/1` function I think is a good implementation
> >> of this done correctly.
> >>
> >> Secondly, I spent a lot of time trying to figure out how to use
> >> FoundationDB's Directory and Subspace layers inside the CouchDB layer.
> >> After barking up that tree for a long time I've basically decided that
> >> the best answer is pro

Prototype CouchDB Layer for FoundationDB

2019-03-27 Thread Paul Davis
Hey everyone!

I've gotten enough of a FoundationDB layer prototype implemented [1]
to start sharing publicly. This is emphatically no where near useful
to non-CouchDB-developers. The motivation for this work was to try and
get enough of a basic prototype written so that we can all start
fleshing out the various RFCs with actual implementations to compare
and contrast and so on.

To be clear, I've made a lot of intentionally "bad" choices while
writing this to both limit the scope of what I was trying to
accomplish and also to make super clear that I don't expect any of
this code to be "final" in any way whatsoever. This work is purely so
that everyone has an initial code base that can be "turned on" so to
speak. To that end, here's a non-exhaustive list of some of the
silliness I've done:

  1. All document bodies must fit inside a single value
  2. All requests must fit within the single fdb transaction limits
  3. I'm using binary_to_term for things like the revision tree
  4. The revision tree has to fit in a single value
  5. There's basically 0 supported query string parameters at this point
  6. Nothing outside super basic db/doc ops is implemented (i.e., no views)

However, what it does do is start! And it talks to FoundationDB! So at
least that bit seems to be reliable (only tested on OS X via
FoundationDB binary installers so super duper caveat on that
"reliable").

There's a small test script [2] that shows what it's currently capable
of. A quick glance at that should give a pretty good idea of how
little is actually implemented in this prototype. There's also a list
of notes I've been keeping as I've been hacking on this that also
tries to gather a bunch of questions that'll need to be answered [3]
as we continue to work on this.

To that end, I have learned a couple lessons from working with
FoundationDB from this work that I'd like to share. First is that
while we can cache a bunch of stuff, we have to be able to ensure that
the cache is invalidated properly when various bits of metadata
change. There's a feature on FoundationDB master [1] for this specific
issue. I've faked the same behavior using an arbitrary key but the
`fabric2_fdb:is_current/1` function I think is a good implementation
of this done correctly.

Secondly, I spent a lot of time trying to figure out how to use
FoundationDB's Directory and Subspace layers inside the CouchDB layer.
After barking up that tree for a long time I've basically decided that
the best answer is probably "don't". I do open a single directory at
the root, but that's merely in order to play nice with any other
layers that use the directory layer. Inside the "CouchDB directory"
its all strictly Tuple Layer direct code.

The Subspace Layer seems to be basically useless in Erlang. First, its
a very thin wrapper over the Tuple Layer that basically just holds
onto a prefix that's prepended onto the tuple layer operations. In
other languages the Subspace Layer has a lot of syntactical sugar that
makes them useful. Erlang doesn't support any of that so it ends up
being more of a burden to use that rather than just using the Tuple
Layer directly. Dropping the use of directories and subspaces has
greatly simplified the implementation thus far.

In terms of code layout, nearly all of the new implementation is in
`src/fabric/src/fabric2*` modules. There's also a few changes to
chttpd obviously to call the new code as well as commenting out parts
of features so I didn't have to follow all the various call stacks
updating huge swathes of semi-unrelated code.

I'd be super interested to hear feed back and see people start running
with this in whatever direction catches their fancy. Hopefully this
proves useful for people to start writing implementations of the
various RFCs so we can make progress on those fronts.

[1] https://github.com/apache/couchdb/compare/prototype/fdb-layer
[2] https://github.com/apache/couchdb/blob/prototype/fdb-layer/fdb-test.py
[3] https://github.com/apache/couchdb/blob/prototype/fdb-layer/FDB_NOTES.md
[4] 
https://forums.foundationdb.org/t/a-new-tool-for-managing-layer-metadata/1191


Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-20 Thread Paul Davis
Strongly agree that we very much don't want to have Erlang-isms being
pushed into fdb. Regardless of what we end up with I'd like to see a
very strong (de)?serialization layer with some significant test
coverage.

On Tue, Feb 19, 2019 at 6:54 PM Adam Kocoloski  wrote:
>
> Yes, that sort of versioning has been omitted from the various concrete 
> proposals but we definitely want to have it. We’ve seen the alternative in 
> some of the Erlang records that we serialize to disk today and it ain’t 
> pretty.
>
> I can imagine that we’ll want to have the codebase laid out in a way that 
> allows us to upgrade to a smarter KV encoding over time without major 
> surgery, which I think is a good “layer of abstraction”. I would be nervous 
> if we started having abstract containers of data structures pushed down into 
> FDB itself :)
>
> Adam
>
> > On Feb 19, 2019, at 5:41 PM, Paul Davis  wrote:
> >
> > A simple doc storage version number would likely be enough for future us to
> > do fancier things.
> >
> > On Tue, Feb 19, 2019 at 4:16 PM Benjamin Anderson 
> > wrote:
> >
> >>> I don’t think adding a layer of abstraction is the right move just yet,
> >> I think we should continue to find consensus on one answer to this question
> >>
> >> Agree that the theorycrafting stage is not optimal for making
> >> abstraction decisions, but I suspect it would be worthwhile somewhere
> >> between prototyping and releasing. Adam's proposal does seem to me the
> >> most appealing approach on the surface, and I don't see anyone signing
> >> up to do the work to deliver an alternative concurrently.
> >>
> >> --
> >> ba
> >>
> >> On Tue, Feb 19, 2019 at 1:43 PM Robert Samuel Newson 
> >> wrote:
> >>>
> >>> Addendum: By “directory aliasing” I meant within a document (either the
> >> actual Directory thing or something equivalent of our own making). The
> >> directory aliasing for each database is a good way to reduce key size
> >> without a significant cost. Though if Redwood lands in time, even this
> >> would become an inutile obfuscation].
> >>>
> >>>> On 19 Feb 2019, at 21:39, Robert Samuel Newson 
> >> wrote:
> >>>>
> >>>> Interesting suggestion, obviously the details might get the wrong kind
> >> of fun.
> >>>>
> >>>> Somewhere above I suggested this would be something we could change
> >> over time and even use different approaches for different documents within
> >> the same database. This is the long way of saying there are multiple ways
> >> to do this each with advantages and none without disadvantages.
> >>>>
> >>>> I don’t think adding a layer of abstraction is the right move just
> >> yet, I think we should continue to find consensus on one answer to this
> >> question (and the related ones in other threads) for the first release.
> >> It’s easy to say “we can change it later”, of course. We can, though it
> >> would be a chunk of work in the context of something that already works,
> >> I’ve rarely seen anyone sign up for that.
> >>>>
> >>>> I’m fine with the first proposal from Adam, where the keys are tuples
> >> of key parts pointing at terminal values. To make it easier for the first
> >> version, I would exclude optimisations like deduplication or the Directory
> >> aliasing or the schema thing that I suggested and that Ilya incorporated a
> >> variant of in a follow-up post. We’d accept that there are limits on the
> >> sizes of documents, including the awkward-to-express one about property
> >> depth.
> >>>>
> >>>> Stepping back, I’m not seeing any essential improvement over Adam’s
> >> original proposal besides the few corrections and clarifications made by
> >> various authors. Could we start an RFC based on Adam’s original proposal on
> >> document body, revision tree and index storage? We could then have PR’s
> >> against that for each additional optimisation (one person’s optimisation is
> >> another person’s needless complication)?
> >>>>
> >>>> If I’ve missed some genuine advance on the original proposal in this
> >> long thread, please call it out for me.
> >>>>
> >>>> B.
> >>>>
> >>>>> On 19 Feb 2019, at 21:15, Benjamin Anderson 
> >> wrote:
> >>>>>
> >>>>> As is evident by the length of this thread, there's a pr

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-19 Thread Paul Davis
Assuming we have multiple competing implementations performance
characterization of each is going to have to be a requirement, right?

Theoretically this would all be at the HTTP layer such that we can swap
things in and out and easily check total system performance and not
duplicate effort and get distracted by pineapple vs recliner comparisons.

The theory being that we’d focus on observable differences weighed against
implementation complexity against feature “ability” in terms of what
various designs might offer.

I’ve been working on Erlang bindings for FoundationDB for a bit and would
really like to bring their approach to testing up through the rest of
CouchDB. Its very reminiscent of property based testing though slightly
less formal.  But it certainly finds bugs. Anything we do regardless of
peromance  I think should be accompanied by a similarly thoroughbtest suite.


On Tue, Feb 19, 2019 at 5:45 PM Joan Touzet  wrote:

> Would it be too much work to prototype both and check CRUD timings for
> each across a small variety of documents?
>
> -Joan
>
> - Original Message -----
> > From: "Paul Davis" 
> > To: dev@couchdb.apache.org
> > Sent: Tuesday, February 19, 2019 5:41:23 PM
> > Subject: Re: [DISCUSS] : things we need to solve/decide : storing JSON
> documents
> >
> > A simple doc storage version number would likely be enough for future
> > us to
> > do fancier things.
> >
> > On Tue, Feb 19, 2019 at 4:16 PM Benjamin Anderson
> > 
> > wrote:
> >
> > > > I don’t think adding a layer of abstraction is the right move
> > > > just yet,
> > > I think we should continue to find consensus on one answer to this
> > > question
> > >
> > > Agree that the theorycrafting stage is not optimal for making
> > > abstraction decisions, but I suspect it would be worthwhile
> > > somewhere
> > > between prototyping and releasing. Adam's proposal does seem to me
> > > the
> > > most appealing approach on the surface, and I don't see anyone
> > > signing
> > > up to do the work to deliver an alternative concurrently.
> > >
> > > --
> > > ba
> > >
> > > On Tue, Feb 19, 2019 at 1:43 PM Robert Samuel Newson
> > > 
> > > wrote:
> > > >
> > > > Addendum: By “directory aliasing” I meant within a document
> > > > (either the
> > > actual Directory thing or something equivalent of our own making).
> > > The
> > > directory aliasing for each database is a good way to reduce key
> > > size
> > > without a significant cost. Though if Redwood lands in time, even
> > > this
> > > would become an inutile obfuscation].
> > > >
> > > > > On 19 Feb 2019, at 21:39, Robert Samuel Newson
> > > > > 
> > > wrote:
> > > > >
> > > > > Interesting suggestion, obviously the details might get the
> > > > > wrong kind
> > > of fun.
> > > > >
> > > > > Somewhere above I suggested this would be something we could
> > > > > change
> > > over time and even use different approaches for different documents
> > > within
> > > the same database. This is the long way of saying there are
> > > multiple ways
> > > to do this each with advantages and none without disadvantages.
> > > > >
> > > > > I don’t think adding a layer of abstraction is the right move
> > > > > just
> > > yet, I think we should continue to find consensus on one answer to
> > > this
> > > question (and the related ones in other threads) for the first
> > > release.
> > > It’s easy to say “we can change it later”, of course. We can,
> > > though it
> > > would be a chunk of work in the context of something that already
> > > works,
> > > I’ve rarely seen anyone sign up for that.
> > > > >
> > > > > I’m fine with the first proposal from Adam, where the keys are
> > > > > tuples
> > > of key parts pointing at terminal values. To make it easier for the
> > > first
> > > version, I would exclude optimisations like deduplication or the
> > > Directory
> > > aliasing or the schema thing that I suggested and that Ilya
> > > incorporated a
> > > variant of in a follow-up post. We’d accept that there are limits
> > > on the
> > > sizes of documents, including the awkward-to-express one about
> > > property
> > > depth.
> > > > >

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-19 Thread Paul Davis
A simple doc storage version number would likely be enough for future us to
do fancier things.

On Tue, Feb 19, 2019 at 4:16 PM Benjamin Anderson 
wrote:

> > I don’t think adding a layer of abstraction is the right move just yet,
> I think we should continue to find consensus on one answer to this question
>
> Agree that the theorycrafting stage is not optimal for making
> abstraction decisions, but I suspect it would be worthwhile somewhere
> between prototyping and releasing. Adam's proposal does seem to me the
> most appealing approach on the surface, and I don't see anyone signing
> up to do the work to deliver an alternative concurrently.
>
> --
> ba
>
> On Tue, Feb 19, 2019 at 1:43 PM Robert Samuel Newson 
> wrote:
> >
> > Addendum: By “directory aliasing” I meant within a document (either the
> actual Directory thing or something equivalent of our own making). The
> directory aliasing for each database is a good way to reduce key size
> without a significant cost. Though if Redwood lands in time, even this
> would become an inutile obfuscation].
> >
> > > On 19 Feb 2019, at 21:39, Robert Samuel Newson 
> wrote:
> > >
> > > Interesting suggestion, obviously the details might get the wrong kind
> of fun.
> > >
> > > Somewhere above I suggested this would be something we could change
> over time and even use different approaches for different documents within
> the same database. This is the long way of saying there are multiple ways
> to do this each with advantages and none without disadvantages.
> > >
> > > I don’t think adding a layer of abstraction is the right move just
> yet, I think we should continue to find consensus on one answer to this
> question (and the related ones in other threads) for the first release.
> It’s easy to say “we can change it later”, of course. We can, though it
> would be a chunk of work in the context of something that already works,
> I’ve rarely seen anyone sign up for that.
> > >
> > > I’m fine with the first proposal from Adam, where the keys are tuples
> of key parts pointing at terminal values. To make it easier for the first
> version, I would exclude optimisations like deduplication or the Directory
> aliasing or the schema thing that I suggested and that Ilya incorporated a
> variant of in a follow-up post. We’d accept that there are limits on the
> sizes of documents, including the awkward-to-express one about property
> depth.
> > >
> > > Stepping back, I’m not seeing any essential improvement over Adam’s
> original proposal besides the few corrections and clarifications made by
> various authors. Could we start an RFC based on Adam’s original proposal on
> document body, revision tree and index storage? We could then have PR’s
> against that for each additional optimisation (one person’s optimisation is
> another person’s needless complication)?
> > >
> > > If I’ve missed some genuine advance on the original proposal in this
> long thread, please call it out for me.
> > >
> > > B.
> > >
> > >> On 19 Feb 2019, at 21:15, Benjamin Anderson 
> wrote:
> > >>
> > >> As is evident by the length of this thread, there's a pretty big
> > >> design space to cover here, and it seems unlikely we'll have arrived
> > >> at a "correct" solution even by the time this thing ships. Perhaps it
> > >> would be worthwhile to treat the in-FDB representation of data as a
> > >> first-class abstraction and support multiple representations
> > >> simultaneously?
> > >>
> > >> Obviously there's no such thing as a zero-cost abstraction - and I've
> > >> not thought very hard about how far up the stack the document
> > >> representation would need to leak - but supporting different layouts
> > >> (primarily, as Adam points out, on the document body itself) might
> > >> prove interesting and useful. I'm sure there are folks interested in a
> > >> column-shaped CouchDB, for example.
> > >>
> > >> --
> > >> b
> > >>
> > >> On Tue, Feb 19, 2019 at 11:39 AM Robert Newson 
> wrote:
> > >>>
> > >>> Good points on revtree, I agree with you we should store that
> intelligently to gain the benefits you mentioned.
> > >>>
> > >>> --
> > >>> Robert Samuel Newson
> > >>> rnew...@apache.org
> > >>>
> > >>> On Tue, 19 Feb 2019, at 18:41, Adam Kocoloski wrote:
> >  I do not think we should store the revtree as a blob. The design
> where
> >  each edit branch is its own KV should save on network IO and CPU
> cycles
> >  for normal updates. We’ve performed too many heroics to keep
> >  couch_key_tree from stalling entire databases when trying to update
> a
> >  single document with a wide revision tree, I would much prefer to
> ignore
> >  other edit branches entirely when all we’re doing is extending one
> of
> >  them.
> > 
> >  I also do not think we should store JSON documents as blobs, but
> it’s a
> >  closer call. Some of my reasoning for preferring the exploded path
> >  design:
> > 
> >  - it lends itself nicely to sub-document operations, 

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-19 Thread Paul Davis
> I'm very interested in knowing if anyone else is interested in going this 
> simple, or considers it a wasted opportunity relative to the 'exploded' path.
>

Very interested because this is how the Record Layer stores their
protobuf messages.


Re: # [DISCUSS] : things we need to solve/decide : storage of edit conflicts

2019-02-11 Thread Paul Davis
Oh, for sure we can do that. I was just trying to think of a clever
way that replicated edits could also find their edit branch with a
single read as well instead of having to pull out the entire tree.

On Mon, Feb 11, 2019 at 9:04 AM Adam Kocoloski  wrote:
>
> Not sure I follow. If a transaction needs the full revision tree for a single 
> document it can retrieve that with a single range read for the (“_meta”, 
> DocID) prefix.
>
> Adam
>
> > On Feb 8, 2019, at 6:35 PM, Paul Davis  wrote:
> >
> > Ah, that all sounds good. The only thing I'm not initially seeing as
> > obvious is how we lookup a revision path to extend during replication
> > when the previous revision may be anywhere in the list of $revs_limit
> > revisions. Feels like there might be some sort of trickery to do that
> > efficiently. Although it may also be good enough to also just issue
> > $revs_limit lookups in parallel given that we're maxed out on either
> > $revs_limit or 2*$revs_limit if we have to check for both deleted and
> > not.
> >
> > On Fri, Feb 8, 2019 at 10:22 AM Adam Kocoloski  wrote:
> >>
> >> Totally unacceptable! ;)  In fact some key bits of that model got 
> >> dispersed into at least two separate emails so you’re likely not the only 
> >> one. I’ll restate here:
> >>
> >> The size limits in FoundationDB preclude us from storing the entire key 
> >> tree as a single value; in pathological situations the tree could exceed 
> >> 100KB. Rather, I think it would make sense to store each edit *branch* as 
> >> a separate KV. We stem the branch long before it hits the value size 
> >> limit, and in the happy case of no edit conflicts this means we store the 
> >> edit history metadata in a single KV. It also means that we can apply an 
> >> interactive edit without retrieving the entire conflicted revision tree; 
> >> we need only retrieve and modify the single branch against which the edit 
> >> is being applied. The downside is that we may duplicate historical 
> >> revision identifiers shared by multiple edit branches, but I think this is 
> >> a worthwhile tradeoff.
> >>
> >> I’d also ensure that a document update only needs to read the edit branch 
> >> KV against which the update is being applied, and it can read that branch 
> >> immediately knowing only the content of the edit that is being attempted 
> >> (i.e., it does not need to read the current version of the document 
> >> itself).
> >>
> >> I think we achieve these goals with a separate subspace (maybe “_meta”?) 
> >> for the revision trees, with keys and values that look like
> >>
> >> (“_meta”, DocID, NotDeleted, RevPosition, RevHash) = 
> >> (VersionstampForCurrentRev, [ParentRev, GrandparentRev, …])
> >>
> >> Some notes:
> >>
> >> - including IsDeleted ensures that we can efficiently accept the case 
> >> where we upload a new document with the same ID where all previous edit 
> >> branches have been deleted; i.e. we can construct a key selector which 
> >> automatically tells us there are no deleted=false edit branches
> >> - access to VersionstampForCurrentRev ensures we can clear the old entry 
> >> in the by_seq space during the update
> >> - i need to remind myself how we handle an edit attempt which supplies a 
> >> _rev representing a deleted leaf. Do we fail that as a conflict? That 
> >> would be the natural thing to do here, otherwise we’re forced to check 
> >> both deleted=false and deleted=true keys
> >> - the keys can be made to naturally sort so that the winning revision 
> >> sorts last, but I don’t believe that’s a requirement here like it is for 
> >> the actual document data space
> >>
> >> Cheers, Adam
> >>
> >>> On Feb 8, 2019, at 8:59 AM, Paul Davis  
> >>> wrote:
> >>>
> >>>> I’m relatively happy with the revision history data model at this point.
> >>>
> >>> I forgot to make a note, but which of the various models are you
> >>> referring to by "revision history data model". There's been so many
> >>> without firm names that my brain is having a hard time parsing that
> >>> one.
> >>>
> >>> On Thu, Feb 7, 2019 at 9:35 PM Adam Kocoloski  wrote:
> >>>>
> >>>> Bob, Garren, Jan - heard you loud and clear, K.I.S.S. I do think it’s a 
> >>>> bit “simplistic" to exclusively choose simplicity over performance and 
> >>>> s

Re: # [DISCUSS] : things we need to solve/decide : storage of edit conflicts

2019-02-09 Thread Paul Davis
The _by_field indexes are a bit worrisome there. We've had users toss
UUIDs and email addresses into keys which then blows the size of those
indexes through the roof which causes all sorts of badness.

On Fri, Feb 8, 2019 at 6:48 PM Ilya Khlopotov  wrote:
>
> # Data model without support for per key revisions
>
> In this model "per key revisions" support was sacrificed so we can avoid 
> doing read of previous revision of the document when we write new version of 
> it.
>
> # Ranges used in the model
>
> - `{NS} / _mapping / _last_field_id
> - `{NS} / _mapping / _by_field / {field_name} = field_id` # we would cache it 
> in Layer's memory
> - `{NS} / _mapping / _by_field_id / {field_id} = field_name` # we would cache 
> it in Layer's memory
> - `{NS} / {docid} / _info` = '{"scheme": {scheme_name} / {scheme_revision}, 
> "revision": {revision}}'
> - `{NS} / {docid} / _data / {compressed_json_path} = latest_value | part`
> - `{NS} / {docid} / {revision} / _info` = '{"scheme": {scheme_name} / 
> {scheme_revision}}'
> - `{NS} / {docid} / {revision} / _data / {compressed_json_path} = value | 
> part`
> - `{NS} / {docid} / _index / _revs / {is_deleted} / {rev_pos} / {revision} = 
> {parent_revision}`
> - `{NS} / _index / _by_seq / {seq}` = "{docid} / {revision}" # seq is a FDB 
> versionstamp
>
> We would have few special documents:
> - "_schema / {schema_name}" - this doc would contain validation rules for 
> schema (not used in MVP).
> - when we start using schema we would be able to populate `{NS} / _mapping / 
> xxx` range when we write schema document
> - the schema document MUST fit into 100K (we don't use flatten JSON model for 
> it)
>
> # JSON path compression
>
> - Assign integer field_id to every unique field_name of a JSON document 
> starting from 10.
> - We would use first 10 integers to encode type of the value:
>   - 0 - the value is an array
>   - 1 - the value is a big scalar value broken down into multiple parts
>   - 2..10 -- reserved for future use
> - Replace field names in JSON path with field IDs
>
> ## Example of compressed JSON
> ```
> {
> foo: {
> bar: {
>   baz: [1, 2, 3]
> },
> langs: {
>"en_US": "English",
>"en_UK": "English (UK)"
>"en_CA": "English (Canada)",
>"zh_CN": "Chinese (China)"
> },
> translations: {
>"en_US": {
>"license": "200 Kb of text"
>}
> }
> }
> }
> ```
> this document would be compressed into
> ```
> # written in separate transaction and cached in the Layer
> {NS} / _mapping / _by_field / foo = 10
> {NS} / _mapping / _by_field / bar = 12
> {NS} / _mapping / _by_field / baz = 11
> {NS} / _mapping / _by_field / langs = 18
> {NS} / _mapping / _by_field / en_US = 13
> {NS} / _mapping / _by_field / en_UK = 14
> {NS} / _mapping / _by_field / en_CA = 15
> {NS} / _mapping / _by_field / zh_CN = 16
> {NS} / _mapping / _by_field / translations = 17
> {NS} / _mapping / _by_field / license = 19
> {NS} / _mapping / _by_field_id / 10 = foo
> {NS} / _mapping / _by_field_id / 12 = bar
> {NS} / _mapping / _by_field_id / 11 = baz
> {NS} / _mapping / _by_field_id  / 18 = langs
> {NS} / _mapping / _by_field_id  / 13 = en_US
> {NS} / _mapping / _by_field_id  / 14 = en_UK
> {NS} / _mapping / _by_field_id  / 15 = en_CA
> {NS} / _mapping / _by_field_id  / 16 = zh_CN
> {NS} / _mapping / _by_field_id  / 17 = translations
> {NS} / _mapping / _by_field_id  / 19 = license
>
> # written on document write
> {NS} / {docid} / _data / 10 /12 / 11 / 0 / 0 = 1
> {NS} / {docid} / _data / 10 /12 / 11 / 0 / 1 = 2
> {NS} / {docid} / _data / 10 /12 / 11 / 0 / 2 = 3
> {NS} / {docid} / _data / 10 / 18 / 13 = English
> {NS} / {docid} / _data / 10 / 18 / 14 = English (UK)
> {NS} / {docid} / _data / 10 / 18 / 15 = English (Canada)
> {NS} / {docid} / _data / 10 / 18 / 16 = Chinese (China)
> {NS} / {docid} / _data / 10 / 17 / 13 / 19 / 1 / 0 = first 100K of license
> {NS} / {docid} / _data / 10 / 17 / 13 / 19 / 1 / 1 = second 100K of license
> ```
>
> # Operations
>
>
> ## Read latest revision
>
> - We do range read "{NS} / {docid}" and assemble documents using results of 
> the query.
> - If we cannot find field_id in Layer's cache we would read "{NS} / _mapping 
> / _by_field_id " range and cache the result.
>
> ## Read specified revision
>
> - Do a range read "`{NS} / {docid} / {revision} /" and assemble document 
> using result of the query
> - If we cannot find field_id in Layer's cache we would read "{NS} / _mapping 
> / _by_field_id " range and cache the result.
>
> ## Write
>
> - flatten JSON
> - check if we there are missing fields in field cache of the Layer
> - if the keys are missing start key allocation transaction
>   - read "{NS} / _mapping / _by_field / {field_name}"
> - if it doesn't exists add key to the write conflict range (the FDB would 
> do it by default)
>   - `field_idx = txn["{NS} / _mapping / _last_field_id"] + 1` and add it to 
> 

Re: # [DISCUSS] : things we need to solve/decide : storage of edit conflicts

2019-02-08 Thread Paul Davis
Ah, that all sounds good. The only thing I'm not initially seeing as
obvious is how we lookup a revision path to extend during replication
when the previous revision may be anywhere in the list of $revs_limit
revisions. Feels like there might be some sort of trickery to do that
efficiently. Although it may also be good enough to also just issue
$revs_limit lookups in parallel given that we're maxed out on either
$revs_limit or 2*$revs_limit if we have to check for both deleted and
not.

On Fri, Feb 8, 2019 at 10:22 AM Adam Kocoloski  wrote:
>
> Totally unacceptable! ;)  In fact some key bits of that model got dispersed 
> into at least two separate emails so you’re likely not the only one. I’ll 
> restate here:
>
> The size limits in FoundationDB preclude us from storing the entire key tree 
> as a single value; in pathological situations the tree could exceed 100KB. 
> Rather, I think it would make sense to store each edit *branch* as a separate 
> KV. We stem the branch long before it hits the value size limit, and in the 
> happy case of no edit conflicts this means we store the edit history metadata 
> in a single KV. It also means that we can apply an interactive edit without 
> retrieving the entire conflicted revision tree; we need only retrieve and 
> modify the single branch against which the edit is being applied. The 
> downside is that we may duplicate historical revision identifiers shared by 
> multiple edit branches, but I think this is a worthwhile tradeoff.
>
> I’d also ensure that a document update only needs to read the edit branch KV 
> against which the update is being applied, and it can read that branch 
> immediately knowing only the content of the edit that is being attempted 
> (i.e., it does not need to read the current version of the document itself).
>
> I think we achieve these goals with a separate subspace (maybe “_meta”?) for 
> the revision trees, with keys and values that look like
>
> (“_meta”, DocID, NotDeleted, RevPosition, RevHash) = 
> (VersionstampForCurrentRev, [ParentRev, GrandparentRev, …])
>
> Some notes:
>
> - including IsDeleted ensures that we can efficiently accept the case where 
> we upload a new document with the same ID where all previous edit branches 
> have been deleted; i.e. we can construct a key selector which automatically 
> tells us there are no deleted=false edit branches
> - access to VersionstampForCurrentRev ensures we can clear the old entry in 
> the by_seq space during the update
> - i need to remind myself how we handle an edit attempt which supplies a _rev 
> representing a deleted leaf. Do we fail that as a conflict? That would be the 
> natural thing to do here, otherwise we’re forced to check both deleted=false 
> and deleted=true keys
> - the keys can be made to naturally sort so that the winning revision sorts 
> last, but I don’t believe that’s a requirement here like it is for the actual 
> document data space
>
> Cheers, Adam
>
> > On Feb 8, 2019, at 8:59 AM, Paul Davis  wrote:
> >
> >> I’m relatively happy with the revision history data model at this point.
> >
> > I forgot to make a note, but which of the various models are you
> > referring to by "revision history data model". There's been so many
> > without firm names that my brain is having a hard time parsing that
> > one.
> >
> > On Thu, Feb 7, 2019 at 9:35 PM Adam Kocoloski  wrote:
> >>
> >> Bob, Garren, Jan - heard you loud and clear, K.I.S.S. I do think it’s a 
> >> bit “simplistic" to exclusively choose simplicity over performance and 
> >> storage density. We’re (re)building a database here, one that has some 
> >> users with pretty demanding performance and scalability requirements. And 
> >> yes, we should certainly be testing and measuring. Kyle and team are 
> >> setting up infrastructure in IBM land to help with that now, but I also 
> >> believe we can design the data model and architecture with a basic 
> >> performance model of FoundationDB in mind:
> >>
> >> - reads cost 1ms
> >> - short range reads are the same cost as a single lookup
> >> - reads of independent parts of the keyspace can be parallelized for cheap
> >> - writes are zero-cost until commit time
> >>
> >> We ought to be able to use these assumptions to drive some decisions about 
> >> data models ahead of any end-to-end performance test.
> >>
> >> If there are specific elements of the edit conflicts management where you 
> >> think greater simplicity is warranted, let’s get those called out. Ilya 
> >> noted (correctly, in my opinion) that the term sharing stuff is one of 
> >> those 

Re: # [DISCUSS] : things we need to solve/decide : storage of edit conflicts

2019-02-08 Thread Paul Davis
On Thu, Feb 7, 2019 at 9:35 PM Adam Kocoloski  wrote:
>
> Bob, Garren, Jan - heard you loud and clear, K.I.S.S. I do think it’s a bit 
> “simplistic" to exclusively choose simplicity over performance and storage 
> density. We’re (re)building a database here, one that has some users with 
> pretty demanding performance and scalability requirements. And yes, we should 
> certainly be testing and measuring. Kyle and team are setting up 
> infrastructure in IBM land to help with that now, but I also believe we can 
> design the data model and architecture with a basic performance model of 
> FoundationDB in mind:
>
> - reads cost 1ms
> - short range reads are the same cost as a single lookup
> - reads of independent parts of the keyspace can be parallelized for cheap
> - writes are zero-cost until commit time
>
> We ought to be able to use these assumptions to drive some decisions about 
> data models ahead of any end-to-end performance test.
>
> If there are specific elements of the edit conflicts management where you 
> think greater simplicity is warranted, let’s get those called out. Ilya noted 
> (correctly, in my opinion) that the term sharing stuff is one of those items. 
> It’s relatively complex, potentially a performance hit, and only saves on 
> storage density in the corner case of lots of edit conflicts. That’s a good 
> one to drop.
>
> I’m relatively happy with the revision history data model at this point. 
> Hopefully folks find it easy to grok, and it’s efficient for both reads and 
> writes. It costs some extra storage for conflict revisions compared to the 
> current tree representation (up to 16K per edit branch, with default 
> _revs_limit) but knowing what we know about the performance death spiral for 
> wide revision trees today I’ll happily make a storage vs. performance 
> tradeoff here :)
>
> Setting the shared term approach aside, I’ve still been mulling over the key 
> structure for the actual document data:
>
> -  I thought about trying to construct a special _conflicts subspace, but I 
> don’t like that approach because the choice of a “winning" revision can flip 
> back and forth very quickly with concurrent writers to different edit 
> branches. I think we really want to have a way for revisions to naturally 
> sort themselves so the winner is the first or last revision in a list.
>
> - Assuming we’re using key paths of the form (docid, revision-ish, path, to, 
> field), the goal here is to find an efficient way to get the last key with 
> prefix “docid” (assuming winner sorts last), and then all the keys that share 
> the same (docid, revision-ish) prefix as that one. I see two possible 
> approaches so far, neither perfect:
>
> Option 1: Execute a get_key() operation with a key selector that asks for the 
> last key less than “docid\xFF” (again assuming winner sorts last), and then 
> do a get_range_startswith() request setting the streaming mode to “want_all” 
> and the prefix to the docid plus whatever revision-ish we found from the 
> get_key() request. This is two roundtrips instead of one, but it always 
> retrieves exactly the right set of keys, and the second step is executed as 
> fast as possible.
>
> Option 2: Jump straight to get_range_startswith() request using only “docid” 
> as the prefix, then cancel the iteration once we reach a revision not equal 
> to the first one we see. We might transfer too much data, or we might end up 
> doing multiple roundtrips if the default “iterator” streaming mode sends too 
> little data to start (I haven’t checked what the default iteration block is 
> there), but in the typical case of zero edit conflicts we have a good chance 
> of retrieving the full document in one roundtrip.
>

I'm working through getting fdb bindings written and fully tested so
have a couple notes for the range streaming bits.

The first somewhat surprising bit was that the want_all streaming mode
doesn't actually return all rows in a single call. If the range is
large enough then a client has to invoke the get_range a number of
times tweaking parameters to know when to stop. I just ran into that
last night but discovered the logic isn't terribly complicated. The
Python implementation is at [1] for reference.

> I don’t have a good sense of which option wins out here from a performance 
> perspective, but they’re both operating on the same data model so easy enough 
> to test the alternatives. The important bit is getting the revision-ish 
> things to sort correctly. I think we can do that by generating something like
>

There are a number of different streaming modes and my wager will be
that we'll have to play with them a bit to get an idea of which is
best and likely it'll be which is best in which scenario. For
instance, when we have a defined range, want_all likely wins, but in
the "iterate until next revision found" situation I'd wager the
`iterator` mode would likely be best as it appears to adaptively size
row batches.

> revision-ish = 

Re: # [DISCUSS] : things we need to solve/decide : storage of edit conflicts

2019-02-08 Thread Paul Davis
> I’m relatively happy with the revision history data model at this point.

I forgot to make a note, but which of the various models are you
referring to by "revision history data model". There's been so many
without firm names that my brain is having a hard time parsing that
one.

On Thu, Feb 7, 2019 at 9:35 PM Adam Kocoloski  wrote:
>
> Bob, Garren, Jan - heard you loud and clear, K.I.S.S. I do think it’s a bit 
> “simplistic" to exclusively choose simplicity over performance and storage 
> density. We’re (re)building a database here, one that has some users with 
> pretty demanding performance and scalability requirements. And yes, we should 
> certainly be testing and measuring. Kyle and team are setting up 
> infrastructure in IBM land to help with that now, but I also believe we can 
> design the data model and architecture with a basic performance model of 
> FoundationDB in mind:
>
> - reads cost 1ms
> - short range reads are the same cost as a single lookup
> - reads of independent parts of the keyspace can be parallelized for cheap
> - writes are zero-cost until commit time
>
> We ought to be able to use these assumptions to drive some decisions about 
> data models ahead of any end-to-end performance test.
>
> If there are specific elements of the edit conflicts management where you 
> think greater simplicity is warranted, let’s get those called out. Ilya noted 
> (correctly, in my opinion) that the term sharing stuff is one of those items. 
> It’s relatively complex, potentially a performance hit, and only saves on 
> storage density in the corner case of lots of edit conflicts. That’s a good 
> one to drop.
>
> I’m relatively happy with the revision history data model at this point. 
> Hopefully folks find it easy to grok, and it’s efficient for both reads and 
> writes. It costs some extra storage for conflict revisions compared to the 
> current tree representation (up to 16K per edit branch, with default 
> _revs_limit) but knowing what we know about the performance death spiral for 
> wide revision trees today I’ll happily make a storage vs. performance 
> tradeoff here :)
>
> Setting the shared term approach aside, I’ve still been mulling over the key 
> structure for the actual document data:
>
> -  I thought about trying to construct a special _conflicts subspace, but I 
> don’t like that approach because the choice of a “winning" revision can flip 
> back and forth very quickly with concurrent writers to different edit 
> branches. I think we really want to have a way for revisions to naturally 
> sort themselves so the winner is the first or last revision in a list.
>
> - Assuming we’re using key paths of the form (docid, revision-ish, path, to, 
> field), the goal here is to find an efficient way to get the last key with 
> prefix “docid” (assuming winner sorts last), and then all the keys that share 
> the same (docid, revision-ish) prefix as that one. I see two possible 
> approaches so far, neither perfect:
>
> Option 1: Execute a get_key() operation with a key selector that asks for the 
> last key less than “docid\xFF” (again assuming winner sorts last), and then 
> do a get_range_startswith() request setting the streaming mode to “want_all” 
> and the prefix to the docid plus whatever revision-ish we found from the 
> get_key() request. This is two roundtrips instead of one, but it always 
> retrieves exactly the right set of keys, and the second step is executed as 
> fast as possible.
>
> Option 2: Jump straight to get_range_startswith() request using only “docid” 
> as the prefix, then cancel the iteration once we reach a revision not equal 
> to the first one we see. We might transfer too much data, or we might end up 
> doing multiple roundtrips if the default “iterator” streaming mode sends too 
> little data to start (I haven’t checked what the default iteration block is 
> there), but in the typical case of zero edit conflicts we have a good chance 
> of retrieving the full document in one roundtrip.
>
> I don’t have a good sense of which option wins out here from a performance 
> perspective, but they’re both operating on the same data model so easy enough 
> to test the alternatives. The important bit is getting the revision-ish 
> things to sort correctly. I think we can do that by generating something like
>
> revision-ish = NotDeleted/1bit : RevPos : RevHash
>
> with some suitable order-preserving encoding on the RevPos integer.
>
> Apologies for the long email. Happy for any comments, either here or over on 
> IRC. Cheers,
>
> Adam
>
> > On Feb 7, 2019, at 4:52 PM, Robert Newson  wrote:
> >
> > I think we should choose simple. We can then see if performance is too low 
> > or storage overhead too high and then see what we can do about it.
> >
> > B.
> >
> > --
> >  Robert Samuel Newson
> >  rnew...@apache.org
> >
> > On Thu, 7 Feb 2019, at 20:36, Ilya Khlopotov wrote:
> >> We cannot do simple thing if we want to support sharing of JSON terms. I
> >> think if we 

Re: # [DISCUSS] : things we need to solve/decide : storage of edit conflicts

2019-02-08 Thread Paul Davis
Cheers to that Garren!

Whatever we decide on for the data model I'd like to see a fairly
extensive property based test suite around it. I almost said for
anything above chunked based storage but even for that I'd think that
I'd still want property testing around various keys and tree
mutations.

On Fri, Feb 8, 2019 at 12:45 AM Garren Smith  wrote:
>
> Hi Adam,
>
> Thanks for the detailed email. In terms of the data model, that makes a lot
> of sense.
>
> I’m still playing a bit of catchup on understanding how fdb works, so I
> can’t comment on the best way to retrieve a document.
>
> From my side, I would like to see our decisions also driven by testing and
> validating that our data model works. I find the way that fdb was tested
> and built really impressive. I would love to see us apply some of that to
> the way we build our CouchDB layer.
>
> Cheers
> Garren
>
> On Fri, Feb 8, 2019 at 5:35 AM Adam Kocoloski  wrote:
>
> > Bob, Garren, Jan - heard you loud and clear, K.I.S.S. I do think it’s a
> > bit “simplistic" to exclusively choose simplicity over performance and
> > storage density. We’re (re)building a database here, one that has some
> > users with pretty demanding performance and scalability requirements. And
> > yes, we should certainly be testing and measuring. Kyle and team are
> > setting up infrastructure in IBM land to help with that now, but I also
> > believe we can design the data model and architecture with a basic
> > performance model of FoundationDB in mind:
> >
> > - reads cost 1ms
> > - short range reads are the same cost as a single lookup
> > - reads of independent parts of the keyspace can be parallelized for cheap
> > - writes are zero-cost until commit time
> >
> > We ought to be able to use these assumptions to drive some decisions about
> > data models ahead of any end-to-end performance test.
> >
> > If there are specific elements of the edit conflicts management where you
> > think greater simplicity is warranted, let’s get those called out. Ilya
> > noted (correctly, in my opinion) that the term sharing stuff is one of
> > those items. It’s relatively complex, potentially a performance hit, and
> > only saves on storage density in the corner case of lots of edit conflicts.
> > That’s a good one to drop.
> >
> > I’m relatively happy with the revision history data model at this point.
> > Hopefully folks find it easy to grok, and it’s efficient for both reads and
> > writes. It costs some extra storage for conflict revisions compared to the
> > current tree representation (up to 16K per edit branch, with default
> > _revs_limit) but knowing what we know about the performance death spiral
> > for wide revision trees today I’ll happily make a storage vs. performance
> > tradeoff here :)
> >
> > Setting the shared term approach aside, I’ve still been mulling over the
> > key structure for the actual document data:
> >
> > -  I thought about trying to construct a special _conflicts subspace, but
> > I don’t like that approach because the choice of a “winning" revision can
> > flip back and forth very quickly with concurrent writers to different edit
> > branches. I think we really want to have a way for revisions to naturally
> > sort themselves so the winner is the first or last revision in a list.
> >
> > - Assuming we’re using key paths of the form (docid, revision-ish, path,
> > to, field), the goal here is to find an efficient way to get the last key
> > with prefix “docid” (assuming winner sorts last), and then all the keys
> > that share the same (docid, revision-ish) prefix as that one. I see two
> > possible approaches so far, neither perfect:
> >
> > Option 1: Execute a get_key() operation with a key selector that asks for
> > the last key less than “docid\xFF” (again assuming winner sorts last), and
> > then do a get_range_startswith() request setting the streaming mode to
> > “want_all” and the prefix to the docid plus whatever revision-ish we found
> > from the get_key() request. This is two roundtrips instead of one, but it
> > always retrieves exactly the right set of keys, and the second step is
> > executed as fast as possible.
> >
> > Option 2: Jump straight to get_range_startswith() request using only
> > “docid” as the prefix, then cancel the iteration once we reach a revision
> > not equal to the first one we see. We might transfer too much data, or we
> > might end up doing multiple roundtrips if the default “iterator” streaming
> > mode sends too little data to start (I haven’t checked what the default
> > iteration block is there), but in the typical case of zero edit conflicts
> > we have a good chance of retrieving the full document in one roundtrip.
> >
> > I don’t have a good sense of which option wins out here from a performance
> > perspective, but they’re both operating on the same data model so easy
> > enough to test the alternatives. The important bit is getting the
> > revision-ish things to sort correctly. I think we can do that by 

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-30 Thread Paul Davis
Jiffy preserves duplicate keys if its not decoding into a map (in
which case last value for duplicate keys wins). Its significantly
corner case and not at all supported by nearly any other JSON library
so changing that shouldn't be considered a breaking change in my
opinion.

On Wed, Jan 30, 2019 at 8:21 AM Mike Rhodes  wrote:
>
> From what I recall Jiffy is able to cope with the valid-but-kinda-silly[1] 
> thing where you have multiple JSON keys with the same name, i.e., { "foo": 1, 
> "foo": 2 }.
>
> Are the proposals on the table able to continue this support (or am I wrong 
> about Jiffy)?
>
> [1] https://tools.ietf.org/html/rfc8259#section-4, "The names within an 
> object SHOULD be unique.", though 
> https://tools.ietf.org/html/rfc7493#section-2.3 does sensibly close that down.
>
> --
> Mike.
>
> On Wed, 30 Jan 2019, at 13:33, Jan Lehnardt wrote:
> >
> >
> > > On 30. Jan 2019, at 14:22, Jan Lehnardt  wrote:
> > >
> > > Thanks Ilya for getting this started!
> > >
> > > Two quick notes on this one:
> > >
> > > 1. note that JSON does not guarantee object key order and that CouchDB 
> > > has never guaranteed it either, and with say emit(doc.foo, doc.bar), if 
> > > either emit() parameter was an object, the undefined-sort-order of 
> > > SpiderMonkey would mix things up. While worth bringing up, this is not a 
> > > BC break.
> > >
> > > 2. This would have the fun property of being able to rename a key inside 
> > > all docs that have that key.
> >
> > …in one short operation.
> >
> > Best
> > Jan
> > —
> > >
> > > Best
> > > Jan
> > > —
> > >
> > >> On 30. Jan 2019, at 14:05, Ilya Khlopotov  wrote:
> > >>
> > >> # First proposal
> > >>
> > >> In order to overcome FoudationDB limitations on key size (10 kB) and 
> > >> value size (100 kB) we could use the following approach.
> > >>
> > >> Bellow the paths are using slash for illustration purposes only. We can 
> > >> use nested subspaces, tuples, directories or something else.
> > >>
> > >> - Store documents in a subspace or directory  (to keep prefix for a key 
> > >> short)
> > >> - When we store the document we would enumerate all field names (0 and 1 
> > >> are reserved) and store the mapping table in the key which look like:
> > >> ```
> > >> {DB_DOCS_NS} / {DOC_KEY} / 0
> > >> ```
> > >> - Flatten the JSON document (convert it into key value pairs where the 
> > >> key is `JSON_PATH` and value is `SCALAR_VALUE`)
> > >> - Replace elements of JSON_PATH with integers from mapping table we 
> > >> constructed earlier
> > >> - When we have array use `1 / {array_idx}`
> > >> - Store scalar values in the keys which look like the following (we use 
> > >> `JSON_PATH` with integers).
> > >> ```
> > >> {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH}
> > >> ```
> > >> - If the scalar value exceeds 100kB we would split it and store every 
> > >> part under key constructed as:
> > >> ```
> > >> {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH} / {PART_IDX}
> > >> ```
> > >>
> > >> Since all parts of the documents are stored under a common `{DB_DOCS_NS} 
> > >> / {DOC_KEY}` they will be stored on the same server most of the time. 
> > >> The document can be retrieved by using range query 
> > >> (`txn.get_range("{DB_DOCS_NS} / {DOC_KEY} / 0", "{DB_DOCS_NS} / 
> > >> {DOC_KEY} / 0xFF")`). We can reconstruct the document since the mapping 
> > >> is returned as well.
> > >>
> > >> The downside of this approach is we wouldn't be able to ensure the same 
> > >> order of keys in the JSON object. Currently the `jiffy` JSON encoder 
> > >> respects order of keys.
> > >> ```
> > >> 4> jiffy:encode({[{bbb, 1}, {aaa, 12}]}).
> > >> <<"{\"bbb\":1,\"aaa\":12}">>
> > >> 5> jiffy:encode({[{aaa, 12}, {bbb, 1}]}).
> > >> <<"{\"aaa\":12,\"bbb\":1}">>
> > >> ```
> > >>
> > >> Best regards,
> > >> iilyak
> > >>
> > >> On 2019/01/30 13:02:57, Ilya Khlopotov  wrote:
> > >>> As you might already know the FoundationDB has a number of limitations 
> > >>> which influences the way we might store JSON documents. The limitations 
> > >>> are:
> > >>>
> > >>> |  limitation |recommended value|recommended 
> > >>> max|absolute max|
> > >>> |-|--:|:|--:|
> > >>> | transaction duration  |  |
> > >>>|  5 sec  |
> > >>> | transaction data size |  |
> > >>>|  10 Mb |
> > >>> | key size   | 32 bytes |   
> > >>> 1 kB  | 10 kB  |
> > >>> | value size|   |   
> > >>>10 kB |100 kB |
> > >>>
> > >>> In order to fit the JSON document into 100kB we would have to partition 
> > >>> it in some way. There are three ways of partitioning the document
> > >>> 1. store multiple binary blobs (parts) in different keys
> > >>> 2. flatten JSON structure and store every path leading to a 

Re: [couchdb] uuid cluster set up

2019-01-16 Thread Paul Davis
And for background, that value is used for replication checkpoints.
The consequences of changing it between upgrades is that you would be
resetting any replications into and out of the cluster. It wouldn't be
end of the world consequences if it broke but would cause
downtime/replication delay if it changed. Depending on your particular
reliance on replications that may or may not be an issue.

On Wed, Jan 16, 2019 at 1:13 PM Nick Vatamaniuc  wrote:
>
> Hi Ricardo,
>
> Make sure the value of the uuid is the same across all the nodes in a
> cluster and that it is preserved across upgrades.
>
> It turns out the uuid value doesn't have to be an actual uuid, it could be
> for example the hostname of the cluster as long it is expected to say
> stable and not change.
>
> Cheers,
> -Nick
>
>
> On Wed, Jan 16, 2019 at 11:03 AM Ricardo 
> wrote:
>
> > Hi everyone!
> > I have two questions setting up a cluster of three nodes.
> >
> > 1.In a cluster with three nodes, in local.ini:
> >
> > [couchdb]
> > uuid = VALUE
> >
> > VALUE has to be different for each node or the same for all?
> >
> > 2. It is important to keep this uuid value between releases upgrades?
> >
> > I have this doubt because at
> > https://docs.couchdb.org/en/latest/setup/cluster.html
> > it says "*Use the first UUID as the cluster UUID.*". I interpret it as a
> > single UUID for the entire cluster.
> > And at https://docs.couchdb.org/en/latest/config/couchdb.html#couchdb/uuid
> > says
> > "*Unique identifier for this CouchDB server instance.*". I interpret it as
> > a UUID for each node, I thinks this is the correct one.
> >
> > Thanks!
> >


Re: [PROPOSAL] Change the minimum supported Erlang version to OTP 19

2018-12-20 Thread Paul Davis
+1
On Thu, Dec 20, 2018 at 8:15 AM Eiri  wrote:
>
> +1
>
>
> > On Dec 20, 2018, at 04:55, Jay Doane  wrote:
> >
> > Currently, CouchDB requires at least OTP 17 or later to build and run
> > [1][2]. However, recent work undertaken to eliminate compiler warnings
> > [3][4] has highlighted the additional effort needed to continue to support
> > older Erlang versions. Some of the issues that have come up are:
> > 1. erlang:now/0 deprecated in OTP 18 [5]
> > 2. crypto:rand_uniform/2 deprecated in OTP 20 [6], but no rand module
> > pre-OTP 18
> > which both require using rebar platform defines [7] and ifdefs [8] to work
> > around compiler warnings.
> >
> > Joan raised the idea that maybe it's time to move to a more recent minimum
> > version to simplify the code, and also because there a many compelling new
> > features in later versions that we currently cannot use:
> > 1. maps introduced in OTP 17, but only became performant for large number
> > of entries in OTP 18 [9]
> > 2. off heap messages introduced in OTP 19 [10]
> >
> > Since CouchDB now ships with it's own OTP 19.6.3 Erlang binaries [9], it's
> > not clear whether we need to continue supporting OTP 17 and 18. As a bonus,
> > removing those versions will also speed up travis builds.
> >
> > Any thoughts either for or against this proposal?
> >
> > Best regards,
> > Jay
> >
> > [1] https://github.com/apache/couchdb/blob/master/rebar.config.script#L94
> > [2] https://github.com/apache/couchdb/blob/master/.travis.yml#L10
> > [3] https://github.com/apache/couchdb-ets-lru/pull/7
> > [4] https://github.com/apache/couchdb/pull/1798
> > [5] http://erlang.org/doc/apps/erts/time_correction.html
> > [6] http://erlang.org/pipermail/erlang-questions/2017-May/092435.html
> > [7]
> > https://github.com/apache/couchdb/blob/master/src/couch/rebar.config.script#L148-L154
> > [8]
> > https://github.com/apache/couchdb/blob/master/src/couch/src/couch_rand.erl#L22-L57
> > [9] http://erlang.org/download/otp_src_18.0.readme
> > [10]
> > https://www.erlang-solutions.com/blog/erlang-19-0-garbage-collector.html
> > [9] https://github.com/apache/couchdb-ci/blob/master/README.md
>


Re: [NOTICE] Mandatory relocation of Apache git repositories on git-wip-us.apache.org

2018-12-07 Thread Paul Davis
+1
On Fri, Dec 7, 2018 at 10:58 AM Joan Touzet  wrote:
>
> Requesting lazy consensus - does anyone have a problem with them
> starting the process to mass-migrate all of the remaining repos to
> gitbox?
>
> This means integrated access and easy PRs on repos like couchdb-admin,
> couchdb-ets-lru, etc.
>
> I can't imagine anyone will say no, but we need "documented support
> for the decision" from a mailing list post, so, here it is.
>
> -Joan
> - Forwarded Message -
> From: "Daniel Gruno" 
> To: us...@infra.apache.org
> Sent: Friday, December 7, 2018 11:52:36 AM
> Subject: [NOTICE] Mandatory relocation of Apache git repositories on 
> git-wip-us.apache.org
>
> [IF YOUR PROJECT DOES NOT HAVE GIT REPOSITORIES ON GIT-WIP-US PLEASE
>   DISREGARD THIS EMAIL; IT WAS MASS-MAILED TO ALL APACHE PROJECTS]
>
> Hello Apache projects,
>
> I am writing to you because you may have git repositories on the
> git-wip-us server, which is slated to be decommissioned in the coming
> months. All repositories will be moved to the new gitbox service which
> includes direct write access on github as well as the standard ASF
> commit access via gitbox.apache.org.
>
> ## Why this move? ##
> The move comes as a result of retiring the git-wip service, as the
> hardware it runs on is longing for retirement. In lieu of this, we
> have decided to consolidate the two services (git-wip and gitbox), to
> ease the management of our repository systems and future-proof the
> underlying hardware. The move is fully automated, and ideally, nothing
> will change in your workflow other than added features and access to
> GitHub.
>
> ## Timeframe for relocation ##
> Initially, we are asking that projects voluntarily request to move
> their repositories to gitbox, hence this email. The voluntary
> timeframe is between now and January 9th 2019, during which projects
> are free to either move over to gitbox or stay put on git-wip. After
> this phase, we will be requiring the remaining projects to move within
> one month, after which we will move the remaining projects over.
>
> To have your project moved in this initial phase, you will need:
>
> - Consensus in the project (documented via the mailing list)
> - File a JIRA ticket with INFRA to voluntarily move your project repos
>over to gitbox (as stated, this is highly automated and will take
>between a minute and an hour, depending on the size and number of
>your repositories)
>
> To sum up the preliminary timeline;
>
> - December 9th 2018 -> January 9th 2019: Voluntary (coordinated)
>relocation
> - January 9th -> February 6th: Mandated (coordinated) relocation
> - February 7th: All remaining repositories are mass migrated.
>
> This timeline may change to accommodate various scenarios.
>
> ## Using GitHub with ASF repositories ##
> When your project has moved, you are free to use either the ASF
> repository system (gitbox.apache.org) OR GitHub for your development
> and code pushes. To be able to use GitHub, please follow the primer
> at: https://reference.apache.org/committer/github
>
>
> We appreciate your understanding of this issue, and hope that your
> project can coordinate voluntarily moving your repositories in a
> timely manner.
>
> All settings, such as commit mail targets, issue linking, PR
> notification schemes etc will automatically be migrated to gitbox as
> well.
>
> With regards, Daniel on behalf of ASF Infra.
>
> PS:For inquiries, please reply to us...@infra.apache.org, not your
> project's dev list :-).
>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@community.apache.org
> For additional commands, e-mail: dev-h...@community.apache.org


Re: Elixir suite has landed on master!

2018-11-08 Thread Paul Davis
Yay!
On Thu, Nov 8, 2018 at 12:36 PM Adam Kocoloski  wrote:
>
> Oh, very cool. Glad to see that get merged!
>
> Adam
>
> > On Nov 8, 2018, at 11:23 AM, Joan Touzet  wrote:
> >
> > Resending since my first email didn't get through...
> >
> > In case you're not on notifications@, you may have missed that the
> > Elixir test suite has just landed on master.
> >
> > This is the heroic effort of many people to modernize our aging
> > JavaScript test suite by converting it to Elixir. The work is
> > ongoing, and if you know Elixir (or want to learn it), we'd love
> > your help!
> >
> > Replacing the JS test suite with an Elixir one is a key step in
> > our long term strategy to mitigate against our dependency on
> > SpiderMonkey 1.8.5. It also gives us a much richer environment in
> > which we can build and run tests.
> >
> > At the moment, this test suite is not run by the `make check`
> > command. You must run `make elixir` to run the Elixir test suite.
> > It's our intent to move `make check` over to the elixir tests
> > as soon as there is feature parity between the JS and Elixir
> > suites.
> >
> > If you do not have the dependencies installed that you need to
> > run Elixir, you can use the scripts in our apache/couchdb-ci
> > repository to install them for you. (Thanks, Ilya!) Clone
> > the repo, and run the `bin/install-elixir.sh` script
> > with root permissions. You must have `git`, `unzip` and `wget`
> > installed for this script to work.
> >
> > Thanks again to everyone who helped us get this far!
> > -Joan
>


Re: Exact definition of a database "active size"

2018-10-23 Thread Paul Davis
+1 to after compaction as well.
On Tue, Oct 23, 2018 at 4:04 AM Jan Lehnardt  wrote:
>
> +1 as well. This should be useful for folks who are in a tight spot with an 
> uncompacted database to know, roughly, how much free space they need for a 
> successful compaction.
>
> Thanks for working on clarifying this!
>
> Best
> Jan
> —
>
> > On 22. Oct 2018, at 23:47, Joan Touzet  wrote:
> >
> > +1 to Adam's definition, which I think is closest to the "former" 
> > definition in Eric's first post.
> >
> > -Joan
> >
> > - Original Message -
> > From: "Adam Kocoloski" 
> > To: "dev@couchdb.apache.org Developers" 
> > Sent: Monday, October 22, 2018 5:13:05 PM
> > Subject: Re: Exact definition of a database "active size"
> >
> > I think sizes.active should be a close approximation of the size of the 
> > database after compaction; i.e. it should be possible to use (sizes.file - 
> > sizes.active) as a way to estimate the number of bytes that can be 
> > reclaimed by compacting that database shard.
> >
> > Adam
> >
> >> On Oct 22, 2018, at 4:32 PM, Eiri  wrote:
> >>
> >> Dear all,
> >>
> >> I’d like to hear your opinion on how we should interpret a database 
> >> attribute “active size”.
> >>
> >> As you surely know we are using three different size attributes in a 
> >> database info: file - the size of the database file on disk; external - 
> >> the uncompressed size of database contents and active, defined as “the 
> >> size of live data inside the database” or “active byte in the current MVCC 
> >> snapshot”.
> >>
> >> Sometime ago I had a discussion with Paul Davis and he pointed on 
> >> ambiguity of that definition, namely - is it live data before a compaction 
> >> or after a compaction? To put it in other words: should we treat as 
> >> “active” only the documents and attachments on btree’s leafs or also 
> >> include into it the previous document revisions while they can be 
> >> accessed. Codewise it is the latter, both in current version of CouchDB 
> >> and in 1.x version where active size was named data_size, but intuitively 
> >> it feels that it should be former.
> >>
> >> Despite sounds academical this is a practical question, the difference of 
> >> active size before and after compaction could be rather noticeable and 
> >> since it is used as a trigger by compaction daemon it could skew disk 
> >> usage pattern.
> >>
> >> Please share your thoughts. If we’ll conclude that we want to change how 
> >> active size calculated I’m willing to take on implementation of this as I 
> >> have a recent PR around the same area of code.
> >>
> >>
> >> Regards,
> >> Eric
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >
>
> --
> Professional Support for Apache CouchDB:
> https://neighbourhood.ie/couchdb-support/
>


Re: [PROPOSAL] Officially deprecate CouchDB 1.x.

2018-07-05 Thread Paul Davis
+1
On Thu, Jul 5, 2018 at 5:18 AM Andy Wenk  wrote:
>
> +1
>
> --
> Andy Wenk
> Hamburg - Germany
> RockIt!
>
> GPG public key:
> http://pgp.mit.edu/pks/lookup?op=get=0x45D3565377F93D29
>
>
>
> > On 5. Jul 2018, at 11:21, Garren Smith  wrote:
> >
> > +1 to this as well. We just don't have enough dev's to do this, and I would
> > rather see us focusing on CouchDB 2.x
> >
> > On Thu, Jul 5, 2018 at 10:26 AM, Robert Samuel Newson 
> > wrote:
> >
> >> +1
> >>
> >> I am also for the proposal to officially deprecate couchdb 1.x.
> >>
> >> I would not want to see back-porting or new feature work in 1.x even if a
> >> theoretical 3 person team were to materialise. Of course any such team that
> >> does appear could fork couchdb 1.x and work on it independently (the only
> >> caveat being that it could not be called "couchdb").
> >>
> >> I agree with Joan and Jan that we are simply recognising reality here.
> >> There is no one developing on the 1.x branch and hasn't been for ages. It
> >> is time for us to officially let it go.
> >>
> >> B.
> >>
> >>> On 5 Jul 2018, at 09:18, Jan Lehnardt  wrote:
> >>>
> >>>
> >>>
>  On 4. Jul 2018, at 22:51, Joan Touzet  wrote:
> 
>  DISCLAIMER: This is a non-technical proposal to make a project decision.
>  Per our Bylaws[1], this means that it should "normally be made with lazy
>  consensus, or by the entire community using discussion-led
>  consensus-building, and not through formal voting." However, since the
>  intent is to make a significant policy change, this concrete proposal
>  should be considered as a *lazy consensus* decision with a *7 day*
>  timeframe (expiring on or about 2017-07-11, 23:59 UTC. Please give this
>  thread your ample consideration.
> 
> 
>  I would like to table[2] a proposal to terminate official Apache support
>  for CouchDB 1.x. This means that:
> 
>  1. The Apache CouchDB project will no longer make new 1.x releases.
>  2. All remaining 1.x issues in JIRA and GH Issues will be closed.
>  3. Everyone can continue to use 1.x as long as they want.
>  4. People are welcome to continue discussing 1.x on the users@ list.
> 
> 
>  The reason is simple: no one is maintaining the 1.x.x branches of
>  CouchDB anymore. Issues stack up on the tracker[3] with no response.
>  Original grand plans of back-porting 2.x features such as Mango to 1.x
>  have not materialised. And when important security issues surface[4],
>  having to patch 1.x as well as 2.x slows down the security team's
>  ability to push releases quickly out the door.
> 
>  By focusing on what we do best - supporting 2.x and beyond with bug
>  fixes, new features, and ever-improving documentation and web UI - we
>  can improve our release cadence and avoid misleading our user base
>  with false promises.
> 
> 
>  THAT SAID: There are two important footnotes to the proposal.
> 
>  FIRST: If a group of interested maintainers start making active efforts
>  to improve 1.x branch upkeep, I can speak with the full authority of the
>  PMC to say that we'll endorse those efforts. But to un-mothball
>  1.x officially should require more than 1-2 people doing occasional
>  bugfixing work. I'd personally want to see at least a 3-person team
>  making sustained contributions to 1.x before re-activating official
>  releases. Also, that work would need to be in-line with work currently
>  happening on master; I wouldn't want to see new 1.x features materialise
>  that don't have parallel commits to master. (Much preferred would be to
>  see people fixing the things in 2.x that prevent people migrating off
>  of 1.x instead.)
> 
>  SECOND: Let a thousand forks bloom. If you're looking to build a CouchDB
>  1.x fork that has baked in geo/full text search, Mango, Fauxton, and
>  can run on VMS, OS/2 Warp 4, NeXTStep 3.x, and Palm, have at it. I'll
>  even write a blog post about your project. (Sounds interesting!)
> 
> 
>  Again, this proposal defaults to lazy consensus with a 7-day expiry
>  period. CouchDB committers have binding "votes" on this proposal.
> >>>
> >>> +1
> >>>
> >>> Best
> >>> Jan
> >>> —
> 
>  Thanks for your consideration,
>  Joan "to infinity, and beyond" Touzet
> 
> 
>  [1] http://couchdb.apache.org/bylaws.html#decisions
>  [2] In the non-U.S. sense of the word, i.e., meaning to begin
>   consideration of a proposal.
>  [3] https://s.apache.org/couchdb-1.x-issues
>  [4] https://s.apache.org/wdnW
> >>>
> >>> --
> >>> Professional Support for Apache CouchDB:
> >>> https://neighbourhood.ie/couchdb-support/
> >>
> >>
>


Re: Proposal: removing view changes code from mrview

2018-04-03 Thread Paul Davis
+1

On Tue, Apr 3, 2018 at 9:23 AM, Joan Touzet  wrote:

> +1.
>
> 1. No one has worked on a fix since its contribution prior to 2.0.
> 2. The code will always be in git in an older revision if someone is
> looking for it.
> 3. We have #592 which describes the fundamental problem that needs to be
> resolved. (By the way, with my PMC hat on, you should unassign this issue
> from yourself unless you're actively working on it *right now*.)
>
> - Original Message -
> From: "Eiri" 
> To: dev@couchdb.apache.org
> Sent: Tuesday, April 3, 2018 8:15:21 AM
> Subject: Proposal: removing view changes code from mrview
>
> Hi all,
>
> It is my understanding that a current implementation of view changes in
> mrview is conceptually broken. I heard from Robert Newson that he and
> Benjamin Bastian found that some time ago doing testing with deletion and
> recreation of docs emitting same keys in the views.
>
> I propose to remove view changes code from mrview and its mention from
> documentation, as it seem that people keep trying to use those for filtered
> replication or getting a false impression that it's a simple fix in fabric.
> Not to mention that the current implementation quite complicates mrviews
> code and takes space in view files with building unneeded seq and kseq
> btrees.
>
> We can re-implement this feature later in more robust way as there are
> clearly a demand for it. Please share your opinion.
>
>
> Regards,
> Eric
>


Re: [ANNOUNCE] Peng Hui Jiang elected as CouchDB committer

2018-03-03 Thread Paul Davis
Congrats!

On Sat, Mar 3, 2018 at 2:24 PM, Andy Wenk  wrote:

> Welcome Peng ;-)
>
> --
> Andy Wenk
> Hamburg - Germany
> RockIt!
>
> GPG public key:
> http://pgp.mit.edu/pks/lookup?op=get=0x45D3565377F93D29
>
>
>
> > On 3. Mar 2018, at 19:04, Nick Vatamaniuc  wrote:
> >
> > Congrats, Peng Hui!
> >
> > On 03/03/2018 12:55 PM, Michelle Phung wrote:
> >> Welcome!!
> >> - Michelle
> >>> On Mar 3, 2018, at 2:45 AM, Jan Lehnardt  wrote:
> >>>
> >>> Welcome to the team :)
> >>>
> >>> Cheers
> >>> Jan
> >>> —
> >>>
>  On 3. Mar 2018, at 05:16, Joan Touzet  wrote:
> 
>  Dear community,
> 
>  I am pleased to announce that the CouchDB Project Management Committee
>  has elected Peng Hui Jiang as a CouchDB committer.
> 
>    Apache ID: jiangphcn
> 
>    IRC nick: jiangph
> 
>  Committers are given a binding vote in certain project decisions, as
>  well as write access to public project infrastructure.
> 
>  This election was made in recognition of Peng Hui's commitment to the
>  project. We mean this in the sense of being loyal to the project and
>  its interests.
> 
>  Please join me in extending a warm welcome to Peng Hui!
> 
>  On behalf of the CouchDB PMC,
>  Joan
> >>>
> >
>
>


Re: [AMEND OFFICIAL DOCUMENTS] Bylaws revision: HTTP API change notifications

2018-02-12 Thread Paul Davis
Echoing my PR +1 here.

On Mon, Feb 12, 2018 at 10:39 AM, Joan Touzet <woh...@apache.org> wrote:

> Hi everyone,
>
> I have incorporated minor feedback from Robert Newson and Paul
> Davis.
>
> https://github.com/apache/couchdb-www/pull/27
>
> The changes reflect that we do not promise backwards compatibility
> for bugs in the published HTTP API, nor do we promise any sort
> of compatibility for undocumented behaviour (such as the change
> to database sequence values between 1.x and 2.x.)
>
> It also clarifies that the compatibility promise is only for
> _released_ versions of CouchDB.
>
> The vote passes, but because of the minor clarifications to the
> PR, I'll leave the PR for another 24 hours before merging.
>
> -Joan
>
> - Original Message -
> From: "Joan Touzet" <woh...@apache.org>
> To: priv...@couchdb.apache.org
> Cc: dev@couchdb.apache.org
> Sent: Thursday, 8 February, 2018 4:13:29 PM
> Subject: Re: [AMEND OFFICIAL DOCUMENTS] Bylaws revision: HTTP API change
> notifications
>
> All, I have word from Paul Davis that he Has Thoughts  on this,
> but is also not physically well, so he needs a bit more time to
> respond.
>
> If no one objects, I'll keep this vote open until he's able to reply.
> We hope that's by Monday.
>
> -Joan
>
> - Original Message -
> From: "Joan Touzet" <woh...@apache.org>
> To: dev@couchdb.apache.org
> Cc: "CouchDB PMC" <priv...@couchdb.apache.org>
> Sent: Thursday, 8 February, 2018 10:31:43 AM
> Subject: Re: [AMEND OFFICIAL DOCUMENTS] Bylaws revision: HTTP API change
> notifications
>
> This is my own +1 vote.
>
> With Alex and Jan's votes, this may now pass (in ~60 hours); other PMC
> members should speak up now if they have concerns. I would like to
> pass this key change with unanimous consent.
>
> -Joan
>
> - Original Message -
> From: "Joan Touzet" <woh...@apache.org>
> To: "CouchDB Developers" <dev@couchdb.apache.org>
> Cc: "CouchDB PMC" <priv...@couchdb.apache.org>
> Sent: Thursday, 8 February, 2018 12:25:13 AM
> Subject: [AMEND OFFICIAL DOCUMENTS] Bylaws revision: HTTP API change
> notifications
>
> Hello PMC Members and developers,
>
> Subsequent to the confusion recently over the proposed change to an
> as-of-yet-unreleased API change, and follow-on concerns that existing
> HTTP API endpoints not be deprecated without sufficient warning, I am
> proposing a change to the CouchDB Bylaws.
>
> https://github.com/apache/couchdb-www/pull/27
>
> This change proposes a mandatory developer mailing list notification and
> at least a Lazy Consensus decision whenever a backwards-incompatible
> change is made to a release branch, or to master. It encourages the same
> for any non-breaking change, but stops short of requiring it.
>
> It also takes the step that changes to master constitute a technical
> decision of the project, since our intention is that master is always
> releasable, and quite often new minor releases of CouchDB are forked
> from master, not the previous major or minor release branch.
>
> As this is a "Create or amend any document marked as official"
> decision, it is being announced to the main development list. A lazy
> 2/3rds majority is required to pass the change, meaning to pass it must
> garner three or more binding +1 votes, and twice as many binding +1
> votes as binding -1 votes. Only PMC Members may cast binding votes. No
> vetoes are allowed.
>
> CouchDB PMC, please vote now.
>
> -Joan
>


Re: [RFC] On the Testing of CouchDB

2017-12-16 Thread Paul Davis
> The one thing that would be nice here if it were easy to disable certain
> tests or suites that make no sense in the pouchdb-server environment, so
> they can easily integrate it in their CI.

The cool thing is that Elixir supports this natively in that you can
add tags to test to selectively enable/disable test classes so this
will just be a matter of letting the pouchdb team disable anything
that doesn't make sense for their implementaiton.

> It would be great if we could use this opportunity to apply this across
> all JS test files when we port them to Elixir. It means a little bit
> more work per test file, but I hope with a few more contributors and
> guidelines, this is an easily paralleliseable task, so individual burden
> can be minimised.

My current approach so far is to try and first port the test directly
and then afterwards go back through and refactor things to be more
traditional. My thinking here was that the initial port could be
reviewed alongside the existing JS test to double check that we're not
dropping any important tests or assertions along the way before we
start moving a lot of code around. That said, Elixir still allows us
to break things. So for instance the replication.js tests I've broken
up into a number of functions that still follow the same order as the
original suite but once the initial port is done it'll be trivial to
split that out into a base class and then have each test extend from
there. Its also possible to generate tests too so the replication
tests that check for all combinations of local/remote source/target
pairs end up as separate tests.

> I noticed that one of the reduce tests took 30+ seconds to run on my
> machine and I experimented with different cluster configuration values
> and to nobodys surprise, the default of q=8 is the main factor in view
> test execution speed. q=4 takes ~20s, q=2 ~10s and q=1 ~5s. I’m not
> suggesting we set q=1 for all tests since q>1 is a behaviour we would
> want to test as well, but maybe we can set q=2 when running the test
> suite(s) for the time being. Shaving 25s off of a single test will get
> us a long way with all tests ported. What do others think?

I've noticed some pretty terrible slowness on OS X (which I'm assuming
you're running on) and chatting with Russel it appears that on Linux
there's a massive speed difference when running tests. I'd very much
prefer to keep our tests against a Q=3 cluster. I'd like to try and
dig in a bit to see if we can't figure out where we're having such a
dramatic time difference between the two. Hopefully some quick
measuring will point us to a knob to adjust to speed things up without
sacrificing cluster nodes during the tests.


Re: [RFC] On the Testing of CouchDB

2017-12-15 Thread Paul Davis
I went ahead and added a `make elixir` command to the elixir-suite branch.

Of note, my earlier instructions that referenced the elixir_suite
directory are now slightly different as I moved things to test/elixir
cause Russel was being slow.

Current rundown is now:

$ # Get Elixir installed as per previous
$ # Build CouchDB as per previous
$ make elixir

The `make elixir` target is currently not integrated with dependencies
(i.e., you need to run `make` on your own) and also doesn't have the
fancy things for running a single test or anything. And I haven't put
it as part of `make check` itself.

On Fri, Dec 15, 2017 at 11:45 AM, Paul Davis
<paul.joseph.da...@gmail.com> wrote:
> For `make check` it should be fairly straightforward to map the
> current approach to it. I could probably knock that out fairly quickly
> if you want me to give it a whirl.
>
> On Fri, Dec 15, 2017 at 11:42 AM, Russell Branca <chewbra...@apache.org> 
> wrote:
>> Yeah just to reiterate what Paul said, the Elixir dev experience is really
>> nice and easy to get rolling with. I had no prior actual experience with
>> Elixir and I was able to get things rolling in a few hours.
>>
>> RE Ben's question about diving in: please do! Just grab one of the unported
>> js suites and goto town. I've just been cherry-pick'ing things out of
>> Paul's branch and we can continue to do the same until we get this more
>> locked down. My goal with the porting is to keep chugging along and just
>> get it knocked out, as I really don't think it will be overly onerous to do
>> so. And if anyone else wants to jump in, there's still a fair number of
>> tests to port, just take your pick.
>>
>> One other thing that needs work is figuring out how to hook all this into
>> "make check" and what not. I've mostly ignored that as this just points at
>> a CouchDB instance and can be run directly, but we'll need to sort that out
>> at some point.
>>
>>
>> -Russell
>>
>> On Fri, Dec 15, 2017 at 9:03 AM Paul Davis <paul.joseph.da...@gmail.com>
>> wrote:
>>
>>> Hello everybody!
>>>
>>> I figured I should probably go ahead and chime in seeing as I've also
>>> been playing around porting some of the tests in my free time between
>>> ops shifts the last couple weeks.
>>>
>>> My first impression was that it was ridiculously easy to get involved.
>>> On OS X at least, `brew install elixir` was enough to get a working
>>> elixir installed (however, if you use kerl or erln8 you'll want have
>>> to build an Erlang 20.x VM to use the brew package). I went from not
>>> having Elixir installed to a full port of uuids.js with the config tag
>>> logic written in about two hours one night. So far the Elixir docs and
>>> seem very well written and put together. I'd say the worst part of
>>> Elixir so far is that knowing Erlang I find myself searching for "How
>>> do I do this Erlang thing in Elixir?" Which isn't as bad as it sounds.
>>> The Elixir libraries have certainly had a considerable amount of
>>> thought put into them to make them easy to use and remember. I find it
>>> to be a lot like my experience when learning Python in that I may have
>>> to Google once and then its muscle memory. As opposed to Erlang's
>>> library where I'm constantly reading the lists manpage to remember
>>> argument orderings and whether I want search or find versions etc.
>>>
>>> Which I guess is a long way of saying I'm rather liking the Elixir
>>> development experience so far.
>>>
>>> That said, I'm currently about half way through porting replication.js
>>> tests to Elixir. For the most part its fairly straightforward. My
>>> current approach as we've done for the other modules is to do a direct
>>> port. Once that's finished we'll want to break up that huge module
>>> into a series of modules that share a lot of the utility functions.
>>> One of the nice things about moving to Elixir is that its got a full
>>> on development story rather than our current couchjs approach that
>>> prevents sharing code easily between subsets of tests.
>>>
>>> For Ben's question on diving in, I'd do just that. I'd say leave a
>>> note here about which module(s)? you're going to port so that we're
>>> not duplicating efforts and then its basically just a matter of
>>> getting Elixir installed. For that, here's a quick rundown on how I
>>> got that working:
>>>
>>> $ brew update
>>> $ brew install elixir
>>> $ # wait for all the things...
>

Re: [ANNOUNCE] Nick Vatamaniuc joins the PMC

2017-11-11 Thread Paul Davis
Hooray and welcome, Nick!


On Sat, Nov 11, 2017 at 2:45 PM Joan Touzet  wrote:

> Congratulations! Welcome, Nick!
>
> - Original Message -
> From: "Alexander Shorin" 
> To: priv...@couchdb.apache.org
> Sent: Saturday, 11 November, 2017 1:31:36 PM
> Subject: Re: [ANNOUNCE] Nick Vatamaniuc joins the PMC
>
> Hooray to Nick! And welcome!
> --
> ,,,^..^,,,
>
>
> On Sat, Nov 11, 2017 at 4:40 PM, Jan Lehnardt  wrote:
> > Forgot to CC private@
> >
> >> Begin forwarded message:
> >>
> >> From: Jan Lehnardt 
> >> Subject: [ANNOUNCE] Nick Vatamaniuc joins the PMC
> >> Date: 11. November 2017 at 17:38:39 GMT+1
> >> To: dev 
> >> Reply-To: dev@couchdb.apache.org
> >>
> >> Dear community,
> >>
> >> I am delighted to announce that Nick Vatamaniuc joins the Apache
> CouchDB Project Management Committee today.
> >>
> >> Nick has made outstanding, sustained contributions to the project. This
> appointment is an official acknowledgement of their position within the
> community, and our trust in their ability to provide oversight for the
> project.
> >>
> >> Everybody, please join me in congratulating Nick!
> >>
> >> On behalf of the CouchDB PMC,
> >> Jan
> >> —
> >>
> >
>


Re: [VOTE] Release Apache CouchDB 1.7.1-RC1

2017-11-10 Thread Paul Davis
make check all green on OS X 10.12

+1

On Fri, Nov 10, 2017 at 12:52 PM, Adam Kocoloski  wrote:
> +1
>
> Adam
>
>> On Nov 10, 2017, at 9:20 AM, Jan Lehnardt  wrote:
>>
>> Dear community,
>>
>> I would like to release Apache CouchDB CouchDB 1.7.1-RC1.
>>
>> Changes since 1.7.0:
>>
>>- Fix authorisation issue with /db/_all_docs:
>>  https://github.com/apache/couchdb/issues/974 
>> 
>>  https://github.com/apache/couchdb/pull/975 
>> 
>>
>> We encourage the whole community to download and test these release 
>> artefacts so that any critical issues can be resolved before the release is 
>> made. Everyone is free to vote on this release, so dig right in!
>>
>> The release artefacts we are voting on are available here:
>>
>>   wget 
>> https://dist.apache.org/repos/dist/dev/couchdb/source/1.7.1/rc.1/apache-couchdb-1.7.1.tar.gz
>>  
>> 
>>   wget 
>> https://dist.apache.org/repos/dist/dev/couchdb/source/1.7.1/rc.1/apache-couchdb-1.7.1.tar.gz.asc
>>  
>> 
>>   wget 
>> https://dist.apache.org/repos/dist/dev/couchdb/source/1.7.1/rc.1/apache-couchdb-1.7.1.tar.gz.sha256
>>  
>> 
>>   wget 
>> https://dist.apache.org/repos/dist/dev/couchdb/source/1.7.1/rc.1/apache-couchdb-1.7.1.tar.gz.sha512
>>  
>> 
>>
>> Please follow the test procedure here:
>>
>>   
>> https://cwiki.apache.org/confluence/display/COUCHDB/Testing+a+Source+Release 
>> 
>>
>> Please remember that "RC1" is an annotation. If the vote passes, these 
>> artefacts will be released as Apache CouchDB 1.7.0.
>>
>> Special thanks to Russell Branca for the quick turnaround on this issue 
>> blocking people to upgrade to 1.7.0.
>>
>> Please cast your votes now.
>>
>> Thanks,
>> Jan
>> —
>>
>


Beginning merge of Pluggable Storage Engines

2017-09-13 Thread Paul Davis
Hi everyone!

I've been working the last few days on getting the PSE PRs rebased on
master to get that feature merged in. There are two PRs [1,2]
associated with this work. Given that this is a fairly decent change
I'm running the bases on making sure everyone has buy in to it before
just pulling the trigger.

My plan now is to merge [1] which is mostly just a lot of code
curation to remove the publicly accessible `#db{}` record along with a
few extras to allow for rolling reboots of a cluster. After merging
this PR we'll wait a few weeks to let it soak and then merge [2] which
introduces the actual PSE APIs/feature. Also of note is that even [2]
isn't really *that* scary as its the same implementation as we
currently have for all database storage, its merely adding the various
hooks and configuration to allow for new storage implementations.

Normally our window for this sort of thing would be 72 hours for lazy
consensus but I'll not do anything till Monday which gives everyone
the weekend given that this is a decent sized feature. And as a heads
up I'm only considering this email in regards to [1]. I'll send
another email with a similar heads up time before merging [2].

Thanks,
Paul J. Davis

[1] https://github.com/apache/couchdb/pull/495
[2] https://github.com/apache/couchdb/pull/496


Re: [DISCUSSION] Disallow all merges of PRs to master that cause tests to fail

2017-08-18 Thread Paul Davis
Yeah, +1 to +1'ing your own PR when its trivial. Minor annoyance but
its a paper trail of sorts anyway.

On Fri, Aug 18, 2017 at 10:55 AM, Joan Touzet  wrote:
> I didn't realize you could review your own PR. That gives us the "escape
> hatch" that we need.
>
> -Joan
>
> - Original Message -
> From: "Nick North" 
> To: dev@couchdb.apache.org, "Joan Touzet" 
> Sent: Friday, 18 August, 2017 9:48:40 AM
> Subject: Re: [DISCUSSION] Disallow all merges of PRs to master that cause 
> tests to fail
>
>
> This is pretty much the set of restrictions we have on the master branch in 
> my organisation, and it works well. We also require PR reviews before 
> merging, but anyone in the team can do the review, including the PR author. 
> This means the author has to make a conscious decision on whether the changes 
> are trivial enough to sign off themselves, or whether someone else should 
> review them, and there's an audit trail of that decision being made.
>
>
> Nick
>
>
>
> On Wed, 16 Aug 2017 at 23:49 Joan Touzet < woh...@apache.org > wrote:
>
>
> Seems there is general consensus.
>
> Now, how do people feel about me asking Infra to make this change to the
> main repos (couchdb, couchdb-fauxton, etc.):
>
> https://help.github.com/assets/images/help/repository/protecting-branch-loose-status.png
>
> Specifically:
>
> Protect master branch (and any release branches like 2.1.x)
> Require status checks to pass before merging
> Require branches be up to date before merging
>
> We can have an optional secondary discussion around enforcing:
>
> Require pull request reviews before merging
>
> This would enforce our RTC model, but we *need* more active devs if this
> is going to pass. I've had to beg multiple times for many of my PRs in
> the 2.1.0 release cycle to be approved...even trivial documentation
> changes. It was very frustrating.
>
> -Joan
>
> - Original Message -
> From: "Nick Vatamaniuc" < vatam...@gmail.com >
> To: dev@couchdb.apache.org
> Sent: Wednesday, 16 August, 2017 6:01:34 PM
> Subject: Re: [DISCUSSION] Disallow all merges of PRs to master that cause 
> tests to fail
>
> +1
>
> On Aug 16, 2017 15:50, "Alexander Shorin" < kxe...@gmail.com > wrote:
>
>> It's strange to say something else than +1 or question the topic in any
>> way.
>>
>> Good call, Joan!
>> --
>> ,,,^..^,,,
>>
>>
>> On Wed, Aug 16, 2017 at 6:46 AM, Joan Touzet < woh...@apache.org > wrote:
>> > Hi committers,
>> >
>> > I'd like to propose a change to our policy on version control, namely
>> > that no check-ins be allowed on the master branch unless CI test runs
>> > against that PR are clean.
>> >
>> > We've worked hard as a group to get runs clean. We need to protect
>> > that achievement and investment in our test suite. That means not
>> > letting rogue check-ins slip by because we are ignoring a red X in
>> > GitHub (GH) from the Travis run.
>> >
>> > Things I see as exceptions:
>> > * Changes to things clearly not related to the test suite, i.e.
>> > documentation, support scripts, rel/overlay/etc/ files, etc.
>> > * Changes already agreed upon in a previous PR/discussion for
>> > administrative tasks
>> >
>> > Interesting situation right now for a discussion: Garren has a PR up[1]
>> > that enables the mango tests to be part of the standard Travis/Jenkins
>> > runs. Unfortunately, it doesn't pass on one of our platforms right now
>> > and that needs investigation. Should we allow the PR to land and fix
>> > the problems in master, or should the PR hold-up until it can land along
>> > with the fixes for the failing mango tests? I can see both sides of this
>> > argument.
>> >
>> > It may or may not be possible for our GH setup to actually prevent such
>> > checkins (the Apache GH setup is somewhat restricted, and various things
>> > like commit hooks and webhooks have to be configured by INFRA, not us).
>> >
>> > I'd like to further discuss whether people feel such a hook would be
>> > acceptable, onerous or otherwise. Personally, I worry that such a setup
>> > might prevent us from checking in some of the exceptions above, but if
>> > there is a way around it, we could proceed down that path.
>> >
>> > What do you think, sirs?[2]
>> > Joan
>> >
>> >
>> > [1]: https://github.com/apache/couchdb/pull/753
>> > [2]: It's a Mystery Science Theatre 3000 Joel reference. :)
>>


Re: [DISCUSSION] Moving to a stricter quarterly release cycle?

2017-08-16 Thread Paul Davis
I can see all of 3, 4, and 6 month release cycles. Before committing
to this I'd like to see what the current process is like and how much
"work" is actually involved. Theoretically if this were a "bump
version number, write email, push button" sort of situation then I'd
be quite happy going this route. However if there's hours of manual
work then it gets to be a little more of a PITA. Also, automating most
of it would hopefully prevent different committers from doing releases
slightly differently.

So, +1 to the general idea with the caveat that I'd like to look more
at the tooling before making it a project commitment.

On Wed, Aug 16, 2017 at 1:55 AM, Joan Touzet  wrote:
> Hi everyone,
>
> I'd like to consider moving us to a regular every-3-months release
> cycle. Under this plan the next CouchDB release would be on or
> around 1 November 2017. The next few releases would be around 1
> February 2018, 1 May 2018, and 1 Aug 2018 again.
>
> The recently achieved "clean test suite" milestone will allow us
> to achieve a better release quality, and should enable this faster
> pace of releases. It should be a lot easier to cut a release,
> knowing it is clean - I am hoping that others can help here.
> Perhaps we can set up a 4-way rota for releases; if I could get 3
> more committers to volunteer, we'd each only have to run 1 release
> a year!
>
> We would still accelerate any point releases required for security
> over and above this, and we would skip any releases where we
> simply have nothing to commit (in which case I'd be nervous about
> the future of the project!)
>
> This isn't a vote, but feel free to +1/0/-1 with comments if it
> makes it easier for you to respond.
>
> -Joan


Re: [DISCUSSION] Disallow all merges of PRs to master that cause tests to fail

2017-08-16 Thread Paul Davis
On Wed, Aug 16, 2017 at 1:46 AM, Joan Touzet  wrote:
> Hi committers,
>
> I'd like to propose a change to our policy on version control, namely
> that no check-ins be allowed on the master branch unless CI test runs
> against that PR are clean.
>
> We've worked hard as a group to get runs clean. We need to protect
> that achievement and investment in our test suite. That means not
> letting rogue check-ins slip by because we are ignoring a red X in
> GitHub (GH) from the Travis run.
>
> Things I see as exceptions:
> * Changes to things clearly not related to the test suite, i.e.
>   documentation, support scripts, rel/overlay/etc/ files, etc.
> * Changes already agreed upon in a previous PR/discussion for
>   administrative tasks
>

I'd be +1 for requiring a clean Travis-CI run before something can be
merged. I'm pretty sure we can configure that easily enough (via INFRA
as you mention). However I'd make that absolutely no exceptions. I'd
also disable commits directly to master not coming through PR so that
it can't be bypassed. This is especially important for when merge PRs
to dependencies we'd need to require the rebar.config.script update to
PR'ed so it goes through CI.

> Interesting situation right now for a discussion: Garren has a PR up[1]
> that enables the mango tests to be part of the standard Travis/Jenkins
> runs. Unfortunately, it doesn't pass on one of our platforms right now
> and that needs investigation. Should we allow the PR to land and fix
> the problems in master, or should the PR hold-up until it can land along
> with the fixes for the failing mango tests? I can see both sides of this
> argument.
>
> It may or may not be possible for our GH setup to actually prevent such
> checkins (the Apache GH setup is somewhat restricted, and various things
> like commit hooks and webhooks have to be configured by INFRA, not us).
>
> I'd like to further discuss whether people feel such a hook would be
> acceptable, onerous or otherwise. Personally, I worry that such a setup
> might prevent us from checking in some of the exceptions above, but if
> there is a way around it, we could proceed down that path.
>

I don't think we should entertain any gray area as it muddies the
waters. I'd say we either make it 100% or leave it in the current
state of "best effort" which is what I believe your current example
falls into which is the status quo.

> What do you think, sirs?[2]
> Joan
>
>
> [1]: https://github.com/apache/couchdb/pull/753
> [2]: It's a Mystery Science Theatre 3000 Joel reference. :)


Re: [VOTE] Release Apache CouchDB 2.1.0-RC1

2017-08-02 Thread Paul Davis
Agreed.

On Wed, Aug 2, 2017 at 1:59 PM Jan Lehnardt <m...@jan.io> wrote:

> Yeah, that's the idea. I'll look into the build script. This is not a
> fault in CouchDB but my mac packaging
>
> Cheers
> Jan
> --
>
> > On 2. Aug 2017, at 20:13, Paul Davis <paul.joseph.da...@gmail.com>
> wrote:
> >
> > This appears to be the issue on William's use of the Mac binary:
> >
> > CRASH REPORT Process  (<0.207.0>) with 0 neighbors exited with reason:
> > "dlopen(/Applications/Apache
> >
> CouchDB.app/Contents/Resources/couchdbx-core/bin/../lib/couch-2.1.0-RC1/priv/couch_icu_driver.so,
> > 2): Library not loaded: /usr/local/opt/icu4c/lib/libicuuc.58.dylib\n
> > Referenced from: /Applications/Apache
> >
> CouchDB.app/Contents/Resources/couchdbx-core/lib/couch-2.1.0-RC1/priv/couch_icu_driver.so\n
> >
> > Looks like its linked against a Homebrew install if icu4c. I'm not
> > sure what the proper solution is for this sort of thing. If memory
> > serves I think you can move things into the App Bundle and do magical
> > things but I've not got any idea how that sort of thing works.
> >
> > On Wed, Aug 2, 2017 at 11:18 AM, William Edney
> > <bed...@technicalpursuit.com> wrote:
> >> Jan -
> >>
> >> I tried the Mac binary 'app bundle' 2.1 build on a machine with no prior
> >> CouchDB installation (no prior 'app bundle' builds or other CouchDBs
> >> installed via something like 'brew') and it doesn't work for me. The
> >> process seems hung.
> >>
> >> OS: macOS Sierra, v10.12.6
> >> Hardware: MBP, 2.9Ghz i7
> >>
> >> Here's a snippet of logs from startup:
> >>
> >> ```
> >> [[info] 2017-08-02T16:05:51.380668Z couchdb@localhost <0.7.0> 
> >> Application couch_log started on node couchdb@localhost
> >> [info] 2017-08-02T16:05:51.383552Z couchdb@localhost <0.7.0> 
> >> Application folsom started on node couchdb@localhost
> >> [info] 2017-08-02T16:05:51.407480Z couchdb@localhost <0.7.0> 
> >> Application couch_stats started on node couchdb@localhost
> >> [info] 2017-08-02T16:05:51.407577Z couchdb@localhost <0.7.0> 
> >> Application khash started on node couchdb@localhost
> >> [info] 2017-08-02T16:05:51.413021Z couchdb@localhost <0.7.0> 
> >> Application couch_event started on node couchdb@localhost
> >> [info] 2017-08-02T16:05:51.416350Z couchdb@localhost <0.7.0> 
> >> Application ibrowse started on node couchdb@localhost
> >> [info] 2017-08-02T16:05:51.419350Z couchdb@localhost <0.7.0> 
> >> Application ioq started on node couchdb@localhost
> >> [info] 2017-08-02T16:05:51.419445Z couchdb@localhost <0.7.0> 
> >> Application mochiweb started on node couchdb@localhost
> >> [info] 2017-08-02T16:05:51.424261Z couchdb@localhost <0.203.0> 
> >> Apache CouchDB 2.1.0 is starting.
> >>
> >> [info] 2017-08-02T16:05:51.424317Z couchdb@localhost <0.204.0> 
> >> Starting couch_sup
> >> [error] 2017-08-02T16:05:51.429024Z couchdb@localhost <0.207.0>
> 
> >> CRASH REPORT Process  (<0.207.0>) with 0 neighbors exited with reason:
> >> "dlopen(/Applications/Apache
> >>
> CouchDB.app/Contents/Resources/couchdbx-core/bin/../lib/couch-2.1.0-RC1/priv/couch_icu_driver.so,
> >> 2): Library not loaded: /usr/local/opt/icu4c/lib/libicuuc.58.dylib\n
> >> Referenced from: /Applications/Apache
> >>
> CouchDB.app/Contents/Resources/couchdbx-core/lib/couch-2.1.0-RC1/priv/couch_icu_driver.so\n
> >> Reason: image not found" at gen_server:init_it/6(line:344) <=
> >> proc_lib:init_p_do_apply/3(line:240); initial_call:
> >> {couch_drv,init,['Argument__1']}, ancestors:
> >> [couch_primary_services,couch_sup,<0.203.0>], messages: [], links:
> >> [<0.206.0>], dictionary: [], trap_exit: false, status: running,
> heap_size:
> >> 1598, stack_size: 27, reductions: 206
> >> [error] 2017-08-02T16:05:51.429240Z couchdb@localhost <0.203.0>
> 
> >> Error starting Apache CouchDB:
> >>
> >>
> >>
> {error,{shutdown,{failed_to_start_child,couch_primary_services,{shutdown,{failed_to_start_child,collation_driver,"dlopen(/Applications/Apache
> >>
> CouchDB.app/Contents/Resources/couchdbx-core/bin/../lib/couch-2.1.0-RC1/priv/couch_icu_driver.so,
> >> 2): Library not loaded: /usr/local/opt/icu4c/lib/libicuuc.58.dylib\n
> &

  1   2   3   4   5   6   7   8   9   10   >