Re: [DISCUSS] couchdb 4.0 transactional semantics

2020-07-15 Thread Adam Kocoloski
Sorry, I also missed that you quoted this specific bit about eagerly requesting 
a new snapshot. Currently the code will just react to the transaction expiring, 
then wait till it acquires a new snapshot if “restart_tx” is set (which can 
take a couple of milliseconds on a FoundationDB cluster that is deployed across 
multiple AZs in a cloud Region) and then proceed.

Adam

> On Jul 15, 2020, at 6:54 PM, Adam Kocoloski  wrote:
> 
> Right now the code has an internal “restart_tx” flag that is used to 
> automatically request a new snapshot if the original one expires and continue 
> streaming the response. It can be used for all manner of multi-row responses, 
> not just _changes.
> 
> As this is a pretty big change to the isolation guarantees provided by the 
> database Bob volunteered to elevate the issue to the mailing list for a 
> deeper discussion.
> 
> Cheers, Adam
> 
>> On Jul 15, 2020, at 11:38 AM, Joan Touzet  wrote:
>> 
>> I'm having trouble following the thread...
>> 
>> On 14/07/2020 14:56, Adam Kocoloski wrote:
>>> For cases where you’re not concerned about the snapshot isolation (e.g. 
>>> streaming an entire _changes feed), there is a small performance benefit to 
>>> requesting a new FDB transaction asynchronously before the old one actually 
>>> times out and swapping over to it. That’s a pattern I’ve seen in other FDB 
>>> layers but I’m not sure we’ve used it anywhere in CouchDB yet.
>> 
>> How does _changes work right now in the proposed 4.0 code?
>> 
>> -Joan
> 



Re: [DISCUSS] couchdb 4.0 transactional semantics

2020-07-15 Thread Adam Kocoloski
Right now the code has an internal “restart_tx” flag that is used to 
automatically request a new snapshot if the original one expires and continue 
streaming the response. It can be used for all manner of multi-row responses, 
not just _changes.

As this is a pretty big change to the isolation guarantees provided by the 
database Bob volunteered to elevate the issue to the mailing list for a deeper 
discussion.

Cheers, Adam

> On Jul 15, 2020, at 11:38 AM, Joan Touzet  wrote:
> 
> I'm having trouble following the thread...
> 
> On 14/07/2020 14:56, Adam Kocoloski wrote:
>> For cases where you’re not concerned about the snapshot isolation (e.g. 
>> streaming an entire _changes feed), there is a small performance benefit to 
>> requesting a new FDB transaction asynchronously before the old one actually 
>> times out and swapping over to it. That’s a pattern I’ve seen in other FDB 
>> layers but I’m not sure we’ve used it anywhere in CouchDB yet.
> 
> How does _changes work right now in the proposed 4.0 code?
> 
> -Joan



Re: [DISCUSS] couchdb 4.0 transactional semantics

2020-07-15 Thread Joan Touzet

I'm having trouble following the thread...

On 14/07/2020 14:56, Adam Kocoloski wrote:

For cases where you’re not concerned about the snapshot isolation (e.g. 
streaming an entire _changes feed), there is a small performance benefit to 
requesting a new FDB transaction asynchronously before the old one actually 
times out and swapping over to it. That’s a pattern I’ve seen in other FDB 
layers but I’m not sure we’ve used it anywhere in CouchDB yet.


How does _changes work right now in the proposed 4.0 code?

-Joan


Re: [DISCUSS] couchdb 4.0 transactional semantics

2020-07-15 Thread Jan Lehnardt



> On 15. Jul 2020, at 16:12, Robert Newson  wrote:
> 
> 
> Thanks Jan
> 
> I would prefer not to have the configuration switch, instead remove what we 
> don’t want. As you said there’ll be a 3 / 4 split for a while (and not just 
> for this reason). 

I’d support an effort for folks to ease into 4.x, as long as it is not the 
default behaviour. I haven’t thought about this enough to have a definite 
opinion about what that looks like.

Best
Jan
—
> -- 
>  Robert Samuel Newson
>  rnew...@apache.org
> 
> On Wed, 15 Jul 2020, at 14:46, Jan Lehnardt wrote:
>> 
>>> On 14. Jul 2020, at 18:00, Adam Kocoloski  wrote:
>>> 
>>> I think there’s tremendous value in being able to tell our users that each 
>>> response served by CouchDB is constructed from a single isolated snapshot 
>>> of the underlying database. I’d advocate for this being the default 
>>> behavior of 4.0.
>> 
>> I too am in favour of this. I apologise for not speaking up in the 
>> earlier thread, which I followed closely, but never found the time to 
>> respond to.
>> 
>> From rnewson’s options, I’d suggest 3. the mandatory limit parameter. 
>> While this does indeed mean a BC break, it teaches the right semantics 
>> for folks on 4.0 and onwards. For client libraries like our own nano, 
>> we can easily wrap this behaviour, so the resulting API is mostly 
>> compatible still, at least when used in streaming mode, less so when 
>> buffering a big _all_docs response).
>> 
>>> If folks wanted to add an opt-in compatibility mode to support longer 
>>> responses, I suppose that could be OK. I think we should discourage that 
>>> access pattern in general, though, as it’s somewhat less friendly to 
>>> various other parts of the stack than a pattern of shorter responses and a 
>>> smart pagination API like the one we’re introducing. To wit, I don’t think 
>>> we’d want to support that compatibility mode in IBM Cloud.
>> 
>> Like Adam, I do not mind a compat mode, either through a different API 
>> endpoint, or even a config option. I think we will be fine in getting 
>> people on this path when we document this in our update guide for the 
>> 4.0 release. I don’t think this will lead to a Python 2/3 situation 
>> overall, because the 4.0+ features are compelling enough for relatively 
>> small changes required, and CouchDB 3.x in its then latest form will 
>> continue to be a fine database for years to come, for folks who can’t 
>> upgrade as easily. So yes, I anticipate we’ll live in a two-versions 
>> world a little longer than we did during 1.x to 2.x, but the reasons to 
>> leave 1.x behind were a little more severe than the improvements of 4.x 
>> over 3.x (while still significant, of course).
>> 
>> Best
>> Jan
>> —
>> 
>>> 
>>> Adam
>>> 
 On Jul 14, 2020, at 10:18 AM, Robert Samuel Newson  
 wrote:
 
 Thanks Nick, very helpful, and it vindicates me opening this thread.
 
 I don't accept Mike Rhodes argument at all but I should explain why I 
 don't;
 
 In CouchDB 1.x, a response was generated from a single .couch file. There 
 was always a window between the start of the request as the client sees it 
 and CouchDB acquiring a snapshot of the relevant database. I don't think 
 that gap is meaningful and does not refute our statements of the time that 
 CouchDB responses are from a snapshot (specifically, that no change to the 
 database made _during_ the response will be visible in _this_ response). 
 In CouchDB 2.x (and continuing in 3.x), a CouchDB database typically 
 consists of multiple shards, each of which, once opened, remain 
 snapshotted for the duration of that response. The difference between 1.x 
 and 2.x/3.x is that the window is potentially larger (though the requests 
 are issued in parallel). The response, however much it returned, was 
 impervious to changes in other requests once it has begun.
 
 I don't think _all_docs, _view or a non-continuous _changes response 
 should allow changes made in other requests to appear midway through them 
 and I want to hear the opinions of folks that have watched over CouchDB 
 from its earliest days on this specific point (If I must name names, at 
 least Adam K, Paul D, Jan L, Joan T). If there's a majority for deviating 
 from this semantic, I will go with the majority.
 
 If we were to agree to preserve the 'single snapshot' behaviour, what 
 would the behaviour be if we can't honour it because of the FoundationDB 
 transaction limits?
 
 I see a few options.
 
 1) We could end the response uncleanly, mid-response. CouchDB does this 
 when it has no alternative, and it is ugly, but it is usually handled well 
 by clients. They are at least not usually convinced they got a complete 
 response if they are using a competent HTTP client.
 
 2) We could disavow the streaming API, as you've suggested, attempt to 
 gather the full 

Re: [DISCUSS] couchdb 4.0 transactional semantics

2020-07-15 Thread Robert Newson


Thanks Jan

I would prefer not to have the configuration switch, instead remove what we 
don’t want. As you said there’ll be a 3 / 4 split for a while (and not just for 
this reason). 
-- 
  Robert Samuel Newson
  rnew...@apache.org

On Wed, 15 Jul 2020, at 14:46, Jan Lehnardt wrote:
> 
> > On 14. Jul 2020, at 18:00, Adam Kocoloski  wrote:
> > 
> > I think there’s tremendous value in being able to tell our users that each 
> > response served by CouchDB is constructed from a single isolated snapshot 
> > of the underlying database. I’d advocate for this being the default 
> > behavior of 4.0.
> 
> I too am in favour of this. I apologise for not speaking up in the 
> earlier thread, which I followed closely, but never found the time to 
> respond to.
> 
> From rnewson’s options, I’d suggest 3. the mandatory limit parameter. 
> While this does indeed mean a BC break, it teaches the right semantics 
> for folks on 4.0 and onwards. For client libraries like our own nano, 
> we can easily wrap this behaviour, so the resulting API is mostly 
> compatible still, at least when used in streaming mode, less so when 
> buffering a big _all_docs response).
> 
> > If folks wanted to add an opt-in compatibility mode to support longer 
> > responses, I suppose that could be OK. I think we should discourage that 
> > access pattern in general, though, as it’s somewhat less friendly to 
> > various other parts of the stack than a pattern of shorter responses and a 
> > smart pagination API like the one we’re introducing. To wit, I don’t think 
> > we’d want to support that compatibility mode in IBM Cloud.
> 
> Like Adam, I do not mind a compat mode, either through a different API 
> endpoint, or even a config option. I think we will be fine in getting 
> people on this path when we document this in our update guide for the 
> 4.0 release. I don’t think this will lead to a Python 2/3 situation 
> overall, because the 4.0+ features are compelling enough for relatively 
> small changes required, and CouchDB 3.x in its then latest form will 
> continue to be a fine database for years to come, for folks who can’t 
> upgrade as easily. So yes, I anticipate we’ll live in a two-versions 
> world a little longer than we did during 1.x to 2.x, but the reasons to 
> leave 1.x behind were a little more severe than the improvements of 4.x 
> over 3.x (while still significant, of course).
> 
> Best
> Jan
> —
> 
> > 
> > Adam
> > 
> >> On Jul 14, 2020, at 10:18 AM, Robert Samuel Newson  
> >> wrote:
> >> 
> >> Thanks Nick, very helpful, and it vindicates me opening this thread.
> >> 
> >> I don't accept Mike Rhodes argument at all but I should explain why I 
> >> don't;
> >> 
> >> In CouchDB 1.x, a response was generated from a single .couch file. There 
> >> was always a window between the start of the request as the client sees it 
> >> and CouchDB acquiring a snapshot of the relevant database. I don't think 
> >> that gap is meaningful and does not refute our statements of the time that 
> >> CouchDB responses are from a snapshot (specifically, that no change to the 
> >> database made _during_ the response will be visible in _this_ response). 
> >> In CouchDB 2.x (and continuing in 3.x), a CouchDB database typically 
> >> consists of multiple shards, each of which, once opened, remain 
> >> snapshotted for the duration of that response. The difference between 1.x 
> >> and 2.x/3.x is that the window is potentially larger (though the requests 
> >> are issued in parallel). The response, however much it returned, was 
> >> impervious to changes in other requests once it has begun.
> >> 
> >> I don't think _all_docs, _view or a non-continuous _changes response 
> >> should allow changes made in other requests to appear midway through them 
> >> and I want to hear the opinions of folks that have watched over CouchDB 
> >> from its earliest days on this specific point (If I must name names, at 
> >> least Adam K, Paul D, Jan L, Joan T). If there's a majority for deviating 
> >> from this semantic, I will go with the majority.
> >> 
> >> If we were to agree to preserve the 'single snapshot' behaviour, what 
> >> would the behaviour be if we can't honour it because of the FoundationDB 
> >> transaction limits?
> >> 
> >> I see a few options.
> >> 
> >> 1) We could end the response uncleanly, mid-response. CouchDB does this 
> >> when it has no alternative, and it is ugly, but it is usually handled well 
> >> by clients. They are at least not usually convinced they got a complete 
> >> response if they are using a competent HTTP client.
> >> 
> >> 2) We could disavow the streaming API, as you've suggested, attempt to 
> >> gather the full response. If we do this within the FDB bounds, return a 
> >> 200 code and the response body. A 400 and an error body if we don't.
> >> 
> >> 3) We could make the "limit" parameter mandatory and with an upper bound, 
> >> in combination with 1 or 2, such that a valid request is very likely to be 
> >> 

Re: [DISCUSS] couchdb 4.0 transactional semantics

2020-07-15 Thread Jan Lehnardt


> On 14. Jul 2020, at 18:00, Adam Kocoloski  wrote:
> 
> I think there’s tremendous value in being able to tell our users that each 
> response served by CouchDB is constructed from a single isolated snapshot of 
> the underlying database. I’d advocate for this being the default behavior of 
> 4.0.

I too am in favour of this. I apologise for not speaking up in the earlier 
thread, which I followed closely, but never found the time to respond to.

From rnewson’s options, I’d suggest 3. the mandatory limit parameter. While 
this does indeed mean a BC break, it teaches the right semantics for folks on 
4.0 and onwards. For client libraries like our own nano, we can easily wrap 
this behaviour, so the resulting API is mostly compatible still, at least when 
used in streaming mode, less so when buffering a big _all_docs response).

> If folks wanted to add an opt-in compatibility mode to support longer 
> responses, I suppose that could be OK. I think we should discourage that 
> access pattern in general, though, as it’s somewhat less friendly to various 
> other parts of the stack than a pattern of shorter responses and a smart 
> pagination API like the one we’re introducing. To wit, I don’t think we’d 
> want to support that compatibility mode in IBM Cloud.

Like Adam, I do not mind a compat mode, either through a different API 
endpoint, or even a config option. I think we will be fine in getting people on 
this path when we document this in our update guide for the 4.0 release. I 
don’t think this will lead to a Python 2/3 situation overall, because the 4.0+ 
features are compelling enough for relatively small changes required, and 
CouchDB 3.x in its then latest form will continue to be a fine database for 
years to come, for folks who can’t upgrade as easily. So yes, I anticipate 
we’ll live in a two-versions world a little longer than we did during 1.x to 
2.x, but the reasons to leave 1.x behind were a little more severe than the 
improvements of 4.x over 3.x (while still significant, of course).

Best
Jan
—

> 
> Adam
> 
>> On Jul 14, 2020, at 10:18 AM, Robert Samuel Newson  
>> wrote:
>> 
>> Thanks Nick, very helpful, and it vindicates me opening this thread.
>> 
>> I don't accept Mike Rhodes argument at all but I should explain why I don't;
>> 
>> In CouchDB 1.x, a response was generated from a single .couch file. There 
>> was always a window between the start of the request as the client sees it 
>> and CouchDB acquiring a snapshot of the relevant database. I don't think 
>> that gap is meaningful and does not refute our statements of the time that 
>> CouchDB responses are from a snapshot (specifically, that no change to the 
>> database made _during_ the response will be visible in _this_ response). In 
>> CouchDB 2.x (and continuing in 3.x), a CouchDB database typically consists 
>> of multiple shards, each of which, once opened, remain snapshotted for the 
>> duration of that response. The difference between 1.x and 2.x/3.x is that 
>> the window is potentially larger (though the requests are issued in 
>> parallel). The response, however much it returned, was impervious to changes 
>> in other requests once it has begun.
>> 
>> I don't think _all_docs, _view or a non-continuous _changes response should 
>> allow changes made in other requests to appear midway through them and I 
>> want to hear the opinions of folks that have watched over CouchDB from its 
>> earliest days on this specific point (If I must name names, at least Adam K, 
>> Paul D, Jan L, Joan T). If there's a majority for deviating from this 
>> semantic, I will go with the majority.
>> 
>> If we were to agree to preserve the 'single snapshot' behaviour, what would 
>> the behaviour be if we can't honour it because of the FoundationDB 
>> transaction limits?
>> 
>> I see a few options.
>> 
>> 1) We could end the response uncleanly, mid-response. CouchDB does this when 
>> it has no alternative, and it is ugly, but it is usually handled well by 
>> clients. They are at least not usually convinced they got a complete 
>> response if they are using a competent HTTP client.
>> 
>> 2) We could disavow the streaming API, as you've suggested, attempt to 
>> gather the full response. If we do this within the FDB bounds, return a 200 
>> code and the response body. A 400 and an error body if we don't.
>> 
>> 3) We could make the "limit" parameter mandatory and with an upper bound, in 
>> combination with 1 or 2, such that a valid request is very likely to be 
>> served within the limits.
>> 
>> I'd like to hear more voices on which way we want to break the unachievable 
>> semantic of old where you could read _all_docs on a billion document 
>> database over, uptime gods willing, a snapshot of the database.
>> 
>> B.
>> 
>>> On 13 Jul 2020, at 21:15, Nick Vatamaniuc  wrote:
>>> 
>>> Thanks for bringing the topic up for the discussion!
>>> 
>>> For background, this topic was discussed on the mailing list starting
>>> in February,