Re: [DISCUSS] Mango indexes on FDB

2020-03-27 Thread Paul Davis
Thanks! For some reason your step 4 was elided in the GMail UI but not
when Garren responded and I was confused.

On Fri, Mar 27, 2020 at 9:11 AM Glynn Bird  wrote:
>
> > The quoting here is weird. Are you saying to skip _all_docs in your
> proposal, Glynn?
>
> I'm saying eliminate (3) from your list of things.
>
> 1. If user specifies an index, use it even if we have to wait
> 2. If an index is built that can be used, use it
> 3. n/a
> 4. As a last resort use _all_docs
>
>
> On Thu, 26 Mar 2020 at 16:59, Paul Davis 
> wrote:
>
> > On Thu, Mar 26, 2020 at 5:33 AM Will Holley  wrote:
> > >
> > > Ah - in that case I think we should remove step 3, as it leads to a
> > > confusing mental model. It's much simpler to explain that Mango will only
> > > use fresh indexes and any new indexes will build in the background.
> > >
> >
> > Simpler in some respect. The trade off being that we then have to
> > teach users how to know that an index is built and also that they then
> > need to be aware that different index types will have different ideas
> > of what "built" means.
> >
> > > On Thu, 26 Mar 2020 at 10:15, Garren Smith  wrote:
> > >
> > > > On Thu, Mar 26, 2020 at 11:04 AM Will Holley 
> > wrote:
> > > >
> > > > > Broadly, I think it's a big step forward if we can prevent Mango from
> > > > > automatically selecting extremely stale indexes.
> > > > >
> > > > > I've been going back and forth on whether step 3 could lead to some
> > > > > difficult-to-predict behaviour. If we assume that requests have a
> > short
> > > > > timeout - e.g. we can't return any result if it doesn't complete
> > within
> > > > the
> > > > > FDB transaction timeout - then I think it's fine: queries that use
> > > > > _all_docs and a large database will be timing out anyway.
> > > > >
> > > > > If we were to allow long-running queries then it seems a bit
> > sketchier
> > > > > because adding an index to a large database could cause queries that
> > > > > previously completed to start timing out whilst they block on the
> > index
> > > > > build. This is basically how Mango in CouchDB 2/3 behaves and has
> > been a
> > > > > big pain point for customers I've worked with, to the point where you
> > > > > basically need to explicitly specify which index Mango uses in all
> > cases
> > > > if
> > > > > you're to avoid surprise timeouts when somebody adds a new index.
> > > > >
> > > > > As I understand it, we're not allowing queries to span FDB
> > transactions
> > > > so
> > > > > this latter case is not something to worry about?
> > > >
> > > >
> > > > We are going to allow queries to span transactions. This is already
> > > > implemented for views and will be for mango
> > > >
> > > >
> > > > >
> > > > > Cheers,
> > > > >
> > > > > Will
> > > > >
> > > > > On Wed, 25 Mar 2020 at 19:43, Garren Smith 
> > wrote:
> > > > >
> > > > > > On Wed, Mar 25, 2020 at 8:35 PM Paul Davis <
> > > > paul.joseph.da...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > > It was therefore felt that having an immediate "Not ready"
> > signal
> > > > for
> > > > > > > just _some_ calls to _find, based on the type of backing index,
> > was a
> > > > > bad
> > > > > > > and confusing api.
> > > > > > > >
> > > > > > > > We also discussed _find calls where the user does not specify
> > an
> > > > > index,
> > > > > > > and concluded that we would be free to choose between using the
> > > > > _all_docs
> > > > > > > index (which is always up to date but rarely the best index for a
> > > > given
> > > > > > > selector) or blocking to update a better but stale index.
> > > > > > > >
> > > > > > > > Summary-ing my summarisation;
> > > > > > > >
> > > > > > > > 1) if you specify an index, we'll use it even if we have to
> > update
> > > > > it,
> > > > > > > no matter how long that takes.
> > > > > > > > 2) if you don't specify an index, it's the dealers choice. The
> > > > > details
> > > > > > > here may change in point releases.
> > > > > > > >
> > > > > > >
> > > > > > > So it seems there's still a bit of confusion on what the
> > consensus is
> > > > > > > here. The way that I had thought this would work is that we'd do
> > > > > > > something like such:
> > > > > > >
> > > > > > > 1. If user specifies and index, use it even if we have to wait
> > > > > > > 2. If an index is built that can be used, use it
> > > > > > > 3. If an index is building that can be used, wait for it
> > > > > > > 4. As a last resort use _all_docs
> > > > > > >
> > > > > > > Discussing with Garren on the PR he's of the opinion that we
> > should
> > > > > > > skip step 3 and just go directly to using _all_docs if nothing is
> > > > > > > built.
> > > > > > >
> > > > > >
> > > > > > I just want to clarify step 3. I'm ok with using an index that
> > still
> > > > > needs
> > > > > > to be built as long as there is no other built index
> > > > > > that can service the request.
> > > > > >
> > > > > > So the big thing for me is to always prefer a built index over a
> > > > building

Re: [DISCUSS] Mango indexes on FDB

2020-03-27 Thread Glynn Bird
> The quoting here is weird. Are you saying to skip _all_docs in your
proposal, Glynn?

I'm saying eliminate (3) from your list of things.

1. If user specifies an index, use it even if we have to wait
2. If an index is built that can be used, use it
3. n/a
4. As a last resort use _all_docs


On Thu, 26 Mar 2020 at 16:59, Paul Davis 
wrote:

> On Thu, Mar 26, 2020 at 5:33 AM Will Holley  wrote:
> >
> > Ah - in that case I think we should remove step 3, as it leads to a
> > confusing mental model. It's much simpler to explain that Mango will only
> > use fresh indexes and any new indexes will build in the background.
> >
>
> Simpler in some respect. The trade off being that we then have to
> teach users how to know that an index is built and also that they then
> need to be aware that different index types will have different ideas
> of what "built" means.
>
> > On Thu, 26 Mar 2020 at 10:15, Garren Smith  wrote:
> >
> > > On Thu, Mar 26, 2020 at 11:04 AM Will Holley 
> wrote:
> > >
> > > > Broadly, I think it's a big step forward if we can prevent Mango from
> > > > automatically selecting extremely stale indexes.
> > > >
> > > > I've been going back and forth on whether step 3 could lead to some
> > > > difficult-to-predict behaviour. If we assume that requests have a
> short
> > > > timeout - e.g. we can't return any result if it doesn't complete
> within
> > > the
> > > > FDB transaction timeout - then I think it's fine: queries that use
> > > > _all_docs and a large database will be timing out anyway.
> > > >
> > > > If we were to allow long-running queries then it seems a bit
> sketchier
> > > > because adding an index to a large database could cause queries that
> > > > previously completed to start timing out whilst they block on the
> index
> > > > build. This is basically how Mango in CouchDB 2/3 behaves and has
> been a
> > > > big pain point for customers I've worked with, to the point where you
> > > > basically need to explicitly specify which index Mango uses in all
> cases
> > > if
> > > > you're to avoid surprise timeouts when somebody adds a new index.
> > > >
> > > > As I understand it, we're not allowing queries to span FDB
> transactions
> > > so
> > > > this latter case is not something to worry about?
> > >
> > >
> > > We are going to allow queries to span transactions. This is already
> > > implemented for views and will be for mango
> > >
> > >
> > > >
> > > > Cheers,
> > > >
> > > > Will
> > > >
> > > > On Wed, 25 Mar 2020 at 19:43, Garren Smith 
> wrote:
> > > >
> > > > > On Wed, Mar 25, 2020 at 8:35 PM Paul Davis <
> > > paul.joseph.da...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > > It was therefore felt that having an immediate "Not ready"
> signal
> > > for
> > > > > > just _some_ calls to _find, based on the type of backing index,
> was a
> > > > bad
> > > > > > and confusing api.
> > > > > > >
> > > > > > > We also discussed _find calls where the user does not specify
> an
> > > > index,
> > > > > > and concluded that we would be free to choose between using the
> > > > _all_docs
> > > > > > index (which is always up to date but rarely the best index for a
> > > given
> > > > > > selector) or blocking to update a better but stale index.
> > > > > > >
> > > > > > > Summary-ing my summarisation;
> > > > > > >
> > > > > > > 1) if you specify an index, we'll use it even if we have to
> update
> > > > it,
> > > > > > no matter how long that takes.
> > > > > > > 2) if you don't specify an index, it's the dealers choice. The
> > > > details
> > > > > > here may change in point releases.
> > > > > > >
> > > > > >
> > > > > > So it seems there's still a bit of confusion on what the
> consensus is
> > > > > > here. The way that I had thought this would work is that we'd do
> > > > > > something like such:
> > > > > >
> > > > > > 1. If user specifies and index, use it even if we have to wait
> > > > > > 2. If an index is built that can be used, use it
> > > > > > 3. If an index is building that can be used, wait for it
> > > > > > 4. As a last resort use _all_docs
> > > > > >
> > > > > > Discussing with Garren on the PR he's of the opinion that we
> should
> > > > > > skip step 3 and just go directly to using _all_docs if nothing is
> > > > > > built.
> > > > > >
> > > > >
> > > > > I just want to clarify step 3. I'm ok with using an index that
> still
> > > > needs
> > > > > to be built as long as there is no other built index
> > > > > that can service the request.
> > > > >
> > > > > So the big thing for me is to always prefer a built index over a
> > > building
> > > > > index. In the situation where there is only 1 building index
> versus all
> > > > > docs I'm ok with using the building index.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > > My main assumption is that most cases where a user is creating an
> > > > > > index and then wanting to run a query with it are in the
> > > > > > design/exploration phase of learning the feature or designing an
> > > index
> > > > > 

Re: [DISCUSS] Mango indexes on FDB

2020-03-26 Thread Paul Davis
On Thu, Mar 26, 2020 at 5:33 AM Will Holley  wrote:
>
> Ah - in that case I think we should remove step 3, as it leads to a
> confusing mental model. It's much simpler to explain that Mango will only
> use fresh indexes and any new indexes will build in the background.
>

Simpler in some respect. The trade off being that we then have to
teach users how to know that an index is built and also that they then
need to be aware that different index types will have different ideas
of what "built" means.

> On Thu, 26 Mar 2020 at 10:15, Garren Smith  wrote:
>
> > On Thu, Mar 26, 2020 at 11:04 AM Will Holley  wrote:
> >
> > > Broadly, I think it's a big step forward if we can prevent Mango from
> > > automatically selecting extremely stale indexes.
> > >
> > > I've been going back and forth on whether step 3 could lead to some
> > > difficult-to-predict behaviour. If we assume that requests have a short
> > > timeout - e.g. we can't return any result if it doesn't complete within
> > the
> > > FDB transaction timeout - then I think it's fine: queries that use
> > > _all_docs and a large database will be timing out anyway.
> > >
> > > If we were to allow long-running queries then it seems a bit sketchier
> > > because adding an index to a large database could cause queries that
> > > previously completed to start timing out whilst they block on the index
> > > build. This is basically how Mango in CouchDB 2/3 behaves and has been a
> > > big pain point for customers I've worked with, to the point where you
> > > basically need to explicitly specify which index Mango uses in all cases
> > if
> > > you're to avoid surprise timeouts when somebody adds a new index.
> > >
> > > As I understand it, we're not allowing queries to span FDB transactions
> > so
> > > this latter case is not something to worry about?
> >
> >
> > We are going to allow queries to span transactions. This is already
> > implemented for views and will be for mango
> >
> >
> > >
> > > Cheers,
> > >
> > > Will
> > >
> > > On Wed, 25 Mar 2020 at 19:43, Garren Smith  wrote:
> > >
> > > > On Wed, Mar 25, 2020 at 8:35 PM Paul Davis <
> > paul.joseph.da...@gmail.com>
> > > > wrote:
> > > >
> > > > > > It was therefore felt that having an immediate "Not ready" signal
> > for
> > > > > just _some_ calls to _find, based on the type of backing index, was a
> > > bad
> > > > > and confusing api.
> > > > > >
> > > > > > We also discussed _find calls where the user does not specify an
> > > index,
> > > > > and concluded that we would be free to choose between using the
> > > _all_docs
> > > > > index (which is always up to date but rarely the best index for a
> > given
> > > > > selector) or blocking to update a better but stale index.
> > > > > >
> > > > > > Summary-ing my summarisation;
> > > > > >
> > > > > > 1) if you specify an index, we'll use it even if we have to update
> > > it,
> > > > > no matter how long that takes.
> > > > > > 2) if you don't specify an index, it's the dealers choice. The
> > > details
> > > > > here may change in point releases.
> > > > > >
> > > > >
> > > > > So it seems there's still a bit of confusion on what the consensus is
> > > > > here. The way that I had thought this would work is that we'd do
> > > > > something like such:
> > > > >
> > > > > 1. If user specifies and index, use it even if we have to wait
> > > > > 2. If an index is built that can be used, use it
> > > > > 3. If an index is building that can be used, wait for it
> > > > > 4. As a last resort use _all_docs
> > > > >
> > > > > Discussing with Garren on the PR he's of the opinion that we should
> > > > > skip step 3 and just go directly to using _all_docs if nothing is
> > > > > built.
> > > > >
> > > >
> > > > I just want to clarify step 3. I'm ok with using an index that still
> > > needs
> > > > to be built as long as there is no other built index
> > > > that can service the request.
> > > >
> > > > So the big thing for me is to always prefer a built index over a
> > building
> > > > index. In the situation where there is only 1 building index versus all
> > > > docs I'm ok with using the building index.
> > > >
> > > >
> > > >
> > > >
> > > > > My main assumption is that most cases where a user is creating an
> > > > > index and then wanting to run a query with it are in the
> > > > > design/exploration phase of learning the feature or designing an
> > index
> > > > > to use. In that scenario if we skip waiting it seems likely that a
> > > > > user could easily be led to believe that an index creation "worked"
> > > > > for their selector when in reality it was just backed by _all_docs.
> > > > >
> > > > > The other reason for preferring to wait for an index to finish
> > > > > building is that the UI for the normal case of creating indexes is a
> > > > > bit awkward. Having to run a polling loop around checking the index
> > > > > status seems suboptimal in most cases.
> > > > >
> > > > > Am I missing other cases that would benefit from not waiting 

Re: [DISCUSS] Mango indexes on FDB

2020-03-26 Thread Paul Davis
The quoting here is weird. Are you saying to skip _all_docs in your
proposal, Glynn?

On Thu, Mar 26, 2020 at 5:46 AM Glynn Bird  wrote:
>
> +1 on removing step 3 - my reservation on falling back on all_docs is that
> users have no insight into how expensive a query is, other than measuring
> latencies (which might depend on other factors).  I would hope that folks
> would use option 1 anyway.
>
> So Paul's list becomes:
>
> 1. If user specifies an index, use it even if we have to wait
> 2. If an index is built that can be used, use it
> 3. n/a
> 4. As a last resort use _all_docs
>
>
> On Thu, 26 Mar 2020 at 10:33, Will Holley  wrote:
>
> > Ah - in that case I think we should remove step 3, as it leads to a
> > confusing mental model. It's much simpler to explain that Mango will only
> > use fresh indexes and any new indexes will build in the background.
> >
> > On Thu, 26 Mar 2020 at 10:15, Garren Smith  wrote:
> >
> > > On Thu, Mar 26, 2020 at 11:04 AM Will Holley 
> > wrote:
> > >
> > > > Broadly, I think it's a big step forward if we can prevent Mango from
> > > > automatically selecting extremely stale indexes.
> > > >
> > > > I've been going back and forth on whether step 3 could lead to some
> > > > difficult-to-predict behaviour. If we assume that requests have a short
> > > > timeout - e.g. we can't return any result if it doesn't complete within
> > > the
> > > > FDB transaction timeout - then I think it's fine: queries that use
> > > > _all_docs and a large database will be timing out anyway.
> > > >
> > > > If we were to allow long-running queries then it seems a bit sketchier
> > > > because adding an index to a large database could cause queries that
> > > > previously completed to start timing out whilst they block on the index
> > > > build. This is basically how Mango in CouchDB 2/3 behaves and has been
> > a
> > > > big pain point for customers I've worked with, to the point where you
> > > > basically need to explicitly specify which index Mango uses in all
> > cases
> > > if
> > > > you're to avoid surprise timeouts when somebody adds a new index.
> > > >
> > > > As I understand it, we're not allowing queries to span FDB transactions
> > > so
> > > > this latter case is not something to worry about?
> > >
> > >
> > > We are going to allow queries to span transactions. This is already
> > > implemented for views and will be for mango
> > >
> > >
> > > >
> > > > Cheers,
> > > >
> > > > Will
> > > >
> > > > On Wed, 25 Mar 2020 at 19:43, Garren Smith  wrote:
> > > >
> > > > > On Wed, Mar 25, 2020 at 8:35 PM Paul Davis <
> > > paul.joseph.da...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > > It was therefore felt that having an immediate "Not ready" signal
> > > for
> > > > > > just _some_ calls to _find, based on the type of backing index,
> > was a
> > > > bad
> > > > > > and confusing api.
> > > > > > >
> > > > > > > We also discussed _find calls where the user does not specify an
> > > > index,
> > > > > > and concluded that we would be free to choose between using the
> > > > _all_docs
> > > > > > index (which is always up to date but rarely the best index for a
> > > given
> > > > > > selector) or blocking to update a better but stale index.
> > > > > > >
> > > > > > > Summary-ing my summarisation;
> > > > > > >
> > > > > > > 1) if you specify an index, we'll use it even if we have to
> > update
> > > > it,
> > > > > > no matter how long that takes.
> > > > > > > 2) if you don't specify an index, it's the dealers choice. The
> > > > details
> > > > > > here may change in point releases.
> > > > > > >
> > > > > >
> > > > > > So it seems there's still a bit of confusion on what the consensus
> > is
> > > > > > here. The way that I had thought this would work is that we'd do
> > > > > > something like such:
> > > > > >
> > > > > > 1. If user specifies and index, use it even if we have to wait
> > > > > > 2. If an index is built that can be used, use it
> > > > > > 3. If an index is building that can be used, wait for it
> > > > > > 4. As a last resort use _all_docs
> > > > > >
> > > > > > Discussing with Garren on the PR he's of the opinion that we should
> > > > > > skip step 3 and just go directly to using _all_docs if nothing is
> > > > > > built.
> > > > > >
> > > > >
> > > > > I just want to clarify step 3. I'm ok with using an index that still
> > > > needs
> > > > > to be built as long as there is no other built index
> > > > > that can service the request.
> > > > >
> > > > > So the big thing for me is to always prefer a built index over a
> > > building
> > > > > index. In the situation where there is only 1 building index versus
> > all
> > > > > docs I'm ok with using the building index.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > > My main assumption is that most cases where a user is creating an
> > > > > > index and then wanting to run a query with it are in the
> > > > > > design/exploration phase of learning the feature or designing an
> > > index
> > > > > > to 

Re: [DISCUSS] Mango indexes on FDB

2020-03-26 Thread Garren Smith
On Thu, Mar 26, 2020 at 12:46 PM Glynn Bird  wrote:

> +1 on removing step 3 - my reservation on falling back on all_docs is that
> users have no insight into how expensive a query is, other than measuring
> latencies (which might depend on other factors).  I would hope that folks
> would use option 1 anyway.
>
> So Paul's list becomes:
>
> 1. If user specifies an index, use it even if we have to wait
> 2. If an index is built that can be used, use it
> 3. n/a
> 4. As a last resort use _all_docs
>

+1


>
>
> On Thu, 26 Mar 2020 at 10:33, Will Holley  wrote:
>
> > Ah - in that case I think we should remove step 3, as it leads to a
> > confusing mental model. It's much simpler to explain that Mango will only
> > use fresh indexes and any new indexes will build in the background.
> >
> > On Thu, 26 Mar 2020 at 10:15, Garren Smith  wrote:
> >
> > > On Thu, Mar 26, 2020 at 11:04 AM Will Holley 
> > wrote:
> > >
> > > > Broadly, I think it's a big step forward if we can prevent Mango from
> > > > automatically selecting extremely stale indexes.
> > > >
> > > > I've been going back and forth on whether step 3 could lead to some
> > > > difficult-to-predict behaviour. If we assume that requests have a
> short
> > > > timeout - e.g. we can't return any result if it doesn't complete
> within
> > > the
> > > > FDB transaction timeout - then I think it's fine: queries that use
> > > > _all_docs and a large database will be timing out anyway.
> > > >
> > > > If we were to allow long-running queries then it seems a bit
> sketchier
> > > > because adding an index to a large database could cause queries that
> > > > previously completed to start timing out whilst they block on the
> index
> > > > build. This is basically how Mango in CouchDB 2/3 behaves and has
> been
> > a
> > > > big pain point for customers I've worked with, to the point where you
> > > > basically need to explicitly specify which index Mango uses in all
> > cases
> > > if
> > > > you're to avoid surprise timeouts when somebody adds a new index.
> > > >
> > > > As I understand it, we're not allowing queries to span FDB
> transactions
> > > so
> > > > this latter case is not something to worry about?
> > >
> > >
> > > We are going to allow queries to span transactions. This is already
> > > implemented for views and will be for mango
> > >
> > >
> > > >
> > > > Cheers,
> > > >
> > > > Will
> > > >
> > > > On Wed, 25 Mar 2020 at 19:43, Garren Smith 
> wrote:
> > > >
> > > > > On Wed, Mar 25, 2020 at 8:35 PM Paul Davis <
> > > paul.joseph.da...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > > It was therefore felt that having an immediate "Not ready"
> signal
> > > for
> > > > > > just _some_ calls to _find, based on the type of backing index,
> > was a
> > > > bad
> > > > > > and confusing api.
> > > > > > >
> > > > > > > We also discussed _find calls where the user does not specify
> an
> > > > index,
> > > > > > and concluded that we would be free to choose between using the
> > > > _all_docs
> > > > > > index (which is always up to date but rarely the best index for a
> > > given
> > > > > > selector) or blocking to update a better but stale index.
> > > > > > >
> > > > > > > Summary-ing my summarisation;
> > > > > > >
> > > > > > > 1) if you specify an index, we'll use it even if we have to
> > update
> > > > it,
> > > > > > no matter how long that takes.
> > > > > > > 2) if you don't specify an index, it's the dealers choice. The
> > > > details
> > > > > > here may change in point releases.
> > > > > > >
> > > > > >
> > > > > > So it seems there's still a bit of confusion on what the
> consensus
> > is
> > > > > > here. The way that I had thought this would work is that we'd do
> > > > > > something like such:
> > > > > >
> > > > > > 1. If user specifies and index, use it even if we have to wait
> > > > > > 2. If an index is built that can be used, use it
> > > > > > 3. If an index is building that can be used, wait for it
> > > > > > 4. As a last resort use _all_docs
> > > > > >
> > > > > > Discussing with Garren on the PR he's of the opinion that we
> should
> > > > > > skip step 3 and just go directly to using _all_docs if nothing is
> > > > > > built.
> > > > > >
> > > > >
> > > > > I just want to clarify step 3. I'm ok with using an index that
> still
> > > > needs
> > > > > to be built as long as there is no other built index
> > > > > that can service the request.
> > > > >
> > > > > So the big thing for me is to always prefer a built index over a
> > > building
> > > > > index. In the situation where there is only 1 building index versus
> > all
> > > > > docs I'm ok with using the building index.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > > My main assumption is that most cases where a user is creating an
> > > > > > index and then wanting to run a query with it are in the
> > > > > > design/exploration phase of learning the feature or designing an
> > > index
> > > > > > to use. In that scenario if we skip waiting it seems 

Re: [DISCUSS] Mango indexes on FDB

2020-03-26 Thread Glynn Bird
+1 on removing step 3 - my reservation on falling back on all_docs is that
users have no insight into how expensive a query is, other than measuring
latencies (which might depend on other factors).  I would hope that folks
would use option 1 anyway.

So Paul's list becomes:

1. If user specifies an index, use it even if we have to wait
2. If an index is built that can be used, use it
3. n/a
4. As a last resort use _all_docs


On Thu, 26 Mar 2020 at 10:33, Will Holley  wrote:

> Ah - in that case I think we should remove step 3, as it leads to a
> confusing mental model. It's much simpler to explain that Mango will only
> use fresh indexes and any new indexes will build in the background.
>
> On Thu, 26 Mar 2020 at 10:15, Garren Smith  wrote:
>
> > On Thu, Mar 26, 2020 at 11:04 AM Will Holley 
> wrote:
> >
> > > Broadly, I think it's a big step forward if we can prevent Mango from
> > > automatically selecting extremely stale indexes.
> > >
> > > I've been going back and forth on whether step 3 could lead to some
> > > difficult-to-predict behaviour. If we assume that requests have a short
> > > timeout - e.g. we can't return any result if it doesn't complete within
> > the
> > > FDB transaction timeout - then I think it's fine: queries that use
> > > _all_docs and a large database will be timing out anyway.
> > >
> > > If we were to allow long-running queries then it seems a bit sketchier
> > > because adding an index to a large database could cause queries that
> > > previously completed to start timing out whilst they block on the index
> > > build. This is basically how Mango in CouchDB 2/3 behaves and has been
> a
> > > big pain point for customers I've worked with, to the point where you
> > > basically need to explicitly specify which index Mango uses in all
> cases
> > if
> > > you're to avoid surprise timeouts when somebody adds a new index.
> > >
> > > As I understand it, we're not allowing queries to span FDB transactions
> > so
> > > this latter case is not something to worry about?
> >
> >
> > We are going to allow queries to span transactions. This is already
> > implemented for views and will be for mango
> >
> >
> > >
> > > Cheers,
> > >
> > > Will
> > >
> > > On Wed, 25 Mar 2020 at 19:43, Garren Smith  wrote:
> > >
> > > > On Wed, Mar 25, 2020 at 8:35 PM Paul Davis <
> > paul.joseph.da...@gmail.com>
> > > > wrote:
> > > >
> > > > > > It was therefore felt that having an immediate "Not ready" signal
> > for
> > > > > just _some_ calls to _find, based on the type of backing index,
> was a
> > > bad
> > > > > and confusing api.
> > > > > >
> > > > > > We also discussed _find calls where the user does not specify an
> > > index,
> > > > > and concluded that we would be free to choose between using the
> > > _all_docs
> > > > > index (which is always up to date but rarely the best index for a
> > given
> > > > > selector) or blocking to update a better but stale index.
> > > > > >
> > > > > > Summary-ing my summarisation;
> > > > > >
> > > > > > 1) if you specify an index, we'll use it even if we have to
> update
> > > it,
> > > > > no matter how long that takes.
> > > > > > 2) if you don't specify an index, it's the dealers choice. The
> > > details
> > > > > here may change in point releases.
> > > > > >
> > > > >
> > > > > So it seems there's still a bit of confusion on what the consensus
> is
> > > > > here. The way that I had thought this would work is that we'd do
> > > > > something like such:
> > > > >
> > > > > 1. If user specifies and index, use it even if we have to wait
> > > > > 2. If an index is built that can be used, use it
> > > > > 3. If an index is building that can be used, wait for it
> > > > > 4. As a last resort use _all_docs
> > > > >
> > > > > Discussing with Garren on the PR he's of the opinion that we should
> > > > > skip step 3 and just go directly to using _all_docs if nothing is
> > > > > built.
> > > > >
> > > >
> > > > I just want to clarify step 3. I'm ok with using an index that still
> > > needs
> > > > to be built as long as there is no other built index
> > > > that can service the request.
> > > >
> > > > So the big thing for me is to always prefer a built index over a
> > building
> > > > index. In the situation where there is only 1 building index versus
> all
> > > > docs I'm ok with using the building index.
> > > >
> > > >
> > > >
> > > >
> > > > > My main assumption is that most cases where a user is creating an
> > > > > index and then wanting to run a query with it are in the
> > > > > design/exploration phase of learning the feature or designing an
> > index
> > > > > to use. In that scenario if we skip waiting it seems likely that a
> > > > > user could easily be led to believe that an index creation "worked"
> > > > > for their selector when in reality it was just backed by _all_docs.
> > > > >
> > > > > The other reason for preferring to wait for an index to finish
> > > > > building is that the UI for the normal case of creating indexes is
> a

Re: [DISCUSS] Mango indexes on FDB

2020-03-26 Thread Will Holley
Ah - in that case I think we should remove step 3, as it leads to a
confusing mental model. It's much simpler to explain that Mango will only
use fresh indexes and any new indexes will build in the background.

On Thu, 26 Mar 2020 at 10:15, Garren Smith  wrote:

> On Thu, Mar 26, 2020 at 11:04 AM Will Holley  wrote:
>
> > Broadly, I think it's a big step forward if we can prevent Mango from
> > automatically selecting extremely stale indexes.
> >
> > I've been going back and forth on whether step 3 could lead to some
> > difficult-to-predict behaviour. If we assume that requests have a short
> > timeout - e.g. we can't return any result if it doesn't complete within
> the
> > FDB transaction timeout - then I think it's fine: queries that use
> > _all_docs and a large database will be timing out anyway.
> >
> > If we were to allow long-running queries then it seems a bit sketchier
> > because adding an index to a large database could cause queries that
> > previously completed to start timing out whilst they block on the index
> > build. This is basically how Mango in CouchDB 2/3 behaves and has been a
> > big pain point for customers I've worked with, to the point where you
> > basically need to explicitly specify which index Mango uses in all cases
> if
> > you're to avoid surprise timeouts when somebody adds a new index.
> >
> > As I understand it, we're not allowing queries to span FDB transactions
> so
> > this latter case is not something to worry about?
>
>
> We are going to allow queries to span transactions. This is already
> implemented for views and will be for mango
>
>
> >
> > Cheers,
> >
> > Will
> >
> > On Wed, 25 Mar 2020 at 19:43, Garren Smith  wrote:
> >
> > > On Wed, Mar 25, 2020 at 8:35 PM Paul Davis <
> paul.joseph.da...@gmail.com>
> > > wrote:
> > >
> > > > > It was therefore felt that having an immediate "Not ready" signal
> for
> > > > just _some_ calls to _find, based on the type of backing index, was a
> > bad
> > > > and confusing api.
> > > > >
> > > > > We also discussed _find calls where the user does not specify an
> > index,
> > > > and concluded that we would be free to choose between using the
> > _all_docs
> > > > index (which is always up to date but rarely the best index for a
> given
> > > > selector) or blocking to update a better but stale index.
> > > > >
> > > > > Summary-ing my summarisation;
> > > > >
> > > > > 1) if you specify an index, we'll use it even if we have to update
> > it,
> > > > no matter how long that takes.
> > > > > 2) if you don't specify an index, it's the dealers choice. The
> > details
> > > > here may change in point releases.
> > > > >
> > > >
> > > > So it seems there's still a bit of confusion on what the consensus is
> > > > here. The way that I had thought this would work is that we'd do
> > > > something like such:
> > > >
> > > > 1. If user specifies and index, use it even if we have to wait
> > > > 2. If an index is built that can be used, use it
> > > > 3. If an index is building that can be used, wait for it
> > > > 4. As a last resort use _all_docs
> > > >
> > > > Discussing with Garren on the PR he's of the opinion that we should
> > > > skip step 3 and just go directly to using _all_docs if nothing is
> > > > built.
> > > >
> > >
> > > I just want to clarify step 3. I'm ok with using an index that still
> > needs
> > > to be built as long as there is no other built index
> > > that can service the request.
> > >
> > > So the big thing for me is to always prefer a built index over a
> building
> > > index. In the situation where there is only 1 building index versus all
> > > docs I'm ok with using the building index.
> > >
> > >
> > >
> > >
> > > > My main assumption is that most cases where a user is creating an
> > > > index and then wanting to run a query with it are in the
> > > > design/exploration phase of learning the feature or designing an
> index
> > > > to use. In that scenario if we skip waiting it seems likely that a
> > > > user could easily be led to believe that an index creation "worked"
> > > > for their selector when in reality it was just backed by _all_docs.
> > > >
> > > > The other reason for preferring to wait for an index to finish
> > > > building is that the UI for the normal case of creating indexes is a
> > > > bit awkward. Having to run a polling loop around checking the index
> > > > status seems suboptimal in most cases.
> > > >
> > > > Am I missing other cases that would benefit from not waiting and just
> > > > using _all_docs?
> > > >
> > > > Paul
> > > >
> > >
> >
>


Re: [DISCUSS] Mango indexes on FDB

2020-03-26 Thread Garren Smith
On Thu, Mar 26, 2020 at 11:04 AM Will Holley  wrote:

> Broadly, I think it's a big step forward if we can prevent Mango from
> automatically selecting extremely stale indexes.
>
> I've been going back and forth on whether step 3 could lead to some
> difficult-to-predict behaviour. If we assume that requests have a short
> timeout - e.g. we can't return any result if it doesn't complete within the
> FDB transaction timeout - then I think it's fine: queries that use
> _all_docs and a large database will be timing out anyway.
>
> If we were to allow long-running queries then it seems a bit sketchier
> because adding an index to a large database could cause queries that
> previously completed to start timing out whilst they block on the index
> build. This is basically how Mango in CouchDB 2/3 behaves and has been a
> big pain point for customers I've worked with, to the point where you
> basically need to explicitly specify which index Mango uses in all cases if
> you're to avoid surprise timeouts when somebody adds a new index.
>
> As I understand it, we're not allowing queries to span FDB transactions so
> this latter case is not something to worry about?


We are going to allow queries to span transactions. This is already
implemented for views and will be for mango


>
> Cheers,
>
> Will
>
> On Wed, 25 Mar 2020 at 19:43, Garren Smith  wrote:
>
> > On Wed, Mar 25, 2020 at 8:35 PM Paul Davis 
> > wrote:
> >
> > > > It was therefore felt that having an immediate "Not ready" signal for
> > > just _some_ calls to _find, based on the type of backing index, was a
> bad
> > > and confusing api.
> > > >
> > > > We also discussed _find calls where the user does not specify an
> index,
> > > and concluded that we would be free to choose between using the
> _all_docs
> > > index (which is always up to date but rarely the best index for a given
> > > selector) or blocking to update a better but stale index.
> > > >
> > > > Summary-ing my summarisation;
> > > >
> > > > 1) if you specify an index, we'll use it even if we have to update
> it,
> > > no matter how long that takes.
> > > > 2) if you don't specify an index, it's the dealers choice. The
> details
> > > here may change in point releases.
> > > >
> > >
> > > So it seems there's still a bit of confusion on what the consensus is
> > > here. The way that I had thought this would work is that we'd do
> > > something like such:
> > >
> > > 1. If user specifies and index, use it even if we have to wait
> > > 2. If an index is built that can be used, use it
> > > 3. If an index is building that can be used, wait for it
> > > 4. As a last resort use _all_docs
> > >
> > > Discussing with Garren on the PR he's of the opinion that we should
> > > skip step 3 and just go directly to using _all_docs if nothing is
> > > built.
> > >
> >
> > I just want to clarify step 3. I'm ok with using an index that still
> needs
> > to be built as long as there is no other built index
> > that can service the request.
> >
> > So the big thing for me is to always prefer a built index over a building
> > index. In the situation where there is only 1 building index versus all
> > docs I'm ok with using the building index.
> >
> >
> >
> >
> > > My main assumption is that most cases where a user is creating an
> > > index and then wanting to run a query with it are in the
> > > design/exploration phase of learning the feature or designing an index
> > > to use. In that scenario if we skip waiting it seems likely that a
> > > user could easily be led to believe that an index creation "worked"
> > > for their selector when in reality it was just backed by _all_docs.
> > >
> > > The other reason for preferring to wait for an index to finish
> > > building is that the UI for the normal case of creating indexes is a
> > > bit awkward. Having to run a polling loop around checking the index
> > > status seems suboptimal in most cases.
> > >
> > > Am I missing other cases that would benefit from not waiting and just
> > > using _all_docs?
> > >
> > > Paul
> > >
> >
>


Re: [DISCUSS] Mango indexes on FDB

2020-03-26 Thread Glynn Bird
Agree with Will that falling back to _all_docs-powered queries is usually
undesirable in all but the smallest data sets. More folks than you'd think
end up going into production without the right index because the
_all_docs-powered query in development (with a small data set) seemed to be
fast enough.

I always advise people to use "use_index" so they get the predictability of
"this query uses that index". You're then left with the the user wondering
whether index X is built yet and for that they have to navigate
_active_tasks or poll a query until it returns something, which is a little
primitive but probably beyond the scope of Garren's original post.

On Thu, 26 Mar 2020 at 09:04, Will Holley  wrote:

> Broadly, I think it's a big step forward if we can prevent Mango from
> automatically selecting extremely stale indexes.
>
> I've been going back and forth on whether step 3 could lead to some
> difficult-to-predict behaviour. If we assume that requests have a short
> timeout - e.g. we can't return any result if it doesn't complete within the
> FDB transaction timeout - then I think it's fine: queries that use
> _all_docs and a large database will be timing out anyway.
>
> If we were to allow long-running queries then it seems a bit sketchier
> because adding an index to a large database could cause queries that
> previously completed to start timing out whilst they block on the index
> build. This is basically how Mango in CouchDB 2/3 behaves and has been a
> big pain point for customers I've worked with, to the point where you
> basically need to explicitly specify which index Mango uses in all cases if
> you're to avoid surprise timeouts when somebody adds a new index.
>
> As I understand it, we're not allowing queries to span FDB transactions so
> this latter case is not something to worry about?
>
> Cheers,
>
> Will
>
> On Wed, 25 Mar 2020 at 19:43, Garren Smith  wrote:
>
> > On Wed, Mar 25, 2020 at 8:35 PM Paul Davis 
> > wrote:
> >
> > > > It was therefore felt that having an immediate "Not ready" signal for
> > > just _some_ calls to _find, based on the type of backing index, was a
> bad
> > > and confusing api.
> > > >
> > > > We also discussed _find calls where the user does not specify an
> index,
> > > and concluded that we would be free to choose between using the
> _all_docs
> > > index (which is always up to date but rarely the best index for a given
> > > selector) or blocking to update a better but stale index.
> > > >
> > > > Summary-ing my summarisation;
> > > >
> > > > 1) if you specify an index, we'll use it even if we have to update
> it,
> > > no matter how long that takes.
> > > > 2) if you don't specify an index, it's the dealers choice. The
> details
> > > here may change in point releases.
> > > >
> > >
> > > So it seems there's still a bit of confusion on what the consensus is
> > > here. The way that I had thought this would work is that we'd do
> > > something like such:
> > >
> > > 1. If user specifies and index, use it even if we have to wait
> > > 2. If an index is built that can be used, use it
> > > 3. If an index is building that can be used, wait for it
> > > 4. As a last resort use _all_docs
> > >
> > > Discussing with Garren on the PR he's of the opinion that we should
> > > skip step 3 and just go directly to using _all_docs if nothing is
> > > built.
> > >
> >
> > I just want to clarify step 3. I'm ok with using an index that still
> needs
> > to be built as long as there is no other built index
> > that can service the request.
> >
> > So the big thing for me is to always prefer a built index over a building
> > index. In the situation where there is only 1 building index versus all
> > docs I'm ok with using the building index.
> >
> >
> >
> >
> > > My main assumption is that most cases where a user is creating an
> > > index and then wanting to run a query with it are in the
> > > design/exploration phase of learning the feature or designing an index
> > > to use. In that scenario if we skip waiting it seems likely that a
> > > user could easily be led to believe that an index creation "worked"
> > > for their selector when in reality it was just backed by _all_docs.
> > >
> > > The other reason for preferring to wait for an index to finish
> > > building is that the UI for the normal case of creating indexes is a
> > > bit awkward. Having to run a polling loop around checking the index
> > > status seems suboptimal in most cases.
> > >
> > > Am I missing other cases that would benefit from not waiting and just
> > > using _all_docs?
> > >
> > > Paul
> > >
> >
>


Re: [DISCUSS] Mango indexes on FDB

2020-03-26 Thread Will Holley
Broadly, I think it's a big step forward if we can prevent Mango from
automatically selecting extremely stale indexes.

I've been going back and forth on whether step 3 could lead to some
difficult-to-predict behaviour. If we assume that requests have a short
timeout - e.g. we can't return any result if it doesn't complete within the
FDB transaction timeout - then I think it's fine: queries that use
_all_docs and a large database will be timing out anyway.

If we were to allow long-running queries then it seems a bit sketchier
because adding an index to a large database could cause queries that
previously completed to start timing out whilst they block on the index
build. This is basically how Mango in CouchDB 2/3 behaves and has been a
big pain point for customers I've worked with, to the point where you
basically need to explicitly specify which index Mango uses in all cases if
you're to avoid surprise timeouts when somebody adds a new index.

As I understand it, we're not allowing queries to span FDB transactions so
this latter case is not something to worry about?

Cheers,

Will

On Wed, 25 Mar 2020 at 19:43, Garren Smith  wrote:

> On Wed, Mar 25, 2020 at 8:35 PM Paul Davis 
> wrote:
>
> > > It was therefore felt that having an immediate "Not ready" signal for
> > just _some_ calls to _find, based on the type of backing index, was a bad
> > and confusing api.
> > >
> > > We also discussed _find calls where the user does not specify an index,
> > and concluded that we would be free to choose between using the _all_docs
> > index (which is always up to date but rarely the best index for a given
> > selector) or blocking to update a better but stale index.
> > >
> > > Summary-ing my summarisation;
> > >
> > > 1) if you specify an index, we'll use it even if we have to update it,
> > no matter how long that takes.
> > > 2) if you don't specify an index, it's the dealers choice. The details
> > here may change in point releases.
> > >
> >
> > So it seems there's still a bit of confusion on what the consensus is
> > here. The way that I had thought this would work is that we'd do
> > something like such:
> >
> > 1. If user specifies and index, use it even if we have to wait
> > 2. If an index is built that can be used, use it
> > 3. If an index is building that can be used, wait for it
> > 4. As a last resort use _all_docs
> >
> > Discussing with Garren on the PR he's of the opinion that we should
> > skip step 3 and just go directly to using _all_docs if nothing is
> > built.
> >
>
> I just want to clarify step 3. I'm ok with using an index that still needs
> to be built as long as there is no other built index
> that can service the request.
>
> So the big thing for me is to always prefer a built index over a building
> index. In the situation where there is only 1 building index versus all
> docs I'm ok with using the building index.
>
>
>
>
> > My main assumption is that most cases where a user is creating an
> > index and then wanting to run a query with it are in the
> > design/exploration phase of learning the feature or designing an index
> > to use. In that scenario if we skip waiting it seems likely that a
> > user could easily be led to believe that an index creation "worked"
> > for their selector when in reality it was just backed by _all_docs.
> >
> > The other reason for preferring to wait for an index to finish
> > building is that the UI for the normal case of creating indexes is a
> > bit awkward. Having to run a polling loop around checking the index
> > status seems suboptimal in most cases.
> >
> > Am I missing other cases that would benefit from not waiting and just
> > using _all_docs?
> >
> > Paul
> >
>


Re: [DISCUSS] Mango indexes on FDB

2020-03-25 Thread Garren Smith
On Wed, Mar 25, 2020 at 8:35 PM Paul Davis 
wrote:

> > It was therefore felt that having an immediate "Not ready" signal for
> just _some_ calls to _find, based on the type of backing index, was a bad
> and confusing api.
> >
> > We also discussed _find calls where the user does not specify an index,
> and concluded that we would be free to choose between using the _all_docs
> index (which is always up to date but rarely the best index for a given
> selector) or blocking to update a better but stale index.
> >
> > Summary-ing my summarisation;
> >
> > 1) if you specify an index, we'll use it even if we have to update it,
> no matter how long that takes.
> > 2) if you don't specify an index, it's the dealers choice. The details
> here may change in point releases.
> >
>
> So it seems there's still a bit of confusion on what the consensus is
> here. The way that I had thought this would work is that we'd do
> something like such:
>
> 1. If user specifies and index, use it even if we have to wait
> 2. If an index is built that can be used, use it
> 3. If an index is building that can be used, wait for it
> 4. As a last resort use _all_docs
>
> Discussing with Garren on the PR he's of the opinion that we should
> skip step 3 and just go directly to using _all_docs if nothing is
> built.
>

I just want to clarify step 3. I'm ok with using an index that still needs
to be built as long as there is no other built index
that can service the request.

So the big thing for me is to always prefer a built index over a building
index. In the situation where there is only 1 building index versus all
docs I'm ok with using the building index.




> My main assumption is that most cases where a user is creating an
> index and then wanting to run a query with it are in the
> design/exploration phase of learning the feature or designing an index
> to use. In that scenario if we skip waiting it seems likely that a
> user could easily be led to believe that an index creation "worked"
> for their selector when in reality it was just backed by _all_docs.
>
> The other reason for preferring to wait for an index to finish
> building is that the UI for the normal case of creating indexes is a
> bit awkward. Having to run a polling loop around checking the index
> status seems suboptimal in most cases.
>
> Am I missing other cases that would benefit from not waiting and just
> using _all_docs?
>
> Paul
>


Re: [DISCUSS] Mango indexes on FDB

2020-03-25 Thread Paul Davis
> It was therefore felt that having an immediate "Not ready" signal for just 
> _some_ calls to _find, based on the type of backing index, was a bad and 
> confusing api.
>
> We also discussed _find calls where the user does not specify an index, and 
> concluded that we would be free to choose between using the _all_docs index 
> (which is always up to date but rarely the best index for a given selector) 
> or blocking to update a better but stale index.
>
> Summary-ing my summarisation;
>
> 1) if you specify an index, we'll use it even if we have to update it, no 
> matter how long that takes.
> 2) if you don't specify an index, it's the dealers choice. The details here 
> may change in point releases.
>

So it seems there's still a bit of confusion on what the consensus is
here. The way that I had thought this would work is that we'd do
something like such:

1. If user specifies and index, use it even if we have to wait
2. If an index is built that can be used, use it
3. If an index is building that can be used, wait for it
4. As a last resort use _all_docs

Discussing with Garren on the PR he's of the opinion that we should
skip step 3 and just go directly to using _all_docs if nothing is
built.

My main assumption is that most cases where a user is creating an
index and then wanting to run a query with it are in the
design/exploration phase of learning the feature or designing an index
to use. In that scenario if we skip waiting it seems likely that a
user could easily be led to believe that an index creation "worked"
for their selector when in reality it was just backed by _all_docs.

The other reason for preferring to wait for an index to finish
building is that the UI for the normal case of creating indexes is a
bit awkward. Having to run a polling loop around checking the index
status seems suboptimal in most cases.

Am I missing other cases that would benefit from not waiting and just
using _all_docs?

Paul


Re: [DISCUSS] Mango indexes on FDB

2020-03-24 Thread Alex Miller

> On Mar 24, 2020, at 05:51, Garren Smith  wrote:
> On Tue, Mar 24, 2020 at 1:30 AM Joan Touzet  > wrote:
> 
>> Question: Imagine a node that's been offline for a bit and is just
>> coming back on. (I'm not 100% sure how this works in FDB land.) If
>> there's a (stale) index on disk, and the index is being updated, and the
>> index on disk is kind of stale...what happens?
>> 
> 
> With couchdb_layer this can't happen as each CouchDB node is stateless and
> doesn't actually keep any indexes. Everything would be in FoundationDB. So
> if the index is built then it is built and ready for all couch_layer nodes.
> 
> FoundationDB storage servers could fall behind the Tlogs. I'm not 100% sure
> what would happen in this case. But it would be consistent for all
> couch_layer nodes.

When a client gets a read version to begin a transaction in FDB, it is promised 
that this was the most recent version at some point in time between issuing the 
request and receiving the reply.  When it issues reads, those reads must 
include the version, and must get back the most recently written value for that 
key as of the included version.  FDB is not allowed to break this contract 
during faults.

The cluster will continue advancing in versions, as it does not throttle if 
only one server in a shard falls behind (or is offline).  When the server comes 
back online, it will pull the stream of mutations from the transaction logs to 
catch up.  In the meantime, it will continue to be unavailable for reads until 
it catches up, as clients send read requests for a specific (recent) version 
that the lagging storage server knows that it does not have.  After 1s, it will 
reply with a `future_version` error to tell the client it won’t be getting an 
answer soon.  The client will then make a decision based upon either the error 
or observed latency to re-issue the read to a different replica of that shard 
so that it may get an answer, and will continue doing so until it notices that 
the lagged storage server has caught up and is responding successfully.

If you’re interested in more details around the operational side of a storage 
server failure, I’d suggest reading the threads that Kyle Snavely started on 
the FDB Forums:
https://forums.foundationdb.org/t/quick-question-on-tlog-disk-space-for-large-clusters/1962
 

https://forums.foundationdb.org/t/questions-regarding-maintenance-for-multiple-storage-oriented-machines-in-a-data-hall/2010
 




Re: [DISCUSS] Mango indexes on FDB

2020-03-24 Thread Robert Samuel Newson
Hi,

We had a long discussion on the CouchDB Slack on this topic, which I will 
brutally summarise;

While we intend to update json indexes in the same transaction as the 
associated document update, there is a concern that this won't always be 
possible (large document and large number of indexes to update at once) and 
that it will not apply to javascript map indexes or search indexes.

It was therefore felt that having an immediate "Not ready" signal for just 
_some_ calls to _find, based on the type of backing index, was a bad and 
confusing api.

We also discussed _find calls where the user does not specify an index, and 
concluded that we would be free to choose between using the _all_docs index 
(which is always up to date but rarely the best index for a given selector) or 
blocking to update a better but stale index.

Summary-ing my summarisation;

1) if you specify an index, we'll use it even if we have to update it, no 
matter how long that takes.
2) if you don't specify an index, it's the dealers choice. The details here may 
change in point releases.

No new status code is therefore needed.

B.

> On 24 Mar 2020, at 12:51, Garren Smith  wrote:
> 
> On Tue, Mar 24, 2020 at 1:30 AM Joan Touzet  wrote:
> 
>> 
>> 
>> On 2020-03-23 4:46 p.m., Mike Rhodes wrote:
>>> Garren,
>>> 
>>> Very much +1 on this suggestion, as it is, at least for me, what I'd
>> expect to happen if I were leaving the system to select an index -- as you
>> imply, the build process almost certainly takes longer than using the
>> _all_docs index. In addition, for the common case where there is a less
>> optimal but still useful index available, one might expect that index to be
>> used in preference to the "better" but unbuilt one.
>> 
>> I agree.
>> 
>>> But I do think this is important:
>>> 
 We can amend the warning message
 to let them know that they have an index that is building that could
 service the index when it's ready.
>>> 
>>> Otherwise it's a bit too easy to get confused when trying to understand
>> the reason why an index you were _sure_ should've been used in fact was not.
>> 
>> Question: Imagine a node that's been offline for a bit and is just
>> coming back on. (I'm not 100% sure how this works in FDB land.) If
>> there's a (stale) index on disk, and the index is being updated, and the
>> index on disk is kind of stale...what happens?
>> 
> 
> With couchdb_layer this can't happen as each CouchDB node is stateless and
> doesn't actually keep any indexes. Everything would be in FoundationDB. So
> if the index is built then it is built and ready for all couch_layer nodes.
> 
> FoundationDB storage servers could fall behind the Tlogs. I'm not 100% sure
> what would happen in this case. But it would be consistent for all
> couch_layer nodes.
> 
> 
> 
>> 
>> -Joan



Re: [DISCUSS] Mango indexes on FDB

2020-03-24 Thread Garren Smith
On Tue, Mar 24, 2020 at 1:30 AM Joan Touzet  wrote:

>
>
> On 2020-03-23 4:46 p.m., Mike Rhodes wrote:
> > Garren,
> >
> > Very much +1 on this suggestion, as it is, at least for me, what I'd
> expect to happen if I were leaving the system to select an index -- as you
> imply, the build process almost certainly takes longer than using the
> _all_docs index. In addition, for the common case where there is a less
> optimal but still useful index available, one might expect that index to be
> used in preference to the "better" but unbuilt one.
>
> I agree.
>
> > But I do think this is important:
> >
> >> We can amend the warning message
> >> to let them know that they have an index that is building that could
> >> service the index when it's ready.
> >
> > Otherwise it's a bit too easy to get confused when trying to understand
> the reason why an index you were _sure_ should've been used in fact was not.
>
> Question: Imagine a node that's been offline for a bit and is just
> coming back on. (I'm not 100% sure how this works in FDB land.) If
> there's a (stale) index on disk, and the index is being updated, and the
> index on disk is kind of stale...what happens?
>

With couchdb_layer this can't happen as each CouchDB node is stateless and
doesn't actually keep any indexes. Everything would be in FoundationDB. So
if the index is built then it is built and ready for all couch_layer nodes.

FoundationDB storage servers could fall behind the Tlogs. I'm not 100% sure
what would happen in this case. But it would be consistent for all
couch_layer nodes.



>
> -Joan
>


Re: [DISCUSS] Mango indexes on FDB

2020-03-24 Thread Robert Newson
No, 425 is something specific

A 503 Service Unavailable seems the only suitable standard code. 

B. 

> On 24 Mar 2020, at 08:48, Glynn Bird  wrote:
> 
> If a user didn't specify the index they wanted to use, leaving the choice
> of index up to CouchDB, I would expect Couch would ignore the partially
> built index and fall back on _all_docs. so +1 on this.
> 
> But we need also consider the API response if a user *specifies* an index
> during a query (with use_index) when that index is not built yet, I think I
> would prefer an instant 4** response indicating that the requested
> resource isn't ready yet, rather than performing a very slow,
> _all_docs-powered search. Is "425 Too Early" a suitable response?
> 
> 
> 
> 
>> On Mon, 23 Mar 2020 at 23:30, Joan Touzet  wrote:
>> 
>> 
>> 
>>> On 2020-03-23 4:46 p.m., Mike Rhodes wrote:
>>> Garren,
>>> 
>>> Very much +1 on this suggestion, as it is, at least for me, what I'd
>> expect to happen if I were leaving the system to select an index -- as you
>> imply, the build process almost certainly takes longer than using the
>> _all_docs index. In addition, for the common case where there is a less
>> optimal but still useful index available, one might expect that index to be
>> used in preference to the "better" but unbuilt one.
>> 
>> I agree.
>> 
>>> But I do think this is important:
>>> 
 We can amend the warning message
 to let them know that they have an index that is building that could
 service the index when it's ready.
>>> 
>>> Otherwise it's a bit too easy to get confused when trying to understand
>> the reason why an index you were _sure_ should've been used in fact was not.
>> 
>> Question: Imagine a node that's been offline for a bit and is just
>> coming back on. (I'm not 100% sure how this works in FDB land.) If
>> there's a (stale) index on disk, and the index is being updated, and the
>> index on disk is kind of stale...what happens?
>> 
>> -Joan
>> 



Re: [DISCUSS] Mango indexes on FDB

2020-03-24 Thread Glynn Bird
If a user didn't specify the index they wanted to use, leaving the choice
of index up to CouchDB, I would expect Couch would ignore the partially
built index and fall back on _all_docs. so +1 on this.

But we need also consider the API response if a user *specifies* an index
during a query (with use_index) when that index is not built yet, I think I
would prefer an instant 4** response indicating that the requested
resource isn't ready yet, rather than performing a very slow,
_all_docs-powered search. Is "425 Too Early" a suitable response?




On Mon, 23 Mar 2020 at 23:30, Joan Touzet  wrote:

>
>
> On 2020-03-23 4:46 p.m., Mike Rhodes wrote:
> > Garren,
> >
> > Very much +1 on this suggestion, as it is, at least for me, what I'd
> expect to happen if I were leaving the system to select an index -- as you
> imply, the build process almost certainly takes longer than using the
> _all_docs index. In addition, for the common case where there is a less
> optimal but still useful index available, one might expect that index to be
> used in preference to the "better" but unbuilt one.
>
> I agree.
>
> > But I do think this is important:
> >
> >> We can amend the warning message
> >> to let them know that they have an index that is building that could
> >> service the index when it's ready.
> >
> > Otherwise it's a bit too easy to get confused when trying to understand
> the reason why an index you were _sure_ should've been used in fact was not.
>
> Question: Imagine a node that's been offline for a bit and is just
> coming back on. (I'm not 100% sure how this works in FDB land.) If
> there's a (stale) index on disk, and the index is being updated, and the
> index on disk is kind of stale...what happens?
>
> -Joan
>


Re: [DISCUSS] Mango indexes on FDB

2020-03-23 Thread Joan Touzet




On 2020-03-23 4:46 p.m., Mike Rhodes wrote:

Garren,

Very much +1 on this suggestion, as it is, at least for me, what I'd expect to happen if 
I were leaving the system to select an index -- as you imply, the build process almost 
certainly takes longer than using the _all_docs index. In addition, for the common case 
where there is a less optimal but still useful index available, one might expect that 
index to be used in preference to the "better" but unbuilt one.


I agree.


But I do think this is important:


We can amend the warning message
to let them know that they have an index that is building that could
service the index when it's ready.


Otherwise it's a bit too easy to get confused when trying to understand the 
reason why an index you were _sure_ should've been used in fact was not.


Question: Imagine a node that's been offline for a bit and is just 
coming back on. (I'm not 100% sure how this works in FDB land.) If 
there's a (stale) index on disk, and the index is being updated, and the 
index on disk is kind of stale...what happens?


-Joan


Re: [DISCUSS] Mango indexes on FDB

2020-03-23 Thread Mike Rhodes
Garren,

Very much +1 on this suggestion, as it is, at least for me, what I'd expect to 
happen if I were leaving the system to select an index -- as you imply, the 
build process almost certainly takes longer than using the _all_docs index. In 
addition, for the common case where there is a less optimal but still useful 
index available, one might expect that index to be used in preference to the 
"better" but unbuilt one.

But I do think this is important:

> We can amend the warning message
> to let them know that they have an index that is building that could
> service the index when it's ready.

Otherwise it's a bit too easy to get confused when trying to understand the 
reason why an index you were _sure_ should've been used in fact was not.

-- 
Mike.

On Mon, 23 Mar 2020, at 17:27, Garren Smith wrote:
> Hi Everyone,
> 
> Currently with Mango, when selecting an index to service a query, Mango
> does not take into consideration whether the index is fully built or not.
> This can lead to unexpected response times when a new index is created. It
> also makes it really tricky on how to deploy a new index into a large
> production database because any new index could cause large delays to
> queries.
> 
> For Mango indexes on FoundationDB, I would like to change the behavior so
> that only indexes that are built can be used to service query requests. The
> reason we can make this change is that Mango indexes are built slightly
> differently on FDB. When an index is created, an initial background process
> will build the index up the change seq that the index was created. At the
> same time, any new doc updates are immediately indexed in the document
> transaction.
> So for Mango on FDB, what I want to do is not choose any index that still
> has the background process doing the initial build.
> 
> This is a bit of a user experience change. The one area where it would be a
> noticeable change is if a user creates their first index in a database, and
> then does a query. If the index is not built yet, Mango would use _all_docs
> to service the query and return a warning that they have no index. This
> could definitely be confusing for a user. We can amend the warning message
> to let them know that they have an index that is building that could
> service the index when it's ready.
> 
> I really like the idea of not using indexes that are still building. It
> allows the user not to have to worry and think about managing indexes. They
> can add an index, and keep querying. It the current query is a bit slow, it
> will start getting faster once the newer better index is ready. But it will
> never get slower because we selected an index that is going to take 30mins
> to build.
> 
> I would like to know if you would prefer that we remove indexes that are
> being built from selection for queries or would you like it to remain the
> same as in CouchDB 3.x
> 
> Cheers
> Garren
>


[DISCUSS] Mango indexes on FDB

2020-03-23 Thread Garren Smith
Hi Everyone,

Currently with Mango, when selecting an index to service a query, Mango
does not take into consideration whether the index is fully built or not.
This can lead to unexpected response times when a new index is created. It
also makes it really tricky on how to deploy a new index into a large
production database because any new index could cause large delays to
queries.

For Mango indexes on FoundationDB, I would like to change the behavior so
that only indexes that are built can be used to service query requests. The
reason we can make this change is that Mango indexes are built slightly
differently on FDB. When an index is created, an initial background process
will build the index up the change seq that the index was created. At the
same time, any new doc updates are immediately indexed in the document
transaction.
So for Mango on FDB, what I want to do is not choose any index that still
has the background process doing the initial build.

This is a bit of a user experience change. The one area where it would be a
noticeable change is if a user creates their first index in a database, and
then does a query. If the index is not built yet, Mango would use _all_docs
to service the query and return a warning that they have no index. This
could definitely be confusing for a user. We can amend the warning message
to let them know that they have an index that is building that could
service the index when it's ready.

I really like the idea of not using indexes that are still building. It
allows the user not to have to worry and think about managing indexes. They
can add an index, and keep querying. It the current query is a bit slow, it
will start getting faster once the newer better index is ready. But it will
never get slower because we selected an index that is going to take 30mins
to build.

I would like to know if you would prefer that we remove indexes that are
being built from selection for queries or would you like it to remain the
same as in CouchDB 3.x

Cheers
Garren