subject:"\[DISCUSS\] \: things we need to solve\/decide \: storing JSON documents"

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-04-01 Thread Adam Kocoloski

I went and collected this discussion into an RFC proposing the “exploded KV” 
approach:

https://github.com/apache/couchdb-documentation/pull/403 


Cheers, Adam

> On Feb 20, 2019, at 10:47 AM, Paul Davis  wrote:
> 
> Strongly agree that we very much don't want to have Erlang-isms being
> pushed into fdb. Regardless of what we end up with I'd like to see a
> very strong (de)?serialization layer with some significant test
> coverage.
> 
> On Tue, Feb 19, 2019 at 6:54 PM Adam Kocoloski  wrote:
>> 
>> Yes, that sort of versioning has been omitted from the various concrete 
>> proposals but we definitely want to have it. We’ve seen the alternative in 
>> some of the Erlang records that we serialize to disk today and it ain’t 
>> pretty.
>> 
>> I can imagine that we’ll want to have the codebase laid out in a way that 
>> allows us to upgrade to a smarter KV encoding over time without major 
>> surgery, which I think is a good “layer of abstraction”. I would be nervous 
>> if we started having abstract containers of data structures pushed down into 
>> FDB itself :)
>> 
>> Adam
>> 
>>> On Feb 19, 2019, at 5:41 PM, Paul Davis  wrote:
>>> 
>>> A simple doc storage version number would likely be enough for future us to
>>> do fancier things.
>>> 
>>> On Tue, Feb 19, 2019 at 4:16 PM Benjamin Anderson 
>>> wrote:
>>> 
> I don’t think adding a layer of abstraction is the right move just yet,
 I think we should continue to find consensus on one answer to this question
 
 Agree that the theorycrafting stage is not optimal for making
 abstraction decisions, but I suspect it would be worthwhile somewhere
 between prototyping and releasing. Adam's proposal does seem to me the
 most appealing approach on the surface, and I don't see anyone signing
 up to do the work to deliver an alternative concurrently.
 
 --
 ba
 
 On Tue, Feb 19, 2019 at 1:43 PM Robert Samuel Newson 
 wrote:
> 
> Addendum: By “directory aliasing” I meant within a document (either the
 actual Directory thing or something equivalent of our own making). The
 directory aliasing for each database is a good way to reduce key size
 without a significant cost. Though if Redwood lands in time, even this
 would become an inutile obfuscation].
> 
>> On 19 Feb 2019, at 21:39, Robert Samuel Newson 
 wrote:
>> 
>> Interesting suggestion, obviously the details might get the wrong kind
 of fun.
>> 
>> Somewhere above I suggested this would be something we could change
 over time and even use different approaches for different documents within
 the same database. This is the long way of saying there are multiple ways
 to do this each with advantages and none without disadvantages.
>> 
>> I don’t think adding a layer of abstraction is the right move just
 yet, I think we should continue to find consensus on one answer to this
 question (and the related ones in other threads) for the first release.
 It’s easy to say “we can change it later”, of course. We can, though it
 would be a chunk of work in the context of something that already works,
 I’ve rarely seen anyone sign up for that.
>> 
>> I’m fine with the first proposal from Adam, where the keys are tuples
 of key parts pointing at terminal values. To make it easier for the first
 version, I would exclude optimisations like deduplication or the Directory
 aliasing or the schema thing that I suggested and that Ilya incorporated a
 variant of in a follow-up post. We’d accept that there are limits on the
 sizes of documents, including the awkward-to-express one about property
 depth.
>> 
>> Stepping back, I’m not seeing any essential improvement over Adam’s
 original proposal besides the few corrections and clarifications made by
 various authors. Could we start an RFC based on Adam’s original proposal on
 document body, revision tree and index storage? We could then have PR’s
 against that for each additional optimisation (one person’s optimisation is
 another person’s needless complication)?
>> 
>> If I’ve missed some genuine advance on the original proposal in this
 long thread, please call it out for me.
>> 
>> B.
>> 
>>> On 19 Feb 2019, at 21:15, Benjamin Anderson 
 wrote:
>>> 
>>> As is evident by the length of this thread, there's a pretty big
>>> design space to cover here, and it seems unlikely we'll have arrived
>>> at a "correct" solution even by the time this thing ships. Perhaps it
>>> would be worthwhile to treat the in-FDB representation of data as a
>>> first-class abstraction and support multiple representations
>>> simultaneously?
>>> 
>>> Obviously there's no such thing as a zero-cost abstraction - and I've
>>> not thought very hard about how far up the stack the

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-20 Thread Paul Davis

Strongly agree that we very much don't want to have Erlang-isms being
pushed into fdb. Regardless of what we end up with I'd like to see a
very strong (de)?serialization layer with some significant test
coverage.

On Tue, Feb 19, 2019 at 6:54 PM Adam Kocoloski  wrote:
>
> Yes, that sort of versioning has been omitted from the various concrete 
> proposals but we definitely want to have it. We’ve seen the alternative in 
> some of the Erlang records that we serialize to disk today and it ain’t 
> pretty.
>
> I can imagine that we’ll want to have the codebase laid out in a way that 
> allows us to upgrade to a smarter KV encoding over time without major 
> surgery, which I think is a good “layer of abstraction”. I would be nervous 
> if we started having abstract containers of data structures pushed down into 
> FDB itself :)
>
> Adam
>
> > On Feb 19, 2019, at 5:41 PM, Paul Davis  wrote:
> >
> > A simple doc storage version number would likely be enough for future us to
> > do fancier things.
> >
> > On Tue, Feb 19, 2019 at 4:16 PM Benjamin Anderson 
> > wrote:
> >
> >>> I don’t think adding a layer of abstraction is the right move just yet,
> >> I think we should continue to find consensus on one answer to this question
> >>
> >> Agree that the theorycrafting stage is not optimal for making
> >> abstraction decisions, but I suspect it would be worthwhile somewhere
> >> between prototyping and releasing. Adam's proposal does seem to me the
> >> most appealing approach on the surface, and I don't see anyone signing
> >> up to do the work to deliver an alternative concurrently.
> >>
> >> --
> >> ba
> >>
> >> On Tue, Feb 19, 2019 at 1:43 PM Robert Samuel Newson 
> >> wrote:
> >>>
> >>> Addendum: By “directory aliasing” I meant within a document (either the
> >> actual Directory thing or something equivalent of our own making). The
> >> directory aliasing for each database is a good way to reduce key size
> >> without a significant cost. Though if Redwood lands in time, even this
> >> would become an inutile obfuscation].
> >>>
>  On 19 Feb 2019, at 21:39, Robert Samuel Newson 
> >> wrote:
> 
>  Interesting suggestion, obviously the details might get the wrong kind
> >> of fun.
> 
>  Somewhere above I suggested this would be something we could change
> >> over time and even use different approaches for different documents within
> >> the same database. This is the long way of saying there are multiple ways
> >> to do this each with advantages and none without disadvantages.
> 
>  I don’t think adding a layer of abstraction is the right move just
> >> yet, I think we should continue to find consensus on one answer to this
> >> question (and the related ones in other threads) for the first release.
> >> It’s easy to say “we can change it later”, of course. We can, though it
> >> would be a chunk of work in the context of something that already works,
> >> I’ve rarely seen anyone sign up for that.
> 
>  I’m fine with the first proposal from Adam, where the keys are tuples
> >> of key parts pointing at terminal values. To make it easier for the first
> >> version, I would exclude optimisations like deduplication or the Directory
> >> aliasing or the schema thing that I suggested and that Ilya incorporated a
> >> variant of in a follow-up post. We’d accept that there are limits on the
> >> sizes of documents, including the awkward-to-express one about property
> >> depth.
> 
>  Stepping back, I’m not seeing any essential improvement over Adam’s
> >> original proposal besides the few corrections and clarifications made by
> >> various authors. Could we start an RFC based on Adam’s original proposal on
> >> document body, revision tree and index storage? We could then have PR’s
> >> against that for each additional optimisation (one person’s optimisation is
> >> another person’s needless complication)?
> 
>  If I’ve missed some genuine advance on the original proposal in this
> >> long thread, please call it out for me.
> 
>  B.
> 
> > On 19 Feb 2019, at 21:15, Benjamin Anderson 
> >> wrote:
> >
> > As is evident by the length of this thread, there's a pretty big
> > design space to cover here, and it seems unlikely we'll have arrived
> > at a "correct" solution even by the time this thing ships. Perhaps it
> > would be worthwhile to treat the in-FDB representation of data as a
> > first-class abstraction and support multiple representations
> > simultaneously?
> >
> > Obviously there's no such thing as a zero-cost abstraction - and I've
> > not thought very hard about how far up the stack the document
> > representation would need to leak - but supporting different layouts
> > (primarily, as Adam points out, on the document body itself) might
> > prove interesting and useful. I'm sure there are folks interested in a
> > column-shaped CouchDB, for example.
> >
> > --
> > b
> >
>

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-19 Thread Adam Kocoloski

Yes, that sort of versioning has been omitted from the various concrete 
proposals but we definitely want to have it. We’ve seen the alternative in some 
of the Erlang records that we serialize to disk today and it ain’t pretty.

I can imagine that we’ll want to have the codebase laid out in a way that 
allows us to upgrade to a smarter KV encoding over time without major surgery, 
which I think is a good “layer of abstraction”. I would be nervous if we 
started having abstract containers of data structures pushed down into FDB 
itself :)

Adam

> On Feb 19, 2019, at 5:41 PM, Paul Davis  wrote:
> 
> A simple doc storage version number would likely be enough for future us to
> do fancier things.
> 
> On Tue, Feb 19, 2019 at 4:16 PM Benjamin Anderson 
> wrote:
> 
>>> I don’t think adding a layer of abstraction is the right move just yet,
>> I think we should continue to find consensus on one answer to this question
>> 
>> Agree that the theorycrafting stage is not optimal for making
>> abstraction decisions, but I suspect it would be worthwhile somewhere
>> between prototyping and releasing. Adam's proposal does seem to me the
>> most appealing approach on the surface, and I don't see anyone signing
>> up to do the work to deliver an alternative concurrently.
>> 
>> --
>> ba
>> 
>> On Tue, Feb 19, 2019 at 1:43 PM Robert Samuel Newson 
>> wrote:
>>> 
>>> Addendum: By “directory aliasing” I meant within a document (either the
>> actual Directory thing or something equivalent of our own making). The
>> directory aliasing for each database is a good way to reduce key size
>> without a significant cost. Though if Redwood lands in time, even this
>> would become an inutile obfuscation].
>>> 
 On 19 Feb 2019, at 21:39, Robert Samuel Newson 
>> wrote:
 
 Interesting suggestion, obviously the details might get the wrong kind
>> of fun.
 
 Somewhere above I suggested this would be something we could change
>> over time and even use different approaches for different documents within
>> the same database. This is the long way of saying there are multiple ways
>> to do this each with advantages and none without disadvantages.
 
 I don’t think adding a layer of abstraction is the right move just
>> yet, I think we should continue to find consensus on one answer to this
>> question (and the related ones in other threads) for the first release.
>> It’s easy to say “we can change it later”, of course. We can, though it
>> would be a chunk of work in the context of something that already works,
>> I’ve rarely seen anyone sign up for that.
 
 I’m fine with the first proposal from Adam, where the keys are tuples
>> of key parts pointing at terminal values. To make it easier for the first
>> version, I would exclude optimisations like deduplication or the Directory
>> aliasing or the schema thing that I suggested and that Ilya incorporated a
>> variant of in a follow-up post. We’d accept that there are limits on the
>> sizes of documents, including the awkward-to-express one about property
>> depth.
 
 Stepping back, I’m not seeing any essential improvement over Adam’s
>> original proposal besides the few corrections and clarifications made by
>> various authors. Could we start an RFC based on Adam’s original proposal on
>> document body, revision tree and index storage? We could then have PR’s
>> against that for each additional optimisation (one person’s optimisation is
>> another person’s needless complication)?
 
 If I’ve missed some genuine advance on the original proposal in this
>> long thread, please call it out for me.
 
 B.
 
> On 19 Feb 2019, at 21:15, Benjamin Anderson 
>> wrote:
> 
> As is evident by the length of this thread, there's a pretty big
> design space to cover here, and it seems unlikely we'll have arrived
> at a "correct" solution even by the time this thing ships. Perhaps it
> would be worthwhile to treat the in-FDB representation of data as a
> first-class abstraction and support multiple representations
> simultaneously?
> 
> Obviously there's no such thing as a zero-cost abstraction - and I've
> not thought very hard about how far up the stack the document
> representation would need to leak - but supporting different layouts
> (primarily, as Adam points out, on the document body itself) might
> prove interesting and useful. I'm sure there are folks interested in a
> column-shaped CouchDB, for example.
> 
> --
> b
> 
> On Tue, Feb 19, 2019 at 11:39 AM Robert Newson 
>> wrote:
>> 
>> Good points on revtree, I agree with you we should store that
>> intelligently to gain the benefits you mentioned.
>> 
>> --
>> Robert Samuel Newson
>> rnew...@apache.org
>> 
>> On Tue, 19 Feb 2019, at 18:41, Adam Kocoloski wrote:
>>> I do not think we should store the revtree as a blob. The design
>> where
>>> each edit branch is its own

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-19 Thread Paul Davis

Assuming we have multiple competing implementations performance
characterization of each is going to have to be a requirement, right?

Theoretically this would all be at the HTTP layer such that we can swap
things in and out and easily check total system performance and not
duplicate effort and get distracted by pineapple vs recliner comparisons.

The theory being that we’d focus on observable differences weighed against
implementation complexity against feature “ability” in terms of what
various designs might offer.

I’ve been working on Erlang bindings for FoundationDB for a bit and would
really like to bring their approach to testing up through the rest of
CouchDB. Its very reminiscent of property based testing though slightly
less formal.  But it certainly finds bugs. Anything we do regardless of
peromance  I think should be accompanied by a similarly thoroughbtest suite.


On Tue, Feb 19, 2019 at 5:45 PM Joan Touzet  wrote:

> Would it be too much work to prototype both and check CRUD timings for
> each across a small variety of documents?
>
> -Joan
>
> - Original Message -
> > From: "Paul Davis" 
> > To: dev@couchdb.apache.org
> > Sent: Tuesday, February 19, 2019 5:41:23 PM
> > Subject: Re: [DISCUSS] : things we need to solve/decide : storing JSON
> documents
> >
> > A simple doc storage version number would likely be enough for future
> > us to
> > do fancier things.
> >
> > On Tue, Feb 19, 2019 at 4:16 PM Benjamin Anderson
> > 
> > wrote:
> >
> > > > I don’t think adding a layer of abstraction is the right move
> > > > just yet,
> > > I think we should continue to find consensus on one answer to this
> > > question
> > >
> > > Agree that the theorycrafting stage is not optimal for making
> > > abstraction decisions, but I suspect it would be worthwhile
> > > somewhere
> > > between prototyping and releasing. Adam's proposal does seem to me
> > > the
> > > most appealing approach on the surface, and I don't see anyone
> > > signing
> > > up to do the work to deliver an alternative concurrently.
> > >
> > > --
> > > ba
> > >
> > > On Tue, Feb 19, 2019 at 1:43 PM Robert Samuel Newson
> > > 
> > > wrote:
> > > >
> > > > Addendum: By “directory aliasing” I meant within a document
> > > > (either the
> > > actual Directory thing or something equivalent of our own making).
> > > The
> > > directory aliasing for each database is a good way to reduce key
> > > size
> > > without a significant cost. Though if Redwood lands in time, even
> > > this
> > > would become an inutile obfuscation].
> > > >
> > > > > On 19 Feb 2019, at 21:39, Robert Samuel Newson
> > > > > 
> > > wrote:
> > > > >
> > > > > Interesting suggestion, obviously the details might get the
> > > > > wrong kind
> > > of fun.
> > > > >
> > > > > Somewhere above I suggested this would be something we could
> > > > > change
> > > over time and even use different approaches for different documents
> > > within
> > > the same database. This is the long way of saying there are
> > > multiple ways
> > > to do this each with advantages and none without disadvantages.
> > > > >
> > > > > I don’t think adding a layer of abstraction is the right move
> > > > > just
> > > yet, I think we should continue to find consensus on one answer to
> > > this
> > > question (and the related ones in other threads) for the first
> > > release.
> > > It’s easy to say “we can change it later”, of course. We can,
> > > though it
> > > would be a chunk of work in the context of something that already
> > > works,
> > > I’ve rarely seen anyone sign up for that.
> > > > >
> > > > > I’m fine with the first proposal from Adam, where the keys are
> > > > > tuples
> > > of key parts pointing at terminal values. To make it easier for the
> > > first
> > > version, I would exclude optimisations like deduplication or the
> > > Directory
> > > aliasing or the schema thing that I suggested and that Ilya
> > > incorporated a
> > > variant of in a follow-up post. We’d accept that there are limits
> > > on the
> > > sizes of documents, including the awkward-to-express one about
> > > property
> > > depth.
> > > > >

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-19 Thread Joan Touzet

Would it be too much work to prototype both and check CRUD timings for
each across a small variety of documents?

-Joan

- Original Message -
> From: "Paul Davis" 
> To: dev@couchdb.apache.org
> Sent: Tuesday, February 19, 2019 5:41:23 PM
> Subject: Re: [DISCUSS] : things we need to solve/decide : storing JSON 
> documents
> 
> A simple doc storage version number would likely be enough for future
> us to
> do fancier things.
> 
> On Tue, Feb 19, 2019 at 4:16 PM Benjamin Anderson
> 
> wrote:
> 
> > > I don’t think adding a layer of abstraction is the right move
> > > just yet,
> > I think we should continue to find consensus on one answer to this
> > question
> >
> > Agree that the theorycrafting stage is not optimal for making
> > abstraction decisions, but I suspect it would be worthwhile
> > somewhere
> > between prototyping and releasing. Adam's proposal does seem to me
> > the
> > most appealing approach on the surface, and I don't see anyone
> > signing
> > up to do the work to deliver an alternative concurrently.
> >
> > --
> > ba
> >
> > On Tue, Feb 19, 2019 at 1:43 PM Robert Samuel Newson
> > 
> > wrote:
> > >
> > > Addendum: By “directory aliasing” I meant within a document
> > > (either the
> > actual Directory thing or something equivalent of our own making).
> > The
> > directory aliasing for each database is a good way to reduce key
> > size
> > without a significant cost. Though if Redwood lands in time, even
> > this
> > would become an inutile obfuscation].
> > >
> > > > On 19 Feb 2019, at 21:39, Robert Samuel Newson
> > > > 
> > wrote:
> > > >
> > > > Interesting suggestion, obviously the details might get the
> > > > wrong kind
> > of fun.
> > > >
> > > > Somewhere above I suggested this would be something we could
> > > > change
> > over time and even use different approaches for different documents
> > within
> > the same database. This is the long way of saying there are
> > multiple ways
> > to do this each with advantages and none without disadvantages.
> > > >
> > > > I don’t think adding a layer of abstraction is the right move
> > > > just
> > yet, I think we should continue to find consensus on one answer to
> > this
> > question (and the related ones in other threads) for the first
> > release.
> > It’s easy to say “we can change it later”, of course. We can,
> > though it
> > would be a chunk of work in the context of something that already
> > works,
> > I’ve rarely seen anyone sign up for that.
> > > >
> > > > I’m fine with the first proposal from Adam, where the keys are
> > > > tuples
> > of key parts pointing at terminal values. To make it easier for the
> > first
> > version, I would exclude optimisations like deduplication or the
> > Directory
> > aliasing or the schema thing that I suggested and that Ilya
> > incorporated a
> > variant of in a follow-up post. We’d accept that there are limits
> > on the
> > sizes of documents, including the awkward-to-express one about
> > property
> > depth.
> > > >
> > > > Stepping back, I’m not seeing any essential improvement over
> > > > Adam’s
> > original proposal besides the few corrections and clarifications
> > made by
> > various authors. Could we start an RFC based on Adam’s original
> > proposal on
> > document body, revision tree and index storage? We could then have
> > PR’s
> > against that for each additional optimisation (one person’s
> > optimisation is
> > another person’s needless complication)?
> > > >
> > > > If I’ve missed some genuine advance on the original proposal in
> > > > this
> > long thread, please call it out for me.
> > > >
> > > > B.
> > > >
> > > >> On 19 Feb 2019, at 21:15, Benjamin Anderson
> > > >> 
> > wrote:
> > > >>
> > > >> As is evident by the length of this thread, there's a pretty
> > > >> big
> > > >> design space to cover here, and it seems unlikely we'll have
> > > >> arrived
> > > >> at a "correct" solution even by the time this thing ships.
> > > >> Perhaps it
> > > >> would be worthwhile to treat the in-FDB representation of data
> > > >> as a
> >

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-19 Thread Paul Davis

A simple doc storage version number would likely be enough for future us to
do fancier things.

On Tue, Feb 19, 2019 at 4:16 PM Benjamin Anderson 
wrote:

> > I don’t think adding a layer of abstraction is the right move just yet,
> I think we should continue to find consensus on one answer to this question
>
> Agree that the theorycrafting stage is not optimal for making
> abstraction decisions, but I suspect it would be worthwhile somewhere
> between prototyping and releasing. Adam's proposal does seem to me the
> most appealing approach on the surface, and I don't see anyone signing
> up to do the work to deliver an alternative concurrently.
>
> --
> ba
>
> On Tue, Feb 19, 2019 at 1:43 PM Robert Samuel Newson 
> wrote:
> >
> > Addendum: By “directory aliasing” I meant within a document (either the
> actual Directory thing or something equivalent of our own making). The
> directory aliasing for each database is a good way to reduce key size
> without a significant cost. Though if Redwood lands in time, even this
> would become an inutile obfuscation].
> >
> > > On 19 Feb 2019, at 21:39, Robert Samuel Newson 
> wrote:
> > >
> > > Interesting suggestion, obviously the details might get the wrong kind
> of fun.
> > >
> > > Somewhere above I suggested this would be something we could change
> over time and even use different approaches for different documents within
> the same database. This is the long way of saying there are multiple ways
> to do this each with advantages and none without disadvantages.
> > >
> > > I don’t think adding a layer of abstraction is the right move just
> yet, I think we should continue to find consensus on one answer to this
> question (and the related ones in other threads) for the first release.
> It’s easy to say “we can change it later”, of course. We can, though it
> would be a chunk of work in the context of something that already works,
> I’ve rarely seen anyone sign up for that.
> > >
> > > I’m fine with the first proposal from Adam, where the keys are tuples
> of key parts pointing at terminal values. To make it easier for the first
> version, I would exclude optimisations like deduplication or the Directory
> aliasing or the schema thing that I suggested and that Ilya incorporated a
> variant of in a follow-up post. We’d accept that there are limits on the
> sizes of documents, including the awkward-to-express one about property
> depth.
> > >
> > > Stepping back, I’m not seeing any essential improvement over Adam’s
> original proposal besides the few corrections and clarifications made by
> various authors. Could we start an RFC based on Adam’s original proposal on
> document body, revision tree and index storage? We could then have PR’s
> against that for each additional optimisation (one person’s optimisation is
> another person’s needless complication)?
> > >
> > > If I’ve missed some genuine advance on the original proposal in this
> long thread, please call it out for me.
> > >
> > > B.
> > >
> > >> On 19 Feb 2019, at 21:15, Benjamin Anderson 
> wrote:
> > >>
> > >> As is evident by the length of this thread, there's a pretty big
> > >> design space to cover here, and it seems unlikely we'll have arrived
> > >> at a "correct" solution even by the time this thing ships. Perhaps it
> > >> would be worthwhile to treat the in-FDB representation of data as a
> > >> first-class abstraction and support multiple representations
> > >> simultaneously?
> > >>
> > >> Obviously there's no such thing as a zero-cost abstraction - and I've
> > >> not thought very hard about how far up the stack the document
> > >> representation would need to leak - but supporting different layouts
> > >> (primarily, as Adam points out, on the document body itself) might
> > >> prove interesting and useful. I'm sure there are folks interested in a
> > >> column-shaped CouchDB, for example.
> > >>
> > >> --
> > >> b
> > >>
> > >> On Tue, Feb 19, 2019 at 11:39 AM Robert Newson 
> wrote:
> > >>>
> > >>> Good points on revtree, I agree with you we should store that
> intelligently to gain the benefits you mentioned.
> > >>>
> > >>> --
> > >>> Robert Samuel Newson
> > >>> rnew...@apache.org
> > >>>
> > >>> On Tue, 19 Feb 2019, at 18:41, Adam Kocoloski wrote:
> >  I do not think we should store the revtree as a blob. The design
> where
> >  each edit branch is its own KV should save on network IO and CPU
> cycles
> >  for normal updates. We’ve performed too many heroics to keep
> >  couch_key_tree from stalling entire databases when trying to update
> a
> >  single document with a wide revision tree, I would much prefer to
> ignore
> >  other edit branches entirely when all we’re doing is extending one
> of
> >  them.
> > 
> >  I also do not think we should store JSON documents as blobs, but
> it’s a
> >  closer call. Some of my reasoning for preferring the exploded path
> >  design:
> > 
> >  - it lends itself nicely to sub-document operations,

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-19 Thread Benjamin Anderson

> I don’t think adding a layer of abstraction is the right move just yet, I 
> think we should continue to find consensus on one answer to this question

Agree that the theorycrafting stage is not optimal for making
abstraction decisions, but I suspect it would be worthwhile somewhere
between prototyping and releasing. Adam's proposal does seem to me the
most appealing approach on the surface, and I don't see anyone signing
up to do the work to deliver an alternative concurrently.

--
ba

On Tue, Feb 19, 2019 at 1:43 PM Robert Samuel Newson  wrote:
>
> Addendum: By “directory aliasing” I meant within a document (either the 
> actual Directory thing or something equivalent of our own making). The 
> directory aliasing for each database is a good way to reduce key size without 
> a significant cost. Though if Redwood lands in time, even this would become 
> an inutile obfuscation].
>
> > On 19 Feb 2019, at 21:39, Robert Samuel Newson  wrote:
> >
> > Interesting suggestion, obviously the details might get the wrong kind of 
> > fun.
> >
> > Somewhere above I suggested this would be something we could change over 
> > time and even use different approaches for different documents within the 
> > same database. This is the long way of saying there are multiple ways to do 
> > this each with advantages and none without disadvantages.
> >
> > I don’t think adding a layer of abstraction is the right move just yet, I 
> > think we should continue to find consensus on one answer to this question 
> > (and the related ones in other threads) for the first release. It’s easy to 
> > say “we can change it later”, of course. We can, though it would be a chunk 
> > of work in the context of something that already works, I’ve rarely seen 
> > anyone sign up for that.
> >
> > I’m fine with the first proposal from Adam, where the keys are tuples of 
> > key parts pointing at terminal values. To make it easier for the first 
> > version, I would exclude optimisations like deduplication or the Directory 
> > aliasing or the schema thing that I suggested and that Ilya incorporated a 
> > variant of in a follow-up post. We’d accept that there are limits on the 
> > sizes of documents, including the awkward-to-express one about property 
> > depth.
> >
> > Stepping back, I’m not seeing any essential improvement over Adam’s 
> > original proposal besides the few corrections and clarifications made by 
> > various authors. Could we start an RFC based on Adam’s original proposal on 
> > document body, revision tree and index storage? We could then have PR’s 
> > against that for each additional optimisation (one person’s optimisation is 
> > another person’s needless complication)?
> >
> > If I’ve missed some genuine advance on the original proposal in this long 
> > thread, please call it out for me.
> >
> > B.
> >
> >> On 19 Feb 2019, at 21:15, Benjamin Anderson  wrote:
> >>
> >> As is evident by the length of this thread, there's a pretty big
> >> design space to cover here, and it seems unlikely we'll have arrived
> >> at a "correct" solution even by the time this thing ships. Perhaps it
> >> would be worthwhile to treat the in-FDB representation of data as a
> >> first-class abstraction and support multiple representations
> >> simultaneously?
> >>
> >> Obviously there's no such thing as a zero-cost abstraction - and I've
> >> not thought very hard about how far up the stack the document
> >> representation would need to leak - but supporting different layouts
> >> (primarily, as Adam points out, on the document body itself) might
> >> prove interesting and useful. I'm sure there are folks interested in a
> >> column-shaped CouchDB, for example.
> >>
> >> --
> >> b
> >>
> >> On Tue, Feb 19, 2019 at 11:39 AM Robert Newson  wrote:
> >>>
> >>> Good points on revtree, I agree with you we should store that 
> >>> intelligently to gain the benefits you mentioned.
> >>>
> >>> --
> >>> Robert Samuel Newson
> >>> rnew...@apache.org
> >>>
> >>> On Tue, 19 Feb 2019, at 18:41, Adam Kocoloski wrote:
>  I do not think we should store the revtree as a blob. The design where
>  each edit branch is its own KV should save on network IO and CPU cycles
>  for normal updates. We’ve performed too many heroics to keep
>  couch_key_tree from stalling entire databases when trying to update a
>  single document with a wide revision tree, I would much prefer to ignore
>  other edit branches entirely when all we’re doing is extending one of
>  them.
> 
>  I also do not think we should store JSON documents as blobs, but it’s a
>  closer call. Some of my reasoning for preferring the exploded path
>  design:
> 
>  - it lends itself nicely to sub-document operations, for which Jan
>  crafted an RFC last year: https://github.com/apache/couchdb/issues/1559
>  - it optimizes the creation of Mango indexes on existing databases since
>  we only need to retrieve the value(s) we want to index
>  -

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-19 Thread Robert Samuel Newson

Addendum: By “directory aliasing” I meant within a document (either the actual 
Directory thing or something equivalent of our own making). The directory 
aliasing for each database is a good way to reduce key size without a 
significant cost. Though if Redwood lands in time, even this would become an 
inutile obfuscation].

> On 19 Feb 2019, at 21:39, Robert Samuel Newson  wrote:
> 
> Interesting suggestion, obviously the details might get the wrong kind of fun.
> 
> Somewhere above I suggested this would be something we could change over time 
> and even use different approaches for different documents within the same 
> database. This is the long way of saying there are multiple ways to do this 
> each with advantages and none without disadvantages.
> 
> I don’t think adding a layer of abstraction is the right move just yet, I 
> think we should continue to find consensus on one answer to this question 
> (and the related ones in other threads) for the first release. It’s easy to 
> say “we can change it later”, of course. We can, though it would be a chunk 
> of work in the context of something that already works, I’ve rarely seen 
> anyone sign up for that.
> 
> I’m fine with the first proposal from Adam, where the keys are tuples of key 
> parts pointing at terminal values. To make it easier for the first version, I 
> would exclude optimisations like deduplication or the Directory aliasing or 
> the schema thing that I suggested and that Ilya incorporated a variant of in 
> a follow-up post. We’d accept that there are limits on the sizes of 
> documents, including the awkward-to-express one about property depth.
> 
> Stepping back, I’m not seeing any essential improvement over Adam’s original 
> proposal besides the few corrections and clarifications made by various 
> authors. Could we start an RFC based on Adam’s original proposal on document 
> body, revision tree and index storage? We could then have PR’s against that 
> for each additional optimisation (one person’s optimisation is another 
> person’s needless complication)?
> 
> If I’ve missed some genuine advance on the original proposal in this long 
> thread, please call it out for me.
> 
> B.
> 
>> On 19 Feb 2019, at 21:15, Benjamin Anderson  wrote:
>> 
>> As is evident by the length of this thread, there's a pretty big
>> design space to cover here, and it seems unlikely we'll have arrived
>> at a "correct" solution even by the time this thing ships. Perhaps it
>> would be worthwhile to treat the in-FDB representation of data as a
>> first-class abstraction and support multiple representations
>> simultaneously?
>> 
>> Obviously there's no such thing as a zero-cost abstraction - and I've
>> not thought very hard about how far up the stack the document
>> representation would need to leak - but supporting different layouts
>> (primarily, as Adam points out, on the document body itself) might
>> prove interesting and useful. I'm sure there are folks interested in a
>> column-shaped CouchDB, for example.
>> 
>> --
>> b
>> 
>> On Tue, Feb 19, 2019 at 11:39 AM Robert Newson  wrote:
>>> 
>>> Good points on revtree, I agree with you we should store that intelligently 
>>> to gain the benefits you mentioned.
>>> 
>>> --
>>> Robert Samuel Newson
>>> rnew...@apache.org
>>> 
>>> On Tue, 19 Feb 2019, at 18:41, Adam Kocoloski wrote:
 I do not think we should store the revtree as a blob. The design where
 each edit branch is its own KV should save on network IO and CPU cycles
 for normal updates. We’ve performed too many heroics to keep
 couch_key_tree from stalling entire databases when trying to update a
 single document with a wide revision tree, I would much prefer to ignore
 other edit branches entirely when all we’re doing is extending one of
 them.
 
 I also do not think we should store JSON documents as blobs, but it’s a
 closer call. Some of my reasoning for preferring the exploded path
 design:
 
 - it lends itself nicely to sub-document operations, for which Jan
 crafted an RFC last year: https://github.com/apache/couchdb/issues/1559
 - it optimizes the creation of Mango indexes on existing databases since
 we only need to retrieve the value(s) we want to index
 - it optimizes Mango queries that use field selectors
 - anyone who wanted to try their hand at GraphQL will find it very
 handy: https://github.com/apache/couchdb/issues/1499
 - looking further ahead, it lets us play with smarter leaf value types
 like Counters (yes I’m still on the CRDT bandwagon, sorry)
 
 A few comments on the thread:
 
>>> * Most documents bodies are probably going to be smaller than 100k. So 
>>> in
>>> the majority of case it would be one write / one read to update and 
>>> fetch
>>> the document body.
 
 We should test, but I expect reading 50KB of data in a range query is
 almost as efficient as reading a single 50 KB value.

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-19 Thread Benjamin Anderson

As is evident by the length of this thread, there's a pretty big
design space to cover here, and it seems unlikely we'll have arrived
at a "correct" solution even by the time this thing ships. Perhaps it
would be worthwhile to treat the in-FDB representation of data as a
first-class abstraction and support multiple representations
simultaneously?

Obviously there's no such thing as a zero-cost abstraction - and I've
not thought very hard about how far up the stack the document
representation would need to leak - but supporting different layouts
(primarily, as Adam points out, on the document body itself) might
prove interesting and useful. I'm sure there are folks interested in a
column-shaped CouchDB, for example.

--
b

On Tue, Feb 19, 2019 at 11:39 AM Robert Newson  wrote:
>
> Good points on revtree, I agree with you we should store that intelligently 
> to gain the benefits you mentioned.
>
> --
>   Robert Samuel Newson
>   rnew...@apache.org
>
> On Tue, 19 Feb 2019, at 18:41, Adam Kocoloski wrote:
> > I do not think we should store the revtree as a blob. The design where
> > each edit branch is its own KV should save on network IO and CPU cycles
> > for normal updates. We’ve performed too many heroics to keep
> > couch_key_tree from stalling entire databases when trying to update a
> > single document with a wide revision tree, I would much prefer to ignore
> > other edit branches entirely when all we’re doing is extending one of
> > them.
> >
> > I also do not think we should store JSON documents as blobs, but it’s a
> > closer call. Some of my reasoning for preferring the exploded path
> > design:
> >
> > - it lends itself nicely to sub-document operations, for which Jan
> > crafted an RFC last year: https://github.com/apache/couchdb/issues/1559
> > - it optimizes the creation of Mango indexes on existing databases since
> > we only need to retrieve the value(s) we want to index
> > - it optimizes Mango queries that use field selectors
> > - anyone who wanted to try their hand at GraphQL will find it very
> > handy: https://github.com/apache/couchdb/issues/1499
> > - looking further ahead, it lets us play with smarter leaf value types
> > like Counters (yes I’m still on the CRDT bandwagon, sorry)
> >
> > A few comments on the thread:
> >
> > >>> * Most documents bodies are probably going to be smaller than 100k. So 
> > >>> in
> > >>> the majority of case it would be one write / one read to update and 
> > >>> fetch
> > >>> the document body.
> >
> > We should test, but I expect reading 50KB of data in a range query is
> > almost as efficient as reading a single 50 KB value. Similarly, writes
> > to a contiguous set of keys should be quite efficient.
> >
> > I am concerned about the overhead of the repeated field paths in the
> > keys with the exploded path option in the absence of key prefix
> > compression. That would be my main reason to acquiesce and throw away
> > all the document structure.
> >
> > Adam
> >
> > > On Feb 19, 2019, at 12:04 PM, Robert Newson  wrote:
> > >
> > > I like the idea that we'd reuse the same pattern (but perhaps not the 
> > > same _code_) for doc bodies, revtree and attachments.
> > >
> > > I hope we still get to delete couch_key_tree.erl, though.
> > >
> > > --
> > >  Robert Samuel Newson
> > >  rnew...@apache.org
> > >
> > > On Tue, 19 Feb 2019, at 17:03, Jan Lehnardt wrote:
> > >> I like the idea from a “trying a simple thing first” perspective, but
> > >> Nick’s points below are especially convincing to with this for now.
> > >>
> > >> Best
> > >> Jan
> > >> —
> > >>
> > >>> On 19. Feb 2019, at 17:53, Nick Vatamaniuc  wrote:
> > >>>
> > >>> Hi,
> > >>>
> > >>> Sorry for jumping in so late, I was following from the sidelines 
> > >>> mostly. A
> > >>> lot of good discussion happening and am excited about the possibilities
> > >>> here.
> > >>>
> > >>> I do like the simpler "chunking" approach for a few reasons:
> > >>>
> > >>> * Most documents bodies are probably going to be smaller than 100k. So 
> > >>> in
> > >>> the majority of case it would be one write / one read to update and 
> > >>> fetch
> > >>> the document body.
> > >>>
> > >>> * We could reuse the chunking code for attachment handling and possibly
> > >>> revision key trees. So it's the general pattern of upload chunks to some
> > >>> prefix, and when finished flip an atomic toggle to make it current.
> > >>>
> > >>> * Do the same thing with revision trees and we could re-use the revision
> > >>> tree manipulation logic. That is, the key tree in most cases would be 
> > >>> small
> > >>> enough to fit in 100k but if they get huge, they'd get chunked. This 
> > >>> would
> > >>> allow us to reuse all the battle tested couch_key_tree code mostly as 
> > >>> is.
> > >>> We even have property tests for it
> > >>> https://github.com/apache/couchdb/blob/master/src/couch/test/couch_key_tree_prop_tests.erl
> > >>>
> > >>> * It removes the need to explain the max exploded path length 
> > >>> limitation to
> >

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-19 Thread Robert Newson

Good points on revtree, I agree with you we should store that intelligently to 
gain the benefits you mentioned.

-- 
  Robert Samuel Newson
  rnew...@apache.org

On Tue, 19 Feb 2019, at 18:41, Adam Kocoloski wrote:
> I do not think we should store the revtree as a blob. The design where 
> each edit branch is its own KV should save on network IO and CPU cycles 
> for normal updates. We’ve performed too many heroics to keep 
> couch_key_tree from stalling entire databases when trying to update a 
> single document with a wide revision tree, I would much prefer to ignore 
> other edit branches entirely when all we’re doing is extending one of 
> them.
> 
> I also do not think we should store JSON documents as blobs, but it’s a 
> closer call. Some of my reasoning for preferring the exploded path 
> design:
> 
> - it lends itself nicely to sub-document operations, for which Jan 
> crafted an RFC last year: https://github.com/apache/couchdb/issues/1559
> - it optimizes the creation of Mango indexes on existing databases since 
> we only need to retrieve the value(s) we want to index
> - it optimizes Mango queries that use field selectors
> - anyone who wanted to try their hand at GraphQL will find it very 
> handy: https://github.com/apache/couchdb/issues/1499
> - looking further ahead, it lets us play with smarter leaf value types 
> like Counters (yes I’m still on the CRDT bandwagon, sorry)
> 
> A few comments on the thread:
> 
> >>> * Most documents bodies are probably going to be smaller than 100k. So in
> >>> the majority of case it would be one write / one read to update and fetch
> >>> the document body.
> 
> We should test, but I expect reading 50KB of data in a range query is 
> almost as efficient as reading a single 50 KB value. Similarly, writes 
> to a contiguous set of keys should be quite efficient.
> 
> I am concerned about the overhead of the repeated field paths in the 
> keys with the exploded path option in the absence of key prefix 
> compression. That would be my main reason to acquiesce and throw away 
> all the document structure.
> 
> Adam
> 
> > On Feb 19, 2019, at 12:04 PM, Robert Newson  wrote:
> > 
> > I like the idea that we'd reuse the same pattern (but perhaps not the same 
> > _code_) for doc bodies, revtree and attachments.
> > 
> > I hope we still get to delete couch_key_tree.erl, though.
> > 
> > -- 
> >  Robert Samuel Newson
> >  rnew...@apache.org
> > 
> > On Tue, 19 Feb 2019, at 17:03, Jan Lehnardt wrote:
> >> I like the idea from a “trying a simple thing first” perspective, but 
> >> Nick’s points below are especially convincing to with this for now.
> >> 
> >> Best
> >> Jan
> >> —
> >> 
> >>> On 19. Feb 2019, at 17:53, Nick Vatamaniuc  wrote:
> >>> 
> >>> Hi,
> >>> 
> >>> Sorry for jumping in so late, I was following from the sidelines mostly. A
> >>> lot of good discussion happening and am excited about the possibilities
> >>> here.
> >>> 
> >>> I do like the simpler "chunking" approach for a few reasons:
> >>> 
> >>> * Most documents bodies are probably going to be smaller than 100k. So in
> >>> the majority of case it would be one write / one read to update and fetch
> >>> the document body.
> >>> 
> >>> * We could reuse the chunking code for attachment handling and possibly
> >>> revision key trees. So it's the general pattern of upload chunks to some
> >>> prefix, and when finished flip an atomic toggle to make it current.
> >>> 
> >>> * Do the same thing with revision trees and we could re-use the revision
> >>> tree manipulation logic. That is, the key tree in most cases would be 
> >>> small
> >>> enough to fit in 100k but if they get huge, they'd get chunked. This would
> >>> allow us to reuse all the battle tested couch_key_tree code mostly as is.
> >>> We even have property tests for it
> >>> https://github.com/apache/couchdb/blob/master/src/couch/test/couch_key_tree_prop_tests.erl
> >>> 
> >>> * It removes the need to explain the max exploded path length limitation 
> >>> to
> >>> customers.
> >>> 
> >>> Cheers,
> >>> -Nick
> >>> 
> >>> 
> >>> On Tue, Feb 19, 2019 at 11:18 AM Robert Newson  wrote:
> >>> 
>  Hi,
>  
>  An alternative storage model that we should seriously consider is to
>  follow our current approach in couch_file et al. Specifically, that the
>  document _body_ is stored as an uninterpreted binary value. This would be
>  much like the obvious plan for attachment storage; a key prefix that
>  identifies the database and document, with the final item of that key 
>  tuple
>  is an incrementing integer. Each of those keys has a binary value of up 
>  to
>  100k. Fetching all values with that key prefix, in fdb's natural 
>  ordering,
>  will yield the full document body, which can be JSON decoded for further
>  processing.
>  
>  I like this idea, and I like Adam's original proposal to explode 
>  documents
>  into property paths. I have a slight preference for the

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-19 Thread Adam Kocoloski

I do not think we should store the revtree as a blob. The design where each 
edit branch is its own KV should save on network IO and CPU cycles for normal 
updates. We’ve performed too many heroics to keep couch_key_tree from stalling 
entire databases when trying to update a single document with a wide revision 
tree, I would much prefer to ignore other edit branches entirely when all we’re 
doing is extending one of them.

I also do not think we should store JSON documents as blobs, but it’s a closer 
call. Some of my reasoning for preferring the exploded path design:

- it lends itself nicely to sub-document operations, for which Jan crafted an 
RFC last year: https://github.com/apache/couchdb/issues/1559
- it optimizes the creation of Mango indexes on existing databases since we 
only need to retrieve the value(s) we want to index
- it optimizes Mango queries that use field selectors
- anyone who wanted to try their hand at GraphQL will find it very handy: 
https://github.com/apache/couchdb/issues/1499
- looking further ahead, it lets us play with smarter leaf value types like 
Counters (yes I’m still on the CRDT bandwagon, sorry)

A few comments on the thread:

>>> * Most documents bodies are probably going to be smaller than 100k. So in
>>> the majority of case it would be one write / one read to update and fetch
>>> the document body.

We should test, but I expect reading 50KB of data in a range query is almost as 
efficient as reading a single 50 KB value. Similarly, writes to a contiguous 
set of keys should be quite efficient.

I am concerned about the overhead of the repeated field paths in the keys with 
the exploded path option in the absence of key prefix compression. That would 
be my main reason to acquiesce and throw away all the document structure.

Adam

> On Feb 19, 2019, at 12:04 PM, Robert Newson  wrote:
> 
> I like the idea that we'd reuse the same pattern (but perhaps not the same 
> _code_) for doc bodies, revtree and attachments.
> 
> I hope we still get to delete couch_key_tree.erl, though.
> 
> -- 
>  Robert Samuel Newson
>  rnew...@apache.org
> 
> On Tue, 19 Feb 2019, at 17:03, Jan Lehnardt wrote:
>> I like the idea from a “trying a simple thing first” perspective, but 
>> Nick’s points below are especially convincing to with this for now.
>> 
>> Best
>> Jan
>> —
>> 
>>> On 19. Feb 2019, at 17:53, Nick Vatamaniuc  wrote:
>>> 
>>> Hi,
>>> 
>>> Sorry for jumping in so late, I was following from the sidelines mostly. A
>>> lot of good discussion happening and am excited about the possibilities
>>> here.
>>> 
>>> I do like the simpler "chunking" approach for a few reasons:
>>> 
>>> * Most documents bodies are probably going to be smaller than 100k. So in
>>> the majority of case it would be one write / one read to update and fetch
>>> the document body.
>>> 
>>> * We could reuse the chunking code for attachment handling and possibly
>>> revision key trees. So it's the general pattern of upload chunks to some
>>> prefix, and when finished flip an atomic toggle to make it current.
>>> 
>>> * Do the same thing with revision trees and we could re-use the revision
>>> tree manipulation logic. That is, the key tree in most cases would be small
>>> enough to fit in 100k but if they get huge, they'd get chunked. This would
>>> allow us to reuse all the battle tested couch_key_tree code mostly as is.
>>> We even have property tests for it
>>> https://github.com/apache/couchdb/blob/master/src/couch/test/couch_key_tree_prop_tests.erl
>>> 
>>> * It removes the need to explain the max exploded path length limitation to
>>> customers.
>>> 
>>> Cheers,
>>> -Nick
>>> 
>>> 
>>> On Tue, Feb 19, 2019 at 11:18 AM Robert Newson  wrote:
>>> 
 Hi,

 An alternative storage model that we should seriously consider is to
 follow our current approach in couch_file et al. Specifically, that the
 document _body_ is stored as an uninterpreted binary value. This would be
 much like the obvious plan for attachment storage; a key prefix that
 identifies the database and document, with the final item of that key tuple
 is an incrementing integer. Each of those keys has a binary value of up to
 100k. Fetching all values with that key prefix, in fdb's natural ordering,
 will yield the full document body, which can be JSON decoded for further
 processing.

 I like this idea, and I like Adam's original proposal to explode documents
 into property paths. I have a slight preference for the simplicity of the
 idea in the previous paragraph, not least because it's close to what we do
 today. I also think it will be possible to migrate to alternative storage
 models in future, and foundationdb's transaction supports means we can do
 this migration seamlessly should we come to it.

 I'm very interested in knowing if anyone else is interested in going this
 simple, or considers it a wasted opportunity relative to the 'exploded'
 path.

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-19 Thread Robert Newson

 I like the idea that we'd reuse the same pattern (but perhaps not the same 
_code_) for doc bodies, revtree and attachments.

I hope we still get to delete couch_key_tree.erl, though.

-- 
  Robert Samuel Newson
  rnew...@apache.org

On Tue, 19 Feb 2019, at 17:03, Jan Lehnardt wrote:
> I like the idea from a “trying a simple thing first” perspective, but 
> Nick’s points below are especially convincing to with this for now.
> 
> Best
> Jan
> —
> 
> > On 19. Feb 2019, at 17:53, Nick Vatamaniuc  wrote:
> > 
> > Hi,
> > 
> > Sorry for jumping in so late, I was following from the sidelines mostly. A
> > lot of good discussion happening and am excited about the possibilities
> > here.
> > 
> > I do like the simpler "chunking" approach for a few reasons:
> > 
> > * Most documents bodies are probably going to be smaller than 100k. So in
> > the majority of case it would be one write / one read to update and fetch
> > the document body.
> > 
> > * We could reuse the chunking code for attachment handling and possibly
> > revision key trees. So it's the general pattern of upload chunks to some
> > prefix, and when finished flip an atomic toggle to make it current.
> > 
> > * Do the same thing with revision trees and we could re-use the revision
> > tree manipulation logic. That is, the key tree in most cases would be small
> > enough to fit in 100k but if they get huge, they'd get chunked. This would
> > allow us to reuse all the battle tested couch_key_tree code mostly as is.
> > We even have property tests for it
> > https://github.com/apache/couchdb/blob/master/src/couch/test/couch_key_tree_prop_tests.erl
> > 
> > * It removes the need to explain the max exploded path length limitation to
> > customers.
> > 
> > Cheers,
> > -Nick
> > 
> > 
> > On Tue, Feb 19, 2019 at 11:18 AM Robert Newson  wrote:
> > 
> >> Hi,
> >> 
> >> An alternative storage model that we should seriously consider is to
> >> follow our current approach in couch_file et al. Specifically, that the
> >> document _body_ is stored as an uninterpreted binary value. This would be
> >> much like the obvious plan for attachment storage; a key prefix that
> >> identifies the database and document, with the final item of that key tuple
> >> is an incrementing integer. Each of those keys has a binary value of up to
> >> 100k. Fetching all values with that key prefix, in fdb's natural ordering,
> >> will yield the full document body, which can be JSON decoded for further
> >> processing.
> >> 
> >> I like this idea, and I like Adam's original proposal to explode documents
> >> into property paths. I have a slight preference for the simplicity of the
> >> idea in the previous paragraph, not least because it's close to what we do
> >> today. I also think it will be possible to migrate to alternative storage
> >> models in future, and foundationdb's transaction supports means we can do
> >> this migration seamlessly should we come to it.
> >> 
> >> I'm very interested in knowing if anyone else is interested in going this
> >> simple, or considers it a wasted opportunity relative to the 'exploded'
> >> path.
> >> 
> >> B.
> >> 
> >> --
> >>  Robert Samuel Newson
> >>  rnew...@apache.org
> >> 
> >> On Mon, 4 Feb 2019, at 19:59, Robert Newson wrote:
> >>> I've been remiss here in not posting the data model ideas that IBM
> >>> worked up while we were thinking about using FoundationDB so I'm posting
> >>> it now. This is Adam' Kocoloski's original work, I am just transcribing
> >>> it, and this is the context that the folks from the IBM side came in
> >>> with, for full disclosure.
> >>> 
> >>> Basics
> >>> 
> >>> 1. All CouchDB databases are inside a Directory
> >>> 2. Each CouchDB database is a Directory within that Directory
> >>> 3. It's possible to list all subdirectories of a Directory, so
> >>> `_all_dbs` is the list of directories from 1.
> >>> 4. Each Directory representing a CouchdB database has several Subspaces;
> >>> 4a. by_id/ doc subspace: actual document contents
> >>> 4b. by_seq/versionstamp subspace: for the _changes feed
> >>> 4c. index_definitions, indexes, ...
> >>> 
> >>> JSON Mapping
> >>> 
> >>> A hierarchical JSON object naturally maps to multiple KV pairs in FDB:
> >>> 
> >>> {
> >>>“_id”: “foo”,
> >>>“owner”: “bob”,
> >>>“mylist”: [1,3,5],
> >>>“mymap”: {
> >>>“blue”: “#FF”,
> >>>“red”: “#FF”
> >>>}
> >>> }
> >>> 
> >>> maps to
> >>> 
> >>> (“foo”, “owner”) = “bob”
> >>> (“foo”, “mylist”, 0) = 1
> >>> (“foo”, “mylist”, 1) = 3
> >>> (“foo”, “mylist”, 2) = 5
> >>> (“foo”, “mymap”, “blue”) = “#FF”
> >>> (“foo”, “mymap”, “red”) = “#FF”
> >>> 
> >>> NB: this means that the 100KB limit applies to individual leafs in the
> >>> JSON object, not the entire doc
> >>> 
> >>> Edit Conflicts
> >>> 
> >>> We need to account for the presence of conflicts in various levels of
> >>> the doc due to replication.
> >>> 
> >>> Proposal is to create a special value indicating that the subtree below
> >>>

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-19 Thread Jan Lehnardt

I like the idea from a “trying a simple thing first” perspective, but Nick’s 
points below are especially convincing to with this for now.

Best
Jan
—

> On 19. Feb 2019, at 17:53, Nick Vatamaniuc  wrote:
> 
> Hi,
> 
> Sorry for jumping in so late, I was following from the sidelines mostly. A
> lot of good discussion happening and am excited about the possibilities
> here.
> 
> I do like the simpler "chunking" approach for a few reasons:
> 
> * Most documents bodies are probably going to be smaller than 100k. So in
> the majority of case it would be one write / one read to update and fetch
> the document body.
> 
> * We could reuse the chunking code for attachment handling and possibly
> revision key trees. So it's the general pattern of upload chunks to some
> prefix, and when finished flip an atomic toggle to make it current.
> 
> * Do the same thing with revision trees and we could re-use the revision
> tree manipulation logic. That is, the key tree in most cases would be small
> enough to fit in 100k but if they get huge, they'd get chunked. This would
> allow us to reuse all the battle tested couch_key_tree code mostly as is.
> We even have property tests for it
> https://github.com/apache/couchdb/blob/master/src/couch/test/couch_key_tree_prop_tests.erl
> 
> * It removes the need to explain the max exploded path length limitation to
> customers.
> 
> Cheers,
> -Nick
> 
> 
> On Tue, Feb 19, 2019 at 11:18 AM Robert Newson  wrote:
> 
>> Hi,
>> 
>> An alternative storage model that we should seriously consider is to
>> follow our current approach in couch_file et al. Specifically, that the
>> document _body_ is stored as an uninterpreted binary value. This would be
>> much like the obvious plan for attachment storage; a key prefix that
>> identifies the database and document, with the final item of that key tuple
>> is an incrementing integer. Each of those keys has a binary value of up to
>> 100k. Fetching all values with that key prefix, in fdb's natural ordering,
>> will yield the full document body, which can be JSON decoded for further
>> processing.
>> 
>> I like this idea, and I like Adam's original proposal to explode documents
>> into property paths. I have a slight preference for the simplicity of the
>> idea in the previous paragraph, not least because it's close to what we do
>> today. I also think it will be possible to migrate to alternative storage
>> models in future, and foundationdb's transaction supports means we can do
>> this migration seamlessly should we come to it.
>> 
>> I'm very interested in knowing if anyone else is interested in going this
>> simple, or considers it a wasted opportunity relative to the 'exploded'
>> path.
>> 
>> B.
>> 
>> --
>>  Robert Samuel Newson
>>  rnew...@apache.org
>> 
>> On Mon, 4 Feb 2019, at 19:59, Robert Newson wrote:
>>> I've been remiss here in not posting the data model ideas that IBM
>>> worked up while we were thinking about using FoundationDB so I'm posting
>>> it now. This is Adam' Kocoloski's original work, I am just transcribing
>>> it, and this is the context that the folks from the IBM side came in
>>> with, for full disclosure.
>>> 
>>> Basics
>>> 
>>> 1. All CouchDB databases are inside a Directory
>>> 2. Each CouchDB database is a Directory within that Directory
>>> 3. It's possible to list all subdirectories of a Directory, so
>>> `_all_dbs` is the list of directories from 1.
>>> 4. Each Directory representing a CouchdB database has several Subspaces;
>>> 4a. by_id/ doc subspace: actual document contents
>>> 4b. by_seq/versionstamp subspace: for the _changes feed
>>> 4c. index_definitions, indexes, ...
>>> 
>>> JSON Mapping
>>> 
>>> A hierarchical JSON object naturally maps to multiple KV pairs in FDB:
>>> 
>>> {
>>>“_id”: “foo”,
>>>“owner”: “bob”,
>>>“mylist”: [1,3,5],
>>>“mymap”: {
>>>“blue”: “#FF”,
>>>“red”: “#FF”
>>>}
>>> }
>>> 
>>> maps to
>>> 
>>> (“foo”, “owner”) = “bob”
>>> (“foo”, “mylist”, 0) = 1
>>> (“foo”, “mylist”, 1) = 3
>>> (“foo”, “mylist”, 2) = 5
>>> (“foo”, “mymap”, “blue”) = “#FF”
>>> (“foo”, “mymap”, “red”) = “#FF”
>>> 
>>> NB: this means that the 100KB limit applies to individual leafs in the
>>> JSON object, not the entire doc
>>> 
>>> Edit Conflicts
>>> 
>>> We need to account for the presence of conflicts in various levels of
>>> the doc due to replication.
>>> 
>>> Proposal is to create a special value indicating that the subtree below
>>> our current cursor position is in an unresolvable conflict. Then add
>>> additional KV pairs below to describe the conflicting entries.
>>> 
>>> KV data model allows us to store these efficiently and minimize
>>> duplication of data:
>>> 
>>> A document with these two conflicts:
>>> 
>>> {
>>>“_id”: “foo”,
>>>“_rev”: “1-abc”,
>>>“owner”: “alice”,
>>>“active”: true
>>> }
>>> {
>>>“_id”: “foo”,
>>>“_rev”: “1-def”,
>>>“owner”: “bob”,
>>>“active”: true
>>> }
>>> 
>>> could be stored thus:
>>>

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-19 Thread Nick Vatamaniuc

Hi,

Sorry for jumping in so late, I was following from the sidelines mostly. A
lot of good discussion happening and am excited about the possibilities
here.

I do like the simpler "chunking" approach for a few reasons:

* Most documents bodies are probably going to be smaller than 100k. So in
the majority of case it would be one write / one read to update and fetch
the document body.

* We could reuse the chunking code for attachment handling and possibly
revision key trees. So it's the general pattern of upload chunks to some
prefix, and when finished flip an atomic toggle to make it current.

* Do the same thing with revision trees and we could re-use the revision
tree manipulation logic. That is, the key tree in most cases would be small
enough to fit in 100k but if they get huge, they'd get chunked. This would
allow us to reuse all the battle tested couch_key_tree code mostly as is.
We even have property tests for it
https://github.com/apache/couchdb/blob/master/src/couch/test/couch_key_tree_prop_tests.erl

* It removes the need to explain the max exploded path length limitation to
customers.

Cheers,
-Nick


On Tue, Feb 19, 2019 at 11:18 AM Robert Newson  wrote:

> Hi,
>
> An alternative storage model that we should seriously consider is to
> follow our current approach in couch_file et al. Specifically, that the
> document _body_ is stored as an uninterpreted binary value. This would be
> much like the obvious plan for attachment storage; a key prefix that
> identifies the database and document, with the final item of that key tuple
> is an incrementing integer. Each of those keys has a binary value of up to
> 100k. Fetching all values with that key prefix, in fdb's natural ordering,
> will yield the full document body, which can be JSON decoded for further
> processing.
>
> I like this idea, and I like Adam's original proposal to explode documents
> into property paths. I have a slight preference for the simplicity of the
> idea in the previous paragraph, not least because it's close to what we do
> today. I also think it will be possible to migrate to alternative storage
> models in future, and foundationdb's transaction supports means we can do
> this migration seamlessly should we come to it.
>
> I'm very interested in knowing if anyone else is interested in going this
> simple, or considers it a wasted opportunity relative to the 'exploded'
> path.
>
> B.
>
> --
>   Robert Samuel Newson
>   rnew...@apache.org
>
> On Mon, 4 Feb 2019, at 19:59, Robert Newson wrote:
> > I've been remiss here in not posting the data model ideas that IBM
> > worked up while we were thinking about using FoundationDB so I'm posting
> > it now. This is Adam' Kocoloski's original work, I am just transcribing
> > it, and this is the context that the folks from the IBM side came in
> > with, for full disclosure.
> >
> > Basics
> >
> > 1. All CouchDB databases are inside a Directory
> > 2. Each CouchDB database is a Directory within that Directory
> > 3. It's possible to list all subdirectories of a Directory, so
> > `_all_dbs` is the list of directories from 1.
> > 4. Each Directory representing a CouchdB database has several Subspaces;
> > 4a. by_id/ doc subspace: actual document contents
> > 4b. by_seq/versionstamp subspace: for the _changes feed
> > 4c. index_definitions, indexes, ...
> >
> > JSON Mapping
> >
> > A hierarchical JSON object naturally maps to multiple KV pairs in FDB:
> >
> > {
> > “_id”: “foo”,
> > “owner”: “bob”,
> > “mylist”: [1,3,5],
> > “mymap”: {
> > “blue”: “#FF”,
> > “red”: “#FF”
> > }
> > }
> >
> > maps to
> >
> > (“foo”, “owner”) = “bob”
> > (“foo”, “mylist”, 0) = 1
> > (“foo”, “mylist”, 1) = 3
> > (“foo”, “mylist”, 2) = 5
> > (“foo”, “mymap”, “blue”) = “#FF”
> > (“foo”, “mymap”, “red”) = “#FF”
> >
> > NB: this means that the 100KB limit applies to individual leafs in the
> > JSON object, not the entire doc
> >
> > Edit Conflicts
> >
> > We need to account for the presence of conflicts in various levels of
> > the doc due to replication.
> >
> > Proposal is to create a special value indicating that the subtree below
> > our current cursor position is in an unresolvable conflict. Then add
> > additional KV pairs below to describe the conflicting entries.
> >
> > KV data model allows us to store these efficiently and minimize
> > duplication of data:
> >
> > A document with these two conflicts:
> >
> > {
> > “_id”: “foo”,
> > “_rev”: “1-abc”,
> > “owner”: “alice”,
> > “active”: true
> > }
> > {
> > “_id”: “foo”,
> > “_rev”: “1-def”,
> > “owner”: “bob”,
> > “active”: true
> > }
> >
> > could be stored thus:
> >
> > (“foo”, “active”) = true
> > (“foo”, “owner”) = kCONFLICT
> > (“foo”, “owner”, “1-abc”) = “alice”
> > (“foo”, “owner”, “1-def”) = “bob”
> >
> > So long as `kCONFLICT` is set at the top of the conflicting subtree this
> > representation can handle conflicts of different data types as well.
> >
> >

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-19 Thread Paul Davis

> I'm very interested in knowing if anyone else is interested in going this 
> simple, or considers it a wasted opportunity relative to the 'exploded' path.
>

Very interested because this is how the Record Layer stores their
protobuf messages.

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-19 Thread Robert Newson

Hi,

An alternative storage model that we should seriously consider is to follow our 
current approach in couch_file et al. Specifically, that the document _body_ is 
stored as an uninterpreted binary value. This would be much like the obvious 
plan for attachment storage; a key prefix that identifies the database and 
document, with the final item of that key tuple is an incrementing integer. 
Each of those keys has a binary value of up to 100k. Fetching all values with 
that key prefix, in fdb's natural ordering, will yield the full document body, 
which can be JSON decoded for further processing.

I like this idea, and I like Adam's original proposal to explode documents into 
property paths. I have a slight preference for the simplicity of the idea in 
the previous paragraph, not least because it's close to what we do today. I 
also think it will be possible to migrate to alternative storage models in 
future, and foundationdb's transaction supports means we can do this migration 
seamlessly should we come to it.

I'm very interested in knowing if anyone else is interested in going this 
simple, or considers it a wasted opportunity relative to the 'exploded' path.

B.

-- 
  Robert Samuel Newson
  rnew...@apache.org

On Mon, 4 Feb 2019, at 19:59, Robert Newson wrote:
> I've been remiss here in not posting the data model ideas that IBM 
> worked up while we were thinking about using FoundationDB so I'm posting 
> it now. This is Adam' Kocoloski's original work, I am just transcribing 
> it, and this is the context that the folks from the IBM side came in 
> with, for full disclosure.
> 
> Basics
> 
> 1. All CouchDB databases are inside a Directory
> 2. Each CouchDB database is a Directory within that Directory
> 3. It's possible to list all subdirectories of a Directory, so 
> `_all_dbs` is the list of directories from 1.
> 4. Each Directory representing a CouchdB database has several Subspaces;
> 4a. by_id/ doc subspace: actual document contents 
> 4b. by_seq/versionstamp subspace: for the _changes feed 
> 4c. index_definitions, indexes, ...
> 
> JSON Mapping
> 
> A hierarchical JSON object naturally maps to multiple KV pairs in FDB:
> 
> { 
> “_id”: “foo”, 
> “owner”: “bob”, 
> “mylist”: [1,3,5], 
> “mymap”: { 
> “blue”: “#FF”, 
> “red”: “#FF” 
> } 
> }
> 
> maps to
> 
> (“foo”, “owner”) = “bob” 
> (“foo”, “mylist”, 0) = 1 
> (“foo”, “mylist”, 1) = 3 
> (“foo”, “mylist”, 2) = 5 
> (“foo”, “mymap”, “blue”) = “#FF” 
> (“foo”, “mymap”, “red”) = “#FF”
> 
> NB: this means that the 100KB limit applies to individual leafs in the 
> JSON object, not the entire doc
> 
> Edit Conflicts
> 
> We need to account for the presence of conflicts in various levels of 
> the doc due to replication.
> 
> Proposal is to create a special value indicating that the subtree below 
> our current cursor position is in an unresolvable conflict. Then add 
> additional KV pairs below to describe the conflicting entries.
> 
> KV data model allows us to store these efficiently and minimize 
> duplication of data:
> 
> A document with these two conflicts:
> 
> { 
> “_id”: “foo”, 
> “_rev”: “1-abc”, 
> “owner”: “alice”, 
> “active”: true 
> }
> { 
> “_id”: “foo”, 
> “_rev”: “1-def”, 
> “owner”: “bob”, 
> “active”: true 
> }
> 
> could be stored thus:
> 
> (“foo”, “active”) = true 
> (“foo”, “owner”) = kCONFLICT 
> (“foo”, “owner”, “1-abc”) = “alice” 
> (“foo”, “owner”, “1-def”) = “bob”
> 
> So long as `kCONFLICT` is set at the top of the conflicting subtree this 
> representation can handle conflicts of different data types as well.
> 
> Missing fields need to be handled explicitly:
> 
> { 
>   “_id”: “foo”, 
>   “_rev”: “1-abc”, 
>   “owner”: “alice”, 
>   “active”: true 
> }
> 
> { 
>   “_id”: “foo”, 
>   “_rev”: “1-def”, 
>   “owner”: { 
> “name”: “bob”, 
> “email”: “
> b...@example.com
> " 
>   } 
> }
> 
> could be stored thus:
> 
> (“foo”, “active”) = kCONFLICT 
> (“foo”, “active”, “1-abc”) = true 
> (“foo”, “active”, “1-def”) = kMISSING 
> (“foo”, “owner”) = kCONFLICT 
> (“foo”, “owner”, “1-abc”) = “alice” 
> (“foo”, “owner”, “1-def”, “name”) = “bob” 
> (“foo”, “owner”, “1-def”, “email”) = ...
> 
> Revision Metadata
> 
> * CouchDB uses a hash history for revisions 
> ** Each edit is identified by the hash of the content of the edit 
> including the base revision against which it was applied 
> ** Individual edit branches are bounded in length but the number of 
> branches is potentially unbounded 
> 
> * Size limits preclude us from storing the entire key tree as a single 
> value; in pathological situations 
> the tree could exceed 100KB (each entry is > 16 bytes) 
> 
> * Store each edit branch as a separate KV including deleted status in a 
> special subspace 
> 
> * Structure key representation so that “winning” revision can be 
> automatically retrieved in a limit=1 
> key range operation
> 
> (“foo”, “_meta”, “deleted=false”, 1,

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-04 Thread Robert Newson

I've been remiss here in not posting the data model ideas that IBM worked up 
while we were thinking about using FoundationDB so I'm posting it now. This is 
Adam' Kocoloski's original work, I am just transcribing it, and this is the 
context that the folks from the IBM side came in with, for full disclosure.

Basics

1. All CouchDB databases are inside a Directory
2. Each CouchDB database is a Directory within that Directory
3. It's possible to list all subdirectories of a Directory, so `_all_dbs` is 
the list of directories from 1.
4. Each Directory representing a CouchdB database has several Subspaces;
4a. by_id/ doc subspace: actual document contents 
4b. by_seq/versionstamp subspace: for the _changes feed 
4c. index_definitions, indexes, ...

JSON Mapping

A hierarchical JSON object naturally maps to multiple KV pairs in FDB:

{ 
“_id”: “foo”, 
“owner”: “bob”, 
“mylist”: [1,3,5], 
“mymap”: { 
“blue”: “#FF”, 
“red”: “#FF” 
} 
}

maps to

(“foo”, “owner”) = “bob” 
(“foo”, “mylist”, 0) = 1 
(“foo”, “mylist”, 1) = 3 
(“foo”, “mylist”, 2) = 5 
(“foo”, “mymap”, “blue”) = “#FF” 
(“foo”, “mymap”, “red”) = “#FF”

NB: this means that the 100KB limit applies to individual leafs in the JSON 
object, not the entire doc

Edit Conflicts

We need to account for the presence of conflicts in various levels of the doc 
due to replication.

Proposal is to create a special value indicating that the subtree below our 
current cursor position is in an unresolvable conflict. Then add additional KV 
pairs below to describe the conflicting entries.

KV data model allows us to store these efficiently and minimize duplication of 
data:

A document with these two conflicts:

{ 
“_id”: “foo”, 
“_rev”: “1-abc”, 
“owner”: “alice”, 
“active”: true 
}
{ 
“_id”: “foo”, 
“_rev”: “1-def”, 
“owner”: “bob”, 
“active”: true 
}

could be stored thus:

(“foo”, “active”) = true 
(“foo”, “owner”) = kCONFLICT 
(“foo”, “owner”, “1-abc”) = “alice” 
(“foo”, “owner”, “1-def”) = “bob”

So long as `kCONFLICT` is set at the top of the conflicting subtree this 
representation can handle conflicts of different data types as well.

Missing fields need to be handled explicitly:

{ 
  “_id”: “foo”, 
  “_rev”: “1-abc”, 
  “owner”: “alice”, 
  “active”: true 
}

{ 
  “_id”: “foo”, 
  “_rev”: “1-def”, 
  “owner”: { 
“name”: “bob”, 
“email”: “
b...@example.com
" 
  } 
}

could be stored thus:

(“foo”, “active”) = kCONFLICT 
(“foo”, “active”, “1-abc”) = true 
(“foo”, “active”, “1-def”) = kMISSING 
(“foo”, “owner”) = kCONFLICT 
(“foo”, “owner”, “1-abc”) = “alice” 
(“foo”, “owner”, “1-def”, “name”) = “bob” 
(“foo”, “owner”, “1-def”, “email”) = ...

Revision Metadata

* CouchDB uses a hash history for revisions 
** Each edit is identified by the hash of the content of the edit including the 
base revision against which it was applied 
** Individual edit branches are bounded in length but the number of branches is 
potentially unbounded 

* Size limits preclude us from storing the entire key tree as a single value; 
in pathological situations 
the tree could exceed 100KB (each entry is > 16 bytes) 

* Store each edit branch as a separate KV including deleted status in a special 
subspace 

* Structure key representation so that “winning” revision can be automatically 
retrieved in a limit=1 
key range operation

(“foo”, “_meta”, “deleted=false”, 1, “def”) = [] 
(“foo”, “_meta”, “deleted=false”, 4, “bif”) = [“3-baz”,”2-bar”,”1-foo”]  <-- 
winner
(“foo”, “_meta”, “deleted=true”, 3, “abc”) = [“2-bar”, “1-foo”]

Changes Feed

* FDB supports a concept called a versionstamp — a 10 byte, unique, 
monotonically (but not sequentially) increasing value for each committed 
transaction. The first 8 bytes are the committed version of the database. The 
last 2 bytes are monotonic in the serialization order for transactions. 

* A transaction can specify a particular index into a key where the following 
10 bytes will be overwritten by the versionstamp at commit time 

* A subspace keyed on versionstamp naturally yields a _changes feed

by_seq subspace 
  (“versionstamp1”) = (“foo”, “1-abc”) 
  (“versionstamp4”) = (“bar”, “4-def”) 

by_id subspace 
  (“bar”, “_vsn”) = “versionstamp4” 
  ... 
  (“foo”, “_vsn”) = “versionstamp1”

JSON Indexes

* “Mango” JSON indexes are defined by
** a list of field names, each of which may be nested,  
** an optional partial_filter_selector which constrains the set of docs that 
contribute 
** an optional name defined by the ddoc field (the name is auto-generated if 
not supplied) 

* Store index definitions in a single subspace to aid query planning 
** ((person,name), title, email) = (“name-title-email”, “{“student”: true}”) 
** Store the values for each index in a dedicated subspace, adding the document 
ID as the last element in the tuple 
*** (“rosie revere”, “engineer”, “ro...@example.com", “foo”) = null

B.

-- 
  Robert Samuel Newson

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-04 Thread Ilya Khlopotov



I want to fix previous mistakes. I did two mistakes in previous calculations:
- I used 1Kb as base size for calculating expansion factor (although we don't 
know exact size of original document)
- The expansion factor calculation included number of revisions (it shouldn't)

I'll focus on flattened JSON docs model

The following formula is used in previous calculation. 
storage_size_per_document=mapping_table_size*number_of_revisions + 
depth*number_of_paths*number_of_revisions + 
number_of_paths*value_size*number_of_revisions

To clarify things a little bit I want to calculate space requirement for single 
revision this time.
mapping_table_size=field_name_size*(field_name_length+4(integer size))=100 * 
(20 + 4(integer size)) = 2400 bytes
storage_size_per_document_per_revision_per_replica=mapping_table_size + 
depth*number_of_paths + value_size*number_of_paths =
2400bytes + 10*1000+1000*100=112400bytes~=110 Kb

We definitely can reduce requirement for mapping table by adopting rnewson's 
idea of a schema.

On 2019/02/04 11:08:16, Ilya Khlopotov  wrote: 
> Hi Michael,
> 
> > For example, hears a crazy thought:
> > Map every distinct occurence of a key/value instance through a crypto hash
> > function to get a set of hashes.
> >
> > These can be be precomputed by Couch without any lookups in FDB.  These
> > will be spread all over kingdom come in FDB and not lend themselves to
> > range search well.
> > 
> > So what you do is index them for frequency of occurring in the same set.
> > In essence, you 'bucket them' statistically, and that bucket id becomes a
> > key prefix. A crypto hash value can be copied into more than one bucket.
> > The {bucket_id}/{cryptohash} becomes a {val_id}
> 
> > When writing a document, Couch submits the list/array of cryptohash values
> > it computed to FDB and gets back the corresponding  {val_id} (the id with
> > the bucket prefixed).  This can get somewhat expensive if there's always a
> > lot of app local cache misses.
> >
> > A document's value is then a series of {val_id} arrays up to 100k per
> > segment.
> > 
> > When retrieving a document, you get the val_ids, find the distinct buckets
> > and min/max entries for this doc, and then parallel query each bucket while
> > reconstructing the document.
> 
> Interesting idea. Let's try to think it through to see if we can make it 
> viable. 
> Let's go through hypothetical example. Input data for the example:
> - 1M of documents
> - each document is around 10Kb
> - each document consists of 1K of unique JSON paths 
> - each document has 100 unique JSON field names
> - every scalar value is 100 bytes
> - 10% of unique JSON paths for every document already stored in database 
> under different doc or different revision of the current one
> - we assume 3 independent copies for every key-value pair in FDB
> - our hash key size is 32 bytes
> - let's assume we can determine if key is already on the storage without 
> doing query
> - 1% of paths is in cache (unrealistic value, in real live the percentage is 
> lower)
> - every JSON field name is 20 bytes
> - every JSON path is 10 levels deep
> - document key prefix length is 50
> - every document has 10 revisions
> Let's estimate the storage requirements and size of data we need to transmit. 
> The calculations are not exact.
> 1. storage_size_per_document (we cannot estimate exact numbers since we don't 
> know how FDB stores it)
>   - 10 * ((10Kb - (10Kb * 10%)) + (1K - (1K * 10%)) * 32 bytes) = 38Kb * 10 * 
> 3 = 1140 Kb (11x)
> 2. number of independent keys to retrieve on document read (non-range 
> queries) per document
>   - 1K - (1K * 1%) = 990
> 3. number of range queries: 0
> 4. data to transmit on read: (1K - (1K * 1%)) * (100 bytes + 32 bytes) = 102 
> Kb (10x) 
> 5. read latency (we use 2ms per read based on numbers from 
> https://apple.github.io/foundationdb/performance.html)
> - sequential: 990*2ms = 1980ms 
> - range: 0
> Let's compare these numbers with initial proposal (flattened JSON docs 
> without global schema and without cache)
> 1. storage_size_per_document
>   - mapping table size: 100 * (20 + 4(integer size)) = 2400 bytes
>   - key size: (10 * (4 + 1(delimiter))) + 50 = 100 bytes 
>   - storage_size_per_document: 2.4K*10 + 100*1K*10 + 1K*100*10 = 2024K = 1976 
> Kb * 3 = 5930 Kb (59.3x)
> 2. number of independent keys to retrieve: 0-2 (depending on index structure)
> 3. number of range queries: 1 (1001 of keys in result)
> 4. data to transmit on read: 24K + 1000*100 + 1000*100 = 23.6 Kb (2.4x)  
> 5. read latency (we use 2ms per read based on numbers from 
> https://apple.github.io/foundationdb/performance.html and estimate range read 
> performance based on numbers from 
> https://apple.github.io/foundationdb/benchmarking.html#single-core-read-test)
>   - range read performance: Given read performance is about 305,000 
> reads/second and range performance 3,600,000 keys/second we estimate range 
> performance to be 11.8x compared to read

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-04 Thread Robert Newson

I think we're deep in the weeds on this small aspect of the data model problem, 
and haven't touched other aspects yet. The numbers used in your example (1k of 
paths, 100 unique field names, 100 bytes for a value), where are they from? If 
they are not from some empirical data source, I don't see any reason to dwell 
on anything we might infer from them.

I think we should focus on the simplest model that also 'works' (i.e, delivers 
all essential properties) and then prototype so we can see how efficient it is.

I am happy to sacrifice some degree of efficiency for a comprehensible mapping 
of documents to key-value pairs and we have at least three techniques to 
address long keys so far. We also know there are other approaches to this 
problem if necessary that have a much smaller storage overhead (adjacent rows 
of 100k chunks of the couchdb document treated as a blob). 

-- 
  Robert Samuel Newson
  rnew...@apache.org

On Mon, 4 Feb 2019, at 18:29, Ilya Khlopotov wrote:
> At some point I changed the number of unique JSON paths and probably 
> forgot to update other conditions.
> The ` - each document is around 10Kb` is not used in the calculations so 
> can be ignored.
> 
> On 2019/02/04 17:46:20, Adam Kocoloski  wrote: 
> > Ugh! We definitely cannot have a model where a 10K JSON document is 
> > exploded into 2MB worth of KV data. I’ve tried several times to follow the 
> > math here but I’m failing. I can’t even get past this first bit:
> > 
> > > - each document is around 10Kb
> > > - each document consists of 1K of unique JSON paths 
> > > - each document has 100 unique JSON field names
> > > - every scalar value is 100 bytes
> > 
> > If each document has 1000 paths, and each path (which leads to a unique 
> > scalar value, right?) has a value of 100 bytes associated with it … how is 
> > the document 10KB? Wouldn’t it need to be at least 100KB just by adding up 
> > all the scalar values?
> > 
> > Adam
> > 
> > > On Feb 4, 2019, at 6:08 AM, Ilya Khlopotov  wrote:
> > > 
> > > Hi Michael,
> > > 
> > >> For example, hears a crazy thought:
> > >> Map every distinct occurence of a key/value instance through a crypto 
> > >> hash
> > >> function to get a set of hashes.
> > >> 
> > >> These can be be precomputed by Couch without any lookups in FDB.  These
> > >> will be spread all over kingdom come in FDB and not lend themselves to
> > >> range search well.
> > >> 
> > >> So what you do is index them for frequency of occurring in the same set.
> > >> In essence, you 'bucket them' statistically, and that bucket id becomes a
> > >> key prefix. A crypto hash value can be copied into more than one bucket.
> > >> The {bucket_id}/{cryptohash} becomes a {val_id}
> > > 
> > >> When writing a document, Couch submits the list/array of cryptohash 
> > >> values
> > >> it computed to FDB and gets back the corresponding  {val_id} (the id with
> > >> the bucket prefixed).  This can get somewhat expensive if there's always 
> > >> a
> > >> lot of app local cache misses.
> > >> 
> > >> A document's value is then a series of {val_id} arrays up to 100k per
> > >> segment.
> > >> 
> > >> When retrieving a document, you get the val_ids, find the distinct 
> > >> buckets
> > >> and min/max entries for this doc, and then parallel query each bucket 
> > >> while
> > >> reconstructing the document.
> > > 
> > > Interesting idea. Let's try to think it through to see if we can make it 
> > > viable. 
> > > Let's go through hypothetical example. Input data for the example:
> > > - 1M of documents
> > > - each document is around 10Kb
> > > - each document consists of 1K of unique JSON paths 
> > > - each document has 100 unique JSON field names
> > > - every scalar value is 100 bytes
> > > - 10% of unique JSON paths for every document already stored in database 
> > > under different doc or different revision of the current one
> > > - we assume 3 independent copies for every key-value pair in FDB
> > > - our hash key size is 32 bytes
> > > - let's assume we can determine if key is already on the storage without 
> > > doing query
> > > - 1% of paths is in cache (unrealistic value, in real live the percentage 
> > > is lower)
> > > - every JSON field name is 20 bytes
> > > - every JSON path is 10 levels deep
> > > - document key prefix length is 50
> > > - every document has 10 revisions
> > > Let's estimate the storage requirements and size of data we need to 
> > > transmit. The calculations are not exact.
> > > 1. storage_size_per_document (we cannot estimate exact numbers since we 
> > > don't know how FDB stores it)
> > >  - 10 * ((10Kb - (10Kb * 10%)) + (1K - (1K * 10%)) * 32 bytes) = 38Kb * 
> > > 10 * 3 = 1140 Kb (11x)
> > > 2. number of independent keys to retrieve on document read (non-range 
> > > queries) per document
> > >  - 1K - (1K * 1%) = 990
> > > 3. number of range queries: 0
> > > 4. data to transmit on read: (1K - (1K * 1%)) * (100 bytes + 32 bytes) = 
> > > 102 Kb (10x) 
> > > 5. read latency (we use 2ms

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-04 Thread Ilya Khlopotov

At some point I changed the number of unique JSON paths and probably forgot to 
update other conditions.
The ` - each document is around 10Kb` is not used in the calculations so can be 
ignored.

On 2019/02/04 17:46:20, Adam Kocoloski  wrote: 
> Ugh! We definitely cannot have a model where a 10K JSON document is exploded 
> into 2MB worth of KV data. I’ve tried several times to follow the math here 
> but I’m failing. I can’t even get past this first bit:
> 
> > - each document is around 10Kb
> > - each document consists of 1K of unique JSON paths 
> > - each document has 100 unique JSON field names
> > - every scalar value is 100 bytes
> 
> If each document has 1000 paths, and each path (which leads to a unique 
> scalar value, right?) has a value of 100 bytes associated with it … how is 
> the document 10KB? Wouldn’t it need to be at least 100KB just by adding up 
> all the scalar values?
> 
> Adam
> 
> > On Feb 4, 2019, at 6:08 AM, Ilya Khlopotov  wrote:
> > 
> > Hi Michael,
> > 
> >> For example, hears a crazy thought:
> >> Map every distinct occurence of a key/value instance through a crypto hash
> >> function to get a set of hashes.
> >> 
> >> These can be be precomputed by Couch without any lookups in FDB.  These
> >> will be spread all over kingdom come in FDB and not lend themselves to
> >> range search well.
> >> 
> >> So what you do is index them for frequency of occurring in the same set.
> >> In essence, you 'bucket them' statistically, and that bucket id becomes a
> >> key prefix. A crypto hash value can be copied into more than one bucket.
> >> The {bucket_id}/{cryptohash} becomes a {val_id}
> > 
> >> When writing a document, Couch submits the list/array of cryptohash values
> >> it computed to FDB and gets back the corresponding  {val_id} (the id with
> >> the bucket prefixed).  This can get somewhat expensive if there's always a
> >> lot of app local cache misses.
> >> 
> >> A document's value is then a series of {val_id} arrays up to 100k per
> >> segment.
> >> 
> >> When retrieving a document, you get the val_ids, find the distinct buckets
> >> and min/max entries for this doc, and then parallel query each bucket while
> >> reconstructing the document.
> > 
> > Interesting idea. Let's try to think it through to see if we can make it 
> > viable. 
> > Let's go through hypothetical example. Input data for the example:
> > - 1M of documents
> > - each document is around 10Kb
> > - each document consists of 1K of unique JSON paths 
> > - each document has 100 unique JSON field names
> > - every scalar value is 100 bytes
> > - 10% of unique JSON paths for every document already stored in database 
> > under different doc or different revision of the current one
> > - we assume 3 independent copies for every key-value pair in FDB
> > - our hash key size is 32 bytes
> > - let's assume we can determine if key is already on the storage without 
> > doing query
> > - 1% of paths is in cache (unrealistic value, in real live the percentage 
> > is lower)
> > - every JSON field name is 20 bytes
> > - every JSON path is 10 levels deep
> > - document key prefix length is 50
> > - every document has 10 revisions
> > Let's estimate the storage requirements and size of data we need to 
> > transmit. The calculations are not exact.
> > 1. storage_size_per_document (we cannot estimate exact numbers since we 
> > don't know how FDB stores it)
> >  - 10 * ((10Kb - (10Kb * 10%)) + (1K - (1K * 10%)) * 32 bytes) = 38Kb * 10 
> > * 3 = 1140 Kb (11x)
> > 2. number of independent keys to retrieve on document read (non-range 
> > queries) per document
> >  - 1K - (1K * 1%) = 990
> > 3. number of range queries: 0
> > 4. data to transmit on read: (1K - (1K * 1%)) * (100 bytes + 32 bytes) = 
> > 102 Kb (10x) 
> > 5. read latency (we use 2ms per read based on numbers from 
> > https://apple.github.io/foundationdb/performance.html)
> >- sequential: 990*2ms = 1980ms 
> >- range: 0
> > Let's compare these numbers with initial proposal (flattened JSON docs 
> > without global schema and without cache)
> > 1. storage_size_per_document
> >  - mapping table size: 100 * (20 + 4(integer size)) = 2400 bytes
> >  - key size: (10 * (4 + 1(delimiter))) + 50 = 100 bytes 
> >  - storage_size_per_document: 2.4K*10 + 100*1K*10 + 1K*100*10 = 2024K = 
> > 1976 Kb * 3 = 5930 Kb (59.3x)
> > 2. number of independent keys to retrieve: 0-2 (depending on index 
> > structure)
> > 3. number of range queries: 1 (1001 of keys in result)
> > 4. data to transmit on read: 24K + 1000*100 + 1000*100 = 23.6 Kb (2.4x)  
> > 5. read latency (we use 2ms per read based on numbers from 
> > https://apple.github.io/foundationdb/performance.html and estimate range 
> > read performance based on numbers from 
> > https://apple.github.io/foundationdb/benchmarking.html#single-core-read-test)
> >  - range read performance: Given read performance is about 305,000 
> > reads/second and range performance 3,600,000 keys/second we estimate range 
> >

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-04 Thread Adam Kocoloski

Ugh! We definitely cannot have a model where a 10K JSON document is exploded 
into 2MB worth of KV data. I’ve tried several times to follow the math here but 
I’m failing. I can’t even get past this first bit:

> - each document is around 10Kb
> - each document consists of 1K of unique JSON paths 
> - each document has 100 unique JSON field names
> - every scalar value is 100 bytes

If each document has 1000 paths, and each path (which leads to a unique scalar 
value, right?) has a value of 100 bytes associated with it … how is the 
document 10KB? Wouldn’t it need to be at least 100KB just by adding up all the 
scalar values?

Adam

> On Feb 4, 2019, at 6:08 AM, Ilya Khlopotov  wrote:
> 
> Hi Michael,
> 
>> For example, hears a crazy thought:
>> Map every distinct occurence of a key/value instance through a crypto hash
>> function to get a set of hashes.
>> 
>> These can be be precomputed by Couch without any lookups in FDB.  These
>> will be spread all over kingdom come in FDB and not lend themselves to
>> range search well.
>> 
>> So what you do is index them for frequency of occurring in the same set.
>> In essence, you 'bucket them' statistically, and that bucket id becomes a
>> key prefix. A crypto hash value can be copied into more than one bucket.
>> The {bucket_id}/{cryptohash} becomes a {val_id}
> 
>> When writing a document, Couch submits the list/array of cryptohash values
>> it computed to FDB and gets back the corresponding  {val_id} (the id with
>> the bucket prefixed).  This can get somewhat expensive if there's always a
>> lot of app local cache misses.
>> 
>> A document's value is then a series of {val_id} arrays up to 100k per
>> segment.
>> 
>> When retrieving a document, you get the val_ids, find the distinct buckets
>> and min/max entries for this doc, and then parallel query each bucket while
>> reconstructing the document.
> 
> Interesting idea. Let's try to think it through to see if we can make it 
> viable. 
> Let's go through hypothetical example. Input data for the example:
> - 1M of documents
> - each document is around 10Kb
> - each document consists of 1K of unique JSON paths 
> - each document has 100 unique JSON field names
> - every scalar value is 100 bytes
> - 10% of unique JSON paths for every document already stored in database 
> under different doc or different revision of the current one
> - we assume 3 independent copies for every key-value pair in FDB
> - our hash key size is 32 bytes
> - let's assume we can determine if key is already on the storage without 
> doing query
> - 1% of paths is in cache (unrealistic value, in real live the percentage is 
> lower)
> - every JSON field name is 20 bytes
> - every JSON path is 10 levels deep
> - document key prefix length is 50
> - every document has 10 revisions
> Let's estimate the storage requirements and size of data we need to transmit. 
> The calculations are not exact.
> 1. storage_size_per_document (we cannot estimate exact numbers since we don't 
> know how FDB stores it)
>  - 10 * ((10Kb - (10Kb * 10%)) + (1K - (1K * 10%)) * 32 bytes) = 38Kb * 10 * 
> 3 = 1140 Kb (11x)
> 2. number of independent keys to retrieve on document read (non-range 
> queries) per document
>  - 1K - (1K * 1%) = 990
> 3. number of range queries: 0
> 4. data to transmit on read: (1K - (1K * 1%)) * (100 bytes + 32 bytes) = 102 
> Kb (10x) 
> 5. read latency (we use 2ms per read based on numbers from 
> https://apple.github.io/foundationdb/performance.html)
>- sequential: 990*2ms = 1980ms 
>- range: 0
> Let's compare these numbers with initial proposal (flattened JSON docs 
> without global schema and without cache)
> 1. storage_size_per_document
>  - mapping table size: 100 * (20 + 4(integer size)) = 2400 bytes
>  - key size: (10 * (4 + 1(delimiter))) + 50 = 100 bytes 
>  - storage_size_per_document: 2.4K*10 + 100*1K*10 + 1K*100*10 = 2024K = 1976 
> Kb * 3 = 5930 Kb (59.3x)
> 2. number of independent keys to retrieve: 0-2 (depending on index structure)
> 3. number of range queries: 1 (1001 of keys in result)
> 4. data to transmit on read: 24K + 1000*100 + 1000*100 = 23.6 Kb (2.4x)  
> 5. read latency (we use 2ms per read based on numbers from 
> https://apple.github.io/foundationdb/performance.html and estimate range read 
> performance based on numbers from 
> https://apple.github.io/foundationdb/benchmarking.html#single-core-read-test)
>  - range read performance: Given read performance is about 305,000 
> reads/second and range performance 3,600,000 keys/second we estimate range 
> performance to be 11.8x compared to read performance. If read performance is 
> 2ms than range performance is 0.169ms (which is hard to believe).
>  - sequential: 2 * 2 = 4ms
>  - range: 0.169
> 
> It looks like we are dealing with a tradeoff:
> - Map every distinct occurrence of a key/value instance through a crypto hash:
>  - 5.39x more disk space efficient
>  - 474x slower
> - flattened JSON model
>  - 5.39x less efficient in disk space
>  - 474x

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-04 Thread Robert Newson

Hi,

The talk of crypto in the key space is extremely premature in my opinion. It it 
is the database's job (foundationdb's in this case) to map meaningful names to 
whatever it takes to efficiently store, index, and retrieve them. Obscuring 
every key with an expensive cryptographic operation works against everything I 
think distinguishes good software.

Keep it simple. The overhead of using readable, meaningful keys can be 
mitigated to a degree with a) the Directory layer which shortens prefixes at 
the cost of a network round trip b) prefix elision in the fdb storage system 
itself (Redwood, which may land before we've completed our work). 

Actual measurements take priority over the speculation in this thread so far, 
and overhead (defined as the actual storage of a document versus its 
theoretical minimum disk occupancy) is preferable to complicated, "clever", but 
brittle solutions.

I point to my earlier comment on optional document schemas which would reduce 
the length of keys to a scalar value anyway (the offset of the data item within 
the declared schema).

B.

-- 
  Robert Samuel Newson
  rnew...@apache.org

On Mon, 4 Feb 2019, at 11:08, Ilya Khlopotov wrote:
> Hi Michael,
> 
> > For example, hears a crazy thought:
> > Map every distinct occurence of a key/value instance through a crypto hash
> > function to get a set of hashes.
> >
> > These can be be precomputed by Couch without any lookups in FDB.  These
> > will be spread all over kingdom come in FDB and not lend themselves to
> > range search well.
> > 
> > So what you do is index them for frequency of occurring in the same set.
> > In essence, you 'bucket them' statistically, and that bucket id becomes a
> > key prefix. A crypto hash value can be copied into more than one bucket.
> > The {bucket_id}/{cryptohash} becomes a {val_id}
> 
> > When writing a document, Couch submits the list/array of cryptohash values
> > it computed to FDB and gets back the corresponding  {val_id} (the id with
> > the bucket prefixed).  This can get somewhat expensive if there's always a
> > lot of app local cache misses.
> >
> > A document's value is then a series of {val_id} arrays up to 100k per
> > segment.
> > 
> > When retrieving a document, you get the val_ids, find the distinct buckets
> > and min/max entries for this doc, and then parallel query each bucket while
> > reconstructing the document.
> 
> Interesting idea. Let's try to think it through to see if we can make it 
> viable. 
> Let's go through hypothetical example. Input data for the example:
> - 1M of documents
> - each document is around 10Kb
> - each document consists of 1K of unique JSON paths 
> - each document has 100 unique JSON field names
> - every scalar value is 100 bytes
> - 10% of unique JSON paths for every document already stored in database 
> under different doc or different revision of the current one
> - we assume 3 independent copies for every key-value pair in FDB
> - our hash key size is 32 bytes
> - let's assume we can determine if key is already on the storage without 
> doing query
> - 1% of paths is in cache (unrealistic value, in real live the 
> percentage is lower)
> - every JSON field name is 20 bytes
> - every JSON path is 10 levels deep
> - document key prefix length is 50
> - every document has 10 revisions
> Let's estimate the storage requirements and size of data we need to 
> transmit. The calculations are not exact.
> 1. storage_size_per_document (we cannot estimate exact numbers since we 
> don't know how FDB stores it)
>   - 10 * ((10Kb - (10Kb * 10%)) + (1K - (1K * 10%)) * 32 bytes) = 38Kb * 
> 10 * 3 = 1140 Kb (11x)
> 2. number of independent keys to retrieve on document read (non-range 
> queries) per document
>   - 1K - (1K * 1%) = 990
> 3. number of range queries: 0
> 4. data to transmit on read: (1K - (1K * 1%)) * (100 bytes + 32 bytes) = 
> 102 Kb (10x) 
> 5. read latency (we use 2ms per read based on numbers from 
> https://apple.github.io/foundationdb/performance.html)
> - sequential: 990*2ms = 1980ms 
> - range: 0
> Let's compare these numbers with initial proposal (flattened JSON docs 
> without global schema and without cache)
> 1. storage_size_per_document
>   - mapping table size: 100 * (20 + 4(integer size)) = 2400 bytes
>   - key size: (10 * (4 + 1(delimiter))) + 50 = 100 bytes 
>   - storage_size_per_document: 2.4K*10 + 100*1K*10 + 1K*100*10 = 2024K = 
> 1976 Kb * 3 = 5930 Kb (59.3x)
> 2. number of independent keys to retrieve: 0-2 (depending on index 
> structure)
> 3. number of range queries: 1 (1001 of keys in result)
> 4. data to transmit on read: 24K + 1000*100 + 1000*100 = 23.6 Kb (2.4x)  
> 5. read latency (we use 2ms per read based on numbers from 
> https://apple.github.io/foundationdb/performance.html and estimate range 
> read performance based on numbers from 
> https://apple.github.io/foundationdb/benchmarking.html#single-core-read-test)
>   - range read performance: Given read performance is about 305,000 
>

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-04 Thread Ilya Khlopotov

Hi Michael,

> For example, hears a crazy thought:
> Map every distinct occurence of a key/value instance through a crypto hash
> function to get a set of hashes.
>
> These can be be precomputed by Couch without any lookups in FDB.  These
> will be spread all over kingdom come in FDB and not lend themselves to
> range search well.
> 
> So what you do is index them for frequency of occurring in the same set.
> In essence, you 'bucket them' statistically, and that bucket id becomes a
> key prefix. A crypto hash value can be copied into more than one bucket.
> The {bucket_id}/{cryptohash} becomes a {val_id}

> When writing a document, Couch submits the list/array of cryptohash values
> it computed to FDB and gets back the corresponding  {val_id} (the id with
> the bucket prefixed).  This can get somewhat expensive if there's always a
> lot of app local cache misses.
>
> A document's value is then a series of {val_id} arrays up to 100k per
> segment.
> 
> When retrieving a document, you get the val_ids, find the distinct buckets
> and min/max entries for this doc, and then parallel query each bucket while
> reconstructing the document.

Interesting idea. Let's try to think it through to see if we can make it 
viable. 
Let's go through hypothetical example. Input data for the example:
- 1M of documents
- each document is around 10Kb
- each document consists of 1K of unique JSON paths 
- each document has 100 unique JSON field names
- every scalar value is 100 bytes
- 10% of unique JSON paths for every document already stored in database under 
different doc or different revision of the current one
- we assume 3 independent copies for every key-value pair in FDB
- our hash key size is 32 bytes
- let's assume we can determine if key is already on the storage without doing 
query
- 1% of paths is in cache (unrealistic value, in real live the percentage is 
lower)
- every JSON field name is 20 bytes
- every JSON path is 10 levels deep
- document key prefix length is 50
- every document has 10 revisions
Let's estimate the storage requirements and size of data we need to transmit. 
The calculations are not exact.
1. storage_size_per_document (we cannot estimate exact numbers since we don't 
know how FDB stores it)
  - 10 * ((10Kb - (10Kb * 10%)) + (1K - (1K * 10%)) * 32 bytes) = 38Kb * 10 * 3 
= 1140 Kb (11x)
2. number of independent keys to retrieve on document read (non-range queries) 
per document
  - 1K - (1K * 1%) = 990
3. number of range queries: 0
4. data to transmit on read: (1K - (1K * 1%)) * (100 bytes + 32 bytes) = 102 Kb 
(10x) 
5. read latency (we use 2ms per read based on numbers from 
https://apple.github.io/foundationdb/performance.html)
- sequential: 990*2ms = 1980ms 
- range: 0
Let's compare these numbers with initial proposal (flattened JSON docs without 
global schema and without cache)
1. storage_size_per_document
  - mapping table size: 100 * (20 + 4(integer size)) = 2400 bytes
  - key size: (10 * (4 + 1(delimiter))) + 50 = 100 bytes 
  - storage_size_per_document: 2.4K*10 + 100*1K*10 + 1K*100*10 = 2024K = 1976 
Kb * 3 = 5930 Kb (59.3x)
2. number of independent keys to retrieve: 0-2 (depending on index structure)
3. number of range queries: 1 (1001 of keys in result)
4. data to transmit on read: 24K + 1000*100 + 1000*100 = 23.6 Kb (2.4x)  
5. read latency (we use 2ms per read based on numbers from 
https://apple.github.io/foundationdb/performance.html and estimate range read 
performance based on numbers from 
https://apple.github.io/foundationdb/benchmarking.html#single-core-read-test)
  - range read performance: Given read performance is about 305,000 
reads/second and range performance 3,600,000 keys/second we estimate range 
performance to be 11.8x compared to read performance. If read performance is 
2ms than range performance is 0.169ms (which is hard to believe).
  - sequential: 2 * 2 = 4ms
  - range: 0.169

It looks like we are dealing with a tradeoff:
- Map every distinct occurrence of a key/value instance through a crypto hash:
  - 5.39x more disk space efficient
  - 474x slower
- flattened JSON model
  - 5.39x less efficient in disk space
  - 474x faster

In any case this unscientific exercise was very helpful. Since it uncovered the 
high cost in terms of disk space. 59.3x of original disk size is too much IMO. 

Are the any ways we can make Michael's model more performant?

Also I don't quite understand few aspects of the global hash table proposal:

1. > - Map every distinct occurence of a key/value instance through a crypto 
hash function to get a set of hashes.
I think we are talking only about scalar values here? I.e. `"#/foo.bar.baz": 
123`
Since I don't know how we can make it work for all possible JSON paths `{"foo": 
{"bar": {"size": 12, "baz": 123}}}":
- foo
- foo.bar
- foo.bar.baz

2. how to delete documents

Best regards,
ILYA


On 2019/01/30 23:33:22, Michael Fair  wrote: 
> On Wed, Jan 30, 2019, 12:57 PM Adam Kocoloski  
> > Hi Michael,
> >
> > > The trivial

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-03 Thread Robert Samuel Newson

Hi,


Yes. The value (in the foundationdb sense) will always be the terminal value 
(string, number, boolean, null) and so the key has to be the path to that, 
including special delimiters for object and array boundaries (what fdb calls 
’the tuple layer’).

I’d also like to see more effort into simplifying the way we map json 
documents/mvcc into fdb as far as possible (but not further). We can add 
embellishments over time, but removing complexity is much harder.

B.

> On 3 Feb 2019, at 01:27, Joan Touzet  wrote:
> 
> Hi Ilya,
> 
> I'm not seeing it in your proposal explicitly, or maybe I'm not
> readling it very well, so: can you confirm that arrays of large
> objects would continue to be deconstructed down to the base JSON
> types of string, number, boolean, and null? Any intermediate 
> objects (or further nesting of arrays and objects beyond that) would
> continue to contribute to the namespace, is that right?
> 
> -Joan
> 
> - Original Message -
>> From: "Ilya Khlopotov" 
>> To: dev@couchdb.apache.org
>> Sent: Wednesday, 30 January, 2019 8:05:05 AM
>> Subject: Re: [DISCUSS] : things we need to solve/decide : storing JSON 
>> documents
>> 
>> # First proposal
>> 
>> In order to overcome FoudationDB limitations on key size (10 kB) and
>> value size (100 kB) we could use the following approach.
>> 
>> Bellow the paths are using slash for illustration purposes only. We
>> can use nested subspaces, tuples, directories or something else.
>> 
>> - Store documents in a subspace or directory  (to keep prefix for a
>> key short)
>> - When we store the document we would enumerate all field names (0
>> and 1 are reserved) and store the mapping table in the key which
>> look like:
>> ```
>> {DB_DOCS_NS} / {DOC_KEY} / 0
>> ```
>> - Flatten the JSON document (convert it into key value pairs where
>> the key is `JSON_PATH` and value is `SCALAR_VALUE`)
>> - Replace elements of JSON_PATH with integers from mapping table we
>> constructed earlier
>> - When we have array use `1 / {array_idx}`
>> - Store scalar values in the keys which look like the following (we
>> use `JSON_PATH` with integers).
>> ```
>> {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH}
>> ```
>> - If the scalar value exceeds 100kB we would split it and store every
>> part under key constructed as:
>> ```
>> {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH} / {PART_IDX}
>> ```
>> 
>> Since all parts of the documents are stored under a common
>> `{DB_DOCS_NS} / {DOC_KEY}` they will be stored on the same server
>> most of the time. The document can be retrieved by using range query
>> (`txn.get_range("{DB_DOCS_NS} / {DOC_KEY} / 0", "{DB_DOCS_NS} /
>> {DOC_KEY} / 0xFF")`). We can reconstruct the document since the
>> mapping is returned as well.
>> 
>> The downside of this approach is we wouldn't be able to ensure the
>> same order of keys in the JSON object. Currently the `jiffy` JSON
>> encoder respects order of keys.
>> ```
>> 4> jiffy:encode({[{bbb, 1}, {aaa, 12}]}).
>> <<"{\"bbb\":1,\"aaa\":12}">>
>> 5> jiffy:encode({[{aaa, 12}, {bbb, 1}]}).
>> <<"{\"aaa\":12,\"bbb\":1}">>
>> ```
>> 
>> Best regards,
>> iilyak
>> 
>> On 2019/01/30 13:02:57, Ilya Khlopotov  wrote:
>>> As you might already know the FoundationDB has a number of
>>> limitations which influences the way we might store JSON
>>> documents. The limitations are:
>>> 
>>> |  limitation |recommended value|recommended
>>> |  max|absolute max|
>>> |-|--:|:|--:|
>>> | transaction duration  |  |
>>> |   |  5 sec  |
>>> | transaction data size |  |
>>> |   |  10 Mb |
>>> | key size   | 32 bytes |
>>> |   1 kB  | 10 kB  |
>>> | value size|   |
>>> |  10 kB |100 kB |
>>> 
>>> In order to fit the JSON document into 100kB we would have to
>>> partition it in some way. There are three ways of partitioning the
>>> document
>>> 1. store multiple binary blobs (parts) in different keys
>>> 2. flatten JSON structure and store every path leading to a scalar
>>> value unde

RE: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-01 Thread Reddy B .

Thank you very much

De : Robert Newson 
Envoyé : vendredi 1 février 2019 10:29
À : dev@couchdb.apache.org
Objet : Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

Hi,

"rebasing is not just a politically correct way of saying that CouchDb is being 
retired"

Emphatically, no. We see this is an evolution of CouchDB, delivering CouchDB 
1.0 semantics around conflicts and changes feeds but in a way that scales 
better than CouchDB 2.0's approach.

We intend to preserve what makes CouchDB special, which includes being able to 
"drop" documents in without having to declare their format. In my post from 
yesterday I suggested _optional_ schema declarations to improve efficiency and 
to address some of the constraints on doc and field size that might arise based 
on how we plan to map documents into foundationdb key-value entries.

The notion of "schemaless" for CouchDB has never meant that users don't have to 
think about how they map their data into CouchDB documents; it just relieved 
them of the burden of teaching CouchDB about them. That notion will remain.

CouchDB has a long history and a fair few clever ideas at the start are looking 
less relevant today (as you mentioned, couchapps, the _show, _list, _update, 
_rewrite sorts of things), as the ecosystem in which CouchDB lives has been so 
hugely expanded in the last ten years. It is right for the CouchDB project to 
re-evaluate the feature set we present and remove things that are of little 
value or are better done with other technology. That is just basic project 
maintenance, though.

Thank you for raising this concern, you are certainly not adding toxicity. It 
would be toxic if there was no expression of concerns about this change. Please 
continue to follow and contribute to this discussion.

B.

--
  Robert Samuel Newson
  rnew...@apache.org

On Fri, 1 Feb 2019, at 09:11, Reddy B. wrote:
> By the way, if the FDB migration was to happen, will CouchDb continue to
> be a schema-less database where we can just drop our documents and map/
> reduce them without further ceremony?
>
> I mean for the long-term, is there a commitment to keeping this feature?
> This is a big deal, the basics of CouchDb. I think this is the first
> assumption you make when you use CouchDb as of today.
>
> I'm not trying to add toxicity to this very positive, constructive and
> high quality discussion, but just some humble feedback. As a user, when
> I see this being questioned, along with the other limitations introduced
> by FDB I am starting to wonder if rebasing is not just a politically
> correct way of saying that CouchDb is being retired. For many once core
> features now become optional extensions to be implemented.
>
> Which makes me wonder "what's the core" and question the benefit/cost
> analysis of the switch in light of the current vision of the project.
> For it's starting to look like FDB may not only be used as an
> implementation convenience but as a new vision for CouchDb (deprecating
> the former vision). In light of this the benefit-cost analysis would
> make sense but such a change in vision has not been publicly announced.
>
> And this would mean that today's core feature are likely to go the way
> of Couchapps tomorrow if the vision has indeed changed. This is a very
> problematic uncertainty as an end-user thinking long-term support for
> new projects. I totally appreciate that this is dev mailing list where
> ideas are bounced and technical details worked out, but it's important
> for us as users to see commitments on vision, thus my question. I also
> took advantage of this opportunity to voice the more general concern
> aforementioned.
>
> But the specific question is: what's the vision for "schema-less" usage
> of CouchDb.
>
> Thanks
>
>
>
> ____________
> De : Ilya Khlopotov 
> Envoyé : mercredi 30 janvier 2019 22:08
> À : dev@couchdb.apache.org
> Objet : Re: [DISCUSS] : things we need to solve/decide : storing JSON 
> documents
>
> > I think I prefer the idea of indexing all document's keys using the same
> > identifier set.  In general I think applications have the behavior that
> > some keys are referenced far more than other keys and giving those keys in
> > each document the same value I think could eventually prove useful for
> > making many features faster and easier than expected.
>
> This approach would require an invention of schema evolution features
> similar to recently open sourced Record Layer
> https://www.foundationdb.org/files/record-layer-paper.pdf
> I am sure some CouchDB users do (because CouchDB is NoSQL i.e. schema-
> less database):
> - rename fields
> - reuse field names for something else when they update appli

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-01 Thread Robert Newson

Hi,

"rebasing is not just a politically correct way of saying that CouchDb is being 
retired"

Emphatically, no. We see this is an evolution of CouchDB, delivering CouchDB 
1.0 semantics around conflicts and changes feeds but in a way that scales 
better than CouchDB 2.0's approach.

We intend to preserve what makes CouchDB special, which includes being able to 
"drop" documents in without having to declare their format. In my post from 
yesterday I suggested _optional_ schema declarations to improve efficiency and 
to address some of the constraints on doc and field size that might arise based 
on how we plan to map documents into foundationdb key-value entries.

The notion of "schemaless" for CouchDB has never meant that users don't have to 
think about how they map their data into CouchDB documents; it just relieved 
them of the burden of teaching CouchDB about them. That notion will remain.

CouchDB has a long history and a fair few clever ideas at the start are looking 
less relevant today (as you mentioned, couchapps, the _show, _list, _update, 
_rewrite sorts of things), as the ecosystem in which CouchDB lives has been so 
hugely expanded in the last ten years. It is right for the CouchDB project to 
re-evaluate the feature set we present and remove things that are of little 
value or are better done with other technology. That is just basic project 
maintenance, though.

Thank you for raising this concern, you are certainly not adding toxicity. It 
would be toxic if there was no expression of concerns about this change. Please 
continue to follow and contribute to this discussion.

B.

-- 
  Robert Samuel Newson
  rnew...@apache.org

On Fri, 1 Feb 2019, at 09:11, Reddy B. wrote:
> By the way, if the FDB migration was to happen, will CouchDb continue to 
> be a schema-less database where we can just drop our documents and map/
> reduce them without further ceremony?
> 
> I mean for the long-term, is there a commitment to keeping this feature? 
> This is a big deal, the basics of CouchDb. I think this is the first 
> assumption you make when you use CouchDb as of today.
> 
> I'm not trying to add toxicity to this very positive, constructive and 
> high quality discussion, but just some humble feedback. As a user, when 
> I see this being questioned, along with the other limitations introduced 
> by FDB I am starting to wonder if rebasing is not just a politically 
> correct way of saying that CouchDb is being retired. For many once core 
> features now become optional extensions to be implemented.
> 
> Which makes me wonder "what's the core" and question the benefit/cost 
> analysis of the switch in light of the current vision of the project. 
> For it's starting to look like FDB may not only be used as an 
> implementation convenience but as a new vision for CouchDb (deprecating 
> the former vision). In light of this the benefit-cost analysis would 
> make sense but such a change in vision has not been publicly announced.
> 
> And this would mean that today's core feature are likely to go the way 
> of Couchapps tomorrow if the vision has indeed changed. This is a very 
> problematic uncertainty as an end-user thinking long-term support for 
> new projects. I totally appreciate that this is dev mailing list where 
> ideas are bounced and technical details worked out, but it's important 
> for us as users to see commitments on vision, thus my question. I also 
> took advantage of this opportunity to voice the more general concern 
> aforementioned.
> 
> But the specific question is: what's the vision for "schema-less" usage 
> of CouchDb.
> 
> Thanks
> 
> 
> 
> ____________
> De : Ilya Khlopotov 
> Envoyé : mercredi 30 janvier 2019 22:08
> À : dev@couchdb.apache.org
> Objet : Re: [DISCUSS] : things we need to solve/decide : storing JSON 
> documents
> 
> > I think I prefer the idea of indexing all document's keys using the same
> > identifier set.  In general I think applications have the behavior that
> > some keys are referenced far more than other keys and giving those keys in
> > each document the same value I think could eventually prove useful for
> > making many features faster and easier than expected.
> 
> This approach would require an invention of schema evolution features 
> similar to recently open sourced Record Layer 
> https://www.foundationdb.org/files/record-layer-paper.pdf
> I am sure some CouchDB users do (because CouchDB is NoSQL i.e. schema-
> less database):
> - rename fields
> - reuse field names for something else when they update application
> - remove fields
> - have documents of different structure in one database
> 
> > I think regardless of whether the mapping is document local or global, 
>

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-01 Thread Jan Lehnardt

Heya Reddy,

totally a valid question: JSON documents with revisions, secondary indexing, 
multi-server sync, PouchDB support the lot, all of this is not under 
discussion, even long term.

Any introduction of schema awareness on the storage level would be incidental 
to the schema-less nature of CouchDB documents.

For the short and long, we are committed to keeping existing apps compatible as 
much as possible, while enabling new use-cases going forward.

CouchDB as you know it is not being retired.

Best
Jan
—

> On 1. Feb 2019, at 10:11, Reddy B.  wrote:
> 
> By the way, if the FDB migration was to happen, will CouchDb continue to be a 
> schema-less database where we can just drop our documents and map/reduce them 
> without further ceremony?
> 
> I mean for the long-term, is there a commitment to keeping this feature? This 
> is a big deal, the basics of CouchDb. I think this is the first assumption 
> you make when you use CouchDb as of today.
> 
> I'm not trying to add toxicity to this very positive, constructive and high 
> quality discussion, but just some humble feedback. As a user, when I see this 
> being questioned, along with the other limitations introduced by FDB I am 
> starting to wonder if rebasing is not just a politically correct way of 
> saying that CouchDb is being retired. For many once core features now become 
> optional extensions to be implemented.
> 
> Which makes me wonder "what's the core" and question the benefit/cost 
> analysis of the switch in light of the current vision of the project. For 
> it's starting to look like FDB may not only be used as an implementation 
> convenience but as a new vision for CouchDb (deprecating the former vision). 
> In light of this the benefit-cost analysis would make sense but such a change 
> in vision has not been publicly announced.
> 
> And this would mean that today's core feature are likely to go the way of 
> Couchapps tomorrow if the vision has indeed changed. This is a very 
> problematic uncertainty as an end-user thinking long-term support for new 
> projects. I totally appreciate that this is dev mailing list where ideas are 
> bounced and technical details worked out, but it's important for us as users 
> to see commitments on vision, thus my question. I also took advantage of this 
> opportunity to voice the more general concern aforementioned.
> 
> But the specific question is: what's the vision for "schema-less" usage of 
> CouchDb.
> 
> Thanks
> 
> 
> 
> ____________
> De : Ilya Khlopotov 
> Envoyé : mercredi 30 janvier 2019 22:08
> À : dev@couchdb.apache.org
> Objet : Re: [DISCUSS] : things we need to solve/decide : storing JSON 
> documents
> 
>> I think I prefer the idea of indexing all document's keys using the same
>> identifier set.  In general I think applications have the behavior that
>> some keys are referenced far more than other keys and giving those keys in
>> each document the same value I think could eventually prove useful for
>> making many features faster and easier than expected.
> 
> This approach would require an invention of schema evolution features similar 
> to recently open sourced Record Layer 
> https://www.foundationdb.org/files/record-layer-paper.pdf
> I am sure some CouchDB users do (because CouchDB is NoSQL i.e. schema-less 
> database):
> - rename fields
> - reuse field names for something else when they update application
> - remove fields
> - have documents of different structure in one database
> 
>> I think regardless of whether the mapping is document local or global, having
>> FDB return those individual values is faster/easier than having Couch Range
>> fetch the mapping and do the translation work itself.
> in case of global mapping we would do
> - get_schema from different subspace (i.e. contact different nodes)
> - extract all scalar values by issuing FDB's range query (most likely all 
> values are co-located)
> - stitch document together and return it to user
> 
> in case of local mapping we don't need to call get_schema. The schema would 
> be returned by range query.
> 
> We would have to stitch document in either case.
> 
> Can you elaborate if my understanding is not correct (I didn't quite 
> understand the "Couch Range fetch" part of your question)?
> 
> best regards,
> iilyak
> 
> On 2019/01/30 20:11:18, Michael Fair  wrote:
>> On Wed, Jan 30, 2019 at 9:53 AM Ilya Khlopotov  wrote:
>> 
>>> FoundationDB Records layer uses global schema for JSON documents. They
>>> also have a nice way of creating indexes and schema evolution support.
>>> However this support comes at a cost of extra lookups in different
>>

RE: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-01 Thread Reddy B .

By the way, if the FDB migration was to happen, will CouchDb continue to be a 
schema-less database where we can just drop our documents and map/reduce them 
without further ceremony?

I mean for the long-term, is there a commitment to keeping this feature? This 
is a big deal, the basics of CouchDb. I think this is the first assumption you 
make when you use CouchDb as of today.

I'm not trying to add toxicity to this very positive, constructive and high 
quality discussion, but just some humble feedback. As a user, when I see this 
being questioned, along with the other limitations introduced by FDB I am 
starting to wonder if rebasing is not just a politically correct way of saying 
that CouchDb is being retired. For many once core features now become optional 
extensions to be implemented.

Which makes me wonder "what's the core" and question the benefit/cost analysis 
of the switch in light of the current vision of the project. For it's starting 
to look like FDB may not only be used as an implementation convenience but as a 
new vision for CouchDb (deprecating the former vision). In light of this the 
benefit-cost analysis would make sense but such a change in vision has not been 
publicly announced.

And this would mean that today's core feature are likely to go the way of 
Couchapps tomorrow if the vision has indeed changed. This is a very problematic 
uncertainty as an end-user thinking long-term support for new projects. I 
totally appreciate that this is dev mailing list where ideas are bounced and 
technical details worked out, but it's important for us as users to see 
commitments on vision, thus my question. I also took advantage of this 
opportunity to voice the more general concern aforementioned.

But the specific question is: what's the vision for "schema-less" usage of 
CouchDb.

Thanks

De : Ilya Khlopotov 
Envoyé : mercredi 30 janvier 2019 22:08
À : dev@couchdb.apache.org
Objet : Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

> I think I prefer the idea of indexing all document's keys using the same
> identifier set.  In general I think applications have the behavior that
> some keys are referenced far more than other keys and giving those keys in
> each document the same value I think could eventually prove useful for
> making many features faster and easier than expected.

This approach would require an invention of schema evolution features similar 
to recently open sourced Record Layer 
https://www.foundationdb.org/files/record-layer-paper.pdf
I am sure some CouchDB users do (because CouchDB is NoSQL i.e. schema-less 
database):
- rename fields
- reuse field names for something else when they update application
- remove fields
- have documents of different structure in one database

> I think regardless of whether the mapping is document local or global, having
> FDB return those individual values is faster/easier than having Couch Range
> fetch the mapping and do the translation work itself.
in case of global mapping we would do
- get_schema from different subspace (i.e. contact different nodes)
- extract all scalar values by issuing FDB's range query (most likely all 
values are co-located)
- stitch document together and return it to user

in case of local mapping we don't need to call get_schema. The schema would be 
returned by range query.

We would have to stitch document in either case.

Can you elaborate if my understanding is not correct (I didn't quite understand 
the "Couch Range fetch" part of your question)?

best regards,
iilyak

On 2019/01/30 20:11:18, Michael Fair  wrote:
> On Wed, Jan 30, 2019 at 9:53 AM Ilya Khlopotov  wrote:
>
> > FoundationDB Records layer uses global schema for JSON documents. They
> > also have a nice way of creating indexes and schema evolution support.
> > However this support comes at a cost of extra lookups in different
> > subspace. With local mapping table we almost (except a corner case) certain
> > that the schema and JSON fields would be collocated on a single node. Due
> > to common prefix.
> >
>
> In general I think I prefer the global, but separate, key mapping idea and
> use FDB's "cache the important, frequently accessed data, across
> distributed memory" features.
>
> I think I prefer the idea of indexing all document's keys using the same
> identifier set.  In general I think applications have the behavior that
> some keys are referenced far more than other keys and giving those keys in
> each document the same value I think could eventually prove useful for
> making many features faster and easier than expected.
>
> While I really like the independence and locality of a document local
> mapping, when I think about the process of transforming a document's keys
> into that mapping's values, I don't see a particu

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-31 Thread Robert Newson

Thanks! I stress it would be optional, we could add it in a release after the 
main couchdb-on-fdb in response to pressure from users finding the 10mb (etc) 
limits too restrictive, or we can do it as a neat enhancement in its own right 
(the validation aspect) that just happens to allow us to optimise the lengths 
of keys.

B.

-- 
  Robert Samuel Newson
  rnew...@apache.org

On Thu, 31 Jan 2019, at 17:36, Adam Kocoloski wrote:
> I like the idea, both for the efficiencies it enables in the 
> FoundationDB data model and for the ability to cover a lot of validation 
> functionality without shelling out to JS.
> 
> It’s pretty obviously a big, meaty topic unto itself, one that needs 
> some careful thought and design. Also an awful lot of opportunity for 
> scope creep. But a good topic nonetheless.
> 
> Adam
> 
> > On Jan 31, 2019, at 12:05 PM, Robert Newson  wrote:
> > 
> > Hi,
> > 
> > An enhancement over the first idea (where we flatten JSON documents into 
> > keys and values where the key is the full path to every terminal value, 
> > regardless of depth in the JSON) is to allow users to register schemas.
> > 
> > For documents that match a registered schema (suggestion, a top level field 
> > called "_schema" is required to mark the documents) we can convert to 
> > key/value pairs much more compactly. The key name, barring whatever prefix 
> > identifies the database itself, is just the name of the schema and the 
> > ordinal of the schema item relative to the declaration.
> > 
> > These schema documents (living under /dbname/_schema/$scheme_name akin to 
> > design documents) would list all required and optional fields, their types 
> > and perhaps other constraints (like valid ranges or relationships with 
> > other fields). We could get arbitrarily complex in schema definitions over 
> > time. Effectively, these are validate_doc_update functions without the 
> > Javascript evaluation pain.
> > 
> > We don't necessarily need this in the first version, but it feels like a 
> > better response to the worries over the restrictions that the flattening 
> > idea is causing than switching to an opaque series of 100k chunks.
> > 
> > thoughts?
> > B
> > 
> > -- 
> >  Robert Newson
> >  b...@rsn.io
> > 
> > On Thu, 31 Jan 2019, at 16:26, Adam Kocoloski wrote:
> >> 
> >>> On Jan 31, 2019, at 1:47 AM, ermouth  wrote:
> >>> 
>  As I don't see the 10k limitation as having significant merit
> >>> 
> >>> Not sure it’s relevant here, but Mango indexes put selected doc values 
> >>> into
> >>> keys.
> >>> 
> >>> ermouth
> >> 
> >> Totally relevant. Not just because of the possibility of putting a large 
> >> scalar value into the index, but because someone could create an index 
> >> on a field that is itself a container, and Mango could just dump the 
> >> entire container into the index as the key.
> >> 
> >> Presumably there’s a followup discussion dedicated to indexing where we 
> >> can suss out what to do in that scenario.
> >> 
> >> Adam
>

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-31 Thread Adam Kocoloski

I like the idea, both for the efficiencies it enables in the FoundationDB data 
model and for the ability to cover a lot of validation functionality without 
shelling out to JS.

It’s pretty obviously a big, meaty topic unto itself, one that needs some 
careful thought and design. Also an awful lot of opportunity for scope creep. 
But a good topic nonetheless.

Adam

> On Jan 31, 2019, at 12:05 PM, Robert Newson  wrote:
> 
> Hi,
> 
> An enhancement over the first idea (where we flatten JSON documents into keys 
> and values where the key is the full path to every terminal value, regardless 
> of depth in the JSON) is to allow users to register schemas.
> 
> For documents that match a registered schema (suggestion, a top level field 
> called "_schema" is required to mark the documents) we can convert to 
> key/value pairs much more compactly. The key name, barring whatever prefix 
> identifies the database itself, is just the name of the schema and the 
> ordinal of the schema item relative to the declaration.
> 
> These schema documents (living under /dbname/_schema/$scheme_name akin to 
> design documents) would list all required and optional fields, their types 
> and perhaps other constraints (like valid ranges or relationships with other 
> fields). We could get arbitrarily complex in schema definitions over time. 
> Effectively, these are validate_doc_update functions without the Javascript 
> evaluation pain.
> 
> We don't necessarily need this in the first version, but it feels like a 
> better response to the worries over the restrictions that the flattening idea 
> is causing than switching to an opaque series of 100k chunks.
> 
> thoughts?
> B
> 
> -- 
>  Robert Newson
>  b...@rsn.io
> 
> On Thu, 31 Jan 2019, at 16:26, Adam Kocoloski wrote:
>> 
>>> On Jan 31, 2019, at 1:47 AM, ermouth  wrote:
>>> 
 As I don't see the 10k limitation as having significant merit
>>> 
>>> Not sure it’s relevant here, but Mango indexes put selected doc values into
>>> keys.
>>> 
>>> ermouth
>> 
>> Totally relevant. Not just because of the possibility of putting a large 
>> scalar value into the index, but because someone could create an index 
>> on a field that is itself a container, and Mango could just dump the 
>> entire container into the index as the key.
>> 
>> Presumably there’s a followup discussion dedicated to indexing where we 
>> can suss out what to do in that scenario.
>> 
>> Adam

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-31 Thread Robert Newson

Hi,

An enhancement over the first idea (where we flatten JSON documents into keys 
and values where the key is the full path to every terminal value, regardless 
of depth in the JSON) is to allow users to register schemas.

For documents that match a registered schema (suggestion, a top level field 
called "_schema" is required to mark the documents) we can convert to key/value 
pairs much more compactly. The key name, barring whatever prefix identifies the 
database itself, is just the name of the schema and the ordinal of the schema 
item relative to the declaration.

These schema documents (living under /dbname/_schema/$scheme_name akin to 
design documents) would list all required and optional fields, their types and 
perhaps other constraints (like valid ranges or relationships with other 
fields). We could get arbitrarily complex in schema definitions over time. 
Effectively, these are validate_doc_update functions without the Javascript 
evaluation pain.

We don't necessarily need this in the first version, but it feels like a better 
response to the worries over the restrictions that the flattening idea is 
causing than switching to an opaque series of 100k chunks.

thoughts?
B

-- 
  Robert Newson
  b...@rsn.io

On Thu, 31 Jan 2019, at 16:26, Adam Kocoloski wrote:
> 
> > On Jan 31, 2019, at 1:47 AM, ermouth  wrote:
> > 
> >> As I don't see the 10k limitation as having significant merit
> > 
> > Not sure it’s relevant here, but Mango indexes put selected doc values into
> > keys.
> > 
> > ermouth
> 
> Totally relevant. Not just because of the possibility of putting a large 
> scalar value into the index, but because someone could create an index 
> on a field that is itself a container, and Mango could just dump the 
> entire container into the index as the key.
> 
> Presumably there’s a followup discussion dedicated to indexing where we 
> can suss out what to do in that scenario.
> 
> Adam

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-31 Thread Adam Kocoloski

> On Jan 31, 2019, at 1:47 AM, ermouth  wrote:
> 
>> As I don't see the 10k limitation as having significant merit
> 
> Not sure it’s relevant here, but Mango indexes put selected doc values into
> keys.
> 
> ermouth

Totally relevant. Not just because of the possibility of putting a large scalar 
value into the index, but because someone could create an index on a field that 
is itself a container, and Mango could just dump the entire container into the 
index as the key.

Presumably there’s a followup discussion dedicated to indexing where we can 
suss out what to do in that scenario.

Adam

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-31 Thread Michael Fair

On Wed, Jan 30, 2019 at 10:48 PM ermouth  wrote:

> > As I don't see the 10k limitation as having significant merit
>
> Not sure it’s relevant here, but Mango indexes put selected doc values into
> keys.
>

It kind of is, putting the values at the end of the keys is a viable
strategy.  It means those values must be kept to fit along with their path
into the 10k limit as well.

To clarify that "insignificant merit" comment:
(1) If docs are stored as arrays of blobs, then a 10k key to reference that
blob would be an inconceivable choice on our part.  The goal would be <= 32
bytes where the FDB folks has said there's a clear performance gain.

(2) If the docs are stored as pieces to be assembled (whether that's in
individual kvps or document sections) with the nested key in the doc
directly used as part of the FDB key; then having a use case where a 10k
JSON Path is a requirement seems an "insignificant risk".  Even in those
cases where such a contrived document exists, it seems to me that requiring
the application find a way to make the required path shorter is exceedingly
reasonable.

and
(3) If a more advanced/sophisticated approach is taken where Couch is
controlling the make up of the keys, then again, designing a system where
the system must support keys larger than 10k simply seems inconceivable.

I can't imagine any use cases where key lengths larger than 10k make up any
meaningful segment of applications for Couch.  At 10k in length, there is
bound to be some kind of path segmentation to reasonably reduce the key
length at the application layer instead of the Couch layer.

Mike

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-30 Thread ermouth

> As I don't see the 10k limitation as having significant merit

Not sure it’s relevant here, but Mango indexes put selected doc values into
keys.

ermouth

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-30 Thread Michael Fair

On Wed, Jan 30, 2019, 12:57 PM Adam Kocoloski  Hi Michael,
>
> > The trivial fix is to use DOCID/REVISIONID as DOC_KEY.
>
> Yes that’s definitely one way to address storage of edit conflicts. I
> think there are other, more compact representations that we can explore if
> we have this “exploded” data model where each scalar value maps to an
> individual KV pair.


I agree, as I mentioned on the original thread, I see a scheme, that
handles both conflicts and revisions, where you only have to store the most
recent change to a field.  Like you suggested, multiple revisions can share
a key.  Which in my mind's eye further begs the conflicts/revisions
discussion along with the working within the limits discussion because it
seems to me they are all intrinsically related as a "feature".

Saying 'We'll break documents up into roughly 80k segments', then trying to
overlay some kind of field sharing scheme for revisions/conflicts doesn't
seem like it will work.

I probably should have left out the trivial fix proposal as I don't think
it's a feasible solution to actually use.

The comment is more regarding that I do not see how this thread can escape
including how to store/retrieve conflicts/revisions.

For instance, the 'doc as individual fields' proposal lends itself to value
sharing across mutiple documents (and I don't just mean revisions of the
same doc, I mean the same key/value instance could be shared for every
document).
However that's not really relevant if we're not considering the amount of
shared information across documents in the storage scheme.

Simply storing documents in <100k segments (perhaps in some kind of
compressed binary representation) to deal with that FDB limit seems fine.
The only reason to consider doing something else is because of its impact
to indexing, searches, reduce functions, revisions, on-disk size impact,
etc.



> > I'm assuming the process will flatten the key paths of the document into
> an array and then request the value of each key as multiple parallel
> queries against FDB at once
>
> Ah, I think this is not one of Ilya’s assumptions. He’s trying to design a
> model which allows the retrieval of a document with a single range read,
> which is a good goal in my opinion.
>

I am not sure I agree.

Think of bitTorrent, a single range read should pull back the structure of
the document (the pieces to fetch), but not necessarily the whole document.

What if you already have a bunch of pieces in common with other documents
locally (a repeated header/footer/ or type for example); and you only need
to get a few pieces of data you don't already have?

The real goal to Couch I see is to treat your document set like the
collection of structured information that it is.  In some respects like an
extension of your application's heap space for structured objects and
efficiently querying that collection to get back subsets of the data.

Otherwise it seems more like a slightly upgraded file system plus a fancy
grep/find like feature...

The best way I see to unlock more features/power is to a move towards a
more granular and efficient way to store and retrieve the scalar values...



For example, hears a crazy thought:
Map every distinct occurence of a key/value instance through a crypto hash
function to get a set of hashes.

These can be be precomputed by Couch without any lookups in FDB.  These
will be spread all over kingdom come in FDB and not lend themselves to
range search well.

So what you do is index them for frequency of occurring in the same set.
In essence, you 'bucket them' statistically, and that bucket id becomes a
key prefix. A crypto hash value can be copied into more than one bucket.
The {bucket_id}/{cryptohash} becomes a {val_id}

When writing a document, Couch submits the list/array of cryptohash values
it computed to FDB and gets back the corresponding  {val_id} (the id with
the bucket prefixed).  This can get somewhat expensive if there's always a
lot of app local cache misses.


A document's value is then a series of {val_id} arrays up to 100k per
segment.

When retrieving a document, you get the val_ids, find the distinct buckets
and min/max entries for this doc, and then parallel query each bucket while
reconstructing the document.

The values returned from the buckets query are the key/value strings
required to reassemble this document.


--
I put this forward primarily to hilite the idea that trying to match the
storage representation of documents in a straight forward way to FDB keys
to reduce query count might not be the most performance oriented approach.

I'd much prefer a storage approach that reduced data duplication and
enabled fast sub-document queries.


This clearly falls in the realm of what people want the 'use case' of Couch
to be/become.  By giving Couch more access to sub-document queries, I could
eventually see queries as complicated as GraphQL submitted to Couch and
pulling back ad-hoc aggregated data across multiple documents in a

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-30 Thread Michael Fair

>
> > Assuming it only does "prefix" and not "segment", then I don't think
> this will help because the DOCID for each key in JSON_PATH will be
> different, making the "prefix" to each path across different documents
> distinct.
>
> I’m not sure I follow you here, or we have different understandings of the
> proposal. When I’m reading a document in this model I’m retrieving a set of
> keys that all share the same {DOCID}. Moreover, if I’ve got e.g. an array
> sitting in some deeply nested part of the document, the entire path
> doc.foo.bar.baz.myarray is common to every element of the array, so it’s
> actually quite a nice case for elision.
>

Doh! I was focusing on somehow doing something about the same "foo/bar/baz"
in thousands of different documents, rather than nested maps/arrays in the
same document.

My biggest hurdle on the keys isn't the 10k limit, I think that's a
perfectly reasonable restriction.
It's all the storage duplication requirement for the repeated key "strings"
assuming document values are being stored independently in the DB (which
I'm inclined to think is overall more useful than a "whole document"
storage approach).
Especially for really long arrays.

If FDB can in fact eliminate most of this storage duplication cost, then
why bother with a key mapping translation at all?

As I don't see the 10k limitation as having significant merit, the only
purposes I can see are:
1) to get keys down under/equal to the 32 bytes optimization level.
Probably four or five, 6/8 byte segments.
2) reduce network transfer bandwidth requirements.

---
But regardless, if FDB can optimize out the duplicate storage cost for key
prefixes like you describe, then keep it simple and skip the key translator.

Cheers!
Mike

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-30 Thread Ilya Khlopotov

> I think I prefer the idea of indexing all document's keys using the same
> identifier set.  In general I think applications have the behavior that
> some keys are referenced far more than other keys and giving those keys in
> each document the same value I think could eventually prove useful for
> making many features faster and easier than expected.

This approach would require an invention of schema evolution features similar 
to recently open sourced Record Layer 
https://www.foundationdb.org/files/record-layer-paper.pdf
I am sure some CouchDB users do (because CouchDB is NoSQL i.e. schema-less 
database):
- rename fields
- reuse field names for something else when they update application
- remove fields
- have documents of different structure in one database

> I think regardless of whether the mapping is document local or global, having
> FDB return those individual values is faster/easier than having Couch Range
> fetch the mapping and do the translation work itself.
in case of global mapping we would do
- get_schema from different subspace (i.e. contact different nodes)
- extract all scalar values by issuing FDB's range query (most likely all 
values are co-located)
- stitch document together and return it to user

in case of local mapping we don't need to call get_schema. The schema would be 
returned by range query.

We would have to stitch document in either case.

Can you elaborate if my understanding is not correct (I didn't quite understand 
the "Couch Range fetch" part of your question)?

best regards,
iilyak

On 2019/01/30 20:11:18, Michael Fair  wrote: 
> On Wed, Jan 30, 2019 at 9:53 AM Ilya Khlopotov  wrote:
> 
> > FoundationDB Records layer uses global schema for JSON documents. They
> > also have a nice way of creating indexes and schema evolution support.
> > However this support comes at a cost of extra lookups in different
> > subspace. With local mapping table we almost (except a corner case) certain
> > that the schema and JSON fields would be collocated on a single node. Due
> > to common prefix.
> >
> 
> In general I think I prefer the global, but separate, key mapping idea and
> use FDB's "cache the important, frequently accessed data, across
> distributed memory" features.
> 
> I think I prefer the idea of indexing all document's keys using the same
> identifier set.  In general I think applications have the behavior that
> some keys are referenced far more than other keys and giving those keys in
> each document the same value I think could eventually prove useful for
> making many features faster and easier than expected.
> 
> While I really like the independence and locality of a document local
> mapping, when I think about the process of transforming a document's keys
> into that mapping's values, I don't see a particular advantage regarding
> where in the DB that key mapping came from.  I'm assuming the process will
> flatten the key paths of the document into an array and then request the
> value of each key as multiple parallel queries against FDB at once.  I
> think regardless of whether the mapping is document local or global, having
> FDB return those individual values is faster/easier than having Couch Range
> fetch the mapping and do the translation work itself.
> 
> I could even see some periodic "reorganizing" engine that could renumber
> frequently used keys to make the reverse transformation back into a value
> that much faster.
> 
> 
> > > Personally I wonder if the 10KB limit on field paths is anything more
> > than a theoretical concern. It’s hard for me to imagine a useful schema
> > that would get anywhere near that deep, but maybe I’m insufficiently
> > creative :)
> 
> 
> +1
> 
> 
> There’s certainly a storage overhead from repeating the upper portion of a
> > path over and over again, but that’s also something the storage engine can
> > optimize away through prefix elision. The current production storage engine
> > in FoundationDB does not do this elision, but the new one in development
> > does.
> >
> 
> Assuming it only does "prefix" and not "segment", then I don't think this
> will help because the DOCID for each key in JSON_PATH will be different,
> making the "prefix" to each path across different documents distinct.  The
> prefix matching engine will only be able to match up to the key element
> before the DOCID.
> 
> Does/Could/Would the engine allow an app to use FDB itself to create a
> mapping identifier for key "segments" or some other method to "skip past"
> the distinct parts of keys to in a sense "reroot" the search?
> 
> If FDB was to "bake in" this "key segment mapping" idea as something it
> exposed to the application layer; that'd be awesome!  Lots of applications
> could probably make use of that.
> 
> Mike
>

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-30 Thread Adam Kocoloski

Hi Michael,

> The trivial fix is to use DOCID/REVISIONID as DOC_KEY.

Yes that’s definitely one way to address storage of edit conflicts. I think 
there are other, more compact representations that we can explore if we have 
this “exploded” data model where each scalar value maps to an individual KV 
pair. E.g. if you have two large revisions of a document that differ in only 
one field it is possible to write down a model where both revisions share all 
the rest of the KV pairs, and there’s a special flag in the value of the 
conflicted path which indicate that an edit branch occurred here. I guess we’ll 

> I'm assuming the process will flatten the key paths of the document into an 
> array and then request the value of each key as multiple parallel queries 
> against FDB at once

Ah, I think this is not one of Ilya’s assumptions. He’s trying to design a 
model which allows the retrieval of a document with a single range read, which 
is a good goal in my opinion.

I do think a small number of parallel reads can be OK, e.g. retrieving some 
database-level mapping information in parallel to the encoded document. We 
should try to avoid serializing reads, and I think issuing a separate read for 
every field of a document would be an unnecessarily heavy load.

> Assuming it only does "prefix" and not "segment", then I don't think this 
> will help because the DOCID for each key in JSON_PATH will be different, 
> making the "prefix" to each path across different documents distinct.

I’m not sure I follow you here, or we have different understandings of the 
proposal. When I’m reading a document in this model I’m retrieving a set of 
keys that all share the same {DOCID}. Moreover, if I’ve got e.g. an array 
sitting in some deeply nested part of the document, the entire path 
doc.foo.bar.baz.myarray is common to every element of the array, so it’s 
actually quite a nice case for elision.

> I think the answer is assuming every document modification can upload in 
> multiple txns.

I would like to avoid this if possible. It adds a lot of extra complexity (the 
subspace with atomic rename dance, for example), and I think CouchDB should be 
focused on use cases that do fit within the 10MB / 5 second limit.

Cheers, Adam

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-30 Thread Ilya Khlopotov

> The limitation I was calling out was you can't store two different values
> in the same Key.
> {CouchNS}/{DOCID}/{XXX}
Sorry I was not clear. I used DOC_KEY instead of DOCID in the proposal on 
purpose. The format of DOC_KEY is TBD. One option is to have DOC_KEY = 
DOCID/REVISIONID. 

In this case document might look like
```
{
"foo": 5,
"bar": {
"baz": HUGE_BLOB
}
}
```
```
{DB_DOCS_NS} / {DOCID} / {REVISIONID} / 0:
 3: foo
 4: bar
 5: baz
{DB_DOCS_NS} / {DOCID} / {REVISIONID} / 3:
 5
{DB_DOCS_NS} / {DOCID} / {REVISIONID} / 4 / 5 | 0: # `|` is used to solve 
problem Adam identified
 "first 100K"
{DB_DOCS_NS} / {DOCID} / {REVISIONID} / 4 / 5 | 1:  
 "second 100K"
``` 

Does it make sense?
Again the format of DOC_KEY is TBD. We might need a level of indirection (which 
is unlikely) if we would have to split CouchDB operation into multiple 
FoundationDB transactions. We would need it if we want to allow documents 
bigger than 10Mb which we shouldn't IMO. 

Best regards,
iilyak



On 2019/01/30 19:58:58, Michael Fair  wrote: 
> On Wed, Jan 30, 2019 at 11:54 AM Ilya Khlopotov  wrote:
> 
> > Hi Mike,
> >
> > > The trivial fix is to use DOCID/REVISIONID as DOC_KEY.
> > This doesn't solve the issue with scalar values being over the limits
> > FoundationDB can support.
> >
> >
> Right, that wasn't the limitation I was calling out.
> 
> The limitation I was calling out was you can't store two different values
> in the same Key.
> {CouchNS}/{DOCID}/{XXX}
> 
> Ever revision has the same {CouchNS}/{DOCID}/{XXX} keys but different
> values.
> Making revisions and conflicts a seemingly unavoidable part of the
> discussion to me.
>

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-30 Thread Michael Fair

>
> |  limitation |recommended value|recommended max|absolute
> max|
>
> |-|--:|:|--:|
> | transaction duration  |  |
>  |  5 sec  |
> | transaction data size |  |
>  |  10 Mb |
> | key size   | 32 bytes |
>  1 kB  | 10 kB  |
> | value size|   |
> 10 kB |100 kB |
>
>
I've been thinking about the transaction duration and data size limitations
too.

I think the answer is assuming every document modification can upload in
multiple txns.

Which more or less means building a "subspace" to work on the document,
then moving/renaming the space.
Or similarly upload the document under some key, then linking that key into
the visible document space.

Otherwise the answer becomes adding "A single document can be no larger
than 10MB" as a Couch limitation.
(And technically it's smaller than that because that 10MB is a global limit
for the whole FDB transaction.)

Unless I'm missing something, a 10MB txn can't upload more than a 10MB doc
in a single txn?
Similar goes for attachments.

Using multiple txns:
1) makes the 5 second limit simply go away by using smaller transactions
2) potentially enables the doc to upload in parallel across multiple txns
simultaneously

Mike

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-30 Thread Michael Fair

On Wed, Jan 30, 2019 at 9:53 AM Ilya Khlopotov  wrote:

> FoundationDB Records layer uses global schema for JSON documents. They
> also have a nice way of creating indexes and schema evolution support.
> However this support comes at a cost of extra lookups in different
> subspace. With local mapping table we almost (except a corner case) certain
> that the schema and JSON fields would be collocated on a single node. Due
> to common prefix.
>

In general I think I prefer the global, but separate, key mapping idea and
use FDB's "cache the important, frequently accessed data, across
distributed memory" features.

I think I prefer the idea of indexing all document's keys using the same
identifier set.  In general I think applications have the behavior that
some keys are referenced far more than other keys and giving those keys in
each document the same value I think could eventually prove useful for
making many features faster and easier than expected.

While I really like the independence and locality of a document local
mapping, when I think about the process of transforming a document's keys
into that mapping's values, I don't see a particular advantage regarding
where in the DB that key mapping came from.  I'm assuming the process will
flatten the key paths of the document into an array and then request the
value of each key as multiple parallel queries against FDB at once.  I
think regardless of whether the mapping is document local or global, having
FDB return those individual values is faster/easier than having Couch Range
fetch the mapping and do the translation work itself.

I could even see some periodic "reorganizing" engine that could renumber
frequently used keys to make the reverse transformation back into a value
that much faster.

> > Personally I wonder if the 10KB limit on field paths is anything more
> than a theoretical concern. It’s hard for me to imagine a useful schema
> that would get anywhere near that deep, but maybe I’m insufficiently
> creative :)

+1

There’s certainly a storage overhead from repeating the upper portion of a
> path over and over again, but that’s also something the storage engine can
> optimize away through prefix elision. The current production storage engine
> in FoundationDB does not do this elision, but the new one in development
> does.
>

Assuming it only does "prefix" and not "segment", then I don't think this
will help because the DOCID for each key in JSON_PATH will be different,
making the "prefix" to each path across different documents distinct.  The
prefix matching engine will only be able to match up to the key element
before the DOCID.

Does/Could/Would the engine allow an app to use FDB itself to create a
mapping identifier for key "segments" or some other method to "skip past"
the distinct parts of keys to in a sense "reroot" the search?

If FDB was to "bake in" this "key segment mapping" idea as something it
exposed to the application layer; that'd be awesome!  Lots of applications
could probably make use of that.

Mike

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-30 Thread Michael Fair

On Wed, Jan 30, 2019 at 11:54 AM Ilya Khlopotov  wrote:

> Hi Mike,
>
> > The trivial fix is to use DOCID/REVISIONID as DOC_KEY.
> This doesn't solve the issue with scalar values being over the limits
> FoundationDB can support.
>
>
Right, that wasn't the limitation I was calling out.

The limitation I was calling out was you can't store two different values
in the same Key.
{CouchNS}/{DOCID}/{XXX}

Ever revision has the same {CouchNS}/{DOCID}/{XXX} keys but different
values.
Making revisions and conflicts a seemingly unavoidable part of the
discussion to me.

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-30 Thread Ilya Khlopotov

Hi Mike,

> The trivial fix is to use DOCID/REVISIONID as DOC_KEY.
This doesn't solve the issue with scalar values being over the limits 
FoundationDB can support.

Best regards,
iilyak

On 2019/01/30 19:00:15, Michael Fair  wrote: 
> I know the claim was to avoid "revisions" and "conflicts" discussion in
> this thread but isn't that unavoidable.
> 
> In scheme #1 you have multiple keys with the same DOCID/PART_IDX but
> different data.
> In schemes #2 / #3 you have multiple copies of the JSON_PATH but different
> values.
> 
> The trivial fix is to use DOCID/REVISIONID as DOC_KEY.
> 
> Mike
> 
> On Wed, Jan 30, 2019 at 9:53 AM Ilya Khlopotov  wrote:
> 
> > FoundationDB Records layer uses global schema for JSON documents. They
> > also have a nice way of creating indexes and schema evolution support.
> > However this support comes at a cost of extra lookups in different
> > subspace. With local mapping table we almost (except a corner case) certain
> > that the schema and JSON fields would be collocated on a single node. Due
> > to common prefix.
> >
> > Best regards,
> > iilyak
> > On 2019/01/30 17:05:01, Jan Lehnardt  wrote:
> > > Ah sure, if we store the *cough* schema per doc, then it's not that
> > easy. An iteration of this proposal could store paths globally with ids
> > that the k/v store then uses for keys, which would enable what I described,
> > but happy to ignore this for the time being. :)
> > >
> > > Cheers
> > > Jan
> > > —
> > >
> > > > On 30. Jan 2019, at 17:58, Adam Kocoloski  wrote:
> > > >
> > > > Jan, I don’t think it does have that "fun property #2", as the mapping
> > is created separately for each document. In this proposal the field name
> > “foo” could map to 2 in one document and 42 in another.
> > > >
> > > > Thanks for the proposal Ilya. Personally I wonder if the 10KB limit on
> > field paths is anything more than a theoretical concern. It’s hard for me
> > to imagine a useful schema that would get anywhere near that deep, but
> > maybe I’m insufficiently creative :) There’s certainly a storage overhead
> > from repeating the upper portion of a path over and over again, but that’s
> > also something the storage engine can optimize away through prefix elision.
> > The current production storage engine in FoundationDB does not do this
> > elision, but the new one in development does.
> > > >
> > > > The value size limit is probably not so theoretical. I think as a
> > project we could choose to impose a 100KB size limit on scalar values - a
> > user who had a string longer than 100KB could chunk it up into an array of
> > strings pretty easily to work around that limit. But let’s say we don’t
> > want to impose that limit. In your design, how do I distinguish {PART_IDX}
> > from the elements of the {JSON_PATH}? I was kind of expecting to see some
> > magic value indicating that the subsequent set of keys with the same prefix
> > are all elements of a “multi-part object”:
> > > >
> > > > {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH}  = kMULTIPART
> > > > {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH} / {PART_IDX}  = “First 100 KB …"
> > > > ...
> > > >
> > > > You might have figured out something more efficient that saves a KV
> > here but I can’t quite grok it.
> > > >
> > > > Cheers, Adam
> > > >
> > > >
> > > >> On Jan 30, 2019, at 8:24 AM, Jan Lehnardt  wrote:
> > > >>
> > > >>
> > > >>
> > > >>> On 30. Jan 2019, at 14:22, Jan Lehnardt  > j...@apache.org>> wrote:
> > > >>>
> > > >>> Thanks Ilya for getting this started!
> > > >>>
> > > >>> Two quick notes on this one:
> > > >>>
> > > >>> 1. note that JSON does not guarantee object key order and that
> > CouchDB has never guaranteed it either, and with say emit(doc.foo,
> > doc.bar), if either emit() parameter was an object, the
> > undefined-sort-order of SpiderMonkey would mix things up. While worth
> > bringing up, this is not a BC break.
> > > >>>
> > > >>> 2. This would have the fun property of being able to rename a key
> > inside all docs that have that key.
> > > >>
> > > >> …in one short operation.
> > > >>
> > > >> Best
> > > >> Jan
> > > >> —
> > > >>>
> > > >>> Best
> > > >>> Jan
> > > >>> —
> > > >>>
> > >  On 30. Jan 2019, at 14:05, Ilya Khlopotov 
> > wrote:
> > > 
> > >  # First proposal
> > > 
> > >  In order to overcome FoudationDB limitations on key size (10 kB)
> > and value size (100 kB) we could use the following approach.
> > > 
> > >  Bellow the paths are using slash for illustration purposes only. We
> > can use nested subspaces, tuples, directories or something else.
> > > 
> > >  - Store documents in a subspace or directory  (to keep prefix for a
> > key short)
> > >  - When we store the document we would enumerate all field names (0
> > and 1 are reserved) and store the mapping table in the key which look like:
> > >  ```
> > >  {DB_DOCS_NS} / {DOC_KEY} / 0
> > >  ```
> > >  - Flatten the JSON document (convert it into key value pairs where
> > the key is

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-30 Thread Michael Fair

I know the claim was to avoid "revisions" and "conflicts" discussion in
this thread but isn't that unavoidable.

In scheme #1 you have multiple keys with the same DOCID/PART_IDX but
different data.
In schemes #2 / #3 you have multiple copies of the JSON_PATH but different
values.

The trivial fix is to use DOCID/REVISIONID as DOC_KEY.

Mike

On Wed, Jan 30, 2019 at 9:53 AM Ilya Khlopotov  wrote:

> FoundationDB Records layer uses global schema for JSON documents. They
> also have a nice way of creating indexes and schema evolution support.
> However this support comes at a cost of extra lookups in different
> subspace. With local mapping table we almost (except a corner case) certain
> that the schema and JSON fields would be collocated on a single node. Due
> to common prefix.
>
> Best regards,
> iilyak
> On 2019/01/30 17:05:01, Jan Lehnardt  wrote:
> > Ah sure, if we store the *cough* schema per doc, then it's not that
> easy. An iteration of this proposal could store paths globally with ids
> that the k/v store then uses for keys, which would enable what I described,
> but happy to ignore this for the time being. :)
> >
> > Cheers
> > Jan
> > —
> >
> > > On 30. Jan 2019, at 17:58, Adam Kocoloski  wrote:
> > >
> > > Jan, I don’t think it does have that "fun property #2", as the mapping
> is created separately for each document. In this proposal the field name
> “foo” could map to 2 in one document and 42 in another.
> > >
> > > Thanks for the proposal Ilya. Personally I wonder if the 10KB limit on
> field paths is anything more than a theoretical concern. It’s hard for me
> to imagine a useful schema that would get anywhere near that deep, but
> maybe I’m insufficiently creative :) There’s certainly a storage overhead
> from repeating the upper portion of a path over and over again, but that’s
> also something the storage engine can optimize away through prefix elision.
> The current production storage engine in FoundationDB does not do this
> elision, but the new one in development does.
> > >
> > > The value size limit is probably not so theoretical. I think as a
> project we could choose to impose a 100KB size limit on scalar values - a
> user who had a string longer than 100KB could chunk it up into an array of
> strings pretty easily to work around that limit. But let’s say we don’t
> want to impose that limit. In your design, how do I distinguish {PART_IDX}
> from the elements of the {JSON_PATH}? I was kind of expecting to see some
> magic value indicating that the subsequent set of keys with the same prefix
> are all elements of a “multi-part object”:
> > >
> > > {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH}  = kMULTIPART
> > > {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH} / {PART_IDX}  = “First 100 KB …"
> > > ...
> > >
> > > You might have figured out something more efficient that saves a KV
> here but I can’t quite grok it.
> > >
> > > Cheers, Adam
> > >
> > >
> > >> On Jan 30, 2019, at 8:24 AM, Jan Lehnardt  wrote:
> > >>
> > >>
> > >>
> > >>> On 30. Jan 2019, at 14:22, Jan Lehnardt  j...@apache.org>> wrote:
> > >>>
> > >>> Thanks Ilya for getting this started!
> > >>>
> > >>> Two quick notes on this one:
> > >>>
> > >>> 1. note that JSON does not guarantee object key order and that
> CouchDB has never guaranteed it either, and with say emit(doc.foo,
> doc.bar), if either emit() parameter was an object, the
> undefined-sort-order of SpiderMonkey would mix things up. While worth
> bringing up, this is not a BC break.
> > >>>
> > >>> 2. This would have the fun property of being able to rename a key
> inside all docs that have that key.
> > >>
> > >> …in one short operation.
> > >>
> > >> Best
> > >> Jan
> > >> —
> > >>>
> > >>> Best
> > >>> Jan
> > >>> —
> > >>>
> >  On 30. Jan 2019, at 14:05, Ilya Khlopotov 
> wrote:
> > 
> >  # First proposal
> > 
> >  In order to overcome FoudationDB limitations on key size (10 kB)
> and value size (100 kB) we could use the following approach.
> > 
> >  Bellow the paths are using slash for illustration purposes only. We
> can use nested subspaces, tuples, directories or something else.
> > 
> >  - Store documents in a subspace or directory  (to keep prefix for a
> key short)
> >  - When we store the document we would enumerate all field names (0
> and 1 are reserved) and store the mapping table in the key which look like:
> >  ```
> >  {DB_DOCS_NS} / {DOC_KEY} / 0
> >  ```
> >  - Flatten the JSON document (convert it into key value pairs where
> the key is `JSON_PATH` and value is `SCALAR_VALUE`)
> >  - Replace elements of JSON_PATH with integers from mapping table we
> constructed earlier
> >  - When we have array use `1 / {array_idx}`
> >  - Store scalar values in the keys which look like the following (we
> use `JSON_PATH` with integers).
> >  ```
> >  {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH}
> >  ```
> >  - If the scalar value exceeds 100kB we would split it and store
> every part under

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-30 Thread Ilya Khlopotov

FoundationDB Records layer uses global schema for JSON documents. They also 
have a nice way of creating indexes and schema evolution support. However this 
support comes at a cost of extra lookups in different subspace. With local 
mapping table we almost (except a corner case) certain that the schema and JSON 
fields would be collocated on a single node. Due to common prefix. 

Best regards,
iilyak
On 2019/01/30 17:05:01, Jan Lehnardt  wrote: 
> Ah sure, if we store the *cough* schema per doc, then it's not that easy. An 
> iteration of this proposal could store paths globally with ids that the k/v 
> store then uses for keys, which would enable what I described, but happy to 
> ignore this for the time being. :)
> 
> Cheers
> Jan
> —
> 
> > On 30. Jan 2019, at 17:58, Adam Kocoloski  wrote:
> > 
> > Jan, I don’t think it does have that "fun property #2", as the mapping is 
> > created separately for each document. In this proposal the field name “foo” 
> > could map to 2 in one document and 42 in another.
> > 
> > Thanks for the proposal Ilya. Personally I wonder if the 10KB limit on 
> > field paths is anything more than a theoretical concern. It’s hard for me 
> > to imagine a useful schema that would get anywhere near that deep, but 
> > maybe I’m insufficiently creative :) There’s certainly a storage overhead 
> > from repeating the upper portion of a path over and over again, but that’s 
> > also something the storage engine can optimize away through prefix elision. 
> > The current production storage engine in FoundationDB does not do this 
> > elision, but the new one in development does.
> > 
> > The value size limit is probably not so theoretical. I think as a project 
> > we could choose to impose a 100KB size limit on scalar values - a user who 
> > had a string longer than 100KB could chunk it up into an array of strings 
> > pretty easily to work around that limit. But let’s say we don’t want to 
> > impose that limit. In your design, how do I distinguish {PART_IDX} from the 
> > elements of the {JSON_PATH}? I was kind of expecting to see some magic 
> > value indicating that the subsequent set of keys with the same prefix are 
> > all elements of a “multi-part object”:
> > 
> > {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH}  = kMULTIPART
> > {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH} / {PART_IDX}  = “First 100 KB …"
> > ...
> > 
> > You might have figured out something more efficient that saves a KV here 
> > but I can’t quite grok it.
> > 
> > Cheers, Adam
> > 
> > 
> >> On Jan 30, 2019, at 8:24 AM, Jan Lehnardt  wrote:
> >> 
> >> 
> >> 
> >>> On 30. Jan 2019, at 14:22, Jan Lehnardt  >>> > wrote:
> >>> 
> >>> Thanks Ilya for getting this started!
> >>> 
> >>> Two quick notes on this one:
> >>> 
> >>> 1. note that JSON does not guarantee object key order and that CouchDB 
> >>> has never guaranteed it either, and with say emit(doc.foo, doc.bar), if 
> >>> either emit() parameter was an object, the undefined-sort-order of 
> >>> SpiderMonkey would mix things up. While worth bringing up, this is not a 
> >>> BC break.
> >>> 
> >>> 2. This would have the fun property of being able to rename a key inside 
> >>> all docs that have that key.
> >> 
> >> …in one short operation.
> >> 
> >> Best
> >> Jan
> >> —
> >>> 
> >>> Best
> >>> Jan
> >>> —
> >>> 
>  On 30. Jan 2019, at 14:05, Ilya Khlopotov  wrote:
>  
>  # First proposal
>  
>  In order to overcome FoudationDB limitations on key size (10 kB) and 
>  value size (100 kB) we could use the following approach.
>  
>  Bellow the paths are using slash for illustration purposes only. We can 
>  use nested subspaces, tuples, directories or something else. 
>  
>  - Store documents in a subspace or directory  (to keep prefix for a key 
>  short)
>  - When we store the document we would enumerate all field names (0 and 1 
>  are reserved) and store the mapping table in the key which look like:
>  ```
>  {DB_DOCS_NS} / {DOC_KEY} / 0
>  ```
>  - Flatten the JSON document (convert it into key value pairs where the 
>  key is `JSON_PATH` and value is `SCALAR_VALUE`)
>  - Replace elements of JSON_PATH with integers from mapping table we 
>  constructed earlier
>  - When we have array use `1 / {array_idx}`
>  - Store scalar values in the keys which look like the following (we use 
>  `JSON_PATH` with integers). 
>  ```
>  {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH}
>  ```
>  - If the scalar value exceeds 100kB we would split it and store every 
>  part under key constructed as:
>  ```
>  {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH} / {PART_IDX}
>  ```
>  
>  Since all parts of the documents are stored under a common `{DB_DOCS_NS} 
>  / {DOC_KEY}` they will be stored on the same server most of the time. 
>  The document can be retrieved by using range query 
>  (`txn.get_range("{DB_DOCS_NS} / {DOC_KEY} / 0",

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-30 Thread Ilya Khlopotov

> I was kind of expecting to see some magic value indicating that the 
> subsequent set of keys with the same prefix are all elements of a “multi-part 
> object”
I missed this aspect. This is easy to solve (as you've mentioned) by using 
either a special character or reserved value in the mapping table.  

On 2019/01/30 16:58:29, Adam Kocoloski  wrote: 
> Jan, I don’t think it does have that "fun property #2", as the mapping is 
> created separately for each document. In this proposal the field name “foo” 
> could map to 2 in one document and 42 in another.
> 
> Thanks for the proposal Ilya. Personally I wonder if the 10KB limit on field 
> paths is anything more than a theoretical concern. It’s hard for me to 
> imagine a useful schema that would get anywhere near that deep, but maybe I’m 
> insufficiently creative :) There’s certainly a storage overhead from 
> repeating the upper portion of a path over and over again, but that’s also 
> something the storage engine can optimize away through prefix elision. The 
> current production storage engine in FoundationDB does not do this elision, 
> but the new one in development does.
> 
> The value size limit is probably not so theoretical. I think as a project we 
> could choose to impose a 100KB size limit on scalar values - a user who had a 
> string longer than 100KB could chunk it up into an array of strings pretty 
> easily to work around that limit. But let’s say we don’t want to impose that 
> limit. In your design, how do I distinguish {PART_IDX} from the elements of 
> the {JSON_PATH}? I was kind of expecting to see some magic value indicating 
> that the subsequent set of keys with the same prefix are all elements of a 
> “multi-part object”:
> 
> {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH}  = kMULTIPART
> {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH} / {PART_IDX}  = “First 100 KB …"
> ...
> 
> You might have figured out something more efficient that saves a KV here but 
> I can’t quite grok it.
> 
> Cheers, Adam
> 
> 
> > On Jan 30, 2019, at 8:24 AM, Jan Lehnardt  wrote:
> > 
> > 
> > 
> >> On 30. Jan 2019, at 14:22, Jan Lehnardt  >> > wrote:
> >> 
> >> Thanks Ilya for getting this started!
> >> 
> >> Two quick notes on this one:
> >> 
> >> 1. note that JSON does not guarantee object key order and that CouchDB has 
> >> never guaranteed it either, and with say emit(doc.foo, doc.bar), if either 
> >> emit() parameter was an object, the undefined-sort-order of SpiderMonkey 
> >> would mix things up. While worth bringing up, this is not a BC break.
> >> 
> >> 2. This would have the fun property of being able to rename a key inside 
> >> all docs that have that key.
> > 
> > …in one short operation.
> > 
> > Best
> > Jan
> > —
> >> 
> >> Best
> >> Jan
> >> —
> >> 
> >>> On 30. Jan 2019, at 14:05, Ilya Khlopotov  wrote:
> >>> 
> >>> # First proposal
> >>> 
> >>> In order to overcome FoudationDB limitations on key size (10 kB) and 
> >>> value size (100 kB) we could use the following approach.
> >>> 
> >>> Bellow the paths are using slash for illustration purposes only. We can 
> >>> use nested subspaces, tuples, directories or something else. 
> >>> 
> >>> - Store documents in a subspace or directory  (to keep prefix for a key 
> >>> short)
> >>> - When we store the document we would enumerate all field names (0 and 1 
> >>> are reserved) and store the mapping table in the key which look like:
> >>> ```
> >>> {DB_DOCS_NS} / {DOC_KEY} / 0
> >>> ```
> >>> - Flatten the JSON document (convert it into key value pairs where the 
> >>> key is `JSON_PATH` and value is `SCALAR_VALUE`)
> >>> - Replace elements of JSON_PATH with integers from mapping table we 
> >>> constructed earlier
> >>> - When we have array use `1 / {array_idx}`
> >>> - Store scalar values in the keys which look like the following (we use 
> >>> `JSON_PATH` with integers). 
> >>> ```
> >>> {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH}
> >>> ```
> >>> - If the scalar value exceeds 100kB we would split it and store every 
> >>> part under key constructed as:
> >>> ```
> >>> {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH} / {PART_IDX}
> >>> ```
> >>> 
> >>> Since all parts of the documents are stored under a common `{DB_DOCS_NS} 
> >>> / {DOC_KEY}` they will be stored on the same server most of the time. The 
> >>> document can be retrieved by using range query 
> >>> (`txn.get_range("{DB_DOCS_NS} / {DOC_KEY} / 0", "{DB_DOCS_NS} / {DOC_KEY} 
> >>> / 0xFF")`). We can reconstruct the document since the mapping is returned 
> >>> as well.
> >>> 
> >>> The downside of this approach is we wouldn't be able to ensure the same 
> >>> order of keys in the JSON object. Currently the `jiffy` JSON encoder 
> >>> respects order of keys.
> >>> ```
> >>> 4> jiffy:encode({[{bbb, 1}, {aaa, 12}]}).
> >>> <<"{\"bbb\":1,\"aaa\":12}">>
> >>> 5> jiffy:encode({[{aaa, 12}, {bbb, 1}]}).
> >>> <<"{\"aaa\":12,\"bbb\":1}">>
> >>> ```
> >>> 
> >>> Best regards,
> >>> iilyak
> >>> 
> >>> On 2019/01/30 13:02:57, Ilya Khlopotov

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-30 Thread Jan Lehnardt

Ah sure, if we store the *cough* schema per doc, then it's not that easy. An 
iteration of this proposal could store paths globally with ids that the k/v 
store then uses for keys, which would enable what I described, but happy to 
ignore this for the time being. :)

Cheers
Jan
—

> On 30. Jan 2019, at 17:58, Adam Kocoloski  wrote:
> 
> Jan, I don’t think it does have that "fun property #2", as the mapping is 
> created separately for each document. In this proposal the field name “foo” 
> could map to 2 in one document and 42 in another.
> 
> Thanks for the proposal Ilya. Personally I wonder if the 10KB limit on field 
> paths is anything more than a theoretical concern. It’s hard for me to 
> imagine a useful schema that would get anywhere near that deep, but maybe I’m 
> insufficiently creative :) There’s certainly a storage overhead from 
> repeating the upper portion of a path over and over again, but that’s also 
> something the storage engine can optimize away through prefix elision. The 
> current production storage engine in FoundationDB does not do this elision, 
> but the new one in development does.
> 
> The value size limit is probably not so theoretical. I think as a project we 
> could choose to impose a 100KB size limit on scalar values - a user who had a 
> string longer than 100KB could chunk it up into an array of strings pretty 
> easily to work around that limit. But let’s say we don’t want to impose that 
> limit. In your design, how do I distinguish {PART_IDX} from the elements of 
> the {JSON_PATH}? I was kind of expecting to see some magic value indicating 
> that the subsequent set of keys with the same prefix are all elements of a 
> “multi-part object”:
> 
> {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH}  = kMULTIPART
> {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH} / {PART_IDX}  = “First 100 KB …"
> ...
> 
> You might have figured out something more efficient that saves a KV here but 
> I can’t quite grok it.
> 
> Cheers, Adam
> 
> 
>> On Jan 30, 2019, at 8:24 AM, Jan Lehnardt  wrote:
>> 
>> 
>> 
>>> On 30. Jan 2019, at 14:22, Jan Lehnardt >> > wrote:
>>> 
>>> Thanks Ilya for getting this started!
>>> 
>>> Two quick notes on this one:
>>> 
>>> 1. note that JSON does not guarantee object key order and that CouchDB has 
>>> never guaranteed it either, and with say emit(doc.foo, doc.bar), if either 
>>> emit() parameter was an object, the undefined-sort-order of SpiderMonkey 
>>> would mix things up. While worth bringing up, this is not a BC break.
>>> 
>>> 2. This would have the fun property of being able to rename a key inside 
>>> all docs that have that key.
>> 
>> …in one short operation.
>> 
>> Best
>> Jan
>> —
>>> 
>>> Best
>>> Jan
>>> —
>>> 
 On 30. Jan 2019, at 14:05, Ilya Khlopotov  wrote:
 
 # First proposal
 
 In order to overcome FoudationDB limitations on key size (10 kB) and value 
 size (100 kB) we could use the following approach.
 
 Bellow the paths are using slash for illustration purposes only. We can 
 use nested subspaces, tuples, directories or something else. 
 
 - Store documents in a subspace or directory  (to keep prefix for a key 
 short)
 - When we store the document we would enumerate all field names (0 and 1 
 are reserved) and store the mapping table in the key which look like:
 ```
 {DB_DOCS_NS} / {DOC_KEY} / 0
 ```
 - Flatten the JSON document (convert it into key value pairs where the key 
 is `JSON_PATH` and value is `SCALAR_VALUE`)
 - Replace elements of JSON_PATH with integers from mapping table we 
 constructed earlier
 - When we have array use `1 / {array_idx}`
 - Store scalar values in the keys which look like the following (we use 
 `JSON_PATH` with integers). 
 ```
 {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH}
 ```
 - If the scalar value exceeds 100kB we would split it and store every part 
 under key constructed as:
 ```
 {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH} / {PART_IDX}
 ```
 
 Since all parts of the documents are stored under a common `{DB_DOCS_NS} / 
 {DOC_KEY}` they will be stored on the same server most of the time. The 
 document can be retrieved by using range query 
 (`txn.get_range("{DB_DOCS_NS} / {DOC_KEY} / 0", "{DB_DOCS_NS} / {DOC_KEY} 
 / 0xFF")`). We can reconstruct the document since the mapping is returned 
 as well.
 
 The downside of this approach is we wouldn't be able to ensure the same 
 order of keys in the JSON object. Currently the `jiffy` JSON encoder 
 respects order of keys.
 ```
 4> jiffy:encode({[{bbb, 1}, {aaa, 12}]}).
 <<"{\"bbb\":1,\"aaa\":12}">>
 5> jiffy:encode({[{aaa, 12}, {bbb, 1}]}).
 <<"{\"aaa\":12,\"bbb\":1}">>
 ```
 
 Best regards,
 iilyak
 
> On 2019/01/30 13:02:57, Ilya Khlopotov  wrote: 
> As you might already know the FoundationDB has a number of limitations 
> which

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-30 Thread Adam Kocoloski

Jan, I don’t think it does have that "fun property #2", as the mapping is 
created separately for each document. In this proposal the field name “foo” 
could map to 2 in one document and 42 in another.

Thanks for the proposal Ilya. Personally I wonder if the 10KB limit on field 
paths is anything more than a theoretical concern. It’s hard for me to imagine 
a useful schema that would get anywhere near that deep, but maybe I’m 
insufficiently creative :) There’s certainly a storage overhead from repeating 
the upper portion of a path over and over again, but that’s also something the 
storage engine can optimize away through prefix elision. The current production 
storage engine in FoundationDB does not do this elision, but the new one in 
development does.

The value size limit is probably not so theoretical. I think as a project we 
could choose to impose a 100KB size limit on scalar values - a user who had a 
string longer than 100KB could chunk it up into an array of strings pretty 
easily to work around that limit. But let’s say we don’t want to impose that 
limit. In your design, how do I distinguish {PART_IDX} from the elements of the 
{JSON_PATH}? I was kind of expecting to see some magic value indicating that 
the subsequent set of keys with the same prefix are all elements of a 
“multi-part object”:

{DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH}  = kMULTIPART
{DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH} / {PART_IDX}  = “First 100 KB …"
...

You might have figured out something more efficient that saves a KV here but I 
can’t quite grok it.

Cheers, Adam

> On Jan 30, 2019, at 8:24 AM, Jan Lehnardt  wrote:
> 
> 
> 
>> On 30. Jan 2019, at 14:22, Jan Lehnardt > > wrote:
>> 
>> Thanks Ilya for getting this started!
>> 
>> Two quick notes on this one:
>> 
>> 1. note that JSON does not guarantee object key order and that CouchDB has 
>> never guaranteed it either, and with say emit(doc.foo, doc.bar), if either 
>> emit() parameter was an object, the undefined-sort-order of SpiderMonkey 
>> would mix things up. While worth bringing up, this is not a BC break.
>> 
>> 2. This would have the fun property of being able to rename a key inside all 
>> docs that have that key.
> 
> …in one short operation.
> 
> Best
> Jan
> —
>> 
>> Best
>> Jan
>> —
>> 
>>> On 30. Jan 2019, at 14:05, Ilya Khlopotov  wrote:
>>> 
>>> # First proposal
>>> 
>>> In order to overcome FoudationDB limitations on key size (10 kB) and value 
>>> size (100 kB) we could use the following approach.
>>> 
>>> Bellow the paths are using slash for illustration purposes only. We can use 
>>> nested subspaces, tuples, directories or something else. 
>>> 
>>> - Store documents in a subspace or directory  (to keep prefix for a key 
>>> short)
>>> - When we store the document we would enumerate all field names (0 and 1 
>>> are reserved) and store the mapping table in the key which look like:
>>> ```
>>> {DB_DOCS_NS} / {DOC_KEY} / 0
>>> ```
>>> - Flatten the JSON document (convert it into key value pairs where the key 
>>> is `JSON_PATH` and value is `SCALAR_VALUE`)
>>> - Replace elements of JSON_PATH with integers from mapping table we 
>>> constructed earlier
>>> - When we have array use `1 / {array_idx}`
>>> - Store scalar values in the keys which look like the following (we use 
>>> `JSON_PATH` with integers). 
>>> ```
>>> {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH}
>>> ```
>>> - If the scalar value exceeds 100kB we would split it and store every part 
>>> under key constructed as:
>>> ```
>>> {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH} / {PART_IDX}
>>> ```
>>> 
>>> Since all parts of the documents are stored under a common `{DB_DOCS_NS} / 
>>> {DOC_KEY}` they will be stored on the same server most of the time. The 
>>> document can be retrieved by using range query 
>>> (`txn.get_range("{DB_DOCS_NS} / {DOC_KEY} / 0", "{DB_DOCS_NS} / {DOC_KEY} / 
>>> 0xFF")`). We can reconstruct the document since the mapping is returned as 
>>> well.
>>> 
>>> The downside of this approach is we wouldn't be able to ensure the same 
>>> order of keys in the JSON object. Currently the `jiffy` JSON encoder 
>>> respects order of keys.
>>> ```
>>> 4> jiffy:encode({[{bbb, 1}, {aaa, 12}]}).
>>> <<"{\"bbb\":1,\"aaa\":12}">>
>>> 5> jiffy:encode({[{aaa, 12}, {bbb, 1}]}).
>>> <<"{\"aaa\":12,\"bbb\":1}">>
>>> ```
>>> 
>>> Best regards,
>>> iilyak
>>> 
>>> On 2019/01/30 13:02:57, Ilya Khlopotov  wrote: 
 As you might already know the FoundationDB has a number of limitations 
 which influences the way we might store JSON documents. The limitations 
 are:

 |  limitation |recommended value|recommended max|absolute 
 max|
 |-|--:|:|--:|
 | transaction duration  |  |   
 |  5 sec  |
 | transaction data size |  |   
 |  10 Mb

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-30 Thread Jan Lehnardt

+1

> On 30. Jan 2019, at 15:31, Paul Davis  wrote:
> 
> Jiffy preserves duplicate keys if its not decoding into a map (in
> which case last value for duplicate keys wins). Its significantly
> corner case and not at all supported by nearly any other JSON library
> so changing that shouldn't be considered a breaking change in my
> opinion.
> 
> On Wed, Jan 30, 2019 at 8:21 AM Mike Rhodes  wrote:
>> 
>> From what I recall Jiffy is able to cope with the valid-but-kinda-silly[1] 
>> thing where you have multiple JSON keys with the same name, i.e., { "foo": 
>> 1, "foo": 2 }.
>> 
>> Are the proposals on the table able to continue this support (or am I wrong 
>> about Jiffy)?
>> 
>> [1] https://tools.ietf.org/html/rfc8259#section-4, "The names within an 
>> object SHOULD be unique.", though 
>> https://tools.ietf.org/html/rfc7493#section-2.3 does sensibly close that 
>> down.
>> 
>> --
>> Mike.
>> 
>> On Wed, 30 Jan 2019, at 13:33, Jan Lehnardt wrote:
>>> 
>>> 
 On 30. Jan 2019, at 14:22, Jan Lehnardt  wrote:
 
 Thanks Ilya for getting this started!
 
 Two quick notes on this one:
 
 1. note that JSON does not guarantee object key order and that CouchDB has 
 never guaranteed it either, and with say emit(doc.foo, doc.bar), if either 
 emit() parameter was an object, the undefined-sort-order of SpiderMonkey 
 would mix things up. While worth bringing up, this is not a BC break.
 
 2. This would have the fun property of being able to rename a key inside 
 all docs that have that key.
>>> 
>>> …in one short operation.
>>> 
>>> Best
>>> Jan
>>> —
 
 Best
 Jan
 —
 
> On 30. Jan 2019, at 14:05, Ilya Khlopotov  wrote:
> 
> # First proposal
> 
> In order to overcome FoudationDB limitations on key size (10 kB) and 
> value size (100 kB) we could use the following approach.
> 
> Bellow the paths are using slash for illustration purposes only. We can 
> use nested subspaces, tuples, directories or something else.
> 
> - Store documents in a subspace or directory  (to keep prefix for a key 
> short)
> - When we store the document we would enumerate all field names (0 and 1 
> are reserved) and store the mapping table in the key which look like:
> ```
> {DB_DOCS_NS} / {DOC_KEY} / 0
> ```
> - Flatten the JSON document (convert it into key value pairs where the 
> key is `JSON_PATH` and value is `SCALAR_VALUE`)
> - Replace elements of JSON_PATH with integers from mapping table we 
> constructed earlier
> - When we have array use `1 / {array_idx}`
> - Store scalar values in the keys which look like the following (we use 
> `JSON_PATH` with integers).
> ```
> {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH}
> ```
> - If the scalar value exceeds 100kB we would split it and store every 
> part under key constructed as:
> ```
> {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH} / {PART_IDX}
> ```
> 
> Since all parts of the documents are stored under a common `{DB_DOCS_NS} 
> / {DOC_KEY}` they will be stored on the same server most of the time. The 
> document can be retrieved by using range query 
> (`txn.get_range("{DB_DOCS_NS} / {DOC_KEY} / 0", "{DB_DOCS_NS} / {DOC_KEY} 
> / 0xFF")`). We can reconstruct the document since the mapping is returned 
> as well.
> 
> The downside of this approach is we wouldn't be able to ensure the same 
> order of keys in the JSON object. Currently the `jiffy` JSON encoder 
> respects order of keys.
> ```
> 4> jiffy:encode({[{bbb, 1}, {aaa, 12}]}).
> <<"{\"bbb\":1,\"aaa\":12}">>
> 5> jiffy:encode({[{aaa, 12}, {bbb, 1}]}).
> <<"{\"aaa\":12,\"bbb\":1}">>
> ```
> 
> Best regards,
> iilyak
> 
> On 2019/01/30 13:02:57, Ilya Khlopotov  wrote:
>> As you might already know the FoundationDB has a number of limitations 
>> which influences the way we might store JSON documents. The limitations 
>> are:
>> 
>> |  limitation |recommended value|recommended 
>> max|absolute max|
>> |-|--:|:|--:|
>> | transaction duration  |  | 
>>   |  5 sec  |
>> | transaction data size |  | 
>>   |  10 Mb |
>> | key size   | 32 bytes |
>>1 kB  | 10 kB  |
>> | value size|   |
>>   10 kB |100 kB |
>> 
>> In order to fit the JSON document into 100kB we would have to partition 
>> it in some way. There are three ways of partitioning the document
>> 1. store multiple binary blobs (parts) in different keys
>> 2. flatten JSON structure and store every path

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-30 Thread Paul Davis

Jiffy preserves duplicate keys if its not decoding into a map (in
which case last value for duplicate keys wins). Its significantly
corner case and not at all supported by nearly any other JSON library
so changing that shouldn't be considered a breaking change in my
opinion.

On Wed, Jan 30, 2019 at 8:21 AM Mike Rhodes  wrote:
>
> From what I recall Jiffy is able to cope with the valid-but-kinda-silly[1] 
> thing where you have multiple JSON keys with the same name, i.e., { "foo": 1, 
> "foo": 2 }.
>
> Are the proposals on the table able to continue this support (or am I wrong 
> about Jiffy)?
>
> [1] https://tools.ietf.org/html/rfc8259#section-4, "The names within an 
> object SHOULD be unique.", though 
> https://tools.ietf.org/html/rfc7493#section-2.3 does sensibly close that down.
>
> --
> Mike.
>
> On Wed, 30 Jan 2019, at 13:33, Jan Lehnardt wrote:
> >
> >
> > > On 30. Jan 2019, at 14:22, Jan Lehnardt  wrote:
> > >
> > > Thanks Ilya for getting this started!
> > >
> > > Two quick notes on this one:
> > >
> > > 1. note that JSON does not guarantee object key order and that CouchDB 
> > > has never guaranteed it either, and with say emit(doc.foo, doc.bar), if 
> > > either emit() parameter was an object, the undefined-sort-order of 
> > > SpiderMonkey would mix things up. While worth bringing up, this is not a 
> > > BC break.
> > >
> > > 2. This would have the fun property of being able to rename a key inside 
> > > all docs that have that key.
> >
> > …in one short operation.
> >
> > Best
> > Jan
> > —
> > >
> > > Best
> > > Jan
> > > —
> > >
> > >> On 30. Jan 2019, at 14:05, Ilya Khlopotov  wrote:
> > >>
> > >> # First proposal
> > >>
> > >> In order to overcome FoudationDB limitations on key size (10 kB) and 
> > >> value size (100 kB) we could use the following approach.
> > >>
> > >> Bellow the paths are using slash for illustration purposes only. We can 
> > >> use nested subspaces, tuples, directories or something else.
> > >>
> > >> - Store documents in a subspace or directory  (to keep prefix for a key 
> > >> short)
> > >> - When we store the document we would enumerate all field names (0 and 1 
> > >> are reserved) and store the mapping table in the key which look like:
> > >> ```
> > >> {DB_DOCS_NS} / {DOC_KEY} / 0
> > >> ```
> > >> - Flatten the JSON document (convert it into key value pairs where the 
> > >> key is `JSON_PATH` and value is `SCALAR_VALUE`)
> > >> - Replace elements of JSON_PATH with integers from mapping table we 
> > >> constructed earlier
> > >> - When we have array use `1 / {array_idx}`
> > >> - Store scalar values in the keys which look like the following (we use 
> > >> `JSON_PATH` with integers).
> > >> ```
> > >> {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH}
> > >> ```
> > >> - If the scalar value exceeds 100kB we would split it and store every 
> > >> part under key constructed as:
> > >> ```
> > >> {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH} / {PART_IDX}
> > >> ```
> > >>
> > >> Since all parts of the documents are stored under a common `{DB_DOCS_NS} 
> > >> / {DOC_KEY}` they will be stored on the same server most of the time. 
> > >> The document can be retrieved by using range query 
> > >> (`txn.get_range("{DB_DOCS_NS} / {DOC_KEY} / 0", "{DB_DOCS_NS} / 
> > >> {DOC_KEY} / 0xFF")`). We can reconstruct the document since the mapping 
> > >> is returned as well.
> > >>
> > >> The downside of this approach is we wouldn't be able to ensure the same 
> > >> order of keys in the JSON object. Currently the `jiffy` JSON encoder 
> > >> respects order of keys.
> > >> ```
> > >> 4> jiffy:encode({[{bbb, 1}, {aaa, 12}]}).
> > >> <<"{\"bbb\":1,\"aaa\":12}">>
> > >> 5> jiffy:encode({[{aaa, 12}, {bbb, 1}]}).
> > >> <<"{\"aaa\":12,\"bbb\":1}">>
> > >> ```
> > >>
> > >> Best regards,
> > >> iilyak
> > >>
> > >> On 2019/01/30 13:02:57, Ilya Khlopotov  wrote:
> > >>> As you might already know the FoundationDB has a number of limitations 
> > >>> which influences the way we might store JSON documents. The limitations 
> > >>> are:
> > >>>
> > >>> |  limitation |recommended value|recommended 
> > >>> max|absolute max|
> > >>> |-|--:|:|--:|
> > >>> | transaction duration  |  |
> > >>>|  5 sec  |
> > >>> | transaction data size |  |
> > >>>|  10 Mb |
> > >>> | key size   | 32 bytes |   
> > >>> 1 kB  | 10 kB  |
> > >>> | value size|   |   
> > >>>10 kB |100 kB |
> > >>>
> > >>> In order to fit the JSON document into 100kB we would have to partition 
> > >>> it in some way. There are three ways of partitioning the document
> > >>> 1. store multiple binary blobs (parts) in different keys
> > >>> 2. flatten JSON structure and store every path leading to a

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-30 Thread Mike Rhodes

>From what I recall Jiffy is able to cope with the valid-but-kinda-silly[1] 
>thing where you have multiple JSON keys with the same name, i.e., { "foo": 1, 
>"foo": 2 }.

Are the proposals on the table able to continue this support (or am I wrong 
about Jiffy)?

[1] https://tools.ietf.org/html/rfc8259#section-4, "The names within an object 
SHOULD be unique.", though https://tools.ietf.org/html/rfc7493#section-2.3 does 
sensibly close that down.

-- 
Mike.

On Wed, 30 Jan 2019, at 13:33, Jan Lehnardt wrote:
> 
> 
> > On 30. Jan 2019, at 14:22, Jan Lehnardt  wrote:
> > 
> > Thanks Ilya for getting this started!
> > 
> > Two quick notes on this one:
> > 
> > 1. note that JSON does not guarantee object key order and that CouchDB has 
> > never guaranteed it either, and with say emit(doc.foo, doc.bar), if either 
> > emit() parameter was an object, the undefined-sort-order of SpiderMonkey 
> > would mix things up. While worth bringing up, this is not a BC break.
> > 
> > 2. This would have the fun property of being able to rename a key inside 
> > all docs that have that key.
> 
> …in one short operation.
> 
> Best
> Jan
> —
> > 
> > Best
> > Jan
> > —
> > 
> >> On 30. Jan 2019, at 14:05, Ilya Khlopotov  wrote:
> >> 
> >> # First proposal
> >> 
> >> In order to overcome FoudationDB limitations on key size (10 kB) and value 
> >> size (100 kB) we could use the following approach.
> >> 
> >> Bellow the paths are using slash for illustration purposes only. We can 
> >> use nested subspaces, tuples, directories or something else. 
> >> 
> >> - Store documents in a subspace or directory  (to keep prefix for a key 
> >> short)
> >> - When we store the document we would enumerate all field names (0 and 1 
> >> are reserved) and store the mapping table in the key which look like:
> >> ```
> >> {DB_DOCS_NS} / {DOC_KEY} / 0
> >> ```
> >> - Flatten the JSON document (convert it into key value pairs where the key 
> >> is `JSON_PATH` and value is `SCALAR_VALUE`)
> >> - Replace elements of JSON_PATH with integers from mapping table we 
> >> constructed earlier
> >> - When we have array use `1 / {array_idx}`
> >> - Store scalar values in the keys which look like the following (we use 
> >> `JSON_PATH` with integers). 
> >> ```
> >> {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH}
> >> ```
> >> - If the scalar value exceeds 100kB we would split it and store every part 
> >> under key constructed as:
> >> ```
> >> {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH} / {PART_IDX}
> >> ```
> >> 
> >> Since all parts of the documents are stored under a common `{DB_DOCS_NS} / 
> >> {DOC_KEY}` they will be stored on the same server most of the time. The 
> >> document can be retrieved by using range query 
> >> (`txn.get_range("{DB_DOCS_NS} / {DOC_KEY} / 0", "{DB_DOCS_NS} / {DOC_KEY} 
> >> / 0xFF")`). We can reconstruct the document since the mapping is returned 
> >> as well.
> >> 
> >> The downside of this approach is we wouldn't be able to ensure the same 
> >> order of keys in the JSON object. Currently the `jiffy` JSON encoder 
> >> respects order of keys.
> >> ```
> >> 4> jiffy:encode({[{bbb, 1}, {aaa, 12}]}).
> >> <<"{\"bbb\":1,\"aaa\":12}">>
> >> 5> jiffy:encode({[{aaa, 12}, {bbb, 1}]}).
> >> <<"{\"aaa\":12,\"bbb\":1}">>
> >> ```
> >> 
> >> Best regards,
> >> iilyak
> >> 
> >> On 2019/01/30 13:02:57, Ilya Khlopotov  wrote: 
> >>> As you might already know the FoundationDB has a number of limitations 
> >>> which influences the way we might store JSON documents. The limitations 
> >>> are:
> >>> 
> >>> |  limitation |recommended value|recommended max|absolute 
> >>> max|
> >>> |-|--:|:|--:|
> >>> | transaction duration  |  |  
> >>>  |  5 sec  |
> >>> | transaction data size |  |  
> >>>  |  10 Mb |
> >>> | key size   | 32 bytes | 
> >>>   1 kB  | 10 kB  |
> >>> | value size|   | 
> >>>  10 kB |100 kB |
> >>> 
> >>> In order to fit the JSON document into 100kB we would have to partition 
> >>> it in some way. There are three ways of partitioning the document
> >>> 1. store multiple binary blobs (parts) in different keys
> >>> 2. flatten JSON structure and store every path leading to a scalar value 
> >>> under own key
> >>> 3. measure the size of different branches of a tree representing the JSON 
> >>> document (while we parse) and use another key for the branch when we 
> >>> about to exceed the limit
> >>> 
> >>> - The first approach is the simplest but it wouldn't allow us to access 
> >>> parts of the document.
> >>> - The downsides of a second approach are:
> >>> - flattened JSON structure would have long paths which means longer keys
> >>> - the scalar value cannot be more than 100kb (unless we split it as

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-30 Thread Jan Lehnardt



> On 30. Jan 2019, at 14:22, Jan Lehnardt  wrote:
> 
> Thanks Ilya for getting this started!
> 
> Two quick notes on this one:
> 
> 1. note that JSON does not guarantee object key order and that CouchDB has 
> never guaranteed it either, and with say emit(doc.foo, doc.bar), if either 
> emit() parameter was an object, the undefined-sort-order of SpiderMonkey 
> would mix things up. While worth bringing up, this is not a BC break.
> 
> 2. This would have the fun property of being able to rename a key inside all 
> docs that have that key.

…in one short operation.

Best
Jan
—
> 
> Best
> Jan
> —
> 
>> On 30. Jan 2019, at 14:05, Ilya Khlopotov  wrote:
>> 
>> # First proposal
>> 
>> In order to overcome FoudationDB limitations on key size (10 kB) and value 
>> size (100 kB) we could use the following approach.
>> 
>> Bellow the paths are using slash for illustration purposes only. We can use 
>> nested subspaces, tuples, directories or something else. 
>> 
>> - Store documents in a subspace or directory  (to keep prefix for a key 
>> short)
>> - When we store the document we would enumerate all field names (0 and 1 are 
>> reserved) and store the mapping table in the key which look like:
>> ```
>> {DB_DOCS_NS} / {DOC_KEY} / 0
>> ```
>> - Flatten the JSON document (convert it into key value pairs where the key 
>> is `JSON_PATH` and value is `SCALAR_VALUE`)
>> - Replace elements of JSON_PATH with integers from mapping table we 
>> constructed earlier
>> - When we have array use `1 / {array_idx}`
>> - Store scalar values in the keys which look like the following (we use 
>> `JSON_PATH` with integers). 
>> ```
>> {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH}
>> ```
>> - If the scalar value exceeds 100kB we would split it and store every part 
>> under key constructed as:
>> ```
>> {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH} / {PART_IDX}
>> ```
>> 
>> Since all parts of the documents are stored under a common `{DB_DOCS_NS} / 
>> {DOC_KEY}` they will be stored on the same server most of the time. The 
>> document can be retrieved by using range query (`txn.get_range("{DB_DOCS_NS} 
>> / {DOC_KEY} / 0", "{DB_DOCS_NS} / {DOC_KEY} / 0xFF")`). We can reconstruct 
>> the document since the mapping is returned as well.
>> 
>> The downside of this approach is we wouldn't be able to ensure the same 
>> order of keys in the JSON object. Currently the `jiffy` JSON encoder 
>> respects order of keys.
>> ```
>> 4> jiffy:encode({[{bbb, 1}, {aaa, 12}]}).
>> <<"{\"bbb\":1,\"aaa\":12}">>
>> 5> jiffy:encode({[{aaa, 12}, {bbb, 1}]}).
>> <<"{\"aaa\":12,\"bbb\":1}">>
>> ```
>> 
>> Best regards,
>> iilyak
>> 
>> On 2019/01/30 13:02:57, Ilya Khlopotov  wrote: 
>>> As you might already know the FoundationDB has a number of limitations 
>>> which influences the way we might store JSON documents. The limitations are:
>>> 
>>> |  limitation |recommended value|recommended max|absolute 
>>> max|
>>> |-|--:|:|--:|
>>> | transaction duration  |  |
>>>|  5 sec  |
>>> | transaction data size |  |
>>>|  10 Mb |
>>> | key size   | 32 bytes |   
>>> 1 kB  | 10 kB  |
>>> | value size|   |   
>>>10 kB |100 kB |
>>> 
>>> In order to fit the JSON document into 100kB we would have to partition it 
>>> in some way. There are three ways of partitioning the document
>>> 1. store multiple binary blobs (parts) in different keys
>>> 2. flatten JSON structure and store every path leading to a scalar value 
>>> under own key
>>> 3. measure the size of different branches of a tree representing the JSON 
>>> document (while we parse) and use another key for the branch when we about 
>>> to exceed the limit
>>> 
>>> - The first approach is the simplest but it wouldn't allow us to access 
>>> parts of the document.
>>> - The downsides of a second approach are:
>>> - flattened JSON structure would have long paths which means longer keys
>>> - the scalar value cannot be more than 100kb (unless we split it as well)
>>> - Third approach falls short in cases when the structure of the document 
>>> doesn't allow a clean cut off branches:
>>> - complex rules to handle all corner cases
>>> 
>>> The goals of this thread are:
>>> - to collect ideas on how to encode and store the JSON document
>>> - to comment on the collected ideas
>>> 
>>> Non goals:
>>> - the storage of metadata for the document would be discussed elsewhere
>>> - thumb stones
>>> - edit conflicts
>>> - revisions 
>>> 
>>> Best regards,
>>> iilyak
>>> 
> 
> -- 
> Professional Support for Apache CouchDB:
> https://neighbourhood.ie/couchdb-support/
> 

-- 
Professional Support for Apache CouchDB:
https://neighbourhood.ie/couchdb-support/

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-30 Thread Jan Lehnardt

Thanks Ilya for getting this started!

Two quick notes on this one:

1. note that JSON does not guarantee object key order and that CouchDB has 
never guaranteed it either, and with say emit(doc.foo, doc.bar), if either 
emit() parameter was an object, the undefined-sort-order of SpiderMonkey would 
mix things up. While worth bringing up, this is not a BC break.

2. This would have the fun property of being able to rename a key inside all 
docs that have that key.

Best
Jan
—

> On 30. Jan 2019, at 14:05, Ilya Khlopotov  wrote:
> 
> # First proposal
> 
> In order to overcome FoudationDB limitations on key size (10 kB) and value 
> size (100 kB) we could use the following approach.
> 
> Bellow the paths are using slash for illustration purposes only. We can use 
> nested subspaces, tuples, directories or something else. 
> 
> - Store documents in a subspace or directory  (to keep prefix for a key short)
> - When we store the document we would enumerate all field names (0 and 1 are 
> reserved) and store the mapping table in the key which look like:
> ```
> {DB_DOCS_NS} / {DOC_KEY} / 0
> ```
> - Flatten the JSON document (convert it into key value pairs where the key is 
> `JSON_PATH` and value is `SCALAR_VALUE`)
> - Replace elements of JSON_PATH with integers from mapping table we 
> constructed earlier
> - When we have array use `1 / {array_idx}`
> - Store scalar values in the keys which look like the following (we use 
> `JSON_PATH` with integers). 
> ```
> {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH}
> ```
> - If the scalar value exceeds 100kB we would split it and store every part 
> under key constructed as:
> ```
> {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH} / {PART_IDX}
> ```
> 
> Since all parts of the documents are stored under a common `{DB_DOCS_NS} / 
> {DOC_KEY}` they will be stored on the same server most of the time. The 
> document can be retrieved by using range query (`txn.get_range("{DB_DOCS_NS} 
> / {DOC_KEY} / 0", "{DB_DOCS_NS} / {DOC_KEY} / 0xFF")`). We can reconstruct 
> the document since the mapping is returned as well.
> 
> The downside of this approach is we wouldn't be able to ensure the same order 
> of keys in the JSON object. Currently the `jiffy` JSON encoder respects order 
> of keys.
> ```
> 4> jiffy:encode({[{bbb, 1}, {aaa, 12}]}).
> <<"{\"bbb\":1,\"aaa\":12}">>
> 5> jiffy:encode({[{aaa, 12}, {bbb, 1}]}).
> <<"{\"aaa\":12,\"bbb\":1}">>
> ```
> 
> Best regards,
> iilyak
> 
> On 2019/01/30 13:02:57, Ilya Khlopotov  wrote: 
>> As you might already know the FoundationDB has a number of limitations which 
>> influences the way we might store JSON documents. The limitations are:
>> 
>> |  limitation |recommended value|recommended max|absolute 
>> max|
>> |-|--:|:|--:|
>> | transaction duration  |  | 
>>   |  5 sec  |
>> | transaction data size |  | 
>>   |  10 Mb |
>> | key size   | 32 bytes |   
>> 1 kB  | 10 kB  |
>> | value size|   |
>>   10 kB |100 kB |
>> 
>> In order to fit the JSON document into 100kB we would have to partition it 
>> in some way. There are three ways of partitioning the document
>> 1. store multiple binary blobs (parts) in different keys
>> 2. flatten JSON structure and store every path leading to a scalar value 
>> under own key
>> 3. measure the size of different branches of a tree representing the JSON 
>> document (while we parse) and use another key for the branch when we about 
>> to exceed the limit
>> 
>> - The first approach is the simplest but it wouldn't allow us to access 
>> parts of the document.
>> - The downsides of a second approach are:
>>  - flattened JSON structure would have long paths which means longer keys
>>  - the scalar value cannot be more than 100kb (unless we split it as well)
>> - Third approach falls short in cases when the structure of the document 
>> doesn't allow a clean cut off branches:
>>  - complex rules to handle all corner cases
>> 
>> The goals of this thread are:
>> - to collect ideas on how to encode and store the JSON document
>> - to comment on the collected ideas
>> 
>> Non goals:
>> - the storage of metadata for the document would be discussed elsewhere
>>  - thumb stones
>>  - edit conflicts
>>  - revisions 
>> 
>> Best regards,
>> iilyak
>> 

-- 
Professional Support for Apache CouchDB:
https://neighbourhood.ie/couchdb-support/

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-30 Thread Ilya Khlopotov

# First proposal

In order to overcome FoudationDB limitations on key size (10 kB) and value size 
(100 kB) we could use the following approach.

Bellow the paths are using slash for illustration purposes only. We can use 
nested subspaces, tuples, directories or something else. 

- Store documents in a subspace or directory  (to keep prefix for a key short)
- When we store the document we would enumerate all field names (0 and 1 are 
reserved) and store the mapping table in the key which look like:
```
{DB_DOCS_NS} / {DOC_KEY} / 0
```
- Flatten the JSON document (convert it into key value pairs where the key is 
`JSON_PATH` and value is `SCALAR_VALUE`)
- Replace elements of JSON_PATH with integers from mapping table we constructed 
earlier
- When we have array use `1 / {array_idx}`
- Store scalar values in the keys which look like the following (we use 
`JSON_PATH` with integers). 
```
{DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH}
```
- If the scalar value exceeds 100kB we would split it and store every part 
under key constructed as:
```
{DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH} / {PART_IDX}
```

Since all parts of the documents are stored under a common `{DB_DOCS_NS} / 
{DOC_KEY}` they will be stored on the same server most of the time. The 
document can be retrieved by using range query (`txn.get_range("{DB_DOCS_NS} / 
{DOC_KEY} / 0", "{DB_DOCS_NS} / {DOC_KEY} / 0xFF")`). We can reconstruct the 
document since the mapping is returned as well.

The downside of this approach is we wouldn't be able to ensure the same order 
of keys in the JSON object. Currently the `jiffy` JSON encoder respects order 
of keys.
```
4> jiffy:encode({[{bbb, 1}, {aaa, 12}]}).
<<"{\"bbb\":1,\"aaa\":12}">>
5> jiffy:encode({[{aaa, 12}, {bbb, 1}]}).
<<"{\"aaa\":12,\"bbb\":1}">>
```

Best regards,
iilyak

On 2019/01/30 13:02:57, Ilya Khlopotov  wrote: 
> As you might already know the FoundationDB has a number of limitations which 
> influences the way we might store JSON documents. The limitations are:
> 
> |  limitation |recommended value|recommended max|absolute max|
> |-|--:|:|--:|
> | transaction duration  |  |  
>  |  5 sec  |
> | transaction data size |  |  
>  |  10 Mb |
> | key size   | 32 bytes |   1 
> kB  | 10 kB  |
> | value size|   | 
>  10 kB |100 kB |
> 
> In order to fit the JSON document into 100kB we would have to partition it in 
> some way. There are three ways of partitioning the document
> 1. store multiple binary blobs (parts) in different keys
> 2. flatten JSON structure and store every path leading to a scalar value 
> under own key
> 3. measure the size of different branches of a tree representing the JSON 
> document (while we parse) and use another key for the branch when we about to 
> exceed the limit
> 
> - The first approach is the simplest but it wouldn't allow us to access parts 
> of the document.
> - The downsides of a second approach are:
>   - flattened JSON structure would have long paths which means longer keys
>   - the scalar value cannot be more than 100kb (unless we split it as well)
> - Third approach falls short in cases when the structure of the document 
> doesn't allow a clean cut off branches:
>   - complex rules to handle all corner cases
> 
> The goals of this thread are:
> - to collect ideas on how to encode and store the JSON document
> - to comment on the collected ideas
> 
> Non goals:
> - the storage of metadata for the document would be discussed elsewhere
>   - thumb stones
>   - edit conflicts
>   - revisions 
> 
> Best regards,
> iilyak
>

[DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-30 Thread Ilya Khlopotov

As you might already know the FoundationDB has a number of limitations which 
influences the way we might store JSON documents. The limitations are:

|  limitation |recommended value|recommended max|absolute max|
|-|--:|:|--:|
| transaction duration  |  |
   |  5 sec  |
| transaction data size |  |
   |  10 Mb |
| key size   | 32 bytes |   1 
kB  | 10 kB  |
| value size|   |  
10 kB |100 kB |

In order to fit the JSON document into 100kB we would have to partition it in 
some way. There are three ways of partitioning the document
1. store multiple binary blobs (parts) in different keys
2. flatten JSON structure and store every path leading to a scalar value under 
own key
3. measure the size of different branches of a tree representing the JSON 
document (while we parse) and use another key for the branch when we about to 
exceed the limit

- The first approach is the simplest but it wouldn't allow us to access parts 
of the document.
- The downsides of a second approach are:
  - flattened JSON structure would have long paths which means longer keys
  - the scalar value cannot be more than 100kb (unless we split it as well)
- Third approach falls short in cases when the structure of the document 
doesn't allow a clean cut off branches:
  - complex rules to handle all corner cases

The goals of this thread are:
- to collect ideas on how to encode and store the JSON document
- to comment on the collected ideas

Non goals:
- the storage of metadata for the document would be discussed elsewhere
  - thumb stones
  - edit conflicts
  - revisions 

Best regards,
iilyak

55 matches

Mail list logo