Re: normalising the rdb database schema

Vikas Saurabh Tue, 16 Aug 2016 05:39:50 -0700

Hi Tomek,

While at first glance I like the idea of normalizing the schema, but
there are potential practical issues with the approach:
* It'd incur a very heavy migration impact on upgrade or RDB setups -
that, most probably, would translate to us having to support both
schemas. I don't feel that it'd easy to flip the switch for existing
setups.
* DocumentNodeStore implementation very freely touches prop:rev=value
for a given id... i.e it assumes there's no cost (at least minimal)
cost involved in persisting those (a commit would tentatively set the
value and flip it to null (which in new schema would be same as
deleting the row) if the commit fails). I think this would get
expensive for index (_id+propName+rev) maintenance - note, in current
scheme of things a document gets deleted only on revision gc ... so,
index cost is really very minimal (well, apart from _modified one)...


Btw, I like the basic idea (and the advantages that you mentioned)...
just that I think we probably need to be careful if we go ahead with
this.

Thanks,
Vikas

On Tue, Aug 16, 2016 at 12:25 PM, Tomek Rekawek <[email protected]> wrote:
> Hello,
>
> I was wondering whether it’d make sense to normalise the RDB Document Store 
> schema - get rid of the JSON/JSOP concatenated strings and store each 
> key/value in a separate database row. Something like this:
>
> id          STRING
> key         STRING
> revision    STRING (nullable)
> value       (LONG) STRING
> modcount    INTEGER
>
> The id+key+revision would make an unique primary key.
>
> So, an example node from the DocumentMK documentation [1]:
>
> {
>     "_id" : "1:/node",
>     "_deleted" : {
>         "r13f3875b5d1-0-1" : "false"
>     },
>     "_lastRev" : {
>         "r0-0-1" : "r13f3875b5d1-0-1"
>     },
>     "_modified" : NumberLong(274208361),
>     "_modCount" : NumberLong(1),
>     "_children" : Boolean(true),
>     "_revisions" : {
>         "r13f3875b5d1-0-1" : "c"
>     }
> }
>
> Would transform to following database rows:
>
> (id, key, revision, value, modcount)
> (“1:/node”, “_deleted”, "r13f3875b5d1-0-1”, “false”, 1)
> (“1:/node”, “_lastRev”, "r0-0-1”, “r13f3875b5d1-0-1”, 1)
> (“1:/node”, “_modified”, null, “274208361”, 1)
> (“1:/node”, “_children”, null, “true”, 1)
> (“1:/node”, “_revisions”, "r13f3875b5d1-0-1", “c”, 1)
>
> Creating a new document would require batching a few INSERTs. Updating a 
> document will combine INSERTs (for the new properties) and UPDATEs (for the 
> modified ones). Each update would end with a modcount increment for all rows 
> related to the given document. Fetching a document will require reading all 
> rows for given id. I think all of these reads and writes can be done in 
> batches, so we’ll end up with a single database call anyway.
>
> Advantages I can see here are:
>
> * no need to parse/serialize JSONs and JSONPs (less load on the Oak instance),
> * no need to periodically compact the JSONPs,
> * more granular updates are possible, we can properly implement all the 
> UpdateOp cases,
> * we can better use the database features, as now the DBE is aware about the 
> document internal structure (it’s not a blob anymore). Eg. we can fetch only 
> a few properties.
>
> For me such design looks more natural and RDB-native. The schema is just a 
> draft and probably I’m missing something, but I wanted to ask about a general 
> feedback on this approach. WDYT?
>
> Regards,
> Tomek
>
> --
> Tomek Rękawek | Adobe Research | www.adobe.com
> [email protected]
>

Re: normalising the rdb database schema

Reply via email to