Hi Tomek,

I like the idea of revisiting our current schema based on usage so
far. However couple of points around potential issue with such a
normalized approach

- This approach would lead to a thin and loooong table. As noted in
[1] in a small repo ~14 M nodes we have ~26 M properties. With
multiple revisions (GC takes some time) this can go higher. This would
then increase the memory requirement for id index. Memory consumption
increases further with id+key+revision index. For any db to perform
optimally the index should fit in ram. So such such a design would
possibly reduce the max size of repository which can be supported
(compared to older one) for given memory

- The read for specific id can be done in 1 remote call. But that
would involve select across multiple rows which might increase the
time taken as it would involve 'm' index lookup and then 'm' reads of
row data for any node having 'n' properties (m > n assuming multiple
revision for property present)

May be we should explore the json support being introduced in multiple
dbs.  DB2 [2], SQL Server [3], Oracle [4], Postgres [5], MySql [6].
Problem here is that we would need DB specific implementation and also
increases the testing effort!

> we can better use the database features, as now the DBE is aware about the 
> document internal structure (it’s not a blob anymore). Eg. we can fetch only 
> a few properties.

In most cases the kind of properties stored in blob part of db row are
always read as a whole.

Chetan Mehrotra
[1] https://issues.apache.org/jira/browse/OAK-4471
[2] 
http://www.ibm.com/developerworks/data/library/techarticle/dm-1306nosqlforjson1/
[3] https://msdn.microsoft.com/en-in/library/dn921897.aspx
[4] https://docs.oracle.com/database/121/ADXDB/json.htm
[5] https://www.postgresql.org/docs/9.3/static/functions-json.html
[6] https://dev.mysql.com/doc/refman/5.7/en/json.html


On Wed, Aug 17, 2016 at 7:19 AM, Michael Marth <mma...@adobe.com> wrote:
> Hi Tomek,
>
> I like the idea (agree with Vikas’ comments / cautions as well).
>
> You are hinting at expected performance differences (maybe faster or slower 
> than the current approach). That would probably be worthwhile to investigate 
> in order to assess your idea.
>
> One more (hypothetical at this point) advantage of your approach: we could 
> utilise DB-native indexes as a replacement for property indexes.
>
> Cheers
> Michael
>
>
>
> On 16/08/16 07:42, "Tomek Rekawek" <reka...@adobe.com> wrote:
>
>>Hi Vikas,
>>
>>thanks for the reply.
>>
>>> On 16 Aug 2016, at 14:38, Vikas Saurabh <vikas.saur...@gmail.com> wrote:
>>
>>> * It'd incur a very heavy migration impact on upgrade or RDB setups -
>>> that, most probably, would translate to us having to support both
>>> schemas. I don't feel that it'd easy to flip the switch for existing
>>> setups.
>>
>>That’s true. I think we should take a similar approach here as with the 
>>segment / segment-tar implementations (and we can use oak-upgrade to convert 
>>between them). At least for now.
>>
>>> * DocumentNodeStore implementation very freely touches prop:rev=value
>>> for a given id… […] I think this would get
>>> expensive for index (_id+propName+rev) maintenance.
>>
>>Indeed, probably we’ll have to analyse the indexing capabilities offered by 
>>different database engines more closely, choosing the one that offers good 
>>writing speed.
>>
>>Best regards,
>>Tomek
>>
>>--
>>Tomek Rękawek | Adobe Research | www.adobe.com
>>reka...@adobe.com

Reply via email to