Hi Tomek, I like the idea of revisiting our current schema based on usage so far. However couple of points around potential issue with such a normalized approach
- This approach would lead to a thin and loooong table. As noted in [1] in a small repo ~14 M nodes we have ~26 M properties. With multiple revisions (GC takes some time) this can go higher. This would then increase the memory requirement for id index. Memory consumption increases further with id+key+revision index. For any db to perform optimally the index should fit in ram. So such such a design would possibly reduce the max size of repository which can be supported (compared to older one) for given memory - The read for specific id can be done in 1 remote call. But that would involve select across multiple rows which might increase the time taken as it would involve 'm' index lookup and then 'm' reads of row data for any node having 'n' properties (m > n assuming multiple revision for property present) May be we should explore the json support being introduced in multiple dbs. DB2 [2], SQL Server [3], Oracle [4], Postgres [5], MySql [6]. Problem here is that we would need DB specific implementation and also increases the testing effort! > we can better use the database features, as now the DBE is aware about the > document internal structure (it’s not a blob anymore). Eg. we can fetch only > a few properties. In most cases the kind of properties stored in blob part of db row are always read as a whole. Chetan Mehrotra [1] https://issues.apache.org/jira/browse/OAK-4471 [2] http://www.ibm.com/developerworks/data/library/techarticle/dm-1306nosqlforjson1/ [3] https://msdn.microsoft.com/en-in/library/dn921897.aspx [4] https://docs.oracle.com/database/121/ADXDB/json.htm [5] https://www.postgresql.org/docs/9.3/static/functions-json.html [6] https://dev.mysql.com/doc/refman/5.7/en/json.html On Wed, Aug 17, 2016 at 7:19 AM, Michael Marth <mma...@adobe.com> wrote: > Hi Tomek, > > I like the idea (agree with Vikas’ comments / cautions as well). > > You are hinting at expected performance differences (maybe faster or slower > than the current approach). That would probably be worthwhile to investigate > in order to assess your idea. > > One more (hypothetical at this point) advantage of your approach: we could > utilise DB-native indexes as a replacement for property indexes. > > Cheers > Michael > > > > On 16/08/16 07:42, "Tomek Rekawek" <reka...@adobe.com> wrote: > >>Hi Vikas, >> >>thanks for the reply. >> >>> On 16 Aug 2016, at 14:38, Vikas Saurabh <vikas.saur...@gmail.com> wrote: >> >>> * It'd incur a very heavy migration impact on upgrade or RDB setups - >>> that, most probably, would translate to us having to support both >>> schemas. I don't feel that it'd easy to flip the switch for existing >>> setups. >> >>That’s true. I think we should take a similar approach here as with the >>segment / segment-tar implementations (and we can use oak-upgrade to convert >>between them). At least for now. >> >>> * DocumentNodeStore implementation very freely touches prop:rev=value >>> for a given id… […] I think this would get >>> expensive for index (_id+propName+rev) maintenance. >> >>Indeed, probably we’ll have to analyse the indexing capabilities offered by >>different database engines more closely, choosing the one that offers good >>writing speed. >> >>Best regards, >>Tomek >> >>-- >>Tomek Rękawek | Adobe Research | www.adobe.com >>reka...@adobe.com