Re: [orientdb] Schema driven serialization #1890

MrFT Mon, 26 May 2014 05:51:07 -0700


Maybe there are some useful ideas to be found in ArangoDB, they claim "*Unlike 
other nosql databases ArangoDB does not store the attributes names and 
types over and over again (which needs lots of space). Instead it 
recognizes the “implicit schema” and stores it space-efficient separately 
from the data – yet every record can look differently.*"


https://www.arangodb.org/high-performance

https://github.com/triAGENS/ArangoDB




Op zondag 25 mei 2014 14:03:41 UTC+2 schreef Steve Coughlan:
>
>  Hi Luca,
>
> Can you email me contact details for Emanuale so I can get in touch with 
> him?  It would be useful if we could start talking at this point.
>
>  
>  While the serialization seems at good point, I've a few questions:
> - did you keep the layer between serialization and ODocument? I'd like to 
> explore also the way where the ODocument doesn't contain the Map of fields 
> but rather work against the byte[]
>  
>
> I fiddled this for a while and spent a day or two messing around with a 
> custom implementation of Map interface as a drop in replacement that used a 
> backing array for values and the keyset from the schema so we could avoid 
> Map.Entry for storing schema-defined fields.  It falls back to a hashmap 
> for non-schema fields.  That seemed the least intrusive way to do it but I 
> ended up discarding it.  As it seems the various maps used in ODocument are 
> exposed elsewhere and we had the problem that iterating over the entrySet 
> was very common.  In the end I think the overhead of that implementation of 
> Map was a tradeoff against GC of many Entry objects and I don't think there 
> was much advantage.
>
> I'm still not convinced that it is wise for field("fieldName") to simply 
> deserialize on the fly.  The issue remains that two different invocations 
> of doc.field("field1") will return two different instances where f1a != f1b 
> and f1a.equals(f1b) only if equals() has been implemented.  
>
> There would be another way to approach it with backing arrays, the only 
> real advantage though is reducing Map.Entry object creation.  It would be 
> far more invasive to the ODocument class though which is why I tried the 
> above approach first.  If there is really demonstrable benefit to reducing 
> creation of Map.Entry then it may be worth pursuing.
>
>  - do you already have written serializers for all the supported types?
>  
>
> All except collection types and CUSTOM.  I have LINK and EMBEDDED working, 
> just not the matching list/set/map types.  Wanted to wait until I'd 
> discussed with someone more familiar with existing serialization before 
> deciding on the serialization format for those.
>
>  - how many test cases are broken?
>  
>
> I have not tried to run test cases as yet.  Currently I just have a simple 
> main() class that creates a schema then saves a document.  Then reload the 
> the database and loads the document.  It's very rudimentary.
>
>  - do you already have some numbers about first benchmarks?
>  
>
> Will need to some guidance toward any benchmarking code.  I haven't 
> attempted this yet.  Though the only area where I can possible see 
> performance being penalised is in altering schema (which should be a rare 
> event)
>
>  
>  Lvc@
>
>  
>
> On 24 May 2014 02:45, Steve <[email protected] <javascript:>> wrote:
>
>>  Dmitriy,
>>
>> Currently the name catalog is per class.  SCHEMALESS is a special class 
>> so there is a global name catalog for schemaless documents.
>>
>> I've just pushed an update that enables setting the option to embed field 
>> names in the record header for fields that are not schema declared.  This 
>> is a per class option although the header format allows this to be done on 
>> a per field basis if we implemented a couple of variations of 
>> ODocument.field().  In theory you could set it globally by setting the 
>> option on the SCHEMALESS class.
>>
>> The variation of header format is simply that nameId == 0 is reserved as 
>> a marker to indicate embedded field name.  If so then the name is stored 
>> immediately after nameId as varint(string length) + stringBytes.
>>
>> commit is here:
>>
>> https://github.com/shadders/orientdb/commit/079a87decd051290addbb294930ddce63054a19d
>>  
>>  
>>
>> On 24/05/14 01:30, Luca Garulli wrote:
>>  
>> Hi Dmitriy, 
>> Everything is still open and what Steve created is a POC to study the 
>> best serialization mechanism for OrientDB 2.0. I personally like to manage 
>> schema-less fields at the tail of the records.
>>
>>  Lvc@
>>  
>>
>> On 23 May 2014 17:21, Dmitriy Krasnikov <[email protected]<javascript:>
>> > wrote:
>>
>>> I tried to follow the discussion, but it went from storing field names 
>>> and serializations of classes, to binary serialization too quickly. 
>>> What is a final decision on storing fields for strict class definition?
>>> I am a little afraid on global field catalog, for a cases where millions 
>>> of records created with schema free layout and system looking up and 
>>> creating new fields in catalog for each record. Schema full class 
>>> definition vs Schema free class definition sounds like perfect solution.
>>> Are you still considering it for 2.0?
>>>   -- 
>>>
>>> --- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "OrientDB" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected] <javascript:>.
>>> For more options, visit https://groups.google.com/d/optout.
>>>  
>>  
>>  -- 
>>
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "OrientDB" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>>    -- 
>>
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "OrientDB" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>  
>  
>  -- 
>
> --- 
> You received this message because you are subscribed to the Google Groups 
> "OrientDB" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> For more options, visit https://groups.google.com/d/optout.
>
>
> 

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [orientdb] Schema driven serialization #1890

Reply via email to