Re: [orientdb] Schema driven serialization #1890

Steve Sun, 25 May 2014 05:04:19 -0700

Hi Luca,

Can you email me contact details for Emanuale so I can get in touch with
him?  It would be useful if we could start talking at this point.


>
> While the serialization seems at good point, I've a few questions:
> - did you keep the layer between serialization and ODocument? I'd like
> to explore also the way where the ODocument doesn't contain the Map of
> fields but rather work against the byte[]

I fiddled this for a while and spent a day or two messing around with a
custom implementation of Map interface as a drop in replacement that
used a backing array for values and the keyset from the schema so we
could avoid Map.Entry for storing schema-defined fields.  It falls back
to a hashmap for non-schema fields.  That seemed the least intrusive way
to do it but I ended up discarding it.  As it seems the various maps
used in ODocument are exposed elsewhere and we had the problem that
iterating over the entrySet was very common.  In the end I think the
overhead of that implementation of Map was a tradeoff against GC of many
Entry objects and I don't think there was much advantage.

I'm still not convinced that it is wise for field("fieldName") to simply
deserialize on the fly.  The issue remains that two different
invocations of doc.field("field1") will return two different instances
where f1a != f1b and f1a.equals(f1b) only if equals() has been
implemented. 

There would be another way to approach it with backing arrays, the only
real advantage though is reducing Map.Entry object creation.  It would
be far more invasive to the ODocument class though which is why I tried
the above approach first.  If there is really demonstrable benefit to
reducing creation of Map.Entry then it may be worth pursuing.

> - do you already have written serializers for all the supported types?

All except collection types and CUSTOM.  I have LINK and EMBEDDED
working, just not the matching list/set/map types.  Wanted to wait until
I'd discussed with someone more familiar with existing serialization
before deciding on the serialization format for those.

> - how many test cases are broken?

I have not tried to run test cases as yet.  Currently I just have a
simple main() class that creates a schema then saves a document.  Then
reload the the database and loads the document.  It's very rudimentary.
> - do you already have some numbers about first benchmarks?

Will need to some guidance toward any benchmarking code.  I haven't
attempted this yet.  Though the only area where I can possible see
performance being penalised is in altering schema (which should be a
rare event)

>
> Lvc@
>
>
>
> On 24 May 2014 02:45, Steve <[email protected]
> <mailto:[email protected]>> wrote:
>
>     Dmitriy,
>
>     Currently the name catalog is per class.  SCHEMALESS is a special
>     class so there is a global name catalog for schemaless documents.
>
>     I've just pushed an update that enables setting the option to
>     embed field names in the record header for fields that are not
>     schema declared.  This is a per class option although the header
>     format allows this to be done on a per field basis if we
>     implemented a couple of variations of ODocument.field().  In
>     theory you could set it globally by setting the option on the
>     SCHEMALESS class.
>
>     The variation of header format is simply that nameId == 0 is
>     reserved as a marker to indicate embedded field name.  If so then
>     the name is stored immediately after nameId as varint(string
>     length) + stringBytes.
>
>     commit is here:
>     
> https://github.com/shadders/orientdb/commit/079a87decd051290addbb294930ddce63054a19d
>
>
>
>     On 24/05/14 01:30, Luca Garulli wrote:
>>     Hi Dmitriy,
>>     Everything is still open and what Steve created is a POC to study
>>     the best serialization mechanism for OrientDB 2.0. I personally
>>     like to manage schema-less fields at the tail of the records.
>>
>>     Lvc@
>>
>>
>>     On 23 May 2014 17:21, Dmitriy Krasnikov <[email protected]
>>     <mailto:[email protected]>> wrote:
>>
>>         I tried to follow the discussion, but it went from storing
>>         field names and serializations of classes, to binary
>>         serialization too quickly.
>>         What is a final decision on storing fields for strict class
>>         definition?
>>         I am a little afraid on global field catalog, for a cases
>>         where millions of records created with schema free layout and
>>         system looking up and creating new fields in catalog for each
>>         record. Schema full class definition vs Schema free class
>>         definition sounds like perfect solution.
>>         Are you still considering it for 2.0?
>>         -- 
>>
>>         ---
>>         You received this message because you are subscribed to the
>>         Google Groups "OrientDB" group.
>>         To unsubscribe from this group and stop receiving emails from
>>         it, send an email to
>>         [email protected]
>>         <mailto:[email protected]>.
>>         For more options, visit https://groups.google.com/d/optout.
>>
>>
>>     -- 
>>
>>     ---
>>     You received this message because you are subscribed to the
>>     Google Groups "OrientDB" group.
>>     To unsubscribe from this group and stop receiving emails from it,
>>     send an email to [email protected]
>>     <mailto:[email protected]>.
>>     For more options, visit https://groups.google.com/d/optout.
>
>     -- 
>
>     ---
>     You received this message because you are subscribed to the Google
>     Groups "OrientDB" group.
>     To unsubscribe from this group and stop receiving emails from it,
>     send an email to [email protected]
>     <mailto:[email protected]>.
>     For more options, visit https://groups.google.com/d/optout.
>
>
> -- 
>
> ---
> You received this message because you are subscribed to the Google
> Groups "OrientDB" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to [email protected]
> <mailto:[email protected]>.
> For more options, visit https://groups.google.com/d/optout.

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [orientdb] Schema driven serialization #1890

Reply via email to