Re: [orientdb] Schema Driven Binary Serialization - draft spec

Luca Garulli Mon, 07 Apr 2014 17:13:23 -0700

Hi Steve,
Sorry for the delay on this answer.

I'm really really impressed by your whole job, regarding the analysis, code
and documentation!


I think all the OrientDB users will thank you for this contribution, even
if we're just at the beginning :-)

I like very much the idea about Class versioning to avoid massive update of
database like RDBMSs do.

I try to answer to your open questions and at the end my thoughts.


> *Things to Deal With*
> *...*
>
*OType.ANY*
>
> No binary serializer currently exists that can handle OType.ANY.  I need
> to find out if there is existing code to determine type from untyped
> input.  I assume there must be because there is an
> ODocument.field(fieldName, value) method.
>

Don't worry about ANY, but rather we need to support CUSTOM with a custom
implementation of serialization, or just any object that implements
Serializable and Externalizable.


> *Persisting additional class metadata*
>
> There is a fundamental mismatch between the way that OrientDB persists
> classes and this scheme.  Namely that each OClassVersion (the current
> equivalent of OClassImpl) is a member of an OClassSet.  Each OClassSet
> shares a table of nameId -> name mappings between all of it's child
> OClassVersions.  The logical way to persist this would be:
>
> OClassSet {
>     int classId;
>     Map<Integer, String> nameIdMap;
>     List<OClassVersion> versions;
> }
>

What's the content of nameIdMap? What nameId stands for?


> Piggybacking OClassSet on top of OClassImpl doesn't seem the right way to
> do this.
>
> Additionally there will need to be persisted a database global map of
> classId -> OClassSet.
>
> I'm open to suggestions as to how to achieve this.  These special
> documents probably cannot be persisted themselves in the binary format
> (without some ugly hacking) as the OBinarySerializer is dependent on
> looking up the OClassSet and nameIds.
>

We've a Schema that can manage this. Schema record is marshalled like
others, so we can add what we want.


>
> *Removing bytes after deserialization*
>
> Lazy serialization/deserialization is quite feasible by overriding the
> various ODocument.field() methods.  i.e. when we read a record we only
> parse the header (in fact only need to parse the first section of the
> header initially).  Then if a field is requested that hasn't been retrieved
> yet we scan the header entry and deserialize.  The question is then raised,
> under what circumstances is it too expensive to hold on to the backing byte
> array rather than just deserializing the remaining fields and releasing
> it.  It would be useful if there was some mechanism to determine if the
> record is part of a large query.  Or if the OBinDocument itself provides a
> method to initiate this so that OrientDB can manage it at a lower level.
>

I'd like to explore the road to completely avoid to use the
Map<String,Object> of ODocument's _fieldValues. In facts, with an efficient
marshallin/unmarshalling we could do it at the fly.

PROS:
- Less RAM used and less objects in Garbage Collector (have you ever seen
tons of Map.Entry?)
- Less copies of buffers: the byte[] could be the same read from the
OStorage layer
- No need of Level2 cache anymore: DiskCache keeps pages, so storing the
unmarshalled Document has no more sense

CONS:
- Slower access to the fields multiple times, but in this case developers
could call field() once and store the content in a local variable

WDYT?

We could also use a hybrid approach or different implementation of
ODocument to let the developer to decide what to use.


> *Current Code Cleanup*
>
> *Bring back OBinHeaderEntry*
>
> OBinProperty (extends OProperty) and OBinHeaderEntry both implement
> IBinHeaderEntry.
>
> OBinHeaderEntry was merged into OBinProperty in an effort to simplify.
> For an OBinRecordHeader we clone the schema declared OBinProperties from
> the schema then add additional OBinProperties for any other fields that
> exist (which is ugly because the properties are never added to the
> schema).  OBinHeaderEntry is both lighter weight, easier to object pool and
> makes a clear distinction between mutable and immutable.
>

Cool.


>
> *Tighten up the API*
>
> Nothing public unless it needs to be exposed.
>

Agreed.

*Poor documentation*

you're right, we've some port totally undocumented. The respective authors
of the code classes/snippets will spend some hours all the weeks to improve
them.

*Partial serialization*

I'd like also to explore the partial serialization case.

I mean the case when a user executes a query, browse the result set, change
a document field and send it back to the database to be saved.

Now we keep tracks of changes in the ODocument (used also by indexes to
maintain aligned), so we could marshall and overwrite only the changed
field in byte[].

This feature must go together with abandon usage of Map to store field
values but use only the byte[].

Lvc@

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [orientdb] Schema Driven Binary Serialization - draft spec

Reply via email to