Re: [orientdb] Schema driven serialization #1890

Steve Mon, 17 Feb 2014 03:48:08 -0800

Thanks Andrey,

I'm still convinced that my idea is too simple and too obvious so I must
be missing something.


If I am I'd love someone to tell me what I've missed so I can understand
Orient better.  That was the main reason for putting the question.

On 17/02/14 21:31, Andrey Lomakin wrote:
> Hi Steve )).
> It seems good idea, I will put your comment inside issue.
>
>
> On Sun, Feb 16, 2014 at 5:53 AM, Steve <[email protected]
> <mailto:[email protected]>> wrote:
>
>     This is probably going to be a stupid question because the
>     solution seems so obvious I must have missed something fundamental.
>
>     I found OrientDB when I gave up on MongoDB due the issue of
>     storing field names in every document (for a lot of my data the
>     field names are larger than the data itself).  I just came across
>     issue #1890
>     <https://github.com/orientechnologies/orientdb/issues/1890> and
>     happy to see that Orient considers this a priority but I don't
>     quite understand the need for such a complex approach.
>
>     Why not simply maintain an internal index of field names and store
>     the index?  It wouldn't really matter if you had different classes
>     with the same field name since the name is all you are interested
>     in.  To further compact things you could use a format like google
>     protobufs 'varint' type
>     <https://developers.google.com/protocol-buffers/docs/encoding#varints>.
>     If you altered the varint format so the first byte 'grouping' was
>     16 bits rather than 8 then you'd have 32k field names available
>     before needing to expand (which would cover an awful lot of uses
>     cases).
>
>     The lookup would be as trivial as an array lookup and any overhead
>     would be more than offset by the benefits of being able to cache
>     many more records in memory due to the space savings.  Another
>     potential advantage would be that you only ever use one instance
>     of each field name String and vastly improve any map lookups that
>     are done internally.  If the current format writes the actual
>     field name as a string then every time a field is read it's
>     reading a new string so for every field * every record where a map
>     lookup is required it must compute hashcode and run a manual char
>     by char equals(). 3 traversals of the string saved on the first
>     lookup (1 for hashcode and 1 for both strings) and 2 for
>     subsequent lookups.
>
>     On the client side I suppose there is the issue of whether the
>     client should keep the entire lookup table in memory.  It could be
>     passed portions of it as needed and use something like a Trove map
>     for lookups.  Not quite as fast as an array lookup but again I
>     would imagine the savings in memory, bandwidth etc would more than
>     offset the cost.
>
>     I must be missing something?
>     -- 
>      
>     ---
>     You received this message because you are subscribed to the Google
>     Groups "OrientDB" group.
>     To unsubscribe from this group and stop receiving emails from it,
>     send an email to [email protected]
>     <mailto:orient-database%[email protected]>.
>     For more options, visit https://groups.google.com/groups/opt_out.
>
>
>
>
> -- 
> Best regards,
> Andrey Lomakin.
>
> Orient Technologies
> the Company behind OrientDB
>
> -- 
>  
> ---
> You received this message because you are subscribed to the Google
> Groups "OrientDB" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: [orientdb] Schema driven serialization #1890

Reply via email to