Re: [orientdb] Schema driven serialization #1890

Andrey Lomakin Mon, 17 Feb 2014 03:32:41 -0800

Hi Steve )).
It seems good idea, I will put your comment inside issue.


On Sun, Feb 16, 2014 at 5:53 AM, Steve <[email protected]> wrote:

>  This is probably going to be a stupid question because the solution seems
> so obvious I must have missed something fundamental.
>
> I found OrientDB when I gave up on MongoDB due the issue of storing field
> names in every document (for a lot of my data the field names are larger
> than the data itself).  I just came across issue 
> #1890<https://github.com/orientechnologies/orientdb/issues/1890>and happy to 
> see that Orient considers this a priority but I don't quite
> understand the need for such a complex approach.
>
> Why not simply maintain an internal index of field names and store the
> index?  It wouldn't really matter if you had different classes with the
> same field name since the name is all you are interested in.  To further
> compact things you could use a format like google protobufs 'varint' 
> type<https://developers.google.com/protocol-buffers/docs/encoding#varints>.
> If you altered the varint format so the first byte 'grouping' was 16 bits
> rather than 8 then you'd have 32k field names available before needing to
> expand (which would cover an awful lot of uses cases).
>
> The lookup would be as trivial as an array lookup and any overhead would
> be more than offset by the benefits of being able to cache many more
> records in memory due to the space savings.  Another potential advantage
> would be that you only ever use one instance of each field name String and
> vastly improve any map lookups that are done internally.  If the current
> format writes the actual field name as a string then every time a field is
> read it's reading a new string so for every field * every record where a
> map lookup is required it must compute hashcode and run a manual char by
> char equals(). 3 traversals of the string saved on the first lookup (1 for
> hashcode and 1 for both strings) and 2 for subsequent lookups.
>
> On the client side I suppose there is the issue of whether the client
> should keep the entire lookup table in memory.  It could be passed portions
> of it as needed and use something like a Trove map for lookups.  Not quite
> as fast as an array lookup but again I would imagine the savings in memory,
> bandwidth etc would more than offset the cost.
>
> I must be missing something?
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "OrientDB" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>



-- 
Best regards,
Andrey Lomakin.

Orient Technologies
the Company behind OrientDB

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: [orientdb] Schema driven serialization #1890

Reply via email to