Hi Steve )). It seems good idea, I will put your comment inside issue.
On Sun, Feb 16, 2014 at 5:53 AM, Steve <[email protected]> wrote: > This is probably going to be a stupid question because the solution seems > so obvious I must have missed something fundamental. > > I found OrientDB when I gave up on MongoDB due the issue of storing field > names in every document (for a lot of my data the field names are larger > than the data itself). I just came across issue > #1890<https://github.com/orientechnologies/orientdb/issues/1890>and happy to > see that Orient considers this a priority but I don't quite > understand the need for such a complex approach. > > Why not simply maintain an internal index of field names and store the > index? It wouldn't really matter if you had different classes with the > same field name since the name is all you are interested in. To further > compact things you could use a format like google protobufs 'varint' > type<https://developers.google.com/protocol-buffers/docs/encoding#varints>. > If you altered the varint format so the first byte 'grouping' was 16 bits > rather than 8 then you'd have 32k field names available before needing to > expand (which would cover an awful lot of uses cases). > > The lookup would be as trivial as an array lookup and any overhead would > be more than offset by the benefits of being able to cache many more > records in memory due to the space savings. Another potential advantage > would be that you only ever use one instance of each field name String and > vastly improve any map lookups that are done internally. If the current > format writes the actual field name as a string then every time a field is > read it's reading a new string so for every field * every record where a > map lookup is required it must compute hashcode and run a manual char by > char equals(). 3 traversals of the string saved on the first lookup (1 for > hashcode and 1 for both strings) and 2 for subsequent lookups. > > On the client side I suppose there is the issue of whether the client > should keep the entire lookup table in memory. It could be passed portions > of it as needed and use something like a Trove map for lookups. Not quite > as fast as an array lookup but again I would imagine the savings in memory, > bandwidth etc would more than offset the cost. > > I must be missing something? > > -- > > --- > You received this message because you are subscribed to the Google Groups > "OrientDB" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. > -- Best regards, Andrey Lomakin. Orient Technologies the Company behind OrientDB -- --- You received this message because you are subscribed to the Google Groups "OrientDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
