Thanks Andrey, I'm still convinced that my idea is too simple and too obvious so I must be missing something.
If I am I'd love someone to tell me what I've missed so I can understand Orient better. That was the main reason for putting the question. On 17/02/14 21:31, Andrey Lomakin wrote: > Hi Steve )). > It seems good idea, I will put your comment inside issue. > > > On Sun, Feb 16, 2014 at 5:53 AM, Steve <[email protected] > <mailto:[email protected]>> wrote: > > This is probably going to be a stupid question because the > solution seems so obvious I must have missed something fundamental. > > I found OrientDB when I gave up on MongoDB due the issue of > storing field names in every document (for a lot of my data the > field names are larger than the data itself). I just came across > issue #1890 > <https://github.com/orientechnologies/orientdb/issues/1890> and > happy to see that Orient considers this a priority but I don't > quite understand the need for such a complex approach. > > Why not simply maintain an internal index of field names and store > the index? It wouldn't really matter if you had different classes > with the same field name since the name is all you are interested > in. To further compact things you could use a format like google > protobufs 'varint' type > <https://developers.google.com/protocol-buffers/docs/encoding#varints>. > If you altered the varint format so the first byte 'grouping' was > 16 bits rather than 8 then you'd have 32k field names available > before needing to expand (which would cover an awful lot of uses > cases). > > The lookup would be as trivial as an array lookup and any overhead > would be more than offset by the benefits of being able to cache > many more records in memory due to the space savings. Another > potential advantage would be that you only ever use one instance > of each field name String and vastly improve any map lookups that > are done internally. If the current format writes the actual > field name as a string then every time a field is read it's > reading a new string so for every field * every record where a map > lookup is required it must compute hashcode and run a manual char > by char equals(). 3 traversals of the string saved on the first > lookup (1 for hashcode and 1 for both strings) and 2 for > subsequent lookups. > > On the client side I suppose there is the issue of whether the > client should keep the entire lookup table in memory. It could be > passed portions of it as needed and use something like a Trove map > for lookups. Not quite as fast as an array lookup but again I > would imagine the savings in memory, bandwidth etc would more than > offset the cost. > > I must be missing something? > -- > > --- > You received this message because you are subscribed to the Google > Groups "OrientDB" group. > To unsubscribe from this group and stop receiving emails from it, > send an email to [email protected] > <mailto:orient-database%[email protected]>. > For more options, visit https://groups.google.com/groups/opt_out. > > > > > -- > Best regards, > Andrey Lomakin. > > Orient Technologies > the Company behind OrientDB > > -- > > --- > You received this message because you are subscribed to the Google > Groups "OrientDB" group. > To unsubscribe from this group and stop receiving emails from it, send > an email to [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. -- --- You received this message because you are subscribed to the Google Groups "OrientDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
