This is probably going to be a stupid question because the solution seems so obvious I must have missed something fundamental.
I found OrientDB when I gave up on MongoDB due the issue of storing field names in every document (for a lot of my data the field names are larger than the data itself). I just came across issue #1890 <https://github.com/orientechnologies/orientdb/issues/1890> and happy to see that Orient considers this a priority but I don't quite understand the need for such a complex approach. Why not simply maintain an internal index of field names and store the index? It wouldn't really matter if you had different classes with the same field name since the name is all you are interested in. To further compact things you could use a format like google protobufs 'varint' type <https://developers.google.com/protocol-buffers/docs/encoding#varints>. If you altered the varint format so the first byte 'grouping' was 16 bits rather than 8 then you'd have 32k field names available before needing to expand (which would cover an awful lot of uses cases). The lookup would be as trivial as an array lookup and any overhead would be more than offset by the benefits of being able to cache many more records in memory due to the space savings. Another potential advantage would be that you only ever use one instance of each field name String and vastly improve any map lookups that are done internally. If the current format writes the actual field name as a string then every time a field is read it's reading a new string so for every field * every record where a map lookup is required it must compute hashcode and run a manual char by char equals(). 3 traversals of the string saved on the first lookup (1 for hashcode and 1 for both strings) and 2 for subsequent lookups. On the client side I suppose there is the issue of whether the client should keep the entire lookup table in memory. It could be passed portions of it as needed and use something like a Trove map for lookups. Not quite as fast as an array lookup but again I would imagine the savings in memory, bandwidth etc would more than offset the cost. I must be missing something? -- --- You received this message because you are subscribed to the Google Groups "OrientDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
