+1 for this (if someone is counting) it's very relevant for our use-case (schema-mixed).
Regards, -Stefan On Tuesday, 18 February 2014 13:32:24 UTC, Steve Coughlan wrote: > > Curious to know is there currently a 'defrag' tool or something of that > nature? If so that would be the ideal place to insert the schema > consolidation process. > > On Sunday, February 16, 2014 1:53:27 PM UTC+10, Steve Coughlan wrote: >> >> This is probably going to be a stupid question because the solution >> seems so obvious I must have missed something fundamental. >> >> I found OrientDB when I gave up on MongoDB due the issue of storing field >> names in every document (for a lot of my data the field names are larger >> than the data itself). I just came across issue >> #1890<https://github.com/orientechnologies/orientdb/issues/1890>and happy to >> see that Orient considers this a priority but I don't quite >> understand the need for such a complex approach. >> >> Why not simply maintain an internal index of field names and store the >> index? It wouldn't really matter if you had different classes with the >> same field name since the name is all you are interested in. To further >> compact things you could use a format like google protobufs 'varint' >> type<https://developers.google.com/protocol-buffers/docs/encoding#varints>. >> If you altered the varint format so the first byte 'grouping' was 16 bits >> rather than 8 then you'd have 32k field names available before needing to >> expand (which would cover an awful lot of uses cases). >> >> The lookup would be as trivial as an array lookup and any overhead would >> be more than offset by the benefits of being able to cache many more >> records in memory due to the space savings. Another potential advantage >> would be that you only ever use one instance of each field name String and >> vastly improve any map lookups that are done internally. If the current >> format writes the actual field name as a string then every time a field is >> read it's reading a new string so for every field * every record where a >> map lookup is required it must compute hashcode and run a manual char by >> char equals(). 3 traversals of the string saved on the first lookup (1 for >> hashcode and 1 for both strings) and 2 for subsequent lookups. >> >> On the client side I suppose there is the issue of whether the client >> should keep the entire lookup table in memory. It could be passed portions >> of it as needed and use something like a Trove map for lookups. Not quite >> as fast as an array lookup but again I would imagine the savings in memory, >> bandwidth etc would more than offset the cost. >> >> I must be missing something? >> > -- --- You received this message because you are subscribed to the Google Groups "OrientDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
