Hi Steve,
my thoughts are also to improve performance when users make usage of
schema-full/mixed feature. The point is: why should I store the field name
when I've declared that a class has such names? Example:
persistent class Employee {
String name;
String surname;
int age;
}
My idea is to assign an id (short or integer) to the property and use that
id instead of name. This would reduce dramatically the record sizes and the
memory consumed. We've to figure out a way where:
- schema-full -> best performance
- schema-mixed -> uses schema fields when declared, then go schema-free
- schema-free -> close to now, all the field names are stored in the record
Then we're thinking about the best way to store field and values.
Lvc@
On 17 February 2014 12:46, Steve <[email protected]> wrote:
> Thanks Andrey,
>
> I'm still convinced that my idea is too simple and too obvious so I must
> be missing something.
>
> If I am I'd love someone to tell me what I've missed so I can understand
> Orient better. That was the main reason for putting the question.
>
>
> On 17/02/14 21:31, Andrey Lomakin wrote:
>
> Hi Steve )).
> It seems good idea, I will put your comment inside issue.
>
>
> On Sun, Feb 16, 2014 at 5:53 AM, Steve <[email protected]> wrote:
>
>> This is probably going to be a stupid question because the solution
>> seems so obvious I must have missed something fundamental.
>>
>> I found OrientDB when I gave up on MongoDB due the issue of storing field
>> names in every document (for a lot of my data the field names are larger
>> than the data itself). I just came across issue
>> #1890<https://github.com/orientechnologies/orientdb/issues/1890>and happy to
>> see that Orient considers this a priority but I don't quite
>> understand the need for such a complex approach.
>>
>> Why not simply maintain an internal index of field names and store the
>> index? It wouldn't really matter if you had different classes with the
>> same field name since the name is all you are interested in. To further
>> compact things you could use a format like google protobufs 'varint'
>> type<https://developers.google.com/protocol-buffers/docs/encoding#varints>.
>> If you altered the varint format so the first byte 'grouping' was 16 bits
>> rather than 8 then you'd have 32k field names available before needing to
>> expand (which would cover an awful lot of uses cases).
>>
>> The lookup would be as trivial as an array lookup and any overhead would
>> be more than offset by the benefits of being able to cache many more
>> records in memory due to the space savings. Another potential advantage
>> would be that you only ever use one instance of each field name String and
>> vastly improve any map lookups that are done internally. If the current
>> format writes the actual field name as a string then every time a field is
>> read it's reading a new string so for every field * every record where a
>> map lookup is required it must compute hashcode and run a manual char by
>> char equals(). 3 traversals of the string saved on the first lookup (1 for
>> hashcode and 1 for both strings) and 2 for subsequent lookups.
>>
>> On the client side I suppose there is the issue of whether the client
>> should keep the entire lookup table in memory. It could be passed portions
>> of it as needed and use something like a Trove map for lookups. Not quite
>> as fast as an array lookup but again I would imagine the savings in memory,
>> bandwidth etc would more than offset the cost.
>>
>> I must be missing something?
>> --
>>
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "OrientDB" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>
>
> --
> Best regards,
> Andrey Lomakin.
>
> Orient Technologies
> the Company behind OrientDB
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "OrientDB" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "OrientDB" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>
--
---
You received this message because you are subscribed to the Google Groups
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.