I am particularly interested in this topic too. I use schema mixed mode in 
my application and this technique would improve performance and disk usage.

Keeping on the topic I would like to have the possibility to choose where 
to store some properties.
Take as an example how MySQL handles TEXT and BLOB. Those fields are stored 
off the table with the table just having a pointer to the location of the 
actual storage.

I would like to decide at class creation and/or field creation time how 
fields are managed.
For example, I would store declared schema fields in the class structure 
file and have a pointer for undeclared fields (stored off class file).
In this way classes could contain only "hot fields" (fields used in select 
where clauses) and pointers to "cold fields" .
Obviously a query on a field outside the class structure will be more 
expensive.

WDYT?

F.


On Monday, February 17, 2014 12:21:58 PM UTC, Lvc@ wrote:
>
> Hi Steve,
> my thoughts are also to improve performance when users make usage of 
> schema-full/mixed feature. The point is: why should I store the field name 
> when I've declared that a class has such names? Example:
>
> persistent class Employee {
>   String name;
>   String surname;
>   int age;
> }
>
> My idea is to assign an id (short or integer) to the property and use that 
> id instead of name. This would reduce dramatically the record sizes and the 
> memory consumed. We've to figure out a way where:
> - schema-full -> best performance
> - schema-mixed -> uses schema fields when declared, then go schema-free
> - schema-free -> close to now, all the field names are stored in the record
>
> Then we're thinking about the best way to store field and values.
>
> Lvc@
>
>
>
>
> On 17 February 2014 12:46, Steve <[email protected] <javascript:>>wrote:
>
>>  Thanks Andrey,
>>
>> I'm still convinced that my idea is too simple and too obvious so I must 
>> be missing something.
>>
>> If I am I'd love someone to tell me what I've missed so I can understand 
>> Orient better.  That was the main reason for putting the question.
>>
>>
>> On 17/02/14 21:31, Andrey Lomakin wrote:
>>  
>> Hi Steve )). 
>> It seems good idea, I will put your comment inside issue.
>>  
>>
>> On Sun, Feb 16, 2014 at 5:53 AM, Steve <[email protected] <javascript:>
>> > wrote:
>>
>>>  This is probably going to be a stupid question because the solution 
>>> seems so obvious I must have missed something fundamental.
>>>
>>> I found OrientDB when I gave up on MongoDB due the issue of storing 
>>> field names in every document (for a lot of my data the field names are 
>>> larger than the data itself).  I just came across issue 
>>> #1890<https://github.com/orientechnologies/orientdb/issues/1890>and happy 
>>> to see that Orient considers this a priority but I don't quite 
>>> understand the need for such a complex approach.
>>>
>>> Why not simply maintain an internal index of field names and store the 
>>> index?  It wouldn't really matter if you had different classes with the 
>>> same field name since the name is all you are interested in.  To further 
>>> compact things you could use a format like google protobufs 'varint' 
>>> type<https://developers.google.com/protocol-buffers/docs/encoding#varints>. 
>>> If you altered the varint format so the first byte 'grouping' was 16 bits 
>>> rather than 8 then you'd have 32k field names available before needing to 
>>> expand (which would cover an awful lot of uses cases).
>>>
>>> The lookup would be as trivial as an array lookup and any overhead would 
>>> be more than offset by the benefits of being able to cache many more 
>>> records in memory due to the space savings.  Another potential advantage 
>>> would be that you only ever use one instance of each field name String and 
>>> vastly improve any map lookups that are done internally.  If the current 
>>> format writes the actual field name as a string then every time a field is 
>>> read it's reading a new string so for every field * every record where a 
>>> map lookup is required it must compute hashcode and run a manual char by 
>>> char equals(). 3 traversals of the string saved on the first lookup (1 for 
>>> hashcode and 1 for both strings) and 2 for subsequent lookups.
>>>
>>> On the client side I suppose there is the issue of whether the client 
>>> should keep the entire lookup table in memory.  It could be passed portions 
>>> of it as needed and use something like a Trove map for lookups.  Not quite 
>>> as fast as an array lookup but again I would imagine the savings in memory, 
>>> bandwidth etc would more than offset the cost.
>>>
>>> I must be missing something?
>>>  -- 
>>>  
>>> --- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "OrientDB" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected] <javascript:>.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>  
>>
>>
>>  -- 
>> Best regards,
>> Andrey Lomakin.
>>
>> Orient Technologies
>> the Company behind OrientDB
>>
>>  -- 
>>  
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "OrientDB" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>>
>>  -- 
>>  
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "OrientDB" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to