Re: [orientdb] Schema driven serialization #1890

Luca Garulli Mon, 17 Feb 2014 11:12:36 -0800

Hi Fabrizio,
this is already possible by creating a separate record connected to the
principal. So I wouldn't add more complexity here for something you can
already do with standard API.


Lvc@



On 17 February 2014 19:37, Fabrizio Fortino <[email protected]>wrote:

> I am particularly interested in this topic too. I use schema mixed mode in
> my application and this technique would improve performance and disk usage.
>
> Keeping on the topic I would like to have the possibility to choose where
> to store some properties.
> Take as an example how MySQL handles TEXT and BLOB. Those fields
> are stored off the table with the table just having a pointer to the
> location of the actual storage.
>
> I would like to decide at class creation and/or field creation time how
> fields are managed.
> For example, I would store declared schema fields in the class structure
> file and have a pointer for undeclared fields (stored off class file).
> In this way classes could contain only "hot fields" (fields used in select
> where clauses) and pointers to "cold fields" .
> Obviously a query on a field outside the class structure will be more
> expensive.
>
> WDYT?
>
> F.
>
>
> On Monday, February 17, 2014 12:21:58 PM UTC, Lvc@ wrote:
>
>> Hi Steve,
>> my thoughts are also to improve performance when users make usage of
>> schema-full/mixed feature. The point is: why should I store the field name
>> when I've declared that a class has such names? Example:
>>
>> persistent class Employee {
>>   String name;
>>   String surname;
>>   int age;
>> }
>>
>> My idea is to assign an id (short or integer) to the property and use
>> that id instead of name. This would reduce dramatically the record sizes
>> and the memory consumed. We've to figure out a way where:
>> - schema-full -> best performance
>> - schema-mixed -> uses schema fields when declared, then go schema-free
>> - schema-free -> close to now, all the field names are stored in the
>> record
>>
>> Then we're thinking about the best way to store field and values.
>>
>> Lvc@
>>
>>
>>
>>
>> On 17 February 2014 12:46, Steve <[email protected]> wrote:
>>
>>>  Thanks Andrey,
>>>
>>> I'm still convinced that my idea is too simple and too obvious so I must
>>> be missing something.
>>>
>>> If I am I'd love someone to tell me what I've missed so I can understand
>>> Orient better.  That was the main reason for putting the question.
>>>
>>>
>>> On 17/02/14 21:31, Andrey Lomakin wrote:
>>>
>>> Hi Steve )).
>>> It seems good idea, I will put your comment inside issue.
>>>
>>>
>>> On Sun, Feb 16, 2014 at 5:53 AM, Steve <[email protected]> wrote:
>>>
>>>>  This is probably going to be a stupid question because the solution
>>>> seems so obvious I must have missed something fundamental.
>>>>
>>>> I found OrientDB when I gave up on MongoDB due the issue of storing
>>>> field names in every document (for a lot of my data the field names are
>>>> larger than the data itself).  I just came across issue 
>>>> #1890<https://github.com/orientechnologies/orientdb/issues/1890>and happy 
>>>> to see that Orient considers this a priority but I don't quite
>>>> understand the need for such a complex approach.
>>>>
>>>> Why not simply maintain an internal index of field names and store the
>>>> index?  It wouldn't really matter if you had different classes with the
>>>> same field name since the name is all you are interested in.  To further
>>>> compact things you could use a format like google protobufs 'varint'
>>>> type<https://developers.google.com/protocol-buffers/docs/encoding#varints>.
>>>> If you altered the varint format so the first byte 'grouping' was 16 bits
>>>> rather than 8 then you'd have 32k field names available before needing to
>>>> expand (which would cover an awful lot of uses cases).
>>>>
>>>> The lookup would be as trivial as an array lookup and any overhead
>>>> would be more than offset by the benefits of being able to cache many more
>>>> records in memory due to the space savings.  Another potential advantage
>>>> would be that you only ever use one instance of each field name String and
>>>> vastly improve any map lookups that are done internally.  If the current
>>>> format writes the actual field name as a string then every time a field is
>>>> read it's reading a new string so for every field * every record where a
>>>> map lookup is required it must compute hashcode and run a manual char by
>>>> char equals(). 3 traversals of the string saved on the first lookup (1 for
>>>> hashcode and 1 for both strings) and 2 for subsequent lookups.
>>>>
>>>> On the client side I suppose there is the issue of whether the client
>>>> should keep the entire lookup table in memory.  It could be passed portions
>>>> of it as needed and use something like a Trove map for lookups.  Not quite
>>>> as fast as an array lookup but again I would imagine the savings in memory,
>>>> bandwidth etc would more than offset the cost.
>>>>
>>>> I must be missing something?
>>>>  --
>>>>
>>>> ---
>>>> You received this message because you are subscribed to the Google
>>>> Groups "OrientDB" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>>
>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>
>>>
>>>
>>>
>>>  --
>>> Best regards,
>>> Andrey Lomakin.
>>>
>>> Orient Technologies
>>> the Company behind OrientDB
>>>
>>>  --
>>>
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "OrientDB" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>>
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>>
>>>  --
>>>
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "OrientDB" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>>
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>
>>  --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "OrientDB" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: [orientdb] Schema driven serialization #1890

Reply via email to