Re: [orientdb] Re: Schema Driven Binary Serialization - draft spec

stefan Mon, 07 Apr 2014 11:47:46 -0700

Hi,

this is great news and potentially fills the dent in my plans regarding 
storage cost, thank you.


I hope that Luca and the team make this a top priority and the 
documentation will improve over time.

Regards,
  -Stefan

On Monday, 7 April 2014 13:00:49 UTC, Steve Coughlan wrote:
>
>  Although binary serialization gives us substantial performance advantage 
> the primary motive for this change was saving space (which in a bigdata 
> context IS a performance factor).  With that in mind I've put together of a 
> comparison of a common type of data record using various methods of 
> serialization.  For this comparison I've picked the classic stockmarket 
> quote.  It's a common small record type that's used in large volumes (and 
> also happens to be the main entry barrier for me to using OrientDB):
>
> class Quote {
>     String ticker; //assume 4 chars
>     Date date;
>     float open;
>     float high;
>     float low;
>     float close;
>     long volume;
> }
>
> For the purpose of the exercise we assume that stock price on average is 4 
> digits + 1 decimal place e.g. $12.51 or for a penny stock $0.013.  We will 
> also assume that String encoded field values are enclosed in quotes as per 
> the current implementation (we'll ignore escape chars for this exercise)
> Also assume that Strings are encoded using 2 bytes/char
>
>
>
> *String keys+values serialization (current implentation) *key:  ticker - 
> 8 chars + fieldValueDelimiter (1 char) = 16 bytes
> value: ticker - 3 chars + 2 " chars = 10 bytes
>
> key: date - 4 + 1 = 10 bytes
> value: date - 13 + 2 = 30 bytes // System.currentTimeMillis() returns 13 
> digit number
>
> key: open - 4 + 1 = 10 bytes
> value: open  - 5 + 1 = 12 bytes
>
> key: high - 4 + 1 = 10 bytes
> value: high  - 5 + 1 = 12 bytes
>
> key: low - 3 + 1 = 8 bytes
> value: low  - 5 + 1 = 12 bytes
>
> key: close - 5 + 1 = 12 bytes
> value: close  - 5 + 1 = 12 bytes
>
> key: volume - 6 + 1 = 14 bytes
> value: volume  - 8 + 1 = 18 bytes
>
> TOTAL = 180 bytes
>
> *String keys+ Binary values serialization*
> (assumes length byte for Strings)
>
> key:  ticker - 6 chars + fieldValueDelimiter (1 char) = 14 bytes
> value: ticker - 1 byte + 3 chars = 7 bytes
>
> key: date - 4 + 1 = 10 bytes
> value: date - 8 bytes 
>
> key: open - 4 + 1 = 10 bytes
> value: open  - 4 bytes
>
> key: high - 4 + 1 = 10 bytes
> value: high  - 4 bytes
>
> key: low - 3 + 1 = 8 bytes
> value: low  - 4 bytes
>
> key: close - 5 + 1 = 12 bytes
> value: close  - 4 bytes
>
> key: volume - 6 + 1 = 14 bytes
> value: volume  - 8 bytes
>
> TOTAL = 65 bytes
>
>
> *Binary serialization without declared schema *
>
> * Header: *format, classId, version, headerLength, fieldCount, 
> nullbitsLength, nullbits = 7 bytes
> 7 fields * nameId, datatype, offset, length = 4 * 7 = 28 bytes
> dataLength = 4 bytes
>
> header total: 35 bytes
>
>
> *Data: *ticker = 6 bytes
> open, high, low, close = 4 * 4 = 16 bytes
> volume = 8 bytes
>
> data total: 30 bytes
>
> record total: 58 bytes
>
>
> *Binary serialization with declared schema *
>
> * Header: *format, classId, version, headerLength, fieldCount, 
> nullbitsLength, nullbits = 7 bytes
> ticker field: offset, length = 2 bytes
> dataLength = 4 bytes
>
> header total: 13 bytes
>
>
> *Data: *ticker = 6 bytes
> open, high, low, close = 4 * 4 = 16 bytes
> volume = 8 bytes
>
> data total: 30 bytes
>
> record total: 43 bytes
>
>
> The valid comparison currently is between the current implementation 
> (which doesn't change it's serialized size regardless of whether the class 
> is schema declared) and either of the two binary examples.
>
> i.e. 160 bytes vs either 58 bytes or 43 bytes which in terms of records 
> able to be cached means a factor of 2.8 or 3.7 depending on whether the 
> class is schema declared.
>
>
> On 07/04/14 22:27, [email protected] <javascript:> wrote:
>  
>  
>  Steve,  
>
>  I see you mention serialization of sub-elements as well. 
>
>  How much effort do you think it is to get this working for embedded maps 
> and do you see that as something you will look into?
>
>  Regards,
>   -Stefan
>
> On Monday, 7 April 2014 12:25:33 UTC, [email protected] wrote: 
>>
>> Hi, 
>>
>>  Do you have any rough estimation regarding how much space this could 
>> save?
>> I know the question is very vague but I'm curious to know if you have 
>> done any comparison at all.
>>
>>  Luca; Are you able to prioritize this to take advantage of this create 
>> work asap?
>>
>>  Steve, again, thank you very much.
>>
>>  Regards,
>>   -Stefán
>>
>> On Monday, 7 April 2014 03:41:11 UTC, Steve Coughlan wrote: 
>>>
>>> On 07/04/14 13:12, Steve wrote: 
>>> > For testing/debug it may be convenient to use a string data 
>>> > serialization format.  Or if using compressedbits this field may 
>>> > specify which compression algorithm or settings to use. 
>>>
>>> I should add a caveat to this statement.  As long is there is not a 
>>> mismatch between whether the serializer serializes fixed length fields 
>>> using the same fixed length.  Some code changes would be required to 
>>> allow for this although they would not be difficult to do. 
>>>
>>     -- 
>
> --- 
> You received this message because you are subscribed to the Google Groups 
> "OrientDB" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> For more options, visit https://groups.google.com/d/optout.
>
>
> 

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [orientdb] Re: Schema Driven Binary Serialization - draft spec

Reply via email to