Hi,
this is great news and potentially fills the dent in my plans regarding
storage cost, thank you.
I hope that Luca and the team make this a top priority and the
documentation will improve over time.
Regards,
-Stefan
On Monday, 7 April 2014 13:00:49 UTC, Steve Coughlan wrote:
>
> Although binary serialization gives us substantial performance advantage
> the primary motive for this change was saving space (which in a bigdata
> context IS a performance factor). With that in mind I've put together of a
> comparison of a common type of data record using various methods of
> serialization. For this comparison I've picked the classic stockmarket
> quote. It's a common small record type that's used in large volumes (and
> also happens to be the main entry barrier for me to using OrientDB):
>
> class Quote {
> String ticker; //assume 4 chars
> Date date;
> float open;
> float high;
> float low;
> float close;
> long volume;
> }
>
> For the purpose of the exercise we assume that stock price on average is 4
> digits + 1 decimal place e.g. $12.51 or for a penny stock $0.013. We will
> also assume that String encoded field values are enclosed in quotes as per
> the current implementation (we'll ignore escape chars for this exercise)
> Also assume that Strings are encoded using 2 bytes/char
>
>
>
> *String keys+values serialization (current implentation) *key: ticker -
> 8 chars + fieldValueDelimiter (1 char) = 16 bytes
> value: ticker - 3 chars + 2 " chars = 10 bytes
>
> key: date - 4 + 1 = 10 bytes
> value: date - 13 + 2 = 30 bytes // System.currentTimeMillis() returns 13
> digit number
>
> key: open - 4 + 1 = 10 bytes
> value: open - 5 + 1 = 12 bytes
>
> key: high - 4 + 1 = 10 bytes
> value: high - 5 + 1 = 12 bytes
>
> key: low - 3 + 1 = 8 bytes
> value: low - 5 + 1 = 12 bytes
>
> key: close - 5 + 1 = 12 bytes
> value: close - 5 + 1 = 12 bytes
>
> key: volume - 6 + 1 = 14 bytes
> value: volume - 8 + 1 = 18 bytes
>
> TOTAL = 180 bytes
>
> *String keys+ Binary values serialization*
> (assumes length byte for Strings)
>
> key: ticker - 6 chars + fieldValueDelimiter (1 char) = 14 bytes
> value: ticker - 1 byte + 3 chars = 7 bytes
>
> key: date - 4 + 1 = 10 bytes
> value: date - 8 bytes
>
> key: open - 4 + 1 = 10 bytes
> value: open - 4 bytes
>
> key: high - 4 + 1 = 10 bytes
> value: high - 4 bytes
>
> key: low - 3 + 1 = 8 bytes
> value: low - 4 bytes
>
> key: close - 5 + 1 = 12 bytes
> value: close - 4 bytes
>
> key: volume - 6 + 1 = 14 bytes
> value: volume - 8 bytes
>
> TOTAL = 65 bytes
>
>
> *Binary serialization without declared schema *
>
> * Header: *format, classId, version, headerLength, fieldCount,
> nullbitsLength, nullbits = 7 bytes
> 7 fields * nameId, datatype, offset, length = 4 * 7 = 28 bytes
> dataLength = 4 bytes
>
> header total: 35 bytes
>
>
> *Data: *ticker = 6 bytes
> open, high, low, close = 4 * 4 = 16 bytes
> volume = 8 bytes
>
> data total: 30 bytes
>
> record total: 58 bytes
>
>
> *Binary serialization with declared schema *
>
> * Header: *format, classId, version, headerLength, fieldCount,
> nullbitsLength, nullbits = 7 bytes
> ticker field: offset, length = 2 bytes
> dataLength = 4 bytes
>
> header total: 13 bytes
>
>
> *Data: *ticker = 6 bytes
> open, high, low, close = 4 * 4 = 16 bytes
> volume = 8 bytes
>
> data total: 30 bytes
>
> record total: 43 bytes
>
>
> The valid comparison currently is between the current implementation
> (which doesn't change it's serialized size regardless of whether the class
> is schema declared) and either of the two binary examples.
>
> i.e. 160 bytes vs either 58 bytes or 43 bytes which in terms of records
> able to be cached means a factor of 2.8 or 3.7 depending on whether the
> class is schema declared.
>
>
> On 07/04/14 22:27, [email protected] <javascript:> wrote:
>
>
> Steve,
>
> I see you mention serialization of sub-elements as well.
>
> How much effort do you think it is to get this working for embedded maps
> and do you see that as something you will look into?
>
> Regards,
> -Stefan
>
> On Monday, 7 April 2014 12:25:33 UTC, [email protected] wrote:
>>
>> Hi,
>>
>> Do you have any rough estimation regarding how much space this could
>> save?
>> I know the question is very vague but I'm curious to know if you have
>> done any comparison at all.
>>
>> Luca; Are you able to prioritize this to take advantage of this create
>> work asap?
>>
>> Steve, again, thank you very much.
>>
>> Regards,
>> -Stefán
>>
>> On Monday, 7 April 2014 03:41:11 UTC, Steve Coughlan wrote:
>>>
>>> On 07/04/14 13:12, Steve wrote:
>>> > For testing/debug it may be convenient to use a string data
>>> > serialization format. Or if using compressedbits this field may
>>> > specify which compression algorithm or settings to use.
>>>
>>> I should add a caveat to this statement. As long is there is not a
>>> mismatch between whether the serializer serializes fixed length fields
>>> using the same fixed length. Some code changes would be required to
>>> allow for this although they would not be difficult to do.
>>>
>> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "OrientDB" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected] <javascript:>.
> For more options, visit https://groups.google.com/d/optout.
>
>
>
--
---
You received this message because you are subscribed to the Google Groups
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.