Re: BinaryObject pros/cons

Vladimir Ozerov Sun, 30 Oct 2016 12:25:34 -0700

Valya,

I have several concerns:
1) Correctness: hasField() will not work properly. But probably we can fix
that by adding this info to schema.
2) Performance: we have lots optimizations which depend on either "stable"
object schema, or low number of schemas. We will effectively turn them off.
But what concerns me even more, is that we may end up in enormous number of
schemas. E.g. consider an object with 10 number fields. If all fields could
be zero, we may end up in something like 2^10 schemas.


Vladimir.

29 окт. 2016 г. 0:37 пользователь "Valentin Kulichenko" <
[email protected]> написал:

> Vova,
>
> Why do we need to write zeros and nulls in the first place? What's the
> value of having them in the byte array?
>
> -Val
>
> On Fri, Oct 28, 2016 at 1:18 AM, Vladimir Ozerov <[email protected]>
> wrote:
>
>> Valya,
>>
>> Currently null value is written as one byte, while zero value of long
>> type is written as 9 bytes. I want to improve that and write zeros as one
>> byte as well.
>>
>> As per var-length encoding, I am strongly against it. It saves IO and
>> memory at the cost of CPU. If we encode numbers in this way we will
>> slowdown SQL (which is already not very fast, to be honest). Because
>> instead of a single read memory read, we will have to perform multiple
>> reads and then apply some mechanics to restore original value. We already
>> have such problem with Strings - Java stores them as UTF-16, but we encode
>> them as UTF-8. As a result every read of a string field in SQL results in
>> decoding overhead.
>>
>> Vladimir.
>>
>> On Fri, Oct 28, 2016 at 6:07 AM, Valentin Kulichenko <
>> [email protected]> wrote:
>>
>>> Cross-posting this to dev list.
>>>
>>> Vladimir,
>>>
>>> To be honest, I don't see much difference between null values for
>>> objects and zero values for primitives. From BinaryObject semantics
>>> standpoint, both are default values for corresponding types. These values
>>> will be returned from the BinaryObject.field() method regardless of whether
>>> we actually save then in the byte array or not. Having said that, why don't
>>> we just skip them during write?
>>>
>>> You optimization will be still useful though, because there are often a
>>> lot of ints and longs that are not zeros, but still small and can fit 1-2
>>> bytes. We already added such compaction in direct message marshaling and it
>>> reduced overall traffic by around 30%.
>>>
>>> -Val
>>>
>>>
>>> On Thu, Oct 27, 2016 at 2:21 PM, Vladimir Ozerov <[email protected]>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am not very concerned with null fields overhead, because usually it
>>>> won't be significant. However, there is a problem with zeros. User object
>>>> might have lots of int/long zeros, this is not uncommon. And each zero will
>>>> consume 4-8 additional bytes. We probably will implement special
>>>> optimization which will write such fields in special compact format.
>>>>
>>>> Vladimir.
>>>>
>>>> On Thu, Oct 27, 2016 at 10:55 PM, vkulichenko <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Yes, null values consume memory. I believe this can be optimized, but I
>>>>> haven't seen issues with this so far. Unless you have hundreds of
>>>>> fields
>>>>> most of which are nulls (very rare case), the overhead is minimal.
>>>>>
>>>>> -Val
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context: http://apache-ignite-users.705
>>>>> 18.x6.nabble.com/BinaryObject-pros-cons-tp8541p8563.html
>>>>> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>>>>>
>>>>
>>>>
>>>
>>
>

Re: BinaryObject pros/cons

Reply via email to