Re: Re[2]: IGNITE-6499 Compact NULL fields

2020-07-13 Thread Ilya Kasnacheev
Hello!

If we do improve it, I think we should go for a full re-think as opposed to
a single breaking change that doesn't actually improve that much.

Nevertheless, I think we can commit some improvements with opt-in
BinaryConfiguration.

Regards,
-- 
Ilya Kasnacheev


сб, 11 июл. 2020 г. в 01:27, steve.hostett...@gmail.com <
steve.hostett...@gmail.com>:

> Ok gotcha, so it is not going to make it.
>
> Just to note that we are dragging this since before v2.0 and just a
> reminder
> that someone else tried a similar thing before v2 and it got blocked
> because
> it was too much of a change for v2.
>
> Typically the type of things that we can never change because it is too
> much
> of an impact.
>
> BinaryObject format is not optimal far for it and if we cannot change it
> incrementally nor in a big bang...
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>


Re: Re[2]: IGNITE-6499 Compact NULL fields

2020-07-10 Thread steve.hostett...@gmail.com
Ok gotcha, so it is not going to make it. 

Just to note that we are dragging this since before v2.0 and just a reminder
that someone else tried a similar thing before v2 and it got blocked because
it was too much of a change for v2.

Typically the type of things that we can never change because it is too much
of an impact.

BinaryObject format is not optimal far for it and if we cannot change it
incrementally nor in a big bang...



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/


Re: Re[2]: IGNITE-6499 Compact NULL fields

2020-07-08 Thread Ilya Kasnacheev
Hello!

Yes, I think this is a sensible approach.

Regards,
-- 
Ilya Kasnacheev


ср, 8 июл. 2020 г. в 14:46, Ivan Daschinsky :

> I think that this feature can be handled as compactFooter. For example, C++
> doesn't support compactFooter and it is not an issue.
> Of course, this feature should be disabled by default, and should be
> enabled explicitly in BinaryConfiguration.
> Also, subsequent issues in jira about this feature support in platforms
> should be created.
>
> ср, 8 июл. 2020 г. в 14:31, Ilya Kasnacheev :
>
> > Hello!
> >
> > I think this is a blocker for this change. We already have binary format
> > published:
> >
> >
> https://apacheignite.readme.io/docs/binary-client-protocol-data-format#complex-object
> >
> > Arguably, we cannot change it in a minor version of Apache Ignite, so
> this
> > change has to target AI 3.0.
> >
> > Extending this binary format with e.g. new operations could probably be
> OK.
> > But we have clients released on a different schedule in their own repos
> > (and there are some 3rd party clients too), we can't release a minor
> > version which will change this format unilaterally without any change of
> > operation (same data, same calls, different result after upgrade, broken
> > clients).
> >
> > Regards,
> > --
> > Ilya Kasnacheev
> >
> >
> > ср, 8 июл. 2020 г. в 13:43, Ivan Daschinsky :
> >
> > > Hi!
> > > Ilya, unfortunatelly yes, subsequent changes should be made in C++,
> .NET
> > > and other platform code.
> > >
> > > ср, 8 июл. 2020 г. в 12:22, Ilya Kasnacheev  >:
> > >
> > > > Hello fellow devs,
> > > >
> > > > I just wanted to ask, how would this Binary Object format change
> affect
> > > > thin clients? C++/.Net nodes? Etc?
> > > >
> > > > Is it fully backwards compatible or not?
> > > >
> > > > I think that realistically, we can only add binary-incompatible
> changes
> > > to
> > > > Binary Object format in 3.0.
> > > >
> > > > Regards,
> > > > --
> > > > Ilya Kasnacheev
> > > >
> > > >
> > > > ср, 8 июл. 2020 г. в 09:05, Ivan Pavlukhin :
> > > >
> > > > > A side note. Now we have a neat URL for TC bot
> > > > > https://mtcga.ignite.apache.org/ (along with one in a gridgain
> > > > > domain).
> > > > >
> > > > > 2020-07-07 18:43 GMT+03:00, Zhenya Stanilovsky
> > > >  > > > > >:
> > > > > >
> > > > > > request it, check for example [1]
> > > > > >
> > > > > > also you need to run [2] tests.
> > > > > >
> > > > > > [1]
> > > > > >
> > > > >
> > > >
> > >
> >
> http://apache-ignite-developers.2346864.n4.nabble.com/Phani-Introduction-td47788.html
> > > > > > [2] https://mtcga.gridgain.com
> > > > > >>Hello,
> > > > > >>
> > > > > >>Look at the ticket and the only comment I can see is creating a
> > > branch
> > > > on
> > > > > >>git in the main repo and not in my fork. I do not have the right
> to
> > > > > create
> > > > > >> a
> > > > > >>branch in the main repository. Am i missing something?
> > > > > >>
> > > > > >>Sorry I probably misread the document but I though that I should
> > fork
> > > > the
> > > > > >>repo and then pull request as I do not have the rights to create
> a
> > > > > branch.
> > > > > >>
> > > > > >>Thanks for your help
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >>--
> > > > > >>Sent from:
> http://apache-ignite-developers.2346864.n4.nabble.com/
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Best regards,
> > > > > Ivan Pavlukhin
> > > > >
> > > >
> > >
> > >
> > > --
> > > Sincerely yours, Ivan Daschinskiy
> > >
> >
>
>
> --
> Sincerely yours, Ivan Daschinskiy
>


Re: Re[2]: IGNITE-6499 Compact NULL fields

2020-07-08 Thread Ivan Daschinsky
I think that this feature can be handled as compactFooter. For example, C++
doesn't support compactFooter and it is not an issue.
Of course, this feature should be disabled by default, and should be
enabled explicitly in BinaryConfiguration.
Also, subsequent issues in jira about this feature support in platforms
should be created.

ср, 8 июл. 2020 г. в 14:31, Ilya Kasnacheev :

> Hello!
>
> I think this is a blocker for this change. We already have binary format
> published:
>
> https://apacheignite.readme.io/docs/binary-client-protocol-data-format#complex-object
>
> Arguably, we cannot change it in a minor version of Apache Ignite, so this
> change has to target AI 3.0.
>
> Extending this binary format with e.g. new operations could probably be OK.
> But we have clients released on a different schedule in their own repos
> (and there are some 3rd party clients too), we can't release a minor
> version which will change this format unilaterally without any change of
> operation (same data, same calls, different result after upgrade, broken
> clients).
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> ср, 8 июл. 2020 г. в 13:43, Ivan Daschinsky :
>
> > Hi!
> > Ilya, unfortunatelly yes, subsequent changes should be made in C++, .NET
> > and other platform code.
> >
> > ср, 8 июл. 2020 г. в 12:22, Ilya Kasnacheev :
> >
> > > Hello fellow devs,
> > >
> > > I just wanted to ask, how would this Binary Object format change affect
> > > thin clients? C++/.Net nodes? Etc?
> > >
> > > Is it fully backwards compatible or not?
> > >
> > > I think that realistically, we can only add binary-incompatible changes
> > to
> > > Binary Object format in 3.0.
> > >
> > > Regards,
> > > --
> > > Ilya Kasnacheev
> > >
> > >
> > > ср, 8 июл. 2020 г. в 09:05, Ivan Pavlukhin :
> > >
> > > > A side note. Now we have a neat URL for TC bot
> > > > https://mtcga.ignite.apache.org/ (along with one in a gridgain
> > > > domain).
> > > >
> > > > 2020-07-07 18:43 GMT+03:00, Zhenya Stanilovsky
> > >  > > > >:
> > > > >
> > > > > request it, check for example [1]
> > > > >
> > > > > also you need to run [2] tests.
> > > > >
> > > > > [1]
> > > > >
> > > >
> > >
> >
> http://apache-ignite-developers.2346864.n4.nabble.com/Phani-Introduction-td47788.html
> > > > > [2] https://mtcga.gridgain.com
> > > > >>Hello,
> > > > >>
> > > > >>Look at the ticket and the only comment I can see is creating a
> > branch
> > > on
> > > > >>git in the main repo and not in my fork. I do not have the right to
> > > > create
> > > > >> a
> > > > >>branch in the main repository. Am i missing something?
> > > > >>
> > > > >>Sorry I probably misread the document but I though that I should
> fork
> > > the
> > > > >>repo and then pull request as I do not have the rights to create a
> > > > branch.
> > > > >>
> > > > >>Thanks for your help
> > > > >>
> > > > >>
> > > > >>
> > > > >>--
> > > > >>Sent from:  http://apache-ignite-developers.2346864.n4.nabble.com/
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Best regards,
> > > > Ivan Pavlukhin
> > > >
> > >
> >
> >
> > --
> > Sincerely yours, Ivan Daschinskiy
> >
>


-- 
Sincerely yours, Ivan Daschinskiy


Re: Re[2]: IGNITE-6499 Compact NULL fields

2020-07-08 Thread Ilya Kasnacheev
Hello!

I think this is a blocker for this change. We already have binary format
published:
https://apacheignite.readme.io/docs/binary-client-protocol-data-format#complex-object

Arguably, we cannot change it in a minor version of Apache Ignite, so this
change has to target AI 3.0.

Extending this binary format with e.g. new operations could probably be OK.
But we have clients released on a different schedule in their own repos
(and there are some 3rd party clients too), we can't release a minor
version which will change this format unilaterally without any change of
operation (same data, same calls, different result after upgrade, broken
clients).

Regards,
-- 
Ilya Kasnacheev


ср, 8 июл. 2020 г. в 13:43, Ivan Daschinsky :

> Hi!
> Ilya, unfortunatelly yes, subsequent changes should be made in C++, .NET
> and other platform code.
>
> ср, 8 июл. 2020 г. в 12:22, Ilya Kasnacheev :
>
> > Hello fellow devs,
> >
> > I just wanted to ask, how would this Binary Object format change affect
> > thin clients? C++/.Net nodes? Etc?
> >
> > Is it fully backwards compatible or not?
> >
> > I think that realistically, we can only add binary-incompatible changes
> to
> > Binary Object format in 3.0.
> >
> > Regards,
> > --
> > Ilya Kasnacheev
> >
> >
> > ср, 8 июл. 2020 г. в 09:05, Ivan Pavlukhin :
> >
> > > A side note. Now we have a neat URL for TC bot
> > > https://mtcga.ignite.apache.org/ (along with one in a gridgain
> > > domain).
> > >
> > > 2020-07-07 18:43 GMT+03:00, Zhenya Stanilovsky
> >  > > >:
> > > >
> > > > request it, check for example [1]
> > > >
> > > > also you need to run [2] tests.
> > > >
> > > > [1]
> > > >
> > >
> >
> http://apache-ignite-developers.2346864.n4.nabble.com/Phani-Introduction-td47788.html
> > > > [2] https://mtcga.gridgain.com
> > > >>Hello,
> > > >>
> > > >>Look at the ticket and the only comment I can see is creating a
> branch
> > on
> > > >>git in the main repo and not in my fork. I do not have the right to
> > > create
> > > >> a
> > > >>branch in the main repository. Am i missing something?
> > > >>
> > > >>Sorry I probably misread the document but I though that I should fork
> > the
> > > >>repo and then pull request as I do not have the rights to create a
> > > branch.
> > > >>
> > > >>Thanks for your help
> > > >>
> > > >>
> > > >>
> > > >>--
> > > >>Sent from:  http://apache-ignite-developers.2346864.n4.nabble.com/
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> > > --
> > >
> > > Best regards,
> > > Ivan Pavlukhin
> > >
> >
>
>
> --
> Sincerely yours, Ivan Daschinskiy
>


Re: Re[2]: IGNITE-6499 Compact NULL fields

2020-07-08 Thread Ivan Daschinsky
Hi!
Ilya, unfortunatelly yes, subsequent changes should be made in C++, .NET
and other platform code.

ср, 8 июл. 2020 г. в 12:22, Ilya Kasnacheev :

> Hello fellow devs,
>
> I just wanted to ask, how would this Binary Object format change affect
> thin clients? C++/.Net nodes? Etc?
>
> Is it fully backwards compatible or not?
>
> I think that realistically, we can only add binary-incompatible changes to
> Binary Object format in 3.0.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> ср, 8 июл. 2020 г. в 09:05, Ivan Pavlukhin :
>
> > A side note. Now we have a neat URL for TC bot
> > https://mtcga.ignite.apache.org/ (along with one in a gridgain
> > domain).
> >
> > 2020-07-07 18:43 GMT+03:00, Zhenya Stanilovsky
>  > >:
> > >
> > > request it, check for example [1]
> > >
> > > also you need to run [2] tests.
> > >
> > > [1]
> > >
> >
> http://apache-ignite-developers.2346864.n4.nabble.com/Phani-Introduction-td47788.html
> > > [2] https://mtcga.gridgain.com
> > >>Hello,
> > >>
> > >>Look at the ticket and the only comment I can see is creating a branch
> on
> > >>git in the main repo and not in my fork. I do not have the right to
> > create
> > >> a
> > >>branch in the main repository. Am i missing something?
> > >>
> > >>Sorry I probably misread the document but I though that I should fork
> the
> > >>repo and then pull request as I do not have the rights to create a
> > branch.
> > >>
> > >>Thanks for your help
> > >>
> > >>
> > >>
> > >>--
> > >>Sent from:  http://apache-ignite-developers.2346864.n4.nabble.com/
> > >
> > >
> > >
> > >
> >
> >
> > --
> >
> > Best regards,
> > Ivan Pavlukhin
> >
>


-- 
Sincerely yours, Ivan Daschinskiy


Re: Re[2]: IGNITE-6499 Compact NULL fields

2020-07-08 Thread Ilya Kasnacheev
Hello fellow devs,

I just wanted to ask, how would this Binary Object format change affect
thin clients? C++/.Net nodes? Etc?

Is it fully backwards compatible or not?

I think that realistically, we can only add binary-incompatible changes to
Binary Object format in 3.0.

Regards,
-- 
Ilya Kasnacheev


ср, 8 июл. 2020 г. в 09:05, Ivan Pavlukhin :

> A side note. Now we have a neat URL for TC bot
> https://mtcga.ignite.apache.org/ (along with one in a gridgain
> domain).
>
> 2020-07-07 18:43 GMT+03:00, Zhenya Stanilovsky  >:
> >
> > request it, check for example [1]
> >
> > also you need to run [2] tests.
> >
> > [1]
> >
> http://apache-ignite-developers.2346864.n4.nabble.com/Phani-Introduction-td47788.html
> > [2] https://mtcga.gridgain.com
> >>Hello,
> >>
> >>Look at the ticket and the only comment I can see is creating a branch on
> >>git in the main repo and not in my fork. I do not have the right to
> create
> >> a
> >>branch in the main repository. Am i missing something?
> >>
> >>Sorry I probably misread the document but I though that I should fork the
> >>repo and then pull request as I do not have the rights to create a
> branch.
> >>
> >>Thanks for your help
> >>
> >>
> >>
> >>--
> >>Sent from:  http://apache-ignite-developers.2346864.n4.nabble.com/
> >
> >
> >
> >
>
>
> --
>
> Best regards,
> Ivan Pavlukhin
>


Re: Re[2]: IGNITE-6499 Compact NULL fields

2020-07-08 Thread Ivan Pavlukhin
A side note. Now we have a neat URL for TC bot
https://mtcga.ignite.apache.org/ (along with one in a gridgain
domain).

2020-07-07 18:43 GMT+03:00, Zhenya Stanilovsky :
>
> request it, check for example [1]
>
> also you need to run [2] tests.
>
> [1]
> http://apache-ignite-developers.2346864.n4.nabble.com/Phani-Introduction-td47788.html
> [2] https://mtcga.gridgain.com
>>Hello,
>>
>>Look at the ticket and the only comment I can see is creating a branch on
>>git in the main repo and not in my fork. I do not have the right to create
>> a
>>branch in the main repository. Am i missing something?
>>
>>Sorry I probably misread the document but I though that I should fork the
>>repo and then pull request as I do not have the rights to create a branch.
>>
>>Thanks for your help
>>
>>
>>
>>--
>>Sent from:  http://apache-ignite-developers.2346864.n4.nabble.com/
>
>
>
>


-- 

Best regards,
Ivan Pavlukhin


Re[2]: IGNITE-6499 Compact NULL fields

2020-07-07 Thread Zhenya Stanilovsky

request it, check for example [1]
 
also you need to run [2] tests.
 
[1]  
http://apache-ignite-developers.2346864.n4.nabble.com/Phani-Introduction-td47788.html
[2] https://mtcga.gridgain.com 
>Hello,
>
>Look at the ticket and the only comment I can see is creating a branch on
>git in the main repo and not in my fork. I do not have the right to create a
>branch in the main repository. Am i missing something?
>
>Sorry I probably misread the document but I though that I should fork the
>repo and then pull request as I do not have the rights to create a branch.
>
>Thanks for your help
>
>
>
>--
>Sent from:  http://apache-ignite-developers.2346864.n4.nabble.com/ 
 
 
 
 

Re: IGNITE-6499 Compact NULL fields

2020-07-07 Thread steve.hostett...@gmail.com
Hello,

Look at the ticket and the only comment I can see is  creating a branch on
git in the main repo and not in my fork. I do not have the right to create a
branch in the main repository. Am i missing something?

Sorry I probably misread the document but I though that I should fork the
repo and then pull request as I do not have the rights to create a branch.

Thanks for your help



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/


Re: IGNITE-6499 Compact NULL fields

2020-07-07 Thread Zhenya Stanilovsky

Steve i place some comments in ticket, still have no response.

 
>
>
>--- Forwarded message ---
>From: " steve.hostett...@gmail.com " < steve.hostett...@gmail.com >
>To:  dev@ignite.apache.org
>Cc:
>Subject: Re: Re[4]: IGNITE-6499 Compact NULL fields
>Date: Fri, 12 Jun 2020 16:15:37 +0300
>
>Hello,
>
>Stanilovsky Evgeny : would you mind having a look at the pull request.
>
>Thanks in advance
>
>
>
>--
>Sent from:  http://apache-ignite-developers.2346864.n4.nabble.com/ 
 
 
 
 

Re: Re[4]: IGNITE-6499 Compact NULL fields

2020-06-12 Thread steve.hostett...@gmail.com
Hello,

Stanilovsky Evgeny : would you mind having a look at the pull request. 

Thanks in advance



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/


Re: Re[4]: IGNITE-6499 Compact NULL fields

2020-05-28 Thread steve.hostett...@gmail.com
Hello,

still fighting to get my unit tests green. Something I do not really
understand is the mechanism of object replacement. For instance, marshalling
a LocalDateTime, gets replace by java.time.Ser, I assume that's because it
has a method writeReplace. But that takes precedence over a custom
serialiser explicitly declared for LocalDateTime. This is quite confusing to
me. Is there some piece of documentation (other than the code) that I can
look at. At this point, I feel like I am doing modification blindly.



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/


Re: Re[4]: IGNITE-6499 Compact NULL fields

2020-05-25 Thread steve.hostett...@gmail.com
Ok, anyway I will finish my patch 2 unit tests still not working. I assume
anyway that if we would want to apply zstd compression we would reuse the
existing page compression algorithm in memory and not only for persistence.
That would probably be anyway simpler and more straightforward.

Will try to hurry to submit a patch for review.



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/


Re[4]: IGNITE-6499 Compact NULL fields

2020-05-25 Thread Zhenya Stanilovsky

Compress of whole binary inside ignite.


 
>Sorry I do not actual get what are you opposing? the compress of the binary
>or the null compaction or both?
>And can you ellaborate on why you are opposing it?
>
>
>
>--
>Sent from:  http://apache-ignite-developers.2346864.n4.nabble.com/ 
 
 
 
 

Re: Re[2]: IGNITE-6499 Compact NULL fields

2020-05-25 Thread steve.hostett...@gmail.com
Sorry I do not actual get what are you opposing? the compress of the binary
or the null compaction or both?
And can you ellaborate on why you are opposing it?



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/


Re[2]: IGNITE-6499 Compact NULL fields

2020-05-25 Thread Zhenya Stanilovsky

I`m currently against this approach, everyone can previously compress Binary 
Object for further using,  no additional code need here. This discussion only 
about currently not optimal null storing and looks like we can improve it 
without performance pay.  

  
>Понедельник, 25 мая 2020, 13:42 +03:00 от Ilya Kasnacheev 
>:
> 
>Hello!
>
>That would be nice! My preferred compression method is zstd (it also has
>dictionary generation built in).
>
>Regards,
>--
>Ilya Kasnacheev
>
>
>пн, 25 мая 2020 г. в 13:25, Hostettler, Steve <
>steve.hostett...@wolterskluwer.com >:
> 
>> I like the idea, especially because it also would apply across the board.
>> So you propose to build the binary object and to apply dictionary based
>> compression on top.
>>
>> I could quickly generate a bunch of binary objects from the tests and
>> apply java compress/deflate with a dictionary based on the BinaryUtils
>> elements.
>> To compare with the null compaction and the varint.
>>
>>
>> -Original Message-
>> From: Ilya Kasnacheev < ilya.kasnach...@gmail.com >
>> Sent: Monday, May 25, 2020 12:05 PM
>> To: dev < dev@ignite.apache.org >
>> Subject: Re: IGNITE-6499 Compact NULL fields
>>
>> Caution, this email may be from a sender outside Wolters Kluwer. Verify
>> the sender and know the content is safe.
>>
>> Hello!
>>
>> My take is the following: if conserving memory is needed at all, then we
>> better invest in compression (such as dictionary-based row compression)
>> rather than implementing varint, compact nulls, etc.
>>
>> Dictionary-based compression can easily tackle varints, null patterns
>> while also compressing strings and repeated values and even things we would
>> never think out on our own.
>>
>> It also has low complexity of our own code, no compatibility issues
>> (people store binary objects in 3rd party storage, they do indeed) and low
>> incidence of bugs.
>>
>> Regards,
>> --
>> Ilya Kasnacheev
>>
>>
>> пн, 25 мая 2020 г. в 12:51, Hostettler, Steve <
>>  steve.hostett...@wolterskluwer.com >:
>>
>> > I went for a simpler approach (only with null mask( and yes the gain
>> > is high for smaller object but low otherwise. I gain between 5-20% on
>> > my objects. But to me it is the step stone to easily implement other
>> > optimisations like varint and schemaless without using raw. Trying to
>> > solve the latest unit tests to give you a better idea. If not worth
>> > then let's not do it but it is worth a try I think.
>> >
>> >
>> > -Original Message-
>> > From: Ilya Kasnacheev < ilya.kasnach...@gmail.com >
>> > Sent: Monday, May 25, 2020 11:48 AM
>> > To: dev < dev@ignite.apache.org >
>> > Subject: Re: IGNITE-6499 Compact NULL fields
>> >
>> > Caution, this email may be from a sender outside Wolters Kluwer.
>> > Verify the sender and know the content is safe.
>> >
>> > Hello!
>> >
>> > I can't help myself but wonder how large of a benefit will it give. I
>> > have checked the ticket description, it looks the proposed scheme is
>> > elaborate and benefit for non-extreme binary objects rather tiny.
>> >
>> > WDYT?
>> >
>> > Regards,
>> > --
>> > Ilya Kasnacheev
>> >
>> >
>> > пн, 18 мая 2020 г. в 22:54,  steve.hostett...@gmail.com <
>> >  steve.hostett...@gmail.com >:
>> >
>> > > Hello igniters,
>> > >
>> > > while I would like to help on the calcite because H2 optimiser (or
>> > > the lack
>> > > thereof) is really killing us, I think that it would be wiser to
>> > > start by contributing on something easier.
>> > >
>> > > Therefore I will tackle another problem that we have which is the
>> > > memory consumption. I stumbled upon this IEP
>> > >
>> > >  https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcw
>> > > ik
>> > > i.apache.org%2Fconfluence%2Fdisplay%2FIGNITE%2FIEP-2%253A%2BBinary%2
>> > > Bo
>> > > bject%2Bformat%2Bimprovementsdata=02%7C01%7CSteve.Hostettler%40
>> > > wo
>> > > lterskluwer.com%7C7568148487434617407b08d80090b1f2%7C8ac76c91e7f141f
>> > > fa
>> > > 89c3553b2da2c17%7C0%7C0%7C637259968758509764sdata=ZNFJ5gqEXRv5K
>> > > R3
>> > > HJUfYZ4rmnGwCiFVGg4IrWTORT2k%3Dreserved=0
>&

Re: IGNITE-6499 Compact NULL fields

2020-05-25 Thread Ilya Kasnacheev
Hello!

That would be nice! My preferred compression method is zstd (it also has
dictionary generation built in).

Regards,
-- 
Ilya Kasnacheev


пн, 25 мая 2020 г. в 13:25, Hostettler, Steve <
steve.hostett...@wolterskluwer.com>:

> I like the idea, especially because it also would apply across the board.
> So you propose to build the binary object and to apply dictionary based
> compression on top.
>
> I could quickly generate a bunch of binary objects from the tests and
> apply java compress/deflate with a dictionary based on the BinaryUtils
> elements.
> To compare with the null compaction and the varint.
>
>
> -Original Message-
> From: Ilya Kasnacheev 
> Sent: Monday, May 25, 2020 12:05 PM
> To: dev 
> Subject: Re: IGNITE-6499 Compact NULL fields
>
> Caution, this email may be from a sender outside Wolters Kluwer. Verify
> the sender and know the content is safe.
>
> Hello!
>
> My take is the following: if conserving memory is needed at all, then we
> better invest in compression (such as dictionary-based row compression)
> rather than implementing varint, compact nulls, etc.
>
> Dictionary-based compression can easily tackle varints, null patterns
> while also compressing strings and repeated values and even things we would
> never think out on our own.
>
> It also has low complexity of our own code, no compatibility issues
> (people store binary objects in 3rd party storage, they do indeed) and low
> incidence of bugs.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> пн, 25 мая 2020 г. в 12:51, Hostettler, Steve <
> steve.hostett...@wolterskluwer.com>:
>
> > I went for a simpler approach (only with null mask( and yes the gain
> > is high for smaller object but low otherwise. I gain between 5-20% on
> > my objects. But to me it is the step stone to easily implement other
> > optimisations like varint and schemaless without using raw. Trying to
> > solve the latest unit tests to give you a better idea. If not worth
> > then let's not do it but it is worth a try I think.
> >
> >
> > -Original Message-
> > From: Ilya Kasnacheev 
> > Sent: Monday, May 25, 2020 11:48 AM
> > To: dev 
> > Subject: Re: IGNITE-6499 Compact NULL fields
> >
> > Caution, this email may be from a sender outside Wolters Kluwer.
> > Verify the sender and know the content is safe.
> >
> > Hello!
> >
> > I can't help myself but wonder how large of a benefit will it give.  I
> > have checked the ticket description, it looks the proposed scheme is
> > elaborate and benefit for non-extreme binary objects rather tiny.
> >
> > WDYT?
> >
> > Regards,
> > --
> > Ilya Kasnacheev
> >
> >
> > пн, 18 мая 2020 г. в 22:54, steve.hostett...@gmail.com <
> > steve.hostett...@gmail.com>:
> >
> > > Hello igniters,
> > >
> > > while I would like to help on the calcite because H2 optimiser (or
> > > the lack
> > > thereof) is really killing us, I think that it would be wiser to
> > > start by contributing on something easier.
> > >
> > > Therefore I will tackle another problem that we have which is the
> > > memory consumption. I stumbled upon this IEP
> > >
> > > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcw
> > > ik
> > > i.apache.org%2Fconfluence%2Fdisplay%2FIGNITE%2FIEP-2%253A%2BBinary%2
> > > Bo
> > > bject%2Bformat%2Bimprovementsdata=02%7C01%7CSteve.Hostettler%40
> > > wo
> > > lterskluwer.com%7C7568148487434617407b08d80090b1f2%7C8ac76c91e7f141f
> > > fa
> > > 89c3553b2da2c17%7C0%7C0%7C637259968758509764sdata=ZNFJ5gqEXRv5K
> > > R3
> > > HJUfYZ4rmnGwCiFVGg4IrWTORT2k%3Dreserved=0
> > > <
> > > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcw
> > > ik
> > > i.apache.org%2Fconfluence%2Fdisplay%2FIGNITE%2FIEP-2%253A%2BBinary%2
> > > Bo
> > > bject%2Bformat%2Bimprovementsdata=02%7C01%7CSteve.Hostettler%40
> > > wo
> > > lterskluwer.com%7C7568148487434617407b08d80090b1f2%7C8ac76c91e7f141f
> > > fa
> > > 89c3553b2da2c17%7C0%7C0%7C637259968758509764sdata=ZNFJ5gqEXRv5K
> > > R3 HJUfYZ4rmnGwCiFVGg4IrWTORT2k%3Dreserved=0>
> > >
> > > that is about optimising the binary marshaller.
> > >
> > > The low hanging fruit seemed to be the null compaction so I decided
> > > to start with it. Though I am sure I do see some hidden complexity.
> > >
> > > Here a couple of questions:
> > > - Can I assign myself IGNITE-6499 an

RE: IGNITE-6499 Compact NULL fields

2020-05-25 Thread Hostettler, Steve
I like the idea, especially because it also would apply across the board.
So you propose to build the binary object and to apply dictionary based 
compression on top.

I could quickly generate a bunch of binary objects from the tests and apply 
java compress/deflate with a dictionary based on the BinaryUtils elements.
To compare with the null compaction and the varint.


-Original Message-
From: Ilya Kasnacheev  
Sent: Monday, May 25, 2020 12:05 PM
To: dev 
Subject: Re: IGNITE-6499 Compact NULL fields

Caution, this email may be from a sender outside Wolters Kluwer. Verify the 
sender and know the content is safe.

Hello!

My take is the following: if conserving memory is needed at all, then we better 
invest in compression (such as dictionary-based row compression) rather than 
implementing varint, compact nulls, etc.

Dictionary-based compression can easily tackle varints, null patterns while 
also compressing strings and repeated values and even things we would never 
think out on our own.

It also has low complexity of our own code, no compatibility issues (people 
store binary objects in 3rd party storage, they do indeed) and low incidence of 
bugs.

Regards,
--
Ilya Kasnacheev


пн, 25 мая 2020 г. в 12:51, Hostettler, Steve <
steve.hostett...@wolterskluwer.com>:

> I went for a simpler approach (only with null mask( and yes the gain 
> is high for smaller object but low otherwise. I gain between 5-20% on 
> my objects. But to me it is the step stone to easily implement other 
> optimisations like varint and schemaless without using raw. Trying to 
> solve the latest unit tests to give you a better idea. If not worth 
> then let's not do it but it is worth a try I think.
>
>
> -Original Message-
> From: Ilya Kasnacheev 
> Sent: Monday, May 25, 2020 11:48 AM
> To: dev 
> Subject: Re: IGNITE-6499 Compact NULL fields
>
> Caution, this email may be from a sender outside Wolters Kluwer. 
> Verify the sender and know the content is safe.
>
> Hello!
>
> I can't help myself but wonder how large of a benefit will it give.  I 
> have checked the ticket description, it looks the proposed scheme is 
> elaborate and benefit for non-extreme binary objects rather tiny.
>
> WDYT?
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> пн, 18 мая 2020 г. в 22:54, steve.hostett...@gmail.com <
> steve.hostett...@gmail.com>:
>
> > Hello igniters,
> >
> > while I would like to help on the calcite because H2 optimiser (or 
> > the lack
> > thereof) is really killing us, I think that it would be wiser to 
> > start by contributing on something easier.
> >
> > Therefore I will tackle another problem that we have which is the 
> > memory consumption. I stumbled upon this IEP
> >
> > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcw
> > ik 
> > i.apache.org%2Fconfluence%2Fdisplay%2FIGNITE%2FIEP-2%253A%2BBinary%2
> > Bo 
> > bject%2Bformat%2Bimprovementsdata=02%7C01%7CSteve.Hostettler%40
> > wo 
> > lterskluwer.com%7C7568148487434617407b08d80090b1f2%7C8ac76c91e7f141f
> > fa
> > 89c3553b2da2c17%7C0%7C0%7C637259968758509764sdata=ZNFJ5gqEXRv5K
> > R3
> > HJUfYZ4rmnGwCiFVGg4IrWTORT2k%3Dreserved=0
> > <
> > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcw
> > ik 
> > i.apache.org%2Fconfluence%2Fdisplay%2FIGNITE%2FIEP-2%253A%2BBinary%2
> > Bo 
> > bject%2Bformat%2Bimprovementsdata=02%7C01%7CSteve.Hostettler%40
> > wo 
> > lterskluwer.com%7C7568148487434617407b08d80090b1f2%7C8ac76c91e7f141f
> > fa
> > 89c3553b2da2c17%7C0%7C0%7C637259968758509764sdata=ZNFJ5gqEXRv5K
> > R3 HJUfYZ4rmnGwCiFVGg4IrWTORT2k%3Dreserved=0>
> >
> > that is about optimising the binary marshaller.
> >
> > The low hanging fruit seemed to be the null compaction so I decided 
> > to start with it. Though I am sure I do see some hidden complexity.
> >
> > Here a couple of questions:
> > - Can I assign myself IGNITE-6499 and attach a patch?
> > - Who can I contact to help with the review. In the following page 
> > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcw
> > ik 
> > i.apache.org%2Fconfluence%2Fdisplay%2FIGNITE%2FHow%2Bto%2BContribute
> > 
> > mp;data=02%7C01%7CSteve.Hostettler%40wolterskluwer.com%7C75681484874
> > 34
> > 617407b08d80090b1f2%7C8ac76c91e7f141ffa89c3553b2da2c17%7C0%7C0%7C637
> > 25 
> > 9968758519763sdata=1Uzz8DLO%2B9nd1FPQ14efFeL35QsYE6tT3BvhIKf03F
> > Q%
> > 3Dreserved=0
> > <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fc
> > wi 
> > ki.apache.org%2Fconfluence%2Fdisplay%2FIGNITE%2FHow%2Bto%2BContribut
> > e&

Re: IGNITE-6499 Compact NULL fields

2020-05-25 Thread Ilya Kasnacheev
Hello!

My take is the following: if conserving memory is needed at all, then we
better invest in compression (such as dictionary-based row compression)
rather than implementing varint, compact nulls, etc.

Dictionary-based compression can easily tackle varints, null patterns while
also compressing strings and repeated values and even things we would never
think out on our own.

It also has low complexity of our own code, no compatibility issues (people
store binary objects in 3rd party storage, they do indeed) and low
incidence of bugs.

Regards,
-- 
Ilya Kasnacheev


пн, 25 мая 2020 г. в 12:51, Hostettler, Steve <
steve.hostett...@wolterskluwer.com>:

> I went for a simpler approach (only with null mask( and yes the gain is
> high for smaller object but low otherwise. I gain between 5-20% on my
> objects. But to me it is the step stone to easily implement other
> optimisations like varint and schemaless without using raw. Trying to solve
> the latest unit tests to give you a better idea. If not worth then let's
> not do it but it is worth a try I think.
>
>
> -Original Message-
> From: Ilya Kasnacheev 
> Sent: Monday, May 25, 2020 11:48 AM
> To: dev 
> Subject: Re: IGNITE-6499 Compact NULL fields
>
> Caution, this email may be from a sender outside Wolters Kluwer. Verify
> the sender and know the content is safe.
>
> Hello!
>
> I can't help myself but wonder how large of a benefit will it give.  I
> have checked the ticket description, it looks the proposed scheme is
> elaborate and benefit for non-extreme binary objects rather tiny.
>
> WDYT?
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> пн, 18 мая 2020 г. в 22:54, steve.hostett...@gmail.com <
> steve.hostett...@gmail.com>:
>
> > Hello igniters,
> >
> > while I would like to help on the calcite because H2 optimiser (or the
> > lack
> > thereof) is really killing us, I think that it would be wiser to start
> > by contributing on something easier.
> >
> > Therefore I will tackle another problem that we have which is the
> > memory consumption. I stumbled upon this IEP
> >
> > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwik
> > i.apache.org%2Fconfluence%2Fdisplay%2FIGNITE%2FIEP-2%253A%2BBinary%2Bo
> > bject%2Bformat%2Bimprovementsdata=02%7C01%7CSteve.Hostettler%40wo
> > lterskluwer.com%7C7568148487434617407b08d80090b1f2%7C8ac76c91e7f141ffa
> > 89c3553b2da2c17%7C0%7C0%7C637259968758509764sdata=ZNFJ5gqEXRv5KR3
> > HJUfYZ4rmnGwCiFVGg4IrWTORT2k%3Dreserved=0
> > <
> > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwik
> > i.apache.org%2Fconfluence%2Fdisplay%2FIGNITE%2FIEP-2%253A%2BBinary%2Bo
> > bject%2Bformat%2Bimprovementsdata=02%7C01%7CSteve.Hostettler%40wo
> > lterskluwer.com%7C7568148487434617407b08d80090b1f2%7C8ac76c91e7f141ffa
> > 89c3553b2da2c17%7C0%7C0%7C637259968758509764sdata=ZNFJ5gqEXRv5KR3
> > HJUfYZ4rmnGwCiFVGg4IrWTORT2k%3Dreserved=0>
> >
> > that is about optimising the binary marshaller.
> >
> > The low hanging fruit seemed to be the null compaction so I decided to
> > start with it. Though I am sure I do see some hidden complexity.
> >
> > Here a couple of questions:
> > - Can I assign myself IGNITE-6499 and attach a patch?
> > - Who can I contact to help with the review. In the following page
> > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwik
> > i.apache.org%2Fconfluence%2Fdisplay%2FIGNITE%2FHow%2Bto%2BContribute
> > mp;data=02%7C01%7CSteve.Hostettler%40wolterskluwer.com%7C7568148487434
> > 617407b08d80090b1f2%7C8ac76c91e7f141ffa89c3553b2da2c17%7C0%7C0%7C63725
> > 9968758519763sdata=1Uzz8DLO%2B9nd1FPQ14efFeL35QsYE6tT3BvhIKf03FQ%
> > 3Dreserved=0
> > <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwi
> > ki.apache.org%2Fconfluence%2Fdisplay%2FIGNITE%2FHow%2Bto%2BContribute&
> > amp;data=02%7C01%7CSteve.Hostettler%40wolterskluwer.com%7C756814848743
> > 4617407b08d80090b1f2%7C8ac76c91e7f141ffa89c3553b2da2c17%7C0%7C0%7C6372
> > 59968758519763sdata=1Uzz8DLO%2B9nd1FPQ14efFeL35QsYE6tT3BvhIKf03FQ
> > %3Dreserved=0> there is no one assigned for marshalling.
> >
> > On the details:
> > The compression is disabled by default as it is not compatible with
> > objects previously marshalled.
> >
> > My approach was to go a bit beyond the JIRA. No only do I remove the
> > indexes to null fields in the footer, I also remove the 0x65 in the
> > objects. I did not remove them fro the collections and arrays because
> > they are using absolute positioning.
> >
> > I gain between 5% to 20% depending of my test cases. Obvi

RE: IGNITE-6499 Compact NULL fields

2020-05-25 Thread Hostettler, Steve
I went for a simpler approach (only with null mask( and yes the gain is high 
for smaller object but low otherwise. I gain between 5-20% on my objects. But 
to me it is the step stone to easily implement other optimisations like varint 
and schemaless without using raw. Trying to solve the latest unit tests to give 
you a better idea. If not worth then let's not do it but it is worth a try I 
think.


-Original Message-
From: Ilya Kasnacheev  
Sent: Monday, May 25, 2020 11:48 AM
To: dev 
Subject: Re: IGNITE-6499 Compact NULL fields

Caution, this email may be from a sender outside Wolters Kluwer. Verify the 
sender and know the content is safe.

Hello!

I can't help myself but wonder how large of a benefit will it give.  I have 
checked the ticket description, it looks the proposed scheme is elaborate and 
benefit for non-extreme binary objects rather tiny.

WDYT?

Regards,
--
Ilya Kasnacheev


пн, 18 мая 2020 г. в 22:54, steve.hostett...@gmail.com <
steve.hostett...@gmail.com>:

> Hello igniters,
>
> while I would like to help on the calcite because H2 optimiser (or the 
> lack
> thereof) is really killing us, I think that it would be wiser to start 
> by contributing on something easier.
>
> Therefore I will tackle another problem that we have which is the 
> memory consumption. I stumbled upon this IEP
>
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwik
> i.apache.org%2Fconfluence%2Fdisplay%2FIGNITE%2FIEP-2%253A%2BBinary%2Bo
> bject%2Bformat%2Bimprovementsdata=02%7C01%7CSteve.Hostettler%40wo
> lterskluwer.com%7C7568148487434617407b08d80090b1f2%7C8ac76c91e7f141ffa
> 89c3553b2da2c17%7C0%7C0%7C637259968758509764sdata=ZNFJ5gqEXRv5KR3
> HJUfYZ4rmnGwCiFVGg4IrWTORT2k%3Dreserved=0
> <
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwik
> i.apache.org%2Fconfluence%2Fdisplay%2FIGNITE%2FIEP-2%253A%2BBinary%2Bo
> bject%2Bformat%2Bimprovementsdata=02%7C01%7CSteve.Hostettler%40wo
> lterskluwer.com%7C7568148487434617407b08d80090b1f2%7C8ac76c91e7f141ffa
> 89c3553b2da2c17%7C0%7C0%7C637259968758509764sdata=ZNFJ5gqEXRv5KR3
> HJUfYZ4rmnGwCiFVGg4IrWTORT2k%3Dreserved=0>
>
> that is about optimising the binary marshaller.
>
> The low hanging fruit seemed to be the null compaction so I decided to 
> start with it. Though I am sure I do see some hidden complexity.
>
> Here a couple of questions:
> - Can I assign myself IGNITE-6499 and attach a patch?
> - Who can I contact to help with the review. In the following page
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwik
> i.apache.org%2Fconfluence%2Fdisplay%2FIGNITE%2FHow%2Bto%2BContribute
> mp;data=02%7C01%7CSteve.Hostettler%40wolterskluwer.com%7C7568148487434
> 617407b08d80090b1f2%7C8ac76c91e7f141ffa89c3553b2da2c17%7C0%7C0%7C63725
> 9968758519763sdata=1Uzz8DLO%2B9nd1FPQ14efFeL35QsYE6tT3BvhIKf03FQ%
> 3Dreserved=0 
> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwi
> ki.apache.org%2Fconfluence%2Fdisplay%2FIGNITE%2FHow%2Bto%2BContribute&
> amp;data=02%7C01%7CSteve.Hostettler%40wolterskluwer.com%7C756814848743
> 4617407b08d80090b1f2%7C8ac76c91e7f141ffa89c3553b2da2c17%7C0%7C0%7C6372
> 59968758519763sdata=1Uzz8DLO%2B9nd1FPQ14efFeL35QsYE6tT3BvhIKf03FQ
> %3Dreserved=0> there is no one assigned for marshalling.
>
> On the details:
> The compression is disabled by default as it is not compatible with 
> objects previously marshalled.
>
> My approach was to go a bit beyond the JIRA. No only do I remove the 
> indexes to null fields in the footer, I also remove the 0x65 in the 
> objects. I did not remove them fro the collections and arrays because 
> they are using absolute positioning.
>
> I gain between 5% to 20% depending of my test cases. Obviously the 
> smaller the object and the higher the number of nulls, the higher the 
> compression rate.
>
> Based on that I can quite easily add var int compression which is
> IGNITE-6418 and should significantly increase the compression rate 
> with a lot of integers and longs when only using small numbers.
>
> Next step is to add JMH micro-benchmark to check the impact in terms 
> of performances.
>
>
> Example on a simple object w/ null compaction
>
> Length=55 FooterPosition=50
> 0x67 // ValueType
> 0x01 // FormatVersion
> 0x2b 0x00 //Flags userType=true hasSchema=true offset=1 
> compactFooter=true
> 0x78 0x66 0xbe 0x44 //TypeId
> 0xf9 0xcd 0x07 0x57 //Hashcode
> 0x37 0x00 0x00 0x00 //Length
> 0x3d 0xa8 0x15 0xe4 //SchemaId
> 0x32 0x00 0x00 0x00 //Footer position = 50
> 0x03 0x01 0x00 0x00 0x00 0x03 0x01 0x00 0x00 0x00 0x09 0x03 0x00 0x00 
> 0x00
> 0x61 0x62 0x63 0x09 0x03 0x00 0x00 0x00 0x61 0x62 0x63 Footer length=5
> 0x18 0x1d 0x22 0x2a 0x47
>
> and w/o null compact

Re: IGNITE-6499 Compact NULL fields

2020-05-25 Thread Ilya Kasnacheev
Hello!

I can't help myself but wonder how large of a benefit will it give.  I have
checked the ticket description, it looks the proposed scheme is elaborate
and benefit for non-extreme binary objects rather tiny.

WDYT?

Regards,
-- 
Ilya Kasnacheev


пн, 18 мая 2020 г. в 22:54, steve.hostett...@gmail.com <
steve.hostett...@gmail.com>:

> Hello igniters,
>
> while I would like to help on the calcite because H2 optimiser (or the lack
> thereof) is really killing us, I think that it would be wiser to start by
> contributing on something easier.
>
> Therefore I will tackle another problem that we have which is the memory
> consumption. I stumbled upon this IEP
>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-2%3A+Binary+object+format+improvements
> <
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-2%3A+Binary+object+format+improvements>
>
> that is about optimising the binary marshaller.
>
> The low hanging fruit seemed to be the null compaction so I decided to
> start
> with it. Though I am sure I do see some hidden complexity.
>
> Here a couple of questions:
> - Can I assign myself IGNITE-6499 and attach a patch?
> - Who can I contact to help with the review. In the following page
> https://cwiki.apache.org/confluence/display/IGNITE/How+to+Contribute
> 
> there is no one assigned for marshalling.
>
> On the details:
> The compression is disabled by default as it is not compatible with objects
> previously marshalled.
>
> My approach was to go a bit beyond the JIRA. No only do I remove the
> indexes
> to null fields in the footer, I also remove the 0x65 in the objects. I did
> not remove them fro the collections and arrays because they are using
> absolute positioning.
>
> I gain between 5% to 20% depending of my test cases. Obviously the smaller
> the object and the higher the number of nulls, the higher the compression
> rate.
>
> Based on that I can quite easily add var int compression which is
> IGNITE-6418 and should significantly increase the compression rate with a
> lot of integers and longs when only using small numbers.
>
> Next step is to add JMH micro-benchmark to check the impact in terms of
> performances.
>
>
> Example on a simple object w/ null compaction
>
> Length=55 FooterPosition=50
> 0x67 // ValueType
> 0x01 // FormatVersion
> 0x2b 0x00 //Flags userType=true hasSchema=true offset=1 compactFooter=true
> 0x78 0x66 0xbe 0x44 //TypeId
> 0xf9 0xcd 0x07 0x57 //Hashcode
> 0x37 0x00 0x00 0x00 //Length
> 0x3d 0xa8 0x15 0xe4 //SchemaId
> 0x32 0x00 0x00 0x00 //Footer position = 50
> 0x03 0x01 0x00 0x00 0x00 0x03 0x01 0x00 0x00 0x00 0x09 0x03 0x00 0x00 0x00
> 0x61 0x62 0x63 0x09 0x03 0x00 0x00 0x00 0x61 0x62 0x63
> Footer length=5
> 0x18 0x1d 0x22 0x2a 0x47
>
> and w/o null compaction
> Length=60 FooterPosition=53
> 0x67 // ValueType
> 0x01 // FormatVersion
> 0x2b 0x00 //Flags userType=true hasSchema=true offset=1 compactFooter=true
> 0x78 0x66 0xbe 0x44 //TypeId
> 0xa4 0x43 0x0e 0xf5 //Hashcode
> 0x3c 0x00 0x00 0x00 //Length
> 0x3d 0xa8 0x15 0xe4 //SchemaId
> 0x35 0x00 0x00 0x00 //Footer position = 53
> 0x03 0x01 0x00 0x00 0x00 0x03 0x01 0x00 0x00 0x00 0x09 0x03 0x00 0x00 0x00
> 0x61 0x62 0x63 0x65 0x65 0x65 0x09 0x03 0x00 0x00 0x00 0x61 0x62 0x63
> Footer length=7
> 0x18 0x1d 0x22 0x2a 0x2b 0x2c 0x2d
>
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>


Re: IGNITE-6499 Compact NULL fields

2020-05-18 Thread Zhenya Stanilovsky

Good catch Steve ! I also think about that.
Ticket unassigned — you can proceed with it.
Also you need to ask your jira account for granting rights for further ticket 
assignation.
High level review can make i (@zstan) and probably (@ivandasch).
Also you need to explain how to run new (ignite-6499 fix consistent code) with 
old base. 
 
see u !
> 
>> 
>>>Hello igniters,
>>>
>>>while I would like to help on the calcite because H2 optimiser (or the lack
>>>thereof) is really killing us, I think that it would be wiser to start by
>>>contributing on something easier.
>>>
>>>Therefore I will tackle another problem that we have which is the memory
>>>consumption. I stumbled upon this IEP
>>>https://cwiki.apache.org/confluence/display/IGNITE/IEP-2%3A+Binary+object+format+improvements
>>>< 
>>>https://cwiki.apache.org/confluence/display/IGNITE/IEP-2%3A+Binary+object+format+improvements
>>> >
>>>that is about optimising the binary marshaller.
>>>
>>>The low hanging fruit seemed to be the null compaction so I decided to start
>>>with it. Though I am sure I do see some hidden complexity.
>>>
>>>Here a couple of questions:
>>>- Can I assign myself IGNITE-6499 and attach a patch?
>>>- Who can I contact to help with the review. In the following page
>>>https://cwiki.apache.org/confluence/display/IGNITE/How+to+Contribute
>>>< https://cwiki.apache.org/confluence/display/IGNITE/How+to+Contribute >
>>>there is no one assigned for marshalling.
>>>
>>>On the details:
>>>The compression is disabled by default as it is not compatible with objects
>>>previously marshalled.
>>>
>>>My approach was to go a bit beyond the JIRA. No only do I remove the indexes
>>>to null fields in the footer, I also remove the 0x65 in the objects. I did
>>>not remove them fro the collections and arrays because they are using
>>>absolute positioning.
>>>
>>>I gain between 5% to 20% depending of my test cases. Obviously the smaller
>>>the object and the higher the number of nulls, the higher the compression
>>>rate.
>>>
>>>Based on that I can quite easily add var int compression which is
>>>IGNITE-6418 and should significantly increase the compression rate with a
>>>lot of integers and longs when only using small numbers.
>>>
>>>Next step is to add JMH micro-benchmark to check the impact in terms of
>>>performances.
>>>
>>>
>>>Example on a simple object w/ null compaction
>>>
>>>Length=55 FooterPosition=50
>>>0x67 // ValueType
>>>0x01 // FormatVersion
>>>0x2b 0x00 //Flags userType=true hasSchema=true offset=1 compactFooter=true
>>>0x78 0x66 0xbe 0x44 //TypeId
>>>0xf9 0xcd 0x07 0x57 //Hashcode
>>>0x37 0x00 0x00 0x00 //Length
>>>0x3d 0xa8 0x15 0xe4 //SchemaId
>>>0x32 0x00 0x00 0x00 //Footer position = 50
>>>0x03 0x01 0x00 0x00 0x00 0x03 0x01 0x00 0x00 0x00 0x09 0x03 0x00 0x00 0x00
>>>0x61 0x62 0x63 0x09 0x03 0x00 0x00 0x00 0x61 0x62 0x63
>>>Footer length=5
>>>0x18 0x1d 0x22 0x2a 0x47
>>>
>>>and w/o null compaction
>>>Length=60 FooterPosition=53
>>>0x67 // ValueType
>>>0x01 // FormatVersion
>>>0x2b 0x00 //Flags userType=true hasSchema=true offset=1 compactFooter=true
>>>0x78 0x66 0xbe 0x44 //TypeId
>>>0xa4 0x43 0x0e 0xf5 //Hashcode
>>>0x3c 0x00 0x00 0x00 //Length
>>>0x3d 0xa8 0x15 0xe4 //SchemaId
>>>0x35 0x00 0x00 0x00 //Footer position = 53
>>>0x03 0x01 0x00 0x00 0x00 0x03 0x01 0x00 0x00 0x00 0x09 0x03 0x00 0x00 0x00
>>>0x61 0x62 0x63 0x65 0x65 0x65 0x09 0x03 0x00 0x00 0x00 0x61 0x62 0x63
>>>Footer length=7
>>>0x18 0x1d 0x22 0x2a 0x2b 0x2c 0x2d
>>>
>>>
>>>
>>>
>>>--
>>>Sent from:  http://apache-ignite-developers.2346864.n4.nabble.com/ 
>> 
>> 
>> 
>> 

IGNITE-6499 Compact NULL fields

2020-05-18 Thread steve.hostett...@gmail.com
Hello igniters,

while I would like to help on the calcite because H2 optimiser (or the lack
thereof) is really killing us, I think that it would be wiser to start by
contributing on something easier.

Therefore I will tackle another problem that we have which is the memory
consumption. I stumbled upon this IEP 
https://cwiki.apache.org/confluence/display/IGNITE/IEP-2%3A+Binary+object+format+improvements

  
that is about optimising the binary marshaller.

The low hanging fruit seemed to be the null compaction so I decided to start
with it. Though I am sure I do see some hidden complexity.

Here a couple of questions:
- Can I assign myself IGNITE-6499 and attach a patch?
- Who can I contact to help with the review. In the following page 
https://cwiki.apache.org/confluence/display/IGNITE/How+to+Contribute
  
there is no one assigned for marshalling.

On the details:
The compression is disabled by default as it is not compatible with objects
previously marshalled.

My approach was to go a bit beyond the JIRA. No only do I remove the indexes
to null fields in the footer, I also remove the 0x65 in the objects. I did
not remove them fro the collections and arrays because they are using
absolute positioning.

I gain between 5% to 20% depending of my test cases. Obviously the smaller
the object and the higher the number of nulls, the higher the compression
rate.

Based on that I can quite easily add var int compression which is
IGNITE-6418 and should significantly increase the compression rate with a
lot of integers and longs when only using small numbers.

Next step is to add JMH micro-benchmark to check the impact in terms of
performances.


Example on a simple object w/ null compaction

Length=55 FooterPosition=50
0x67 // ValueType
0x01 // FormatVersion 
0x2b 0x00 //Flags userType=true hasSchema=true offset=1 compactFooter=true
0x78 0x66 0xbe 0x44 //TypeId 
0xf9 0xcd 0x07 0x57 //Hashcode 
0x37 0x00 0x00 0x00 //Length 
0x3d 0xa8 0x15 0xe4 //SchemaId 
0x32 0x00 0x00 0x00 //Footer position = 50 
0x03 0x01 0x00 0x00 0x00 0x03 0x01 0x00 0x00 0x00 0x09 0x03 0x00 0x00 0x00
0x61 0x62 0x63 0x09 0x03 0x00 0x00 0x00 0x61 0x62 0x63 
Footer length=5
0x18 0x1d 0x22 0x2a 0x47 

and w/o null compaction
Length=60 FooterPosition=53
0x67 // ValueType
0x01 // FormatVersion 
0x2b 0x00 //Flags userType=true hasSchema=true offset=1 compactFooter=true
0x78 0x66 0xbe 0x44 //TypeId 
0xa4 0x43 0x0e 0xf5 //Hashcode 
0x3c 0x00 0x00 0x00 //Length 
0x3d 0xa8 0x15 0xe4 //SchemaId 
0x35 0x00 0x00 0x00 //Footer position = 53 
0x03 0x01 0x00 0x00 0x00 0x03 0x01 0x00 0x00 0x00 0x09 0x03 0x00 0x00 0x00
0x61 0x62 0x63 0x65 0x65 0x65 0x09 0x03 0x00 0x00 0x00 0x61 0x62 0x63 
Footer length=7
0x18 0x1d 0x22 0x2a 0x2b 0x2c 0x2d 




--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/


[jira] [Created] (IGNITE-6499) Compact NULL fields binary representation

2017-09-26 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-6499:
--

 Summary: Compact NULL fields binary representation
 Key: IGNITE-6499
 URL: https://issues.apache.org/jira/browse/IGNITE-6499
 Project: Ignite
  Issue Type: Improvement
  Components: binary
Affects Versions: 2.1
Reporter: Alexandr Kuramshin
Assignee: Vladimir Ozerov


Current compact footer implementation writes offset for the every field in 
schema. Depending on serialized size of an object offset may be 1, 2 or 4 bytes.

Imagine an object with some 100 fields are null. It takes from 100 to 400 bytes 
overhead. For middle-sized objects (about 260 bytes) it doubles the memory 
usage. For a small-sized objects (about 40 bytes) the memory usage increased by 
factor 3 or 4.

Proposed two optimizations, the both should be implemented, the most optimal 
implementation should be selected dynamically upon object marshalling.

1. Write field ID and offset for the only non-null fields in footer.

2. Write footer header then field offsets for the only non-null fields as 
follows

[0] bit mask for first 8 fields, 0 - field is null, 1 - field is non-null
[1] cumulative sum of "1" bits
[2] bit mask for the next 8 fields
[3] cumulative sum of "1" bits
... and so on
[N1...N2] offset of first non-null field
[N3...N4] offset of next non-null field
... and so on

If we want to read fields from 0 to 7, then we read first footer byte, step 
through bits and find the offset index for non-null field or find that field is 
null.

If we want to read fields from 8, then we read two footer bytes, take start 
offset from the first byte, and then step through bits and find the offset 
index for non-null field or find that field is null.

This supports up to 255 non-null fields per binary object.

Overhead would be only 24 bytes per 100 null fields instead of 200 bytes for 
the middle-sized object.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)