RE: IGNITE-6499 Compact NULL fields

2020-05-25 Thread Hostettler, Steve
I like the idea, especially because it also would apply across the board.
So you propose to build the binary object and to apply dictionary based 
compression on top.

I could quickly generate a bunch of binary objects from the tests and apply 
java compress/deflate with a dictionary based on the BinaryUtils elements.
To compare with the null compaction and the varint.


-Original Message-
From: Ilya Kasnacheev  
Sent: Monday, May 25, 2020 12:05 PM
To: dev 
Subject: Re: IGNITE-6499 Compact NULL fields

Caution, this email may be from a sender outside Wolters Kluwer. Verify the 
sender and know the content is safe.

Hello!

My take is the following: if conserving memory is needed at all, then we better 
invest in compression (such as dictionary-based row compression) rather than 
implementing varint, compact nulls, etc.

Dictionary-based compression can easily tackle varints, null patterns while 
also compressing strings and repeated values and even things we would never 
think out on our own.

It also has low complexity of our own code, no compatibility issues (people 
store binary objects in 3rd party storage, they do indeed) and low incidence of 
bugs.

Regards,
--
Ilya Kasnacheev


пн, 25 мая 2020 г. в 12:51, Hostettler, Steve <
steve.hostett...@wolterskluwer.com>:

> I went for a simpler approach (only with null mask( and yes the gain 
> is high for smaller object but low otherwise. I gain between 5-20% on 
> my objects. But to me it is the step stone to easily implement other 
> optimisations like varint and schemaless without using raw. Trying to 
> solve the latest unit tests to give you a better idea. If not worth 
> then let's not do it but it is worth a try I think.
>
>
> -Original Message-
> From: Ilya Kasnacheev 
> Sent: Monday, May 25, 2020 11:48 AM
> To: dev 
> Subject: Re: IGNITE-6499 Compact NULL fields
>
> Caution, this email may be from a sender outside Wolters Kluwer. 
> Verify the sender and know the content is safe.
>
> Hello!
>
> I can't help myself but wonder how large of a benefit will it give.  I 
> have checked the ticket description, it looks the proposed scheme is 
> elaborate and benefit for non-extreme binary objects rather tiny.
>
> WDYT?
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> пн, 18 мая 2020 г. в 22:54, steve.hostett...@gmail.com <
> steve.hostett...@gmail.com>:
>
> > Hello igniters,
> >
> > while I would like to help on the calcite because H2 optimiser (or 
> > the lack
> > thereof) is really killing us, I think that it would be wiser to 
> > start by contributing on something easier.
> >
> > Therefore I will tackle another problem that we have which is the 
> > memory consumption. I stumbled upon this IEP
> >
> > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcw
> > ik 
> > i.apache.org%2Fconfluence%2Fdisplay%2FIGNITE%2FIEP-2%253A%2BBinary%2
> > Bo 
> > bject%2Bformat%2Bimprovementsdata=02%7C01%7CSteve.Hostettler%40
> > wo 
> > lterskluwer.com%7C7568148487434617407b08d80090b1f2%7C8ac76c91e7f141f
> > fa
> > 89c3553b2da2c17%7C0%7C0%7C637259968758509764sdata=ZNFJ5gqEXRv5K
> > R3
> > HJUfYZ4rmnGwCiFVGg4IrWTORT2k%3Dreserved=0
> > <
> > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcw
> > ik 
> > i.apache.org%2Fconfluence%2Fdisplay%2FIGNITE%2FIEP-2%253A%2BBinary%2
> > Bo 
> > bject%2Bformat%2Bimprovementsdata=02%7C01%7CSteve.Hostettler%40
> > wo 
> > lterskluwer.com%7C7568148487434617407b08d80090b1f2%7C8ac76c91e7f141f
> > fa
> > 89c3553b2da2c17%7C0%7C0%7C637259968758509764sdata=ZNFJ5gqEXRv5K
> > R3 HJUfYZ4rmnGwCiFVGg4IrWTORT2k%3Dreserved=0>
> >
> > that is about optimising the binary marshaller.
> >
> > The low hanging fruit seemed to be the null compaction so I decided 
> > to start with it. Though I am sure I do see some hidden complexity.
> >
> > Here a couple of questions:
> > - Can I assign myself IGNITE-6499 and attach a patch?
> > - Who can I contact to help with the review. In the following page 
> > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcw
> > ik 
> > i.apache.org%2Fconfluence%2Fdisplay%2FIGNITE%2FHow%2Bto%2BContribute
> > 
> > mp;data=02%7C01%7CSteve.Hostettler%40wolterskluwer.com%7C75681484874
> > 34
> > 617407b08d80090b1f2%7C8ac76c91e7f141ffa89c3553b2da2c17%7C0%7C0%7C637
> > 25 
> > 9968758519763sdata=1Uzz8DLO%2B9nd1FPQ14efFeL35QsYE6tT3BvhIKf03F
> > Q%
> > 3Dreserved=0
> > <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fc
> > wi 
> > ki.apache.org%2Fconfluence%2Fdisplay%2FIGNITE%2FHow%2Bto%2BContribut
> > e&

RE: IGNITE-6499 Compact NULL fields

2020-05-25 Thread Hostettler, Steve
I went for a simpler approach (only with null mask( and yes the gain is high 
for smaller object but low otherwise. I gain between 5-20% on my objects. But 
to me it is the step stone to easily implement other optimisations like varint 
and schemaless without using raw. Trying to solve the latest unit tests to give 
you a better idea. If not worth then let's not do it but it is worth a try I 
think.


-Original Message-
From: Ilya Kasnacheev  
Sent: Monday, May 25, 2020 11:48 AM
To: dev 
Subject: Re: IGNITE-6499 Compact NULL fields

Caution, this email may be from a sender outside Wolters Kluwer. Verify the 
sender and know the content is safe.

Hello!

I can't help myself but wonder how large of a benefit will it give.  I have 
checked the ticket description, it looks the proposed scheme is elaborate and 
benefit for non-extreme binary objects rather tiny.

WDYT?

Regards,
--
Ilya Kasnacheev


пн, 18 мая 2020 г. в 22:54, steve.hostett...@gmail.com <
steve.hostett...@gmail.com>:

> Hello igniters,
>
> while I would like to help on the calcite because H2 optimiser (or the 
> lack
> thereof) is really killing us, I think that it would be wiser to start 
> by contributing on something easier.
>
> Therefore I will tackle another problem that we have which is the 
> memory consumption. I stumbled upon this IEP
>
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwik
> i.apache.org%2Fconfluence%2Fdisplay%2FIGNITE%2FIEP-2%253A%2BBinary%2Bo
> bject%2Bformat%2Bimprovementsdata=02%7C01%7CSteve.Hostettler%40wo
> lterskluwer.com%7C7568148487434617407b08d80090b1f2%7C8ac76c91e7f141ffa
> 89c3553b2da2c17%7C0%7C0%7C637259968758509764sdata=ZNFJ5gqEXRv5KR3
> HJUfYZ4rmnGwCiFVGg4IrWTORT2k%3Dreserved=0
> <
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwik
> i.apache.org%2Fconfluence%2Fdisplay%2FIGNITE%2FIEP-2%253A%2BBinary%2Bo
> bject%2Bformat%2Bimprovementsdata=02%7C01%7CSteve.Hostettler%40wo
> lterskluwer.com%7C7568148487434617407b08d80090b1f2%7C8ac76c91e7f141ffa
> 89c3553b2da2c17%7C0%7C0%7C637259968758509764sdata=ZNFJ5gqEXRv5KR3
> HJUfYZ4rmnGwCiFVGg4IrWTORT2k%3Dreserved=0>
>
> that is about optimising the binary marshaller.
>
> The low hanging fruit seemed to be the null compaction so I decided to 
> start with it. Though I am sure I do see some hidden complexity.
>
> Here a couple of questions:
> - Can I assign myself IGNITE-6499 and attach a patch?
> - Who can I contact to help with the review. In the following page
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwik
> i.apache.org%2Fconfluence%2Fdisplay%2FIGNITE%2FHow%2Bto%2BContribute
> mp;data=02%7C01%7CSteve.Hostettler%40wolterskluwer.com%7C7568148487434
> 617407b08d80090b1f2%7C8ac76c91e7f141ffa89c3553b2da2c17%7C0%7C0%7C63725
> 9968758519763sdata=1Uzz8DLO%2B9nd1FPQ14efFeL35QsYE6tT3BvhIKf03FQ%
> 3Dreserved=0 
>  ki.apache.org%2Fconfluence%2Fdisplay%2FIGNITE%2FHow%2Bto%2BContribute&
> amp;data=02%7C01%7CSteve.Hostettler%40wolterskluwer.com%7C756814848743
> 4617407b08d80090b1f2%7C8ac76c91e7f141ffa89c3553b2da2c17%7C0%7C0%7C6372
> 59968758519763sdata=1Uzz8DLO%2B9nd1FPQ14efFeL35QsYE6tT3BvhIKf03FQ
> %3Dreserved=0> there is no one assigned for marshalling.
>
> On the details:
> The compression is disabled by default as it is not compatible with 
> objects previously marshalled.
>
> My approach was to go a bit beyond the JIRA. No only do I remove the 
> indexes to null fields in the footer, I also remove the 0x65 in the 
> objects. I did not remove them fro the collections and arrays because 
> they are using absolute positioning.
>
> I gain between 5% to 20% depending of my test cases. Obviously the 
> smaller the object and the higher the number of nulls, the higher the 
> compression rate.
>
> Based on that I can quite easily add var int compression which is
> IGNITE-6418 and should significantly increase the compression rate 
> with a lot of integers and longs when only using small numbers.
>
> Next step is to add JMH micro-benchmark to check the impact in terms 
> of performances.
>
>
> Example on a simple object w/ null compaction
>
> Length=55 FooterPosition=50
> 0x67 // ValueType
> 0x01 // FormatVersion
> 0x2b 0x00 //Flags userType=true hasSchema=true offset=1 
> compactFooter=true
> 0x78 0x66 0xbe 0x44 //TypeId
> 0xf9 0xcd 0x07 0x57 //Hashcode
> 0x37 0x00 0x00 0x00 //Length
> 0x3d 0xa8 0x15 0xe4 //SchemaId
> 0x32 0x00 0x00 0x00 //Footer position = 50
> 0x03 0x01 0x00 0x00 0x00 0x03 0x01 0x00 0x00 0x00 0x09 0x03 0x00 0x00 
> 0x00
> 0x61 0x62 0x63 0x09 0x03 0x00 0x00 0x00 0x61 0x62 0x63 Footer length=5
> 0x18 0x1d 0x22 0x2a 0x47
>
> and w/o null compaction
> Length=60 FooterPosition=53
> 0x67 // ValueType
> 0x01 // FormatVersion
> 0x2b 0x00 //Flags userType=true hasSchema=true offset=1 
> compactFooter=true
> 0x78 0x66 0xbe 0x44 //TypeId
> 0xa4 0x43 0x0e 0xf5 //Hashcode
> 0x3c 0x00 0x00 0x00 //Length
> 0x3d 0xa8 0x15 0xe4 

RE: New SQL execution engine

2019-11-18 Thread Hostettler, Steve
Hi Roman,

Thanks a lot for the answer (and the pull request). As I said initially, I was 
under the impression that the reason was the lack of affinity.
I understand the reason and the current design and I think we all agreed that 
this is not optimal and that it should be reworked in the new design. 
Especially the sort of silent behavior. That being said, more than a warning : 
having joins in // inter partitions would be very helpful but I understand that 
it is not straightforward.

As always you guys are very reactive and helpful. Keep up the great work. 
Appreciate it.

-Original Message-
From: Roman Kondakov  
Sent: Monday, November 18, 2019 11:04 AM
To: dev@ignite.apache.org
Subject: Re: New SQL execution engine

Hi, Steve

This behavior is actually not a bug, but this is not obvious. I'll try to 
explain.

When query parallelism = N is turned on, it means that each cache is divided 
into N parts from the SQL point of view. Every SQL query is executed 
independently over each particular part, and then results are merged together 
during the reducer step.

This is absolutely identical to the distributed query execution, where instead 
of a single node with query parallelism = N, we have N nodes with query 
parallelism = 1. SQL query is executed over each partition of data on all nodes 
and then results are merged on reducer.

As we can see, query parallelism is equivalent to the distributed query 
execution. When we do joins over distributed tables, we need to think about the 
collocation of data [1]. If data is not collocated, we get a wrong result. This 
happens silently, which is not good, IMO.

I reworked your example a bit in order to impose collocation on the joining key 
and now join returns correct result [2].

Current approach in configuration and query execution looks very uncomfortable 
and should be completely redesigned in the new engine.

[1] 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapacheignite-sql.readme.io%2Fdocs%2Fdistributed-joinsdata=02%7C01%7CSteve.Hostettler%40wolterskluwer.com%7C68a93ad417fc4e70ed1808d76c0e9f53%7C8ac76c91e7f141ffa89c3553b2da2c17%7C0%7C0%7C637096682368420072sdata=82bDWI1PHUOzNz95A5F%2Flyiqlrb9aQ2vadxhE%2FK47LM%3Dreserved=0

[2] 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fhostettler%2FigniteParallelQueries%2Fpull%2F1data=02%7C01%7CSteve.Hostettler%40wolterskluwer.com%7C68a93ad417fc4e70ed1808d76c0e9f53%7C8ac76c91e7f141ffa89c3553b2da2c17%7C0%7C0%7C637096682368420072sdata=QCvNEKqGGyZYOXQbF0sG0DUCzYJCnKoWleFTMtngcsc%3Dreserved=0


--
Kind Regards
Roman Kondakov

On 16.11.2019 12:50, steve.hostett...@gmail.com wrote:
> Actually I am now wondering whether this is not just a bug and that I 
> should record it as such. As the behavior is different with and 
> without the parallelism and there is no warning during execution or in the 
> api.
>
> Any thought?
>
>
>
> --
> Sent from: 
> https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fapach
> e-ignite-developers.2346864.n4.nabble.com%2Fdata=02%7C01%7CSteve.
> Hostettler%40wolterskluwer.com%7C68a93ad417fc4e70ed1808d76c0e9f53%7C8a
> c76c91e7f141ffa89c3553b2da2c17%7C0%7C0%7C637096682368420072sdata=
> LzUii%2BuNqHhS1YbFLNwpe7cn6XRRpKrrSO6wS5zNlSU%3Dreserved=0


RE: New SQL execution engine

2019-11-18 Thread Hostettler, Steve
Ivan,

Thanks that is good news. I use ignite as a platform and not directly to exec 
in-house application so these types of things are making the generic code less  
generic .

Thanks a lot for the great work.

-Original Message-
From: Ivan Pavlukhin  
Sent: Monday, November 18, 2019 10:13 AM
To: dev 
Subject: Re: New SQL execution engine

Steve,

Yep, unfortunately query parallelism in current flavor is counter-intuitive. 
But it was designed so =( As Roman wrote
> And of course this feature should also be available in the new engine, though 
> it's architecture may be changed.
The architecture of parallel execution will be definitely reconsidered. And 
currently we are targeted to do it so in one node cluster query will return the 
same results regardless parallelism.

сб, 16 нояб. 2019 г. в 12:48, steve.hostett...@gmail.com
:
>
> Actually I am now wondering whether this is not just a bug and that I 
> should record it as such. As the behavior is different with and 
> without the parallelism and there is no warning during execution or in the 
> api.
>
> Any thought?
>
>
>
> --
> Sent from: 
> https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fapach
> e-ignite-developers.2346864.n4.nabble.com%2Fdata=02%7C01%7CSteve.
> Hostettler%40wolterskluwer.com%7Cac6000fb14834d1abfa108d76c079273%7C8a
> c76c91e7f141ffa89c3553b2da2c17%7C0%7C0%7C637096652092270800sdata=
> PcitGXmdx5DittW1RMAOEeneiLfKVrydUHL8uCKGi3g%3Dreserved=0



--
Best regards,
Ivan Pavlukhin


RE: New SQL execution engine

2019-11-15 Thread Hostettler, Steve
Hi Roman,

Actually it does not work as I expect it. Please see 
https://github.com/hostettler/igniteParallelQueries
Do mvn clean install and then java -jar 
target/ignite-parallel-query-1.0.0-SNAPSHOT-jar-with-dependencies.jar

This demonstrates that with or without the flag the query does not return the 
same result. I understand that it probably because I did not set an affinity 
but it is very counter-intuitive.

Am I missing something?

-Original Message-
From: Roman Kondakov  
Sent: Friday, November 15, 2019 11:46 AM
To: dev@ignite.apache.org
Subject: Re: New SQL execution engine

Hi Steve,

it is possible to execute queries in parallel even in the current engine, see 
docs here [1]. And of course this feature should also be available in the new 
engine, though it's architecture may be changed.

[1]
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapacheignite.readme.io%2Fv2.0%2Fdocs%2Fsql-performance-and-debugging%23query-parallelismdata=02%7C01%7CSteve.Hostettler%40wolterskluwer.com%7C2b752425baeb422af60408d769b9159d%7C8ac76c91e7f141ffa89c3553b2da2c17%7C0%7C0%7C637094115967087030sdata=eN7b2RCJegg8J9KQVK6TIFhcS6NG7j5pWKFxX9GWyYk%3Dreserved=0


-- 
Kind Regards
Roman Kondakov

On 15.11.2019 12:53, steve.hostett...@gmail.com wrote:
> Dear all,
>
> would it be possible to also have then // execution of sql queries on single
> node with that approach?
> My understanding is that, for the moment, the SQL queries a re
> single-threaded for a given node if there is no affinity.
>
> Best Regards
>
>
>
> --
> Sent from: 
> https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fapache-ignite-developers.2346864.n4.nabble.com%2Fdata=02%7C01%7CSteve.Hostettler%40wolterskluwer.com%7C2b752425baeb422af60408d769b9159d%7C8ac76c91e7f141ffa89c3553b2da2c17%7C0%7C0%7C637094115967087030sdata=jXtLMt2dWYqM4KcRFkw4lby6K0o8glKnrLFgxZ96LbQ%3Dreserved=0