[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function

2020-08-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16267 )

Change subject: IMPALA-10017: Implement ds_kll_union() function
..

IMPALA-10017: Implement ds_kll_union() function

This function receives a set of serialized Apache DataSketches KLL
sketches produced by ds_kll_sketch() and merges them into a single
sketch.

An example usage is to create a sketch for each partition of a table,
write these sketches to a separate table and based on which partition
the user is interested of the relevant sketches can be union-ed
together to get an estimate. E.g.:
  SELECT
  ds_kll_quantile(ds_kll_union(sketch_col), 0.5)
  FROM sketch_tbl
  WHERE partition_col=1 OR partition_col=5;

Testing:
  - Apart from the automated tests I added to this patch I also
tested ds_kll_union() on a bigger dataset to check that
serialization, deserialization and merging steps work well. I
took TPCH25.linelitem, created a number of sketches with grouping
by l_shipdate and called ds_kll_union() on those sketches.

Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf
Reviewed-on: http://gerrit.cloudera.org:8080/16267
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/data/README
A testdata/data/kll_sketches_from_impala.parquet
M testdata/workloads/functional-query/queries/QueryTest/datasketches-kll.test
M tests/query_test/test_datasketches.py
7 files changed, 204 insertions(+), 39 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/16267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf
Gerrit-Change-Number: 16267
Gerrit-PatchSet: 10
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function

2020-08-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16267 )

Change subject: IMPALA-10017: Implement ds_kll_union() function
..


Patch Set 9: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/16267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf
Gerrit-Change-Number: 16267
Gerrit-PatchSet: 9
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Sat, 08 Aug 2020 11:50:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function

2020-08-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16267 )

Change subject: IMPALA-10017: Implement ds_kll_union() function
..


Patch Set 9:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6253/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf
Gerrit-Change-Number: 16267
Gerrit-PatchSet: 9
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Sat, 08 Aug 2020 06:35:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function

2020-08-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16267 )

Change subject: IMPALA-10017: Implement ds_kll_union() function
..


Patch Set 9: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf
Gerrit-Change-Number: 16267
Gerrit-PatchSet: 9
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Sat, 08 Aug 2020 06:35:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function

2020-08-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16267 )

Change subject: IMPALA-10017: Implement ds_kll_union() function
..


Patch Set 8: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6252/


--
To view, visit http://gerrit.cloudera.org:8080/16267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf
Gerrit-Change-Number: 16267
Gerrit-PatchSet: 8
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Sat, 08 Aug 2020 01:16:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function

2020-08-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16267 )

Change subject: IMPALA-10017: Implement ds_kll_union() function
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6839/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf
Gerrit-Change-Number: 16267
Gerrit-PatchSet: 7
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 07 Aug 2020 21:28:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function

2020-08-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16267 )

Change subject: IMPALA-10017: Implement ds_kll_union() function
..


Patch Set 8: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf
Gerrit-Change-Number: 16267
Gerrit-PatchSet: 8
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 07 Aug 2020 21:04:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function

2020-08-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16267 )

Change subject: IMPALA-10017: Implement ds_kll_union() function
..


Patch Set 8:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6252/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf
Gerrit-Change-Number: 16267
Gerrit-PatchSet: 8
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 07 Aug 2020 21:04:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function

2020-08-07 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16267 )

Change subject: IMPALA-10017: Implement ds_kll_union() function
..


Patch Set 7: Code-Review+2

Carry +2 from Csaba


--
To view, visit http://gerrit.cloudera.org:8080/16267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf
Gerrit-Change-Number: 16267
Gerrit-PatchSet: 7
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 07 Aug 2020 21:04:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function

2020-08-07 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16267 )

Change subject: IMPALA-10017: Implement ds_kll_union() function
..


Patch Set 7:

PS7 is rebase with master


--
To view, visit http://gerrit.cloudera.org:8080/16267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf
Gerrit-Change-Number: 16267
Gerrit-PatchSet: 7
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 07 Aug 2020 21:03:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function

2020-08-07 Thread Gabor Kaszab (Code Review)
Hello Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16267

to look at the new patch set (#7).

Change subject: IMPALA-10017: Implement ds_kll_union() function
..

IMPALA-10017: Implement ds_kll_union() function

This function receives a set of serialized Apache DataSketches KLL
sketches produced by ds_kll_sketch() and merges them into a single
sketch.

An example usage is to create a sketch for each partition of a table,
write these sketches to a separate table and based on which partition
the user is interested of the relevant sketches can be union-ed
together to get an estimate. E.g.:
  SELECT
  ds_kll_quantile(ds_kll_union(sketch_col), 0.5)
  FROM sketch_tbl
  WHERE partition_col=1 OR partition_col=5;

Testing:
  - Apart from the automated tests I added to this patch I also
tested ds_kll_union() on a bigger dataset to check that
serialization, deserialization and merging steps work well. I
took TPCH25.linelitem, created a number of sketches with grouping
by l_shipdate and called ds_kll_union() on those sketches.

Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/data/README
A testdata/data/kll_sketches_from_impala.parquet
M testdata/workloads/functional-query/queries/QueryTest/datasketches-kll.test
M tests/query_test/test_datasketches.py
7 files changed, 204 insertions(+), 39 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/16267/7
--
To view, visit http://gerrit.cloudera.org:8080/16267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf
Gerrit-Change-Number: 16267
Gerrit-PatchSet: 7
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function

2020-08-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16267 )

Change subject: IMPALA-10017: Implement ds_kll_union() function
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6824/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf
Gerrit-Change-Number: 16267
Gerrit-PatchSet: 6
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 07 Aug 2020 10:56:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function

2020-08-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16267 )

Change subject: IMPALA-10017: Implement ds_kll_union() function
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6823/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf
Gerrit-Change-Number: 16267
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 07 Aug 2020 10:45:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function

2020-08-07 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16267 )

Change subject: IMPALA-10017: Implement ds_kll_union() function
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16267/3/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/16267/3/be/src/exprs/aggregate-functions-ir.cc@1851
PS3, Line 1851:
> Can you also add a similar block for HLL (line 1796)? It is ok to do that i
Done



--
To view, visit http://gerrit.cloudera.org:8080/16267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf
Gerrit-Change-Number: 16267
Gerrit-PatchSet: 6
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 07 Aug 2020 10:44:16 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function

2020-08-07 Thread Gabor Kaszab (Code Review)
Hello Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16267

to look at the new patch set (#6).

Change subject: IMPALA-10017: Implement ds_kll_union() function
..

IMPALA-10017: Implement ds_kll_union() function

This function receives a set of serialized Apache DataSketches KLL
sketches produced by ds_kll_sketch() and merges them into a single
sketch.

An example usage is to create a sketch for each partition of a table,
write these sketches to a separate table and based on which partition
the user is interested of the relevant sketches can be union-ed
together to get an estimate. E.g.:
  SELECT
  ds_kll_quantile(ds_kll_union(sketch_col), 0.5)
  FROM sketch_tbl
  WHERE partition_col=1 OR partition_col=5;

Testing:
  - Apart from the automated tests I added to this patch I also
tested ds_kll_union() on a bigger dataset to check that
serialization, deserialization and merging steps work well. I
took TPCH25.linelitem, created a number of sketches with grouping
by l_shipdate and called ds_kll_union() on those sketches.

Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/data/README
A testdata/data/kll_sketches_from_impala.parquet
M testdata/workloads/functional-query/queries/QueryTest/datasketches-kll.test
M tests/query_test/test_datasketches.py
7 files changed, 204 insertions(+), 39 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/16267/6
--
To view, visit http://gerrit.cloudera.org:8080/16267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf
Gerrit-Change-Number: 16267
Gerrit-PatchSet: 6
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function

2020-08-07 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16267 )

Change subject: IMPALA-10017: Implement ds_kll_union() function
..


Patch Set 5:

PS5 is a rebase with master.


--
To view, visit http://gerrit.cloudera.org:8080/16267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf
Gerrit-Change-Number: 16267
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 07 Aug 2020 10:24:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function

2020-08-07 Thread Gabor Kaszab (Code Review)
Hello Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16267

to look at the new patch set (#5).

Change subject: IMPALA-10017: Implement ds_kll_union() function
..

IMPALA-10017: Implement ds_kll_union() function

This function receives a set of serialized Apache DataSketches KLL
sketches produced by ds_kll_sketch() and merges them into a single
sketch.

An example usage is to create a sketch for each partition of a table,
write these sketches to a separate table and based on which partition
the user is interested of the relevant sketches can be union-ed
together to get an estimate. E.g.:
  SELECT
  ds_kll_quantile(ds_kll_union(sketch_col), 0.5)
  FROM sketch_tbl
  WHERE partition_col=1 OR partition_col=5;

Testing:
  - Apart from the automated tests I added to this patch I also
tested ds_kll_union() on a bigger dataset to check that
serialization, deserialization and merging steps work well. I
took TPCH25.linelitem, created a number of sketches with grouping
by l_shipdate and called ds_kll_union() on those sketches.

Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/data/README
A testdata/data/kll_sketches_from_impala.parquet
M testdata/workloads/functional-query/queries/QueryTest/datasketches-kll.test
M tests/query_test/test_datasketches.py
7 files changed, 199 insertions(+), 37 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/16267/5
--
To view, visit http://gerrit.cloudera.org:8080/16267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf
Gerrit-Change-Number: 16267
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function

2020-08-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16267 )

Change subject: IMPALA-10017: Implement ds_kll_union() function
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6806/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf
Gerrit-Change-Number: 16267
Gerrit-PatchSet: 4
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 06 Aug 2020 13:15:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function

2020-08-06 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16267 )

Change subject: IMPALA-10017: Implement ds_kll_union() function
..


Patch Set 4: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16267/3/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/16267/3/be/src/exprs/aggregate-functions-ir.cc@1851
PS3, Line 1851: etch)) {
> The code you linked is urelated here, it is for HLL sketches. However, the
Can you also add a similar block for HLL (line 1796)? It is ok to do that in 
another patch, but I think that it is the simplest to do it here.



--
To view, visit http://gerrit.cloudera.org:8080/16267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf
Gerrit-Change-Number: 16267
Gerrit-PatchSet: 4
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 06 Aug 2020 13:12:24 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function

2020-08-06 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16267 )

Change subject: IMPALA-10017: Implement ds_kll_union() function
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16267/3/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/16267/3/be/src/exprs/aggregate-functions-ir.cc@1851
PS3, Line 1851: etch)) {
> Can you add a try-catch block? I randomly checked a deserialize function an
The code you linked is urelated here, it is for HLL sketches. However, the one 
for KLL ca also throw:
https://github.com/apache/impala/blob/074731e2bcf37643710f2fdf236829991a462fc3/be/src/thirdparty/datasketches/kll_sketch_impl.hpp#L534
E.g. ensure_minimum_memory() throws, haven't checked the rest, I'll add a 
try-catch block, thanks for spotting.
Done


http://gerrit.cloudera.org:8080/#/c/16267/3/be/src/exprs/aggregate-functions-ir.cc@1922
PS3, Line 1922:   DCHECK(!dst->is_null);
> merge can throw an exception, please put it in a try-catch block:
Done.
I also found one occurrence above, changed that as well.



--
To view, visit http://gerrit.cloudera.org:8080/16267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf
Gerrit-Change-Number: 16267
Gerrit-PatchSet: 4
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 06 Aug 2020 12:52:46 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function

2020-08-06 Thread Gabor Kaszab (Code Review)
Hello Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16267

to look at the new patch set (#4).

Change subject: IMPALA-10017: Implement ds_kll_union() function
..

IMPALA-10017: Implement ds_kll_union() function

This function receives a set of serialized Apache DataSketches KLL
sketches produced by ds_kll_sketch() and merges them into a single
sketch.

An example usage is to create a sketch for each partition of a table,
write these sketches to a separate table and based on which partition
the user is interested of the relevant sketches can be union-ed
together to get an estimate. E.g.:
  SELECT
  ds_kll_quantile(ds_kll_union(sketch_col), 0.5)
  FROM sketch_tbl
  WHERE partition_col=1 OR partition_col=5;

Testing:
  - Apart from the automated tests I added to this patch I also
tested ds_kll_union() on a bigger dataset to check that
serialization, deserialization and merging steps work well. I
took TPCH25.linelitem, created a number of sketches with grouping
by l_shipdate and called ds_kll_union() on those sketches.

Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/data/README
A testdata/data/kll_sketches_from_impala.parquet
M testdata/workloads/functional-query/queries/QueryTest/datasketches-kll.test
M tests/query_test/test_datasketches.py
7 files changed, 199 insertions(+), 37 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/16267/4
--
To view, visit http://gerrit.cloudera.org:8080/16267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf
Gerrit-Change-Number: 16267
Gerrit-PatchSet: 4
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function

2020-08-05 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16267 )

Change subject: IMPALA-10017: Implement ds_kll_union() function
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16267/3/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/16267/3/be/src/exprs/aggregate-functions-ir.cc@1851
PS3, Line 1851: deserialize
Can you add a try-catch block? I randomly checked a deserialize function and it 
can throw exception:
https://github.com/apache/impala/blob/074731e2bcf37643710f2fdf236829991a462fc3/be/src/thirdparty/datasketches/HllSketchImplFactory.hpp#L86

It would be nice to do this for other similar calls to datasketches functions 
too.


http://gerrit.cloudera.org:8080/#/c/16267/3/be/src/exprs/aggregate-functions-ir.cc@1922
PS3, Line 1922:   dst_sketch->merge(src_sketch);
merge can throw an exception, please put it in a try-catch block:
https://github.com/apache/impala/blob/074731e2bcf37643710f2fdf236829991a462fc3/be/src/thirdparty/datasketches/kll_sketch_impl.hpp#L182



--
To view, visit http://gerrit.cloudera.org:8080/16267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf
Gerrit-Change-Number: 16267
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 05 Aug 2020 15:10:57 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function

2020-07-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16267 )

Change subject: IMPALA-10017: Implement ds_kll_union() function
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6751/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf
Gerrit-Change-Number: 16267
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 31 Jul 2020 14:09:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function

2020-07-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16267 )

Change subject: IMPALA-10017: Implement ds_kll_union() function
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6750/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf
Gerrit-Change-Number: 16267
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 31 Jul 2020 14:05:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function

2020-07-31 Thread Gabor Kaszab (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16267

to look at the new patch set (#2).

Change subject: IMPALA-10017: Implement ds_kll_union() function
..

IMPALA-10017: Implement ds_kll_union() function

This function receives a set of serialized Apache DataSketches KLL
sketches produced by ds_kll_sketch() and merges them into a single
sketch.

An example usage is to create a sketch for each partition of a table,
write these sketches to a separate table and based on which partition
the user is interested of the relevant sketches can be union-ed
together to get an estimate. E.g.:
  SELECT
  ds_kll_quantile(ds_kll_union(sketch_col), 0.5)
  FROM sketch_tbl
  WHERE partition_col=1 OR partition_col=5;

Testing:
  - Apart from the automated tests I added to this patch I also
tested ds_kll_union() on a bigger dataset to check that
serialization, deserialization and merging steps work well. I
took TPCH25.linelitem, created a number of sketches with grouping
by l_shipdate and called ds_kll_union() on those sketches.

Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/data/README
A testdata/data/kll_sketches_from_impala.parquet
M testdata/workloads/functional-query/queries/QueryTest/datasketches-kll.test
M tests/query_test/test_datasketches.py
7 files changed, 184 insertions(+), 37 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/16267/2
--
To view, visit http://gerrit.cloudera.org:8080/16267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf
Gerrit-Change-Number: 16267
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function

2020-07-31 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16267


Change subject: IMPALA-10017: Implement ds_kll_union() function
..

IMPALA-10017: Implement ds_kll_union() function

This function receives a set of serialized Apache DataSketches KLL
sketches produced by ds_kll_sketch() and merges them into a single
sketch.

An example usage is to create a sketch for each partition of a table,
write these sketches to a separate table and based on which partition
the user is interested of the relevant sketches can be union-ed
together to get an estimate. E.g.:
  SELECT
  ds_kll_quantile(ds_kll_union(sketch_col), 0.5)
  FROM sketch_tbl
  WHERE partition_col=1 OR partition_col=5;

Testing:
  - Apart from the automated tests I added to this patch I also
tested ds_kll_union() on a bigger dataset to check that
serialization, deserialization and merging steps work well. I
took TPCH25.linelitem, created a number of sketches with grouping
by l_shipdate and called ds_kll_union() on those sketches.

Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/data/README
M testdata/workloads/functional-query/queries/QueryTest/datasketches-kll.test
M tests/query_test/test_datasketches.py
6 files changed, 184 insertions(+), 37 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/16267/1
--
To view, visit http://gerrit.cloudera.org:8080/16267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf
Gerrit-Change-Number: 16267
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab