[Impala-ASF-CR] IMPALA-10467: Implement ds theta union() function

2021-02-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/17048 )

Change subject: IMPALA-10467: Implement ds_theta_union() function
..

IMPALA-10467: Implement ds_theta_union() function

This function receives a set of serialized Apache DataSketches Theta
sketches produced by ds_theta_sketch() and merges them into a single
sketch.

An example usage is to create a sketch for each partition of a table,
write these sketches to a separate table and based on which partition
the user is interested of the relevant sketches can be union-ed
together to get an estimate. E.g.:
  SELECT
  ds_theta_estimate(ds_theta_union(sketch_col))
  FROM sketch_tbl
  WHERE partition_col=1 OR partition_col=5;

Testing:
  - Apart from the automated tests I added to this patch I also
tested ds_theta_union() on a bigger dataset to check that
serialization, deserialization and merging steps work well. I
took TPCH25.linelitem, created a number of sketches with grouping
by l_shipdate and called ds_theta_union() on those sketches

Change-Id: I91baf58c76eb43748acd5245047edac8c66761b2
Reviewed-on: http://gerrit.cloudera.org:8080/17048
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/data/README
A testdata/data/theta_sketches_from_impala.parquet
M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test
M tests/query_test/test_datasketches.py
7 files changed, 152 insertions(+), 0 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/17048
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I91baf58c76eb43748acd5245047edac8c66761b2
Gerrit-Change-Number: 17048
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10467: Implement ds theta union() function

2021-02-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17048 )

Change subject: IMPALA-10467: Implement ds_theta_union() function
..


Patch Set 3: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/17048
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91baf58c76eb43748acd5245047edac8c66761b2
Gerrit-Change-Number: 17048
Gerrit-PatchSet: 3
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 19 Feb 2021 13:32:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10467: Implement ds theta union() function

2021-02-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17048 )

Change subject: IMPALA-10467: Implement ds_theta_union() function
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8168/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17048
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91baf58c76eb43748acd5245047edac8c66761b2
Gerrit-Change-Number: 17048
Gerrit-PatchSet: 2
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 19 Feb 2021 08:12:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10467: Implement ds theta union() function

2021-02-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17048 )

Change subject: IMPALA-10467: Implement ds_theta_union() function
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17048
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91baf58c76eb43748acd5245047edac8c66761b2
Gerrit-Change-Number: 17048
Gerrit-PatchSet: 3
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 19 Feb 2021 07:50:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10467: Implement ds theta union() function

2021-02-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17048 )

Change subject: IMPALA-10467: Implement ds_theta_union() function
..


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6902/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/17048
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91baf58c76eb43748acd5245047edac8c66761b2
Gerrit-Change-Number: 17048
Gerrit-PatchSet: 3
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 19 Feb 2021 07:50:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10467: Implement ds theta union() function

2021-02-18 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17048 )

Change subject: IMPALA-10467: Implement ds_theta_union() function
..


Patch Set 2: Code-Review+2

Thanks for implementing this! It seems that adding new and new DataSketches 
functionality is sometimes more copy-paste and names rewrite than actually 
implementing something new :)


--
To view, visit http://gerrit.cloudera.org:8080/17048
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91baf58c76eb43748acd5245047edac8c66761b2
Gerrit-Change-Number: 17048
Gerrit-PatchSet: 2
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 19 Feb 2021 07:49:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10467: Implement ds theta union() function

2021-02-18 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/17048 )

Change subject: IMPALA-10467: Implement ds_theta_union() function
..

IMPALA-10467: Implement ds_theta_union() function

This function receives a set of serialized Apache DataSketches Theta
sketches produced by ds_theta_sketch() and merges them into a single
sketch.

An example usage is to create a sketch for each partition of a table,
write these sketches to a separate table and based on which partition
the user is interested of the relevant sketches can be union-ed
together to get an estimate. E.g.:
  SELECT
  ds_theta_estimate(ds_theta_union(sketch_col))
  FROM sketch_tbl
  WHERE partition_col=1 OR partition_col=5;

Testing:
  - Apart from the automated tests I added to this patch I also
tested ds_theta_union() on a bigger dataset to check that
serialization, deserialization and merging steps work well. I
took TPCH25.linelitem, created a number of sketches with grouping
by l_shipdate and called ds_theta_union() on those sketches

Change-Id: I91baf58c76eb43748acd5245047edac8c66761b2
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/data/README
A testdata/data/theta_sketches_from_impala.parquet
M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test
M tests/query_test/test_datasketches.py
7 files changed, 152 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/17048/2
--
To view, visit http://gerrit.cloudera.org:8080/17048
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I91baf58c76eb43748acd5245047edac8c66761b2
Gerrit-Change-Number: 17048
Gerrit-PatchSet: 2
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10467: Implement ds theta union() function

2021-02-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17048 )

Change subject: IMPALA-10467: Implement ds_theta_union() function
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8111/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17048
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91baf58c76eb43748acd5245047edac8c66761b2
Gerrit-Change-Number: 17048
Gerrit-PatchSet: 1
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 10 Feb 2021 01:52:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10467: Implement ds theta union() function

2021-02-09 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17048


Change subject: IMPALA-10467: Implement ds_theta_union() function
..

IMPALA-10467: Implement ds_theta_union() function

This function receives a set of serialized Apache DataSketches Theta
sketches produced by ds_theta_sketch() and merges them into a single
sketch.

An example usage is to create a sketch for each partition of a table,
write these sketches to a separate table and based on which partition
the user is interested of the relevant sketches can be union-ed
together to get an estimate. E.g.:
  SELECT
  ds_theta_estimate(ds_theta_union(sketch_col))
  FROM sketch_tbl
  WHERE partition_col=1 OR partition_col=5;

Testing:
  - Apart from the automated tests I added to this patch I also
tested ds_theta_union() on a bigger dataset to check that
serialization, deserialization and merging steps work well. I
took TPCH25.linelitem, created a number of sketches with grouping
by l_shipdate and called ds_theta_union() on those sketches

Change-Id: I91baf58c76eb43748acd5245047edac8c66761b2
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/data/README
A testdata/data/theta_sketches_from_impala.parquet
M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test
M tests/query_test/test_datasketches.py
7 files changed, 162 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/17048/1
--
To view, visit http://gerrit.cloudera.org:8080/17048
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I91baf58c76eb43748acd5245047edac8c66761b2
Gerrit-Change-Number: 17048
Gerrit-PatchSet: 1
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins