[Impala-ASF-CR] IMPALA-10868: cmake dependency on the latest datasketches 3.3.0

2022-01-04 Thread Alexander Saydakov (Code Review)
Alexander Saydakov has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/18118


Change subject: IMPALA-10868: cmake dependency on the latest datasketches 3.3.0
..

IMPALA-10868: cmake dependency on the latest datasketches 3.3.0

- added dependency on datasketches-cpp-3.3.0 using cmake
- removed older checked-in version of datasketches

Change-Id: I502ae31d8efd775b5bdeaf272530f7adff04e5b8
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/exprs/CMakeLists.txt
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/datasketches-common.cc
M be/src/exprs/datasketches-common.h
M be/src/exprs/datasketches-functions-ir.cc
M be/src/exprs/datasketches-test.cc
D be/src/thirdparty/datasketches/AuxHashMap-internal.hpp
D be/src/thirdparty/datasketches/AuxHashMap.hpp
D be/src/thirdparty/datasketches/CompositeInterpolationXTable-internal.hpp
D be/src/thirdparty/datasketches/CompositeInterpolationXTable.hpp
D be/src/thirdparty/datasketches/CouponHashSet-internal.hpp
D be/src/thirdparty/datasketches/CouponHashSet.hpp
D be/src/thirdparty/datasketches/CouponList-internal.hpp
D be/src/thirdparty/datasketches/CouponList.hpp
D be/src/thirdparty/datasketches/CubicInterpolation-internal.hpp
D be/src/thirdparty/datasketches/CubicInterpolation.hpp
D be/src/thirdparty/datasketches/HarmonicNumbers-internal.hpp
D be/src/thirdparty/datasketches/HarmonicNumbers.hpp
D be/src/thirdparty/datasketches/Hll4Array-internal.hpp
D be/src/thirdparty/datasketches/Hll4Array.hpp
D be/src/thirdparty/datasketches/Hll6Array-internal.hpp
D be/src/thirdparty/datasketches/Hll6Array.hpp
D be/src/thirdparty/datasketches/Hll8Array-internal.hpp
D be/src/thirdparty/datasketches/Hll8Array.hpp
D be/src/thirdparty/datasketches/HllArray-internal.hpp
D be/src/thirdparty/datasketches/HllArray.hpp
D be/src/thirdparty/datasketches/HllSketch-internal.hpp
D be/src/thirdparty/datasketches/HllSketchImpl-internal.hpp
D be/src/thirdparty/datasketches/HllSketchImpl.hpp
D be/src/thirdparty/datasketches/HllSketchImplFactory.hpp
D be/src/thirdparty/datasketches/HllUnion-internal.hpp
D be/src/thirdparty/datasketches/HllUtil.hpp
D be/src/thirdparty/datasketches/LICENSE
D be/src/thirdparty/datasketches/MurmurHash3.h
D be/src/thirdparty/datasketches/README.md
D be/src/thirdparty/datasketches/RelativeErrorTables-internal.hpp
D be/src/thirdparty/datasketches/RelativeErrorTables.hpp
D be/src/thirdparty/datasketches/binomial_bounds.hpp
D be/src/thirdparty/datasketches/bounds_binomial_proportions.hpp
D be/src/thirdparty/datasketches/bounds_on_ratios_in_sampled_sets.hpp
D be/src/thirdparty/datasketches/bounds_on_ratios_in_theta_sketched_sets.hpp
D be/src/thirdparty/datasketches/ceiling_power_of_2.hpp
D be/src/thirdparty/datasketches/common_defs.hpp
D be/src/thirdparty/datasketches/compression_data.hpp
D be/src/thirdparty/datasketches/conditional_back_inserter.hpp
D be/src/thirdparty/datasketches/conditional_forward.hpp
D be/src/thirdparty/datasketches/count_zeros.hpp
D be/src/thirdparty/datasketches/coupon_iterator-internal.hpp
D be/src/thirdparty/datasketches/coupon_iterator.hpp
D be/src/thirdparty/datasketches/cpc_common.hpp
D be/src/thirdparty/datasketches/cpc_compressor.hpp
D be/src/thirdparty/datasketches/cpc_compressor_impl.hpp
D be/src/thirdparty/datasketches/cpc_confidence.hpp
D be/src/thirdparty/datasketches/cpc_sketch.hpp
D be/src/thirdparty/datasketches/cpc_sketch_impl.hpp
D be/src/thirdparty/datasketches/cpc_union.hpp
D be/src/thirdparty/datasketches/cpc_union_impl.hpp
D be/src/thirdparty/datasketches/cpc_util.hpp
D be/src/thirdparty/datasketches/hll.hpp
D be/src/thirdparty/datasketches/hll.private.hpp
D be/src/thirdparty/datasketches/icon_estimator.hpp
D be/src/thirdparty/datasketches/inv_pow2_table.hpp
D be/src/thirdparty/datasketches/kll_helper.hpp
D be/src/thirdparty/datasketches/kll_helper_impl.hpp
D be/src/thirdparty/datasketches/kll_quantile_calculator.hpp
D be/src/thirdparty/datasketches/kll_quantile_calculator_impl.hpp
D be/src/thirdparty/datasketches/kll_sketch.hpp
D be/src/thirdparty/datasketches/kll_sketch_impl.hpp
D be/src/thirdparty/datasketches/kxp_byte_lookup.hpp
D be/src/thirdparty/datasketches/memory_operations.hpp
D be/src/thirdparty/datasketches/serde.hpp
D be/src/thirdparty/datasketches/theta_a_not_b.hpp
D be/src/thirdparty/datasketches/theta_a_not_b_impl.hpp
D be/src/thirdparty/datasketches/theta_comparators.hpp
D be/src/thirdparty/datasketches/theta_constants.hpp
D be/src/thirdparty/datasketches/theta_helpers.hpp
D be/src/thirdparty/datasketches/theta_intersection.hpp
D be/src/thirdparty/datasketches/theta_intersection_base.hpp
D be/src/thirdparty/datasketches/theta_intersection_base_impl.hpp
D be/src/thirdparty/datasketches/theta_intersection_impl.hpp
D be/src/thirdparty/datasketches/theta_jaccard_similarity.hpp
D be/src/thirdparty/datasketches/theta_jaccard_similarity_base.hpp
D be/src/thirdparty/datasketches/theta_set_difference_base.hpp
D 

[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead

2021-12-10 Thread Alexander Saydakov (Code Review)
Alexander Saydakov has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17869 )

Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead
..


Patch Set 4:

Sorry about duplicate comments. I am not used to this Gerrit thing. The 
interface here is not very intuitive. And it seems there is no way to remove 
these comments once posted.


--
To view, visit http://gerrit.cloudera.org:8080/17869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f
Gerrit-Change-Number: 17869
Gerrit-PatchSet: 4
Gerrit-Owner: Alexander Saydakov 
Gerrit-Reviewer: Alexander Saydakov 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Sat, 11 Dec 2021 01:16:07 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead

2021-12-10 Thread Alexander Saydakov (Code Review)
Alexander Saydakov has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17869 )

Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead
..


Patch Set 4:

(1 comment)
 > Some of the failed tests are related to running out of memory in JVM, and 
 > these are not known flaky tests

As far as I understand, the failed tests seem to have nothing to do with 
DataSketches UDFs.


--
To view, visit http://gerrit.cloudera.org:8080/17869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f
Gerrit-Change-Number: 17869
Gerrit-PatchSet: 4
Gerrit-Owner: Alexander Saydakov 
Gerrit-Reviewer: Alexander Saydakov 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Sat, 11 Dec 2021 01:08:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead

2021-12-10 Thread Alexander Saydakov (Code Review)
Alexander Saydakov has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17869 )

Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead
..


Patch Set 4:

> (1 comment)
 >
 > > Some of the failed tests are related to running out of memory in
 > JVM, and these are not known flaky tests

As far as I understand, the failed tests seem to have nothing to do with 
DataSketches UDFs.


--
To view, visit http://gerrit.cloudera.org:8080/17869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f
Gerrit-Change-Number: 17869
Gerrit-PatchSet: 4
Gerrit-Owner: Alexander Saydakov 
Gerrit-Reviewer: Alexander Saydakov 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Sat, 11 Dec 2021 01:06:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead

2021-12-10 Thread Alexander Saydakov (Code Review)
Alexander Saydakov has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17869 )

Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead
..


Patch Set 4:

(1 comment)

> Some of the failed tests are related to running out of memory in JVM, and 
> these are not known flaky tests
As far as I understand, the failed tests seem to have nothing to do with 
DataSketches UDFs.

http://gerrit.cloudera.org:8080/#/c/17869/4/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/17869/4/be/src/exprs/aggregate-functions-ir.cc@2075
PS4, Line 2075:   agg_state_ptr->second = new 
(ctx->Allocate())
  :   datasketches::update_theta_sketch(
  :   datasketches::update_theta_sketch::builder().build())
> I looked through the patch again looking for potential leaks. bit I didn't
Both failures seem unrelated to this change to me. One is about iceberg, 
another is about some statement expression limit.



--
To view, visit http://gerrit.cloudera.org:8080/17869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f
Gerrit-Change-Number: 17869
Gerrit-PatchSet: 4
Gerrit-Owner: Alexander Saydakov 
Gerrit-Reviewer: Alexander Saydakov 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Sat, 11 Dec 2021 01:06:03 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead

2021-12-10 Thread Alexander Saydakov (Code Review)
Alexander Saydakov has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17869 )

Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17869/4/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/17869/4/be/src/exprs/aggregate-functions-ir.cc@2075
PS4, Line 2075:   agg_state_ptr->second = new 
(ctx->Allocate())
  :   datasketches::update_theta_sketch(
  :   datasketches::update_theta_sketch::builder().build())
> I looked through the patch again looking for potential leaks. bit I didn't
Space is allocated just above, and placement new is called to put an object 
there. uninitilaized_fill_n does the same. I just did not quite like using it 
for one object. I think it is clearer to call placement new directly.


http://gerrit.cloudera.org:8080/#/c/17869/4/be/src/exprs/aggregate-functions-ir.cc@2151
PS4, Line 2151: u.update(*dst_sketch_ptr);
> Couldn't we use std::move here? update seems to be optimized for the rvalue
Yes, we could since we discard this object later. I doubt it makes any 
difference in practice. Rvalue case is supported in the API, but I don't think 
it is handled differently in the Theta union. Anyway, this is not incorrect. In 
the worst case it is a missed slight optimization opportunity.



--
To view, visit http://gerrit.cloudera.org:8080/17869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f
Gerrit-Change-Number: 17869
Gerrit-PatchSet: 4
Gerrit-Owner: Alexander Saydakov 
Gerrit-Reviewer: Alexander Saydakov 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 10 Dec 2021 18:16:29 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead

2021-12-08 Thread Alexander Saydakov (Code Review)
Alexander Saydakov has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17869 )

Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead
..


Patch Set 3:

(1 comment)

> please write once again if you want the code to be merged as it is or add 
> updates
I would suggest merging as is, and do another round of improvements separately 
once the dependency problem is resolved.

http://gerrit.cloudera.org:8080/#/c/17869/3/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/17869/3/be/src/exprs/aggregate-functions-ir.cc@1689
PS3, Line 1689:   agg_state_ptr->second = new 
(ctx->Allocate())
  :   datasketches::hll_sketch(DS_SKETCH_CONFIG, DS_HLL_TYPE);
> I agree that it is a great improvement as it is.
If the processing can start from merge, then indeed we can do lazy 
initialization, but let's not worry about this right now.



--
To view, visit http://gerrit.cloudera.org:8080/17869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f
Gerrit-Change-Number: 17869
Gerrit-PatchSet: 3
Gerrit-Owner: Alexander Saydakov 
Gerrit-Reviewer: Alexander Saydakov 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 08 Dec 2021 20:43:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead

2021-12-03 Thread Alexander Saydakov (Code Review)
Alexander Saydakov has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17869 )

Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17869/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17869/3//COMMIT_MSG@9
PS3, Line 9: - call destructors of sketch and union objects
> Thanks a lot for fixing this!
Sorry for the late reply. It did not seem to me that these comments invite a 
discussion or something might be expected from me. I am replying just in case I 
misunderstood, and I hope to move this forward.

Yes, ideally we should have an allocator, and I even started writing one, but 
it turned out that the version of Apache DataSketches currently used in Impala 
is not ready for this yet. There are some changes in the library to support 
this, but not released yet. It should happen soon. In the meantime this 
proposed change should make things better than before. I believe that the leaks 
should not happen in normal circumstances anymore (only in case of thrown 
exceptions).

The next step after adopting this change would be to fix the dependency on 
Apache Datasketches, so we could easily upgrade to the latest version, and then 
work on the allocator to take this to the next level. I believe there is a 
ticket open for the dependency issue.


http://gerrit.cloudera.org:8080/#/c/17869/3/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/17869/3/be/src/exprs/aggregate-functions-ir.cc@1689
PS3, Line 1689:   agg_state_ptr->second = new 
(ctx->Allocate())
  :   datasketches::hll_sketch(DS_SKETCH_CONFIG, DS_HLL_TYPE);
> DsHllInit always initializes a hll_sketch while we'll only modify it if DsH
As I understand, there is always the update phase of the processing before 
moving on to the merge phase. Even if I am wrong, and it is not so, the 
overhead of this unnecessary constructor is not that high, and it was there 
before, so no regression here. On the contrary, some unnecessary complexity was 
eliminated in this change. Even if not perfect, this should be an improvement.



--
To view, visit http://gerrit.cloudera.org:8080/17869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f
Gerrit-Change-Number: 17869
Gerrit-PatchSet: 3
Gerrit-Owner: Alexander Saydakov 
Gerrit-Reviewer: Alexander Saydakov 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Sat, 04 Dec 2021 05:28:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead

2021-10-07 Thread Alexander Saydakov (Code Review)
Hello Fucun Chu, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17869

to look at the new patch set (#3).

Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead
..

IMPALA-10956: datasketches UDFs: memory leak and merge overhead

- call destructors of sketch and union objects
- avoid overhead of constructing union and getting result from it every time

Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f
---
M be/src/exprs/aggregate-functions-ir.cc
1 file changed, 273 insertions(+), 195 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/17869/3
--
To view, visit http://gerrit.cloudera.org:8080/17869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f
Gerrit-Change-Number: 17869
Gerrit-PatchSet: 3
Gerrit-Owner: Alexander Saydakov 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10956 datasketches UDFs: memory leak and merge overhead

2021-10-07 Thread Alexander Saydakov (Code Review)
Hello Fucun Chu, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17869

to look at the new patch set (#2).

Change subject: IMPALA-10956 datasketches UDFs: memory leak and merge overhead
..

IMPALA-10956 datasketches UDFs: memory leak and merge overhead

- call destructors of sketch and union objects
- avoid overhead of constructing union and getting result from it every time

Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f
---
M be/src/exprs/aggregate-functions-ir.cc
1 file changed, 273 insertions(+), 195 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/17869/2
--
To view, visit http://gerrit.cloudera.org:8080/17869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f
Gerrit-Change-Number: 17869
Gerrit-PatchSet: 2
Gerrit-Owner: Alexander Saydakov 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10835: Extend the DS HLL SKETCH function to accept a precision

2021-09-24 Thread Alexander Saydakov (Code Review)
Alexander Saydakov has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17744 )

Change subject: IMPALA-10835: Extend the DS_HLL_SKETCH function to accept a 
precision
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17744/1/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/17744/1/be/src/exprs/aggregate-functions-ir.cc@1722
PS1, Line 1722: precision
precision is not the best name. I would suggest following the datasketches 
library and call it lg_k



--
To view, visit http://gerrit.cloudera.org:8080/17744
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91a360bb046d4abb101641772b6159308bf6c014
Gerrit-Change-Number: 17744
Gerrit-PatchSet: 1
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Alexander Saydakov 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 24 Sep 2021 22:24:56 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10835: Extend the DS HLL SKETCH function to accept a precision

2021-09-24 Thread Alexander Saydakov (Code Review)
Alexander Saydakov has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17744 )

Change subject: IMPALA-10835: Extend the DS_HLL_SKETCH function to accept a 
precision
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17744/1/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/17744/1/be/src/exprs/aggregate-functions-ir.cc@1826
PS1, Line 1826: datasketches
why max here, not the specified precision?



--
To view, visit http://gerrit.cloudera.org:8080/17744
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91a360bb046d4abb101641772b6159308bf6c014
Gerrit-Change-Number: 17744
Gerrit-PatchSet: 1
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Alexander Saydakov 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 24 Sep 2021 21:38:14 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] datasketches: improved merge and memory allocation - avoid overhead of constructing union and getting result from it every time - call destructors of sketch and union objects

2021-09-24 Thread Alexander Saydakov (Code Review)
Alexander Saydakov has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17869


Change subject: datasketches: improved merge and memory allocation - avoid 
overhead of constructing union and getting result from it every time - call 
destructors of sketch and union objects
..

datasketches: improved merge and memory allocation
- avoid overhead of constructing union and getting result from it every time
- call destructors of sketch and union objects

Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f
---
M be/src/exprs/aggregate-functions-ir.cc
1 file changed, 273 insertions(+), 195 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/17869/1
--
To view, visit http://gerrit.cloudera.org:8080/17869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f
Gerrit-Change-Number: 17869
Gerrit-PatchSet: 1
Gerrit-Owner: Alexander Saydakov 


[Impala-ASF-CR] IMPALA-10901 cleaner and faster operations with datasketches

2021-09-03 Thread Alexander Saydakov (Code Review)
Hello Fucun Chu, Gabor Kaszab, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17818

to look at the new patch set (#3).

Change subject: IMPALA-10901 cleaner and faster operations with datasketches
..

IMPALA-10901 cleaner and faster operations with datasketches

- serialize using bytes instead of stream
- avoid unnecessary constructor during deserialization
- simplified code slightly
- added original exception message to re-thrown generic message

Change-Id: I306a2489dac0f4d2d475e8f9987cd58bf95474bb
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/datasketches-common.cc
M be/src/exprs/datasketches-common.h
M be/src/exprs/datasketches-functions-ir.cc
M testdata/workloads/functional-query/queries/QueryTest/datasketches-cpc.test
M testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test
M testdata/workloads/functional-query/queries/QueryTest/datasketches-kll.test
M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test
8 files changed, 259 insertions(+), 375 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/18/17818/3
--
To view, visit http://gerrit.cloudera.org:8080/17818
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I306a2489dac0f4d2d475e8f9987cd58bf95474bb
Gerrit-Change-Number: 17818
Gerrit-PatchSet: 3
Gerrit-Owner: Alexander Saydakov 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10901 cleaner and faster operations with datasketches

2021-08-31 Thread Alexander Saydakov (Code Review)
Hello Gabor Kaszab, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17818

to look at the new patch set (#2).

Change subject: IMPALA-10901 cleaner and faster operations with datasketches
..

IMPALA-10901 cleaner and faster operations with datasketches

- serialize using bytes instead of stream
- avoid unnecessary constructor during deserialization
- simplified code slightly
- added original exception message to re-thrown generic message

Change-Id: I306a2489dac0f4d2d475e8f9987cd58bf95474bb
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/datasketches-common.cc
M be/src/exprs/datasketches-common.h
M be/src/exprs/datasketches-functions-ir.cc
4 files changed, 233 insertions(+), 342 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/18/17818/2
--
To view, visit http://gerrit.cloudera.org:8080/17818
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I306a2489dac0f4d2d475e8f9987cd58bf95474bb
Gerrit-Change-Number: 17818
Gerrit-PatchSet: 2
Gerrit-Owner: Alexander Saydakov 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] cleaner and faster operations wtih datasketches

2021-08-30 Thread Alexander Saydakov (Code Review)
Alexander Saydakov has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17818


Change subject: cleaner and faster operations wtih datasketches
..

cleaner and faster operations wtih datasketches

Change-Id: I306a2489dac0f4d2d475e8f9987cd58bf95474bb
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/datasketches-common.cc
M be/src/exprs/datasketches-common.h
M be/src/exprs/datasketches-functions-ir.cc
4 files changed, 203 insertions(+), 308 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/18/17818/1
--
To view, visit http://gerrit.cloudera.org:8080/17818
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I306a2489dac0f4d2d475e8f9987cd58bf95474bb
Gerrit-Change-Number: 17818
Gerrit-PatchSet: 1
Gerrit-Owner: Alexander Saydakov