Alex Behm has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
......................................................................


Patch Set 1:

(14 comments)

Did you verify the memory savings with a simple experiment?

http://gerrit.cloudera.org:8080/#/c/6025/1/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

Line 146: // We assume that the dst buffer has already been allocated earlier. 
This increases
can simplify with:

Same as AllocBuffer() above but for re-allocating 'dst->ptr'.


Line 150:   uint8_t* ptr = ctx->Reallocate(dst->ptr, buf_len);
DCHECK(dst->ptr != NULL);


Line 918: const static int NUM_SAMPLES = NUM_BUCKETS * NUM_SAMPLES_PER_BUCKET;
NUM_SAMPLES -> MAX_NUM_SAMPLES


Line 974:   ReservoirSample<T>* at(int64_t idx) {
single line, const function


Line 979:   void push_back(ReservoirSample<T> s) {
use const&, otherwise 's' is copied twice


Line 982:     ReservoirSample<T>* arr = 
reinterpret_cast<ReservoirSample<T>*>(begin());
no need to reinterpret_cast


Line 987:   ReservoirSample<T>* begin() {
const function (and elsewhere)


Line 994:   ReservoirSample<T>* end() {
single line


Line 1005: void resizeReservoirSampleState(FunctionContext* ctx, StringVal* 
buffer) {
Why not move this function into ReservoirSampleState and make 
ReservoirSampleState auto-resize itself during push_back()? You can add ctx as 
a param to push_back.

That way all the logic is encapsulated inside ReservoirSampleState and not 
throughout various agg fn implementations


Line 1007:   // can fit 10 times as many samples, up to a maximum of 20,000. We 
do not allocate
instead of 20,000 use the constant MAX_NUM_SAMPLES


Line 1033:   // If the array gets filled due to updates or merges, we will 
reallocate a larger
suggest "we reallocate a larger buffer to hold up to a maximum of 
MAX_NUM_SAMPLES ReservoirSamples."


Line 1035:   int init_size = (NUM_SAMPLES / 1000);
It would be nice to have this encapsulated in the ReservoirSampleState as well.


Line 1061:       resizeReservoirSampleState<T>(ctx, dst);
if ReallocBuffer() failed, dst->ptr will be in an undefined state, and may 
cause a crash

I suggest moving the reallocation logic into the push_back() of 
ReservoirSampleState. You might use the return code of push_back() to signal 
success/failure, so you can handle the failure case here.


Line 1147:   DCHECK_EQ(src_val.len, sizeof(ReservoirSampleState<T>) +
you can add a helper function to ReservoirSampleState to do this size 
computation


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tbobrovyt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to