Alex Behm has posted comments on this change. Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage ......................................................................
Patch Set 1: (14 comments) Did you verify the memory savings with a simple experiment? http://gerrit.cloudera.org:8080/#/c/6025/1/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: Line 146: // We assume that the dst buffer has already been allocated earlier. This increases can simplify with: Same as AllocBuffer() above but for re-allocating 'dst->ptr'. Line 150: uint8_t* ptr = ctx->Reallocate(dst->ptr, buf_len); DCHECK(dst->ptr != NULL); Line 918: const static int NUM_SAMPLES = NUM_BUCKETS * NUM_SAMPLES_PER_BUCKET; NUM_SAMPLES -> MAX_NUM_SAMPLES Line 974: ReservoirSample<T>* at(int64_t idx) { single line, const function Line 979: void push_back(ReservoirSample<T> s) { use const&, otherwise 's' is copied twice Line 982: ReservoirSample<T>* arr = reinterpret_cast<ReservoirSample<T>*>(begin()); no need to reinterpret_cast Line 987: ReservoirSample<T>* begin() { const function (and elsewhere) Line 994: ReservoirSample<T>* end() { single line Line 1005: void resizeReservoirSampleState(FunctionContext* ctx, StringVal* buffer) { Why not move this function into ReservoirSampleState and make ReservoirSampleState auto-resize itself during push_back()? You can add ctx as a param to push_back. That way all the logic is encapsulated inside ReservoirSampleState and not throughout various agg fn implementations Line 1007: // can fit 10 times as many samples, up to a maximum of 20,000. We do not allocate instead of 20,000 use the constant MAX_NUM_SAMPLES Line 1033: // If the array gets filled due to updates or merges, we will reallocate a larger suggest "we reallocate a larger buffer to hold up to a maximum of MAX_NUM_SAMPLES ReservoirSamples." Line 1035: int init_size = (NUM_SAMPLES / 1000); It would be nice to have this encapsulated in the ReservoirSampleState as well. Line 1061: resizeReservoirSampleState<T>(ctx, dst); if ReallocBuffer() failed, dst->ptr will be in an undefined state, and may cause a crash I suggest moving the reallocation logic into the push_back() of ReservoirSampleState. You might use the return code of push_back() to signal success/failure, so you can handle the failure case here. Line 1147: DCHECK_EQ(src_val.len, sizeof(ReservoirSampleState<T>) + you can add a helper function to ReservoirSampleState to do this size computation -- To view, visit http://gerrit.cloudera.org:8080/6025 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky <tbobrovyt...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com> Gerrit-HasComments: Yes