Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage ......................................................................
Patch Set 1: (14 comments) http://gerrit.cloudera.org:8080/#/c/6025/1/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: Line 146: // We assume that the dst buffer has already been allocated earlier. This increases > can simplify with: Done Line 150: uint8_t* ptr = ctx->Reallocate(dst->ptr, buf_len); > DCHECK(dst->ptr != NULL); Done Line 918: const static int NUM_SAMPLES = NUM_BUCKETS * NUM_SAMPLES_PER_BUCKET; > NUM_SAMPLES -> MAX_NUM_SAMPLES Done Line 974: ReservoirSample<T>* at(int64_t idx) { > single line, const function Done Line 979: void push_back(ReservoirSample<T> s) { > use const&, otherwise 's' is copied twice Done Line 982: ReservoirSample<T>* arr = reinterpret_cast<ReservoirSample<T>*>(begin()); > no need to reinterpret_cast Done Line 987: ReservoirSample<T>* begin() { > const function (and elsewhere) Done Line 994: ReservoirSample<T>* end() { > single line Done Line 1005: void resizeReservoirSampleState(FunctionContext* ctx, StringVal* buffer) { > Why not move this function into ReservoirSampleState and make ReservoirSamp Unfortunately it's not possible to encapsulate because we also need to update the pointer in the StringVal buffer. It would be more complicated and error prone to pass it as one of parameters into this function. Line 1007: // can fit 10 times as many samples, up to a maximum of 20,000. We do not allocate > instead of 20,000 use the constant MAX_NUM_SAMPLES Done Line 1033: // If the array gets filled due to updates or merges, we will reallocate a larger > suggest "we reallocate a larger buffer to hold up to a maximum of MAX_NUM_S Done Line 1035: int init_size = (NUM_SAMPLES / 1000); > It would be nice to have this encapsulated in the ReservoirSampleState as w As mentioned above, it's not possible to encapsulate cleanly. Line 1061: resizeReservoirSampleState<T>(ctx, dst); > if ReallocBuffer() failed, dst->ptr will be in an undefined state, and may I just added a check here, because it's not possible to move the logic as you suggested. Line 1147: DCHECK_EQ(src_val.len, sizeof(ReservoirSampleState<T>) + > you can add a helper function to ReservoirSampleState to do this size compu Done. Also added IsFull(). -- To view, visit http://gerrit.cloudera.org:8080/6025 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-Reviewer: Taras Bobrovytsky <[email protected]> Gerrit-HasComments: Yes
