Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage ......................................................................
Patch Set 8: (11 comments) http://gerrit.cloudera.org:8080/#/c/6025/8//COMMIT_MSG Commit Message: PS8, Line 22: No new tests were added, > can update this now Done http://gerrit.cloudera.org:8080/#/c/6025/8/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: Line 126: // TODO: this file should be cross compiled and then all of the builtin > is this done? This file ends with a "-ir" and it's in be/src/codegen/impala-ir.cc, so this must be done. Removed. Line 961: struct ReservoirSampleState { > i'd say this really turned into a class Done Line 967: // resize occurs, this needs to be updated from the outside. > what does 'from the outside' mean? I meant that whoever resizes and memcopies this struct over, they are also responsible for updating the capacity. It might be more clear if I remove that part. Line 977: ReservoirSampleState(int init_capacity) : > use standard formatting Done. Line 1016: // The array of ReservoirSamples starts right after ReservoirSampleState, so we use > that's often done by putting an array of size 1 at the end of the header st Done. Also made some of the functions non-const because we don't want a function like GetSample() to return a const ReservoirSample<T>*. Line 1025: int64_t GetNext64(int64_t max) { > while you're at it, this deserves a comment Done Line 1033: // Given a buffer that contains a ReservoirSampleState, resize the buffer so that it's > its nice catch Line 1040: if (new_capacity * 2 >= MAX_CAPACITY) new_capacity = MAX_CAPACITY; > if state->capacity is 10 and max_capacity is 40, this line sets new_capacit With the current constants, we would be resizing from about 8000 to 20,000. I think this is acceptable. Would it be better to resize to 16,000 then to 20,000? Line 1062: // If the array gets filled due to updates or merges, we reallocate a larger buffer to > you should put this (= a brief description of what you're doing) somewhere I moved this description higher up. I kept it mostly unmodified. http://gerrit.cloudera.org:8080/#/c/6025/8/testdata/workloads/functional-query/queries/QueryTest/aggregation.test File testdata/workloads/functional-query/queries/QueryTest/aggregation.test: PS8, Line 1163: mediam > spelling Done -- To view, visit http://gerrit.cloudera.org:8080/6025 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Jim Apple <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Taras Bobrovytsky <[email protected]> Gerrit-HasComments: Yes
