Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage ......................................................................
Patch Set 5: (3 comments) http://gerrit.cloudera.org:8080/#/c/6025/5/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: Line 1121: int r = rand() % state->num_samples; > I know you didn't author this line, but could you switch to a better PRNG t I think this should be addressed in a separate patch. I think we would have to benchmark the change, etc. PS5, Line 1122: ((double) state->source_size - r) / state->source_size > I'm not yet convinced this is correct. I'm not convinced it is correct in H I don't think this is correct either. The distribution of keys will not be the same as what's described in the blog post: https://gregable.com/2007/10/reservoir-sampling.html We should be generating a random number between 0 and 1 in update(), but we are "simulating" this here (see comment on line 1107), which gives us an incorrect distribution. I don't think increasing the denominator by 1 as you suggested would fix the problem. About your second point, in our case the weight k is always 1, so we don't need to raise r to some power. I think algorithm should be addressed in a separate patch. PS5, Line 1282: sort > I know this also is present in HEAD, but it can be http://en.cppreference.c Done -- To view, visit http://gerrit.cloudera.org:8080/6025 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Jim Apple <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Taras Bobrovytsky <[email protected]> Gerrit-HasComments: Yes
