Jim Apple has posted comments on this change. Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage ......................................................................
Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/6025/5/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: PS5, Line 1122: ((double) state->source_size - r) / state->source_size I'm not yet convinced this is correct. I'm not convinced it is correct in HEAD. First, a nit: I think the denominator should be increased by 1. (I think the expected value of the maximum of two independent random variables over the uniform distribution on (0,1) is 2/3, not 3/4) Second, while I understand that this produces values in the given range, I don't think I understand how this implements the algorithm mentioned in the .h file, in which values of weight k are mapped to r^(1/k), where r is uniform random from (0,1). -- To view, visit http://gerrit.cloudera.org:8080/6025 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Jim Apple <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Taras Bobrovytsky <[email protected]> Gerrit-HasComments: Yes
