Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
......................................................................


Patch Set 5:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6025/5/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

Line 1121:     int r = rand() % state->num_samples;
> I know you didn't author this line, but could you switch to a better PRNG t
I think this should be addressed in a separate patch. I think we would have to 
benchmark the change, etc.


PS5, Line 1122: ((double) state->source_size - r) / state->source_size
> I'm not yet convinced this is correct. I'm not convinced it is correct in H
I don't think this is correct either. The distribution of keys will not be the 
same as what's described in the blog post: 
https://gregable.com/2007/10/reservoir-sampling.html
We should be generating a random number between 0 and 1 in update(), but we are 
"simulating" this here (see comment on line 1107), which gives us an incorrect 
distribution.

I don't think increasing the denominator by 1 as you suggested would fix the 
problem.

About your second point, in our case the weight k is always 1, so we don't need 
to raise r to some power.

I think algorithm should be addressed in a separate patch.


PS5, Line 1282: sort
> I know this also is present in HEAD, but it can be http://en.cppreference.c
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Jim Apple <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Matthew Jacobs <[email protected]>
Gerrit-Reviewer: Mostafa Mokhtar <[email protected]>
Gerrit-Reviewer: Taras Bobrovytsky <[email protected]>
Gerrit-HasComments: Yes

Reply via email to