Henry Robinson has posted comments on this change.

Change subject: IMPALA-2494: Support for byte array-encoded decimals in Parquet 
scanner
......................................................................


Patch Set 1:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/5115/1//COMMIT_MSG
Commit Message:

Line 21:  * Tested computing SUM(col) for 1 billion distinct dictionary-encoded
> I'm assuming you are measuring response time. Since there is overall more w
Right, but I'm measuring the absolute difference in response time, not  the 
relative difference. So it doesn't totally matter what the rest of the query is 
doing, as long as it's stable (I agree that variance can drown out the signal). 
Thankfully all the runtimes I measured were remarkably stable. However, your 
next point is a very good one.


Line 23:  * No performance difference measured by introduction of extra
> I'm assuming you compared response times. With multi-threaded scans the los
Doh, of course you're right, and I wasn't properly measuring the overhead, 
because of parallelism.

With a single thread, decoding 1B decimals is about 1 second slower, which is 
about 1ns per decode. I've updated the commit message (still think that's not a 
perf difference worth chasing).


http://gerrit.cloudera.org:8080/#/c/5115/1/be/src/exec/parquet-common.h
File be/src/exec/parquet-common.h:

Line 203:   /// assumed to be BYTE_ARRAY, otherwise it is assumed to be 
FIXED_LEN_BYTE_ARRAY.
> Document return value?
Done


PS1, Line 328: fixed_len_size
> This name seems inaccurate since it can be variable. Maybe 'encoded_size'?
Hm, I think it's accurate - either it's set (> 0) and fixed, or it's not set 
(-1). 'encoded_size' wouldn't work well later in the method (because it's set 
to the size of the buffer, not the total encoded size). I could of course have 
another variable to manage that, but this seems ok to me.


PS1, Line 338: fixed_len_size
> What if fixed_len_size is negative?
Done.


http://gerrit.cloudera.org:8080/#/c/5115/1/testdata/bin/create-load-data.sh
File testdata/bin/create-load-data.sh:

Line 150:   # Decimal Parquet encodings written by Java Parquet library.
> If you're going to create the table as part of the test, might as well copy
Done.


-- 
To view, visit http://gerrit.cloudera.org:8080/5115
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If95171e65aa48f08b08b8e87f4555dc75e867977
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Henry Robinson <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to