Henry Robinson has posted comments on this change. Change subject: IMPALA-2494: Support for byte array-encoded decimals in Parquet scanner ......................................................................
Patch Set 1: (6 comments) http://gerrit.cloudera.org:8080/#/c/5115/1//COMMIT_MSG Commit Message: Line 21: * Tested computing SUM(col) for 1 billion distinct dictionary-encoded > I'm assuming you are measuring response time. Since there is overall more w Right, but I'm measuring the absolute difference in response time, not the relative difference. So it doesn't totally matter what the rest of the query is doing, as long as it's stable (I agree that variance can drown out the signal). Thankfully all the runtimes I measured were remarkably stable. However, your next point is a very good one. Line 23: * No performance difference measured by introduction of extra > I'm assuming you compared response times. With multi-threaded scans the los Doh, of course you're right, and I wasn't properly measuring the overhead, because of parallelism. With a single thread, decoding 1B decimals is about 1 second slower, which is about 1ns per decode. I've updated the commit message (still think that's not a perf difference worth chasing). http://gerrit.cloudera.org:8080/#/c/5115/1/be/src/exec/parquet-common.h File be/src/exec/parquet-common.h: Line 203: /// assumed to be BYTE_ARRAY, otherwise it is assumed to be FIXED_LEN_BYTE_ARRAY. > Document return value? Done PS1, Line 328: fixed_len_size > This name seems inaccurate since it can be variable. Maybe 'encoded_size'? Hm, I think it's accurate - either it's set (> 0) and fixed, or it's not set (-1). 'encoded_size' wouldn't work well later in the method (because it's set to the size of the buffer, not the total encoded size). I could of course have another variable to manage that, but this seems ok to me. PS1, Line 338: fixed_len_size > What if fixed_len_size is negative? Done. http://gerrit.cloudera.org:8080/#/c/5115/1/testdata/bin/create-load-data.sh File testdata/bin/create-load-data.sh: Line 150: # Decimal Parquet encodings written by Java Parquet library. > If you're going to create the table as part of the test, might as well copy Done. -- To view, visit http://gerrit.cloudera.org:8080/5115 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If95171e65aa48f08b08b8e87f4555dc75e867977 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Henry Robinson <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
