Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5412 Scan returns wrong partition-column values when 
scanning multiple partitions pointing to the same filesystem location.
......................................................................


Patch Set 1:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/7625/1//COMMIT_MSG
Commit Message:

PS1, Line 7: IMPALA-5412 Scan returns wrong partition-column values when 
scanning multiple partitions
           : pointing to the same filesystem location.
           : 
           : The maps storing file descriptors and file metadata were using 
filename as a key.
           : Multiple partitions pointing to the same filesystem location 
resulted that these
           : map entries were occasionally overwritted by the other partition 
poing to the same.
           : 
           : As a solution the map key was enhanced to contain a pair of 
partition ID and file name.
wrap at 60cols in git commit msgs


http://gerrit.cloudera.org:8080/#/c/7625/1/be/src/exec/base-sequence-scanner.cc
File be/src/exec/base-sequence-scanner.cc:

PS1, Line 163: ,s
space


PS1, Line 309:     
4 spaces


http://gerrit.cloudera.org:8080/#/c/7625/1/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 613:     context_->partition_descriptor()->id(), filename());
4 spaces


http://gerrit.cloudera.org:8080/#/c/7625/1/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

PS1, Line 266: string
remove string constructor here


PS1, Line 266: ,
space


http://gerrit.cloudera.org:8080/#/c/7625/1/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

PS1, Line 237:   /// Returns nullptr if the search doesn't find the descriptor.
not true; it has a dcheck


PS1, Line 360: pair_hash
misleading name because the hash is computed on the value.

why is it necessary to provide a custom hash?


PS1, Line 369: pair<int64_t, std::string>
Can you typedef this and using that here and in the .cc file ?


PS1, Line 381: pair<int64_t, std::string>
same


http://gerrit.cloudera.org:8080/#/c/7625/1/tests/metadata/test_partition_metadata.py
File tests/metadata/test_partition_metadata.py:

PS1, Line 80: avro
does this need to be avro? seems like text would work, and this test class is 
using the text/none dimension (see l37)


PS1, Line 81: ad
as?


PS1, Line 109: output
'output' isn't clear (it doesn't match up with the queries). say this is what 
would get returned by 
 select i, j from %s


PS1, Line 110: note, that shortcoming on avro writer would result the inserted 
value as NULL,
             :     # but the point is the second column for partition i
This isn't a clear sentence; why does the avro writer insert NULL?


PS1, Line 122: output:
             :     # [NULL, 1] 3 times
             :     # [NULL, 2] 3 times
again, this is confusing


-- 
To view, visit http://gerrit.cloudera.org:8080/7625
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <laszlo.g...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to