Matthew Jacobs has posted comments on this change. Change subject: IMPALA-5412 Scan returns wrong partition-column values when scanning multiple partitions pointing to the same filesystem location. ......................................................................
Patch Set 1: (15 comments) http://gerrit.cloudera.org:8080/#/c/7625/1//COMMIT_MSG Commit Message: PS1, Line 7: IMPALA-5412 Scan returns wrong partition-column values when scanning multiple partitions : pointing to the same filesystem location. : : The maps storing file descriptors and file metadata were using filename as a key. : Multiple partitions pointing to the same filesystem location resulted that these : map entries were occasionally overwritted by the other partition poing to the same. : : As a solution the map key was enhanced to contain a pair of partition ID and file name. wrap at 60cols in git commit msgs http://gerrit.cloudera.org:8080/#/c/7625/1/be/src/exec/base-sequence-scanner.cc File be/src/exec/base-sequence-scanner.cc: PS1, Line 163: ,s space PS1, Line 309: 4 spaces http://gerrit.cloudera.org:8080/#/c/7625/1/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 613: context_->partition_descriptor()->id(), filename()); 4 spaces http://gerrit.cloudera.org:8080/#/c/7625/1/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: PS1, Line 266: string remove string constructor here PS1, Line 266: , space http://gerrit.cloudera.org:8080/#/c/7625/1/be/src/exec/hdfs-scan-node-base.h File be/src/exec/hdfs-scan-node-base.h: PS1, Line 237: /// Returns nullptr if the search doesn't find the descriptor. not true; it has a dcheck PS1, Line 360: pair_hash misleading name because the hash is computed on the value. why is it necessary to provide a custom hash? PS1, Line 369: pair<int64_t, std::string> Can you typedef this and using that here and in the .cc file ? PS1, Line 381: pair<int64_t, std::string> same http://gerrit.cloudera.org:8080/#/c/7625/1/tests/metadata/test_partition_metadata.py File tests/metadata/test_partition_metadata.py: PS1, Line 80: avro does this need to be avro? seems like text would work, and this test class is using the text/none dimension (see l37) PS1, Line 81: ad as? PS1, Line 109: output 'output' isn't clear (it doesn't match up with the queries). say this is what would get returned by select i, j from %s PS1, Line 110: note, that shortcoming on avro writer would result the inserted value as NULL, : # but the point is the second column for partition i This isn't a clear sentence; why does the avro writer insert NULL? PS1, Line 122: output: : # [NULL, 1] 3 times : # [NULL, 2] 3 times again, this is confusing -- To view, visit http://gerrit.cloudera.org:8080/7625 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com> Gerrit-Reviewer: Laszlo Gaal <laszlo.g...@cloudera.com> Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes