Gabor Kaszab has posted comments on this change. Change subject: IMPALA-5412 Fix scan result with partitions on same file ......................................................................
Patch Set 1: (20 comments) http://gerrit.cloudera.org:8080/#/c/7625/1/be/src/exec/base-sequence-scanner.cc File be/src/exec/base-sequence-scanner.cc: Line 90: context->partition_descriptor()->id(),stream_->filename())); > nit:space after , Done PS1, Line 163: ,s > space Done PS1, Line 309: > 4 spaces Done http://gerrit.cloudera.org:8080/#/c/7625/1/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 613: context_->partition_descriptor()->id(), filename()); > 4 spaces Done http://gerrit.cloudera.org:8080/#/c/7625/1/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: PS1, Line 266: , > space Done PS1, Line 266: string > Do we need to call the string constructor explicitly? Seems a bit weird. Done PS1, Line 266: string > remove string constructor here Done http://gerrit.cloudera.org:8080/#/c/7625/1/be/src/exec/hdfs-scan-node-base.h File be/src/exec/hdfs-scan-node-base.h: Line 197: /// Allocate a new scan range object, stored in the runtime state's object pool. For > This comment needs updating since partition_id is now required. Done PS1, Line 237: /// Returns nullptr if the search doesn't find the descriptor. > not true; it has a dcheck Done PS1, Line 360: pair_hash > misleading name because the hash is computed on the value. apparently, unordered_map doesn't accept pair as key by default, only if a custom hash is provided. Line 360: struct pair_hash { > We have some utilities like this already in util/container-util.h, let's mo Done Line 363: return std::hash<T2>{}(p.second); > Let's hash both values in the pair and combine them with boost::hash_combin Done PS1, Line 369: pair<int64_t, std::string> > Can you typedef this and using that here and in the .cc file ? Done PS1, Line 381: pair<int64_t, std::string> > same Done http://gerrit.cloudera.org:8080/#/c/7625/1/tests/metadata/test_partition_metadata.py File tests/metadata/test_partition_metadata.py: Line 71: data = self.execute_scalar("select sum(i), sum(j) from %s" % FQ_TBL_NAME) > Can we run this query with num_nodes=1 to verify that the same bug doesn't Done PS1, Line 81: ad > as? Done PS1, Line 81: imapala > Impala Done PS1, Line 109: output > 'output' isn't clear (it doesn't match up with the queries). say this is wh Done Line 110: # (note, that shortcoming on avro writer would result the inserted value as NULL, > How about we only query the second column to avoid this complication? Done PS1, Line 122: output: : # [NULL, 1] 3 times : # [NULL, 2] 3 times > again, this is confusing Done -- To view, visit http://gerrit.cloudera.org:8080/7625 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com> Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Laszlo Gaal <laszlo.g...@cloudera.com> Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes