[Impala-ASF-CR] IMPALA-5412: Fix scan result with partitions on same file
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5412: Fix scan result with partitions on same file .. Patch Set 8: Code-Review+2 Verified+1 Carry +2 https://jenkins.impala.io/job/parallel-all-tests/1284/ -- To view, visit http://gerrit.cloudera.org:8080/7625 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Gabor KaszabGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5412: Fix scan result with partitions on same file
Tim Armstrong has submitted this change and it was merged. Change subject: IMPALA-5412: Fix scan result with partitions on same file .. IMPALA-5412: Fix scan result with partitions on same file The maps storing file descriptors and file metadata were using filename as a key. Multiple partitions pointing to the same filesystem location resulted that these map entries were occasionally overwritted by the other partition poing to the same. As a solution the map key was enhanced to contain a pair of partition ID and file name. Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9 Reviewed-on: http://gerrit.cloudera.org:8080/7625 Reviewed-by: Tim ArmstrongTested-by: Tim Armstrong --- M be/src/exec/base-sequence-scanner.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/scanner-context.cc M be/src/util/container-util.h M tests/metadata/test_partition_metadata.py 8 files changed, 110 insertions(+), 47 deletions(-) Approvals: Tim Armstrong: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/7625 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5412: Fix scan result with partitions on same file
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5412: Fix scan result with partitions on same file .. Patch Set 7: Verified+1 https://jenkins.impala.io/job/parallel-all-tests/1284/ -- To view, visit http://gerrit.cloudera.org:8080/7625 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Gabor KaszabGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5412: Fix scan result with partitions on same file
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5412: Fix scan result with partitions on same file .. Patch Set 7: Running tests against the Impala-lzo branch with matching change: https://jenkins.impala.io/job/parallel-all-tests/1281/ -- To view, visit http://gerrit.cloudera.org:8080/7625 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Gabor KaszabGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5412: Fix scan result with partitions on same file
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5412: Fix scan result with partitions on same file .. Patch Set 7: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1040/ -- To view, visit http://gerrit.cloudera.org:8080/7625 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Gabor KaszabGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5412: Fix scan result with partitions on same file
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5412: Fix scan result with partitions on same file .. Patch Set 7: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1040/ -- To view, visit http://gerrit.cloudera.org:8080/7625 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Gabor KaszabGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5412: Fix scan result with partitions on same file
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5412: Fix scan result with partitions on same file .. Patch Set 7: Code-Review+2 Rebase -- To view, visit http://gerrit.cloudera.org:8080/7625 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Gabor KaszabGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5412: Fix scan result with partitions on same file
Alex Behm has posted comments on this change. Change subject: IMPALA-5412: Fix scan result with partitions on same file .. Patch Set 6: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7625 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Gabor KaszabGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5412: Fix scan result with partitions on same file
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5412: Fix scan result with partitions on same file .. Patch Set 5: (3 comments) http://gerrit.cloudera.org:8080/#/c/7625/5/be/src/exec/hdfs-scan-node-base.h File be/src/exec/hdfs-scan-node-base.h: Line 365: typedef std::unordered_map FileDescMap; > extra space after PartitionFileKey Done http://gerrit.cloudera.org:8080/#/c/7625/5/tests/metadata/test_partition_metadata.py File tests/metadata/test_partition_metadata.py: Line 52: def test_multiple_partitions_same_location(self, vector, unique_database): > I think this test could use some cleanup, but I'm ok to accept this patch i I see your point but I don't have the time right now to do this refactoring. I think we should get this in and improve the tests as a separate step (if we think it's important). Line 103: # check if using num_nodes=1 has the same behaviour > # force all scan ranges to be on the same node Done -- To view, visit http://gerrit.cloudera.org:8080/7625 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Gabor KaszabGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5412: Fix scan result with partitions on same file
Hello Matthew Jacobs, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7625 to look at the new patch set (#6). Change subject: IMPALA-5412: Fix scan result with partitions on same file .. IMPALA-5412: Fix scan result with partitions on same file The maps storing file descriptors and file metadata were using filename as a key. Multiple partitions pointing to the same filesystem location resulted that these map entries were occasionally overwritted by the other partition poing to the same. As a solution the map key was enhanced to contain a pair of partition ID and file name. Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9 --- M be/src/exec/base-sequence-scanner.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/scanner-context.cc M be/src/util/container-util.h M tests/metadata/test_partition_metadata.py 8 files changed, 110 insertions(+), 47 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/7625/6 -- To view, visit http://gerrit.cloudera.org:8080/7625 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Gabor KaszabGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5412: Fix scan result with partitions on same file
Alex Behm has posted comments on this change. Change subject: IMPALA-5412: Fix scan result with partitions on same file .. Patch Set 5: (3 comments) http://gerrit.cloudera.org:8080/#/c/7625/5/be/src/exec/hdfs-scan-node-base.h File be/src/exec/hdfs-scan-node-base.h: Line 365: typedef std::unordered_map FileDescMap; extra space after PartitionFileKey http://gerrit.cloudera.org:8080/#/c/7625/5/tests/metadata/test_partition_metadata.py File tests/metadata/test_partition_metadata.py: Line 52: def test_multiple_partitions_same_location(self, vector, unique_database): I think this test could use some cleanup, but I'm ok to accept this patch if you feel the cleanup is too cumbersome. * instead of limiting the file formats and relying on allow_unsupported_formats, we could use existing alltypes data (create table like, then create partitions pointing to known locations with data) * split up the read and write tests; it's good to have coverage of the write path, but none of the JIRAs mentioned here were bugs in the write path Line 103: # check if using num_nodes=1 has the same behaviour # force all scan ranges to be on the same node -- To view, visit http://gerrit.cloudera.org:8080/7625 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Gabor KaszabGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5412: Fix scan result with partitions on same file
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-5412: Fix scan result with partitions on same file .. Patch Set 5: Code-Review+1 Looks good -- To view, visit http://gerrit.cloudera.org:8080/7625 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Gabor KaszabGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5412 Fix scan result with partitions on same file
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5412 Fix scan result with partitions on same file .. Patch Set 3: (3 comments) I ended up combining the tests in test_partition_metadata and adding parquet and seq. Reran the tests to confirm that it still failed before the fix. http://gerrit.cloudera.org:8080/#/c/7625/3/be/src/exec/hdfs-text-scanner.cc File be/src/exec/hdfs-text-scanner.cc: PS3, Line 589: c > nit: 4 spaces Done http://gerrit.cloudera.org:8080/#/c/7625/3/tests/metadata/test_partition_metadata.py File tests/metadata/test_partition_metadata.py: PS3, Line 86: Imapala > There's still an extra 'a' :) Done PS3, Line 125: > Trailing whitespace. Done -- To view, visit http://gerrit.cloudera.org:8080/7625 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Gabor KaszabGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5412: Fix scan result with partitions on same file
Tim Armstrong has uploaded a new patch set (#5). Change subject: IMPALA-5412: Fix scan result with partitions on same file .. IMPALA-5412: Fix scan result with partitions on same file The maps storing file descriptors and file metadata were using filename as a key. Multiple partitions pointing to the same filesystem location resulted that these map entries were occasionally overwritted by the other partition poing to the same. As a solution the map key was enhanced to contain a pair of partition ID and file name. Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9 --- M be/src/exec/base-sequence-scanner.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/scanner-context.cc M be/src/util/container-util.h M tests/metadata/test_partition_metadata.py 8 files changed, 109 insertions(+), 47 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/7625/5 -- To view, visit http://gerrit.cloudera.org:8080/7625 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Gabor KaszabGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5412 Fix scan result with partitions on same file
Tim Armstrong has uploaded a new patch set (#4). Change subject: IMPALA-5412 Fix scan result with partitions on same file .. IMPALA-5412 Fix scan result with partitions on same file The maps storing file descriptors and file metadata were using filename as a key. Multiple partitions pointing to the same filesystem location resulted that these map entries were occasionally overwritted by the other partition poing to the same. As a solution the map key was enhanced to contain a pair of partition ID and file name. Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9 --- M be/src/exec/base-sequence-scanner.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/scanner-context.cc M be/src/util/container-util.h M tests/metadata/test_partition_metadata.py 8 files changed, 109 insertions(+), 47 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/7625/4 -- To view, visit http://gerrit.cloudera.org:8080/7625 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Gabor KaszabGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5412 Fix scan result with partitions on same file
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-5412 Fix scan result with partitions on same file .. Patch Set 3: (1 comment) > (2 comments) > > Gabor told me that he won't be able to work on this for around 10 > days. From my point of view this is pretty close, so I think we > should consider filing a follow-on JIRA for the extra test coverage > and getting this in soonish to give it time to bake. Works for me. http://gerrit.cloudera.org:8080/#/c/7625/3/be/src/exec/hdfs-text-scanner.cc File be/src/exec/hdfs-text-scanner.cc: PS3, Line 589: c nit: 4 spaces -- To view, visit http://gerrit.cloudera.org:8080/7625 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Gabor KaszabGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5412 Fix scan result with partitions on same file
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5412 Fix scan result with partitions on same file .. Patch Set 3: (2 comments) Gabor told me that he won't be able to work on this for around 10 days. From my point of view this is pretty close, so I think we should consider filing a follow-on JIRA for the extra test coverage and getting this in soonish to give it time to bake. http://gerrit.cloudera.org:8080/#/c/7625/3/tests/metadata/test_partition_metadata.py File tests/metadata/test_partition_metadata.py: PS3, Line 86: Imapala There's still an extra 'a' :) PS3, Line 125: Trailing whitespace. -- To view, visit http://gerrit.cloudera.org:8080/7625 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Gabor KaszabGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5412 Fix scan result with partitions on same file
Gabor Kaszab has posted comments on this change. Change subject: IMPALA-5412 Fix scan result with partitions on same file .. Patch Set 1: (20 comments) http://gerrit.cloudera.org:8080/#/c/7625/1/be/src/exec/base-sequence-scanner.cc File be/src/exec/base-sequence-scanner.cc: Line 90: context->partition_descriptor()->id(),stream_->filename())); > nit:space after , Done PS1, Line 163: ,s > space Done PS1, Line 309: > 4 spaces Done http://gerrit.cloudera.org:8080/#/c/7625/1/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 613: context_->partition_descriptor()->id(), filename()); > 4 spaces Done http://gerrit.cloudera.org:8080/#/c/7625/1/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: PS1, Line 266: , > space Done PS1, Line 266: string > Do we need to call the string constructor explicitly? Seems a bit weird. Done PS1, Line 266: string > remove string constructor here Done http://gerrit.cloudera.org:8080/#/c/7625/1/be/src/exec/hdfs-scan-node-base.h File be/src/exec/hdfs-scan-node-base.h: Line 197: /// Allocate a new scan range object, stored in the runtime state's object pool. For > This comment needs updating since partition_id is now required. Done PS1, Line 237: /// Returns nullptr if the search doesn't find the descriptor. > not true; it has a dcheck Done PS1, Line 360: pair_hash > misleading name because the hash is computed on the value. apparently, unordered_map doesn't accept pair as key by default, only if a custom hash is provided. Line 360: struct pair_hash { > We have some utilities like this already in util/container-util.h, let's mo Done Line 363: return std::hash{}(p.second); > Let's hash both values in the pair and combine them with boost::hash_combin Done PS1, Line 369: pair> Can you typedef this and using that here and in the .cc file ? Done PS1, Line 381: pair > same Done http://gerrit.cloudera.org:8080/#/c/7625/1/tests/metadata/test_partition_metadata.py File tests/metadata/test_partition_metadata.py: Line 71: data = self.execute_scalar("select sum(i), sum(j) from %s" % FQ_TBL_NAME) > Can we run this query with num_nodes=1 to verify that the same bug doesn't Done PS1, Line 81: ad > as? Done PS1, Line 81: imapala > Impala Done PS1, Line 109: output > 'output' isn't clear (it doesn't match up with the queries). say this is wh Done Line 110: # (note, that shortcoming on avro writer would result the inserted value as NULL, > How about we only query the second column to avoid this complication? Done PS1, Line 122: output: : # [NULL, 1] 3 times : # [NULL, 2] 3 times > again, this is confusing Done -- To view, visit http://gerrit.cloudera.org:8080/7625 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5412 Fix scan result with partitions on same file
Gabor Kaszab has uploaded a new patch set (#3). Change subject: IMPALA-5412 Fix scan result with partitions on same file .. IMPALA-5412 Fix scan result with partitions on same file The maps storing file descriptors and file metadata were using filename as a key. Multiple partitions pointing to the same filesystem location resulted that these map entries were occasionally overwritted by the other partition poing to the same. As a solution the map key was enhanced to contain a pair of partition ID and file name. Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9 --- M be/src/exec/base-sequence-scanner.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/scanner-context.cc M be/src/util/container-util.h M tests/metadata/test_partition_metadata.py 8 files changed, 120 insertions(+), 38 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/7625/3 -- To view, visit http://gerrit.cloudera.org:8080/7625 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Gabor KaszabGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5412 Fix scan result with partitions on same file
Gabor Kaszab has uploaded a new patch set (#2). Change subject: IMPALA-5412 Fix scan result with partitions on same file .. IMPALA-5412 Fix scan result with partitions on same file The maps storing file descriptors and file metadata were using filename as a key. Multiple partitions pointing to the same filesystem location resulted that these map entries were occasionally overwritted by the other partition poing to the same. As a solution the map key was enhanced to contain a pair of partition ID and file name. Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9 --- M be/src/exec/base-sequence-scanner.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/scanner-context.cc M tests/metadata/test_partition_metadata.py 7 files changed, 109 insertions(+), 34 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/7625/2 -- To view, visit http://gerrit.cloudera.org:8080/7625 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Gabor KaszabGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong