[Impala-ASF-CR] IMPALA-4893: Efficiently update the rows read counter for sequence file
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4893: Efficiently update the rows read counter for sequence file .. Patch Set 7: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6522 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadkeGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4893: Efficiently update the rows read counter for sequence file
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4893: Efficiently update the rows read counter for sequence file .. Patch Set 7: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/504/ -- To view, visit http://gerrit.cloudera.org:8080/6522 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadkeGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4893: Efficiently update the rows read counter for sequence file
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-4893: Efficiently update the rows read counter for sequence file .. Patch Set 7: Code-Review+2 Carrying Tim's +2 for Anuj -- To view, visit http://gerrit.cloudera.org:8080/6522 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadkeGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4893: Efficiently update the rows read counter for sequence file
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4893: Efficiently update the rows read counter for sequence file .. Patch Set 6: Code-Review+2 We've had some build stability issues. I'm going to hold off on merging this for a bit. -- To view, visit http://gerrit.cloudera.org:8080/6522 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadkeGerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4893: Efficiently update the rows read counter for sequence file
anujphadke has posted comments on this change. Change subject: IMPALA-4893: Efficiently update the rows read counter for sequence file .. Patch Set 6: (1 comment) Moed the set NUM_NODES=1 option from the hdfs_scanner_profile.test file to test_scanner.py http://gerrit.cloudera.org:8080/#/c/6522/5/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: PS5, Line 74: _format') > Can you also rename the file to match the test name? Just makes it easier t Done -- To view, visit http://gerrit.cloudera.org:8080/6522 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadkeGerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4893: Efficiently update the rows read counter for sequence file
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6522 to look at the new patch set (#6). Change subject: IMPALA-4893: Efficiently update the rows read counter for sequence file .. IMPALA-4893: Efficiently update the rows read counter for sequence file Update the rows read counter after processing the scan range instead of updating it after reading every row for sequence files to save CPU cycles. Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541 --- M be/src/exec/hdfs-sequence-scanner.cc A testdata/workloads/functional-query/queries/QueryTest/hdfs_scanner_profile.test M tests/query_test/test_scanners.py 3 files changed, 19 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/6522/6 -- To view, visit http://gerrit.cloudera.org:8080/6522 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadkeGerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke
[Impala-ASF-CR] IMPALA-4893: Efficiently update the rows read counter for sequence file
anujphadke has posted comments on this change. Change subject: IMPALA-4893: Efficiently update the rows read counter for sequence file .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/6522/4/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: Line 70: > We can make this more concise: Done Line 74: self.run_test_case('QueryTest/rows-read', vector) > Might as well make this a separate test method, e.g. test_rows_read(). Or a Done -- To view, visit http://gerrit.cloudera.org:8080/6522 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadkeGerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4893: Efficiently update the rows read counter for sequence file
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4893: Efficiently update the rows read counter for sequence file .. Patch Set 4: (2 comments) Only minor style comments remaining. http://gerrit.cloudera.org:8080/#/c/6522/4/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: Line 70: if new_vector.get_value('table_format').file_format == 'kudu': We can make this more concise: in ('kudu, 'hbase') Line 74: self.run_test_case('QueryTest/rows-read', new_vector) Might as well make this a separate test method, e.g. test_rows_read(). Or actually that seems very specific. Maybe rename to something like test_hdfs_scanner_profile/hdfs-scanner-profile. -- To view, visit http://gerrit.cloudera.org:8080/6522 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadkeGerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4893: Efficiently update the rows read counter for sequence file
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4893: Efficiently update the rows read counter for sequence file .. Patch Set 3: I ran an exhaustive build and it looks like the test failed on other file formats. I think the problems I saw were: * If the data is split between multiple files we get values < 10k (i.e. we probably need to have a separate test where we set num_nodes=1). * Some scanners, e.g. Kudu, don't have this counter -- To view, visit http://gerrit.cloudera.org:8080/6522 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadkeGerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4893: Efficiently update the rows read counter for sequence file
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6522 to look at the new patch set (#3). Change subject: IMPALA-4893: Efficiently update the rows read counter for sequence file .. IMPALA-4893: Efficiently update the rows read counter for sequence file Update the rows read counter after processing the scan range instead of updating it after reading every row for sequence files to save CPU cycles. Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541 --- M be/src/exec/hdfs-sequence-scanner.cc M testdata/workloads/functional-query/queries/QueryTest/scanners.test 2 files changed, 6 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/6522/3 -- To view, visit http://gerrit.cloudera.org:8080/6522 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadkeGerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke
[Impala-ASF-CR] IMPALA-4893: Efficiently update the rows read counter for sequence file
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4893: Efficiently update the rows read counter for sequence file .. Patch Set 2: Code-Review+2 (1 comment) Looks good to me. Did you run the modified scanner tests for all the exhaustive file formats? I'll run that myself before merging. http://gerrit.cloudera.org:8080/#/c/6522/2/be/src/exec/hdfs-sequence-scanner.cc File be/src/exec/hdfs-sequence-scanner.cc: Line 345: if (stream_->eof()) { We put simple "if" statements on one like (https://google.github.io/styleguide/cppguide.html#Conditionals) if (stream_->eof()) break; -- To view, visit http://gerrit.cloudera.org:8080/6522 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadkeGerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4893: Efficiently update the rows read counter for sequence file
anujphadke has posted comments on this change. Change subject: IMPALA-4893: Efficiently update the rows read counter for sequence file .. Patch Set 2: (1 comment) Added a test to test the RowsRead Counter http://gerrit.cloudera.org:8080/#/c/6522/1/be/src/exec/hdfs-sequence-scanner.cc File be/src/exec/hdfs-sequence-scanner.cc: Line 346: break; > I think we can avoid the duplicated logic if we break here instead of retur Done -- To view, visit http://gerrit.cloudera.org:8080/6522 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadkeGerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4893: Efficiently update the rows read counter for sequence file
anujphadke has uploaded a new patch set (#2). Change subject: IMPALA-4893: Efficiently update the rows read counter for sequence file .. IMPALA-4893: Efficiently update the rows read counter for sequence file Update the rows read counter after processing the scan range instead of updating it after reading every row for sequence files to save CPU cycles. Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541 --- M be/src/exec/hdfs-sequence-scanner.cc M testdata/workloads/functional-query/queries/QueryTest/scanners.test 2 files changed, 8 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/6522/2 -- To view, visit http://gerrit.cloudera.org:8080/6522 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadkeGerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4893: Efficiently update the rows read counter for sequence file
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4893: Efficiently update the rows read counter for sequence file .. Patch Set 1: (1 comment) Can you add a test for the RowsRead counter? It would be nice extra coverage that I think we're currently missing. E.g. I think in scanners.test we do some full table scans of some functional tables, and that is run for all file formats. It looks like runtime_filters.test has some verification of RowsRead. http://gerrit.cloudera.org:8080/#/c/6522/1/be/src/exec/hdfs-sequence-scanner.cc File be/src/exec/hdfs-sequence-scanner.cc: Line 346: COUNTER_ADD(scan_node_->rows_read_counter(), num_rows_read); I think we can avoid the duplicated logic if we break here instead of returning. I.e. if (stream->eof()) break; -- To view, visit http://gerrit.cloudera.org:8080/6522 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadkeGerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4893: Efficiently update the rows read counter for sequence file
anujphadke has uploaded a new change for review. http://gerrit.cloudera.org:8080/6522 Change subject: IMPALA-4893: Efficiently update the rows read counter for sequence file .. IMPALA-4893: Efficiently update the rows read counter for sequence file Update the rows read counter after processing the scan range instead of updating it after reading every row for sequence files to save CPU cycles. Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541 --- M be/src/exec/hdfs-sequence-scanner.cc 1 file changed, 7 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/6522/1 -- To view, visit http://gerrit.cloudera.org:8080/6522 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadke