[Impala-ASF-CR] IMPALA-4893: Efficiently update the rows read counter for sequence file

2017-04-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4893: Efficiently update the rows read counter for 
sequence file
..


Patch Set 7: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6522
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4893: Efficiently update the rows read counter for sequence file

2017-04-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4893: Efficiently update the rows read counter for 
sequence file
..


Patch Set 7:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/504/

-- 
To view, visit http://gerrit.cloudera.org:8080/6522
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4893: Efficiently update the rows read counter for sequence file

2017-04-25 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-4893: Efficiently update the rows read counter for 
sequence file
..


Patch Set 7: Code-Review+2

Carrying Tim's +2 for Anuj

-- 
To view, visit http://gerrit.cloudera.org:8080/6522
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4893: Efficiently update the rows read counter for sequence file

2017-04-14 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4893: Efficiently update the rows read counter for 
sequence file
..


Patch Set 6: Code-Review+2

We've had some build stability issues. I'm going to hold off on merging this 
for a bit.

-- 
To view, visit http://gerrit.cloudera.org:8080/6522
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4893: Efficiently update the rows read counter for sequence file

2017-04-14 Thread anujphadke (Code Review)
anujphadke has posted comments on this change.

Change subject: IMPALA-4893: Efficiently update the rows read counter for 
sequence file
..


Patch Set 6:

(1 comment)

Moed the set NUM_NODES=1 option from the hdfs_scanner_profile.test  file to 
test_scanner.py

http://gerrit.cloudera.org:8080/#/c/6522/5/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

PS5, Line 74: _format')
> Can you also rename the file to match the test name? Just makes it easier t
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6522
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4893: Efficiently update the rows read counter for sequence file

2017-04-14 Thread anujphadke (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6522

to look at the new patch set (#6).

Change subject: IMPALA-4893: Efficiently update the rows read counter for 
sequence file
..

IMPALA-4893: Efficiently update the rows read counter for sequence file

Update the rows read counter after processing the scan range instead of updating
it after reading every row for sequence files to save CPU cycles.

Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541
---
M be/src/exec/hdfs-sequence-scanner.cc
A 
testdata/workloads/functional-query/queries/QueryTest/hdfs_scanner_profile.test
M tests/query_test/test_scanners.py
3 files changed, 19 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/6522/6
-- 
To view, visit http://gerrit.cloudera.org:8080/6522
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-4893: Efficiently update the rows read counter for sequence file

2017-04-12 Thread anujphadke (Code Review)
anujphadke has posted comments on this change.

Change subject: IMPALA-4893: Efficiently update the rows read counter for 
sequence file
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6522/4/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

Line 70: 
> We can make this more concise:
Done


Line 74: self.run_test_case('QueryTest/rows-read', vector)
> Might as well make this a separate test method, e.g. test_rows_read(). Or a
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6522
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4893: Efficiently update the rows read counter for sequence file

2017-04-12 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4893: Efficiently update the rows read counter for 
sequence file
..


Patch Set 4:

(2 comments)

Only minor style comments remaining.

http://gerrit.cloudera.org:8080/#/c/6522/4/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

Line 70: if new_vector.get_value('table_format').file_format == 'kudu':
We can make this more concise:
  in ('kudu, 'hbase')


Line 74: self.run_test_case('QueryTest/rows-read', new_vector)
Might as well make this a separate test method, e.g. test_rows_read(). Or 
actually that seems very specific. Maybe rename to something like 
test_hdfs_scanner_profile/hdfs-scanner-profile.


-- 
To view, visit http://gerrit.cloudera.org:8080/6522
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4893: Efficiently update the rows read counter for sequence file

2017-04-05 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4893: Efficiently update the rows read counter for 
sequence file
..


Patch Set 3:

I ran an exhaustive build and it looks like the test failed on other file 
formats. I think the problems I saw were:

* If the data is split between multiple files we get values < 10k (i.e. we 
probably need to have a separate test where we set num_nodes=1).
* Some scanners, e.g. Kudu, don't have this counter

-- 
To view, visit http://gerrit.cloudera.org:8080/6522
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4893: Efficiently update the rows read counter for sequence file

2017-04-03 Thread anujphadke (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6522

to look at the new patch set (#3).

Change subject: IMPALA-4893: Efficiently update the rows read counter for 
sequence file
..

IMPALA-4893: Efficiently update the rows read counter for sequence file

Update the rows read counter after processing the scan range instead of updating
it after reading every row for sequence files to save CPU cycles.

Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541
---
M be/src/exec/hdfs-sequence-scanner.cc
M testdata/workloads/functional-query/queries/QueryTest/scanners.test
2 files changed, 6 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/6522/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6522
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-4893: Efficiently update the rows read counter for sequence file

2017-04-03 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4893: Efficiently update the rows read counter for 
sequence file
..


Patch Set 2: Code-Review+2

(1 comment)

Looks good to me. Did you run the modified scanner tests for all the exhaustive 
file formats? I'll run that myself before merging.

http://gerrit.cloudera.org:8080/#/c/6522/2/be/src/exec/hdfs-sequence-scanner.cc
File be/src/exec/hdfs-sequence-scanner.cc:

Line 345: if (stream_->eof()) {
We put simple "if" statements on one like 
(https://google.github.io/styleguide/cppguide.html#Conditionals)

  if (stream_->eof()) break;


-- 
To view, visit http://gerrit.cloudera.org:8080/6522
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4893: Efficiently update the rows read counter for sequence file

2017-04-03 Thread anujphadke (Code Review)
anujphadke has posted comments on this change.

Change subject: IMPALA-4893: Efficiently update the rows read counter for 
sequence file
..


Patch Set 2:

(1 comment)

Added a test to test the RowsRead Counter

http://gerrit.cloudera.org:8080/#/c/6522/1/be/src/exec/hdfs-sequence-scanner.cc
File be/src/exec/hdfs-sequence-scanner.cc:

Line 346:   break;
> I think we can avoid the duplicated logic if we break here instead of retur
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6522
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4893: Efficiently update the rows read counter for sequence file

2017-04-03 Thread anujphadke (Code Review)
anujphadke has uploaded a new patch set (#2).

Change subject: IMPALA-4893: Efficiently update the rows read counter for 
sequence file
..

IMPALA-4893: Efficiently update the rows read counter for sequence file

Update the rows read counter after processing the scan range instead of updating
it after reading every row for sequence files to save CPU cycles.

Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541
---
M be/src/exec/hdfs-sequence-scanner.cc
M testdata/workloads/functional-query/queries/QueryTest/scanners.test
2 files changed, 8 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/6522/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6522
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4893: Efficiently update the rows read counter for sequence file

2017-03-31 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4893: Efficiently update the rows read counter for 
sequence file
..


Patch Set 1:

(1 comment)

Can you add a test for the RowsRead counter? It would be nice extra coverage 
that I think we're currently missing.

E.g. I think in scanners.test we do some full table scans of some functional 
tables, and that is run for all file formats.

It looks like runtime_filters.test has some verification of RowsRead.

http://gerrit.cloudera.org:8080/#/c/6522/1/be/src/exec/hdfs-sequence-scanner.cc
File be/src/exec/hdfs-sequence-scanner.cc:

Line 346:   COUNTER_ADD(scan_node_->rows_read_counter(), num_rows_read);
I think we can avoid the duplicated logic if we break here instead of 
returning. I.e.

if (stream->eof()) break;


-- 
To view, visit http://gerrit.cloudera.org:8080/6522
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4893: Efficiently update the rows read counter for sequence file

2017-03-30 Thread anujphadke (Code Review)
anujphadke has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6522

Change subject: IMPALA-4893: Efficiently update the rows read counter for 
sequence file
..

IMPALA-4893: Efficiently update the rows read counter for sequence file

Update the rows read counter after processing the scan range instead of updating
it after reading every row for sequence files to save CPU cycles.

Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541
---
M be/src/exec/hdfs-sequence-scanner.cc
1 file changed, 7 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/6522/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6522
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke