[Impala-ASF-CR] IMPALA-4957: Don't crash Impalad on LLVM fatal error

2017-06-14 Thread Michael Ho (Code Review)
Michael Ho has abandoned this change.

Change subject: IMPALA-4957: Don't crash Impalad on LLVM fatal error
..


Abandoned

Abandoned until we come up with a better approach.

-- 
To view, visit http://gerrit.cloudera.org:8080/6318
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: I54706e261ed223eadde347b1184fb0102e03a3d6
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-5158: report buffer pool free memory in MemTracker

2017-06-14 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5158: report buffer pool free memory in MemTracker
..


Patch Set 4: Code-Review+2

rebase

-- 
To view, visit http://gerrit.cloudera.org:8080/6993
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I007eb258377b33fff9f3246580d80fa551837078
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5158: report buffer pool free memory in MemTracker

2017-06-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5158: report buffer pool free memory in MemTracker
..


Patch Set 3:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/734/

-- 
To view, visit http://gerrit.cloudera.org:8080/6993
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I007eb258377b33fff9f3246580d80fa551837078
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5316: Adds last day() function

2017-06-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5316: Adds last_day() function
..


IMPALA-5316: Adds last_day() function

This change adds last_day() function.
The function takes exactly one TIMESTAMP argument
and returns a TIMESTAMP that is the last date of the
input date's calendar month.
The function will return NULL when:
  1) The input argument cannot be implicitly casted to
 a TIMESTAMP.
  2) The TIMESTAMP argument is missing a date component.
  3) The TIMESTAMP argument is outside of the supported range:
 between 1400-01-31 00:00:00 and -12-31 23:59:59

Change-Id: I429c8734bddca3c37a2eedc211a16a4ffcb04370
Reviewed-on: http://gerrit.cloudera.org:8080/6991
Reviewed-by: Matthew Jacobs 
Tested-by: Impala Public Jenkins
---
M be/src/exprs/expr-test.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.h
M common/function-registry/impala_functions.py
4 files changed, 75 insertions(+), 0 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Matthew Jacobs: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6991
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I429c8734bddca3c37a2eedc211a16a4ffcb04370
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Vincent Tran 


[Impala-ASF-CR] IMPALA-5316: Adds last day() function

2017-06-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5316: Adds last_day() function
..


Patch Set 5: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6991
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I429c8734bddca3c37a2eedc211a16a4ffcb04370
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Vincent Tran 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5036: Parquet count star optimization

2017-06-14 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5036: Parquet count star optimization
..


Patch Set 4:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/6812/4/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 455: // There are no materialized slots, e.g.
There are no materialized slots and we are not optimizing count(*)


Line 456: // "select count(*) from alltypes where int_col > 5" and "select 
1 from alltypes".
The first query is not correct, in that case int_col is materialized. I think 
the second query is a sufficient example.


http://gerrit.cloudera.org:8080/#/c/6812/4/fe/src/main/java/org/apache/impala/planner/ScanNode.java
File fe/src/main/java/org/apache/impala/planner/ScanNode.java:

Line 171:   if (slot.getColumn() != null && !slot.getStats().hasStats()) 
return true;
Thanks. This was a bug in general, not really specific to your change.


http://gerrit.cloudera.org:8080/#/c/6812/4/testdata/workloads/functional-planner/queries/PlannerTest/parquet-stats-agg.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-stats-agg.test:

Line 105: from functional_parquet.alltypes
single line query


Line 150: # The optimization is disabled if the output of the count(*) inline 
view is being
Optimization is not applied because the inner count(*) is not materialized. The 
outer count(*) does not reference a base table.


Line 285: # Optimization is not applied when selecting from an empty table.
Do we apply the optimization when we reference a non-empty partitioned table, 
but then we prune all partitions?


Line 323: # materialized. Not materialized agg exprs are ignored.
Non-materialized agg exprs are ignored.


http://gerrit.cloudera.org:8080/#/c/6812/4/testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test
File 
testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test:

Line 86: # Verify that 0 is returned when we are selecting from an empty table 
and the optimization
the optimization is not applied in this case right?


Line 95: # Verify that 0 is returned when all the partitioned columns are 
filtered.
when all partitions are pruned


http://gerrit.cloudera.org:8080/#/c/6812/4/tests/query_test/test_aggregation.py
File tests/query_test/test_aggregation.py:

Line 275: if (vector.get_value('table_format').file_format != 'text' or
seems weird, why not

vector.get_value('table_format').file_format != 'parquet'


Line 280: vector.get_value('exec_option')['batch_size'] = 1
Doesn't exhaustive run this test with multiple batch sizes already? If so, then 
no need for this.


-- 
To view, visit http://gerrit.cloudera.org:8080/6812
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I536b85c014821296aed68a0c68faadae96005e62
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5499: avoid ephemeral port conflicts

2017-06-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5499: avoid ephemeral port conflicts
..


Patch Set 3: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7171
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id02c83e6f946a14b83f5e6561957d7ad81442835
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5499: avoid ephemeral port conflicts

2017-06-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5499: avoid ephemeral port conflicts
..


IMPALA-5499: avoid ephemeral port conflicts

We should not select the same port twice, which could happen because
multiple ports were selected consecutively without actually binding to
any of the ports.

Change-Id: Id02c83e6f946a14b83f5e6561957d7ad81442835
Reviewed-on: http://gerrit.cloudera.org:8080/7171
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins
---
M be/src/rpc/thrift-server-test.cc
M be/src/statestore/statestore-test.cc
M be/src/testutil/in-process-servers.cc
M be/src/util/network-util.cc
M be/src/util/network-util.h
5 files changed, 27 insertions(+), 17 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Tim Armstrong: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7171
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Id02c83e6f946a14b83f5e6561957d7ad81442835
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-4957: Don't crash Impalad on LLVM fatal error

2017-06-14 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4957: Don't crash Impalad on LLVM fatal error
..


Patch Set 5:

Where does this change stand? I think we decided this isn't safe to do, right? 
Should we abandon it or do you have a plan?

-- 
To view, visit http://gerrit.cloudera.org:8080/6318
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I54706e261ed223eadde347b1184fb0102e03a3d6
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: No


Re: [Impala-ASF-CR] IMPALA-5494: Fixes the selectivity of NOT IN predicates

2017-06-14 Thread Alexander Behm
Planner test failed in PlannerTest.testKuduSelectivity. Please fix,

On Wed, Jun 14, 2017 at 9:16 PM, Impala Public Jenkins (Code Review) <
ger...@cloudera.org> wrote:

> Impala Public Jenkins has posted comments on this change.
>
> Change subject: IMPALA-5494: Fixes the selectivity of NOT IN predicates
> ..
>
>
> Patch Set 7: Verified-1
>
> Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/731/
>
> --
> To view, visit http://gerrit.cloudera.org:8080/7168
> To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
>
> Gerrit-MessageType: comment
> Gerrit-Change-Id: I69e6217257b5618cb63e13b32ba3347fa0483b63
> Gerrit-PatchSet: 7
> Gerrit-Project: Impala-ASF
> Gerrit-Branch: master
> Gerrit-Owner: Vincent Tran 
> Gerrit-Reviewer: Alex Behm 
> Gerrit-Reviewer: Bharath Vissapragada 
> Gerrit-Reviewer: Impala Public Jenkins
> Gerrit-Reviewer: Vincent Tran 
> Gerrit-HasComments: No
>
> --
> You received this message because you are subscribed to the Google Groups
> "impala-cr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to impala-cr+unsubscr...@cloudera.com.
> For more options, visit https://groups.google.com/a/cloudera.com/d/optout.
>


[Impala-ASF-CR] IMPALA-4418: Extra blank lines in query result

2017-06-14 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4418: Extra blank lines in query result
..


Patch Set 3:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/7055/3//COMMIT_MSG
Commit Message:

Line 11: that returns 0 row.
This change avoids printing blank lines when the Impala shell fetches 0 rows 
from a statement.


http://gerrit.cloudera.org:8080/#/c/7055/3/shell/impala_shell.py
File shell/impala_shell.py:

Line 922:   # IMPALA-4418: Breaking out of the loop to prevent an empty 
newline printing
Break out of the loop to prevent printing an unnecessary empty line.

(the rest is clear from the code context)


http://gerrit.cloudera.org:8080/#/c/7055/3/tests/shell/test_shell_interactive.py
File tests/shell/test_shell_interactive.py:

Line 293: # IMPALA-4418
We use this format

# IMPALA-4418: DROP AND USE ...
# ...
# ...


Line 294: # DROP and USE are exception cases that client does not fetch.
DROP and USE are generally exceptional statements where the client does not 
fetch. However, when preceded by a comment, the Impala shell treats them like 
any other statement and will try to fetch - receiving 0 rows. For statements 
returning 0 rows we do not want an empty line in stdout.


Line 296: # CREATE [DATABASE|TABLE] should trigger a 0 row fetch
Remove this part about CREATE in favor of the more general formulation above.

Maybe add another test that does not require create, e.g.:
select * from functional.alltypes limit 0


-- 
To view, visit http://gerrit.cloudera.org:8080/7055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e18ce36be07ee90a16b007b1e30d5255ef8a839
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Vincent Tran 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5494: Fixes the selectivity of NOT IN predicates

2017-06-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5494: Fixes the selectivity of NOT IN predicates
..


Patch Set 7: Verified-1

Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/731/

-- 
To view, visit http://gerrit.cloudera.org:8080/7168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I69e6217257b5618cb63e13b32ba3347fa0483b63
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Vincent Tran 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5158: report buffer pool free memory in MemTracker

2017-06-14 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5158: report buffer pool free memory in MemTracker
..


Patch Set 3: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6993
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I007eb258377b33fff9f3246580d80fa551837078
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4987: Skip test rows availability when testing over a network.

2017-06-14 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4987: Skip test_rows_availability when testing over a 
network.
..


Patch Set 1:

Ping?

-- 
To view, visit http://gerrit.cloudera.org:8080/6144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4449d82ccd86fcf22ea96b36618c6de87458ffce
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4086: Add benchmark for simple scheduler

2017-06-14 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4086: Add benchmark for simple scheduler
..


Patch Set 5:

Tim, please do the +2 on this one once Lars response to your last comments and 
you're happy with the change.

-- 
To view, visit http://gerrit.cloudera.org:8080/4554
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I89ec1c6c1828bb0b86d1e13ce4dfc5a8df865c2e
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5315 Cast to timestamp fails for YYYY-M-D format

2017-06-14 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5315 Cast to timestamp fails for -M-D format
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7009/3/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

Line 2216:   TimestampValue::Parse("2001-01-21 00:00:00"));
> So looks like both of these formats should be supported:
Sorry for the delayed response, I missed this question.

That's going to result in a lot of stamped out formats though, right? But I 
guess we don't have a great alternative given the way the code works today. 
Though you'll have to basically parse the input format in order to match it up 
since many of them will have the same lengths, so it seems like there should be 
a better way where we generate the parse information on-the-fly. But if you 
don't see a good way to do that efficiently, I guess stamping them out is okay.


-- 
To view, visit http://gerrit.cloudera.org:8080/7009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Vincent Tran 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5061: Populate null count in parquet::statistics

2017-06-14 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5061: Populate null_count in parquet::statistics
..


Patch Set 8:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7058/8/be/src/exec/parquet-column-stats.h
File be/src/exec/parquet-column-stats.h:

PS8, Line 96: has_values
is that dead now? if so, how about we delete it?


Line 107:   /// Stores whether the current object has been initialized with a 
set of values.
it looks like has_value_ doesn't indicate whether null_count_ is valid or not, 
right? Let's clarify that, maybe say: ... with min and max values.


-- 
To view, visit http://gerrit.cloudera.org:8080/7058
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4c49a63af84c2234f0633be63206cb52eb7e8ebb
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Pooja Nilangekar 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Pooja Nilangekar 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5085: large rows in BufferedTupleStreamV2

2017-06-14 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5085: large rows in BufferedTupleStreamV2
..


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6638/10/be/src/runtime/buffered-tuple-stream-v2.h
File be/src/runtime/buffered-tuple-stream-v2.h:

PS10, Line 284: AllocateRow
> Because it depends on the caller writing out the row in a specific format.
I guess let's go with "Custom".


-- 
To view, visit http://gerrit.cloudera.org:8080/6638
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2861c58efa7bc1aeaa5b7e2f043c97cb3985c8f5
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4418: Extra blank lines in query result

2017-06-14 Thread Vincent Tran (Code Review)
Vincent Tran has posted comments on this change.

Change subject: IMPALA-4418: Extra blank lines in query result
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/7055/2/shell/impala_shell.py
File shell/impala_shell.py:

Line 919:   if len(rows) == 0:
> Comment on why we are doing this
Done


http://gerrit.cloudera.org:8080/#/c/7055/2/tests/shell/test_shell_interactive.py
File tests/shell/test_shell_interactive.py:

Line 294:   # DROP and USE are exception cases that client does not fetch
> Prefix with IMPALA-4418
Done


Line 297:   run_impala_shell_interactive("drop database if exists d1;")
> Why not use a test similar to what was reported in the JIRA, i.e. issue a u
Good point. Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e18ce36be07ee90a16b007b1e30d5255ef8a839
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Vincent Tran 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5507: Add clear description to help information of KEYVAL option

2017-06-14 Thread Donghui Xu (Code Review)
Donghui Xu has uploaded a new patch set (#3).

Change subject: IMPALA-5507: Add clear description to help information of 
KEYVAL option
..

IMPALA-5507: Add clear description to help information of KEYVAL option

Help information of KEYVAL option in impala-shell is not clear enough.

I fix this issue by adding clear description to help information of
KEYVAL option.

Change-Id: I68cfc16838c6c0e7813f03dd4296f9eb54ec4c63
---
M shell/option_parser.py
1 file changed, 6 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/7179/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7179
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I68cfc16838c6c0e7813f03dd4296f9eb54ec4c63
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Donghui Xu 


[Impala-ASF-CR] IMPALA-5036: Parquet count star optimization

2017-06-14 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-5036: Parquet count star optimization
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6812/3/testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test:

Line 324: WARNING: The following tables are missing relevant table and/or 
column statistics.
> This has nothing to do with my setup. It passes a private build on Jenkins.
This is fixed now, disregard the above comment.


-- 
To view, visit http://gerrit.cloudera.org:8080/6812
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I536b85c014821296aed68a0c68faadae96005e62
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5036: Parquet count star optimization

2017-06-14 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#4).

Change subject: IMPALA-5036: Parquet count star optimization
..

IMPALA-5036: Parquet count star optimization

Instead of materializing empty rows when computing count star, we use
the data stored in the Parquet RowGroup.num_rows field. The Parquet
scanner tuple is modified to have one slot into which we will write the
num rows statistic. The aggregate function is changed from count to a
special sum function that gets initialized to 0. We also add a rewrite
rule so that count() is rewritten to count(*) in order to make
sure that this optimization is applied in all cases.

Testing:
- Added functional and planner tests

Change-Id: I536b85c014821296aed68a0c68faadae96005e62
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionParams.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
A fe/src/main/java/org/apache/impala/rewrite/NormalizeCountStarRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/distinct.test
M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-stats-agg.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
A testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test
R testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
M tests/query_test/test_aggregation.py
M tests/query_test/test_parquet_stats.py
26 files changed, 874 insertions(+), 79 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/6812/4
-- 
To view, visit http://gerrit.cloudera.org:8080/6812
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I536b85c014821296aed68a0c68faadae96005e62
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-5036: Parquet count star optimization

2017-06-14 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-5036: Parquet count star optimization
..


Patch Set 3:

(28 comments)

http://gerrit.cloudera.org:8080/#/c/6812/3/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 445:   *dst_slot = file_metadata_.row_groups[row_group_idx_].num_rows;
> Bounds check against file_metadata_.num_rows (i.e. keep a running counter a
Done


Line 452:   }
> Why not else if as in the previous patch set? Else-if seems more accurate.
Reverted to else if. (I don't think it matters if we have else if or not, the 
behavior is identical in both cases)


Line 454:   if (scan_node_->IsZeroSlotTableScan()) {
> Why is this optimization not redundant now?  Maybe update the comment to in
Done


http://gerrit.cloudera.org:8080/#/c/6812/3/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

Line 226:   11: optional i64 parquet_count_star_slot_offset
> Would it be simpler to have this be one parameter and indicate truth by pas
Yes, I did something similar. (its now true is if this parameter is set).


Line 226:   11: optional i64 parquet_count_star_slot_offset
> i32 right?
Ah yes, because it's int instead of long in Java. Done


http://gerrit.cloudera.org:8080/#/c/6812/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

Line 248:* Adds a new slot descriptor to the tuple descriptor of this scan. 
Also adds an entry
> * explain what is going to be stored in this new slot descriptor
Done


Line 249:* to 'optimizedAggSmap_' that replaces a count() with a special 
sum() function that
> that substitutes count(*) with sum_init_zero()
Done


Line 915: 
msg.hdfs_scan_node.setOptimize_parquet_count_star(optimizedAggSmap_ != null);
> Do we need to pass this to the BE? The presence/absence of the parquet_coun
Done


http://gerrit.cloudera.org:8080/#/c/6812/3/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

Line 1213:* table scans.
> instead of scanning the table (fix other places below also)
Done


http://gerrit.cloudera.org:8080/#/c/6812/3/fe/src/test/java/org/apache/impala/planner/PlannerTest.java
File fe/src/test/java/org/apache/impala/planner/PlannerTest.java:

Line 290:   public void testParquetStats() { 
runPlannerTestFile("parquet-stats-agg"); }
> testParquetStatsAgg()
Done


http://gerrit.cloudera.org:8080/#/c/6812/3/testdata/workloads/functional-planner/queries/PlannerTest/parquet-stats-agg.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-stats-agg.test:

Line 1: # Verify that that the parquet count(*) optimization is applied in all 
the cases.
> spell out "in all the cases" a little more and also mention that in one cas
Done


Line 22: |  |  output: sum_init_zero(functional_parquet.alltypes.parquet-stats: 
num_rows)
> Can we reduce this to just parquet-stats.num_rows? How do we create such a 
The slot descriptor label gets printed here that is set on line 263 in 
HdfsScanNode.java. The full path is printed by default. Are you suggesting to 
add some kind of extra plumbing how labels get printed?


Line 99:  DISTRIBUTEDPLAN
> Remove here and all tests below. I think showing the distributed plan for t
Done


Line 114: select month, count(*) from functional_parquet.alltypes group by 
month, year
> Add a negative test for this one:
Added a select count(year) from alltypes.


Line 172: select max(year), count(*) from functional_parquet.alltypes
> use avg() instead of max() because max() is going to be optimized in the sa
Done


Line 195: # IMPALA-5036
> JIRA number is not very descriptive. Describe what this test case is checki
Rewrote. Still feels like the description is not quite right.


Line 278: # The count(*) optimization is applied to the inline view even if 
there is a join.
> Add a negative test case that shows the query block must have one table ref
Done


Line 352: # tinyint_col is not partitioned so the optimization is disabled.
> tinyint_col is not a partition column
Done


Line 402: # Optimization is not applied in the case of count(null).
> is not applied to count(null)
Done


Line 451: # Optimization is not applied because the count(*) is not applied 
directly to the
> Optimization is not applied across query blocks, even though it would be co
Done


Line 453: select count(*) from ( select int_col from 
functional_parquet.alltypes) t
> Add a new test that shows we only consider materialized agg exprs, somethin
Done


Line 476: # Optimization is not applied because we are not scanning a Parquet 
table.
> Remove. This case is already covered above.
Done


http://gerrit.cloudera.org:8080/#/c/6812/3/testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
File 

[Impala-ASF-CR] IMPALA-5036: Parquet count star optimization

2017-06-14 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#4).

Change subject: IMPALA-5036: Parquet count star optimization
..

IMPALA-5036: Parquet count star optimization

Instead of materializing empty rows when computing count star, we use
the data stored in the Parquet RowGroup.num_rows field. The Parquet
scanner tuple is modified to have one slot into which we will write the
num rows statistic. The aggregate function is changed from count to a
special sum function that gets initialized to 0. We also add a rewrite
rule so that count() is rewritten to count(*) in order to make
sure that this optimization is applied in all cases.

Testing:
- Added functional and planner tests

Change-Id: I536b85c014821296aed68a0c68faadae96005e62
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionParams.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
A fe/src/main/java/org/apache/impala/rewrite/NormalizeCountStarRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/distinct.test
M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-stats-agg.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
A testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test
R testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
M tests/query_test/test_aggregation.py
M tests/query_test/test_parquet_stats.py
26 files changed, 874 insertions(+), 79 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/6812/4
-- 
To view, visit http://gerrit.cloudera.org:8080/6812
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I536b85c014821296aed68a0c68faadae96005e62
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] DRAFT

2017-06-14 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#4).

Change subject: DRAFT
..

DRAFT

IMPALA-5036: Parquet count star optimization

Instead of materializing empty rows when computing count star, we use
the data stored in the Parquet RowGroup.num_rows field. The Parquet
scanner tuple is modified to have one slot into which we will write the
num rows statistic. The aggregate function is changed from count to a
special sum function that gets initialized to 0. We also add a rewrite
rule so that count() is rewritten to count(*) in order to make
sure that this optimization is applied in all cases.

Testing:
- Added functional and planner tests

Change-Id: I536b85c014821296aed68a0c68faadae96005e62
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionParams.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
A fe/src/main/java/org/apache/impala/rewrite/NormalizeCountStarRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/distinct.test
M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-stats-agg.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
A testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test
R testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
M tests/query_test/test_aggregation.py
M tests/query_test/test_parquet_stats.py
26 files changed, 874 insertions(+), 79 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/6812/4
-- 
To view, visit http://gerrit.cloudera.org:8080/6812
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I536b85c014821296aed68a0c68faadae96005e62
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-4418: Extra blank lines in query result

2017-06-14 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4418: Extra blank lines in query result
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/7055/2/shell/impala_shell.py
File shell/impala_shell.py:

Line 919:   if len(rows) == 0:
Comment on why we are doing this


http://gerrit.cloudera.org:8080/#/c/7055/2/tests/shell/test_shell_interactive.py
File tests/shell/test_shell_interactive.py:

Line 294:   # DROP and USE are exception cases that client does not fetch
Prefix with IMPALA-4418


Line 297:   run_impala_shell_interactive("drop database if exists d1;")
Why not use a test similar to what was reported in the JIRA, i.e. issue a use? 
According to the JIRA the use should do a zero row fetch.

Better to avoid adding dummy dbs and tables, especially with short names. Those 
might conflict with local stuff a dev has.


-- 
To view, visit http://gerrit.cloudera.org:8080/7055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e18ce36be07ee90a16b007b1e30d5255ef8a839
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Alex Behm 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5389: simplify BufferDescriptor lifetime

2017-06-14 Thread Tim Armstrong (Code Review)
Hello Henry Robinson,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7182

to look at the new patch set (#2).

Change subject: IMPALA-5389: simplify BufferDescriptor lifetime
..

IMPALA-5389: simplify BufferDescriptor lifetime

This is cleanup to make the code easier to understand in anticipation of
some trickier changes to I/O buffer management for IMPALA-4835. I do
not expect any changes in behaviour as a result of this patch.

* Use unique_ptr to make BufferDescriptor ownership transfer more
  explicit and allow enforcing that the buffers are not leaked via
  a DCHECK in ~BufferDescriptor.
* Remove 'free_buffer_descs_' cache. TCMalloc is good at caching small
  objects and there will likely be a lot less lock contention compared
  with a single global lock. The cache did not avoid calls to
  malloc() anyway because appending to std::list<> requires allocating
  a list node.
* Use std::deque instead of std::list in a couple of places - it offers
  O(1) push_*()/pop_*() at both ends and requires fewer calls into
  malloc()/free().

Testing:
Ran ASAN and exhaustive builds.

Change-Id: I007d098e9a1abb1f684be37b7f1ee6c03d3879b2
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
M be/src/runtime/disk-io-mgr-scan-range.cc
M be/src/runtime/disk-io-mgr-stress.cc
M be/src/runtime/disk-io-mgr-test.cc
M be/src/runtime/disk-io-mgr.cc
M be/src/runtime/disk-io-mgr.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M be/src/runtime/tmp-file-mgr.cc
11 files changed, 137 insertions(+), 202 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/7182/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7182
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I007d098e9a1abb1f684be37b7f1ee6c03d3879b2
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Joe McDonnell 


[Impala-ASF-CR] IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.

2017-06-14 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.
..


Patch Set 4:

(6 comments)

Core tests passed (minus some known flakiness). My exhaustive tests got 
randomly killed, will retry with the latest patch.

http://gerrit.cloudera.org:8080/#/c/6527/4/be/src/exec/base-sequence-scanner.cc
File be/src/exec/base-sequence-scanner.cc:

PS4, Line 89: static_cast(
> Is this cast needed? It looks like this is already the type of scan_node_.
Not needed. Done.


http://gerrit.cloudera.org:8080/#/c/6527/4/be/src/exec/exec-node.cc
File be/src/exec/exec-node.cc:

Line 64: using std::unique_ptr;
> Aren't these in common/names.h?
Yup. Done.


http://gerrit.cloudera.org:8080/#/c/6527/4/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 52: using std::move;
> I think common/names.h has this
Done


http://gerrit.cloudera.org:8080/#/c/6527/4/be/src/exec/hdfs-scanner.cc
File be/src/exec/hdfs-scanner.cc:

PS4, Line 112: unique_ptr(
 : new RowBatch
> make_unique( ?
Done


http://gerrit.cloudera.org:8080/#/c/6527/4/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

PS4, Line 136: tow
> row
Done


http://gerrit.cloudera.org:8080/#/c/6527/4/be/src/exec/kudu-scan-node.cc
File be/src/exec/kudu-scan-node.cc:

Line 174: unique_ptr row_batch(new RowBatch(
> make_unique?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6527
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.

2017-06-14 Thread Alex Behm (Code Review)
Alex Behm has uploaded a new patch set (#5).

Change subject: IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.
..

IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.

Implements HdfsScanner::GetNext() for the Avro, RC File, and
Sequence File scanners. Changes ProcessSplit() to repeatedly call
GetNext() to share the core scanning code between the legacy
ProcessSplit() interface (ProcessSplit()) and the new GetNext()
interface.

Summary of changes:
- Slightly change code flow for initial scan range that
  only parses the file header. The new code sets
  'only_parsing_header_' in Open() and then honors
  that flag in GetNextInternal(). Before, all the logic
  was inside ProcessSpit().
- Replace 'finished_' with 'eos_'.
- Add a RowBatch parameter to various functions.
- Change Close() to free all resources when a nullptr
  RowBatch is passed.

Testing:
- Exhaustive tests passed on debug
- Core tests passed on asan
- TODO: Perf testing on cluster

Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669
---
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/base-sequence-scanner.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hdfs-avro-scanner-ir.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-rcfile-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node-mt.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M be/src/exec/kudu-scan-node.cc
M be/src/exec/scan-node.h
M be/src/util/blocking-queue.h
M testdata/workloads/functional-query/queries/DataErrorsTest/avro-errors.test
26 files changed, 706 insertions(+), 816 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/6527/5
-- 
To view, visit http://gerrit.cloudera.org:8080/6527
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-5389: simplify BufferDescriptor lifetime

2017-06-14 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-5389: simplify BufferDescriptor lifetime
..


Patch Set 1: Code-Review+1

(5 comments)

This does seem more understandable to me. Just some small suggestions.

http://gerrit.cloudera.org:8080/#/c/7182/1/be/src/runtime/disk-io-mgr-scan-range.cc
File be/src/runtime/disk-io-mgr-scan-range.cc:

Line 85:   *buffer = nullptr;
maybe prefer buffer->reset() here (for me it's clearer not to use the 
overloaded operator).


http://gerrit.cloudera.org:8080/#/c/7182/1/be/src/runtime/disk-io-mgr.cc
File be/src/runtime/disk-io-mgr.cc:

PS1, Line 208: len_(0),
 : eosr_(false),
 : scan_range_offset_(0
we've been moving towards initializing members with constants in the header 
file. If, like me, you prefer that style suggest you do so here to keep the 
initializer list short.


PS1, Line 700: unique_ptr
make_unique(this, reader, range, buffer, buffer_size, reader->mem_tracker_)?


PS1, Line 710: std
remove std::?


http://gerrit.cloudera.org:8080/#/c/7182/1/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

PS1, Line 217: std::unique_ptr&& buffer
> I believe this can be std::unique_ptr, since w
Right, since you can't pass a unique_ptr<> except by moving it, I think 
pass-by-value is equivalent. There's an argument in favour of keeping this an 
rvalue ref though, because if we ever changed the type of the ptr to be 
copyable, pass-by-value would silently start copying at each callsite - it's 
the sort of latent bug that could go undetected. From a readability 
perspective, rvalue-refs are explicit that the argument is going to get 
consumed. 

I don't feel very strongly though. It would be good to be consistent, since 
there are other methods that take a unique_ptr by value.


-- 
To view, visit http://gerrit.cloudera.org:8080/7182
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I007d098e9a1abb1f684be37b7f1ee6c03d3879b2
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5499: avoid ephemeral port conflicts

2017-06-14 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5499: avoid ephemeral port conflicts
..


Patch Set 3: Code-Review+2

carry +2

-- 
To view, visit http://gerrit.cloudera.org:8080/7171
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id02c83e6f946a14b83f5e6561957d7ad81442835
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5389: simplify BufferDescriptor lifetime

2017-06-14 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change.

Change subject: IMPALA-5389: simplify BufferDescriptor lifetime
..


Patch Set 1:

(4 comments)

First pass looks good. A few minor comments.

http://gerrit.cloudera.org:8080/#/c/7182/1//COMMIT_MSG
Commit Message:

PS1, Line 11: test.
test?


PS1, Line 18: was did not
did not


http://gerrit.cloudera.org:8080/#/c/7182/1/be/src/runtime/disk-io-mgr.h
File be/src/runtime/disk-io-mgr.h:

PS1, Line 150: recycling of the buffer descriptor
Since we no longer reuse the buffer descriptor, I think this comment should 
change.


http://gerrit.cloudera.org:8080/#/c/7182/1/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

PS1, Line 217: std::unique_ptr&& buffer
I believe this can be std::unique_ptr, since we 
take ownership in all cases. I could be wrong.


-- 
To view, visit http://gerrit.cloudera.org:8080/7182
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I007d098e9a1abb1f684be37b7f1ee6c03d3879b2
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5499: avoid ephemeral port conflicts

2017-06-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5499: avoid ephemeral port conflicts
..


Patch Set 3:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/732/

-- 
To view, visit http://gerrit.cloudera.org:8080/7171
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id02c83e6f946a14b83f5e6561957d7ad81442835
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5499: avoid ephemeral port conflicts

2017-06-14 Thread anujphadke (Code Review)
anujphadke has posted comments on this change.

Change subject: IMPALA-5499: avoid ephemeral port conflicts
..


Patch Set 2: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7171
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id02c83e6f946a14b83f5e6561957d7ad81442835
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5499: avoid ephemeral port conflicts

2017-06-14 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-5499: avoid ephemeral port conflicts
..


Patch Set 2: Code-Review+2

On second thought I am confident to +2 this. Please check with Anuj whether he 
has any other comments.

-- 
To view, visit http://gerrit.cloudera.org:8080/7171
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id02c83e6f946a14b83f5e6561957d7ad81442835
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5499: avoid ephemeral port conflicts

2017-06-14 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-5499: avoid ephemeral port conflicts
..


Patch Set 2: Code-Review+1

(3 comments)

http://gerrit.cloudera.org:8080/#/c/7171/1/be/src/statestore/statestore-test.cc
File be/src/statestore/statestore-test.cc:

Line 76:   int subscriber_port = FindUnusedEphemeralPort(_ports);
> Is this the right place for it? Seems more like a unit test for FindUnusedE
I agree that adding a unit test would be the best place to put this but is 
probably not worth it. I thought adding it here will serve as a smoke test of 
sorts in case that functionality ever breaks inside FindUnusedEphemeralPort(). 
I'm also ok with leaving it as it is.


http://gerrit.cloudera.org:8080/#/c/7171/1/be/src/util/network-util.cc
File be/src/util/network-util.cc:

Line 197: }
> Changed back to a for() loop, which achieves the same thing. I was thinking
Thanks. I think this way it is also more obvious that the loop will always 
terminate.


http://gerrit.cloudera.org:8080/#/c/7171/1/be/src/util/network-util.h
File be/src/util/network-util.h:

Line 70: int FindUnusedEphemeralPort(std::vector* used_ports);
> I hit a compile error that I couldn't get around. Unsure if it's a compiler
I agree, let's keep it as is then.


-- 
To view, visit http://gerrit.cloudera.org:8080/7171
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id02c83e6f946a14b83f5e6561957d7ad81442835
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5494: Fixes the selectivity of NOT IN predicates

2017-06-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5494: Fixes the selectivity of NOT IN predicates
..


Patch Set 7:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/731/

-- 
To view, visit http://gerrit.cloudera.org:8080/7168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I69e6217257b5618cb63e13b32ba3347fa0483b63
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Vincent Tran 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5494: Fixes the selectivity of NOT IN predicates

2017-06-14 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5494: Fixes the selectivity of NOT IN predicates
..


Patch Set 7: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I69e6217257b5618cb63e13b32ba3347fa0483b63
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Vincent Tran 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4669: [KUTIL] Add kudu util library to the build.

2017-06-14 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4669: [KUTIL] Add kudu_util library to the build.
..


Patch Set 13:

Fixed a few compilation errors on RHEL5 (see env_posix.cc for almost all of 
them). 

This patch compiles on RHEL5.5 and 6.0, SLES11 and 12, Ubuntu 12.04 and
14.04 and Debian 7.0 and 8.0.

-- 
To view, visit http://gerrit.cloudera.org:8080/5715
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8da798ee55506d6f969416b17c191eb03cb215f5
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4669: [KUTIL] Add kudu util library to the build.

2017-06-14 Thread Henry Robinson (Code Review)
Henry Robinson has abandoned this change.

Change subject: IMPALA-4669: [KUTIL] Add kudu_util library to the build.
..


Abandoned

Wrong Change-Id.

-- 
To view, visit http://gerrit.cloudera.org:8080/7186
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: Ic37368b3a53e16ddb29ac3b4ea6f0108b29d1f2d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 


[Impala-ASF-CR] IMPALA-4669: [KUTIL] Add kudu util library to the build.

2017-06-14 Thread Henry Robinson (Code Review)
Henry Robinson has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7186

Change subject: IMPALA-4669: [KUTIL] Add kudu_util library to the build.
..

IMPALA-4669: [KUTIL] Add kudu_util library to the build.

A few miscellaneous changes to allow kudu_util to compile with Impala.

Add kudu_version.cc to substitute for the version.cc file that is
automatically built during the full Kudu build.

Set LZ4_DISABLE_DEPRECATE_WARNINGS to allow Kudu's compressor utility to
use deprecated names for LZ4 methods.

Add NO_NVM_SUPPORT flag to Kudu build (plan to upstream this later) to
disable building with nvm support, removing a library dependency.

Also remove imported FindOpenSSL.cmake in favour of the standard one provided
by cmake itself.

Finally, a few changes to allow compilation on RHEL5:

* Only use sched_getcpu() if supported
* Only include magic.h if available
* Workaround for kernels that don't have SOCK_NONBLOCK
* Workaround for kernels that don't have O_CLOEXEC (ignore the flag)
* Provide non-working implementation of fallocate()
* Disable inclusion of linux/fiemap.h - although this exists on RHEL5,
  it does not compile due to other #includes in env_posix.cc. We disable
  the path this is used for, since Impala does not call that code.
* Use Kudu's implementation of pipe(2), preadv(2) and pwritev(2) where
  it doesn't exist.

In most cases these changes simply force kutil to revert to a different
implementation that was already written for OSX support - this patch
generalises the logic to provide the implementation whenever the
required function doesn't exist.

This patch compiles on RHEL5.5 and 6.0, SLES11 and 12, Ubuntu 12.04 and
14.04 and Debian 7.0 and 8.0.

Change-Id: Ic37368b3a53e16ddb29ac3b4ea6f0108b29d1f2d
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/common/CMakeLists.txt
M be/src/common/config.h.in
A be/src/common/kudu_version.cc
M be/src/common/logging.cc
M be/src/exec/kudu-util.h
A be/src/kudu/gutil
M be/src/kudu/util/CMakeLists.txt
M be/src/kudu/util/cache.cc
M be/src/kudu/util/compression/compression_codec.cc
M be/src/kudu/util/env.cc
M be/src/kudu/util/env_posix.cc
M be/src/kudu/util/flags.cc
A be/src/kudu/util/kudu_export.h
M be/src/kudu/util/locks.h
M be/src/kudu/util/logging.cc
M be/src/kudu/util/minidump.cc
M be/src/kudu/util/net/socket.cc
M be/src/kudu/util/subprocess.cc
D cmake_modules/FindOpenSSL.cmake
21 files changed, 183 insertions(+), 93 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/7186/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7186
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic37368b3a53e16ddb29ac3b4ea6f0108b29d1f2d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 


[Impala-ASF-CR] IMPALA-5494: Fixes the selectivity of NOT IN predicates

2017-06-14 Thread Vincent Tran (Code Review)
Vincent Tran has posted comments on this change.

Change subject: IMPALA-5494: Fixes the selectivity of NOT IN predicates
..


Patch Set 6:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/7168/6/fe/src/test/java/org/apache/impala/analysis/ExprSelectivityTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprSelectivityTest.java:

Line 31: public void verifySelectivityStmt(String stmtStr, double 
expectedSel)
> we indent 2
Done


Line 33:   SelectStmt stmt = (SelectStmt) ParsesOk(stmtStr);
> simplify with AnalyzesOk()
Done


Line 38:   assertEquals(calculatedSel,expectedSel);
> space after comma
Done


Line 43:   String stmtStr = "select * from functional.alltypes where id " +
> simplify this to "select  from functional.alltypes"
Done


Line 52: private double getPredSel(int numPredChild) {
> Very specific to IN predicate. How about we hardcode the expected values fo
Done


Line 69:   verifySel("in (1,3,5,7)", getPredSel(4));
> Seems cleaner to provide the full expr as input and not assume "id". That w
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I69e6217257b5618cb63e13b32ba3347fa0483b63
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Vincent Tran 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5286/IMPALA-5283: Kudu column name case cleanup

2017-06-14 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5286/IMPALA-5283: Kudu column name case cleanup
..


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/6902/2/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

Line 194:   1: required string columnName
This should be the lower case column name.

For the original Kudu column name, we should add a new field at the bottom 
where the Kudu-specific stuff is.


Line 336:   // Column names, in Kudu case.
Shouldn't everything be in Impala case by default? Based on your Patch Set 3, I 
don't this and the changes below in this file are needed.


http://gerrit.cloudera.org:8080/#/c/6902/2/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

Line 1246: List colNames = Lists.newArrayList();
These changes were not needed in your Patch Set 3 version. Can we remove this?


http://gerrit.cloudera.org:8080/#/c/6902/2/fe/src/main/java/org/apache/impala/catalog/KuduTable.java
File fe/src/main/java/org/apache/impala/catalog/KuduTable.java:

Line 109:   // Primary key column names, in Impala case.
Impala case -> lower case (fix here and elsewhere)


-- 
To view, visit http://gerrit.cloudera.org:8080/6902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I14aba88510012174716691b9946e1c7d54d01b44
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5286/IMPALA-5283: Kudu column name case cleanup

2017-06-14 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5286/IMPALA-5283: Kudu column name case cleanup
..


Patch Set 3:

(5 comments)

I actually prefer the original solution from the previous patch. Let me look at 
that in more detail.

http://gerrit.cloudera.org:8080/#/c/6902/3/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

Line 371:   // Column names in the casing that they appear in Kudu.
Mention that these are in an arbitrary order


http://gerrit.cloudera.org:8080/#/c/6902/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

Line 959: key.equals(key.toLowerCase()), "Slot paths should be lower 
case: " + key);
use StringUtils.isAllLowerCase() here and elsewhere


http://gerrit.cloudera.org:8080/#/c/6902/3/fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java:

Line 255:   Set colNames = Sets.newHashSet();
lowerCaseColNames


Line 258: if (colNames.contains(lowerCaseColName)) {
instead of the extra contains() you can check the return value of add()


Line 260:   String.format("Error loading Kudu table: conflicting 
column name '%s'",
Spell out what the problem is. "Conflicting column name" is not very clear.


-- 
To view, visit http://gerrit.cloudera.org:8080/6902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I14aba88510012174716691b9946e1c7d54d01b44
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4622: Add ALTER COLUMN statement.

2017-06-14 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4622: Add ALTER COLUMN statement.
..


Patch Set 4:

We should add an e2e test that exhaustively tests all combinations of column 
type, encoding and compression. In each case, we can do something very simple 
like insert one row and then do a select.

-- 
To view, visit http://gerrit.cloudera.org:8080/6955
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id2e8bd65342b79644a0fdcd925e6f17797e89ad6
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4622: Add ALTER COLUMN statement.

2017-06-14 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4622: Add ALTER COLUMN statement.
..


Patch Set 4:

(18 comments)

http://gerrit.cloudera.org:8080/#/c/6955/4/fe/src/main/java/org/apache/impala/analysis/AlterTableChangeColStmt.java
File fe/src/main/java/org/apache/impala/analysis/AlterTableChangeColStmt.java:

Line 44: public class AlterTableChangeColStmt extends AlterTableStmt {
We should move in the direction of making ALTER TABLE ALTER COLUMN the default. 
In that sense, I think we should rename this class to AlterTableAlterColumn and 
make corresponding renames everywhere else. It's fine if you prefer to do the 
renames in a follow-on patch.

The new AlterTableAlterColumn should have a static createChangeColStmt() to 
make it clear we are creating one of those.


Line 56: option.put(ColumnDef.Option.DEFAULT, new NullLiteral());
Why is it ok to use a NullLiteral for this purpose? Isn't NULL a legitimate 
default value that a user might want to set via ALTER TABLE ALTER COLUMN SET?


Line 88: // TODO: Support column-level DDL on HBase tables. Requires 
updating the column
Remove this TODO, seems useless


Line 91:   throw new AnalysisException("ALTER TABLE CHANGE COLUMN not 
currently supported " +
ALTER TABLE CHANGE/ALTER COLUMN


Line 97: for (FieldSchema fs: t.getMetaStoreTable().getPartitionKeys()) {
So weird. Can you clean this up while you are here? Feels easier to do these 
checks using Impala's Table/Column classes.


Line 126: if 
(!colName_.toLowerCase().equals(newColDef_.getColName().toLowerCase()) &&
Somewhat redundant with the checks in L97. Can you clean it up?


Line 146: "COLUMN statement: " + newColDef_.toString());
Mention that the new stmt should be used


http://gerrit.cloudera.org:8080/#/c/6955/4/fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java:

Line 399:* Updates the column matching 'colName' to have the name, type, 
and options specified
I thought it was not possible to change the type?


Line 402:   public static void changeColumn(KuduTable tbl, String colName, 
TColumn newCol)
alterColumn()


Line 440: newCol.getColumnName(), tbl.getName());
We should display the existing column name, not the new name.


Line 441: alterKuduTable(tbl, alterTableOptions, errMsg);
Does this alteration apply to new data being added to Kudu, or does this 
operation rewrite all the column data in the new encoding/compression? We need 
to be careful with expensive operations, maybe we can avoid holding the table 
lock, we should add logging for debugging purposes, etc.


http://gerrit.cloudera.org:8080/#/c/6955/4/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

Line 1133:   public void TestAlterColumn() throws AnalysisException {
TestAlterTableAlterColumn() for consistency

Does it make sense to merge this into the existing tests for CHANGE COLUMN?


http://gerrit.cloudera.org:8080/#/c/6955/4/fe/src/test/java/org/apache/impala/analysis/ParserTest.java
File fe/src/test/java/org/apache/impala/analysis/ParserTest.java:

Line 2358:   public void TestAlterColumn() {
TestAlterTableAlterColumn


Line 2359: for (String column : Lists.newArrayList("", "COLUMN")) {
By the way, the Postgres ALTER TABLE ALTER COLUMN allows changing multiple 
columns in the same stmt. No need to add that now, just wanted to make you 
aware.


http://gerrit.cloudera.org:8080/#/c/6955/4/testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test
File testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test:

Line 459: # add a default to a row that didn't already have one
* What happens when adding/dropping a default value to a PK column altering 
(seems like we should catch that in analysis)
* Can you alter a non-nullable column to have a default value of NULL?
* Can you change the compression/encoding of PK columns?


Line 462: alter table kudu_tbl_to_alter alter column new_col1 set default 10 + 
5;
can you set the default back to NULL later?


http://gerrit.cloudera.org:8080/#/c/6955/4/testdata/workloads/functional-query/queries/QueryTest/kudu_describe.test
File testdata/workloads/functional-query/queries/QueryTest/kudu_describe.test:

Line 49: alter table describe_test alter column c3
seems more appropriate to have this in kudu_alter.test


Line 51: describe describe_test;
Make sure that an insert works and that the values can be scanned.


-- 
To view, visit http://gerrit.cloudera.org:8080/6955
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id2e8bd65342b79644a0fdcd925e6f17797e89ad6
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: 

[Impala-ASF-CR] IMPALA-5061: Populate null count in parquet::statistics

2017-06-14 Thread Pooja Nilangekar (Code Review)
Pooja Nilangekar has uploaded a new patch set (#8).

Change subject: IMPALA-5061: Populate null_count in parquet::statistics
..

IMPALA-5061: Populate null_count in parquet::statistics

The null_count in the statistics field is updated each time a null
value is encountered by parquet table writer. The value is written
to the parquet header if it has one or more null values in the
row_group.

Testing: Modified the existing end-to-end test in the
test_insert_parquet.py file to make sure each parquet header has
the appropriate null_count. Verified the correctness of the nulltable
test and added an additional test which populates a parquet file with
the functional_parquet.zipcode_incomes table and ensures that the
expected null_count is populated.

Change-Id: I4c49a63af84c2234f0633be63206cb52eb7e8ebb
---
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/parquet-column-stats.h
M be/src/exec/parquet-column-stats.inline.h
M tests/query_test/test_insert_parquet.py
4 files changed, 129 insertions(+), 119 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/7058/8
-- 
To view, visit http://gerrit.cloudera.org:8080/7058
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4c49a63af84c2234f0633be63206cb52eb7e8ebb
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Pooja Nilangekar 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Pooja Nilangekar 


[Impala-ASF-CR] IMPALA-5488: Fix handling of exclusive HDFS file handles

2017-06-14 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5488: Fix handling of exclusive HDFS file handles
..


Patch Set 1: Code-Review+1

(2 comments)

The changes make sense to me. It's subtle so would be good to get another pair 
of eyes on it.

http://gerrit.cloudera.org:8080/#/c/7181/1/be/src/runtime/disk-io-mgr-scan-range.cc
File be/src/runtime/disk-io-mgr-scan-range.cc:

Line 304:   // Destroy the file handle and remove it from the cache
Nit: period at end for consistency?


Line 351:   // Destroy the file handle and remove it from the cache
Nit: period at end for consistency?


-- 
To view, visit http://gerrit.cloudera.org:8080/7181
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4c03696984285cc9ce463edd969c5149cd83a861
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5506: Add stdin description to help information of query file option

2017-06-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5506: Add stdin description to help information of 
query_file option
..


IMPALA-5506: Add stdin description to help information of query_file option

Help information of query_file option in impala-shell misses stdin
description.

I fix this issue by adding stdin description to help information of
query_file option.

Change-Id: I0264174fd062497c72891b31cf9ac1ba6c00f716
Reviewed-on: http://gerrit.cloudera.org:8080/7178
Reviewed-by: Henry Robinson 
Tested-by: Impala Public Jenkins
---
M shell/option_parser.py
1 file changed, 2 insertions(+), 1 deletion(-)

Approvals:
  Impala Public Jenkins: Verified
  Henry Robinson: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7178
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I0264174fd062497c72891b31cf9ac1ba6c00f716
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Donghui Xu 
Gerrit-Reviewer: Donghui Xu 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins


[Impala-ASF-CR] IMPALA-5506: Add stdin description to help information of query file option

2017-06-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5506: Add stdin description to help information of 
query_file option
..


Patch Set 2: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7178
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0264174fd062497c72891b31cf9ac1ba6c00f716
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Donghui Xu 
Gerrit-Reviewer: Donghui Xu 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5389: simplify BufferDescriptor lifetime

2017-06-14 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7182

Change subject: IMPALA-5389: simplify BufferDescriptor lifetime
..

IMPALA-5389: simplify BufferDescriptor lifetime

This is cleanup to make the code easier to understand in anticipation of
some trickier changes to I/O buffer management for IMPALA-4835. I do
not expect any changes in behaviour as a result of the test.

* Use unique_ptr to make BufferDescriptor ownership transfer more
  explicit and allow enforcing that the buffers are not leaked via
  a DCHECK in ~BufferDescriptor.
* Remove 'free_buffer_descs_' cache. TCMalloc is good at caching small
  objects and there will likely be a lot less lock contention compared
  with a single global lock. The cache was did not avoid calls to
  malloc() anyway because appending to std::list<> requires allocating
  a list node.
* Use std::deque instead of std::list in a couple of places - it offers
  O(1) push_*()/pop_*() at both ends and requires fewer calls into
  malloc()/free().

Testing:
Ran ASAN and exhaustive builds.

Change-Id: I007d098e9a1abb1f684be37b7f1ee6c03d3879b2
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
M be/src/runtime/disk-io-mgr-scan-range.cc
M be/src/runtime/disk-io-mgr-stress.cc
M be/src/runtime/disk-io-mgr-test.cc
M be/src/runtime/disk-io-mgr.cc
M be/src/runtime/disk-io-mgr.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M be/src/runtime/tmp-file-mgr.cc
11 files changed, 134 insertions(+), 195 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/7182/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7182
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I007d098e9a1abb1f684be37b7f1ee6c03d3879b2
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5085: large rows in BufferedTupleStreamV2

2017-06-14 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5085: large rows in BufferedTupleStreamV2
..


Patch Set 11:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6638/11/be/src/runtime/buffered-tuple-stream-v2.cc
File be/src/runtime/buffered-tuple-stream-v2.cc:

PS11, Line 121: DCHECK_GE
> is that suppose to be GT, given the comment?
Oops. Also needed to move a CHECK_CONSISTENCY() so that it is after the row is 
added.


http://gerrit.cloudera.org:8080/#/c/6638/10/be/src/runtime/buffered-tuple-stream-v2.h
File be/src/runtime/buffered-tuple-stream-v2.h:

PS10, Line 284: ssful, call
> What's "unsafe" about it?
Because it depends on the caller writing out the row in a specific format.


-- 
To view, visit http://gerrit.cloudera.org:8080/6638
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2861c58efa7bc1aeaa5b7e2f043c97cb3985c8f5
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5085: large rows in BufferedTupleStreamV2

2017-06-14 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#12).

Change subject: IMPALA-5085: large rows in BufferedTupleStreamV2
..

IMPALA-5085: large rows in BufferedTupleStreamV2

The stream defaults to pages of default_page_len_. If a row doesn't
fit in that page, it will allocate another page up to max_page_len_
bytes and append a single row to that page, then immediately unpin
the page. This means that when writing a stream, the large
page only needs to be kept in memory temporarily, which helps with
memory requirements.  E.g. consider a hash join that is repartitioning
1 unpinned stream into 16 unpinned streams. We will need
default_page_len_ * 15 + max_page_len_ * 2 bytes of reservation because
when processing a large row we only need one large write buffer at a
time.

Also switches the stream to lazily allocating write pages, so that
we don't need to allocate a page until we know the size of the row
to go in it. This required a mechanism to "save" reservation in
PrepareForRead()/PrepareForWrite(). A SubReservation APi is added
to BufferPool for this purpose and the stream now saves read and
write reservation for lazy page allocation. It also saves reservation
instead of double-pinning pages in the read/write case.

The large row cases are not as optimised for memory consumption or
performance - queries processing very large numbers of large rows
are an extreme edge case that is likely to hit other performance
bottlenecks first. Pages with large rows can have up to 50%
internal fragmentation.

To avoid duplicating more logic between AddRow() and AllocateRow()
I restructured things so that AddRowSlow() is implemented in terms
of AllocateRowSlow(). AllocateRow() now takes a function as an
argument to populate the row.

Testing:
* Added tests for the case where 0 rows are added to the stream
* Extend BigRow to exercise the new code.
* Also test large strings and read/write streams.

Change-Id: I2861c58efa7bc1aeaa5b7e2f043c97cb3985c8f5
---
M be/src/runtime/buffered-tuple-stream-v2-test.cc
M be/src/runtime/buffered-tuple-stream-v2.cc
M be/src/runtime/buffered-tuple-stream-v2.h
M be/src/runtime/buffered-tuple-stream-v2.inline.h
M be/src/runtime/bufferpool/buffer-pool-test.cc
M be/src/runtime/bufferpool/buffer-pool.cc
M be/src/runtime/bufferpool/buffer-pool.h
M be/src/runtime/bufferpool/reservation-tracker-test.cc
M be/src/runtime/bufferpool/reservation-tracker.h
M common/thrift/generate_error_codes.py
10 files changed, 965 insertions(+), 332 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/6638/12
-- 
To view, visit http://gerrit.cloudera.org:8080/6638
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2861c58efa7bc1aeaa5b7e2f043c97cb3985c8f5
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5488: Fix handling of exclusive HDFS file handles

2017-06-14 Thread Joe McDonnell (Code Review)
Joe McDonnell has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7181

Change subject: IMPALA-5488: Fix handling of exclusive HDFS file handles
..

IMPALA-5488: Fix handling of exclusive HDFS file handles

This change fixes three issues:
1. File handle caching is expected to be disabled for
remote files (using exclusive HDFS file handles),
however the file handles are still being cached.
2. The retry logic for exclusive file handles is broken,
leading the number of open files to be incorrect.
3. There is no test coverage for disabling the file
handle cache.

To fix issue #1, when a scan range is requesting an
exclusive file handle from the cache, it will always
request a newly opened file handle. It also will destroy
the file handle when the scan range is closed.

To fix issue #2, exclusive file handles will no longer
retry IOs. Since the exclusive file handle is always
a fresh file handle, it will never have a bad file
handle from the cache. This returns the logic to
its state before IMPALA-4623 in these cases. If a
file handle is borrowed from the cache, then the
code will continue to retry once with a fresh handle.

To fix issue #3, custom_cluster/test_hdfs_fd_caching.py
now does both positive and negative tests for the file
handle cache. It verifies that setting
max_cached_file_handles to zero disables caching. It
also verifies that caching is disabled on remote
files. (This change will resolve IMPALA-5390.)

Change-Id: I4c03696984285cc9ce463edd969c5149cd83a861
---
M be/src/runtime/disk-io-mgr-scan-range.cc
M be/src/runtime/disk-io-mgr.cc
M be/src/runtime/disk-io-mgr.h
M tests/common/skip.py
M tests/custom_cluster/test_hdfs_fd_caching.py
5 files changed, 101 insertions(+), 70 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/7181/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7181
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I4c03696984285cc9ce463edd969c5149cd83a861
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 


[Impala-ASF-CR] IMPALA-5427: Fix race between CRS::UpdateQueryStatus() and beeswax RPCs

2017-06-14 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-5427: Fix race between CRS::UpdateQueryStatus() and 
beeswax RPCs
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7155/6/be/src/service/impala-beeswax-server.cc
File be/src/service/impala-beeswax-server.cc:

Line 301: error_log_ss << request_state->coord()->GetErrorLog();
> what does the error log look like without the double line break?  the alter
I misinterpreted L297, thinking it would always leave a trailing newline. I 
agree that it's better to change the condition instead.


-- 
To view, visit http://gerrit.cloudera.org:8080/7155
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib4494fe3f933cc23841db0e7da407eec5650f2b5
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5427: Fix race between CRS::UpdateQueryStatus() and beeswax RPCs

2017-06-14 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5427: Fix race between CRS::UpdateQueryStatus() and 
beeswax RPCs
..


Patch Set 6:

(1 comment)

This look correct to me, but let's see what the new failure is due to.

http://gerrit.cloudera.org:8080/#/c/7155/6/be/src/service/impala-beeswax-server.cc
File be/src/service/impala-beeswax-server.cc:

Line 301: error_log_ss << request_state->coord()->GetErrorLog();
what does the error log look like without the double line break?  the 
alternative would be to keep one or two newlines, but change the condition to 
if error_log_ss is empty or not (rather than checking the ok()).


-- 
To view, visit http://gerrit.cloudera.org:8080/7155
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib4494fe3f933cc23841db0e7da407eec5650f2b5
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5061: Populate null count in parquet::statistics

2017-06-14 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-5061: Populate null_count in parquet::statistics
..


Patch Set 7:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/7058/7/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

PS7, Line 474: Update null_count in page_stat_
page_stat_ doesn't exist. I think it would be easier to understand if you just 
inline ProcessNullValue here and update the comment accordingly. The extra 
level of indirection doesn't seem to help much with the readability.


http://gerrit.cloudera.org:8080/#/c/7058/7/be/src/exec/parquet-column-stats.h
File be/src/exec/parquet-column-stats.h:

PS7, Line 159: value
nit values


PS7, Line 160: values
to either value until...


-- 
To view, visit http://gerrit.cloudera.org:8080/7058
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4c49a63af84c2234f0633be63206cb52eb7e8ebb
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Pooja Nilangekar 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Pooja Nilangekar 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5427: Fix race between CRS::UpdateQueryStatus() and beeswax RPCs

2017-06-14 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-5427: Fix race between CRS::UpdateQueryStatus() and 
beeswax RPCs
..


Patch Set 6:

Interestingly, now the error message changed. After about 200,000 executions of 
the tests, it seemed to complete once without throwing an error:

assert False, "Expected exception: %s" % expected_str
E   AssertionError: Expected exception: Column metadata states there are 50 
values, but read 100 values from column element.

I'll attach the logs to the JIRA.

-- 
To view, visit http://gerrit.cloudera.org:8080/7155
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib4494fe3f933cc23841db0e7da407eec5650f2b5
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-1575: Yield admission control resources at query end

2017-06-14 Thread Joe McDonnell (Code Review)
Joe McDonnell has uploaded a new patch set (#2).

Change subject: IMPALA-1575: Yield admission control resources at query end
..

IMPALA-1575: Yield admission control resources at query end

Currently, a query does not release admission control
resources until the client calls UnregisterQuery. Slow
clients can hold admission control resources even after
the query has reached a state where it will no longer
return any rows. Specifically, in the following
cases, the query is completed, but the client must
still call UnregisterQuery to release admission control
resources:

1. The query encounters an error and fails
2. The query is cancelled due to the idle query timeout
3. The query reaches eos (or the DML completes)
4. The client cancels the query without closing the query

This change releases admission control resources as
soon as the query reaches a state where it cannot
return any rows rather than waiting for the client
to explicitly end the query.

When cancelling a query, the coordinator asynchronously
notifies all fragment instances to cancel. The
coordinator does not wait for the fragment instances
to respond, so the cancel case can release admission
control resources while some fragment instances may
continue to run until the cancel takes effect. The
concern with this behavior is that the fragment
instances may continue to use memory and cause subsequent
admitted queries to fail. This is already possible today,
as a client can directly close a running query (which
cancels the query and unregisters the query immediately).
For example, the session idle timeout does this. However,
this change expands the circumstances where this can
happen.

Admission control based on mem_limit operates differently.
It relies on the reported memory usage of each
QueryState to generate a cumulative memory usage across
all of the instances. Admission control's behavior is
determined by when the QueryState releases its memory.

The existing behavior releases the query's memory on the
destruction of the QueryState, which occurs when the
query is unregistered. This matches the existing behavior
for admission control prior to this change. To support
the new behavior for mem_limit, the QueryState will now
release query resources when the last fragment instance
terminates. This unregisters the query memory tracker,
which results in the admission control memory resources
being freed.

To test both aspects of this change, the admission control
test (custom_cluster/test_admission_controller.py) has
been modified to use four different modes of ending a
query: client cancelling a query, the query hitting an
idle timeout, the query reaching eos, and the client
closing the query. The test uses a mix of all four.
After the query ends, all clients wait for the test
to complete before closing the query or closing the
connection. This ensures that the admission control
decisions are based entirely on the query end behavior.
This test works for both query admission control
and mem_limit admission control.

Change-Id: Ia5003d017b3142a160bacf7e3569ff26026b1700
---
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/query-exec-mgr.cc
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/service/client-request-state.cc
M tests/custom_cluster/test_admission_controller.py
7 files changed, 158 insertions(+), 102 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/7079/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7079
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia5003d017b3142a160bacf7e3569ff26026b1700
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-5506: Add stdin description to help information of query file option

2017-06-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5506: Add stdin description to help information of 
query_file option
..


Patch Set 2:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/730/

-- 
To view, visit http://gerrit.cloudera.org:8080/7178
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0264174fd062497c72891b31cf9ac1ba6c00f716
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Donghui Xu 
Gerrit-Reviewer: Donghui Xu 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5506: Add stdin description to help information of query file option

2017-06-14 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-5506: Add stdin description to help information of 
query_file option
..


Patch Set 2: Code-Review+2

Thanks!

-- 
To view, visit http://gerrit.cloudera.org:8080/7178
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0264174fd062497c72891b31cf9ac1ba6c00f716
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Donghui Xu 
Gerrit-Reviewer: Donghui Xu 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5507: Add clear description to help information of KEYVAL option

2017-06-14 Thread Donghui Xu (Code Review)
Donghui Xu has uploaded a new patch set (#2).

Change subject: IMPALA-5507: Add clear description to help information of 
KEYVAL option
..

IMPALA-5507: Add clear description to help information of KEYVAL option

Help information of KEYVAL option in impala-shell is not clear enough.

I fix this issue by adding clear description to help information of
KEYVAL option.

Change-Id: I68cfc16838c6c0e7813f03dd4296f9eb54ec4c63
---
M shell/option_parser.py
1 file changed, 4 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/7179/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7179
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I68cfc16838c6c0e7813f03dd4296f9eb54ec4c63
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Donghui Xu 


[Impala-ASF-CR] IMPALA-5507: Add clear description to help information of KEYVAL option

2017-06-14 Thread Donghui Xu (Code Review)
Donghui Xu has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7179

Change subject: IMPALA-5507: Add clear description to help information of 
KEYVAL option
..

IMPALA-5507: Add clear description to help information of KEYVAL option

Help information of KEYVAL option in impala-shell is not clear enough.

I fix this issue by adding clear description to help information of
KEYVAL option.

Change-Id: I68cfc16838c6c0e7813f03dd4296f9eb54ec4c63
---
M shell/option_parser.py
1 file changed, 4 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/7179/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7179
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I68cfc16838c6c0e7813f03dd4296f9eb54ec4c63
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Donghui Xu 


[Impala-ASF-CR] IMPALA-5492: Fix incorrect newline character in the LDAP message

2017-06-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5492: Fix incorrect newline character in the LDAP message
..


Patch Set 6: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I30c601ab255a4882260f7be23b5763ef8ec76d28
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Donghui Xu 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Donghui Xu 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5492: Fix incorrect newline character in the LDAP message

2017-06-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5492: Fix incorrect newline character in the LDAP message
..


IMPALA-5492: Fix incorrect newline character in the LDAP message

The introduction has redundant '\' in impala-shell when using LDAP.

I fix this issue by removing extraneous '\' in introduction when
impala-shell using LDAP.

Change-Id: I30c601ab255a4882260f7be23b5763ef8ec76d28
Reviewed-on: http://gerrit.cloudera.org:8080/7166
Reviewed-by: Bharath Vissapragada 
Tested-by: Impala Public Jenkins
---
M shell/impala_shell.py
1 file changed, 1 insertion(+), 1 deletion(-)

Approvals:
  Impala Public Jenkins: Verified
  Bharath Vissapragada: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I30c601ab255a4882260f7be23b5763ef8ec76d28
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Donghui Xu 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Donghui Xu 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins


[Impala-ASF-CR] IMPALA-5506: Add stdin description to help information of query file option

2017-06-14 Thread Donghui Xu (Code Review)
Donghui Xu has posted comments on this change.

Change subject: IMPALA-5506: Add stdin description to help information of 
query_file option
..


Patch Set 2:

(1 comment)

I have modified the code according to your opinion.

http://gerrit.cloudera.org:8080/#/c/7178/1/shell/option_parser.py
File shell/option_parser.py:

Line 87: help="Execute the queries in the query file, 
delimited by ;."
> Please wrap to 90 chars.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7178
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0264174fd062497c72891b31cf9ac1ba6c00f716
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Donghui Xu 
Gerrit-Reviewer: Donghui Xu 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5506: Add stdin description to help information of query file option

2017-06-14 Thread Donghui Xu (Code Review)
Donghui Xu has uploaded a new patch set (#2).

Change subject: IMPALA-5506: Add stdin description to help information of 
query_file option
..

IMPALA-5506: Add stdin description to help information of query_file option

Help information of query_file option in impala-shell misses stdin
description.

I fix this issue by adding stdin description to help information of
query_file option.

Change-Id: I0264174fd062497c72891b31cf9ac1ba6c00f716
---
M shell/option_parser.py
1 file changed, 2 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/7178/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7178
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0264174fd062497c72891b31cf9ac1ba6c00f716
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Donghui Xu 
Gerrit-Reviewer: Henry Robinson 


[Impala-ASF-CR] IMPALA-5506: Add stdin description to help information of query file option

2017-06-14 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-5506: Add stdin description to help information of 
query_file option
..


Patch Set 1:

(1 comment)

Thanks for doing this!

http://gerrit.cloudera.org:8080/#/c/7178/1/shell/option_parser.py
File shell/option_parser.py:

Line 87: help="Execute the queries in the query file or 
from STDIN indicated by - and end input with ctrl+d, delimited by ;")
Please wrap to 90 chars.

The message is good - but may I suggest a minor reword? How about:

"Execute the queries in the query file, delimited by ;. Queries
  may be read from stdin if the argument to -f is -."


-- 
To view, visit http://gerrit.cloudera.org:8080/7178
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0264174fd062497c72891b31cf9ac1ba6c00f716
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Donghui Xu 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: Yes