[Impala-ASF-CR] IMPALA-5890: Abort queries if scanner hits IO errors

2017-09-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5890: Abort queries if scanner hits IO errors
..


Patch Set 12:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1218/

-- 
To view, visit http://gerrit.cloudera.org:8080/8011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5923: Print binary ID as hex in ChildQuery::Cancel()

2017-09-12 Thread Lars Volker (Code Review)
Lars Volker has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/8050

Change subject: IMPALA-5923: Print binary ID as hex in ChildQuery::Cancel()
..

IMPALA-5923: Print binary ID as hex in ChildQuery::Cancel()

ChildQuery::Cancel() prints a binary ID into the log which can show up
as random characters. One fix is to print it as a hex string.

I tested this by running test_cancellation::test_cancel_insert and
making sure the ID is printed as hex.

Change-Id: Ie1a9516d5c03524e2585255700bb84e8a301a7ee
---
M be/src/service/child-query.cc
M be/src/util/debug-util.cc
2 files changed, 3 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/8050/1
-- 
To view, visit http://gerrit.cloudera.org:8080/8050
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ie1a9516d5c03524e2585255700bb84e8a301a7ee
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 


[Impala-ASF-CR] IMPALA-5856: Fix outer join predicate assignment.

2017-09-12 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5856: Fix outer join predicate assignment.
..


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/8039/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

PS1, Line 1381: // Right-most full-outer join table ref that is equal to or to 
the left of the
  : // table ref materializing 'tids'.
Trying to visualize this hurts my brain :) Not sure if it helps to understand 
the logic in this function. The comment in L1385 should suffice.


PS1, Line 1385: this
What is "this" refer to? After reading this more carefully, I think I 
understand what it means. I think it is better to move the comment below L1388.


PS1, Line 1398: globalState_.ojClauseByConjunct.get(e.getId());
getOjRef(e)?


http://gerrit.cloudera.org:8080/#/c/8039/1/testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test
File testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test:

PS1, Line 1020: from the On-clause of a left outer join
Out of curiosity, did you check if we have the same issue if there is a left 
outer join followed by a full outer join? If we don't have a test like that, 
maybe add it for completion.


Line 1028:  and t1.bigint_col > 10 and t2.bigint_col > 30
To make it more interesting, add a where clause with one predicate that 
references t1 and t2 and one that references t3.


-- 
To view, visit http://gerrit.cloudera.org:8080/8039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93db34d988cb66e00aa05d7dc161e0ca47042acb
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5211: Simplifying ifnull conditional.

2017-09-12 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5211: Simplifying ifnull conditional.
..


Patch Set 6:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/7829/6//COMMIT_MSG
Commit Message:

Line 9: Re-writes ifnull(x, y) into if(x IS DISTINCT FROM y, x, NULL),
Comment somewhat hard to read, maybe create bullet points foe each change, 
should be clear which one is a "conversion" and which one adds a new rewrite 
rule


I prefer "convert" instead of re-write because the latter has 
a very specific meaning to us (done via ExprRewriteRule)


Line 18: The error messages are slightly altered by this change:
Not just error messages, might be easier to just state that the original 
ifnull() is printed as the converted if() everywhere including error messages, 
explain plans, column labels, etc.


Line 20: Before:
Not sure we need to flesh this out so much. Like you said below, this is what 
you already get with similar conversions, e.g., NVL2() or decode()


http://gerrit.cloudera.org:8080/#/c/7829/6/be/src/exprs/conditional-functions-ir.cc
File be/src/exprs/conditional-functions-ir.cc:

Line 46
I like red!


http://gerrit.cloudera.org:8080/#/c/7829/6/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java:

Line 85:   public static Expr createExpr(FunctionName fnName, FunctionParams 
params)
I think we should add tests along the lines of what we have for NVL2() in 
AnalyzeExprsTest#TestFunctionCallExpr


Line 86:   throws AnalysisException {
This isn't right - createExpr() is and should only be called from the parser 
from which we should not throw an AnalysisException. We should strive to keep 
parsing and analysis distinct phases.

I think the right way to fix the dilemma below is to only transform the 
NULLIF() into IF() if there are exactly two arguments - otherwise we can just 
create a FunctionCallExpr with the original NULLIF function name which will 
later fail gracefully during analysis.


Line 108:   NullLiteral.create(Type.NULL) // NULL
Use new NullLiteral(). The create() returns an *analyzed* NullLiteral which is 
not what you want because we have not started analysis yet (still in parsing).

create() is typically used for creating an analyzed NullLiteral of a specific 
(non-NULL) type


http://gerrit.cloudera.org:8080/#/c/7829/6/fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java
File fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java:

Line 152: // Unary functions like isfalse, isnotfalse, istrue, isnottrue, 
nullvalue,
Move to class comment


Line 155: // nullif and nvl2 are re-written into an if.
re-written -> converted


Line 204:* Note that FunctionalCallExpr.createExpr() re-writes "nvl2" into 
"if",
re-writes -> converts


http://gerrit.cloudera.org:8080/#/c/7829/6/fe/src/main/java/org/apache/impala/rewrite/SimplifyDistinctFromRule.java
File fe/src/main/java/org/apache/impala/rewrite/SimplifyDistinctFromRule.java:

Line 39:   public Expr apply(Expr expr, Analyzer analyzer) throws 
AnalysisException {
Doesn't throw. I just noticed we declare throws unnecessarily in a bunch of 
other rules. Mind cleaning those up, too?


http://gerrit.cloudera.org:8080/#/c/7829/6/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java:

Line 387: RewritesOk("nvl2(null, 1, 2)", rules, "2");
Let's try not to mix these tests. Constant folding should be tested in the 
constant folding test. The fact that this is a conditional function is 
secondary. The tests are grouped by rewrite rule - not by type of expr.


Line 424: // nullif: handled by folding
move to constant folding test


Line 583: // nullif: handled by rewrite to if and simplification
converted to if and simplified


Line 584: RewritesOk("nullif(bool_col, bool_col)", rules, "NULL");
Move into SimplifyConditionalsTest


http://gerrit.cloudera.org:8080/#/c/7829/6/fe/src/test/java/org/apache/impala/analysis/ParserTest.java
File fe/src/test/java/org/apache/impala/analysis/ParserTest.java:

Line 90:   public void ParserAnalysisError(String stmt, String 
expectedErrorString) {
Parser does analyze.

Also I don't see this used.


-- 
To view, visit http://gerrit.cloudera.org:8080/7829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id91ca968a0c0be44e1ec54ad8602f91a5cb2e0e5
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5890: Abort queries if scanner hits IO errors

2017-09-12 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5890: Abort queries if scanner hits IO errors
..


Patch Set 12: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/8011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5927: Fix enable distcc for zsh

2017-09-12 Thread Lars Volker (Code Review)
Lars Volker has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/8049

Change subject: IMPALA-5927: Fix enable_distcc for zsh
..

IMPALA-5927: Fix enable_distcc for zsh

enable_distcc didn't work on zsh anymore since it relies on automatic
variable splitting, which only works in bash.

This change makes zsh temporarily emulate sh to make enable_distcc work
again. I tested it on zsh and bash.

Change-Id: I88284e4f68c309bb46ce4b5a842ccc576cd487ed
---
M bin/distcc/distcc_env.sh
1 file changed, 2 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/8049/1
-- 
To view, visit http://gerrit.cloudera.org:8080/8049
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I88284e4f68c309bb46ce4b5a842ccc576cd487ed
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 


[Impala-ASF-CR] IMPALA-3437: DECIMAL V2: avoid implicit decimal->double conversion

2017-09-12 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-3437: DECIMAL_V2: avoid implicit decimal->double 
conversion
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7916/5/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java:

Line 1331: Type.DOUBLE, ScalarType.createDecimalType(5, 1));
> Why is the result not DECIMAL(2,1)?
I mean DECIMAL(3,1) of course.


-- 
To view, visit http://gerrit.cloudera.org:8080/7916
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie419a75784eec2294947103e6e1465dfadfc29da
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3437: DECIMAL V2: avoid implicit decimal->double conversion

2017-09-12 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-3437: DECIMAL_V2: avoid implicit decimal->double 
conversion
..


Patch Set 5:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/7916/5/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

Line 513:* Converts numeric literal in the expr tree rooted at this expr to 
return floating
literals


Line 516:* Decimal has a higher processing cost than floating point and we 
should not pay
Please add subsections "DECIMAL_V1" and "DECIMAL_V2" to explain the rationale 
for their casting behavior. In particular, does decimal still have a higher 
processing cost?

This way it becomes clear that the old behavior was intended to maximize 
performance and the new behavior is intended to provide accurate/consistent 
behavior (but does it take a perf hit for that?)


Line 529:   protected void convertNumericLiteralsFromDecimal(Analyzer analyzer)
I can't wait for this function to be deleted.


http://gerrit.cloudera.org:8080/#/c/7916/5/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java:

Line 1306:* Test expressions that resolve to different types depend on the 
DECIMAL_V2 setting.
Test expressions that return different decimal types depending on the 
DECIMAL_V2 setting.


Line 1319:* Test expressions that resolve to the same type with either 
DECIMAL_V2 value.
resolve to -> return

(in the FE 'resolve' has a specific meaning for SlotRefs and TableRefs which we 
should not confuse here)


Line 1331: Type.DOUBLE, ScalarType.createDecimalType(5, 1));
Why is the result not DECIMAL(2,1)?
Decimal type seems weird, but not the focus of this patch.


Line 1342: // DECIMAL_V2: floating point + decimal expr = decimal
This does not seem to explain what happens for "floating point + numeric 
literal" like in the DECIMAL_V1 case.

What happens in cases where the numeric literal is a float because it does not 
fit into our max decimal? Needs test.

Might be clearer if you list:
DECIMAL_V2: float + decimal literal = decimal
DECIMAL_V2: float + float literal = float


Line 1386: // Multiplying a floating point type with any other type always 
results in a double.
Why? Seems inconsistent.


Line 1409: checkDecimalReturnType("select round(1.2345, 2) * pow(10, 10)", 
Type.DOUBLE);
add same test with "+"


Line 1426: // Test behavior of compound expressions with a single column 
and many literals.
column -> slot ref

(and elsewhere)


Line 1465: checkDecimalReturnType("select 1.123e-2", 
ScalarType.createDecimalType(5, 5));
What about literals that don't fit into a decimal?


-- 
To view, visit http://gerrit.cloudera.org:8080/7916
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie419a75784eec2294947103e6e1465dfadfc29da
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5905: add script for all-build-options job

2017-09-12 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-5905: add script for all-build-options job
..


Patch Set 2: Code-Review+2

(3 comments)

Can you add a note about ccache and ninja?

http://gerrit.cloudera.org:8080/#/c/8043/1/bin/all-build-options.sh
File bin/all-build-options.sh:

Line 1
> Yeah the jenkins/ subdirectory makes sense to me, there's a lot of junk in 
I named the Jenkins job "all-build-options". I'm OK with a name change of any 
sort, or leaving it the same. Maybe "build-with-all-flag-combinations.sh"?


Line 21
Required the ninja build system and ccache to be installed, which are not 
strictly build requirements.


PS1, Line 35: 
: 
: 
: 
: 
: 
: 
> I just preserved this logic from the original Jenkins job script. I didn't 
One difference: if clean.sh fails in the call to buildall, that call fails but 
we continue to test more build options. This way, if we can't clean, we just 
dies since all the remaining calls to buildall.sh should also fail but will 
provide no additional information.


-- 
To view, visit http://gerrit.cloudera.org:8080/8043
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e78f05c41e3ccd59af599b00e453e7f88b2bb34
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5597: Try casting targetExpr when building runtime filter plan

2017-09-12 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5597: Try casting targetExpr when building runtime 
filter plan
..


Patch Set 4: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0d66673e280ce13cd3a514236c0c2ecbc50475a6
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5597: Try casting targetExpr when building runtime filter plan

2017-09-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5597: Try casting targetExpr when building runtime 
filter plan
..


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1217/

-- 
To view, visit http://gerrit.cloudera.org:8080/7949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0d66673e280ce13cd3a514236c0c2ecbc50475a6
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tianyi Wang 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3516: Avoid writing to /tmp in testing

2017-09-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-3516: Avoid writing to /tmp in testing
..


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1216/

-- 
To view, visit http://gerrit.cloudera.org:8080/8047
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9e503eb7d333c1b89dc8aea87cf30504838c44f9
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tianyi Wang 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3516: Avoid writing to /tmp in testing

2017-09-12 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-3516: Avoid writing to /tmp in testing
..


Patch Set 2: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/8047
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9e503eb7d333c1b89dc8aea87cf30504838c44f9
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5890: Abort queries if scanner hits IO errors

2017-09-12 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-5890: Abort queries if scanner hits IO errors
..


Patch Set 10:

(2 comments)

Changed to INTERNAL_ERROR for the possible error in GetNextBuffer().

http://gerrit.cloudera.org:8080/#/c/8011/10/be/src/exec/scanner-context.cc
File be/src/exec/scanner-context.cc:

PS10, Line 200: DISK_IO_ERROR
> Seems to me that INTERNAL_ERROR should be in the list of unrecoverable erro
Done


PS10, Line 200: DISK_IO_ERROR
> Makes sense to me.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/8011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5890: Abort queries if scanner hits IO errors

2017-09-12 Thread Lars Volker (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/8011

to look at the new patch set (#12).

Change subject: IMPALA-5890: Abort queries if scanner hits IO errors
..

IMPALA-5890: Abort queries if scanner hits IO errors

Prior to this fix, an error in ScannerContext::Stream::GetNextBuffer()
could leave the stream in an inconsistent state:

- The DiskIoMgr hits EOF unexpected, cancels the scan range and enqueues
a buffer with eosr set.
- The ScannerContext::Stream tries to read more bytes, but since it has
hit eosr, it tries to read beyond the end of the scan range using
DiskIoMgr::Read().
- The previous read error resulted in a new file handle being opened.
The now truncated, smaller file causes the seek to fail.
- Then during error handling, the BaseSequenceScanner calls SkipToSync()
and trips over the NULL pointer in in the IO buffer.

In my reproduction this only happens with the file handle cache enabled,
which causes Impala to see two different sized handles: the one from the
cache when the query starts, and the one after reopening the file.

To fix this, we change the I/O manager to always return DISK_IO_ERROR
for errors and we abort a query if we receive such an error in the
scanner.

This change also fixes GetBytesInternal() to maintain the invariant that
the output buffer points to the boundary buffer whenever the latter
contains some data.

I tested this by running the repro from the JIRA and impalad did not
crash but aborted the queries. I also ran the repro with
abort_on_error=1, and with the file handle cache disabled.

Text files are not affected by this problem, since the
text scanner doesn't try to recover from errors during ProcessRange()
but wraps it in RETURN_IF_ERROR instead. With this change queries abort
with the same error.

Parquet files are also not affected since they have the metadata at the
end. Truncated files immediately fail with this error:
WARNINGS: File 
'hdfs://localhost:20500/test-warehouse/tpch.partsupp_parquet/foo.0.parq'
has an invalid version number: 

Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509
---
M be/src/common/status.h
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
M be/src/runtime/disk-io-mgr-scan-range.cc
M be/src/runtime/disk-io-mgr-test.cc
M be/src/runtime/disk-io-mgr.cc
M be/src/runtime/runtime-state.cc
M common/thrift/generate_error_codes.py
9 files changed, 104 insertions(+), 55 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/8011/12
-- 
To view, visit http://gerrit.cloudera.org:8080/8011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5597: Try casting targetExpr when building runtime filter plan

2017-09-12 Thread Tianyi Wang (Code Review)
Tianyi Wang has posted comments on this change.

Change subject: IMPALA-5597: Try casting targetExpr when building runtime 
filter plan
..


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/7949/3/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

Line 587:   } catch (IllegalStateException e) {
> catch (Exception e)
Done


Line 592: // Types of targetExpr and srcExpr should be exactly the same 
since runtime filter is
> Types of targetExpr and srcExpr must be exactly the same since runtime filt
Done


Line 597:   } catch (AnalysisException | IllegalArgumentException e) {
> catch (Exception e)
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0d66673e280ce13cd3a514236c0c2ecbc50475a6
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5597: Try casting targetExpr when building runtime filter plan

2017-09-12 Thread Tianyi Wang (Code Review)
Tianyi Wang has uploaded a new patch set (#4).

Change subject: IMPALA-5597: Try casting targetExpr when building runtime 
filter plan
..

IMPALA-5597: Try casting targetExpr when building runtime filter plan

This patch fixes a bug that fails a precondition check when generating
runtime filter plans. The lhs and rhs or join predicate might have
different types when the eq predicate function accepts wildcard-typed
parameters. In this case in existing code the types of source and target
expr will be found mismatch and an exception will be thrown when
generating runtime filters. This patch tries to cast target expr to be
of the same type as source expr. A testcase is added to joins.test

Change-Id: I0d66673e280ce13cd3a514236c0c2ecbc50475a6
---
M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test
6 files changed, 53 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/7949/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0d66673e280ce13cd3a514236c0c2ecbc50475a6
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tianyi Wang 


[Impala-ASF-CR] IMPALA-3516: Avoid writing to /tmp in testing

2017-09-12 Thread Tianyi Wang (Code Review)
Tianyi Wang has posted comments on this change.

Change subject: IMPALA-3516: Avoid writing to /tmp in testing
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8047/1/be/src/service/fe-support.cc
File be/src/service/fe-support.cc:

Line 18: 
> what happened here?
Sorry. Didn't notice it in diff.


http://gerrit.cloudera.org:8080/#/c/8047/1/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
File fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java:

Line 90:   private final String outDir_ = 
System.getenv("IMPALA_FE_TEST_LOGS_DIR")
> I suggest setting this in setUp() and checking whether the env var is set. 
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/8047
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9e503eb7d333c1b89dc8aea87cf30504838c44f9
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3516: Avoid writing to /tmp in testing

2017-09-12 Thread Tianyi Wang (Code Review)
Tianyi Wang has uploaded a new patch set (#2).

Change subject: IMPALA-3516: Avoid writing to /tmp in testing
..

IMPALA-3516: Avoid writing to /tmp in testing

Currently some parts of the tests write to /tmp:
1. PlannerTest result files are written to /tmp/PlannerTest
2. FE tests load libfesupport, which writes logs to /tmp
3. Updated results in EE tests (run-tests.py --update_results) is
   written to /tmp
This patch changes them into writing to $IMPALA_HOME/logs. Specifically:
1. PlannerTest result files are written to
   $IMPALA_FE_TEST_LOGS_DIR/PlannerTest
2. libfesupport logs are written to $IMPALA_FE_TEST_LOGS_DIR
3. Updated EE test results are written to $IMPALA_EE_TEST_LOGS_DIR

Change-Id: I9e503eb7d333c1b89dc8aea87cf30504838c44f9
---
M be/src/service/fe-support.cc
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M tests/common/impala_test_suite.py
3 files changed, 12 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/8047/2
-- 
To view, visit http://gerrit.cloudera.org:8080/8047
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9e503eb7d333c1b89dc8aea87cf30504838c44f9
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 


[Impala-ASF-CR] IMPALA-1767 Adds predicate to test boolean values true, false, unknown.

2017-09-12 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change.

Change subject: IMPALA-1767 Adds predicate to test boolean values true, false, 
unknown.
..


Patch Set 7:

(21 comments)

http://gerrit.cloudera.org:8080/#/c/8014/1/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

PS1, Line 2876: 
> don't think you need that
Done. removed in latest patch.


PS1, Line 2878: 
> same here
Done. removed in latest patch.


Line 2891:   | UNMATCHED_STRING_LITERAL:l expr:e
> nit: extra spaces (see marked as red).
Done. removed in latest patch.


http://gerrit.cloudera.org:8080/#/c/8014/2/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

PS2, Line 273:   KW_UNBOUNDED, KW_UNCACHED, KW_UNION, KW_UPDATE, KW_UPDATE_FN, 
KW_UPSERT, KW_USE,
> nit: long line (see vertical separator, that's ok for .test files but for e
Done. reverted this change when I moved away from using a keyword.


http://gerrit.cloudera.org:8080/#/c/8014/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

PS6, Line 337: rules.add(BetweenToCompoundRule.INSTANCE);
> Maybe add a brief comment about this rule as well?
Both rules treated the same way, so updated this comment.


PS6, Line 1093: ss
> nit: them
Done


PS6, Line 1098: 
  : // TODO: add a method to Expr to take care of this.
> remove. Similar comment above (L1082).
Done


http://gerrit.cloudera.org:8080/#/c/8014/2/fe/src/main/java/org/apache/impala/analysis/BoolTestPredicate.java
File fe/src/main/java/org/apache/impala/analysis/BoolTestPredicate.java:

PS2, Line 26: 
> "a boolean test predicate."? I think you can also add the grammar spec here
Done


PS2, Line 33:  This predicate needs to be rewritten into a Compo
> For literals we usually make them static as well; also, remove "_" from the
Done


PS2, Line 38: ivate
> No need for "this". We use "_" in the name to indicate private fields.
Done


PS2, Line 41: blic BoolTestPredicate(Expr e, LiteralExpr v, boolean isNegated) {
: super();
> You can use the addChild() function here, same thing.
Done


PS2, Line 88: analyzeImpl(analyzer);
> Will that work with something like "select 1 is true"? For instance, "selec
since "select 1 = true" is supported, it makes sense to inject an implicit cast 
here as well. I changed this so that I don't type-check here since it would 
duplicate the type-checking that "eq" already does. so, to be consistent with 
"eq", I let the lhs through and assume that the analyzed rewritten expr will 
handle type-checking (and subqueries, etc). the main downside I see with this 
is that the error message from "eq" is exposed, which could be surprising to 
the user since its at a lower level of abstraction.


PS2, Line 94: LHS is type-checked follow
> Similar comment as above. e.g. select 1 is 1;
I've specified the parser rule to require that the rhs is true, false, or 
unknown (that's per the standard). I'd prefer to check as-is; let me know if 
you think it makes more sense to relax the parser and move more flexibility 
here.


http://gerrit.cloudera.org:8080/#/c/8014/6/fe/src/main/java/org/apache/impala/rewrite/BoolTestToCompoundRule.java
File fe/src/main/java/org/apache/impala/rewrite/BoolTestToCompoundRule.java:

PS6, Line 29: 
> "a compound predicate."
Done


Line 64: result = new CompoundPredicate(CompoundPredicate.Operator.NOT, 
result, null);
> Preconditions.checkNotNull(result);
Done


http://gerrit.cloudera.org:8080/#/c/8014/6/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java:

PS6, Line 578: 
> nit: extra space
Done


PS6, Line 580: TestBoolTestPredicate
> A few more test cases to consider:
added select list tests here

added the others to the rewrites and end-to-end since the analysis pass does 
not do much now. if there is a sure-fire way to be consistent with the casting 
rules used by "=" (and without code duplication), it would make more sense to 
add the checks to analysis (and therefore more tests here).


PS6, Line 586: 
> Also use "unknown"
whoops, those nulls should not have been here.


PS6, Line 591: 
> This doesn't seem to be consistent with how we handle for instance expr lik
see comment above.


PS6, Line 591: 
> Add a few more negative cases. Example:
added negative tests and coverage to rewrites and end-to-end. the analysis for 
this expr does not do as much.


http://gerrit.cloudera.org:8080/#/c/8014/6/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java:

PS6, Line 168: RewritesOk("50.0 between null and 5000", rule,
> Did you figure that out? If so, remove the TODO.
the rewritten query does not come with parens, as shown, but I verified that 
the sql string returned from the shell (e.g., via a show create table ) 

[Impala-ASF-CR] IMPALA-1767 Adds predicate to test boolean values true, false, unknown.

2017-09-12 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has uploaded a new patch set (#7).

Change subject: IMPALA-1767 Adds predicate to test boolean values true, false, 
unknown.
..

IMPALA-1767 Adds predicate to test boolean values true, false, unknown.

Adds a new expression to represent the following boolean predicate:
 IS [NOT] (TRUE | FALSE | UNKNOWN)
The expression is rewritten into compound expressions using
negation, is null, and equality. As a result, there are no BE
operators to eval this predicate.

Testing:
- Frontend: parser, analyzer, rewriter, tosql
- EndToEnd query expressions

Change-Id: I5bd0aa82a0316ac236b7ecf8612ef4b30c016a00
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
A fe/src/main/java/org/apache/impala/analysis/BoolTestPredicate.java
A fe/src/main/java/org/apache/impala/rewrite/BoolTestToCompoundRule.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M testdata/workloads/functional-query/queries/QueryTest/exprs.test
9 files changed, 467 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/8014/7
-- 
To view, visit http://gerrit.cloudera.org:8080/8014
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5bd0aa82a0316ac236b7ecf8612ef4b30c016a00
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-4987: Fix flaky test test row availability.py

2017-09-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4987: Fix flaky test test_row_availability.py
..


IMPALA-4987: Fix flaky test test_row_availability.py

This patch keeps test_row_availbility from randomly failing. In this test
the time interval between the 'Rows available' timeline event and the
previous event in the runtime profile is measured in order to make sure
that the rows become available after a specific amount of time. This
measurement is not correct since the previous event is that the
coordinator finished sending the query fragments to the backends, which
means the execution on some backends might have already started. This
patch tracks another event "Ready to start" as the beginning of the time
interval instead. The coordinator begins to send the query fragments to
the backends after this event so the time check should always pass.

Change-Id: I96142f1868a26426cbc34aa9a0e0a56979df66c3
Reviewed-on: http://gerrit.cloudera.org:8080/8036
Reviewed-by: Alex Behm 
Tested-by: Impala Public Jenkins
---
M tests/query_test/test_rows_availability.py
1 file changed, 30 insertions(+), 21 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Alex Behm: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/8036
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I96142f1868a26426cbc34aa9a0e0a56979df66c3
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tianyi Wang 


[Impala-ASF-CR] IMPALA-4987: Fix flaky test test row availability.py

2017-09-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4987: Fix flaky test test_row_availability.py
..


Patch Set 3: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/8036
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I96142f1868a26426cbc34aa9a0e0a56979df66c3
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tianyi Wang 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5890: Abort queries if scanner hits IO errors

2017-09-12 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5890: Abort queries if scanner hits IO errors
..


Patch Set 11: Code-Review+1

The stricter invariant makes sense to me.

-- 
To view, visit http://gerrit.cloudera.org:8080/8011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5597: Try casting targetExpr when building runtime filter plan

2017-09-12 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5597: Try casting targetExpr when building runtime 
filter plan
..


Patch Set 3:

(4 comments)

lgtm after this round

http://gerrit.cloudera.org:8080/#/c/7949/3/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

Line 587:   } catch (IllegalStateException e) {
catch (Exception e)


Line 592: // Types of targetExpr and srcExpr should be exactly the same 
since runtime filter is
Types of targetExpr and srcExpr must be exactly the same since runtime filters 
are based on hashing.


Line 597:   } catch (AnalysisException | IllegalArgumentException e) {
catch (Exception e)


http://gerrit.cloudera.org:8080/#/c/7949/2/testdata/workloads/functional-query/queries/QueryTest/joins.test
File testdata/workloads/functional-query/queries/QueryTest/joins.test:

Line 774
> Done. What does propagation mean here?
I agree "propagation" may be confusing. I think it refers to the fact we are 
"sprinkling" runtime filters throughout the plan tree. I think it's more 
confusing than not (might be better to rename in a separate patch).


-- 
To view, visit http://gerrit.cloudera.org:8080/7949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0d66673e280ce13cd3a514236c0c2ecbc50475a6
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3516: Avoid writing to /tmp in testing

2017-09-12 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-3516: Avoid writing to /tmp in testing
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8047/1/be/src/service/fe-support.cc
File be/src/service/fe-support.cc:

Line 18: 
what happened here?


http://gerrit.cloudera.org:8080/#/c/8047/1/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
File fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java:

Line 90:   private final String outDir_ = 
System.getenv("IMPALA_FE_TEST_LOGS_DIR")
I suggest setting this in setUp() and checking whether the env var is set. 
Otherwise, might be hard to debug what went wrong.


-- 
To view, visit http://gerrit.cloudera.org:8080/8047
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9e503eb7d333c1b89dc8aea87cf30504838c44f9
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5881: Use native allocation while building catalog updates

2017-09-12 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has abandoned this change.

Change subject: IMPALA-5881: Use native allocation while building catalog 
updates
..


Abandoned

While this change works as expected, I think we can be more uniform in our 
approach so that the system is easy to reason about. With this patch, its 
possible that the Catalog can support large topic size while Impalads can't 
deserialize it (for example, a single table larger than 2GB). While we explore 
other options, I'm abandoning this change.

-- 
To view, visit http://gerrit.cloudera.org:8080/7955
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782
Gerrit-PatchSet: 19
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5920: Remove admission control dependency on YARN RM jar

2017-09-12 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5920: Remove admission control dependency on YARN RM jar
..


Patch Set 2:

(3 comments)

It's a bit unfortunate that we have to pull in this much code to get it to 
work, but it seems like you've already pruned out the unnecessary stuff.

http://gerrit.cloudera.org:8080/#/c/8035/2//COMMIT_MSG
Commit Message:

Line 23: 'common/yarn-extras', taken from Hadoop 2.6.0.
Is the original source code that you cribbed this from available publicly 
somewhere? Would be handy to diff.


http://gerrit.cloudera.org:8080/#/c/8035/2/common/yarn-extras/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationConfiguration.java
File 
common/yarn-extras/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationConfiguration.java:

Line 19: //YARNUTIL: MODIFIED
Is there an easy way I can see the diff? If you point me to the hadoop source 
this was derived from I can do it myself.


http://gerrit.cloudera.org:8080/#/c/8035/2/fe/pom.xml
File fe/pom.xml:

Line 111:   system
Is systemPath necessary? We seem to be able to resolve data-source-api above 
without specifying this.


-- 
To view, visit http://gerrit.cloudera.org:8080/8035
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7efdd8ebea298836ca2a82c0a4ae037ac9285bcf
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5890: Abort queries if scanner hits IO errors

2017-09-12 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5890: Abort queries if scanner hits IO errors
..


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8011/10/be/src/exec/scanner-context.cc
File be/src/exec/scanner-context.cc:

PS10, Line 200: DISK_IO_ERROR
> Seems to me that INTERNAL_ERROR should be in the list of unrecoverable erro
Makes sense to me.


-- 
To view, visit http://gerrit.cloudera.org:8080/8011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5890: Abort queries if scanner hits IO errors

2017-09-12 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5890: Abort queries if scanner hits IO errors
..


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8011/10/be/src/exec/scanner-context.cc
File be/src/exec/scanner-context.cc:

PS10, Line 200: DISK_IO_ERROR
> I can change that, but then we also have to abort the query in GetNextInter
Seems to me that INTERNAL_ERROR should be in the list of unrecoverable errors.  
Who knows what state things are in when hitting one of them (which signifies a 
bug). Tim, what do you think?

If you think it's too risky to make that change now, we can do it later.


-- 
To view, visit http://gerrit.cloudera.org:8080/8011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3516: Avoid writing to /tmp in testing

2017-09-12 Thread Tianyi Wang (Code Review)
Tianyi Wang has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/8047

Change subject: IMPALA-3516: Avoid writing to /tmp in testing
..

IMPALA-3516: Avoid writing to /tmp in testing

Currently some parts of the tests write to /tmp:
1. PlannerTest result files are written to /tmp/PlannerTest
2. FE tests load libfesupport, which writes logs to /tmp
3. Updated results in EE tests (run-tests.py --update_results) is
   written to /tmp
This patch changes them into writing to $IMPALA_HOME/logs. Specifically:
1. PlannerTest result files are written to
   $IMPALA_FE_TEST_LOGS_DIR/PlannerTest
2. libfesupport logs are written to $IMPALA_FE_TEST_LOGS_DIR
3. Updated EE test results are written to $IMPALA_EE_TEST_LOGS_DIR

Change-Id: I9e503eb7d333c1b89dc8aea87cf30504838c44f9
---
M be/src/service/fe-support.cc
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M tests/common/impala_test_suite.py
3 files changed, 8 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/8047/1
-- 
To view, visit http://gerrit.cloudera.org:8080/8047
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I9e503eb7d333c1b89dc8aea87cf30504838c44f9
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang 


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-12 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..


Patch Set 3: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5905: add script for all-build-options job

2017-09-12 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#2).

Change subject: IMPALA-5905: add script for all-build-options job
..

IMPALA-5905: add script for all-build-options job

This checks in a modified version of the job script for
https://jenkins.impala.io/view/Experimental/job/all-build-options
which adds UBSAN and TSAN.

The script is also modified to not reference any jenkins environment
variables, e.g. WORKSPACE, since the Jenkins job script intermingled
references to those with the script logic.

Change-Id: I6e78f05c41e3ccd59af599b00e453e7f88b2bb34
---
A bin/jenkins/all-build-options.sh
1 file changed, 57 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/8043/2
-- 
To view, visit http://gerrit.cloudera.org:8080/8043
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6e78f05c41e3ccd59af599b00e453e7f88b2bb34
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5890: Abort queries if scanner hits IO errors

2017-09-12 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-5890: Abort queries if scanner hits IO errors
..


Patch Set 10:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/8011/10/be/src/exec/scanner-context.cc
File be/src/exec/scanner-context.cc:

PS10, Line 154: if the scan range has returned eosr before
> if that was the case, why can we have !eosr at line 152?
I think this was an old comment and I couldn't convince myself anymore that it 
was correct. I also discovered that the DCHECK should read (!status.ok || ...) 
because if the status is OK, that implies the buffer must not be NULL).


PS10, Line 155: must
> why "must"? or do you mean "can't"?
Yes, can't is what I meant to say. Removed the comment though.


PS10, Line 155: this is the first time the function
  : // is called 
> why is that significant? and why is it okay to have io_buffer_==NULL in tha
Removed the comment.


PS10, Line 200: DISK_IO_ERROR
> why not INTERNAL_ERROR, since that's the code that indicates an internal er
I can change that, but then we also have to abort the query in 
GetNextInternal() on INTERNAL_ERROR. I think we re-used DISK_IO_ERROR here to 
keep the set of aborting errors small. Would you prefer to switch to 
INTERNAL_ERROR?


PS10, Line 351:   if (boundary_buffer_bytes_left_ > 0 &&
  :   (output_buffer_pos_ != _buffer_pos_ ||
  :   output_buffer_bytes_left_ != 
_buffer_bytes_left_)) {
  : return false;
> this is effectively a double negative statement, which makes it harder to r
Done. I think in the else case (boundary_buffer_bytes_left_ == 0) the buffer 
pointers could point to either buffer.


-- 
To view, visit http://gerrit.cloudera.org:8080/8011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5890: Abort queries if scanner hits IO errors

2017-09-12 Thread Lars Volker (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/8011

to look at the new patch set (#11).

Change subject: IMPALA-5890: Abort queries if scanner hits IO errors
..

IMPALA-5890: Abort queries if scanner hits IO errors

Prior to this fix, an error in ScannerContext::Stream::GetNextBuffer()
could leave the stream in an inconsistent state:

- The DiskIoMgr hits EOF unexpected, cancels the scan range and enqueues
a buffer with eosr set.
- The ScannerContext::Stream tries to read more bytes, but since it has
hit eosr, it tries to read beyond the end of the scan range using
DiskIoMgr::Read().
- The previous read error resulted in a new file handle being opened.
The now truncated, smaller file causes the seek to fail.
- Then during error handling, the BaseSequenceScanner calls SkipToSync()
and trips over the NULL pointer in in the IO buffer.

In my reproduction this only happens with the file handle cache enabled,
which causes Impala to see two different sized handles: the one from the
cache when the query starts, and the one after reopening the file.

To fix this, we change the I/O manager to always return DISK_IO_ERROR
for errors and we abort a query if we receive such an error in the
scanner.

This change also fixes GetBytesInternal() to maintain the invariant that
the output buffer points to the boundary buffer whenever the latter
contains some data.

I tested this by running the repro from the JIRA and impalad did not
crash but aborted the queries. I also ran the repro with
abort_on_error=1, and with the file handle cache disabled.

Text files are not affected by this problem, since the
text scanner doesn't try to recover from errors during ProcessRange()
but wraps it in RETURN_IF_ERROR instead. With this change queries abort
with the same error.

Parquet files are also not affected since they have the metadata at the
end. Truncated files immediately fail with this error:
WARNINGS: File 
'hdfs://localhost:20500/test-warehouse/tpch.partsupp_parquet/foo.0.parq'
has an invalid version number: 

Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509
---
M be/src/common/status.h
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
M be/src/runtime/disk-io-mgr-scan-range.cc
M be/src/runtime/disk-io-mgr-test.cc
M be/src/runtime/disk-io-mgr.cc
M be/src/runtime/runtime-state.cc
M common/thrift/generate_error_codes.py
9 files changed, 97 insertions(+), 55 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/8011/11
-- 
To view, visit http://gerrit.cloudera.org:8080/8011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5905: add script for all-build-options job

2017-09-12 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5905: add script for all-build-options job
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8043/1/bin/all-build-options.sh
File bin/all-build-options.sh:

Line 1: #!/usr/bin/env bash
> Nit: I don't like the file name. all-build-options.sh sounds like it's some
Yeah the jenkins/ subdirectory makes sense to me, there's a lot of junk in bin/


PS1, Line 35: do
:   OPTIONS[4]=$BUILD_SHARED_LIBS
:   if ! ./bin/clean.sh
:   then
: echo "Clean failed"
: exit 1
:   fi
> You could remove noclean in line 26 and remove this. It's obviously not exa
I just preserved this logic from the original Jenkins job script. I didn't add 
this so I don't know if there was some deeper motivation or it.

I could probably try to clean this and other things up but I'd prefer to check 
this in with the minimal changes required to make it work then worry about 
cleanup later.


-- 
To view, visit http://gerrit.cloudera.org:8080/8043
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e78f05c41e3ccd59af599b00e453e7f88b2bb34
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3437: DECIMAL V2: avoid implicit decimal->double conversion

2017-09-12 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3437: DECIMAL_V2: avoid implicit decimal->double 
conversion
..


Patch Set 5:

Updated with some additional tests to get more exhaustive coverage of the 
different possible permutations.

-- 
To view, visit http://gerrit.cloudera.org:8080/7916
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie419a75784eec2294947103e6e1465dfadfc29da
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3437: DECIMAL V2: avoid implicit decimal->double conversion

2017-09-12 Thread Tim Armstrong (Code Review)
Hello Greg Rahn,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7916

to look at the new patch set (#5).

Change subject: IMPALA-3437: DECIMAL_V2: avoid implicit decimal->double 
conversion
..

IMPALA-3437: DECIMAL_V2: avoid implicit decimal->double conversion

This changes the behaviour of applying an arithmetic operator to
constant DECIMAL and non-DECIMAL arguments. In DECIMAL_V1, this
caused an implicit conversion to floating point, which caused
users a lot of confusion in some cases. In DECIMAL_V2 the typing
rules are simplified: constant decimals are treated the same as any
other decimals.

Testing:
Added some expression tests for different arithmetic operators
and binary predicates (the two Expr subclasses that call
convertNumericLiteralsFromDecimal()).

Extended analyzer tests to test DECIMAL_V2 behaviour. Added many
additional test for various combinations of literals and non-literals to
get better coverage of existing and new behaviour.

Ran core tests.

Change-Id: Ie419a75784eec2294947103e6e1465dfadfc29da
---
M be/src/exprs/expr-test.cc
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
3 files changed, 200 insertions(+), 49 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/7916/5
-- 
To view, visit http://gerrit.cloudera.org:8080/7916
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie419a75784eec2294947103e6e1465dfadfc29da
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Greg Rahn 


[Impala-ASF-CR] IMPALA-5597: Try casting targetExpr when building runtime filter plan

2017-09-12 Thread Tianyi Wang (Code Review)
Tianyi Wang has posted comments on this change.

Change subject: IMPALA-5597: Try casting targetExpr when building runtime 
filter plan
..


Patch Set 2:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/7949/2/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

Line 585:   targetExpr = targetExpr.substitute(smap, analyzer,false);
> Better to use try/catch and return null in case the re-analysis (part of su
Done


Line 587: Type srcType = filter.getSrcExpr().getType();
> Add a comment explaining why the types must be identical
Done


Line 591:   } catch(AnalysisException | IllegalArgumentException e){
> catch Exception
Done


Line 594:   Preconditions.checkState(targetExpr.getType().equals(srcType));
> remove (obviously true from code above)
Done


http://gerrit.cloudera.org:8080/#/c/7949/2/testdata/workloads/functional-query/queries/QueryTest/joins.test
File testdata/workloads/functional-query/queries/QueryTest/joins.test:

Line 774:  QUERY
> Also add a planner test in runtime-filter-propagation.test
Done. What does propagation mean here?


Line 775: # IMPALA-5597: Tests that a join on clause with lhs and rhs having 
different types will
> Comment is wrong.
Done


Line 777: select *
> EE test should go into runtime_row_filters.test
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0d66673e280ce13cd3a514236c0c2ecbc50475a6
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5597: Try casting targetExpr when building runtime filter plan

2017-09-12 Thread Tianyi Wang (Code Review)
Tianyi Wang has uploaded a new patch set (#3).

Change subject: IMPALA-5597: Try casting targetExpr when building runtime 
filter plan
..

IMPALA-5597: Try casting targetExpr when building runtime filter plan

This patch fixes a bug that fails a precondition check when generating
runtime filter plans. The lhs and rhs or join predicate might have
different types when the eq predicate function accepts wildcard-typed
parameters. In this case in existing code the types of source and target
expr will be found mismatch and an exception will be thrown when
generating runtime filters. This patch tries to cast target expr to be
of the same type as source expr. A testcase is added to joins.test

Change-Id: I0d66673e280ce13cd3a514236c0c2ecbc50475a6
---
M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test
6 files changed, 54 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/7949/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0d66673e280ce13cd3a514236c0c2ecbc50475a6
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tianyi Wang 


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-12 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded a new patch set (#3).

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..

IMPALA-5538: Use explicit catalog versions for deleted objects

This commit changes the way deletions are handled in the catalog and
disseminated to the impalad nodes through the statestore. Previously,
deletions of catalog objects were not explicitly annotated with the
catalog version in which the deletion occured and the impalads were
using the max catalog version in a catalog update in order to decide
whether a deletion should be applied to the local catalog cache or not.
This works correctly under the assumption that
all the changes that occurred in the catalog between an update's min
and max catalog version are included in that update, i.e. no gaps or
missing updates. With the upcoming fix for IMPALA-5058, that constraint
will be relaxed, thus allowing for gaps in the catalog updates.

To avoid breaking the existing behavior, this patch introduced the
following changes:
* Deletions in the catalog are explicitly recorded in a log with
the catalog version in which they occurred. As before, added and deleted
catalog objects are sent to the statestore.
* Topic entries associated with deleted catalog objects have non-empty
values (besided keys) that contain minimal object metadata including the
catalog version.
* Statestore is no longer using the existence or not of
topic entry values in order to identify deleted topic entries. Deleted
topic entries should be explicitly marked as such by the statestore
subscribers that produce them.
* Statestore subscribers now use the 'deleted' flag to determine if a
topic entry corresponds to a deleted item.
* Impalads use the deleted objects' catalog versions when updating the
local catalog cache from a catalog update and not the update's maximum
catalog version.

Testing:
- No new tests were added as these paths are already exercised by
existing tests.
- Run all core tests.

Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
---
M be/src/catalog/catalog-server.cc
M be/src/catalog/catalog-server.h
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/admission-controller.h
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler.cc
M be/src/service/impala-server.cc
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
M common/thrift/CatalogInternalService.thrift
M common/thrift/StatestoreService.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/statestore/test_statestore.py
16 files changed, 421 insertions(+), 336 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/7731/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-12 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..


Patch Set 2:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/7731/2/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:

PS2, Line 233:  0, if they have then skip this step and use that data
> nit: perhaps easier to read? "...version 0. If so, then skip ..."
Done


Line 305:   BuildTopicUpdatesHelper(catalog_objects.updated_objects, false);
> I like the previous version more where this is inlined into L292. A quick c
Done


Line 317: if (catalog_object.catalog_version <= last_sent_catalog_version_) 
continue;
> Move this above L312?
Done


PS2, Line 336: if (topic_deletions) {
 :   VLOG(1) << "Publishing deletion: " << entry_key << "@"
 :   << catalog_object.catalog_version;
 : } else {
 :   VLOG(1) << "Publishing update: " << entry_key << "@"
 :   << catalog_object.catalog_version;
 : }
> perhaps remove some redundancy this way:
Done


http://gerrit.cloudera.org:8080/#/c/7731/2/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

Line 1340: LOG(INFO) << "Received large catalog update(>100mb): "
> Received a large catalog item?
Done


PS2, Line 1366: if (item.deleted) {
  : VLOG(3) << "Deleted item: " << item.key << " version: "
  : << catalog_object.catalog_version;
  :   } else {
  : VLOG(3) << "Added item: " << item.key << " version: "
  : << catalog_object.catalog_version;
  :   }
> same suggestion as in catalog_server to reduce redundancy.
Done


http://gerrit.cloudera.org:8080/#/c/7731/2/common/thrift/StatestoreService.thrift
File common/thrift/StatestoreService.thrift:

Line 86:   3: required bool deleted = false;
> Isn't it better to not have a default value to make sure this is always set
Done


http://gerrit.cloudera.org:8080/#/c/7731/2/fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java:

Line 56:  *   determine which catalog objects were deleted since the last 
catalog update topic.
> catalog topic update
Done


Line 57:  *   Once the catalog update topic is constructed, the old deleted 
catalog objects are
> catalog topic update
Done


Line 74:   // Retrieve all the removed catalog objects with version > 
'fromVersion'.
> /** style comment
Done


http://gerrit.cloudera.org:8080/#/c/7731/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

PS2, Line 152: .
> the word "delta" implies that we're dealing with inserts, deletes, and poss
Done


Line 347:   // Identify the catalog objects that were updated (added/dropped) 
in the catalog with
> /** style comment
Done


Line 351: Set addedCatalogObjectKeys = Sets.newHashSet();
> updatedCatalogObjects?
Done


Line 363: TCatalogObject catalogTbl = new 
TCatalogObject(TCatalogObjectType.TABLE,
> Why not just iterate over db.getTables()?
Done


Line 441: 
addedCatalogObjectKeys.add(CatalogDeltaLog.toCatalogObjectKey(privilege));
> Instead of having a duplicate add() in all places, how about creating the a
Done


Line 448: for (TCatalogObject removedObject: 
deltaLog_.retrieveObjects(fromVersion)) {
> Nice! Much clearer.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5890: Abort queries if scanner hits IO errors

2017-09-12 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5890: Abort queries if scanner hits IO errors
..


Patch Set 10:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/8011/10/be/src/exec/scanner-context.cc
File be/src/exec/scanner-context.cc:

PS10, Line 154: if the scan range has returned eosr before
if that was the case, why can we have !eosr at line 152?


PS10, Line 155: must
why "must"? or do you mean "can't"?


PS10, Line 155: this is the first time the function
  : // is called 
why is that significant? and why is it okay to have io_buffer_==NULL in that 
case?


PS10, Line 200: DISK_IO_ERROR
why not INTERNAL_ERROR, since that's the code that indicates an internal error 
(i.e. bug).


PS10, Line 351:   if (boundary_buffer_bytes_left_ > 0 &&
  :   (output_buffer_pos_ != _buffer_pos_ ||
  :   output_buffer_bytes_left_ != 
_buffer_bytes_left_)) {
  : return false;
this is effectively a double negative statement, which makes it harder to read. 
 How about applying deMorgan's to remove the double negation:


return boundary_buffer_bytes_left_ == 0 ||
   (output_buffer_pos_ == _buffer_pos_ &&
output_buffer_bytes_left == _buffer_bytes_left_);

Alternatively, it may be even easier to read if you just put the DCHECK in 
there:

if (boundary_buffer_bytes_left_ > 0) {
DCHECK_EQ(output_buffer_pos_, _buffer_pos_);
DCHECK_EQ(output_buffer_bytes_left_, _buffer_bytes_left_);
}

(which you could also use at line 228-229).  Also, is there an invariant for 
the "else" case w.r.t. io_buffer* that makes sense to check?


-- 
To view, visit http://gerrit.cloudera.org:8080/8011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5905: add script for all-build-options job

2017-09-12 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change.

Change subject: IMPALA-5905: add script for all-build-options job
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8043/1/bin/all-build-options.sh
File bin/all-build-options.sh:

Line 1: #!/usr/bin/env bash
Nit: I don't like the file name. all-build-options.sh sounds like it's 
something that spits out the build options, rather than something that actually 
then builds using all those options. "build-with-all-options.sh" is closer to 
what this does.

On the other hand, if we have a directory for Jenkins jobs, and the script 
names are one to one with job names, I have no problem with this name. As a new 
developer, though, figuring out what's what in bin/ is already a challenge.


PS1, Line 35: do
:   OPTIONS[4]=$BUILD_SHARED_LIBS
:   if ! ./bin/clean.sh
:   then
: echo "Clean failed"
: exit 1
:   fi
You could remove noclean in line 26 and remove this. It's obviously not exactly 
the same, but it's supposed to be!


-- 
To view, visit http://gerrit.cloudera.org:8080/8043
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e78f05c41e3ccd59af599b00e453e7f88b2bb34
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5881: Use native allocation while building catalog updates

2017-09-12 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-5881: Use native allocation while building catalog 
updates
..


Patch Set 19: Code-Review+2

Thanks for the reviews. Carrying +2.

-- 
To view, visit http://gerrit.cloudera.org:8080/7955
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782
Gerrit-PatchSet: 19
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5881: Use native allocation while building catalog updates

2017-09-12 Thread Bharath Vissapragada (Code Review)
Hello Dimitris Tsirogiannis, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7955

to look at the new patch set (#19).

Change subject: IMPALA-5881: Use native allocation while building catalog 
updates
..

IMPALA-5881: Use native allocation while building catalog updates

This patch moves the allocation of thrift structures for serializing
the output of GetAllCatalogObjects() call into the native side. This
is done to prevent the Catalog from hitting JVM array limitations
(2GB maximum size) at scale. Additionally this patch also extends the
--compact_catalog_top=true to apply TCompactProtocol for the above
serialization to reduce the in-memory footprint.

This patch also caps the native allocations to not go beyond 4GB due
to Thrift library limitations. (Thrift internally uses uint32_t
datatype to represent the message size which limits the size to
~4.2GB).

Testing: Passed ASAN + HDFS core and DEBUG + HDFS exhaustive.

Deployed the patch on a 16 node cluster and tested it on a
Catalog-update topic of 3.5GB (uncompressed) or 780MB (compressed).

Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782
---
M be/src/catalog/catalog.cc
M be/src/catalog/catalog.h
M be/src/rpc/jni-thrift-util.h
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
A fe/src/main/java/org/apache/impala/thrift/NativeAllocator.java
A fe/src/main/java/org/apache/impala/thrift/NativeByteArrayOutputStream.java
A fe/src/main/java/org/apache/impala/thrift/TNativeSerializer.java
A fe/src/test/java/org/apache/impala/thrift/NativeByteArrayOutputStreamTest.java
A fe/src/test/java/org/apache/impala/thrift/TNativeSerializerTest.java
12 files changed, 731 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/7955/19
-- 
To view, visit http://gerrit.cloudera.org:8080/7955
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782
Gerrit-PatchSet: 19
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5881: Use native allocation while building catalog updates

2017-09-12 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-5881: Use native allocation while building catalog 
updates
..


Patch Set 18:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/7955/18/be/src/catalog/catalog.cc
File be/src/catalog/catalog.cc:

PS18, Line 117: DCHECK_LE(len, numeric_limits::max());
> This DCHECK needs to be run against nbuffer.buffer_len directly.
I'm not totally sure about this part, but looks like the static_cast() can 
silently fail. Either way I think its good to make this check on buffer_len


http://gerrit.cloudera.org:8080/#/c/7955/18/fe/src/main/java/org/apache/impala/service/JniCatalog.java
File fe/src/main/java/org/apache/impala/service/JniCatalog.java:

PS18, Line 155: Catalog
> nit: catalog
Done


http://gerrit.cloudera.org:8080/#/c/7955/18/fe/src/test/java/org/apache/impala/thrift/NativeByteArrayOutputStreamTest.java
File 
fe/src/test/java/org/apache/impala/thrift/NativeByteArrayOutputStreamTest.java:

Line 61: nbaos.write(b, offset, len);
> try catch and assert if we catch an exception or use the exceptionCaught pa
As discussed, not really needed since we expect the write to throw if an error 
occurs and then it is caught by the test.


Line 67:   // Makes sure that the memory is freed by the end of nboas.write().
> Expects a write() to fail and checks that the memory is freed after the uns
Done


Line 71:   nbaos.write(b, offset, len);
> Add an assert that fails if the write succeeded.
Done


Line 95:   public void nbaosTest() {
> NbaosTest
Done


Line 99: testAllocator  = new NativeTestAllocator();
> move into L96
Done


Line 100: // Check that initial allocation failure in NBAOS c'tor
> remove "that"
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7955
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782
Gerrit-PatchSet: 18
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5905: add script for all-build-options job

2017-09-12 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/8043

Change subject: IMPALA-5905: add script for all-build-options job
..

IMPALA-5905: add script for all-build-options job

This checks in a modified version of the job script for
https://jenkins.impala.io/view/Experimental/job/all-build-options
which adds UBSAN and TSAN.

The script is also modified to not reference any jenkins environment
variables, e.g. WORKSPACE, since the Jenkins job script intermingled
references to those with the script logic.

Change-Id: I6e78f05c41e3ccd59af599b00e453e7f88b2bb34
---
A bin/all-build-options.sh
1 file changed, 57 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/8043/1
-- 
To view, visit http://gerrit.cloudera.org:8080/8043
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I6e78f05c41e3ccd59af599b00e453e7f88b2bb34
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5881: Use native allocation while building catalog updates

2017-09-12 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5881: Use native allocation while building catalog 
updates
..


Patch Set 18: Code-Review+2

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7955/18/be/src/catalog/catalog.cc
File be/src/catalog/catalog.cc:

PS18, Line 117: DCHECK_LE(len, numeric_limits::max());
Is this actually needed? Isn't the cast in the line above going to fail if the 
condition in the DCHECK is false?


http://gerrit.cloudera.org:8080/#/c/7955/18/fe/src/main/java/org/apache/impala/service/JniCatalog.java
File fe/src/main/java/org/apache/impala/service/JniCatalog.java:

PS18, Line 155: Catalog
nit: catalog


-- 
To view, visit http://gerrit.cloudera.org:8080/7955
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782
Gerrit-PatchSet: 18
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Bump Kudu version to 3f49724

2017-09-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: Bump Kudu version to 3f49724
..


Patch Set 1: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/8040
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I21aff562e2bca90436a8d0206ffca44b712bc1f1
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] Bump Kudu version to 3f49724

2017-09-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: Bump Kudu version to 3f49724
..


Bump Kudu version to 3f49724

Change-Id: I21aff562e2bca90436a8d0206ffca44b712bc1f1
Reviewed-on: http://gerrit.cloudera.org:8080/8040
Reviewed-by: Matthew Jacobs 
Tested-by: Impala Public Jenkins
---
M bin/impala-config.sh
1 file changed, 2 insertions(+), 2 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Matthew Jacobs: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/8040
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I21aff562e2bca90436a8d0206ffca44b712bc1f1
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-5890: Abort queries if scanner hits IO errors

2017-09-12 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-5890: Abort queries if scanner hits IO errors
..


Patch Set 10:

I fixed more test errors and will run an exhaustive build in parallel now.

-- 
To view, visit http://gerrit.cloudera.org:8080/8011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5890: Abort queries if scanner hits IO errors

2017-09-12 Thread Lars Volker (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/8011

to look at the new patch set (#10).

Change subject: IMPALA-5890: Abort queries if scanner hits IO errors
..

IMPALA-5890: Abort queries if scanner hits IO errors

Prior to this fix, an error in ScannerContext::Stream::GetNextBuffer()
could leave the stream in an inconsistent state:

- The DiskIoMgr hits EOF unexpected, cancels the scan range and enqueues
a buffer with eosr set.
- The ScannerContext::Stream tries to read more bytes, but since it has
hit eosr, it tries to read beyond the end of the scan range using
DiskIoMgr::Read().
- The previous read error resulted in a new file handle being opened.
The now truncated, smaller file causes the seek to fail.
- Then during error handling, the BaseSequenceScanner calls SkipToSync()
and trips over the NULL pointer in in the IO buffer.

In my reproduction this only happens with the file handle cache enabled,
which causes Impala to see two different sized handles: the one from the
cache when the query starts, and the one after reopening the file.

To fix this, we change the I/O manager to always return DISK_IO_ERROR
for errors and we abort a query if we receive such an error in the
scanner.

This change also fixes GetBytesInternal() to maintain the invariant that
the output buffer points to the boundary buffer whenever the latter
contains some data.

I tested this by running the repro from the JIRA and impalad did not
crash but aborted the queries. I also ran the repro with
abort_on_error=1, and with the file handle cache disabled.

Text files are not affected by this problem, since the
text scanner doesn't try to recover from errors during ProcessRange()
but wraps it in RETURN_IF_ERROR instead. With this change queries abort
with the same error.

Parquet files are also not affected since they have the metadata at the
end. Truncated files immediately fail with this error:
WARNINGS: File 
'hdfs://localhost:20500/test-warehouse/tpch.partsupp_parquet/foo.0.parq'
has an invalid version number: 

Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509
---
M be/src/common/status.h
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
M be/src/runtime/disk-io-mgr-scan-range.cc
M be/src/runtime/disk-io-mgr-test.cc
M be/src/runtime/disk-io-mgr.cc
M be/src/runtime/runtime-state.cc
M common/thrift/generate_error_codes.py
9 files changed, 106 insertions(+), 54 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/8011/10
-- 
To view, visit http://gerrit.cloudera.org:8080/8011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5250: Unify decompressor output length semantics

2017-09-12 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5250: Unify decompressor output_length semantics
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8030/1//COMMIT_MSG
Commit Message:

Line 26: instead of the "fast" version, for the latter is insecure with 
corrupted
> Is there an individual benchmark?
Not sure how useful compression-test.cc is. I was thinking more along the lines 
of end-to-end tests that have heavy data exchanges.


http://gerrit.cloudera.org:8080/#/c/8030/1/be/src/util/decompress-test.cc
File be/src/util/decompress-test.cc:

Line 196: }
> I'm not sure if my interpretation of JIRA is correct. In my understanding:
Your interpretation is correct.

I'm just referring to a specific broken file I've seen where I believe the 
output_len should be 0. Can you please make sure that your change fixes the bug 
for that specific broken file and check whether these tests cover that scenario?


http://gerrit.cloudera.org:8080/#/c/8030/1/be/src/util/decompress.cc
File be/src/util/decompress.cc:

Line 426: // If size_only is false, output buffer size must be at least 
output_len. *output_len is
> Done. It is set to 0 in this case, while other decompressor may leave it un
Agree this in/out pattern is not great and we should consider your proposal 
(but not in this patch).

I'm ok with setting to 0 everywhere, but it might be easier to just leave 
output_len untouched. I saw in the code that in some places it is set to 0, but 
not in others. My main ask is for consistency and documenting the behavior in a 
comment.


-- 
To view, visit http://gerrit.cloudera.org:8080/8030
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifd42942b169921a7eb53940c3762bc45bb82a993
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5881: Use native allocation while building catalog updates

2017-09-12 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5881: Use native allocation while building catalog 
updates
..


Patch Set 18:

Taking a look now.

-- 
To view, visit http://gerrit.cloudera.org:8080/7955
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782
Gerrit-PatchSet: 18
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4987: Fix flaky test test row availability.py

2017-09-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4987: Fix flaky test test_row_availability.py
..


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1215/

-- 
To view, visit http://gerrit.cloudera.org:8080/8036
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I96142f1868a26426cbc34aa9a0e0a56979df66c3
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tianyi Wang 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5597: Try casting targetExpr when building runtime filter plan

2017-09-12 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5597: Try casting targetExpr when building runtime 
filter plan
..


Patch Set 2:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/7949/2/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

PS2, Line 866: 
Thanks for removing


http://gerrit.cloudera.org:8080/#/c/7949/2/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

Line 585:   targetExpr = targetExpr.substitute(smap, analyzer,false);
Better to use try/catch and return null in case the re-analysis (part of 
substitute()) fails for some reason.

Space before "false"


Line 587: Type srcType = filter.getSrcExpr().getType();
Add a comment explaining why the types must be identical


Line 591:   } catch(AnalysisException | IllegalArgumentException e){
catch Exception

formatting (space after catch, space before "{")


Line 594:   Preconditions.checkState(targetExpr.getType().equals(srcType));
remove (obviously true from code above)


http://gerrit.cloudera.org:8080/#/c/7949/2/testdata/workloads/functional-query/queries/QueryTest/joins.test
File testdata/workloads/functional-query/queries/QueryTest/joins.test:

Line 774:  QUERY
Also add a planner test in runtime-filter-propagation.test


Line 775: # IMPALA-5597: Tests that a join on clause with lhs and rhs having 
different types will
Comment is wrong.


Line 777: select *
EE test should go into runtime_row_filters.test


-- 
To view, visit http://gerrit.cloudera.org:8080/7949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0d66673e280ce13cd3a514236c0c2ecbc50475a6
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4987: Fix flaky test test row availability.py

2017-09-12 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4987: Fix flaky test test_row_availability.py
..


Patch Set 3: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/8036
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I96142f1868a26426cbc34aa9a0e0a56979df66c3
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4987: Fix flaky test test row availability.py

2017-09-12 Thread Tianyi Wang (Code Review)
Tianyi Wang has posted comments on this change.

Change subject: IMPALA-4987: Fix flaky test test_row_availability.py
..


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8036/2//COMMIT_MSG
Commit Message:

Line 7: IMPALA-4987: Fix flaky test test_row_availability.py
> Sorry to be adamant about grammar nits, but let's get them fixed.
Thanks for the careful review. I should definitely improve on this.


Line 9: This patch keeps test_row_availbility from random failing. In this test
> from randomly failing
Done


Line 14: coordinator finished sending query to the backends, which means the
> sending the query fragments
Done


http://gerrit.cloudera.org:8080/#/c/8036/2/tests/query_test/test_rows_availability.py
File tests/query_test/test_rows_availability.py:

Line 1:   # Licensed to the Apache Software Foundation (ASF) under one
> space
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/8036
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I96142f1868a26426cbc34aa9a0e0a56979df66c3
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4987: Fix flaky test test row availability.py

2017-09-12 Thread Tianyi Wang (Code Review)
Tianyi Wang has uploaded a new patch set (#3).

Change subject: IMPALA-4987: Fix flaky test test_row_availability.py
..

IMPALA-4987: Fix flaky test test_row_availability.py

This patch keeps test_row_availbility from randomly failing. In this test
the time interval between the 'Rows available' timeline event and the
previous event in the runtime profile is measured in order to make sure
that the rows become available after a specific amount of time. This
measurement is not correct since the previous event is that the
coordinator finished sending the query fragments to the backends, which
means the execution on some backends might have already started. This
patch tracks another event "Ready to start" as the beginning of the time
interval instead. The coordinator begins to send the query fragments to
the backends after this event so the time check should always pass.

Change-Id: I96142f1868a26426cbc34aa9a0e0a56979df66c3
---
M tests/query_test/test_rows_availability.py
1 file changed, 30 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/8036/3
-- 
To view, visit http://gerrit.cloudera.org:8080/8036
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I96142f1868a26426cbc34aa9a0e0a56979df66c3
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tianyi Wang 


[Impala-ASF-CR] IMPALA-5860: upgrade to LLVM 3.9.1

2017-09-12 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#2).

Change subject: IMPALA-5860: upgrade to LLVM 3.9.1
..

IMPALA-5860: upgrade to LLVM 3.9.1

LLVM made a few API changes:
* Misc minor changes to function and type signatures
* The CloneFunction() API changed semantics (http://reviews.llvm.org/D18628)

TODO: need to rebase and update toolchain to a later version. It has
gotten behind with the frequent Kudu version bumps.

Testing:
Ran core and ASAN tests.

Perf:
Ran single node TPC-H and targeted perf with scale factor 60. Both
improved on average.

+--+---+-++++
| Workload | File Format   | Avg (s) | Delta(Avg) | GeoMean(s) | 
Delta(GeoMean) |
+--+---+-++++
| TPCH(60) | parquet / none / none | 17.82   | -5.01% | 11.64  | -4.23% 
|
+--+---+-++++

+--+--+---++-++++-+---+
| Workload | Query| File Format   | Avg(s) | Base Avg(s) | 
Delta(Avg) | StdDev(%)  | Base StdDev(%) | Num Clients | Iters |
+--+--+---++-++++-+---+
| TPCH(60) | TPCH-Q1  | parquet / none / none | 27.97  | 27.59   |   +1.36% 
  |   0.39%|   0.41%| 1   | 5 |
| TPCH(60) | TPCH-Q20 | parquet / none / none | 5.81   | 5.78|   +0.44% 
  |   0.73%|   0.21%| 1   | 5 |
| TPCH(60) | TPCH-Q21 | parquet / none / none | 62.98  | 62.98   |   +0.01% 
  |   5.56%|   1.07%| 1   | 5 |
| TPCH(60) | TPCH-Q15 | parquet / none / none | 8.45   | 8.46|   -0.20% 
  |   0.40%|   0.38%| 1   | 5 |
| TPCH(60) | TPCH-Q4  | parquet / none / none | 5.57   | 5.59|   -0.41% 
  |   0.43%|   0.80%| 1   | 5 |
| TPCH(60) | TPCH-Q6  | parquet / none / none | 3.16   | 3.17|   -0.45% 
  |   0.78%|   1.70%| 1   | 5 |
| TPCH(60) | TPCH-Q5  | parquet / none / none | 7.41   | 7.47|   -0.92% 
  |   0.71%|   1.06%| 1   | 5 |
| TPCH(60) | TPCH-Q9  | parquet / none / none | 33.45  | 33.78   |   -0.99% 
  |   1.15%|   0.85%| 1   | 5 |
| TPCH(60) | TPCH-Q11 | parquet / none / none | 2.00   | 2.03|   -1.34% 
  |   1.71%|   2.24%| 1   | 5 |
| TPCH(60) | TPCH-Q2  | parquet / none / none | 4.71   | 4.79|   -1.60% 
  |   1.49%|   1.95%| 1   | 5 |
| TPCH(60) | TPCH-Q18 | parquet / none / none | 46.48  | 47.71   |   -2.58% 
  |   1.04%|   0.38%| 1   | 5 |
| TPCH(60) | TPCH-Q14 | parquet / none / none | 5.85   | 6.02|   -2.84% 
  |   0.44%|   0.70%| 1   | 5 |
| TPCH(60) | TPCH-Q22 | parquet / none / none | 6.51   | 6.76|   -3.71% 
  |   2.29%|   2.42%| 1   | 5 |
| TPCH(60) | TPCH-Q19 | parquet / none / none | 7.27   | 7.63|   -4.69% 
  |   1.33%|   0.78%| 1   | 5 |
| TPCH(60) | TPCH-Q10 | parquet / none / none | 13.19  | 13.84   |   -4.73% 
  |   0.42%|   1.44%| 1   | 5 |
| TPCH(60) | TPCH-Q13 | parquet / none / none | 21.95  | 23.12   |   -5.03% 
  |   0.25%|   1.19%| 1   | 5 |
| TPCH(60) | TPCH-Q16 | parquet / none / none | 5.29   | 5.57|   -5.04% 
  |   0.85%|   0.78%| 1   | 5 |
| TPCH(60) | TPCH-Q7  | parquet / none / none | 42.05  | 44.33   |   -5.16% 
  |   2.07%|   2.28%| 1   | 5 |
| TPCH(60) | TPCH-Q12 | parquet / none / none | 19.77  | 21.00   |   -5.87% 
  |   8.14%|   5.09%| 1   | 5 |
| TPCH(60) | TPCH-Q3  | parquet / none / none | 11.46  | 12.32   |   -6.94% 
  |   0.76%|   0.53%| 1   | 5 |
| TPCH(60) | TPCH-Q17 | parquet / none / none | 40.09  | 49.28   |   
-18.64%  |   2.09%|   0.67%| 1   | 5 |
| TPCH(60) | TPCH-Q8  | parquet / none / none | 10.63  | 13.47   | I 
-21.08%  | * 12.34% * | * 21.09% * | 1   | 5 |
+--+--+---++-++++-+---+

+---+---+-++++
| Workload  | File Format   | Avg (s) | Delta(Avg) | GeoMean(s) 
| Delta(GeoMean) |
+---+---+-++++
| TARGETED-PERF(60) | parquet / none / none | 22.38   | -1.24% 

[Impala-ASF-CR] IMPALA-3877: support unpatched LLVM

2017-09-12 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#2).

Change subject: IMPALA-3877: support unpatched LLVM
..

IMPALA-3877: support unpatched LLVM

The p1 patch we use for LLVM avoided merging of structurally
identical Struct types in unpredictable ways when linking in
IR UDF modules. This avoided hitting type assertions when
generating calls to IR UDfs.

This implements an alternative solution, which is to bitcast
the arguments when calling IR UDFs. This means we do not need to carry
the patch when we upgrade LLVM.

Testing:
Ran core tests with unpatched LLVM 3.8, including the IR UDF test that
originally required the patch to pass.

Change-Id: I3dfbe44ed8a1464b9b0991fd54e72b194ad6155d
---
M be/src/codegen/CMakeLists.txt
M be/src/codegen/codegen-anyval.cc
A be/src/codegen/codegen-util.cc
A be/src/codegen/codegen-util.h
M be/src/codegen/llvm-codegen.h
M be/src/exprs/expr-codegen-test.cc
6 files changed, 154 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/7973/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7973
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3dfbe44ed8a1464b9b0991fd54e72b194ad6155d
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5250: Unify decompressor output length semantics

2017-09-12 Thread Tianyi Wang (Code Review)
Tianyi Wang has posted comments on this change.

Change subject: IMPALA-5250: Unify decompressor output_length semantics
..


Patch Set 1:

(18 comments)

http://gerrit.cloudera.org:8080/#/c/8030/1//COMMIT_MSG
Commit Message:

Line 9: This patch makes the semantics output_length parameter in
> the semantics of the output_length
Done


Line 10: Codec::ProcessBlock to be the same. In existing code different
> Codec::ProcessBlock() to be the same across all codecs. In the existing cod
Done


Line 13:to actual decompressed length, but it does not set it to actual
> to the actual decompressed length
Done


Line 16:be exactly the same as actual decompressed length, otherwise
> as the actual decompressed length
Done


Line 19:actual decompressed length and will set it to actual decompressed
> the actual decompressed length
Done


Line 21: This inconsistency leads to a bug where the error message is
> This inconsistency lead to a bug where 
Do you mean changing it to plural?


Line 24: decompressors can handle oversized output buffer correctly.
> can handle an oversized output buffer correctly.
Done


Line 25: Lz4Decompressor will use the "safe" version of decompression function
> will use the "safe" instead of the "fast" decompression function
Done


Line 26: instead of the "fast" version, for the latter is insecure with 
corrupted
> We use LZ4 for compressing exchange data, so we check the perf impact of th
Is there an individual benchmark?
There is one for compression in be/src/experiments/compression-test.cc.


Line 27: data and requires decompressed length to be known.
> and requires the decompressed length to be known
Done


http://gerrit.cloudera.org:8080/#/c/8030/1/be/src/util/decompress-test.cc
File be/src/util/decompress-test.cc:

Line 153:   // Test the behavior when the decompressor is given too little / 
too much space
> "." at end of sentence
Done


Line 155:   // correct output size when the space is enough, and does not write 
beyond the output
> the correct output size
Done


Line 157:   void DecompressOverUnderSizedOutputBuffer(Codec *compressor, Codec 
*decompressor,
> Out style is "Codec* compressor"
Done


Line 196: }
> I think we should test more directly the scenario mentioned in the JIRA whe
I'm not sure if my interpretation of JIRA is correct. In my understanding:
1. The parquet file is corrupted so either 
current_page_header_.uncompressed_page_size is greater or 
current_page_header_.compressed_page_size is less than what it should be. 
2. Either way we allocated a larger buffer and the decompressor decompressed 
the (probably incomplete) data successfully. 
3. The part of the buffer not written by the decompressor is uninitilized. 
Because the decompressor returned OK, somehow we read this uninitilized part 
later and trigger undeterministic errors.
The problem here is in 2: The decompressor should report how much data it 
decompressed even if the decompression is successful. Then ReadDataPage() can 
check whether it is the same as current_page_header_.uncompressed_page_size.
So I think we should test whether output_len is set properly with an successful 
decompression. I don't understand "set to 0" part. When should it be set to 0?


http://gerrit.cloudera.org:8080/#/c/8030/1/be/src/util/decompress.cc
File be/src/util/decompress.cc:

Line 426: // If size_only is false, output buffer size must be at least 
output_len. *output_len is
> Mention what output_len is set to if a non-OK status is returned.
Done. It is set to 0 in this case, while other decompressor may leave it 
unmodified or set it to an arbitrary intermediate value.
I think the complication here is caused by the design that 'output_len' and 
'output' are output parameters. When a parameter is both input and output, and 
could be passed as reference further into callees of callee, there could be all 
kinds of bugs forgetting to set the correct value here and there. I think if 
output and output_len is in the return value, creating such bugs is much more 
difficult, and the code will be much easier to follow since the data flow is 
clear and explicit.

I think we should have a Status type for return values. It represents either 
a value or an error status, and then we can avoid output parameters in new 
codes.


Line 427: // also updated with actual output size.
> the actual output size
Done


Line 575:   // LZ4_decompress_fast will cause a segmentation fault if passed a 
nullptr output.
> Comment still relevant?
Removed.


Line 579:   if (ret < 0) {
> single line
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/8030
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifd42942b169921a7eb53940c3762bc45bb82a993
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 

[Impala-ASF-CR] IMPALA-5250: Unify decompressor output length semantics

2017-09-12 Thread Tianyi Wang (Code Review)
Tianyi Wang has uploaded a new patch set (#2).

Change subject: IMPALA-5250: Unify decompressor output_length semantics
..

IMPALA-5250: Unify decompressor output_length semantics

This patch makes the semantics of the output_length parameter in
Codec::ProcessBlock to be the same across all codecs. In existing code
different decompressor treats output_length differently:
1. SnappyDecompressor needs output_length to be greater than or equal
   to the actual decompressed length, but it does not set it to the
   actual decompressed length after decompression.
2. SnappyBlockDecompressor and Lz4Decompressor require output_length to
   be exactly the same as the actual decompressed length, otherwise
   decompression fails.
3. Other decompressors need output_length to be greater than or equal to
   the actual decompressed length and will set it to actual decompressed
   length if oversized.
This inconsistency leads to a bug where the error message is
undeterministic when the compressed block is corrupted. This patch makes
all decompressor behave as 3 and adds a testcase checking that
decompressors can handle an oversized output buffer correctly.
Lz4Decompressor will use the "safe" instead of the "fast" decompression
function, for the latter is insecure with corrupted data and requires
the decompressed length to be known.

Change-Id: Ifd42942b169921a7eb53940c3762bc45bb82a993
---
M be/src/util/decompress-test.cc
M be/src/util/decompress.cc
2 files changed, 31 insertions(+), 27 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/8030/2
-- 
To view, visit http://gerrit.cloudera.org:8080/8030
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifd42942b169921a7eb53940c3762bc45bb82a993
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 


[Impala-ASF-CR] IMPALA-4987: Fix flaky test test row availability.py

2017-09-12 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4987: Fix flaky test test_row_availability.py
..


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8036/2//COMMIT_MSG
Commit Message:

Line 7: IMPALA-4987: Fix flaky test test_row_availability.py
Sorry to be adamant about grammar nits, but let's get them fixed.


Line 9: This patch keeps test_row_availbility from random failing. In this test
from randomly failing


Line 14: coordinator finished sending query to the backends, which means the
sending the query fragments


http://gerrit.cloudera.org:8080/#/c/8036/2/tests/query_test/test_rows_availability.py
File tests/query_test/test_rows_availability.py:

Line 1:   # Licensed to the Apache Software Foundation (ASF) under one
space


-- 
To view, visit http://gerrit.cloudera.org:8080/8036
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I96142f1868a26426cbc34aa9a0e0a56979df66c3
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5881: Use native allocation while building catalog updates

2017-09-12 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5881: Use native allocation while building catalog 
updates
..


Patch Set 18: Code-Review+1

(6 comments)

Dimitris, any more comments?

http://gerrit.cloudera.org:8080/#/c/7955/18/fe/src/test/java/org/apache/impala/thrift/NativeByteArrayOutputStreamTest.java
File 
fe/src/test/java/org/apache/impala/thrift/NativeByteArrayOutputStreamTest.java:

Line 61: nbaos.write(b, offset, len);
try catch and assert if we catch an exception or use the exceptionCaught 
pattern you use for the c'tor test below


Line 67:   // Makes sure that the memory is freed by the end of nboas.write().
Expects a write() to fail and checks that the memory is freed after the 
unsuccessful write().


Line 71:   nbaos.write(b, offset, len);
Add an assert that fails if the write succeeded.


Line 95:   public void nbaosTest() {
NbaosTest


Line 99: testAllocator  = new NativeTestAllocator();
move into L96


Line 100: // Check that initial allocation failure in NBAOS c'tor
remove "that"


-- 
To view, visit http://gerrit.cloudera.org:8080/7955
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782
Gerrit-PatchSet: 18
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] PREVIEW: IMPALA-3437: DECIMAL V2: avoid implicit decimal->double conversion

2017-09-12 Thread Greg Rahn (Code Review)
Greg Rahn has posted comments on this change.

Change subject: PREVIEW: IMPALA-3437: DECIMAL_V2: avoid implicit 
decimal->double conversion
..


Patch Set 4: Code-Review+1

LGTM

-- 
To view, visit http://gerrit.cloudera.org:8080/7916
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie419a75784eec2294947103e6e1465dfadfc29da
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Greg Rahn 
Gerrit-HasComments: No


[Impala-ASF-CR] [DOCS] Explain Boost setting needed for 96-bit timestamps

2017-09-12 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: [DOCS] Explain Boost setting needed for 96-bit timestamps
..


Patch Set 2: Code-Review-1

I think that the documented solution is not the correct solution to the 
problem. It may work by coincidence but relies on some very brittle assumptions 
about Impala exporting boost functions that have binary compatibility with the 
version of boost used by the UDF.

In future we will likely remove, update or modify the version of boost used in 
Impala.

-- 
To view, visit http://gerrit.cloudera.org:8080/7983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4b67cd7762f682c3a054e0d9641080aa51801c83
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5890: Abort queries if scanner hits IO errors

2017-09-12 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5890: Abort queries if scanner hits IO errors
..


Patch Set 9: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/8011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5881: Use native allocation while building catalog updates

2017-09-12 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has uploaded a new patch set (#18).

Change subject: IMPALA-5881: Use native allocation while building catalog 
updates
..

IMPALA-5881: Use native allocation while building catalog updates

This patch moves the allocation of thrift structures for serializing
the output of GetAllCatalogObjects() call into the native side. This
is done to prevent the Catalog from hitting JVM array limitations
(2GB maximum size) at scale. Additionally this patch also extends the
--compact_catalog_top=true to apply TCompactProtocol for the above
serialization to reduce the in-memory footprint.

This patch also caps the native allocations to not go beyond 4GB due
to Thrift library limitations. (Thrift internally uses uint32_t
datatype to represent the message size which limits the size to
~4.2GB).

Testing: Passed ASAN + HDFS core and DEBUG + HDFS exhaustive.

Deployed the patch on a 16 node cluster and tested it on a
Catalog-update topic of 3.5GB (uncompressed) or 780MB (compressed).

Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782
---
M be/src/catalog/catalog.cc
M be/src/catalog/catalog.h
M be/src/rpc/jni-thrift-util.h
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
A fe/src/main/java/org/apache/impala/thrift/NativeAllocator.java
A fe/src/main/java/org/apache/impala/thrift/NativeByteArrayOutputStream.java
A fe/src/main/java/org/apache/impala/thrift/TNativeSerializer.java
A fe/src/test/java/org/apache/impala/thrift/NativeByteArrayOutputStreamTest.java
A fe/src/test/java/org/apache/impala/thrift/TNativeSerializerTest.java
12 files changed, 731 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/7955/18
-- 
To view, visit http://gerrit.cloudera.org:8080/7955
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782
Gerrit-PatchSet: 18
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5881: Use native allocation while building catalog updates

2017-09-12 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-5881: Use native allocation while building catalog 
updates
..


Patch Set 17:

(7 comments)

Fixed a subtle bug in the test allocator.

http://gerrit.cloudera.org:8080/#/c/7955/17/fe/src/test/java/org/apache/impala/thrift/NativeByteArrayOutputStreamTest.java
File 
fe/src/test/java/org/apache/impala/thrift/NativeByteArrayOutputStreamTest.java:

Line 40:   || size >= 
NativeByteArrayOutputStream.BUFFER_MAX_SIZE_LIMIT) {
> Our real allocator would not OOM if the size is big, at least not in this p
Done


Line 60:   private void writeAndCheck(NativeByteArrayOutputStream nboas,
> I'd prefer to split this up into writeOk() and writeNotOk(). Otherwise, tes
Done


Line 78: NativeByteArrayOutputStream nboas =
> The first allocation happens here in the c'tor. We need to test that as wel
Done


Line 90: nboas = new NativeByteArrayOutputStream(testAllocator);
> nit: nbaos
Done


Line 108: 
> remove extraneous newines
Done


Line 114: testAllocator = new NativeTestAllocator();
> Add a helper function for these smaller tests that creates a new allocator,
Done


http://gerrit.cloudera.org:8080/#/c/7955/17/fe/src/test/java/org/apache/impala/thrift/TNativeSerializerTest.java
File fe/src/test/java/org/apache/impala/thrift/TNativeSerializerTest.java:

Line 81:   public void testBasicSerialization()
> We usually call these TestBasicSerialization (first letter is uppercase)
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7955
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782
Gerrit-PatchSet: 17
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4987: Fix flaky test test row availability.py

2017-09-12 Thread Tianyi Wang (Code Review)
Tianyi Wang has posted comments on this change.

Change subject: IMPALA-4987: Fix flaky test test_row_availability.py
..


Patch Set 1:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/8036/1//COMMIT_MSG
Commit Message:

Line 9: This patch keeps test_row_availbility from random failure. In this test
> Please take a pass to correct wording/grammar. For example, it should be:
Done


Line 10: the time interval between 'Rows available' event and the previous event
> the 'Rows available' timeline event
Done


Line 11: in runtime profile is measured in order to make sure that rows become
> in the runtime profile
Done


Line 12: available after a specific amount of time. This is not correct since
> Instead of 'This' try to rephrase/repeat what 'This' refers to for clarity,
Done


Line 13: the previous event is that the coordinator finishes sending query to
> finished sending the query to the backends
Done


Line 14: backends, which means the execution on backend might have already
> on some backends might have already started
Done


Line 15: started. This patch tracks another event "ready to start" as the
> "Read to start"
Done


Line 17: query to backends after this event so the time check should always 
pass.
> the query to backends
Done


http://gerrit.cloudera.org:8080/#/c/8036/1/tests/query_test/test_rows_availability.py
File tests/query_test/test_rows_availability.py:

Line 81: row_avail_time_ms = 
self.__parse_time_ms(self.__find_time(line))
> rows_avail_time_ms (not just one row is available)
Done


Line 92: "'ready to start' event.\nExpected the event to be marked no 
earlier than "\
> 'Ready to start'
Done


Line 93: "%sms after the 'ready to start' event.\nQuery: %s"\
> 'Ready to start'
Done


Line 99: """Find event time point in a line from runtime timeline."""
> from the runtime profile timeline
Done


Line 100: # Example line: "- Rows available: 3s311ms (2s300ms)"
> Example should include what this function returns
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/8036
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I96142f1868a26426cbc34aa9a0e0a56979df66c3
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5736: Add impala-shell argument to set default query options

2017-09-12 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change.

Change subject: IMPALA-5736: Add impala-shell argument to set default query 
options
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8038/1//COMMIT_MSG
Commit Message:

Line 12: Examples:
I think you can add a test in tests/shell/test_shell_interactive.py. 
test_var_substitution() has an example where options are provided. I didn't see 
an ~/.impalarc test.

  @pytest.mark.execute_serially
  def test_var_substitution(self):
cmds = open(os.path.join(QUERY_FILE_PATH, 
'test_var_substitution.sql')).read()
args = '''--var=foo=123 --var=BAR=456 --delimited "--output_delimiter= " '''
result = run_impala_shell_interactive(cmds, shell_args=args)
assert_var_substitution(result)


PS1, Line 16: .impalarc:
: [impala]
: query_options=EXPLAIN_LEVEL=2,MT_DOP=2
This might be hard to do, but I think

[impala]
[[query_options]]
EXPLAIN_LEVEL=2
MT_DOP=2

may be better.


http://gerrit.cloudera.org:8080/#/c/8038/1/shell/option_parser.py
File shell/option_parser.py:

PS1, Line 160:   parser.add_option("--var", dest="keyval", action="append",
 : help="Define variable(s) to be used within 
the Impala session."
 :  " It must follow the pattern 
\"KEY=VALUE\","
 :  " KEY starts with an alphabetic 
character and"
 :  " contains alphanumeric characters or 
underscores.")
 :   parser.add_option("--query_options", dest="query_options",
 : help="Sets default query options. The format 
is comma separated"
 :   
A little bit of bike-shedding:

Impala-shell seems to support --var a=b --var c=d, but for query options it's 
"--query_options=a=b,c=d". I think the former approach is preferable, because 
it means we don't have to worry about commas in settings. It's also consistent 
with --var.


-- 
To view, visit http://gerrit.cloudera.org:8080/8038
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I26a3b67230c80a99bd246b6af205d558fec9a986
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4987: Fix flaky test test row availability.py

2017-09-12 Thread Tianyi Wang (Code Review)
Tianyi Wang has uploaded a new patch set (#2).

Change subject: IMPALA-4987: Fix flaky test test_row_availability.py
..

IMPALA-4987: Fix flaky test test_row_availability.py

This patch keeps test_row_availbility from random failing. In this test
the time interval between the 'Rows available' timeline event and the
previous event in the runtime profile is measured in order to make sure
that the rows become available after a specific amount of time. This
measurement is not correct since the previous event is that the
coordinator finished sending query to the backends, which means the
execution on some backends might have already started. This patch tracks
another event "Ready to start" as the beginning of time interval
instead. The coordinator begins to send the query to the backends after
this event so the time check should always pass.

Change-Id: I96142f1868a26426cbc34aa9a0e0a56979df66c3
---
M tests/query_test/test_rows_availability.py
1 file changed, 31 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/8036/2
-- 
To view, visit http://gerrit.cloudera.org:8080/8036
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I96142f1868a26426cbc34aa9a0e0a56979df66c3
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 


[Impala-ASF-CR] IMPALA-5597: Try casting targetExpr when building runtime filter plan

2017-09-12 Thread Tianyi Wang (Code Review)
Tianyi Wang has uploaded a new patch set (#2).

Change subject: IMPALA-5597: Try casting targetExpr when building runtime 
filter plan
..

IMPALA-5597: Try casting targetExpr when building runtime filter plan

This patch fixes a bug that fails a precondition check when generating
runtime filter plans. The lhs and rhs or join predicate might have
different types when the eq predicate function accepts wildcard-typed
parameters. In this case in existing code the types of source and target
expr will be found mismatch and an exception will be thrown when
generating runtime filters. This patch tries to cast target expr to be
of the same type as source expr. A testcase is added to joins.test

Change-Id: I0d66673e280ce13cd3a514236c0c2ecbc50475a6
---
M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M testdata/workloads/functional-query/queries/QueryTest/joins.test
5 files changed, 20 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/7949/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0d66673e280ce13cd3a514236c0c2ecbc50475a6
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tianyi Wang 


[Impala-ASF-CR] IMPALA-5597: Check predicate children types when building runtime filter plan

2017-09-12 Thread Tianyi Wang (Code Review)
Tianyi Wang has restored this change.

Change subject: IMPALA-5597: Check predicate children types when building 
runtime filter plan
..


Restored

A new fix is implemented.

-- 
To view, visit http://gerrit.cloudera.org:8080/7949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: restore
Gerrit-Change-Id: I0d66673e280ce13cd3a514236c0c2ecbc50475a6
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tianyi Wang 


Re: [Impala-ASF-CR] [DOCS] Explain Boost setting needed for 96-bit timestamps

2017-09-12 Thread Tim Armstrong
I've been talking to John out of band about this. I had some more meta
concerns about documenting this and whether it will encourage customers to
do dangerous things. Need to organise my thoughts and enumerate concerns.

On 12 Sep. 2017 9:42 am, "Matthew Jacobs (Code Review)" 
wrote:

> Matthew Jacobs has posted comments on this change.
>
> Change subject: [DOCS] Explain Boost setting needed for 96-bit timestamps
> ..
>
>
> Patch Set 2:
>
> (3 comments)
>
> http://gerrit.cloudera.org:8080/#/c/7983/2/docs/shared/impala_common.xml
> File docs/shared/impala_common.xml:
>
> PS2, Line 848: our
> who is 'our' here?
>
>
> PS2, Line 850: ,
>  : which matches the representation for
> TIMESTAMP in Impala
> , and is required in order to use TIMESTAMP.
>
>
> Line 855: 
> it might be good to just say that you should probably just define this
> variable always, unless you have a really good reason not to
>
>
> --
> To view, visit http://gerrit.cloudera.org:8080/7983
> To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
>
> Gerrit-MessageType: comment
> Gerrit-Change-Id: I4b67cd7762f682c3a054e0d9641080aa51801c83
> Gerrit-PatchSet: 2
> Gerrit-Project: Impala-ASF
> Gerrit-Branch: master
> Gerrit-Owner: John Russell 
> Gerrit-Reviewer: John Russell 
> Gerrit-Reviewer: Matthew Jacobs 
> Gerrit-Reviewer: Michael Ho
> Gerrit-Reviewer: Taras Bobrovytsky 
> Gerrit-Reviewer: Tim Armstrong 
> Gerrit-HasComments: Yes
>


[Impala-ASF-CR] IMPALA-5890: Abort queries if scanner hits IO errors

2017-09-12 Thread Lars Volker (Code Review)
Lars Volker has uploaded a new patch set (#9).

Change subject: IMPALA-5890: Abort queries if scanner hits IO errors
..

IMPALA-5890: Abort queries if scanner hits IO errors

Prior to this fix, an error in ScannerContext::Stream::GetNextBuffer()
could leave the stream in an inconsistent state:

- The DiskIoMgr hits EOF unexpected, cancels the scan range and enqueues
a buffer with eosr set.
- The ScannerContext::Stream tries to read more bytes, but since it has
hit eosr, it tries to read beyond the end of the scan range using
DiskIoMgr::Read().
- The previous read error resulted in a new file handle being opened.
The now truncated, smaller file causes the seek to fail.
- Then during error handling, the BaseSequenceScanner calls SkipToSync()
and trips over the NULL pointer in in the IO buffer.

In my reproduction this only happens with the file handle cache enabled,
which causes Impala to see two different sized handles: the one from the
cache when the query starts, and the one after reopening the file.

To fix this, we change the I/O manager to always return DISK_IO_ERROR
for errors and we abort a query if we receive such an error in the
scanner.

This change also fixes GetBytesInternal() to maintain the invariant that
the output buffer points to the boundary buffer whenever the latter
contains some data.

I tested this by running the repro from the JIRA and impalad did not
crash but aborted the queries. I also ran the repro with
abort_on_error=1, and with the file handle cache disabled.

Text files are not affected by this problem, since the
text scanner doesn't try to recover from errors during ProcessRange()
but wraps it in RETURN_IF_ERROR instead. With this change queries abort
with the same error.

Parquet files are also not affected since they have the metadata at the
end. Truncated files immediately fail with this error:
WARNINGS: File 
'hdfs://localhost:20500/test-warehouse/tpch.partsupp_parquet/foo.0.parq'
has an invalid version number: 

Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509
---
M be/src/common/status.h
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
M be/src/runtime/disk-io-mgr-scan-range.cc
M be/src/runtime/disk-io-mgr.cc
M be/src/runtime/runtime-state.cc
M common/thrift/generate_error_codes.py
8 files changed, 100 insertions(+), 51 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/8011/9
-- 
To view, visit http://gerrit.cloudera.org:8080/8011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5890: Abort queries if scanner hits IO errors

2017-09-12 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-5890: Abort queries if scanner hits IO errors
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8011/8/be/src/exec/base-sequence-scanner.cc
File be/src/exec/base-sequence-scanner.cc:

Line 180: 
state_->LogError(ErrorMsg(TErrorCode::SEQUENCE_SCANNER_PARSE_ERROR,
> Unfortunately it looks like when abort_on_error=1 we depend on this status 
Done. I opened IMPALA-5922 to track this and left a TODO in the code.


-- 
To view, visit http://gerrit.cloudera.org:8080/8011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5890: Abort queries if scanner hits IO errors

2017-09-12 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5890: Abort queries if scanner hits IO errors
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8011/8/be/src/exec/base-sequence-scanner.cc
File be/src/exec/base-sequence-scanner.cc:

Line 180: 
state_->LogError(ErrorMsg(TErrorCode::SEQUENCE_SCANNER_PARSE_ERROR,
> Unfortunately it looks like when abort_on_error=1 we depend on this status 
Hm, that's unfortunate. I wonder how many of these are already reporting 
filename/offset, but yeah, until we fix them all we'll have to keep the old 
structure. Let's be sure to continue to chip away at these kind of things over 
time, though. And let's leave a TODO explaining where we'd like to get to (use 
LogOrReturnError()).


-- 
To view, visit http://gerrit.cloudera.org:8080/8011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Bump Kudu version to 3f49724

2017-09-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: Bump Kudu version to 3f49724
..


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1214/

-- 
To view, visit http://gerrit.cloudera.org:8080/8040
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I21aff562e2bca90436a8d0206ffca44b712bc1f1
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala

2017-09-12 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7954/2/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

PS2, Line 81: ptime
Have you considered catching the error when the invalid ptime object is 
created? What is the benefit of doing it here?


-- 
To view, visit http://gerrit.cloudera.org:8080/7954
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5736: Add impala-shell argument to set default query options

2017-09-12 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-5736: Add impala-shell argument to set default query 
options
..


Patch Set 1:

(1 comment)

Looks good to me, only a minor comment.

http://gerrit.cloudera.org:8080/#/c/8038/1/shell/impala_shell.py
File shell/impala_shell.py:

PS1, Line 1308: keyvals
Can you inline this?


-- 
To view, visit http://gerrit.cloudera.org:8080/8038
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I26a3b67230c80a99bd246b6af205d558fec9a986
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder

2017-09-12 Thread Pranay Singh (Code Review)
Pranay Singh has uploaded a new patch set (#2).

Change subject: IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder
..

IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder

Currently DictDecoder class and DictEncoder class uses std::vector
to store the tables mapping codeword to value and vice-versa. It is
hard to detect the memory usage by these tables when they becomes
very large, since this memory is not accounted by Impala's memory
mangement infrastructure.

This patch uses existing dictionary_pool_ to track the memory used
by std::vector in DictDecoder class, whereas an existing memory pool
for DictEncoder class is used when allocating memory for std::vector
in DictEncoder class.

Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299
---
M be/src/exec/parquet-column-readers.cc
M be/src/util/dict-encoding.h
M be/src/util/dict-test.cc
3 files changed, 42 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/34/8034/2
-- 
To view, visit http://gerrit.cloudera.org:8080/8034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Pranay Singh


[Impala-ASF-CR] IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder

2017-09-12 Thread Pranay Singh (Code Review)
Pranay Singh has posted comments on this change.

Change subject: IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8034/1/be/src/exec/exec-node.h
File be/src/exec/exec-node.h:

PS1, Line 215:  MemTracker* decoder_mem_tracker() { return 
decoder_mem_tracker_.get(); }
> The dictionary is only used for Parquet, so I think the MemPool should not 
I also think using existing memory pool, dictionary_pool_  is a better option, 
I missed it. I'll use dictionary_pool_ instead


http://gerrit.cloudera.org:8080/#/c/8034/1/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

PS1, Line 213: dict_decoder_(new 
MemPool(parent->scan_node_->decoder_mem_tracker())),
> It is important not to create objects on a per-column basis unless truly ne
I'll use dictionary_pool_ instead of creating a new memory pool for every 
column reads


http://gerrit.cloudera.org:8080/#/c/8034/1/be/src/util/dict-encoding.h
File be/src/util/dict-encoding.h:

PS1, Line 234: *val_ptr = *dict_[index];
> dict_ needs to return T's directly rather than T*'s. This is a performance 
I'll use memcpy to copy the contents of dictionary to the buffer passed.


-- 
To view, visit http://gerrit.cloudera.org:8080/8034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Pranay Singh
Gerrit-HasComments: Yes


[Impala-ASF-CR] [DOCS] Explain Boost setting needed for 96-bit timestamps

2017-09-12 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: [DOCS] Explain Boost setting needed for 96-bit timestamps
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/7983/2/docs/shared/impala_common.xml
File docs/shared/impala_common.xml:

PS2, Line 848: our
who is 'our' here?


PS2, Line 850: ,
 : which matches the representation for 
TIMESTAMP in Impala
, and is required in order to use TIMESTAMP.


Line 855: 
it might be good to just say that you should probably just define this variable 
always, unless you have a really good reason not to


-- 
To view, visit http://gerrit.cloudera.org:8080/7983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4b67cd7762f682c3a054e0d9641080aa51801c83
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Bump Kudu version to 3f49724

2017-09-12 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: Bump Kudu version to 3f49724
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/8040
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I21aff562e2bca90436a8d0206ffca44b712bc1f1
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder

2017-09-12 Thread Pranay Singh (Code Review)
Pranay Singh has uploaded a new patch set (#2).

Change subject: IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder
..

IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder

Currently DictDecoder class and DictEncoder class uses std::vector
to store the tables mapping codeword to value and vice-versa. It is
hard to detect the memory usage by these tables when they becomes
very large, since this memory is not accounted by Impala's memory
mangement infrastructure.

This patch uses existing dictionary_pool_ to track the memory used
by std::vector in DictDecoder class, whereas an existing memory pool
for DictEncoder class is used when allocating memory for std::vector
in DictEncoder class.

Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299
---
M be/src/exec/parquet-column-readers.cc
M be/src/util/dict-encoding.h
M be/src/util/dict-test.cc
3 files changed, 42 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/34/8034/2
-- 
To view, visit http://gerrit.cloudera.org:8080/8034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Joe McDonnell 


[Impala-ASF-CR] Bump Kudu version to 3f49724

2017-09-12 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/8040

Change subject: Bump Kudu version to 3f49724
..

Bump Kudu version to 3f49724

Change-Id: I21aff562e2bca90436a8d0206ffca44b712bc1f1
---
M bin/impala-config.sh
1 file changed, 2 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/8040/1
-- 
To view, visit http://gerrit.cloudera.org:8080/8040
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I21aff562e2bca90436a8d0206ffca44b712bc1f1
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-5890: Abort queries if scanner hits IO errors

2017-09-12 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5890: Abort queries if scanner hits IO errors
..


Patch Set 8:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8011/8/be/src/exec/base-sequence-scanner.cc
File be/src/exec/base-sequence-scanner.cc:

Line 180: 
state_->LogError(ErrorMsg(TErrorCode::SEQUENCE_SCANNER_PARSE_ERROR,
Unfortunately it looks like when abort_on_error=1 we depend on this status 
wrapping to add a filename and offset for some parse errors, e.g. "Table schema 
is not a record". I think the filename is pretty essential to understand and 
fix any errors so I think we should be careful not to drop it.

Ideally I think all parse errors from the scanners would just include that 
context when original constructed. Not sure how much of a project it would be 
to go through all the sequence scanners and fix that. It's much cleaner using 
LogOrReturnError().

Maybe in the meantime we should just do:

if (!status.IsCancelled() && !status.IsMemLimitExceeded() && 
!status.IsDiskIoError()) {
state_->LogError(ErrorMsg(TErrorCode::SEQUENCE_SCANNER_PARSE_ERROR, ...)
  }


http://gerrit.cloudera.org:8080/#/c/8011/6/be/src/runtime/disk-io-mgr.cc
File be/src/runtime/disk-io-mgr.cc:

Line 544: return Status(TErrorCode::DISK_IO_ERROR,
> Should we re-use DISK_IO_ERROR for those, even though they're technically n
I think it makes sense to use DISK_IO_ERROR for now if we interpret its meaning 
broadly as "an error that prevented the disk I/O manager completing a request 
range".


-- 
To view, visit http://gerrit.cloudera.org:8080/8011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5856: Fix outer join predicate assignment.

2017-09-12 Thread Alex Behm (Code Review)
Alex Behm has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/8039

Change subject: IMPALA-5856: Fix outer join predicate assignment.
..

IMPALA-5856: Fix outer join predicate assignment.

Fixes incorrect assignment of join predicates with
the following properties:
- from the On-clause of a left outer join
- only references the left-hand side tuples (not the
  right hand side tuple)
- references full-outer joined tuples; the full outer
  join appears on the left

Testing:
- a core/hdfs run passed
- added new regression test

Change-Id: I93db34d988cb66e00aa05d7dc161e0ca47042acb
---
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test
2 files changed, 43 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/8039/1
-- 
To view, visit http://gerrit.cloudera.org:8080/8039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I93db34d988cb66e00aa05d7dc161e0ca47042acb
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 


[Impala-ASF-CR] IMPALA-5736: Add impala-shell argument to set default query options

2017-09-12 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/8038

Change subject: IMPALA-5736: Add impala-shell argument to set default query 
options
..

IMPALA-5736: Add impala-shell argument to set default query options

Query options can be set from command line and impala rc as comma
separated key=value pairs.

Examples:
command line:
impala-shell.sh --query_options=EXPLAIN_LEVEL=2,MT_DOP=2

.impalarc:
[impala]
query_options=EXPLAIN_LEVEL=2,MT_DOP=2

Note that the option set in command line will overwrite the one
in impalarc instead of updating it as a set. For example, if
impalarc contains "query_options=EXPLAIN_LEVEL=2", and the shell
command contains --query_options="MT_DOP=2", only MT_DOP will be set,
and EXPLAIN_LEVEL will use its default value.

Change-Id: I26a3b67230c80a99bd246b6af205d558fec9a986
---
M shell/impala_shell.py
M shell/impala_shell_config_defaults.py
M shell/option_parser.py
3 files changed, 13 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/8038/1
-- 
To view, visit http://gerrit.cloudera.org:8080/8038
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I26a3b67230c80a99bd246b6af205d558fec9a986
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Csaba Ringhofer 


[Impala-ASF-CR] IMPALA-5890: Abort queries if scanner hits IO errors

2017-09-12 Thread Lars Volker (Code Review)
Lars Volker has uploaded a new patch set (#8).

Change subject: IMPALA-5890: Abort queries if scanner hits IO errors
..

IMPALA-5890: Abort queries if scanner hits IO errors

Prior to this fix, an error in ScannerContext::Stream::GetNextBuffer()
could leave the stream in an inconsistent state:

- The DiskIoMgr hits EOF unexpected, cancels the scan range and enqueues
a buffer with eosr set.
- The ScannerContext::Stream tries to read more bytes, but since it has
hit eosr, it tries to read beyond the end of the scan range using
DiskIoMgr::Read().
- The previous read error resulted in a new file handle being opened.
The now truncated, smaller file causes the seek to fail.
- Then during error handling, the BaseSequenceScanner calls SkipToSync()
and trips over the NULL pointer in in the IO buffer.

In my reproduction this only happens with the file handle cache enabled,
which causes Impala to see two different sized handles: the one from the
cache when the query starts, and the one after reopening the file.

To fix this, we change the I/O manager to always return DISK_IO_ERROR
for errors and we abort a query if we receive such an error in the
scanner.

This change also fixes GetBytesInternal() to maintain the invariant that
the output buffer points to the boundary buffer whenever the latter
contains some data.

I tested this by running the repro from the JIRA and impalad did not
crash but aborted the queries. I also ran the repro with
abort_on_error=1, and with the file handle cache disabled.

Text files are not affected by this problem, since the
text scanner doesn't try to recover from errors during ProcessRange()
but wraps it in RETURN_IF_ERROR instead. With this change queries abort
with the same error.

Parquet files are also not affected since they have the metadata at the
end. Truncated files immediately fail with this error:
WARNINGS: File 
'hdfs://localhost:20500/test-warehouse/tpch.partsupp_parquet/foo.0.parq'
has an invalid version number: 

Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509
---
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
M be/src/runtime/disk-io-mgr-scan-range.cc
M be/src/runtime/disk-io-mgr.cc
M be/src/runtime/runtime-state.cc
M common/thrift/generate_error_codes.py
7 files changed, 86 insertions(+), 48 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/8011/8
-- 
To view, visit http://gerrit.cloudera.org:8080/8011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4082: Remove todo item in getRegionsInRange

2017-09-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4082: Remove todo item in getRegionsInRange
..


IMPALA-4082: Remove todo item in getRegionsInRange

HbaseTable.GetRegionsInRange is a function copied and modified from
HTable in Hbase 0.95.1 or ealier. The HTable function uses cached region
location and the modified version gets more up-to-date information.
There is a todo item for the removal of this modified function if
Hbase provides the same functionality itself. In Hbase 0.95.2, the
HTable function is renamed to getKeysAndRegionsInRange and in Hbase
0.99 it became a private function. Thus this todo item is no longer
needed and is removed by this patch.

Change-Id: I35f0a3cd9363b55b3cde237beaf3037a7967582a
Reviewed-on: http://gerrit.cloudera.org:8080/8018
Reviewed-by: Lars Volker 
Reviewed-by: Alex Behm 
Tested-by: Impala Public Jenkins
---
M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java
1 file changed, 8 insertions(+), 8 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Lars Volker: Looks good to me, but someone else must approve
  Alex Behm: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/8018
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I35f0a3cd9363b55b3cde237beaf3037a7967582a
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 


[Impala-ASF-CR] IMPALA-4082: Remove todo item in getRegionsInRange

2017-09-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4082: Remove todo item in getRegionsInRange
..


Patch Set 1: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/8018
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I35f0a3cd9363b55b3cde237beaf3037a7967582a
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5894: [DOCS] Clarify placement of STRAIGHT JOIN hint

2017-09-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5894: [DOCS] Clarify placement of STRAIGHT_JOIN hint
..


IMPALA-5894: [DOCS] Clarify placement of STRAIGHT_JOIN hint

"immediately after the SELECT keyword" was mentioned in a
few places for STRAIGHT_JOIN. I reworded all instances to
mention that [DISTINCT | ALL] can also come before the
hint name.

Change-Id: I3cac1afccc132f389b2017ad217fdf7e7b04513a
Reviewed-on: http://gerrit.cloudera.org:8080/8031
Reviewed-by: Alex Behm 
Tested-by: Impala Public Jenkins
---
M docs/shared/impala_common.xml
M docs/topics/impala_hints.xml
M docs/topics/impala_perf_joins.xml
3 files changed, 12 insertions(+), 5 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Alex Behm: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/8031
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I3cac1afccc132f389b2017ad217fdf7e7b04513a
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins


[Impala-ASF-CR] IMPALA-5894: [DOCS] Clarify placement of STRAIGHT JOIN hint

2017-09-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5894: [DOCS] Clarify placement of STRAIGHT_JOIN hint
..


Patch Set 1: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/8031
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3cac1afccc132f389b2017ad217fdf7e7b04513a
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No


[Impala-ASF-CR] [DOCS] Explain Boost setting needed for 96-bit timestamps

2017-09-12 Thread John Russell (Code Review)
John Russell has posted comments on this change.

Change subject: [DOCS] Explain Boost setting needed for 96-bit timestamps
..


Patch Set 2:

Tim and Michael, I added you as reviewers based on your frequent activity in 
be/src/exprs/. If someone else is more familiar with the compilation aspect 
w.r.t. TIMESTAMP, please suggest any other reviewer.

-- 
To view, visit http://gerrit.cloudera.org:8080/7983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4b67cd7762f682c3a054e0d9641080aa51801c83
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5894: [DOCS] Clarify placement of STRAIGHT JOIN hint

2017-09-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5894: [DOCS] Clarify placement of STRAIGHT_JOIN hint
..


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-docs-submit/153/

-- 
To view, visit http://gerrit.cloudera.org:8080/8031
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3cac1afccc132f389b2017ad217fdf7e7b04513a
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No