[Impala-ASF-CR] IMPALA-7586: fix predicate pushdown of escaped strings

2018-11-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/11814 )

Change subject: IMPALA-7586: fix predicate pushdown of escaped strings
..

IMPALA-7586: fix predicate pushdown of escaped strings

This fixes a class of bugs where the planner incorrectly uses the raw
string from the parser instead of the unescaped string. This occurs in
several places that push predicates down to the storage layer:
* Kudu scans
* HBase scans
* Data source scans

There are some more complex issues with escapes and the LIKE predicate
that are tracked separately by IMPALA-2422.

This also uncovered a different issue with RCFiles that is tracked by
IMPALA-7778 and is worked around by the tests added.

In order to make bugs like this more obvious in future, I renamed
getValue() to getValueWithOriginalEscapes().

Testing:
Added regression test that tests handling of backslash escapes on all
file formats. I did not add a regression test for the data source bug
since it seems to require some major modification of the data source
test infrastructure.

Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28
Reviewed-on: http://gerrit.cloudera.org:8080/11814
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins 
---
M fe/src/main/java/org/apache/impala/analysis/AdminFnStmt.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/ExtractFromExpr.java
M fe/src/main/java/org/apache/impala/analysis/LikePredicate.java
M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M testdata/data/README
A testdata/data/strings_with_quotes.csv
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A 
testdata/workloads/functional-query/queries/QueryTest/string-escaping-rcfile-bug.test
A testdata/workloads/functional-query/queries/QueryTest/string-escaping.test
M tests/query_test/test_scanners.py
15 files changed, 199 insertions(+), 14 deletions(-)

Approvals:
  Tim Armstrong: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/11814
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28
Gerrit-Change-Number: 11814
Gerrit-PatchSet: 9
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-7586: fix predicate pushdown of escaped strings

2018-11-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11814 )

Change subject: IMPALA-7586: fix predicate pushdown of escaped strings
..


Patch Set 8: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/11814
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28
Gerrit-Change-Number: 11814
Gerrit-PatchSet: 8
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 01 Nov 2018 21:27:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7586: fix predicate pushdown of escaped strings

2018-11-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11814 )

Change subject: IMPALA-7586: fix predicate pushdown of escaped strings
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1250/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/11814
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28
Gerrit-Change-Number: 11814
Gerrit-PatchSet: 8
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 01 Nov 2018 18:00:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7586: fix predicate pushdown of escaped strings

2018-11-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11814 )

Change subject: IMPALA-7586: fix predicate pushdown of escaped strings
..


Patch Set 8:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3403/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/11814
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28
Gerrit-Change-Number: 11814
Gerrit-PatchSet: 8
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 01 Nov 2018 17:26:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7586: fix predicate pushdown of escaped strings

2018-11-01 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11814 )

Change subject: IMPALA-7586: fix predicate pushdown of escaped strings
..


Patch Set 8:

Rebase onto https://gerrit.cloudera.org/#/c/11800/ required updating test files 
to use new quoting.


--
To view, visit http://gerrit.cloudera.org:8080/11814
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28
Gerrit-Change-Number: 11814
Gerrit-PatchSet: 8
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 01 Nov 2018 17:25:48 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7586: fix predicate pushdown of escaped strings

2018-11-01 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11814 )

Change subject: IMPALA-7586: fix predicate pushdown of escaped strings
..


Patch Set 8: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/11814
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28
Gerrit-Change-Number: 11814
Gerrit-PatchSet: 8
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 01 Nov 2018 17:25:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7586: fix predicate pushdown of escaped strings

2018-11-01 Thread Tim Armstrong (Code Review)
Hello Bharath Vissapragada, Csaba Ringhofer, Impala Public Jenkins, Vuk 
Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11814

to look at the new patch set (#8).

Change subject: IMPALA-7586: fix predicate pushdown of escaped strings
..

IMPALA-7586: fix predicate pushdown of escaped strings

This fixes a class of bugs where the planner incorrectly uses the raw
string from the parser instead of the unescaped string. This occurs in
several places that push predicates down to the storage layer:
* Kudu scans
* HBase scans
* Data source scans

There are some more complex issues with escapes and the LIKE predicate
that are tracked separately by IMPALA-2422.

This also uncovered a different issue with RCFiles that is tracked by
IMPALA-7778 and is worked around by the tests added.

In order to make bugs like this more obvious in future, I renamed
getValue() to getValueWithOriginalEscapes().

Testing:
Added regression test that tests handling of backslash escapes on all
file formats. I did not add a regression test for the data source bug
since it seems to require some major modification of the data source
test infrastructure.

Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28
---
M fe/src/main/java/org/apache/impala/analysis/AdminFnStmt.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/ExtractFromExpr.java
M fe/src/main/java/org/apache/impala/analysis/LikePredicate.java
M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M testdata/data/README
A testdata/data/strings_with_quotes.csv
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A 
testdata/workloads/functional-query/queries/QueryTest/string-escaping-rcfile-bug.test
A testdata/workloads/functional-query/queries/QueryTest/string-escaping.test
M tests/query_test/test_scanners.py
15 files changed, 199 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/11814/8
--
To view, visit http://gerrit.cloudera.org:8080/11814
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28
Gerrit-Change-Number: 11814
Gerrit-PatchSet: 8
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-7586: fix predicate pushdown of escaped strings

2018-10-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11814 )

Change subject: IMPALA-7586: fix predicate pushdown of escaped strings
..


Patch Set 7: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/3393/


--
To view, visit http://gerrit.cloudera.org:8080/11814
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28
Gerrit-Change-Number: 11814
Gerrit-PatchSet: 7
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 01 Nov 2018 02:16:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7586: fix predicate pushdown of escaped strings

2018-10-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11814 )

Change subject: IMPALA-7586: fix predicate pushdown of escaped strings
..


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3393/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/11814
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28
Gerrit-Change-Number: 11814
Gerrit-PatchSet: 7
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 31 Oct 2018 22:14:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7586: fix predicate pushdown of escaped strings

2018-10-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11814 )

Change subject: IMPALA-7586: fix predicate pushdown of escaped strings
..


Patch Set 7: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/11814
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28
Gerrit-Change-Number: 11814
Gerrit-PatchSet: 7
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 31 Oct 2018 22:14:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7586: fix predicate pushdown of escaped strings

2018-10-31 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11814 )

Change subject: IMPALA-7586: fix predicate pushdown of escaped strings
..


Patch Set 6: Code-Review+2

Fix looks simple. +2ing myself to unblock it.


--
To view, visit http://gerrit.cloudera.org:8080/11814
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28
Gerrit-Change-Number: 11814
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 31 Oct 2018 22:01:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7586: fix predicate pushdown of escaped strings

2018-10-30 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11814 )

Change subject: IMPALA-7586: fix predicate pushdown of escaped strings
..


Patch Set 6: Code-Review+1

(3 comments)

http://gerrit.cloudera.org:8080/#/c/11814/5/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java:

http://gerrit.cloudera.org:8080/#/c/11814/5/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java@319
PS5, Line 319: getExplainString
> The behaviour here seems ok to me - it calls toSql(), which consistently es
The behavior overall seems ok to me as well. But in this case, it diverged from 
the original conjunct that was used to push down.


http://gerrit.cloudera.org:8080/#/c/11814/5/testdata/data/strings_with_quotes.csv
File testdata/data/strings_with_quotes.csv:

http://gerrit.cloudera.org:8080/#/c/11814/5/testdata/data/strings_with_quotes.csv@11
PS5, Line 11: foo\\"bar,11
> Done. I assume you meant that you wanted the value in the table in include
Yep.


http://gerrit.cloudera.org:8080/#/c/11814/5/testdata/workloads/functional-query/queries/QueryTest/string-escaping-rcfile-bug.test
File 
testdata/workloads/functional-query/queries/QueryTest/string-escaping-rcfile-bug.test:

http://gerrit.cloudera.org:8080/#/c/11814/5/testdata/workloads/functional-query/queries/QueryTest/string-escaping-rcfile-bug.test@4
PS5, Line 4: 
> I think it's better to exercise and track the known-bad behaviour than leav
Agreed.



--
To view, visit http://gerrit.cloudera.org:8080/11814
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28
Gerrit-Change-Number: 11814
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 30 Oct 2018 21:24:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7586: fix predicate pushdown of escaped strings

2018-10-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11814 )

Change subject: IMPALA-7586: fix predicate pushdown of escaped strings
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1214/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/11814
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28
Gerrit-Change-Number: 11814
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 30 Oct 2018 20:02:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7586: fix predicate pushdown of escaped strings

2018-10-30 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11814 )

Change subject: IMPALA-7586: fix predicate pushdown of escaped strings
..


Patch Set 5:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/11814/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11814/5//COMMIT_MSG@9
PS5, Line 9: This fixes a class of bugs where the planner incorrectly uses the 
raw
   : string from the parser instead of the unescaped string.
> fyi, I have found a similar issue in partition pruning, see IMPALA-7784.
ack


http://gerrit.cloudera.org:8080/#/c/11814/5//COMMIT_MSG@26
PS5, Line 26: Added regression test that tests handling of backslash escapes on 
all file
: formats. I did not add a regression test for the data source bug 
since it
> nit: long lines, please wrap at 72
Done


http://gerrit.cloudera.org:8080/#/c/11814/5/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java:

http://gerrit.cloudera.org:8080/#/c/11814/5/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java@319
PS5, Line 319: getExplainString
> It looks like this always prints the normalized/unescaped string even thoug
The behaviour here seems ok to me - it calls toSql(), which consistently 
escapes things. If we didn't escape things, then the expressions in the explain 
output might not be valid expressions in our SQL expressions, which seems 
inconvenient.


http://gerrit.cloudera.org:8080/#/c/11814/5/testdata/data/strings_with_quotes.csv
File testdata/data/strings_with_quotes.csv:

http://gerrit.cloudera.org:8080/#/c/11814/5/testdata/data/strings_with_quotes.csv@11
PS5, Line 11:
> add foo\"bar, 11 too? That shouldn't be returned with ..where s = "foo\"bar
Done. I assume you meant that you wanted the value in the table in include the 
backslash, which requires a double backslash here.


http://gerrit.cloudera.org:8080/#/c/11814/5/testdata/workloads/functional-query/queries/QueryTest/string-escaping-rcfile-bug.test
File 
testdata/workloads/functional-query/queries/QueryTest/string-escaping-rcfile-bug.test:

http://gerrit.cloudera.org:8080/#/c/11814/5/testdata/workloads/functional-query/queries/QueryTest/string-escaping-rcfile-bug.test@1
PS5, Line 1: 
> Can you add an extra comment to make it even clearer that this file contain
Done


http://gerrit.cloudera.org:8080/#/c/11814/5/testdata/workloads/functional-query/queries/QueryTest/string-escaping-rcfile-bug.test@4
PS5, Line 4: # IMPALA-7778: escapes are ignored so output is incorrect
> Can't we just xfail if format == 'rc' instead of doing this? or is it becau
I think it's better to exercise and track the known-bad behaviour than leave it 
unexercised.

Also, honestly, I don't know if there is enough usage of RC at this point that 
anyone will be motivated to fix it.



--
To view, visit http://gerrit.cloudera.org:8080/11814
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28
Gerrit-Change-Number: 11814
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 30 Oct 2018 19:28:43 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7586: fix predicate pushdown of escaped strings

2018-10-30 Thread Tim Armstrong (Code Review)
Hello Bharath Vissapragada, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11814

to look at the new patch set (#6).

Change subject: IMPALA-7586: fix predicate pushdown of escaped strings
..

IMPALA-7586: fix predicate pushdown of escaped strings

This fixes a class of bugs where the planner incorrectly uses the raw
string from the parser instead of the unescaped string. This occurs in
several places that push predicates down to the storage layer:
* Kudu scans
* HBase scans
* Data source scans

There are some more complex issues with escapes and the LIKE predicate
that are tracked separately by IMPALA-2422.

This also uncovered a different issue with RCFiles that is tracked by
IMPALA-7778 and is worked around by the tests added.

In order to make bugs like this more obvious in future, I renamed
getValue() to getValueWithOriginalEscapes().

Testing:
Added regression test that tests handling of backslash escapes on all
file formats. I did not add a regression test for the data source bug
since it seems to require some major modification of the data source
test infrastructure.

Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28
---
M fe/src/main/java/org/apache/impala/analysis/AdminFnStmt.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/ExtractFromExpr.java
M fe/src/main/java/org/apache/impala/analysis/LikePredicate.java
M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M testdata/data/README
A testdata/data/strings_with_quotes.csv
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A 
testdata/workloads/functional-query/queries/QueryTest/string-escaping-rcfile-bug.test
A testdata/workloads/functional-query/queries/QueryTest/string-escaping.test
M tests/query_test/test_scanners.py
15 files changed, 199 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/11814/6
--
To view, visit http://gerrit.cloudera.org:8080/11814
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28
Gerrit-Change-Number: 11814
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-7586: fix predicate pushdown of escaped strings

2018-10-30 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11814 )

Change subject: IMPALA-7586: fix predicate pushdown of escaped strings
..


Patch Set 5:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/11814/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11814/5//COMMIT_MSG@9
PS5, Line 9: This fixes a class of bugs where the planner incorrectly uses the 
raw
   : string from the parser instead of the unescaped string.
fyi, I have found a similar issue in partition pruning, see IMPALA-7784.


http://gerrit.cloudera.org:8080/#/c/11814/5//COMMIT_MSG@26
PS5, Line 26: Added regression test that tests handling of backslash escapes on 
all file
: formats. I did not add a regression test for the data source bug 
since it
nit: long lines, please wrap at 72


http://gerrit.cloudera.org:8080/#/c/11814/5/testdata/workloads/functional-query/queries/QueryTest/string-escaping-rcfile-bug.test
File 
testdata/workloads/functional-query/queries/QueryTest/string-escaping-rcfile-bug.test:

http://gerrit.cloudera.org:8080/#/c/11814/5/testdata/workloads/functional-query/queries/QueryTest/string-escaping-rcfile-bug.test@1
PS5, Line 1: 
Can you add an extra comment to make it even clearer that this file contains 
buggy results that we plan to fix later?



--
To view, visit http://gerrit.cloudera.org:8080/11814
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28
Gerrit-Change-Number: 11814
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 30 Oct 2018 17:55:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7586: fix predicate pushdown of escaped strings

2018-10-30 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11814 )

Change subject: IMPALA-7586: fix predicate pushdown of escaped strings
..


Patch Set 5: Code-Review+1

(3 comments)

A bunch of minor comments. Fix lgtm.

http://gerrit.cloudera.org:8080/#/c/11814/5/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java:

http://gerrit.cloudera.org:8080/#/c/11814/5/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java@319
PS5, Line 319: getExplainString
It looks like this always prints the normalized/unescaped string even though we 
used the original string for predicates in the scan nodes.  We could've 
probably diagnosed the issue faster if this was right.


http://gerrit.cloudera.org:8080/#/c/11814/5/testdata/data/strings_with_quotes.csv
File testdata/data/strings_with_quotes.csv:

http://gerrit.cloudera.org:8080/#/c/11814/5/testdata/data/strings_with_quotes.csv@11
PS5, Line 11:
add foo\"bar, 11 too? That shouldn't be returned with ..where s = "foo\"bar"..


http://gerrit.cloudera.org:8080/#/c/11814/5/testdata/workloads/functional-query/queries/QueryTest/string-escaping-rcfile-bug.test
File 
testdata/workloads/functional-query/queries/QueryTest/string-escaping-rcfile-bug.test:

http://gerrit.cloudera.org:8080/#/c/11814/5/testdata/workloads/functional-query/queries/QueryTest/string-escaping-rcfile-bug.test@4
PS5, Line 4: # IMPALA-7778: escapes are ignored so output is incorrect
Can't we just xfail if format == 'rc' instead of doing this? or is it because 
if someone fixes it in the future, this test starts failing and they know it 
right away?



--
To view, visit http://gerrit.cloudera.org:8080/11814
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28
Gerrit-Change-Number: 11814
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 30 Oct 2018 17:38:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7586: fix predicate pushdown of escaped strings

2018-10-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11814 )

Change subject: IMPALA-7586: fix predicate pushdown of escaped strings
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1211/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/11814
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28
Gerrit-Change-Number: 11814
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 30 Oct 2018 16:52:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7586: fix predicate pushdown of escaped strings

2018-10-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11814 )

Change subject: IMPALA-7586: fix predicate pushdown of escaped strings
..


Patch Set 5:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/1210/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/11814
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28
Gerrit-Change-Number: 11814
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 30 Oct 2018 16:01:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7586: fix predicate pushdown of escaped strings

2018-10-30 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/11814 )

Change subject: IMPALA-7586: fix predicate pushdown of escaped strings
..

IMPALA-7586: fix predicate pushdown of escaped strings

This fixes a class of bugs where the planner incorrectly uses the raw
string from the parser instead of the unescaped string. This occurs in
several places that push predicates down to the storage layer:
* Kudu scans
* HBase scans
* Data source scans

There are some more complex issues with escapes and the LIKE predicate
that are tracked separately by IMPALA-2422.

This also uncovered a different issue with RCFiles that is tracked by
IMPALA-7778 and is worked around by the tests added.

In order to make bugs like this more obvious in future, I renamed
getValue() to getValueWithOriginalEscapes().

Testing:
Added regression test that tests handling of backslash escapes on all file
formats. I did not add a regression test for the data source bug since it
seems to require some major modification of the data source test
infrastructure.

Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28
---
M fe/src/main/java/org/apache/impala/analysis/AdminFnStmt.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/ExtractFromExpr.java
M fe/src/main/java/org/apache/impala/analysis/LikePredicate.java
M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M testdata/data/README
A testdata/data/strings_with_quotes.csv
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A 
testdata/workloads/functional-query/queries/QueryTest/string-escaping-rcfile-bug.test
A testdata/workloads/functional-query/queries/QueryTest/string-escaping.test
M tests/query_test/test_scanners.py
15 files changed, 193 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/11814/5
--
To view, visit http://gerrit.cloudera.org:8080/11814
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28
Gerrit-Change-Number: 11814
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong