[Impala-ASF-CR] IMPALA-7586: fix predicate pushdown of escaped strings
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11814 ) Change subject: IMPALA-7586: fix predicate pushdown of escaped strings .. IMPALA-7586: fix predicate pushdown of escaped strings This fixes a class of bugs where the planner incorrectly uses the raw string from the parser instead of the unescaped string. This occurs in several places that push predicates down to the storage layer: * Kudu scans * HBase scans * Data source scans There are some more complex issues with escapes and the LIKE predicate that are tracked separately by IMPALA-2422. This also uncovered a different issue with RCFiles that is tracked by IMPALA-7778 and is worked around by the tests added. In order to make bugs like this more obvious in future, I renamed getValue() to getValueWithOriginalEscapes(). Testing: Added regression test that tests handling of backslash escapes on all file formats. I did not add a regression test for the data source bug since it seems to require some major modification of the data source test infrastructure. Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28 Reviewed-on: http://gerrit.cloudera.org:8080/11814 Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins --- M fe/src/main/java/org/apache/impala/analysis/AdminFnStmt.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/ExtractFromExpr.java M fe/src/main/java/org/apache/impala/analysis/LikePredicate.java M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M testdata/data/README A testdata/data/strings_with_quotes.csv M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv A testdata/workloads/functional-query/queries/QueryTest/string-escaping-rcfile-bug.test A testdata/workloads/functional-query/queries/QueryTest/string-escaping.test M tests/query_test/test_scanners.py 15 files changed, 199 insertions(+), 14 deletions(-) Approvals: Tim Armstrong: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/11814 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28 Gerrit-Change-Number: 11814 Gerrit-PatchSet: 9 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-7586: fix predicate pushdown of escaped strings
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11814 ) Change subject: IMPALA-7586: fix predicate pushdown of escaped strings .. Patch Set 8: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/11814 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28 Gerrit-Change-Number: 11814 Gerrit-PatchSet: 8 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Thu, 01 Nov 2018 21:27:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7586: fix predicate pushdown of escaped strings
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11814 ) Change subject: IMPALA-7586: fix predicate pushdown of escaped strings .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1250/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11814 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28 Gerrit-Change-Number: 11814 Gerrit-PatchSet: 8 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Thu, 01 Nov 2018 18:00:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7586: fix predicate pushdown of escaped strings
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11814 ) Change subject: IMPALA-7586: fix predicate pushdown of escaped strings .. Patch Set 8: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3403/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/11814 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28 Gerrit-Change-Number: 11814 Gerrit-PatchSet: 8 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Thu, 01 Nov 2018 17:26:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7586: fix predicate pushdown of escaped strings
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11814 ) Change subject: IMPALA-7586: fix predicate pushdown of escaped strings .. Patch Set 8: Rebase onto https://gerrit.cloudera.org/#/c/11800/ required updating test files to use new quoting. -- To view, visit http://gerrit.cloudera.org:8080/11814 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28 Gerrit-Change-Number: 11814 Gerrit-PatchSet: 8 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Thu, 01 Nov 2018 17:25:48 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7586: fix predicate pushdown of escaped strings
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11814 ) Change subject: IMPALA-7586: fix predicate pushdown of escaped strings .. Patch Set 8: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/11814 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28 Gerrit-Change-Number: 11814 Gerrit-PatchSet: 8 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Thu, 01 Nov 2018 17:25:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7586: fix predicate pushdown of escaped strings
Hello Bharath Vissapragada, Csaba Ringhofer, Impala Public Jenkins, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11814 to look at the new patch set (#8). Change subject: IMPALA-7586: fix predicate pushdown of escaped strings .. IMPALA-7586: fix predicate pushdown of escaped strings This fixes a class of bugs where the planner incorrectly uses the raw string from the parser instead of the unescaped string. This occurs in several places that push predicates down to the storage layer: * Kudu scans * HBase scans * Data source scans There are some more complex issues with escapes and the LIKE predicate that are tracked separately by IMPALA-2422. This also uncovered a different issue with RCFiles that is tracked by IMPALA-7778 and is worked around by the tests added. In order to make bugs like this more obvious in future, I renamed getValue() to getValueWithOriginalEscapes(). Testing: Added regression test that tests handling of backslash escapes on all file formats. I did not add a regression test for the data source bug since it seems to require some major modification of the data source test infrastructure. Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28 --- M fe/src/main/java/org/apache/impala/analysis/AdminFnStmt.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/ExtractFromExpr.java M fe/src/main/java/org/apache/impala/analysis/LikePredicate.java M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M testdata/data/README A testdata/data/strings_with_quotes.csv M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv A testdata/workloads/functional-query/queries/QueryTest/string-escaping-rcfile-bug.test A testdata/workloads/functional-query/queries/QueryTest/string-escaping.test M tests/query_test/test_scanners.py 15 files changed, 199 insertions(+), 14 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/11814/8 -- To view, visit http://gerrit.cloudera.org:8080/11814 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28 Gerrit-Change-Number: 11814 Gerrit-PatchSet: 8 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-7586: fix predicate pushdown of escaped strings
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11814 ) Change subject: IMPALA-7586: fix predicate pushdown of escaped strings .. Patch Set 7: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/3393/ -- To view, visit http://gerrit.cloudera.org:8080/11814 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28 Gerrit-Change-Number: 11814 Gerrit-PatchSet: 7 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Thu, 01 Nov 2018 02:16:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7586: fix predicate pushdown of escaped strings
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11814 ) Change subject: IMPALA-7586: fix predicate pushdown of escaped strings .. Patch Set 7: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3393/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/11814 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28 Gerrit-Change-Number: 11814 Gerrit-PatchSet: 7 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Wed, 31 Oct 2018 22:14:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7586: fix predicate pushdown of escaped strings
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11814 ) Change subject: IMPALA-7586: fix predicate pushdown of escaped strings .. Patch Set 7: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/11814 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28 Gerrit-Change-Number: 11814 Gerrit-PatchSet: 7 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Wed, 31 Oct 2018 22:14:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7586: fix predicate pushdown of escaped strings
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/11814 ) Change subject: IMPALA-7586: fix predicate pushdown of escaped strings .. Patch Set 6: Code-Review+2 Fix looks simple. +2ing myself to unblock it. -- To view, visit http://gerrit.cloudera.org:8080/11814 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28 Gerrit-Change-Number: 11814 Gerrit-PatchSet: 6 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Wed, 31 Oct 2018 22:01:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7586: fix predicate pushdown of escaped strings
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/11814 ) Change subject: IMPALA-7586: fix predicate pushdown of escaped strings .. Patch Set 6: Code-Review+1 (3 comments) http://gerrit.cloudera.org:8080/#/c/11814/5/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java: http://gerrit.cloudera.org:8080/#/c/11814/5/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java@319 PS5, Line 319: getExplainString > The behaviour here seems ok to me - it calls toSql(), which consistently es The behavior overall seems ok to me as well. But in this case, it diverged from the original conjunct that was used to push down. http://gerrit.cloudera.org:8080/#/c/11814/5/testdata/data/strings_with_quotes.csv File testdata/data/strings_with_quotes.csv: http://gerrit.cloudera.org:8080/#/c/11814/5/testdata/data/strings_with_quotes.csv@11 PS5, Line 11: foo\\"bar,11 > Done. I assume you meant that you wanted the value in the table in include Yep. http://gerrit.cloudera.org:8080/#/c/11814/5/testdata/workloads/functional-query/queries/QueryTest/string-escaping-rcfile-bug.test File testdata/workloads/functional-query/queries/QueryTest/string-escaping-rcfile-bug.test: http://gerrit.cloudera.org:8080/#/c/11814/5/testdata/workloads/functional-query/queries/QueryTest/string-escaping-rcfile-bug.test@4 PS5, Line 4: > I think it's better to exercise and track the known-bad behaviour than leav Agreed. -- To view, visit http://gerrit.cloudera.org:8080/11814 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28 Gerrit-Change-Number: 11814 Gerrit-PatchSet: 6 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 30 Oct 2018 21:24:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7586: fix predicate pushdown of escaped strings
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11814 ) Change subject: IMPALA-7586: fix predicate pushdown of escaped strings .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1214/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11814 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28 Gerrit-Change-Number: 11814 Gerrit-PatchSet: 6 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 30 Oct 2018 20:02:20 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7586: fix predicate pushdown of escaped strings
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11814 ) Change subject: IMPALA-7586: fix predicate pushdown of escaped strings .. Patch Set 5: (6 comments) http://gerrit.cloudera.org:8080/#/c/11814/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11814/5//COMMIT_MSG@9 PS5, Line 9: This fixes a class of bugs where the planner incorrectly uses the raw : string from the parser instead of the unescaped string. > fyi, I have found a similar issue in partition pruning, see IMPALA-7784. ack http://gerrit.cloudera.org:8080/#/c/11814/5//COMMIT_MSG@26 PS5, Line 26: Added regression test that tests handling of backslash escapes on all file : formats. I did not add a regression test for the data source bug since it > nit: long lines, please wrap at 72 Done http://gerrit.cloudera.org:8080/#/c/11814/5/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java: http://gerrit.cloudera.org:8080/#/c/11814/5/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java@319 PS5, Line 319: getExplainString > It looks like this always prints the normalized/unescaped string even thoug The behaviour here seems ok to me - it calls toSql(), which consistently escapes things. If we didn't escape things, then the expressions in the explain output might not be valid expressions in our SQL expressions, which seems inconvenient. http://gerrit.cloudera.org:8080/#/c/11814/5/testdata/data/strings_with_quotes.csv File testdata/data/strings_with_quotes.csv: http://gerrit.cloudera.org:8080/#/c/11814/5/testdata/data/strings_with_quotes.csv@11 PS5, Line 11: > add foo\"bar, 11 too? That shouldn't be returned with ..where s = "foo\"bar Done. I assume you meant that you wanted the value in the table in include the backslash, which requires a double backslash here. http://gerrit.cloudera.org:8080/#/c/11814/5/testdata/workloads/functional-query/queries/QueryTest/string-escaping-rcfile-bug.test File testdata/workloads/functional-query/queries/QueryTest/string-escaping-rcfile-bug.test: http://gerrit.cloudera.org:8080/#/c/11814/5/testdata/workloads/functional-query/queries/QueryTest/string-escaping-rcfile-bug.test@1 PS5, Line 1: > Can you add an extra comment to make it even clearer that this file contain Done http://gerrit.cloudera.org:8080/#/c/11814/5/testdata/workloads/functional-query/queries/QueryTest/string-escaping-rcfile-bug.test@4 PS5, Line 4: # IMPALA-7778: escapes are ignored so output is incorrect > Can't we just xfail if format == 'rc' instead of doing this? or is it becau I think it's better to exercise and track the known-bad behaviour than leave it unexercised. Also, honestly, I don't know if there is enough usage of RC at this point that anyone will be motivated to fix it. -- To view, visit http://gerrit.cloudera.org:8080/11814 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28 Gerrit-Change-Number: 11814 Gerrit-PatchSet: 5 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 30 Oct 2018 19:28:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7586: fix predicate pushdown of escaped strings
Hello Bharath Vissapragada, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11814 to look at the new patch set (#6). Change subject: IMPALA-7586: fix predicate pushdown of escaped strings .. IMPALA-7586: fix predicate pushdown of escaped strings This fixes a class of bugs where the planner incorrectly uses the raw string from the parser instead of the unescaped string. This occurs in several places that push predicates down to the storage layer: * Kudu scans * HBase scans * Data source scans There are some more complex issues with escapes and the LIKE predicate that are tracked separately by IMPALA-2422. This also uncovered a different issue with RCFiles that is tracked by IMPALA-7778 and is worked around by the tests added. In order to make bugs like this more obvious in future, I renamed getValue() to getValueWithOriginalEscapes(). Testing: Added regression test that tests handling of backslash escapes on all file formats. I did not add a regression test for the data source bug since it seems to require some major modification of the data source test infrastructure. Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28 --- M fe/src/main/java/org/apache/impala/analysis/AdminFnStmt.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/ExtractFromExpr.java M fe/src/main/java/org/apache/impala/analysis/LikePredicate.java M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M testdata/data/README A testdata/data/strings_with_quotes.csv M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv A testdata/workloads/functional-query/queries/QueryTest/string-escaping-rcfile-bug.test A testdata/workloads/functional-query/queries/QueryTest/string-escaping.test M tests/query_test/test_scanners.py 15 files changed, 199 insertions(+), 14 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/11814/6 -- To view, visit http://gerrit.cloudera.org:8080/11814 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28 Gerrit-Change-Number: 11814 Gerrit-PatchSet: 6 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-7586: fix predicate pushdown of escaped strings
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/11814 ) Change subject: IMPALA-7586: fix predicate pushdown of escaped strings .. Patch Set 5: (3 comments) http://gerrit.cloudera.org:8080/#/c/11814/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11814/5//COMMIT_MSG@9 PS5, Line 9: This fixes a class of bugs where the planner incorrectly uses the raw : string from the parser instead of the unescaped string. fyi, I have found a similar issue in partition pruning, see IMPALA-7784. http://gerrit.cloudera.org:8080/#/c/11814/5//COMMIT_MSG@26 PS5, Line 26: Added regression test that tests handling of backslash escapes on all file : formats. I did not add a regression test for the data source bug since it nit: long lines, please wrap at 72 http://gerrit.cloudera.org:8080/#/c/11814/5/testdata/workloads/functional-query/queries/QueryTest/string-escaping-rcfile-bug.test File testdata/workloads/functional-query/queries/QueryTest/string-escaping-rcfile-bug.test: http://gerrit.cloudera.org:8080/#/c/11814/5/testdata/workloads/functional-query/queries/QueryTest/string-escaping-rcfile-bug.test@1 PS5, Line 1: Can you add an extra comment to make it even clearer that this file contains buggy results that we plan to fix later? -- To view, visit http://gerrit.cloudera.org:8080/11814 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28 Gerrit-Change-Number: 11814 Gerrit-PatchSet: 5 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 30 Oct 2018 17:55:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7586: fix predicate pushdown of escaped strings
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/11814 ) Change subject: IMPALA-7586: fix predicate pushdown of escaped strings .. Patch Set 5: Code-Review+1 (3 comments) A bunch of minor comments. Fix lgtm. http://gerrit.cloudera.org:8080/#/c/11814/5/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java: http://gerrit.cloudera.org:8080/#/c/11814/5/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java@319 PS5, Line 319: getExplainString It looks like this always prints the normalized/unescaped string even though we used the original string for predicates in the scan nodes. We could've probably diagnosed the issue faster if this was right. http://gerrit.cloudera.org:8080/#/c/11814/5/testdata/data/strings_with_quotes.csv File testdata/data/strings_with_quotes.csv: http://gerrit.cloudera.org:8080/#/c/11814/5/testdata/data/strings_with_quotes.csv@11 PS5, Line 11: add foo\"bar, 11 too? That shouldn't be returned with ..where s = "foo\"bar".. http://gerrit.cloudera.org:8080/#/c/11814/5/testdata/workloads/functional-query/queries/QueryTest/string-escaping-rcfile-bug.test File testdata/workloads/functional-query/queries/QueryTest/string-escaping-rcfile-bug.test: http://gerrit.cloudera.org:8080/#/c/11814/5/testdata/workloads/functional-query/queries/QueryTest/string-escaping-rcfile-bug.test@4 PS5, Line 4: # IMPALA-7778: escapes are ignored so output is incorrect Can't we just xfail if format == 'rc' instead of doing this? or is it because if someone fixes it in the future, this test starts failing and they know it right away? -- To view, visit http://gerrit.cloudera.org:8080/11814 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28 Gerrit-Change-Number: 11814 Gerrit-PatchSet: 5 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 30 Oct 2018 17:38:19 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7586: fix predicate pushdown of escaped strings
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11814 ) Change subject: IMPALA-7586: fix predicate pushdown of escaped strings .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1211/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11814 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28 Gerrit-Change-Number: 11814 Gerrit-PatchSet: 5 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 30 Oct 2018 16:52:14 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7586: fix predicate pushdown of escaped strings
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11814 ) Change subject: IMPALA-7586: fix predicate pushdown of escaped strings .. Patch Set 5: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/1210/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/11814 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28 Gerrit-Change-Number: 11814 Gerrit-PatchSet: 5 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 30 Oct 2018 16:01:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7586: fix predicate pushdown of escaped strings
Tim Armstrong has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/11814 ) Change subject: IMPALA-7586: fix predicate pushdown of escaped strings .. IMPALA-7586: fix predicate pushdown of escaped strings This fixes a class of bugs where the planner incorrectly uses the raw string from the parser instead of the unescaped string. This occurs in several places that push predicates down to the storage layer: * Kudu scans * HBase scans * Data source scans There are some more complex issues with escapes and the LIKE predicate that are tracked separately by IMPALA-2422. This also uncovered a different issue with RCFiles that is tracked by IMPALA-7778 and is worked around by the tests added. In order to make bugs like this more obvious in future, I renamed getValue() to getValueWithOriginalEscapes(). Testing: Added regression test that tests handling of backslash escapes on all file formats. I did not add a regression test for the data source bug since it seems to require some major modification of the data source test infrastructure. Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28 --- M fe/src/main/java/org/apache/impala/analysis/AdminFnStmt.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/ExtractFromExpr.java M fe/src/main/java/org/apache/impala/analysis/LikePredicate.java M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M testdata/data/README A testdata/data/strings_with_quotes.csv M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv A testdata/workloads/functional-query/queries/QueryTest/string-escaping-rcfile-bug.test A testdata/workloads/functional-query/queries/QueryTest/string-escaping.test M tests/query_test/test_scanners.py 15 files changed, 193 insertions(+), 14 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/11814/5 -- To view, visit http://gerrit.cloudera.org:8080/11814 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28 Gerrit-Change-Number: 11814 Gerrit-PatchSet: 5 Gerrit-Owner: Tim Armstrong