Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/21825 )
Change subject: IMPALA-889: Add trim() function matching ANSI SQL definition ...................................................................... Patch Set 6: (6 comments) http://gerrit.cloudera.org:8080/#/c/21825/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21825/6//COMMIT_MSG@55 PS6, Line 55: However, despite : that logic in the code, having "TRIM" just as a keyword still results in : syntax error (e.g. in queries like "create table t1(trim int)" ). With the current state of the patch "create table t1(trim int)" would succeed, right? The sentence could be clearer about this, for example "having "TRIM" just as a keyword still result"->"having "TRIM" as a keyword would still result" http://gerrit.cloudera.org:8080/#/c/21825/6/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: http://gerrit.cloudera.org:8080/#/c/21825/6/fe/src/main/cup/sql-parser.cup@3870 PS6, Line 3870: "select sum(count(id))". I don't get it, how does this affect sum(count(id))? All the new causes contain KW_FROM, which should distinguish it from sum/count http://gerrit.cloudera.org:8080/#/c/21825/6/fe/src/main/jflex/sql-scanner.flex File fe/src/main/jflex/sql-scanner.flex: http://gerrit.cloudera.org:8080/#/c/21825/6/fe/src/main/jflex/sql-scanner.flex@421 PS6, Line 421: "leading", "trailing", "both" I still think that it would be better to keep these as reserved and add them as keywords. My problem with removing them from reservedWords is that this will lead to allowing statements like "create table t1(trailing int);". This is currently not allowed in Impala and Hive also doesn't allow it. If we allow it with the patch, that means that users can write queries that contain trealing/leading/both without backticks, and later disallowing this would break these queries. An example why we would disallow this in the future is moving to the calcite based planner, which probably use similar parsing rules as Hive. So I prefer to be as restrictive as possible (without breaking existing queries of course). http://gerrit.cloudera.org:8080/#/c/21825/6/testdata/workloads/functional-query/queries/QueryTest/utf8-string-functions.test File testdata/workloads/functional-query/queries/QueryTest/utf8-string-functions.test: http://gerrit.cloudera.org:8080/#/c/21825/6/testdata/workloads/functional-query/queries/QueryTest/utf8-string-functions.test@398 PS6, Line 398: ---- QUERY I would prefer to reorganize the tests a bit: For queries that are semantically equivalent to existing trim/rtrim/ltrim tests in this file, I would put the new test next to the original, e.g. after select id, name, rtrim(name, '?????') from utf8_str_tiny; I would put select id, name, trim(trailing name, '?????') from utf8_str_tiny; For other tests I don't think that the utf8 handling is important, so for example select id, name, trim(left('??????‘’“”????????',5) from name) from utf8_str_tiny; doesn't seem to trim anything from the dataset. A simple test like select id, name, trim(left('??????‘’“”????????',5) from "!a!"; seems more useful. Also, tests where utf8 handling is not relevant could be moved to other files, for example expr-test.cc or exprs.test. Currently most trim tests are in expr-test.cc http://gerrit.cloudera.org:8080/#/c/21825/6/testdata/workloads/functional-query/queries/QueryTest/utf8-string-functions.test@400 PS6, Line 400: leading I don't see any test for `both` - can you add one test with charset, and one without charset? http://gerrit.cloudera.org:8080/#/c/21825/6/testdata/workloads/functional-query/queries/QueryTest/utf8-string-functions.test@700 PS6, Line 700: set utf8_mode=false; The current tests always use utf8_mode=true, while this test uses utf8_trim, so the non-utf8 code path is not tested. -- To view, visit http://gerrit.cloudera.org:8080/21825 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3c4fa6d0d8d0684c4b6d8dac8fd531d205e4f7b4 Gerrit-Change-Number: 21825 Gerrit-PatchSet: 6 Gerrit-Owner: Mihaly Szjatinya <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Mihaly Szjatinya <[email protected]> Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]> Gerrit-Reviewer: Peter Rozsa <[email protected]> Gerrit-Comment-Date: Mon, 14 Oct 2024 12:27:13 +0000 Gerrit-HasComments: Yes
