Zach Amsden has posted comments on this change. Change subject: IMPALA-4729: Implement REPLACE() ......................................................................
Patch Set 6: (1 comment) After pre-loading the data (lost the first few lines, can't figure out how to get more scrollback in bash on Windows yet ;) But we can see replace() now wins on all simple replacements. The first version lost horribly on replace of a single space with 17 spaces. The fancier buffer sizing on expanding patterns was actually required. +------------------------------------------------------------------+ | sum(length(regexp_replace(l_comment, ' ', ' '))) | +------------------------------------------------------------------+ | 496964585 | +------------------------------------------------------------------+ Fetched 1 row(s) in 3.63s [localhost:21000] > select sum(length(replace(l_comment, ' ', ' '))) from tpch.lineitem; Query: select sum(length(replace(l_comment, ' ', ' '))) from tpch.lineitem Query submitted at: 2017-02-03 19:21:42 (Coordinator: http://impala-dev:25000) Query progress can be monitored at: http://impala-dev:25000/query_plan?query_id=504aabb65f306ab1:7601c2c800000000 +-----------------------------------------------------------+ | sum(length(replace(l_comment, ' ', ' '))) | +-----------------------------------------------------------+ | 496964585 | +-----------------------------------------------------------+ Fetched 1 row(s) in 1.63s [localhost:21000] > select sum(length(regexp_replace(l_comment, ' ', ''))) from tpch.lineitem; Query: select sum(length(regexp_replace(l_comment, ' ', ''))) from tpch.lineitem Query submitted at: 2017-02-03 19:21:58 (Coordinator: http://impala-dev:25000) Query progress can be monitored at: http://impala-dev:25000/query_plan?query_id=9440a311864f0940:f89d1f3e00000000 +-------------------------------------------------+ | sum(length(regexp_replace(l_comment, ' ', ''))) | +-------------------------------------------------+ | 137874248 | +-------------------------------------------------+ Fetched 1 row(s) in 3.04s [localhost:21000] > select sum(length(replace(l_comment, ' ', ''))) from tpch.lineitem; Query: select sum(length(replace(l_comment, ' ', ''))) from tpch.lineitem Query submitted at: 2017-02-03 19:22:09 (Coordinator: http://impala-dev:25000) Query progress can be monitored at: http://impala-dev:25000/query_plan?query_id=c345d0f14967fd99:4b23ef8c00000000 +------------------------------------------+ | sum(length(replace(l_comment, ' ', ''))) | +------------------------------------------+ | 137874248 | +------------------------------------------+ Fetched 1 row(s) in 1.54s [localhost:21000] > select sum(length(regexp_replace(l_comment, 'e', 'I'))) from tpch.lineitem; Query: select sum(length(regexp_replace(l_comment, 'e', 'I'))) from tpch.lineitem Query submitted at: 2017-02-03 19:22:47 (Coordinator: http://impala-dev:25000) Query progress can be monitored at: http://impala-dev:25000/query_plan?query_id=d8405ff45581ef67:7df386ab00000000 +--------------------------------------------------+ | sum(length(regexp_replace(l_comment, 'e', 'i'))) | +--------------------------------------------------+ | 158997209 | +--------------------------------------------------+ Fetched 1 row(s) in 2.84s [localhost:21000] > select sum(length(replace(l_comment, 'e', 'I'))) from tpch.lineitem; Query: select sum(length(replace(l_comment, 'e', 'I'))) from tpch.lineitem Query submitted at: 2017-02-03 19:22:58 (Coordinator: http://impala-dev:25000) Query progress can be monitored at: http://impala-dev:25000/query_plan?query_id=5f42bba668201666:623a45f200000000 +-------------------------------------------+ | sum(length(replace(l_comment, 'e', 'i'))) | +-------------------------------------------+ | 158997209 | +-------------------------------------------+ Fetched 1 row(s) in 1.63s [localhost:21000] > select sum(length(regex_replace(l_comment, 'he', 'HE'))) from tpch.lineitem; Query: select sum(length(regex_replace(l_comment, 'he', 'HE'))) from tpch.lineitem Query submitted at: 2017-02-03 19:23:30 (Coordinator: http://impala-dev:25000) ERROR: AnalysisException: default.regex_replace() unknown [localhost:21000] > select sum(length(regexp_replace(l_comment, 'he', 'HE'))) from tpch.lineitem; Query: select sum(length(regexp_replace(l_comment, 'he', 'HE'))) from tpch.lineitem Query submitted at: 2017-02-03 19:23:37 (Coordinator: http://impala-dev:25000) Query progress can be monitored at: http://impala-dev:25000/query_plan?query_id=134a4fc63cf86279:372eeb6f00000000 +----------------------------------------------------+ | sum(length(regexp_replace(l_comment, 'he', 'he'))) | +----------------------------------------------------+ | 158997209 | +----------------------------------------------------+ Fetched 1 row(s) in 1.73s [localhost:21000] > select sum(length(replace(l_comment, 'he', 'HE'))) from tpch.lineitem; Query: select sum(length(replace(l_comment, 'he', 'HE'))) from tpch.lineitem Query submitted at: 2017-02-03 19:23:45 (Coordinator: http://impala-dev:25000) Query progress can be monitored at: http://impala-dev:25000/query_plan?query_id=1541a32bc801a543:8efcb85300000000 +---------------------------------------------+ | sum(length(replace(l_comment, 'he', 'he'))) | +---------------------------------------------+ | 158997209 | +---------------------------------------------+ Fetched 1 row(s) in 1.53s [localhost:21000] > select sum(length(regexp_replace(l_comment, 'comment', '//'))) from tpch.lineitem; Query: select sum(length(regexp_replace(l_comment, 'comment', '//'))) from tpch.lineitem Query submitted at: 2017-02-03 19:24:22 (Coordinator: http://impala-dev:25000) Query progress can be monitored at: http://impala-dev:25000/query_plan?query_id=5e453cd4807ee411:987abebe00000000 +---------------------------------------------------------+ | sum(length(regexp_replace(l_comment, 'comment', '//'))) | +---------------------------------------------------------+ | 158997209 | +---------------------------------------------------------+ Fetched 1 row(s) in 1.74s [localhost:21000] > select sum(length(replace(l_comment, 'comment', '//'))) from tpch.lineitem; Query: select sum(length(replace(l_comment, 'comment', '//'))) from tpch.lineitem Query submitted at: 2017-02-03 19:24:30 (Coordinator: http://impala-dev:25000) Query progress can be monitored at: http://impala-dev:25000/query_plan?query_id=854f49d5c5c3951f:271f7f2200000000 +--------------------------------------------------+ | sum(length(replace(l_comment, 'comment', '//'))) | +--------------------------------------------------+ | 158997209 | +--------------------------------------------------+ Fetched 1 row(s) in 1.33s [localhost:21000] > http://gerrit.cloudera.org:8080/#/c/5776/6/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: Line 2619: /* Since "IF", "TRUNCATE" are keywords, need to special case these functions */ > update comment Done -- To view, visit http://gerrit.cloudera.org:8080/5776 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Zach Amsden <[email protected]> Gerrit-HasComments: Yes
