[jira] [Commented] (HIVE-22929) Performance: quoted identifier parsing uses throwaway Regex via String.replaceAll()
[ https://issues.apache.org/jira/browse/HIVE-22929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17050614#comment-17050614 ] Jesus Camacho Rodriguez commented on HIVE-22929: +1 > Performance: quoted identifier parsing uses throwaway Regex via > String.replaceAll() > --- > > Key: HIVE-22929 > URL: https://issues.apache.org/jira/browse/HIVE-22929 > Project: Hive > Issue Type: Bug >Reporter: Gopal Vijayaraghavan >Assignee: Krisztian Kasa >Priority: Major > Attachments: HIVE-22929.1.patch, HIVE-22929.2.patch, > HIVE-22929.2.patch, HIVE-22929.2.patch, HIVE-22929.2.patch, > String.replaceAll.png > > > !String.replaceAll.png! > https://github.com/apache/hive/blob/master/parser/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g#L530 > {code} > '`' ( '``' | ~('`') )* '`' { setText(getText().substring(1, > getText().length() -1 ).replaceAll("``", "`")); } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22929) Performance: quoted identifier parsing uses throwaway Regex via String.replaceAll()
[ https://issues.apache.org/jira/browse/HIVE-22929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17050480#comment-17050480 ] Hive QA commented on HIVE-22929: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12995468/HIVE-22929.2.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:green}SUCCESS:{color} +1 due to 18096 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/20933/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/20933/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-20933/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12995468 - PreCommit-HIVE-Build > Performance: quoted identifier parsing uses throwaway Regex via > String.replaceAll() > --- > > Key: HIVE-22929 > URL: https://issues.apache.org/jira/browse/HIVE-22929 > Project: Hive > Issue Type: Bug >Reporter: Gopal Vijayaraghavan >Assignee: Krisztian Kasa >Priority: Major > Attachments: HIVE-22929.1.patch, HIVE-22929.2.patch, > HIVE-22929.2.patch, HIVE-22929.2.patch, HIVE-22929.2.patch, > String.replaceAll.png > > > !String.replaceAll.png! > https://github.com/apache/hive/blob/master/parser/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g#L530 > {code} > '`' ( '``' | ~('`') )* '`' { setText(getText().substring(1, > getText().length() -1 ).replaceAll("``", "`")); } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22929) Performance: quoted identifier parsing uses throwaway Regex via String.replaceAll()
[ https://issues.apache.org/jira/browse/HIVE-22929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17050414#comment-17050414 ] Hive QA commented on HIVE-22929: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 1m 3s{color} | {color:red} The patch generated 2 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 1m 50s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-20933/dev-support/hive-personality.sh | | git revision | master / 9cdf97f | | asflicense | http://104.198.109.242/logs//PreCommit-HIVE-Build-20933/yetus/patch-asflicense-problems.txt | | modules | C: parser U: parser | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-20933/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Performance: quoted identifier parsing uses throwaway Regex via > String.replaceAll() > --- > > Key: HIVE-22929 > URL: https://issues.apache.org/jira/browse/HIVE-22929 > Project: Hive > Issue Type: Bug >Reporter: Gopal Vijayaraghavan >Assignee: Krisztian Kasa >Priority: Major > Attachments: HIVE-22929.1.patch, HIVE-22929.2.patch, > HIVE-22929.2.patch, HIVE-22929.2.patch, HIVE-22929.2.patch, > String.replaceAll.png > > > !String.replaceAll.png! > https://github.com/apache/hive/blob/master/parser/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g#L530 > {code} > '`' ( '``' | ~('`') )* '`' { setText(getText().substring(1, > getText().length() -1 ).replaceAll("``", "`")); } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22929) Performance: quoted identifier parsing uses throwaway Regex via String.replaceAll()
[ https://issues.apache.org/jira/browse/HIVE-22929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17050047#comment-17050047 ] Hive QA commented on HIVE-22929: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 1m 3s{color} | {color:red} The patch generated 2 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 1m 44s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-20926/dev-support/hive-personality.sh | | git revision | master / 94dca16 | | asflicense | http://104.198.109.242/logs//PreCommit-HIVE-Build-20926/yetus/patch-asflicense-problems.txt | | modules | C: parser U: parser | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-20926/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Performance: quoted identifier parsing uses throwaway Regex via > String.replaceAll() > --- > > Key: HIVE-22929 > URL: https://issues.apache.org/jira/browse/HIVE-22929 > Project: Hive > Issue Type: Bug >Reporter: Gopal Vijayaraghavan >Assignee: Krisztian Kasa >Priority: Major > Attachments: HIVE-22929.1.patch, HIVE-22929.2.patch, > HIVE-22929.2.patch, HIVE-22929.2.patch, String.replaceAll.png > > > !String.replaceAll.png! > https://github.com/apache/hive/blob/master/parser/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g#L530 > {code} > '`' ( '``' | ~('`') )* '`' { setText(getText().substring(1, > getText().length() -1 ).replaceAll("``", "`")); } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22929) Performance: quoted identifier parsing uses throwaway Regex via String.replaceAll()
[ https://issues.apache.org/jira/browse/HIVE-22929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17048181#comment-17048181 ] Hive QA commented on HIVE-22929: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12994900/HIVE-22929.2.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 18090 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestTezPerfConstraintsCliDriver.testCliDriver[cbo_query20] (batchId=305) org.apache.hadoop.hive.metastore.TestHiveMetaStoreAlterColumnPar.org.apache.hadoop.hive.metastore.TestHiveMetaStoreAlterColumnPar (batchId=253) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/20881/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/20881/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-20881/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12994900 - PreCommit-HIVE-Build > Performance: quoted identifier parsing uses throwaway Regex via > String.replaceAll() > --- > > Key: HIVE-22929 > URL: https://issues.apache.org/jira/browse/HIVE-22929 > Project: Hive > Issue Type: Bug >Reporter: Gopal Vijayaraghavan >Assignee: Krisztian Kasa >Priority: Major > Attachments: HIVE-22929.1.patch, HIVE-22929.2.patch, > String.replaceAll.png > > > !String.replaceAll.png! > https://github.com/apache/hive/blob/master/parser/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g#L530 > {code} > '`' ( '``' | ~('`') )* '`' { setText(getText().substring(1, > getText().length() -1 ).replaceAll("``", "`")); } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22929) Performance: quoted identifier parsing uses throwaway Regex via String.replaceAll()
[ https://issues.apache.org/jira/browse/HIVE-22929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17048141#comment-17048141 ] Hive QA commented on HIVE-22929: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 1m 12s{color} | {color:red} The patch generated 2 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 2m 17s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-20881/dev-support/hive-personality.sh | | git revision | master / 1fc351b | | asflicense | http://104.198.109.242/logs//PreCommit-HIVE-Build-20881/yetus/patch-asflicense-problems.txt | | modules | C: parser U: parser | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-20881/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Performance: quoted identifier parsing uses throwaway Regex via > String.replaceAll() > --- > > Key: HIVE-22929 > URL: https://issues.apache.org/jira/browse/HIVE-22929 > Project: Hive > Issue Type: Bug >Reporter: Gopal Vijayaraghavan >Assignee: Krisztian Kasa >Priority: Major > Attachments: HIVE-22929.1.patch, HIVE-22929.2.patch, > String.replaceAll.png > > > !String.replaceAll.png! > https://github.com/apache/hive/blob/master/parser/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g#L530 > {code} > '`' ( '``' | ~('`') )* '`' { setText(getText().substring(1, > getText().length() -1 ).replaceAll("``", "`")); } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22929) Performance: quoted identifier parsing uses throwaway Regex via String.replaceAll()
[ https://issues.apache.org/jira/browse/HIVE-22929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17047636#comment-17047636 ] Krisztian Kasa commented on HIVE-22929: --- [~gopalv] Uploaded a patch fixing only the HiveLexer using StringUtils.replace. I will address other issues in a separate jira if needed. > Performance: quoted identifier parsing uses throwaway Regex via > String.replaceAll() > --- > > Key: HIVE-22929 > URL: https://issues.apache.org/jira/browse/HIVE-22929 > Project: Hive > Issue Type: Bug >Reporter: Gopal Vijayaraghavan >Assignee: Krisztian Kasa >Priority: Major > Attachments: HIVE-22929.1.patch, HIVE-22929.2.patch, > String.replaceAll.png > > > !String.replaceAll.png! > https://github.com/apache/hive/blob/master/parser/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g#L530 > {code} > '`' ( '``' | ~('`') )* '`' { setText(getText().substring(1, > getText().length() -1 ).replaceAll("``", "`")); } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22929) Performance: quoted identifier parsing uses throwaway Regex via String.replaceAll()
[ https://issues.apache.org/jira/browse/HIVE-22929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17047608#comment-17047608 ] Krisztian Kasa commented on HIVE-22929: --- [~kgyrtkirk] Run benchmark using hive-jmh: Sample query: {code:java} select `x+1```, `y`, rank() over(partition by `x+1``` order by `y`) \n" + "from t4 where `x+1``` = '10' group by `x+1```, `y` having `x+1``` = '10' {code} Settings for each method under test: {code:java} @Benchmark @Warmup(iterations = 10, time = 2, timeUnit = TimeUnit.MILLISECONDS) @Measurement(iterations = 1000, time = 2, timeUnit = TimeUnit.MILLISECONDS) {code} Results: {code:java} Benchmark Mode Cnt Score Error Units BacktickDedup.BacktickDedupImpl.charDedup avgt 1000 0.717 ± 0.095 us/op BacktickDedup.BacktickDedupImpl.charDedupArrayavgt 1000 1.007 ± 0.077 us/op BacktickDedup.BacktickDedupImpl.regExUtilsReplaceAll avgt 1000 1.495 ± 0.060 us/op BacktickDedup.BacktickDedupImpl.stringReplace avgt 1000 1.856 ± 0.430 us/op BacktickDedup.BacktickDedupImpl.stringReplaceAll avgt 1000 2.056 ± 0.391 us/op BacktickDedup.BacktickDedupImpl.stringUtilsReplaceavgt 1000 0.545 ± 0.065 us/op {code} Your function is called charDedupArray here. And charDedup is a version where *String.charAt( i )* is used instead of *String.toCharArray()* + iterating through the array. Conclusion: non-regex based methods are perform better. > Performance: quoted identifier parsing uses throwaway Regex via > String.replaceAll() > --- > > Key: HIVE-22929 > URL: https://issues.apache.org/jira/browse/HIVE-22929 > Project: Hive > Issue Type: Bug >Reporter: Gopal Vijayaraghavan >Assignee: Krisztian Kasa >Priority: Major > Attachments: HIVE-22929.1.patch, String.replaceAll.png > > > !String.replaceAll.png! > https://github.com/apache/hive/blob/master/parser/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g#L530 > {code} > '`' ( '``' | ~('`') )* '`' { setText(getText().substring(1, > getText().length() -1 ).replaceAll("``", "`")); } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22929) Performance: quoted identifier parsing uses throwaway Regex via String.replaceAll()
[ https://issues.apache.org/jira/browse/HIVE-22929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17047330#comment-17047330 ] Krisztian Kasa commented on HIVE-22929: --- [~gopalv] I use JDK8. Seems that *StringUtils.replace* is the best: {code:java} long count = 1000; ... start = System.currentTimeMillis(); for (int i = 0; i < count; ++i) { result = StringUtils.replace("sample sample", "am", "b"); } System.out.println("StringUtils.replace: " + (System.currentTimeMillis() - start)); System.out.println(result); {code} {code:java} StringUtils.replace: 777 sbple sbple {code} Thanks > Performance: quoted identifier parsing uses throwaway Regex via > String.replaceAll() > --- > > Key: HIVE-22929 > URL: https://issues.apache.org/jira/browse/HIVE-22929 > Project: Hive > Issue Type: Bug >Reporter: Gopal Vijayaraghavan >Assignee: Krisztian Kasa >Priority: Major > Attachments: HIVE-22929.1.patch, String.replaceAll.png > > > !String.replaceAll.png! > https://github.com/apache/hive/blob/master/parser/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g#L530 > {code} > '`' ( '``' | ~('`') )* '`' { setText(getText().substring(1, > getText().length() -1 ).replaceAll("``", "`")); } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22929) Performance: quoted identifier parsing uses throwaway Regex via String.replaceAll()
[ https://issues.apache.org/jira/browse/HIVE-22929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17047329#comment-17047329 ] Zoltan Haindrich commented on HIVE-22929: - okay; one more note: I think the reason that replaceAll lit up on the profiler might have other backgrounds beyond the pure weight of the function; we have a few syn-preds in the grammar - those can blow up parsing time quite easily - and may overstress things like a replaceAll; I it would be helpfull to get a counter #{replaceAll calls at the key location} and compare it against #{` quoted things in the input query} > Performance: quoted identifier parsing uses throwaway Regex via > String.replaceAll() > --- > > Key: HIVE-22929 > URL: https://issues.apache.org/jira/browse/HIVE-22929 > Project: Hive > Issue Type: Bug >Reporter: Gopal Vijayaraghavan >Assignee: Krisztian Kasa >Priority: Major > Attachments: HIVE-22929.1.patch, String.replaceAll.png > > > !String.replaceAll.png! > https://github.com/apache/hive/blob/master/parser/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g#L530 > {code} > '`' ( '``' | ~('`') )* '`' { setText(getText().substring(1, > getText().length() -1 ).replaceAll("``", "`")); } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22929) Performance: quoted identifier parsing uses throwaway Regex via String.replaceAll()
[ https://issues.apache.org/jira/browse/HIVE-22929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17047310#comment-17047310 ] Gopal Vijayaraghavan commented on HIVE-22929: - bq. Could you please provide a sample query for this? it might be interesting to take a look at it - in case it heats up something like a "String.replaceAll" function This was from a profile with 100k queries running on a system - the best approximation is that it is 56 different queries with varying predicates & giant list of projection column names, with 256 concurrent users. Wasn't from a single query or a single user. > Performance: quoted identifier parsing uses throwaway Regex via > String.replaceAll() > --- > > Key: HIVE-22929 > URL: https://issues.apache.org/jira/browse/HIVE-22929 > Project: Hive > Issue Type: Bug >Reporter: Gopal Vijayaraghavan >Assignee: Krisztian Kasa >Priority: Major > Attachments: HIVE-22929.1.patch, String.replaceAll.png > > > !String.replaceAll.png! > https://github.com/apache/hive/blob/master/parser/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g#L530 > {code} > '`' ( '``' | ~('`') )* '`' { setText(getText().substring(1, > getText().length() -1 ).replaceAll("``", "`")); } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22929) Performance: quoted identifier parsing uses throwaway Regex via String.replaceAll()
[ https://issues.apache.org/jira/browse/HIVE-22929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17047309#comment-17047309 ] Gopal Vijayaraghavan commented on HIVE-22929: - Or instead of using the JDK versions (with diff impls), we could use https://commons.apache.org/proper/commons-lang/apidocs/src-html/org/apache/commons/lang3/StringUtils.html#line.5602 which looks nearly identical to the JDK11 implementation. > Performance: quoted identifier parsing uses throwaway Regex via > String.replaceAll() > --- > > Key: HIVE-22929 > URL: https://issues.apache.org/jira/browse/HIVE-22929 > Project: Hive > Issue Type: Bug >Reporter: Gopal Vijayaraghavan >Assignee: Krisztian Kasa >Priority: Major > Attachments: HIVE-22929.1.patch, String.replaceAll.png > > > !String.replaceAll.png! > https://github.com/apache/hive/blob/master/parser/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g#L530 > {code} > '`' ( '``' | ~('`') )* '`' { setText(getText().substring(1, > getText().length() -1 ).replaceAll("``", "`")); } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22929) Performance: quoted identifier parsing uses throwaway Regex via String.replaceAll()
[ https://issues.apache.org/jira/browse/HIVE-22929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17047308#comment-17047308 ] Zoltan Haindrich commented on HIVE-22929: - [~kkasa] do we need the power of regexp to address the original issue? I think it was something like: {code} str.replaceAll("``","`") {code} I think a special tailored utility function would probably perform even better...could you see how this performs: {code} public String fx(String s, char c) { StringBuilder sb = new StringBuilder(); char[] cc = s.toCharArray(); char l = 0; for (int i = 0; i < cc.length; i++) { char curr = cc[i]; if (l == '`') { l = 0; continue; } else { l = curr; } sb.append(curr); } return sb.toString(); } {code} note: for performance measurements you can write tests under the itests/hive-jmh ; there are a few there already. ...I think another approach could be: since we have a lexer here (somewhere) ...we might be able to convince it to process these escapings/etc for us - not sure at what cost it could do that... I would recommend to not change this all over the place - it might not affect general performance; for example the performance gain at places like a ddltask is irrelevant...I think it's best to focus on the performance issue at hand - the impact of the same issue is usually neglegible at other places [~gopalv] Could you please provide a sample query for this? it might be interesting to take a look at it - in case it heats up something like a "String.replaceAll" function > Performance: quoted identifier parsing uses throwaway Regex via > String.replaceAll() > --- > > Key: HIVE-22929 > URL: https://issues.apache.org/jira/browse/HIVE-22929 > Project: Hive > Issue Type: Bug >Reporter: Gopal Vijayaraghavan >Assignee: Krisztian Kasa >Priority: Major > Attachments: HIVE-22929.1.patch, String.replaceAll.png > > > !String.replaceAll.png! > https://github.com/apache/hive/blob/master/parser/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g#L530 > {code} > '`' ( '``' | ~('`') )* '`' { setText(getText().substring(1, > getText().length() -1 ).replaceAll("``", "`")); } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22929) Performance: quoted identifier parsing uses throwaway Regex via String.replaceAll()
[ https://issues.apache.org/jira/browse/HIVE-22929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17047303#comment-17047303 ] Gopal Vijayaraghavan commented on HIVE-22929: - I just looked at my JDK install and it has {code} /** * Replaces each substring of this string that matches the literal target * sequence with the specified literal replacement sequence. The * replacement proceeds from the beginning of the string to the end, for * example, replacing "aa" with "b" in the string "aaa" will result in * "ba" rather than "ab". * * @param target The sequence of char values to be replaced * @param replacement The replacement sequence of char values * @return The resulting string * @since 1.5 */ public String replace(CharSequence target, CharSequence replacement) { String tgtStr = target.toString(); String replStr = replacement.toString(); int j = indexOf(tgtStr); if (j < 0) { return this; } int tgtLen = tgtStr.length(); int tgtLen1 = Math.max(tgtLen, 1); int thisLen = length(); int newLenHint = thisLen - tgtLen + replStr.length(); if (newLenHint < 0) { throw new OutOfMemoryError(); } StringBuilder sb = new StringBuilder(newLenHint); int i = 0; do { sb.append(this, i, j).append(replStr); i = j + tgtLen; } while (j < thisLen && (j = indexOf(tgtStr, j + tgtLen1)) > 0); return sb.append(this, i, thisLen).toString(); } {code} Are you using JDK11 or JDK8 to run this? > Performance: quoted identifier parsing uses throwaway Regex via > String.replaceAll() > --- > > Key: HIVE-22929 > URL: https://issues.apache.org/jira/browse/HIVE-22929 > Project: Hive > Issue Type: Bug >Reporter: Gopal Vijayaraghavan >Assignee: Krisztian Kasa >Priority: Major > Attachments: HIVE-22929.1.patch, String.replaceAll.png > > > !String.replaceAll.png! > https://github.com/apache/hive/blob/master/parser/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g#L530 > {code} > '`' ( '``' | ~('`') )* '`' { setText(getText().substring(1, > getText().length() -1 ).replaceAll("``", "`")); } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22929) Performance: quoted identifier parsing uses throwaway Regex via String.replaceAll()
[ https://issues.apache.org/jira/browse/HIVE-22929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17047281#comment-17047281 ] Krisztian Kasa commented on HIVE-22929: --- [~gopalv] String.replace implementation is: {code} public String replace(CharSequence target, CharSequence replacement) { return Pattern.compile(target.toString(), Pattern.LITERAL).matcher( this).replaceAll(Matcher.quoteReplacement(replacement.toString())); } {code} So it also calls Pattern.compile with *target* every time it called. The difference between replace and replaceAll is: {code} replace Pattern.compile(target.toString(), Pattern.LITERAL) {code} {code} replaceAll Pattern.compile(regex) {code} I did some testing: {code} public static final Pattern REGEX = Pattern.compile("am", Pattern.LITERAL); @Test public void testReplacePerf() { long count = 1000; long start = System.currentTimeMillis(); for (int i = 0; i < count; ++i) { String s = "sample sample".replaceAll("am", "b"); } System.out.println("String.replaceAll: " + (System.currentTimeMillis() - start)); start = System.currentTimeMillis(); for (int i = 0; i < count; ++i) { String s = "sample sample".replace("am", "b"); } System.out.println("String.replace: " + (System.currentTimeMillis() - start)); start = System.currentTimeMillis(); for (int i = 0; i < count; ++i) { String s = RegExUtils.replaceAll("sample sample", REGEX, "b"); } System.out.println("Precompiled regex + RegExUtils.replaceAll:" + (System.currentTimeMillis() - start)); } {code} {code} String.replaceAll: 3997 String.replace: 3028 Precompiled regex + RegExUtils.replaceAll:2164 {code} Please share your thoughts. > Performance: quoted identifier parsing uses throwaway Regex via > String.replaceAll() > --- > > Key: HIVE-22929 > URL: https://issues.apache.org/jira/browse/HIVE-22929 > Project: Hive > Issue Type: Bug >Reporter: Gopal Vijayaraghavan >Assignee: Krisztian Kasa >Priority: Major > Attachments: HIVE-22929.1.patch, String.replaceAll.png > > > !String.replaceAll.png! > https://github.com/apache/hive/blob/master/parser/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g#L530 > {code} > '`' ( '``' | ~('`') )* '`' { setText(getText().substring(1, > getText().length() -1 ).replaceAll("``", "`")); } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22929) Performance: quoted identifier parsing uses throwaway Regex via String.replaceAll()
[ https://issues.apache.org/jira/browse/HIVE-22929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17047054#comment-17047054 ] Gopal Vijayaraghavan commented on HIVE-22929: - {code} QuotedIdentifier : -'`' ( '``' | ~('`') )* '`' { setText(getText().substring(1, getText().length() -1 ).replaceAll("``", "`")); } +'`' ( '``' | ~('`') )* '`' { setText(RegExUtils.replaceAll(getText().substring(1, getText().length() -1 ), QUOTED_REGEX, "`")); } {code} For the single literal case, what we needed to use was String.replace(), not a Regex. > Performance: quoted identifier parsing uses throwaway Regex via > String.replaceAll() > --- > > Key: HIVE-22929 > URL: https://issues.apache.org/jira/browse/HIVE-22929 > Project: Hive > Issue Type: Bug >Reporter: Gopal Vijayaraghavan >Assignee: Krisztian Kasa >Priority: Major > Attachments: HIVE-22929.1.patch, String.replaceAll.png > > > !String.replaceAll.png! > https://github.com/apache/hive/blob/master/parser/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g#L530 > {code} > '`' ( '``' | ~('`') )* '`' { setText(getText().substring(1, > getText().length() -1 ).replaceAll("``", "`")); } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22929) Performance: quoted identifier parsing uses throwaway Regex via String.replaceAll()
[ https://issues.apache.org/jira/browse/HIVE-22929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17047031#comment-17047031 ] Hive QA commented on HIVE-22929: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12994772/HIVE-22929.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:green}SUCCESS:{color} +1 due to 18073 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/20858/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/20858/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-20858/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12994772 - PreCommit-HIVE-Build > Performance: quoted identifier parsing uses throwaway Regex via > String.replaceAll() > --- > > Key: HIVE-22929 > URL: https://issues.apache.org/jira/browse/HIVE-22929 > Project: Hive > Issue Type: Bug >Reporter: Gopal Vijayaraghavan >Assignee: Krisztian Kasa >Priority: Major > Attachments: HIVE-22929.1.patch, String.replaceAll.png > > > !String.replaceAll.png! > https://github.com/apache/hive/blob/master/parser/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g#L530 > {code} > '`' ( '``' | ~('`') )* '`' { setText(getText().substring(1, > getText().length() -1 ).replaceAll("``", "`")); } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22929) Performance: quoted identifier parsing uses throwaway Regex via String.replaceAll()
[ https://issues.apache.org/jira/browse/HIVE-22929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17047009#comment-17047009 ] Hive QA commented on HIVE-22929: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 2m 7s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 39s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 26s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 59s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 58s{color} | {color:blue} parser in master has 3 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 4m 0s{color} | {color:blue} ql in master has 1531 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 18s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 29s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 31s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 49s{color} | {color:red} ql: The patch generated 7 new + 724 unchanged - 16 fixed = 731 total (was 740) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 15s{color} | {color:red} The patch generated 2 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 31m 49s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-20858/dev-support/hive-personality.sh | | git revision | master / ffba5d6 | | Default Java | 1.8.0_111 | | findbugs | v3.0.1 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-20858/yetus/diff-checkstyle-ql.txt | | asflicense | http://104.198.109.242/logs//PreCommit-HIVE-Build-20858/yetus/patch-asflicense-problems.txt | | modules | C: parser ql U: . | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-20858/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Performance: quoted identifier parsing uses throwaway Regex via > String.replaceAll() > --- > > Key: HIVE-22929 > URL: https://issues.apache.org/jira/browse/HIVE-22929 > Project: Hive > Issue Type: Bug >Reporter: Gopal Vijayaraghavan >Assignee: Krisztian Kasa >Priority: Major > Attachments: HIVE-22929.1.patch, String.replaceAll.png > > > !String.replaceAll.png! > https://github.com/apache/hive/blob/master/parser/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g#L530 > {code} > '`' ( '``' | ~('`') )* '`' { setText(getText().substring(1, > getText().length() -1 ).replaceAll("``", "`")); } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22929) Performance: quoted identifier parsing uses throwaway Regex via String.replaceAll()
[ https://issues.apache.org/jira/browse/HIVE-22929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045797#comment-17045797 ] Gopal Vijayaraghavan commented on HIVE-22929: - At least for a first pass, only the ones which use character constants, not regexes. {code} org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:String escapedTokenText = curTok.getText().replaceAll("`", "``"); {code} is to be replaced with String.replace(), but for now we'll leave behind {code} org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java: String expr_no_tok = expr_flattened.replaceAll("tok_\\S+", ""); {code} for a later pass to be replaced by Pattern.compile() as a class constant, create a .matcher in each instance + .replaceAll(). > Performance: quoted identifier parsing uses throwaway Regex via > String.replaceAll() > --- > > Key: HIVE-22929 > URL: https://issues.apache.org/jira/browse/HIVE-22929 > Project: Hive > Issue Type: Bug >Reporter: Gopal Vijayaraghavan >Assignee: Krisztian Kasa >Priority: Major > Attachments: String.replaceAll.png > > > !String.replaceAll.png! > https://github.com/apache/hive/blob/master/parser/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g#L530 > {code} > '`' ( '``' | ~('`') )* '`' { setText(getText().substring(1, > getText().length() -1 ).replaceAll("``", "`")); } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22929) Performance: quoted identifier parsing uses throwaway Regex via String.replaceAll()
[ https://issues.apache.org/jira/browse/HIVE-22929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045536#comment-17045536 ] Krisztian Kasa commented on HIVE-22929: --- [~gopalv] replaceAll is used lots of times. Should all of them are changed? These are the ones I found in the ql project: {code:java} org/apache/hadoop/hive/ql/ddl/misc/conf/ShowConfOperation.java: output.write(description.replaceAll(" *\n *", " ").getBytes("UTF-8")); org/apache/hadoop/hive/ql/ddl/table/column/show/ShowColumnsOperation.java: columnPattern = columnPattern.replaceAll("\\*", ".*"); org/apache/hadoop/hive/ql/exec/Utilities.java: String ret = taskid.replaceAll(".*_[mr]_", "").replaceAll(".*_(map|reduce)_", ""); org/apache/hadoop/hive/ql/exec/Utilities.java: return filename.replaceAll(oldTaskId, newTaskId); org/apache/hadoop/hive/ql/exec/Utilities.java: String class_file = klass.getName().replaceAll("\\.", "/") + ".class"; org/apache/hadoop/hive/ql/exec/Utilities.java:return path.replaceAll("!.*$", ""); org/apache/hadoop/hive/ql/exec/tez/HivePreWarmProcessor.java: String klass = je.getName().replace(".class","").replaceAll("/","\\."); org/apache/hadoop/hive/ql/exec/tez/monitoring/TezJobMonitor.java: .map(k -> k.replaceAll(" ", "_")) org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java:return sb.toString().replaceAll("GeneratedConstructorAccessor[0-9]*", "GeneratedConstructorAccessor"); org/apache/hadoop/hive/ql/exec/vector/expressions/CastStringToDateWithFormat.java: Date date = formatter.parseDate(dateString.replaceAll("\u", "")); org/apache/hadoop/hive/ql/exec/vector/expressions/CastStringToTimestampWithFormat.java: Timestamp timestamp = formatter.parseTimestamp(inputString.replaceAll("\u", "")); org/apache/hadoop/hive/ql/exec/vector/expressions/FilterStringColRegExpStringScalar.java: return new PhoneNumberChecker(pattern.replaceAll("d", "d").replaceAll("\\(", "(").replaceAll("\\)", ")")); org/apache/hadoop/hive/ql/io/parquet/LeafFilterFactory.java: return value.replaceAll(regex, ""); org/apache/hadoop/hive/ql/lockmgr/HiveLockObject.java:return in.replaceAll(":",""); org/apache/hadoop/hive/ql/metadata/HiveUtils.java: identifier = identifier.replaceAll("`", "``"); org/apache/hadoop/hive/ql/metadata/JarUtils.java:String class_file = my_class.getName().replaceAll("\\.", "/") + ".class"; org/apache/hadoop/hive/ql/metadata/JarUtils.java:toReturn = toReturn.replaceAll("\\+", "%2B"); org/apache/hadoop/hive/ql/metadata/JarUtils.java:return toReturn.replaceAll("!.*$", ""); org/apache/hadoop/hive/ql/metadata/JarUtils.java: String class_file = klass.getName().replaceAll("\\.", "/") + ".class"; org/apache/hadoop/hive/ql/metadata/JarUtils.java:return path.replaceAll("!.*$", ""); org/apache/hadoop/hive/ql/metadata/SessionHiveMetaStoreClient.java: tablePattern = tablePattern.replaceAll("\\*", ".*"); org/apache/hadoop/hive/ql/metadata/SessionHiveMetaStoreClient.java: tablePattern = tablePattern.replaceAll("\\*", ".*"); org/apache/hadoop/hive/ql/metadata/SessionHiveMetaStoreClient.java: dbPatternList.add(Pattern.compile(element.replaceAll("\\*", ".*")).matcher("")); org/apache/hadoop/hive/ql/metadata/SessionHiveMetaStoreClient.java: tblPatternList.add(Pattern.compile(element.replaceAll("\\*", ".*")).matcher("")); org/apache/hadoop/hive/ql/metadata/formatting/MetaDataFormatUtils.java: String unescapedValue = (isLastLinePadded && value != null) ? value.replaceAll("n|r|rn", "\n") : value; org/apache/hadoop/hive/ql/metadata/formatting/MetaDataFormatUtils.java: (isOutputPadded && value != null) ? value.replaceAll("n|r|rn","\n"):value; org/apache/hadoop/hive/ql/optimizer/physical/NullScanTaskDispatcher.java: return partSpec.toString().replaceAll("[{}:/#\\?, ]+", "_"); org/apache/hadoop/hive/ql/optimizer/spark/SplitSparkWorkResolver.java: clonedParentWork.setName(clonedParentWork.getName().replaceAll("^([a-zA-Z]+)(\\s+)(\\d+)", org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java:return colName.replaceAll("`", "``"); org/apache/hadoop/hive/ql/parse/ReplicationSpec.java:long currReplStateLong = Long.parseLong(currReplState.replaceAll("\\D","")); org/apache/hadoop/hive/ql/parse/ReplicationSpec.java:long replacementReplStateLong = Long.parseLong(replacementReplState.replaceAll("\\D","")); org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java: String expr_no_tok = expr_flattened.replaceAll("tok_\\S+", ""); org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java: String expr_formatted = expr_no_tok.replaceAll("\\W", " ").trim().replaceAll("\\s+", "_"); org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java: sb.append(curTok.getText().replaceAll("`", "``"));