[jira] Commented: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791481#action_12791481 ] Hadoop QA commented on PIG-965: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12428184/poregex2.patch against trunk revision 890596. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to cause Findbugs to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/130/testReport/ Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/130/console This message is automatically generated. PERFORMANCE: optimize common case in matches (PORegex) -- Key: PIG-965 URL: https://issues.apache.org/jira/browse/PIG-965 Project: Pig Issue Type: Improvement Components: impl Reporter: Thejas M Nair Assignee: Benjamin Francisoud Attachments: automaton.jar, poregex2.patch Some frequently seen use cases of 'matches' comparison operator have follow properties - 1. The rhs is a constant string . eg c1 matches 'abc%' 2. Regexes such that look for matching prefix , suffix etc are very common. eg - abc%', %abc, '%abc%' To optimize for these common cases , PORegex.java can be changed to - 1. Compile the pattern (rhs of matches) re-use it if the pattern string has not changed. 2. Use string comparisons for simple common regexes (in 2 above). The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791561#action_12791561 ] Thejas M Nair commented on PIG-965: --- +1 . new patch looks good Hudson findbugs and core-tests will fail because it does not include the attached jar while compiling. PERFORMANCE: optimize common case in matches (PORegex) -- Key: PIG-965 URL: https://issues.apache.org/jira/browse/PIG-965 Project: Pig Issue Type: Improvement Components: impl Reporter: Thejas M Nair Assignee: Ankit Modi Attachments: automaton.jar, poregex2.patch Some frequently seen use cases of 'matches' comparison operator have follow properties - 1. The rhs is a constant string . eg c1 matches 'abc%' 2. Regexes such that look for matching prefix , suffix etc are very common. eg - abc%', %abc, '%abc%' To optimize for these common cases , PORegex.java can be changed to - 1. Compile the pattern (rhs of matches) re-use it if the pattern string has not changed. 2. Use string comparisons for simple common regexes (in 2 above). The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791668#action_12791668 ] Olga Natkovich commented on PIG-965: Thanks, Thejas. I will run test-commit tests manually and commit if it passes. PERFORMANCE: optimize common case in matches (PORegex) -- Key: PIG-965 URL: https://issues.apache.org/jira/browse/PIG-965 Project: Pig Issue Type: Improvement Components: impl Reporter: Thejas M Nair Assignee: Ankit Modi Attachments: automaton.jar, poregex2.patch Some frequently seen use cases of 'matches' comparison operator have follow properties - 1. The rhs is a constant string . eg c1 matches 'abc%' 2. Regexes such that look for matching prefix , suffix etc are very common. eg - abc%', %abc, '%abc%' To optimize for these common cases , PORegex.java can be changed to - 1. Compile the pattern (rhs of matches) re-use it if the pattern string has not changed. 2. Use string comparisons for simple common regexes (in 2 above). The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791096#action_12791096 ] Ankit Modi commented on PIG-965: Here are numbers comparing comparing optimization 12 against optimization 1 dk.brics dk.brics.Runautomaton is as fast as optimization 2 and also provides similar speeds in a set of additional expressions. || Query || svn_trunk || std_dev || Optimization 1 2 || std_dev || Optimization 1 brics.RunAutomaton || std_dev || | .\*ABCD.\* | 33.87 | 0.71 | 18.77 | 0.71 | 18.94 | 0.02 | | .\*ABCD | 30.06 | 2.91 | 18.44 | 0.05 | 18.94 | 0.03 | | ABCD.\* | 21.93 | 2.91 | 18.35 | 0.1 | 18.85 | 0.04 | Values are averaged over 3 runs. PERFORMANCE: optimize common case in matches (PORegex) -- Key: PIG-965 URL: https://issues.apache.org/jira/browse/PIG-965 Project: Pig Issue Type: Improvement Components: impl Reporter: Thejas M Nair Assignee: Ankit Modi Attachments: automaton.jar, poregex2.patch Some frequently seen use cases of 'matches' comparison operator have follow properties - 1. The rhs is a constant string . eg c1 matches 'abc%' 2. Regexes such that look for matching prefix , suffix etc are very common. eg - abc%', %abc, '%abc%' To optimize for these common cases , PORegex.java can be changed to - 1. Compile the pattern (rhs of matches) re-use it if the pattern string has not changed. 2. Use string comparisons for simple common regexes (in 2 above). The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791225#action_12791225 ] Hadoop QA commented on PIG-965: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12428066/poregex2.patch against trunk revision 890596. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to cause Findbugs to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/127/testReport/ Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/127/console This message is automatically generated. PERFORMANCE: optimize common case in matches (PORegex) -- Key: PIG-965 URL: https://issues.apache.org/jira/browse/PIG-965 Project: Pig Issue Type: Improvement Components: impl Reporter: Thejas M Nair Assignee: Ankit Modi Attachments: automaton.jar, poregex2.patch Some frequently seen use cases of 'matches' comparison operator have follow properties - 1. The rhs is a constant string . eg c1 matches 'abc%' 2. Regexes such that look for matching prefix , suffix etc are very common. eg - abc%', %abc, '%abc%' To optimize for these common cases , PORegex.java can be changed to - 1. Compile the pattern (rhs of matches) re-use it if the pattern string has not changed. 2. Use string comparisons for simple common regexes (in 2 above). The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12790104#action_12790104 ] Hadoop QA commented on PIG-965: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12427913/poregex2.patch against trunk revision 889870. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to cause Findbugs to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/121/testReport/ Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/121/console This message is automatically generated. PERFORMANCE: optimize common case in matches (PORegex) -- Key: PIG-965 URL: https://issues.apache.org/jira/browse/PIG-965 Project: Pig Issue Type: Improvement Components: impl Reporter: Thejas M Nair Assignee: Ankit Modi Attachments: automaton.jar, poregex2.patch Some frequently seen use cases of 'matches' comparison operator have follow properties - 1. The rhs is a constant string . eg c1 matches 'abc%' 2. Regexes such that look for matching prefix , suffix etc are very common. eg - abc%', %abc, '%abc%' To optimize for these common cases , PORegex.java can be changed to - 1. Compile the pattern (rhs of matches) re-use it if the pattern string has not changed. 2. Use string comparisons for simple common regexes (in 2 above). The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12790545#action_12790545 ] Ankit Modi commented on PIG-965: * NonConstantRegex - I did not think of equals. But I added a length check before as it could find out change in length faster and to best of my knowledge its a getMethod. And yes as you mentioned equals will check for same object and instanceOf which is not useful in our case. * The numbers published above are using dk.brics.automaton.RunAutomaton. Do you want me to publish numbers for more set of regexs ? I'll create a patch for rest of the comments. PERFORMANCE: optimize common case in matches (PORegex) -- Key: PIG-965 URL: https://issues.apache.org/jira/browse/PIG-965 Project: Pig Issue Type: Improvement Components: impl Reporter: Thejas M Nair Assignee: Ankit Modi Attachments: automaton.jar, poregex2.patch Some frequently seen use cases of 'matches' comparison operator have follow properties - 1. The rhs is a constant string . eg c1 matches 'abc%' 2. Regexes such that look for matching prefix , suffix etc are very common. eg - abc%', %abc, '%abc%' To optimize for these common cases , PORegex.java can be changed to - 1. Compile the pattern (rhs of matches) re-use it if the pattern string has not changed. 2. Use string comparisons for simple common regexes (in 2 above). The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789380#action_12789380 ] Hadoop QA commented on PIG-965: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12427730/automaton.jar against trunk revision 889346. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/117/console This message is automatically generated. PERFORMANCE: optimize common case in matches (PORegex) -- Key: PIG-965 URL: https://issues.apache.org/jira/browse/PIG-965 Project: Pig Issue Type: Improvement Components: impl Reporter: Thejas M Nair Assignee: Ankit Modi Attachments: automaton.jar, poregex2.patch Some frequently seen use cases of 'matches' comparison operator have follow properties - 1. The rhs is a constant string . eg c1 matches 'abc%' 2. Regexes such that look for matching prefix , suffix etc are very common. eg - abc%', %abc, '%abc%' To optimize for these common cases , PORegex.java can be changed to - 1. Compile the pattern (rhs of matches) re-use it if the pattern string has not changed. 2. Use string comparisons for simple common regexes (in 2 above). The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789668#action_12789668 ] Thejas M Nair commented on PIG-965: --- Review comments: * The regex will always be on the rhs. So we don't need the code/classes which tries to determine which side has the regular expression based on which side has constant. * in determineBestRegexMethod, need to add (? to the list of regex strings not supported in dk.bricks (in javaRegexOnly) . It has special meanings in java regex, which is not honored by dk.brics . * in determineBestRegexMethod, We are dealing with cases like \d (choose java regex), \\d (choose dk.brics), but not dealing with \\\d (which should be choose java regex). ie we need to go back until we find a non '\' char. * in RegexInit.compile(..), the following message is more appropriate at debug level, not at info . At info level, it might also confuse the user. +log.info(Got an IllegalArgumentException for Pattern: + pattern ); +log.info(e.getMessage()); +log.info(Switching to java.util.regex ); * The following comment in PORegex.java seems to be out of place . // This is a BinaryComparisonOperator hence there can only be two inputs PERFORMANCE: optimize common case in matches (PORegex) -- Key: PIG-965 URL: https://issues.apache.org/jira/browse/PIG-965 Project: Pig Issue Type: Improvement Components: impl Reporter: Thejas M Nair Assignee: Ankit Modi Attachments: automaton.jar, poregex2.patch Some frequently seen use cases of 'matches' comparison operator have follow properties - 1. The rhs is a constant string . eg c1 matches 'abc%' 2. Regexes such that look for matching prefix , suffix etc are very common. eg - abc%', %abc, '%abc%' To optimize for these common cases , PORegex.java can be changed to - 1. Compile the pattern (rhs of matches) re-use it if the pattern string has not changed. 2. Use string comparisons for simple common regexes (in 2 above). The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12784849#action_12784849 ] Thejas M Nair commented on PIG-965: --- In the above performance numbers, I assume optimization 2 (custom string comparison) is used only for the regex .*ABCD.* , while optimization 1 (re-using compiled pattern) is used with dk.brics.automaton as well. Can you please confirm ? From the performance numbers, it looks like we don't need to do optimization 2. We can just use dk.brics.automaton for the common regexes as well and keep the pig code simpler. PERFORMANCE: optimize common case in matches (PORegex) -- Key: PIG-965 URL: https://issues.apache.org/jira/browse/PIG-965 Project: Pig Issue Type: Improvement Components: impl Reporter: Thejas M Nair Assignee: Ankit Modi Some frequently seen use cases of 'matches' comparison operator have follow properties - 1. The rhs is a constant string . eg c1 matches 'abc%' 2. Regexes such that look for matching prefix , suffix etc are very common. eg - abc%', %abc, '%abc%' To optimize for these common cases , PORegex.java can be changed to - 1. Compile the pattern (rhs of matches) re-use it if the pattern string has not changed. 2. Use string comparisons for simple common regexes (in 2 above). The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12784596#action_12784596 ] Ankit Modi commented on PIG-965: I implemented a patch with optimization 1 and 2 mentioned above and another patch with optimization 1,2 and dk.brics.automaton. dk.brics.automaton does not support all features of java.util.regex hence the second patch considers that and switches to java.util.regex if the regex can only be handled by java.util.regex. Here are the numbers ||Regex|| svn_trunk ||Optimization 1 and 2|| dk.brics.automaton|| comments || | .\*ABCD.\* | 92.74 | 50.92| 49.32 | Here only optimization 2 is used | | .\*[A-F]{2,3}.\* |152.3| 133.48| 105.93 | dk.brics.automaton is used | | A.B.C.D | 54.492 | 44.46 | 44.66 | dk.brics.automaton is used | | .\*([A-F]{4})\w\*\1.\* | 129.29 | 112.89 | 109.43 | java.util.regex used in all cases | | .\*\[A-F\]\{4\}\w\*[N-Z]\{3\}.\* | 129.63 | 108.11 | 54.42 | dk.brics.automaton used | These results were obtained using Local Mode on 1 Billion lines of data of following format f1:Chararray(100) of random chars from [A-Z] f2:int random integer dk.brics.automaton provides good performance in case of complex regex. PERFORMANCE: optimize common case in matches (PORegex) -- Key: PIG-965 URL: https://issues.apache.org/jira/browse/PIG-965 Project: Pig Issue Type: Improvement Components: impl Reporter: Thejas M Nair Assignee: Ankit Modi Some frequently seen use cases of 'matches' comparison operator have follow properties - 1. The rhs is a constant string . eg c1 matches 'abc%' 2. Regexes such that look for matching prefix , suffix etc are very common. eg - abc%', %abc, '%abc%' To optimize for these common cases , PORegex.java can be changed to - 1. Compile the pattern (rhs of matches) re-use it if the pattern string has not changed. 2. Use string comparisons for simple common regexes (in 2 above). The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766985#action_12766985 ] Thejas M Nair commented on PIG-965: --- I found another regex library that is supposed to be faster than java.util.regex . - dk.brics.automaton.RegExp (BSD license, used in apache nutch). It does not support all features of java regex, but it is a candidate that can be used for purposes of this patch (common simpler regexes). It is faster than java regex, but much slower than 'optimization2' (see numbers in code comments below) {code} String prefix = 123; Pattern p = Pattern.compile(123.*); RegExp r = new RegExp(123.*); Automaton a = r.toAutomaton(); while((str = in.readLine()) != null ){ // optimization 1 - takes 30 secs //if((p.matcher(str).matches())) //matches++; //optimization 2 - takes 15 secs //int len = prefix.length(); //boolean matched = true; //for(int i=0; ilen; i++){ //if(prefix.charAt(i) != str.charAt(i)){ //matched = false; //break; //} //} //if(matched) //matches++; // dk.brics.automaton - takes 25 secs //if(a.run(str)) //matches++; tot++; } {code} PERFORMANCE: optimize common case in matches (PORegex) -- Key: PIG-965 URL: https://issues.apache.org/jira/browse/PIG-965 Project: Pig Issue Type: Improvement Components: impl Reporter: Thejas M Nair Some frequently seen use cases of 'matches' comparison operator have follow properties - 1. The rhs is a constant string . eg c1 matches 'abc%' 2. Regexes such that look for matching prefix , suffix etc are very common. eg - abc%', %abc, '%abc%' To optimize for these common cases , PORegex.java can be changed to - 1. Compile the pattern (rhs of matches) re-use it if the pattern string has not changed. 2. Use string comparisons for simple common regexes (in 2 above). The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757242#action_12757242 ] Thejas M Nair commented on PIG-965: --- The 'common' use case to which these optimization apply usually has a constant string specifying the pattern. It makes sense to use this optimization only (specifically optimization 2) in such cases, so that the worst case is not worse off. Another thing to check is if there are alternative faster regex implementations . PERFORMANCE: optimize common case in matches (PORegex) -- Key: PIG-965 URL: https://issues.apache.org/jira/browse/PIG-965 Project: Pig Issue Type: Improvement Components: impl Reporter: Thejas M Nair Some frequently seen use cases of 'matches' comparison operator have follow properties - 1. The rhs is a constant string . eg c1 matches 'abc%' 2. Regexes such that look for matching prefix , suffix etc are very common. eg - abc%', %abc, '%abc%' To optimize for these common cases , PORegex.java can be changed to - 1. Compile the pattern (rhs of matches) re-use it if the pattern string has not changed. 2. Use string comparisons for simple common regexes (in 2 above). The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12756673#action_12756673 ] Thejas M Nair commented on PIG-965: --- Hive like clause implementation is here - http://svn.apache.org/viewvc/hadoop/hive/trunk/ql/src/java/org/apache/hadoop /hive/ql/udf/UDFLike.java?revision=802066view=markup I ran simple tests with a simple java program to see the impact of these optimizations. Optimization 1 reduces runtime to 1/2, optimization 2 reduces runtime to 1/4 . {code} int matches =0; int tot = 0; String prefix = 123; Pattern p = Pattern.compile(123.*); while((str = in.readLine()) != null ){ //without proposed optimizations //test setups 1 and 2 took 9secs, 126 secs respectively //if(str.matches(123.*)) //matches++; // with optimization 1 //test sestups 1, 2 took 4, 57 secs respectively //if((p.matcher(str).matches())) //matches++; // with optimization 1 //test sestups 1, 2 took 2.5, 25 secs respectively //takes 2.5, 25 secs //int len = prefix.length(); //boolean matched = true; //for(int i=0; ilen; i++){ //if(prefix.charAt(i) != str.charAt(i)){ //matched = false; //break; //} //} //if(matched) //matches++; tot++; } } System.out.println(matches + matches + tot + tot); {code} PERFORMANCE: optimize common case in matches (PORegex) -- Key: PIG-965 URL: https://issues.apache.org/jira/browse/PIG-965 Project: Pig Issue Type: Improvement Components: impl Reporter: Thejas M Nair Some frequently seen use cases of 'matches' comparison operator have follow properties - 1. The rhs is a constant string . eg c1 matches 'abc%' 2. Regexes such that look for matching prefix , suffix etc are very common. eg - abc%', %abc, '%abc%' To optimize for these common cases , PORegex.java can be changed to - 1. Compile the pattern (rhs of matches) re-use it if the pattern string has not changed. 2. Use string comparisons for simple common regexes (in 2 above). The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.