Thejas M Nair commented on PIG-965:

The 'common' use case to which these optimization apply usually has a constant 
string specifying the pattern. It makes sense to use this optimization only 
(specifically optimization 2) in such cases, so that the worst case is not 
worse off.

Another thing to check is if there are alternative faster regex implementations 

> PERFORMANCE: optimize common case in matches (PORegex)
> ------------------------------------------------------
>                 Key: PIG-965
>                 URL: https://issues.apache.org/jira/browse/PIG-965
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>            Reporter: Thejas M Nair
> Some frequently seen use cases of 'matches' comparison operator have follow 
> properties -
> 1. The rhs is a constant string . eg "c1 matches 'abc%' "
> 2. Regexes such that look for matching prefix , suffix etc are very common. 
> eg - "abc%', "%abc", '%abc%' 
> To optimize for these common cases , PORegex.java can be changed to -
> 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
> not changed. 
> 2. Use string comparisons for simple common regexes (in 2 above).
> The implementation of Hive like clause uses similar optimizations.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to