Yes, I think too, it should behave like 'matches' and REGEX_EXTRACT_ALL. The problem is about REGEX_EXTRACT.
A simple workaround is to use REGEX_EXTRACT_ALL but it defeats the purpose of having REGEX_EXTRACT. e.g. REGEX_EXTRACT_ALL(value, '(.+?)/?').$0 and it is a bug difficult to see. I posted: https://issues.apache.org/jira/browse/PIG-2514 Romain On Sun, Feb 5, 2012 at 9:41 PM, Dmitriy Ryaboy <[email protected]> wrote: > I think the intent is to behave the same way as the Pig "matches" operator > (which, unsurprisingly, uses the Java matches method). > > RegexExtractAll becomes quite confusing if it means "extract all matched > subexpressions of the first match of the expression" (one might expect > "all" to refer to all matches of the expression itself). > > At the very least, the behavior should be documented. > > On Fri, Feb 3, 2012 at 5:29 PM, Romain Rigaux <[email protected] > >wrote: > > > Hello, > > > > REGEX_EXTRACT is using Matcher.find() instead of Matcher.matches() and so > > does not work with some non greedy regular expression. > > > > Is it the wanted behavior? > > > > Thanks, > > > > Romain > > > > > > > http://docs.oracle.com/javase/1.4.2/docs/api/java/util/regex/Matcher.html > > > > > > > > - > > > > The matches< > http://docs.oracle.com/javase/1.4.2/docs/api/java/util/regex/Matcher.html#matches()> > method > > attempts to match the entire input sequence against the pattern. > > - The find< > http://docs.oracle.com/javase/1.4.2/docs/api/java/util/regex/Matcher.html#find()> > method > > scans the input sequence looking for the next subsequence that > matches the > > pattern. > > > > > > > > > > System.out.println("Pig's way with m.find()"); > > String a = "hdfs://mygrid.com/projects/"; > > Matcher m = Pattern.compile("(.+?)/?").matcher(a); > > System.out.println(m.find()); > > System.out.println(m.group(1)); > > System.out.println(m.start()); > > System.out.println(m.end()); > > > > System.out.println("\nm.matches()"); > > a = "hdfs://mygrid.com/projects/"; > > m = Pattern.compile("(.+?)/?").matcher(a); > > System.out.println(m.matches()); > > System.out.println(m.group(1)); > > System.out.println(m.start()); > > System.out.println(m.end()); > > > > System.out.println("\nREGEX_EXTRACT m.find()"); > > Tuple t = TupleFactory.getInstance().newTuple(); > > t.append(a); > > t.append("(.+?)/?"); > > t.append(1); > > System.out.println(new TestPigExtractAll().new > > REGEX_EXTRACT().exec(t)); > > > > > > Output: > > > > Pig's way with m.find() > > true > > h > > 0 > > 1 > > > > m.matches() > > true > > hdfs://mygrid.com/projects > > 0 > > 27 > > > > REGEX_EXTRACT m.find() > > h > > > > > > >
