Hey Jonathan,

You need to escape the backslash as well (it has a meaning in the string
literals in Pig):

b = FILTER a BY REGEX_EXTRACT(f1,'(.*)\\:(.*)',1) == 'hi';

If you'd want to escape a single backslash, it'd become '\\\\'.

Best,
-Sven


On Tue, Apr 19, 2011 at 4:00 PM, Jonathan Hoover <[email protected]>wrote:

> Hello,
>
> I am having a problem escaping a ":" and a "." in a regular expression
> within the REGEX_EXTRACT() function shown at
> http://pig.apache.org/docs/r0.8.0/piglatin_ref2.html#REGEX_EXTRACT. Here's
> a simplified example, though the example in the docs gives me the problem as
> well. I've tried it without the "\" in front of the ":", but that doesn't
> work right either (returns the whole line). So, how do I escape the ":", and
> also I need to escape a "." as well in my actual script.
>
> ------INPUT FILE------
> hi:1    num1    num2    num3
> hi:20   num1    blah    boo
> ho:30   num1    blah    foo
> bar:30  foo     foo     foo
> bar:40  foo     far     away
> bar:40  far     far     far
>
> ------PIG SCRIPT------
> a = LOAD 'fromabs-colons' USING PigStorage AS (f1,f2,f3,f4);
> b = FILTER a BY REGEX_EXTRACT(f1,'(.*)\:(.*)',1) == 'hi';
> DUMP b;
>
> ------WHAT I EXPECT---
> (hi:1,num1,num2,num3)
> (hi:20,num1,blah,foo)
>
> ------ERROR I GET-----
> 2011-04-19 22:55:43,844 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1000: Error during parsing. Lexical error at line 1, column 40.
>  Encountered: ":" (58), after : "\'(.*)\\"
>
> ------PIG VERSION-----
> Apache Pig version 0.8.0..1103222002 (r1084466)
>

Reply via email to