That was pig 0.10. This line: matched = FOREACH counts_raw GENERATE com.kebinger.pigbat.BYTES_TO_INT(key,0) as portal_id, (html matches '(?s).*generator" content="WordPress.*|.*wp-content.*') as wp_match:boolean;
Gives me the error ERROR 1200: <file count_wordpress_pages.pig, line 18, column 93> Syntax error, unexpected symbol at or near 'html' Taking off the parens ERROR 1200: <file count_wordpress_pages.pig, line 18, column 97> mismatched input 'matches' expecting SEMI_COLON and converting to an int as suggested later in the thread: matched = FOREACH counts_raw GENERATE com.kebinger.pigbat.BYTES_TO_INT(key,0) as portal_id, (html matches '(?s).*generator" content="WordPress.*|.*wp-content.*' ? 1 : 0) as wp_match:int; does work. So the int approach is a nice work around On Thu, Sep 27, 2012 at 12:38 PM, Alan Gates <[email protected]> wrote: > What version of Pig are you using? > > Alan. > > On Sep 27, 2012, at 8:54 AM, James Kebinger wrote: > > > Hello, I'm having some trouble doing something I thought would be easy: > I'd > > like to use matches to generate a boolean flag but this seems to not > > compile: > > > > FOREACH html_pages GENERATE portal_id, html matches 'some pattern' as > > wp_match:boolean; > > > > I've tried wrapping it in parens too, with no luck. > > > > Is this possible, or am I out of luck? > > > > thanks > >
