Hello, I'm having an issue with regex in pig.
Specifically, I'm loading an apache access log and trying to break out the bits from the query string: logs = LOAD '$input' using logloader as (remoteHost:CHARARRAY, hyphen:CHARARRAY, hyphen2:CHARARRAY, time:CHARARRAY, method:CHARARRAY, uri:CHARARRAY, protocol:CHARARRAY, statusCode:CHARARRAY, responseSize:CHARARRAY, treferer:CHARARRAY, agent:CHARARRAY); full_logs = FOREACH logs GENERATE time, uri, FLATTEN(REGEX_EXTRACT(uri, 'id=[0-9]', 2)); The uri looks like: /khello.html?ref=http%3A%2F%2Fwww.google.com %2F&k=4165427574dfdb75e0a37a8c13ab757d4273a283&id=1234 However when I run this simple pig script, I get the uri but not the 'id' parameter. I then tried using "\d" instead of [0-9] - still won't work. I tried both [0-9] and \d in php and I get 'id=1' and '1' so I'm not sure what I'm doing wrong. Thanks in advance.
