I ended up fixing this issue - i did change it to a bag after but the main problem was that regexextractall was returning everything as a string (bia group) which meant that max, avg etc... was not matched as a matching function for a bag of tuple doubles.
I ended up writing a new udf for extractall to return types based on whether \d or \w was used in the regexp. Flattening that to specfic types didnt work. That solved the issue, would appreciate the feedback on the udf and approach - will post it early next week on pastebin. If there's a better way then please let me know. This whole solution was because I wanted to get around the issue of creating a new udf for each log line type I needed to parse. Many thanks, Jon On 24 Jun 2011, at 23:45, Dmitriy Ryaboy <[email protected]> wrote: > <mime-attachment.txt>
