[ 
https://issues.apache.org/jira/browse/PIG-472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638468#action_12638468
 ] 

Ian Holsman commented on PIG-472:
---------------------------------

hey guys.

I just noticed that it RegexLoader stops when it finds a bad line (that doesn't 
match the regex pattern). 
I modified the loader so it will skip those bad lines and continue, logging an 
error (changing the 'if' to a 'while').

svn isn't working for me right now, so excuse the paste:

 66         while ((line = in.readLine(utf8, recordDel)) != null) {
 67             if (line.length() > 0 && line.charAt(line.length() - 1) == '\r')
 68                 line = line.substring(0, line.length() - 1);
 69 
 70             matcher.reset(line);
 71             if (matcher.find()) {
 72                 ArrayList<Datum> list = new ArrayList<Datum>();
 73 
 74                 for (int i = 1; i <= matcher.groupCount(); i++) {
 75                     list.add(new DataAtom(matcher.group(i)));
 76                 }
 77                 return new Tuple(list);
 78             }
 79             else {
 80                 log.warn("Warning: Line " + line + " did not match the 
regex. Skipping it");
 81             }
 82         }
 83         return null;



> load files based on user provided regular expressions
> -----------------------------------------------------
>
>                 Key: PIG-472
>                 URL: https://issues.apache.org/jira/browse/PIG-472
>             Project: Pig
>          Issue Type: New Feature
>          Components: data, grunt
>    Affects Versions: 0.1.0
>            Reporter: Earl Cahill
>             Fix For: 0.1.0
>
>         Attachments: RegExLoader-PIG-472
>
>
> Want to be able to load files based on regular expressions.  Each group 
> specified in parenthesis should end up as a DataAtom, and the list of 
> DataAtoms should end up in a Tuple.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to