[
https://issues.apache.org/jira/browse/PIG-472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638468#action_12638468
]
Ian Holsman commented on PIG-472:
---------------------------------
hey guys.
I just noticed that it RegexLoader stops when it finds a bad line (that doesn't
match the regex pattern).
I modified the loader so it will skip those bad lines and continue, logging an
error (changing the 'if' to a 'while').
svn isn't working for me right now, so excuse the paste:
66 while ((line = in.readLine(utf8, recordDel)) != null) {
67 if (line.length() > 0 && line.charAt(line.length() - 1) == '\r')
68 line = line.substring(0, line.length() - 1);
69
70 matcher.reset(line);
71 if (matcher.find()) {
72 ArrayList<Datum> list = new ArrayList<Datum>();
73
74 for (int i = 1; i <= matcher.groupCount(); i++) {
75 list.add(new DataAtom(matcher.group(i)));
76 }
77 return new Tuple(list);
78 }
79 else {
80 log.warn("Warning: Line " + line + " did not match the
regex. Skipping it");
81 }
82 }
83 return null;
> load files based on user provided regular expressions
> -----------------------------------------------------
>
> Key: PIG-472
> URL: https://issues.apache.org/jira/browse/PIG-472
> Project: Pig
> Issue Type: New Feature
> Components: data, grunt
> Affects Versions: 0.1.0
> Reporter: Earl Cahill
> Fix For: 0.1.0
>
> Attachments: RegExLoader-PIG-472
>
>
> Want to be able to load files based on regular expressions. Each group
> specified in parenthesis should end up as a DataAtom, and the list of
> DataAtoms should end up in a Tuple.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.