[ 
https://issues.apache.org/jira/browse/PIG-593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Zaliva updated PIG-593:
-----------------------------

    Status: Patch Available  (was: Open)

deco /Users/lord/java/pig-0.1.1/contrib/piggybank/java/src> svn diff
Index: main/java/org/apache/pig/piggybank/storage/RegExLoader.java
===================================================================
--- main/java/org/apache/pig/piggybank/storage/RegExLoader.java (revision 
730029)
+++ main/java/org/apache/pig/piggybank/storage/RegExLoader.java (working copy)
@@ -57,7 +57,8 @@
         Matcher matcher = pattern.matcher("");
 
         String line;
-        if ((line = in.readLine(utf8, recordDel)) != null) {
+        while((line = in.readLine(utf8, recordDel)) != null) 
+        {
             if (line.length() > 0 && line.charAt(line.length() - 1) == '\r')
                 line = line.substring(0, line.length() - 1);
 





> RegExLoader stops an non-matching line
> --------------------------------------
>
>                 Key: PIG-593
>                 URL: https://issues.apache.org/jira/browse/PIG-593
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.1.0
>            Reporter: Vadim Zaliva
>
> Class RegExLoader and all its subclasses stop if some of lines does not match 
> provided regular expression.
> In particular, I have noticed this when CombinedLogLoader stopped at the 
> following line:
> 58.210.62.24 - - [29/Dec/2008:23:06:57 -0800] "GET 
> /tor/browse/?id=24746&rel=FLY
> 999%40Jack's+Teen+America+22%2FFLY999原創%40單掛D.C.資訊交流網+Jack's+Teen+Ameri
> ca+22+cd1.avi HTTP/1.1" 8952 200 
> "http://img252.imageshack.us/tor/browse/?id=247
> 46&rel=FLY999%40Jack%27s+Teen+America+22" "Mozilla/4.0 (compatible; MSIE 6.0; 
> Wi
> ndows NT 5.1; )" "-"
> Looks like some japanese characters here do not match \S expression used.  
> In general I expect it to skip such lines, not to stop processing data file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to