RegExLoader hangs on lines that don't match the regular expression
------------------------------------------------------------------

                 Key: PIG-1449
                 URL: https://issues.apache.org/jira/browse/PIG-1449
             Project: Pig
          Issue Type: Bug
    Affects Versions: 0.7.0
            Reporter: Justin Sanders
            Priority: Minor


In the 0.7.0 changes to RegExLoader there was a bug introduced where the code 
will stay in the while loop if the line isn't matched.  Before 0.7.0 these 
lines would be skipped if they didn't match the regular expression.  The result 
is the mapper will not respond and will time out with "Task attempt_X failed to 
report status for 600 seconds. Killing!".

Here are the steps to recreate the bug:

Create a text file in HDFS with the following lines:

test1
testA
test2

Run the following pig script:

REGISTER /usr/local/pig/contrib/piggybank/java/piggybank.jar;
test = LOAD '/path/to/test.txt' using 
org.apache.pig.piggybank.storage.MyRegExLoader('(test\\d)') AS (line);
dump test;

Expected result:

(test1)
(test3)

Actual result:

Job fails to complete after 600 second timeout waiting on the mapper to 
complete.  The mapper hangs at 33% since it can process the first line but gets 
stuck into the while loop on the second line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to