Fail underlying M/R jobs when concatenated gzip and bz2 files are provided as 
input
-----------------------------------------------------------------------------------

                 Key: PIG-1304
                 URL: https://issues.apache.org/jira/browse/PIG-1304
             Project: Pig
          Issue Type: New Feature
    Affects Versions: 0.6.0
            Reporter: Viraj Bhat


I have the following txt files which are bzipped: \t =<TAB> 
{code}
$ bzcat A.txt.bz2 
1\ta
2\taa

$bzcat B.txt.bz2
1\tb
2\tbb

$cat *.bz2 > test/mymerge.bz2
$bzcat test/mymerge.bz2 
1\ta
2\taa
1\tb
2\tbb

$hadoop fs -put test/mymerge.bz2 /user/viraj

{code}

I now write a Pig script to print values of bz2.

{code}
A = load '/user/viraj/bzipgetmerge/mymerge.bz2' using PigStorage();
dump A;
{code}

I get the records for the first bz2 file which I concatenated.

(1,a)
(2,aa)

My M/R jobs do not fail or throw any warning about this, just that it drops 
records. Is there a way we can throw a warning or fail the underlying Map job, 
can it be done in Bzip2TextInputFormat class in Pig ?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to