Fail underlying M/R jobs when concatenated gzip and bz2 files are provided as input -----------------------------------------------------------------------------------
Key: PIG-1304 URL: https://issues.apache.org/jira/browse/PIG-1304 Project: Pig Issue Type: New Feature Affects Versions: 0.6.0 Reporter: Viraj Bhat I have the following txt files which are bzipped: \t =<TAB> {code} $ bzcat A.txt.bz2 1\ta 2\taa $bzcat B.txt.bz2 1\tb 2\tbb $cat *.bz2 > test/mymerge.bz2 $bzcat test/mymerge.bz2 1\ta 2\taa 1\tb 2\tbb $hadoop fs -put test/mymerge.bz2 /user/viraj {code} I now write a Pig script to print values of bz2. {code} A = load '/user/viraj/bzipgetmerge/mymerge.bz2' using PigStorage(); dump A; {code} I get the records for the first bz2 file which I concatenated. (1,a) (2,aa) My M/R jobs do not fail or throw any warning about this, just that it drops records. Is there a way we can throw a warning or fail the underlying Map job, can it be done in Bzip2TextInputFormat class in Pig ? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.