bzip2 compression and local mode bugs

                 Key: PIG-752
             Project: Pig
          Issue Type: Bug
            Reporter: David Ciemiewicz

Problem 1)  use of .bz2 file extension does not store results bzip2 compressed 
in Local mode (-exectype local)

If I use the .bz2 filename extension in a STORE statement on HDFS, the results 
are stored with bzip2 compression.
If I use the .bz2 filename extension in a STORE statement on local file system, 
the results are NOT stored with bzip2 compression.

A = load 'events.test' using PigStorage();
store A into 'events.test.bz2' using PigStorage();

C = load 'events.test.bz2' using PigStorage();
C = limit C 10;

dump C;

-bash-3.00$ pig -exectype local compact.bz2.pig

-bash-3.00$ file events.test
events.test: ASCII English text, with very long lines
-bash-3.00$ file events.test.bz2
events.test.bz2: ASCII English text, with very long lines

-bash-3.00$ cat events.test | bzip2 > events.test.bz2
-bash-3.00$ file events.test.bz2
events.test.bz2: bzip2 compressed data, block size = 900k

The output format in local mode is definitely not bzip2, but it should be.

Problem 2) pig in local mode does not decompress bzip2 compressed files, but 
should, to be consistent with HDFS

A = load 'events.test.bz2' using PigStorage();
A = limit A 10;
dump A;

The output should be human readable but is instead garbage, indicating no 
decompression took place during the load:

-bash-3.00$ pig -exectype local read.bz2.pig
USING: /grid/0/gs/pig/current
2009-04-03 18:26:30,455 [main] INFO  
org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
2009-04-03 18:26:30,456 [main] INFO  
org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
((R? 6?*m?&???g, 
a???e????)B??9?                          ?44

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to