bzip2 compression and local mode bugs
-------------------------------------

                 Key: PIG-752
                 URL: https://issues.apache.org/jira/browse/PIG-752
             Project: Pig
          Issue Type: Bug
            Reporter: David Ciemiewicz


Problem 1)  use of .bz2 file extension does not store results bzip2 compressed 
in Local mode (-exectype local)

If I use the .bz2 filename extension in a STORE statement on HDFS, the results 
are stored with bzip2 compression.
If I use the .bz2 filename extension in a STORE statement on local file system, 
the results are NOT stored with bzip2 compression.

compact.bz2.pig:
{code}
A = load 'events.test' using PigStorage();
store A into 'events.test.bz2' using PigStorage();

C = load 'events.test.bz2' using PigStorage();
C = limit C 10;

dump C;
{code}

{code}
-bash-3.00$ pig -exectype local compact.bz2.pig

-bash-3.00$ file events.test
events.test: ASCII English text, with very long lines
-bash-3.00$ file events.test.bz2
events.test.bz2: ASCII English text, with very long lines

-bash-3.00$ cat events.test | bzip2 > events.test.bz2
-bash-3.00$ file events.test.bz2
events.test.bz2: bzip2 compressed data, block size = 900k
{code}

The output format in local mode is definitely not bzip2, but it should be.
{code}


Problem 2) pig in local mode does not decompress bzip2 compressed files, but 
should, to be consistent with HDFS

read.bz2.pig:
{code}
A = load 'events.test.bz2' using PigStorage();
A = limit A 10;
dump A;
{code}

The output should be human readable but is instead garbage, indicating no 
decompression took place during the load:

{code}
-bash-3.00$ pig -exectype local read.bz2.pig
USING: /grid/0/gs/pig/current
2009-04-03 18:26:30,455 [main] INFO  
org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
2009-04-03 18:26:30,456 [main] INFO  
org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
(BZh91AY&syoz?u?...@{????????????????????x_?d?|u????-??mK???;??????????????4?C??)
((R? 6?*m?&???g, 
?6?Zj?????k,???0?????QT?d???hY?#m????J?>????????[j???z?m?t?u?K)??K5+??)?m?E7j?X?8????????a??
??U?p@@????MT?$?B?P??N??=???(????z<}gk...@????c$\??i????]?g:?J)
a(R?,?u?v???...@?i@??J??!D?)???A?PP?IY??m?
(m????P(i?4,#F[?I)@????>?...@??|7^?}U??w????wg,?u?$?T???????((Q!D?=`*?}h????????P??_|??=?(??2???m=?????xG?(?rC?B?(33??:4?N???????t????|??T?*??k??????NT?x???=?fyv?w>f??????4z???4t?)
(?oou?t???Kwl?????3?n????CM?WS?;l???P?s?x
a???e????)B??9?                          ?44
((?...@4?)
(f????)
(?...@+?d?0@>?U)
(Q?SR)
-bash-3.00$ 
{code}



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to