What was the compression ratio you saw?
I get the correct results, but the data size is almost same as uncompressed text.

searches = load '/user/testuser/aol_search_logs.txt' as (ID : int, Query : chararray, QueryTime : chararray, ItemRank : int, ClickURL : chararray); store searches into '/user/testuser/aol_search_logs.avro' using AvroStorage();

I also tried -

SET avro.output.codec snappy
SET mapred.output.compress true
searches = load '/user/testuser/aol_search_logs.avro' using org.apache.pig.piggybank.storage.avro.AvroStorage(); store searches into '/user/testuser/aol_search_logs.snappy.avro' using org.apache.pig.piggybank.storage.avro.AvroStorage();

-Thejas



On 10/22/12 6:02 AM, Ruslan Al-Fakikh wrote:
How do you generate your Avro files?
It worked OK for me with:

SET avro.mapred.deflate.level 5
inputData = LOAD 'input path' USING
org.apache.pig.piggybank.storage.avro.AvroStorage();
STORE inputData INTO 'output path' USING
org.apache.pig.piggybank.storage.avro.AvroStorage();

But I did these tests a long time ago with an old version.

Ruslan

On Sun, Oct 21, 2012 at 9:22 AM, Thejas Nair <[email protected]> wrote:
Based on AvroStorage code and documentation, it looks like compression is
enabled by default, codec set to "deflate". But the file size is almost same
as that of uncompressed tab separated text data.

This is probably a bug in AvroStorage, but I wanted to check if this is
somehow expected, before I open a jira to track it.

Uncompressed txt     2.12 GB
avro (default compression)    2.09 GB
avro + snappy compression     2.09 GB
lzo compressed txt      0.69 GB


Thanks,
Thejas


Reply via email to