FYI, deflateCodec(9) rarely improves compression over level 6, but is much
slower to write. 

Also, unless you increase the block size in the file to over 256KB it
probably won't improve it at all.  The primary thing that larger
deflate/gzip compression levels do is increase the size of the lookback
window for finding duplicate segments.

In short, with your actual data, try different compression levels and buffer
sizes and see what works best for you.   The best choice is almost never
compression level 9.

I often end up with compression level 3 or 1 when I need the speed, and
level 6 or 7 with larger blocks for 'archival' use.
A useful link comparing speed to compression ratio for gzip (gzip is deflate
with a different header and crc) is:
http://tukaani.org/lzma/benchmarks.html

As you can see, compression level 9 is typically 2 to 3 times slower than
level 6 and only a tiny fraction better compression ratio.

On 8/8/11 12:56 PM, "Poole, Samuel [USA]" <[email protected]> wrote:

> Thank you very much. Yes, this works good. And then I took it one step further
> to try and get the schema put in the file and also to apply encoding.
> 
>  
> 
>  
> 
> FOO fooObj = ....
> BAR barObj = ....
> BAR barObj2 = ....
>         ByteArrayOutputStream out = new ByteArrayOutputStream();
>         DatumWriter<SpecificRecord> writer = new
> SpecificDatumWriter<SpecificRecord>(yourSchema);
>         
>         DataFileWriter filewriter=new DataFileWriter(writer);
>         CodecFactory codec = CodecFactory.deflateCodec(9);
> 
>         filewriter.setCodec(codec);
> 
>  
> 
>         filewriter.create(yourSchema,out);
> 
>         
> 
>         encoder = EncoderFactory.get().binaryEncoder(out, encoder);
> 
>  
> 
>         filewriter.append(fooObj);
> 
>         filewriter.append(barObj);
> 
>         filewriter.append(barObj2);
> 
>  
> 
>         OutputStream outstream=new
> FileOutputStream("/somefolder/somefile.avro");
> 
>         out.writeTo(outstream);
> 
>  
>  
>  
> this code works, but now I have an issue with reading the file....
>  
> When I read the file, I can only see the first datum in the union.  I know
> that all of the datums were written to the file because of the size of the
> file, but I can't read all of the datums.
>  
> Here is my code to read the union file.
>  
> Schema yourSchema=Schema.parse(new File("/somefolder/someschema.avro"));
> 
> DatumReader<SpecificRecord> datumreader=new
> SpecificDatumReader<SpecificRecord>(yourSchema);
> 
> DataFileReader reader=new DataFileReader(new
> File("/somefolder/somefile.avro"),datumreader);
> 
>  
> 
> if(reader.hasNext()){
> 
>     SpecificRecord result=(SpecificRecord) reader.next();
> 
>     System.out.println(result.getClass());
> 
> }
> 
>  
>  
> Not sure if I have a problem with how I created the file or how I am reading
> the file....
>  
> Any ideas?
>  
>  
> 
> From: Vyacheslav Zholudev [[email protected]]
> Sent: Monday, August 08, 2011 12:52 PM
> To: [email protected]
> Subject: Re: Java Example of writing a union
> 
> I'm assuming for now that you are using a specific writer and you have a union
> schema with two records FOO and BAR (you should get two classes FOO and BAR
> generated by avro tools):
> 
> FOO fooObj = ....
> BAR barObj = ....
> BAR barObj2 = ....
>         ByteArrayOutputStream out = new ByteArrayOutputStream();
>         DatumWriter<GenericRecord> writer = new
> SpecificDatumWriter<Record>(yourSchema);
>         encoder = EncoderFactory.get().binaryEncoder(out, encoder);
>         writer.write(fooObj, encoder);
>         writer.write(barObj, encoder);
>         writer.write(barObj2, encoder);
>         encoder.flush();
>         out.close();
> 
> Does it make sense?
> 
> Vyacheslav
> 
> On Aug 8, 2011, at 3:53 PM, Sam Poole wrote:
> 
>> Does anybody have an example of writing a file that uses a union schema?  I
>> am having problems trying to write a file that uses a union schema because
>> once I set the schema, I can't add an individual datum because it is not
>> part of a union.
>> 
>> 
>> 
>> --
>> View this message in context:
>> http://apache-avro.679487.n3.nabble.com/Java-Example-of-writing-a-union-tp323
>> 5624p3235624.html
>> Sent from the Avro - Users mailing list archive at Nabble.com
>> <http://Nabble.com> .
> 


Reply via email to