Doug, I am using avro 1.4.1 and the problem is not avro been slow, is the AvroStorage does a recursive schema validation that makes it so slow. It is fixed now.
Felix On Fri, Mar 4, 2011 at 9:25 AM, Doug Cutting <[email protected]> wrote: > On 03/01/2011 09:05 PM, felix gao wrote: > > I am running some comparison tests on a data set that I converted to > > avro with deflator set to level 6. The original logs consists of 2880 > > uncompressed http access logs with a total size of 1.4TB. The Compressed > > avro log is about 2/3 of the size. However, when I ran the same pig job > > on the raw logs, it is blazing fast during the initial map phase. > > Finished in under 40 min. When I ran the same pig job with avro files, > > the initial map phase took 8 minutes to only finish 10%. I am wondering > > is there any way to figure out what is slowing down the map? > > What version of Avro are you using? How are you integrating Avro with Pig? > > Also, for speed, you might try level=1 (Deflater.BEST_SPEED). > > Doug >
