On 03/01/2011 09:05 PM, felix gao wrote: > I am running some comparison tests on a data set that I converted to > avro with deflator set to level 6. The original logs consists of 2880 > uncompressed http access logs with a total size of 1.4TB. The Compressed > avro log is about 2/3 of the size. However, when I ran the same pig job > on the raw logs, it is blazing fast during the initial map phase. > Finished in under 40 min. When I ran the same pig job with avro files, > the initial map phase took 8 minutes to only finish 10%. I am wondering > is there any way to figure out what is slowing down the map?
What version of Avro are you using? How are you integrating Avro with Pig? Also, for speed, you might try level=1 (Deflater.BEST_SPEED). Doug
