subject:"Avro vs Parquet performance on Pig"

Re: Avro vs Parquet performance on Pig

2019-02-15 Thread Mario Ferreira

I was under the impression that ORC files with snappy compression would prove to be better unless your processing was columnar in nature. Isn't that the case? On Thu, Feb 7, 2019, 21:54 Russell Jurney wrote: > Sorry if this isn't helpful, but the other obvious thing is to store > intermediate

Re: Avro vs Parquet performance on Pig

2019-02-11 Thread Rohini Palaniswamy

You might need https://issues.apache.org/jira/browse/PIG-4092 On Thu, Feb 7, 2019 at 3:54 PM Russell Jurney wrote: > Sorry if this isn't helpful, but the other obvious thing is to store > intermediate data in Parquet whenever you repeat code/data that can be > shared between jobs. If tests

Re: Avro vs Parquet performance on Pig

2019-02-07 Thread Russell Jurney

Sorry if this isn't helpful, but the other obvious thing is to store intermediate data in Parquet whenever you repeat code/data that can be shared between jobs. If tests indicate it is faster. Before Parquet this wasn't necessarily advantageous as IO from disk is slower than IO through RAM which

Re: Avro vs Parquet performance on Pig

2019-02-07 Thread Michael Doo

Indeed. When loading Parquet using org.apache.parquet.pig.ParquetLoader(), we're specifying the schema for which columns we want to load. On 2/7/19, 5:14 PM, "Russell Jurney" wrote: Well, the obvious thing is to load only those columns you need. Just in case you’re not doing this.

Re: Avro vs Parquet performance on Pig

2019-02-07 Thread Russell Jurney

Well, the obvious thing is to load only those columns you need. Just in case you’re not doing this. On Thu, Feb 7, 2019 at 2:04 PM Michael Doo wrote: > Hey all, > I’ve been migrating some processes over from ingesting Avro to ingesting > Parquet. In Spark, we’re seeing 2x-8x performance gains

Avro vs Parquet performance on Pig

2019-02-07 Thread Michael Doo

Hey all, I’ve been migrating some processes over from ingesting Avro to ingesting Parquet. In Spark, we’re seeing 2x-8x performance gains when using Parquet over Avro. In Pig, similar processes are about the same runtime between the two formats (and sometimes even higher using Parquet). We’ve

Re: Avro vs Parquet performance on Pig

Re: Avro vs Parquet performance on Pig

Re: Avro vs Parquet performance on Pig

Re: Avro vs Parquet performance on Pig

Re: Avro vs Parquet performance on Pig

Avro vs Parquet performance on Pig

6 matches

Site Navigation

Mail list logo

Footer information