Re: How to increase the Json parsing speed

Ewan Higgs Fri, 28 Aug 2015 00:43:10 -0700

Hi Gavin,

You can increase the speed by choosing a better encoding. A little bitof ETL goes a long way.

e.g. As you're working with Spark SQL you probably have a tabularformat. So you could use CSV so you don't need to parse the field nameson each entry (and it will also reduce the file size). You should alsocheck if you can put your files into Parquet or Avro.


Yours,
Ewan

On 28/08/15 03:58, Gavin Yue wrote:

Hey

I am using the Json4s-Jackson parser coming with spark and parsing roughly 80m 
records with totally size 900mb.

But the speed is slow.  It took my 50 nodes(16cores cpu,100gb mem) roughly 
30mins to parse Json to use spark sql.

Jackson has the benchmark saying parsing should be ms level.

Any way to increase speed?

I am using spark 1.4 on Hadoop 2.7 with Java 8.

Thanks a lot !
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: How to increase the Json parsing speed

Reply via email to