Hi Gavin,

You can increase the speed by choosing a better encoding. A little bit of ETL goes a long way.

e.g. As you're working with Spark SQL you probably have a tabular format. So you could use CSV so you don't need to parse the field names on each entry (and it will also reduce the file size). You should also check if you can put your files into Parquet or Avro.

Yours,
Ewan

On 28/08/15 03:58, Gavin Yue wrote:
Hey

I am using the Json4s-Jackson parser coming with spark and parsing roughly 80m 
records with totally size 900mb.

But the speed is slow.  It took my 50 nodes(16cores cpu,100gb mem) roughly 
30mins to parse Json to use spark sql.

Jackson has the benchmark saying parsing should be ms level.

Any way to increase speed?

I am using spark 1.4 on Hadoop 2.7 with Java 8.

Thanks a lot !
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to