Hi Gavin,
You can increase the speed by choosing a better encoding. A little bit
of ETL goes a long way.
e.g. As you're working with Spark SQL you probably have a tabular
format. So you could use CSV so you don't need to parse the field names
on each entry (and it will also reduce the file size). You should also
check if you can put your files into Parquet or Avro.
Yours,
Ewan
On 28/08/15 03:58, Gavin Yue wrote:
Hey
I am using the Json4s-Jackson parser coming with spark and parsing roughly 80m
records with totally size 900mb.
But the speed is slow. It took my 50 nodes(16cores cpu,100gb mem) roughly
30mins to parse Json to use spark sql.
Jackson has the benchmark saying parsing should be ms level.
Any way to increase speed?
I am using spark 1.4 on Hadoop 2.7 with Java 8.
Thanks a lot !
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org