Re: Compare performance of sqlContext.jsonFile and sqlContext.jsonRDD

2014-12-11 Thread Cheng Lian
There are several overloaded versions of both |jsonFile| and |jsonRDD|. Schema inferring is kinda expensive since it requires an extra Spark job. You can avoid schema inferring by storing the inferred schema and then use it together with the following two methods: * |def jsonFile(path: String

Compare performance of sqlContext.jsonFile and sqlContext.jsonRDD

2014-12-10 Thread Rakesh Nair
Couple of questions : 1. "sqlContext.jsonFile" reads a json file, infers the schema for the data stored, and then returns a SchemaRDD. Now, i could also create a SchemaRDD by reading a file as text(which returns RDD[String]) and then use the "jsonRDD" method. My question, is the "jsonFile" way of c