Hi,

I am new to spark. I met a problem when I intended to load one dataset.

I have a dataset where the data is in json format and I'd like to load it
as a RDD.

As one record may span multiple lines, so SparkContext.textFile() is not
doable. I also tried to use json4s to parse the json manually and then
merge them into RDD one by one, but this solution is not convenient and low
efficient.

It seems that there is JsonRDD in SparkSQL, but it seems that it is for
query only.

Could any one provide me some suggestion about how to load json format data
as RDD? For example, given the file path, load the dataset as RDD[JObject].

Thank you very much!

Regards,
J

Reply via email to