Hi Daniel,
Right now, you need to do the transformation manually. The feature you need
is under development (https://issues.apache.org/jira/browse/SPARK-4190).
Thanks,
Yin
On Tue, Nov 4, 2014 at 2:44 AM, Gerard Maas wrote:
> You could transform the json to a case class instead of serializing
You could transform the json to a case class instead of serializing it back
to a String. The resulting RDD[MyCaseClass] is then directly usable as a
SchemaRDD using the register function implicitly provided by 'import
sqlContext.schemaRDD'. Then the rest of your pipeline will remain the same.
-kr,
I am trying to convert terabytes of json log files into parquet files.
but I need to clean it a little first.
I end up doing the following
txt = sc.textFile(inpath).coalesce(800)
val json = (for {
line <- txt
JObject(child) = parse(line)
child2 = (for {
JFiel