Hi Brad,

It is a bug. I have filed https://issues.apache.org/jira/browse/SPARK-2908
to track it. It will be fixed soon.

Thanks,

Yin


On Thu, Aug 7, 2014 at 10:55 AM, Brad Miller <bmill...@eecs.berkeley.edu>
wrote:

> Hi All,
>
> I'm having a bit of trouble with nested data structures in pyspark with
> saveAsParquetFile.  I'm running master (as of yesterday) with this pull
> request added: https://github.com/apache/spark/pull/1802.
>
> *# these all work*
> > sqlCtx.jsonRDD(sc.parallelize(['{"record":
> null}'])).saveAsParquetFile('/tmp/test0')
> > sqlCtx.jsonRDD(sc.parallelize(['{"record":
> []}'])).saveAsParquetFile('/tmp/test1')
> > sqlCtx.jsonRDD(sc.parallelize(['{"record": {"children":
> null}}'])).saveAsParquetFile('/tmp/test2')
> > sqlCtx.jsonRDD(sc.parallelize(['{"record": {"children":
> []}}'])).saveAsParquetFile('/tmp/test3')
> > sqlCtx.jsonRDD(sc.parallelize(['{"record": *[{"children": "foobar"}]*
> }'])).saveAsParquetFile('/tmp/test4')
>
> *# this FAILS*
> > sqlCtx.jsonRDD(sc.parallelize(['{"record": *[{"children": null}]*
> }'])).saveAsParquetFile('/tmp/test5')
> Py4JJavaError: An error occurred while calling o706.saveAsParquetFile.
> : java.lang.RuntimeException: *Unsupported datatype NullType*
>
> *# this FAILS*
> > sqlCtx.jsonRDD(sc.parallelize(['{"record": *[{"children": []}]*
> }'])).saveAsParquetFile('/tmp/test6')
> Py4JJavaError: An error occurred while calling o719.saveAsParquetFile.
> : java.lang.RuntimeException: *Unsupported datatype NullType*
>
> Based on the documentation and the examples that work, it seems like the
> failing examples are probably meant to be supported features.  I was unable
> to find an open issue for this.  Does anybody know if there is an open
> issue, or whether an issue should be created?
>
> best,
> -Brad
>

Reply via email to