On Wed, May 27, 2015 at 01:13:43PM -0700, Ted Yu wrote:
Can you tell us a bit more about (schema of) your JSON ?

It's fairly simple, consisting of 22 fields with values that are mostly strings or integers, except that some of the fields are objects with http header/value pairs. I'd guess it's something in those latter fields that is causing the problems. The data is 800M rows that I didn't create in the first place and I'm in the process of making a simpler test case. What I was mostly wondering is if there were an obvious mechanism that I'm just missing to get jsonRDD to spit out more information about which specific rows it's having problems with.

You can find sample JSON in sql/core/src/test//scala/org/apache/spark/sql/json/
TestJsonData.scala

I know the jsonRDD works in general, I've used it before without problems. It even works on subsets of this data.
Mike Stone

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to