Hey all, I try to make a DataFrame by inspection (using Spark 1.4.0), but run into a parameter of my case class not being supported. Minimal example:
val sqlContext = new org.apache.spark.sql.SQLContext(sc) import sqlContext.implicits._ import com.vividsolutions.jts.geom.Coordinate case class Foo(id: String, coord: Coordinate) val fooRDD = sc.parallelize(List(Foo("a", new Coordinate(0.0, 0.0)), Foo("b", new Coordinate(1.0, 1.0)))) fooRDD.toDF() This results in: java.lang.UnsupportedOperationException: Schema for type com.vividsolutions.jts.geom.Coordinate is not supported at org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:148) at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:28) at org.apache.spark.sql.catalyst.ScalaReflection$anonfun$schemaFor$1.apply(ScalaReflection.scala:123) at org.apache.spark.sql.catalyst.ScalaReflection$anonfun$schemaFor$1.apply(ScalaReflection.scala:121) ... Fair enough, I can work around it, but how can I tell whether a type is supported? According to the documentation [1], the Coordinate class is Serializable (not sure if that matters), and it's just 3 doubles. It would fit a StructType I'd guess, which is indeed used hen I make my own case class Coordinate(x: Double, y: Double, z: Double). Does this then only work with basic types and Scala case classes? With all this, what is best practise to handle non-supported types? Not use them? Map to Row and use explicit schema? Or is there maybe some implicit conversion possible (my Scala-fu is not 100% there yet)? Many thanks for any insights! Sander [1] http://www.vividsolutions.com/jts/javadoc/com/vividsolutions/jts/geom/Coordinate.html