Hey all,

I try to make a DataFrame by inspection (using Spark 1.4.0), but run into a
parameter of my case class not being supported. Minimal example:

val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
import com.vividsolutions.jts.geom.Coordinate

case class Foo(id: String, coord: Coordinate)
val fooRDD = sc.parallelize(List(Foo("a", new Coordinate(0.0, 0.0)),
Foo("b", new Coordinate(1.0, 1.0))))

fooRDD.toDF()

This results in:

java.lang.UnsupportedOperationException: Schema for type
com.vividsolutions.jts.geom.Coordinate is not supported
at
org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:148)
at
org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:28)
at
org.apache.spark.sql.catalyst.ScalaReflection$anonfun$schemaFor$1.apply(ScalaReflection.scala:123)
at
org.apache.spark.sql.catalyst.ScalaReflection$anonfun$schemaFor$1.apply(ScalaReflection.scala:121)
...


Fair enough, I can work around it, but how can I tell whether a type is
supported? According to the documentation [1], the Coordinate class is
Serializable (not sure if that matters), and it's just 3 doubles. It would
fit a StructType I'd guess, which is indeed used
hen I make my own case class Coordinate(x: Double, y: Double, z: Double).
Does this then only work with basic types and Scala case classes?

With all this, what is best practise to handle non-supported types? Not use
them? Map to Row and use explicit schema? Or is there maybe some implicit
conversion possible (my Scala-fu is not 100% there yet)?

Many thanks for any insights!

Sander


[1]
http://www.vividsolutions.com/jts/javadoc/com/vividsolutions/jts/geom/Coordinate.html

Reply via email to