Re: SparkSQL not honoring schema

2014-12-10 Thread Michael Armbrust
As the scala doc for applySchema says, It is important to make sure that
the structure of every [[Row]] of the provided RDD matches the provided
schema. Otherwise, there will be runtime exceptions.  We don't check as
doing runtime reflection on all of the data would be very expensive.  You
will only get errors if you try to manipulate the data, but otherwise it
will pass it though.

I have written some debugging code (developer API, not guaranteed to be
stable) though that you can use.

import org.apache.spark.sql.execution.debug._
schemaRDD.typeCheck()

On Wed, Dec 10, 2014 at 6:19 PM, Alessandro Baretta alexbare...@gmail.com
wrote:

 Hello,

 I defined a SchemaRDD by applying a hand-crafted StructType to an RDD. Some
 of the Rows in the RDD are malformed--that is, they do not conform to the
 schema defined by the StructType. When running a select statement on this
 SchemaRDD I would expect SparkSQL to either reject the malformed rows or
 fail. Instead, it returns whatever data it finds, even if malformed. Is
 this the desired behavior? Is there no method in SparkSQL to check for
 validity with respect to the schema?

 Thanks.

 Alex



Re: SparkSQL not honoring schema

2014-12-10 Thread Alessandro Baretta
Hey Michael,

Thanks for the clarification. I was actually assuming the query would fail.
Ok, so this means I will have to do the validation in an RDD transformation
feeding into the SchemaRDD.

On Wed, Dec 10, 2014 at 6:27 PM, Michael Armbrust mich...@databricks.com
wrote:

 As the scala doc for applySchema says, It is important to make sure that
 the structure of every [[Row]] of the provided RDD matches the provided
 schema. Otherwise, there will be runtime exceptions.  We don't check as
 doing runtime reflection on all of the data would be very expensive.  You
 will only get errors if you try to manipulate the data, but otherwise it
 will pass it though.

 I have written some debugging code (developer API, not guaranteed to be
 stable) though that you can use.

 import org.apache.spark.sql.execution.debug._
 schemaRDD.typeCheck()

 On Wed, Dec 10, 2014 at 6:19 PM, Alessandro Baretta alexbare...@gmail.com
  wrote:

 Hello,

 I defined a SchemaRDD by applying a hand-crafted StructType to an RDD.
 Some
 of the Rows in the RDD are malformed--that is, they do not conform to the
 schema defined by the StructType. When running a select statement on this
 SchemaRDD I would expect SparkSQL to either reject the malformed rows or
 fail. Instead, it returns whatever data it finds, even if malformed. Is
 this the desired behavior? Is there no method in SparkSQL to check for
 validity with respect to the schema?

 Thanks.

 Alex