SparkSQL not honoring schema

2014-12-10 Thread Alessandro Baretta
Hello, I defined a SchemaRDD by applying a hand-crafted StructType to an RDD. Some of the Rows in the RDD are malformed--that is, they do not conform to the schema defined by the StructType. When running a select statement on this SchemaRDD I would expect SparkSQL to either reject the malformed

Re: SparkSQL not honoring schema

2014-12-10 Thread Michael Armbrust
As the scala doc for applySchema says, It is important to make sure that the structure of every [[Row]] of the provided RDD matches the provided schema. Otherwise, there will be runtime exceptions. We don't check as doing runtime reflection on all of the data would be very expensive. You will

Re: SparkSQL not honoring schema

2014-12-10 Thread Alessandro Baretta
Hey Michael, Thanks for the clarification. I was actually assuming the query would fail. Ok, so this means I will have to do the validation in an RDD transformation feeding into the SchemaRDD. On Wed, Dec 10, 2014 at 6:27 PM, Michael Armbrust mich...@databricks.com wrote: As the scala doc for