Re: SQLContext.applySchema strictness
Applying schema is a pretty low-level operation, and I would expect most users would use the type safe interfaces. If you are unsure you can always run: import org.apache.spark.sql.execution.debug._ schemaRDD.typeCheck() and it will tell you if you have made any mistakes. Michael On Sat, Feb 14, 2015 at 1:05 PM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > Would it make sense to add an optional validate parameter to applySchema() > which defaults to False, both to give users the option to check the schema > immediately and to make the default behavior clearer? > > > On Sat Feb 14 2015 at 9:18:59 AM Michael Armbrust > wrote: > >> Doing runtime type checking is very expensive, so we only do it when >> necessary (i.e. you perform an operation like adding two columns together) >> >> On Sat, Feb 14, 2015 at 2:19 AM, nitin wrote: >> >>> AFAIK, this is the expected behavior. You have to make sure that the >>> schema >>> matches the row. It won't give any error when you apply the schema as it >>> doesn't validate the nature of data. >>> >>> >>> >>> -- >>> View this message in context: >>> http://apache-spark-user-list.1001560.n3.nabble.com/SQLContext-applySchema-strictness-tp21650p21653.html >>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>> >>> - >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> For additional commands, e-mail: user-h...@spark.apache.org >>> >>> >>
Re: SQLContext.applySchema strictness
Would it make sense to add an optional validate parameter to applySchema() which defaults to False, both to give users the option to check the schema immediately and to make the default behavior clearer? On Sat Feb 14 2015 at 9:18:59 AM Michael Armbrust wrote: > Doing runtime type checking is very expensive, so we only do it when > necessary (i.e. you perform an operation like adding two columns together) > > On Sat, Feb 14, 2015 at 2:19 AM, nitin wrote: > >> AFAIK, this is the expected behavior. You have to make sure that the >> schema >> matches the row. It won't give any error when you apply the schema as it >> doesn't validate the nature of data. >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/SQLContext-applySchema-strictness-tp21650p21653.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >
Re: SQLContext.applySchema strictness
Doing runtime type checking is very expensive, so we only do it when necessary (i.e. you perform an operation like adding two columns together) On Sat, Feb 14, 2015 at 2:19 AM, nitin wrote: > AFAIK, this is the expected behavior. You have to make sure that the schema > matches the row. It won't give any error when you apply the schema as it > doesn't validate the nature of data. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/SQLContext-applySchema-strictness-tp21650p21653.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >
Re: SQLContext.applySchema strictness
AFAIK, this is the expected behavior. You have to make sure that the schema matches the row. It won't give any error when you apply the schema as it doesn't validate the nature of data. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SQLContext-applySchema-strictness-tp21650p21653.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: SQLContext.applySchema strictness
OK, but what about on an action, like collect()? Shouldn't it be able to determine the correctness at that time? On Fri, Feb 13, 2015 at 4:49 PM, Yin Huai wrote: > Hi Justin, > > It is expected. We do not check if the provided schema matches rows since > all rows need to be scanned to give a correct answer. > > Thanks, > > Yin > > On Fri, Feb 13, 2015 at 1:33 PM, Justin Pihony > wrote: > >> Per the documentation: >> >> It is important to make sure that the structure of every Row of the >> provided RDD matches the provided schema. Otherwise, there will be runtime >> exception. >> >> However, it appears that this is not being enforced. >> >> import org.apache.spark.sql._ >> val sqlContext = new SqlContext(sc) >> val struct = StructType(List(StructField("test", BooleanType, true))) >> val myData = sc.parallelize(List(Row(0), Row(true), Row("stuff"))) >> val schemaData = sqlContext.applySchema(myData, struct) //No error >> schemaData.collect()(0).getBoolean(0) //Only now will I receive an error >> >> Is this expected or a bug? >> >> Thanks, >> Justin >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/SQLContext-applySchema-strictness-tp21650.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >
Re: SQLContext.applySchema strictness
Hi Justin, It is expected. We do not check if the provided schema matches rows since all rows need to be scanned to give a correct answer. Thanks, Yin On Fri, Feb 13, 2015 at 1:33 PM, Justin Pihony wrote: > Per the documentation: > > It is important to make sure that the structure of every Row of the > provided RDD matches the provided schema. Otherwise, there will be runtime > exception. > > However, it appears that this is not being enforced. > > import org.apache.spark.sql._ > val sqlContext = new SqlContext(sc) > val struct = StructType(List(StructField("test", BooleanType, true))) > val myData = sc.parallelize(List(Row(0), Row(true), Row("stuff"))) > val schemaData = sqlContext.applySchema(myData, struct) //No error > schemaData.collect()(0).getBoolean(0) //Only now will I receive an error > > Is this expected or a bug? > > Thanks, > Justin > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/SQLContext-applySchema-strictness-tp21650.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >