All values in Hive are always nullable, though you should still not be seeing this error.
It should be addressed by this patch: https://github.com/apache/spark/pull/3150 On Fri, Dec 5, 2014 at 2:36 AM, Hao Ren <inv...@gmail.com> wrote: > Hi, > > I am using SparkSQL on 1.1.0 branch. > > The following code leads to a scala.MatchError > at > > org.apache.spark.sql.catalyst.expressions.Cast.cast$lzycompute(Cast.scala:247) > > val scm = StructType(inputRDD.schema.fields.init :+ > StructField("list", > ArrayType( > StructType( > Seq(StructField("date", StringType, nullable = false), > StructField("nbPurchase", IntegerType, nullable = false)))), > nullable = false)) > > // purchaseRDD is RDD[sql.ROW] whose schema is corresponding to scm. It is > transformed from inputRDD > val schemaRDD = hiveContext.applySchema(purchaseRDD, scm) > schemaRDD.registerTempTable("t_purchase") > > Here's the stackTrace: > scala.MatchError: ArrayType(StructType(List(StructField(date,StringType, > true ), StructField(n_reachat,IntegerType, true ))),true) (of class > org.apache.spark.sql.catalyst.types.ArrayType) > at > > org.apache.spark.sql.catalyst.expressions.Cast.cast$lzycompute(Cast.scala:247) > at > org.apache.spark.sql.catalyst.expressions.Cast.cast(Cast.scala:247) > at > org.apache.spark.sql.catalyst.expressions.Cast.eval(Cast.scala:263) > at > > org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:84) > at > > org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:66) > at > > org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:50) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org > $apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1(InsertIntoHiveTable.scala:149) > at > > org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$1.apply(InsertIntoHiveTable.scala:158) > at > > org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$1.apply(InsertIntoHiveTable.scala:158) > at > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) > at org.apache.spark.scheduler.Task.run(Task.scala:54) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) > at > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > > The strange thing is that nullable of date and nbPurchase field are set to > true while it were false in the code. If I set both to true, it works. But, > in fact, they should not be nullable. > > Here's what I find at Cast.scala:247 on 1.1.0 branch > > private[this] lazy val cast: Any => Any = dataType match { > case StringType => castToString > case BinaryType => castToBinary > case DecimalType => castToDecimal > case TimestampType => castToTimestamp > case BooleanType => castToBoolean > case ByteType => castToByte > case ShortType => castToShort > case IntegerType => castToInt > case FloatType => castToFloat > case LongType => castToLong > case DoubleType => castToDouble > } > > Any idea? Thank you. > > Hao > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/scala-MatchError-on-SparkSQL-when-creating-ArrayType-of-StructType-tp20459.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >