Re: org.apache.spark.sql.types.GenericArrayData cannot be cast to org.apache.spark.sql.catalyst.InternalRow

2016-03-07 Thread dmt
Is there a workaround ? My dataset contains billions of rows, and it would be
nice to ignore/exclude the few lines that are badly formatted.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/org-apache-spark-sql-types-GenericArrayData-cannot-be-cast-to-org-apache-spark-sql-catalyst-Internalw-tp26377p26420.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: org.apache.spark.sql.types.GenericArrayData cannot be cast to org.apache.spark.sql.catalyst.InternalRow

2016-03-07 Thread dmt
I have found why the exception is raised.
I have defined a JSON schema, using org.apache.spark.sql.types.StructType,
that expects this kind of record :
/{
  "request": {
"user": {
  "id": 123
}
  }
}/

There's a bad record in my dataset, that defines field "user" as an array,
instead of a JSON object :
/{
  "request": {
"user": []
  }
}/

I have created the following issue :
https://issues.apache.org/jira/browse/SPARK-13719

 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/org-apache-spark-sql-types-GenericArrayData-cannot-be-cast-to-org-apache-spark-sql-catalyst-Internalw-tp26377p26417.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



org.apache.spark.sql.types.GenericArrayData cannot be cast to org.apache.spark.sql.catalyst.InternalRow

2016-03-02 Thread dmt
Hi,

the following error is raised using Spark 1.5.2 or 1.6.0, in stand alone
mode, on my computer.
Has anyone had the same problem, and do you know what might cause this
exception ? Thanks in advance.

/16/03/02 15:12:27 WARN TaskSetManager: Lost task 9.0 in stage 0.0 (TID 9,
192.168.1.36): java.lang.ClassCastException:
org.apache.spark.sql.types.GenericArrayData cannot be cast to
org.apache.spark.sql.catalyst.InternalRow
at
org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getStruct(rows.scala:50)
at
org.apache.spark.sql.catalyst.expressions.GenericMutableRow.getStruct(rows.scala:247)
at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificPredicate.eval(Unknown
Source)
at
org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$$anonfun$create$2.apply(GeneratePredicate.scala:67)
at
org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$$anonfun$create$2.apply(GeneratePredicate.scala:67)
at
org.apache.spark.sql.execution.Filter$$anonfun$4$$anonfun$apply$4.apply(basicOperators.scala:117)
at
org.apache.spark.sql.execution.Filter$$anonfun$4$$anonfun$apply$4.apply(basicOperators.scala:115)
at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:390)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at
org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:365)
at
org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.start(TungstenAggregationIterator.scala:622)
at
org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1.org$apache$spark$sql$execution$aggregate$TungstenAggregate$$anonfun$$executePartition$1(TungstenAggregate.scala:110)
at
org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119)
at
org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119)
at
org.apache.spark.rdd.MapPartitionsWithPreparationRDD.compute(MapPartitionsWithPreparationRDD.scala:64)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

16/03/02 15:12:27 INFO TaskSetManager: Starting task 9.1 in stage 0.0 (TID
17, 192.168.1.36, PROCESS_LOCAL, 2236 bytes)
16/03/02 15:12:27 INFO TaskSetManager: Finished task 11.0 in stage 0.0 (TID
11) in 921 ms on 192.168.1.36 (10/17)
16/03/02 15:12:27 INFO TaskSetManager: Finished task 13.0 in stage 0.0 (TID
13) in 871 ms on 192.168.1.36 (11/17)
16/03/02 15:12:27 INFO TaskSetManager: Finished task 14.0 in stage 0.0 (TID
14) in 885 ms on 192.168.1.36 (12/17)
16/03/02 15:12:27 INFO TaskSetManager: Finished task 8.0 in stage 0.0 (TID
8) in 981 ms on 192.168.1.36 (13/17)
16/03/02 15:12:27 INFO TaskSetManager: Finished task 15.0 in stage 0.0 (TID
15) in 844 ms on 192.168.1.36 (14/17)
16/03/02 15:12:27 INFO TaskSetManager: Finished task 10.0 in stage 0.0 (TID
10) in 1007 ms on 192.168.1.36 (15/17)
16/03/02 15:12:28 INFO TaskSetManager: Lost task 9.1 in stage 0.0 (TID 17)
on executor 192.168.1.36: java.lang.ClassCastException
(org.apache.spark.sql.types.GenericArrayData cannot be cast to
org.apache.spark.sql.catalyst.InternalRow) [duplicate 1]
16/03/02 15:12:28 INFO TaskSetManager: Starting task 9.2 in stage 0.0 (TID
18, 192.168.1.36, PROCESS_LOCAL, 2236 bytes)
16/03/02 15:12:28 INFO TaskSetManager: Finished task 16.0 in stage 0.0 (TID
16) in 537 ms on 192.168.1.36 (16/17)
16/03/02 15:12:28 INFO TaskSetManager: Lost task 9.2 in stage 0.0 (TID 18)
on executor 192.168.1.36: java.lang.ClassCastException
(org.apache.spark.sql.types.GenericArrayData cannot be cast to
org.apache.spark.sql.catalyst.InternalRow) [duplicate 2]
16/03/02 15:12:28 INFO TaskSetManager: Starting task 9.3 in stage 0.0 (TID
19, 192.168.1.36, PROCESS_LOCAL, 2236 bytes)
16/03/02 15:12:29 WARN TaskSetManager: Lost task 9.3 in stage 0.0 (TID 19,
192.168.1.36): java.lang.ClassCastException

16/03/02 15:12:29 ERROR TaskSetManager: Task 9 in stage 0.0 failed 4 times;
aborting job
16/03/02