Re: Spark SQL - Exception only when using cacheTable
This is how the table was created: transactions = parts.map(lambda p: Row(customer_id=long(p[0]), chain=int(p[1]), dept=int(p[2]), category=int(p[3]), company=int(p[4]), brand=int(p[5]), date=str(p[6]), productsize=float(p[7]), productmeasure=str(p[8]), purchasequantity=int(p[9]), purchaseamount=float(p[10]))) # Infer the schema, and register the Schema RDD as a table schemaTransactions = sqlContext.inferSchema(transactions) schemaTransactions.registerTempTable("transactions") sqlContext.cacheTable("transactions") t = sqlContext.sql("SELECT * FROM transactions WHERE purchaseamount >= 50") t.count() Thank you, poiuytrez -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-Exception-only-when-using-cacheTable-tp16031p16262.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark SQL - Exception only when using cacheTable
es, most recent failure: Lost task 120.3 in stage 7.0 (TID 2248, spark-w-0.c.db.internal): java.lang.ClassCastException: Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$failJobAndIndependentStages(DAGScheduler.scala:1185) at org.apache.spark.scheduler.DAGScheduler$anonfun$abortStage$1.apply(DAGScheduler.scala:1174) at org.apache.spark.scheduler.DAGScheduler$anonfun$abortStage$1.apply(DAGScheduler.scala:1173) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173) at org.apache.spark.scheduler.DAGScheduler$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688) at org.apache.spark.scheduler.DAGScheduler$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Thank you -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-Exception-only-when-using-cacheTable-tp16031p16138.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark SQL - Exception only when using cacheTable
foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Thank you -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-Exception-only-when-using-cacheTable-tp16031p16138.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark SQL - Exception only when using cacheTable
I am using the python api. Unfortunately, I cannot find the isCached method equivalent in the documentation: https://spark.apache.org/docs/1.1.0/api/python/index.html in the SQLContext section. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-Exception-only-when-using-cacheTable-tp16031p16137.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark SQL - Exception only when using cacheTable
Hi Poiuytrez, what version of Spark are you using? Exception details like stacktrace are really needed to investigate this issue. You can find them in the executor logs, or just browse the application stderr/stdout link from Spark Web UI. On 10/9/14 9:37 PM, poiuytrez wrote: Hello, I have a weird issue, this request works fine: sqlContext.sql("SELECT customer_id FROM transactions WHERE purchaseamount >= 200").count() However, when I cache the table before making the request: sqlContext.cacheTable("transactions") sqlContext.sql("SELECT customer_id FROM transactions WHERE purchaseamount >= 200").count() I am getting an exception on of the task: : org.apache.spark.SparkException: Job aborted due to stage failure: Task 120 in stage 104.0 failed 4 times, most recent failure: Lost task 120.3 in stage 104.0 (TID 20537, spark-w-0.c.internal): java.lang.ClassCastException: (I have no details after the ':') Any ideas of what could be wrong? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-Exception-only-when-using-cacheTable-tp16031.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark SQL - Exception only when using cacheTable
Can you try checking whether the table is being cached? You can use isCached method. More details are here - http://spark.apache.org/docs/1.0.2/api/java/org/apache/spark/sql/SQLContext.html -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-Exception-only-when-using-cacheTable-tp16031p16123.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Spark SQL - Exception only when using cacheTable
Hello, I have a weird issue, this request works fine: sqlContext.sql("SELECT customer_id FROM transactions WHERE purchaseamount >= 200").count() However, when I cache the table before making the request: sqlContext.cacheTable("transactions") sqlContext.sql("SELECT customer_id FROM transactions WHERE purchaseamount >= 200").count() I am getting an exception on of the task: : org.apache.spark.SparkException: Job aborted due to stage failure: Task 120 in stage 104.0 failed 4 times, most recent failure: Lost task 120.3 in stage 104.0 (TID 20537, spark-w-0.c.internal): java.lang.ClassCastException: (I have no details after the ':') Any ideas of what could be wrong? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-Exception-only-when-using-cacheTable-tp16031.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org