Hello All, I am trying to query a Hive table using Spark SQL from my java code,but getting the following error:
*Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task not serializable: java.io.NotSerializableException: org.apache.spark.sql.hive.api.java.JavaHiveContext at org.apache.spark.scheduler.DAGScheduler.org <http://org.apache.spark.scheduler.DAGScheduler.org>$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1049) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)* I am using Spark 1.0.2. My code snippet is as below: *JavaHiveContext hiveContext = null;JavaSparkContext jsCtx = ......;hiveContext = new JavaHiveContext(jsCtx);hiveContext.hql("select col1,col2 from table1")* Usually people have been suggesting not to pass any non-serializable object to Spark closure function (map,reduce,etc.) to avoid it from getting distributed across multiple machines.But I am not using any closure functions here,so not sure how to handle this issue. Can you please advise how to resolve this problem? Thanks Bijoy