Could you provide details on how to reproduce the issue? such as the exact spark branch, the command to build Spark, how you build Hive, and what queries/commands you run.
We are running Hive on Spark all the time. Our pre-commit test runs without any issue. Thanks, Xuefu On Tue, Dec 2, 2014 at 4:13 AM, yuemeng1 <yueme...@huawei.com> wrote: > hi,XueFu > i checkout a spark branch from sparkgithub(tags:v1.2.0-snapshot0)and i > compare this spark's pom.xml with spark-parent-1.2.0-SNAPSHOT.pom(get from > http://ec2-50-18-79-139.us-west-1.compute.amazonaws.com/data/spark_2.10-1.2-SNAPSHOT/org/apache/spark/spark-parent/1.2.0-SNAPSHOT/),and > there is only difference is follow: > in spark-parent-1.2.0-SNAPSHOT.pom > <artifactId>spark-parent</artifactId> > <version>1.2.0-SNAPSHOT</version> > and in v1.2.0-snapshot0 > <artifactId>spark-parent</artifactId> > <version>1.2.0</version> > i think there is no essence diff,and i built v1.2.0-snapshot0 and deploy > it as my spark clusters > when i run query about join two table ,it still give some error what i > show u earlier > > Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most > recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): > java.lang.NullPointerException+details > > Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most > recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430) > at > org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587) > at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233) > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210) > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:56) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:722) > > Driver stacktrace: > > > > i think my spark clusters did't had any problem,but why always give me > such error > > > > > > > > > > > > > > > > > > > > > > > On 2014/12/2 13:39, Xuefu Zhang wrote: > > You need to build your spark assembly from spark 1.2 branch. this should > give your both a spark build as well as spark-assembly jar, which you need > to copy to Hive lib directory. Snapshot is fine, and spark 1.2 hasn't been > released yet. > > --Xuefu > > On Mon, Dec 1, 2014 at 7:41 PM, yuemeng1 <yueme...@huawei.com> wrote: > >> >> >> hi.XueFu, >> thanks a lot for your inforamtion,but as far as i know ,the latest spark >> version on github is spark-snapshot-1.3,but there is no spark-1.2,only have >> a branch-1.2 with spark-snapshot-1.2,can u tell me which spark version i >> should built,and for now,that's >> spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar produce error like that >> >> >> On 2014/12/2 11:03, Xuefu Zhang wrote: >> >> It seems that wrong class, HiveInputFormat, is loaded. The stacktrace >> is way off the current Hive code. You need to build Spark 1.2 and copy >> spark-assembly jar to Hive's lib directory and that it. >> >> --Xuefu >> >> On Mon, Dec 1, 2014 at 6:22 PM, yuemeng1 <yueme...@huawei.com> wrote: >> >>> hi,i built a hive on spark package and my spark assembly jar is >>> spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar,when i run a query in hive >>> shell,before execute this query, >>> i set all the require which hive need with spark.and i execute a join >>> query : >>> select distinct st.sno,sname from student st join score sc >>> on(st.sno=sc.sno) where sc.cno IN(11,12,13) and st.sage > 28; >>> but it failed, >>> get follow error in spark webUI: >>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, >>> most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): >>> java.lang.NullPointerException+details >>> >>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most >>> recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): >>> java.lang.NullPointerException >>> at >>> org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255) >>> at >>> org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437) >>> at >>> org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430) >>> at >>> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587) >>> at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233) >>> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210) >>> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99) >>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) >>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) >>> at >>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) >>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) >>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) >>> at >>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) >>> at >>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) >>> at org.apache.spark.scheduler.Task.run(Task.scala:56) >>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) >>> at java.lang.Thread.run(Thread.java:722) >>> >>> Driver stacktrace: >>> >>> >>> can u give me a help to deal this probelm,and i think my built was >>> succussed! >>> >> >> >> > >