When you build Spark, remove -Phive as well as -Pyarn. When you run hive queries, you may need to run "set spark.home=/path/to/spark/dir";
Thanks, Xuefu On Tue, Dec 2, 2014 at 6:29 PM, yuemeng1 <yueme...@huawei.com> wrote: > hi,XueFu,thanks a lot for your help,now i will provide more detail to > reproduce this ssue: > 1),i checkout a spark branch from hive github( > https://github.com/apache/hive/tree/spark on Nov 29,becasue of for > version now it will give something wrong about:Caused by: > java.lang.RuntimeException: Unable to instantiate > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ), > and built command:mvn clean package -DskipTests -Phadoop-2 -Pdist > after built i get package from > :/home/ym/hive-on-spark/hive1129/hive/packaging/target(apache-hive-0.15.0-SNAPSHOT-bin.tar.gz) > 2)i checkout spark from > https://github.com/apache/spark/tree/v1.2.0-snapshot0,becasue of spark > branch-1.2 is with spark parent(1.2.1-SNAPSHOT),so i chose > v1.2.0-snapshot0 and i compare this spark's pom.xml with > spark-parent-1.2.0-SNAPSHOT.pom(get from > http://ec2-50-18-79-139.us-west-1.compute.amazonaws.com/data/spark_2.10-1.2-SNAPSHOT/org/apache/spark/spark-parent/1.2.0-SNAPSHOT/),and > there is only difference is spark-parent name,and built command is : > > mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -DskipTests clean > package > > 3)comand i execute in hive-shell: > ./hive --auxpath > /opt/hispark/spark/assembly/target/scala-2.10/spark-assembly-1.2.0-hadoop2.4.0.jar(copy > this jar to hive dir lib already) > create table student(sno int,sname string,sage int,ssex string) row format > delimited FIELDS TERMINATED BY ','; > create table score(sno int,cno int,sage int) row format delimited FIELDS > TERMINATED BY ','; > load data local inpath > '/home/hive-on-spark/temp/spark-1.2.0/examples/src/main/resources/student.txt' > into table student; > load data local inpath > '/home/hive-on-spark/temp/spark-1.2.0/examples/src/main/resources/score.txt' > into table score; > set hive.execution.engine=spark; > set spark.master=spark://10.175.xxx.xxx:7077; > set spark.eventLog.enabled=true; > set spark.executor.memory=9086m; > set spark.serializer=org.apache.spark.serializer.KryoSerializer; > select distinct st.sno,sname from student st join score sc > on(st.sno=sc.sno) where sc.cno IN(11,12,13) and st.sage > 28;(work in mr) > 4) > studdent.txt file > 1,rsh,27,female > 2,kupo,28,male > 3,astin,29,female > 4,beike,30,male > 5,aili,31,famle > > score.txt file > 1,10,80 > 2,11,85 > 3,12,90 > 4,13,95 > 5,14,100 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 2014/12/2 23:28, Xuefu Zhang wrote: > > Could you provide details on how to reproduce the issue? such as the > exact spark branch, the command to build Spark, how you build Hive, and > what queries/commands you run. > > We are running Hive on Spark all the time. Our pre-commit test runs > without any issue. > > Thanks, > Xuefu > > On Tue, Dec 2, 2014 at 4:13 AM, yuemeng1 <yueme...@huawei.com> wrote: > >> hi,XueFu >> i checkout a spark branch from sparkgithub(tags:v1.2.0-snapshot0)and i >> compare this spark's pom.xml with spark-parent-1.2.0-SNAPSHOT.pom(get from >> http://ec2-50-18-79-139.us-west-1.compute.amazonaws.com/data/spark_2.10-1.2-SNAPSHOT/org/apache/spark/spark-parent/1.2.0-SNAPSHOT/),and >> there is only difference is follow: >> in spark-parent-1.2.0-SNAPSHOT.pom >> <artifactId>spark-parent</artifactId> >> <version>1.2.0-SNAPSHOT</version> >> and in v1.2.0-snapshot0 >> <artifactId>spark-parent</artifactId> >> <version>1.2.0</version> >> i think there is no essence diff,and i built v1.2.0-snapshot0 and deploy >> it as my spark clusters >> when i run query about join two table ,it still give some error what i >> show u earlier >> >> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, >> most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): >> java.lang.NullPointerException+details >> >> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most >> recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): >> java.lang.NullPointerException >> at >> org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255) >> at >> org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437) >> at >> org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430) >> at >> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587) >> at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233) >> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210) >> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) >> at >> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) >> at org.apache.spark.scheduler.Task.run(Task.scala:56) >> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) >> at java.lang.Thread.run(Thread.java:722) >> >> Driver stacktrace: >> >> >> >> i think my spark clusters did't had any problem,but why always give me >> such error >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On 2014/12/2 13:39, Xuefu Zhang wrote: >> >> You need to build your spark assembly from spark 1.2 branch. this >> should give your both a spark build as well as spark-assembly jar, which >> you need to copy to Hive lib directory. Snapshot is fine, and spark 1.2 >> hasn't been released yet. >> >> --Xuefu >> >> On Mon, Dec 1, 2014 at 7:41 PM, yuemeng1 <yueme...@huawei.com> wrote: >> >>> >>> >>> hi.XueFu, >>> thanks a lot for your inforamtion,but as far as i know ,the latest spark >>> version on github is spark-snapshot-1.3,but there is no spark-1.2,only have >>> a branch-1.2 with spark-snapshot-1.2,can u tell me which spark version i >>> should built,and for now,that's >>> spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar produce error like that >>> >>> >>> On 2014/12/2 11:03, Xuefu Zhang wrote: >>> >>> It seems that wrong class, HiveInputFormat, is loaded. The stacktrace >>> is way off the current Hive code. You need to build Spark 1.2 and copy >>> spark-assembly jar to Hive's lib directory and that it. >>> >>> --Xuefu >>> >>> On Mon, Dec 1, 2014 at 6:22 PM, yuemeng1 <yueme...@huawei.com> wrote: >>> >>>> hi,i built a hive on spark package and my spark assembly jar is >>>> spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar,when i run a query in hive >>>> shell,before execute this query, >>>> i set all the require which hive need with spark.and i execute a join >>>> query : >>>> select distinct st.sno,sname from student st join score sc >>>> on(st.sno=sc.sno) where sc.cno IN(11,12,13) and st.sage > 28; >>>> but it failed, >>>> get follow error in spark webUI: >>>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, >>>> most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): >>>> java.lang.NullPointerException+details >>>> >>>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most >>>> recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): >>>> java.lang.NullPointerException >>>> at >>>> org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255) >>>> at >>>> org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437) >>>> at >>>> org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430) >>>> at >>>> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587) >>>> at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233) >>>> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210) >>>> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99) >>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) >>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) >>>> at >>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) >>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) >>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) >>>> at >>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) >>>> at >>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) >>>> at org.apache.spark.scheduler.Task.run(Task.scala:56) >>>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) >>>> at java.lang.Thread.run(Thread.java:722) >>>> >>>> Driver stacktrace: >>>> >>>> >>>> can u give me a help to deal this probelm,and i think my built was >>>> succussed! >>>> >>> >>> >>> >> >> > >