Yuemeng, you can find out how to edit wikidocs here: About This Wiki <https://cwiki.apache.org/confluence/display/Hive/AboutThisWiki#AboutThisWiki-Howtogetpermissiontoedit> .
-- Lefty On Wed, Dec 3, 2014 at 10:05 PM, Xuefu Zhang <xzh...@cloudera.com> wrote: > Hi Yuemeng, > > I'm glad that Hive on Spark finally works for you. As you know, this > project is still in development and yet to be released. Thus, please > forgive about the lack of proper documentation. We have a "Get Started" > page that's linked in HIVE-7292. If you can improve the document there, it > would be very helpful for other Hive users. > > Thanks, > Xuefu > > On Wed, Dec 3, 2014 at 5:42 PM, yuemeng1 <yueme...@huawei.com> wrote: > >> hi,thanks a lot for your help,with your help ,my hive-on-spark can work >> well now >> it take me long time to install and deploy.here are some advice,i think >> we need to improve the installation documentation, allowing users to use >> the least amount of time to compile and install >> 1)add which spark version we should pick from spark github if we select >> built spark instead of download a spark pre-built,tell them the right built >> commad!(not include Pyarn ,Phive) >> 2)if they get some error during built ,such as [ERRO >> /hive/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobStatus.java: >> [22,24] cannot find symbol >> [ERROR] symbol: class JobExecutionStatus,tell them what they can do? >> for our users,first to use it ,then feel good or bad? >> and if u need,i can add something to start document >> >> >> thanks >> yuemeng >> >> >> >> >> >> >> On 2014/12/3 11:03, Xuefu Zhang wrote: >> >> When you build Spark, remove -Phive as well as -Pyarn. When you run >> hive queries, you may need to run "set spark.home=/path/to/spark/dir"; >> >> Thanks, >> Xuefu >> >> On Tue, Dec 2, 2014 at 6:29 PM, yuemeng1 <yueme...@huawei.com> wrote: >> >>> hi,XueFu,thanks a lot for your help,now i will provide more detail to >>> reproduce this ssue: >>> 1),i checkout a spark branch from hive github( >>> https://github.com/apache/hive/tree/spark on Nov 29,becasue of for >>> version now it will give something wrong about:Caused by: >>> java.lang.RuntimeException: Unable to instantiate >>> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ), >>> and built command:mvn clean package -DskipTests -Phadoop-2 -Pdist >>> after built i get package from >>> :/home/ym/hive-on-spark/hive1129/hive/packaging/target(apache-hive-0.15.0-SNAPSHOT-bin.tar.gz) >>> 2)i checkout spark from >>> https://github.com/apache/spark/tree/v1.2.0-snapshot0,becasue of spark >>> branch-1.2 is with spark parent(1.2.1-SNAPSHOT),so i chose >>> v1.2.0-snapshot0 and i compare this spark's pom.xml with >>> spark-parent-1.2.0-SNAPSHOT.pom(get from >>> http://ec2-50-18-79-139.us-west-1.compute.amazonaws.com/data/spark_2.10-1.2-SNAPSHOT/org/apache/spark/spark-parent/1.2.0-SNAPSHOT/),and >>> there is only difference is spark-parent name,and built command is : >>> >>> mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -DskipTests clean >>> package >>> >>> 3)comand i execute in hive-shell: >>> ./hive --auxpath >>> /opt/hispark/spark/assembly/target/scala-2.10/spark-assembly-1.2.0-hadoop2.4.0.jar(copy >>> this jar to hive dir lib already) >>> create table student(sno int,sname string,sage int,ssex string) row >>> format delimited FIELDS TERMINATED BY ','; >>> create table score(sno int,cno int,sage int) row format delimited FIELDS >>> TERMINATED BY ','; >>> load data local inpath >>> '/home/hive-on-spark/temp/spark-1.2.0/examples/src/main/resources/student.txt' >>> into table student; >>> load data local inpath >>> '/home/hive-on-spark/temp/spark-1.2.0/examples/src/main/resources/score.txt' >>> into table score; >>> set hive.execution.engine=spark; >>> set spark.master=spark://10.175.xxx.xxx:7077; >>> set spark.eventLog.enabled=true; >>> set spark.executor.memory=9086m; >>> set spark.serializer=org.apache.spark.serializer.KryoSerializer; >>> select distinct st.sno,sname from student st join score sc >>> on(st.sno=sc.sno) where sc.cno IN(11,12,13) and st.sage > 28;(work in mr) >>> 4) >>> studdent.txt file >>> 1,rsh,27,female >>> 2,kupo,28,male >>> 3,astin,29,female >>> 4,beike,30,male >>> 5,aili,31,famle >>> >>> score.txt file >>> 1,10,80 >>> 2,11,85 >>> 3,12,90 >>> 4,13,95 >>> 5,14,100 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On 2014/12/2 23:28, Xuefu Zhang wrote: >>> >>> Could you provide details on how to reproduce the issue? such as the >>> exact spark branch, the command to build Spark, how you build Hive, and >>> what queries/commands you run. >>> >>> We are running Hive on Spark all the time. Our pre-commit test runs >>> without any issue. >>> >>> Thanks, >>> Xuefu >>> >>> On Tue, Dec 2, 2014 at 4:13 AM, yuemeng1 <yueme...@huawei.com> wrote: >>> >>>> hi,XueFu >>>> i checkout a spark branch from sparkgithub(tags:v1.2.0-snapshot0)and i >>>> compare this spark's pom.xml with spark-parent-1.2.0-SNAPSHOT.pom(get from >>>> http://ec2-50-18-79-139.us-west-1.compute.amazonaws.com/data/spark_2.10-1.2-SNAPSHOT/org/apache/spark/spark-parent/1.2.0-SNAPSHOT/),and >>>> there is only difference is follow: >>>> in spark-parent-1.2.0-SNAPSHOT.pom >>>> <artifactId>spark-parent</artifactId> >>>> <version>1.2.0-SNAPSHOT</version> >>>> and in v1.2.0-snapshot0 >>>> <artifactId>spark-parent</artifactId> >>>> <version>1.2.0</version> >>>> i think there is no essence diff,and i built v1.2.0-snapshot0 and >>>> deploy it as my spark clusters >>>> when i run query about join two table ,it still give some error what i >>>> show u earlier >>>> >>>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, >>>> most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): >>>> java.lang.NullPointerException+details >>>> >>>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most >>>> recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): >>>> java.lang.NullPointerException >>>> at >>>> org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255) >>>> at >>>> org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437) >>>> at >>>> org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430) >>>> at >>>> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587) >>>> at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233) >>>> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210) >>>> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99) >>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) >>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) >>>> at >>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) >>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) >>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) >>>> at >>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) >>>> at >>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) >>>> at org.apache.spark.scheduler.Task.run(Task.scala:56) >>>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) >>>> at java.lang.Thread.run(Thread.java:722) >>>> >>>> Driver stacktrace: >>>> >>>> >>>> >>>> i think my spark clusters did't had any problem,but why always give me >>>> such error >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On 2014/12/2 13:39, Xuefu Zhang wrote: >>>> >>>> You need to build your spark assembly from spark 1.2 branch. this >>>> should give your both a spark build as well as spark-assembly jar, which >>>> you need to copy to Hive lib directory. Snapshot is fine, and spark 1.2 >>>> hasn't been released yet. >>>> >>>> --Xuefu >>>> >>>> On Mon, Dec 1, 2014 at 7:41 PM, yuemeng1 <yueme...@huawei.com> wrote: >>>> >>>>> >>>>> >>>>> hi.XueFu, >>>>> thanks a lot for your inforamtion,but as far as i know ,the latest >>>>> spark version on github is spark-snapshot-1.3,but there is no >>>>> spark-1.2,only have a branch-1.2 with spark-snapshot-1.2,can u tell me >>>>> which spark version i should built,and for now,that's >>>>> spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar produce error like that >>>>> >>>>> >>>>> On 2014/12/2 11:03, Xuefu Zhang wrote: >>>>> >>>>> It seems that wrong class, HiveInputFormat, is loaded. The >>>>> stacktrace is way off the current Hive code. You need to build Spark 1.2 >>>>> and copy spark-assembly jar to Hive's lib directory and that it. >>>>> >>>>> --Xuefu >>>>> >>>>> On Mon, Dec 1, 2014 at 6:22 PM, yuemeng1 <yueme...@huawei.com> wrote: >>>>> >>>>>> hi,i built a hive on spark package and my spark assembly jar is >>>>>> spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar,when i run a query in hive >>>>>> shell,before execute this query, >>>>>> i set all the require which hive need with spark.and i execute a >>>>>> join query : >>>>>> select distinct st.sno,sname from student st join score sc >>>>>> on(st.sno=sc.sno) where sc.cno IN(11,12,13) and st.sage > 28; >>>>>> but it failed, >>>>>> get follow error in spark webUI: >>>>>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, >>>>>> most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): >>>>>> java.lang.NullPointerException+details >>>>>> >>>>>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, >>>>>> most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): >>>>>> java.lang.NullPointerException >>>>>> at >>>>>> org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255) >>>>>> at >>>>>> org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437) >>>>>> at >>>>>> org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430) >>>>>> at >>>>>> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587) >>>>>> at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233) >>>>>> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210) >>>>>> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99) >>>>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) >>>>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) >>>>>> at >>>>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) >>>>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) >>>>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) >>>>>> at >>>>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) >>>>>> at >>>>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) >>>>>> at org.apache.spark.scheduler.Task.run(Task.scala:56) >>>>>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) >>>>>> at >>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) >>>>>> at >>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) >>>>>> at java.lang.Thread.run(Thread.java:722) >>>>>> >>>>>> Driver stacktrace: >>>>>> >>>>>> >>>>>> can u give me a help to deal this probelm,and i think my built was >>>>>> succussed! >>>>>> >>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >