Re: Job aborted due to stage failure

Xuefu Zhang Tue, 02 Dec 2014 07:30:14 -0800

Could you provide details on how to reproduce the issue? such as the exact
spark branch, the command to build Spark, how you build Hive, and what
queries/commands you run.


We are running Hive on Spark all the time. Our pre-commit test runs without
any issue.

Thanks,
Xuefu

On Tue, Dec 2, 2014 at 4:13 AM, yuemeng1 <yueme...@huawei.com> wrote:

>  hi,XueFu
> i checkout a spark branch from sparkgithub(tags:v1.2.0-snapshot0)and i
> compare this spark's pom.xml with spark-parent-1.2.0-SNAPSHOT.pom(get from
> http://ec2-50-18-79-139.us-west-1.compute.amazonaws.com/data/spark_2.10-1.2-SNAPSHOT/org/apache/spark/spark-parent/1.2.0-SNAPSHOT/),and
> there is only difference is follow:
> in spark-parent-1.2.0-SNAPSHOT.pom
>   <artifactId>spark-parent</artifactId>
>   <version>1.2.0-SNAPSHOT</version>
> and in v1.2.0-snapshot0
> <artifactId>spark-parent</artifactId>
>   <version>1.2.0</version>
> i think there is no essence diff,and i built v1.2.0-snapshot0 and deploy
> it as my spark clusters
> when i run query about join two table ,it still give some error what i
> show u earlier
>
> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most
> recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18):
> java.lang.NullPointerException+details
>
> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most 
> recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): 
> java.lang.NullPointerException
>       at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
>       at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
>       at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
>       at 
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
>       at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
>       at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
>       at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
>       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>       at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>       at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>       at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>       at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>       at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>       at org.apache.spark.scheduler.Task.run(Task.scala:56)
>       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>       at java.lang.Thread.run(Thread.java:722)
>
> Driver stacktrace:
>
>
>
> i think my spark clusters did't had any problem,but why always give me
> such error
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On 2014/12/2 13:39, Xuefu Zhang wrote:
>
>  You need to build your spark assembly from spark 1.2 branch. this should
> give your both a spark build as well as spark-assembly jar, which you need
> to copy to Hive lib directory. Snapshot is fine, and spark 1.2 hasn't been
> released yet.
>
>  --Xuefu
>
> On Mon, Dec 1, 2014 at 7:41 PM, yuemeng1 <yueme...@huawei.com> wrote:
>
>>
>>
>> hi.XueFu,
>> thanks a lot for your inforamtion,but as far as i know ,the latest spark
>> version on github is spark-snapshot-1.3,but there is no spark-1.2,only have
>> a branch-1.2 with spark-snapshot-1.2,can u tell me which spark version i
>> should built,and for now,that's
>> spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar produce error like that
>>
>>
>> On 2014/12/2 11:03, Xuefu Zhang wrote:
>>
>>  It seems that wrong class, HiveInputFormat, is loaded. The stacktrace
>> is way off the current Hive code. You need to build Spark 1.2 and copy
>> spark-assembly jar to Hive's lib directory and that it.
>>
>>  --Xuefu
>>
>> On Mon, Dec 1, 2014 at 6:22 PM, yuemeng1 <yueme...@huawei.com> wrote:
>>
>>>  hi,i built a hive on spark package and my spark assembly jar is
>>> spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar,when i run a query in hive
>>> shell,before execute this query,
>>> i set all the  require which hive need with  spark.and i execute a join
>>> query :
>>> select distinct st.sno,sname from student st join score sc
>>> on(st.sno=sc.sno) where sc.cno IN(11,12,13) and st.sage > 28;
>>> but it failed,
>>> get follow error in spark webUI:
>>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times,
>>> most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18):
>>> java.lang.NullPointerException+details
>>>
>>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most 
>>> recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): 
>>> java.lang.NullPointerException
>>>     at 
>>> org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
>>>     at 
>>> org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
>>>     at 
>>> org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
>>>     at 
>>> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
>>>     at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
>>>     at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
>>>     at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
>>>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>>     at 
>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>>>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>>     at 
>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>>>     at 
>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>>     at org.apache.spark.scheduler.Task.run(Task.scala:56)
>>>     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>>>     at 
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>>     at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>>     at java.lang.Thread.run(Thread.java:722)
>>>
>>> Driver stacktrace:
>>>
>>>
>>>  can u give me a help to deal this probelm,and i think my built was
>>> succussed!
>>>
>>
>>
>>
>
>

Re: Job aborted due to stage failure

Reply via email to