Re: YARN - Pyspark

ayan guha Fri, 30 Sep 2016 05:10:39 -0700

I understand, thank you for explanation. However, I ran using yarn-client
mode, submitted using nohup and I could see the logs getting into log file
throughout the life of the job.....everything worked well on spark side,
just Yarn reported success long before job actually completed. I would love
to understand if I am missing anything here....


On Fri, Sep 30, 2016 at 8:32 PM, Timur Shenkao <t...@timshenkao.su> wrote:

> It's not weird behavior. Did you run the job in cluster mode?
> I suspect your driver died / finished / stopped after 12 hours but your
> job continued. It's possible as you didn't output anything to console on
> driver node.
>
> Quite long time ago, when I just tried Spark Streaming, I launched PySpark
> Streaming jobs in PyCharm & pyspark console and "killed" them via Ctrl+Z
> Drivers were gone but YARN containers (where computations on slaves were
> performed) remained.
> Nevertheless, I believe that final result in "some table" is corrupted
>
> On Fri, Sep 30, 2016 at 9:33 AM, ayan guha <guha.a...@gmail.com> wrote:
>
>> Hi
>>
>> I just observed a litlte weird behavior:
>>
>> I ran a pyspark job, very simple one.
>>
>> conf = SparkConf()
>> conf.setAppName("Historical Meter Load")
>> conf.set("spark.yarn.queue","root.Applications")
>> conf.set("spark.executor.instances","50")
>> conf.set("spark.executor.memory","10g")
>> conf.set("spark.yarn.executor.memoryOverhead","2048")
>> conf.set("spark.sql.shuffle.partitions",1000)
>> conf.set("spark.executor.cores","4")
>> sc = SparkContext(conf = conf)
>> sqlContext = HiveContext(sc)
>>
>> df = sqlContext.sql("some sql")
>>
>> c = df.count()
>>
>> df.filter(df["RNK"] == 1).saveAsTable("some table").mode("overwrite")
>>
>> sc.stop()
>>
>> running is on CDH 5.7 cluster, Spark 1.6.0.
>>
>> Behavior observed: After few hours of running (definitely over 12H, but
>> not sure exacly when), Yarn reported job as Completed, finished
>> successfully, whereas the job kept running (I can see from Application
>> master link) for 22H. Timing of the job is expected. Behavior of YARN is
>> not.
>>
>> Is it a known issue? Is it a pyspark specific issue or same with scala as
>> well?
>>
>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>
>


-- 
Best Regards,
Ayan Guha

Re: YARN - Pyspark

Reply via email to