Re: how to debug this kind of error, e.g. "lost executor"?

2015-02-11 Thread Praveen Garg
Try increasing the value of spark.yarn.executor.memoryOverhead. It’s default 
value is 384mb in spark 1.1. This error generally comes when your process usage 
exceed your max allocation. Use following property to increase memory overhead.

From: Yifan LI mailto:iamyifa...@gmail.com>>
Date: Friday, 6 February 2015 3:53 pm
To: Ankur Srivastava 
mailto:ankur.srivast...@gmail.com>>
Cc: "user@spark.apache.org<mailto:user@spark.apache.org>" 
mailto:user@spark.apache.org>>
Subject: Re: how to debug this kind of error, e.g. "lost executor"?

Hi Ankur,

Thanks very much for your help, but I am using v1.2, so it is SORT…

Let me know if you have any other advice, :)

Best,
Yifan LI





On 05 Feb 2015, at 17:56, Ankur Srivastava 
mailto:ankur.srivast...@gmail.com>> wrote:


Li, I cannot tell you the reason for this exception but have seen these kind of 
errors when using HASH based shuffle manager (which is default) until v 1.2. 
Try the SORT shuffle manager.

Hopefully that will help

Thanks
Ankur

Anyone has idea on where I can find the detailed log of that lost executor(why 
it was lost)?

Thanks in advance!





On 05 Feb 2015, at 16:14, Yifan LI 
mailto:iamyifa...@gmail.com>> wrote:

Hi,

I am running a heavy memory/cpu overhead graphx application, I think the memory 
is sufficient and set RDDs’ StorageLevel using MEMORY_AND_DISK.

But I found there were some tasks failed due to following errors:

java.io.FileNotFoundException: 
/data/spark/local/spark-local-20150205151711-9700/09/rdd_3_275 (No files or 
folders of this type)

ExecutorLostFailure (executor 11 lost)


So, finally that stage failed:

org.apache.spark.shuffle.FetchFailedException: java.io.FileNotFoundException: 
/data/spark/local/spark-local-20150205151711-587a/16/shuffle_11_219_0.index


Anyone has points? Where I can get more details for this issue?


Best,
Yifan LI









Re: how to debug this kind of error, e.g. "lost executor"?

2015-02-06 Thread Yifan LI
Hi Ankur,

Thanks very much for your help, but I am using v1.2, so it is SORT… 

Let me know if you have any other advice, :)

Best,
Yifan LI





> On 05 Feb 2015, at 17:56, Ankur Srivastava  wrote:
> 
> Li, I cannot tell you the reason for this exception but have seen these kind 
> of errors when using HASH based shuffle manager (which is default) until v 
> 1.2. Try the SORT shuffle manager.
> 
> Hopefully that will help
> 
> Thanks
> Ankur
> 
> 
> Anyone has idea on where I can find the detailed log of that lost 
> executor(why it was lost)?
> 
> Thanks in advance!
> 
> 
> 
> 
> 
>> On 05 Feb 2015, at 16:14, Yifan LI > > wrote:
>> 
>> Hi,
>> 
>> I am running a heavy memory/cpu overhead graphx application, I think the 
>> memory is sufficient and set RDDs’ StorageLevel using MEMORY_AND_DISK.
>> 
>> But I found there were some tasks failed due to following errors:
>> 
>> java.io.FileNotFoundException: 
>> /data/spark/local/spark-local-20150205151711-9700/09/rdd_3_275 (No files or 
>> folders of this type)
>> 
>> ExecutorLostFailure (executor 11 lost)
>> 
>> 
>> So, finally that stage failed:
>> 
>> org.apache.spark.shuffle.FetchFailedException: 
>> java.io.FileNotFoundException: 
>> /data/spark/local/spark-local-20150205151711-587a/16/shuffle_11_219_0.index
>> 
>> 
>> Anyone has points? Where I can get more details for this issue?
>> 
>> 
>> Best,
>> Yifan LI
>> 
>> 
>> 
>> 
>> 
> 



Re: how to debug this kind of error, e.g. "lost executor"?

2015-02-05 Thread Xuefeng Wu
could you find the shuffle files? or the files were deleted by other processes?

Yours, Xuefeng Wu 吴雪峰 敬上

> On 2015年2月5日, at 下午11:14, Yifan LI  wrote:
> 
> Hi,
> 
> I am running a heavy memory/cpu overhead graphx application, I think the 
> memory is sufficient and set RDDs’ StorageLevel using MEMORY_AND_DISK.
> 
> But I found there were some tasks failed due to following errors:
> 
> java.io.FileNotFoundException: 
> /data/spark/local/spark-local-20150205151711-9700/09/rdd_3_275 (No files or 
> folders of this type)
> 
> ExecutorLostFailure (executor 11 lost)
> 
> 
> So, finally that stage failed:
> 
> org.apache.spark.shuffle.FetchFailedException: java.io.FileNotFoundException: 
> /data/spark/local/spark-local-20150205151711-587a/16/shuffle_11_219_0.index
> 
> 
> Anyone has points? Where I can get more details for this issue?
> 
> 
> Best,
> Yifan LI
> 
> 
> 
> 
> 


Re: how to debug this kind of error, e.g. "lost executor"?

2015-02-05 Thread Ankur Srivastava
Li, I cannot tell you the reason for this exception but have seen these
kind of errors when using HASH based shuffle manager (which is default)
until v 1.2. Try the SORT shuffle manager.

Hopefully that will help

Thanks
Ankur

Anyone has idea on where I can find the detailed log of that lost
executor(why it was lost)?

Thanks in advance!





On 05 Feb 2015, at 16:14, Yifan LI  wrote:

Hi,

I am running a heavy memory/cpu overhead graphx application, I think the
memory is sufficient and set RDDs’ StorageLevel using MEMORY_AND_DISK.

But I found there were some tasks failed due to following errors:

java.io.FileNotFoundException:
/data/spark/local/spark-local-20150205151711-9700/09/rdd_3_275 (No files or
folders of this type)

ExecutorLostFailure (executor 11 lost)


So, finally that stage failed:

org.apache.spark.shuffle.FetchFailedException:
java.io.FileNotFoundException:
/data/spark/local/spark-local-20150205151711-587a/16/shuffle_11_219_0.index


Anyone has points? Where I can get more details for this issue?


Best,
Yifan LI


Re: how to debug this kind of error, e.g. "lost executor"?

2015-02-05 Thread Yifan LI

Anyone has idea on where I can find the detailed log of that lost executor(why 
it was lost)?

Thanks in advance!





> On 05 Feb 2015, at 16:14, Yifan LI  wrote:
> 
> Hi,
> 
> I am running a heavy memory/cpu overhead graphx application, I think the 
> memory is sufficient and set RDDs’ StorageLevel using MEMORY_AND_DISK.
> 
> But I found there were some tasks failed due to following errors:
> 
> java.io.FileNotFoundException: 
> /data/spark/local/spark-local-20150205151711-9700/09/rdd_3_275 (No files or 
> folders of this type)
> 
> ExecutorLostFailure (executor 11 lost)
> 
> 
> So, finally that stage failed:
> 
> org.apache.spark.shuffle.FetchFailedException: java.io.FileNotFoundException: 
> /data/spark/local/spark-local-20150205151711-587a/16/shuffle_11_219_0.index
> 
> 
> Anyone has points? Where I can get more details for this issue?
> 
> 
> Best,
> Yifan LI
> 
> 
> 
> 
> 



how to debug this kind of error, e.g. "lost executor"?

2015-02-05 Thread Yifan LI
Hi,

I am running a heavy memory/cpu overhead graphx application, I think the memory 
is sufficient and set RDDs’ StorageLevel using MEMORY_AND_DISK.

But I found there were some tasks failed due to following errors:

java.io.FileNotFoundException: 
/data/spark/local/spark-local-20150205151711-9700/09/rdd_3_275 (No files or 
folders of this type)

ExecutorLostFailure (executor 11 lost)


So, finally that stage failed:

org.apache.spark.shuffle.FetchFailedException: java.io.FileNotFoundException: 
/data/spark/local/spark-local-20150205151711-587a/16/shuffle_11_219_0.index


Anyone has points? Where I can get more details for this issue?


Best,
Yifan LI