I think you should check the rpc target, may be the nodemanager has memory
issue like gc or other.Check it out first.
And i wonder why you assign --executor-cores 8?
2017-07-29 7:40 GMT+08:00 jeff saremi :
> asking this on a tangent:
>
> Is there anyway for the shuffle
asking this on a tangent:
Is there anyway for the shuffle data to be replicated to more than one server?
thanks
From: jeff saremi
Sent: Friday, July 28, 2017 4:38:08 PM
To: Juan Rodríguez Hortalá
Cc: user@spark.apache.org
Subject: Re:
Thanks Juan for taking the time
Here's more info:
- This is running on Yarn in Master mode
- See config params below
- This is a corporate environment. In general nodes should not be added or
removed that often to the cluster. Even if that is the case I would expect that
to be one or 2
Hi Jeff,
Can you provide more information about how are you running your job? In
particular:
- which cluster manager are you using? It is YARN, Mesos, Spark
Standalone?
- with configuration options are you using to submit the job? In
particular are you using dynamic allocation or external
We have a not too complex and not too large spark job that keeps dying with
this error
I have researched it and I have not seen any convincing explanation on why
I am not using a shuffle service. Which server is the one that is refusing the
connection?
If I go to the server that is being