Hi Steve,
Thanks for your feedback. From your email, I could gather the following two
important points:
1. Report failures to something (cluster manager) which can opt to
destroy the node and request a new one
2. Pluggable failure detection algorithms
Regarding #1, current blacklisting
blacklist
> intervals.
> - W.r.t turning it on by default: Do we have a sense of how many teams are
> using blacklisting today using the current default settings? It may be
> worth changing the defaults for a release or two and gather feedback to
> help make a call on turning it on by defau
ht have failed after 4-6 stages,
> depending on how it played out. (FWIW, this was running one executor per
> node).
>
> -Chris
>
> On Fri, Mar 29, 2019 at 1:48 PM Ankur Gupta
> wrote:
>
>> Thanks Reynold! That is certainly useful to know.
>>
>> @Chris Will
>
> On Thu, Mar 28, 2019 at 3:32 PM, Ankur Gupta <
> ankur.gu...@cloudera.com.invalid> wrote:
>
>> Hi all,
>>
>> This is a follow-on to my PR: https://github.com/apache/spark/pull/24208,
>> where I aimed to enable blacklisting for fetch failure by defaul
Hi all,
This is a follow-on to my PR: https://github.com/apache/spark/pull/24208,
where I aimed to enable blacklisting for fetch failure by default. From the
comments, there is interest in the community to enable overall blacklisting
feature by default. I have listed down 3 different things that w
;re not really that
> useful for debugging. So a solution than keeps that behavior, but
> writes INFO logs to this new sink, would be great.
>
> If you can come up with a solution to those problems I think this
> could be a good feature.
>
>
> On Wed, Aug 22, 2018 at 10:01 AM
gt; For a vanilla Spark on yarn client application, I think user could
>> redirect the console outputs to log and provides both driver log and yarn
>> application log to the customers, this seems not a big overhead.
>>
>> Just my two cents.
>>
>> Thanks
>&g
Hi all,
I want to highlight a problem that we face here at Cloudera and start a
discussion on how to go about solving it.
*Problem Statement:*
Our customers reach out to us when they face problems in their Spark
Applications. Those problems can be related to Spark, environment issues,
their own c