[ANNOUNCE] Apache Spark 3.0.3 released

2021-06-24 Thread Yi Wu
We are happy to announce the availability of Spark 3.0.3!

Spark 3.0.3 is a maintenance release containing stability fixes. This
release is based on the branch-3.0 maintenance branch of Spark. We strongly
recommend all 3.0 users to upgrade to this stable release.

To download Spark 3.0.3, head over to the download page:
https://spark.apache.org/downloads.html

To view the release notes:
https://spark.apache.org/releases/spark-release-3-0-3.html

We would like to acknowledge all community members for contributing to this
release. This release would not have been possible without you.

Yi


Re: [DISCUSS] Spark cannot identify the problem executor

2020-09-13 Thread Yi Wu
The FetchFailed error of Task B will be forwarded to DAGScheduler too.
The FetchFailed already means the output missing of the stage. So
DAGScheduler will reschedule the upstream stage, which would reschedule the
upstream task of Task B at the end.

On Mon, Sep 14, 2020 at 10:39 AM 陈晓宇  wrote:

> Thanks Yi Wu and Sean. Here I mean shuffle data, and without shuffle
> service.
>
> spark.blacklist.application.fetchFailure.enabled=true; seems to be the
> answer and I was not noticed about it, thanks for pointing it out. And I
> will give a try.
>
> However I doubt how it would work: when Task B reported FetchFailed, this
> blacklist flag can be used to identify executor A, and tasks will not be
> scheduled to executor A any more. But would the upstream task for Task
> B(which was previously running on executor A) be re-scheduled by DAG
> scheduler? The DAG scheduler only reschedule a task unless it thinks the
> output of the task is missing(Please correct me if I am wrong). And unless
> executor A failed to report heartbeat for a timeout period, the driver
> still believe the output are there on executor A.
>
> Thanks again.
>
> Yi Wu  于2020年9月11日周五 下午9:24写道:
>
>> What do you mean by "read from executor A"? I can think of several paths
>> for an executor to read something from another remote executor:
>>
>> 1. shuffle data
>> If the executor fails to fetch the shuffle data, I think it will result
>> in the FetchFiled for the task. For this case, blacklist can identify the
>> problematic executor A
>> if spark.blacklist.application.fetchFailure.enabled=true;
>>
>> 2. RDD block
>> If the executor fails to fetch RDD blocks, I think the task would just do
>> the computation by itself instead of failing.
>>
>> 3. Broadcast block
>> If the executor fails to fetch the broadcast block, the task seems to
>> fail in this case and blacklist doesn't handle it well.
>>
>> Thanks,
>> Yi
>>
>> On Fri, Sep 11, 2020 at 8:43 PM Sean Owen  wrote:
>>
>>> -dev, +user
>>> Executors do not communicate directly, so I don't think that's quite
>>> what you are seeing. You'd have to clarify.
>>>
>>> On Fri, Sep 11, 2020 at 12:08 AM 陈晓宇  wrote:
>>> >
>>> > Hello all,
>>> >
>>> > We've been using spark 2.3 with blacklist enabled and  often meet the
>>> problem that when executor A has some problem(like connection issue). Tasks
>>> on executor B, executor C will fail saying cannot read from executor A.
>>> Finally the job will fail due to task on executor B failed 4 times.
>>> >
>>> > I wonder whether there is any existing fix or discussions how to
>>> identify Executor A as the problem node.
>>> >
>>> > Thanks
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>


Re: [DISCUSS] Spark cannot identify the problem executor

2020-09-11 Thread Yi Wu
What do you mean by "read from executor A"? I can think of several paths
for an executor to read something from another remote executor:

1. shuffle data
If the executor fails to fetch the shuffle data, I think it will result in
the FetchFiled for the task. For this case, blacklist can identify the
problematic executor A
if spark.blacklist.application.fetchFailure.enabled=true;

2. RDD block
If the executor fails to fetch RDD blocks, I think the task would just do
the computation by itself instead of failing.

3. Broadcast block
If the executor fails to fetch the broadcast block, the task seems to fail
in this case and blacklist doesn't handle it well.

Thanks,
Yi

On Fri, Sep 11, 2020 at 8:43 PM Sean Owen  wrote:

> -dev, +user
> Executors do not communicate directly, so I don't think that's quite
> what you are seeing. You'd have to clarify.
>
> On Fri, Sep 11, 2020 at 12:08 AM 陈晓宇  wrote:
> >
> > Hello all,
> >
> > We've been using spark 2.3 with blacklist enabled and  often meet the
> problem that when executor A has some problem(like connection issue). Tasks
> on executor B, executor C will fail saying cannot read from executor A.
> Finally the job will fail due to task on executor B failed 4 times.
> >
> > I wonder whether there is any existing fix or discussions how to
> identify Executor A as the problem node.
> >
> > Thanks
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>