spark lacks fault tolerance with dynamic partition overwrite

2020-04-02 Thread Koert Kuipers
i wanted to highlight here the issue we are facing with dynamic partition overwrite. it seems that any tasks that writes to disk using this feature and that need to be retried fails upon retry, leading to a failure for the entire job. we have seen this issue show up with preemption (task gets kil

Fwd: Automatic PR labeling

2020-04-02 Thread Hyukjin Kwon
Seems like this email missed to cc the mailing list, forwarding it for trackability. -- Forwarded message - 보낸사람: Ismaël Mejía Date: 2020년 4월 2일 (목) 오후 4:46 Subject: Re: Automatic PR labeling To: Hyukjin Kwon +1 Just for ref there is a really simple Github App for this: https:

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-02 Thread Takeshi Yamamuro
Also, I think the 3.0 release had better to include all the SQL document updates: https://issues.apache.org/jira/browse/SPARK-28588 On Fri, Apr 3, 2020 at 12:36 AM Sean Owen wrote: > (If it wasn't stated explicitly, yeah I think we knew there are a few > important unresolved issues and that this

Re: Automatic PR labeling

2020-04-02 Thread Hyukjin Kwon
Awesome! 2020년 4월 3일 (금) 오전 7:13, Nicholas Chammas 님이 작성: > SPARK-31330 : > Automatically label PRs based on the paths they touch > > On Wed, Apr 1, 2020 at 11:34 PM Hyukjin Kwon wrote: > >> @Nicholas Chammas Would you be interested >> in tacki

Re: [DISCUSS] filling affected versions on JIRA issue

2020-04-02 Thread Jungtaek Lim
On Fri, Apr 3, 2020 at 12:31 AM Sean Owen wrote: > On Wed, Apr 1, 2020 at 10:28 PM Jungtaek Lim > wrote: > > The definition of "latest version" would matter, especially there's a > time we prepare minor+ version release. > > > > For example, lots of people (even including committers) filed an >

Beginner PR against the Catalog API

2020-04-02 Thread Nicholas Chammas
I recently submitted my first Scala PR. It's very simple, though I don't know if I've done things correctly since I'm not a regular Scala user. SPARK-31000 : Add ability to set table description in the catalog https://github.com/apache/spark/pull

Re: Automatic PR labeling

2020-04-02 Thread Nicholas Chammas
SPARK-31330 : Automatically label PRs based on the paths they touch On Wed, Apr 1, 2020 at 11:34 PM Hyukjin Kwon wrote: > @Nicholas Chammas Would you be interested in > tacking a look? I would love this to be done. > > 2020년 3월 25일 (수) 오전 10:30

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-02 Thread Sean Owen
(If it wasn't stated explicitly, yeah I think we knew there are a few important unresolved issues and that this RC was going to fail. Let's all please test anyway of course, to flush out any additional issues, rather than wait. Pipelining and all that.) On Thu, Apr 2, 2020 at 10:31 AM Maxim Gekk

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-02 Thread Maxim Gekk
-1 (non-binding) The problem of compatibility with Spark 2.4 in reading/writing dates/timestamps hasn't been solved completely so far. In particular, the sub-task https://issues.apache.org/jira/browse/SPARK-31328 hasn't resolved yet. Maxim Gekk Software Engineer Databricks, Inc. On Wed, Apr 1

Re: [DISCUSS] filling affected versions on JIRA issue

2020-04-02 Thread Sean Owen
On Wed, Apr 1, 2020 at 10:28 PM Jungtaek Lim wrote: > The definition of "latest version" would matter, especially there's a time we > prepare minor+ version release. > > For example, lots of people (even including committers) filed an > "improvement" issue with setting fix version to 3.0, which