Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-21 Thread Shixiong(Ryan) Zhu
FYI. I found two more blockers:

https://issues.apache.org/jira/browse/SPARK-23475
https://issues.apache.org/jira/browse/SPARK-23481

On Wed, Feb 21, 2018 at 9:45 AM, Xiao Li  wrote:

> Hi, Ryan,
>
> In this release, Data Source V2 is experimental. We are still collecting
> the feedbacks from the community and will improve the related APIs and
> implementation in the next 2.4 release.
>
> Thanks,
>
> Xiao
>
> 2018-02-21 9:43 GMT-08:00 Xiao Li :
>
>> Hi, Justin,
>>
>> Based on my understanding, SPARK-17147 is also not a regression. Thus,
>> Spark 2.3.0 is unable to contain it. We have to wait for the committers who
>> are familiar with Spark Streaming to make a decision whether we can fix the
>> issue in Spark 2.3.1.
>>
>> Since this is open source, feel free to add the patch in your local build.
>>
>> Thanks for using Spark!
>>
>> Xiao
>>
>>
>> 2018-02-21 9:36 GMT-08:00 Ryan Blue :
>>
>>> No problem if we can't add them, this is experimental anyway so this
>>> release should be more about validating the API and the start of our
>>> implementation. I just don't think we can recommend that anyone actually
>>> use DataSourceV2 without these patches.
>>>
>>> On Wed, Feb 21, 2018 at 9:21 AM, Wenchen Fan 
>>> wrote:
>>>
 SPARK-23323 adds a new API, I'm not sure we can still do it at this
 stage of the release... Besides users can work around it by calling the
 spark output coordinator themselves in their data source.

 SPARK-23203 is non-trivial and didn't fix any known bugs, so it's hard
 to convince other people that it's safe to add it to the release during the
 RC phase.

 SPARK-23418 depends on the above one.

 Generally they are good to have in Spark 2.3, if they were merged
 before the RC. I think this is a lesson we should learn from, that we
 should work on stuff we want in the release before the RC, instead of 
 after.

 On Thu, Feb 22, 2018 at 1:01 AM, Ryan Blue 
 wrote:

> What does everyone think about getting some of the newer DataSourceV2
> improvements in? It should be low risk because it is a new code path, and
> v2 isn't very usable without things like support for using the output
> commit coordinator to deconflict writes.
>
> The ones I'd like to get in are:
> * Use the output commit coordinator: https://issues.ap
> ache.org/jira/browse/SPARK-23323
> * Use immutable trees and the same push-down logic as other read
> paths: https://issues.apache.org/jira/browse/SPARK-23203
> * Don't allow users to supply schemas when they aren't supported:
> https://issues.apache.org/jira/browse/SPARK-23418
>
> I think it would make the 2.3.0 release more usable for anyone
> interested in the v2 read and write paths.
>
> Thanks!
>
> On Tue, Feb 20, 2018 at 7:07 PM, Weichen Xu  > wrote:
>
>> +1
>>
>> On Wed, Feb 21, 2018 at 10:07 AM, Marcelo Vanzin > > wrote:
>>
>>> Done, thanks!
>>>
>>> On Tue, Feb 20, 2018 at 6:05 PM, Sameer Agarwal 
>>> wrote:
>>> > Sure, please feel free to backport.
>>> >
>>> > On 20 February 2018 at 18:02, Marcelo Vanzin 
>>> wrote:
>>> >>
>>> >> Hey Sameer,
>>> >>
>>> >> Mind including https://github.com/apache/spark/pull/20643
>>> >> (SPARK-23468)  in the new RC? It's a minor bug since I've only
>>> hit it
>>> >> with older shuffle services, but it's pretty safe.
>>> >>
>>> >> On Tue, Feb 20, 2018 at 5:58 PM, Sameer Agarwal <
>>> samee...@apache.org>
>>> >> wrote:
>>> >> > This RC has failed due to
>>> >> > https://issues.apache.org/jira/browse/SPARK-23470.
>>> >> > Now that the fix has been merged in 2.3 (thanks Marcelo!), I'll
>>> follow
>>> >> > up
>>> >> > with an RC5 soon.
>>> >> >
>>> >> > On 20 February 2018 at 16:49, Ryan Blue 
>>> wrote:
>>> >> >>
>>> >> >> +1
>>> >> >>
>>> >> >> Build & tests look fine, checked signature and checksums for
>>> src
>>> >> >> tarball.
>>> >> >>
>>> >> >> On Tue, Feb 20, 2018 at 12:54 PM, Shixiong(Ryan) Zhu
>>> >> >>  wrote:
>>> >> >>>
>>> >> >>> I'm -1 because of the UI regression
>>> >> >>> https://issues.apache.org/jira/browse/SPARK-23470 : the All
>>> Jobs page
>>> >> >>> may be
>>> >> >>> too slow and cause "read timeout" when there are lots of jobs
>>> and
>>> >> >>> stages.
>>> >> >>> This is one of the most important pages because when it's
>>> broken, it's
>>> >> >>> pretty hard to use Spark Web UI.
>>> >> >>>
>>> >> >>>
>>> >> >>> On Tue, Feb 20, 2018 at 4:37 AM, Marco Gaido <
>>> marcogaid...@gmail.com>

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-21 Thread Justin Miller
Ah gotcha thanks for letting me know. We’ve been using the patch in production 
for a couple weeks now and it’s been working great. If anyone else runs into 
the issue (non-compacted topics have “gaps” in offsets) feel free to have them 
e-mail me and I can try to help them get going with patching their own systems 
until 2.3.1 is available.

Thanks,
Justin

> On Feb 21, 2018, at 10:43 AM, Xiao Li  wrote:
> 
> Hi, Justin, 
> 
> Based on my understanding, SPARK-17147 is also not a regression. Thus, Spark 
> 2.3.0 is unable to contain it. We have to wait for the committers who are 
> familiar with Spark Streaming to make a decision whether we can fix the issue 
> in Spark 2.3.1.
> 
> Since this is open source, feel free to add the patch in your local build.
> 
> Thanks for using Spark!
> 
> Xiao
> 
> 
> 2018-02-21 9:36 GMT-08:00 Ryan Blue  >:
> No problem if we can't add them, this is experimental anyway so this release 
> should be more about validating the API and the start of our implementation. 
> I just don't think we can recommend that anyone actually use DataSourceV2 
> without these patches.
> 
> On Wed, Feb 21, 2018 at 9:21 AM, Wenchen Fan  > wrote:
> SPARK-23323 adds a new API, I'm not sure we can still do it at this stage of 
> the release... Besides users can work around it by calling the spark output 
> coordinator themselves in their data source.
> 
> SPARK-23203 is non-trivial and didn't fix any known bugs, so it's hard to 
> convince other people that it's safe to add it to the release during the RC 
> phase.
> 
> SPARK-23418 depends on the above one.
> 
> Generally they are good to have in Spark 2.3, if they were merged before the 
> RC. I think this is a lesson we should learn from, that we should work on 
> stuff we want in the release before the RC, instead of after.
> 
> On Thu, Feb 22, 2018 at 1:01 AM, Ryan Blue  > wrote:
> What does everyone think about getting some of the newer DataSourceV2 
> improvements in? It should be low risk because it is a new code path, and v2 
> isn't very usable without things like support for using the output commit 
> coordinator to deconflict writes.
> 
> The ones I'd like to get in are:
> * Use the output commit coordinator: 
> https://issues.apache.org/jira/browse/SPARK-23323 
> 
> * Use immutable trees and the same push-down logic as other read paths: 
> https://issues.apache.org/jira/browse/SPARK-23203 
> 
> * Don't allow users to supply schemas when they aren't supported: 
> https://issues.apache.org/jira/browse/SPARK-23418 
> 
> 
> I think it would make the 2.3.0 release more usable for anyone interested in 
> the v2 read and write paths.
> 
> Thanks!
> 
> On Tue, Feb 20, 2018 at 7:07 PM, Weichen Xu  > wrote:
> +1
> 
> On Wed, Feb 21, 2018 at 10:07 AM, Marcelo Vanzin  > wrote:
> Done, thanks!
> 
> On Tue, Feb 20, 2018 at 6:05 PM, Sameer Agarwal  > wrote:
> > Sure, please feel free to backport.
> >
> > On 20 February 2018 at 18:02, Marcelo Vanzin  > > wrote:
> >>
> >> Hey Sameer,
> >>
> >> Mind including https://github.com/apache/spark/pull/20643 
> >> 
> >> (SPARK-23468)  in the new RC? It's a minor bug since I've only hit it
> >> with older shuffle services, but it's pretty safe.
> >>
> >> On Tue, Feb 20, 2018 at 5:58 PM, Sameer Agarwal  >> >
> >> wrote:
> >> > This RC has failed due to
> >> > https://issues.apache.org/jira/browse/SPARK-23470 
> >> > .
> >> > Now that the fix has been merged in 2.3 (thanks Marcelo!), I'll follow
> >> > up
> >> > with an RC5 soon.
> >> >
> >> > On 20 February 2018 at 16:49, Ryan Blue  >> > > wrote:
> >> >>
> >> >> +1
> >> >>
> >> >> Build & tests look fine, checked signature and checksums for src
> >> >> tarball.
> >> >>
> >> >> On Tue, Feb 20, 2018 at 12:54 PM, Shixiong(Ryan) Zhu
> >> >> > wrote:
> >> >>>
> >> >>> I'm -1 because of the UI regression
> >> >>> https://issues.apache.org/jira/browse/SPARK-23470 
> >> >>>  : the All Jobs page
> >> >>> may be
> >> >>> too slow and cause "read timeout" when there are lots of jobs and
> >> >>> stages.
> >> >>> This is one of the most important pages because when it's broken, it's
> >> >>> pretty hard to use Spark Web 

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-21 Thread Michael Armbrust
I'm -1 on any changes that aren't fixing major regressions from 2.2 at this
point. Also in any cases where its possible we should be flipping new
features off if they are still regressing, rather than continuing to
attempt to fix them.

Since its experimental, I would support backporting the DataSourceV2
patches into 2.3.1 so that there is more opportunity for feedback as the
API matures.

On Wed, Feb 21, 2018 at 11:32 AM, Shixiong(Ryan) Zhu <
shixi...@databricks.com> wrote:

> FYI. I found two more blockers:
>
> https://issues.apache.org/jira/browse/SPARK-23475
> https://issues.apache.org/jira/browse/SPARK-23481
>
> On Wed, Feb 21, 2018 at 9:45 AM, Xiao Li  wrote:
>
>> Hi, Ryan,
>>
>> In this release, Data Source V2 is experimental. We are still collecting
>> the feedbacks from the community and will improve the related APIs and
>> implementation in the next 2.4 release.
>>
>> Thanks,
>>
>> Xiao
>>
>> 2018-02-21 9:43 GMT-08:00 Xiao Li :
>>
>>> Hi, Justin,
>>>
>>> Based on my understanding, SPARK-17147 is also not a regression. Thus,
>>> Spark 2.3.0 is unable to contain it. We have to wait for the committers who
>>> are familiar with Spark Streaming to make a decision whether we can fix the
>>> issue in Spark 2.3.1.
>>>
>>> Since this is open source, feel free to add the patch in your local
>>> build.
>>>
>>> Thanks for using Spark!
>>>
>>> Xiao
>>>
>>>
>>> 2018-02-21 9:36 GMT-08:00 Ryan Blue :
>>>
 No problem if we can't add them, this is experimental anyway so this
 release should be more about validating the API and the start of our
 implementation. I just don't think we can recommend that anyone actually
 use DataSourceV2 without these patches.

 On Wed, Feb 21, 2018 at 9:21 AM, Wenchen Fan 
 wrote:

> SPARK-23323 adds a new API, I'm not sure we can still do it at this
> stage of the release... Besides users can work around it by calling the
> spark output coordinator themselves in their data source.
>
> SPARK-23203 is non-trivial and didn't fix any known bugs, so it's hard
> to convince other people that it's safe to add it to the release during 
> the
> RC phase.
>
> SPARK-23418 depends on the above one.
>
> Generally they are good to have in Spark 2.3, if they were merged
> before the RC. I think this is a lesson we should learn from, that we
> should work on stuff we want in the release before the RC, instead of 
> after.
>
> On Thu, Feb 22, 2018 at 1:01 AM, Ryan Blue 
> wrote:
>
>> What does everyone think about getting some of the newer DataSourceV2
>> improvements in? It should be low risk because it is a new code path, and
>> v2 isn't very usable without things like support for using the output
>> commit coordinator to deconflict writes.
>>
>> The ones I'd like to get in are:
>> * Use the output commit coordinator: https://issues.ap
>> ache.org/jira/browse/SPARK-23323
>> * Use immutable trees and the same push-down logic as other read
>> paths: https://issues.apache.org/jira/browse/SPARK-23203
>> * Don't allow users to supply schemas when they aren't supported:
>> https://issues.apache.org/jira/browse/SPARK-23418
>>
>> I think it would make the 2.3.0 release more usable for anyone
>> interested in the v2 read and write paths.
>>
>> Thanks!
>>
>> On Tue, Feb 20, 2018 at 7:07 PM, Weichen Xu <
>> weichen...@databricks.com> wrote:
>>
>>> +1
>>>
>>> On Wed, Feb 21, 2018 at 10:07 AM, Marcelo Vanzin <
>>> van...@cloudera.com> wrote:
>>>
 Done, thanks!

 On Tue, Feb 20, 2018 at 6:05 PM, Sameer Agarwal <
 samee...@apache.org> wrote:
 > Sure, please feel free to backport.
 >
 > On 20 February 2018 at 18:02, Marcelo Vanzin 
 wrote:
 >>
 >> Hey Sameer,
 >>
 >> Mind including https://github.com/apache/spark/pull/20643
 >> (SPARK-23468)  in the new RC? It's a minor bug since I've only
 hit it
 >> with older shuffle services, but it's pretty safe.
 >>
 >> On Tue, Feb 20, 2018 at 5:58 PM, Sameer Agarwal <
 samee...@apache.org>
 >> wrote:
 >> > This RC has failed due to
 >> > https://issues.apache.org/jira/browse/SPARK-23470.
 >> > Now that the fix has been merged in 2.3 (thanks Marcelo!),
 I'll follow
 >> > up
 >> > with an RC5 soon.
 >> >
 >> > On 20 February 2018 at 16:49, Ryan Blue 
 wrote:
 >> >>
 >> >> +1
 >> >>
 >> >> Build & tests look fine, checked signature and checksums for
 src
 >> >> tarball.
 >> >>
 >> >> On Tue, Feb 20, 2018 at 12:54 PM, 

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-21 Thread Xiao Li
Hi, Ryan,

In this release, Data Source V2 is experimental. We are still collecting
the feedbacks from the community and will improve the related APIs and
implementation in the next 2.4 release.

Thanks,

Xiao

2018-02-21 9:43 GMT-08:00 Xiao Li :

> Hi, Justin,
>
> Based on my understanding, SPARK-17147 is also not a regression. Thus,
> Spark 2.3.0 is unable to contain it. We have to wait for the committers who
> are familiar with Spark Streaming to make a decision whether we can fix the
> issue in Spark 2.3.1.
>
> Since this is open source, feel free to add the patch in your local build.
>
> Thanks for using Spark!
>
> Xiao
>
>
> 2018-02-21 9:36 GMT-08:00 Ryan Blue :
>
>> No problem if we can't add them, this is experimental anyway so this
>> release should be more about validating the API and the start of our
>> implementation. I just don't think we can recommend that anyone actually
>> use DataSourceV2 without these patches.
>>
>> On Wed, Feb 21, 2018 at 9:21 AM, Wenchen Fan  wrote:
>>
>>> SPARK-23323 adds a new API, I'm not sure we can still do it at this
>>> stage of the release... Besides users can work around it by calling the
>>> spark output coordinator themselves in their data source.
>>>
>>> SPARK-23203 is non-trivial and didn't fix any known bugs, so it's hard
>>> to convince other people that it's safe to add it to the release during the
>>> RC phase.
>>>
>>> SPARK-23418 depends on the above one.
>>>
>>> Generally they are good to have in Spark 2.3, if they were merged before
>>> the RC. I think this is a lesson we should learn from, that we should work
>>> on stuff we want in the release before the RC, instead of after.
>>>
>>> On Thu, Feb 22, 2018 at 1:01 AM, Ryan Blue 
>>> wrote:
>>>
 What does everyone think about getting some of the newer DataSourceV2
 improvements in? It should be low risk because it is a new code path, and
 v2 isn't very usable without things like support for using the output
 commit coordinator to deconflict writes.

 The ones I'd like to get in are:
 * Use the output commit coordinator: https://issues.ap
 ache.org/jira/browse/SPARK-23323
 * Use immutable trees and the same push-down logic as other read paths:
 https://issues.apache.org/jira/browse/SPARK-23203
 * Don't allow users to supply schemas when they aren't supported:
 https://issues.apache.org/jira/browse/SPARK-23418

 I think it would make the 2.3.0 release more usable for anyone
 interested in the v2 read and write paths.

 Thanks!

 On Tue, Feb 20, 2018 at 7:07 PM, Weichen Xu 
 wrote:

> +1
>
> On Wed, Feb 21, 2018 at 10:07 AM, Marcelo Vanzin 
> wrote:
>
>> Done, thanks!
>>
>> On Tue, Feb 20, 2018 at 6:05 PM, Sameer Agarwal 
>> wrote:
>> > Sure, please feel free to backport.
>> >
>> > On 20 February 2018 at 18:02, Marcelo Vanzin 
>> wrote:
>> >>
>> >> Hey Sameer,
>> >>
>> >> Mind including https://github.com/apache/spark/pull/20643
>> >> (SPARK-23468)  in the new RC? It's a minor bug since I've only hit
>> it
>> >> with older shuffle services, but it's pretty safe.
>> >>
>> >> On Tue, Feb 20, 2018 at 5:58 PM, Sameer Agarwal <
>> samee...@apache.org>
>> >> wrote:
>> >> > This RC has failed due to
>> >> > https://issues.apache.org/jira/browse/SPARK-23470.
>> >> > Now that the fix has been merged in 2.3 (thanks Marcelo!), I'll
>> follow
>> >> > up
>> >> > with an RC5 soon.
>> >> >
>> >> > On 20 February 2018 at 16:49, Ryan Blue 
>> wrote:
>> >> >>
>> >> >> +1
>> >> >>
>> >> >> Build & tests look fine, checked signature and checksums for src
>> >> >> tarball.
>> >> >>
>> >> >> On Tue, Feb 20, 2018 at 12:54 PM, Shixiong(Ryan) Zhu
>> >> >>  wrote:
>> >> >>>
>> >> >>> I'm -1 because of the UI regression
>> >> >>> https://issues.apache.org/jira/browse/SPARK-23470 : the All
>> Jobs page
>> >> >>> may be
>> >> >>> too slow and cause "read timeout" when there are lots of jobs
>> and
>> >> >>> stages.
>> >> >>> This is one of the most important pages because when it's
>> broken, it's
>> >> >>> pretty hard to use Spark Web UI.
>> >> >>>
>> >> >>>
>> >> >>> On Tue, Feb 20, 2018 at 4:37 AM, Marco Gaido <
>> marcogaid...@gmail.com>
>> >> >>> wrote:
>> >> 
>> >>  +1
>> >> 
>> >>  2018-02-20 12:30 GMT+01:00 Hyukjin Kwon > >:
>> >> >
>> >> > +1 too
>> >> >
>> >> > 2018-02-20 14:41 GMT+09:00 Takuya UESHIN <
>> ues...@happy-camper.st>:
>> >> >>
>> >> >> +1
>> >> >>
>> >> 

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-21 Thread Xiao Li
Hi, Justin,

Based on my understanding, SPARK-17147 is also not a regression. Thus,
Spark 2.3.0 is unable to contain it. We have to wait for the committers who
are familiar with Spark Streaming to make a decision whether we can fix the
issue in Spark 2.3.1.

Since this is open source, feel free to add the patch in your local build.

Thanks for using Spark!

Xiao


2018-02-21 9:36 GMT-08:00 Ryan Blue :

> No problem if we can't add them, this is experimental anyway so this
> release should be more about validating the API and the start of our
> implementation. I just don't think we can recommend that anyone actually
> use DataSourceV2 without these patches.
>
> On Wed, Feb 21, 2018 at 9:21 AM, Wenchen Fan  wrote:
>
>> SPARK-23323 adds a new API, I'm not sure we can still do it at this stage
>> of the release... Besides users can work around it by calling the spark
>> output coordinator themselves in their data source.
>>
>> SPARK-23203 is non-trivial and didn't fix any known bugs, so it's hard to
>> convince other people that it's safe to add it to the release during the RC
>> phase.
>>
>> SPARK-23418 depends on the above one.
>>
>> Generally they are good to have in Spark 2.3, if they were merged before
>> the RC. I think this is a lesson we should learn from, that we should work
>> on stuff we want in the release before the RC, instead of after.
>>
>> On Thu, Feb 22, 2018 at 1:01 AM, Ryan Blue 
>> wrote:
>>
>>> What does everyone think about getting some of the newer DataSourceV2
>>> improvements in? It should be low risk because it is a new code path, and
>>> v2 isn't very usable without things like support for using the output
>>> commit coordinator to deconflict writes.
>>>
>>> The ones I'd like to get in are:
>>> * Use the output commit coordinator: https://issues.ap
>>> ache.org/jira/browse/SPARK-23323
>>> * Use immutable trees and the same push-down logic as other read paths:
>>> https://issues.apache.org/jira/browse/SPARK-23203
>>> * Don't allow users to supply schemas when they aren't supported:
>>> https://issues.apache.org/jira/browse/SPARK-23418
>>>
>>> I think it would make the 2.3.0 release more usable for anyone
>>> interested in the v2 read and write paths.
>>>
>>> Thanks!
>>>
>>> On Tue, Feb 20, 2018 at 7:07 PM, Weichen Xu 
>>> wrote:
>>>
 +1

 On Wed, Feb 21, 2018 at 10:07 AM, Marcelo Vanzin 
 wrote:

> Done, thanks!
>
> On Tue, Feb 20, 2018 at 6:05 PM, Sameer Agarwal 
> wrote:
> > Sure, please feel free to backport.
> >
> > On 20 February 2018 at 18:02, Marcelo Vanzin 
> wrote:
> >>
> >> Hey Sameer,
> >>
> >> Mind including https://github.com/apache/spark/pull/20643
> >> (SPARK-23468)  in the new RC? It's a minor bug since I've only hit
> it
> >> with older shuffle services, but it's pretty safe.
> >>
> >> On Tue, Feb 20, 2018 at 5:58 PM, Sameer Agarwal <
> samee...@apache.org>
> >> wrote:
> >> > This RC has failed due to
> >> > https://issues.apache.org/jira/browse/SPARK-23470.
> >> > Now that the fix has been merged in 2.3 (thanks Marcelo!), I'll
> follow
> >> > up
> >> > with an RC5 soon.
> >> >
> >> > On 20 February 2018 at 16:49, Ryan Blue 
> wrote:
> >> >>
> >> >> +1
> >> >>
> >> >> Build & tests look fine, checked signature and checksums for src
> >> >> tarball.
> >> >>
> >> >> On Tue, Feb 20, 2018 at 12:54 PM, Shixiong(Ryan) Zhu
> >> >>  wrote:
> >> >>>
> >> >>> I'm -1 because of the UI regression
> >> >>> https://issues.apache.org/jira/browse/SPARK-23470 : the All
> Jobs page
> >> >>> may be
> >> >>> too slow and cause "read timeout" when there are lots of jobs
> and
> >> >>> stages.
> >> >>> This is one of the most important pages because when it's
> broken, it's
> >> >>> pretty hard to use Spark Web UI.
> >> >>>
> >> >>>
> >> >>> On Tue, Feb 20, 2018 at 4:37 AM, Marco Gaido <
> marcogaid...@gmail.com>
> >> >>> wrote:
> >> 
> >>  +1
> >> 
> >>  2018-02-20 12:30 GMT+01:00 Hyukjin Kwon :
> >> >
> >> > +1 too
> >> >
> >> > 2018-02-20 14:41 GMT+09:00 Takuya UESHIN <
> ues...@happy-camper.st>:
> >> >>
> >> >> +1
> >> >>
> >> >>
> >> >> On Tue, Feb 20, 2018 at 2:14 PM, Xingbo Jiang
> >> >> 
> >> >> wrote:
> >> >>>
> >> >>> +1
> >> >>>
> >> >>>
> >> >>> Wenchen Fan 于2018年2月20日 周二下午1:09写道:
> >> 
> >>  +1
> >> 
> >>  On Tue, Feb 20, 2018 at 12:53 PM, Reynold Xin
> >>  

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-21 Thread Ryan Blue
No problem if we can't add them, this is experimental anyway so this
release should be more about validating the API and the start of our
implementation. I just don't think we can recommend that anyone actually
use DataSourceV2 without these patches.

On Wed, Feb 21, 2018 at 9:21 AM, Wenchen Fan  wrote:

> SPARK-23323 adds a new API, I'm not sure we can still do it at this stage
> of the release... Besides users can work around it by calling the spark
> output coordinator themselves in their data source.
>
> SPARK-23203 is non-trivial and didn't fix any known bugs, so it's hard to
> convince other people that it's safe to add it to the release during the RC
> phase.
>
> SPARK-23418 depends on the above one.
>
> Generally they are good to have in Spark 2.3, if they were merged before
> the RC. I think this is a lesson we should learn from, that we should work
> on stuff we want in the release before the RC, instead of after.
>
> On Thu, Feb 22, 2018 at 1:01 AM, Ryan Blue 
> wrote:
>
>> What does everyone think about getting some of the newer DataSourceV2
>> improvements in? It should be low risk because it is a new code path, and
>> v2 isn't very usable without things like support for using the output
>> commit coordinator to deconflict writes.
>>
>> The ones I'd like to get in are:
>> * Use the output commit coordinator: https://issues.ap
>> ache.org/jira/browse/SPARK-23323
>> * Use immutable trees and the same push-down logic as other read paths:
>> https://issues.apache.org/jira/browse/SPARK-23203
>> * Don't allow users to supply schemas when they aren't supported:
>> https://issues.apache.org/jira/browse/SPARK-23418
>>
>> I think it would make the 2.3.0 release more usable for anyone interested
>> in the v2 read and write paths.
>>
>> Thanks!
>>
>> On Tue, Feb 20, 2018 at 7:07 PM, Weichen Xu 
>> wrote:
>>
>>> +1
>>>
>>> On Wed, Feb 21, 2018 at 10:07 AM, Marcelo Vanzin 
>>> wrote:
>>>
 Done, thanks!

 On Tue, Feb 20, 2018 at 6:05 PM, Sameer Agarwal 
 wrote:
 > Sure, please feel free to backport.
 >
 > On 20 February 2018 at 18:02, Marcelo Vanzin 
 wrote:
 >>
 >> Hey Sameer,
 >>
 >> Mind including https://github.com/apache/spark/pull/20643
 >> (SPARK-23468)  in the new RC? It's a minor bug since I've only hit it
 >> with older shuffle services, but it's pretty safe.
 >>
 >> On Tue, Feb 20, 2018 at 5:58 PM, Sameer Agarwal 
 >> wrote:
 >> > This RC has failed due to
 >> > https://issues.apache.org/jira/browse/SPARK-23470.
 >> > Now that the fix has been merged in 2.3 (thanks Marcelo!), I'll
 follow
 >> > up
 >> > with an RC5 soon.
 >> >
 >> > On 20 February 2018 at 16:49, Ryan Blue  wrote:
 >> >>
 >> >> +1
 >> >>
 >> >> Build & tests look fine, checked signature and checksums for src
 >> >> tarball.
 >> >>
 >> >> On Tue, Feb 20, 2018 at 12:54 PM, Shixiong(Ryan) Zhu
 >> >>  wrote:
 >> >>>
 >> >>> I'm -1 because of the UI regression
 >> >>> https://issues.apache.org/jira/browse/SPARK-23470 : the All
 Jobs page
 >> >>> may be
 >> >>> too slow and cause "read timeout" when there are lots of jobs and
 >> >>> stages.
 >> >>> This is one of the most important pages because when it's
 broken, it's
 >> >>> pretty hard to use Spark Web UI.
 >> >>>
 >> >>>
 >> >>> On Tue, Feb 20, 2018 at 4:37 AM, Marco Gaido <
 marcogaid...@gmail.com>
 >> >>> wrote:
 >> 
 >>  +1
 >> 
 >>  2018-02-20 12:30 GMT+01:00 Hyukjin Kwon :
 >> >
 >> > +1 too
 >> >
 >> > 2018-02-20 14:41 GMT+09:00 Takuya UESHIN <
 ues...@happy-camper.st>:
 >> >>
 >> >> +1
 >> >>
 >> >>
 >> >> On Tue, Feb 20, 2018 at 2:14 PM, Xingbo Jiang
 >> >> 
 >> >> wrote:
 >> >>>
 >> >>> +1
 >> >>>
 >> >>>
 >> >>> Wenchen Fan 于2018年2月20日 周二下午1:09写道:
 >> 
 >>  +1
 >> 
 >>  On Tue, Feb 20, 2018 at 12:53 PM, Reynold Xin
 >>  
 >>  wrote:
 >> >
 >> > +1
 >> >
 >> > On Feb 20, 2018, 5:51 PM +1300, Sameer Agarwal
 >> > , wrote:
 >> >>
 >> >> this file shouldn't be included?
 >> >>
 >> >> https://dist.apache.org/repos/
 dist/dev/spark/v2.3.0-rc4-bin/spark-parent_2.11.iml
 >> >
 >> >
 >> > I've now deleted this file
 >> >
 >> >> From: Sameer Agarwal 
 >> 

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-21 Thread Wenchen Fan
SPARK-23323 adds a new API, I'm not sure we can still do it at this stage
of the release... Besides users can work around it by calling the spark
output coordinator themselves in their data source.

SPARK-23203 is non-trivial and didn't fix any known bugs, so it's hard to
convince other people that it's safe to add it to the release during the RC
phase.

SPARK-23418 depends on the above one.

Generally they are good to have in Spark 2.3, if they were merged before
the RC. I think this is a lesson we should learn from, that we should work
on stuff we want in the release before the RC, instead of after.

On Thu, Feb 22, 2018 at 1:01 AM, Ryan Blue 
wrote:

> What does everyone think about getting some of the newer DataSourceV2
> improvements in? It should be low risk because it is a new code path, and
> v2 isn't very usable without things like support for using the output
> commit coordinator to deconflict writes.
>
> The ones I'd like to get in are:
> * Use the output commit coordinator: https://issues.ap
> ache.org/jira/browse/SPARK-23323
> * Use immutable trees and the same push-down logic as other read paths:
> https://issues.apache.org/jira/browse/SPARK-23203
> * Don't allow users to supply schemas when they aren't supported:
> https://issues.apache.org/jira/browse/SPARK-23418
>
> I think it would make the 2.3.0 release more usable for anyone interested
> in the v2 read and write paths.
>
> Thanks!
>
> On Tue, Feb 20, 2018 at 7:07 PM, Weichen Xu 
> wrote:
>
>> +1
>>
>> On Wed, Feb 21, 2018 at 10:07 AM, Marcelo Vanzin 
>> wrote:
>>
>>> Done, thanks!
>>>
>>> On Tue, Feb 20, 2018 at 6:05 PM, Sameer Agarwal 
>>> wrote:
>>> > Sure, please feel free to backport.
>>> >
>>> > On 20 February 2018 at 18:02, Marcelo Vanzin 
>>> wrote:
>>> >>
>>> >> Hey Sameer,
>>> >>
>>> >> Mind including https://github.com/apache/spark/pull/20643
>>> >> (SPARK-23468)  in the new RC? It's a minor bug since I've only hit it
>>> >> with older shuffle services, but it's pretty safe.
>>> >>
>>> >> On Tue, Feb 20, 2018 at 5:58 PM, Sameer Agarwal 
>>> >> wrote:
>>> >> > This RC has failed due to
>>> >> > https://issues.apache.org/jira/browse/SPARK-23470.
>>> >> > Now that the fix has been merged in 2.3 (thanks Marcelo!), I'll
>>> follow
>>> >> > up
>>> >> > with an RC5 soon.
>>> >> >
>>> >> > On 20 February 2018 at 16:49, Ryan Blue  wrote:
>>> >> >>
>>> >> >> +1
>>> >> >>
>>> >> >> Build & tests look fine, checked signature and checksums for src
>>> >> >> tarball.
>>> >> >>
>>> >> >> On Tue, Feb 20, 2018 at 12:54 PM, Shixiong(Ryan) Zhu
>>> >> >>  wrote:
>>> >> >>>
>>> >> >>> I'm -1 because of the UI regression
>>> >> >>> https://issues.apache.org/jira/browse/SPARK-23470 : the All Jobs
>>> page
>>> >> >>> may be
>>> >> >>> too slow and cause "read timeout" when there are lots of jobs and
>>> >> >>> stages.
>>> >> >>> This is one of the most important pages because when it's broken,
>>> it's
>>> >> >>> pretty hard to use Spark Web UI.
>>> >> >>>
>>> >> >>>
>>> >> >>> On Tue, Feb 20, 2018 at 4:37 AM, Marco Gaido <
>>> marcogaid...@gmail.com>
>>> >> >>> wrote:
>>> >> 
>>> >>  +1
>>> >> 
>>> >>  2018-02-20 12:30 GMT+01:00 Hyukjin Kwon :
>>> >> >
>>> >> > +1 too
>>> >> >
>>> >> > 2018-02-20 14:41 GMT+09:00 Takuya UESHIN <
>>> ues...@happy-camper.st>:
>>> >> >>
>>> >> >> +1
>>> >> >>
>>> >> >>
>>> >> >> On Tue, Feb 20, 2018 at 2:14 PM, Xingbo Jiang
>>> >> >> 
>>> >> >> wrote:
>>> >> >>>
>>> >> >>> +1
>>> >> >>>
>>> >> >>>
>>> >> >>> Wenchen Fan 于2018年2月20日 周二下午1:09写道:
>>> >> 
>>> >>  +1
>>> >> 
>>> >>  On Tue, Feb 20, 2018 at 12:53 PM, Reynold Xin
>>> >>  
>>> >>  wrote:
>>> >> >
>>> >> > +1
>>> >> >
>>> >> > On Feb 20, 2018, 5:51 PM +1300, Sameer Agarwal
>>> >> > , wrote:
>>> >> >>
>>> >> >> this file shouldn't be included?
>>> >> >>
>>> >> >> https://dist.apache.org/repos/
>>> dist/dev/spark/v2.3.0-rc4-bin/spark-parent_2.11.iml
>>> >> >
>>> >> >
>>> >> > I've now deleted this file
>>> >> >
>>> >> >> From: Sameer Agarwal 
>>> >> >> Sent: Saturday, February 17, 2018 1:43:39 PM
>>> >> >> To: Sameer Agarwal
>>> >> >> Cc: dev
>>> >> >> Subject: Re: [VOTE] Spark 2.3.0 (RC4)
>>> >> >>
>>> >> >> I'll start with a +1 once again.
>>> >> >>
>>> >> >> All blockers reported against RC3 have been resolved and
>>> the
>>> >> >> builds are healthy.
>>> >> >>
>>> >> >> On 17 February 2018 at 13:41, Sameer Agarwal
>>> >> 

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-21 Thread Justin Miller
Greetings,

I would also like to ask if the following ticket could make it in to 2.3.0. I’m 
currently testing the code in production as we were running into issues on 
non-compacted topics (very occasionally) running into non-consecutive offsets. 
I imagine other people will encounter similar issues if they’re doing 15+ 
billion records a day. 

https://github.com/apache/spark/pull/20572 
 (SPARK-17147)

Thanks,
Justin

> On Feb 21, 2018, at 10:21 AM, kant kodali  wrote:
> 
> Hi All,
> 
> +1 for the tickets proposed by Ryan Blue
> 
> Any possible chance of this one 
> https://issues.apache.org/jira/browse/SPARK-23406 
>  getting into 2.3.0? It's 
> a very important feature for us so if it doesn't make the cut I would have to 
> cherry-pick this commit and compile from the source for our production 
> release.
> 
> Thanks!
> 
> On Wed, Feb 21, 2018 at 9:01 AM, Ryan Blue  > wrote:
> What does everyone think about getting some of the newer DataSourceV2 
> improvements in? It should be low risk because it is a new code path, and v2 
> isn't very usable without things like support for using the output commit 
> coordinator to deconflict writes.
> 
> The ones I'd like to get in are:
> * Use the output commit coordinator: 
> https://issues.apache.org/jira/browse/SPARK-23323 
> 
> * Use immutable trees and the same push-down logic as other read paths: 
> https://issues.apache.org/jira/browse/SPARK-23203 
> 
> * Don't allow users to supply schemas when they aren't supported: 
> https://issues.apache.org/jira/browse/SPARK-23418 
> 
> 
> I think it would make the 2.3.0 release more usable for anyone interested in 
> the v2 read and write paths.
> 
> Thanks!
> 
> On Tue, Feb 20, 2018 at 7:07 PM, Weichen Xu  > wrote:
> +1
> 
> On Wed, Feb 21, 2018 at 10:07 AM, Marcelo Vanzin  > wrote:
> Done, thanks!
> 
> On Tue, Feb 20, 2018 at 6:05 PM, Sameer Agarwal  > wrote:
> > Sure, please feel free to backport.
> >
> > On 20 February 2018 at 18:02, Marcelo Vanzin  > > wrote:
> >>
> >> Hey Sameer,
> >>
> >> Mind including https://github.com/apache/spark/pull/20643 
> >> 
> >> (SPARK-23468)  in the new RC? It's a minor bug since I've only hit it
> >> with older shuffle services, but it's pretty safe.
> >>
> >> On Tue, Feb 20, 2018 at 5:58 PM, Sameer Agarwal  >> >
> >> wrote:
> >> > This RC has failed due to
> >> > https://issues.apache.org/jira/browse/SPARK-23470 
> >> > .
> >> > Now that the fix has been merged in 2.3 (thanks Marcelo!), I'll follow
> >> > up
> >> > with an RC5 soon.
> >> >
> >> > On 20 February 2018 at 16:49, Ryan Blue  >> > > wrote:
> >> >>
> >> >> +1
> >> >>
> >> >> Build & tests look fine, checked signature and checksums for src
> >> >> tarball.
> >> >>
> >> >> On Tue, Feb 20, 2018 at 12:54 PM, Shixiong(Ryan) Zhu
> >> >> > wrote:
> >> >>>
> >> >>> I'm -1 because of the UI regression
> >> >>> https://issues.apache.org/jira/browse/SPARK-23470 
> >> >>>  : the All Jobs page
> >> >>> may be
> >> >>> too slow and cause "read timeout" when there are lots of jobs and
> >> >>> stages.
> >> >>> This is one of the most important pages because when it's broken, it's
> >> >>> pretty hard to use Spark Web UI.
> >> >>>
> >> >>>
> >> >>> On Tue, Feb 20, 2018 at 4:37 AM, Marco Gaido  >> >>> >
> >> >>> wrote:
> >> 
> >>  +1
> >> 
> >>  2018-02-20 12:30 GMT+01:00 Hyukjin Kwon  >>  >:
> >> >
> >> > +1 too
> >> >
> >> > 2018-02-20 14:41 GMT+09:00 Takuya UESHIN  >> > >:
> >> >>
> >> >> +1
> >> >>
> >> >>
> >> >> On Tue, Feb 20, 2018 at 2:14 PM, Xingbo Jiang
> >> >> >
> >> >> wrote:
> >> >>>
> >> >>> +1
> >> >>>
> >> >>>
> >> >>> Wenchen Fan  >> >>> >于2018年2月20日 周二下午1:09写道:
> >> 
> >>  +1
> >> 
> >>  On Tue, Feb 20, 2018 at 12:53 PM, Reynold Xin
> >>  >
> 

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-21 Thread Wenchen Fan
SPARK-23406 fixes a bug of a new feature in Spark 2.3, which is not a
regression. I think we have to fix it in 2.3.1, but I'm less sure about
2.3.0.

On Thu, Feb 22, 2018 at 1:21 AM, kant kodali  wrote:

> Hi All,
>
> +1 for the tickets proposed by Ryan Blue
>
> Any possible chance of this one https://issues.apache.org/
> jira/browse/SPARK-23406 getting into 2.3.0? It's a very important feature
> for us so if it doesn't make the cut I would have to cherry-pick this
> commit and compile from the source for our production release.
>
> Thanks!
>
> On Wed, Feb 21, 2018 at 9:01 AM, Ryan Blue 
> wrote:
>
>> What does everyone think about getting some of the newer DataSourceV2
>> improvements in? It should be low risk because it is a new code path, and
>> v2 isn't very usable without things like support for using the output
>> commit coordinator to deconflict writes.
>>
>> The ones I'd like to get in are:
>> * Use the output commit coordinator: https://issues.ap
>> ache.org/jira/browse/SPARK-23323
>> * Use immutable trees and the same push-down logic as other read paths:
>> https://issues.apache.org/jira/browse/SPARK-23203
>> * Don't allow users to supply schemas when they aren't supported:
>> https://issues.apache.org/jira/browse/SPARK-23418
>>
>> I think it would make the 2.3.0 release more usable for anyone interested
>> in the v2 read and write paths.
>>
>> Thanks!
>>
>> On Tue, Feb 20, 2018 at 7:07 PM, Weichen Xu 
>> wrote:
>>
>>> +1
>>>
>>> On Wed, Feb 21, 2018 at 10:07 AM, Marcelo Vanzin 
>>> wrote:
>>>
 Done, thanks!

 On Tue, Feb 20, 2018 at 6:05 PM, Sameer Agarwal 
 wrote:
 > Sure, please feel free to backport.
 >
 > On 20 February 2018 at 18:02, Marcelo Vanzin 
 wrote:
 >>
 >> Hey Sameer,
 >>
 >> Mind including https://github.com/apache/spark/pull/20643
 >> (SPARK-23468)  in the new RC? It's a minor bug since I've only hit it
 >> with older shuffle services, but it's pretty safe.
 >>
 >> On Tue, Feb 20, 2018 at 5:58 PM, Sameer Agarwal 
 >> wrote:
 >> > This RC has failed due to
 >> > https://issues.apache.org/jira/browse/SPARK-23470.
 >> > Now that the fix has been merged in 2.3 (thanks Marcelo!), I'll
 follow
 >> > up
 >> > with an RC5 soon.
 >> >
 >> > On 20 February 2018 at 16:49, Ryan Blue  wrote:
 >> >>
 >> >> +1
 >> >>
 >> >> Build & tests look fine, checked signature and checksums for src
 >> >> tarball.
 >> >>
 >> >> On Tue, Feb 20, 2018 at 12:54 PM, Shixiong(Ryan) Zhu
 >> >>  wrote:
 >> >>>
 >> >>> I'm -1 because of the UI regression
 >> >>> https://issues.apache.org/jira/browse/SPARK-23470 : the All
 Jobs page
 >> >>> may be
 >> >>> too slow and cause "read timeout" when there are lots of jobs and
 >> >>> stages.
 >> >>> This is one of the most important pages because when it's
 broken, it's
 >> >>> pretty hard to use Spark Web UI.
 >> >>>
 >> >>>
 >> >>> On Tue, Feb 20, 2018 at 4:37 AM, Marco Gaido <
 marcogaid...@gmail.com>
 >> >>> wrote:
 >> 
 >>  +1
 >> 
 >>  2018-02-20 12:30 GMT+01:00 Hyukjin Kwon :
 >> >
 >> > +1 too
 >> >
 >> > 2018-02-20 14:41 GMT+09:00 Takuya UESHIN <
 ues...@happy-camper.st>:
 >> >>
 >> >> +1
 >> >>
 >> >>
 >> >> On Tue, Feb 20, 2018 at 2:14 PM, Xingbo Jiang
 >> >> 
 >> >> wrote:
 >> >>>
 >> >>> +1
 >> >>>
 >> >>>
 >> >>> Wenchen Fan 于2018年2月20日 周二下午1:09写道:
 >> 
 >>  +1
 >> 
 >>  On Tue, Feb 20, 2018 at 12:53 PM, Reynold Xin
 >>  
 >>  wrote:
 >> >
 >> > +1
 >> >
 >> > On Feb 20, 2018, 5:51 PM +1300, Sameer Agarwal
 >> > , wrote:
 >> >>
 >> >> this file shouldn't be included?
 >> >>
 >> >> https://dist.apache.org/repos/
 dist/dev/spark/v2.3.0-rc4-bin/spark-parent_2.11.iml
 >> >
 >> >
 >> > I've now deleted this file
 >> >
 >> >> From: Sameer Agarwal 
 >> >> Sent: Saturday, February 17, 2018 1:43:39 PM
 >> >> To: Sameer Agarwal
 >> >> Cc: dev
 >> >> Subject: Re: [VOTE] Spark 2.3.0 (RC4)
 >> >>
 >> >> I'll start with a +1 once again.
 >> >>
 >> >> All blockers reported against RC3 have been resolved and
 the
 >> >> builds are healthy.

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-21 Thread kant kodali
Hi All,

+1 for the tickets proposed by Ryan Blue

Any possible chance of this one
https://issues.apache.org/jira/browse/SPARK-23406 getting into 2.3.0? It's
a very important feature for us so if it doesn't make the cut I would have
to cherry-pick this commit and compile from the source for our production
release.

Thanks!

On Wed, Feb 21, 2018 at 9:01 AM, Ryan Blue 
wrote:

> What does everyone think about getting some of the newer DataSourceV2
> improvements in? It should be low risk because it is a new code path, and
> v2 isn't very usable without things like support for using the output
> commit coordinator to deconflict writes.
>
> The ones I'd like to get in are:
> * Use the output commit coordinator: https://issues.
> apache.org/jira/browse/SPARK-23323
> * Use immutable trees and the same push-down logic as other read paths:
> https://issues.apache.org/jira/browse/SPARK-23203
> * Don't allow users to supply schemas when they aren't supported:
> https://issues.apache.org/jira/browse/SPARK-23418
>
> I think it would make the 2.3.0 release more usable for anyone interested
> in the v2 read and write paths.
>
> Thanks!
>
> On Tue, Feb 20, 2018 at 7:07 PM, Weichen Xu 
> wrote:
>
>> +1
>>
>> On Wed, Feb 21, 2018 at 10:07 AM, Marcelo Vanzin 
>> wrote:
>>
>>> Done, thanks!
>>>
>>> On Tue, Feb 20, 2018 at 6:05 PM, Sameer Agarwal 
>>> wrote:
>>> > Sure, please feel free to backport.
>>> >
>>> > On 20 February 2018 at 18:02, Marcelo Vanzin 
>>> wrote:
>>> >>
>>> >> Hey Sameer,
>>> >>
>>> >> Mind including https://github.com/apache/spark/pull/20643
>>> >> (SPARK-23468)  in the new RC? It's a minor bug since I've only hit it
>>> >> with older shuffle services, but it's pretty safe.
>>> >>
>>> >> On Tue, Feb 20, 2018 at 5:58 PM, Sameer Agarwal 
>>> >> wrote:
>>> >> > This RC has failed due to
>>> >> > https://issues.apache.org/jira/browse/SPARK-23470.
>>> >> > Now that the fix has been merged in 2.3 (thanks Marcelo!), I'll
>>> follow
>>> >> > up
>>> >> > with an RC5 soon.
>>> >> >
>>> >> > On 20 February 2018 at 16:49, Ryan Blue  wrote:
>>> >> >>
>>> >> >> +1
>>> >> >>
>>> >> >> Build & tests look fine, checked signature and checksums for src
>>> >> >> tarball.
>>> >> >>
>>> >> >> On Tue, Feb 20, 2018 at 12:54 PM, Shixiong(Ryan) Zhu
>>> >> >>  wrote:
>>> >> >>>
>>> >> >>> I'm -1 because of the UI regression
>>> >> >>> https://issues.apache.org/jira/browse/SPARK-23470 : the All Jobs
>>> page
>>> >> >>> may be
>>> >> >>> too slow and cause "read timeout" when there are lots of jobs and
>>> >> >>> stages.
>>> >> >>> This is one of the most important pages because when it's broken,
>>> it's
>>> >> >>> pretty hard to use Spark Web UI.
>>> >> >>>
>>> >> >>>
>>> >> >>> On Tue, Feb 20, 2018 at 4:37 AM, Marco Gaido <
>>> marcogaid...@gmail.com>
>>> >> >>> wrote:
>>> >> 
>>> >>  +1
>>> >> 
>>> >>  2018-02-20 12:30 GMT+01:00 Hyukjin Kwon :
>>> >> >
>>> >> > +1 too
>>> >> >
>>> >> > 2018-02-20 14:41 GMT+09:00 Takuya UESHIN <
>>> ues...@happy-camper.st>:
>>> >> >>
>>> >> >> +1
>>> >> >>
>>> >> >>
>>> >> >> On Tue, Feb 20, 2018 at 2:14 PM, Xingbo Jiang
>>> >> >> 
>>> >> >> wrote:
>>> >> >>>
>>> >> >>> +1
>>> >> >>>
>>> >> >>>
>>> >> >>> Wenchen Fan 于2018年2月20日 周二下午1:09写道:
>>> >> 
>>> >>  +1
>>> >> 
>>> >>  On Tue, Feb 20, 2018 at 12:53 PM, Reynold Xin
>>> >>  
>>> >>  wrote:
>>> >> >
>>> >> > +1
>>> >> >
>>> >> > On Feb 20, 2018, 5:51 PM +1300, Sameer Agarwal
>>> >> > , wrote:
>>> >> >>
>>> >> >> this file shouldn't be included?
>>> >> >>
>>> >> >> https://dist.apache.org/repos/
>>> dist/dev/spark/v2.3.0-rc4-bin/spark-parent_2.11.iml
>>> >> >
>>> >> >
>>> >> > I've now deleted this file
>>> >> >
>>> >> >> From: Sameer Agarwal 
>>> >> >> Sent: Saturday, February 17, 2018 1:43:39 PM
>>> >> >> To: Sameer Agarwal
>>> >> >> Cc: dev
>>> >> >> Subject: Re: [VOTE] Spark 2.3.0 (RC4)
>>> >> >>
>>> >> >> I'll start with a +1 once again.
>>> >> >>
>>> >> >> All blockers reported against RC3 have been resolved and
>>> the
>>> >> >> builds are healthy.
>>> >> >>
>>> >> >> On 17 February 2018 at 13:41, Sameer Agarwal
>>> >> >> 
>>> >> >> wrote:
>>> >> >>>
>>> >> >>> Please vote on releasing the following candidate as Apache
>>> >> >>> Spark
>>> >> >>> version 2.3.0. The vote is open until Thursday February
>>> 22,
>>> >> >>> 2018 at 

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-21 Thread Xiao Li
Hi, Ryan,

Thank you for bringing it up. Since it is in the RC4 already, we only can
accept the regression fixes in the 2.3 branch. This is also the strategy in
the previous Spark releases.

Data source APIs V2 is newly introduced in this release. In this stage, we
are unable to accept any change in data source APIs V2. We have to stop
adding new features/changes into the to-be-released branches. Sorry for
that.

Thanks,

Xiao







2018-02-21 9:01 GMT-08:00 Ryan Blue :

> What does everyone think about getting some of the newer DataSourceV2
> improvements in? It should be low risk because it is a new code path, and
> v2 isn't very usable without things like support for using the output
> commit coordinator to deconflict writes.
>
> The ones I'd like to get in are:
> * Use the output commit coordinator: https://issues.
> apache.org/jira/browse/SPARK-23323
> * Use immutable trees and the same push-down logic as other read paths:
> https://issues.apache.org/jira/browse/SPARK-23203
> * Don't allow users to supply schemas when they aren't supported:
> https://issues.apache.org/jira/browse/SPARK-23418
>
> I think it would make the 2.3.0 release more usable for anyone interested
> in the v2 read and write paths.
>
> Thanks!
>
> On Tue, Feb 20, 2018 at 7:07 PM, Weichen Xu 
> wrote:
>
>> +1
>>
>> On Wed, Feb 21, 2018 at 10:07 AM, Marcelo Vanzin 
>> wrote:
>>
>>> Done, thanks!
>>>
>>> On Tue, Feb 20, 2018 at 6:05 PM, Sameer Agarwal 
>>> wrote:
>>> > Sure, please feel free to backport.
>>> >
>>> > On 20 February 2018 at 18:02, Marcelo Vanzin 
>>> wrote:
>>> >>
>>> >> Hey Sameer,
>>> >>
>>> >> Mind including https://github.com/apache/spark/pull/20643
>>> >> (SPARK-23468)  in the new RC? It's a minor bug since I've only hit it
>>> >> with older shuffle services, but it's pretty safe.
>>> >>
>>> >> On Tue, Feb 20, 2018 at 5:58 PM, Sameer Agarwal 
>>> >> wrote:
>>> >> > This RC has failed due to
>>> >> > https://issues.apache.org/jira/browse/SPARK-23470.
>>> >> > Now that the fix has been merged in 2.3 (thanks Marcelo!), I'll
>>> follow
>>> >> > up
>>> >> > with an RC5 soon.
>>> >> >
>>> >> > On 20 February 2018 at 16:49, Ryan Blue  wrote:
>>> >> >>
>>> >> >> +1
>>> >> >>
>>> >> >> Build & tests look fine, checked signature and checksums for src
>>> >> >> tarball.
>>> >> >>
>>> >> >> On Tue, Feb 20, 2018 at 12:54 PM, Shixiong(Ryan) Zhu
>>> >> >>  wrote:
>>> >> >>>
>>> >> >>> I'm -1 because of the UI regression
>>> >> >>> https://issues.apache.org/jira/browse/SPARK-23470 : the All Jobs
>>> page
>>> >> >>> may be
>>> >> >>> too slow and cause "read timeout" when there are lots of jobs and
>>> >> >>> stages.
>>> >> >>> This is one of the most important pages because when it's broken,
>>> it's
>>> >> >>> pretty hard to use Spark Web UI.
>>> >> >>>
>>> >> >>>
>>> >> >>> On Tue, Feb 20, 2018 at 4:37 AM, Marco Gaido <
>>> marcogaid...@gmail.com>
>>> >> >>> wrote:
>>> >> 
>>> >>  +1
>>> >> 
>>> >>  2018-02-20 12:30 GMT+01:00 Hyukjin Kwon :
>>> >> >
>>> >> > +1 too
>>> >> >
>>> >> > 2018-02-20 14:41 GMT+09:00 Takuya UESHIN <
>>> ues...@happy-camper.st>:
>>> >> >>
>>> >> >> +1
>>> >> >>
>>> >> >>
>>> >> >> On Tue, Feb 20, 2018 at 2:14 PM, Xingbo Jiang
>>> >> >> 
>>> >> >> wrote:
>>> >> >>>
>>> >> >>> +1
>>> >> >>>
>>> >> >>>
>>> >> >>> Wenchen Fan 于2018年2月20日 周二下午1:09写道:
>>> >> 
>>> >>  +1
>>> >> 
>>> >>  On Tue, Feb 20, 2018 at 12:53 PM, Reynold Xin
>>> >>  
>>> >>  wrote:
>>> >> >
>>> >> > +1
>>> >> >
>>> >> > On Feb 20, 2018, 5:51 PM +1300, Sameer Agarwal
>>> >> > , wrote:
>>> >> >>
>>> >> >> this file shouldn't be included?
>>> >> >>
>>> >> >> https://dist.apache.org/repos/
>>> dist/dev/spark/v2.3.0-rc4-bin/spark-parent_2.11.iml
>>> >> >
>>> >> >
>>> >> > I've now deleted this file
>>> >> >
>>> >> >> From: Sameer Agarwal 
>>> >> >> Sent: Saturday, February 17, 2018 1:43:39 PM
>>> >> >> To: Sameer Agarwal
>>> >> >> Cc: dev
>>> >> >> Subject: Re: [VOTE] Spark 2.3.0 (RC4)
>>> >> >>
>>> >> >> I'll start with a +1 once again.
>>> >> >>
>>> >> >> All blockers reported against RC3 have been resolved and
>>> the
>>> >> >> builds are healthy.
>>> >> >>
>>> >> >> On 17 February 2018 at 13:41, Sameer Agarwal
>>> >> >> 
>>> >> >> wrote:
>>> >> >>>
>>> >> >>> Please vote on releasing the following candidate as Apache
>>> >> >>> Spark
>>> >> 

FINAL REMINDER: CFP for Apache EU Roadshow Closes 25th February

2018-02-21 Thread Sharan F

Hello Apache Supporters and Enthusiasts

This is your FINAL reminder that the Call for Papers (CFP) for the 
Apache EU Roadshow is closing soon. Our Apache EU Roadshow will focus on 
Cloud, IoT, Apache Tomcat, Apache Http and will run from 13-14 June 2018 
in Berlin.
Note that the CFP deadline has been extended to *25*^*th* *February *and 
it will be your final opportunity to submit a talk for thisevent.


Please make your submissions at http://apachecon.com/euroadshow18/

Also note that early bird ticket registrations to attend FOSS Backstage 
including the Apache EU Roadshow, have also been extended and will be 
available until 23^rd February. Please register at 
https://foss-backstage.de/tickets


We look forward to seeing you in Berlin!

Thanks
Sharan Foga, VP Apache Community Development

PLEASE NOTE: You are receiving this message because you are subscribed 
to a user@ or dev@ list of one or more Apache Software Foundation projects.




Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-21 Thread Ryan Blue
What does everyone think about getting some of the newer DataSourceV2
improvements in? It should be low risk because it is a new code path, and
v2 isn't very usable without things like support for using the output
commit coordinator to deconflict writes.

The ones I'd like to get in are:
* Use the output commit coordinator:
https://issues.apache.org/jira/browse/SPARK-23323
* Use immutable trees and the same push-down logic as other read paths:
https://issues.apache.org/jira/browse/SPARK-23203
* Don't allow users to supply schemas when they aren't supported:
https://issues.apache.org/jira/browse/SPARK-23418

I think it would make the 2.3.0 release more usable for anyone interested
in the v2 read and write paths.

Thanks!

On Tue, Feb 20, 2018 at 7:07 PM, Weichen Xu 
wrote:

> +1
>
> On Wed, Feb 21, 2018 at 10:07 AM, Marcelo Vanzin 
> wrote:
>
>> Done, thanks!
>>
>> On Tue, Feb 20, 2018 at 6:05 PM, Sameer Agarwal 
>> wrote:
>> > Sure, please feel free to backport.
>> >
>> > On 20 February 2018 at 18:02, Marcelo Vanzin 
>> wrote:
>> >>
>> >> Hey Sameer,
>> >>
>> >> Mind including https://github.com/apache/spark/pull/20643
>> >> (SPARK-23468)  in the new RC? It's a minor bug since I've only hit it
>> >> with older shuffle services, but it's pretty safe.
>> >>
>> >> On Tue, Feb 20, 2018 at 5:58 PM, Sameer Agarwal 
>> >> wrote:
>> >> > This RC has failed due to
>> >> > https://issues.apache.org/jira/browse/SPARK-23470.
>> >> > Now that the fix has been merged in 2.3 (thanks Marcelo!), I'll
>> follow
>> >> > up
>> >> > with an RC5 soon.
>> >> >
>> >> > On 20 February 2018 at 16:49, Ryan Blue  wrote:
>> >> >>
>> >> >> +1
>> >> >>
>> >> >> Build & tests look fine, checked signature and checksums for src
>> >> >> tarball.
>> >> >>
>> >> >> On Tue, Feb 20, 2018 at 12:54 PM, Shixiong(Ryan) Zhu
>> >> >>  wrote:
>> >> >>>
>> >> >>> I'm -1 because of the UI regression
>> >> >>> https://issues.apache.org/jira/browse/SPARK-23470 : the All Jobs
>> page
>> >> >>> may be
>> >> >>> too slow and cause "read timeout" when there are lots of jobs and
>> >> >>> stages.
>> >> >>> This is one of the most important pages because when it's broken,
>> it's
>> >> >>> pretty hard to use Spark Web UI.
>> >> >>>
>> >> >>>
>> >> >>> On Tue, Feb 20, 2018 at 4:37 AM, Marco Gaido <
>> marcogaid...@gmail.com>
>> >> >>> wrote:
>> >> 
>> >>  +1
>> >> 
>> >>  2018-02-20 12:30 GMT+01:00 Hyukjin Kwon :
>> >> >
>> >> > +1 too
>> >> >
>> >> > 2018-02-20 14:41 GMT+09:00 Takuya UESHIN > >:
>> >> >>
>> >> >> +1
>> >> >>
>> >> >>
>> >> >> On Tue, Feb 20, 2018 at 2:14 PM, Xingbo Jiang
>> >> >> 
>> >> >> wrote:
>> >> >>>
>> >> >>> +1
>> >> >>>
>> >> >>>
>> >> >>> Wenchen Fan 于2018年2月20日 周二下午1:09写道:
>> >> 
>> >>  +1
>> >> 
>> >>  On Tue, Feb 20, 2018 at 12:53 PM, Reynold Xin
>> >>  
>> >>  wrote:
>> >> >
>> >> > +1
>> >> >
>> >> > On Feb 20, 2018, 5:51 PM +1300, Sameer Agarwal
>> >> > , wrote:
>> >> >>
>> >> >> this file shouldn't be included?
>> >> >>
>> >> >> https://dist.apache.org/repos/
>> dist/dev/spark/v2.3.0-rc4-bin/spark-parent_2.11.iml
>> >> >
>> >> >
>> >> > I've now deleted this file
>> >> >
>> >> >> From: Sameer Agarwal 
>> >> >> Sent: Saturday, February 17, 2018 1:43:39 PM
>> >> >> To: Sameer Agarwal
>> >> >> Cc: dev
>> >> >> Subject: Re: [VOTE] Spark 2.3.0 (RC4)
>> >> >>
>> >> >> I'll start with a +1 once again.
>> >> >>
>> >> >> All blockers reported against RC3 have been resolved and the
>> >> >> builds are healthy.
>> >> >>
>> >> >> On 17 February 2018 at 13:41, Sameer Agarwal
>> >> >> 
>> >> >> wrote:
>> >> >>>
>> >> >>> Please vote on releasing the following candidate as Apache
>> >> >>> Spark
>> >> >>> version 2.3.0. The vote is open until Thursday February 22,
>> >> >>> 2018 at 8:00:00
>> >> >>> am UTC and passes if a majority of at least 3 PMC +1 votes
>> are
>> >> >>> cast.
>> >> >>>
>> >> >>>
>> >> >>> [ ] +1 Release this package as Apache Spark 2.3.0
>> >> >>>
>> >> >>> [ ] -1 Do not release this package because ...
>> >> >>>
>> >> >>>
>> >> >>> To learn more about Apache Spark, please see
>> >> >>> https://spark.apache.org/
>> >> >>>
>> >> >>> The tag to be voted on is v2.3.0-rc4:
>> >> >>>