Re: Auto-linking from PRs to Jira tickets

2020-03-09 Thread Holden Karau
Oh got it. That sounds cool.

On Mon, Mar 9, 2020 at 6:25 PM Nicholas Chammas 
wrote:

> Right, what I'm talking about is linking in the other direction, from
> GitHub to Jira.
>
> i.e. you can type "SPARK-1234" in plain text on a PR, and GitHub will
> automatically turn it into a link to the appropriate ticket on Jira.
>
> On Mon, Mar 9, 2020 at 8:21 PM Holden Karau  wrote:
>
>>
>>
>> On Mon, Mar 9, 2020 at 2:14 PM Nicholas Chammas <
>> nicholas.cham...@gmail.com> wrote:
>>
>>> This is a feature of GitHub itself and would auto-link directly from the
>>> PR back to Jira.
>>>
>>> I haven't looked at the PR dashboard in a while, but I believe you're
>>> referencing a feature of the dashboard  that
>>> people won't get unless they look at the dashboard itself.
>>>
>>> What GitHub is offering is an ability to auto-link any mention of a Jira
>>> ticket anywhere in a PR discussion (and hopefully also in the PR title,
>>> though I'm not sure) directly back to Jira.
>>>
>> so the dashboard has a bot which would update the JIRA tickets based on
>> the PRs. It might be broken though.
>>
>>>
>>> I suppose if you're in the habit of using the dashboard regularly it
>>> won't make a big difference. I typically land on a PR via a notification in
>>> GitHub or via email. If I want to lookup the referenced Jira ticket, I have
>>> to copy it from the PR title and navigate to issues.apache.org and
>>> paste it in.
>>>
>>> On Mon, Mar 9, 2020 at 4:46 PM Holden Karau 
>>> wrote:
>>>
 I think we used to do this with the same bot that runs the PR
 dashboard, is it no longer working?

 On Mon, Mar 9, 2020 at 12:28 PM Nicholas Chammas <
 nicholas.cham...@gmail.com> wrote:

> https://github.blog/2019-10-14-introducing-autolink-references/
>
> GitHub has a feature for auto-linking from PRs to external tickets.
> It's only available for their paid plans, but perhaps Apache has some
> arrangement with them where we can get that feature.
>
> Since we include Jira ticket numbers in every PR title, it would be
> great if each PR auto-linked back to the relevant Jira tickets. (We 
> already
> have auto-linking from Jira to PRs.)
>
> Has someone looked into this already, or should I file a ticket with
> INFRA and see what they say?
>
> Nick
>
> --
 Twitter: https://twitter.com/holdenkarau
 Books (Learning Spark, High Performance Spark, etc.):
 https://amzn.to/2MaRAG9  
 YouTube Live Streams: https://www.youtube.com/user/holdenkarau

>>> --
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
> --
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


Re: Auto-linking from PRs to Jira tickets

2020-03-09 Thread Nicholas Chammas
Right, what I'm talking about is linking in the other direction, from
GitHub to Jira.

i.e. you can type "SPARK-1234" in plain text on a PR, and GitHub will
automatically turn it into a link to the appropriate ticket on Jira.

On Mon, Mar 9, 2020 at 8:21 PM Holden Karau  wrote:

>
>
> On Mon, Mar 9, 2020 at 2:14 PM Nicholas Chammas <
> nicholas.cham...@gmail.com> wrote:
>
>> This is a feature of GitHub itself and would auto-link directly from the
>> PR back to Jira.
>>
>> I haven't looked at the PR dashboard in a while, but I believe you're
>> referencing a feature of the dashboard  that
>> people won't get unless they look at the dashboard itself.
>>
>> What GitHub is offering is an ability to auto-link any mention of a Jira
>> ticket anywhere in a PR discussion (and hopefully also in the PR title,
>> though I'm not sure) directly back to Jira.
>>
> so the dashboard has a bot which would update the JIRA tickets based on
> the PRs. It might be broken though.
>
>>
>> I suppose if you're in the habit of using the dashboard regularly it
>> won't make a big difference. I typically land on a PR via a notification in
>> GitHub or via email. If I want to lookup the referenced Jira ticket, I have
>> to copy it from the PR title and navigate to issues.apache.org and paste
>> it in.
>>
>> On Mon, Mar 9, 2020 at 4:46 PM Holden Karau  wrote:
>>
>>> I think we used to do this with the same bot that runs the PR dashboard,
>>> is it no longer working?
>>>
>>> On Mon, Mar 9, 2020 at 12:28 PM Nicholas Chammas <
>>> nicholas.cham...@gmail.com> wrote:
>>>
 https://github.blog/2019-10-14-introducing-autolink-references/

 GitHub has a feature for auto-linking from PRs to external tickets.
 It's only available for their paid plans, but perhaps Apache has some
 arrangement with them where we can get that feature.

 Since we include Jira ticket numbers in every PR title, it would be
 great if each PR auto-linked back to the relevant Jira tickets. (We already
 have auto-linking from Jira to PRs.)

 Has someone looked into this already, or should I file a ticket with
 INFRA and see what they say?

 Nick

 --
>>> Twitter: https://twitter.com/holdenkarau
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9  
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>


Re: Auto-linking from PRs to Jira tickets

2020-03-09 Thread Holden Karau
On Mon, Mar 9, 2020 at 2:14 PM Nicholas Chammas 
wrote:

> This is a feature of GitHub itself and would auto-link directly from the
> PR back to Jira.
>
> I haven't looked at the PR dashboard in a while, but I believe you're
> referencing a feature of the dashboard  that
> people won't get unless they look at the dashboard itself.
>
> What GitHub is offering is an ability to auto-link any mention of a Jira
> ticket anywhere in a PR discussion (and hopefully also in the PR title,
> though I'm not sure) directly back to Jira.
>
so the dashboard has a bot which would update the JIRA tickets based on the
PRs. It might be broken though.

>
> I suppose if you're in the habit of using the dashboard regularly it won't
> make a big difference. I typically land on a PR via a notification in
> GitHub or via email. If I want to lookup the referenced Jira ticket, I have
> to copy it from the PR title and navigate to issues.apache.org and paste
> it in.
>
> On Mon, Mar 9, 2020 at 4:46 PM Holden Karau  wrote:
>
>> I think we used to do this with the same bot that runs the PR dashboard,
>> is it no longer working?
>>
>> On Mon, Mar 9, 2020 at 12:28 PM Nicholas Chammas <
>> nicholas.cham...@gmail.com> wrote:
>>
>>> https://github.blog/2019-10-14-introducing-autolink-references/
>>>
>>> GitHub has a feature for auto-linking from PRs to external tickets. It's
>>> only available for their paid plans, but perhaps Apache has some
>>> arrangement with them where we can get that feature.
>>>
>>> Since we include Jira ticket numbers in every PR title, it would be
>>> great if each PR auto-linked back to the relevant Jira tickets. (We already
>>> have auto-linking from Jira to PRs.)
>>>
>>> Has someone looked into this already, or should I file a ticket with
>>> INFRA and see what they say?
>>>
>>> Nick
>>>
>>> --
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
> --
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


Re: [VOTE] Amend Spark's Semantic Versioning Policy

2020-03-09 Thread Holden Karau
+1 (binding) on the original proposal.

On Mon, Mar 9, 2020 at 1:32 PM Michael Heuer  wrote:

> +1 (non-binding)
>
> I am disappointed however that this only mentions API and not dependencies
> and transitive dependencies.
>
I think upgrading dependencies continues to be reasonable.

>
> As Spark does not provide separation between its runtime classpath and the
> classpath used by applications, I believe Spark's dependencies and
> transitive dependencies should be considered part of the API for this
> policy.  Breaking dependency upgrades and incompatible dependency versions
> are the source of much frustration.
>
I my self have also face this frustration. I believe we've increased some
shading to help here. Are there specific pain points you've  experienced?
Maybe we can factor this discussion into another thread

>
>

>michael
>
>
> On Mar 9, 2020, at 2:16 PM, Takuya UESHIN  wrote:
>
> +1 (binding)
>
>
> On Mon, Mar 9, 2020 at 11:49 AM Xingbo Jiang 
> wrote:
>
>> +1 (non-binding)
>>
>> Cheers,
>>
>> Xingbo
>>
>> On Mon, Mar 9, 2020 at 9:35 AM Xiao Li  wrote:
>>
>>> +1 (binding)
>>>
>>> Xiao
>>>
>>> On Mon, Mar 9, 2020 at 8:33 AM Denny Lee  wrote:
>>>
 +1 (non-binding)

 On Mon, Mar 9, 2020 at 1:59 AM Hyukjin Kwon 
 wrote:

> The proposal itself seems good as the factors to consider, Thanks
> Michael.
>
> Several concerns mentioned look good points, in particular:
>
> > ... assuming that this is for public stable APIs, not APIs that are
> marked as unstable, evolving, etc. ...
> I would like to confirm this. We already have API annotations such as
> Experimental, Unstable, etc. and the implication of each is still
> effective. If it's for stable APIs, it makes sense to me as well.
>
> > ... can we expand on 'when' an API change can occur ?  Since we are
> proposing to diverge from semver. ...
> I think this is a good point. If we're proposing to divert
> from semver, the delta compared to semver will have to be clarified to
> avoid different personal interpretations of the somewhat general 
> principles.
>
> > ... can we narrow down on the migration from Apache Spark 2.4.5 to
> Apache Spark 3.0+? ...
>
> Assuming these concerns will be addressed, +1 (binding).
>
>
> 2020년 3월 9일 (월) 오후 4:53, Takeshi Yamamuro 님이
> 작성:
>
>> +1 (non-binding)
>>
>> Bests,
>> Takeshi
>>
>> On Mon, Mar 9, 2020 at 4:52 PM Gengliang Wang <
>> gengliang.w...@databricks.com> wrote:
>>
>>> +1 (non-binding)
>>>
>>> Gengliang
>>>
>>> On Mon, Mar 9, 2020 at 12:22 AM Matei Zaharia <
>>> matei.zaha...@gmail.com> wrote:
>>>
 +1 as well.

 Matei

 On Mar 9, 2020, at 12:05 AM, Wenchen Fan 
 wrote:

 +1 (binding), assuming that this is for public stable APIs, not
 APIs that are marked as unstable, evolving, etc.

 On Mon, Mar 9, 2020 at 1:10 AM Ismaël Mejía 
 wrote:

> +1 (non-binding)
>
> Michael's section on the trade-offs of maintaining / removing an
> API are one of
> the best reads I have seeing in this mailing list. Enthusiast +1
>
> On Sat, Mar 7, 2020 at 8:28 PM Dongjoon Hyun <
> dongjoon.h...@gmail.com> wrote:
> >
> > This new policy has a good indention, but can we narrow down on
> the migration from Apache Spark 2.4.5 to Apache Spark 3.0+?
> >
> > I saw that there already exists a reverting PR to bring back
> Spark 1.4 and 1.5 APIs based on this AS-IS suggestion.
> >
> > The AS-IS policy is clearly mentioning that JVM/Scala-level
> difficulty, and it's nice.
> >
> > However, for the other cases, it sounds like `recommending older
> APIs as much as possible` due to the following.
> >
> >  > How long has the API been in Spark?
> >
> > We had better be more careful when we add a new policy and
> should aim not to mislead the users and 3rd party library developers 
> to say
> "older is better".
> >
> > Technically, I'm wondering who will use new APIs in their
> examples (of books and StackOverflow) if they need to write an 
> additional
> warning like `this only works at 2.4.0+` always .
> >
> > Bests,
> > Dongjoon.
> >
> > On Fri, Mar 6, 2020 at 7:10 PM Mridul Muralidharan <
> mri...@gmail.com> wrote:
> >>
> >> I am in broad agreement with the prposal, as any developer, I
> prefer
> >> stable well designed API's :-)
> >>
> >> Can we tie the proposal to stability guarantees given by spark
> and
> >> reasonable expectation from users ?
> >> In my opinion, an unstable or 

Re: [VOTE] Amend Spark's Semantic Versioning Policy

2020-03-09 Thread Burak Yavuz
+1

On Mon, Mar 9, 2020 at 4:55 PM Reynold Xin  wrote:

> +1
>
>
>
> On Mon, Mar 09, 2020 at 3:53 PM, John Zhuge  wrote:
>
>> +1 (non-binding)
>>
>> On Mon, Mar 9, 2020 at 1:32 PM Michael Heuer  wrote:
>>
>>> +1 (non-binding)
>>>
>>> I am disappointed however that this only mentions API and not
>>> dependencies and transitive dependencies.
>>>
>>> As Spark does not provide separation between its runtime classpath and
>>> the classpath used by applications, I believe Spark's dependencies and
>>> transitive dependencies should be considered part of the API for this
>>> policy.  Breaking dependency upgrades and incompatible dependency versions
>>> are the source of much frustration.
>>>
>>>michael
>>>
>>>
>>> On Mar 9, 2020, at 2:16 PM, Takuya UESHIN 
>>> wrote:
>>>
>>> +1 (binding)
>>>
>>>
>>> On Mon, Mar 9, 2020 at 11:49 AM Xingbo Jiang 
>>> wrote:
>>>
 +1 (non-binding)

 Cheers,

 Xingbo

 On Mon, Mar 9, 2020 at 9:35 AM Xiao Li  wrote:

> +1 (binding)
>
> Xiao
>
> On Mon, Mar 9, 2020 at 8:33 AM Denny Lee 
> wrote:
>
>> +1 (non-binding)
>>
>> On Mon, Mar 9, 2020 at 1:59 AM Hyukjin Kwon 
>> wrote:
>>
>>> The proposal itself seems good as the factors to consider, Thanks
>>> Michael.
>>>
>>> Several concerns mentioned look good points, in particular:
>>>
>>> > ... assuming that this is for public stable APIs, not APIs that
>>> are marked as unstable, evolving, etc. ...
>>> I would like to confirm this. We already have API annotations such
>>> as Experimental, Unstable, etc. and the implication of each is still
>>> effective. If it's for stable APIs, it makes sense to me as well.
>>>
>>> > ... can we expand on 'when' an API change can occur ?  Since we
>>> are proposing to diverge from semver. ...
>>> I think this is a good point. If we're proposing to divert
>>> from semver, the delta compared to semver will have to be clarified to
>>> avoid different personal interpretations of the somewhat general 
>>> principles.
>>>
>>> > ... can we narrow down on the migration from Apache Spark 2.4.5 to
>>> Apache Spark 3.0+? ...
>>>
>>> Assuming these concerns will be addressed, +1 (binding).
>>>
>>>
>>> 2020년 3월 9일 (월) 오후 4:53, Takeshi Yamamuro 님이
>>> 작성:
>>>
 +1 (non-binding)

 Bests,
 Takeshi

 On Mon, Mar 9, 2020 at 4:52 PM Gengliang Wang <
 gengliang.w...@databricks.com> wrote:

> +1 (non-binding)
>
> Gengliang
>
> On Mon, Mar 9, 2020 at 12:22 AM Matei Zaharia <
> matei.zaha...@gmail.com> wrote:
>
>> +1 as well.
>>
>> Matei
>>
>> On Mar 9, 2020, at 12:05 AM, Wenchen Fan 
>> wrote:
>>
>> +1 (binding), assuming that this is for public stable APIs, not
>> APIs that are marked as unstable, evolving, etc.
>>
>> On Mon, Mar 9, 2020 at 1:10 AM Ismaël Mejía 
>> wrote:
>>
>>> +1 (non-binding)
>>>
>>> Michael's section on the trade-offs of maintaining / removing an
>>> API are one of
>>> the best reads I have seeing in this mailing list. Enthusiast +1
>>>
>>> On Sat, Mar 7, 2020 at 8:28 PM Dongjoon Hyun <
>>> dongjoon.h...@gmail.com> wrote:
>>> >
>>> > This new policy has a good indention, but can we narrow down
>>> on the migration from Apache Spark 2.4.5 to Apache Spark 3.0+?
>>> >
>>> > I saw that there already exists a reverting PR to bring back
>>> Spark 1.4 and 1.5 APIs based on this AS-IS suggestion.
>>> >
>>> > The AS-IS policy is clearly mentioning that JVM/Scala-level
>>> difficulty, and it's nice.
>>> >
>>> > However, for the other cases, it sounds like `recommending
>>> older APIs as much as possible` due to the following.
>>> >
>>> >  > How long has the API been in Spark?
>>> >
>>> > We had better be more careful when we add a new policy and
>>> should aim not to mislead the users and 3rd party library 
>>> developers to say
>>> "older is better".
>>> >
>>> > Technically, I'm wondering who will use new APIs in their
>>> examples (of books and StackOverflow) if they need to write an 
>>> additional
>>> warning like `this only works at 2.4.0+` always .
>>> >
>>> > Bests,
>>> > Dongjoon.
>>> >
>>> > On Fri, Mar 6, 2020 at 7:10 PM Mridul Muralidharan <
>>> mri...@gmail.com> wrote:
>>> >>
>>> >> I am in broad agreement with the prposal, as any developer, I
>>> prefer
>>> >> stable well designed API's :-)
>>> >>
>>> >> Can we tie the proposal to stability 

Re: [VOTE] Amend Spark's Semantic Versioning Policy

2020-03-09 Thread Reynold Xin
+1

On Mon, Mar 09, 2020 at 3:53 PM, John Zhuge < jzh...@apache.org > wrote:

> 
> +1 (non-binding)
> 
> 
> On Mon, Mar 9, 2020 at 1:32 PM Michael Heuer < heuermh@ gmail. com (
> heue...@gmail.com ) > wrote:
> 
> 
>> +1 (non-binding)
>> 
>> 
>> I am disappointed however that this only mentions API and not dependencies
>> and transitive dependencies.
>> 
>> 
>> As Spark does not provide separation between its runtime classpath and the
>> classpath used by applications, I believe Spark's dependencies and
>> transitive dependencies should be considered part of the API for this
>> policy.  Breaking dependency upgrades and incompatible dependency versions
>> are the source of much frustration.
>> 
>> 
>>    michael
>> 
>> 
>> 
>> 
>>> On Mar 9, 2020, at 2:16 PM, Takuya UESHIN < ueshin@ happy-camper. st (
>>> ues...@happy-camper.st ) > wrote:
>>> 
>>> +1 (binding)
>>> 
>>> 
>>> 
>>> On Mon, Mar 9, 2020 at 11:49 AM Xingbo Jiang < jiangxb1987@ gmail. com (
>>> jiangxb1...@gmail.com ) > wrote:
>>> 
>>> 
 +1 (non-binding)
 
 
 Cheers,
 
 
 Xingbo
 
 On Mon, Mar 9, 2020 at 9:35 AM Xiao Li < lixiao@ databricks. com (
 lix...@databricks.com ) > wrote:
 
 
> +1 (binding)
> 
> 
> Xiao
> 
> On Mon, Mar 9, 2020 at 8:33 AM Denny Lee < denny. g. lee@ gmail. com (
> denny.g@gmail.com ) > wrote:
> 
> 
>> +1 (non-binding)
>> 
>> 
>> On Mon, Mar 9, 2020 at 1:59 AM Hyukjin Kwon < gurwls223@ gmail. com (
>> gurwls...@gmail.com ) > wrote:
>> 
>> 
>>> The proposal itself seems good as the factors to consider, Thanks 
>>> Michael.
>>> 
>>> 
>>> Several concerns mentioned look good points, in particular:
>>> 
>>> > ... assuming that this is for public stable APIs, not APIs that are
>>> marked as unstable, evolving, etc. ...
>>> I would like to confirm this. We already have API annotations such as
>>> Experimental, Unstable, etc. and the implication of each is still
>>> effective. If it's for stable APIs, it makes sense to me as well.
>>> 
>>> > ... can we expand on 'when' an API change can occur ?  Since we are
>>> proposing to diverge from semver. ...
>>> 
>>> I think this is a good point. If we're proposing to divert from semver,
>>> the delta compared to semver will have to be clarified to avoid 
>>> different
>>> personal interpretations of the somewhat general principles.
>>> 
>>> > ... can we narrow down on the migration from Apache Spark 2.4.5 to
>>> Apache Spark 3.0+? ...
>>> 
>>> Assuming these concerns will be addressed, +1 (binding).
>>> 
>>>  
>>> 2020년 3월 9일 (월) 오후 4:53, Takeshi Yamamuro < linguin. m. s@ gmail. com (
>>> linguin@gmail.com ) >님이 작성:
>>> 
>>> 
 +1 (non-binding)
 
 
 Bests,
 Takeshi
 
 On Mon, Mar 9, 2020 at 4:52 PM Gengliang Wang < gengliang. wang@ 
 databricks.
 com ( gengliang.w...@databricks.com ) > wrote:
 
 
> +1 (non-binding)
> 
> 
> Gengliang
> 
> On Mon, Mar 9, 2020 at 12:22 AM Matei Zaharia < matei. zaharia@ 
> gmail. com
> ( matei.zaha...@gmail.com ) > wrote:
> 
> 
>> +1 as well.
>> 
>> 
>> Matei
>> 
>> 
>>> On Mar 9, 2020, at 12:05 AM, Wenchen Fan < cloud0fan@ gmail. com (
>>> cloud0...@gmail.com ) > wrote:
>>> 
>>> +1 (binding), assuming that this is for public stable APIs, not 
>>> APIs that
>>> are marked as unstable, evolving, etc.
>>> 
>>> 
>>> On Mon, Mar 9, 2020 at 1:10 AM Ismaël Mejía < iemejia@ gmail. com (
>>> ieme...@gmail.com ) > wrote:
>>> 
>>> 
 +1 (non-binding)
 
 Michael's section on the trade-offs of maintaining / removing an 
 API are
 one of
 the best reads I have seeing in this mailing list. Enthusiast +1
 
 On Sat, Mar 7, 2020 at 8:28 PM Dongjoon Hyun < dongjoon. hyun@ 
 gmail. com (
 dongjoon.h...@gmail.com ) > wrote:
 >
 > This new policy has a good indention, but can we narrow down on 
 > the
 migration from Apache Spark 2.4.5 to Apache Spark 3.0+?
 >
 > I saw that there already exists a reverting PR to bring back 
 > Spark 1.4
 and 1.5 APIs based on this AS-IS suggestion.
 >
 > The AS-IS policy is clearly mentioning that JVM/Scala-level 
 > difficulty,
 and it's nice.
 >
 > However, for the other cases, it sounds like `recommending older 
 > APIs as
 much as possible` due to the following.
 >
 >      

Re: [VOTE] Amend Spark's Semantic Versioning Policy

2020-03-09 Thread John Zhuge
+1 (non-binding)

On Mon, Mar 9, 2020 at 1:32 PM Michael Heuer  wrote:

> +1 (non-binding)
>
> I am disappointed however that this only mentions API and not dependencies
> and transitive dependencies.
>
> As Spark does not provide separation between its runtime classpath and the
> classpath used by applications, I believe Spark's dependencies and
> transitive dependencies should be considered part of the API for this
> policy.  Breaking dependency upgrades and incompatible dependency versions
> are the source of much frustration.
>
>michael
>
>
> On Mar 9, 2020, at 2:16 PM, Takuya UESHIN  wrote:
>
> +1 (binding)
>
>
> On Mon, Mar 9, 2020 at 11:49 AM Xingbo Jiang 
> wrote:
>
>> +1 (non-binding)
>>
>> Cheers,
>>
>> Xingbo
>>
>> On Mon, Mar 9, 2020 at 9:35 AM Xiao Li  wrote:
>>
>>> +1 (binding)
>>>
>>> Xiao
>>>
>>> On Mon, Mar 9, 2020 at 8:33 AM Denny Lee  wrote:
>>>
 +1 (non-binding)

 On Mon, Mar 9, 2020 at 1:59 AM Hyukjin Kwon 
 wrote:

> The proposal itself seems good as the factors to consider, Thanks
> Michael.
>
> Several concerns mentioned look good points, in particular:
>
> > ... assuming that this is for public stable APIs, not APIs that are
> marked as unstable, evolving, etc. ...
> I would like to confirm this. We already have API annotations such as
> Experimental, Unstable, etc. and the implication of each is still
> effective. If it's for stable APIs, it makes sense to me as well.
>
> > ... can we expand on 'when' an API change can occur ?  Since we are
> proposing to diverge from semver. ...
> I think this is a good point. If we're proposing to divert
> from semver, the delta compared to semver will have to be clarified to
> avoid different personal interpretations of the somewhat general 
> principles.
>
> > ... can we narrow down on the migration from Apache Spark 2.4.5 to
> Apache Spark 3.0+? ...
>
> Assuming these concerns will be addressed, +1 (binding).
>
>
> 2020년 3월 9일 (월) 오후 4:53, Takeshi Yamamuro 님이
> 작성:
>
>> +1 (non-binding)
>>
>> Bests,
>> Takeshi
>>
>> On Mon, Mar 9, 2020 at 4:52 PM Gengliang Wang <
>> gengliang.w...@databricks.com> wrote:
>>
>>> +1 (non-binding)
>>>
>>> Gengliang
>>>
>>> On Mon, Mar 9, 2020 at 12:22 AM Matei Zaharia <
>>> matei.zaha...@gmail.com> wrote:
>>>
 +1 as well.

 Matei

 On Mar 9, 2020, at 12:05 AM, Wenchen Fan 
 wrote:

 +1 (binding), assuming that this is for public stable APIs, not
 APIs that are marked as unstable, evolving, etc.

 On Mon, Mar 9, 2020 at 1:10 AM Ismaël Mejía 
 wrote:

> +1 (non-binding)
>
> Michael's section on the trade-offs of maintaining / removing an
> API are one of
> the best reads I have seeing in this mailing list. Enthusiast +1
>
> On Sat, Mar 7, 2020 at 8:28 PM Dongjoon Hyun <
> dongjoon.h...@gmail.com> wrote:
> >
> > This new policy has a good indention, but can we narrow down on
> the migration from Apache Spark 2.4.5 to Apache Spark 3.0+?
> >
> > I saw that there already exists a reverting PR to bring back
> Spark 1.4 and 1.5 APIs based on this AS-IS suggestion.
> >
> > The AS-IS policy is clearly mentioning that JVM/Scala-level
> difficulty, and it's nice.
> >
> > However, for the other cases, it sounds like `recommending older
> APIs as much as possible` due to the following.
> >
> >  > How long has the API been in Spark?
> >
> > We had better be more careful when we add a new policy and
> should aim not to mislead the users and 3rd party library developers 
> to say
> "older is better".
> >
> > Technically, I'm wondering who will use new APIs in their
> examples (of books and StackOverflow) if they need to write an 
> additional
> warning like `this only works at 2.4.0+` always .
> >
> > Bests,
> > Dongjoon.
> >
> > On Fri, Mar 6, 2020 at 7:10 PM Mridul Muralidharan <
> mri...@gmail.com> wrote:
> >>
> >> I am in broad agreement with the prposal, as any developer, I
> prefer
> >> stable well designed API's :-)
> >>
> >> Can we tie the proposal to stability guarantees given by spark
> and
> >> reasonable expectation from users ?
> >> In my opinion, an unstable or evolving could change - while an
> >> experimental api which has been around for ages should be more
> >> conservatively handled.
> >> Which brings in question what are the stability guarantees as
> >> specified by annotations interacting with the proposal.

Re: Auto-linking from PRs to Jira tickets

2020-03-09 Thread Nicholas Chammas
This is a feature of GitHub itself and would auto-link directly from the PR
back to Jira.

I haven't looked at the PR dashboard in a while, but I believe you're
referencing a feature of the dashboard  that
people won't get unless they look at the dashboard itself.

What GitHub is offering is an ability to auto-link any mention of a Jira
ticket anywhere in a PR discussion (and hopefully also in the PR title,
though I'm not sure) directly back to Jira.

I suppose if you're in the habit of using the dashboard regularly it won't
make a big difference. I typically land on a PR via a notification in
GitHub or via email. If I want to lookup the referenced Jira ticket, I have
to copy it from the PR title and navigate to issues.apache.org and paste it
in.

On Mon, Mar 9, 2020 at 4:46 PM Holden Karau  wrote:

> I think we used to do this with the same bot that runs the PR dashboard,
> is it no longer working?
>
> On Mon, Mar 9, 2020 at 12:28 PM Nicholas Chammas <
> nicholas.cham...@gmail.com> wrote:
>
>> https://github.blog/2019-10-14-introducing-autolink-references/
>>
>> GitHub has a feature for auto-linking from PRs to external tickets. It's
>> only available for their paid plans, but perhaps Apache has some
>> arrangement with them where we can get that feature.
>>
>> Since we include Jira ticket numbers in every PR title, it would be great
>> if each PR auto-linked back to the relevant Jira tickets. (We already have
>> auto-linking from Jira to PRs.)
>>
>> Has someone looked into this already, or should I file a ticket with
>> INFRA and see what they say?
>>
>> Nick
>>
>> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>


Re: Auto-linking from PRs to Jira tickets

2020-03-09 Thread Holden Karau
I think we used to do this with the same bot that runs the PR dashboard, is
it no longer working?

On Mon, Mar 9, 2020 at 12:28 PM Nicholas Chammas 
wrote:

> https://github.blog/2019-10-14-introducing-autolink-references/
>
> GitHub has a feature for auto-linking from PRs to external tickets. It's
> only available for their paid plans, but perhaps Apache has some
> arrangement with them where we can get that feature.
>
> Since we include Jira ticket numbers in every PR title, it would be great
> if each PR auto-linked back to the relevant Jira tickets. (We already have
> auto-linking from Jira to PRs.)
>
> Has someone looked into this already, or should I file a ticket with INFRA
> and see what they say?
>
> Nick
>
> --
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


Re: [VOTE] Amend Spark's Semantic Versioning Policy

2020-03-09 Thread Michael Heuer
+1 (non-binding)

I am disappointed however that this only mentions API and not dependencies and 
transitive dependencies.

As Spark does not provide separation between its runtime classpath and the 
classpath used by applications, I believe Spark's dependencies and transitive 
dependencies should be considered part of the API for this policy.  Breaking 
dependency upgrades and incompatible dependency versions are the source of much 
frustration.

   michael


> On Mar 9, 2020, at 2:16 PM, Takuya UESHIN  wrote:
> 
> +1 (binding)
> 
> 
> On Mon, Mar 9, 2020 at 11:49 AM Xingbo Jiang  > wrote:
> +1 (non-binding)
> 
> Cheers,
> 
> Xingbo
> 
> On Mon, Mar 9, 2020 at 9:35 AM Xiao Li  > wrote:
> +1 (binding)
> 
> Xiao
> 
> On Mon, Mar 9, 2020 at 8:33 AM Denny Lee  > wrote:
> +1 (non-binding)
> 
> On Mon, Mar 9, 2020 at 1:59 AM Hyukjin Kwon  > wrote:
> The proposal itself seems good as the factors to consider, Thanks Michael.
> 
> Several concerns mentioned look good points, in particular:
> 
> > ... assuming that this is for public stable APIs, not APIs that are marked 
> > as unstable, evolving, etc. ...
> I would like to confirm this. We already have API annotations such as 
> Experimental, Unstable, etc. and the implication of each is still effective. 
> If it's for stable APIs, it makes sense to me as well.
> 
> > ... can we expand on 'when' an API change can occur ?  Since we are 
> > proposing to diverge from semver. ...
> I think this is a good point. If we're proposing to divert from semver, the 
> delta compared to semver will have to be clarified to avoid different 
> personal interpretations of the somewhat general principles.
> 
> > ... can we narrow down on the migration from Apache Spark 2.4.5 to Apache 
> > Spark 3.0+? ...
> 
> Assuming these concerns will be addressed, +1 (binding).
> 
>  
> 2020년 3월 9일 (월) 오후 4:53, Takeshi Yamamuro  >님이 작성:
> +1 (non-binding)
> 
> Bests,
> Takeshi
> 
> On Mon, Mar 9, 2020 at 4:52 PM Gengliang Wang  > wrote:
> +1 (non-binding)
> 
> Gengliang
> 
> On Mon, Mar 9, 2020 at 12:22 AM Matei Zaharia  > wrote:
> +1 as well.
> 
> Matei
> 
>> On Mar 9, 2020, at 12:05 AM, Wenchen Fan > > wrote:
>> 
>> +1 (binding), assuming that this is for public stable APIs, not APIs that 
>> are marked as unstable, evolving, etc.
>> 
>> On Mon, Mar 9, 2020 at 1:10 AM Ismaël Mejía > > wrote:
>> +1 (non-binding)
>> 
>> Michael's section on the trade-offs of maintaining / removing an API are one 
>> of
>> the best reads I have seeing in this mailing list. Enthusiast +1
>> 
>> On Sat, Mar 7, 2020 at 8:28 PM Dongjoon Hyun > > wrote:
>> >
>> > This new policy has a good indention, but can we narrow down on the 
>> > migration from Apache Spark 2.4.5 to Apache Spark 3.0+?
>> >
>> > I saw that there already exists a reverting PR to bring back Spark 1.4 and 
>> > 1.5 APIs based on this AS-IS suggestion.
>> >
>> > The AS-IS policy is clearly mentioning that JVM/Scala-level difficulty, 
>> > and it's nice.
>> >
>> > However, for the other cases, it sounds like `recommending older APIs as 
>> > much as possible` due to the following.
>> >
>> >  > How long has the API been in Spark?
>> >
>> > We had better be more careful when we add a new policy and should aim not 
>> > to mislead the users and 3rd party library developers to say "older is 
>> > better".
>> >
>> > Technically, I'm wondering who will use new APIs in their examples (of 
>> > books and StackOverflow) if they need to write an additional warning like 
>> > `this only works at 2.4.0+` always .
>> >
>> > Bests,
>> > Dongjoon.
>> >
>> > On Fri, Mar 6, 2020 at 7:10 PM Mridul Muralidharan > > > wrote:
>> >>
>> >> I am in broad agreement with the prposal, as any developer, I prefer
>> >> stable well designed API's :-)
>> >>
>> >> Can we tie the proposal to stability guarantees given by spark and
>> >> reasonable expectation from users ?
>> >> In my opinion, an unstable or evolving could change - while an
>> >> experimental api which has been around for ages should be more
>> >> conservatively handled.
>> >> Which brings in question what are the stability guarantees as
>> >> specified by annotations interacting with the proposal.
>> >>
>> >> Also, can we expand on 'when' an API change can occur ?  Since we are
>> >> proposing to diverge from semver.
>> >> Patch release ? Minor release ? Only major release ? Based on 'impact'
>> >> of API ? Stability guarantees ?
>> >>
>> >> Regards,
>> >> Mridul
>> >>
>> >>
>> >>
>> >> On Fri, Mar 6, 2020 at 7:01 PM Michael Armbrust > >> > wrote:
>> >> >
>> >> > I'll start off the vote with a strong +1 (binding).
>> >> >
>> >> > On Fri, Mar 6, 2020 at 

Auto-linking from PRs to Jira tickets

2020-03-09 Thread Nicholas Chammas
https://github.blog/2019-10-14-introducing-autolink-references/

GitHub has a feature for auto-linking from PRs to external tickets. It's
only available for their paid plans, but perhaps Apache has some
arrangement with them where we can get that feature.

Since we include Jira ticket numbers in every PR title, it would be great
if each PR auto-linked back to the relevant Jira tickets. (We already have
auto-linking from Jira to PRs.)

Has someone looked into this already, or should I file a ticket with INFRA
and see what they say?

Nick


Re: [VOTE] Amend Spark's Semantic Versioning Policy

2020-03-09 Thread Takuya UESHIN
+1 (binding)


On Mon, Mar 9, 2020 at 11:49 AM Xingbo Jiang  wrote:

> +1 (non-binding)
>
> Cheers,
>
> Xingbo
>
> On Mon, Mar 9, 2020 at 9:35 AM Xiao Li  wrote:
>
>> +1 (binding)
>>
>> Xiao
>>
>> On Mon, Mar 9, 2020 at 8:33 AM Denny Lee  wrote:
>>
>>> +1 (non-binding)
>>>
>>> On Mon, Mar 9, 2020 at 1:59 AM Hyukjin Kwon  wrote:
>>>
 The proposal itself seems good as the factors to consider, Thanks
 Michael.

 Several concerns mentioned look good points, in particular:

 > ... assuming that this is for public stable APIs, not APIs that are
 marked as unstable, evolving, etc. ...
 I would like to confirm this. We already have API annotations such as
 Experimental, Unstable, etc. and the implication of each is still
 effective. If it's for stable APIs, it makes sense to me as well.

 > ... can we expand on 'when' an API change can occur ?  Since we are
 proposing to diverge from semver. ...
 I think this is a good point. If we're proposing to divert from semver,
 the delta compared to semver will have to be clarified to avoid different
 personal interpretations of the somewhat general principles.

 > ... can we narrow down on the migration from Apache Spark 2.4.5 to
 Apache Spark 3.0+? ...

 Assuming these concerns will be addressed, +1 (binding).


 2020년 3월 9일 (월) 오후 4:53, Takeshi Yamamuro 님이 작성:

> +1 (non-binding)
>
> Bests,
> Takeshi
>
> On Mon, Mar 9, 2020 at 4:52 PM Gengliang Wang <
> gengliang.w...@databricks.com> wrote:
>
>> +1 (non-binding)
>>
>> Gengliang
>>
>> On Mon, Mar 9, 2020 at 12:22 AM Matei Zaharia <
>> matei.zaha...@gmail.com> wrote:
>>
>>> +1 as well.
>>>
>>> Matei
>>>
>>> On Mar 9, 2020, at 12:05 AM, Wenchen Fan 
>>> wrote:
>>>
>>> +1 (binding), assuming that this is for public stable APIs, not APIs
>>> that are marked as unstable, evolving, etc.
>>>
>>> On Mon, Mar 9, 2020 at 1:10 AM Ismaël Mejía 
>>> wrote:
>>>
 +1 (non-binding)

 Michael's section on the trade-offs of maintaining / removing an
 API are one of
 the best reads I have seeing in this mailing list. Enthusiast +1

 On Sat, Mar 7, 2020 at 8:28 PM Dongjoon Hyun <
 dongjoon.h...@gmail.com> wrote:
 >
 > This new policy has a good indention, but can we narrow down on
 the migration from Apache Spark 2.4.5 to Apache Spark 3.0+?
 >
 > I saw that there already exists a reverting PR to bring back
 Spark 1.4 and 1.5 APIs based on this AS-IS suggestion.
 >
 > The AS-IS policy is clearly mentioning that JVM/Scala-level
 difficulty, and it's nice.
 >
 > However, for the other cases, it sounds like `recommending older
 APIs as much as possible` due to the following.
 >
 >  > How long has the API been in Spark?
 >
 > We had better be more careful when we add a new policy and should
 aim not to mislead the users and 3rd party library developers to say 
 "older
 is better".
 >
 > Technically, I'm wondering who will use new APIs in their
 examples (of books and StackOverflow) if they need to write an 
 additional
 warning like `this only works at 2.4.0+` always .
 >
 > Bests,
 > Dongjoon.
 >
 > On Fri, Mar 6, 2020 at 7:10 PM Mridul Muralidharan <
 mri...@gmail.com> wrote:
 >>
 >> I am in broad agreement with the prposal, as any developer, I
 prefer
 >> stable well designed API's :-)
 >>
 >> Can we tie the proposal to stability guarantees given by spark
 and
 >> reasonable expectation from users ?
 >> In my opinion, an unstable or evolving could change - while an
 >> experimental api which has been around for ages should be more
 >> conservatively handled.
 >> Which brings in question what are the stability guarantees as
 >> specified by annotations interacting with the proposal.
 >>
 >> Also, can we expand on 'when' an API change can occur ?  Since
 we are
 >> proposing to diverge from semver.
 >> Patch release ? Minor release ? Only major release ? Based on
 'impact'
 >> of API ? Stability guarantees ?
 >>
 >> Regards,
 >> Mridul
 >>
 >>
 >>
 >> On Fri, Mar 6, 2020 at 7:01 PM Michael Armbrust <
 mich...@databricks.com> wrote:
 >> >
 >> > I'll start off the vote with a strong +1 (binding).
 >> >
 >> > On Fri, Mar 6, 2020 at 1:01 PM Michael Armbrust <
 mich...@databricks.com> wrote:
 >> >>
 >> >> I propose to add the following text to Spark's Semantic

Re: [VOTE] Amend Spark's Semantic Versioning Policy

2020-03-09 Thread Xingbo Jiang
+1 (non-binding)

Cheers,

Xingbo

On Mon, Mar 9, 2020 at 9:35 AM Xiao Li  wrote:

> +1 (binding)
>
> Xiao
>
> On Mon, Mar 9, 2020 at 8:33 AM Denny Lee  wrote:
>
>> +1 (non-binding)
>>
>> On Mon, Mar 9, 2020 at 1:59 AM Hyukjin Kwon  wrote:
>>
>>> The proposal itself seems good as the factors to consider, Thanks
>>> Michael.
>>>
>>> Several concerns mentioned look good points, in particular:
>>>
>>> > ... assuming that this is for public stable APIs, not APIs that are
>>> marked as unstable, evolving, etc. ...
>>> I would like to confirm this. We already have API annotations such as
>>> Experimental, Unstable, etc. and the implication of each is still
>>> effective. If it's for stable APIs, it makes sense to me as well.
>>>
>>> > ... can we expand on 'when' an API change can occur ?  Since we are
>>> proposing to diverge from semver. ...
>>> I think this is a good point. If we're proposing to divert from semver,
>>> the delta compared to semver will have to be clarified to avoid different
>>> personal interpretations of the somewhat general principles.
>>>
>>> > ... can we narrow down on the migration from Apache Spark 2.4.5 to
>>> Apache Spark 3.0+? ...
>>>
>>> Assuming these concerns will be addressed, +1 (binding).
>>>
>>>
>>> 2020년 3월 9일 (월) 오후 4:53, Takeshi Yamamuro 님이 작성:
>>>
 +1 (non-binding)

 Bests,
 Takeshi

 On Mon, Mar 9, 2020 at 4:52 PM Gengliang Wang <
 gengliang.w...@databricks.com> wrote:

> +1 (non-binding)
>
> Gengliang
>
> On Mon, Mar 9, 2020 at 12:22 AM Matei Zaharia 
> wrote:
>
>> +1 as well.
>>
>> Matei
>>
>> On Mar 9, 2020, at 12:05 AM, Wenchen Fan  wrote:
>>
>> +1 (binding), assuming that this is for public stable APIs, not APIs
>> that are marked as unstable, evolving, etc.
>>
>> On Mon, Mar 9, 2020 at 1:10 AM Ismaël Mejía 
>> wrote:
>>
>>> +1 (non-binding)
>>>
>>> Michael's section on the trade-offs of maintaining / removing an API
>>> are one of
>>> the best reads I have seeing in this mailing list. Enthusiast +1
>>>
>>> On Sat, Mar 7, 2020 at 8:28 PM Dongjoon Hyun <
>>> dongjoon.h...@gmail.com> wrote:
>>> >
>>> > This new policy has a good indention, but can we narrow down on
>>> the migration from Apache Spark 2.4.5 to Apache Spark 3.0+?
>>> >
>>> > I saw that there already exists a reverting PR to bring back Spark
>>> 1.4 and 1.5 APIs based on this AS-IS suggestion.
>>> >
>>> > The AS-IS policy is clearly mentioning that JVM/Scala-level
>>> difficulty, and it's nice.
>>> >
>>> > However, for the other cases, it sounds like `recommending older
>>> APIs as much as possible` due to the following.
>>> >
>>> >  > How long has the API been in Spark?
>>> >
>>> > We had better be more careful when we add a new policy and should
>>> aim not to mislead the users and 3rd party library developers to say 
>>> "older
>>> is better".
>>> >
>>> > Technically, I'm wondering who will use new APIs in their examples
>>> (of books and StackOverflow) if they need to write an additional warning
>>> like `this only works at 2.4.0+` always .
>>> >
>>> > Bests,
>>> > Dongjoon.
>>> >
>>> > On Fri, Mar 6, 2020 at 7:10 PM Mridul Muralidharan <
>>> mri...@gmail.com> wrote:
>>> >>
>>> >> I am in broad agreement with the prposal, as any developer, I
>>> prefer
>>> >> stable well designed API's :-)
>>> >>
>>> >> Can we tie the proposal to stability guarantees given by spark and
>>> >> reasonable expectation from users ?
>>> >> In my opinion, an unstable or evolving could change - while an
>>> >> experimental api which has been around for ages should be more
>>> >> conservatively handled.
>>> >> Which brings in question what are the stability guarantees as
>>> >> specified by annotations interacting with the proposal.
>>> >>
>>> >> Also, can we expand on 'when' an API change can occur ?  Since we
>>> are
>>> >> proposing to diverge from semver.
>>> >> Patch release ? Minor release ? Only major release ? Based on
>>> 'impact'
>>> >> of API ? Stability guarantees ?
>>> >>
>>> >> Regards,
>>> >> Mridul
>>> >>
>>> >>
>>> >>
>>> >> On Fri, Mar 6, 2020 at 7:01 PM Michael Armbrust <
>>> mich...@databricks.com> wrote:
>>> >> >
>>> >> > I'll start off the vote with a strong +1 (binding).
>>> >> >
>>> >> > On Fri, Mar 6, 2020 at 1:01 PM Michael Armbrust <
>>> mich...@databricks.com> wrote:
>>> >> >>
>>> >> >> I propose to add the following text to Spark's Semantic
>>> Versioning policy and adopt it as the rubric that should be used when
>>> deciding to break APIs (even at major versions such as 3.0).
>>> >> >>
>>> >> >>
>>> >> >> I'll leave the vote open until Tuesday, March 10th at 2pm. As
>>> this is a 

Re: [VOTE] Amend Spark's Semantic Versioning Policy

2020-03-09 Thread Xiao Li
+1 (binding)

Xiao

On Mon, Mar 9, 2020 at 8:33 AM Denny Lee  wrote:

> +1 (non-binding)
>
> On Mon, Mar 9, 2020 at 1:59 AM Hyukjin Kwon  wrote:
>
>> The proposal itself seems good as the factors to consider, Thanks Michael.
>>
>> Several concerns mentioned look good points, in particular:
>>
>> > ... assuming that this is for public stable APIs, not APIs that are
>> marked as unstable, evolving, etc. ...
>> I would like to confirm this. We already have API annotations such as
>> Experimental, Unstable, etc. and the implication of each is still
>> effective. If it's for stable APIs, it makes sense to me as well.
>>
>> > ... can we expand on 'when' an API change can occur ?  Since we are
>> proposing to diverge from semver. ...
>> I think this is a good point. If we're proposing to divert from semver,
>> the delta compared to semver will have to be clarified to avoid different
>> personal interpretations of the somewhat general principles.
>>
>> > ... can we narrow down on the migration from Apache Spark 2.4.5 to
>> Apache Spark 3.0+? ...
>>
>> Assuming these concerns will be addressed, +1 (binding).
>>
>>
>> 2020년 3월 9일 (월) 오후 4:53, Takeshi Yamamuro 님이 작성:
>>
>>> +1 (non-binding)
>>>
>>> Bests,
>>> Takeshi
>>>
>>> On Mon, Mar 9, 2020 at 4:52 PM Gengliang Wang <
>>> gengliang.w...@databricks.com> wrote:
>>>
 +1 (non-binding)

 Gengliang

 On Mon, Mar 9, 2020 at 12:22 AM Matei Zaharia 
 wrote:

> +1 as well.
>
> Matei
>
> On Mar 9, 2020, at 12:05 AM, Wenchen Fan  wrote:
>
> +1 (binding), assuming that this is for public stable APIs, not APIs
> that are marked as unstable, evolving, etc.
>
> On Mon, Mar 9, 2020 at 1:10 AM Ismaël Mejía  wrote:
>
>> +1 (non-binding)
>>
>> Michael's section on the trade-offs of maintaining / removing an API
>> are one of
>> the best reads I have seeing in this mailing list. Enthusiast +1
>>
>> On Sat, Mar 7, 2020 at 8:28 PM Dongjoon Hyun 
>> wrote:
>> >
>> > This new policy has a good indention, but can we narrow down on the
>> migration from Apache Spark 2.4.5 to Apache Spark 3.0+?
>> >
>> > I saw that there already exists a reverting PR to bring back Spark
>> 1.4 and 1.5 APIs based on this AS-IS suggestion.
>> >
>> > The AS-IS policy is clearly mentioning that JVM/Scala-level
>> difficulty, and it's nice.
>> >
>> > However, for the other cases, it sounds like `recommending older
>> APIs as much as possible` due to the following.
>> >
>> >  > How long has the API been in Spark?
>> >
>> > We had better be more careful when we add a new policy and should
>> aim not to mislead the users and 3rd party library developers to say 
>> "older
>> is better".
>> >
>> > Technically, I'm wondering who will use new APIs in their examples
>> (of books and StackOverflow) if they need to write an additional warning
>> like `this only works at 2.4.0+` always .
>> >
>> > Bests,
>> > Dongjoon.
>> >
>> > On Fri, Mar 6, 2020 at 7:10 PM Mridul Muralidharan <
>> mri...@gmail.com> wrote:
>> >>
>> >> I am in broad agreement with the prposal, as any developer, I
>> prefer
>> >> stable well designed API's :-)
>> >>
>> >> Can we tie the proposal to stability guarantees given by spark and
>> >> reasonable expectation from users ?
>> >> In my opinion, an unstable or evolving could change - while an
>> >> experimental api which has been around for ages should be more
>> >> conservatively handled.
>> >> Which brings in question what are the stability guarantees as
>> >> specified by annotations interacting with the proposal.
>> >>
>> >> Also, can we expand on 'when' an API change can occur ?  Since we
>> are
>> >> proposing to diverge from semver.
>> >> Patch release ? Minor release ? Only major release ? Based on
>> 'impact'
>> >> of API ? Stability guarantees ?
>> >>
>> >> Regards,
>> >> Mridul
>> >>
>> >>
>> >>
>> >> On Fri, Mar 6, 2020 at 7:01 PM Michael Armbrust <
>> mich...@databricks.com> wrote:
>> >> >
>> >> > I'll start off the vote with a strong +1 (binding).
>> >> >
>> >> > On Fri, Mar 6, 2020 at 1:01 PM Michael Armbrust <
>> mich...@databricks.com> wrote:
>> >> >>
>> >> >> I propose to add the following text to Spark's Semantic
>> Versioning policy and adopt it as the rubric that should be used when
>> deciding to break APIs (even at major versions such as 3.0).
>> >> >>
>> >> >>
>> >> >> I'll leave the vote open until Tuesday, March 10th at 2pm. As
>> this is a procedural vote, the measure will pass if there are more
>> favourable votes than unfavourable ones. PMC votes are binding, but the
>> community is encouraged to add their voice to the discussion.
>> >> >>
>> >> >>
>> >> >> [ ] +1 - Spark 

Re: [VOTE] Amend Spark's Semantic Versioning Policy

2020-03-09 Thread Denny Lee
+1 (non-binding)

On Mon, Mar 9, 2020 at 1:59 AM Hyukjin Kwon  wrote:

> The proposal itself seems good as the factors to consider, Thanks Michael.
>
> Several concerns mentioned look good points, in particular:
>
> > ... assuming that this is for public stable APIs, not APIs that are
> marked as unstable, evolving, etc. ...
> I would like to confirm this. We already have API annotations such as
> Experimental, Unstable, etc. and the implication of each is still
> effective. If it's for stable APIs, it makes sense to me as well.
>
> > ... can we expand on 'when' an API change can occur ?  Since we are
> proposing to diverge from semver. ...
> I think this is a good point. If we're proposing to divert from semver,
> the delta compared to semver will have to be clarified to avoid different
> personal interpretations of the somewhat general principles.
>
> > ... can we narrow down on the migration from Apache Spark 2.4.5 to
> Apache Spark 3.0+? ...
>
> Assuming these concerns will be addressed, +1 (binding).
>
>
> 2020년 3월 9일 (월) 오후 4:53, Takeshi Yamamuro 님이 작성:
>
>> +1 (non-binding)
>>
>> Bests,
>> Takeshi
>>
>> On Mon, Mar 9, 2020 at 4:52 PM Gengliang Wang <
>> gengliang.w...@databricks.com> wrote:
>>
>>> +1 (non-binding)
>>>
>>> Gengliang
>>>
>>> On Mon, Mar 9, 2020 at 12:22 AM Matei Zaharia 
>>> wrote:
>>>
 +1 as well.

 Matei

 On Mar 9, 2020, at 12:05 AM, Wenchen Fan  wrote:

 +1 (binding), assuming that this is for public stable APIs, not APIs
 that are marked as unstable, evolving, etc.

 On Mon, Mar 9, 2020 at 1:10 AM Ismaël Mejía  wrote:

> +1 (non-binding)
>
> Michael's section on the trade-offs of maintaining / removing an API
> are one of
> the best reads I have seeing in this mailing list. Enthusiast +1
>
> On Sat, Mar 7, 2020 at 8:28 PM Dongjoon Hyun 
> wrote:
> >
> > This new policy has a good indention, but can we narrow down on the
> migration from Apache Spark 2.4.5 to Apache Spark 3.0+?
> >
> > I saw that there already exists a reverting PR to bring back Spark
> 1.4 and 1.5 APIs based on this AS-IS suggestion.
> >
> > The AS-IS policy is clearly mentioning that JVM/Scala-level
> difficulty, and it's nice.
> >
> > However, for the other cases, it sounds like `recommending older
> APIs as much as possible` due to the following.
> >
> >  > How long has the API been in Spark?
> >
> > We had better be more careful when we add a new policy and should
> aim not to mislead the users and 3rd party library developers to say 
> "older
> is better".
> >
> > Technically, I'm wondering who will use new APIs in their examples
> (of books and StackOverflow) if they need to write an additional warning
> like `this only works at 2.4.0+` always .
> >
> > Bests,
> > Dongjoon.
> >
> > On Fri, Mar 6, 2020 at 7:10 PM Mridul Muralidharan 
> wrote:
> >>
> >> I am in broad agreement with the prposal, as any developer, I prefer
> >> stable well designed API's :-)
> >>
> >> Can we tie the proposal to stability guarantees given by spark and
> >> reasonable expectation from users ?
> >> In my opinion, an unstable or evolving could change - while an
> >> experimental api which has been around for ages should be more
> >> conservatively handled.
> >> Which brings in question what are the stability guarantees as
> >> specified by annotations interacting with the proposal.
> >>
> >> Also, can we expand on 'when' an API change can occur ?  Since we
> are
> >> proposing to diverge from semver.
> >> Patch release ? Minor release ? Only major release ? Based on
> 'impact'
> >> of API ? Stability guarantees ?
> >>
> >> Regards,
> >> Mridul
> >>
> >>
> >>
> >> On Fri, Mar 6, 2020 at 7:01 PM Michael Armbrust <
> mich...@databricks.com> wrote:
> >> >
> >> > I'll start off the vote with a strong +1 (binding).
> >> >
> >> > On Fri, Mar 6, 2020 at 1:01 PM Michael Armbrust <
> mich...@databricks.com> wrote:
> >> >>
> >> >> I propose to add the following text to Spark's Semantic
> Versioning policy and adopt it as the rubric that should be used when
> deciding to break APIs (even at major versions such as 3.0).
> >> >>
> >> >>
> >> >> I'll leave the vote open until Tuesday, March 10th at 2pm. As
> this is a procedural vote, the measure will pass if there are more
> favourable votes than unfavourable ones. PMC votes are binding, but the
> community is encouraged to add their voice to the discussion.
> >> >>
> >> >>
> >> >> [ ] +1 - Spark should adopt this policy.
> >> >>
> >> >> [ ] -1  - Spark should not adopt this policy.
> >> >>
> >> >>
> >> >> 
> >> >>
> >> >>
> >> >> Considerations When Breaking APIs
> >> >>
> >> >> The Spark 

Re: Keytab, Proxy User & Principal

2020-03-09 Thread Lars Francke
I just wanted to bump this to see if anyone has any opinions on this?

On Fri, Feb 28, 2020 at 3:20 PM Lars Francke  wrote:

> Hi,
>
> I understand that we forbid specifying "principal" & "proxy user" at the
> same time because the current logic would just stage the keytab and the
> proxy user could then use that to gain full access circumventing any
> security.
>
> But we have a use-case for Livy where a different semantic would be great:
> Livy is supposed to submit a job for other users. It does so by specifying
> "proxy user" and it relies on the local credential cache (outside of Java)
> to contain the proper tickets (it runs kinit in a background thread).
>
> This will only work if Livy runs in an environment where it's the only
> user working with that credentials cache. Unfortunately that's not always
> the case when multiple services share the same user.
>
> (One thing we'll try is to use the KRB5CCNAME environment variable to
> point to a different Credential Cache for Livy but I'm not sure yet if
> that's being passed on to the spawned Spark process)
>
> Can we not allow specifying a keytab and principal together with proxy
> user but those are only used for the initial login to submit the job and
> are not shipped to the cluster? This way jobs wouldn't need to rely on the
> operating system.
>
> Maybe I'm missing something as well?
>
> Cheers,
> Lars
>


Re: [VOTE] Amend Spark's Semantic Versioning Policy

2020-03-09 Thread Hyukjin Kwon
The proposal itself seems good as the factors to consider, Thanks Michael.

Several concerns mentioned look good points, in particular:

> ... assuming that this is for public stable APIs, not APIs that are
marked as unstable, evolving, etc. ...
I would like to confirm this. We already have API annotations such as
Experimental, Unstable, etc. and the implication of each is still
effective. If it's for stable APIs, it makes sense to me as well.

> ... can we expand on 'when' an API change can occur ?  Since we are
proposing to diverge from semver. ...
I think this is a good point. If we're proposing to divert from semver, the
delta compared to semver will have to be clarified to avoid different
personal interpretations of the somewhat general principles.

> ... can we narrow down on the migration from Apache Spark 2.4.5 to Apache
Spark 3.0+? ...

Assuming these concerns will be addressed, +1 (binding).


2020년 3월 9일 (월) 오후 4:53, Takeshi Yamamuro 님이 작성:

> +1 (non-binding)
>
> Bests,
> Takeshi
>
> On Mon, Mar 9, 2020 at 4:52 PM Gengliang Wang <
> gengliang.w...@databricks.com> wrote:
>
>> +1 (non-binding)
>>
>> Gengliang
>>
>> On Mon, Mar 9, 2020 at 12:22 AM Matei Zaharia 
>> wrote:
>>
>>> +1 as well.
>>>
>>> Matei
>>>
>>> On Mar 9, 2020, at 12:05 AM, Wenchen Fan  wrote:
>>>
>>> +1 (binding), assuming that this is for public stable APIs, not APIs
>>> that are marked as unstable, evolving, etc.
>>>
>>> On Mon, Mar 9, 2020 at 1:10 AM Ismaël Mejía  wrote:
>>>
 +1 (non-binding)

 Michael's section on the trade-offs of maintaining / removing an API
 are one of
 the best reads I have seeing in this mailing list. Enthusiast +1

 On Sat, Mar 7, 2020 at 8:28 PM Dongjoon Hyun 
 wrote:
 >
 > This new policy has a good indention, but can we narrow down on the
 migration from Apache Spark 2.4.5 to Apache Spark 3.0+?
 >
 > I saw that there already exists a reverting PR to bring back Spark
 1.4 and 1.5 APIs based on this AS-IS suggestion.
 >
 > The AS-IS policy is clearly mentioning that JVM/Scala-level
 difficulty, and it's nice.
 >
 > However, for the other cases, it sounds like `recommending older APIs
 as much as possible` due to the following.
 >
 >  > How long has the API been in Spark?
 >
 > We had better be more careful when we add a new policy and should aim
 not to mislead the users and 3rd party library developers to say "older is
 better".
 >
 > Technically, I'm wondering who will use new APIs in their examples
 (of books and StackOverflow) if they need to write an additional warning
 like `this only works at 2.4.0+` always .
 >
 > Bests,
 > Dongjoon.
 >
 > On Fri, Mar 6, 2020 at 7:10 PM Mridul Muralidharan 
 wrote:
 >>
 >> I am in broad agreement with the prposal, as any developer, I prefer
 >> stable well designed API's :-)
 >>
 >> Can we tie the proposal to stability guarantees given by spark and
 >> reasonable expectation from users ?
 >> In my opinion, an unstable or evolving could change - while an
 >> experimental api which has been around for ages should be more
 >> conservatively handled.
 >> Which brings in question what are the stability guarantees as
 >> specified by annotations interacting with the proposal.
 >>
 >> Also, can we expand on 'when' an API change can occur ?  Since we are
 >> proposing to diverge from semver.
 >> Patch release ? Minor release ? Only major release ? Based on
 'impact'
 >> of API ? Stability guarantees ?
 >>
 >> Regards,
 >> Mridul
 >>
 >>
 >>
 >> On Fri, Mar 6, 2020 at 7:01 PM Michael Armbrust <
 mich...@databricks.com> wrote:
 >> >
 >> > I'll start off the vote with a strong +1 (binding).
 >> >
 >> > On Fri, Mar 6, 2020 at 1:01 PM Michael Armbrust <
 mich...@databricks.com> wrote:
 >> >>
 >> >> I propose to add the following text to Spark's Semantic
 Versioning policy and adopt it as the rubric that should be used when
 deciding to break APIs (even at major versions such as 3.0).
 >> >>
 >> >>
 >> >> I'll leave the vote open until Tuesday, March 10th at 2pm. As
 this is a procedural vote, the measure will pass if there are more
 favourable votes than unfavourable ones. PMC votes are binding, but the
 community is encouraged to add their voice to the discussion.
 >> >>
 >> >>
 >> >> [ ] +1 - Spark should adopt this policy.
 >> >>
 >> >> [ ] -1  - Spark should not adopt this policy.
 >> >>
 >> >>
 >> >> 
 >> >>
 >> >>
 >> >> Considerations When Breaking APIs
 >> >>
 >> >> The Spark project strives to avoid breaking APIs or silently
 changing behavior, even at major versions. While this is not always
 possible, the balance of the following factors should be considered before
 choosing to break an API.
 

Re: [VOTE] Amend Spark's Semantic Versioning Policy

2020-03-09 Thread Takeshi Yamamuro
+1 (non-binding)

Bests,
Takeshi

On Mon, Mar 9, 2020 at 4:52 PM Gengliang Wang 
wrote:

> +1 (non-binding)
>
> Gengliang
>
> On Mon, Mar 9, 2020 at 12:22 AM Matei Zaharia 
> wrote:
>
>> +1 as well.
>>
>> Matei
>>
>> On Mar 9, 2020, at 12:05 AM, Wenchen Fan  wrote:
>>
>> +1 (binding), assuming that this is for public stable APIs, not APIs that
>> are marked as unstable, evolving, etc.
>>
>> On Mon, Mar 9, 2020 at 1:10 AM Ismaël Mejía  wrote:
>>
>>> +1 (non-binding)
>>>
>>> Michael's section on the trade-offs of maintaining / removing an API are
>>> one of
>>> the best reads I have seeing in this mailing list. Enthusiast +1
>>>
>>> On Sat, Mar 7, 2020 at 8:28 PM Dongjoon Hyun 
>>> wrote:
>>> >
>>> > This new policy has a good indention, but can we narrow down on the
>>> migration from Apache Spark 2.4.5 to Apache Spark 3.0+?
>>> >
>>> > I saw that there already exists a reverting PR to bring back Spark 1.4
>>> and 1.5 APIs based on this AS-IS suggestion.
>>> >
>>> > The AS-IS policy is clearly mentioning that JVM/Scala-level
>>> difficulty, and it's nice.
>>> >
>>> > However, for the other cases, it sounds like `recommending older APIs
>>> as much as possible` due to the following.
>>> >
>>> >  > How long has the API been in Spark?
>>> >
>>> > We had better be more careful when we add a new policy and should aim
>>> not to mislead the users and 3rd party library developers to say "older is
>>> better".
>>> >
>>> > Technically, I'm wondering who will use new APIs in their examples (of
>>> books and StackOverflow) if they need to write an additional warning like
>>> `this only works at 2.4.0+` always .
>>> >
>>> > Bests,
>>> > Dongjoon.
>>> >
>>> > On Fri, Mar 6, 2020 at 7:10 PM Mridul Muralidharan 
>>> wrote:
>>> >>
>>> >> I am in broad agreement with the prposal, as any developer, I prefer
>>> >> stable well designed API's :-)
>>> >>
>>> >> Can we tie the proposal to stability guarantees given by spark and
>>> >> reasonable expectation from users ?
>>> >> In my opinion, an unstable or evolving could change - while an
>>> >> experimental api which has been around for ages should be more
>>> >> conservatively handled.
>>> >> Which brings in question what are the stability guarantees as
>>> >> specified by annotations interacting with the proposal.
>>> >>
>>> >> Also, can we expand on 'when' an API change can occur ?  Since we are
>>> >> proposing to diverge from semver.
>>> >> Patch release ? Minor release ? Only major release ? Based on 'impact'
>>> >> of API ? Stability guarantees ?
>>> >>
>>> >> Regards,
>>> >> Mridul
>>> >>
>>> >>
>>> >>
>>> >> On Fri, Mar 6, 2020 at 7:01 PM Michael Armbrust <
>>> mich...@databricks.com> wrote:
>>> >> >
>>> >> > I'll start off the vote with a strong +1 (binding).
>>> >> >
>>> >> > On Fri, Mar 6, 2020 at 1:01 PM Michael Armbrust <
>>> mich...@databricks.com> wrote:
>>> >> >>
>>> >> >> I propose to add the following text to Spark's Semantic Versioning
>>> policy and adopt it as the rubric that should be used when deciding to
>>> break APIs (even at major versions such as 3.0).
>>> >> >>
>>> >> >>
>>> >> >> I'll leave the vote open until Tuesday, March 10th at 2pm. As this
>>> is a procedural vote, the measure will pass if there are more favourable
>>> votes than unfavourable ones. PMC votes are binding, but the community is
>>> encouraged to add their voice to the discussion.
>>> >> >>
>>> >> >>
>>> >> >> [ ] +1 - Spark should adopt this policy.
>>> >> >>
>>> >> >> [ ] -1  - Spark should not adopt this policy.
>>> >> >>
>>> >> >>
>>> >> >> 
>>> >> >>
>>> >> >>
>>> >> >> Considerations When Breaking APIs
>>> >> >>
>>> >> >> The Spark project strives to avoid breaking APIs or silently
>>> changing behavior, even at major versions. While this is not always
>>> possible, the balance of the following factors should be considered before
>>> choosing to break an API.
>>> >> >>
>>> >> >>
>>> >> >> Cost of Breaking an API
>>> >> >>
>>> >> >> Breaking an API almost always has a non-trivial cost to the users
>>> of Spark. A broken API means that Spark programs need to be rewritten
>>> before they can be upgraded. However, there are a few considerations when
>>> thinking about what the cost will be:
>>> >> >>
>>> >> >> Usage - an API that is actively used in many different places, is
>>> always very costly to break. While it is hard to know usage for sure, there
>>> are a bunch of ways that we can estimate:
>>> >> >>
>>> >> >> How long has the API been in Spark?
>>> >> >>
>>> >> >> Is the API common even for basic programs?
>>> >> >>
>>> >> >> How often do we see recent questions in JIRA or mailing lists?
>>> >> >>
>>> >> >> How often does it appear in StackOverflow or blogs?
>>> >> >>
>>> >> >> Behavior after the break - How will a program that works today,
>>> work after the break? The following are listed roughly in order of
>>> increasing severity:
>>> >> >>
>>> >> >> Will there be a compiler or linker error?
>>> >> >>
>>> >> >> Will there be a runtime exception?
>>> >> >>
>>> 

Re: [VOTE] Amend Spark's Semantic Versioning Policy

2020-03-09 Thread Gengliang Wang
+1 (non-binding)

Gengliang

On Mon, Mar 9, 2020 at 12:22 AM Matei Zaharia 
wrote:

> +1 as well.
>
> Matei
>
> On Mar 9, 2020, at 12:05 AM, Wenchen Fan  wrote:
>
> +1 (binding), assuming that this is for public stable APIs, not APIs that
> are marked as unstable, evolving, etc.
>
> On Mon, Mar 9, 2020 at 1:10 AM Ismaël Mejía  wrote:
>
>> +1 (non-binding)
>>
>> Michael's section on the trade-offs of maintaining / removing an API are
>> one of
>> the best reads I have seeing in this mailing list. Enthusiast +1
>>
>> On Sat, Mar 7, 2020 at 8:28 PM Dongjoon Hyun 
>> wrote:
>> >
>> > This new policy has a good indention, but can we narrow down on the
>> migration from Apache Spark 2.4.5 to Apache Spark 3.0+?
>> >
>> > I saw that there already exists a reverting PR to bring back Spark 1.4
>> and 1.5 APIs based on this AS-IS suggestion.
>> >
>> > The AS-IS policy is clearly mentioning that JVM/Scala-level difficulty,
>> and it's nice.
>> >
>> > However, for the other cases, it sounds like `recommending older APIs
>> as much as possible` due to the following.
>> >
>> >  > How long has the API been in Spark?
>> >
>> > We had better be more careful when we add a new policy and should aim
>> not to mislead the users and 3rd party library developers to say "older is
>> better".
>> >
>> > Technically, I'm wondering who will use new APIs in their examples (of
>> books and StackOverflow) if they need to write an additional warning like
>> `this only works at 2.4.0+` always .
>> >
>> > Bests,
>> > Dongjoon.
>> >
>> > On Fri, Mar 6, 2020 at 7:10 PM Mridul Muralidharan 
>> wrote:
>> >>
>> >> I am in broad agreement with the prposal, as any developer, I prefer
>> >> stable well designed API's :-)
>> >>
>> >> Can we tie the proposal to stability guarantees given by spark and
>> >> reasonable expectation from users ?
>> >> In my opinion, an unstable or evolving could change - while an
>> >> experimental api which has been around for ages should be more
>> >> conservatively handled.
>> >> Which brings in question what are the stability guarantees as
>> >> specified by annotations interacting with the proposal.
>> >>
>> >> Also, can we expand on 'when' an API change can occur ?  Since we are
>> >> proposing to diverge from semver.
>> >> Patch release ? Minor release ? Only major release ? Based on 'impact'
>> >> of API ? Stability guarantees ?
>> >>
>> >> Regards,
>> >> Mridul
>> >>
>> >>
>> >>
>> >> On Fri, Mar 6, 2020 at 7:01 PM Michael Armbrust <
>> mich...@databricks.com> wrote:
>> >> >
>> >> > I'll start off the vote with a strong +1 (binding).
>> >> >
>> >> > On Fri, Mar 6, 2020 at 1:01 PM Michael Armbrust <
>> mich...@databricks.com> wrote:
>> >> >>
>> >> >> I propose to add the following text to Spark's Semantic Versioning
>> policy and adopt it as the rubric that should be used when deciding to
>> break APIs (even at major versions such as 3.0).
>> >> >>
>> >> >>
>> >> >> I'll leave the vote open until Tuesday, March 10th at 2pm. As this
>> is a procedural vote, the measure will pass if there are more favourable
>> votes than unfavourable ones. PMC votes are binding, but the community is
>> encouraged to add their voice to the discussion.
>> >> >>
>> >> >>
>> >> >> [ ] +1 - Spark should adopt this policy.
>> >> >>
>> >> >> [ ] -1  - Spark should not adopt this policy.
>> >> >>
>> >> >>
>> >> >> 
>> >> >>
>> >> >>
>> >> >> Considerations When Breaking APIs
>> >> >>
>> >> >> The Spark project strives to avoid breaking APIs or silently
>> changing behavior, even at major versions. While this is not always
>> possible, the balance of the following factors should be considered before
>> choosing to break an API.
>> >> >>
>> >> >>
>> >> >> Cost of Breaking an API
>> >> >>
>> >> >> Breaking an API almost always has a non-trivial cost to the users
>> of Spark. A broken API means that Spark programs need to be rewritten
>> before they can be upgraded. However, there are a few considerations when
>> thinking about what the cost will be:
>> >> >>
>> >> >> Usage - an API that is actively used in many different places, is
>> always very costly to break. While it is hard to know usage for sure, there
>> are a bunch of ways that we can estimate:
>> >> >>
>> >> >> How long has the API been in Spark?
>> >> >>
>> >> >> Is the API common even for basic programs?
>> >> >>
>> >> >> How often do we see recent questions in JIRA or mailing lists?
>> >> >>
>> >> >> How often does it appear in StackOverflow or blogs?
>> >> >>
>> >> >> Behavior after the break - How will a program that works today,
>> work after the break? The following are listed roughly in order of
>> increasing severity:
>> >> >>
>> >> >> Will there be a compiler or linker error?
>> >> >>
>> >> >> Will there be a runtime exception?
>> >> >>
>> >> >> Will that exception happen after significant processing has been
>> done?
>> >> >>
>> >> >> Will we silently return different answers? (very hard to debug,
>> might not even notice!)
>> >> >>
>> >> >>
>> >> >> Cost of 

Re: [VOTE] Amend Spark's Semantic Versioning Policy

2020-03-09 Thread Matei Zaharia
+1 as well.

Matei

> On Mar 9, 2020, at 12:05 AM, Wenchen Fan  wrote:
> 
> +1 (binding), assuming that this is for public stable APIs, not APIs that are 
> marked as unstable, evolving, etc.
> 
> On Mon, Mar 9, 2020 at 1:10 AM Ismaël Mejía  > wrote:
> +1 (non-binding)
> 
> Michael's section on the trade-offs of maintaining / removing an API are one 
> of
> the best reads I have seeing in this mailing list. Enthusiast +1
> 
> On Sat, Mar 7, 2020 at 8:28 PM Dongjoon Hyun  > wrote:
> >
> > This new policy has a good indention, but can we narrow down on the 
> > migration from Apache Spark 2.4.5 to Apache Spark 3.0+?
> >
> > I saw that there already exists a reverting PR to bring back Spark 1.4 and 
> > 1.5 APIs based on this AS-IS suggestion.
> >
> > The AS-IS policy is clearly mentioning that JVM/Scala-level difficulty, and 
> > it's nice.
> >
> > However, for the other cases, it sounds like `recommending older APIs as 
> > much as possible` due to the following.
> >
> >  > How long has the API been in Spark?
> >
> > We had better be more careful when we add a new policy and should aim not 
> > to mislead the users and 3rd party library developers to say "older is 
> > better".
> >
> > Technically, I'm wondering who will use new APIs in their examples (of 
> > books and StackOverflow) if they need to write an additional warning like 
> > `this only works at 2.4.0+` always .
> >
> > Bests,
> > Dongjoon.
> >
> > On Fri, Mar 6, 2020 at 7:10 PM Mridul Muralidharan  > > wrote:
> >>
> >> I am in broad agreement with the prposal, as any developer, I prefer
> >> stable well designed API's :-)
> >>
> >> Can we tie the proposal to stability guarantees given by spark and
> >> reasonable expectation from users ?
> >> In my opinion, an unstable or evolving could change - while an
> >> experimental api which has been around for ages should be more
> >> conservatively handled.
> >> Which brings in question what are the stability guarantees as
> >> specified by annotations interacting with the proposal.
> >>
> >> Also, can we expand on 'when' an API change can occur ?  Since we are
> >> proposing to diverge from semver.
> >> Patch release ? Minor release ? Only major release ? Based on 'impact'
> >> of API ? Stability guarantees ?
> >>
> >> Regards,
> >> Mridul
> >>
> >>
> >>
> >> On Fri, Mar 6, 2020 at 7:01 PM Michael Armbrust  >> > wrote:
> >> >
> >> > I'll start off the vote with a strong +1 (binding).
> >> >
> >> > On Fri, Mar 6, 2020 at 1:01 PM Michael Armbrust  >> > > wrote:
> >> >>
> >> >> I propose to add the following text to Spark's Semantic Versioning 
> >> >> policy and adopt it as the rubric that should be used when deciding to 
> >> >> break APIs (even at major versions such as 3.0).
> >> >>
> >> >>
> >> >> I'll leave the vote open until Tuesday, March 10th at 2pm. As this is a 
> >> >> procedural vote, the measure will pass if there are more favourable 
> >> >> votes than unfavourable ones. PMC votes are binding, but the community 
> >> >> is encouraged to add their voice to the discussion.
> >> >>
> >> >>
> >> >> [ ] +1 - Spark should adopt this policy.
> >> >>
> >> >> [ ] -1  - Spark should not adopt this policy.
> >> >>
> >> >>
> >> >> 
> >> >>
> >> >>
> >> >> Considerations When Breaking APIs
> >> >>
> >> >> The Spark project strives to avoid breaking APIs or silently changing 
> >> >> behavior, even at major versions. While this is not always possible, 
> >> >> the balance of the following factors should be considered before 
> >> >> choosing to break an API.
> >> >>
> >> >>
> >> >> Cost of Breaking an API
> >> >>
> >> >> Breaking an API almost always has a non-trivial cost to the users of 
> >> >> Spark. A broken API means that Spark programs need to be rewritten 
> >> >> before they can be upgraded. However, there are a few considerations 
> >> >> when thinking about what the cost will be:
> >> >>
> >> >> Usage - an API that is actively used in many different places, is 
> >> >> always very costly to break. While it is hard to know usage for sure, 
> >> >> there are a bunch of ways that we can estimate:
> >> >>
> >> >> How long has the API been in Spark?
> >> >>
> >> >> Is the API common even for basic programs?
> >> >>
> >> >> How often do we see recent questions in JIRA or mailing lists?
> >> >>
> >> >> How often does it appear in StackOverflow or blogs?
> >> >>
> >> >> Behavior after the break - How will a program that works today, work 
> >> >> after the break? The following are listed roughly in order of 
> >> >> increasing severity:
> >> >>
> >> >> Will there be a compiler or linker error?
> >> >>
> >> >> Will there be a runtime exception?
> >> >>
> >> >> Will that exception happen after significant processing has been done?
> >> >>
> >> >> Will we silently return different answers? (very hard to debug, might 
> >> >> not even notice!)
> >> >>
> >> 

Re: [VOTE] Amend Spark's Semantic Versioning Policy

2020-03-09 Thread Wenchen Fan
+1 (binding), assuming that this is for public stable APIs, not APIs that
are marked as unstable, evolving, etc.

On Mon, Mar 9, 2020 at 1:10 AM Ismaël Mejía  wrote:

> +1 (non-binding)
>
> Michael's section on the trade-offs of maintaining / removing an API are
> one of
> the best reads I have seeing in this mailing list. Enthusiast +1
>
> On Sat, Mar 7, 2020 at 8:28 PM Dongjoon Hyun 
> wrote:
> >
> > This new policy has a good indention, but can we narrow down on the
> migration from Apache Spark 2.4.5 to Apache Spark 3.0+?
> >
> > I saw that there already exists a reverting PR to bring back Spark 1.4
> and 1.5 APIs based on this AS-IS suggestion.
> >
> > The AS-IS policy is clearly mentioning that JVM/Scala-level difficulty,
> and it's nice.
> >
> > However, for the other cases, it sounds like `recommending older APIs as
> much as possible` due to the following.
> >
> >  > How long has the API been in Spark?
> >
> > We had better be more careful when we add a new policy and should aim
> not to mislead the users and 3rd party library developers to say "older is
> better".
> >
> > Technically, I'm wondering who will use new APIs in their examples (of
> books and StackOverflow) if they need to write an additional warning like
> `this only works at 2.4.0+` always .
> >
> > Bests,
> > Dongjoon.
> >
> > On Fri, Mar 6, 2020 at 7:10 PM Mridul Muralidharan 
> wrote:
> >>
> >> I am in broad agreement with the prposal, as any developer, I prefer
> >> stable well designed API's :-)
> >>
> >> Can we tie the proposal to stability guarantees given by spark and
> >> reasonable expectation from users ?
> >> In my opinion, an unstable or evolving could change - while an
> >> experimental api which has been around for ages should be more
> >> conservatively handled.
> >> Which brings in question what are the stability guarantees as
> >> specified by annotations interacting with the proposal.
> >>
> >> Also, can we expand on 'when' an API change can occur ?  Since we are
> >> proposing to diverge from semver.
> >> Patch release ? Minor release ? Only major release ? Based on 'impact'
> >> of API ? Stability guarantees ?
> >>
> >> Regards,
> >> Mridul
> >>
> >>
> >>
> >> On Fri, Mar 6, 2020 at 7:01 PM Michael Armbrust 
> wrote:
> >> >
> >> > I'll start off the vote with a strong +1 (binding).
> >> >
> >> > On Fri, Mar 6, 2020 at 1:01 PM Michael Armbrust <
> mich...@databricks.com> wrote:
> >> >>
> >> >> I propose to add the following text to Spark's Semantic Versioning
> policy and adopt it as the rubric that should be used when deciding to
> break APIs (even at major versions such as 3.0).
> >> >>
> >> >>
> >> >> I'll leave the vote open until Tuesday, March 10th at 2pm. As this
> is a procedural vote, the measure will pass if there are more favourable
> votes than unfavourable ones. PMC votes are binding, but the community is
> encouraged to add their voice to the discussion.
> >> >>
> >> >>
> >> >> [ ] +1 - Spark should adopt this policy.
> >> >>
> >> >> [ ] -1  - Spark should not adopt this policy.
> >> >>
> >> >>
> >> >> 
> >> >>
> >> >>
> >> >> Considerations When Breaking APIs
> >> >>
> >> >> The Spark project strives to avoid breaking APIs or silently
> changing behavior, even at major versions. While this is not always
> possible, the balance of the following factors should be considered before
> choosing to break an API.
> >> >>
> >> >>
> >> >> Cost of Breaking an API
> >> >>
> >> >> Breaking an API almost always has a non-trivial cost to the users of
> Spark. A broken API means that Spark programs need to be rewritten before
> they can be upgraded. However, there are a few considerations when thinking
> about what the cost will be:
> >> >>
> >> >> Usage - an API that is actively used in many different places, is
> always very costly to break. While it is hard to know usage for sure, there
> are a bunch of ways that we can estimate:
> >> >>
> >> >> How long has the API been in Spark?
> >> >>
> >> >> Is the API common even for basic programs?
> >> >>
> >> >> How often do we see recent questions in JIRA or mailing lists?
> >> >>
> >> >> How often does it appear in StackOverflow or blogs?
> >> >>
> >> >> Behavior after the break - How will a program that works today, work
> after the break? The following are listed roughly in order of increasing
> severity:
> >> >>
> >> >> Will there be a compiler or linker error?
> >> >>
> >> >> Will there be a runtime exception?
> >> >>
> >> >> Will that exception happen after significant processing has been
> done?
> >> >>
> >> >> Will we silently return different answers? (very hard to debug,
> might not even notice!)
> >> >>
> >> >>
> >> >> Cost of Maintaining an API
> >> >>
> >> >> Of course, the above does not mean that we will never break any
> APIs. We must also consider the cost both to the project and to our users
> of keeping the API in question.
> >> >>
> >> >> Project Costs - Every API we have needs to be tested and needs to
> keep working as other parts of the