Re: ASF policy violation and Scala version issues

2023-06-07 Thread Grisha Weintraub
Dongjoon,

I followed the conversation, and in my opinion, your concern is totally
legit.
It just feels that the discussion is focused solely on Databricks, and as I
said above, the same issue occurs in other vendors as well.


On Wed, Jun 7, 2023 at 10:28 PM Dongjoon Hyun 
wrote:

> To Grisha, we are talking about what is the right way and how to comply
> with ASF legal advice which I shared in this thread from "legal-discuss@"
> mailing thread.
>
> https://lists.apache.org/thread/mzhggd0rpz8t4d7vdsbhkp38mvd3lty4
>  (legal-discuss@)
> https://www.apache.org/foundation/marks/downstream.html#source (ASF
> Website)
>
> Dongjoon
>
>
> On Wed, Jun 7, 2023 at 12:16 PM Grisha Weintraub <
> grisha.weintr...@gmail.com> wrote:
>
>> Yes, in Spark UI you have it as "3.1.2-amazon", but when you create a
>> cluster it's just Spark 3.1.2.
>>
>> On Wed, Jun 7, 2023 at 10:05 PM Nan Zhu  wrote:
>>
>>>
>>>  for EMR, I think they show 3.1.2-amazon in Spark UI, no?
>>>
>>>
>>> On Wed, Jun 7, 2023 at 11:30 Grisha Weintraub <
>>> grisha.weintr...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am not taking sides here, but just for fairness, I think it should be
>>>> noted that AWS EMR does exactly the same thing.
>>>> We choose the EMR version (e.g., 6.4.0) and it has an associated Spark
>>>> version (e.g., 3.1.2).
>>>> The Spark version here is not the original Apache version but AWS Spark
>>>> distribution.
>>>>
>>>> On Wed, Jun 7, 2023 at 8:24 PM Dongjoon Hyun 
>>>> wrote:
>>>>
>>>>> I disagree with you in several ways.
>>>>>
>>>>> The following is not a *minor* change like the given examples
>>>>> (alterations to the start-up and shutdown scripts, configuration files,
>>>>> file layout etc.).
>>>>>
>>>>> > The change you cite meets the 4th point, minor change, made for
>>>>> integration reasons.
>>>>>
>>>>> The following is also wrong. There is no such point of state of Apache
>>>>> Spark 3.4.0 after 3.4.0 tag creation. Apache Spark community didn't allow
>>>>> Scala reverting patches in both `master` branch and `branch-3.4`.
>>>>>
>>>>> > There is no known technical objection; this was after all at one
>>>>> point the state of Apache Spark.
>>>>>
>>>>> Is the following your main point? So, you are selling a box "including
>>>>> Harry Potter by J. K. Rolling whose main character is Barry instead of
>>>>> Harry", but it's okay because you didn't sell the book itself? And, as a
>>>>> cloud-vendor, you borrowed the box instead of selling it like private
>>>>> libraries?
>>>>>
>>>>> > There is no standalone distribution of Apache Spark anywhere here.
>>>>>
>>>>> We are not asking a big thing. Why are you so reluctant to say you are
>>>>> not "Apache Spark 3.4.0" by simply saying "Apache Spark 3.4.0-databricks".
>>>>> What is the marketing reason here?
>>>>>
>>>>> Dongjoon.
>>>>>
>>>>>
>>>>> On Wed, Jun 7, 2023 at 9:27 AM Sean Owen  wrote:
>>>>>
>>>>>> Hi Dongjoon, I think this conversation is not advancing anymore. I
>>>>>> personally consider the matter closed unless you can find other support 
>>>>>> or
>>>>>> respond with more specifics. While this perhaps should be on private@,
>>>>>> I think it's not wrong as an instructive discussion on dev@.
>>>>>>
>>>>>> I don't believe you've made a clear argument about the problem, or
>>>>>> how it relates specifically to policy. Nevertheless I will show you my
>>>>>> logic.
>>>>>>
>>>>>> You are asserting that a vendor cannot call a product Apache Spark
>>>>>> 3.4.0 if it omits a patch updating a Scala maintenance version. This
>>>>>> difference has no known impact on usage, as far as I can tell.
>>>>>>
>>>>>> Let's see what policy requires:
>>>>>>
>>>>>> 1/ All source code changes must meet at least one of the acceptable
>>>>>> changes criteria set out below:
>>>>>> - The change has accepted by the relevant Apache p

Re: ASF policy violation and Scala version issues

2023-06-07 Thread Grisha Weintraub
Yes, in Spark UI you have it as "3.1.2-amazon", but when you create a
cluster it's just Spark 3.1.2.

On Wed, Jun 7, 2023 at 10:05 PM Nan Zhu  wrote:

>
>  for EMR, I think they show 3.1.2-amazon in Spark UI, no?
>
>
> On Wed, Jun 7, 2023 at 11:30 Grisha Weintraub 
> wrote:
>
>> Hi,
>>
>> I am not taking sides here, but just for fairness, I think it should be
>> noted that AWS EMR does exactly the same thing.
>> We choose the EMR version (e.g., 6.4.0) and it has an associated Spark
>> version (e.g., 3.1.2).
>> The Spark version here is not the original Apache version but AWS Spark
>> distribution.
>>
>> On Wed, Jun 7, 2023 at 8:24 PM Dongjoon Hyun 
>> wrote:
>>
>>> I disagree with you in several ways.
>>>
>>> The following is not a *minor* change like the given examples
>>> (alterations to the start-up and shutdown scripts, configuration files,
>>> file layout etc.).
>>>
>>> > The change you cite meets the 4th point, minor change, made for
>>> integration reasons.
>>>
>>> The following is also wrong. There is no such point of state of Apache
>>> Spark 3.4.0 after 3.4.0 tag creation. Apache Spark community didn't allow
>>> Scala reverting patches in both `master` branch and `branch-3.4`.
>>>
>>> > There is no known technical objection; this was after all at one point
>>> the state of Apache Spark.
>>>
>>> Is the following your main point? So, you are selling a box "including
>>> Harry Potter by J. K. Rolling whose main character is Barry instead of
>>> Harry", but it's okay because you didn't sell the book itself? And, as a
>>> cloud-vendor, you borrowed the box instead of selling it like private
>>> libraries?
>>>
>>> > There is no standalone distribution of Apache Spark anywhere here.
>>>
>>> We are not asking a big thing. Why are you so reluctant to say you are
>>> not "Apache Spark 3.4.0" by simply saying "Apache Spark 3.4.0-databricks".
>>> What is the marketing reason here?
>>>
>>> Dongjoon.
>>>
>>>
>>> On Wed, Jun 7, 2023 at 9:27 AM Sean Owen  wrote:
>>>
>>>> Hi Dongjoon, I think this conversation is not advancing anymore. I
>>>> personally consider the matter closed unless you can find other support or
>>>> respond with more specifics. While this perhaps should be on private@,
>>>> I think it's not wrong as an instructive discussion on dev@.
>>>>
>>>> I don't believe you've made a clear argument about the problem, or how
>>>> it relates specifically to policy. Nevertheless I will show you my logic.
>>>>
>>>> You are asserting that a vendor cannot call a product Apache Spark
>>>> 3.4.0 if it omits a patch updating a Scala maintenance version. This
>>>> difference has no known impact on usage, as far as I can tell.
>>>>
>>>> Let's see what policy requires:
>>>>
>>>> 1/ All source code changes must meet at least one of the acceptable
>>>> changes criteria set out below:
>>>> - The change has accepted by the relevant Apache project community for
>>>> inclusion in a future release. Note that the process used to accept changes
>>>> and how that acceptance is documented varies between projects.
>>>> - A change is a fix for an undisclosed security issue; and the fix is
>>>> not publicly disclosed as as security fix; and the Apache project has been
>>>> notified of the both issue and the proposed fix; and the PMC has rejected
>>>> neither the vulnerability report nor the proposed fix.
>>>> - A change is a fix for a bug; and the Apache project has been notified
>>>> of both the bug and the proposed fix; and the PMC has rejected neither the
>>>> bug report nor the proposed fix.
>>>> - Minor changes (e.g. alterations to the start-up and shutdown scripts,
>>>> configuration files, file layout etc.) to integrate with the target
>>>> platform providing the Apache project has not objected to those changes.
>>>>
>>>> The change you cite meets the 4th point, minor change, made for
>>>> integration reasons. There is no known technical objection; this was after
>>>> all at one point the state of Apache Spark.
>>>>
>>>>
>>>> 2/ A version number must be used that both clearly differentiates it
>>>> from an Apache Software Foundation release and clearly i

Re: ASF policy violation and Scala version issues

2023-06-07 Thread Grisha Weintraub
Hi,

I am not taking sides here, but just for fairness, I think it should be
noted that AWS EMR does exactly the same thing.
We choose the EMR version (e.g., 6.4.0) and it has an associated Spark
version (e.g., 3.1.2).
The Spark version here is not the original Apache version but AWS Spark
distribution.

On Wed, Jun 7, 2023 at 8:24 PM Dongjoon Hyun 
wrote:

> I disagree with you in several ways.
>
> The following is not a *minor* change like the given examples (alterations
> to the start-up and shutdown scripts, configuration files, file layout
> etc.).
>
> > The change you cite meets the 4th point, minor change, made for
> integration reasons.
>
> The following is also wrong. There is no such point of state of Apache
> Spark 3.4.0 after 3.4.0 tag creation. Apache Spark community didn't allow
> Scala reverting patches in both `master` branch and `branch-3.4`.
>
> > There is no known technical objection; this was after all at one point
> the state of Apache Spark.
>
> Is the following your main point? So, you are selling a box "including
> Harry Potter by J. K. Rolling whose main character is Barry instead of
> Harry", but it's okay because you didn't sell the book itself? And, as a
> cloud-vendor, you borrowed the box instead of selling it like private
> libraries?
>
> > There is no standalone distribution of Apache Spark anywhere here.
>
> We are not asking a big thing. Why are you so reluctant to say you are not
> "Apache Spark 3.4.0" by simply saying "Apache Spark 3.4.0-databricks". What
> is the marketing reason here?
>
> Dongjoon.
>
>
> On Wed, Jun 7, 2023 at 9:27 AM Sean Owen  wrote:
>
>> Hi Dongjoon, I think this conversation is not advancing anymore. I
>> personally consider the matter closed unless you can find other support or
>> respond with more specifics. While this perhaps should be on private@, I
>> think it's not wrong as an instructive discussion on dev@.
>>
>> I don't believe you've made a clear argument about the problem, or how it
>> relates specifically to policy. Nevertheless I will show you my logic.
>>
>> You are asserting that a vendor cannot call a product Apache Spark 3.4.0
>> if it omits a patch updating a Scala maintenance version. This difference
>> has no known impact on usage, as far as I can tell.
>>
>> Let's see what policy requires:
>>
>> 1/ All source code changes must meet at least one of the acceptable
>> changes criteria set out below:
>> - The change has accepted by the relevant Apache project community for
>> inclusion in a future release. Note that the process used to accept changes
>> and how that acceptance is documented varies between projects.
>> - A change is a fix for an undisclosed security issue; and the fix is not
>> publicly disclosed as as security fix; and the Apache project has been
>> notified of the both issue and the proposed fix; and the PMC has rejected
>> neither the vulnerability report nor the proposed fix.
>> - A change is a fix for a bug; and the Apache project has been notified
>> of both the bug and the proposed fix; and the PMC has rejected neither the
>> bug report nor the proposed fix.
>> - Minor changes (e.g. alterations to the start-up and shutdown scripts,
>> configuration files, file layout etc.) to integrate with the target
>> platform providing the Apache project has not objected to those changes.
>>
>> The change you cite meets the 4th point, minor change, made for
>> integration reasons. There is no known technical objection; this was after
>> all at one point the state of Apache Spark.
>>
>>
>> 2/ A version number must be used that both clearly differentiates it from
>> an Apache Software Foundation release and clearly identifies the Apache
>> Software Foundation version on which the software is based.
>>
>> Keep in mind the product here is not "Apache Spark", but the "Databricks
>> Runtime 13.1 (including Apache Spark 3.4.0)". That is, there is far more
>> than a version number differentiating this product from Apache Spark. There
>> is no standalone distribution of Apache Spark anywhere here. I believe that
>> easily matches the intent.
>>
>>
>> 3/ The documentation must clearly identify the Apache Software Foundation
>> version on which the software is based.
>>
>> Clearly, yes.
>>
>>
>> 4/ The end user expects that the distribution channel will back-port
>> fixes. It is not necessary to back-port all fixes. Selection of fixes to
>> back-port must be consistent with the update policy of that distribution
>> channel.
>>
>> I think this is safe to say too. Indeed this explicitly contemplates not
>> back-porting a change.
>>
>>
>> Backing up, you can see from this document that the spirit of it is:
>> don't include changes in your own Apache Foo x.y that aren't wanted by the
>> project, and still call it Apache Foo x.y. I don't believe your case
>> matches this spirit either.
>>
>> I do think it's not crazy to suggest, hey vendor, would you call this
>> "Apache Spark + patches" or ".vendor123". But that's at best a suggestion,