Re: ASF policy violation and Scala version issues

2023-06-16 Thread Dongjoon Hyun
I want to add two updates on this thread.


First, Ammonite library issue is resolved for Scala 2.12/2.13. For Scala 3, we 
can talk later in the scope of Spark 4.

SPARK-44041 Upgrade Ammonite to 2.5.9

This unblocked the following and we start to evaluate them.

SPARK-43832 Upgrade Scala to 2.12.18
SPARK-40497 Upgrade Scala to 2.13.11


Second, Sean, Mitch, and Grisha shared their different perspective on ASF legal 
issue and Apache Spark PMC role in this thread. Thank you again for sharing 
your idea explicitly. I'm going to start a vote for that specifically to build 
a consensus and have a conclusion.


Dongjoon


On 2023/06/12 08:15:39 Dongjoon Hyun wrote:
> Let me add my answers about a few Scala questions, Jungtaek.
> 
> > Are we concerned that a library does not release a new version
> > which bumps the Scala version, which the Scala version is
> > announced in less than a week?
> 
> No, we have concerns about the newly introduced disability
> in the Apache Spark Scala environment.
> 
> 
> 
> > Shall we respect the efforts of all maintainers of open source projects
> > we use as dependencies, regardless whether they are ASF projects or
> > individuals?
> 
> Not only respecting all the efforts, but also Yang Jie and I've been
> participating in those individual projects to help them and us.
> I believe that we've aimed our best collaboration there.
> 
> 
> > Bumping a bugfix version is not always safe,
> > especially for Scala where they use semver as one level down
> > their minor version is almost another's major version
> > (similar amount of pain on upgrading).
> 
> I agree with you in two ways.
> 
> 1. Before adding Ammonite dependency, Apache Spark community itself was one
> of the major Scala users who participated in new version testing and we
> gave active feedback to the Scala community. In addition, we decide whether
> to consume it or not by ourselves. Now, the Apache Spark community has lost
> our ability to consume it because it fails at the dependency downloading
> step. We are waiting because we don't have an alternative. That's a big
> difference; to be able or not.
> 
> 2. Again, I must reiterate that that's one of the reasons why I reported an
> issue, "There is a company claiming something non-Apache like "Apache Spark
> 3.4.0 minus SPARK-40436" with the name "Apache Spark 3.4.0."
> 
> 
> Dongjoon.
> 

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: ASF policy violation and Scala version issues

2023-06-12 Thread Dongjoon Hyun
Let me add my answers about a few Scala questions, Jungtaek.

> Are we concerned that a library does not release a new version
> which bumps the Scala version, which the Scala version is
> announced in less than a week?

No, we have concerns about the newly introduced disability
in the Apache Spark Scala environment.



> Shall we respect the efforts of all maintainers of open source projects
> we use as dependencies, regardless whether they are ASF projects or
> individuals?

Not only respecting all the efforts, but also Yang Jie and I've been
participating in those individual projects to help them and us.
I believe that we've aimed our best collaboration there.


> Bumping a bugfix version is not always safe,
> especially for Scala where they use semver as one level down
> their minor version is almost another's major version
> (similar amount of pain on upgrading).

I agree with you in two ways.

1. Before adding Ammonite dependency, Apache Spark community itself was one
of the major Scala users who participated in new version testing and we
gave active feedback to the Scala community. In addition, we decide whether
to consume it or not by ourselves. Now, the Apache Spark community has lost
our ability to consume it because it fails at the dependency downloading
step. We are waiting because we don't have an alternative. That's a big
difference; to be able or not.

2. Again, I must reiterate that that's one of the reasons why I reported an
issue, "There is a company claiming something non-Apache like "Apache Spark
3.4.0 minus SPARK-40436" with the name "Apache Spark 3.4.0."


Dongjoon.


Re: ASF policy violation and Scala version issues

2023-06-11 Thread yangjie01
Yes, you're right.

发件人: Jungtaek Lim 
日期: 2023年6月12日 星期一 11:37
收件人: Dongjoon Hyun 
抄送: yangjie01 , Grisha Weintraub 
, Nan Zhu , Sean Owen 
, "dev@spark.apache.org" 
主题: Re: ASF policy violation and Scala version issues

Are we concerned that a library does not release a new version which bumps the 
Scala version, which the Scala version is announced in less than a week?
Shall we respect the efforts of all maintainers of open source projects we use 
as dependencies, regardless whether they are ASF projects or individuals? 
Individual projects consist of volunteers (unlike projects which are backed by 
small and big companies). Please remember they have their daily job different 
from these projects.

Also, if you look at the thread for 
2.13.11<https://mailshield.baidu.com/check?q=E9xf51i5sa2wqrDNTPMGP%2bJ5vRDX6swwWoUVM4BOHuieR2C2tiXELn1Naidu10goaNed1unydE%2fIxEO0qZGbt%2fip1OknC50DrNwIIzOCklI%3d>,
 they found two regressions in only 3 days, even before they announced the 
version. Bumping a bugfix version is not always safe, especially for Scala 
where they use semver as one level down - their minor version is almost 
another's major version (similar amount of pain on upgrading).

Btw, I see this is an effort of supporting JDK 21, but GA of JDK 21 is planned 
for September 19, according to the post in 
InfoQ<https://mailshield.baidu.com/check?q=XlovhAviAP6grLxmKUhESlErm%2fUkjSdr%2feFLoPZRR9i5ZTM71no0%2bluQ8wxVmswYS88X7DFnKECffb07Qw4k9n0Puo4%3d>.
 Do we need to be coupled with a Java version which is not even released yet? 
Shall we postpone this to Spark 4.0, as we say supporting JDK 21 is a stretched 
goal for Spark 3.5 rather than a blocker?
This is not a complete view, but one post about JDK usage among LTS 
versions<https://mailshield.baidu.com/check?q=Xt%2frUApteavMO2fTJrCxp0oWmNB6TUnaYqEATZ10xmO5RdRalPvmJ1sl43F8sXytpNk0y%2baYfvTJ%2f2ZAE0MPN4joQb30ita%2bPg2Lhw%3d%3d>
 shows that JDK 17 is still less than 10% although it was released 1.5 years 
ago, and in last year it was less than 0.5%. In the real world, Java 11 is 
still a majority and growing up, and 17 is slowly catching up. Even though JDK 
21 will be released tomorrow, we will have more than one year to support it.



On Mon, Jun 12, 2023 at 4:54 AM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
Yes, that's exactly the pain point. I totally agree with you.
For now, we are focusing on other stuffs more, but we need to resolve this 
situation soon.

Dongjoon.


On Sun, Jun 11, 2023 at 1:21 AM yangjie01 
mailto:yangji...@baidu.com>> wrote:
Perhaps we should reconsider our reliance on and use of Ammonite? There are 
still no new available versions of Ammonite one week after the release of Scala 
2.12.18 and 2.13.11. The question related to version release in the Ammonite 
community also did not receive a response, which makes me feel this is 
unexpected. Of course, we can also wait for a while before making a decision.

```
Scala version upgrade is blocked by the Ammonite library dev cycle currently.

Although we discussed it here and it had good intentions,
the current master branch cannot use the latest Scala.

- 
https://lists.apache.org/thread/4nk5ddtmlobdt8g3z8xbqjclzkhlsdfk<https://mailshield.baidu.com/check?q=a0CRn0If1fLAaBgzrkizNpbJftqXtEqgcW38yNaIQU0Q%2bmjDPAzVRvE67%2blIinmxUzxEubVP%2fhQb3ZmEtUYFNqDCCXU%3d>
"Ammonite as REPL for Spark Connect"
 SPARK-42884 Add Ammonite REPL integration

Specifically, the following are blocked and I'm monitoring the Ammonite 
repository.
- SPARK-40497 Upgrade Scala to 2.13.11
- SPARK-43832 Upgrade Scala to 2.12.18
- According to 
https://github.com/com-lihaoyi/Ammonite/issues<https://mailshield.baidu.com/check?q=NMT2mSYh9onPK%2fRWv7ZdEPl7eFGwlK%2fKLvFdLs%2f1hex2Mqxu8x5e0CQVe0OwQtVEqqli7w%3d%3d>
 ,
  Scala 3.3.0 LTS support also looks infeasible.

Although we may be able to wait for a while, there are two fundamental 
solutions
to unblock this situation in a long-term maintenance perspective.
- Replace it with a Scala-shell based implementation
- Move `connector/connect/client/jvm/pom.xml` outside from Spark repo.
   Maybe, we can put it into the new repo like Rust and Go client.
```
发件人: Grisha Weintraub 
mailto:grisha.weintr...@gmail.com>>
日期: 2023年6月8日 星期四 04:05
收件人: Dongjoon Hyun mailto:dongjoon.h...@gmail.com>>
抄送: Nan Zhu mailto:zhunanmcg...@gmail.com>>, Sean Owen 
mailto:sro...@gmail.com>>, 
"dev@spark.apache.org<mailto:dev@spark.apache.org>" 
mailto:dev@spark.apache.org>>
主题: Re: ASF policy violation and Scala version issues

Dongjoon,

I followed the conversation, and in my opinion, your concern is totally legit.
It just feels that the discussion is focused solely on Databricks, and as I 
said above, the same issue occurs in other vendors as well.


On Wed, Jun 7, 2023 at 10:28 PM Dongjoon Hyun 
mailto:dongjoon.h...@gmail

Re: ASF policy violation and Scala version issues

2023-06-11 Thread Jungtaek Lim
Are we concerned that a library does not release a new version which bumps
the Scala version, which the Scala version is announced in less than a week?
Shall we respect the efforts of all maintainers of open source projects we
use as dependencies, regardless whether they are ASF projects or
individuals? Individual projects consist of volunteers (unlike projects
which are backed by small and big companies). Please remember they have
their daily job different from these projects.

Also, if you look at the thread for 2.13.11
<https://contributors.scala-lang.org/t/scala-2-13-11-release-planning/6088/16>,
they found two regressions in only 3 days, even before they announced the
version. Bumping a bugfix version is not always safe, especially for Scala
where they use semver as one level down - their minor version is almost
another's major version (similar amount of pain on upgrading).

Btw, I see this is an effort of supporting JDK 21, but GA of JDK 21 is
planned for September 19, according to the post in InfoQ
<https://www.infoq.com/news/2023/04/java-news-roundup-mar27-2023/>. Do we
need to be coupled with a Java version which is not even released yet?
Shall we postpone this to Spark 4.0, as we say supporting JDK 21 is a
stretched goal for Spark 3.5 rather than a blocker?
This is not a complete view, but one post about JDK usage among LTS versions
<https://newrelic.com/resources/report/2023-state-of-the-java-ecosystem>
shows that JDK 17 is still less than 10% although it was released 1.5 years
ago, and in last year it was less than 0.5%. In the real world, Java 11 is
still a majority and growing up, and 17 is slowly catching up. Even though
JDK 21 will be released tomorrow, we will have more than one year to
support it.



On Mon, Jun 12, 2023 at 4:54 AM Dongjoon Hyun 
wrote:

> Yes, that's exactly the pain point. I totally agree with you.
> For now, we are focusing on other stuffs more, but we need to resolve this
> situation soon.
>
> Dongjoon.
>
>
> On Sun, Jun 11, 2023 at 1:21 AM yangjie01  wrote:
>
>> Perhaps we should reconsider our reliance on and use of Ammonite? There
>> are still no new available versions of Ammonite one week after the release
>> of Scala 2.12.18 and 2.13.11. The question related to version release in
>> the Ammonite community also did not receive a response, which makes me feel
>> this is unexpected. Of course, we can also wait for a while before making a
>> decision.
>>
>>
>>
>> ```
>>
>> Scala version upgrade is blocked by the Ammonite library dev cycle
>> currently.
>>
>> Although we discussed it here and it had good intentions,
>> the current master branch cannot use the latest Scala.
>>
>> - https://lists.apache.org/thread/4nk5ddtmlobdt8g3z8xbqjclzkhlsdfk
>> <https://mailshield.baidu.com/check?q=a0CRn0If1fLAaBgzrkizNpbJftqXtEqgcW38yNaIQU0Q%2bmjDPAzVRvE67%2blIinmxUzxEubVP%2fhQb3ZmEtUYFNqDCCXU%3d>
>> "Ammonite as REPL for Spark Connect"
>>  SPARK-42884 Add Ammonite REPL integration
>>
>> Specifically, the following are blocked and I'm monitoring the
>> Ammonite repository.
>> - SPARK-40497 Upgrade Scala to 2.13.11
>> - SPARK-43832 Upgrade Scala to 2.12.18
>> - According to https://github.com/com-lihaoyi/Ammonite/issues
>> <https://mailshield.baidu.com/check?q=NMT2mSYh9onPK%2fRWv7ZdEPl7eFGwlK%2fKLvFdLs%2f1hex2Mqxu8x5e0CQVe0OwQtVEqqli7w%3d%3d>
>>  ,
>>   Scala 3.3.0 LTS support also looks infeasible.
>>
>> Although we may be able to wait for a while, there are two
>> fundamental solutions
>> to unblock this situation in a long-term maintenance perspective.
>> - Replace it with a Scala-shell based implementation
>> - Move `connector/connect/client/jvm/pom.xml` outside from Spark repo.
>>    Maybe, we can put it into the new repo like Rust and Go client.
>>
>> ```
>>
>> *发件人**: *Grisha Weintraub 
>> *日期**: *2023年6月8日 星期四 04:05
>> *收件人**: *Dongjoon Hyun 
>> *抄送**: *Nan Zhu , Sean Owen , "
>> dev@spark.apache.org" 
>> *主题**: *Re: ASF policy violation and Scala version issues
>>
>>
>>
>> Dongjoon,
>>
>>
>>
>> I followed the conversation, and in my opinion, your concern is totally
>> legit.
>> It just feels that the discussion is focused solely on Databricks, and as
>> I said above, the same issue occurs in other vendors as well.
>>
>>
>>
>>
>>
>> On Wed, Jun 7, 2023 at 10:28 PM Dongjoon Hyun 
>> wrote:
>>
>> To Grisha, we are talking about what is the right way and how to comply
>> with ASF legal advice which I shared in this thread from "legal-

Re: ASF policy violation and Scala version issues

2023-06-11 Thread Dongjoon Hyun
Yes, that's exactly the pain point. I totally agree with you.
For now, we are focusing on other stuffs more, but we need to resolve this
situation soon.

Dongjoon.


On Sun, Jun 11, 2023 at 1:21 AM yangjie01  wrote:

> Perhaps we should reconsider our reliance on and use of Ammonite? There
> are still no new available versions of Ammonite one week after the release
> of Scala 2.12.18 and 2.13.11. The question related to version release in
> the Ammonite community also did not receive a response, which makes me feel
> this is unexpected. Of course, we can also wait for a while before making a
> decision.
>
>
>
> ```
>
> Scala version upgrade is blocked by the Ammonite library dev cycle
> currently.
>
> Although we discussed it here and it had good intentions,
> the current master branch cannot use the latest Scala.
>
> - https://lists.apache.org/thread/4nk5ddtmlobdt8g3z8xbqjclzkhlsdfk
> <https://mailshield.baidu.com/check?q=a0CRn0If1fLAaBgzrkizNpbJftqXtEqgcW38yNaIQU0Q%2bmjDPAzVRvE67%2blIinmxUzxEubVP%2fhQb3ZmEtUYFNqDCCXU%3d>
> "Ammonite as REPL for Spark Connect"
>  SPARK-42884 Add Ammonite REPL integration
>
> Specifically, the following are blocked and I'm monitoring the
> Ammonite repository.
> - SPARK-40497 Upgrade Scala to 2.13.11
> - SPARK-43832 Upgrade Scala to 2.12.18
> - According to https://github.com/com-lihaoyi/Ammonite/issues
> <https://mailshield.baidu.com/check?q=NMT2mSYh9onPK%2fRWv7ZdEPl7eFGwlK%2fKLvFdLs%2f1hex2Mqxu8x5e0CQVe0OwQtVEqqli7w%3d%3d>
>  ,
>   Scala 3.3.0 LTS support also looks infeasible.
>
> Although we may be able to wait for a while, there are two fundamental
> solutions
> to unblock this situation in a long-term maintenance perspective.
> - Replace it with a Scala-shell based implementation
> - Move `connector/connect/client/jvm/pom.xml` outside from Spark repo.
>Maybe, we can put it into the new repo like Rust and Go client.
>
> ```
>
> *发件人**: *Grisha Weintraub 
> *日期**: *2023年6月8日 星期四 04:05
> *收件人**: *Dongjoon Hyun 
> *抄送**: *Nan Zhu , Sean Owen , "
> dev@spark.apache.org" 
> *主题**: *Re: ASF policy violation and Scala version issues
>
>
>
> Dongjoon,
>
>
>
> I followed the conversation, and in my opinion, your concern is totally
> legit.
> It just feels that the discussion is focused solely on Databricks, and as
> I said above, the same issue occurs in other vendors as well.
>
>
>
>
>
> On Wed, Jun 7, 2023 at 10:28 PM Dongjoon Hyun 
> wrote:
>
> To Grisha, we are talking about what is the right way and how to comply
> with ASF legal advice which I shared in this thread from "legal-discuss@"
> mailing thread.
>
>
>
> https://lists.apache.org/thread/mzhggd0rpz8t4d7vdsbhkp38mvd3lty4
> <https://mailshield.baidu.com/check?q=ZwiIuh1GjzJ832wcY43%2filMC89G28qpX1MwPTnGE7kWNMJuVe0FwSuGJ6LAJTTLxv%2fy5Mv0poHnEa2T7SxQr4gzLc2I%3d>
>  (legal-discuss@)
>
> https://www.apache.org/foundation/marks/downstream.html#source
> <https://mailshield.baidu.com/check?q=wtR8UhV2EuUe5pw6boBqY5wTjAhKC8N2YWd1CnMAN3Mi58ZQ5oaSUx92kUzkH%2fwAZRZhN7Rus0A1VMxjHf90qN3oMBY%3d>
>  (ASF
> Website)
>
>
>
> Dongjoon
>
>
>
>
>
> On Wed, Jun 7, 2023 at 12:16 PM Grisha Weintraub <
> grisha.weintr...@gmail.com> wrote:
>
> Yes, in Spark UI you have it as "3.1.2-amazon", but when you create a
> cluster it's just Spark 3.1.2.
>
>
>
> On Wed, Jun 7, 2023 at 10:05 PM Nan Zhu  wrote:
>
>
>
>  for EMR, I think they show 3.1.2-amazon in Spark UI, no?
>
>
>
>
>
> On Wed, Jun 7, 2023 at 11:30 Grisha Weintraub 
> wrote:
>
> Hi,
>
>
>
> I am not taking sides here, but just for fairness, I think it should be
> noted that AWS EMR does exactly the same thing.
>
> We choose the EMR version (e.g., 6.4.0) and it has an associated Spark
> version (e.g., 3.1.2).
>
> The Spark version here is not the original Apache version but AWS Spark
> distribution.
>
>
>
> On Wed, Jun 7, 2023 at 8:24 PM Dongjoon Hyun 
> wrote:
>
> I disagree with you in several ways.
>
>
>
> The following is not a *minor* change like the given examples (alterations
> to the start-up and shutdown scripts, configuration files, file layout
> etc.).
>
>
>
> > The change you cite meets the 4th point, minor change, made for
> integration reasons.
>
>
>
> The following is also wrong. There is no such point of state of Apache
> Spark 3.4.0 after 3.4.0 tag creation. Apache Spark community didn't allow
> Scala reverting patches in both `master` branch and `branch-3.4`.
>
>
>
> > There is no known t

Re: ASF policy violation and Scala version issues

2023-06-11 Thread yangjie01
Perhaps we should reconsider our reliance on and use of Ammonite? There are 
still no new available versions of Ammonite one week after the release of Scala 
2.12.18 and 2.13.11. The question related to version release in the Ammonite 
community also did not receive a response, which makes me feel this is 
unexpected. Of course, we can also wait for a while before making a decision.

```
Scala version upgrade is blocked by the Ammonite library dev cycle currently.

Although we discussed it here and it had good intentions,
the current master branch cannot use the latest Scala.

- 
https://lists.apache.org/thread/4nk5ddtmlobdt8g3z8xbqjclzkhlsdfk<https://mailshield.baidu.com/check?q=a0CRn0If1fLAaBgzrkizNpbJftqXtEqgcW38yNaIQU0Q%2bmjDPAzVRvE67%2blIinmxUzxEubVP%2fhQb3ZmEtUYFNqDCCXU%3d>
"Ammonite as REPL for Spark Connect"
 SPARK-42884 Add Ammonite REPL integration

Specifically, the following are blocked and I'm monitoring the Ammonite 
repository.
- SPARK-40497 Upgrade Scala to 2.13.11
- SPARK-43832 Upgrade Scala to 2.12.18
- According to 
https://github.com/com-lihaoyi/Ammonite/issues<https://mailshield.baidu.com/check?q=NMT2mSYh9onPK%2fRWv7ZdEPl7eFGwlK%2fKLvFdLs%2f1hex2Mqxu8x5e0CQVe0OwQtVEqqli7w%3d%3d>
 ,
  Scala 3.3.0 LTS support also looks infeasible.

Although we may be able to wait for a while, there are two fundamental 
solutions
to unblock this situation in a long-term maintenance perspective.
- Replace it with a Scala-shell based implementation
- Move `connector/connect/client/jvm/pom.xml` outside from Spark repo.
   Maybe, we can put it into the new repo like Rust and Go client.
```
发件人: Grisha Weintraub 
日期: 2023年6月8日 星期四 04:05
收件人: Dongjoon Hyun 
抄送: Nan Zhu , Sean Owen , 
"dev@spark.apache.org" 
主题: Re: ASF policy violation and Scala version issues

Dongjoon,

I followed the conversation, and in my opinion, your concern is totally legit.
It just feels that the discussion is focused solely on Databricks, and as I 
said above, the same issue occurs in other vendors as well.


On Wed, Jun 7, 2023 at 10:28 PM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
To Grisha, we are talking about what is the right way and how to comply with 
ASF legal advice which I shared in this thread from "legal-discuss@" mailing 
thread.

https://lists.apache.org/thread/mzhggd0rpz8t4d7vdsbhkp38mvd3lty4<https://mailshield.baidu.com/check?q=ZwiIuh1GjzJ832wcY43%2filMC89G28qpX1MwPTnGE7kWNMJuVe0FwSuGJ6LAJTTLxv%2fy5Mv0poHnEa2T7SxQr4gzLc2I%3d>
 (legal-discuss@)
https://www.apache.org/foundation/marks/downstream.html#source<https://mailshield.baidu.com/check?q=wtR8UhV2EuUe5pw6boBqY5wTjAhKC8N2YWd1CnMAN3Mi58ZQ5oaSUx92kUzkH%2fwAZRZhN7Rus0A1VMxjHf90qN3oMBY%3d>
 (ASF Website)

Dongjoon


On Wed, Jun 7, 2023 at 12:16 PM Grisha Weintraub 
mailto:grisha.weintr...@gmail.com>> wrote:
Yes, in Spark UI you have it as "3.1.2-amazon", but when you create a cluster 
it's just Spark 3.1.2.

On Wed, Jun 7, 2023 at 10:05 PM Nan Zhu 
mailto:zhunanmcg...@gmail.com>> wrote:

 for EMR, I think they show 3.1.2-amazon in Spark UI, no?


On Wed, Jun 7, 2023 at 11:30 Grisha Weintraub 
mailto:grisha.weintr...@gmail.com>> wrote:
Hi,

I am not taking sides here, but just for fairness, I think it should be noted 
that AWS EMR does exactly the same thing.
We choose the EMR version (e.g., 6.4.0) and it has an associated Spark version 
(e.g., 3.1.2).
The Spark version here is not the original Apache version but AWS Spark 
distribution.

On Wed, Jun 7, 2023 at 8:24 PM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
I disagree with you in several ways.

The following is not a *minor* change like the given examples (alterations to 
the start-up and shutdown scripts, configuration files, file layout etc.).

> The change you cite meets the 4th point, minor change, made for integration 
> reasons.

The following is also wrong. There is no such point of state of Apache Spark 
3.4.0 after 3.4.0 tag creation. Apache Spark community didn't allow Scala 
reverting patches in both `master` branch and `branch-3.4`.

> There is no known technical objection; this was after all at one point the 
> state of Apache Spark.

Is the following your main point? So, you are selling a box "including Harry 
Potter by J. K. Rolling whose main character is Barry instead of Harry", but 
it's okay because you didn't sell the book itself? And, as a cloud-vendor, you 
borrowed the box instead of selling it like private libraries?

> There is no standalone distribution of Apache Spark anywhere here.

We are not asking a big thing. Why are you so reluctant to say you are not 
"Apache Spark 3.4.0" by simply saying "Apache Spark 3.4.0-databricks". What is 
the marketing reason here?

Dongjoon.


On Wed, Jun 7, 2023 at 9:27 AM Sean Owen 
mailto:sro...@gmail.com>> wrote:
Hi Do

Re: ASF policy violation and Scala version issues

2023-06-07 Thread Grisha Weintraub
Dongjoon,

I followed the conversation, and in my opinion, your concern is totally
legit.
It just feels that the discussion is focused solely on Databricks, and as I
said above, the same issue occurs in other vendors as well.


On Wed, Jun 7, 2023 at 10:28 PM Dongjoon Hyun 
wrote:

> To Grisha, we are talking about what is the right way and how to comply
> with ASF legal advice which I shared in this thread from "legal-discuss@"
> mailing thread.
>
> https://lists.apache.org/thread/mzhggd0rpz8t4d7vdsbhkp38mvd3lty4
>  (legal-discuss@)
> https://www.apache.org/foundation/marks/downstream.html#source (ASF
> Website)
>
> Dongjoon
>
>
> On Wed, Jun 7, 2023 at 12:16 PM Grisha Weintraub <
> grisha.weintr...@gmail.com> wrote:
>
>> Yes, in Spark UI you have it as "3.1.2-amazon", but when you create a
>> cluster it's just Spark 3.1.2.
>>
>> On Wed, Jun 7, 2023 at 10:05 PM Nan Zhu  wrote:
>>
>>>
>>>  for EMR, I think they show 3.1.2-amazon in Spark UI, no?
>>>
>>>
>>> On Wed, Jun 7, 2023 at 11:30 Grisha Weintraub <
>>> grisha.weintr...@gmail.com> wrote:
>>>
 Hi,

 I am not taking sides here, but just for fairness, I think it should be
 noted that AWS EMR does exactly the same thing.
 We choose the EMR version (e.g., 6.4.0) and it has an associated Spark
 version (e.g., 3.1.2).
 The Spark version here is not the original Apache version but AWS Spark
 distribution.

 On Wed, Jun 7, 2023 at 8:24 PM Dongjoon Hyun 
 wrote:

> I disagree with you in several ways.
>
> The following is not a *minor* change like the given examples
> (alterations to the start-up and shutdown scripts, configuration files,
> file layout etc.).
>
> > The change you cite meets the 4th point, minor change, made for
> integration reasons.
>
> The following is also wrong. There is no such point of state of Apache
> Spark 3.4.0 after 3.4.0 tag creation. Apache Spark community didn't allow
> Scala reverting patches in both `master` branch and `branch-3.4`.
>
> > There is no known technical objection; this was after all at one
> point the state of Apache Spark.
>
> Is the following your main point? So, you are selling a box "including
> Harry Potter by J. K. Rolling whose main character is Barry instead of
> Harry", but it's okay because you didn't sell the book itself? And, as a
> cloud-vendor, you borrowed the box instead of selling it like private
> libraries?
>
> > There is no standalone distribution of Apache Spark anywhere here.
>
> We are not asking a big thing. Why are you so reluctant to say you are
> not "Apache Spark 3.4.0" by simply saying "Apache Spark 3.4.0-databricks".
> What is the marketing reason here?
>
> Dongjoon.
>
>
> On Wed, Jun 7, 2023 at 9:27 AM Sean Owen  wrote:
>
>> Hi Dongjoon, I think this conversation is not advancing anymore. I
>> personally consider the matter closed unless you can find other support 
>> or
>> respond with more specifics. While this perhaps should be on private@,
>> I think it's not wrong as an instructive discussion on dev@.
>>
>> I don't believe you've made a clear argument about the problem, or
>> how it relates specifically to policy. Nevertheless I will show you my
>> logic.
>>
>> You are asserting that a vendor cannot call a product Apache Spark
>> 3.4.0 if it omits a patch updating a Scala maintenance version. This
>> difference has no known impact on usage, as far as I can tell.
>>
>> Let's see what policy requires:
>>
>> 1/ All source code changes must meet at least one of the acceptable
>> changes criteria set out below:
>> - The change has accepted by the relevant Apache project community
>> for inclusion in a future release. Note that the process used to accept
>> changes and how that acceptance is documented varies between projects.
>> - A change is a fix for an undisclosed security issue; and the fix is
>> not publicly disclosed as as security fix; and the Apache project has 
>> been
>> notified of the both issue and the proposed fix; and the PMC has rejected
>> neither the vulnerability report nor the proposed fix.
>> - A change is a fix for a bug; and the Apache project has been
>> notified of both the bug and the proposed fix; and the PMC has rejected
>> neither the bug report nor the proposed fix.
>> - Minor changes (e.g. alterations to the start-up and shutdown
>> scripts, configuration files, file layout etc.) to integrate with the
>> target platform providing the Apache project has not objected to those
>> changes.
>>
>> The change you cite meets the 4th point, minor change, made for
>> integration reasons. There is no known technical objection; this was 
>> after
>> all at one point the state of Apache Spark.
>>
>>
>> 2/ A version number must be 

Re: ASF policy violation and Scala version issues

2023-06-07 Thread Dongjoon Hyun
To Grisha, we are talking about what is the right way and how to comply
with ASF legal advice which I shared in this thread from "legal-discuss@"
mailing thread.

https://lists.apache.org/thread/mzhggd0rpz8t4d7vdsbhkp38mvd3lty4
 (legal-discuss@)
https://www.apache.org/foundation/marks/downstream.html#source (ASF Website)

Dongjoon


On Wed, Jun 7, 2023 at 12:16 PM Grisha Weintraub 
wrote:

> Yes, in Spark UI you have it as "3.1.2-amazon", but when you create a
> cluster it's just Spark 3.1.2.
>
> On Wed, Jun 7, 2023 at 10:05 PM Nan Zhu  wrote:
>
>>
>>  for EMR, I think they show 3.1.2-amazon in Spark UI, no?
>>
>>
>> On Wed, Jun 7, 2023 at 11:30 Grisha Weintraub 
>> wrote:
>>
>>> Hi,
>>>
>>> I am not taking sides here, but just for fairness, I think it should be
>>> noted that AWS EMR does exactly the same thing.
>>> We choose the EMR version (e.g., 6.4.0) and it has an associated Spark
>>> version (e.g., 3.1.2).
>>> The Spark version here is not the original Apache version but AWS Spark
>>> distribution.
>>>
>>> On Wed, Jun 7, 2023 at 8:24 PM Dongjoon Hyun 
>>> wrote:
>>>
 I disagree with you in several ways.

 The following is not a *minor* change like the given examples
 (alterations to the start-up and shutdown scripts, configuration files,
 file layout etc.).

 > The change you cite meets the 4th point, minor change, made for
 integration reasons.

 The following is also wrong. There is no such point of state of Apache
 Spark 3.4.0 after 3.4.0 tag creation. Apache Spark community didn't allow
 Scala reverting patches in both `master` branch and `branch-3.4`.

 > There is no known technical objection; this was after all at one
 point the state of Apache Spark.

 Is the following your main point? So, you are selling a box "including
 Harry Potter by J. K. Rolling whose main character is Barry instead of
 Harry", but it's okay because you didn't sell the book itself? And, as a
 cloud-vendor, you borrowed the box instead of selling it like private
 libraries?

 > There is no standalone distribution of Apache Spark anywhere here.

 We are not asking a big thing. Why are you so reluctant to say you are
 not "Apache Spark 3.4.0" by simply saying "Apache Spark 3.4.0-databricks".
 What is the marketing reason here?

 Dongjoon.


 On Wed, Jun 7, 2023 at 9:27 AM Sean Owen  wrote:

> Hi Dongjoon, I think this conversation is not advancing anymore. I
> personally consider the matter closed unless you can find other support or
> respond with more specifics. While this perhaps should be on private@,
> I think it's not wrong as an instructive discussion on dev@.
>
> I don't believe you've made a clear argument about the problem, or how
> it relates specifically to policy. Nevertheless I will show you my logic.
>
> You are asserting that a vendor cannot call a product Apache Spark
> 3.4.0 if it omits a patch updating a Scala maintenance version. This
> difference has no known impact on usage, as far as I can tell.
>
> Let's see what policy requires:
>
> 1/ All source code changes must meet at least one of the acceptable
> changes criteria set out below:
> - The change has accepted by the relevant Apache project community for
> inclusion in a future release. Note that the process used to accept 
> changes
> and how that acceptance is documented varies between projects.
> - A change is a fix for an undisclosed security issue; and the fix is
> not publicly disclosed as as security fix; and the Apache project has been
> notified of the both issue and the proposed fix; and the PMC has rejected
> neither the vulnerability report nor the proposed fix.
> - A change is a fix for a bug; and the Apache project has been
> notified of both the bug and the proposed fix; and the PMC has rejected
> neither the bug report nor the proposed fix.
> - Minor changes (e.g. alterations to the start-up and shutdown
> scripts, configuration files, file layout etc.) to integrate with the
> target platform providing the Apache project has not objected to those
> changes.
>
> The change you cite meets the 4th point, minor change, made for
> integration reasons. There is no known technical objection; this was after
> all at one point the state of Apache Spark.
>
>
> 2/ A version number must be used that both clearly differentiates it
> from an Apache Software Foundation release and clearly identifies the
> Apache Software Foundation version on which the software is based.
>
> Keep in mind the product here is not "Apache Spark", but the
> "Databricks Runtime 13.1 (including Apache Spark 3.4.0)". That is, there 
> is
> far more than a version number differentiating this product from Apache
> Spark. There is no standalone distribution of 

Re: ASF policy violation and Scala version issues

2023-06-07 Thread Grisha Weintraub
Yes, in Spark UI you have it as "3.1.2-amazon", but when you create a
cluster it's just Spark 3.1.2.

On Wed, Jun 7, 2023 at 10:05 PM Nan Zhu  wrote:

>
>  for EMR, I think they show 3.1.2-amazon in Spark UI, no?
>
>
> On Wed, Jun 7, 2023 at 11:30 Grisha Weintraub 
> wrote:
>
>> Hi,
>>
>> I am not taking sides here, but just for fairness, I think it should be
>> noted that AWS EMR does exactly the same thing.
>> We choose the EMR version (e.g., 6.4.0) and it has an associated Spark
>> version (e.g., 3.1.2).
>> The Spark version here is not the original Apache version but AWS Spark
>> distribution.
>>
>> On Wed, Jun 7, 2023 at 8:24 PM Dongjoon Hyun 
>> wrote:
>>
>>> I disagree with you in several ways.
>>>
>>> The following is not a *minor* change like the given examples
>>> (alterations to the start-up and shutdown scripts, configuration files,
>>> file layout etc.).
>>>
>>> > The change you cite meets the 4th point, minor change, made for
>>> integration reasons.
>>>
>>> The following is also wrong. There is no such point of state of Apache
>>> Spark 3.4.0 after 3.4.0 tag creation. Apache Spark community didn't allow
>>> Scala reverting patches in both `master` branch and `branch-3.4`.
>>>
>>> > There is no known technical objection; this was after all at one point
>>> the state of Apache Spark.
>>>
>>> Is the following your main point? So, you are selling a box "including
>>> Harry Potter by J. K. Rolling whose main character is Barry instead of
>>> Harry", but it's okay because you didn't sell the book itself? And, as a
>>> cloud-vendor, you borrowed the box instead of selling it like private
>>> libraries?
>>>
>>> > There is no standalone distribution of Apache Spark anywhere here.
>>>
>>> We are not asking a big thing. Why are you so reluctant to say you are
>>> not "Apache Spark 3.4.0" by simply saying "Apache Spark 3.4.0-databricks".
>>> What is the marketing reason here?
>>>
>>> Dongjoon.
>>>
>>>
>>> On Wed, Jun 7, 2023 at 9:27 AM Sean Owen  wrote:
>>>
 Hi Dongjoon, I think this conversation is not advancing anymore. I
 personally consider the matter closed unless you can find other support or
 respond with more specifics. While this perhaps should be on private@,
 I think it's not wrong as an instructive discussion on dev@.

 I don't believe you've made a clear argument about the problem, or how
 it relates specifically to policy. Nevertheless I will show you my logic.

 You are asserting that a vendor cannot call a product Apache Spark
 3.4.0 if it omits a patch updating a Scala maintenance version. This
 difference has no known impact on usage, as far as I can tell.

 Let's see what policy requires:

 1/ All source code changes must meet at least one of the acceptable
 changes criteria set out below:
 - The change has accepted by the relevant Apache project community for
 inclusion in a future release. Note that the process used to accept changes
 and how that acceptance is documented varies between projects.
 - A change is a fix for an undisclosed security issue; and the fix is
 not publicly disclosed as as security fix; and the Apache project has been
 notified of the both issue and the proposed fix; and the PMC has rejected
 neither the vulnerability report nor the proposed fix.
 - A change is a fix for a bug; and the Apache project has been notified
 of both the bug and the proposed fix; and the PMC has rejected neither the
 bug report nor the proposed fix.
 - Minor changes (e.g. alterations to the start-up and shutdown scripts,
 configuration files, file layout etc.) to integrate with the target
 platform providing the Apache project has not objected to those changes.

 The change you cite meets the 4th point, minor change, made for
 integration reasons. There is no known technical objection; this was after
 all at one point the state of Apache Spark.


 2/ A version number must be used that both clearly differentiates it
 from an Apache Software Foundation release and clearly identifies the
 Apache Software Foundation version on which the software is based.

 Keep in mind the product here is not "Apache Spark", but the
 "Databricks Runtime 13.1 (including Apache Spark 3.4.0)". That is, there is
 far more than a version number differentiating this product from Apache
 Spark. There is no standalone distribution of Apache Spark anywhere here. I
 believe that easily matches the intent.


 3/ The documentation must clearly identify the Apache Software
 Foundation version on which the software is based.

 Clearly, yes.


 4/ The end user expects that the distribution channel will back-port
 fixes. It is not necessary to back-port all fixes. Selection of fixes to
 back-port must be consistent with the update policy of that distribution
 channel.

 I think this is safe 

Re: ASF policy violation and Scala version issues

2023-06-07 Thread Nan Zhu
 for EMR, I think they show 3.1.2-amazon in Spark UI, no?


On Wed, Jun 7, 2023 at 11:30 Grisha Weintraub 
wrote:

> Hi,
>
> I am not taking sides here, but just for fairness, I think it should be
> noted that AWS EMR does exactly the same thing.
> We choose the EMR version (e.g., 6.4.0) and it has an associated Spark
> version (e.g., 3.1.2).
> The Spark version here is not the original Apache version but AWS Spark
> distribution.
>
> On Wed, Jun 7, 2023 at 8:24 PM Dongjoon Hyun 
> wrote:
>
>> I disagree with you in several ways.
>>
>> The following is not a *minor* change like the given examples
>> (alterations to the start-up and shutdown scripts, configuration files,
>> file layout etc.).
>>
>> > The change you cite meets the 4th point, minor change, made for
>> integration reasons.
>>
>> The following is also wrong. There is no such point of state of Apache
>> Spark 3.4.0 after 3.4.0 tag creation. Apache Spark community didn't allow
>> Scala reverting patches in both `master` branch and `branch-3.4`.
>>
>> > There is no known technical objection; this was after all at one point
>> the state of Apache Spark.
>>
>> Is the following your main point? So, you are selling a box "including
>> Harry Potter by J. K. Rolling whose main character is Barry instead of
>> Harry", but it's okay because you didn't sell the book itself? And, as a
>> cloud-vendor, you borrowed the box instead of selling it like private
>> libraries?
>>
>> > There is no standalone distribution of Apache Spark anywhere here.
>>
>> We are not asking a big thing. Why are you so reluctant to say you are
>> not "Apache Spark 3.4.0" by simply saying "Apache Spark 3.4.0-databricks".
>> What is the marketing reason here?
>>
>> Dongjoon.
>>
>>
>> On Wed, Jun 7, 2023 at 9:27 AM Sean Owen  wrote:
>>
>>> Hi Dongjoon, I think this conversation is not advancing anymore. I
>>> personally consider the matter closed unless you can find other support or
>>> respond with more specifics. While this perhaps should be on private@,
>>> I think it's not wrong as an instructive discussion on dev@.
>>>
>>> I don't believe you've made a clear argument about the problem, or how
>>> it relates specifically to policy. Nevertheless I will show you my logic.
>>>
>>> You are asserting that a vendor cannot call a product Apache Spark 3.4.0
>>> if it omits a patch updating a Scala maintenance version. This difference
>>> has no known impact on usage, as far as I can tell.
>>>
>>> Let's see what policy requires:
>>>
>>> 1/ All source code changes must meet at least one of the acceptable
>>> changes criteria set out below:
>>> - The change has accepted by the relevant Apache project community for
>>> inclusion in a future release. Note that the process used to accept changes
>>> and how that acceptance is documented varies between projects.
>>> - A change is a fix for an undisclosed security issue; and the fix is
>>> not publicly disclosed as as security fix; and the Apache project has been
>>> notified of the both issue and the proposed fix; and the PMC has rejected
>>> neither the vulnerability report nor the proposed fix.
>>> - A change is a fix for a bug; and the Apache project has been notified
>>> of both the bug and the proposed fix; and the PMC has rejected neither the
>>> bug report nor the proposed fix.
>>> - Minor changes (e.g. alterations to the start-up and shutdown scripts,
>>> configuration files, file layout etc.) to integrate with the target
>>> platform providing the Apache project has not objected to those changes.
>>>
>>> The change you cite meets the 4th point, minor change, made for
>>> integration reasons. There is no known technical objection; this was after
>>> all at one point the state of Apache Spark.
>>>
>>>
>>> 2/ A version number must be used that both clearly differentiates it
>>> from an Apache Software Foundation release and clearly identifies the
>>> Apache Software Foundation version on which the software is based.
>>>
>>> Keep in mind the product here is not "Apache Spark", but the "Databricks
>>> Runtime 13.1 (including Apache Spark 3.4.0)". That is, there is far more
>>> than a version number differentiating this product from Apache Spark. There
>>> is no standalone distribution of Apache Spark anywhere here. I believe that
>>> easily matches the intent.
>>>
>>>
>>> 3/ The documentation must clearly identify the Apache Software
>>> Foundation version on which the software is based.
>>>
>>> Clearly, yes.
>>>
>>>
>>> 4/ The end user expects that the distribution channel will back-port
>>> fixes. It is not necessary to back-port all fixes. Selection of fixes to
>>> back-port must be consistent with the update policy of that distribution
>>> channel.
>>>
>>> I think this is safe to say too. Indeed this explicitly contemplates not
>>> back-porting a change.
>>>
>>>
>>> Backing up, you can see from this document that the spirit of it is:
>>> don't include changes in your own Apache Foo x.y that aren't wanted by the
>>> project, and still 

Re: ASF policy violation and Scala version issues

2023-06-07 Thread Grisha Weintraub
Hi,

I am not taking sides here, but just for fairness, I think it should be
noted that AWS EMR does exactly the same thing.
We choose the EMR version (e.g., 6.4.0) and it has an associated Spark
version (e.g., 3.1.2).
The Spark version here is not the original Apache version but AWS Spark
distribution.

On Wed, Jun 7, 2023 at 8:24 PM Dongjoon Hyun 
wrote:

> I disagree with you in several ways.
>
> The following is not a *minor* change like the given examples (alterations
> to the start-up and shutdown scripts, configuration files, file layout
> etc.).
>
> > The change you cite meets the 4th point, minor change, made for
> integration reasons.
>
> The following is also wrong. There is no such point of state of Apache
> Spark 3.4.0 after 3.4.0 tag creation. Apache Spark community didn't allow
> Scala reverting patches in both `master` branch and `branch-3.4`.
>
> > There is no known technical objection; this was after all at one point
> the state of Apache Spark.
>
> Is the following your main point? So, you are selling a box "including
> Harry Potter by J. K. Rolling whose main character is Barry instead of
> Harry", but it's okay because you didn't sell the book itself? And, as a
> cloud-vendor, you borrowed the box instead of selling it like private
> libraries?
>
> > There is no standalone distribution of Apache Spark anywhere here.
>
> We are not asking a big thing. Why are you so reluctant to say you are not
> "Apache Spark 3.4.0" by simply saying "Apache Spark 3.4.0-databricks". What
> is the marketing reason here?
>
> Dongjoon.
>
>
> On Wed, Jun 7, 2023 at 9:27 AM Sean Owen  wrote:
>
>> Hi Dongjoon, I think this conversation is not advancing anymore. I
>> personally consider the matter closed unless you can find other support or
>> respond with more specifics. While this perhaps should be on private@, I
>> think it's not wrong as an instructive discussion on dev@.
>>
>> I don't believe you've made a clear argument about the problem, or how it
>> relates specifically to policy. Nevertheless I will show you my logic.
>>
>> You are asserting that a vendor cannot call a product Apache Spark 3.4.0
>> if it omits a patch updating a Scala maintenance version. This difference
>> has no known impact on usage, as far as I can tell.
>>
>> Let's see what policy requires:
>>
>> 1/ All source code changes must meet at least one of the acceptable
>> changes criteria set out below:
>> - The change has accepted by the relevant Apache project community for
>> inclusion in a future release. Note that the process used to accept changes
>> and how that acceptance is documented varies between projects.
>> - A change is a fix for an undisclosed security issue; and the fix is not
>> publicly disclosed as as security fix; and the Apache project has been
>> notified of the both issue and the proposed fix; and the PMC has rejected
>> neither the vulnerability report nor the proposed fix.
>> - A change is a fix for a bug; and the Apache project has been notified
>> of both the bug and the proposed fix; and the PMC has rejected neither the
>> bug report nor the proposed fix.
>> - Minor changes (e.g. alterations to the start-up and shutdown scripts,
>> configuration files, file layout etc.) to integrate with the target
>> platform providing the Apache project has not objected to those changes.
>>
>> The change you cite meets the 4th point, minor change, made for
>> integration reasons. There is no known technical objection; this was after
>> all at one point the state of Apache Spark.
>>
>>
>> 2/ A version number must be used that both clearly differentiates it from
>> an Apache Software Foundation release and clearly identifies the Apache
>> Software Foundation version on which the software is based.
>>
>> Keep in mind the product here is not "Apache Spark", but the "Databricks
>> Runtime 13.1 (including Apache Spark 3.4.0)". That is, there is far more
>> than a version number differentiating this product from Apache Spark. There
>> is no standalone distribution of Apache Spark anywhere here. I believe that
>> easily matches the intent.
>>
>>
>> 3/ The documentation must clearly identify the Apache Software Foundation
>> version on which the software is based.
>>
>> Clearly, yes.
>>
>>
>> 4/ The end user expects that the distribution channel will back-port
>> fixes. It is not necessary to back-port all fixes. Selection of fixes to
>> back-port must be consistent with the update policy of that distribution
>> channel.
>>
>> I think this is safe to say too. Indeed this explicitly contemplates not
>> back-porting a change.
>>
>>
>> Backing up, you can see from this document that the spirit of it is:
>> don't include changes in your own Apache Foo x.y that aren't wanted by the
>> project, and still call it Apache Foo x.y. I don't believe your case
>> matches this spirit either.
>>
>> I do think it's not crazy to suggest, hey vendor, would you call this
>> "Apache Spark + patches" or ".vendor123". But that's at best a suggestion,

Re: ASF policy violation and Scala version issues

2023-06-07 Thread Mich Talebzadeh
OK, is this the crux of the matter?

We are not asking a big thing ...

First, who are we here? members?

In my opinion, without being overly specific, this discussion has lost its
objectivity. However, with reference to your point, I am sure, a simple
vote will clarify the position in a fairer way.

for me + 1 for Sean's assertions, meaning I agree with Sean to move on and
close this discussion.

Respectfully

Mich Talebzadeh,
Lead Solutions Architect/Engineering Lead
Palantir Technologies Limited
London
United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Wed, 7 Jun 2023 at 18:24, Dongjoon Hyun  wrote:

> I disagree with you in several ways.
>
> The following is not a *minor* change like the given examples (alterations
> to the start-up and shutdown scripts, configuration files, file layout
> etc.).
>
> > The change you cite meets the 4th point, minor change, made for
> integration reasons.
>
> The following is also wrong. There is no such point of state of Apache
> Spark 3.4.0 after 3.4.0 tag creation. Apache Spark community didn't allow
> Scala reverting patches in both `master` branch and `branch-3.4`.
>
> > There is no known technical objection; this was after all at one point
> the state of Apache Spark.
>
> Is the following your main point? So, you are selling a box "including
> Harry Potter by J. K. Rolling whose main character is Barry instead of
> Harry", but it's okay because you didn't sell the book itself? And, as a
> cloud-vendor, you borrowed the box instead of selling it like private
> libraries?
>
> > There is no standalone distribution of Apache Spark anywhere here.
>
> We are not asking a big thing. Why are you so reluctant to say you are not
> "Apache Spark 3.4.0" by simply saying "Apache Spark 3.4.0-databricks". What
> is the marketing reason here?
>
> Dongjoon.
>
>
> On Wed, Jun 7, 2023 at 9:27 AM Sean Owen  wrote:
>
>> Hi Dongjoon, I think this conversation is not advancing anymore. I
>> personally consider the matter closed unless you can find other support or
>> respond with more specifics. While this perhaps should be on private@, I
>> think it's not wrong as an instructive discussion on dev@.
>>
>> I don't believe you've made a clear argument about the problem, or how it
>> relates specifically to policy. Nevertheless I will show you my logic.
>>
>> You are asserting that a vendor cannot call a product Apache Spark 3.4.0
>> if it omits a patch updating a Scala maintenance version. This difference
>> has no known impact on usage, as far as I can tell.
>>
>> Let's see what policy requires:
>>
>> 1/ All source code changes must meet at least one of the acceptable
>> changes criteria set out below:
>> - The change has accepted by the relevant Apache project community for
>> inclusion in a future release. Note that the process used to accept changes
>> and how that acceptance is documented varies between projects.
>> - A change is a fix for an undisclosed security issue; and the fix is not
>> publicly disclosed as as security fix; and the Apache project has been
>> notified of the both issue and the proposed fix; and the PMC has rejected
>> neither the vulnerability report nor the proposed fix.
>> - A change is a fix for a bug; and the Apache project has been notified
>> of both the bug and the proposed fix; and the PMC has rejected neither the
>> bug report nor the proposed fix.
>> - Minor changes (e.g. alterations to the start-up and shutdown scripts,
>> configuration files, file layout etc.) to integrate with the target
>> platform providing the Apache project has not objected to those changes.
>>
>> The change you cite meets the 4th point, minor change, made for
>> integration reasons. There is no known technical objection; this was after
>> all at one point the state of Apache Spark.
>>
>>
>> 2/ A version number must be used that both clearly differentiates it from
>> an Apache Software Foundation release and clearly identifies the Apache
>> Software Foundation version on which the software is based.
>>
>> Keep in mind the product here is not "Apache Spark", but the "Databricks
>> Runtime 13.1 (including Apache Spark 3.4.0)". That is, there is far more
>> than a version number differentiating this product from Apache Spark. There
>> is no standalone distribution of Apache Spark anywhere here. I believe that
>> easily matches the intent.
>>
>>
>> 3/ The documentation must clearly identify the Apache Software Foundation
>> version on which the software is based.
>>
>> Clearly, yes.
>>
>>
>> 4/ The end user expects that the distribution channel will 

Re: ASF policy violation and Scala version issues

2023-06-07 Thread Dongjoon Hyun
I disagree with you in several ways.

The following is not a *minor* change like the given examples (alterations
to the start-up and shutdown scripts, configuration files, file layout
etc.).

> The change you cite meets the 4th point, minor change, made for
integration reasons.

The following is also wrong. There is no such point of state of Apache
Spark 3.4.0 after 3.4.0 tag creation. Apache Spark community didn't allow
Scala reverting patches in both `master` branch and `branch-3.4`.

> There is no known technical objection; this was after all at one point
the state of Apache Spark.

Is the following your main point? So, you are selling a box "including
Harry Potter by J. K. Rolling whose main character is Barry instead of
Harry", but it's okay because you didn't sell the book itself? And, as a
cloud-vendor, you borrowed the box instead of selling it like private
libraries?

> There is no standalone distribution of Apache Spark anywhere here.

We are not asking a big thing. Why are you so reluctant to say you are not
"Apache Spark 3.4.0" by simply saying "Apache Spark 3.4.0-databricks". What
is the marketing reason here?

Dongjoon.


On Wed, Jun 7, 2023 at 9:27 AM Sean Owen  wrote:

> Hi Dongjoon, I think this conversation is not advancing anymore. I
> personally consider the matter closed unless you can find other support or
> respond with more specifics. While this perhaps should be on private@, I
> think it's not wrong as an instructive discussion on dev@.
>
> I don't believe you've made a clear argument about the problem, or how it
> relates specifically to policy. Nevertheless I will show you my logic.
>
> You are asserting that a vendor cannot call a product Apache Spark 3.4.0
> if it omits a patch updating a Scala maintenance version. This difference
> has no known impact on usage, as far as I can tell.
>
> Let's see what policy requires:
>
> 1/ All source code changes must meet at least one of the acceptable
> changes criteria set out below:
> - The change has accepted by the relevant Apache project community for
> inclusion in a future release. Note that the process used to accept changes
> and how that acceptance is documented varies between projects.
> - A change is a fix for an undisclosed security issue; and the fix is not
> publicly disclosed as as security fix; and the Apache project has been
> notified of the both issue and the proposed fix; and the PMC has rejected
> neither the vulnerability report nor the proposed fix.
> - A change is a fix for a bug; and the Apache project has been notified of
> both the bug and the proposed fix; and the PMC has rejected neither the bug
> report nor the proposed fix.
> - Minor changes (e.g. alterations to the start-up and shutdown scripts,
> configuration files, file layout etc.) to integrate with the target
> platform providing the Apache project has not objected to those changes.
>
> The change you cite meets the 4th point, minor change, made for
> integration reasons. There is no known technical objection; this was after
> all at one point the state of Apache Spark.
>
>
> 2/ A version number must be used that both clearly differentiates it from
> an Apache Software Foundation release and clearly identifies the Apache
> Software Foundation version on which the software is based.
>
> Keep in mind the product here is not "Apache Spark", but the "Databricks
> Runtime 13.1 (including Apache Spark 3.4.0)". That is, there is far more
> than a version number differentiating this product from Apache Spark. There
> is no standalone distribution of Apache Spark anywhere here. I believe that
> easily matches the intent.
>
>
> 3/ The documentation must clearly identify the Apache Software Foundation
> version on which the software is based.
>
> Clearly, yes.
>
>
> 4/ The end user expects that the distribution channel will back-port
> fixes. It is not necessary to back-port all fixes. Selection of fixes to
> back-port must be consistent with the update policy of that distribution
> channel.
>
> I think this is safe to say too. Indeed this explicitly contemplates not
> back-porting a change.
>
>
> Backing up, you can see from this document that the spirit of it is: don't
> include changes in your own Apache Foo x.y that aren't wanted by the
> project, and still call it Apache Foo x.y. I don't believe your case
> matches this spirit either.
>
> I do think it's not crazy to suggest, hey vendor, would you call this
> "Apache Spark + patches" or ".vendor123". But that's at best a suggestion,
> and I think it does nothing in particular for users. You've made the
> suggestion, and I do not see some police action from the PMC must follow.
>
>
> I think you're simply objecting to a vendor choice, but that is not
> on-topic here unless you can specifically rebut the reasoning above and
> show it's connected.
>
>
> On Wed, Jun 7, 2023 at 11:02 AM Dongjoon Hyun  wrote:
>
>> Sean, it seems that you are confused here. We are not talking about your
>> upper system (the 

Re: ASF policy violation and Scala version issues

2023-06-07 Thread Sean Owen
Hi Dongjoon, I think this conversation is not advancing anymore. I
personally consider the matter closed unless you can find other support or
respond with more specifics. While this perhaps should be on private@, I
think it's not wrong as an instructive discussion on dev@.

I don't believe you've made a clear argument about the problem, or how it
relates specifically to policy. Nevertheless I will show you my logic.

You are asserting that a vendor cannot call a product Apache Spark 3.4.0 if
it omits a patch updating a Scala maintenance version. This difference has
no known impact on usage, as far as I can tell.

Let's see what policy requires:

1/ All source code changes must meet at least one of the acceptable changes
criteria set out below:
- The change has accepted by the relevant Apache project community for
inclusion in a future release. Note that the process used to accept changes
and how that acceptance is documented varies between projects.
- A change is a fix for an undisclosed security issue; and the fix is not
publicly disclosed as as security fix; and the Apache project has been
notified of the both issue and the proposed fix; and the PMC has rejected
neither the vulnerability report nor the proposed fix.
- A change is a fix for a bug; and the Apache project has been notified of
both the bug and the proposed fix; and the PMC has rejected neither the bug
report nor the proposed fix.
- Minor changes (e.g. alterations to the start-up and shutdown scripts,
configuration files, file layout etc.) to integrate with the target
platform providing the Apache project has not objected to those changes.

The change you cite meets the 4th point, minor change, made for integration
reasons. There is no known technical objection; this was after all at one
point the state of Apache Spark.


2/ A version number must be used that both clearly differentiates it from
an Apache Software Foundation release and clearly identifies the Apache
Software Foundation version on which the software is based.

Keep in mind the product here is not "Apache Spark", but the "Databricks
Runtime 13.1 (including Apache Spark 3.4.0)". That is, there is far more
than a version number differentiating this product from Apache Spark. There
is no standalone distribution of Apache Spark anywhere here. I believe that
easily matches the intent.


3/ The documentation must clearly identify the Apache Software Foundation
version on which the software is based.

Clearly, yes.


4/ The end user expects that the distribution channel will back-port fixes.
It is not necessary to back-port all fixes. Selection of fixes to back-port
must be consistent with the update policy of that distribution channel.

I think this is safe to say too. Indeed this explicitly contemplates not
back-porting a change.


Backing up, you can see from this document that the spirit of it is: don't
include changes in your own Apache Foo x.y that aren't wanted by the
project, and still call it Apache Foo x.y. I don't believe your case
matches this spirit either.

I do think it's not crazy to suggest, hey vendor, would you call this
"Apache Spark + patches" or ".vendor123". But that's at best a suggestion,
and I think it does nothing in particular for users. You've made the
suggestion, and I do not see some police action from the PMC must follow.


I think you're simply objecting to a vendor choice, but that is not
on-topic here unless you can specifically rebut the reasoning above and
show it's connected.


On Wed, Jun 7, 2023 at 11:02 AM Dongjoon Hyun  wrote:

> Sean, it seems that you are confused here. We are not talking about your
> upper system (the notebook environment). We are talking about the
> submodule, "Apache Spark 3.4.0-databricks". Whatever you call it, both of
> us knows "Apache Spark 3.4.0-databricks" is different from "Apache Spark
> 3.4.0". You should not use "3.4.0" at your subsystem.
>
> > This also is aimed at distributions of "Apache Foo", not products that
> > "include Apache Foo", which are clearly not Apache Foo.
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: ASF policy violation and Scala version issues

2023-06-07 Thread Dongjoon Hyun
Sean, it seems that you are confused here. We are not talking about your upper 
system (the notebook environment). We are talking about the submodule, "Apache 
Spark 3.4.0-databricks". Whatever you call it, both of us knows "Apache Spark 
3.4.0-databricks" is different from "Apache Spark 3.4.0". You should not use 
"3.4.0" at your subsystem.

> This also is aimed at distributions of "Apache Foo", not products that
> "include Apache Foo", which are clearly not Apache Foo.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: ASF policy violation and Scala version issues

2023-06-07 Thread Sean Owen
(With consent, shall we move this to the PMC list?)

No, I don't think that's what this policy says.

First, could you please be more specific here? why do you think a certain
release is at odds with this?
Because so far you've mentioned, I think, not taking a Scala maintenance
release update.

But this says things like:

The source code on which the software is based must either be identical to
an Apache Software Foundation source code release or all of the following
must also be true:
  ...
  - The end user expects that the distribution channel will back-port
fixes. It is not necessary to back-port all fixes. Selection of fixes to
back-port must be consistent with the update policy of that distribution
channel.

That describes what you're talking about.

This also is aimed at distributions of "Apache Foo", not products that
"include Apache Foo", which are clearly not Apache Foo.
The spirit of it is, more generally: don't keep new features and fixes to
yourself. That does not seem to apply here.

On Tue, Jun 6, 2023 at 11:34 PM Dongjoon Hyun 
wrote:

> Hi, All and Matei (as the Chair of Spark PMC).
>
> For the ASF policy violation part, here is a legal recommendation
> documentation (draft) from `legal-discuss@`.
>
> https://www.apache.org/foundation/marks/downstream.html#source
>
> > A version number must be used that both clearly differentiates it from
> an Apache Software Foundation release and clearly identifies the Apache
> Software Foundation version on which the software is based.
>
> In short, Databricks should not claim its product like "Apache Spark
> 3.4.0". The version number should clearly differentiate it from Apache
> Spark 3.4.0. I hope we can conclude this together in this way and move our
> focus forward to the other remaining issues.
>
> To Matei, could you do the legal follow-up officially with Databricks with
> the above info?
>
> If there is a person to do this, I believe you are the best person to
> drive this.
>
> Thank you in advance.
>
> Dongjoon.
>
>
> On Tue, Jun 6, 2023 at 2:49 PM Dongjoon Hyun  wrote:
>
>> It goes to "legal-discuss@".
>>
>> https://lists.apache.org/thread/mzhggd0rpz8t4d7vdsbhkp38mvd3lty4
>>
>> I hope we can conclude the legal part clearly and shortly in one way or
>> another which we will follow with confidence.
>>
>> Dongjoon
>>
>> On 2023/06/06 20:06:42 Dongjoon Hyun wrote:
>> > Thank you, Sean, Mich, Holden, again.
>> >
>> > For this specific part, let's ask the ASF board via bo...@apache.org to
>> > find a right answer because it's a controversial legal issue here.
>> >
>> > > I think you'd just prefer Databricks make a different choice, which is
>> > legitimate, but, an issue to take up with Databricks, not here.
>> >
>> > Dongjoon.
>> >
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>


Re: ASF policy violation and Scala version issues

2023-06-06 Thread Dongjoon Hyun
Hi, All and Matei (as the Chair of Spark PMC).

For the ASF policy violation part, here is a legal recommendation
documentation (draft) from `legal-discuss@`.

https://www.apache.org/foundation/marks/downstream.html#source

> A version number must be used that both clearly differentiates it from an
Apache Software Foundation release and clearly identifies the Apache
Software Foundation version on which the software is based.

In short, Databricks should not claim its product like "Apache Spark
3.4.0". The version number should clearly differentiate it from Apache
Spark 3.4.0. I hope we can conclude this together in this way and move our
focus forward to the other remaining issues.

To Matei, could you do the legal follow-up officially with Databricks with
the above info?

If there is a person to do this, I believe you are the best person to drive
this.

Thank you in advance.

Dongjoon.


On Tue, Jun 6, 2023 at 2:49 PM Dongjoon Hyun  wrote:

> It goes to "legal-discuss@".
>
> https://lists.apache.org/thread/mzhggd0rpz8t4d7vdsbhkp38mvd3lty4
>
> I hope we can conclude the legal part clearly and shortly in one way or
> another which we will follow with confidence.
>
> Dongjoon
>
> On 2023/06/06 20:06:42 Dongjoon Hyun wrote:
> > Thank you, Sean, Mich, Holden, again.
> >
> > For this specific part, let's ask the ASF board via bo...@apache.org to
> > find a right answer because it's a controversial legal issue here.
> >
> > > I think you'd just prefer Databricks make a different choice, which is
> > legitimate, but, an issue to take up with Databricks, not here.
> >
> > Dongjoon.
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: ASF policy violation and Scala version issues

2023-06-06 Thread Mich Talebzadeh
Hello,

This explanation is splendidly detailed and requires further understanding.
However, on a first thought with regard to the point raised below and I
quote:

"... There is a company claiming something non-Apache like "Apache Spark
3.4.0 minus SPARK-40436" with the name "Apache Spark 3.4.0."

There is a potential risk for the consumers of this product offered that
can be justified as below:
To maintain the integrity of the Apache Spark project and ensure reliable
and secure software, it is  a common practice to use official releases from
the ASF. If a third party company is claiming to provide a modified version
of Apache Spark (in the form of software as a service), it is strongly
recommended " for consumers" to carefully review the modifications
involved, understand the reasoning behind these modifications and/or
omissions, and evaluate the potential implications before using and
maintaining this offering in production environments. The third party
company has to clearly state and advertise  the reasoning behind this
so-called hacking, specifically with reference to
 "### 3. Action Items ### --We should communicate and help the company to
fix the misleading messages and remove Scala-version segmentation
situations per Spark version".

HTH

Mich Talebzadeh,
Lead Solutions Architect/Engineering Lead
Palantir Technologies Limited
London
United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Mon, 5 Jun 2023 at 08:46, Dongjoon Hyun  wrote:

> Hi, All and Matei (as the Chair of Apache Spark PMC).
>
> Sorry for a long email, I want to share two topics and corresponding
> action items.
> You can go to "Section 3: Action Items" directly for the conclusion.
>
>
> ### 1. ASF Policy Violation ###
>
> ASF has a rule for "MAY I CALL MY MODIFIED CODE 'APACHE'?"
>
> https://www.apache.org/foundation/license-faq.html#Name-changes
>
> For example, when we call `Apache Spark 3.4.0`, it's supposed to be the
> same with one of our official distributions.
>
> https://downloads.apache.org/spark/spark-3.4.0/
>
> Specifically, in terms of the Scala version, we believe it should have
> Scala 2.12.17 because of 'SPARK-40436 Upgrade Scala to 2.12.17'.
>
> There is a company claiming something non-Apache like "Apache Spark 3.4.0
> minus SPARK-40436" with the name "Apache Spark 3.4.0."
>
> - The company website shows "X.Y (includes Apache Spark 3.4.0, Scala
> 2.12)"
> - The runtime logs "23/06/05 04:23:27 INFO SparkContext: Running Spark
> version 3.4.0"
> - UI shows Apache Spark logo and `3.4.0`.
> - However, Scala Version is '2.12.15'
>
> [image: Screenshot 2023-06-04 at 9.37.16 PM.png][image: Screenshot
> 2023-06-04 at 10.14.45 PM.png]
>
> Lastly, this is not a single instance. For example, the same company also
> claims "Apache Spark 3.3.2" with a mismatched Scala version.
>
>
> ### 2. Scala Issues ###
>
> In addition to (1), although we proceeded with good intentions and great
> care
> including dev mailing list discussion, there are several concerning areas
> which
> need more attention and our love.
>
> a) Scala Spark users will experience UX inconvenience from Spark 3.5.
>
> SPARK-42493 Make Python the first tab for code examples
>
> For the record, we discussed it here.
> - https://lists.apache.org/thread/1p8s09ysrh4jqsfd47qdtrl7rm4rrs05
>   "[DISCUSS] Show Python code examples first in Spark documentation"
>
> b) Scala version upgrade is blocked by the Ammonite library dev cycle
> currently.
>
> Although we discussed it here and it had good intentions,
> the current master branch cannot use the latest Scala.
>
> - https://lists.apache.org/thread/4nk5ddtmlobdt8g3z8xbqjclzkhlsdfk
> "Ammonite as REPL for Spark Connect"
>  SPARK-42884 Add Ammonite REPL integration
>
> Specifically, the following are blocked and I'm monitoring the
> Ammonite repository.
> - SPARK-40497 Upgrade Scala to 2.13.11
> - SPARK-43832 Upgrade Scala to 2.12.18
> - According to https://github.com/com-lihaoyi/Ammonite/issues ,
>   Scala 3.3.0 LTS support also looks infeasible.
>
> Although we may be able to wait for a while, there are two fundamental
> solutions
> to unblock this situation in a long-term maintenance perspective.
> - Replace it with a Scala-shell based implementation
> - Move `connector/connect/client/jvm/pom.xml` outside from Spark repo.
>Maybe, we can put it into the new repo like Rust and Go client.
>
> c) Scala 2.13 and above needs Apache Spark 4.0.
>
> In "Apache Spark 3.5.0 Expectations?" and 

Re: ASF policy violation and Scala version issues

2023-06-06 Thread Dongjoon Hyun
It goes to "legal-discuss@".

https://lists.apache.org/thread/mzhggd0rpz8t4d7vdsbhkp38mvd3lty4

I hope we can conclude the legal part clearly and shortly in one way or another 
which we will follow with confidence.

Dongjoon

On 2023/06/06 20:06:42 Dongjoon Hyun wrote:
> Thank you, Sean, Mich, Holden, again.
> 
> For this specific part, let's ask the ASF board via bo...@apache.org to
> find a right answer because it's a controversial legal issue here.
> 
> > I think you'd just prefer Databricks make a different choice, which is
> legitimate, but, an issue to take up with Databricks, not here.
> 
> Dongjoon.
> 

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: ASF policy violation and Scala version issues

2023-06-06 Thread Dongjoon Hyun
Thank you, Sean, Mich, Holden, again.

For this specific part, let's ask the ASF board via bo...@apache.org to
find a right answer because it's a controversial legal issue here.

> I think you'd just prefer Databricks make a different choice, which is
legitimate, but, an issue to take up with Databricks, not here.

Dongjoon.


Re: ASF policy violation and Scala version issues

2023-06-06 Thread Holden Karau
So I think if the Spark PMC wants to ask Databricks something that could be
reasonable (although I'm a little fuzzy as to the ask), but that
conversation might belong on private@ (I could be wrong of course).

On Tue, Jun 6, 2023 at 3:29 AM Mich Talebzadeh 
wrote:

> I concur with you Sean.
>
> If I understand correctly the point raised by the thread owner, in
> heterogeneous environments that we work, it is up to the practitioner to
> ensure that there is version compatibility among OS versions, spark version
> and the target artefact in consideration. For example if I try to connect
> to Google BigQuery from spark 3.4.0, my OS or for that matter, the docker
> needs to run Java 8 regardless of  spark Java version, otherwise it will
> fail.
>
> I think these details should be left to the trenches, because these
> arguments about versioning become tangential in the big picture.  Case in
> point, my current OS scala version is 2.13.8 but works fine with Spark
> built on 2.12.17.
>
> HTH
>
> Mich Talebzadeh,
> Lead Solutions Architect/Engineering Lead
> Palantir Technologies Limited
> London
> United Kingdom
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Tue, 6 Jun 2023 at 01:37, Sean Owen  wrote:
>
>> I think the issue is whether a distribution of Spark is so materially
>> different from OSS that it causes problems for the larger community of
>> users. There's a legitimate question of whether such a thing can be called
>> "Apache Spark + changes", as describing it that way becomes meaningfully
>> inaccurate. And if it's inaccurate, then it's a trademark usage issue, and
>> a matter for the PMC to act on. I certainly recall this type of problem
>> from the early days of Hadoop - the project itself had 2 or 3 live branches
>> in development (was it 0.20.x vs 0.23.x vs 1.x? YARN vs no YARN?) picked up
>> by different vendors and it was unclear what "Apache Hadoop" meant in a
>> vendor distro. Or frankly, upstream.
>>
>> In comparison, variation in Scala maintenance release seems trivial. I'm
>> not clear from the thread what actual issue this causes to users. Is there
>> more to it - does this go hand in hand with JDK version and Ammonite, or
>> are those separate? What's an example of the practical user issue. Like, I
>> compile vs Spark 3.4.0 and because of Scala version differences it doesn't
>> run on some vendor distro? That's not great, but seems like a vendor
>> problem. Unless you tell me we are getting tons of bug reports to OSS Spark
>> as a result or something.
>>
>> Is the implication that something in OSS Spark is being blocked to prefer
>> some set of vendor choices? because the changes you're pointing to seem to
>> be going into Apache Spark, actually. It'd be more useful to be specific
>> and name names at this point, seems fine.
>>
>> The rest of this is just a discussion about Databricks choices. (If it's
>> not clear, I'm at Databricks but do not work on the Spark distro). We can
>> discuss but it seems off-topic _if_ it can't be connected to a problem for
>> OSS Spark. Anyway:
>>
>> If it helps, _some_ important patches are described at
>> https://docs.databricks.com/release-notes/runtime/maintenance-updates.html
>> ; I don't think this is exactly hidden.
>>
>> Out of curiosity, how would you describe this software in the UI instead?
>> "3.4.0" is shorthand, because this is a little dropdown menu; the terminal
>> output is likewise not a place to list all patches. You would propose
>> requesting calling this "3.4.0 + patches"? That's the best I can think of,
>> but I don't think it addresses what you're getting at anyway. I think you'd
>> just prefer Databricks make a different choice, which is legitimate, but,
>> an issue to take up with Databricks, not here.
>>
>>
>> On Mon, Jun 5, 2023 at 6:58 PM Dongjoon Hyun 
>> wrote:
>>
>>> Hi, Sean.
>>>
>>> "+ patches" or "powered by Apache Spark 3.4.0" is not a problem as you
>>> mentioned. For the record, I also didn't bring up any old story here.
>>>
>>> > "Apache Spark 3.4.0 + patches"
>>>
>>> However, "including Apache Spark 3.4.0" still causes confusion even in a
>>> different way because of those missing patches, SPARK-40436 (Upgrade Scala
>>> to 2.12.17) and SPARK-39414 (Upgrade Scala to 2.12.16). Technically,
>>> Databricks Runtime doesn't include Apache Spark 3.4.0 while it claims it to
>>> the users.
>>>
>>> [image: image.png]
>>>
>>> It's a sad story from the Apache Spark Scala perspective because the
>>> users cannot even try to use the correct Scala 2.12.17 version in the
>>> runtime.

Re: ASF policy violation and Scala version issues

2023-06-06 Thread Mich Talebzadeh
I concur with you Sean.

If I understand correctly the point raised by the thread owner, in
heterogeneous environments that we work, it is up to the practitioner to
ensure that there is version compatibility among OS versions, spark version
and the target artefact in consideration. For example if I try to connect
to Google BigQuery from spark 3.4.0, my OS or for that matter, the docker
needs to run Java 8 regardless of  spark Java version, otherwise it will
fail.

I think these details should be left to the trenches, because these
arguments about versioning become tangential in the big picture.  Case in
point, my current OS scala version is 2.13.8 but works fine with Spark
built on 2.12.17.

HTH

Mich Talebzadeh,
Lead Solutions Architect/Engineering Lead
Palantir Technologies Limited
London
United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Tue, 6 Jun 2023 at 01:37, Sean Owen  wrote:

> I think the issue is whether a distribution of Spark is so materially
> different from OSS that it causes problems for the larger community of
> users. There's a legitimate question of whether such a thing can be called
> "Apache Spark + changes", as describing it that way becomes meaningfully
> inaccurate. And if it's inaccurate, then it's a trademark usage issue, and
> a matter for the PMC to act on. I certainly recall this type of problem
> from the early days of Hadoop - the project itself had 2 or 3 live branches
> in development (was it 0.20.x vs 0.23.x vs 1.x? YARN vs no YARN?) picked up
> by different vendors and it was unclear what "Apache Hadoop" meant in a
> vendor distro. Or frankly, upstream.
>
> In comparison, variation in Scala maintenance release seems trivial. I'm
> not clear from the thread what actual issue this causes to users. Is there
> more to it - does this go hand in hand with JDK version and Ammonite, or
> are those separate? What's an example of the practical user issue. Like, I
> compile vs Spark 3.4.0 and because of Scala version differences it doesn't
> run on some vendor distro? That's not great, but seems like a vendor
> problem. Unless you tell me we are getting tons of bug reports to OSS Spark
> as a result or something.
>
> Is the implication that something in OSS Spark is being blocked to prefer
> some set of vendor choices? because the changes you're pointing to seem to
> be going into Apache Spark, actually. It'd be more useful to be specific
> and name names at this point, seems fine.
>
> The rest of this is just a discussion about Databricks choices. (If it's
> not clear, I'm at Databricks but do not work on the Spark distro). We can
> discuss but it seems off-topic _if_ it can't be connected to a problem for
> OSS Spark. Anyway:
>
> If it helps, _some_ important patches are described at
> https://docs.databricks.com/release-notes/runtime/maintenance-updates.html
> ; I don't think this is exactly hidden.
>
> Out of curiosity, how would you describe this software in the UI instead?
> "3.4.0" is shorthand, because this is a little dropdown menu; the terminal
> output is likewise not a place to list all patches. You would propose
> requesting calling this "3.4.0 + patches"? That's the best I can think of,
> but I don't think it addresses what you're getting at anyway. I think you'd
> just prefer Databricks make a different choice, which is legitimate, but,
> an issue to take up with Databricks, not here.
>
>
> On Mon, Jun 5, 2023 at 6:58 PM Dongjoon Hyun 
> wrote:
>
>> Hi, Sean.
>>
>> "+ patches" or "powered by Apache Spark 3.4.0" is not a problem as you
>> mentioned. For the record, I also didn't bring up any old story here.
>>
>> > "Apache Spark 3.4.0 + patches"
>>
>> However, "including Apache Spark 3.4.0" still causes confusion even in a
>> different way because of those missing patches, SPARK-40436 (Upgrade Scala
>> to 2.12.17) and SPARK-39414 (Upgrade Scala to 2.12.16). Technically,
>> Databricks Runtime doesn't include Apache Spark 3.4.0 while it claims it to
>> the users.
>>
>> [image: image.png]
>>
>> It's a sad story from the Apache Spark Scala perspective because the
>> users cannot even try to use the correct Scala 2.12.17 version in the
>> runtime.
>>
>> All items I've shared are connected via a single theme, hurting Apache
>> Spark Scala users.
>> From (1) building Spark, (2) creating a fragmented Scala Spark runtime
>> environment and (3) hidden user-facing documentation.
>>
>> Of course, I don't think those are designed in an organized way
>> intentionally. It just happens at the same time.
>>
>> Based on your comments, let me ask you two 

Re: ASF policy violation and Scala version issues

2023-06-05 Thread Sean Owen
I think the issue is whether a distribution of Spark is so materially
different from OSS that it causes problems for the larger community of
users. There's a legitimate question of whether such a thing can be called
"Apache Spark + changes", as describing it that way becomes meaningfully
inaccurate. And if it's inaccurate, then it's a trademark usage issue, and
a matter for the PMC to act on. I certainly recall this type of problem
from the early days of Hadoop - the project itself had 2 or 3 live branches
in development (was it 0.20.x vs 0.23.x vs 1.x? YARN vs no YARN?) picked up
by different vendors and it was unclear what "Apache Hadoop" meant in a
vendor distro. Or frankly, upstream.

In comparison, variation in Scala maintenance release seems trivial. I'm
not clear from the thread what actual issue this causes to users. Is there
more to it - does this go hand in hand with JDK version and Ammonite, or
are those separate? What's an example of the practical user issue. Like, I
compile vs Spark 3.4.0 and because of Scala version differences it doesn't
run on some vendor distro? That's not great, but seems like a vendor
problem. Unless you tell me we are getting tons of bug reports to OSS Spark
as a result or something.

Is the implication that something in OSS Spark is being blocked to prefer
some set of vendor choices? because the changes you're pointing to seem to
be going into Apache Spark, actually. It'd be more useful to be specific
and name names at this point, seems fine.

The rest of this is just a discussion about Databricks choices. (If it's
not clear, I'm at Databricks but do not work on the Spark distro). We can
discuss but it seems off-topic _if_ it can't be connected to a problem for
OSS Spark. Anyway:

If it helps, _some_ important patches are described at
https://docs.databricks.com/release-notes/runtime/maintenance-updates.html
; I don't think this is exactly hidden.

Out of curiosity, how would you describe this software in the UI instead?
"3.4.0" is shorthand, because this is a little dropdown menu; the terminal
output is likewise not a place to list all patches. You would propose
requesting calling this "3.4.0 + patches"? That's the best I can think of,
but I don't think it addresses what you're getting at anyway. I think you'd
just prefer Databricks make a different choice, which is legitimate, but,
an issue to take up with Databricks, not here.


On Mon, Jun 5, 2023 at 6:58 PM Dongjoon Hyun 
wrote:

> Hi, Sean.
>
> "+ patches" or "powered by Apache Spark 3.4.0" is not a problem as you
> mentioned. For the record, I also didn't bring up any old story here.
>
> > "Apache Spark 3.4.0 + patches"
>
> However, "including Apache Spark 3.4.0" still causes confusion even in a
> different way because of those missing patches, SPARK-40436 (Upgrade Scala
> to 2.12.17) and SPARK-39414 (Upgrade Scala to 2.12.16). Technically,
> Databricks Runtime doesn't include Apache Spark 3.4.0 while it claims it to
> the users.
>
> [image: image.png]
>
> It's a sad story from the Apache Spark Scala perspective because the users
> cannot even try to use the correct Scala 2.12.17 version in the runtime.
>
> All items I've shared are connected via a single theme, hurting Apache
> Spark Scala users.
> From (1) building Spark, (2) creating a fragmented Scala Spark runtime
> environment and (3) hidden user-facing documentation.
>
> Of course, I don't think those are designed in an organized way
> intentionally. It just happens at the same time.
>
> Based on your comments, let me ask you two questions. (1) When Databricks
> builds its internal Spark from its private code repository, is it a company
> policy to always expose "Apache 3.4.0" to the users like the following by
> ignoring all changes (whatever they are). And, (2) Do you insist that it is
> normative and clear to the users and the community?
>
> > - The runtime logs "23/06/05 04:23:27 INFO SparkContext: Running Spark
> version 3.4.0"
> > - UI shows Apache Spark logo and `3.4.0`.
>
>>
>>


Re: ASF policy violation and Scala version issues

2023-06-05 Thread Dongjoon Hyun
Hi, Sean.

"+ patches" or "powered by Apache Spark 3.4.0" is not a problem as you
mentioned. For the record, I also didn't bring up any old story here.

> "Apache Spark 3.4.0 + patches"

However, "including Apache Spark 3.4.0" still causes confusion even in a
different way because of those missing patches, SPARK-40436 (Upgrade Scala
to 2.12.17) and SPARK-39414 (Upgrade Scala to 2.12.16). Technically,
Databricks Runtime doesn't include Apache Spark 3.4.0 while it claims it to
the users.

[image: image.png]

It's a sad story from the Apache Spark Scala perspective because the users
cannot even try to use the correct Scala 2.12.17 version in the runtime.

All items I've shared are connected via a single theme, hurting Apache
Spark Scala users.
>From (1) building Spark, (2) creating a fragmented Scala Spark runtime
environment and (3) hidden user-facing documentation.

Of course, I don't think those are designed in an organized way
intentionally. It just happens at the same time.

Based on your comments, let me ask you two questions. (1) When Databricks
builds its internal Spark from its private code repository, is it a company
policy to always expose "Apache 3.4.0" to the users like the following by
ignoring all changes (whatever they are). And, (2) Do you insist that it is
normative and clear to the users and the community?

> - The runtime logs "23/06/05 04:23:27 INFO SparkContext: Running Spark
version 3.4.0"
> - UI shows Apache Spark logo and `3.4.0`.


Dongjoon.


On Mon, Jun 5, 2023 at 10:40 AM Sean Owen  wrote:

> On Mon, Jun 5, 2023 at 12:01 PM Dongjoon Hyun 
> wrote:
>
>> 1. For the naming, yes, but the company should use different version
>> numbers instead of the exact "3.4.0". As I shared the screenshot in my
>> previous email, the company exposes "Apache Spark 3.4.0" exactly because
>> they build their distribution without changing their version number at all.
>>
>
> I don't believe this is supported by guidance on the underlying issue
> here, which is trademark. There is nothing wrong with nominative use, and I
> think that's what this is. A thing can be "Apache Spark 3.4.0 + patches"
> and be described that way.
> Calling it "Apache Spark 3.4.0.vendor123" is argubaly more confusing IMHO,
> as there is no such Apache Spark version.
>
>
>
>> 2. According to
>> https://mvnrepository.com/artifact/org.apache.spark/spark-core,
>> all the other companies followed  "Semantic Versioning" or added
>> additional version numbers at their distributions, didn't they? AFAIK,
>> nobody claims to take over the exact, "3.4.0" version string, in source
>> code level like this company.
>>
>
> Here you're talking about software artifact numbering, for companies that
> were also releasing their own maintenance branch of OSS. That pretty much
> requires some sub-versioning scheme. I think that's fine too, although as
> above I think this is arguably _worse_ w.r.t. reuse of the Apache name and
> namespace.
> I'm not aware of any policy on this, and don't find this problematic
> myself. Doesn't mean it's right, but does mean implicitly this has never
> before been viewed as an issue?
>
> The one I'm aware of was releasing a product "including Apache Spark 2.0"
> before it existed, which does seem to potentially cause confusion, and that
> was addressed.
>
> Can you describe what policy is violated? we can disagree about what we'd
> prefer or not, but the question is, what if anything is disallowed? I'm not
> seeing that.
>
>
>> 3. This company not only causes the 'Scala Version Segmentation'
>> environment in a subtle way, but also defames Apache Spark 3.4.0 by
>> removing many bug fixes of SPARK-40436 (Upgrade Scala to 2.12.17) and
>> SPARK-39414 (Upgrade Scala to 2.12.16) for some unknown reason. Apparently,
>> this looks like not a superior version of Apache Spark 3.4.0. For me, it's
>> the inferior version. If a company disagrees with Scala 2.12.17 for some
>> internal reason, they are able to stick to 2.12.15, of course. However,
>> Apache Spark PMC should not allow them to lie to the customers that "Apache
>> Spark 3.4.0" uses Scala 2.12.15 by default. That's the reason why I
>> initiated this email because I'm considering this as a serious blocker to
>> make Apache Spark Scala improvement.
>> - https://github.com/scala/scala/releases/tag/v2.12.17 (21 Merged
>> PRs)
>> - https://github.com/scala/scala/releases/tag/v2.12.16 (68 Merged
>> PRs)
>>
>
> To be clear, this seems unrelated to your first two points above?
>
> I'm having trouble following what you are arguing here. You are saying a
> vendor release based on "Apache Spark 3.4.0" is not the same in some
> material way that you don't like. That's a fine position to take, but I
> think the product is still substantially describable as "Apache Spark
> 3.4.0 + patches". You can take up the issue with the vendor.
>
> But more importantly, I am not seeing how that constrains anything in
> Apache Spark? those updates were merged to OSS. But even taking up 

Re: ASF policy violation and Scala version issues

2023-06-05 Thread Sean Owen
On Mon, Jun 5, 2023 at 12:01 PM Dongjoon Hyun 
wrote:

> 1. For the naming, yes, but the company should use different version
> numbers instead of the exact "3.4.0". As I shared the screenshot in my
> previous email, the company exposes "Apache Spark 3.4.0" exactly because
> they build their distribution without changing their version number at all.
>

I don't believe this is supported by guidance on the underlying issue here,
which is trademark. There is nothing wrong with nominative use, and I think
that's what this is. A thing can be "Apache Spark 3.4.0 + patches" and be
described that way.
Calling it "Apache Spark 3.4.0.vendor123" is argubaly more confusing IMHO,
as there is no such Apache Spark version.



> 2. According to
> https://mvnrepository.com/artifact/org.apache.spark/spark-core,
> all the other companies followed  "Semantic Versioning" or added
> additional version numbers at their distributions, didn't they? AFAIK,
> nobody claims to take over the exact, "3.4.0" version string, in source
> code level like this company.
>

Here you're talking about software artifact numbering, for companies that
were also releasing their own maintenance branch of OSS. That pretty much
requires some sub-versioning scheme. I think that's fine too, although as
above I think this is arguably _worse_ w.r.t. reuse of the Apache name and
namespace.
I'm not aware of any policy on this, and don't find this problematic
myself. Doesn't mean it's right, but does mean implicitly this has never
before been viewed as an issue?

The one I'm aware of was releasing a product "including Apache Spark 2.0"
before it existed, which does seem to potentially cause confusion, and that
was addressed.

Can you describe what policy is violated? we can disagree about what we'd
prefer or not, but the question is, what if anything is disallowed? I'm not
seeing that.


> 3. This company not only causes the 'Scala Version Segmentation'
> environment in a subtle way, but also defames Apache Spark 3.4.0 by
> removing many bug fixes of SPARK-40436 (Upgrade Scala to 2.12.17) and
> SPARK-39414 (Upgrade Scala to 2.12.16) for some unknown reason. Apparently,
> this looks like not a superior version of Apache Spark 3.4.0. For me, it's
> the inferior version. If a company disagrees with Scala 2.12.17 for some
> internal reason, they are able to stick to 2.12.15, of course. However,
> Apache Spark PMC should not allow them to lie to the customers that "Apache
> Spark 3.4.0" uses Scala 2.12.15 by default. That's the reason why I
> initiated this email because I'm considering this as a serious blocker to
> make Apache Spark Scala improvement.
> - https://github.com/scala/scala/releases/tag/v2.12.17 (21 Merged PRs)
> - https://github.com/scala/scala/releases/tag/v2.12.16 (68 Merged PRs)
>

To be clear, this seems unrelated to your first two points above?

I'm having trouble following what you are arguing here. You are saying a
vendor release based on "Apache Spark 3.4.0" is not the same in some
material way that you don't like. That's a fine position to take, but I
think the product is still substantially describable as "Apache Spark
3.4.0 + patches". You can take up the issue with the vendor.

But more importantly, I am not seeing how that constrains anything in
Apache Spark? those updates were merged to OSS. But even taking up the
point you describe, why is the scala maintenance version even such a
material issue that is so severe it warrants PMC action?

Could you connect the dots a little more?


>


Re: ASF policy violation and Scala version issues

2023-06-05 Thread Dongjoon Hyun
Thank you, Sean.

I'll reply as a comment for some areas first.

>  I believe releasing "Apache Foo X.Y + patches" is acceptable,
> if it is substantially Apache Foo X.Y.

1. For the naming, yes, but the company should use different version
numbers instead of the exact "3.4.0". As I shared the screenshot in my
previous email, the company exposes "Apache Spark 3.4.0" exactly because
they build their distribution without changing their version number at all.


>  I'm sure this one is about Databricks but I'm also sure Cloudera,
Hortonworks, etc had Spark releases with patches, too.

2. According to
https://mvnrepository.com/artifact/org.apache.spark/spark-core,
all the other companies followed  "Semantic Versioning" or added additional
version numbers at their distributions, didn't they? AFAIK, nobody claims
to take over the exact, "3.4.0" version string, in source code level like
this company.


> The principle here is consumer confusion.
> Is anyone substantially misled?
> Here I don't think so.

3. This company not only causes the 'Scala Version Segmentation'
environment in a subtle way, but also defames Apache Spark 3.4.0 by
removing many bug fixes of SPARK-40436 (Upgrade Scala to 2.12.17) and
SPARK-39414 (Upgrade Scala to 2.12.16) for some unknown reason. Apparently,
this looks like not a superior version of Apache Spark 3.4.0. For me, it's
the inferior version. If a company disagrees with Scala 2.12.17 for some
internal reason, they are able to stick to 2.12.15, of course. However,
Apache Spark PMC should not allow them to lie to the customers that "Apache
Spark 3.4.0" uses Scala 2.12.15 by default. That's the reason why I
initiated this email because I'm considering this as a serious blocker to
make Apache Spark Scala improvement.
- https://github.com/scala/scala/releases/tag/v2.12.17 (21 Merged PRs)
- https://github.com/scala/scala/releases/tag/v2.12.16 (68 Merged PRs)


> 2b/ If a single dependency blocks important updates, yeah it's fair to
remove it, IMHO. I wouldn't remove in 3.5 unless the other updates are
critical, and it's not clear they are. In 4.0 yes.

4. Apache Spark 3.5 is not exposed yet and we didn't cut the feature
branch. So, it's the opposite. I believe it's the best time to remove it
before cutting branch-3.5.


Dongjoon



On Mon, Jun 5, 2023 at 5:58 AM Sean Owen  wrote:

> 1/ Regarding naming - I believe releasing "Apache Foo X.Y + patches" is
> acceptable, if it is substantially Apache Foo X.Y. This is common practice
> for downstream vendors. It's fair nominative use. The principle here is
> consumer confusion. Is anyone substantially misled? Here I don't think so.
> I know that we have in the past decided it would not be OK, for example, to
> release a product with "Apache Spark 4.0" now as there is no such release,
> even building from master. A vendor should elaborate the changes
> somewhere, ideally. I'm sure this one is about Databricks but I'm also sure
> Cloudera, Hortonworks, etc had Spark releases with patches, too.
>
> 2a/ That issue seems to be about just flipping which code sample is shown
> by default. It seemed widely agree that this would slightly help more users
> than it harms. I agree with the change and don't see a need to escalate.
> the question of further Python parity is a big one but is separate.
>
> 2b/ If a single dependency blocks important updates, yeah it's fair to
> remove it, IMHO. I wouldn't remove in 3.5 unless the other updates are
> critical, and it's not clear they are. In 4.0 yes.
>
> 2c/ Scala 2.13 is already supported in 3.x, and does not require 4.0. This
> was about what the default non-Scala release convenience binaries use.
> Sticking to 2.12 in 3.x doesn't seem like an issue, even desirable.
>
> 2d/ Same as 2b
>
> 3/ I don't think 1/ is an incident. Yes to moving towards 4.0 after 3.5,
> IMHO, and to removing Ammonite in 4.0 if there is no resolution forthcoming
>
> On Mon, Jun 5, 2023 at 2:46 AM Dongjoon Hyun 
> wrote:
>
>> Hi, All and Matei (as the Chair of Apache Spark PMC).
>>
>> Sorry for a long email, I want to share two topics and corresponding
>> action items.
>> You can go to "Section 3: Action Items" directly for the conclusion.
>>
>>
>> ### 1. ASF Policy Violation ###
>>
>> ASF has a rule for "MAY I CALL MY MODIFIED CODE 'APACHE'?"
>>
>> https://www.apache.org/foundation/license-faq.html#Name-changes
>>
>> For example, when we call `Apache Spark 3.4.0`, it's supposed to be the
>> same with one of our official distributions.
>>
>> https://downloads.apache.org/spark/spark-3.4.0/
>>
>> Specifically, in terms of the Scala version, we believe it should have
>> Scala 2.12.17 because of 'SPARK-40436 Upgrade Scala to 2.12.17'.
>>
>> There is a company claiming something non-Apache like "Apache Spark 3.4.0
>> minus SPARK-40436" with the name "Apache Spark 3.4.0."
>>
>> - The company website shows "X.Y (includes Apache Spark 3.4.0, Scala
>> 2.12)"
>> - The runtime logs "23/06/05 04:23:27 INFO SparkContext: 

Re: ASF policy violation and Scala version issues

2023-06-05 Thread Sean Owen
1/ Regarding naming - I believe releasing "Apache Foo X.Y + patches" is
acceptable, if it is substantially Apache Foo X.Y. This is common practice
for downstream vendors. It's fair nominative use. The principle here is
consumer confusion. Is anyone substantially misled? Here I don't think so.
I know that we have in the past decided it would not be OK, for example, to
release a product with "Apache Spark 4.0" now as there is no such release,
even building from master. A vendor should elaborate the changes
somewhere, ideally. I'm sure this one is about Databricks but I'm also sure
Cloudera, Hortonworks, etc had Spark releases with patches, too.

2a/ That issue seems to be about just flipping which code sample is shown
by default. It seemed widely agree that this would slightly help more users
than it harms. I agree with the change and don't see a need to escalate.
the question of further Python parity is a big one but is separate.

2b/ If a single dependency blocks important updates, yeah it's fair to
remove it, IMHO. I wouldn't remove in 3.5 unless the other updates are
critical, and it's not clear they are. In 4.0 yes.

2c/ Scala 2.13 is already supported in 3.x, and does not require 4.0. This
was about what the default non-Scala release convenience binaries use.
Sticking to 2.12 in 3.x doesn't seem like an issue, even desirable.

2d/ Same as 2b

3/ I don't think 1/ is an incident. Yes to moving towards 4.0 after 3.5,
IMHO, and to removing Ammonite in 4.0 if there is no resolution forthcoming

On Mon, Jun 5, 2023 at 2:46 AM Dongjoon Hyun 
wrote:

> Hi, All and Matei (as the Chair of Apache Spark PMC).
>
> Sorry for a long email, I want to share two topics and corresponding
> action items.
> You can go to "Section 3: Action Items" directly for the conclusion.
>
>
> ### 1. ASF Policy Violation ###
>
> ASF has a rule for "MAY I CALL MY MODIFIED CODE 'APACHE'?"
>
> https://www.apache.org/foundation/license-faq.html#Name-changes
>
> For example, when we call `Apache Spark 3.4.0`, it's supposed to be the
> same with one of our official distributions.
>
> https://downloads.apache.org/spark/spark-3.4.0/
>
> Specifically, in terms of the Scala version, we believe it should have
> Scala 2.12.17 because of 'SPARK-40436 Upgrade Scala to 2.12.17'.
>
> There is a company claiming something non-Apache like "Apache Spark 3.4.0
> minus SPARK-40436" with the name "Apache Spark 3.4.0."
>
> - The company website shows "X.Y (includes Apache Spark 3.4.0, Scala
> 2.12)"
> - The runtime logs "23/06/05 04:23:27 INFO SparkContext: Running Spark
> version 3.4.0"
> - UI shows Apache Spark logo and `3.4.0`.
> - However, Scala Version is '2.12.15'
>
> [image: Screenshot 2023-06-04 at 9.37.16 PM.png][image: Screenshot
> 2023-06-04 at 10.14.45 PM.png]
>
> Lastly, this is not a single instance. For example, the same company also
> claims "Apache Spark 3.3.2" with a mismatched Scala version.
>
>
> ### 2. Scala Issues ###
>
> In addition to (1), although we proceeded with good intentions and great
> care
> including dev mailing list discussion, there are several concerning areas
> which
> need more attention and our love.
>
> a) Scala Spark users will experience UX inconvenience from Spark 3.5.
>
> SPARK-42493 Make Python the first tab for code examples
>
> For the record, we discussed it here.
> - https://lists.apache.org/thread/1p8s09ysrh4jqsfd47qdtrl7rm4rrs05
>   "[DISCUSS] Show Python code examples first in Spark documentation"
>
> b) Scala version upgrade is blocked by the Ammonite library dev cycle
> currently.
>
> Although we discussed it here and it had good intentions,
> the current master branch cannot use the latest Scala.
>
> - https://lists.apache.org/thread/4nk5ddtmlobdt8g3z8xbqjclzkhlsdfk
> "Ammonite as REPL for Spark Connect"
>  SPARK-42884 Add Ammonite REPL integration
>
> Specifically, the following are blocked and I'm monitoring the
> Ammonite repository.
> - SPARK-40497 Upgrade Scala to 2.13.11
> - SPARK-43832 Upgrade Scala to 2.12.18
> - According to https://github.com/com-lihaoyi/Ammonite/issues ,
>   Scala 3.3.0 LTS support also looks infeasible.
>
> Although we may be able to wait for a while, there are two fundamental
> solutions
> to unblock this situation in a long-term maintenance perspective.
> - Replace it with a Scala-shell based implementation
> - Move `connector/connect/client/jvm/pom.xml` outside from Spark repo.
>Maybe, we can put it into the new repo like Rust and Go client.
>
> c) Scala 2.13 and above needs Apache Spark 4.0.
>
> In "Apache Spark 3.5.0 Expectations?" and "Apache Spark 4.0
> Timeframe?" threads,
> we discussed Spark 3.5.0 scope and decided to revert
> "SPARK-43836 Make Scala 2.13 as default in Spark 3.5".
> Apache Spark 4.0.0 is the only way to support Scala 2.13 or higher.
>
> - https://lists.apache.org/thread/3x6dh17bmy20n3frtt3crgxjydnxh2o0
> ("Apache Spark 3.5.0 

ASF policy violation and Scala version issues

2023-06-05 Thread Dongjoon Hyun
Hi, All and Matei (as the Chair of Apache Spark PMC).

Sorry for a long email, I want to share two topics and corresponding action
items.
You can go to "Section 3: Action Items" directly for the conclusion.


### 1. ASF Policy Violation ###

ASF has a rule for "MAY I CALL MY MODIFIED CODE 'APACHE'?"

https://www.apache.org/foundation/license-faq.html#Name-changes

For example, when we call `Apache Spark 3.4.0`, it's supposed to be the
same with one of our official distributions.

https://downloads.apache.org/spark/spark-3.4.0/

Specifically, in terms of the Scala version, we believe it should have
Scala 2.12.17 because of 'SPARK-40436 Upgrade Scala to 2.12.17'.

There is a company claiming something non-Apache like "Apache Spark 3.4.0
minus SPARK-40436" with the name "Apache Spark 3.4.0."

- The company website shows "X.Y (includes Apache Spark 3.4.0, Scala
2.12)"
- The runtime logs "23/06/05 04:23:27 INFO SparkContext: Running Spark
version 3.4.0"
- UI shows Apache Spark logo and `3.4.0`.
- However, Scala Version is '2.12.15'

[image: Screenshot 2023-06-04 at 9.37.16 PM.png][image: Screenshot
2023-06-04 at 10.14.45 PM.png]

Lastly, this is not a single instance. For example, the same company also
claims "Apache Spark 3.3.2" with a mismatched Scala version.


### 2. Scala Issues ###

In addition to (1), although we proceeded with good intentions and great
care
including dev mailing list discussion, there are several concerning areas
which
need more attention and our love.

a) Scala Spark users will experience UX inconvenience from Spark 3.5.

SPARK-42493 Make Python the first tab for code examples

For the record, we discussed it here.
- https://lists.apache.org/thread/1p8s09ysrh4jqsfd47qdtrl7rm4rrs05
  "[DISCUSS] Show Python code examples first in Spark documentation"

b) Scala version upgrade is blocked by the Ammonite library dev cycle
currently.

Although we discussed it here and it had good intentions,
the current master branch cannot use the latest Scala.

- https://lists.apache.org/thread/4nk5ddtmlobdt8g3z8xbqjclzkhlsdfk
"Ammonite as REPL for Spark Connect"
 SPARK-42884 Add Ammonite REPL integration

Specifically, the following are blocked and I'm monitoring the Ammonite
repository.
- SPARK-40497 Upgrade Scala to 2.13.11
- SPARK-43832 Upgrade Scala to 2.12.18
- According to https://github.com/com-lihaoyi/Ammonite/issues ,
  Scala 3.3.0 LTS support also looks infeasible.

Although we may be able to wait for a while, there are two fundamental
solutions
to unblock this situation in a long-term maintenance perspective.
- Replace it with a Scala-shell based implementation
- Move `connector/connect/client/jvm/pom.xml` outside from Spark repo.
   Maybe, we can put it into the new repo like Rust and Go client.

c) Scala 2.13 and above needs Apache Spark 4.0.

In "Apache Spark 3.5.0 Expectations?" and "Apache Spark 4.0 Timeframe?"
threads,
we discussed Spark 3.5.0 scope and decided to revert
"SPARK-43836 Make Scala 2.13 as default in Spark 3.5".
Apache Spark 4.0.0 is the only way to support Scala 2.13 or higher.

- https://lists.apache.org/thread/3x6dh17bmy20n3frtt3crgxjydnxh2o0
("Apache Spark 3.5.0 Expectations?")
- https://lists.apache.org/thread/xhkgj60j361gdpywoxxz7qspp2w80ry6
("Apache Spark 4.0 Timeframe?")

 A candidate(or mentioned) timeframe was "Spark 4.0.0: 2024.06" and
Scala 3.3.0 LTS.
 - https://scala-lang.org/blog/2023/05/30/scala-3.3.0-released.html

d) Java 21 LTS is Apache Spark 3.5.0's stretched goal

SPARK-43831 Build and Run Spark on Java 21

However, this needs SPARK-40497 (Scala 2.13.11) and SPARK-43832 (Scala
2.12.18)
which are blocked by Ammonite library as mentioned in (b)


### 3. Action Items ###

To provide a clarity to the Apache Spark Scala community,

- We should communicate and help the company to fix the misleading messages
and
  remove Scala-version segmentation situations per Spark version.

- Apache Spark PMC should include this incident report and the result
  in the next Apache Spark Quarterly Report (August).

- I will start a vote for Apache Spark 4.0.0 timeframe next week after
receiving more feedback.
  Since 4.0.0 is not limited to the Scala issues, we will vote on the
timeline only.

- Lastly, we need to re-evaluate the risk of  `Ammonite` library before
Apache Spark 3.5.0 release.
  If it blocks Scala upgrade and Java 21 support, we had better avoid it at
all cost.


WDTY?

Thanks,
Dongjoon.