Re: [VOTE] Apache Spark 3.0.0 RC1

2020-05-07 Thread Jungtaek Lim
I don't see any new features/functions for these blockers. For SPARK-31257 (which is filed and marked as a blocker from me), I agree unifying create table syntax shouldn't be a blocker for Spark 3.0.0, as that is a new feature, but even we put the proposal aside, the problem remains the same and

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-05-07 Thread Xiao Li
Below are the three major blockers. I think we should start discussing how to unblock the release. -

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-05-07 Thread Sean Owen
So, this RC1 doesn't pass of course, but what's the status of RC2 - are there outstanding issues? On Tue, Mar 31, 2020 at 10:04 PM Reynold Xin wrote: > Please vote on releasing the following candidate as Apache Spark version > 3.0.0. > > The vote is open until 11:59pm Pacific time Fri Apr 3,

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-10 Thread Marcelo Vanzin
-0.5, mostly because this requires extra things not in the default packaging... But if you add the hadoop-aws libraries and dependencies to Spark built with Hadoop 3, things don't work: $ ./bin/spark-shell --jars s3a://blah 20/04/10 16:28:32 WARN Utils: Your hostname, vanzin-t480 resolves to a

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-09 Thread Jungtaek Lim
Thanks for sharing the blockers, Wenchen. SPARK-31404 has sub-tasks, hence that means all sub-tasks are blockers for this release, do I understand that correctly? Xiao, I sincerely respect the practice the Spark community has been done, so please treat it as 2 cents. Just would like to see the

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-09 Thread Wenchen Fan
The ongoing critical issues I'm aware of are: SPARK-31257 : Fix ambiguous two different CREATE TABLE syntaxes SPARK-31404 : backward compatibility issues after switching to Proleptic Gregorian

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-09 Thread Xiao Li
Only the low-risk or high-value bug fixes, and the documentation changes are allowed to merge to branch-3.0. I expect all the committers are following the same rules like what we did in the previous releases. Xiao On Thu, Apr 9, 2020 at 6:13 PM Jungtaek Lim wrote: > Looks like around 80

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-09 Thread Jungtaek Lim
Looks like around 80 commits have been landed to branch-3.0 after we cut RC1 (I know many of them are to version the config, as well as add docs). Shall we announce the blocker-only phase and maintain the list of blockers to restrict the changes on the branch? This would make everyone being

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-09 Thread Jungtaek Lim
I went through some manually tests for the new features of Structured Streaming in Spark 3.0.0. (Please let me know if there're more features we'd like to test manually.) * file source cleanup - both “archive" and “delete" work. Query fails as expected when the input directory is the output

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-03 Thread Sean Owen
Aside from the other issues mentioned here, which probably do require another RC, this looks pretty good to me. I built on Ubuntu 19 and ran with Java 11, -Pspark-ganglia-lgpl -Pkinesis-asl -Phadoop-3.2 -Phive-2.3 -Pyarn -Pmesos -Pkubernetes -Phive-thriftserver -Djava.version=11 I did see the

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-02 Thread Takeshi Yamamuro
Also, I think the 3.0 release had better to include all the SQL document updates: https://issues.apache.org/jira/browse/SPARK-28588 On Fri, Apr 3, 2020 at 12:36 AM Sean Owen wrote: > (If it wasn't stated explicitly, yeah I think we knew there are a few > important unresolved issues and that

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-02 Thread Sean Owen
(If it wasn't stated explicitly, yeah I think we knew there are a few important unresolved issues and that this RC was going to fail. Let's all please test anyway of course, to flush out any additional issues, rather than wait. Pipelining and all that.) On Thu, Apr 2, 2020 at 10:31 AM Maxim Gekk

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-02 Thread Maxim Gekk
-1 (non-binding) The problem of compatibility with Spark 2.4 in reading/writing dates/timestamps hasn't been solved completely so far. In particular, the sub-task https://issues.apache.org/jira/browse/SPARK-31328 hasn't resolved yet. Maxim Gekk Software Engineer Databricks, Inc. On Wed, Apr

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-01 Thread Ryan Blue
-1 (non-binding) I agree with Jungtaek. The change to create datasource tables instead of Hive tables by default (no USING or STORED AS clauses) has created confusing behavior and should either be rolled back or fixed before 3.0. On Wed, Apr 1, 2020 at 5:12 AM Sean Owen wrote: > Those are not

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-01 Thread Sean Owen
Those are not per se release blockers. They are (perhaps important) improvements to functionality. I don't know who is active and able to review that part of the code; I'd look for authors of changes in the surrounding code. The question here isn't so much what one would like to see in this

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-01 Thread Dr. Kent Yao
-1 Do not release this package because v3.0.0 is the 3rd major release since we added Spark On Kubernetes. Can we make it more production-ready as it has been experimental for more than 2 years? The main practical adoption of Spark on Kubernetes is to take on the role of other cluster

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-01 Thread Reynold Xin
The Apache Software Foundation requires voting before any release can be published. On Tue, Mar 31, 2020 at 11:27 PM, Stephen Coy < s...@infomedia.com.au.invalid > wrote: > > >> On 1 Apr 2020, at 5:20 pm, Sean Owen < srowen@ gmail. com ( >> sro...@gmail.com ) > wrote: >> >> It can be

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-01 Thread Stephen Coy
On 1 Apr 2020, at 5:20 pm, Sean Owen mailto:sro...@gmail.com>> wrote: It can be published as "3.0.0-rc1" but how do we test that to vote on it without some other RC1 RC1 I’m not sure what you mean by this question? This email contains confidential information of and is the copyright of

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-01 Thread Sean Owen
You just mvn -DskipTests install the source release. That is the primary artifact we're testing. But yes you could put the jars in your local repo too. I think this is pretty standard practice. Obviously the RC can't be published as "3.0.0". It can be published as "3.0.0-rc1" but how do we test

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-03-31 Thread Stephen Coy
Therefore, if I want to build my product against these jars I need to either locally install these jars or checkout and build the RC tag. I guess I need to build anyway because I need a spark-hadoop-cloud_2.12-3.0.0.jar. BTW, it would be incredibly handy to have this in the distro, or at least

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-03-31 Thread Jungtaek Lim
-1 (non-binding) I filed SPARK-31257 as a blocker, and now others start to agree that it's a critical issue which should be dealt before releasing Spark 3.0. Please refer recent comments in https://github.com/apache/spark/pull/28026 It won't delay the release pretty much, as we can either revert

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-03-31 Thread Stephen Coy
That is a very unusual practice... On 1 Apr 2020, at 3:32 pm, Sean Owen mailto:sro...@gmail.com>> wrote: These are release candidates, not the final release, so they won't be published to Maven Central. The naming matches what the final release would be. On Tue, Mar 31, 2020 at 11:25 PM

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-03-31 Thread Wenchen Fan
Yea, release candidates are different from the preview version, as release candidates are not official releases, so they won't appear in Maven Central, can't be downloaded in the Spark official website, etc. On Wed, Apr 1, 2020 at 12:32 PM Sean Owen wrote: > These are release candidates, not

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-03-31 Thread Sean Owen
These are release candidates, not the final release, so they won't be published to Maven Central. The naming matches what the final release would be. On Tue, Mar 31, 2020 at 11:25 PM Stephen Coy wrote: > Furthermore, the spark jars in these bundles all look like release > versions: > >

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-03-31 Thread Stephen Coy
Furthermore, the spark jars in these bundles all look like release versions: [scoy@Steves-Core-i9 spark-3.0.0-bin-hadoop3.2]$ ls -l jars/spark-* -rw-r--r--@ 1 scoy staff 9261223 31 Mar 20:55 jars/spark-catalyst_2.12-3.0.0.jar -rw-r--r--@ 1 scoy staff 9720421 31 Mar 20:55

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-03-31 Thread Stephen Coy
The download artifacts are all seem to have the “RC1” missing from their names. e.g. spark-3.0.0-bin-hadoop3.2.tgz Cheers, Steve C On 1 Apr 2020, at 2:04 pm, Reynold Xin mailto:r...@databricks.com>> wrote: Please vote on releasing the following candidate as Apache Spark version 3.0.0. The