Re: [VOTE] Apache Spark 3.0.0 RC1

2020-05-07 Thread Jungtaek Lim
I don't see any new features/functions for these blockers.

For SPARK-31257 (which is filed and marked as a blocker from me), I agree
unifying create table syntax shouldn't be a blocker for Spark 3.0.0, as
that is a new feature, but even we put the proposal aside, the problem
remains the same and I think it's still a blocker.

We have a discussion thread for SPARK-31257 - let's revive the thread if we
wouldn't be able to adopt proposed solution in Spark 3.0.0.
https://lists.apache.org/thread.html/rf1acfaaa3de2d3129575199c28e7d529d38f2783e7d3c5be2ac8923d%40%3Cdev.spark.apache.org%3E

On Fri, May 8, 2020 at 9:41 AM Xiao Li  wrote:

> Below are the three major blockers. I think we should start discussing how
> to unblock the release.
> 
>
>
>- 
>https://issues.apache.org/jira/browse/SPARK-31257
>- https://issues.apache.org/jira/browse/SPARK-31399
>- https://issues.apache.org/jira/browse/SPARK-31404
>
> In this stage, for the features/functions that are not supported in the
> previous releases, we should just simply throw an exception and document it
> as a limitation. We do not need to fix all the things to block the release.
> Not all of them are blockers.
>
> Can we start RC2 next week?
>
> Xiao
>
>
> On Thu, May 7, 2020 at 5:28 PM Sean Owen  wrote:
>
>> So, this RC1 doesn't pass of course, but what's the status of RC2 - are
>> there outstanding issues?
>>
>> On Tue, Mar 31, 2020 at 10:04 PM Reynold Xin  wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 3.0.0.
>>>
>>> The vote is open until 11:59pm Pacific time Fri Apr 3, and passes if a
>>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>>
>>> [ ] +1 Release this package as Apache Spark 3.0.0
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is v3.0.0-rc1 (commit
>>> 6550d0d5283efdbbd838f3aeaf0476c7f52a0fb1):
>>> https://github.com/apache/spark/tree/v3.0.0-rc1
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc1-bin/
>>>
>>> Signatures used for Spark RCs can be found in this file:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1341/
>>>
>>> The documentation corresponding to this release can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc1-docs/
>>>
>>> The list of bug fixes going into 2.4.5 can be found at the following URL:
>>> https://issues.apache.org/jira/projects/SPARK/versions/12339177
>>>
>>> This release is using the release script of the tag v3.0.0-rc1.
>>>
>>>
>>> FAQ
>>>
>>> =
>>> How can I help test this release?
>>> =
>>> If you are a Spark user, you can help us test this release by taking
>>> an existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> If you're working in PySpark you can set up a virtual env and install
>>> the current RC and see if anything important breaks, in the Java/Scala
>>> you can add the staging repository to your projects resolvers and test
>>> with the RC (make sure to clean up the artifact cache before/after so
>>> you don't end up building with a out of date RC going forward).
>>>
>>> ===
>>> What should happen to JIRA tickets still targeting 3.0.0?
>>> ===
>>> The current list of open tickets targeted at 3.0.0 can be found at:
>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>> Version/s" = 3.0.0
>>>
>>> Committers should look at those and triage. Extremely important bug
>>> fixes, documentation, and API tweaks that impact compatibility should
>>> be worked on immediately. Everything else please retarget to an
>>> appropriate release.
>>>
>>> ==
>>> But my bug isn't fixed?
>>> ==
>>> In order to make timely releases, we will typically not hold the
>>> release unless the bug in question is a regression from the previous
>>> release. That being said, if there is something which is a regression
>>> that has not been correctly targeted please ping me or a committer to
>>> help target the issue.
>>>
>>>
>>> Note: I fully expect this RC to fail.
>>>
>>>
>>>
>>>
>
> --
> 
>


Re: [VOTE] Apache Spark 3.0.0 RC1

2020-05-07 Thread Xiao Li
Below are the three major blockers. I think we should start discussing how
to unblock the release.


   
   - 
   https://issues.apache.org/jira/browse/SPARK-31257
   - https://issues.apache.org/jira/browse/SPARK-31399
   - https://issues.apache.org/jira/browse/SPARK-31404

In this stage, for the features/functions that are not supported in the
previous releases, we should just simply throw an exception and document it
as a limitation. We do not need to fix all the things to block the release.
Not all of them are blockers.

Can we start RC2 next week?

Xiao


On Thu, May 7, 2020 at 5:28 PM Sean Owen  wrote:

> So, this RC1 doesn't pass of course, but what's the status of RC2 - are
> there outstanding issues?
>
> On Tue, Mar 31, 2020 at 10:04 PM Reynold Xin  wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 3.0.0.
>>
>> The vote is open until 11:59pm Pacific time Fri Apr 3, and passes if a
>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 3.0.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v3.0.0-rc1 (commit
>> 6550d0d5283efdbbd838f3aeaf0476c7f52a0fb1):
>> https://github.com/apache/spark/tree/v3.0.0-rc1
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc1-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1341/
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc1-docs/
>>
>> The list of bug fixes going into 2.4.5 can be found at the following URL:
>> https://issues.apache.org/jira/projects/SPARK/versions/12339177
>>
>> This release is using the release script of the tag v3.0.0-rc1.
>>
>>
>> FAQ
>>
>> =
>> How can I help test this release?
>> =
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install
>> the current RC and see if anything important breaks, in the Java/Scala
>> you can add the staging repository to your projects resolvers and test
>> with the RC (make sure to clean up the artifact cache before/after so
>> you don't end up building with a out of date RC going forward).
>>
>> ===
>> What should happen to JIRA tickets still targeting 3.0.0?
>> ===
>> The current list of open tickets targeted at 3.0.0 can be found at:
>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>> Version/s" = 3.0.0
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should
>> be worked on immediately. Everything else please retarget to an
>> appropriate release.
>>
>> ==
>> But my bug isn't fixed?
>> ==
>> In order to make timely releases, we will typically not hold the
>> release unless the bug in question is a regression from the previous
>> release. That being said, if there is something which is a regression
>> that has not been correctly targeted please ping me or a committer to
>> help target the issue.
>>
>>
>> Note: I fully expect this RC to fail.
>>
>>
>>
>>

-- 



Re: [VOTE] Apache Spark 3.0.0 RC1

2020-05-07 Thread Sean Owen
So, this RC1 doesn't pass of course, but what's the status of RC2 - are
there outstanding issues?

On Tue, Mar 31, 2020 at 10:04 PM Reynold Xin  wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 3.0.0.
>
> The vote is open until 11:59pm Pacific time Fri Apr 3, and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.0.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.0.0-rc1 (commit
> 6550d0d5283efdbbd838f3aeaf0476c7f52a0fb1):
> https://github.com/apache/spark/tree/v3.0.0-rc1
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc1-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1341/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc1-docs/
>
> The list of bug fixes going into 2.4.5 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12339177
>
> This release is using the release script of the tag v3.0.0-rc1.
>
>
> FAQ
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.0.0?
> ===
> The current list of open tickets targeted at 3.0.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.0.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>
> Note: I fully expect this RC to fail.
>
>
>
>


Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-10 Thread Marcelo Vanzin
-0.5, mostly because this requires extra things not in the default
packaging...

But if you add the hadoop-aws libraries and dependencies to Spark built
with Hadoop 3, things don't work:

$ ./bin/spark-shell --jars s3a://blah
20/04/10 16:28:32 WARN Utils: Your hostname, vanzin-t480 resolves to a
loopback address: 127.0.1.1; using 192.168.2.14 instead (on interface
wlp3s0)
20/04/10 16:28:32 WARN Utils: Set SPARK_LOCAL_IP if you need to bind
to another address
20/04/10 16:28:32 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where
applicable
20/04/10 16:28:32 WARN MetricsConfig: Cannot locate configuration:
tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
Exception in thread "main" java.lang.NoSuchMethodError:
com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;Ljava/lang/Object;)V
at
org.apache.hadoop.fs.s3a.S3AUtils.lookupPassword(S3AUtils.java:816)
at
org.apache.hadoop.fs.s3a.S3AUtils.lookupPassword(S3AUtils.java:792)
at
org.apache.hadoop.fs.s3a.S3AUtils.getAWSAccessKeys(S3AUtils.java:747)
at
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider.(SimpleAWSCredentialsProvider.java:58)
at
org.apache.hadoop.fs.s3a.S3AUtils.createAWSCredentialProviderSet(S3AUtils.java:600)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:260)
at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
at
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
at
org.apache.spark.deploy.DependencyUtils$.resolveGlobPath(DependencyUtils.scala:191)

That's because Hadoop 3.2 is using Guava 27 and Spark still ships Guava 14
(which is ok for Hadoop 2).


On Tue, Mar 31, 2020 at 8:05 PM Reynold Xin  wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 3.0.0.
>
> The vote is open until 11:59pm Pacific time Fri Apr 3, and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.0.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.0.0-rc1 (commit
> 6550d0d5283efdbbd838f3aeaf0476c7f52a0fb1):
> https://github.com/apache/spark/tree/v3.0.0-rc1
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc1-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1341/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc1-docs/
>
> The list of bug fixes going into 2.4.5 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12339177
>
> This release is using the release script of the tag v3.0.0-rc1.
>
>
> FAQ
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.0.0?
> ===
> The current list of open tickets targeted at 3.0.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.0.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>
> Note: I fully expect this RC to fail.
>
>
>
>

-- 
Marcelo Vanzin

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-09 Thread Jungtaek Lim
Thanks for sharing the blockers, Wenchen. SPARK-31404 has sub-tasks, hence
that means all sub-tasks are blockers for this release, do I understand
that correctly?

Xiao, I sincerely respect the practice the Spark community has been done,
so please treat it as 2 cents. Just would like to see the way how the
community could focus on the such huge release - even only counting bugs +
improvement + new features, nearly 2000 issues has been resolved "only" in
Spark 3.0.0. The volume seems to be quite different from usual bugfix and
minor releases which feels that special cares are needed.


On Fri, Apr 10, 2020 at 1:22 PM Wenchen Fan  wrote:

> The ongoing critical issues I'm aware of are:
> SPARK-31257 : Fix
> ambiguous two different CREATE TABLE syntaxes
> SPARK-31404 : backward
> compatibility issues after switching to Proleptic Gregorian calendar
> SPARK-31399 : closure
> cleaner is broken in Spark 3.0
> SPARK-28067 :
> Incorrect results in decimal aggregation with whole-stage codegen enabled
>
> That said, I'm -1 (binding) to RC1
>
> Please reply to this thread if you know more critical issues that should
> be fixed before 3.0.
>
> Thanks,
> Wenchen
>
>
> On Fri, Apr 10, 2020 at 10:01 AM Xiao Li  wrote:
>
>> Only the low-risk or high-value bug fixes, and the documentation changes
>> are allowed to merge to branch-3.0. I expect all the committers are
>> following the same rules like what we did in the previous releases.
>>
>> Xiao
>>
>> On Thu, Apr 9, 2020 at 6:13 PM Jungtaek Lim 
>> wrote:
>>
>>> Looks like around 80 commits have been landed to branch-3.0 after we cut
>>> RC1 (I know many of them are to version the config, as well as add docs).
>>> Shall we announce the blocker-only phase and maintain the list of blockers
>>> to restrict the changes on the branch? This would make everyone being
>>> hesitate to test the RC1 (see how many people have been tested RC1 in this
>>> thread), as they probably need to test the same with RC2.
>>>
>>> On Thu, Apr 9, 2020 at 5:50 PM Jungtaek Lim <
>>> kabhwan.opensou...@gmail.com> wrote:
>>>
 I went through some manually tests for the new features of Structured
 Streaming in Spark 3.0.0. (Please let me know if there're more features
 we'd like to test manually.)

 * file source cleanup - both “archive" and “delete" work. Query fails
 as expected when the input directory is the output directory of file sink.
 * kafka source/sink - “header” works for both source and sink, "group
 id prefix" and “static group id” work, confirmed start offset by timestamp
 works for streaming case
 * event log stuffs with streaming query - enabled it, confirmed
 compaction works, and SHS can read compacted event logs, and downloading
 event log in SHS works as zipping the event log directory. original
 functionalities with single event log file work as well.

 Looks good, though there're still plenty of commits pushed to
 branch-3.0 after RC1 which feels me that it may not be safe to carry over
 the test result for RC1 to RC2.

 On Sat, Apr 4, 2020 at 12:49 AM Sean Owen  wrote:

> Aside from the other issues mentioned here, which probably do require
> another RC, this looks pretty good to me.
>
> I built on Ubuntu 19 and ran with Java 11, -Pspark-ganglia-lgpl
> -Pkinesis-asl -Phadoop-3.2 -Phive-2.3 -Pyarn -Pmesos -Pkubernetes
> -Phive-thriftserver -Djava.version=11
>
> I did see the following test failures, but as usual, I'm not sure
> whether it's specific to me. Anyone else see these, particularly the R
> warnings?
>
>
> PythonUDFSuite:
> org.apache.spark.sql.execution.python.PythonUDFSuite *** ABORTED ***
>   java.lang.RuntimeException: Unable to load a Suite class that was
> discovered in the runpath:
> org.apache.spark.sql.execution.python.PythonUDFSuite
>   at
> org.scalatest.tools.DiscoverySuite$.getSuiteInstance(DiscoverySuite.scala:81)
>   at
> org.scalatest.tools.DiscoverySuite.$anonfun$nestedSuites$1(DiscoverySuite.scala:38)
>   at
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>   at scala.collection.Iterator.foreach(Iterator.scala:941)
>   at scala.collection.Iterator.foreach$(Iterator.scala:941)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
>   at scala.collection.IterableLike.foreach(IterableLike.scala:74)
>   at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:238)
>
>
> - SPARK-25158: Executor accidentally exit because
> ScriptTransformationWriterThread throw 

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-09 Thread Wenchen Fan
The ongoing critical issues I'm aware of are:
SPARK-31257 : Fix
ambiguous two different CREATE TABLE syntaxes
SPARK-31404 : backward
compatibility issues after switching to Proleptic Gregorian calendar
SPARK-31399 : closure
cleaner is broken in Spark 3.0
SPARK-28067 : Incorrect
results in decimal aggregation with whole-stage codegen enabled

That said, I'm -1 (binding) to RC1

Please reply to this thread if you know more critical issues that should be
fixed before 3.0.

Thanks,
Wenchen


On Fri, Apr 10, 2020 at 10:01 AM Xiao Li  wrote:

> Only the low-risk or high-value bug fixes, and the documentation changes
> are allowed to merge to branch-3.0. I expect all the committers are
> following the same rules like what we did in the previous releases.
>
> Xiao
>
> On Thu, Apr 9, 2020 at 6:13 PM Jungtaek Lim 
> wrote:
>
>> Looks like around 80 commits have been landed to branch-3.0 after we cut
>> RC1 (I know many of them are to version the config, as well as add docs).
>> Shall we announce the blocker-only phase and maintain the list of blockers
>> to restrict the changes on the branch? This would make everyone being
>> hesitate to test the RC1 (see how many people have been tested RC1 in this
>> thread), as they probably need to test the same with RC2.
>>
>> On Thu, Apr 9, 2020 at 5:50 PM Jungtaek Lim 
>> wrote:
>>
>>> I went through some manually tests for the new features of Structured
>>> Streaming in Spark 3.0.0. (Please let me know if there're more features
>>> we'd like to test manually.)
>>>
>>> * file source cleanup - both “archive" and “delete" work. Query fails as
>>> expected when the input directory is the output directory of file sink.
>>> * kafka source/sink - “header” works for both source and sink, "group id
>>> prefix" and “static group id” work, confirmed start offset by timestamp
>>> works for streaming case
>>> * event log stuffs with streaming query - enabled it, confirmed
>>> compaction works, and SHS can read compacted event logs, and downloading
>>> event log in SHS works as zipping the event log directory. original
>>> functionalities with single event log file work as well.
>>>
>>> Looks good, though there're still plenty of commits pushed to branch-3.0
>>> after RC1 which feels me that it may not be safe to carry over the
>>> test result for RC1 to RC2.
>>>
>>> On Sat, Apr 4, 2020 at 12:49 AM Sean Owen  wrote:
>>>
 Aside from the other issues mentioned here, which probably do require
 another RC, this looks pretty good to me.

 I built on Ubuntu 19 and ran with Java 11, -Pspark-ganglia-lgpl
 -Pkinesis-asl -Phadoop-3.2 -Phive-2.3 -Pyarn -Pmesos -Pkubernetes
 -Phive-thriftserver -Djava.version=11

 I did see the following test failures, but as usual, I'm not sure
 whether it's specific to me. Anyone else see these, particularly the R
 warnings?


 PythonUDFSuite:
 org.apache.spark.sql.execution.python.PythonUDFSuite *** ABORTED ***
   java.lang.RuntimeException: Unable to load a Suite class that was
 discovered in the runpath:
 org.apache.spark.sql.execution.python.PythonUDFSuite
   at
 org.scalatest.tools.DiscoverySuite$.getSuiteInstance(DiscoverySuite.scala:81)
   at
 org.scalatest.tools.DiscoverySuite.$anonfun$nestedSuites$1(DiscoverySuite.scala:38)
   at
 scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
   at scala.collection.Iterator.foreach(Iterator.scala:941)
   at scala.collection.Iterator.foreach$(Iterator.scala:941)
   at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
   at scala.collection.IterableLike.foreach(IterableLike.scala:74)
   at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
   at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
   at scala.collection.TraversableLike.map(TraversableLike.scala:238)


 - SPARK-25158: Executor accidentally exit because
 ScriptTransformationWriterThread throw Exception *** FAILED ***
   Expected exception org.apache.spark.SparkException to be thrown, but
 no exception was thrown (SQLQuerySuite.scala:2384)


 * checking for missing documentation entries ... WARNING
 Undocumented code objects:
   ‘%<=>%’ ‘add_months’ ‘agg’ ‘approxCountDistinct’ ‘approxQuantile’
   ‘approx_count_distinct’ ‘arrange’ ‘array_contains’ ‘array_distinct’
 ...
  WARNING
 ‘qpdf’ is needed for checks on size reduction of PDFs

 On Tue, Mar 31, 2020 at 10:04 PM Reynold Xin 
 wrote:
 >
 > Please vote on releasing the following candidate as Apache Spark
 version 3.0.0.
 >
 > The vote is open until 11:59pm Pacific time Fri Apr 3, and passes if
 a majority +1 PMC votes are cast, with a 

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-09 Thread Xiao Li
Only the low-risk or high-value bug fixes, and the documentation changes
are allowed to merge to branch-3.0. I expect all the committers are
following the same rules like what we did in the previous releases.

Xiao

On Thu, Apr 9, 2020 at 6:13 PM Jungtaek Lim 
wrote:

> Looks like around 80 commits have been landed to branch-3.0 after we cut
> RC1 (I know many of them are to version the config, as well as add docs).
> Shall we announce the blocker-only phase and maintain the list of blockers
> to restrict the changes on the branch? This would make everyone being
> hesitate to test the RC1 (see how many people have been tested RC1 in this
> thread), as they probably need to test the same with RC2.
>
> On Thu, Apr 9, 2020 at 5:50 PM Jungtaek Lim 
> wrote:
>
>> I went through some manually tests for the new features of Structured
>> Streaming in Spark 3.0.0. (Please let me know if there're more features
>> we'd like to test manually.)
>>
>> * file source cleanup - both “archive" and “delete" work. Query fails as
>> expected when the input directory is the output directory of file sink.
>> * kafka source/sink - “header” works for both source and sink, "group id
>> prefix" and “static group id” work, confirmed start offset by timestamp
>> works for streaming case
>> * event log stuffs with streaming query - enabled it, confirmed
>> compaction works, and SHS can read compacted event logs, and downloading
>> event log in SHS works as zipping the event log directory. original
>> functionalities with single event log file work as well.
>>
>> Looks good, though there're still plenty of commits pushed to branch-3.0
>> after RC1 which feels me that it may not be safe to carry over the
>> test result for RC1 to RC2.
>>
>> On Sat, Apr 4, 2020 at 12:49 AM Sean Owen  wrote:
>>
>>> Aside from the other issues mentioned here, which probably do require
>>> another RC, this looks pretty good to me.
>>>
>>> I built on Ubuntu 19 and ran with Java 11, -Pspark-ganglia-lgpl
>>> -Pkinesis-asl -Phadoop-3.2 -Phive-2.3 -Pyarn -Pmesos -Pkubernetes
>>> -Phive-thriftserver -Djava.version=11
>>>
>>> I did see the following test failures, but as usual, I'm not sure
>>> whether it's specific to me. Anyone else see these, particularly the R
>>> warnings?
>>>
>>>
>>> PythonUDFSuite:
>>> org.apache.spark.sql.execution.python.PythonUDFSuite *** ABORTED ***
>>>   java.lang.RuntimeException: Unable to load a Suite class that was
>>> discovered in the runpath:
>>> org.apache.spark.sql.execution.python.PythonUDFSuite
>>>   at
>>> org.scalatest.tools.DiscoverySuite$.getSuiteInstance(DiscoverySuite.scala:81)
>>>   at
>>> org.scalatest.tools.DiscoverySuite.$anonfun$nestedSuites$1(DiscoverySuite.scala:38)
>>>   at
>>> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>>>   at scala.collection.Iterator.foreach(Iterator.scala:941)
>>>   at scala.collection.Iterator.foreach$(Iterator.scala:941)
>>>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
>>>   at scala.collection.IterableLike.foreach(IterableLike.scala:74)
>>>   at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
>>>   at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
>>>   at scala.collection.TraversableLike.map(TraversableLike.scala:238)
>>>
>>>
>>> - SPARK-25158: Executor accidentally exit because
>>> ScriptTransformationWriterThread throw Exception *** FAILED ***
>>>   Expected exception org.apache.spark.SparkException to be thrown, but
>>> no exception was thrown (SQLQuerySuite.scala:2384)
>>>
>>>
>>> * checking for missing documentation entries ... WARNING
>>> Undocumented code objects:
>>>   ‘%<=>%’ ‘add_months’ ‘agg’ ‘approxCountDistinct’ ‘approxQuantile’
>>>   ‘approx_count_distinct’ ‘arrange’ ‘array_contains’ ‘array_distinct’
>>> ...
>>>  WARNING
>>> ‘qpdf’ is needed for checks on size reduction of PDFs
>>>
>>> On Tue, Mar 31, 2020 at 10:04 PM Reynold Xin 
>>> wrote:
>>> >
>>> > Please vote on releasing the following candidate as Apache Spark
>>> version 3.0.0.
>>> >
>>> > The vote is open until 11:59pm Pacific time Fri Apr 3, and passes if a
>>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>> >
>>> > [ ] +1 Release this package as Apache Spark 3.0.0
>>> > [ ] -1 Do not release this package because ...
>>> >
>>> > To learn more about Apache Spark, please see http://spark.apache.org/
>>> >
>>> > The tag to be voted on is v3.0.0-rc1 (commit
>>> 6550d0d5283efdbbd838f3aeaf0476c7f52a0fb1):
>>> > https://github.com/apache/spark/tree/v3.0.0-rc1
>>> >
>>> > The release files, including signatures, digests, etc. can be found at:
>>> > https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc1-bin/
>>> >
>>> > Signatures used for Spark RCs can be found in this file:
>>> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>>> >
>>> > The staging repository for this release can be found at:
>>> >
>>> https://repository.apache.org/content/repositories/orgapachespark-1341/
>>> >
>>> > The documentation 

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-09 Thread Jungtaek Lim
Looks like around 80 commits have been landed to branch-3.0 after we cut
RC1 (I know many of them are to version the config, as well as add docs).
Shall we announce the blocker-only phase and maintain the list of blockers
to restrict the changes on the branch? This would make everyone being
hesitate to test the RC1 (see how many people have been tested RC1 in this
thread), as they probably need to test the same with RC2.

On Thu, Apr 9, 2020 at 5:50 PM Jungtaek Lim 
wrote:

> I went through some manually tests for the new features of Structured
> Streaming in Spark 3.0.0. (Please let me know if there're more features
> we'd like to test manually.)
>
> * file source cleanup - both “archive" and “delete" work. Query fails as
> expected when the input directory is the output directory of file sink.
> * kafka source/sink - “header” works for both source and sink, "group id
> prefix" and “static group id” work, confirmed start offset by timestamp
> works for streaming case
> * event log stuffs with streaming query - enabled it, confirmed compaction
> works, and SHS can read compacted event logs, and downloading event log in
> SHS works as zipping the event log directory. original functionalities with
> single event log file work as well.
>
> Looks good, though there're still plenty of commits pushed to branch-3.0
> after RC1 which feels me that it may not be safe to carry over the
> test result for RC1 to RC2.
>
> On Sat, Apr 4, 2020 at 12:49 AM Sean Owen  wrote:
>
>> Aside from the other issues mentioned here, which probably do require
>> another RC, this looks pretty good to me.
>>
>> I built on Ubuntu 19 and ran with Java 11, -Pspark-ganglia-lgpl
>> -Pkinesis-asl -Phadoop-3.2 -Phive-2.3 -Pyarn -Pmesos -Pkubernetes
>> -Phive-thriftserver -Djava.version=11
>>
>> I did see the following test failures, but as usual, I'm not sure
>> whether it's specific to me. Anyone else see these, particularly the R
>> warnings?
>>
>>
>> PythonUDFSuite:
>> org.apache.spark.sql.execution.python.PythonUDFSuite *** ABORTED ***
>>   java.lang.RuntimeException: Unable to load a Suite class that was
>> discovered in the runpath:
>> org.apache.spark.sql.execution.python.PythonUDFSuite
>>   at
>> org.scalatest.tools.DiscoverySuite$.getSuiteInstance(DiscoverySuite.scala:81)
>>   at
>> org.scalatest.tools.DiscoverySuite.$anonfun$nestedSuites$1(DiscoverySuite.scala:38)
>>   at
>> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>>   at scala.collection.Iterator.foreach(Iterator.scala:941)
>>   at scala.collection.Iterator.foreach$(Iterator.scala:941)
>>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
>>   at scala.collection.IterableLike.foreach(IterableLike.scala:74)
>>   at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
>>   at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
>>   at scala.collection.TraversableLike.map(TraversableLike.scala:238)
>>
>>
>> - SPARK-25158: Executor accidentally exit because
>> ScriptTransformationWriterThread throw Exception *** FAILED ***
>>   Expected exception org.apache.spark.SparkException to be thrown, but
>> no exception was thrown (SQLQuerySuite.scala:2384)
>>
>>
>> * checking for missing documentation entries ... WARNING
>> Undocumented code objects:
>>   ‘%<=>%’ ‘add_months’ ‘agg’ ‘approxCountDistinct’ ‘approxQuantile’
>>   ‘approx_count_distinct’ ‘arrange’ ‘array_contains’ ‘array_distinct’
>> ...
>>  WARNING
>> ‘qpdf’ is needed for checks on size reduction of PDFs
>>
>> On Tue, Mar 31, 2020 at 10:04 PM Reynold Xin  wrote:
>> >
>> > Please vote on releasing the following candidate as Apache Spark
>> version 3.0.0.
>> >
>> > The vote is open until 11:59pm Pacific time Fri Apr 3, and passes if a
>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>> >
>> > [ ] +1 Release this package as Apache Spark 3.0.0
>> > [ ] -1 Do not release this package because ...
>> >
>> > To learn more about Apache Spark, please see http://spark.apache.org/
>> >
>> > The tag to be voted on is v3.0.0-rc1 (commit
>> 6550d0d5283efdbbd838f3aeaf0476c7f52a0fb1):
>> > https://github.com/apache/spark/tree/v3.0.0-rc1
>> >
>> > The release files, including signatures, digests, etc. can be found at:
>> > https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc1-bin/
>> >
>> > Signatures used for Spark RCs can be found in this file:
>> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>> >
>> > The staging repository for this release can be found at:
>> > https://repository.apache.org/content/repositories/orgapachespark-1341/
>> >
>> > The documentation corresponding to this release can be found at:
>> > https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc1-docs/
>> >
>> > The list of bug fixes going into 2.4.5 can be found at the following
>> URL:
>> > https://issues.apache.org/jira/projects/SPARK/versions/12339177
>> >
>> > This release is using the release script of the tag v3.0.0-rc1.
>> >
>> >
>> > FAQ
>> >
>> > =

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-09 Thread Jungtaek Lim
I went through some manually tests for the new features of Structured
Streaming in Spark 3.0.0. (Please let me know if there're more features
we'd like to test manually.)

* file source cleanup - both “archive" and “delete" work. Query fails as
expected when the input directory is the output directory of file sink.
* kafka source/sink - “header” works for both source and sink, "group id
prefix" and “static group id” work, confirmed start offset by timestamp
works for streaming case
* event log stuffs with streaming query - enabled it, confirmed compaction
works, and SHS can read compacted event logs, and downloading event log in
SHS works as zipping the event log directory. original functionalities with
single event log file work as well.

Looks good, though there're still plenty of commits pushed to branch-3.0
after RC1 which feels me that it may not be safe to carry over the
test result for RC1 to RC2.

On Sat, Apr 4, 2020 at 12:49 AM Sean Owen  wrote:

> Aside from the other issues mentioned here, which probably do require
> another RC, this looks pretty good to me.
>
> I built on Ubuntu 19 and ran with Java 11, -Pspark-ganglia-lgpl
> -Pkinesis-asl -Phadoop-3.2 -Phive-2.3 -Pyarn -Pmesos -Pkubernetes
> -Phive-thriftserver -Djava.version=11
>
> I did see the following test failures, but as usual, I'm not sure
> whether it's specific to me. Anyone else see these, particularly the R
> warnings?
>
>
> PythonUDFSuite:
> org.apache.spark.sql.execution.python.PythonUDFSuite *** ABORTED ***
>   java.lang.RuntimeException: Unable to load a Suite class that was
> discovered in the runpath:
> org.apache.spark.sql.execution.python.PythonUDFSuite
>   at
> org.scalatest.tools.DiscoverySuite$.getSuiteInstance(DiscoverySuite.scala:81)
>   at
> org.scalatest.tools.DiscoverySuite.$anonfun$nestedSuites$1(DiscoverySuite.scala:38)
>   at
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>   at scala.collection.Iterator.foreach(Iterator.scala:941)
>   at scala.collection.Iterator.foreach$(Iterator.scala:941)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
>   at scala.collection.IterableLike.foreach(IterableLike.scala:74)
>   at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:238)
>
>
> - SPARK-25158: Executor accidentally exit because
> ScriptTransformationWriterThread throw Exception *** FAILED ***
>   Expected exception org.apache.spark.SparkException to be thrown, but
> no exception was thrown (SQLQuerySuite.scala:2384)
>
>
> * checking for missing documentation entries ... WARNING
> Undocumented code objects:
>   ‘%<=>%’ ‘add_months’ ‘agg’ ‘approxCountDistinct’ ‘approxQuantile’
>   ‘approx_count_distinct’ ‘arrange’ ‘array_contains’ ‘array_distinct’
> ...
>  WARNING
> ‘qpdf’ is needed for checks on size reduction of PDFs
>
> On Tue, Mar 31, 2020 at 10:04 PM Reynold Xin  wrote:
> >
> > Please vote on releasing the following candidate as Apache Spark version
> 3.0.0.
> >
> > The vote is open until 11:59pm Pacific time Fri Apr 3, and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
> >
> > [ ] +1 Release this package as Apache Spark 3.0.0
> > [ ] -1 Do not release this package because ...
> >
> > To learn more about Apache Spark, please see http://spark.apache.org/
> >
> > The tag to be voted on is v3.0.0-rc1 (commit
> 6550d0d5283efdbbd838f3aeaf0476c7f52a0fb1):
> > https://github.com/apache/spark/tree/v3.0.0-rc1
> >
> > The release files, including signatures, digests, etc. can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc1-bin/
> >
> > Signatures used for Spark RCs can be found in this file:
> > https://dist.apache.org/repos/dist/dev/spark/KEYS
> >
> > The staging repository for this release can be found at:
> > https://repository.apache.org/content/repositories/orgapachespark-1341/
> >
> > The documentation corresponding to this release can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc1-docs/
> >
> > The list of bug fixes going into 2.4.5 can be found at the following URL:
> > https://issues.apache.org/jira/projects/SPARK/versions/12339177
> >
> > This release is using the release script of the tag v3.0.0-rc1.
> >
> >
> > FAQ
> >
> > =
> > How can I help test this release?
> > =
> > If you are a Spark user, you can help us test this release by taking
> > an existing Spark workload and running on this release candidate, then
> > reporting any regressions.
> >
> > If you're working in PySpark you can set up a virtual env and install
> > the current RC and see if anything important breaks, in the Java/Scala
> > you can add the staging repository to your projects resolvers and test
> > with the RC (make sure to clean up the artifact cache before/after so
> > you don't end up building with a out of date RC 

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-03 Thread Sean Owen
Aside from the other issues mentioned here, which probably do require
another RC, this looks pretty good to me.

I built on Ubuntu 19 and ran with Java 11, -Pspark-ganglia-lgpl
-Pkinesis-asl -Phadoop-3.2 -Phive-2.3 -Pyarn -Pmesos -Pkubernetes
-Phive-thriftserver -Djava.version=11

I did see the following test failures, but as usual, I'm not sure
whether it's specific to me. Anyone else see these, particularly the R
warnings?


PythonUDFSuite:
org.apache.spark.sql.execution.python.PythonUDFSuite *** ABORTED ***
  java.lang.RuntimeException: Unable to load a Suite class that was
discovered in the runpath:
org.apache.spark.sql.execution.python.PythonUDFSuite
  at 
org.scalatest.tools.DiscoverySuite$.getSuiteInstance(DiscoverySuite.scala:81)
  at 
org.scalatest.tools.DiscoverySuite.$anonfun$nestedSuites$1(DiscoverySuite.scala:38)
  at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
  at scala.collection.Iterator.foreach(Iterator.scala:941)
  at scala.collection.Iterator.foreach$(Iterator.scala:941)
  at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
  at scala.collection.IterableLike.foreach(IterableLike.scala:74)
  at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
  at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
  at scala.collection.TraversableLike.map(TraversableLike.scala:238)


- SPARK-25158: Executor accidentally exit because
ScriptTransformationWriterThread throw Exception *** FAILED ***
  Expected exception org.apache.spark.SparkException to be thrown, but
no exception was thrown (SQLQuerySuite.scala:2384)


* checking for missing documentation entries ... WARNING
Undocumented code objects:
  ‘%<=>%’ ‘add_months’ ‘agg’ ‘approxCountDistinct’ ‘approxQuantile’
  ‘approx_count_distinct’ ‘arrange’ ‘array_contains’ ‘array_distinct’
...
 WARNING
‘qpdf’ is needed for checks on size reduction of PDFs

On Tue, Mar 31, 2020 at 10:04 PM Reynold Xin  wrote:
>
> Please vote on releasing the following candidate as Apache Spark version 
> 3.0.0.
>
> The vote is open until 11:59pm Pacific time Fri Apr 3, and passes if a 
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.0.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.0.0-rc1 (commit 
> 6550d0d5283efdbbd838f3aeaf0476c7f52a0fb1):
> https://github.com/apache/spark/tree/v3.0.0-rc1
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc1-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1341/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc1-docs/
>
> The list of bug fixes going into 2.4.5 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12339177
>
> This release is using the release script of the tag v3.0.0-rc1.
>
>
> FAQ
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.0.0?
> ===
> The current list of open tickets targeted at 3.0.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target 
> Version/s" = 3.0.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>
> Note: I fully expect this RC to fail.
>
>
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-02 Thread Takeshi Yamamuro
Also, I think the 3.0 release had better to include all the SQL document
updates:
https://issues.apache.org/jira/browse/SPARK-28588

On Fri, Apr 3, 2020 at 12:36 AM Sean Owen  wrote:

> (If it wasn't stated explicitly, yeah I think we knew there are a few
> important unresolved issues and that this RC was going to fail. Let's
> all please test anyway of course, to flush out any additional issues,
> rather than wait. Pipelining and all that.)
>
> On Thu, Apr 2, 2020 at 10:31 AM Maxim Gekk 
> wrote:
> >
> > -1 (non-binding)
> >
> > The problem of compatibility with Spark 2.4 in reading/writing
> dates/timestamps hasn't been solved completely so far. In particular, the
> sub-task https://issues.apache.org/jira/browse/SPARK-31328 hasn't
> resolved yet.
> >
> > Maxim Gekk
> >
> > Software Engineer
> >
> > Databricks, Inc.
> >
> >
> >
> > On Wed, Apr 1, 2020 at 7:09 PM Ryan Blue 
> wrote:
> >>
> >> -1 (non-binding)
> >>
> >> I agree with Jungtaek. The change to create datasource tables instead
> of Hive tables by default (no USING or STORED AS clauses) has created
> confusing behavior and should either be rolled back or fixed before 3.0.
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

-- 
---
Takeshi Yamamuro


Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-02 Thread Sean Owen
(If it wasn't stated explicitly, yeah I think we knew there are a few
important unresolved issues and that this RC was going to fail. Let's
all please test anyway of course, to flush out any additional issues,
rather than wait. Pipelining and all that.)

On Thu, Apr 2, 2020 at 10:31 AM Maxim Gekk  wrote:
>
> -1 (non-binding)
>
> The problem of compatibility with Spark 2.4 in reading/writing 
> dates/timestamps hasn't been solved completely so far. In particular, the 
> sub-task https://issues.apache.org/jira/browse/SPARK-31328 hasn't resolved 
> yet.
>
> Maxim Gekk
>
> Software Engineer
>
> Databricks, Inc.
>
>
>
> On Wed, Apr 1, 2020 at 7:09 PM Ryan Blue  wrote:
>>
>> -1 (non-binding)
>>
>> I agree with Jungtaek. The change to create datasource tables instead of 
>> Hive tables by default (no USING or STORED AS clauses) has created confusing 
>> behavior and should either be rolled back or fixed before 3.0.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-02 Thread Maxim Gekk
-1 (non-binding)

The problem of compatibility with Spark 2.4 in reading/writing
dates/timestamps hasn't been solved completely so far. In particular, the
sub-task https://issues.apache.org/jira/browse/SPARK-31328 hasn't resolved
yet.

Maxim Gekk

Software Engineer

Databricks, Inc.


On Wed, Apr 1, 2020 at 7:09 PM Ryan Blue  wrote:

> -1 (non-binding)
>
> I agree with Jungtaek. The change to create datasource tables instead of
> Hive tables by default (no USING or STORED AS clauses) has created
> confusing behavior and should either be rolled back or fixed before 3.0.
>
> On Wed, Apr 1, 2020 at 5:12 AM Sean Owen  wrote:
>
>> Those are not per se release blockers. They are (perhaps important)
>> improvements to functionality. I don't know who is active and able to
>> review that part of the code; I'd look for authors of changes in the
>> surrounding code. The question here isn't so much what one would like
>> to see in this release, but evaluating whether the release is sound
>> and free of show-stopper problems. There will always be potentially
>> important changes and fixes to come.
>>
>> On Wed, Apr 1, 2020 at 5:31 AM Dr. Kent Yao  wrote:
>> >
>> > -1
>> > Do not release this package because v3.0.0 is the 3rd major release
>> since we
>> > added Spark On Kubernetes. Can we make it more production-ready as it
>> has
>> > been experimental for more than 2 years?
>> >
>> > The main practical adoption of Spark on Kubernetes is to take on the
>> role of
>> > other cluster managers(mainly YARN). And the storage layer(mainly HDFS)
>> > would be more likely kept anyway. But Spark on Kubernetes with HDFS
>> seems
>> > not to work properly.
>> >
>> > e.g.
>> > This ticket and PR were submitted 7 months ago, and never get reviewed.
>> > https://issues.apache.org/jira/browse/SPARK-29974
>> > https://issues.apache.org/jira/browse/SPARK-28992
>> > https://github.com/apache/spark/pull/25695
>> >
>> > And this.
>> > https://issues.apache.org/jira/browse/SPARK-28896
>> > https://github.com/apache/spark/pull/25609
>> >
>> > In terms of how often this module is updated, it seems to be stable.
>> > But in terms of how often PRs for this module are reviewed, it seems
>> that it
>> > will stay experimental for a long time.
>> >
>> > Thanks.
>> >
>> >
>> >
>> > --
>> > Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>> >
>> > -
>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>> >
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>


Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-01 Thread Ryan Blue
-1 (non-binding)

I agree with Jungtaek. The change to create datasource tables instead of
Hive tables by default (no USING or STORED AS clauses) has created
confusing behavior and should either be rolled back or fixed before 3.0.

On Wed, Apr 1, 2020 at 5:12 AM Sean Owen  wrote:

> Those are not per se release blockers. They are (perhaps important)
> improvements to functionality. I don't know who is active and able to
> review that part of the code; I'd look for authors of changes in the
> surrounding code. The question here isn't so much what one would like
> to see in this release, but evaluating whether the release is sound
> and free of show-stopper problems. There will always be potentially
> important changes and fixes to come.
>
> On Wed, Apr 1, 2020 at 5:31 AM Dr. Kent Yao  wrote:
> >
> > -1
> > Do not release this package because v3.0.0 is the 3rd major release
> since we
> > added Spark On Kubernetes. Can we make it more production-ready as it has
> > been experimental for more than 2 years?
> >
> > The main practical adoption of Spark on Kubernetes is to take on the
> role of
> > other cluster managers(mainly YARN). And the storage layer(mainly HDFS)
> > would be more likely kept anyway. But Spark on Kubernetes with HDFS seems
> > not to work properly.
> >
> > e.g.
> > This ticket and PR were submitted 7 months ago, and never get reviewed.
> > https://issues.apache.org/jira/browse/SPARK-29974
> > https://issues.apache.org/jira/browse/SPARK-28992
> > https://github.com/apache/spark/pull/25695
> >
> > And this.
> > https://issues.apache.org/jira/browse/SPARK-28896
> > https://github.com/apache/spark/pull/25609
> >
> > In terms of how often this module is updated, it seems to be stable.
> > But in terms of how often PRs for this module are reviewed, it seems
> that it
> > will stay experimental for a long time.
> >
> > Thanks.
> >
> >
> >
> > --
> > Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
> >
> > -
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

-- 
Ryan Blue
Software Engineer
Netflix


Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-01 Thread Sean Owen
Those are not per se release blockers. They are (perhaps important)
improvements to functionality. I don't know who is active and able to
review that part of the code; I'd look for authors of changes in the
surrounding code. The question here isn't so much what one would like
to see in this release, but evaluating whether the release is sound
and free of show-stopper problems. There will always be potentially
important changes and fixes to come.

On Wed, Apr 1, 2020 at 5:31 AM Dr. Kent Yao  wrote:
>
> -1
> Do not release this package because v3.0.0 is the 3rd major release since we
> added Spark On Kubernetes. Can we make it more production-ready as it has
> been experimental for more than 2 years?
>
> The main practical adoption of Spark on Kubernetes is to take on the role of
> other cluster managers(mainly YARN). And the storage layer(mainly HDFS)
> would be more likely kept anyway. But Spark on Kubernetes with HDFS seems
> not to work properly.
>
> e.g.
> This ticket and PR were submitted 7 months ago, and never get reviewed.
> https://issues.apache.org/jira/browse/SPARK-29974
> https://issues.apache.org/jira/browse/SPARK-28992
> https://github.com/apache/spark/pull/25695
>
> And this.
> https://issues.apache.org/jira/browse/SPARK-28896
> https://github.com/apache/spark/pull/25609
>
> In terms of how often this module is updated, it seems to be stable.
> But in terms of how often PRs for this module are reviewed, it seems that it
> will stay experimental for a long time.
>
> Thanks.
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-01 Thread Dr. Kent Yao
-1
Do not release this package because v3.0.0 is the 3rd major release since we
added Spark On Kubernetes. Can we make it more production-ready as it has
been experimental for more than 2 years? 

The main practical adoption of Spark on Kubernetes is to take on the role of
other cluster managers(mainly YARN). And the storage layer(mainly HDFS)
would be more likely kept anyway. But Spark on Kubernetes with HDFS seems
not to work properly.

e.g.
This ticket and PR were submitted 7 months ago, and never get reviewed.
https://issues.apache.org/jira/browse/SPARK-29974
https://issues.apache.org/jira/browse/SPARK-28992
https://github.com/apache/spark/pull/25695

And this.
https://issues.apache.org/jira/browse/SPARK-28896
https://github.com/apache/spark/pull/25609

In terms of how often this module is updated, it seems to be stable. 
But in terms of how often PRs for this module are reviewed, it seems that it
will stay experimental for a long time.

Thanks.



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-01 Thread Reynold Xin
The Apache Software Foundation requires voting before any release can be 
published.

On Tue, Mar 31, 2020 at 11:27 PM, Stephen Coy < s...@infomedia.com.au.invalid > 
wrote:

> 
> 
>> On 1 Apr 2020, at 5:20 pm, Sean Owen < srowen@ gmail. com (
>> sro...@gmail.com ) > wrote:
>> 
>> It can be published as "3.0.0-rc1" but how do we test that to vote on it
>> without some other RC1 RC1
>> 
> 
> 
> I’m not sure what you mean by this question?
> 
> 
> 
> 
> This email contains confidential information of and is the copyright of
> Infomedia. It must not be forwarded, amended or disclosed without consent
> of the sender. If you received this message by mistake, please advise the
> sender and delete all copies. Security of transmission on the internet
> cannot be guaranteed, could be infected, intercepted, or corrupted and you
> should ensure you have suitable antivirus protection in place. By sending
> us your or any third party personal details, you consent to (or confirm
> you have obtained consent from such third parties) to Infomedia’s privacy
> policy. http:/ / www. infomedia. com. au/ privacy-policy/ (
> http://www.infomedia.com.au/privacy-policy/ )
>

smime.p7s
Description: S/MIME Cryptographic Signature


Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-01 Thread Stephen Coy

On 1 Apr 2020, at 5:20 pm, Sean Owen 
mailto:sro...@gmail.com>> wrote:

It can be published as "3.0.0-rc1" but how do we test that to vote on it 
without some other RC1 RC1

I’m not sure what you mean by this question?


This email contains confidential information of and is the copyright of 
Infomedia. It must not be forwarded, amended or disclosed without consent of 
the sender. If you received this message by mistake, please advise the sender 
and delete all copies. Security of transmission on the internet cannot be 
guaranteed, could be infected, intercepted, or corrupted and you should ensure 
you have suitable antivirus protection in place. By sending us your or any 
third party personal details, you consent to (or confirm you have obtained 
consent from such third parties) to Infomedia’s privacy policy. 
http://www.infomedia.com.au/privacy-policy/


Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-01 Thread Sean Owen
You just mvn -DskipTests install the source release. That is the primary
artifact we're testing. But yes you could put the jars in your local repo
too.
I think this is pretty standard practice. Obviously the RC can't be
published as "3.0.0". It can be published as "3.0.0-rc1" but how do we test
that to vote on it without some other RC1 RC1.

On Wed, Apr 1, 2020 at 12:30 AM Stephen Coy  wrote:

> Therefore, if I want to build my product against these jars I need to
> either locally install these jars or checkout and build the RC tag.
>
> I guess I need to build anyway because I need
> a spark-hadoop-cloud_2.12-3.0.0.jar. BTW, it would be incredibly handy to
> have this in the distro, or at least in Maven Central.
>
> Thanks,
>
> Steve C
>
> On 1 Apr 2020, at 3:48 pm, Wenchen Fan  wrote:
>
> Yea, release candidates are different from the preview version, as release
> candidates are not official releases, so they won't appear in Maven
> Central, can't be downloaded in the Spark official website, etc.
>
> On Wed, Apr 1, 2020 at 12:32 PM Sean Owen  wrote:
>
>> These are release candidates, not the final release, so they won't be
>> published to Maven Central. The naming matches what the final release would
>> be.
>>
>> On Tue, Mar 31, 2020 at 11:25 PM Stephen Coy <
>> s...@infomedia.com.au.invalid> wrote:
>>
>>> Furthermore, the spark jars in these bundles all look like release
>>> versions:
>>>
>>> [scoy@Steves-Core-i9 spark-3.0.0-bin-hadoop3.2]$ ls -l jars/spark-*
>>> -rw-r--r--@ 1 scoy  staff  9261223 31 Mar 20:55
>>> jars/spark-catalyst_2.12-3.0.0.jar
>>> -rw-r--r--@ 1 scoy  staff  9720421 31 Mar 20:55
>>> jars/spark-core_2.12-3.0.0.jar
>>> -rw-r--r--@ 1 scoy  staff   430854 31 Mar 20:55
>>> jars/spark-graphx_2.12-3.0.0.jar
>>> -rw-r--r--@ 1 scoy  staff  2076394 31 Mar 20:55
>>> jars/spark-hive-thriftserver_2.12-3.0.0.jar
>>> -rw-r--r--@ 1 scoy  staff   690789 31 Mar 20:55
>>> jars/spark-hive_2.12-3.0.0.jar
>>> -rw-r--r--@ 1 scoy  staff   369189 31 Mar 20:55
>>> jars/spark-kubernetes_2.12-3.0.0.jar
>>> -rw-r--r--@ 1 scoy  staff59870 31 Mar 20:55
>>> jars/spark-kvstore_2.12-3.0.0.jar
>>> -rw-r--r--@ 1 scoy  staff75930 31 Mar 20:55
>>> jars/spark-launcher_2.12-3.0.0.jar
>>> -rw-r--r--@ 1 scoy  staff   294692 31 Mar 20:55
>>> jars/spark-mesos_2.12-3.0.0.jar
>>> -rw-r--r--@ 1 scoy  staff   111915 31 Mar 20:55
>>> jars/spark-mllib-local_2.12-3.0.0.jar
>>> -rw-r--r--@ 1 scoy  staff  5884976 31 Mar 20:55
>>> jars/spark-mllib_2.12-3.0.0.jar
>>> -rw-r--r--@ 1 scoy  staff  2397168 31 Mar 20:55
>>> jars/spark-network-common_2.12-3.0.0.jar
>>> -rw-r--r--@ 1 scoy  staff87065 31 Mar 20:55
>>> jars/spark-network-shuffle_2.12-3.0.0.jar
>>> -rw-r--r--@ 1 scoy  staff52605 31 Mar 20:55
>>> jars/spark-repl_2.12-3.0.0.jar
>>> -rw-r--r--@ 1 scoy  staff30347 31 Mar 20:55
>>> jars/spark-sketch_2.12-3.0.0.jar
>>> -rw-r--r--@ 1 scoy  staff  7092213 31 Mar 20:55
>>> jars/spark-sql_2.12-3.0.0.jar
>>> -rw-r--r--@ 1 scoy  staff  1137675 31 Mar 20:55
>>> jars/spark-streaming_2.12-3.0.0.jar
>>> -rw-r--r--@ 1 scoy  staff 9049 31 Mar 20:55
>>> jars/spark-tags_2.12-3.0.0-tests.jar
>>> -rw-r--r--@ 1 scoy  staff15149 31 Mar 20:55
>>> jars/spark-tags_2.12-3.0.0.jar
>>> -rw-r--r--@ 1 scoy  staff51089 31 Mar 20:55
>>> jars/spark-unsafe_2.12-3.0.0.jar
>>> -rw-r--r--@ 1 scoy  staff   329764 31 Mar 20:55
>>> jars/spark-yarn_2.12-3.0.0.jar
>>>
>>> At least they have not yet shown up on Maven Central…
>>>
>>> Steve C
>>>
>>> On 1 Apr 2020, at 3:18 pm, Stephen Coy 
>>> wrote:
>>>
>>> The download artifacts are all seem to have the “RC1” missing from their
>>> names.
>>>
>>> e.g. spark-3.0.0-bin-hadoop3.2.tgz
>>>
>>> Cheers,
>>>
>>> Steve C
>>>
>>> On 1 Apr 2020, at 2:04 pm, Reynold Xin  wrote:
>>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 3.0.0.
>>>
>>> The vote is open until 11:59pm Pacific time Fri Apr 3, and passes if a
>>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>>
>>> [ ] +1 Release this package as Apache Spark 3.0.0
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>> 
>>>
>>> The tag to be voted on is v3.0.0-rc1 (commit
>>> 6550d0d5283efdbbd838f3aeaf0476c7f52a0fb1):
>>> https://github.com/apache/spark/tree/v3.0.0-rc1
>>> 
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> 

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-03-31 Thread Stephen Coy
Therefore, if I want to build my product against these jars I need to either 
locally install these jars or checkout and build the RC tag.

I guess I need to build anyway because I need a 
spark-hadoop-cloud_2.12-3.0.0.jar. BTW, it would be incredibly handy to have 
this in the distro, or at least in Maven Central.

Thanks,

Steve C

On 1 Apr 2020, at 3:48 pm, Wenchen Fan 
mailto:cloud0...@gmail.com>> wrote:

Yea, release candidates are different from the preview version, as release 
candidates are not official releases, so they won't appear in Maven Central, 
can't be downloaded in the Spark official website, etc.

On Wed, Apr 1, 2020 at 12:32 PM Sean Owen 
mailto:sro...@gmail.com>> wrote:
These are release candidates, not the final release, so they won't be published 
to Maven Central. The naming matches what the final release would be.

On Tue, Mar 31, 2020 at 11:25 PM Stephen Coy 
mailto:s...@infomedia.com.au.invalid>> wrote:
Furthermore, the spark jars in these bundles all look like release versions:

[scoy@Steves-Core-i9 spark-3.0.0-bin-hadoop3.2]$ ls -l jars/spark-*
-rw-r--r--@ 1 scoy  staff  9261223 31 Mar 20:55 
jars/spark-catalyst_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff  9720421 31 Mar 20:55 jars/spark-core_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff   430854 31 Mar 20:55 jars/spark-graphx_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff  2076394 31 Mar 20:55 
jars/spark-hive-thriftserver_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff   690789 31 Mar 20:55 jars/spark-hive_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff   369189 31 Mar 20:55 
jars/spark-kubernetes_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff59870 31 Mar 20:55 
jars/spark-kvstore_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff75930 31 Mar 20:55 
jars/spark-launcher_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff   294692 31 Mar 20:55 jars/spark-mesos_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff   111915 31 Mar 20:55 
jars/spark-mllib-local_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff  5884976 31 Mar 20:55 jars/spark-mllib_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff  2397168 31 Mar 20:55 
jars/spark-network-common_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff87065 31 Mar 20:55 
jars/spark-network-shuffle_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff52605 31 Mar 20:55 jars/spark-repl_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff30347 31 Mar 20:55 jars/spark-sketch_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff  7092213 31 Mar 20:55 jars/spark-sql_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff  1137675 31 Mar 20:55 
jars/spark-streaming_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff 9049 31 Mar 20:55 
jars/spark-tags_2.12-3.0.0-tests.jar
-rw-r--r--@ 1 scoy  staff15149 31 Mar 20:55 jars/spark-tags_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff51089 31 Mar 20:55 jars/spark-unsafe_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff   329764 31 Mar 20:55 jars/spark-yarn_2.12-3.0.0.jar

At least they have not yet shown up on Maven Central…

Steve C

On 1 Apr 2020, at 3:18 pm, Stephen Coy 
mailto:s...@infomedia.com.au.INVALID>> wrote:

The download artifacts are all seem to have the “RC1” missing from their names.

e.g. spark-3.0.0-bin-hadoop3.2.tgz

Cheers,

Steve C

On 1 Apr 2020, at 2:04 pm, Reynold Xin 
mailto:r...@databricks.com>> wrote:


Please vote on releasing the following candidate as Apache Spark version 3.0.0.

The vote is open until 11:59pm Pacific time Fri Apr 3, and passes if a majority 
+1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 3.0.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see 
http://spark.apache.org/

The tag to be voted on is v3.0.0-rc1 (commit 
6550d0d5283efdbbd838f3aeaf0476c7f52a0fb1):
https://github.com/apache/spark/tree/v3.0.0-rc1

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc1-bin/

Signatures used for Spark RCs can be found in this file:

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-03-31 Thread Jungtaek Lim
-1 (non-binding)

I filed SPARK-31257 as a blocker, and now others start to agree that it's a
critical issue which should be dealt before releasing Spark 3.0. Please
refer recent comments in https://github.com/apache/spark/pull/28026

It won't delay the release pretty much, as we can either revert
SPARK-30098, or turn on legacy config by default. That's just a matter of
choice, doesn't require huge effort.

On Wed, Apr 1, 2020 at 12:04 PM Reynold Xin  wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 3.0.0.
>
> The vote is open until 11:59pm Pacific time Fri Apr 3, and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.0.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.0.0-rc1 (commit
> 6550d0d5283efdbbd838f3aeaf0476c7f52a0fb1):
> https://github.com/apache/spark/tree/v3.0.0-rc1
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc1-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1341/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc1-docs/
>
> The list of bug fixes going into 2.4.5 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12339177
>
> This release is using the release script of the tag v3.0.0-rc1.
>
>
> FAQ
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.0.0?
> ===
> The current list of open tickets targeted at 3.0.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.0.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>
> Note: I fully expect this RC to fail.
>
>
>
>


Re: [VOTE] Apache Spark 3.0.0 RC1

2020-03-31 Thread Stephen Coy
That is a very unusual practice...

On 1 Apr 2020, at 3:32 pm, Sean Owen 
mailto:sro...@gmail.com>> wrote:

These are release candidates, not the final release, so they won't be published 
to Maven Central. The naming matches what the final release would be.

On Tue, Mar 31, 2020 at 11:25 PM Stephen Coy 
mailto:s...@infomedia.com.au.invalid>> wrote:
Furthermore, the spark jars in these bundles all look like release versions:

[scoy@Steves-Core-i9 spark-3.0.0-bin-hadoop3.2]$ ls -l jars/spark-*
-rw-r--r--@ 1 scoy  staff  9261223 31 Mar 20:55 
jars/spark-catalyst_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff  9720421 31 Mar 20:55 jars/spark-core_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff   430854 31 Mar 20:55 jars/spark-graphx_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff  2076394 31 Mar 20:55 
jars/spark-hive-thriftserver_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff   690789 31 Mar 20:55 jars/spark-hive_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff   369189 31 Mar 20:55 
jars/spark-kubernetes_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff59870 31 Mar 20:55 
jars/spark-kvstore_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff75930 31 Mar 20:55 
jars/spark-launcher_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff   294692 31 Mar 20:55 jars/spark-mesos_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff   111915 31 Mar 20:55 
jars/spark-mllib-local_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff  5884976 31 Mar 20:55 jars/spark-mllib_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff  2397168 31 Mar 20:55 
jars/spark-network-common_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff87065 31 Mar 20:55 
jars/spark-network-shuffle_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff52605 31 Mar 20:55 jars/spark-repl_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff30347 31 Mar 20:55 jars/spark-sketch_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff  7092213 31 Mar 20:55 jars/spark-sql_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff  1137675 31 Mar 20:55 
jars/spark-streaming_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff 9049 31 Mar 20:55 
jars/spark-tags_2.12-3.0.0-tests.jar
-rw-r--r--@ 1 scoy  staff15149 31 Mar 20:55 jars/spark-tags_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff51089 31 Mar 20:55 jars/spark-unsafe_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff   329764 31 Mar 20:55 jars/spark-yarn_2.12-3.0.0.jar

At least they have not yet shown up on Maven Central…

Steve C

On 1 Apr 2020, at 3:18 pm, Stephen Coy 
mailto:s...@infomedia.com.au.INVALID>> wrote:

The download artifacts are all seem to have the “RC1” missing from their names.

e.g. spark-3.0.0-bin-hadoop3.2.tgz

Cheers,

Steve C

On 1 Apr 2020, at 2:04 pm, Reynold Xin 
mailto:r...@databricks.com>> wrote:


Please vote on releasing the following candidate as Apache Spark version 3.0.0.

The vote is open until 11:59pm Pacific time Fri Apr 3, and passes if a majority 
+1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 3.0.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see 
http://spark.apache.org/

The tag to be voted on is v3.0.0-rc1 (commit 
6550d0d5283efdbbd838f3aeaf0476c7f52a0fb1):
https://github.com/apache/spark/tree/v3.0.0-rc1

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc1-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-03-31 Thread Wenchen Fan
Yea, release candidates are different from the preview version, as release
candidates are not official releases, so they won't appear in Maven
Central, can't be downloaded in the Spark official website, etc.

On Wed, Apr 1, 2020 at 12:32 PM Sean Owen  wrote:

> These are release candidates, not the final release, so they won't be
> published to Maven Central. The naming matches what the final release would
> be.
>
> On Tue, Mar 31, 2020 at 11:25 PM Stephen Coy 
> wrote:
>
>> Furthermore, the spark jars in these bundles all look like release
>> versions:
>>
>> [scoy@Steves-Core-i9 spark-3.0.0-bin-hadoop3.2]$ ls -l jars/spark-*
>> -rw-r--r--@ 1 scoy  staff  9261223 31 Mar 20:55
>> jars/spark-catalyst_2.12-3.0.0.jar
>> -rw-r--r--@ 1 scoy  staff  9720421 31 Mar 20:55
>> jars/spark-core_2.12-3.0.0.jar
>> -rw-r--r--@ 1 scoy  staff   430854 31 Mar 20:55
>> jars/spark-graphx_2.12-3.0.0.jar
>> -rw-r--r--@ 1 scoy  staff  2076394 31 Mar 20:55
>> jars/spark-hive-thriftserver_2.12-3.0.0.jar
>> -rw-r--r--@ 1 scoy  staff   690789 31 Mar 20:55
>> jars/spark-hive_2.12-3.0.0.jar
>> -rw-r--r--@ 1 scoy  staff   369189 31 Mar 20:55
>> jars/spark-kubernetes_2.12-3.0.0.jar
>> -rw-r--r--@ 1 scoy  staff59870 31 Mar 20:55
>> jars/spark-kvstore_2.12-3.0.0.jar
>> -rw-r--r--@ 1 scoy  staff75930 31 Mar 20:55
>> jars/spark-launcher_2.12-3.0.0.jar
>> -rw-r--r--@ 1 scoy  staff   294692 31 Mar 20:55
>> jars/spark-mesos_2.12-3.0.0.jar
>> -rw-r--r--@ 1 scoy  staff   111915 31 Mar 20:55
>> jars/spark-mllib-local_2.12-3.0.0.jar
>> -rw-r--r--@ 1 scoy  staff  5884976 31 Mar 20:55
>> jars/spark-mllib_2.12-3.0.0.jar
>> -rw-r--r--@ 1 scoy  staff  2397168 31 Mar 20:55
>> jars/spark-network-common_2.12-3.0.0.jar
>> -rw-r--r--@ 1 scoy  staff87065 31 Mar 20:55
>> jars/spark-network-shuffle_2.12-3.0.0.jar
>> -rw-r--r--@ 1 scoy  staff52605 31 Mar 20:55
>> jars/spark-repl_2.12-3.0.0.jar
>> -rw-r--r--@ 1 scoy  staff30347 31 Mar 20:55
>> jars/spark-sketch_2.12-3.0.0.jar
>> -rw-r--r--@ 1 scoy  staff  7092213 31 Mar 20:55
>> jars/spark-sql_2.12-3.0.0.jar
>> -rw-r--r--@ 1 scoy  staff  1137675 31 Mar 20:55
>> jars/spark-streaming_2.12-3.0.0.jar
>> -rw-r--r--@ 1 scoy  staff 9049 31 Mar 20:55
>> jars/spark-tags_2.12-3.0.0-tests.jar
>> -rw-r--r--@ 1 scoy  staff15149 31 Mar 20:55
>> jars/spark-tags_2.12-3.0.0.jar
>> -rw-r--r--@ 1 scoy  staff51089 31 Mar 20:55
>> jars/spark-unsafe_2.12-3.0.0.jar
>> -rw-r--r--@ 1 scoy  staff   329764 31 Mar 20:55
>> jars/spark-yarn_2.12-3.0.0.jar
>>
>> At least they have not yet shown up on Maven Central…
>>
>> Steve C
>>
>> On 1 Apr 2020, at 3:18 pm, Stephen Coy 
>> wrote:
>>
>> The download artifacts are all seem to have the “RC1” missing from their
>> names.
>>
>> e.g. spark-3.0.0-bin-hadoop3.2.tgz
>>
>> Cheers,
>>
>> Steve C
>>
>> On 1 Apr 2020, at 2:04 pm, Reynold Xin  wrote:
>>
>> Please vote on releasing the following candidate as Apache Spark version
>> 3.0.0.
>>
>> The vote is open until 11:59pm Pacific time Fri Apr 3, and passes if a
>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 3.0.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>> 
>>
>> The tag to be voted on is v3.0.0-rc1 (commit
>> 6550d0d5283efdbbd838f3aeaf0476c7f52a0fb1):
>> https://github.com/apache/spark/tree/v3.0.0-rc1
>> 
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc1-bin/
>> 
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>> 
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1341/

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-03-31 Thread Sean Owen
These are release candidates, not the final release, so they won't be
published to Maven Central. The naming matches what the final release would
be.

On Tue, Mar 31, 2020 at 11:25 PM Stephen Coy 
wrote:

> Furthermore, the spark jars in these bundles all look like release
> versions:
>
> [scoy@Steves-Core-i9 spark-3.0.0-bin-hadoop3.2]$ ls -l jars/spark-*
> -rw-r--r--@ 1 scoy  staff  9261223 31 Mar 20:55
> jars/spark-catalyst_2.12-3.0.0.jar
> -rw-r--r--@ 1 scoy  staff  9720421 31 Mar 20:55
> jars/spark-core_2.12-3.0.0.jar
> -rw-r--r--@ 1 scoy  staff   430854 31 Mar 20:55
> jars/spark-graphx_2.12-3.0.0.jar
> -rw-r--r--@ 1 scoy  staff  2076394 31 Mar 20:55
> jars/spark-hive-thriftserver_2.12-3.0.0.jar
> -rw-r--r--@ 1 scoy  staff   690789 31 Mar 20:55
> jars/spark-hive_2.12-3.0.0.jar
> -rw-r--r--@ 1 scoy  staff   369189 31 Mar 20:55
> jars/spark-kubernetes_2.12-3.0.0.jar
> -rw-r--r--@ 1 scoy  staff59870 31 Mar 20:55
> jars/spark-kvstore_2.12-3.0.0.jar
> -rw-r--r--@ 1 scoy  staff75930 31 Mar 20:55
> jars/spark-launcher_2.12-3.0.0.jar
> -rw-r--r--@ 1 scoy  staff   294692 31 Mar 20:55
> jars/spark-mesos_2.12-3.0.0.jar
> -rw-r--r--@ 1 scoy  staff   111915 31 Mar 20:55
> jars/spark-mllib-local_2.12-3.0.0.jar
> -rw-r--r--@ 1 scoy  staff  5884976 31 Mar 20:55
> jars/spark-mllib_2.12-3.0.0.jar
> -rw-r--r--@ 1 scoy  staff  2397168 31 Mar 20:55
> jars/spark-network-common_2.12-3.0.0.jar
> -rw-r--r--@ 1 scoy  staff87065 31 Mar 20:55
> jars/spark-network-shuffle_2.12-3.0.0.jar
> -rw-r--r--@ 1 scoy  staff52605 31 Mar 20:55
> jars/spark-repl_2.12-3.0.0.jar
> -rw-r--r--@ 1 scoy  staff30347 31 Mar 20:55
> jars/spark-sketch_2.12-3.0.0.jar
> -rw-r--r--@ 1 scoy  staff  7092213 31 Mar 20:55
> jars/spark-sql_2.12-3.0.0.jar
> -rw-r--r--@ 1 scoy  staff  1137675 31 Mar 20:55
> jars/spark-streaming_2.12-3.0.0.jar
> -rw-r--r--@ 1 scoy  staff 9049 31 Mar 20:55
> jars/spark-tags_2.12-3.0.0-tests.jar
> -rw-r--r--@ 1 scoy  staff15149 31 Mar 20:55
> jars/spark-tags_2.12-3.0.0.jar
> -rw-r--r--@ 1 scoy  staff51089 31 Mar 20:55
> jars/spark-unsafe_2.12-3.0.0.jar
> -rw-r--r--@ 1 scoy  staff   329764 31 Mar 20:55
> jars/spark-yarn_2.12-3.0.0.jar
>
> At least they have not yet shown up on Maven Central…
>
> Steve C
>
> On 1 Apr 2020, at 3:18 pm, Stephen Coy 
> wrote:
>
> The download artifacts are all seem to have the “RC1” missing from their
> names.
>
> e.g. spark-3.0.0-bin-hadoop3.2.tgz
>
> Cheers,
>
> Steve C
>
> On 1 Apr 2020, at 2:04 pm, Reynold Xin  wrote:
>
> Please vote on releasing the following candidate as Apache Spark version
> 3.0.0.
>
> The vote is open until 11:59pm Pacific time Fri Apr 3, and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.0.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
> 
>
> The tag to be voted on is v3.0.0-rc1 (commit
> 6550d0d5283efdbbd838f3aeaf0476c7f52a0fb1):
> https://github.com/apache/spark/tree/v3.0.0-rc1
> 
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc1-bin/
> 
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
> 
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1341/
> 
>
> The documentation corresponding 

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-03-31 Thread Stephen Coy
Furthermore, the spark jars in these bundles all look like release versions:

[scoy@Steves-Core-i9 spark-3.0.0-bin-hadoop3.2]$ ls -l jars/spark-*
-rw-r--r--@ 1 scoy  staff  9261223 31 Mar 20:55 
jars/spark-catalyst_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff  9720421 31 Mar 20:55 jars/spark-core_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff   430854 31 Mar 20:55 jars/spark-graphx_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff  2076394 31 Mar 20:55 
jars/spark-hive-thriftserver_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff   690789 31 Mar 20:55 jars/spark-hive_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff   369189 31 Mar 20:55 
jars/spark-kubernetes_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff59870 31 Mar 20:55 
jars/spark-kvstore_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff75930 31 Mar 20:55 
jars/spark-launcher_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff   294692 31 Mar 20:55 jars/spark-mesos_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff   111915 31 Mar 20:55 
jars/spark-mllib-local_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff  5884976 31 Mar 20:55 jars/spark-mllib_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff  2397168 31 Mar 20:55 
jars/spark-network-common_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff87065 31 Mar 20:55 
jars/spark-network-shuffle_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff52605 31 Mar 20:55 jars/spark-repl_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff30347 31 Mar 20:55 jars/spark-sketch_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff  7092213 31 Mar 20:55 jars/spark-sql_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff  1137675 31 Mar 20:55 
jars/spark-streaming_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff 9049 31 Mar 20:55 
jars/spark-tags_2.12-3.0.0-tests.jar
-rw-r--r--@ 1 scoy  staff15149 31 Mar 20:55 jars/spark-tags_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff51089 31 Mar 20:55 jars/spark-unsafe_2.12-3.0.0.jar
-rw-r--r--@ 1 scoy  staff   329764 31 Mar 20:55 jars/spark-yarn_2.12-3.0.0.jar

At least they have not yet shown up on Maven Central…

Steve C

On 1 Apr 2020, at 3:18 pm, Stephen Coy 
mailto:s...@infomedia.com.au.INVALID>> wrote:

The download artifacts are all seem to have the “RC1” missing from their names.

e.g. spark-3.0.0-bin-hadoop3.2.tgz

Cheers,

Steve C

On 1 Apr 2020, at 2:04 pm, Reynold Xin 
mailto:r...@databricks.com>> wrote:


Please vote on releasing the following candidate as Apache Spark version 3.0.0.

The vote is open until 11:59pm Pacific time Fri Apr 3, and passes if a majority 
+1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 3.0.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see 
http://spark.apache.org/

The tag to be voted on is v3.0.0-rc1 (commit 
6550d0d5283efdbbd838f3aeaf0476c7f52a0fb1):
https://github.com/apache/spark/tree/v3.0.0-rc1

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc1-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1341/

The documentation corresponding to this release can be found at:

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-03-31 Thread Stephen Coy
The download artifacts are all seem to have the “RC1” missing from their names.

e.g. spark-3.0.0-bin-hadoop3.2.tgz

Cheers,

Steve C

On 1 Apr 2020, at 2:04 pm, Reynold Xin 
mailto:r...@databricks.com>> wrote:


Please vote on releasing the following candidate as Apache Spark version 3.0.0.

The vote is open until 11:59pm Pacific time Fri Apr 3, and passes if a majority 
+1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 3.0.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v3.0.0-rc1 (commit 
6550d0d5283efdbbd838f3aeaf0476c7f52a0fb1):
https://github.com/apache/spark/tree/v3.0.0-rc1

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc1-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1341/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc1-docs/

The list of bug fixes going into 2.4.5 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12339177

This release is using the release script of the tag v3.0.0-rc1.


FAQ

=
How can I help test this release?
=
If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 3.0.0?
===
The current list of open tickets targeted at 3.0.0 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target Version/s" 
= 3.0.0

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==
But my bug isn't fixed?
==
In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


Note: I fully expect this RC to fail.




This email contains confidential information of and is the copyright of 
Infomedia. It must not be forwarded, amended or disclosed without consent of 
the sender. If you received this message by mistake, please advise the sender 
and delete all copies. Security of transmission on the internet cannot be 
guaranteed, could be infected, intercepted, or corrupted and you should ensure 
you have suitable antivirus protection in place. By sending us your or any 
third party personal details, you consent to (or confirm you have obtained 
consent from such third parties) to Infomedia’s privacy policy. 
http://www.infomedia.com.au/privacy-policy/