Re: [VOTE] Release Spark 2.4.6 (RC8)

2020-06-05 Thread Holden Karau
Binding +1 and the vote passes. I’ll upload the release today/this weekend.

On a personal note, I hope everyone is doing as well as possible with
pandemic and police violence. I’ve been grappling with the implications of
our work as a community.

+1s (* binding):
Wenchen Fan *
Sean Owen *
DB Tsai *
Prashant Sharma *
Mridul Muralidharan *
Tom Graves *
Dongjoon Hyun *

0s:
None

-1s:
None

Comment without vote:
Nicholas Chammas
Xiao Li

On Wed, Jun 3, 2020 at 12:49 PM Holden Karau  wrote:

> If this is something we expect to mostly impact new users I think we can
> push them towards Spark 3 instead of introducing a behaviour change in 2.4.6
>
> On Wed, Jun 3, 2020 at 12:34 PM Mridul Muralidharan 
> wrote:
>
>>
>>   Is this a behavior change in 2.4.x from earlier version ?
>> Or are we proposing to introduce  a functionality to help with adoption ?
>>
>> Regards,
>> Mridul
>>
>>
>> On Wed, Jun 3, 2020 at 10:32 AM Xiao Li  wrote:
>>
>>> Yes. Spark 3.0 RC2 works well.
>>>
>>> I think the current behavior in Spark 2.4 affects the adoption,
>>> especially for the new users who want to try Spark in their local
>>> environment.
>>>
>>> It impacts all our built-in clients, like Scala Shell and PySpark.
>>> Should we consider back-porting it to 2.4?
>>>
>>> Although this fixes the bug, it will also introduce the behavior change.
>>> We should publicly document it and mention it in the release note. Let us
>>> review it more carefully and understand the risk and impact.
>>>
>>> Thanks,
>>>
>>> Xiao
>>>
>>> Nicholas Chammas  于2020年6月3日周三 上午10:12写道:
>>>
 I believe that was fixed in 3.0 and there was a decision not to
 backport the fix: SPARK-31170
 

 On Wed, Jun 3, 2020 at 1:04 PM Xiao Li  wrote:

> Just downloaded it in my local macbook. Trying to create a table using
> the pre-built PySpark. It sounds like the conf "spark.sql.warehouse.dir"
> does not take an effect. It is trying to create a directory in
> "file:/user/hive/warehouse/t1". I have not done any investigation yet. 
> Have
> any of you hit the same issue?
>
> C02XT0U7JGH5:bin lixiao$ ./pyspark --conf
> spark.sql.warehouse.dir="/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6"
>
> Python 2.7.16 (default, Jan 27 2020, 04:46:15)
>
> [GCC 4.2.1 Compatible Apple LLVM 10.0.1 (clang-1001.0.37.14)] on darwin
>
> Type "help", "copyright", "credits" or "license" for more information.
>
> 20/06/03 09:56:11 WARN NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
>
> Using Spark's default log4j profile:
> org/apache/spark/log4j-defaults.properties
>
> Setting default log level to "WARN".
>
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
> setLogLevel(newLevel).
>
> Welcome to
>
>     __
>
>  / __/__  ___ _/ /__
>
> _\ \/ _ \/ _ `/ __/  '_/
>
>/__ / .__/\_,_/_/ /_/\_\   version 2.4.6
>
>   /_/
>
>
> Using Python version 2.7.16 (default, Jan 27 2020 04:46:15)
>
> SparkSession available as 'spark'.
>
> >>> spark.sql("set spark.sql.warehouse.dir").show(truncate=False)
>
>
> +---+-+
>
> |key|value
> |
>
>
> +---+-+
>
>
> |spark.sql.warehouse.dir|/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6|
>
>
> +---+-+
>
>
> >>> spark.sql("create table t1 (col1 int)")
>
> 20/06/03 09:56:29 WARN HiveMetaStore: Location:
> file:/user/hive/warehouse/t1 specified for non-external table:t1
>
> Traceback (most recent call last):
>
>   File "", line 1, in 
>
>   File
> "/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6/python/pyspark/sql/session.py",
> line 767, in sql
>
> return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
>
>   File
> "/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py",
> line 1257, in __call__
>
>   File
> "/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6/python/pyspark/sql/utils.py",
> line 69, in deco
>
> raise AnalysisException(s.split(': ', 1)[1], stackTrace)
>
> pyspark.sql.utils.AnalysisException:
> u'org.apache.hadoop.hive.ql.metadata.HiveException:
> MetaException(message:file:/user/hive/warehouse/t1 is not a directory or
> unable to create one);'
>
> Dongjoon Hyun  于2020年6月3日周三 上午9:18写道:
>
>> +1
>>
>> Bests,
>> Dongjoon
>>
>> On Wed, Jun 3, 2020 at 5:59 AM Tom 

Re: [VOTE] Release Spark 2.4.6 (RC8)

2020-06-03 Thread Holden Karau
If this is something we expect to mostly impact new users I think we can
push them towards Spark 3 instead of introducing a behaviour change in 2.4.6

On Wed, Jun 3, 2020 at 12:34 PM Mridul Muralidharan 
wrote:

>
>   Is this a behavior change in 2.4.x from earlier version ?
> Or are we proposing to introduce  a functionality to help with adoption ?
>
> Regards,
> Mridul
>
>
> On Wed, Jun 3, 2020 at 10:32 AM Xiao Li  wrote:
>
>> Yes. Spark 3.0 RC2 works well.
>>
>> I think the current behavior in Spark 2.4 affects the adoption,
>> especially for the new users who want to try Spark in their local
>> environment.
>>
>> It impacts all our built-in clients, like Scala Shell and PySpark. Should
>> we consider back-porting it to 2.4?
>>
>> Although this fixes the bug, it will also introduce the behavior change.
>> We should publicly document it and mention it in the release note. Let us
>> review it more carefully and understand the risk and impact.
>>
>> Thanks,
>>
>> Xiao
>>
>> Nicholas Chammas  于2020年6月3日周三 上午10:12写道:
>>
>>> I believe that was fixed in 3.0 and there was a decision not to backport
>>> the fix: SPARK-31170 
>>>
>>> On Wed, Jun 3, 2020 at 1:04 PM Xiao Li  wrote:
>>>
 Just downloaded it in my local macbook. Trying to create a table using
 the pre-built PySpark. It sounds like the conf "spark.sql.warehouse.dir"
 does not take an effect. It is trying to create a directory in
 "file:/user/hive/warehouse/t1". I have not done any investigation yet. Have
 any of you hit the same issue?

 C02XT0U7JGH5:bin lixiao$ ./pyspark --conf
 spark.sql.warehouse.dir="/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6"

 Python 2.7.16 (default, Jan 27 2020, 04:46:15)

 [GCC 4.2.1 Compatible Apple LLVM 10.0.1 (clang-1001.0.37.14)] on darwin

 Type "help", "copyright", "credits" or "license" for more information.

 20/06/03 09:56:11 WARN NativeCodeLoader: Unable to load native-hadoop
 library for your platform... using builtin-java classes where applicable

 Using Spark's default log4j profile:
 org/apache/spark/log4j-defaults.properties

 Setting default log level to "WARN".

 To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
 setLogLevel(newLevel).

 Welcome to

     __

  / __/__  ___ _/ /__

 _\ \/ _ \/ _ `/ __/  '_/

/__ / .__/\_,_/_/ /_/\_\   version 2.4.6

   /_/


 Using Python version 2.7.16 (default, Jan 27 2020 04:46:15)

 SparkSession available as 'spark'.

 >>> spark.sql("set spark.sql.warehouse.dir").show(truncate=False)


 +---+-+

 |key|value
 |


 +---+-+


 |spark.sql.warehouse.dir|/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6|


 +---+-+


 >>> spark.sql("create table t1 (col1 int)")

 20/06/03 09:56:29 WARN HiveMetaStore: Location:
 file:/user/hive/warehouse/t1 specified for non-external table:t1

 Traceback (most recent call last):

   File "", line 1, in 

   File
 "/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6/python/pyspark/sql/session.py",
 line 767, in sql

 return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)

   File
 "/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py",
 line 1257, in __call__

   File
 "/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6/python/pyspark/sql/utils.py",
 line 69, in deco

 raise AnalysisException(s.split(': ', 1)[1], stackTrace)

 pyspark.sql.utils.AnalysisException:
 u'org.apache.hadoop.hive.ql.metadata.HiveException:
 MetaException(message:file:/user/hive/warehouse/t1 is not a directory or
 unable to create one);'

 Dongjoon Hyun  于2020年6月3日周三 上午9:18写道:

> +1
>
> Bests,
> Dongjoon
>
> On Wed, Jun 3, 2020 at 5:59 AM Tom Graves 
> wrote:
>
>>  +1
>>
>> Tom
>>
>> On Sunday, May 31, 2020, 06:47:09 PM CDT, Holden Karau <
>> hol...@pigscanfly.ca> wrote:
>>
>>
>> Please vote on releasing the following candidate as Apache Spark
>> version 2.4.6.
>>
>> The vote is open until June 5th at 9AM PST and passes if a majority
>> +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 2.4.6
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> There are currently no 

Re: [VOTE] Release Spark 2.4.6 (RC8)

2020-06-03 Thread Mridul Muralidharan
  Is this a behavior change in 2.4.x from earlier version ?
Or are we proposing to introduce  a functionality to help with adoption ?

Regards,
Mridul


On Wed, Jun 3, 2020 at 10:32 AM Xiao Li  wrote:

> Yes. Spark 3.0 RC2 works well.
>
> I think the current behavior in Spark 2.4 affects the adoption, especially
> for the new users who want to try Spark in their local environment.
>
> It impacts all our built-in clients, like Scala Shell and PySpark. Should
> we consider back-porting it to 2.4?
>
> Although this fixes the bug, it will also introduce the behavior change.
> We should publicly document it and mention it in the release note. Let us
> review it more carefully and understand the risk and impact.
>
> Thanks,
>
> Xiao
>
> Nicholas Chammas  于2020年6月3日周三 上午10:12写道:
>
>> I believe that was fixed in 3.0 and there was a decision not to backport
>> the fix: SPARK-31170 
>>
>> On Wed, Jun 3, 2020 at 1:04 PM Xiao Li  wrote:
>>
>>> Just downloaded it in my local macbook. Trying to create a table using
>>> the pre-built PySpark. It sounds like the conf "spark.sql.warehouse.dir"
>>> does not take an effect. It is trying to create a directory in
>>> "file:/user/hive/warehouse/t1". I have not done any investigation yet. Have
>>> any of you hit the same issue?
>>>
>>> C02XT0U7JGH5:bin lixiao$ ./pyspark --conf
>>> spark.sql.warehouse.dir="/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6"
>>>
>>> Python 2.7.16 (default, Jan 27 2020, 04:46:15)
>>>
>>> [GCC 4.2.1 Compatible Apple LLVM 10.0.1 (clang-1001.0.37.14)] on darwin
>>>
>>> Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> 20/06/03 09:56:11 WARN NativeCodeLoader: Unable to load native-hadoop
>>> library for your platform... using builtin-java classes where applicable
>>>
>>> Using Spark's default log4j profile:
>>> org/apache/spark/log4j-defaults.properties
>>>
>>> Setting default log level to "WARN".
>>>
>>> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
>>> setLogLevel(newLevel).
>>>
>>> Welcome to
>>>
>>>     __
>>>
>>>  / __/__  ___ _/ /__
>>>
>>> _\ \/ _ \/ _ `/ __/  '_/
>>>
>>>/__ / .__/\_,_/_/ /_/\_\   version 2.4.6
>>>
>>>   /_/
>>>
>>>
>>> Using Python version 2.7.16 (default, Jan 27 2020 04:46:15)
>>>
>>> SparkSession available as 'spark'.
>>>
>>> >>> spark.sql("set spark.sql.warehouse.dir").show(truncate=False)
>>>
>>>
>>> +---+-+
>>>
>>> |key|value
>>>   |
>>>
>>>
>>> +---+-+
>>>
>>>
>>> |spark.sql.warehouse.dir|/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6|
>>>
>>>
>>> +---+-+
>>>
>>>
>>> >>> spark.sql("create table t1 (col1 int)")
>>>
>>> 20/06/03 09:56:29 WARN HiveMetaStore: Location:
>>> file:/user/hive/warehouse/t1 specified for non-external table:t1
>>>
>>> Traceback (most recent call last):
>>>
>>>   File "", line 1, in 
>>>
>>>   File
>>> "/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6/python/pyspark/sql/session.py",
>>> line 767, in sql
>>>
>>> return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
>>>
>>>   File
>>> "/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py",
>>> line 1257, in __call__
>>>
>>>   File
>>> "/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6/python/pyspark/sql/utils.py",
>>> line 69, in deco
>>>
>>> raise AnalysisException(s.split(': ', 1)[1], stackTrace)
>>>
>>> pyspark.sql.utils.AnalysisException:
>>> u'org.apache.hadoop.hive.ql.metadata.HiveException:
>>> MetaException(message:file:/user/hive/warehouse/t1 is not a directory or
>>> unable to create one);'
>>>
>>> Dongjoon Hyun  于2020年6月3日周三 上午9:18写道:
>>>
 +1

 Bests,
 Dongjoon

 On Wed, Jun 3, 2020 at 5:59 AM Tom Graves 
 wrote:

>  +1
>
> Tom
>
> On Sunday, May 31, 2020, 06:47:09 PM CDT, Holden Karau <
> hol...@pigscanfly.ca> wrote:
>
>
> Please vote on releasing the following candidate as Apache Spark
> version 2.4.6.
>
> The vote is open until June 5th at 9AM PST and passes if a majority +1
> PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 2.4.6
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> There are currently no issues targeting 2.4.6 (try project = SPARK AND
> "Target Version/s" = "2.4.6" AND status in (Open, Reopened, "In 
> Progress"))
>
> The tag to be voted on is v2.4.6-rc8 (commit
> 807e0a484d1de767d1f02bd8a622da6450bdf940):
> https://github.com/apache/spark/tree/v2.4.6-rc8
>
> The release files, including signatures, digests, etc. can be found 

Re: [VOTE] Release Spark 2.4.6 (RC8)

2020-06-03 Thread Xiao Li
Yes. Spark 3.0 RC2 works well.

I think the current behavior in Spark 2.4 affects the adoption, especially
for the new users who want to try Spark in their local environment.

It impacts all our built-in clients, like Scala Shell and PySpark. Should
we consider back-porting it to 2.4?

Although this fixes the bug, it will also introduce the behavior change. We
should publicly document it and mention it in the release note. Let us
review it more carefully and understand the risk and impact.

Thanks,

Xiao

Nicholas Chammas  于2020年6月3日周三 上午10:12写道:

> I believe that was fixed in 3.0 and there was a decision not to backport
> the fix: SPARK-31170 
>
> On Wed, Jun 3, 2020 at 1:04 PM Xiao Li  wrote:
>
>> Just downloaded it in my local macbook. Trying to create a table using
>> the pre-built PySpark. It sounds like the conf "spark.sql.warehouse.dir"
>> does not take an effect. It is trying to create a directory in
>> "file:/user/hive/warehouse/t1". I have not done any investigation yet. Have
>> any of you hit the same issue?
>>
>> C02XT0U7JGH5:bin lixiao$ ./pyspark --conf
>> spark.sql.warehouse.dir="/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6"
>>
>> Python 2.7.16 (default, Jan 27 2020, 04:46:15)
>>
>> [GCC 4.2.1 Compatible Apple LLVM 10.0.1 (clang-1001.0.37.14)] on darwin
>>
>> Type "help", "copyright", "credits" or "license" for more information.
>>
>> 20/06/03 09:56:11 WARN NativeCodeLoader: Unable to load native-hadoop
>> library for your platform... using builtin-java classes where applicable
>>
>> Using Spark's default log4j profile:
>> org/apache/spark/log4j-defaults.properties
>>
>> Setting default log level to "WARN".
>>
>> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
>> setLogLevel(newLevel).
>>
>> Welcome to
>>
>>     __
>>
>>  / __/__  ___ _/ /__
>>
>> _\ \/ _ \/ _ `/ __/  '_/
>>
>>/__ / .__/\_,_/_/ /_/\_\   version 2.4.6
>>
>>   /_/
>>
>>
>> Using Python version 2.7.16 (default, Jan 27 2020 04:46:15)
>>
>> SparkSession available as 'spark'.
>>
>> >>> spark.sql("set spark.sql.warehouse.dir").show(truncate=False)
>>
>>
>> +---+-+
>>
>> |key|value
>>   |
>>
>>
>> +---+-+
>>
>>
>> |spark.sql.warehouse.dir|/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6|
>>
>>
>> +---+-+
>>
>>
>> >>> spark.sql("create table t1 (col1 int)")
>>
>> 20/06/03 09:56:29 WARN HiveMetaStore: Location:
>> file:/user/hive/warehouse/t1 specified for non-external table:t1
>>
>> Traceback (most recent call last):
>>
>>   File "", line 1, in 
>>
>>   File
>> "/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6/python/pyspark/sql/session.py",
>> line 767, in sql
>>
>> return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
>>
>>   File
>> "/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py",
>> line 1257, in __call__
>>
>>   File
>> "/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6/python/pyspark/sql/utils.py",
>> line 69, in deco
>>
>> raise AnalysisException(s.split(': ', 1)[1], stackTrace)
>>
>> pyspark.sql.utils.AnalysisException:
>> u'org.apache.hadoop.hive.ql.metadata.HiveException:
>> MetaException(message:file:/user/hive/warehouse/t1 is not a directory or
>> unable to create one);'
>>
>> Dongjoon Hyun  于2020年6月3日周三 上午9:18写道:
>>
>>> +1
>>>
>>> Bests,
>>> Dongjoon
>>>
>>> On Wed, Jun 3, 2020 at 5:59 AM Tom Graves 
>>> wrote:
>>>
  +1

 Tom

 On Sunday, May 31, 2020, 06:47:09 PM CDT, Holden Karau <
 hol...@pigscanfly.ca> wrote:


 Please vote on releasing the following candidate as Apache Spark
 version 2.4.6.

 The vote is open until June 5th at 9AM PST and passes if a majority +1
 PMC votes are cast, with a minimum of 3 +1 votes.

 [ ] +1 Release this package as Apache Spark 2.4.6
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see http://spark.apache.org/

 There are currently no issues targeting 2.4.6 (try project = SPARK AND
 "Target Version/s" = "2.4.6" AND status in (Open, Reopened, "In Progress"))

 The tag to be voted on is v2.4.6-rc8 (commit
 807e0a484d1de767d1f02bd8a622da6450bdf940):
 https://github.com/apache/spark/tree/v2.4.6-rc8

 The release files, including signatures, digests, etc. can be found at:
 https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-bin/

 Signatures used for Spark RCs can be found in this file:
 https://dist.apache.org/repos/dist/dev/spark/KEYS

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1349/

 The documentation 

Re: [VOTE] Release Spark 2.4.6 (RC8)

2020-06-03 Thread Nicholas Chammas
I believe that was fixed in 3.0 and there was a decision not to backport
the fix: SPARK-31170 

On Wed, Jun 3, 2020 at 1:04 PM Xiao Li  wrote:

> Just downloaded it in my local macbook. Trying to create a table using the
> pre-built PySpark. It sounds like the conf "spark.sql.warehouse.dir"
> does not take an effect. It is trying to create a directory in
> "file:/user/hive/warehouse/t1". I have not done any investigation yet. Have
> any of you hit the same issue?
>
> C02XT0U7JGH5:bin lixiao$ ./pyspark --conf
> spark.sql.warehouse.dir="/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6"
>
> Python 2.7.16 (default, Jan 27 2020, 04:46:15)
>
> [GCC 4.2.1 Compatible Apple LLVM 10.0.1 (clang-1001.0.37.14)] on darwin
>
> Type "help", "copyright", "credits" or "license" for more information.
>
> 20/06/03 09:56:11 WARN NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
>
> Using Spark's default log4j profile:
> org/apache/spark/log4j-defaults.properties
>
> Setting default log level to "WARN".
>
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
> setLogLevel(newLevel).
>
> Welcome to
>
>     __
>
>  / __/__  ___ _/ /__
>
> _\ \/ _ \/ _ `/ __/  '_/
>
>/__ / .__/\_,_/_/ /_/\_\   version 2.4.6
>
>   /_/
>
>
> Using Python version 2.7.16 (default, Jan 27 2020 04:46:15)
>
> SparkSession available as 'spark'.
>
> >>> spark.sql("set spark.sql.warehouse.dir").show(truncate=False)
>
> +---+-+
>
> |key|value
> |
>
> +---+-+
>
> |spark.sql.warehouse.dir|/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6|
>
> +---+-+
>
>
> >>> spark.sql("create table t1 (col1 int)")
>
> 20/06/03 09:56:29 WARN HiveMetaStore: Location:
> file:/user/hive/warehouse/t1 specified for non-external table:t1
>
> Traceback (most recent call last):
>
>   File "", line 1, in 
>
>   File
> "/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6/python/pyspark/sql/session.py",
> line 767, in sql
>
> return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
>
>   File
> "/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py",
> line 1257, in __call__
>
>   File
> "/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6/python/pyspark/sql/utils.py",
> line 69, in deco
>
> raise AnalysisException(s.split(': ', 1)[1], stackTrace)
>
> pyspark.sql.utils.AnalysisException:
> u'org.apache.hadoop.hive.ql.metadata.HiveException:
> MetaException(message:file:/user/hive/warehouse/t1 is not a directory or
> unable to create one);'
>
> Dongjoon Hyun  于2020年6月3日周三 上午9:18写道:
>
>> +1
>>
>> Bests,
>> Dongjoon
>>
>> On Wed, Jun 3, 2020 at 5:59 AM Tom Graves 
>> wrote:
>>
>>>  +1
>>>
>>> Tom
>>>
>>> On Sunday, May 31, 2020, 06:47:09 PM CDT, Holden Karau <
>>> hol...@pigscanfly.ca> wrote:
>>>
>>>
>>> Please vote on releasing the following candidate as Apache Spark
>>> version 2.4.6.
>>>
>>> The vote is open until June 5th at 9AM PST and passes if a majority +1
>>> PMC votes are cast, with a minimum of 3 +1 votes.
>>>
>>> [ ] +1 Release this package as Apache Spark 2.4.6
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> There are currently no issues targeting 2.4.6 (try project = SPARK AND
>>> "Target Version/s" = "2.4.6" AND status in (Open, Reopened, "In Progress"))
>>>
>>> The tag to be voted on is v2.4.6-rc8 (commit
>>> 807e0a484d1de767d1f02bd8a622da6450bdf940):
>>> https://github.com/apache/spark/tree/v2.4.6-rc8
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-bin/
>>>
>>> Signatures used for Spark RCs can be found in this file:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1349/
>>>
>>> The documentation corresponding to this release can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-docs/
>>>
>>> The list of bug fixes going into 2.4.6 can be found at the following URL:
>>> https://issues.apache.org/jira/projects/SPARK/versions/12346781
>>>
>>> This release is using the release script of the tag v2.4.6-rc8.
>>>
>>> FAQ
>>>
>>> =
>>> What happened to the other RCs?
>>> =
>>>
>>> The parallel maven build caused some flakiness so I wasn't comfortable
>>> releasing them. I backported the fix from the 3.0 branch for this release.
>>> I've got a proposed change to the build script so that we only push tags
>>> when once the build is a 

Re: [VOTE] Release Spark 2.4.6 (RC8)

2020-06-03 Thread Xiao Li
Just downloaded it in my local macbook. Trying to create a table using the
pre-built PySpark. It sounds like the conf "spark.sql.warehouse.dir"
does not take an effect. It is trying to create a directory in
"file:/user/hive/warehouse/t1". I have not done any investigation yet. Have
any of you hit the same issue?

C02XT0U7JGH5:bin lixiao$ ./pyspark --conf
spark.sql.warehouse.dir="/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6"

Python 2.7.16 (default, Jan 27 2020, 04:46:15)

[GCC 4.2.1 Compatible Apple LLVM 10.0.1 (clang-1001.0.37.14)] on darwin

Type "help", "copyright", "credits" or "license" for more information.

20/06/03 09:56:11 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable

Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties

Setting default log level to "WARN".

To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
setLogLevel(newLevel).

Welcome to

    __

 / __/__  ___ _/ /__

_\ \/ _ \/ _ `/ __/  '_/

   /__ / .__/\_,_/_/ /_/\_\   version 2.4.6

  /_/


Using Python version 2.7.16 (default, Jan 27 2020 04:46:15)

SparkSession available as 'spark'.

>>> spark.sql("set spark.sql.warehouse.dir").show(truncate=False)

+---+-+

|key|value|

+---+-+

|spark.sql.warehouse.dir|/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6|

+---+-+


>>> spark.sql("create table t1 (col1 int)")

20/06/03 09:56:29 WARN HiveMetaStore: Location:
file:/user/hive/warehouse/t1 specified for non-external table:t1

Traceback (most recent call last):

  File "", line 1, in 

  File
"/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6/python/pyspark/sql/session.py",
line 767, in sql

return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)

  File
"/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py",
line 1257, in __call__

  File
"/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6/python/pyspark/sql/utils.py",
line 69, in deco

raise AnalysisException(s.split(': ', 1)[1], stackTrace)

pyspark.sql.utils.AnalysisException:
u'org.apache.hadoop.hive.ql.metadata.HiveException:
MetaException(message:file:/user/hive/warehouse/t1 is not a directory or
unable to create one);'

Dongjoon Hyun  于2020年6月3日周三 上午9:18写道:

> +1
>
> Bests,
> Dongjoon
>
> On Wed, Jun 3, 2020 at 5:59 AM Tom Graves 
> wrote:
>
>>  +1
>>
>> Tom
>>
>> On Sunday, May 31, 2020, 06:47:09 PM CDT, Holden Karau <
>> hol...@pigscanfly.ca> wrote:
>>
>>
>> Please vote on releasing the following candidate as Apache Spark
>> version 2.4.6.
>>
>> The vote is open until June 5th at 9AM PST and passes if a majority +1
>> PMC votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 2.4.6
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> There are currently no issues targeting 2.4.6 (try project = SPARK AND
>> "Target Version/s" = "2.4.6" AND status in (Open, Reopened, "In Progress"))
>>
>> The tag to be voted on is v2.4.6-rc8 (commit
>> 807e0a484d1de767d1f02bd8a622da6450bdf940):
>> https://github.com/apache/spark/tree/v2.4.6-rc8
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1349/
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-docs/
>>
>> The list of bug fixes going into 2.4.6 can be found at the following URL:
>> https://issues.apache.org/jira/projects/SPARK/versions/12346781
>>
>> This release is using the release script of the tag v2.4.6-rc8.
>>
>> FAQ
>>
>> =
>> What happened to the other RCs?
>> =
>>
>> The parallel maven build caused some flakiness so I wasn't comfortable
>> releasing them. I backported the fix from the 3.0 branch for this release.
>> I've got a proposed change to the build script so that we only push tags
>> when once the build is a success for the future, but it does not block this
>> release.
>>
>> =
>> How can I help test this release?
>> =
>>
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're 

Re: [VOTE] Release Spark 2.4.6 (RC8)

2020-06-03 Thread Dongjoon Hyun
+1

Bests,
Dongjoon

On Wed, Jun 3, 2020 at 5:59 AM Tom Graves 
wrote:

>  +1
>
> Tom
>
> On Sunday, May 31, 2020, 06:47:09 PM CDT, Holden Karau <
> hol...@pigscanfly.ca> wrote:
>
>
> Please vote on releasing the following candidate as Apache Spark
> version 2.4.6.
>
> The vote is open until June 5th at 9AM PST and passes if a majority +1 PMC
> votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 2.4.6
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> There are currently no issues targeting 2.4.6 (try project = SPARK AND
> "Target Version/s" = "2.4.6" AND status in (Open, Reopened, "In Progress"))
>
> The tag to be voted on is v2.4.6-rc8 (commit
> 807e0a484d1de767d1f02bd8a622da6450bdf940):
> https://github.com/apache/spark/tree/v2.4.6-rc8
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1349/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-docs/
>
> The list of bug fixes going into 2.4.6 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12346781
>
> This release is using the release script of the tag v2.4.6-rc8.
>
> FAQ
>
> =
> What happened to the other RCs?
> =
>
> The parallel maven build caused some flakiness so I wasn't comfortable
> releasing them. I backported the fix from the 3.0 branch for this release.
> I've got a proposed change to the build script so that we only push tags
> when once the build is a success for the future, but it does not block this
> release.
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with an out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 2.4.6?
> ===
>
> The current list of open tickets targeted at 2.4.6 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 2.4.6
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>


Re: [VOTE] Release Spark 2.4.6 (RC8)

2020-06-03 Thread Tom Graves
  +1
Tom
On Sunday, May 31, 2020, 06:47:09 PM CDT, Holden Karau 
 wrote:  
 
 Please vote on releasing the following candidate as Apache Spark version 2.4.6.

The vote is open until June 5th at 9AM PST and passes if a majority +1 PMC 
votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 2.4.6
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

There are currently no issues targeting 2.4.6 (try project = SPARK AND "Target 
Version/s" = "2.4.6" AND status in (Open, Reopened, "In Progress"))
The tag to be voted on is v2.4.6-rc8 (commit 
807e0a484d1de767d1f02bd8a622da6450bdf940):
https://github.com/apache/spark/tree/v2.4.6-rc8
The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-bin/
Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1349/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-docs/
The list of bug fixes going into 2.4.6 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12346781
This release is using the release script of the tag v2.4.6-rc8.

FAQ

=
What happened to the other RCs?=

The parallel maven build caused some flakiness so I wasn't comfortable 
releasing them. I backported the fix from the 3.0 branch for this release. I've 
got a proposed change to the build script so that we only push tags when once 
the build is a success for the future, but it does not block this release.
=
How can I help test this release?
=

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with an out of date RC going forward).

===
What should happen to JIRA tickets still targeting 2.4.6?
===

The current list of open tickets targeted at 2.4.6 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target Version/s" 
= 2.4.6

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==
But my bug isn't fixed?
==

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.

-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
YouTube Live Streams: https://www.youtube.com/user/holdenkarau  

Re: [VOTE] Release Spark 2.4.6 (RC8)

2020-06-02 Thread Mridul Muralidharan
+1 (binding)


Thanks,
Mridul


On Sun, May 31, 2020 at 4:47 PM Holden Karau  wrote:

> Please vote on releasing the following candidate as Apache Spark
> version 2.4.6.
>
> The vote is open until June 5th at 9AM PST and passes if a majority +1 PMC
> votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 2.4.6
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> There are currently no issues targeting 2.4.6 (try project = SPARK AND
> "Target Version/s" = "2.4.6" AND status in (Open, Reopened, "In Progress"))
>
> The tag to be voted on is v2.4.6-rc8 (commit
> 807e0a484d1de767d1f02bd8a622da6450bdf940):
> https://github.com/apache/spark/tree/v2.4.6-rc8
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1349/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-docs/
>
> The list of bug fixes going into 2.4.6 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12346781
>
> This release is using the release script of the tag v2.4.6-rc8.
>
> FAQ
>
> =
> What happened to the other RCs?
> =
>
> The parallel maven build caused some flakiness so I wasn't comfortable
> releasing them. I backported the fix from the 3.0 branch for this release.
> I've got a proposed change to the build script so that we only push tags
> when once the build is a success for the future, but it does not block this
> release.
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with an out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 2.4.6?
> ===
>
> The current list of open tickets targeted at 2.4.6 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 2.4.6
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>


Re: [VOTE] Release Spark 2.4.6 (RC8)

2020-06-01 Thread Prashant Sharma
+1

Thanks!

On Mon, Jun 1, 2020 at 10:50 AM Holden Karau  wrote:

> Yes thats correct, the release script needs a bit of work and it's
> diverged a bit from 3.0 as well. I'll follow up with some more PRs in
> addition to the current one I have.
>
> On Sun, May 31, 2020 at 10:08 PM Sean Owen  wrote:
>
>> I suspect there were some problems with the release script to fix.
>>
>> +1 from me, same as last time. This still appears to be OK in licenses
>> and sigs, and source compiles and passes tests.
>>
>> On Sun, May 31, 2020 at 11:23 PM Wenchen Fan  wrote:
>>
>>> +1 (binding), although I don't know why we jump from RC 3 to RC 8...
>>>
>>> On Mon, Jun 1, 2020 at 7:47 AM Holden Karau 
>>> wrote:
>>>
 Please vote on releasing the following candidate as Apache Spark
 version 2.4.6.

 The vote is open until June 5th at 9AM PST and passes if a majority +1
 PMC votes are cast, with a minimum of 3 +1 votes.

 [ ] +1 Release this package as Apache Spark 2.4.6
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see http://spark.apache.org/

 There are currently no issues targeting 2.4.6 (try project = SPARK AND
 "Target Version/s" = "2.4.6" AND status in (Open, Reopened, "In Progress"))

 The tag to be voted on is v2.4.6-rc8 (commit
 807e0a484d1de767d1f02bd8a622da6450bdf940):
 https://github.com/apache/spark/tree/v2.4.6-rc8

 The release files, including signatures, digests, etc. can be found at:
 https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-bin/

 Signatures used for Spark RCs can be found in this file:
 https://dist.apache.org/repos/dist/dev/spark/KEYS

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1349/

 The documentation corresponding to this release can be found at:
 https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-docs/

 The list of bug fixes going into 2.4.6 can be found at the following
 URL:
 https://issues.apache.org/jira/projects/SPARK/versions/12346781

 This release is using the release script of the tag v2.4.6-rc8.

 FAQ

 =
 What happened to the other RCs?
 =

 The parallel maven build caused some flakiness so I wasn't comfortable
 releasing them. I backported the fix from the 3.0 branch for this release.
 I've got a proposed change to the build script so that we only push tags
 when once the build is a success for the future, but it does not block this
 release.

 =
 How can I help test this release?
 =

 If you are a Spark user, you can help us test this release by taking
 an existing Spark workload and running on this release candidate, then
 reporting any regressions.

 If you're working in PySpark you can set up a virtual env and install
 the current RC and see if anything important breaks, in the Java/Scala
 you can add the staging repository to your projects resolvers and test
 with the RC (make sure to clean up the artifact cache before/after so
 you don't end up building with an out of date RC going forward).

 ===
 What should happen to JIRA tickets still targeting 2.4.6?
 ===

 The current list of open tickets targeted at 2.4.6 can be found at:
 https://issues.apache.org/jira/projects/SPARK and search for "Target
 Version/s" = 2.4.6

 Committers should look at those and triage. Extremely important bug
 fixes, documentation, and API tweaks that impact compatibility should
 be worked on immediately. Everything else please retarget to an
 appropriate release.

 ==
 But my bug isn't fixed?
 ==

 In order to make timely releases, we will typically not hold the
 release unless the bug in question is a regression from the previous
 release. That being said, if there is something which is a regression
 that has not been correctly targeted please ping me or a committer to
 help target the issue.


 --
 Twitter: https://twitter.com/holdenkarau
 Books (Learning Spark, High Performance Spark, etc.):
 https://amzn.to/2MaRAG9  
 YouTube Live Streams: https://www.youtube.com/user/holdenkarau

>>>
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>


Re: [VOTE] Release Spark 2.4.6 (RC8)

2020-05-31 Thread Holden Karau
Yes thats correct, the release script needs a bit of work and it's diverged
a bit from 3.0 as well. I'll follow up with some more PRs in addition to
the current one I have.

On Sun, May 31, 2020 at 10:08 PM Sean Owen  wrote:

> I suspect there were some problems with the release script to fix.
>
> +1 from me, same as last time. This still appears to be OK in licenses and
> sigs, and source compiles and passes tests.
>
> On Sun, May 31, 2020 at 11:23 PM Wenchen Fan  wrote:
>
>> +1 (binding), although I don't know why we jump from RC 3 to RC 8...
>>
>> On Mon, Jun 1, 2020 at 7:47 AM Holden Karau  wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark
>>> version 2.4.6.
>>>
>>> The vote is open until June 5th at 9AM PST and passes if a majority +1
>>> PMC votes are cast, with a minimum of 3 +1 votes.
>>>
>>> [ ] +1 Release this package as Apache Spark 2.4.6
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> There are currently no issues targeting 2.4.6 (try project = SPARK AND
>>> "Target Version/s" = "2.4.6" AND status in (Open, Reopened, "In Progress"))
>>>
>>> The tag to be voted on is v2.4.6-rc8 (commit
>>> 807e0a484d1de767d1f02bd8a622da6450bdf940):
>>> https://github.com/apache/spark/tree/v2.4.6-rc8
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-bin/
>>>
>>> Signatures used for Spark RCs can be found in this file:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1349/
>>>
>>> The documentation corresponding to this release can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-docs/
>>>
>>> The list of bug fixes going into 2.4.6 can be found at the following URL:
>>> https://issues.apache.org/jira/projects/SPARK/versions/12346781
>>>
>>> This release is using the release script of the tag v2.4.6-rc8.
>>>
>>> FAQ
>>>
>>> =
>>> What happened to the other RCs?
>>> =
>>>
>>> The parallel maven build caused some flakiness so I wasn't comfortable
>>> releasing them. I backported the fix from the 3.0 branch for this release.
>>> I've got a proposed change to the build script so that we only push tags
>>> when once the build is a success for the future, but it does not block this
>>> release.
>>>
>>> =
>>> How can I help test this release?
>>> =
>>>
>>> If you are a Spark user, you can help us test this release by taking
>>> an existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> If you're working in PySpark you can set up a virtual env and install
>>> the current RC and see if anything important breaks, in the Java/Scala
>>> you can add the staging repository to your projects resolvers and test
>>> with the RC (make sure to clean up the artifact cache before/after so
>>> you don't end up building with an out of date RC going forward).
>>>
>>> ===
>>> What should happen to JIRA tickets still targeting 2.4.6?
>>> ===
>>>
>>> The current list of open tickets targeted at 2.4.6 can be found at:
>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>> Version/s" = 2.4.6
>>>
>>> Committers should look at those and triage. Extremely important bug
>>> fixes, documentation, and API tweaks that impact compatibility should
>>> be worked on immediately. Everything else please retarget to an
>>> appropriate release.
>>>
>>> ==
>>> But my bug isn't fixed?
>>> ==
>>>
>>> In order to make timely releases, we will typically not hold the
>>> release unless the bug in question is a regression from the previous
>>> release. That being said, if there is something which is a regression
>>> that has not been correctly targeted please ping me or a committer to
>>> help target the issue.
>>>
>>>
>>> --
>>> Twitter: https://twitter.com/holdenkarau
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9  
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>

-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


Re: [VOTE] Release Spark 2.4.6 (RC8)

2020-05-31 Thread DB Tsai
+1 (binding), thanks!

On Sun, May 31, 2020 at 9:23 PM Wenchen Fan  wrote:

> +1 (binding), although I don't know why we jump from RC 3 to RC 8...
>
> On Mon, Jun 1, 2020 at 7:47 AM Holden Karau  wrote:
>
>> Please vote on releasing the following candidate as Apache Spark
>> version 2.4.6.
>>
>> The vote is open until June 5th at 9AM PST and passes if a majority +1
>> PMC votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 2.4.6
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> There are currently no issues targeting 2.4.6 (try project = SPARK AND
>> "Target Version/s" = "2.4.6" AND status in (Open, Reopened, "In Progress"))
>>
>> The tag to be voted on is v2.4.6-rc8 (commit
>> 807e0a484d1de767d1f02bd8a622da6450bdf940):
>> https://github.com/apache/spark/tree/v2.4.6-rc8
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1349/
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-docs/
>>
>> The list of bug fixes going into 2.4.6 can be found at the following URL:
>> https://issues.apache.org/jira/projects/SPARK/versions/12346781
>>
>> This release is using the release script of the tag v2.4.6-rc8.
>>
>> FAQ
>>
>> =
>> What happened to the other RCs?
>> =
>>
>> The parallel maven build caused some flakiness so I wasn't comfortable
>> releasing them. I backported the fix from the 3.0 branch for this release.
>> I've got a proposed change to the build script so that we only push tags
>> when once the build is a success for the future, but it does not block this
>> release.
>>
>> =
>> How can I help test this release?
>> =
>>
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install
>> the current RC and see if anything important breaks, in the Java/Scala
>> you can add the staging repository to your projects resolvers and test
>> with the RC (make sure to clean up the artifact cache before/after so
>> you don't end up building with an out of date RC going forward).
>>
>> ===
>> What should happen to JIRA tickets still targeting 2.4.6?
>> ===
>>
>> The current list of open tickets targeted at 2.4.6 can be found at:
>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>> Version/s" = 2.4.6
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should
>> be worked on immediately. Everything else please retarget to an
>> appropriate release.
>>
>> ==
>> But my bug isn't fixed?
>> ==
>>
>> In order to make timely releases, we will typically not hold the
>> release unless the bug in question is a regression from the previous
>> release. That being said, if there is something which is a regression
>> that has not been correctly targeted please ping me or a committer to
>> help target the issue.
>>
>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
> --
- DB Sent from my iPhone


Re: [VOTE] Release Spark 2.4.6 (RC8)

2020-05-31 Thread Sean Owen
I suspect there were some problems with the release script to fix.

+1 from me, same as last time. This still appears to be OK in licenses and
sigs, and source compiles and passes tests.

On Sun, May 31, 2020 at 11:23 PM Wenchen Fan  wrote:

> +1 (binding), although I don't know why we jump from RC 3 to RC 8...
>
> On Mon, Jun 1, 2020 at 7:47 AM Holden Karau  wrote:
>
>> Please vote on releasing the following candidate as Apache Spark
>> version 2.4.6.
>>
>> The vote is open until June 5th at 9AM PST and passes if a majority +1
>> PMC votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 2.4.6
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> There are currently no issues targeting 2.4.6 (try project = SPARK AND
>> "Target Version/s" = "2.4.6" AND status in (Open, Reopened, "In Progress"))
>>
>> The tag to be voted on is v2.4.6-rc8 (commit
>> 807e0a484d1de767d1f02bd8a622da6450bdf940):
>> https://github.com/apache/spark/tree/v2.4.6-rc8
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1349/
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-docs/
>>
>> The list of bug fixes going into 2.4.6 can be found at the following URL:
>> https://issues.apache.org/jira/projects/SPARK/versions/12346781
>>
>> This release is using the release script of the tag v2.4.6-rc8.
>>
>> FAQ
>>
>> =
>> What happened to the other RCs?
>> =
>>
>> The parallel maven build caused some flakiness so I wasn't comfortable
>> releasing them. I backported the fix from the 3.0 branch for this release.
>> I've got a proposed change to the build script so that we only push tags
>> when once the build is a success for the future, but it does not block this
>> release.
>>
>> =
>> How can I help test this release?
>> =
>>
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install
>> the current RC and see if anything important breaks, in the Java/Scala
>> you can add the staging repository to your projects resolvers and test
>> with the RC (make sure to clean up the artifact cache before/after so
>> you don't end up building with an out of date RC going forward).
>>
>> ===
>> What should happen to JIRA tickets still targeting 2.4.6?
>> ===
>>
>> The current list of open tickets targeted at 2.4.6 can be found at:
>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>> Version/s" = 2.4.6
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should
>> be worked on immediately. Everything else please retarget to an
>> appropriate release.
>>
>> ==
>> But my bug isn't fixed?
>> ==
>>
>> In order to make timely releases, we will typically not hold the
>> release unless the bug in question is a regression from the previous
>> release. That being said, if there is something which is a regression
>> that has not been correctly targeted please ping me or a committer to
>> help target the issue.
>>
>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>


Re: [VOTE] Release Spark 2.4.6 (RC8)

2020-05-31 Thread Wenchen Fan
+1 (binding), although I don't know why we jump from RC 3 to RC 8...

On Mon, Jun 1, 2020 at 7:47 AM Holden Karau  wrote:

> Please vote on releasing the following candidate as Apache Spark
> version 2.4.6.
>
> The vote is open until June 5th at 9AM PST and passes if a majority +1 PMC
> votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 2.4.6
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> There are currently no issues targeting 2.4.6 (try project = SPARK AND
> "Target Version/s" = "2.4.6" AND status in (Open, Reopened, "In Progress"))
>
> The tag to be voted on is v2.4.6-rc8 (commit
> 807e0a484d1de767d1f02bd8a622da6450bdf940):
> https://github.com/apache/spark/tree/v2.4.6-rc8
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1349/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-docs/
>
> The list of bug fixes going into 2.4.6 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12346781
>
> This release is using the release script of the tag v2.4.6-rc8.
>
> FAQ
>
> =
> What happened to the other RCs?
> =
>
> The parallel maven build caused some flakiness so I wasn't comfortable
> releasing them. I backported the fix from the 3.0 branch for this release.
> I've got a proposed change to the build script so that we only push tags
> when once the build is a success for the future, but it does not block this
> release.
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with an out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 2.4.6?
> ===
>
> The current list of open tickets targeted at 2.4.6 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 2.4.6
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>


[VOTE] Release Spark 2.4.6 (RC8)

2020-05-31 Thread Holden Karau
Please vote on releasing the following candidate as Apache Spark
version 2.4.6.

The vote is open until June 5th at 9AM PST and passes if a majority +1 PMC
votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 2.4.6
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

There are currently no issues targeting 2.4.6 (try project = SPARK AND
"Target Version/s" = "2.4.6" AND status in (Open, Reopened, "In Progress"))

The tag to be voted on is v2.4.6-rc8 (commit
807e0a484d1de767d1f02bd8a622da6450bdf940):
https://github.com/apache/spark/tree/v2.4.6-rc8

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1349/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-docs/

The list of bug fixes going into 2.4.6 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12346781

This release is using the release script of the tag v2.4.6-rc8.

FAQ

=
What happened to the other RCs?
=

The parallel maven build caused some flakiness so I wasn't comfortable
releasing them. I backported the fix from the 3.0 branch for this release.
I've got a proposed change to the build script so that we only push tags
when once the build is a success for the future, but it does not block this
release.

=
How can I help test this release?
=

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with an out of date RC going forward).

===
What should happen to JIRA tickets still targeting 2.4.6?
===

The current list of open tickets targeted at 2.4.6 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target
Version/s" = 2.4.6

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==
But my bug isn't fixed?
==

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau