Re: [VOTE] Release Spark 3.1.1 (RC1)

2021-02-06 Thread Hyukjin Kwon
Looks like we resolved all standing issues known so far. I will start
another RC next Monday PST.

2021년 2월 4일 (목) 오전 12:03, Kent Yao 님이 작성:

> Sending https://github.com/apache/spark/pull/31460
>
> Based my research so far, when there is there is an existing
> *io.file.buffer.size* in hive-site.xml, the hadoopConf finallly get reset
> by that.
> In many real-world cases, when interacting with hive catalog through
> Spark SQL, users may just share thehive-site.xm for their hive jobs and
> make a copy to SPARK_HOM/conf w/o modification. In Spark, when we
> generate Hadoop configurations, we will use*spark.buffer.size(65536)* to
> reseti*o.file.buffer.size(4096)*. But when we load the hive-site.xml, we
> may ignore this behavior and reset *io.file.buffer.size* again according
> to hive-site.xml.
>
> The PR fixes:
> 1. The configuration priority for setting Hadoop and Hive config here is
> not right, while literally, the order should be *spark > spark.hive >
> spark.hadoop > hive > hadoop*
> 2. This breaks *spark.buffer.size* congfig's behavior for tuning the IO
> performance w/ HDFS if there is an existing io.file.buffer.size in
> hive-site.xml
>
> *Kent Yao *
> @ Data Science Center, Hangzhou Research Institute, NetEase Corp.
> *a spark enthusiast*
> *kyuubi is a unified multi-tenant JDBC
> interface for large-scale data processing and analytics, built on top
> of Apache Spark .*
> *spark-authorizer A Spark
> SQL extension which provides SQL Standard Authorization for **Apache
> Spark .*
> *spark-postgres  A library for
> reading data from and transferring data to Postgres / Greenplum with Spark
> SQL and DataFrames, 10~100x faster.*
> *spark-func-extras A
> library that brings excellent and useful functions from various modern
> database management systems to Apache Spark .*
>
>
>
> On 02/3/2021 15:36,Maxim Gekk
>  wrote:
>
> Hi All,
>
> > Also I am investigating a performance regression in some TPC-DS queries
> (q88 for instance) that is caused by a recent commit in 3.1 ...
>
> I have found that the perf regression is caused by the Hadoop config:
> io.file.buffer.size = 4096
> Before the commit
> https://github.com/apache/spark/commit/278f6f45f46ccafc7a31007d51ab9cb720c9cb14,
> we had:
> io.file.buffer.size = 65536
>
> Maxim Gekk
>
> Software Engineer
>
> Databricks, Inc.
>
>
> On Wed, Feb 3, 2021 at 2:37 AM Hyukjin Kwon  wrote:
>
>> Yeah, agree. I changed. Thanks for the heads up. Tom.
>>
>> 2021년 2월 3일 (수) 오전 8:31, Tom Graves 님이 작성:
>>
>>> ok thanks for the update. That is marked as an improvement, if its a
>>> blocker can we mark it as such and describe why.  I searched jiras and
>>> didn't see any critical or blockers open.
>>>
>>> Tom
>>> On Tuesday, February 2, 2021, 05:12:24 PM CST, Hyukjin Kwon <
>>> gurwls...@gmail.com> wrote:
>>>
>>>
>>> There is one here: https://github.com/apache/spark/pull/31440. There
>>> look several issues being identified (to confirm that this is an issue in
>>> OSS too), and fixed in parallel.
>>> There are a bit of unexpected delays here as several issues more were
>>> found. I will try to file and share relevant JIRAs as soon as I can confirm.
>>>
>>> 2021년 2월 3일 (수) 오전 2:36, Tom Graves 님이 작성:
>>>
>>> Just curious if we have an update on next rc? is there a jira for the
>>> tpcds issue?
>>>
>>> Thanks,
>>> Tom
>>>
>>> On Wednesday, January 27, 2021, 05:46:27 PM CST, Hyukjin Kwon <
>>> gurwls...@gmail.com> wrote:
>>>
>>>
>>> Just to share the current status, most of the known issues were
>>> resolved. Let me know if there are some more.
>>> One thing left is a performance regression in TPCDS being investigated.
>>> Once this is identified (and fixed if it should be), I will cut another RC
>>> right away.
>>> I roughly expect to cut another RC next Monday.
>>>
>>> Thanks guys.
>>>
>>> 2021년 1월 27일 (수) 오전 5:26, Terry Kim 님이 작성:
>>>
>>> Hi,
>>>
>>> Please check if the following regression should be included:
>>> https://github.com/apache/spark/pull/31352
>>>
>>> Thanks,
>>> Terry
>>>
>>> On Tue, Jan 26, 2021 at 7:54 AM Holden Karau 
>>> wrote:
>>>
>>> If were ok waiting for it, I’d like to get
>>> https://github.com/apache/spark/pull/31298 in as well (it’s not a
>>> regression but it is a bug fix).
>>>
>>> On Tue, Jan 26, 2021 at 6:38 AM Hyukjin Kwon 
>>> wrote:
>>>
>>> It looks like a cool one but it's a pretty big one and affects the plans
>>> considerably ... maybe it's best to avoid adding it into 3.1.1 in
>>> particular during the RC period if this isn't a clear regression that
>>> affects many users.
>>>
>>> 2021년 1월 26일 (화) 오후 11:23, Peter Toth 님이 작성:
>>>
>>> Hey,
>>>
>>> Sorry for chiming in a bit late, but I would like to suggest my PR (
>>> https://github.com/apache/spark/pull/28885) for review and inclusion
>>> 

Re: [VOTE] Release Spark 3.1.1 (RC1)

2021-02-03 Thread Kent Yao







Sending https://github.com/apache/spark/pull/31460Based my research so far, when there is there is an existingio.file.buffer.size in hive-site.xml, the hadoopConf finallly get reset by that. In many real-world cases, when interacting with hive catalog through Spark SQL, users may just share thehive-site.xm for their hive jobs and make a copy to SPARK_HOM/conf w/o modification. In Spark, when we generate Hadoop configurations, we will usespark.buffer.size(65536) to resetio.file.buffer.size(4096). But when we load the hive-site.xml, we may ignore this behavior and reset io.file.buffer.size again according to hive-site.xml.The PR fixes:1. The configuration priority for setting Hadoop and Hive config here is not right, while literally, the order should be spark > spark.hive > spark.hadoop > hive > hadoop2. This breaks spark.buffer.size congfig's behavior for tuning the IO performance w/ HDFS if there is an existing io.file.buffer.size in hive-site.xml






  



















Kent Yao @ Data Science Center, Hangzhou Research Institute, NetEase Corp.a spark enthusiastkyuubiis a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark.spark-authorizerA Spark SQL extension which provides SQL Standard Authorization for Apache Spark.spark-postgres A library for reading data from and transferring data to Postgres / Greenplum with Spark SQL and DataFrames, 10~100x faster.spark-func-extrasA library that brings excellent and useful functions from various modern database management systems to Apache Spark.















 


On 02/3/2021 15:36,Maxim Gekk wrote: 


Hi All,> Also I am investigating a performance regression in some TPC-DS queries (q88 for instance) that is caused by a recent commit in 3.1 ...I have found that the perf regression is caused by the Hadoop config:io.file.buffer.size = 4096Before the commit https://github.com/apache/spark/commit/278f6f45f46ccafc7a31007d51ab9cb720c9cb14, we had:io.file.buffer.size = 65536 Maxim GekkSoftware EngineerDatabricks, Inc.On Wed, Feb 3, 2021 at 2:37 AM Hyukjin Kwon  wrote:Yeah, agree. I changed. Thanks for the heads up. Tom.2021년 2월 3일 (수) 오전 8:31, Tom Graves 님이 작성:
ok thanks for the update. That is marked as an improvement, if its a blocker can we mark it as such and describe why.  I searched jiras and didn't see any critical or blockers open.Tom





On Tuesday, February 2, 2021, 05:12:24 PM CST, Hyukjin Kwon  wrote:



There is one here: https://github.com/apache/spark/pull/31440. There look several issues being identified (to confirm that this is an issue in OSS too), and fixed in parallel.There are a bit of unexpected delays here as several issues  more were found. I will try to file and share relevant JIRAs as soon as I can confirm.2021년 2월 3일 (수) 오전 2:36, Tom Graves 님이 작성:
Just curious if we have an update on next rc? is there a jira for the tpcds issue?Thanks,Tom





On Wednesday, January 27, 2021, 05:46:27 PM CST, Hyukjin Kwon  wrote:



Just to share the current status, most of the known issues were resolved. Let me know if there are some more.One thing left is a performance regression in TPCDS being investigated. Once this is identified (and fixed if it should be), I will cut another RC right away.I roughly expect to cut another RC next Monday.Thanks guys.2021년 1월 27일 (수) 오전 5:26, Terry Kim 님이 작성:Hi,Please check if the following regression should be included: https://github.com/apache/spark/pull/31352Thanks,TerryOn Tue, Jan 26, 2021 at 7:54 AM Holden Karau  wrote:If were ok waiting for it, I’d like to get https://github.com/apache/spark/pull/31298 in as well (it’s not a regression but it is a bug fix).On Tue, Jan 26, 2021 at 6:38 AM Hyukjin Kwon  wrote:It looks like a cool one but it's a pretty big one and affects the plans considerably ... maybe it's best to avoid adding it into 3.1.1 in particular during the RC period if this isn't a clear regression that affects many users.2021년 1월 26일 (화) 오후 11:23, Peter Toth 님이 작성:Hey,Sorry for chiming in a bit late, but I would like to suggest my PR (https://github.com/apache/spark/pull/28885) for review and inclusion into 3.1.1.Currently, invalid reuse reference nodes appear in many queries, causing performance issues and incorrect explain plans. Now that https://github.com/apache/spark/pull/31243 got merged these 

Re: [VOTE] Release Spark 3.1.1 (RC1)

2021-02-02 Thread Maxim Gekk
Hi All,

> Also I am investigating a performance regression in some TPC-DS queries
(q88 for instance) that is caused by a recent commit in 3.1 ...

I have found that the perf regression is caused by the Hadoop config:
io.file.buffer.size = 4096
Before the commit
https://github.com/apache/spark/commit/278f6f45f46ccafc7a31007d51ab9cb720c9cb14,
we had:
io.file.buffer.size = 65536

Maxim Gekk

Software Engineer

Databricks, Inc.


On Wed, Feb 3, 2021 at 2:37 AM Hyukjin Kwon  wrote:

> Yeah, agree. I changed. Thanks for the heads up. Tom.
>
> 2021년 2월 3일 (수) 오전 8:31, Tom Graves 님이 작성:
>
>> ok thanks for the update. That is marked as an improvement, if its a
>> blocker can we mark it as such and describe why.  I searched jiras and
>> didn't see any critical or blockers open.
>>
>> Tom
>> On Tuesday, February 2, 2021, 05:12:24 PM CST, Hyukjin Kwon <
>> gurwls...@gmail.com> wrote:
>>
>>
>> There is one here: https://github.com/apache/spark/pull/31440. There
>> look several issues being identified (to confirm that this is an issue in
>> OSS too), and fixed in parallel.
>> There are a bit of unexpected delays here as several issues more were
>> found. I will try to file and share relevant JIRAs as soon as I can confirm.
>>
>> 2021년 2월 3일 (수) 오전 2:36, Tom Graves 님이 작성:
>>
>> Just curious if we have an update on next rc? is there a jira for the
>> tpcds issue?
>>
>> Thanks,
>> Tom
>>
>> On Wednesday, January 27, 2021, 05:46:27 PM CST, Hyukjin Kwon <
>> gurwls...@gmail.com> wrote:
>>
>>
>> Just to share the current status, most of the known issues were resolved.
>> Let me know if there are some more.
>> One thing left is a performance regression in TPCDS being investigated.
>> Once this is identified (and fixed if it should be), I will cut another RC
>> right away.
>> I roughly expect to cut another RC next Monday.
>>
>> Thanks guys.
>>
>> 2021년 1월 27일 (수) 오전 5:26, Terry Kim 님이 작성:
>>
>> Hi,
>>
>> Please check if the following regression should be included:
>> https://github.com/apache/spark/pull/31352
>>
>> Thanks,
>> Terry
>>
>> On Tue, Jan 26, 2021 at 7:54 AM Holden Karau 
>> wrote:
>>
>> If were ok waiting for it, I’d like to get
>> https://github.com/apache/spark/pull/31298 in as well (it’s not a
>> regression but it is a bug fix).
>>
>> On Tue, Jan 26, 2021 at 6:38 AM Hyukjin Kwon  wrote:
>>
>> It looks like a cool one but it's a pretty big one and affects the plans
>> considerably ... maybe it's best to avoid adding it into 3.1.1 in
>> particular during the RC period if this isn't a clear regression that
>> affects many users.
>>
>> 2021년 1월 26일 (화) 오후 11:23, Peter Toth 님이 작성:
>>
>> Hey,
>>
>> Sorry for chiming in a bit late, but I would like to suggest my PR (
>> https://github.com/apache/spark/pull/28885) for review and inclusion
>> into 3.1.1.
>>
>> Currently, invalid reuse reference nodes appear in many queries, causing
>> performance issues and incorrect explain plans. Now that
>> https://github.com/apache/spark/pull/31243 got merged these invalid
>> references can be easily found in many of our golden files on master:
>> https://github.com/apache/spark/pull/28885#issuecomment-767530441.
>> But the issue isn't master (3.2) specific, actually it has been there
>> since 3.0 when Dynamic Partition Pruning was added.
>> So it is not a regression from 3.0 to 3.1.1, but in some cases (like
>> TPCDS q23b) it is causing performance regression from 2.4 to 3.x.
>>
>> Thanks,
>> Peter
>>
>> On Tue, Jan 26, 2021 at 6:30 AM Hyukjin Kwon  wrote:
>>
>> Guys, I plan to make an RC as soon as we have no visible issues. I have
>> merged a few correctness issues. There look:
>> - https://github.com/apache/spark/pull/31319 waiting for a review (I
>> will do it too soon).
>> - https://github.com/apache/spark/pull/31336
>> - I know Max's investigating the perf regression one which hopefully will
>> be fixed soon.
>>
>> Are there any more blockers or correctness issues? Please ping me or say
>> it out here.
>> I would like to avoid making an RC when there are clearly some issues to
>> be fixed.
>> If you're investigating something suspicious, that's fine too. It's
>> better to make sure we're safe instead of rushing an RC without finishing
>> the investigation.
>>
>> Thanks all.
>>
>>
>> 2021년 1월 22일 (금) 오후 6:19, Hyukjin Kwon 님이 작성:
>>
>> Sure, thanks guys. I'll start another RC after the fixes. Looks like
>> we're almost there.
>>
>> On Fri, 22 Jan 2021, 17:47 Wenchen Fan,  wrote:
>>
>> BTW, there is a correctness bug being fixed at
>> https://github.com/apache/spark/pull/30788 . It's not a regression, but
>> the fix is very simple and it would be better to start the next RC after
>> merging that fix.
>>
>> On Fri, Jan 22, 2021 at 3:54 PM Maxim Gekk 
>> wrote:
>>
>> Also I am investigating a performance regression in some TPC-DS queries
>> (q88 for instance) that is caused by a recent commit in 3.1, highly likely
>> in the period from 19th November, 2020 to 18th December, 2020.
>>
>> Maxim Gekk
>>
>> Software Engineer
>>
>> 

Re: [VOTE] Release Spark 3.1.1 (RC1)

2021-02-02 Thread Hyukjin Kwon
Yeah, agree. I changed. Thanks for the heads up. Tom.

2021년 2월 3일 (수) 오전 8:31, Tom Graves 님이 작성:

> ok thanks for the update. That is marked as an improvement, if its a
> blocker can we mark it as such and describe why.  I searched jiras and
> didn't see any critical or blockers open.
>
> Tom
> On Tuesday, February 2, 2021, 05:12:24 PM CST, Hyukjin Kwon <
> gurwls...@gmail.com> wrote:
>
>
> There is one here: https://github.com/apache/spark/pull/31440. There look
> several issues being identified (to confirm that this is an issue in OSS
> too), and fixed in parallel.
> There are a bit of unexpected delays here as several issues more were
> found. I will try to file and share relevant JIRAs as soon as I can confirm.
>
> 2021년 2월 3일 (수) 오전 2:36, Tom Graves 님이 작성:
>
> Just curious if we have an update on next rc? is there a jira for the
> tpcds issue?
>
> Thanks,
> Tom
>
> On Wednesday, January 27, 2021, 05:46:27 PM CST, Hyukjin Kwon <
> gurwls...@gmail.com> wrote:
>
>
> Just to share the current status, most of the known issues were resolved.
> Let me know if there are some more.
> One thing left is a performance regression in TPCDS being investigated.
> Once this is identified (and fixed if it should be), I will cut another RC
> right away.
> I roughly expect to cut another RC next Monday.
>
> Thanks guys.
>
> 2021년 1월 27일 (수) 오전 5:26, Terry Kim 님이 작성:
>
> Hi,
>
> Please check if the following regression should be included:
> https://github.com/apache/spark/pull/31352
>
> Thanks,
> Terry
>
> On Tue, Jan 26, 2021 at 7:54 AM Holden Karau  wrote:
>
> If were ok waiting for it, I’d like to get
> https://github.com/apache/spark/pull/31298 in as well (it’s not a
> regression but it is a bug fix).
>
> On Tue, Jan 26, 2021 at 6:38 AM Hyukjin Kwon  wrote:
>
> It looks like a cool one but it's a pretty big one and affects the plans
> considerably ... maybe it's best to avoid adding it into 3.1.1 in
> particular during the RC period if this isn't a clear regression that
> affects many users.
>
> 2021년 1월 26일 (화) 오후 11:23, Peter Toth 님이 작성:
>
> Hey,
>
> Sorry for chiming in a bit late, but I would like to suggest my PR (
> https://github.com/apache/spark/pull/28885) for review and inclusion into
> 3.1.1.
>
> Currently, invalid reuse reference nodes appear in many queries, causing
> performance issues and incorrect explain plans. Now that
> https://github.com/apache/spark/pull/31243 got merged these invalid
> references can be easily found in many of our golden files on master:
> https://github.com/apache/spark/pull/28885#issuecomment-767530441.
> But the issue isn't master (3.2) specific, actually it has been there
> since 3.0 when Dynamic Partition Pruning was added.
> So it is not a regression from 3.0 to 3.1.1, but in some cases (like TPCDS
> q23b) it is causing performance regression from 2.4 to 3.x.
>
> Thanks,
> Peter
>
> On Tue, Jan 26, 2021 at 6:30 AM Hyukjin Kwon  wrote:
>
> Guys, I plan to make an RC as soon as we have no visible issues. I have
> merged a few correctness issues. There look:
> - https://github.com/apache/spark/pull/31319 waiting for a review (I will
> do it too soon).
> - https://github.com/apache/spark/pull/31336
> - I know Max's investigating the perf regression one which hopefully will
> be fixed soon.
>
> Are there any more blockers or correctness issues? Please ping me or say
> it out here.
> I would like to avoid making an RC when there are clearly some issues to
> be fixed.
> If you're investigating something suspicious, that's fine too. It's better
> to make sure we're safe instead of rushing an RC without finishing the
> investigation.
>
> Thanks all.
>
>
> 2021년 1월 22일 (금) 오후 6:19, Hyukjin Kwon 님이 작성:
>
> Sure, thanks guys. I'll start another RC after the fixes. Looks like we're
> almost there.
>
> On Fri, 22 Jan 2021, 17:47 Wenchen Fan,  wrote:
>
> BTW, there is a correctness bug being fixed at
> https://github.com/apache/spark/pull/30788 . It's not a regression, but
> the fix is very simple and it would be better to start the next RC after
> merging that fix.
>
> On Fri, Jan 22, 2021 at 3:54 PM Maxim Gekk 
> wrote:
>
> Also I am investigating a performance regression in some TPC-DS queries
> (q88 for instance) that is caused by a recent commit in 3.1, highly likely
> in the period from 19th November, 2020 to 18th December, 2020.
>
> Maxim Gekk
>
> Software Engineer
>
> Databricks, Inc.
>
>
> On Fri, Jan 22, 2021 at 10:45 AM Wenchen Fan  wrote:
>
> -1 as I just found a regression in 3.1. A self-join query works well in
> 3.0 but fails in 3.1. It's being fixed at
> https://github.com/apache/spark/pull/31287
>
> On Fri, Jan 22, 2021 at 4:34 AM Tom Graves 
> wrote:
>
> +1
>
> built from tarball, verified sha and regular CI and tests all pass.
>
> Tom
>
> On Monday, January 18, 2021, 06:06:42 AM CST, Hyukjin Kwon <
> gurwls...@gmail.com> wrote:
>
>
> Please vote on releasing the following candidate as Apache Spark version
> 3.1.1.
>
> The vote is open until January 22nd 4PM PST and 

Re: [VOTE] Release Spark 3.1.1 (RC1)

2021-02-02 Thread Tom Graves
 ok thanks for the update. That is marked as an improvement, if its a blocker 
can we mark it as such and describe why.  I searched jiras and didn't see any 
critical or blockers open.
TomOn Tuesday, February 2, 2021, 05:12:24 PM CST, Hyukjin Kwon 
 wrote:  
 
 There is one here: https://github.com/apache/spark/pull/31440. There look 
several issues being identified (to confirm that this is an issue in OSS too), 
and fixed in parallel.
There are a bit of unexpected delays here as several issues more were found. I 
will try to file and share relevant JIRAs as soon as I can confirm.

2021년 2월 3일 (수) 오전 2:36, Tom Graves 님이 작성:

 Just curious if we have an update on next rc? is there a jira for the tpcds 
issue?
Thanks,Tom
On Wednesday, January 27, 2021, 05:46:27 PM CST, Hyukjin Kwon 
 wrote:  
 
 Just to share the current status, most of the known issues were resolved. Let 
me know if there are some more.
One thing left is a performance regression in TPCDS being investigated. Once 
this is identified (and fixed if it should be), I will cut another RC right 
away.
I roughly expect to cut another RC next Monday.

Thanks guys.
2021년 1월 27일 (수) 오전 5:26, Terry Kim 님이 작성:

Hi,
Please check if the following regression should be included: 
https://github.com/apache/spark/pull/31352
Thanks,Terry
On Tue, Jan 26, 2021 at 7:54 AM Holden Karau  wrote:

If were ok waiting for it, I’d like to get 
https://github.com/apache/spark/pull/31298 in as well (it’s not a regression 
but it is a bug fix).
On Tue, Jan 26, 2021 at 6:38 AM Hyukjin Kwon  wrote:

It looks like a cool one but it's a pretty big one and affects the plans 
considerably ... maybe it's best to avoid adding it into 3.1.1 in particular 
during the RC period if this isn't a clear regression that affects many users.
2021년 1월 26일 (화) 오후 11:23, Peter Toth 님이 작성:

Hey,
Sorry for chiming in a bit late, but I would like to suggest my PR 
(https://github.com/apache/spark/pull/28885) for review and inclusion into 
3.1.1.

Currently, invalid reuse reference nodes appear in many queries, causing 
performance issues and incorrect explain plans. Now that 
https://github.com/apache/spark/pull/31243 got merged these invalid references 
can be easily found in many of our golden files on master: 
https://github.com/apache/spark/pull/28885#issuecomment-767530441.
But the issue isn't master (3.2) specific, actually it has been there since 3.0 
when Dynamic Partition Pruning was added. 
So it is not a regression from 3.0 to 3.1.1, but in some cases (like TPCDS 
q23b) it is causing performance regression from 2.4 to 3.x.

Thanks,Peter
On Tue, Jan 26, 2021 at 6:30 AM Hyukjin Kwon  wrote:

Guys, I plan to make an RC as soon as we have no visible issues. I have merged 
a few correctness issues. There look:
- https://github.com/apache/spark/pull/31319 waiting for a review (I will do it 
too soon).
- https://github.com/apache/spark/pull/31336
- I know Max's investigating the perf regression one which hopefully will be 
fixed soon.

Are there any more blockers or correctness issues? Please ping me or say it out 
here.
I would like to avoid making an RC when there are clearly some issues to be 
fixed.
If you're investigating something suspicious, that's fine too. It's better to 
make sure we're safe instead of rushing an RC without finishing the 
investigation.

Thanks all.


2021년 1월 22일 (금) 오후 6:19, Hyukjin Kwon 님이 작성:

Sure, thanks guys. I'll start another RC after the fixes. Looks like we're 
almost there.
On Fri, 22 Jan 2021, 17:47 Wenchen Fan,  wrote:

BTW, there is a correctness bug being fixed at 
https://github.com/apache/spark/pull/30788 . It's not a regression, but the fix 
is very simple and it would be better to start the next RC after merging that 
fix.
On Fri, Jan 22, 2021 at 3:54 PM Maxim Gekk  wrote:

Also I am investigating a performance regression in some TPC-DS queries (q88 
for instance) that is caused by a recent commit in 3.1, highly likely in the 
period from 19th November, 2020 to 18th December, 2020.
Maxim Gekk

Software Engineer

Databricks, Inc.


On Fri, Jan 22, 2021 at 10:45 AM Wenchen Fan  wrote:

-1 as I just found a regression in 3.1. A self-join query works well in 3.0 but 
fails in 3.1. It's being fixed at https://github.com/apache/spark/pull/31287
On Fri, Jan 22, 2021 at 4:34 AM Tom Graves  wrote:

 +1
built from tarball, verified sha and regular CI and tests all pass.
Tom
On Monday, January 18, 2021, 06:06:42 AM CST, Hyukjin Kwon 
 wrote:  
 
 Please vote on releasing the following candidate as Apache Spark version 3.1.1.
The vote is open until January 22nd 4PM PST and passes if a majority +1 PMC 
votes are cast, with a minimum of 3 +1 votes.
[ ] +1 Release this package as Apache Spark 3.1.0[ ] -1 Do not release this 
package because ...
To learn more about Apache Spark, please see http://spark.apache.org/
The tag to be voted on is v3.1.1-rc1 (commit 
53fe365edb948d0e05a5ccb62f349cd9fcb4bb5d):https://github.com/apache/spark/tree/v3.1.1-rc1

Re: [VOTE] Release Spark 3.1.1 (RC1)

2021-02-02 Thread Hyukjin Kwon
There is one here: https://github.com/apache/spark/pull/31440. There look
several issues being identified (to confirm that this is an issue in OSS
too), and fixed in parallel.
There are a bit of unexpected delays here as several issues more were
found. I will try to file and share relevant JIRAs as soon as I can confirm.

2021년 2월 3일 (수) 오전 2:36, Tom Graves 님이 작성:

> Just curious if we have an update on next rc? is there a jira for the
> tpcds issue?
>
> Thanks,
> Tom
>
> On Wednesday, January 27, 2021, 05:46:27 PM CST, Hyukjin Kwon <
> gurwls...@gmail.com> wrote:
>
>
> Just to share the current status, most of the known issues were resolved.
> Let me know if there are some more.
> One thing left is a performance regression in TPCDS being investigated.
> Once this is identified (and fixed if it should be), I will cut another RC
> right away.
> I roughly expect to cut another RC next Monday.
>
> Thanks guys.
>
> 2021년 1월 27일 (수) 오전 5:26, Terry Kim 님이 작성:
>
> Hi,
>
> Please check if the following regression should be included:
> https://github.com/apache/spark/pull/31352
>
> Thanks,
> Terry
>
> On Tue, Jan 26, 2021 at 7:54 AM Holden Karau  wrote:
>
> If were ok waiting for it, I’d like to get
> https://github.com/apache/spark/pull/31298 in as well (it’s not a
> regression but it is a bug fix).
>
> On Tue, Jan 26, 2021 at 6:38 AM Hyukjin Kwon  wrote:
>
> It looks like a cool one but it's a pretty big one and affects the plans
> considerably ... maybe it's best to avoid adding it into 3.1.1 in
> particular during the RC period if this isn't a clear regression that
> affects many users.
>
> 2021년 1월 26일 (화) 오후 11:23, Peter Toth 님이 작성:
>
> Hey,
>
> Sorry for chiming in a bit late, but I would like to suggest my PR (
> https://github.com/apache/spark/pull/28885) for review and inclusion into
> 3.1.1.
>
> Currently, invalid reuse reference nodes appear in many queries, causing
> performance issues and incorrect explain plans. Now that
> https://github.com/apache/spark/pull/31243 got merged these invalid
> references can be easily found in many of our golden files on master:
> https://github.com/apache/spark/pull/28885#issuecomment-767530441.
> But the issue isn't master (3.2) specific, actually it has been there
> since 3.0 when Dynamic Partition Pruning was added.
> So it is not a regression from 3.0 to 3.1.1, but in some cases (like TPCDS
> q23b) it is causing performance regression from 2.4 to 3.x.
>
> Thanks,
> Peter
>
> On Tue, Jan 26, 2021 at 6:30 AM Hyukjin Kwon  wrote:
>
> Guys, I plan to make an RC as soon as we have no visible issues. I have
> merged a few correctness issues. There look:
> - https://github.com/apache/spark/pull/31319 waiting for a review (I will
> do it too soon).
> - https://github.com/apache/spark/pull/31336
> - I know Max's investigating the perf regression one which hopefully will
> be fixed soon.
>
> Are there any more blockers or correctness issues? Please ping me or say
> it out here.
> I would like to avoid making an RC when there are clearly some issues to
> be fixed.
> If you're investigating something suspicious, that's fine too. It's better
> to make sure we're safe instead of rushing an RC without finishing the
> investigation.
>
> Thanks all.
>
>
> 2021년 1월 22일 (금) 오후 6:19, Hyukjin Kwon 님이 작성:
>
> Sure, thanks guys. I'll start another RC after the fixes. Looks like we're
> almost there.
>
> On Fri, 22 Jan 2021, 17:47 Wenchen Fan,  wrote:
>
> BTW, there is a correctness bug being fixed at
> https://github.com/apache/spark/pull/30788 . It's not a regression, but
> the fix is very simple and it would be better to start the next RC after
> merging that fix.
>
> On Fri, Jan 22, 2021 at 3:54 PM Maxim Gekk 
> wrote:
>
> Also I am investigating a performance regression in some TPC-DS queries
> (q88 for instance) that is caused by a recent commit in 3.1, highly likely
> in the period from 19th November, 2020 to 18th December, 2020.
>
> Maxim Gekk
>
> Software Engineer
>
> Databricks, Inc.
>
>
> On Fri, Jan 22, 2021 at 10:45 AM Wenchen Fan  wrote:
>
> -1 as I just found a regression in 3.1. A self-join query works well in
> 3.0 but fails in 3.1. It's being fixed at
> https://github.com/apache/spark/pull/31287
>
> On Fri, Jan 22, 2021 at 4:34 AM Tom Graves 
> wrote:
>
> +1
>
> built from tarball, verified sha and regular CI and tests all pass.
>
> Tom
>
> On Monday, January 18, 2021, 06:06:42 AM CST, Hyukjin Kwon <
> gurwls...@gmail.com> wrote:
>
>
> Please vote on releasing the following candidate as Apache Spark version
> 3.1.1.
>
> The vote is open until January 22nd 4PM PST and passes if a majority +1
> PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.1.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.1.1-rc1 (commit
> 53fe365edb948d0e05a5ccb62f349cd9fcb4bb5d):
> https://github.com/apache/spark/tree/v3.1.1-rc1
>
> The 

Re: [VOTE] Release Spark 3.1.1 (RC1)

2021-02-02 Thread Tom Graves
 Just curious if we have an update on next rc? is there a jira for the tpcds 
issue?
Thanks,Tom
On Wednesday, January 27, 2021, 05:46:27 PM CST, Hyukjin Kwon 
 wrote:  
 
 Just to share the current status, most of the known issues were resolved. Let 
me know if there are some more.
One thing left is a performance regression in TPCDS being investigated. Once 
this is identified (and fixed if it should be), I will cut another RC right 
away.
I roughly expect to cut another RC next Monday.

Thanks guys.
2021년 1월 27일 (수) 오전 5:26, Terry Kim 님이 작성:

Hi,
Please check if the following regression should be included: 
https://github.com/apache/spark/pull/31352
Thanks,Terry
On Tue, Jan 26, 2021 at 7:54 AM Holden Karau  wrote:

If were ok waiting for it, I’d like to get 
https://github.com/apache/spark/pull/31298 in as well (it’s not a regression 
but it is a bug fix).
On Tue, Jan 26, 2021 at 6:38 AM Hyukjin Kwon  wrote:

It looks like a cool one but it's a pretty big one and affects the plans 
considerably ... maybe it's best to avoid adding it into 3.1.1 in particular 
during the RC period if this isn't a clear regression that affects many users.
2021년 1월 26일 (화) 오후 11:23, Peter Toth 님이 작성:

Hey,
Sorry for chiming in a bit late, but I would like to suggest my PR 
(https://github.com/apache/spark/pull/28885) for review and inclusion into 
3.1.1.

Currently, invalid reuse reference nodes appear in many queries, causing 
performance issues and incorrect explain plans. Now that 
https://github.com/apache/spark/pull/31243 got merged these invalid references 
can be easily found in many of our golden files on master: 
https://github.com/apache/spark/pull/28885#issuecomment-767530441.
But the issue isn't master (3.2) specific, actually it has been there since 3.0 
when Dynamic Partition Pruning was added. 
So it is not a regression from 3.0 to 3.1.1, but in some cases (like TPCDS 
q23b) it is causing performance regression from 2.4 to 3.x.

Thanks,Peter
On Tue, Jan 26, 2021 at 6:30 AM Hyukjin Kwon  wrote:

Guys, I plan to make an RC as soon as we have no visible issues. I have merged 
a few correctness issues. There look:
- https://github.com/apache/spark/pull/31319 waiting for a review (I will do it 
too soon).
- https://github.com/apache/spark/pull/31336
- I know Max's investigating the perf regression one which hopefully will be 
fixed soon.

Are there any more blockers or correctness issues? Please ping me or say it out 
here.
I would like to avoid making an RC when there are clearly some issues to be 
fixed.
If you're investigating something suspicious, that's fine too. It's better to 
make sure we're safe instead of rushing an RC without finishing the 
investigation.

Thanks all.


2021년 1월 22일 (금) 오후 6:19, Hyukjin Kwon 님이 작성:

Sure, thanks guys. I'll start another RC after the fixes. Looks like we're 
almost there.
On Fri, 22 Jan 2021, 17:47 Wenchen Fan,  wrote:

BTW, there is a correctness bug being fixed at 
https://github.com/apache/spark/pull/30788 . It's not a regression, but the fix 
is very simple and it would be better to start the next RC after merging that 
fix.
On Fri, Jan 22, 2021 at 3:54 PM Maxim Gekk  wrote:

Also I am investigating a performance regression in some TPC-DS queries (q88 
for instance) that is caused by a recent commit in 3.1, highly likely in the 
period from 19th November, 2020 to 18th December, 2020.
Maxim Gekk

Software Engineer

Databricks, Inc.


On Fri, Jan 22, 2021 at 10:45 AM Wenchen Fan  wrote:

-1 as I just found a regression in 3.1. A self-join query works well in 3.0 but 
fails in 3.1. It's being fixed at https://github.com/apache/spark/pull/31287
On Fri, Jan 22, 2021 at 4:34 AM Tom Graves  wrote:

 +1
built from tarball, verified sha and regular CI and tests all pass.
Tom
On Monday, January 18, 2021, 06:06:42 AM CST, Hyukjin Kwon 
 wrote:  
 
 Please vote on releasing the following candidate as Apache Spark version 3.1.1.
The vote is open until January 22nd 4PM PST and passes if a majority +1 PMC 
votes are cast, with a minimum of 3 +1 votes.
[ ] +1 Release this package as Apache Spark 3.1.0[ ] -1 Do not release this 
package because ...
To learn more about Apache Spark, please see http://spark.apache.org/
The tag to be voted on is v3.1.1-rc1 (commit 
53fe365edb948d0e05a5ccb62f349cd9fcb4bb5d):https://github.com/apache/spark/tree/v3.1.1-rc1
The release files, including signatures, digests, etc. can be found 
at:https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/
Signatures used for Spark RCs can be found in this 
file:https://dist.apache.org/repos/dist/dev/spark/KEYS
The staging repository for this release can be found 
at:https://repository.apache.org/content/repositories/orgapachespark-1364
The documentation corresponding to this release can be found 
at:https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-docs/

The list of bug fixes going into 3.1.1 can be found at the following 
URL:https://s.apache.org/41kf2
This release is using the release script 

Re: [VOTE] Release Spark 3.1.1 (RC1)

2021-01-27 Thread Hyukjin Kwon
Just to share the current status, most of the known issues were resolved.
Let me know if there are some more.
One thing left is a performance regression in TPCDS being investigated.
Once this is identified (and fixed if it should be), I will cut another RC
right away.
I roughly expect to cut another RC next Monday.

Thanks guys.

2021년 1월 27일 (수) 오전 5:26, Terry Kim 님이 작성:

> Hi,
>
> Please check if the following regression should be included:
> https://github.com/apache/spark/pull/31352
>
> Thanks,
> Terry
>
> On Tue, Jan 26, 2021 at 7:54 AM Holden Karau  wrote:
>
>> If were ok waiting for it, I’d like to get
>> https://github.com/apache/spark/pull/31298 in as well (it’s not a
>> regression but it is a bug fix).
>>
>> On Tue, Jan 26, 2021 at 6:38 AM Hyukjin Kwon  wrote:
>>
>>> It looks like a cool one but it's a pretty big one and affects the plans
>>> considerably ... maybe it's best to avoid adding it into 3.1.1 in
>>> particular during the RC period if this isn't a clear regression that
>>> affects many users.
>>>
>>> 2021년 1월 26일 (화) 오후 11:23, Peter Toth 님이 작성:
>>>
 Hey,

 Sorry for chiming in a bit late, but I would like to suggest my PR (
 https://github.com/apache/spark/pull/28885) for review and inclusion
 into 3.1.1.

 Currently, invalid reuse reference nodes appear in many queries,
 causing performance issues and incorrect explain plans. Now that
 https://github.com/apache/spark/pull/31243 got merged these invalid
 references can be easily found in many of our golden files on master:
 https://github.com/apache/spark/pull/28885#issuecomment-767530441.
 But the issue isn't master (3.2) specific, actually it has been there
 since 3.0 when Dynamic Partition Pruning was added.
 So it is not a regression from 3.0 to 3.1.1, but in some cases (like
 TPCDS q23b) it is causing performance regression from 2.4 to 3.x.

 Thanks,
 Peter

 On Tue, Jan 26, 2021 at 6:30 AM Hyukjin Kwon 
 wrote:

> Guys, I plan to make an RC as soon as we have no visible issues. I
> have merged a few correctness issues. There look:
> - https://github.com/apache/spark/pull/31319 waiting for a review (I
> will do it too soon).
> - https://github.com/apache/spark/pull/31336
> - I know Max's investigating the perf regression one which hopefully
> will be fixed soon.
>
> Are there any more blockers or correctness issues? Please ping me or
> say it out here.
> I would like to avoid making an RC when there are clearly some issues
> to be fixed.
> If you're investigating something suspicious, that's fine too. It's
> better to make sure we're safe instead of rushing an RC without finishing
> the investigation.
>
> Thanks all.
>
>
> 2021년 1월 22일 (금) 오후 6:19, Hyukjin Kwon 님이 작성:
>
>> Sure, thanks guys. I'll start another RC after the fixes. Looks like
>> we're almost there.
>>
>> On Fri, 22 Jan 2021, 17:47 Wenchen Fan,  wrote:
>>
>>> BTW, there is a correctness bug being fixed at
>>> https://github.com/apache/spark/pull/30788 . It's not a regression,
>>> but the fix is very simple and it would be better to start the next RC
>>> after merging that fix.
>>>
>>> On Fri, Jan 22, 2021 at 3:54 PM Maxim Gekk <
>>> maxim.g...@databricks.com> wrote:
>>>
 Also I am investigating a performance regression in some TPC-DS
 queries (q88 for instance) that is caused by a recent commit in 3.1, 
 highly
 likely in the period from 19th November, 2020 to 18th December, 2020.

 Maxim Gekk

 Software Engineer

 Databricks, Inc.


 On Fri, Jan 22, 2021 at 10:45 AM Wenchen Fan 
 wrote:

> -1 as I just found a regression in 3.1. A self-join query works
> well in 3.0 but fails in 3.1. It's being fixed at
> https://github.com/apache/spark/pull/31287
>
> On Fri, Jan 22, 2021 at 4:34 AM Tom Graves
>  wrote:
>
>> +1
>>
>> built from tarball, verified sha and regular CI and tests all
>> pass.
>>
>> Tom
>>
>> On Monday, January 18, 2021, 06:06:42 AM CST, Hyukjin Kwon <
>> gurwls...@gmail.com> wrote:
>>
>>
>> Please vote on releasing the following candidate as Apache Spark
>> version 3.1.1.
>>
>> The vote is open until January 22nd 4PM PST and passes if a
>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 3.1.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see
>> http://spark.apache.org/
>>
>> The tag to be voted on is v3.1.1-rc1 (commit
>> 

Re: [VOTE] Release Spark 3.1.1 (RC1)

2021-01-26 Thread Terry Kim
Hi,

Please check if the following regression should be included:
https://github.com/apache/spark/pull/31352

Thanks,
Terry

On Tue, Jan 26, 2021 at 7:54 AM Holden Karau  wrote:

> If were ok waiting for it, I’d like to get
> https://github.com/apache/spark/pull/31298 in as well (it’s not a
> regression but it is a bug fix).
>
> On Tue, Jan 26, 2021 at 6:38 AM Hyukjin Kwon  wrote:
>
>> It looks like a cool one but it's a pretty big one and affects the plans
>> considerably ... maybe it's best to avoid adding it into 3.1.1 in
>> particular during the RC period if this isn't a clear regression that
>> affects many users.
>>
>> 2021년 1월 26일 (화) 오후 11:23, Peter Toth 님이 작성:
>>
>>> Hey,
>>>
>>> Sorry for chiming in a bit late, but I would like to suggest my PR (
>>> https://github.com/apache/spark/pull/28885) for review and inclusion
>>> into 3.1.1.
>>>
>>> Currently, invalid reuse reference nodes appear in many queries, causing
>>> performance issues and incorrect explain plans. Now that
>>> https://github.com/apache/spark/pull/31243 got merged these invalid
>>> references can be easily found in many of our golden files on master:
>>> https://github.com/apache/spark/pull/28885#issuecomment-767530441.
>>> But the issue isn't master (3.2) specific, actually it has been there
>>> since 3.0 when Dynamic Partition Pruning was added.
>>> So it is not a regression from 3.0 to 3.1.1, but in some cases (like
>>> TPCDS q23b) it is causing performance regression from 2.4 to 3.x.
>>>
>>> Thanks,
>>> Peter
>>>
>>> On Tue, Jan 26, 2021 at 6:30 AM Hyukjin Kwon 
>>> wrote:
>>>
 Guys, I plan to make an RC as soon as we have no visible issues. I have
 merged a few correctness issues. There look:
 - https://github.com/apache/spark/pull/31319 waiting for a review (I
 will do it too soon).
 - https://github.com/apache/spark/pull/31336
 - I know Max's investigating the perf regression one which hopefully
 will be fixed soon.

 Are there any more blockers or correctness issues? Please ping me or
 say it out here.
 I would like to avoid making an RC when there are clearly some issues
 to be fixed.
 If you're investigating something suspicious, that's fine too. It's
 better to make sure we're safe instead of rushing an RC without finishing
 the investigation.

 Thanks all.


 2021년 1월 22일 (금) 오후 6:19, Hyukjin Kwon 님이 작성:

> Sure, thanks guys. I'll start another RC after the fixes. Looks like
> we're almost there.
>
> On Fri, 22 Jan 2021, 17:47 Wenchen Fan,  wrote:
>
>> BTW, there is a correctness bug being fixed at
>> https://github.com/apache/spark/pull/30788 . It's not a regression,
>> but the fix is very simple and it would be better to start the next RC
>> after merging that fix.
>>
>> On Fri, Jan 22, 2021 at 3:54 PM Maxim Gekk 
>> wrote:
>>
>>> Also I am investigating a performance regression in some TPC-DS
>>> queries (q88 for instance) that is caused by a recent commit in 3.1, 
>>> highly
>>> likely in the period from 19th November, 2020 to 18th December, 2020.
>>>
>>> Maxim Gekk
>>>
>>> Software Engineer
>>>
>>> Databricks, Inc.
>>>
>>>
>>> On Fri, Jan 22, 2021 at 10:45 AM Wenchen Fan 
>>> wrote:
>>>
 -1 as I just found a regression in 3.1. A self-join query works
 well in 3.0 but fails in 3.1. It's being fixed at
 https://github.com/apache/spark/pull/31287

 On Fri, Jan 22, 2021 at 4:34 AM Tom Graves
  wrote:

> +1
>
> built from tarball, verified sha and regular CI and tests all pass.
>
> Tom
>
> On Monday, January 18, 2021, 06:06:42 AM CST, Hyukjin Kwon <
> gurwls...@gmail.com> wrote:
>
>
> Please vote on releasing the following candidate as Apache Spark
> version 3.1.1.
>
> The vote is open until January 22nd 4PM PST and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.1.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see
> http://spark.apache.org/
>
> The tag to be voted on is v3.1.1-rc1 (commit
> 53fe365edb948d0e05a5ccb62f349cd9fcb4bb5d):
> https://github.com/apache/spark/tree/v3.1.1-rc1
>
> The release files, including signatures, digests, etc. can be
> found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
>
> 

Re: [VOTE] Release Spark 3.1.1 (RC1)

2021-01-26 Thread Holden Karau
If were ok waiting for it, I’d like to get
https://github.com/apache/spark/pull/31298 in as well (it’s not a
regression but it is a bug fix).

On Tue, Jan 26, 2021 at 6:38 AM Hyukjin Kwon  wrote:

> It looks like a cool one but it's a pretty big one and affects the plans
> considerably ... maybe it's best to avoid adding it into 3.1.1 in
> particular during the RC period if this isn't a clear regression that
> affects many users.
>
> 2021년 1월 26일 (화) 오후 11:23, Peter Toth 님이 작성:
>
>> Hey,
>>
>> Sorry for chiming in a bit late, but I would like to suggest my PR (
>> https://github.com/apache/spark/pull/28885) for review and inclusion
>> into 3.1.1.
>>
>> Currently, invalid reuse reference nodes appear in many queries, causing
>> performance issues and incorrect explain plans. Now that
>> https://github.com/apache/spark/pull/31243 got merged these invalid
>> references can be easily found in many of our golden files on master:
>> https://github.com/apache/spark/pull/28885#issuecomment-767530441.
>> But the issue isn't master (3.2) specific, actually it has been there
>> since 3.0 when Dynamic Partition Pruning was added.
>> So it is not a regression from 3.0 to 3.1.1, but in some cases (like
>> TPCDS q23b) it is causing performance regression from 2.4 to 3.x.
>>
>> Thanks,
>> Peter
>>
>> On Tue, Jan 26, 2021 at 6:30 AM Hyukjin Kwon  wrote:
>>
>>> Guys, I plan to make an RC as soon as we have no visible issues. I have
>>> merged a few correctness issues. There look:
>>> - https://github.com/apache/spark/pull/31319 waiting for a review (I
>>> will do it too soon).
>>> - https://github.com/apache/spark/pull/31336
>>> - I know Max's investigating the perf regression one which hopefully
>>> will be fixed soon.
>>>
>>> Are there any more blockers or correctness issues? Please ping me or say
>>> it out here.
>>> I would like to avoid making an RC when there are clearly some issues to
>>> be fixed.
>>> If you're investigating something suspicious, that's fine too. It's
>>> better to make sure we're safe instead of rushing an RC without finishing
>>> the investigation.
>>>
>>> Thanks all.
>>>
>>>
>>> 2021년 1월 22일 (금) 오후 6:19, Hyukjin Kwon 님이 작성:
>>>
 Sure, thanks guys. I'll start another RC after the fixes. Looks like
 we're almost there.

 On Fri, 22 Jan 2021, 17:47 Wenchen Fan,  wrote:

> BTW, there is a correctness bug being fixed at
> https://github.com/apache/spark/pull/30788 . It's not a regression,
> but the fix is very simple and it would be better to start the next RC
> after merging that fix.
>
> On Fri, Jan 22, 2021 at 3:54 PM Maxim Gekk 
> wrote:
>
>> Also I am investigating a performance regression in some TPC-DS
>> queries (q88 for instance) that is caused by a recent commit in 3.1, 
>> highly
>> likely in the period from 19th November, 2020 to 18th December, 2020.
>>
>> Maxim Gekk
>>
>> Software Engineer
>>
>> Databricks, Inc.
>>
>>
>> On Fri, Jan 22, 2021 at 10:45 AM Wenchen Fan 
>> wrote:
>>
>>> -1 as I just found a regression in 3.1. A self-join query works well
>>> in 3.0 but fails in 3.1. It's being fixed at
>>> https://github.com/apache/spark/pull/31287
>>>
>>> On Fri, Jan 22, 2021 at 4:34 AM Tom Graves
>>>  wrote:
>>>
 +1

 built from tarball, verified sha and regular CI and tests all pass.

 Tom

 On Monday, January 18, 2021, 06:06:42 AM CST, Hyukjin Kwon <
 gurwls...@gmail.com> wrote:


 Please vote on releasing the following candidate as Apache Spark
 version 3.1.1.

 The vote is open until January 22nd 4PM PST and passes if a
 majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

 [ ] +1 Release this package as Apache Spark 3.1.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 The tag to be voted on is v3.1.1-rc1 (commit
 53fe365edb948d0e05a5ccb62f349cd9fcb4bb5d):
 https://github.com/apache/spark/tree/v3.1.1-rc1

 The release files, including signatures, digests, etc. can be found
 at:
 https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/

 Signatures used for Spark RCs can be found in this file:
 https://dist.apache.org/repos/dist/dev/spark/KEYS

 The staging repository for this release can be found at:

 https://repository.apache.org/content/repositories/orgapachespark-1364

 The documentation corresponding to this release can be found at:
 https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-docs/

 The list of bug fixes going into 3.1.1 can be found at the
 following URL:
 https://s.apache.org/41kf2

 

Re: [VOTE] Release Spark 3.1.1 (RC1)

2021-01-26 Thread Hyukjin Kwon
It looks like a cool one but it's a pretty big one and affects the plans
considerably ... maybe it's best to avoid adding it into 3.1.1 in
particular during the RC period if this isn't a clear regression that
affects many users.

2021년 1월 26일 (화) 오후 11:23, Peter Toth 님이 작성:

> Hey,
>
> Sorry for chiming in a bit late, but I would like to suggest my PR (
> https://github.com/apache/spark/pull/28885) for review and inclusion into
> 3.1.1.
>
> Currently, invalid reuse reference nodes appear in many queries, causing
> performance issues and incorrect explain plans. Now that
> https://github.com/apache/spark/pull/31243 got merged these invalid
> references can be easily found in many of our golden files on master:
> https://github.com/apache/spark/pull/28885#issuecomment-767530441.
> But the issue isn't master (3.2) specific, actually it has been there
> since 3.0 when Dynamic Partition Pruning was added.
> So it is not a regression from 3.0 to 3.1.1, but in some cases (like TPCDS
> q23b) it is causing performance regression from 2.4 to 3.x.
>
> Thanks,
> Peter
>
> On Tue, Jan 26, 2021 at 6:30 AM Hyukjin Kwon  wrote:
>
>> Guys, I plan to make an RC as soon as we have no visible issues. I have
>> merged a few correctness issues. There look:
>> - https://github.com/apache/spark/pull/31319 waiting for a review (I
>> will do it too soon).
>> - https://github.com/apache/spark/pull/31336
>> - I know Max's investigating the perf regression one which hopefully will
>> be fixed soon.
>>
>> Are there any more blockers or correctness issues? Please ping me or say
>> it out here.
>> I would like to avoid making an RC when there are clearly some issues to
>> be fixed.
>> If you're investigating something suspicious, that's fine too. It's
>> better to make sure we're safe instead of rushing an RC without finishing
>> the investigation.
>>
>> Thanks all.
>>
>>
>> 2021년 1월 22일 (금) 오후 6:19, Hyukjin Kwon 님이 작성:
>>
>>> Sure, thanks guys. I'll start another RC after the fixes. Looks like
>>> we're almost there.
>>>
>>> On Fri, 22 Jan 2021, 17:47 Wenchen Fan,  wrote:
>>>
 BTW, there is a correctness bug being fixed at
 https://github.com/apache/spark/pull/30788 . It's not a regression,
 but the fix is very simple and it would be better to start the next RC
 after merging that fix.

 On Fri, Jan 22, 2021 at 3:54 PM Maxim Gekk 
 wrote:

> Also I am investigating a performance regression in some TPC-DS
> queries (q88 for instance) that is caused by a recent commit in 3.1, 
> highly
> likely in the period from 19th November, 2020 to 18th December, 2020.
>
> Maxim Gekk
>
> Software Engineer
>
> Databricks, Inc.
>
>
> On Fri, Jan 22, 2021 at 10:45 AM Wenchen Fan 
> wrote:
>
>> -1 as I just found a regression in 3.1. A self-join query works well
>> in 3.0 but fails in 3.1. It's being fixed at
>> https://github.com/apache/spark/pull/31287
>>
>> On Fri, Jan 22, 2021 at 4:34 AM Tom Graves
>>  wrote:
>>
>>> +1
>>>
>>> built from tarball, verified sha and regular CI and tests all pass.
>>>
>>> Tom
>>>
>>> On Monday, January 18, 2021, 06:06:42 AM CST, Hyukjin Kwon <
>>> gurwls...@gmail.com> wrote:
>>>
>>>
>>> Please vote on releasing the following candidate as Apache Spark
>>> version 3.1.1.
>>>
>>> The vote is open until January 22nd 4PM PST and passes if a majority
>>> +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>>
>>> [ ] +1 Release this package as Apache Spark 3.1.0
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see
>>> http://spark.apache.org/
>>>
>>> The tag to be voted on is v3.1.1-rc1 (commit
>>> 53fe365edb948d0e05a5ccb62f349cd9fcb4bb5d):
>>> https://github.com/apache/spark/tree/v3.1.1-rc1
>>>
>>> The release files, including signatures, digests, etc. can be found
>>> at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/
>>>
>>> Signatures used for Spark RCs can be found in this file:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>
>>> The staging repository for this release can be found at:
>>>
>>> https://repository.apache.org/content/repositories/orgapachespark-1364
>>>
>>> The documentation corresponding to this release can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-docs/
>>>
>>> The list of bug fixes going into 3.1.1 can be found at the following
>>> URL:
>>> https://s.apache.org/41kf2
>>>
>>> This release is using the release script of the tag v3.1.1-rc1.
>>>
>>> FAQ
>>>
>>> ===
>>> What happened to 3.1.0?
>>> ===
>>>
>>> There was a technical issue during Apache Spark 3.1.0 preparation,
>>> and it was discussed and decided to skip 3.1.0.
>>> Please 

Re: [VOTE] Release Spark 3.1.1 (RC1)

2021-01-26 Thread Peter Toth
Hey,

Sorry for chiming in a bit late, but I would like to suggest my PR (
https://github.com/apache/spark/pull/28885) for review and inclusion into
3.1.1.

Currently, invalid reuse reference nodes appear in many queries, causing
performance issues and incorrect explain plans. Now that
https://github.com/apache/spark/pull/31243 got merged these invalid
references can be easily found in many of our golden files on master:
https://github.com/apache/spark/pull/28885#issuecomment-767530441.
But the issue isn't master (3.2) specific, actually it has been there since
3.0 when Dynamic Partition Pruning was added.
So it is not a regression from 3.0 to 3.1.1, but in some cases (like TPCDS
q23b) it is causing performance regression from 2.4 to 3.x.

Thanks,
Peter

On Tue, Jan 26, 2021 at 6:30 AM Hyukjin Kwon  wrote:

> Guys, I plan to make an RC as soon as we have no visible issues. I have
> merged a few correctness issues. There look:
> - https://github.com/apache/spark/pull/31319 waiting for a review (I will
> do it too soon).
> - https://github.com/apache/spark/pull/31336
> - I know Max's investigating the perf regression one which hopefully will
> be fixed soon.
>
> Are there any more blockers or correctness issues? Please ping me or say
> it out here.
> I would like to avoid making an RC when there are clearly some issues to
> be fixed.
> If you're investigating something suspicious, that's fine too. It's better
> to make sure we're safe instead of rushing an RC without finishing the
> investigation.
>
> Thanks all.
>
>
> 2021년 1월 22일 (금) 오후 6:19, Hyukjin Kwon 님이 작성:
>
>> Sure, thanks guys. I'll start another RC after the fixes. Looks like
>> we're almost there.
>>
>> On Fri, 22 Jan 2021, 17:47 Wenchen Fan,  wrote:
>>
>>> BTW, there is a correctness bug being fixed at
>>> https://github.com/apache/spark/pull/30788 . It's not a regression, but
>>> the fix is very simple and it would be better to start the next RC after
>>> merging that fix.
>>>
>>> On Fri, Jan 22, 2021 at 3:54 PM Maxim Gekk 
>>> wrote:
>>>
 Also I am investigating a performance regression in some TPC-DS queries
 (q88 for instance) that is caused by a recent commit in 3.1, highly likely
 in the period from 19th November, 2020 to 18th December, 2020.

 Maxim Gekk

 Software Engineer

 Databricks, Inc.


 On Fri, Jan 22, 2021 at 10:45 AM Wenchen Fan 
 wrote:

> -1 as I just found a regression in 3.1. A self-join query works well
> in 3.0 but fails in 3.1. It's being fixed at
> https://github.com/apache/spark/pull/31287
>
> On Fri, Jan 22, 2021 at 4:34 AM Tom Graves
>  wrote:
>
>> +1
>>
>> built from tarball, verified sha and regular CI and tests all pass.
>>
>> Tom
>>
>> On Monday, January 18, 2021, 06:06:42 AM CST, Hyukjin Kwon <
>> gurwls...@gmail.com> wrote:
>>
>>
>> Please vote on releasing the following candidate as Apache Spark
>> version 3.1.1.
>>
>> The vote is open until January 22nd 4PM PST and passes if a majority
>> +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 3.1.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v3.1.1-rc1 (commit
>> 53fe365edb948d0e05a5ccb62f349cd9fcb4bb5d):
>> https://github.com/apache/spark/tree/v3.1.1-rc1
>>
>> The release files, including signatures, digests, etc. can be found
>> at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1364
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-docs/
>>
>> The list of bug fixes going into 3.1.1 can be found at the following
>> URL:
>> https://s.apache.org/41kf2
>>
>> This release is using the release script of the tag v3.1.1-rc1.
>>
>> FAQ
>>
>> ===
>> What happened to 3.1.0?
>> ===
>>
>> There was a technical issue during Apache Spark 3.1.0 preparation,
>> and it was discussed and decided to skip 3.1.0.
>> Please see
>> https://spark.apache.org/news/next-official-release-spark-3.1.1.html
>> for more details.
>>
>> =
>> How can I help test this release?
>> =
>>
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're 

Re: [VOTE] Release Spark 3.1.1 (RC1)

2021-01-25 Thread Hyukjin Kwon
Guys, I plan to make an RC as soon as we have no visible issues. I have
merged a few correctness issues. There look:
- https://github.com/apache/spark/pull/31319 waiting for a review (I will
do it too soon).
- https://github.com/apache/spark/pull/31336
- I know Max's investigating the perf regression one which hopefully will
be fixed soon.

Are there any more blockers or correctness issues? Please ping me or say it
out here.
I would like to avoid making an RC when there are clearly some issues to be
fixed.
If you're investigating something suspicious, that's fine too. It's better
to make sure we're safe instead of rushing an RC without finishing the
investigation.

Thanks all.


2021년 1월 22일 (금) 오후 6:19, Hyukjin Kwon 님이 작성:

> Sure, thanks guys. I'll start another RC after the fixes. Looks like we're
> almost there.
>
> On Fri, 22 Jan 2021, 17:47 Wenchen Fan,  wrote:
>
>> BTW, there is a correctness bug being fixed at
>> https://github.com/apache/spark/pull/30788 . It's not a regression, but
>> the fix is very simple and it would be better to start the next RC after
>> merging that fix.
>>
>> On Fri, Jan 22, 2021 at 3:54 PM Maxim Gekk 
>> wrote:
>>
>>> Also I am investigating a performance regression in some TPC-DS queries
>>> (q88 for instance) that is caused by a recent commit in 3.1, highly likely
>>> in the period from 19th November, 2020 to 18th December, 2020.
>>>
>>> Maxim Gekk
>>>
>>> Software Engineer
>>>
>>> Databricks, Inc.
>>>
>>>
>>> On Fri, Jan 22, 2021 at 10:45 AM Wenchen Fan 
>>> wrote:
>>>
 -1 as I just found a regression in 3.1. A self-join query works well in
 3.0 but fails in 3.1. It's being fixed at
 https://github.com/apache/spark/pull/31287

 On Fri, Jan 22, 2021 at 4:34 AM Tom Graves 
 wrote:

> +1
>
> built from tarball, verified sha and regular CI and tests all pass.
>
> Tom
>
> On Monday, January 18, 2021, 06:06:42 AM CST, Hyukjin Kwon <
> gurwls...@gmail.com> wrote:
>
>
> Please vote on releasing the following candidate as Apache Spark
> version 3.1.1.
>
> The vote is open until January 22nd 4PM PST and passes if a majority
> +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.1.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.1.1-rc1 (commit
> 53fe365edb948d0e05a5ccb62f349cd9fcb4bb5d):
> https://github.com/apache/spark/tree/v3.1.1-rc1
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1364
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-docs/
>
> The list of bug fixes going into 3.1.1 can be found at the following
> URL:
> https://s.apache.org/41kf2
>
> This release is using the release script of the tag v3.1.1-rc1.
>
> FAQ
>
> ===
> What happened to 3.1.0?
> ===
>
> There was a technical issue during Apache Spark 3.1.0 preparation, and
> it was discussed and decided to skip 3.1.0.
> Please see
> https://spark.apache.org/news/next-official-release-spark-3.1.1.html
> for more details.
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC via "pip install
> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/pyspark-3.1.1.tar.gz
> "
> and see if anything important breaks.
> In the Java/Scala, you can add the staging repository to your projects
> resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with an out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.1.1?
> ===
>
> The current list of open tickets targeted at 3.1.1 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.1.1
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that 

Re: [VOTE] Release Spark 3.1.1 (RC1)

2021-01-22 Thread Hyukjin Kwon
Sure, thanks guys. I'll start another RC after the fixes. Looks like we're
almost there.

On Fri, 22 Jan 2021, 17:47 Wenchen Fan,  wrote:

> BTW, there is a correctness bug being fixed at
> https://github.com/apache/spark/pull/30788 . It's not a regression, but
> the fix is very simple and it would be better to start the next RC after
> merging that fix.
>
> On Fri, Jan 22, 2021 at 3:54 PM Maxim Gekk 
> wrote:
>
>> Also I am investigating a performance regression in some TPC-DS queries
>> (q88 for instance) that is caused by a recent commit in 3.1, highly likely
>> in the period from 19th November, 2020 to 18th December, 2020.
>>
>> Maxim Gekk
>>
>> Software Engineer
>>
>> Databricks, Inc.
>>
>>
>> On Fri, Jan 22, 2021 at 10:45 AM Wenchen Fan  wrote:
>>
>>> -1 as I just found a regression in 3.1. A self-join query works well in
>>> 3.0 but fails in 3.1. It's being fixed at
>>> https://github.com/apache/spark/pull/31287
>>>
>>> On Fri, Jan 22, 2021 at 4:34 AM Tom Graves 
>>> wrote:
>>>
 +1

 built from tarball, verified sha and regular CI and tests all pass.

 Tom

 On Monday, January 18, 2021, 06:06:42 AM CST, Hyukjin Kwon <
 gurwls...@gmail.com> wrote:


 Please vote on releasing the following candidate as Apache Spark
 version 3.1.1.

 The vote is open until January 22nd 4PM PST and passes if a majority +1
 PMC votes are cast, with a minimum of 3 +1 votes.

 [ ] +1 Release this package as Apache Spark 3.1.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see http://spark.apache.org/

 The tag to be voted on is v3.1.1-rc1 (commit
 53fe365edb948d0e05a5ccb62f349cd9fcb4bb5d):
 https://github.com/apache/spark/tree/v3.1.1-rc1

 The release files, including signatures, digests, etc. can be found at:
 https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/

 Signatures used for Spark RCs can be found in this file:
 https://dist.apache.org/repos/dist/dev/spark/KEYS

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1364

 The documentation corresponding to this release can be found at:
 https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-docs/

 The list of bug fixes going into 3.1.1 can be found at the following
 URL:
 https://s.apache.org/41kf2

 This release is using the release script of the tag v3.1.1-rc1.

 FAQ

 ===
 What happened to 3.1.0?
 ===

 There was a technical issue during Apache Spark 3.1.0 preparation, and
 it was discussed and decided to skip 3.1.0.
 Please see
 https://spark.apache.org/news/next-official-release-spark-3.1.1.html
 for more details.

 =
 How can I help test this release?
 =

 If you are a Spark user, you can help us test this release by taking
 an existing Spark workload and running on this release candidate, then
 reporting any regressions.

 If you're working in PySpark you can set up a virtual env and install
 the current RC via "pip install
 https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/pyspark-3.1.1.tar.gz
 "
 and see if anything important breaks.
 In the Java/Scala, you can add the staging repository to your projects
 resolvers and test
 with the RC (make sure to clean up the artifact cache before/after so
 you don't end up building with an out of date RC going forward).

 ===
 What should happen to JIRA tickets still targeting 3.1.1?
 ===

 The current list of open tickets targeted at 3.1.1 can be found at:
 https://issues.apache.org/jira/projects/SPARK and search for "Target
 Version/s" = 3.1.1

 Committers should look at those and triage. Extremely important bug
 fixes, documentation, and API tweaks that impact compatibility should
 be worked on immediately. Everything else please retarget to an
 appropriate release.

 ==
 But my bug isn't fixed?
 ==

 In order to make timely releases, we will typically not hold the
 release unless the bug in question is a regression from the previous
 release. That being said, if there is something which is a regression
 that has not been correctly targeted please ping me or a committer to
 help target the issue.




Re: [VOTE] Release Spark 3.1.1 (RC1)

2021-01-22 Thread Wenchen Fan
BTW, there is a correctness bug being fixed at
https://github.com/apache/spark/pull/30788 . It's not a regression, but the
fix is very simple and it would be better to start the next RC after
merging that fix.

On Fri, Jan 22, 2021 at 3:54 PM Maxim Gekk 
wrote:

> Also I am investigating a performance regression in some TPC-DS queries
> (q88 for instance) that is caused by a recent commit in 3.1, highly likely
> in the period from 19th November, 2020 to 18th December, 2020.
>
> Maxim Gekk
>
> Software Engineer
>
> Databricks, Inc.
>
>
> On Fri, Jan 22, 2021 at 10:45 AM Wenchen Fan  wrote:
>
>> -1 as I just found a regression in 3.1. A self-join query works well in
>> 3.0 but fails in 3.1. It's being fixed at
>> https://github.com/apache/spark/pull/31287
>>
>> On Fri, Jan 22, 2021 at 4:34 AM Tom Graves 
>> wrote:
>>
>>> +1
>>>
>>> built from tarball, verified sha and regular CI and tests all pass.
>>>
>>> Tom
>>>
>>> On Monday, January 18, 2021, 06:06:42 AM CST, Hyukjin Kwon <
>>> gurwls...@gmail.com> wrote:
>>>
>>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 3.1.1.
>>>
>>> The vote is open until January 22nd 4PM PST and passes if a majority +1
>>> PMC votes are cast, with a minimum of 3 +1 votes.
>>>
>>> [ ] +1 Release this package as Apache Spark 3.1.0
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is v3.1.1-rc1 (commit
>>> 53fe365edb948d0e05a5ccb62f349cd9fcb4bb5d):
>>> https://github.com/apache/spark/tree/v3.1.1-rc1
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/
>>>
>>> Signatures used for Spark RCs can be found in this file:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1364
>>>
>>> The documentation corresponding to this release can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-docs/
>>>
>>> The list of bug fixes going into 3.1.1 can be found at the following URL:
>>> https://s.apache.org/41kf2
>>>
>>> This release is using the release script of the tag v3.1.1-rc1.
>>>
>>> FAQ
>>>
>>> ===
>>> What happened to 3.1.0?
>>> ===
>>>
>>> There was a technical issue during Apache Spark 3.1.0 preparation, and
>>> it was discussed and decided to skip 3.1.0.
>>> Please see
>>> https://spark.apache.org/news/next-official-release-spark-3.1.1.html
>>> for more details.
>>>
>>> =
>>> How can I help test this release?
>>> =
>>>
>>> If you are a Spark user, you can help us test this release by taking
>>> an existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> If you're working in PySpark you can set up a virtual env and install
>>> the current RC via "pip install
>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/pyspark-3.1.1.tar.gz
>>> "
>>> and see if anything important breaks.
>>> In the Java/Scala, you can add the staging repository to your projects
>>> resolvers and test
>>> with the RC (make sure to clean up the artifact cache before/after so
>>> you don't end up building with an out of date RC going forward).
>>>
>>> ===
>>> What should happen to JIRA tickets still targeting 3.1.1?
>>> ===
>>>
>>> The current list of open tickets targeted at 3.1.1 can be found at:
>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>> Version/s" = 3.1.1
>>>
>>> Committers should look at those and triage. Extremely important bug
>>> fixes, documentation, and API tweaks that impact compatibility should
>>> be worked on immediately. Everything else please retarget to an
>>> appropriate release.
>>>
>>> ==
>>> But my bug isn't fixed?
>>> ==
>>>
>>> In order to make timely releases, we will typically not hold the
>>> release unless the bug in question is a regression from the previous
>>> release. That being said, if there is something which is a regression
>>> that has not been correctly targeted please ping me or a committer to
>>> help target the issue.
>>>
>>>


Re: [VOTE] Release Spark 3.1.1 (RC1)

2021-01-21 Thread Maxim Gekk
Also I am investigating a performance regression in some TPC-DS queries
(q88 for instance) that is caused by a recent commit in 3.1, highly likely
in the period from 19th November, 2020 to 18th December, 2020.

Maxim Gekk

Software Engineer

Databricks, Inc.


On Fri, Jan 22, 2021 at 10:45 AM Wenchen Fan  wrote:

> -1 as I just found a regression in 3.1. A self-join query works well in
> 3.0 but fails in 3.1. It's being fixed at
> https://github.com/apache/spark/pull/31287
>
> On Fri, Jan 22, 2021 at 4:34 AM Tom Graves 
> wrote:
>
>> +1
>>
>> built from tarball, verified sha and regular CI and tests all pass.
>>
>> Tom
>>
>> On Monday, January 18, 2021, 06:06:42 AM CST, Hyukjin Kwon <
>> gurwls...@gmail.com> wrote:
>>
>>
>> Please vote on releasing the following candidate as Apache Spark version
>> 3.1.1.
>>
>> The vote is open until January 22nd 4PM PST and passes if a majority +1
>> PMC votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 3.1.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v3.1.1-rc1 (commit
>> 53fe365edb948d0e05a5ccb62f349cd9fcb4bb5d):
>> https://github.com/apache/spark/tree/v3.1.1-rc1
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1364
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-docs/
>>
>> The list of bug fixes going into 3.1.1 can be found at the following URL:
>> https://s.apache.org/41kf2
>>
>> This release is using the release script of the tag v3.1.1-rc1.
>>
>> FAQ
>>
>> ===
>> What happened to 3.1.0?
>> ===
>>
>> There was a technical issue during Apache Spark 3.1.0 preparation, and it
>> was discussed and decided to skip 3.1.0.
>> Please see
>> https://spark.apache.org/news/next-official-release-spark-3.1.1.html for
>> more details.
>>
>> =
>> How can I help test this release?
>> =
>>
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install
>> the current RC via "pip install
>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/pyspark-3.1.1.tar.gz
>> "
>> and see if anything important breaks.
>> In the Java/Scala, you can add the staging repository to your projects
>> resolvers and test
>> with the RC (make sure to clean up the artifact cache before/after so
>> you don't end up building with an out of date RC going forward).
>>
>> ===
>> What should happen to JIRA tickets still targeting 3.1.1?
>> ===
>>
>> The current list of open tickets targeted at 3.1.1 can be found at:
>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>> Version/s" = 3.1.1
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should
>> be worked on immediately. Everything else please retarget to an
>> appropriate release.
>>
>> ==
>> But my bug isn't fixed?
>> ==
>>
>> In order to make timely releases, we will typically not hold the
>> release unless the bug in question is a regression from the previous
>> release. That being said, if there is something which is a regression
>> that has not been correctly targeted please ping me or a committer to
>> help target the issue.
>>
>>


Re: [VOTE] Release Spark 3.1.1 (RC1)

2021-01-21 Thread Wenchen Fan
-1 as I just found a regression in 3.1. A self-join query works well in 3.0
but fails in 3.1. It's being fixed at
https://github.com/apache/spark/pull/31287

On Fri, Jan 22, 2021 at 4:34 AM Tom Graves 
wrote:

> +1
>
> built from tarball, verified sha and regular CI and tests all pass.
>
> Tom
>
> On Monday, January 18, 2021, 06:06:42 AM CST, Hyukjin Kwon <
> gurwls...@gmail.com> wrote:
>
>
> Please vote on releasing the following candidate as Apache Spark version
> 3.1.1.
>
> The vote is open until January 22nd 4PM PST and passes if a majority +1
> PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.1.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.1.1-rc1 (commit
> 53fe365edb948d0e05a5ccb62f349cd9fcb4bb5d):
> https://github.com/apache/spark/tree/v3.1.1-rc1
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1364
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-docs/
>
> The list of bug fixes going into 3.1.1 can be found at the following URL:
> https://s.apache.org/41kf2
>
> This release is using the release script of the tag v3.1.1-rc1.
>
> FAQ
>
> ===
> What happened to 3.1.0?
> ===
>
> There was a technical issue during Apache Spark 3.1.0 preparation, and it
> was discussed and decided to skip 3.1.0.
> Please see
> https://spark.apache.org/news/next-official-release-spark-3.1.1.html for
> more details.
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC via "pip install
> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/pyspark-3.1.1.tar.gz
> "
> and see if anything important breaks.
> In the Java/Scala, you can add the staging repository to your projects
> resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with an out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.1.1?
> ===
>
> The current list of open tickets targeted at 3.1.1 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.1.1
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>


Re: [VOTE] Release Spark 3.1.1 (RC1)

2021-01-21 Thread Tom Graves
 +1
built from tarball, verified sha and regular CI and tests all pass.
Tom
On Monday, January 18, 2021, 06:06:42 AM CST, Hyukjin Kwon 
 wrote:  
 
 Please vote on releasing the following candidate as Apache Spark version 3.1.1.
The vote is open until January 22nd 4PM PST and passes if a majority +1 PMC 
votes are cast, with a minimum of 3 +1 votes.
[ ] +1 Release this package as Apache Spark 3.1.0[ ] -1 Do not release this 
package because ...
To learn more about Apache Spark, please see http://spark.apache.org/
The tag to be voted on is v3.1.1-rc1 (commit 
53fe365edb948d0e05a5ccb62f349cd9fcb4bb5d):https://github.com/apache/spark/tree/v3.1.1-rc1
The release files, including signatures, digests, etc. can be found 
at:https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/
Signatures used for Spark RCs can be found in this 
file:https://dist.apache.org/repos/dist/dev/spark/KEYS
The staging repository for this release can be found 
at:https://repository.apache.org/content/repositories/orgapachespark-1364
The documentation corresponding to this release can be found 
at:https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-docs/

The list of bug fixes going into 3.1.1 can be found at the following 
URL:https://s.apache.org/41kf2
This release is using the release script of the tag v3.1.1-rc1.
FAQ
===
What happened to 3.1.0?===

There was a technical issue during Apache Spark 3.1.0 preparation, and it was 
discussed and decided to skip 3.1.0.
Please see https://spark.apache.org/news/next-official-release-spark-3.1.1.html 
for more details.

=How can I help test this 
release?=
If you are a Spark user, you can help us test this release by takingan existing 
Spark workload and running on this release candidate, thenreporting any 
regressions.
If you're working in PySpark you can set up a virtual env and installthe 
current RC via "pip install 
https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/pyspark-3.1.1.tar.gz;
and see if anything important breaks.In the Java/Scala, you can add the staging 
repository to your projects resolvers and testwith the RC (make sure to clean 
up the artifact cache before/after soyou don't end up building with an out of 
date RC going forward).
===What should happen to JIRA tickets 
still targeting 3.1.1?===
The current list of open tickets targeted at 3.1.1 can be found 
at:https://issues.apache.org/jira/projects/SPARK and search for "Target 
Version/s" = 3.1.1
Committers should look at those and triage. Extremely important bugfixes, 
documentation, and API tweaks that impact compatibility shouldbe worked on 
immediately. Everything else please retarget to anappropriate release.
==But my bug isn't fixed?==
In order to make timely releases, we will typically not hold therelease unless 
the bug in question is a regression from the previousrelease. That being said, 
if there is something which is a regressionthat has not been correctly targeted 
please ping me or a committer tohelp target the issue.
  

Re: [VOTE] Release Spark 3.1.1 (RC1)

2021-01-21 Thread Holden Karau
Hi folks,

Just an FYI -- I've found a potential race condition in this RC with block
manager decommissioning and the torrent broadcast factory  (
https://issues.apache.org/jira/browse/SPARK-34193 ).

I don't think this should block the release (it's not a regression), so
my +1 stands as is and I haven't triggered it more than once.

Cheers,

Holden

On Wed, Jan 20, 2021 at 9:05 PM Mridul Muralidharan 
wrote:

> +1
>
> Signatures, digests, etc check out fine.
> Checked out tag and build/tested with -Pyarn -Phadoop-2.7 -Phive
> -Phive-thriftserver -Pmesos -Pkubernetes
>
> The sha512 signature for spark-3.1.1.tgz tripped up my scripts :-)
>
>
> Regards,
> Mridul
>
>
> On Wed, Jan 20, 2021 at 8:17 PM 郑瑞峰  wrote:
>
>> +1 (non-binding)
>>
>> Thank you, Hyukjin
>>
>> Bests,
>> Ruifeng
>>
>> -- 原始邮件 --
>> *发件人:* "Dongjoon Hyun" ;
>> *发送时间:* 2021年1月20日(星期三) 中午1:57
>> *收件人:* "Holden Karau";
>> *抄送:* "Sean Owen";"Hyukjin Kwon"> >;"dev";
>> *主题:* Re: [VOTE] Release Spark 3.1.1 (RC1)
>>
>> +1
>>
>> I additionally
>> - Ran JDBC integration test
>> - Ran with AWS EKS 1.16
>> - Ran unit tests with Python 3.9.1 combination (numpy 1.19.5, pandas
>> 1.2.0, scipy 1.6.0)
>>   (PyArrow is not tested because it's not supported in Python 3.9.x. This
>> is documented via SPARK-34162)
>>
>> There exists some on-going work in the umbrella JIRA (SPARK-33507:
>> Improve and fix cache behavior in v1 and v2).
>> I believe it can be achieved at 3.2.0 and we can add some comments on the
>> release note at 3.1.0.
>>
>> Thank you, Hyukjin and all.
>>
>> Bests,
>> Dongjoon.
>>
>> On Tue, Jan 19, 2021 at 10:49 AM Holden Karau 
>> wrote:
>>
>>> +1, pip installs on Python 3.8
>>>
>>> One potential thing we might want to consider if there ends up being
>>> another RC is that the error message for installing with Python2 could be
>>> clearer.
>>>
>>> Processing ./pyspark-3.1.1.tar.gz
>>> ERROR: Command errored out with exit status 1:
>>>  command: /tmp/py3.1/bin/python2 -c 'import sys, setuptools,
>>> tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-lmlitE/setup.py'"'"';
>>> __file__='"'"'/tmp/pip-req-build-lmlitE/setup.py'"'"';f=getattr(tokenize,
>>> '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"',
>>> '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))'
>>> egg_info --egg-base /tmp/pip-pip-egg-info-W1BsIL
>>>  cwd: /tmp/pip-req-build-lmlitE/
>>> Complete output (6 lines):
>>> Traceback (most recent call last):
>>>   File "", line 1, in 
>>>   File "/tmp/pip-req-build-lmlitE/setup.py", line 31
>>> file=sys.stderr)
>>> ^
>>> SyntaxError: invalid syntax
>>> 
>>> ERROR: Command errored out with exit status 1: python setup.py egg_info
>>> Check the logs for full command output.
>>>
>>>
>>>
>>> On Tue, Jan 19, 2021 at 10:26 AM Sean Owen  wrote:
>>>
>>>> +1 from me. Same results as in 3.1.0 testing.
>>>>
>>>> On Mon, Jan 18, 2021 at 6:06 AM Hyukjin Kwon 
>>>> wrote:
>>>>
>>>>> Please vote on releasing the following candidate as Apache Spark
>>>>> version 3.1.1.
>>>>>
>>>>> The vote is open until January 22nd 4PM PST and passes if a majority
>>>>> +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>>>>
>>>>> [ ] +1 Release this package as Apache Spark 3.1.0
>>>>> [ ] -1 Do not release this package because ...
>>>>>
>>>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>>>
>>>>> The tag to be voted on is v3.1.1-rc1 (commit
>>>>> 53fe365edb948d0e05a5ccb62f349cd9fcb4bb5d):
>>>>> https://github.com/apache/spark/tree/v3.1.1-rc1
>>>>>
>>>>> The release files, including signatures, digests, etc. can be found at:
>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/
>>>>>
>>>>> Signatures used for Spark RCs can be found in this file:
>>>>> https://dist.apache.org/repos/dist/de

Re: [VOTE] Release Spark 3.1.1 (RC1)

2021-01-20 Thread Mridul Muralidharan
+1

Signatures, digests, etc check out fine.
Checked out tag and build/tested with -Pyarn -Phadoop-2.7 -Phive
-Phive-thriftserver -Pmesos -Pkubernetes

The sha512 signature for spark-3.1.1.tgz tripped up my scripts :-)


Regards,
Mridul


On Wed, Jan 20, 2021 at 8:17 PM 郑瑞峰  wrote:

> +1 (non-binding)
>
> Thank you, Hyukjin
>
> Bests,
> Ruifeng
>
> -- 原始邮件 --
> *发件人:* "Dongjoon Hyun" ;
> *发送时间:* 2021年1月20日(星期三) 中午1:57
> *收件人:* "Holden Karau";
> *抄送:* "Sean Owen";"Hyukjin Kwon" >;"dev";
> *主题:* Re: [VOTE] Release Spark 3.1.1 (RC1)
>
> +1
>
> I additionally
> - Ran JDBC integration test
> - Ran with AWS EKS 1.16
> - Ran unit tests with Python 3.9.1 combination (numpy 1.19.5, pandas
> 1.2.0, scipy 1.6.0)
>   (PyArrow is not tested because it's not supported in Python 3.9.x. This
> is documented via SPARK-34162)
>
> There exists some on-going work in the umbrella JIRA (SPARK-33507: Improve
> and fix cache behavior in v1 and v2).
> I believe it can be achieved at 3.2.0 and we can add some comments on the
> release note at 3.1.0.
>
> Thank you, Hyukjin and all.
>
> Bests,
> Dongjoon.
>
> On Tue, Jan 19, 2021 at 10:49 AM Holden Karau 
> wrote:
>
>> +1, pip installs on Python 3.8
>>
>> One potential thing we might want to consider if there ends up being
>> another RC is that the error message for installing with Python2 could be
>> clearer.
>>
>> Processing ./pyspark-3.1.1.tar.gz
>> ERROR: Command errored out with exit status 1:
>>  command: /tmp/py3.1/bin/python2 -c 'import sys, setuptools,
>> tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-lmlitE/setup.py'"'"';
>> __file__='"'"'/tmp/pip-req-build-lmlitE/setup.py'"'"';f=getattr(tokenize,
>> '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"',
>> '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))'
>> egg_info --egg-base /tmp/pip-pip-egg-info-W1BsIL
>>  cwd: /tmp/pip-req-build-lmlitE/
>> Complete output (6 lines):
>> Traceback (most recent call last):
>>   File "", line 1, in 
>>   File "/tmp/pip-req-build-lmlitE/setup.py", line 31
>> file=sys.stderr)
>> ^
>> SyntaxError: invalid syntax
>> 
>> ERROR: Command errored out with exit status 1: python setup.py egg_info
>> Check the logs for full command output.
>>
>>
>>
>> On Tue, Jan 19, 2021 at 10:26 AM Sean Owen  wrote:
>>
>>> +1 from me. Same results as in 3.1.0 testing.
>>>
>>> On Mon, Jan 18, 2021 at 6:06 AM Hyukjin Kwon 
>>> wrote:
>>>
>>>> Please vote on releasing the following candidate as Apache Spark
>>>> version 3.1.1.
>>>>
>>>> The vote is open until January 22nd 4PM PST and passes if a majority +1
>>>> PMC votes are cast, with a minimum of 3 +1 votes.
>>>>
>>>> [ ] +1 Release this package as Apache Spark 3.1.0
>>>> [ ] -1 Do not release this package because ...
>>>>
>>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>>
>>>> The tag to be voted on is v3.1.1-rc1 (commit
>>>> 53fe365edb948d0e05a5ccb62f349cd9fcb4bb5d):
>>>> https://github.com/apache/spark/tree/v3.1.1-rc1
>>>>
>>>> The release files, including signatures, digests, etc. can be found at:
>>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/
>>>>
>>>> Signatures used for Spark RCs can be found in this file:
>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>>
>>>> The staging repository for this release can be found at:
>>>> https://repository.apache.org/content/repositories/orgapachespark-1364
>>>>
>>>> The documentation corresponding to this release can be found at:
>>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-docs/
>>>>
>>>> The list of bug fixes going into 3.1.1 can be found at the following
>>>> URL:
>>>> https://s.apache.org/41kf2
>>>>
>>>> This release is using the release script of the tag v3.1.1-rc1.
>>>>
>>>> FAQ
>>>>
>>>> ===
>>>> What happened to 3.1.0?
>>>> ===
>>>>
>>>> There was a 

Re: [VOTE] Release Spark 3.1.1 (RC1)

2021-01-20 Thread Terry Kim
+1 (non-binding)

(Also ran .NET for Apache Spark E2E tests, which touch many of DataFrame,
Function APIs)

Thanks,
Terry

On Wed, Jan 20, 2021 at 6:01 AM Jacek Laskowski  wrote:

> Hi,
>
> +1 (non-binding)
>
> 1. Built locally using AdoptOpenJDK (build 11.0.9+11) with
> -Pyarn,kubernetes,hive-thriftserver,scala-2.12 -DskipTests
> 2. Ran batch and streaming demos using Spark on Kubernetes (minikube)
> using spark-shell (client deploy mode) and spark-submit --deploy-mode
> cluster
>
> I reported a non-blocking issue with "the only developer Matei" (
> https://issues.apache.org/jira/browse/SPARK-34158)
>
> Found a minor non-blocking (but annoying) issue in Spark on k8s that's
> different from 3.0.1 that should really be silenced as the other debug
> message in ExecutorPodsAllocator:
>
> 21/01/19 12:23:26 DEBUG ExecutorPodsAllocator: ResourceProfile Id: 0 pod
> allocation status: 2 running, 0 pending. 0 unacknowledged.
> 21/01/19 12:23:27 DEBUG ExecutorPodsAllocator: ResourceProfile Id: 0 pod
> allocation status: 2 running, 0 pending. 0 unacknowledged.
> 21/01/19 12:23:28 DEBUG ExecutorPodsAllocator: ResourceProfile Id: 0 pod
> allocation status: 2 running, 0 pending. 0 unacknowledged.
> 21/01/19 12:23:29 DEBUG ExecutorPodsAllocator: ResourceProfile Id: 0 pod
> allocation status: 2 running, 0 pending. 0 unacknowledged.
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://about.me/JacekLaskowski
> "The Internals Of" Online Books 
> Follow me on https://twitter.com/jaceklaskowski
>
> 
>
>
> On Mon, Jan 18, 2021 at 1:06 PM Hyukjin Kwon  wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 3.1.1.
>>
>> The vote is open until January 22nd 4PM PST and passes if a majority +1
>> PMC votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 3.1.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v3.1.1-rc1 (commit
>> 53fe365edb948d0e05a5ccb62f349cd9fcb4bb5d):
>> https://github.com/apache/spark/tree/v3.1.1-rc1
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1364
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-docs/
>>
>> The list of bug fixes going into 3.1.1 can be found at the following URL:
>> https://s.apache.org/41kf2
>>
>> This release is using the release script of the tag v3.1.1-rc1.
>>
>> FAQ
>>
>> ===
>> What happened to 3.1.0?
>> ===
>>
>> There was a technical issue during Apache Spark 3.1.0 preparation, and it
>> was discussed and decided to skip 3.1.0.
>> Please see
>> https://spark.apache.org/news/next-official-release-spark-3.1.1.html for
>> more details.
>>
>> =
>> How can I help test this release?
>> =
>>
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install
>> the current RC via "pip install
>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/pyspark-3.1.1.tar.gz
>> "
>> and see if anything important breaks.
>> In the Java/Scala, you can add the staging repository to your projects
>> resolvers and test
>> with the RC (make sure to clean up the artifact cache before/after so
>> you don't end up building with an out of date RC going forward).
>>
>> ===
>> What should happen to JIRA tickets still targeting 3.1.1?
>> ===
>>
>> The current list of open tickets targeted at 3.1.1 can be found at:
>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>> Version/s" = 3.1.1
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should
>> be worked on immediately. Everything else please retarget to an
>> appropriate release.
>>
>> ==
>> But my bug isn't fixed?
>> ==
>>
>> In order to make timely releases, we will typically not hold the
>> release unless the bug in question is a regression from the previous
>> release. That being said, if there is something which is a regression
>> that has not been correctly targeted please ping me or a committer to
>> help target the issue.
>>
>>


Re: [VOTE] Release Spark 3.1.1 (RC1)

2021-01-20 Thread Jacek Laskowski
Hi,

+1 (non-binding)

1. Built locally using AdoptOpenJDK (build 11.0.9+11) with
-Pyarn,kubernetes,hive-thriftserver,scala-2.12 -DskipTests
2. Ran batch and streaming demos using Spark on Kubernetes (minikube) using
spark-shell (client deploy mode) and spark-submit --deploy-mode cluster

I reported a non-blocking issue with "the only developer Matei" (
https://issues.apache.org/jira/browse/SPARK-34158)

Found a minor non-blocking (but annoying) issue in Spark on k8s that's
different from 3.0.1 that should really be silenced as the other debug
message in ExecutorPodsAllocator:

21/01/19 12:23:26 DEBUG ExecutorPodsAllocator: ResourceProfile Id: 0 pod
allocation status: 2 running, 0 pending. 0 unacknowledged.
21/01/19 12:23:27 DEBUG ExecutorPodsAllocator: ResourceProfile Id: 0 pod
allocation status: 2 running, 0 pending. 0 unacknowledged.
21/01/19 12:23:28 DEBUG ExecutorPodsAllocator: ResourceProfile Id: 0 pod
allocation status: 2 running, 0 pending. 0 unacknowledged.
21/01/19 12:23:29 DEBUG ExecutorPodsAllocator: ResourceProfile Id: 0 pod
allocation status: 2 running, 0 pending. 0 unacknowledged.

Pozdrawiam,
Jacek Laskowski

https://about.me/JacekLaskowski
"The Internals Of" Online Books 
Follow me on https://twitter.com/jaceklaskowski




On Mon, Jan 18, 2021 at 1:06 PM Hyukjin Kwon  wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 3.1.1.
>
> The vote is open until January 22nd 4PM PST and passes if a majority +1
> PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.1.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.1.1-rc1 (commit
> 53fe365edb948d0e05a5ccb62f349cd9fcb4bb5d):
> https://github.com/apache/spark/tree/v3.1.1-rc1
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1364
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-docs/
>
> The list of bug fixes going into 3.1.1 can be found at the following URL:
> https://s.apache.org/41kf2
>
> This release is using the release script of the tag v3.1.1-rc1.
>
> FAQ
>
> ===
> What happened to 3.1.0?
> ===
>
> There was a technical issue during Apache Spark 3.1.0 preparation, and it
> was discussed and decided to skip 3.1.0.
> Please see
> https://spark.apache.org/news/next-official-release-spark-3.1.1.html for
> more details.
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC via "pip install
> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/pyspark-3.1.1.tar.gz
> "
> and see if anything important breaks.
> In the Java/Scala, you can add the staging repository to your projects
> resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with an out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.1.1?
> ===
>
> The current list of open tickets targeted at 3.1.1 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.1.1
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>


Re: [VOTE] Release Spark 3.1.1 (RC1)

2021-01-19 Thread Dongjoon Hyun
+1

I additionally
- Ran JDBC integration test
- Ran with AWS EKS 1.16
- Ran unit tests with Python 3.9.1 combination (numpy 1.19.5, pandas 1.2.0,
scipy 1.6.0)
  (PyArrow is not tested because it's not supported in Python 3.9.x. This
is documented via SPARK-34162)

There exists some on-going work in the umbrella JIRA (SPARK-33507: Improve
and fix cache behavior in v1 and v2).
I believe it can be achieved at 3.2.0 and we can add some comments on the
release note at 3.1.0.

Thank you, Hyukjin and all.

Bests,
Dongjoon.

On Tue, Jan 19, 2021 at 10:49 AM Holden Karau  wrote:

> +1, pip installs on Python 3.8
>
> One potential thing we might want to consider if there ends up being
> another RC is that the error message for installing with Python2 could be
> clearer.
>
> Processing ./pyspark-3.1.1.tar.gz
> ERROR: Command errored out with exit status 1:
>  command: /tmp/py3.1/bin/python2 -c 'import sys, setuptools, tokenize;
> sys.argv[0] = '"'"'/tmp/pip-req-build-lmlitE/setup.py'"'"';
> __file__='"'"'/tmp/pip-req-build-lmlitE/setup.py'"'"';f=getattr(tokenize,
> '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"',
> '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))'
> egg_info --egg-base /tmp/pip-pip-egg-info-W1BsIL
>  cwd: /tmp/pip-req-build-lmlitE/
> Complete output (6 lines):
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/tmp/pip-req-build-lmlitE/setup.py", line 31
> file=sys.stderr)
> ^
> SyntaxError: invalid syntax
> 
> ERROR: Command errored out with exit status 1: python setup.py egg_info
> Check the logs for full command output.
>
>
>
> On Tue, Jan 19, 2021 at 10:26 AM Sean Owen  wrote:
>
>> +1 from me. Same results as in 3.1.0 testing.
>>
>> On Mon, Jan 18, 2021 at 6:06 AM Hyukjin Kwon  wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 3.1.1.
>>>
>>> The vote is open until January 22nd 4PM PST and passes if a majority +1
>>> PMC votes are cast, with a minimum of 3 +1 votes.
>>>
>>> [ ] +1 Release this package as Apache Spark 3.1.0
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is v3.1.1-rc1 (commit
>>> 53fe365edb948d0e05a5ccb62f349cd9fcb4bb5d):
>>> https://github.com/apache/spark/tree/v3.1.1-rc1
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/
>>>
>>> Signatures used for Spark RCs can be found in this file:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1364
>>>
>>> The documentation corresponding to this release can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-docs/
>>>
>>> The list of bug fixes going into 3.1.1 can be found at the following URL:
>>> https://s.apache.org/41kf2
>>>
>>> This release is using the release script of the tag v3.1.1-rc1.
>>>
>>> FAQ
>>>
>>> ===
>>> What happened to 3.1.0?
>>> ===
>>>
>>> There was a technical issue during Apache Spark 3.1.0 preparation, and
>>> it was discussed and decided to skip 3.1.0.
>>> Please see
>>> https://spark.apache.org/news/next-official-release-spark-3.1.1.html
>>> for more details.
>>>
>>> =
>>> How can I help test this release?
>>> =
>>>
>>> If you are a Spark user, you can help us test this release by taking
>>> an existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> If you're working in PySpark you can set up a virtual env and install
>>> the current RC via "pip install
>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/pyspark-3.1.1.tar.gz
>>> "
>>> and see if anything important breaks.
>>> In the Java/Scala, you can add the staging repository to your projects
>>> resolvers and test
>>> with the RC (make sure to clean up the artifact cache before/after so
>>> you don't end up building with an out of date RC going forward).
>>>
>>> ===
>>> What should happen to JIRA tickets still targeting 3.1.1?
>>> ===
>>>
>>> The current list of open tickets targeted at 3.1.1 can be found at:
>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>> Version/s" = 3.1.1
>>>
>>> Committers should look at those and triage. Extremely important bug
>>> fixes, documentation, and API tweaks that impact compatibility should
>>> be worked on immediately. Everything else please retarget to an
>>> appropriate release.
>>>
>>> ==
>>> But my bug isn't fixed?
>>> ==
>>>
>>> In order to make timely releases, we will typically 

Re: [VOTE] Release Spark 3.1.1 (RC1)

2021-01-19 Thread Holden Karau
+1, pip installs on Python 3.8

One potential thing we might want to consider if there ends up being
another RC is that the error message for installing with Python2 could be
clearer.

Processing ./pyspark-3.1.1.tar.gz
ERROR: Command errored out with exit status 1:
 command: /tmp/py3.1/bin/python2 -c 'import sys, setuptools, tokenize;
sys.argv[0] = '"'"'/tmp/pip-req-build-lmlitE/setup.py'"'"';
__file__='"'"'/tmp/pip-req-build-lmlitE/setup.py'"'"';f=getattr(tokenize,
'"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"',
'"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))'
egg_info --egg-base /tmp/pip-pip-egg-info-W1BsIL
 cwd: /tmp/pip-req-build-lmlitE/
Complete output (6 lines):
Traceback (most recent call last):
  File "", line 1, in 
  File "/tmp/pip-req-build-lmlitE/setup.py", line 31
file=sys.stderr)
^
SyntaxError: invalid syntax

ERROR: Command errored out with exit status 1: python setup.py egg_info
Check the logs for full command output.



On Tue, Jan 19, 2021 at 10:26 AM Sean Owen  wrote:

> +1 from me. Same results as in 3.1.0 testing.
>
> On Mon, Jan 18, 2021 at 6:06 AM Hyukjin Kwon  wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 3.1.1.
>>
>> The vote is open until January 22nd 4PM PST and passes if a majority +1
>> PMC votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 3.1.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v3.1.1-rc1 (commit
>> 53fe365edb948d0e05a5ccb62f349cd9fcb4bb5d):
>> https://github.com/apache/spark/tree/v3.1.1-rc1
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1364
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-docs/
>>
>> The list of bug fixes going into 3.1.1 can be found at the following URL:
>> https://s.apache.org/41kf2
>>
>> This release is using the release script of the tag v3.1.1-rc1.
>>
>> FAQ
>>
>> ===
>> What happened to 3.1.0?
>> ===
>>
>> There was a technical issue during Apache Spark 3.1.0 preparation, and it
>> was discussed and decided to skip 3.1.0.
>> Please see
>> https://spark.apache.org/news/next-official-release-spark-3.1.1.html for
>> more details.
>>
>> =
>> How can I help test this release?
>> =
>>
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install
>> the current RC via "pip install
>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/pyspark-3.1.1.tar.gz
>> "
>> and see if anything important breaks.
>> In the Java/Scala, you can add the staging repository to your projects
>> resolvers and test
>> with the RC (make sure to clean up the artifact cache before/after so
>> you don't end up building with an out of date RC going forward).
>>
>> ===
>> What should happen to JIRA tickets still targeting 3.1.1?
>> ===
>>
>> The current list of open tickets targeted at 3.1.1 can be found at:
>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>> Version/s" = 3.1.1
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should
>> be worked on immediately. Everything else please retarget to an
>> appropriate release.
>>
>> ==
>> But my bug isn't fixed?
>> ==
>>
>> In order to make timely releases, we will typically not hold the
>> release unless the bug in question is a regression from the previous
>> release. That being said, if there is something which is a regression
>> that has not been correctly targeted please ping me or a committer to
>> help target the issue.
>>
>>

-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


Re: [VOTE] Release Spark 3.1.1 (RC1)

2021-01-19 Thread Sean Owen
+1 from me. Same results as in 3.1.0 testing.

On Mon, Jan 18, 2021 at 6:06 AM Hyukjin Kwon  wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 3.1.1.
>
> The vote is open until January 22nd 4PM PST and passes if a majority +1
> PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.1.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.1.1-rc1 (commit
> 53fe365edb948d0e05a5ccb62f349cd9fcb4bb5d):
> https://github.com/apache/spark/tree/v3.1.1-rc1
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1364
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-docs/
>
> The list of bug fixes going into 3.1.1 can be found at the following URL:
> https://s.apache.org/41kf2
>
> This release is using the release script of the tag v3.1.1-rc1.
>
> FAQ
>
> ===
> What happened to 3.1.0?
> ===
>
> There was a technical issue during Apache Spark 3.1.0 preparation, and it
> was discussed and decided to skip 3.1.0.
> Please see
> https://spark.apache.org/news/next-official-release-spark-3.1.1.html for
> more details.
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC via "pip install
> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/pyspark-3.1.1.tar.gz
> "
> and see if anything important breaks.
> In the Java/Scala, you can add the staging repository to your projects
> resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with an out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.1.1?
> ===
>
> The current list of open tickets targeted at 3.1.1 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.1.1
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>


Re: [VOTE] Release Spark 3.1.1 (RC1)

2021-01-19 Thread John Zhuge
+1 (non-binding)

On Tue, Jan 19, 2021 at 4:08 AM JackyLee  wrote:

> +1
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

-- 
John Zhuge


Re: [VOTE] Release Spark 3.1.1 (RC1)

2021-01-19 Thread JackyLee
+1



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Spark 3.1.1 (RC1)

2021-01-19 Thread Prashant Sharma
+1

On Tue, Jan 19, 2021 at 4:38 PM Yang,Jie(INF)  wrote:

> +1
>
>
>
> *发件人**: *Gengliang Wang 
> *日期**: *2021年1月19日 星期二 下午3:04
> *收件人**: *Jungtaek Lim 
> *抄送**: *Yuming Wang , Hyukjin Kwon ,
> dev 
> *主题**: *Re: [VOTE] Release Spark 3.1.1 (RC1)
>
>
>
> +1 (non-binding)
>
>
>
>
>
> On Tue, Jan 19, 2021 at 2:05 PM Jungtaek Lim 
> wrote:
>
> +1 (non-binding)
>
>
>
> * verified signature and sha for all files (there's a glitch which I'll
> describe in below)
>
> * built source (DISCLAIMER: didn't run tests) and made custom
> distribution, and built a docker image based on the distribution
>
>   - used profiles: kubernetes, hadoop-3.2, hadoop-cloud
>
> * ran some SS PySpark queries (Rate to Kafka, Kafka to Kafka) with Spark
> on k8s (used MinIO - s3 compatible - as checkpoint location)
>
>   - for Kafka reader, tested both approaches: newer (offset via admin
> client) and older (offset via consumer)
>
> * ran simple batch query with magic committer against MinIO storage &
> dynamic volume provisioning (with NFS)
>
> * verified DataStreamReader.table & DataStreamWriter.toTable works in
> PySpark (which also verifies on Scala API as well)
>
> * ran test stateful SS queries and checked the new additions of SS UI
> (state store & watermark information)
>
>
>
> A glitch from verifying sha; the file format of sha512 is different
> between source targz and others. My tool succeeded with others and failed
> with source targz, though I confirmed sha itself is the same. Not a blocker
> but would be ideal if we can make it be consistent.
>
>
>
> Thanks for driving the release process!
>
>
>
> On Tue, Jan 19, 2021 at 2:25 PM Yuming Wang  wrote:
>
> +1.
>
>
>
> On Tue, Jan 19, 2021 at 7:54 AM Hyukjin Kwon  wrote:
>
> I forgot to say :). I'll start with my +1.
>
>
>
> On Mon, 18 Jan 2021, 21:06 Hyukjin Kwon,  wrote:
>
> Please vote on releasing the following candidate as Apache Spark version
> 3.1.1.
>
>
>
> The vote is open until January 22nd 4PM PST and passes if a majority +1
> PMC votes are cast, with a minimum of 3 +1 votes.
>
>
>
> [ ] +1 Release this package as Apache Spark 3.1.0
>
> [ ] -1 Do not release this package because ...
>
>
>
> To learn more about Apache Spark, please see http://spark.apache.org/
> <https://mailshield.baidu.com/check?q=eJcUboQ1HRRomPZKEwRzpl69wA8DbI%2fNIiRNsQ%3d%3d>
>
>
>
> The tag to be voted on is v3.1.1-rc1 (commit
> 53fe365edb948d0e05a5ccb62f349cd9fcb4bb5d):
>
> https://github.com/apache/spark/tree/v3.1.1-rc1
> <https://mailshield.baidu.com/check?q=zbqz3MYVbEd1sSCjZ7dX%2fJD3avlvWUFYXV4XHiBGkriN7VNdz23Md8oHpZhgZNHwTl%2b6pw%3d%3d>
>
>
>
> The release files, including signatures, digests, etc. can be found at:
>
> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/
> <https://mailshield.baidu.com/check?q=aBUdO5bKlQP35zEYFIwQs7sgFgN%2bTaWXQBUm6jT9tZTdMsKBOoSjep4yC04DBd060%2bkzEn8GSVRVmKCdNzWkNA%3d%3d>
>
>
>
> Signatures used for Spark RCs can be found in this file:
>
> https://dist.apache.org/repos/dist/dev/spark/KEYS
> <https://mailshield.baidu.com/check?q=E6fHbSXEWw02TTJBpc3bfA9mi7ea0YiWcNHkm%2fDJxwlaWinGnMdaoO1PahHhgj00vKwcbElpuHA%3d>
>
>
>
> The staging repository for this release can be found at:
>
> https://repository.apache.org/content/repositories/orgapachespark-1364
> <https://mailshield.baidu.com/check?q=VRy9actRQw148oVco9CQh18idb79jbu25AC5BncG93syW0z8%2fFyrn5GAVgSG4PWYkKnxeNs0GJ3sGOlFC5uqGY74FJQLjdigMqU%2fPg%3d%3d>
>
>
>
> The documentation corresponding to this release can be found at:
>
> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-docs/
> <https://mailshield.baidu.com/check?q=BuqQ5RY8M9vELgwt8zAa05tgP3VLDkM5wCGFLETwfH46h6yW0kFTMFljBWN0FCBxW9L25rwk2mugmwQie48oBugdDnc%3d>
>
>
>
> The list of bug fixes going into 3.1.1 can be found at the following URL:
>
> https://s.apache.org/41kf2
> <https://mailshield.baidu.com/check?q=Gtq8o%2b5UFC60hSOfJ6de%2bsVEWcpGcb7IwyVvW%2fuTK%2b8%3d>
>
>
>
> This release is using the release script of the tag v3.1.1-rc1.
>
>
>
> FAQ
>
>
>
> ===
> What happened to 3.1.0?
>
> ===
>
> There was a technical issue during Apache Spark 3.1.0 preparation, and it
> was discussed and decided to skip 3.1.0.
> Please see
> https://spark.apache.org/news/next-official-release-spark-3.1.1.html
> <https://mailshield.baidu.com/check?q=QADCNwn2yfjg8NxQF0Z8XlCKD1pWeI%2flwL0aMyvhC5KneSAMGvjx1iPMGubKOn50YiVHcBkWh1%2br%2frGpo5dzzA8QFvRqTagn>
> for more de

Re: [VOTE] Release Spark 3.1.1 (RC1)

2021-01-19 Thread Yang,Jie(INF)
+1

发件人: Gengliang Wang 
日期: 2021年1月19日 星期二 下午3:04
收件人: Jungtaek Lim 
抄送: Yuming Wang , Hyukjin Kwon , dev 

主题: Re: [VOTE] Release Spark 3.1.1 (RC1)

+1 (non-binding)


On Tue, Jan 19, 2021 at 2:05 PM Jungtaek Lim 
mailto:kabhwan.opensou...@gmail.com>> wrote:
+1 (non-binding)

* verified signature and sha for all files (there's a glitch which I'll 
describe in below)
* built source (DISCLAIMER: didn't run tests) and made custom distribution, and 
built a docker image based on the distribution
  - used profiles: kubernetes, hadoop-3.2, hadoop-cloud
* ran some SS PySpark queries (Rate to Kafka, Kafka to Kafka) with Spark on k8s 
(used MinIO - s3 compatible - as checkpoint location)
  - for Kafka reader, tested both approaches: newer (offset via admin client) 
and older (offset via consumer)
* ran simple batch query with magic committer against MinIO storage & dynamic 
volume provisioning (with NFS)
* verified DataStreamReader.table & DataStreamWriter.toTable works in PySpark 
(which also verifies on Scala API as well)
* ran test stateful SS queries and checked the new additions of SS UI (state 
store & watermark information)

A glitch from verifying sha; the file format of sha512 is different between 
source targz and others. My tool succeeded with others and failed with source 
targz, though I confirmed sha itself is the same. Not a blocker but would be 
ideal if we can make it be consistent.

Thanks for driving the release process!

On Tue, Jan 19, 2021 at 2:25 PM Yuming Wang 
mailto:wgy...@gmail.com>> wrote:
+1.

On Tue, Jan 19, 2021 at 7:54 AM Hyukjin Kwon 
mailto:gurwls...@gmail.com>> wrote:
I forgot to say :). I'll start with my +1.

On Mon, 18 Jan 2021, 21:06 Hyukjin Kwon, 
mailto:gurwls...@gmail.com>> wrote:
Please vote on releasing the following candidate as Apache Spark version 3.1.1.

The vote is open until January 22nd 4PM PST and passes if a majority +1 PMC 
votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 3.1.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see 
http://spark.apache.org/<https://mailshield.baidu.com/check?q=eJcUboQ1HRRomPZKEwRzpl69wA8DbI%2fNIiRNsQ%3d%3d>

The tag to be voted on is v3.1.1-rc1 (commit 
53fe365edb948d0e05a5ccb62f349cd9fcb4bb5d):
https://github.com/apache/spark/tree/v3.1.1-rc1<https://mailshield.baidu.com/check?q=zbqz3MYVbEd1sSCjZ7dX%2fJD3avlvWUFYXV4XHiBGkriN7VNdz23Md8oHpZhgZNHwTl%2b6pw%3d%3d>

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/<https://mailshield.baidu.com/check?q=aBUdO5bKlQP35zEYFIwQs7sgFgN%2bTaWXQBUm6jT9tZTdMsKBOoSjep4yC04DBd060%2bkzEn8GSVRVmKCdNzWkNA%3d%3d>

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS<https://mailshield.baidu.com/check?q=E6fHbSXEWw02TTJBpc3bfA9mi7ea0YiWcNHkm%2fDJxwlaWinGnMdaoO1PahHhgj00vKwcbElpuHA%3d>

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1364<https://mailshield.baidu.com/check?q=VRy9actRQw148oVco9CQh18idb79jbu25AC5BncG93syW0z8%2fFyrn5GAVgSG4PWYkKnxeNs0GJ3sGOlFC5uqGY74FJQLjdigMqU%2fPg%3d%3d>

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-docs/<https://mailshield.baidu.com/check?q=BuqQ5RY8M9vELgwt8zAa05tgP3VLDkM5wCGFLETwfH46h6yW0kFTMFljBWN0FCBxW9L25rwk2mugmwQie48oBugdDnc%3d>

The list of bug fixes going into 3.1.1 can be found at the following URL:
https://s.apache.org/41kf2<https://mailshield.baidu.com/check?q=Gtq8o%2b5UFC60hSOfJ6de%2bsVEWcpGcb7IwyVvW%2fuTK%2b8%3d>

This release is using the release script of the tag v3.1.1-rc1.

FAQ

===
What happened to 3.1.0?
===

There was a technical issue during Apache Spark 3.1.0 preparation, and it was 
discussed and decided to skip 3.1.0.
Please see 
https://spark.apache.org/news/next-official-release-spark-3.1.1.html<https://mailshield.baidu.com/check?q=QADCNwn2yfjg8NxQF0Z8XlCKD1pWeI%2flwL0aMyvhC5KneSAMGvjx1iPMGubKOn50YiVHcBkWh1%2br%2frGpo5dzzA8QFvRqTagn>
 for more details.
=
How can I help test this release?
=

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC via "pip install 
https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/pyspark-3.1.1.tar.gz<https://mailshield.baidu.com/check?q=joMLIhLN%2fq%2bgh99cyr7a5mYBSslorlP3n7SbsxeOrBa4d8OqZg7koIbumZ4L5seSeHBV040KquPwDlv9VGEgq28%2fLQt7ywvG1IJqBcDHgXhI2laB>"
and see if anything important breaks.
In the Java/Scala, you can add the staging repository to your projects 
resol

Re: [VOTE] Release Spark 3.1.1 (RC1)

2021-01-18 Thread Gengliang Wang
+1 (non-binding)


On Tue, Jan 19, 2021 at 2:05 PM Jungtaek Lim 
wrote:

> +1 (non-binding)
>
> * verified signature and sha for all files (there's a glitch which I'll
> describe in below)
> * built source (DISCLAIMER: didn't run tests) and made custom
> distribution, and built a docker image based on the distribution
>   - used profiles: kubernetes, hadoop-3.2, hadoop-cloud
> * ran some SS PySpark queries (Rate to Kafka, Kafka to Kafka) with Spark
> on k8s (used MinIO - s3 compatible - as checkpoint location)
>   - for Kafka reader, tested both approaches: newer (offset via admin
> client) and older (offset via consumer)
> * ran simple batch query with magic committer against MinIO storage &
> dynamic volume provisioning (with NFS)
> * verified DataStreamReader.table & DataStreamWriter.toTable works in
> PySpark (which also verifies on Scala API as well)
> * ran test stateful SS queries and checked the new additions of SS UI
> (state store & watermark information)
>
> A glitch from verifying sha; the file format of sha512 is different
> between source targz and others. My tool succeeded with others and failed
> with source targz, though I confirmed sha itself is the same. Not a blocker
> but would be ideal if we can make it be consistent.
>
> Thanks for driving the release process!
>
> On Tue, Jan 19, 2021 at 2:25 PM Yuming Wang  wrote:
>
>> +1.
>>
>> On Tue, Jan 19, 2021 at 7:54 AM Hyukjin Kwon  wrote:
>>
>>> I forgot to say :). I'll start with my +1.
>>>
>>> On Mon, 18 Jan 2021, 21:06 Hyukjin Kwon,  wrote:
>>>
 Please vote on releasing the following candidate as Apache Spark
 version 3.1.1.

 The vote is open until January 22nd 4PM PST and passes if a majority +1
 PMC votes are cast, with a minimum of 3 +1 votes.

 [ ] +1 Release this package as Apache Spark 3.1.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see http://spark.apache.org/

 The tag to be voted on is v3.1.1-rc1 (commit
 53fe365edb948d0e05a5ccb62f349cd9fcb4bb5d):
 https://github.com/apache/spark/tree/v3.1.1-rc1

 The release files, including signatures, digests, etc. can be found at:
 https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/

 Signatures used for Spark RCs can be found in this file:
 https://dist.apache.org/repos/dist/dev/spark/KEYS

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1364

 The documentation corresponding to this release can be found at:
 https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-docs/

 The list of bug fixes going into 3.1.1 can be found at the following
 URL:
 https://s.apache.org/41kf2

 This release is using the release script of the tag v3.1.1-rc1.

 FAQ

 ===
 What happened to 3.1.0?
 ===

 There was a technical issue during Apache Spark 3.1.0 preparation, and
 it was discussed and decided to skip 3.1.0.
 Please see
 https://spark.apache.org/news/next-official-release-spark-3.1.1.html
 for more details.

 =
 How can I help test this release?
 =

 If you are a Spark user, you can help us test this release by taking
 an existing Spark workload and running on this release candidate, then
 reporting any regressions.

 If you're working in PySpark you can set up a virtual env and install
 the current RC via "pip install
 https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/pyspark-3.1.1.tar.gz
 "
 and see if anything important breaks.
 In the Java/Scala, you can add the staging repository to your projects
 resolvers and test
 with the RC (make sure to clean up the artifact cache before/after so
 you don't end up building with an out of date RC going forward).

 ===
 What should happen to JIRA tickets still targeting 3.1.1?
 ===

 The current list of open tickets targeted at 3.1.1 can be found at:
 https://issues.apache.org/jira/projects/SPARK and search for "Target
 Version/s" = 3.1.1

 Committers should look at those and triage. Extremely important bug
 fixes, documentation, and API tweaks that impact compatibility should
 be worked on immediately. Everything else please retarget to an
 appropriate release.

 ==
 But my bug isn't fixed?
 ==

 In order to make timely releases, we will typically not hold the
 release unless the bug in question is a regression from the previous
 release. That being said, if there is something which is a regression
 that has not been correctly targeted please ping me or a committer to
 help target the issue.

Re: [VOTE] Release Spark 3.1.1 (RC1)

2021-01-18 Thread Jungtaek Lim
+1 (non-binding)

* verified signature and sha for all files (there's a glitch which I'll
describe in below)
* built source (DISCLAIMER: didn't run tests) and made custom distribution,
and built a docker image based on the distribution
  - used profiles: kubernetes, hadoop-3.2, hadoop-cloud
* ran some SS PySpark queries (Rate to Kafka, Kafka to Kafka) with Spark on
k8s (used MinIO - s3 compatible - as checkpoint location)
  - for Kafka reader, tested both approaches: newer (offset via admin
client) and older (offset via consumer)
* ran simple batch query with magic committer against MinIO storage &
dynamic volume provisioning (with NFS)
* verified DataStreamReader.table & DataStreamWriter.toTable works in
PySpark (which also verifies on Scala API as well)
* ran test stateful SS queries and checked the new additions of SS UI
(state store & watermark information)

A glitch from verifying sha; the file format of sha512 is different between
source targz and others. My tool succeeded with others and failed with
source targz, though I confirmed sha itself is the same. Not a blocker but
would be ideal if we can make it be consistent.

Thanks for driving the release process!

On Tue, Jan 19, 2021 at 2:25 PM Yuming Wang  wrote:

> +1.
>
> On Tue, Jan 19, 2021 at 7:54 AM Hyukjin Kwon  wrote:
>
>> I forgot to say :). I'll start with my +1.
>>
>> On Mon, 18 Jan 2021, 21:06 Hyukjin Kwon,  wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 3.1.1.
>>>
>>> The vote is open until January 22nd 4PM PST and passes if a majority +1
>>> PMC votes are cast, with a minimum of 3 +1 votes.
>>>
>>> [ ] +1 Release this package as Apache Spark 3.1.0
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is v3.1.1-rc1 (commit
>>> 53fe365edb948d0e05a5ccb62f349cd9fcb4bb5d):
>>> https://github.com/apache/spark/tree/v3.1.1-rc1
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/
>>>
>>> Signatures used for Spark RCs can be found in this file:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1364
>>>
>>> The documentation corresponding to this release can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-docs/
>>>
>>> The list of bug fixes going into 3.1.1 can be found at the following URL:
>>> https://s.apache.org/41kf2
>>>
>>> This release is using the release script of the tag v3.1.1-rc1.
>>>
>>> FAQ
>>>
>>> ===
>>> What happened to 3.1.0?
>>> ===
>>>
>>> There was a technical issue during Apache Spark 3.1.0 preparation, and
>>> it was discussed and decided to skip 3.1.0.
>>> Please see
>>> https://spark.apache.org/news/next-official-release-spark-3.1.1.html
>>> for more details.
>>>
>>> =
>>> How can I help test this release?
>>> =
>>>
>>> If you are a Spark user, you can help us test this release by taking
>>> an existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> If you're working in PySpark you can set up a virtual env and install
>>> the current RC via "pip install
>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/pyspark-3.1.1.tar.gz
>>> "
>>> and see if anything important breaks.
>>> In the Java/Scala, you can add the staging repository to your projects
>>> resolvers and test
>>> with the RC (make sure to clean up the artifact cache before/after so
>>> you don't end up building with an out of date RC going forward).
>>>
>>> ===
>>> What should happen to JIRA tickets still targeting 3.1.1?
>>> ===
>>>
>>> The current list of open tickets targeted at 3.1.1 can be found at:
>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>> Version/s" = 3.1.1
>>>
>>> Committers should look at those and triage. Extremely important bug
>>> fixes, documentation, and API tweaks that impact compatibility should
>>> be worked on immediately. Everything else please retarget to an
>>> appropriate release.
>>>
>>> ==
>>> But my bug isn't fixed?
>>> ==
>>>
>>> In order to make timely releases, we will typically not hold the
>>> release unless the bug in question is a regression from the previous
>>> release. That being said, if there is something which is a regression
>>> that has not been correctly targeted please ping me or a committer to
>>> help target the issue.
>>>
>>>


Re: [VOTE] Release Spark 3.1.1 (RC1)

2021-01-18 Thread Yuming Wang
+1.

On Tue, Jan 19, 2021 at 7:54 AM Hyukjin Kwon  wrote:

> I forgot to say :). I'll start with my +1.
>
> On Mon, 18 Jan 2021, 21:06 Hyukjin Kwon,  wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 3.1.1.
>>
>> The vote is open until January 22nd 4PM PST and passes if a majority +1
>> PMC votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 3.1.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v3.1.1-rc1 (commit
>> 53fe365edb948d0e05a5ccb62f349cd9fcb4bb5d):
>> https://github.com/apache/spark/tree/v3.1.1-rc1
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1364
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-docs/
>>
>> The list of bug fixes going into 3.1.1 can be found at the following URL:
>> https://s.apache.org/41kf2
>>
>> This release is using the release script of the tag v3.1.1-rc1.
>>
>> FAQ
>>
>> ===
>> What happened to 3.1.0?
>> ===
>>
>> There was a technical issue during Apache Spark 3.1.0 preparation, and it
>> was discussed and decided to skip 3.1.0.
>> Please see
>> https://spark.apache.org/news/next-official-release-spark-3.1.1.html for
>> more details.
>>
>> =
>> How can I help test this release?
>> =
>>
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install
>> the current RC via "pip install
>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/pyspark-3.1.1.tar.gz
>> "
>> and see if anything important breaks.
>> In the Java/Scala, you can add the staging repository to your projects
>> resolvers and test
>> with the RC (make sure to clean up the artifact cache before/after so
>> you don't end up building with an out of date RC going forward).
>>
>> ===
>> What should happen to JIRA tickets still targeting 3.1.1?
>> ===
>>
>> The current list of open tickets targeted at 3.1.1 can be found at:
>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>> Version/s" = 3.1.1
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should
>> be worked on immediately. Everything else please retarget to an
>> appropriate release.
>>
>> ==
>> But my bug isn't fixed?
>> ==
>>
>> In order to make timely releases, we will typically not hold the
>> release unless the bug in question is a regression from the previous
>> release. That being said, if there is something which is a regression
>> that has not been correctly targeted please ping me or a committer to
>> help target the issue.
>>
>>


Re: [VOTE] Release Spark 3.1.1 (RC1)

2021-01-18 Thread Hyukjin Kwon
I forgot to say :). I'll start with my +1.

On Mon, 18 Jan 2021, 21:06 Hyukjin Kwon,  wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 3.1.1.
>
> The vote is open until January 22nd 4PM PST and passes if a majority +1
> PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.1.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.1.1-rc1 (commit
> 53fe365edb948d0e05a5ccb62f349cd9fcb4bb5d):
> https://github.com/apache/spark/tree/v3.1.1-rc1
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1364
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-docs/
>
> The list of bug fixes going into 3.1.1 can be found at the following URL:
> https://s.apache.org/41kf2
>
> This release is using the release script of the tag v3.1.1-rc1.
>
> FAQ
>
> ===
> What happened to 3.1.0?
> ===
>
> There was a technical issue during Apache Spark 3.1.0 preparation, and it
> was discussed and decided to skip 3.1.0.
> Please see
> https://spark.apache.org/news/next-official-release-spark-3.1.1.html for
> more details.
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC via "pip install
> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/pyspark-3.1.1.tar.gz
> "
> and see if anything important breaks.
> In the Java/Scala, you can add the staging repository to your projects
> resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with an out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.1.1?
> ===
>
> The current list of open tickets targeted at 3.1.1 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.1.1
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>