Re: Random expr in join key not support

2021-10-19 Thread Ye Xianjin
> For that, you can add a table subquery and do it in the select list.

Do you mean something like this:
select * from t1 join (select floor(random()*9) + id as x from t2) m on t1.id = 
m.x ?

Yes, that works. But that raise another question: theses two queries seem 
semantically equivalent, yet we treat them differently: one raises an analysis 
exception, one can work well. 
Should we treat them equally?




Sent from my iPhone

> On Oct 20, 2021, at 9:55 AM, Yingyi Bu  wrote:
> 
> 
> Per SQL spec, I think your join query can only be run as a NestedLoopJoin or 
> CartesianProduct.  See page 241 in SQL-99 
> (http://web.cecs.pdx.edu/~len/sql1999.pdf).
> In other words, it might be a correctness bug in other systems if they run 
> your query as a hash join.
> 
> > Here the purpose of adding a random in join key is to resolve the data skew 
> > problem.
> 
> For that, you can add a table subquery and do it in the select list.
> 
> Best,
> Yingyi
> 
> 
>> On Tue, Oct 19, 2021 at 12:46 AM Lantao Jin  wrote:
>> In PostgreSQL and Presto, the below query works well
>> sql> create table t1 (id int);
>> sql> create table t2 (id int);
>> sql> select * from t1 join t2 on t1.id = floor(random() * 9) + t2.id;
>> 
>> But it throws "Error in query: nondeterministic expressions are only allowed 
>> in Project, Filter, Aggregate or Window". Why Spark doesn't support random 
>> expressions in join condition?
>> Here the purpose to add a random in join key is to resolve the data skew 
>> problem.
>> 
>> Thanks,
>> Lantao


Re: Random expr in join key not support

2021-10-19 Thread Yingyi Bu
Per SQL spec, I think your join query can only be run as a NestedLoopJoin
or CartesianProduct.  See page 241 in SQL-99 (
http://web.cecs.pdx.edu/~len/sql1999.pdf).
In other words, it might be a correctness bug in other systems if they run
your query as a hash join.

> Here the purpose of adding a random in join key is to resolve the data
skew problem.

For that, you can add a table subquery and do it in the select list.

Best,
Yingyi


On Tue, Oct 19, 2021 at 12:46 AM Lantao Jin  wrote:

> In PostgreSQL and Presto, the below query works well
> sql> create table t1 (id int);
> sql> create table t2 (id int);
> sql> select * from t1 join t2 on t1.id = floor(random() * 9) + t2.id;
>
> But it throws "Error in query: nondeterministic expressions are only
> allowed in Project, Filter, Aggregate or Window". Why Spark doesn't support
> random expressions in join condition?
> Here the purpose to add a random in join key is to resolve the data skew
> problem.
>
> Thanks,
> Lantao
>


Re: [ANNOUNCE] Apache Spark 3.2.0

2021-10-19 Thread Wenchen Fan
Yea the file naming is a bit confusing, we can fix it in the next release.
3.2 actually means 3.2 or higher, so not a big deal I think.

Congrats and thanks!

On Wed, Oct 20, 2021 at 3:44 AM Jungtaek Lim 
wrote:

> Thanks to Gengliang for driving this huge release!
>
> On Wed, Oct 20, 2021 at 1:50 AM Dongjoon Hyun 
> wrote:
>
>> Thank you so much, Gengliang and all!
>>
>> Dongjoon.
>>
>> On Tue, Oct 19, 2021 at 8:48 AM Xiao Li  wrote:
>>
>>> Thank you, Gengliang!
>>>
>>> Congrats to our community and all the contributors!
>>>
>>> Xiao
>>>
>>> Henrik Peng  于2021年10月19日周二 上午8:26写道:
>>>
 Congrats and thanks!


 Gengliang Wang 于2021年10月19日 周二下午10:16写道:

> Hi all,
>
> Apache Spark 3.2.0 is the third release of the 3.x line. With
> tremendous contribution from the open-source community, this release
> managed to resolve in excess of 1,700 Jira tickets.
>
> We'd like to thank our contributors and users for their contributions
> and early feedback to this release. This release would not have been
> possible without you.
>
> To download Spark 3.2.0, head over to the download page:
> https://spark.apache.org/downloads.html
>
> To view the release notes:
> https://spark.apache.org/releases/spark-release-3-2-0.html
>



Re: [ANNOUNCE] Apache Spark 3.2.0

2021-10-19 Thread Jungtaek Lim
Thanks to Gengliang for driving this huge release!

On Wed, Oct 20, 2021 at 1:50 AM Dongjoon Hyun 
wrote:

> Thank you so much, Gengliang and all!
>
> Dongjoon.
>
> On Tue, Oct 19, 2021 at 8:48 AM Xiao Li  wrote:
>
>> Thank you, Gengliang!
>>
>> Congrats to our community and all the contributors!
>>
>> Xiao
>>
>> Henrik Peng  于2021年10月19日周二 上午8:26写道:
>>
>>> Congrats and thanks!
>>>
>>>
>>> Gengliang Wang 于2021年10月19日 周二下午10:16写道:
>>>
 Hi all,

 Apache Spark 3.2.0 is the third release of the 3.x line. With
 tremendous contribution from the open-source community, this release
 managed to resolve in excess of 1,700 Jira tickets.

 We'd like to thank our contributors and users for their contributions
 and early feedback to this release. This release would not have been
 possible without you.

 To download Spark 3.2.0, head over to the download page:
 https://spark.apache.org/downloads.html

 To view the release notes:
 https://spark.apache.org/releases/spark-release-3-2-0.html

>>>


Re: [ANNOUNCE] Apache Spark 3.2.0

2021-10-19 Thread Dongjoon Hyun
Thank you so much, Gengliang and all!

Dongjoon.

On Tue, Oct 19, 2021 at 8:48 AM Xiao Li  wrote:

> Thank you, Gengliang!
>
> Congrats to our community and all the contributors!
>
> Xiao
>
> Henrik Peng  于2021年10月19日周二 上午8:26写道:
>
>> Congrats and thanks!
>>
>>
>> Gengliang Wang 于2021年10月19日 周二下午10:16写道:
>>
>>> Hi all,
>>>
>>> Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous
>>> contribution from the open-source community, this release managed to
>>> resolve in excess of 1,700 Jira tickets.
>>>
>>> We'd like to thank our contributors and users for their contributions
>>> and early feedback to this release. This release would not have been
>>> possible without you.
>>>
>>> To download Spark 3.2.0, head over to the download page:
>>> https://spark.apache.org/downloads.html
>>>
>>> To view the release notes:
>>> https://spark.apache.org/releases/spark-release-3-2-0.html
>>>
>>


Re: [ANNOUNCE] Apache Spark 3.2.0

2021-10-19 Thread Xiao Li
Thank you, Gengliang!

Congrats to our community and all the contributors!

Xiao

Henrik Peng  于2021年10月19日周二 上午8:26写道:

> Congrats and thanks!
>
>
> Gengliang Wang 于2021年10月19日 周二下午10:16写道:
>
>> Hi all,
>>
>> Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous
>> contribution from the open-source community, this release managed to
>> resolve in excess of 1,700 Jira tickets.
>>
>> We'd like to thank our contributors and users for their contributions and
>> early feedback to this release. This release would not have been possible
>> without you.
>>
>> To download Spark 3.2.0, head over to the download page:
>> https://spark.apache.org/downloads.html
>>
>> To view the release notes:
>> https://spark.apache.org/releases/spark-release-3-2-0.html
>>
>


Re: [ANNOUNCE] Apache Spark 3.2.0

2021-10-19 Thread Prasad Paravatha
Works now. Thanks 
Minor thing, the version naming convention could cause confusion. 
The name on this UI vs the tgz file name. 




> On Oct 19, 2021, at 10:09 AM, Gengliang Wang  wrote:
> 
> 
> Hi Prasad,
> 
> Thanks for reporting the issue. The link was wrong. It should be fixed now.
> Could you try again on https://spark.apache.org/downloads.html?
> 
>> On Tue, Oct 19, 2021 at 10:53 PM Prasad Paravatha 
>>  wrote:
>> https://www.apache.org/dyn/closer.lua/spark/spark-3.2.0/spark-3.2.0-bin-hadoop3.3.tgz
>> 
>> FYI, unable to download from this location. 
>> Also, I don’t see Hadoop 3.3 version in the dist 
>> 
>> 
 On Oct 19, 2021, at 9:39 AM, Bode, Meikel, NMA-CFD 
  wrote:
 
>>> 
>>> Many thanks! 
>>> 
>>>  
>>> 
>>> From: Gengliang Wang  
>>> Sent: Dienstag, 19. Oktober 2021 16:16
>>> To: dev ; user 
>>> Subject: [ANNOUNCE] Apache Spark 3.2.0
>>> 
>>>  
>>> 
>>> Hi all,
>>> 
>>>  
>>> 
>>> Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous 
>>> contribution from the open-source community, this release managed to 
>>> resolve in excess of 1,700 Jira tickets.
>>> 
>>>  
>>> 
>>> We'd like to thank our contributors and users for their contributions and 
>>> early feedback to this release. This release would not have been possible 
>>> without you.
>>> 
>>>  
>>> 
>>> To download Spark 3.2.0, head over to the download page: 
>>> https://spark.apache.org/downloads.html
>>> 
>>>  
>>> 
>>> To view the release notes: 
>>> https://spark.apache.org/releases/spark-release-3-2-0.html


Re: [ANNOUNCE] Apache Spark 3.2.0

2021-10-19 Thread Gengliang Wang
Hi Prasad,

Thanks for reporting the issue. The link was wrong. It should be fixed now.
Could you try again on https://spark.apache.org/downloads.html?

On Tue, Oct 19, 2021 at 10:53 PM Prasad Paravatha <
prasad.parava...@gmail.com> wrote:

>
> https://www.apache.org/dyn/closer.lua/spark/spark-3.2.0/spark-3.2.0-bin-hadoop3.3.tgz
>
> FYI, unable to download from this location.
> Also, I don’t see Hadoop 3.3 version in the dist
>
>
> On Oct 19, 2021, at 9:39 AM, Bode, Meikel, NMA-CFD <
> meikel.b...@bertelsmann.de> wrote:
>
> 
>
> Many thanks! 
>
>
>
> *From:* Gengliang Wang 
> *Sent:* Dienstag, 19. Oktober 2021 16:16
> *To:* dev ; user 
> *Subject:* [ANNOUNCE] Apache Spark 3.2.0
>
>
>
> Hi all,
>
>
>
> Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous
> contribution from the open-source community, this release managed to
> resolve in excess of 1,700 Jira tickets.
>
>
>
> We'd like to thank our contributors and users for their contributions and
> early feedback to this release. This release would not have been possible
> without you.
>
>
>
> To download Spark 3.2.0, head over to the download page:
> https://spark.apache.org/downloads.html
> 
>
>
>
> To view the release notes:
> https://spark.apache.org/releases/spark-release-3-2-0.html
> 
>
>


Re: [ANNOUNCE] Apache Spark 3.2.0

2021-10-19 Thread Prasad Paravatha
https://www.apache.org/dyn/closer.lua/spark/spark-3.2.0/spark-3.2.0-bin-hadoop3.3.tgz

FYI, unable to download from this location. 
Also, I don’t see Hadoop 3.3 version in the dist 


> On Oct 19, 2021, at 9:39 AM, Bode, Meikel, NMA-CFD 
>  wrote:
> 
> 
> Many thanks! 
>  
> From: Gengliang Wang  
> Sent: Dienstag, 19. Oktober 2021 16:16
> To: dev ; user 
> Subject: [ANNOUNCE] Apache Spark 3.2.0
>  
> Hi all,
>  
> Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous 
> contribution from the open-source community, this release managed to resolve 
> in excess of 1,700 Jira tickets.
>  
> We'd like to thank our contributors and users for their contributions and 
> early feedback to this release. This release would not have been possible 
> without you.
>  
> To download Spark 3.2.0, head over to the download page: 
> https://spark.apache.org/downloads.html
>  
> To view the release notes: 
> https://spark.apache.org/releases/spark-release-3-2-0.html


RE: [ANNOUNCE] Apache Spark 3.2.0

2021-10-19 Thread Bode, Meikel, NMA-CFD
Many thanks! 

From: Gengliang Wang 
Sent: Dienstag, 19. Oktober 2021 16:16
To: dev ; user 
Subject: [ANNOUNCE] Apache Spark 3.2.0

Hi all,

Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous 
contribution from the open-source community, this release managed to resolve in 
excess of 1,700 Jira tickets.

We'd like to thank our contributors and users for their contributions and early 
feedback to this release. This release would not have been possible without you.

To download Spark 3.2.0, head over to the download page: 
https://spark.apache.org/downloads.html

To view the release notes: 
https://spark.apache.org/releases/spark-release-3-2-0.html


Re: [ANNOUNCE] Apache Spark 3.2.0

2021-10-19 Thread Mridul Muralidharan
Congratulations everyone !
And thanks Gengliang for sheparding  the release out :-)

Regards,
Mridul

On Tue, Oct 19, 2021 at 9:25 AM Yuming Wang  wrote:

> Congrats and thanks!
>
> On Tue, Oct 19, 2021 at 10:17 PM Gengliang Wang  wrote:
>
>> Hi all,
>>
>> Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous
>> contribution from the open-source community, this release managed to
>> resolve in excess of 1,700 Jira tickets.
>>
>> We'd like to thank our contributors and users for their contributions and
>> early feedback to this release. This release would not have been possible
>> without you.
>>
>> To download Spark 3.2.0, head over to the download page:
>> https://spark.apache.org/downloads.html
>>
>> To view the release notes:
>> https://spark.apache.org/releases/spark-release-3-2-0.html
>>
>


Re: [ANNOUNCE] Apache Spark 3.2.0

2021-10-19 Thread Yuming Wang
Congrats and thanks!

On Tue, Oct 19, 2021 at 10:17 PM Gengliang Wang  wrote:

> Hi all,
>
> Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous
> contribution from the open-source community, this release managed to
> resolve in excess of 1,700 Jira tickets.
>
> We'd like to thank our contributors and users for their contributions and
> early feedback to this release. This release would not have been possible
> without you.
>
> To download Spark 3.2.0, head over to the download page:
> https://spark.apache.org/downloads.html
>
> To view the release notes:
> https://spark.apache.org/releases/spark-release-3-2-0.html
>


[ANNOUNCE] Apache Spark 3.2.0

2021-10-19 Thread Gengliang Wang
Hi all,

Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous
contribution from the open-source community, this release managed to
resolve in excess of 1,700 Jira tickets.

We'd like to thank our contributors and users for their contributions and
early feedback to this release. This release would not have been possible
without you.

To download Spark 3.2.0, head over to the download page:
https://spark.apache.org/downloads.html

To view the release notes:
https://spark.apache.org/releases/spark-release-3-2-0.html


Random expr in join key not support

2021-10-19 Thread Lantao Jin
In PostgreSQL and Presto, the below query works well
sql> create table t1 (id int);
sql> create table t2 (id int);
sql> select * from t1 join t2 on t1.id = floor(random() * 9) + t2.id;

But it throws "Error in query: nondeterministic expressions are only
allowed in Project, Filter, Aggregate or Window". Why Spark doesn't support
random expressions in join condition?
Here the purpose to add a random in join key is to resolve the data skew
problem.

Thanks,
Lantao