Re: [VOTE] Release Spark 3.3.0 (RC2)

2022-05-18 Thread Maxim Gekk
Hi Kent, > Shall we backport the fix from the master to 3.3 too? Yes, we shall. Maxim Gekk Software Engineer Databricks, Inc. On Thu, May 19, 2022 at 6:44 AM Kent Yao wrote: > Hi, > > I verified the simple case below with the binary release, and it looks > like a bug to me. > >

Re: A scene with unstable Spark performance

2022-05-18 Thread Chang Chen
This is a case where resources are fixed in the same SparkContext, but sqls have different priorities. Some SQLs are only allowed to be executed if there are spare resources, once the high priority sql comes in, those sqls taskset either are killed or stalled. If we set a high priority pool's

Re: Behaviour of Append & Overwrite modes when table is not present when using df.write in Spark 3

2022-05-18 Thread Sourabh Badhya
Requesting some suggestions on this. Thanks in advance, Sourabh Badhya On Mon, May 9, 2022 at 5:16 PM Sourabh Badhya wrote: > Hi team, > > I would like to know the behaviour of Append & Overwrite modes when table > is not present and whether automatic table creation is > supported/unsupported

Re: Introducing "Pandas API on Spark" component in JIRA, and use "PS" PR title component

2022-05-18 Thread Dongjoon Hyun
+1 Thank you for the suggestion, Hyukjin. Dongjoon. On Wed, May 18, 2022 at 11:08 AM Bjørn Jørgensen wrote: > +1 > But can will have PR Title and PR label the same, PS > > ons. 18. mai 2022 kl. 18:57 skrev Xinrong Meng > : > >> Great! >> >> It saves us from always specifying "Pandas API on

Re: [VOTE] Release Spark 3.3.0 (RC2)

2022-05-18 Thread Kent Yao
Hi, I verified the simple case below with the binary release, and it looks like a bug to me. bin/spark-sql -e "select date '2018-11-17' > 1" Error in query: Invalid call to toAttribute on unresolved object; 'Project [unresolvedalias((2018-11-17 > 1), None)] +- OneRowRelation Both 3.2 releases

Re: Introducing "Pandas API on Spark" component in JIRA, and use "PS" PR title component

2022-05-18 Thread Bjørn Jørgensen
+1 But can will have PR Title and PR label the same, PS ons. 18. mai 2022 kl. 18:57 skrev Xinrong Meng : > Great! > > It saves us from always specifying "Pandas API on Spark" in PR titles. > > Thanks! > > > Xinrong Meng > > Software Engineer > > Databricks > > > On Tue, May 17, 2022 at 1:08 AM

Re: Introducing "Pandas API on Spark" component in JIRA, and use "PS" PR title component

2022-05-18 Thread Xinrong Meng
Great! It saves us from always specifying "Pandas API on Spark" in PR titles. Thanks! Xinrong Meng Software Engineer Databricks On Tue, May 17, 2022 at 1:08 AM Maciej wrote: > Sounds good! > > +1 > > On 5/17/22 06:08, Yikun Jiang wrote: > > It's a pretty good idea, +1. > > > > To be

Re: Unable to create view due to up cast error when migrating from Hive to Spark

2022-05-18 Thread Wenchen Fan
A view is essentially a SQL query. It's fragile to share views between Spark and Hive because different systems have different SQL dialects. They may interpret the view SQL query differently and introduce unexpected behaviors. In this case, Spark returns decimal type for gender * 0.3 - 0.1 but

Re: [VOTE] Release Spark 3.3.0 (RC2)

2022-05-18 Thread Yuming Wang
-1. There is a regression: https://github.com/apache/spark/pull/36595 On Wed, May 18, 2022 at 4:11 PM Martin Grigorov wrote: > Hi, > > [X] +1 Release this package as Apache Spark 3.3.0 > > Tested: > - make local distribution from sources (with ./dev/make-distribution.sh > --tgz --name

Unable to create view due to up cast error when migrating from Hive to Spark

2022-05-18 Thread beliefer
During the migration from hive to spark, there was a problem with the SQL used to create views in hive. The problem is that the SQL that legally creates a view in hive will make an error when executed in spark SQL. The SQL is as follows: CREATE VIEW test_db.my_view AS select case when age > 12

Re: [VOTE] Release Spark 3.3.0 (RC2)

2022-05-18 Thread Martin Grigorov
Hi, [X] +1 Release this package as Apache Spark 3.3.0 Tested: - make local distribution from sources (with ./dev/make-distribution.sh --tgz --name with-volcano -Pkubernetes,volcano,hadoop-3) - create a Docker image (with JDK 11) - run Pi example on -- local -- Kubernetes with default scheduler