Re: A scene with unstable Spark performance

2022-05-17 Thread Bowen Song
Hi, Spark dynamic resource allocation cannot solve my problem, because the resources of the production environment are limited. I expect that under this premise, by reserving resources to ensure that job tasks of different groups can be scheduled in time. Thank you, Bowen Song

Unable to create view due to up cast error when migrating from Hive to Spark

2022-05-17 Thread beliefer
During the migration from Hive to spark, there was a problem when the view created in Hive was used in Spark SQL. The origin Hive SQL show below: CREATE VIEW myView AS SELECT CASE WHEN age > 12 THEN CAST(gender * 0.3 - 0.1 AS double) END AS TT, gender, age FROM myTable; Users use Spark SQL

Re: Reverse proxy for Spark UI on Kubernetes

2022-05-17 Thread bo yang
Yes, it should be possible, any interest to work on this together? Need more hands to add more features here :) On Tue, May 17, 2022 at 2:06 PM Holden Karau wrote: > Could we make it do the same sort of history server fallback approach? > > On Tue, May 17, 2022 at 10:41 PM bo yang wrote: > >>

Re: Data Engineering Track at ApacheCon (October 3-6, New Orleans) - CFP ends 23/05

2022-05-17 Thread Pasha Finkelshtein
Hi Ismaël, Looks like I do: https://github.com/JetBrains/kotlin-spark-api :) Regards, Pasha ср, 18 мая 2022 г., 01:18 Ismaël Mejía : > Hello Pasha, > > This is not only for Apache project maintainers, if you contribute or > maintain other tool that integrates with an existing Apache project

Re: Data Engineering Track at ApacheCon (October 3-6, New Orleans) - CFP ends 23/05

2022-05-17 Thread Ismaël Mejía
Hello Pasha, This is not only for Apache project maintainers, if you contribute or maintain other tool that integrates with an existing Apache project to do or improve common Data Engineering tasks it can definitely fit. Regards, Ismaël On Tue, May 17, 2022 at 11:23 PM Pasha Finkelshtein

Re: Data Engineering Track at ApacheCon (October 3-6, New Orleans) - CFP ends 23/05

2022-05-17 Thread Pasha Finkelshtein
Hi Ismaël, Thank you, it's interesting. Is this message relevant only to maintainers/contributors of top-level Apache projects or works for other maintainers of Apache-licensed software too? Regards, Pasha ср, 18 мая 2022 г., 00:05 Ismaël Mejía : > Hello, > > ApacheCon North America is back in

Re: Reverse proxy for Spark UI on Kubernetes

2022-05-17 Thread Holden Karau
Could we make it do the same sort of history server fallback approach? On Tue, May 17, 2022 at 10:41 PM bo yang wrote: > It is like Web Application Proxy in YARN ( > https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/WebApplicationProxy.html), > to provide easy access for Spark

Data Engineering Track at ApacheCon (October 3-6, New Orleans) - CFP ends 23/05

2022-05-17 Thread Ismaël Mejía
Hello, ApacheCon North America is back in person this year in October. https://apachecon.com/acna2022/ Together with Jarek Potiuk, we are organizing for the first time a Data Engineering Track as part of ApacheCon. You might be wondering why a different track if we already have the Big Data

Re: Reverse proxy for Spark UI on Kubernetes

2022-05-17 Thread bo yang
It is like Web Application Proxy in YARN ( https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/WebApplicationProxy.html), to provide easy access for Spark UI when the Spark application is running. When running Spark on Kubernetes with S3, there is no YARN. The reverse proxy here is

Re: Reverse proxy for Spark UI on Kubernetes

2022-05-17 Thread bo yang
Thanks Holden :) On Mon, May 16, 2022 at 11:12 PM Holden Karau wrote: > Oh that’s rad  > > On Tue, May 17, 2022 at 7:47 AM bo yang wrote: > >> Hi Spark Folks, >> >> I built a web reverse proxy to access Spark UI on Kubernetes (working >> together with >>

Re: Re: Re: Re: [VOTE] Release Spark 3.3.0 (RC2)

2022-05-17 Thread Hyukjin Kwon
There might be other blockers. Lets wait and see. On Tue, May 17, 2022 at 8:59 PM beliefer wrote: > OK. let it into 3.3.1 > > > 在 2022-05-17 18:59:13,"Hyukjin Kwon" 写道: > > I think most users won't be affected since aggregate pushdown is disabled > by default. > > On Tue, 17 May 2022 at 19:53,

Re:Re: Re: Re: [VOTE] Release Spark 3.3.0 (RC2)

2022-05-17 Thread beliefer
OK. let it into 3.3.1 在 2022-05-17 18:59:13,"Hyukjin Kwon" 写道: I think most users won't be affected since aggregate pushdown is disabled by default. On Tue, 17 May 2022 at 19:53, beliefer wrote: If we not contains https://github.com/apache/spark/pull/36556, we will break change when

Re: Re: Re: [VOTE] Release Spark 3.3.0 (RC2)

2022-05-17 Thread Hyukjin Kwon
And seems like it won't break it because adding a new method won't break binary compatibility. On Tue, 17 May 2022 at 19:59, Hyukjin Kwon wrote: > I think most users won't be affected since aggregate pushdown is disabled > by default. > > On Tue, 17 May 2022 at 19:53, beliefer wrote: > >> If

Re: Re: Re: [VOTE] Release Spark 3.3.0 (RC2)

2022-05-17 Thread Hyukjin Kwon
I think most users won't be affected since aggregate pushdown is disabled by default. On Tue, 17 May 2022 at 19:53, beliefer wrote: > If we not contains https://github.com/apache/spark/pull/36556, we will > break change when we merge it into 3.3.1 > > At 2022-05-17 18:26:12, "Hyukjin Kwon"

Re:Re: Re: [VOTE] Release Spark 3.3.0 (RC2)

2022-05-17 Thread beliefer
If we not contains https://github.com/apache/spark/pull/36556, we will break change when we merge it into 3.3.1 At 2022-05-17 18:26:12, "Hyukjin Kwon" wrote: We need add https://github.com/apache/spark/pull/36556 to RC2. We will likely have to change the version being added if RC2 passes.

Re: Re: [VOTE] Release Spark 3.3.0 (RC2)

2022-05-17 Thread Hyukjin Kwon
We need add https://github.com/apache/spark/pull/36556 to RC2. We will likely have to change the version being added if RC2 passes. Since this is a new API/improvement, I would prefer to not block the release by that. On Tue, 17 May 2022 at 19:19, beliefer wrote: > We need add

Re:Re: [VOTE] Release Spark 3.3.0 (RC2)

2022-05-17 Thread beliefer
We need add https://github.com/apache/spark/pull/36556 to RC2. 在 2022-05-17 17:37:13,"Hyukjin Kwon" 写道: That seems like a test-only issue. I made a quick followup at https://github.com/apache/spark/pull/36576. On Tue, 17 May 2022 at 03:56, Sean Owen wrote: I'm still seeing failures

Re: [VOTE] Release Spark 3.3.0 (RC2)

2022-05-17 Thread Hyukjin Kwon
That seems like a test-only issue. I made a quick followup at https://github.com/apache/spark/pull/36576. On Tue, 17 May 2022 at 03:56, Sean Owen wrote: > I'm still seeing failures related to the function registry, like: > > ExpressionsSchemaSuite: > - Check schemas for expression examples ***

Unable to create view due to up cast error when migrating from Hive to Spark

2022-05-17 Thread beliefer
During the migration from hive to spark, there was a problem with the SQL used to create views in hive. The problem is that the SQL that legally creates a view in hive will make an error when executed in spark SQL. The SQL is as follows: CREATE VIEW myView AS SELECT CASE WHEN age > 12 THEN

Re: Introducing "Pandas API on Spark" component in JIRA, and use "PS" PR title component

2022-05-17 Thread Maciej
Sounds good! +1 On 5/17/22 06:08, Yikun Jiang wrote: > It's a pretty good idea, +1. > > To be clear in Github: > > - For each PR Title: [SPARK-XXX][PYTHON][PS] The Pandas on spark pr title > (*still keep [PYTHON]* and [PS] new added) > > - For PR label: new added: `PANDAS API ON Spark`, still

Re: Reverse proxy for Spark UI on Kubernetes

2022-05-17 Thread Holden Karau
Oh that’s rad  On Tue, May 17, 2022 at 7:47 AM bo yang wrote: > Hi Spark Folks, > > I built a web reverse proxy to access Spark UI on Kubernetes (working > together with https://github.com/GoogleCloudPlatform/spark-on-k8s-operator). > Want to share here in case other people have similar need.