Re: [VOTE] Add new `Versions` in Apache Spark JIRA for Versioning of Spark Operator

2024-04-12 Thread huaxin gao
+1 On Fri, Apr 12, 2024 at 9:07 AM Dongjoon Hyun wrote: > +1 > > Thank you! > > I hope we can customize `dev/merge_spark_pr.py` script per repository > after this PR. > > Dongjoon. > > On 2024/04/12 03:28:36 "L. C. Hsieh" wrote: > > Hi all, > > > > Thanks for all discussions in the thread of

Re: [VOTE] Add new `Versions` in Apache Spark JIRA for Versioning of Spark Operator

2024-04-12 Thread Dongjoon Hyun
+1 Thank you! I hope we can customize `dev/merge_spark_pr.py` script per repository after this PR. Dongjoon. On 2024/04/12 03:28:36 "L. C. Hsieh" wrote: > Hi all, > > Thanks for all discussions in the thread of "Versioning of Spark > Operator":

Re: [DISCUSS] SPARK-44444: Use ANSI SQL mode by default

2024-04-12 Thread Wenchen Fan
+1, the existing "NULL on error" behavior is terrible for data quality. I have one concern about error reporting with DataFrame APIs. Query execution is lazy and where the error happens can be far away from where the dataframe/column was created. We are improving it (PR

[DISCUSS] Spark 4.0.0 release

2024-04-12 Thread Wenchen Fan
Hi all, It's close to the previously proposed 4.0.0 release date (June 2024), and I think it's time to prepare for it and discuss the ongoing projects: - ANSI by default - Spark Connect GA - Structured Logging - Streaming state store data source - new data type VARIANT - STRING

Re: [DISCUSS] SPARK-44444: Use ANSI SQL mode by default

2024-04-12 Thread L. C. Hsieh
+1 I believe ANSI mode is well developed after many releases. No doubt it could be used. Since it is very easy to disable it to restore to current behavior, I guess the impact could be limited. Do we have known the possible impacts such as what are the major changes (e.g., what kind of

Re: [DISCUSS] SPARK-44444: Use ANSI SQL mode by default

2024-04-11 Thread Gengliang Wang
+1, enabling Spark's ANSI SQL mode in version 4.0 will significantly enhance data quality and integrity. I fully support this initiative. > In other words, the current Spark ANSI SQL implementation becomes the first implementation for Spark SQL users to face at first while providing

Re: [PySpark]: DataFrameWriterV2.overwrite fails with spark connect

2024-04-11 Thread Ruifeng Zheng
Toki Takahashi, Thanks for reporting this, I created https://issues.apache.org/jira/browse/SPARK-47828 to track this bug. I will take a look. On Thu, Apr 11, 2024 at 10:11 PM Toki Takahashi wrote: > Hi Community, > > I get the following error when using Spark Connect in PySpark 3.5.1 > and

[VOTE] Add new `Versions` in Apache Spark JIRA for Versioning of Spark Operator

2024-04-11 Thread L. C. Hsieh
Hi all, Thanks for all discussions in the thread of "Versioning of Spark Operator": https://lists.apache.org/thread/zhc7nb2sxm8jjxdppq8qjcmlf4rcsthh I would like to create this vote to get the consensus for versioning of the Spark Kubernetes Operator. The proposal is to use an independent

[DISCUSS] SPARK-44444: Use ANSI SQL mode by default

2024-04-11 Thread Dongjoon Hyun
Hi, All. Thanks to you, we've been achieving many things and have on-going SPIPs. I believe it's time to scope Apache Spark 4.0.0 (SPARK-44111) more narrowly by asking your opinions about Apache Spark's ANSI SQL mode. https://issues.apache.org/jira/browse/SPARK-44111 Prepare Apache Spark

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-04-11 Thread Jungtaek Lim
I'm still having a hard time reviewing this. I have been handling a bunch of context right now, and the change is non-trivial to review in parallel. I see people were OK with the algorithm in high-level, but from a code perspective it's uneasy to understand without knowledge of DRA. It would take

Re: External Spark shuffle service for k8s

2024-04-11 Thread Bjørn Jørgensen
I think this answers your question about what to do if you need more space on nodes. https://spark.apache.org/docs/latest/running-on-kubernetes.html#local-storage Local Storage Spark supports using volumes to spill

[PySpark]: DataFrameWriterV2.overwrite fails with spark connect

2024-04-11 Thread Toki Takahashi
Hi Community, I get the following error when using Spark Connect in PySpark 3.5.1 and writing with DataFrameWriterV2.overwrite. ``` > df.writeTo('db.table').overwrite(F.col('id')==F.lit(1)) ... SparkConnectGrpcException: (org.apache.spark.sql.connect.common.InvalidPlanInput) Expression with ID:

Re: [External] Re: Versioning of Spark Operator

2024-04-11 Thread Ofir Manor
A related question - what is the expected release cadence? At least for the next 12-18 months? Since this is a new subproject, I am personally hoping it would have a faster cadence at first, maybe one a month or once every couple of months... If so, that would affect versioning. Also, if it

Re: External Spark shuffle service for k8s

2024-04-11 Thread Bjørn Jørgensen
" In the end for my usecase I started using pvcs and pvc aware scheduling along with decommissioning. So far performance is good with this choice." How did you do this? tor. 11. apr. 2024 kl. 04:13 skrev Arun Ravi : > Hi Everyone, > > I had to explored IBM's and AWS's S3 shuffle plugins (some

Re: Introducing Apache Gluten(incubating), a middle layer to offload Spark to native engine

2024-04-10 Thread Holden Karau
On Wed, Apr 10, 2024 at 9:54 PM Binwei Yang wrote: > > Gluten currently already support Velox backend and Clickhouse backend. > data fusion support is also proposed but no one worked on it. > > Gluten isn't a POC. It's under actively developing but some companies > already used it. > > > On

Re: Introducing Apache Gluten(incubating), a middle layer to offload Spark to native engine

2024-04-10 Thread Binwei Yang
Gluten currently already support Velox backend and Clickhouse backend. data fusion support is also proposed but no one worked on it. Gluten isn't a POC. It's under actively developing but some companies already used it. On 2024/04/11 03:32:01 Dongjoon Hyun wrote: > I'm interested in your

Re: SPIP: Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-10 Thread Binwei Yang
Gluten java part is pretty stable now. The development is more in the c++ code, velox code as well as Clickhouse backend. The SPIP doesn't plan to introduce whole Gluten stack into Spark. But the way to serialize Spark physical plan and be able to send to native backend, through JNI or gRPC.

Re: SPIP: Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-10 Thread Binwei Yang
We (Gluten and Arrow guys) actually do planned to put the plan conversation in the substrait-java repo. But to me it makes more sense to put it as part of Spark repo. Native library and accelerator support will be more and more import in future. On 2024/04/10 08:29:08 Wenchen Fan wrote: >

Re: Introducing Apache Gluten(incubating), a middle layer to offload Spark to native engine

2024-04-10 Thread Dongjoon Hyun
I'm interested in your claim. Could you elaborate or provide some evidence for your claim, *a door for all native libraries*, Binwei? For example, is there any POC for that claim? Maybe, did I miss something in that SPIP? Dongjoon. On Wed, Apr 10, 2024 at 8:19 PM Binwei Yang wrote: > > The

Re: Introducing Apache Gluten(incubating), a middle layer to offload Spark to native engine

2024-04-10 Thread Binwei Yang
The SPIP is not for current Gluten, but open a door for all native libraries and accelerators support. On 2024/04/11 00:27:43 Weiting Chen wrote: > Yes, the 1st Apache release(v1.2.0) for Gluten will be in September. > For Spark version support, currently Gluten v1.1.1 support Spark3.2 and

Re: External Spark shuffle service for k8s

2024-04-10 Thread Arun Ravi
Hi Everyone, I had to explored IBM's and AWS's S3 shuffle plugins (some time back), I had also explored AWS FSX lustre in few of my production jobs which has ~20TB of shuffle operations with 200-300 executors. What I have observed is S3 and fax behaviour was fine during the write phase, however I

Re: Introducing Apache Gluten(incubating), a middle layer to offload Spark to native engine

2024-04-10 Thread Weiting Chen
Yes, the 1st Apache release(v1.2.0) for Gluten will be in September. For Spark version support, currently Gluten v1.1.1 support Spark3.2 and 3.3. We are planning to support Spark3.4 and 3.5 in Gluten v1.2.0. Spark4.0 support for Gluten is depending on the release schedule in Spark community. On

Re: SPIP: Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-10 Thread L. C. Hsieh
+1 for Wenchen's point. I don't see a strong reason to pull these transformations into Spark instead of keeping them in third party packages/projects. On Wed, Apr 10, 2024 at 5:32 AM Wenchen Fan wrote: > > It's good to reduce duplication between different native accelerators of > Spark, and

Re: Versioning of Spark Operator

2024-04-10 Thread L. C. Hsieh
This approach makes sense to me. If Spark K8s operator is aligned with Spark versions, for example, it uses 4.0.0 now. Because these JIRA tickets are not actually targeting Spark 4.0.0, it will cause confusion and more questions, like when we are going to cut Spark release, should we include

Re: Versioning of Spark Operator

2024-04-10 Thread bo yang
Cool, looks like we have two options here. Option 1: Spark Operator and Connect Go Client versioning independent of Spark, e.g. starting with 0.1.0. Pros: they can evolve versions independently. Cons: people will need an extra step to decide the version when using Spark Operator and Connect Go

Re: SPIP: Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-10 Thread Mich Talebzadeh
I read the SPIP. I have a number of ;points if I may - Maturity of Gluten: as the excerpt mentions, Gluten is a project, and its feature set and stability IMO are still under development. Integrating a non-core component could introduce risks if it is not fully mature - Complexity: integrating

Re: SPIP: Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-10 Thread Wenchen Fan
It's good to reduce duplication between different native accelerators of Spark, and AFAIK there is already a project trying to solve it: https://substrait.io/ I'm not sure why we need to do this inside Spark, instead of doing the unification for a wider scope (for all engines, not only Spark).

Re: Versioning of Spark Operator

2024-04-10 Thread Dongjoon Hyun
Ya, that would work. Inevitably, I looked at Apache Flink K8s Operator's JIRA and GitHub repo. It looks reasonable to me. Although they share the same JIRA, they choose different patterns per place. 1. In POM file and Maven Artifact, independent version number. 1.8.0 2. Tag is also based on

Re: Versioning of Spark Operator

2024-04-10 Thread L. C. Hsieh
Yea, I guess, for example, the first release of Spark K8s Operator would be something like 0.1.0 instead of 4.0.0. It sounds hard to align with Spark versions because of that? On Tue, Apr 9, 2024 at 10:15 AM Dongjoon Hyun wrote: > > Ya, that's simple and possible. > > However, it may cause

Re: SPIP: Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-09 Thread Holden Karau
I like the idea of improving flexibility of Sparks physical plans and really anything that might reduce code duplication among the ~4 or so different accelerators. Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9

Re: Versioning of Spark Operator

2024-04-09 Thread L. C. Hsieh
For Spark Operator, I think the answer is yes. According to my impression, Spark Operator should be Spark version-agnostic. Zhou, please correct me if I'm wrong. I am not sure about the Spark Connector Go client, but if it is going to talk with Spark cluster, I guess it should be still related to

Re: Versioning of Spark Operator

2024-04-09 Thread Dongjoon Hyun
Do we have a compatibility matrix of Apache Connect Go client already, Bo? Specifically, I'm wondering which versions the existing Apache Spark Connect Go repository is able to support as of now. We know that it is supposed to be compatible always, but do we have a way to verify that actually

Re: Versioning of Spark Operator

2024-04-09 Thread bo yang
Thanks Liang-Chi for the Spark Operator work, and also the discussion here! For Spark Operator and Connector Go Client, I am guessing they need to support multiple versions of Spark? e.g. same Spark Operator may support running multiple versions of Spark, and Connector Go Client might support

Re: Versioning of Spark Operator

2024-04-09 Thread Dongjoon Hyun
Ya, that's simple and possible. However, it may cause many confusions because it implies that new `Spark K8s Operator 4.0.0` and `Spark Connect Go 4.0.0` follow the same `Semantic Versioning` policy like Apache Spark 4.0.0. In addition, `Versioning` is directly related to the Release Cadence.

Re: Versioning of Spark Operator

2024-04-09 Thread DB Tsai
Aligning with Spark releases is sensible, as it allows us to guarantee that the Spark operator functions correctly with the new version while also maintaining support for previous versions. DB Tsai | https://www.dbtsai.com/ | PGP 42E5B25A8F7A82C1 > On Apr 9, 2024, at 9:45 AM, Mridul

Re: Versioning of Spark Operator

2024-04-09 Thread Mridul Muralidharan
I am trying to understand if we can simply align with Spark's version for this ? Makes the release and jira management much more simpler for developers and intuitive for users. Regards, Mridul On Tue, Apr 9, 2024 at 10:09 AM Dongjoon Hyun wrote: > Hi, Liang-Chi. > > Thank you for leading

Re: Versioning of Spark Operator

2024-04-09 Thread Dongjoon Hyun
Hi, Liang-Chi. Thank you for leading Apache Spark K8s operator as a shepherd. I took a look at `Apache Spark Connect Go` repo mentioned in the thread. Sadly, there is no release at all and no activity since last 6 months. It seems to be the first time for Apache Spark community to consider

Re: SPIP: Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-09 Thread Dongjoon Hyun
Thank you for sharing, Jia. I have the same questions like the previous Weiting's thread. Do you think you can share the future milestone of Apache Gluten? I'm wondering when the first stable release will come and how we can coordinate across the ASF communities. > This project is still under

Re: Introducing Apache Gluten(incubating), a middle layer to offload Spark to native engine

2024-04-09 Thread Dongjoon Hyun
Thank you for sharing, Weiting. Do you think you can share the future milestone of Apache Gluten? I'm wondering when the first stable release will come and how we can coordinate across the ASF communities. > This project is still under active development now, and doesn't have a stable release. >

Introducing Apache Gluten(incubating), a middle layer to offload Spark to native engine

2024-04-09 Thread WeitingChen
Hi all, We are excited to introduce a new Apache incubating project called Gluten. Gluten serves as a middleware layer designed to offload Spark to native engines like Velox or ClickHouse. For more detailed information, please visit the project repository at

SPIP: Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-09 Thread Ke Jia
Apache Spark currently lacks an official mechanism to support cross-platform execution of physical plans. The Gluten project offers a mechanism that utilizes the Substrait standard to convert and optimize Spark's physical plans. By introducing Gluten's plan conversion, validation, and fallback

Versioning of Spark Operator

2024-04-08 Thread L. C. Hsieh
Hi all, We've opened the dedicated repository of Spark Kubernetes Operator, and the first PR is created. Thank you for the review from the community so far. About the versioning of Spark Operator, there are questions. As we are using Spark JIRA, when we are going to merge PRs, we need to choose

Unsubscribe

2024-04-08 Thread bruce COTTMAN
- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Apache Spark 3.4.3 (?)

2024-04-08 Thread Dongjoon Hyun
Thank you, Holden, Mridul, Kent, Liang-Chi, Mich, Jungtaek. I added `Target Version: 3.4.3` to SPARK-47318 and am going to continue to prepare for RC1 (April 15th). Dongjoon. - To unsubscribe e-mail:

Re: External Spark shuffle service for k8s

2024-04-08 Thread Mich Talebzadeh
Hi, First thanks everyone for their contributions I was going to reply to @Enrico Minack but noticed additional info. As I understand for example, Apache Uniffle is an incubating project aimed at providing a pluggable shuffle service for Spark. So basically, all these "external shuffle

Re: External Spark shuffle service for k8s

2024-04-08 Thread Vakaris Baškirov
I see that both Uniffle and Celebron support S3/HDFS backends which is great. In the case someone is using S3/HDFS, I wonder what would be the advantages of using Celebron or Uniffle vs IBM shuffle service plugin or Cloud Shuffle Storage Plugin from AWS

Re: External Spark shuffle service for k8s

2024-04-08 Thread roryqi
Apache Uniffle (incubating) may be another solution. You can see https://github.com/apache/incubator-uniffle https://uniffle.apache.org/blog/2023/07/21/Uniffle%20-%20New%20chapter%20for%20the%20shuffle%20in%20the%20cloud%20native%20era Mich Talebzadeh 于2024年4月8日周一 07:15写道: > Splendid > > The

Re: Apache Spark 3.4.3 (?)

2024-04-07 Thread Jungtaek Lim
Sounds like a plan. +1 (non-binding) Thanks for volunteering! On Sun, Apr 7, 2024 at 5:45 AM Dongjoon Hyun wrote: > Hi, All. > > Apache Spark 3.4.2 tag was created on Nov 24th and `branch-3.4` has 85 > commits including important security and correctness patches like > SPARK-45580, SPARK-46092,

Fwd: Apache Spark 3.4.3 (?)

2024-04-07 Thread Mich Talebzadeh
Mich Talebzadeh, Technologist | Solutions Architect | Data Engineer | Generative AI London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The information provided is correct

Re: Apache Spark 3.4.3 (?)

2024-04-07 Thread L. C. Hsieh
+1 Thanks Dongjoon! On Sun, Apr 7, 2024 at 1:56 AM Kent Yao wrote: > > +1, thank you, Dongjoon > > > Kent > > Holden Karau 于2024年4月7日周日 14:54写道: > > > > Sounds good to me :) > > > > Twitter: https://twitter.com/holdenkarau > > Books (Learning Spark, High Performance Spark, etc.): > >

Re: External Spark shuffle service for k8s

2024-04-07 Thread Mich Talebzadeh
Thanks Cheng for the heads up. I will have a look. Cheers Mich Talebzadeh, Technologist | Solutions Architect | Data Engineer | Generative AI London United Kingdom view my Linkedin profile

Re: External Spark shuffle service for k8s

2024-04-07 Thread Cheng Pan
Instead of External Shuffle Shufle, Apache Celeborn might be a good option as a Remote Shuffle Service for Spark on K8s. There are some useful resources you might be interested in. [1] https://celeborn.apache.org/ [2] https://www.youtube.com/watch?v=s5xOtG6Venw [3]

Re: External Spark shuffle service for k8s

2024-04-07 Thread Mich Talebzadeh
Splendid The configurations below can be used with k8s deployments of Spark. Spark applications running on k8s can utilize these configurations to seamlessly access data stored in Google Cloud Storage (GCS) and Amazon S3. For Google GCS we may have spark_config_gcs = {

Re: External Spark shuffle service for k8s

2024-04-07 Thread Vakaris Baškirov
There is an IBM shuffle service plugin that supports S3 https://github.com/IBM/spark-s3-shuffle Though I would think a feature like this could be a part of the main Spark repo. Trino already has out-of-box support for s3 exchange (shuffle) and it's very useful. Vakaris On Sun, Apr 7, 2024 at

Re: Apache Spark 3.4.3 (?)

2024-04-07 Thread Kent Yao
+1, thank you, Dongjoon Kent Holden Karau 于2024年4月7日周日 14:54写道: > > Sounds good to me :) > > Twitter: https://twitter.com/holdenkarau > Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 > YouTube Live Streams: https://www.youtube.com/user/holdenkarau > > > On Sat,

Re: Apache Spark 3.4.3 (?)

2024-04-06 Thread Mridul Muralidharan
Hi Dongjoon, Thanks for volunteering ! I would suggest to wait for SPARK-47318 to be merged as well for 3.4 Regards, Mridul On Sat, Apr 6, 2024 at 6:49 PM Dongjoon Hyun wrote: > Hi, All. > > Apache Spark 3.4.2 tag was created on Nov 24th and `branch-3.4` has 85 > commits including important

Re: Apache Spark 3.4.3 (?)

2024-04-06 Thread Holden Karau
Sounds good to me :) Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams: https://www.youtube.com/user/holdenkarau On Sat, Apr 6, 2024 at 2:51 PM Dongjoon Hyun wrote: > Hi, All.

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-04-06 Thread Pavan Kotikalapudi
Hi Jungtaek, Status on current SPARK-24815 : Thomas Graves is reviewing the draft PR . I need to add documentation about the configs and usage details, I am planning to do that this week. He did mention

Re: External Spark shuffle service for k8s

2024-04-06 Thread Mich Talebzadeh
Thanks for your suggestion that I take it as a workaround. Whilst this workaround can potentially address storage allocation issues, I was more interested in exploring solutions that offer a more seamless integration with large distributed file systems like HDFS, GCS, or S3. This would ensure

Apache Spark 3.4.3 (?)

2024-04-06 Thread Dongjoon Hyun
Hi, All. Apache Spark 3.4.2 tag was created on Nov 24th and `branch-3.4` has 85 commits including important security and correctness patches like SPARK-45580, SPARK-46092, SPARK-46466, SPARK-46794, and SPARK-46862. https://github.com/apache/spark/releases/tag/v3.4.2 $ git log --oneline

Re: External Spark shuffle service for k8s

2024-04-06 Thread Bjørn Jørgensen
You can make a PVC on K8S call it 300GB make a folder in yours dockerfile WORKDIR /opt/spark/work-dir RUN chmod g+w /opt/spark/work-dir start spark with adding this .config("spark.kubernetes.driver.volumes.persistentVolumeClaim.300gb.options.claimName", "300gb") \

External Spark shuffle service for k8s

2024-04-06 Thread Mich Talebzadeh
I have seen some older references for shuffle service for k8s, although it is not clear they are talking about a generic shuffle service for k8s. Anyhow with the advent of genai and the need to allow for a larger volume of data, I was wondering if there has been any more work on this matter.

[VOTE][RESULT] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-04-03 Thread Hyukjin Kwon
The vote passes with 19+1s (13 binding +1s). (* = binding) +1: Haejoon Lee Ruifeng Zheng(*) Dongjoon Hyun(*) Gengliang Wang(*) Mridul Muralidharan(*) Liang-Chi Hsieh(*) Takuya Ueshin(*) Kent Yao Chao Sun(*) Hussein Awala Xiao Li(*) Yuanjian Li(*) Denny Lee Felix Cheung(*) Bo Yang Xinrong Meng(*)

Participate in the ASF 25th Anniversary Campaign

2024-04-03 Thread Brian Proffitt
Hi everyone, As part of The ASF’s 25th anniversary campaign[1], we will be celebrating projects and communities in multiple ways. We invite all projects and contributors to participate in the following ways: * Individuals - submit your first contribution:

Ready for Review: spark-kubernetes-operator Alpha Release

2024-04-02 Thread Zhou Jiang
Hi dev members, I am writing to let you know that the first pull request has been raised to the newly established spark-kubernetes-operator, as previously discussed within the group. This PR includes the alpha version release of this project.

Re: Scheduling jobs using FAIR pool

2024-04-02 Thread Varun Shah
Hi Hussein, Thanks for clarifying my doubts. It means that even if I configure 2 separate pools for 2 jobs or submit the 2 jobs in same pool, the submission time will take into effect only when both the jobs are "running" in parallel ( ie if job 1 gets all resources, job 2 has to wait unless

Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-04-02 Thread Tom Graves
+1 Tom On Sunday, March 31, 2024 at 10:09:28 PM CDT, Ruifeng Zheng wrote: +1 On Mon, Apr 1, 2024 at 10:06 AM Haejoon Lee wrote: +1 On Mon, Apr 1, 2024 at 10:15 AM Hyukjin Kwon wrote: Hi all, I'd like to start the vote for SPIP: Pure Python Package in PyPI (Spark Connect) 

Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-04-02 Thread Hyukjin Kwon
Yes On Tue, Apr 2, 2024 at 6:36 PM Femi Anthony wrote: > So, to clarify - the purpose of this package is to enable connectivity to > a remote Spark cluster without having to install any local JVM > dependencies, right ? > > Sent from my iPhone > > On Mar 31, 2024, at 10:07 PM, Haejoon Lee >

Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-04-02 Thread Femi Anthony
So, to clarify - the purpose of this package is to enable connectivity to a remote Spark cluster without having to install any local JVM dependencies, right ? Sent from my iPhoneOn Mar 31, 2024, at 10:07 PM, Haejoon Lee wrote:+1On Mon, Apr 1, 2024 at 10:15 AM Hyukjin Kwon

Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-04-01 Thread Mridul Muralidharan
+1 Regards, Mridul On Mon, Apr 1, 2024 at 11:26 PM Holden Karau wrote: > +1 > > Twitter: https://twitter.com/holdenkarau > Books (Learning Spark, High Performance Spark, etc.): > https://amzn.to/2MaRAG9 > YouTube Live Streams:

Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-04-01 Thread Holden Karau
+1 Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams: https://www.youtube.com/user/holdenkarau On Mon, Apr 1, 2024 at 5:44 PM Xinrong Meng wrote: > +1 > > Thank you @Hyukjin

Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-04-01 Thread Xinrong Meng
+1 Thank you @Hyukjin Kwon On Mon, Apr 1, 2024 at 10:19 AM Felix Cheung wrote: > +1 > -- > *From:* Denny Lee > *Sent:* Monday, April 1, 2024 10:06:14 AM > *To:* Hussein Awala > *Cc:* Chao Sun ; Hyukjin Kwon ; > Mridul Muralidharan ; dev > *Subject:* Re: [VOTE]

Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-04-01 Thread bo yang
+1 (non-binding) On Mon, Apr 1, 2024 at 10:19 AM Felix Cheung wrote: > +1 > -- > *From:* Denny Lee > *Sent:* Monday, April 1, 2024 10:06:14 AM > *To:* Hussein Awala > *Cc:* Chao Sun ; Hyukjin Kwon ; > Mridul Muralidharan ; dev > *Subject:* Re: [VOTE] SPIP: Pure

Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-04-01 Thread Felix Cheung
+1 From: Denny Lee Sent: Monday, April 1, 2024 10:06:14 AM To: Hussein Awala Cc: Chao Sun ; Hyukjin Kwon ; Mridul Muralidharan ; dev Subject: Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect) +1 (non-binding) On Mon, Apr 1, 2024 at 9:24 AM Hussein

Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-04-01 Thread Yuanjian Li
+1 Chao Sun 于2024年4月1日周一 07:56写道: > +1 > > On Sun, Mar 31, 2024 at 10:31 PM Hyukjin Kwon > wrote: > >> Oh I didn't send the discussion thread out as it's pretty simple, >> non-invasive and the discussion was sort of done as part of the Spark >> Connect initial discussion .. >> >> On Mon, Apr

Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-04-01 Thread Denny Lee
+1 (non-binding) On Mon, Apr 1, 2024 at 9:24 AM Hussein Awala wrote: > +1(non-binding) I add to the difference will it make that it will also > simplify package maintenance and easily release a bug fix/new feature > without needing to wait for Pyspark to release. > > On Mon, Apr 1, 2024 at

Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-04-01 Thread Xiao Li
+1 Hussein Awala 于2024年4月1日周一 08:07写道: > +1(non-binding) I add to the difference will it make that it will also > simplify package maintenance and easily release a bug fix/new feature > without needing to wait for Pyspark to release. > > On Mon, Apr 1, 2024 at 4:56 PM Chao Sun wrote: > >> +1

Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-04-01 Thread Hussein Awala
+1(non-binding) I add to the difference will it make that it will also simplify package maintenance and easily release a bug fix/new feature without needing to wait for Pyspark to release. On Mon, Apr 1, 2024 at 4:56 PM Chao Sun wrote: > +1 > > On Sun, Mar 31, 2024 at 10:31 PM Hyukjin Kwon >

Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-04-01 Thread Chao Sun
+1 On Sun, Mar 31, 2024 at 10:31 PM Hyukjin Kwon wrote: > Oh I didn't send the discussion thread out as it's pretty simple, > non-invasive and the discussion was sort of done as part of the Spark > Connect initial discussion .. > > On Mon, Apr 1, 2024 at 1:59 PM Mridul Muralidharan > wrote: >

Re: Scheduling jobs using FAIR pool

2024-04-01 Thread Hussein Awala
IMO the questions are not limited to Databricks. > The Round-Robin distribution of executors only work in case of empty executors (achievable by enabling dynamic allocation). In case the jobs (part of the same pool) requires all executors, second jobs will still need to wait. This feature in

Re: Scheduling jobs using FAIR pool

2024-04-01 Thread Varun Shah
Hi Mich, I did not post in the databricks community, as most of the questions were related to spark itself. But let me also post the question on databricks community. Thanks, Varun Shah On Mon, Apr 1, 2024, 16:28 Mich Talebzadeh wrote: > Hi, > > Have you put this question to Databricks forum

Re: Scheduling jobs using FAIR pool

2024-04-01 Thread Mich Talebzadeh
Hi, Have you put this question to Databricks forum Data Engineering - Databricks Mich Talebzadeh, Technologist | Solutions Architect | Data Engineer | Generative AI London United Kingdom view my Linkedin profile

Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-04-01 Thread Kent Yao
+1(non-binding). Thank you, Hyukjin. Kent Yao Takuya UESHIN 于2024年4月1日周一 18:04写道: > > +1 > > On Sun, Mar 31, 2024 at 6:16 PM Hyukjin Kwon wrote: >> >> Hi all, >> >> I'd like to start the vote for SPIP: Pure Python Package in PyPI (Spark >> Connect) >> >> JIRA >> Prototype >> SPIP doc >> >>

Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-03-31 Thread L. C. Hsieh
+1 Thanks Hyukjin. On Sun, Mar 31, 2024 at 10:52 PM Dongjoon Hyun wrote: > > +1 > > Thank you, Hyukjin. > > Dongjoon > > On Sun, Mar 31, 2024 at 19:07 Haejoon Lee > wrote: >> >> +1 >> >> On Mon, Apr 1, 2024 at 10:15 AM Hyukjin Kwon wrote: >>> >>> Hi all, >>> >>> I'd like to start the vote

Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-03-31 Thread Takuya UESHIN
+1 On Sun, Mar 31, 2024 at 6:16 PM Hyukjin Kwon wrote: > Hi all, > > I'd like to start the vote for SPIP: Pure Python Package in PyPI (Spark > Connect) > > JIRA > Prototype > SPIP doc >

Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-03-31 Thread Hyukjin Kwon
Oh I didn't send the discussion thread out as it's pretty simple, non-invasive and the discussion was sort of done as part of the Spark Connect initial discussion .. On Mon, Apr 1, 2024 at 1:59 PM Mridul Muralidharan wrote: > > Can you point me to the SPIP’s discussion thread please ? > I was

Scheduling jobs using FAIR pool

2024-03-31 Thread Varun Shah
Hi Community, I am currently exploring the best use of "Scheduler Pools" for executing jobs in parallel, and require clarification and suggestions on a few points. The implementation consists of executing "Structured Streaming" jobs on Databricks using AutoLoader. Each stream is executed with

Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-03-31 Thread Mridul Muralidharan
Can you point me to the SPIP’s discussion thread please ? I was not able to find it, but I was on vacation, and so might have missed this … Regards, Mridul On Sun, Mar 31, 2024 at 9:08 PM Haejoon Lee wrote: > +1 > > On Mon, Apr 1, 2024 at 10:15 AM Hyukjin Kwon wrote: > >> Hi all, >> >> I'd

Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-03-31 Thread Gengliang Wang
+1 On Sun, Mar 31, 2024 at 8:24 PM Dongjoon Hyun wrote: > +1 > > Thank you, Hyukjin. > > Dongjoon > > On Sun, Mar 31, 2024 at 19:07 Haejoon Lee > wrote: > >> +1 >> >> On Mon, Apr 1, 2024 at 10:15 AM Hyukjin Kwon >> wrote: >> >>> Hi all, >>> >>> I'd like to start the vote for SPIP: Pure Python

Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-03-31 Thread Dongjoon Hyun
+1 Thank you, Hyukjin. Dongjoon On Sun, Mar 31, 2024 at 19:07 Haejoon Lee wrote: > +1 > > On Mon, Apr 1, 2024 at 10:15 AM Hyukjin Kwon wrote: > >> Hi all, >> >> I'd like to start the vote for SPIP: Pure Python Package in PyPI (Spark >> Connect) >> >> JIRA

Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-03-31 Thread Ruifeng Zheng
+1 On Mon, Apr 1, 2024 at 10:06 AM Haejoon Lee wrote: > +1 > > On Mon, Apr 1, 2024 at 10:15 AM Hyukjin Kwon wrote: > >> Hi all, >> >> I'd like to start the vote for SPIP: Pure Python Package in PyPI (Spark >> Connect) >> >> JIRA >> Prototype

Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-03-31 Thread Haejoon Lee
+1 On Mon, Apr 1, 2024 at 10:15 AM Hyukjin Kwon wrote: > Hi all, > > I'd like to start the vote for SPIP: Pure Python Package in PyPI (Spark > Connect) > > JIRA > Prototype > SPIP doc >

[VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-03-31 Thread Hyukjin Kwon
Hi all, I'd like to start the vote for SPIP: Pure Python Package in PyPI (Spark Connect) JIRA Prototype SPIP doc

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-28 Thread Pavan Kotikalapudi
Hi Andrew, Sandy, Jerry, Thomas, marcelo, Whenchen, YangJie, Shixiong, My apologies. I have tagged soo many of you (on multiple emails), I am in the process of finding the core contributors of the Dynamic resource allocation (DRA) feature in apache/spark , I could

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-28 Thread Pavan Kotikalapudi
Hi Jungtaek, Sorry for the late reply. I understand the concerns towards finding PMC members, I had similar concerns in the past. Do you think we have something to improve in the SPIP (certain areas) so that it would get traction from PMC members? Or this SPIP might not be a priority to the PMC

Re: The dedicated repository for Kubernetes Operator for Apache Spark

2024-03-28 Thread Dongjoon Hyun
Thank you, Liang-Chi! Dongjoon. On Wed, Mar 27, 2024 at 10:56 PM L. C. Hsieh wrote: > Hi all, > > For the passed SPIP: An Official Kubernetes Operator for Apache Spark, > the developers have been working on code cleaning and refactoring for > open source in the last few months. They are ready

Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2024-03-28 Thread L. C. Hsieh
Hi Vakaris, Sorry for the late reply. Thanks for being interested in the official operator. The developers have been working on code cleaning and refactoring the internal codes for open source in the last few months. They are ready to contribute the code to Spark. We will create a dedicated

The dedicated repository for Kubernetes Operator for Apache Spark

2024-03-27 Thread L. C. Hsieh
Hi all, For the passed SPIP: An Official Kubernetes Operator for Apache Spark, the developers have been working on code cleaning and refactoring for open source in the last few months. They are ready to contribute the code to Spark now. As we discussed, I will go to create a dedicated repository

Re: Allowing Unicode Whitespace in Lexer

2024-03-27 Thread Mich Talebzadeh
looks fine except that processing all Unicode whitespace characters might add overhead to the parsing process, potentially impacting performance. Although I think this is a moot point +1 Mich Talebzadeh, Technologist | Solutions Architect | Data Engineer | Generative AI London United Kingdom

Re: Allowing Unicode Whitespace in Lexer

2024-03-27 Thread Gengliang Wang
+1, this is a reasonable change. Gengliang On Wed, Mar 27, 2024 at 9:54 AM serge rielau.com wrote: > Going once, going twice, …. last call for objections > On Mar 23, 2024 at 5:29 PM -0700, serge rielau.com , > wrote: > > Hello, > > I have a PR https://github.com/apache/spark/pull/45620 ready

<    1   2   3   4   5   6   7   8   9   10   >