Re: [VOTE] Release Spark 4.0.0-preview2 (RC1)

2024-09-16 Thread Holden Karau
+1 Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams: https://www.youtube.com/user/holdenkarau Pronouns: she/her On Mon, Sep 16, 2024 at 10:55 AM Zhou Jiang wrote: > + 1 > Sent

Re: [VOTE] Document and Feature Preview via GitHub Pages

2024-09-11 Thread Holden Karau
+1 Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams: https://www.youtube.com/user/holdenkarau Pronouns: she/her On Wed, Sep 11, 2024 at 6:45 PM Xiao Li wrote: > +1 > > Hyukjin

Re: [VOTE] Deprecate SparkR

2024-08-21 Thread Holden Karau
+1 Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams: https://www.youtube.com/user/holdenkarau Pronouns: she/her On Wed, Aug 21, 2024 at 8:59 PM Herman van Hovell wrote: > +1 >

Re: [DISCUSS] Deprecating SparkR

2024-08-12 Thread Holden Karau
+1 Are the sparklyr folks on this list? Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams: https://www.youtube.com/user/holdenkarau Pronouns: she/her On Mon, Aug 12, 2024 at 5:22

Re: [VOTE] Archive Spark Documentations in Apache Archives

2024-08-12 Thread Holden Karau
+1 On Mon, Aug 12, 2024 at 10:17 AM Dongjoon Hyun wrote: > +1 for the proposals > - enhancing the release process to put the docs to `release` directory in > order to archive. > - uploading old releases via SVN manually to archive. > > Since deletion is not a scope of this vote, I don't see any

Re: [VOTE] Using Github Issues for Spark-Connect-Go _only_ issues.

2024-08-12 Thread Holden Karau
+0 (binding) Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams: https://www.youtube.com/user/holdenkarau Pronouns: she/her On Mon, Aug 12, 2024 at 9:14 AM Matthew Powers wrote:

Re: [DISCUSS] Differentiate Spark without Spark Connect from Spark Connect

2024-07-24 Thread Holden Karau
sou...@gmail.com> wrote: >>> > >> >>> > >>> I'd propose not to change the name of "Spark Connect" - the name >>> > >>> represents the characteristic of the mode (separation of layer for >>> client >>> > &

Re: [DISCUSS] Differentiate Spark without Spark Connect from Spark Connect

2024-07-20 Thread Holden Karau
I think perhaps Spark Connect could be phrased as “Basic* Spark” & existing Spark could be “Full Spark” given the API limitations of Spark connect. *I was also thinking Core here but we’ve used core to refer to the RDD APIs for too long to reuse it here. Twitter: https://twitter.com/holdenkarau B

Re: issue forwarding SPARK_CONF_DIR to start workers

2024-07-20 Thread Holden Karau
This might a good discussion for the dev@ list, I don’t know much about SLURM deployments personally. Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams: https://www.youtube.com/user

Re: [DISCUSS] Why do we remove RDD usage and RDD-backed code?

2024-07-13 Thread Holden Karau
rted, and opened a new one https://github.com/apache/spark/pull/47341. > > On Sat, 13 Jul 2024 at 15:40, Hyukjin Kwon wrote: > >> Yeah that's fine. I'll revert and open a fresh PR including my own >> followup when I get back home later today. >> >> On

Re: [DISCUSS] Why do we remove RDD usage and RDD-backed code?

2024-07-12 Thread Holden Karau
tead of a plan string ser/de. We made similar changes > in JSON and CSV schema inference (it was an RDD before) > > On Sat, Jul 13, 2024 at 10:33 AM Holden Karau > wrote: > >> My bad I meant to say I believe the provided justification is >> inappropriate. >> >>

Re: [DISCUSS] Why do we remove RDD usage and RDD-backed code?

2024-07-12 Thread Holden Karau
On Fri, Jul 12, 2024 at 5:14 PM Holden Karau wrote: > So looking at the PR it does not appear to be removing any RDD APIs but > the justification provided for changing the ML backend to use the DataFrame > APIs is indeed concerning. > > This PR appears to have been merged with

Re: [DISCUSS] Why do we remove RDD usage and RDD-backed code?

2024-07-12 Thread Holden Karau
So looking at the PR it does not appear to be removing any RDD APIs but the justification provided for changing the ML backend to use the DataFrame APIs is indeed concerning. This PR appears to have been merged without proper review (or providing an opportunity for review). I’d like to remind peo

Re: [VOTE] Allow GitHub Actions runs for contributors' PRs without approvals in apache/spark-connect-go

2024-07-04 Thread Holden Karau
+1 Although given its a US holiday maybe keep the vote open for an extra day? Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams: https://www.youtube.com/user/holdenkarau On Thu,

Re: [外部邮件] Re: [VOTE] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-02 Thread Holden Karau
+1 Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams: https://www.youtube.com/user/holdenkarau On Tue, Jul 2, 2024 at 10:18 PM yangjie01 wrote: > +1 (non-binding) > > > > *发件人**

Re: [DISCUSS] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-02 Thread Holden Karau
I guess my one concern here would be are we going to expand the dependencies that are visible on the class path for non-connect users? One of the pain points that folks experienced with upgrading can be from those changing. Otherwise this seems pretty reasonable. Twitter: https://twitter.com/hol

[jira] [Created] (SPARK-48362) Add CollectSetWIthLimit

2024-05-20 Thread Holden Karau (Jira)
Holden Karau created SPARK-48362: Summary: Add CollectSetWIthLimit Key: SPARK-48362 URL: https://issues.apache.org/jira/browse/SPARK-48362 Project: Spark Issue Type: Improvement

[jira] [Assigned] (SPARK-44953) Log a warning (or automatically disable) when shuffle tracking is enabled along side another DA supported mechanism

2024-05-13 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau reassigned SPARK-44953: Assignee: binjie yang > Log a warning (or automatically disable) when shuffle track

[jira] [Resolved] (SPARK-44953) Log a warning (or automatically disable) when shuffle tracking is enabled along side another DA supported mechanism

2024-05-13 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau resolved SPARK-44953. -- Resolution: Fixed > Log a warning (or automatically disable) when shuffle tracking is enab

Re: [DISCUSS] Spark 4.0.0 release

2024-05-08 Thread Holden Karau
. >> Is there some point of contact that can provide me needed context and >> permissions? >> I'd also love to see why the costs are high and see how we can reduce >> them... >> >> Thanks, >> Nimrod >> >> On Wed, May 8, 2024 at 8:26 AM Hol

Re: [DISCUSS] Spark 4.0.0 release

2024-05-07 Thread Holden Karau
t; will be automated and the only thing which will be manual is to sign the > release for security reasons that would be reasonable. > > Thanks, > Nimrod > > > בתאריך יום ד׳, 8 במאי 2024, 00:54, מאת Holden Karau ‏< > holden.ka...@gmail.com>: > >> Indeed. We could c

Re: [DISCUSS] Spark 4.0.0 release

2024-05-07 Thread Holden Karau
anymore, my pgp >> key is lost, etc.). I'll start the RC process at my tomorrow. Thanks for >> your patience! >> >> Wenchen >> >> On Fri, May 3, 2024 at 7:47 AM yangjie01 wrote: >> >>> +1 >>> >>> >>> >>> *发

Re: ASF board report draft for May

2024-05-06 Thread Holden Karau
release) >>>> >>>> In addition, Apache Spark PMC received an official notice from ASF >>>> Infra team. >>>> >>>> https://lists.apache.org/thread/rgy1cg17tkd3yox7qfq87ht12sqclkbg >>>> > [NOTICE] Apache Spark's Gi

Re: ASF board report draft for May

2024-05-06 Thread Holden Karau
as much as possible, we >> opened a blocker-level JIRA issue and have been working on it. >> - https://infra.apache.org/github-actions-policy.html >> >> Please include a sentence that Apache Spark PMC is working on under the >> following umbrella JIRA issue. >&g

Re: ASF board report draft for May

2024-05-05 Thread Holden Karau
Do we want to include that we’re planning on having a preview release of Spark 4 so folks can see the APIs “soon”? Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams: https://www.you

[jira] [Updated] (SPARK-48101) When using INSERT OVERWRITE with Spark CTEs they may not be fully resolved

2024-05-02 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-48101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau updated SPARK-48101: - Priority: Minor (was: Major) > When using INSERT OVERWRITE with Spark CTEs they may not

[jira] [Created] (SPARK-48101) When using INSERT OVERWRITE with Spark CTEs they may not be fully resolved

2024-05-02 Thread Holden Karau (Jira)
Holden Karau created SPARK-48101: Summary: When using INSERT OVERWRITE with Spark CTEs they may not be fully resolved Key: SPARK-48101 URL: https://issues.apache.org/jira/browse/SPARK-48101 Project

Re: [DISCUSS] Spark 4.0.0 release

2024-05-01 Thread Holden Karau
+1 :) yay previews On Wed, May 1, 2024 at 5:36 PM Chao Sun wrote: > +1 > > On Wed, May 1, 2024 at 5:23 PM Xiao Li wrote: > >> +1 for next Monday. >> >> We can do more previews when the other features are ready for preview. >> >> Tathagata Das 于2024年5月1日周三 08:46写道: >> >>> Next week sounds great

Re: [VOTE] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-26 Thread Holden Karau
+1 Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams: https://www.youtube.com/user/holdenkarau On Fri, Apr 26, 2024 at 12:06 PM L. C. Hsieh wrote: > +1 > > On Fri, Apr 26, 2024

Re: [FYI] SPARK-47993: Drop Python 3.8

2024-04-25 Thread Holden Karau
+1 Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams: https://www.youtube.com/user/holdenkarau On Thu, Apr 25, 2024 at 11:18 AM Maciej wrote: > +1 > > Best regards, > Maciej Szy

[Bug 2018504] Re: cups-browsed is using an excessive amount of CPU

2024-04-19 Thread Holden Karau
+1 also running into this If I restart cups the issue goes away for "awhile" though (interestingly printing does not seem to impact cups meaning it's probably behavior that is unrelated to the printing). -- You received this bug notification because you are a member of Ubuntu Bugs, which is sub

Re: [VOTE] SPARK-44444: Use ANSI SQL mode by default

2024-04-13 Thread Holden Karau
+1 -- even if it's not perfect now is the time to change default values On Sat, Apr 13, 2024 at 4:11 PM Hyukjin Kwon wrote: > +1 > > On Sun, Apr 14, 2024 at 7:46 AM Chao Sun wrote: > >> +1. >> >> This feature is very helpful for guarding against correctness issues, >> such as null results due t

Re: Introducing Apache Gluten(incubating), a middle layer to offload Spark to native engine

2024-04-10 Thread Holden Karau
On Wed, Apr 10, 2024 at 9:54 PM Binwei Yang wrote: > > Gluten currently already support Velox backend and Clickhouse backend. > data fusion support is also proposed but no one worked on it. > > Gluten isn't a POC. It's under actively developing but some companies > already used it. > > > On 2024/

Re: SPIP: Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-09 Thread Holden Karau
I like the idea of improving flexibility of Sparks physical plans and really anything that might reduce code duplication among the ~4 or so different accelerators. Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9

Re: Apache Spark 3.4.3 (?)

2024-04-06 Thread Holden Karau
Sounds good to me :) Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams: https://www.youtube.com/user/holdenkarau On Sat, Apr 6, 2024 at 2:51 PM Dongjoon Hyun wrote: > Hi, All. >

Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-04-01 Thread Holden Karau
+1 Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams: https://www.youtube.com/user/holdenkarau On Mon, Apr 1, 2024 at 5:44 PM Xinrong Meng wrote: > +1 > > Thank you @Hyukjin Kwo

[jira] [Created] (SPARK-47672) Avoid double evaluation of non-trivial projected elements from filter pushdown

2024-04-01 Thread Holden Karau (Jira)
Holden Karau created SPARK-47672: Summary: Avoid double evaluation of non-trivial projected elements from filter pushdown Key: SPARK-47672 URL: https://issues.apache.org/jira/browse/SPARK-47672

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-12 Thread Holden Karau
+1 Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams: https://www.youtube.com/user/holdenkarau On Mon, Mar 11, 2024 at 7:44 PM Reynold Xin wrote: > +1 > > > On Mon, Mar 11 2024

[jira] [Created] (SPARK-47220) log4j race condition during shutdown

2024-02-28 Thread Holden Karau (Jira)
Holden Karau created SPARK-47220: Summary: log4j race condition during shutdown Key: SPARK-47220 URL: https://issues.apache.org/jira/browse/SPARK-47220 Project: Spark Issue Type: Improvement

Re: Generating config docs automatically

2024-02-21 Thread Holden Karau
I think this is a good idea. I like having everything in one source of truth rather than two (so option 1 sounds like a good idea); but that’s just my opinion. I'd be happy to help with reviews though. On Wed, Feb 21, 2024 at 6:37 AM Nicholas Chammas wrote: > I know config documentation is not t

Re: Spark 4.0 Query Analyzer Bug Report

2024-02-20 Thread Holden Karau
Do you mean Spark 3.4? 4.0 is very much not released yet. Also it would help if you could share your query & more of the logs leading up to the error. On Tue, Feb 20, 2024 at 3:07 PM Sharma, Anup wrote: > Hi Spark team, > > > > We ran into a dataframe issue after upgrading from spark 3.1 to 4.

[jira] [Resolved] (SPARK-47077) sbt build is broken due to selenium change

2024-02-16 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-47077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau resolved SPARK-47077. -- Resolution: Cannot Reproduce After blowing away my maven + ivy cache it works fine – should

[jira] [Created] (SPARK-47077) sbt build is broken due to selenium change

2024-02-16 Thread Holden Karau (Jira)
Holden Karau created SPARK-47077: Summary: sbt build is broken due to selenium change Key: SPARK-47077 URL: https://issues.apache.org/jira/browse/SPARK-47077 Project: Spark Issue Type

[jira] [Updated] (SPARK-47001) Pushdown Verification in Optimizer.scala should support changed data types

2024-02-16 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-47001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau updated SPARK-47001: - Description: When pushing a filter down in a union the data type may not match exactly if the

Re: Dynamically Support Spark Native Engine in Iceberg

2024-02-13 Thread Holden Karau
This is great work! Very excited to see this. Cell : 425-233-8271 On Tue, Feb 13, 2024 at 4:38 PM huaxin gao wrote: > Hello Iceberg community, > > As you may already know, Project Comet > , a plugin to > accelerate Spark query execution via lev

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

2024-02-13 Thread Holden Karau
This looks really cool :) Out of interest what are the differences in the approach between this and Glutten? On Tue, Feb 13, 2024 at 12:42 PM Chao Sun wrote: > Hi all, > > We are very happy to announce that Project Comet, a plugin to > accelerate Spark query execution via leveraging DataFusion a

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

2024-02-13 Thread Holden Karau
This looks really cool :) Out of interest what are the differences in the approach between this and Glutten? On Tue, Feb 13, 2024 at 12:42 PM Chao Sun wrote: > Hi all, > > We are very happy to announce that Project Comet, a plugin to > accelerate Spark query execution via leveraging DataFusion a

[jira] [Created] (SPARK-47031) Union of with non-determinstic expression should be non-deterministic

2024-02-12 Thread Holden Karau (Jira)
Holden Karau created SPARK-47031: Summary: Union of with non-determinstic expression should be non-deterministic Key: SPARK-47031 URL: https://issues.apache.org/jira/browse/SPARK-47031 Project: Spark

Re: [Spark-Core] Improving Reliability of spark when Executors OOM

2024-01-16 Thread Holden Karau
Oh interesting solution, a co-worker was suggesting something similar using resource profiles to increase memory -- but your approach avoids a lot of complexity I like it (and we could extend it out to support resource profile growth too). I think an SPIP sounds like a great next step. On Tue, Ja

Re: Spark-Connect: Param `--packages` does not take effect for executors.

2023-12-04 Thread Holden Karau
So I think this sounds like a bug to me, in the help options for both regular spark-submit and ./sbin/start-connect-server.sh we say: " --packages Comma-separated list of maven coordinates of jars to include on the driver and executor classpaths. Will

Re: Classpath isolation per SparkSession without Spark Connect

2023-11-27 Thread Holden Karau
So I don’t think we make any particular guarantees around class path isolation there, so even if it does work it’s something you’d need to pay attention to on upgrades. Class path isolation is tricky to get right. On Mon, Nov 27, 2023 at 2:58 PM Faiz Halde wrote: > Hello, > > We are using spark

Re: [VOTE] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-14 Thread Holden Karau
+1 On Tue, Nov 14, 2023 at 10:21 AM DB Tsai wrote: > +1 > > DB Tsai | https://www.dbtsai.com/ | PGP 42E5B25A8F7A82C1 > > On Nov 14, 2023, at 10:14 AM, Vakaris Baškirov < > vakaris.bashki...@gmail.com> wrote: > > +1 (non-binding) > > > On Tue, Nov 14, 2023 at 8:03 PM Chao Sun wrote: > >> +1

Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-12 Thread Holden Karau
To be clear: I am generally supportive of the idea (+1) but have some follow-up questions: Have we taken the time to learn from the other operators? Do we have a compatible CRD/API or not (and if so why?) The API seems to assume that everything is packaged in the container in advance, but I imagin

Re: Apache Spark 3.4.2 (?)

2023-11-06 Thread Holden Karau
+1 On Mon, Nov 6, 2023 at 4:30 PM yangjie01 wrote: > +1 > > > > *发件人**: *Yuming Wang > *日期**: *2023年11月7日 星期二 07:00 > *收件人**: *Santosh Pingale > *抄送**: *Dongjoon Hyun , dev > > *主题**: *Re: Apache Spark 3.4.2 (?) > > > > +1 > > > > On Tue, Nov 7, 2023 at 3:55 AM Santosh Pingale > wrote: > > M

[jira] [Created] (SPARK-45712) Provide a command line flag to override the log4j properties file

2023-10-27 Thread Holden Karau (Jira)
Holden Karau created SPARK-45712: Summary: Provide a command line flag to override the log4j properties file Key: SPARK-45712 URL: https://issues.apache.org/jira/browse/SPARK-45712 Project: Spark

[jira] [Updated] (SPARK-45563) Spark history files backend currently depend on polling for loading into the history server

2023-10-16 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-45563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau updated SPARK-45563: - Affects Version/s: 4.0.0 (was: 3.3.0) (was

[jira] [Updated] (SPARK-45563) Spark history files backend currently depend on polling for loading into the history server

2023-10-16 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-45563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau updated SPARK-45563: - Issue Type: Improvement (was: Bug) > Spark history files backend currently depend on poll

[jira] [Updated] (SPARK-45563) Spark history files backend currently depend on polling for loading into the history server

2023-10-16 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-45563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau updated SPARK-45563: - Description: The spark history server FS  currently depends on polling for loading history

[jira] [Updated] (SPARK-45563) Spark history files backend currently depend on polling for loading into the history server

2023-10-16 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-45563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau updated SPARK-45563: - Summary: Spark history files backend currently depend on polling for loading into the history

[jira] [Created] (SPARK-45563) Spark rolling history files currently depend on polling for loading into the history server

2023-10-16 Thread Holden Karau (Jira)
Holden Karau created SPARK-45563: Summary: Spark rolling history files currently depend on polling for loading into the history server Key: SPARK-45563 URL: https://issues.apache.org/jira/browse/SPARK-45563

[jira] [Resolved] (SPARK-44735) Log a warning when inserting columns with the same name by row that don't match up

2023-10-13 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau resolved SPARK-44735. -- Fix Version/s: 4.0.0 Resolution: Fixed > Log a warning when inserting columns with

[jira] [Assigned] (SPARK-44735) Log a warning when inserting columns with the same name by row that don't match up

2023-10-13 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau reassigned SPARK-44735: Assignee: Jia Fan > Log a warning when inserting columns with the same name by row t

Re: Write Spark Connection client application in Go

2023-09-12 Thread Holden Karau
That’s so cool! Great work y’all :) On Tue, Sep 12, 2023 at 8:14 PM bo yang wrote: > Hi Spark Friends, > > Anyone interested in using Golang to write Spark application? We created a > Spark > Connect Go Client library . > Would love to hear feedback/t

Re: Write Spark Connection client application in Go

2023-09-12 Thread Holden Karau
That’s so cool! Great work y’all :) On Tue, Sep 12, 2023 at 8:14 PM bo yang wrote: > Hi Spark Friends, > > Anyone interested in using Golang to write Spark application? We created a > Spark > Connect Go Client library . > Would love to hear feedback/t

Re: [VOTE] Release Apache Spark 3.5.0 (RC4)

2023-09-07 Thread Holden Karau
+1 pip installing seems to function :) On Thu, Sep 7, 2023 at 7:22 PM Yuming Wang wrote: > +1. > > On Thu, Sep 7, 2023 at 10:33 PM yangjie01 > wrote: > >> +1 >> >> >> >> *发件人**: *Gengliang Wang >> *日期**: *2023年9月7日 星期四 12:53 >> *收件人**: *Yuanjian Li >> *抄送**: *Xiao Li , "her...@databricks.com.

Re: [VOTE] Release Apache Spark 3.5.0 (RC3)

2023-09-02 Thread Holden Karau
Can we delay the next RC cut until after Labor Day? On Sat, Sep 2, 2023 at 9:59 PM Yuanjian Li wrote: > Thank you for all the reports! > The vote has failed. I plan to cut RC4 in two days. > > @Dipayan Dev I quickly skimmed through the > corresponding ticket, and it doesn't seem to be a regress

[jira] [Created] (SPARK-44992) Add support for rack information from an environment variable

2023-08-28 Thread Holden Karau (Jira)
Holden Karau created SPARK-44992: Summary: Add support for rack information from an environment variable Key: SPARK-44992 URL: https://issues.apache.org/jira/browse/SPARK-44992 Project: Spark

Re: Elasticsearch support for Spark 3.x

2023-08-27 Thread Holden Karau
What’s the version of the ES connector you are using? On Sat, Aug 26, 2023 at 10:17 AM Dipayan Dev wrote: > Hi All, > > We're using Spark 2.4.x to write dataframe into the Elasticsearch index. > As we're upgrading to Spark 3.3.0, it throwing out error > Caused by: java.lang.ClassNotFoundExceptio

[jira] [Created] (SPARK-44970) Spark History File Uploads Can Fail on S3

2023-08-25 Thread Holden Karau (Jira)
Holden Karau created SPARK-44970: Summary: Spark History File Uploads Can Fail on S3 Key: SPARK-44970 URL: https://issues.apache.org/jira/browse/SPARK-44970 Project: Spark Issue Type: Bug

[jira] [Created] (SPARK-44955) Add the option for dynamically marking containers for preemption based data

2023-08-24 Thread Holden Karau (Jira)
Holden Karau created SPARK-44955: Summary: Add the option for dynamically marking containers for preemption based data Key: SPARK-44955 URL: https://issues.apache.org/jira/browse/SPARK-44955 Project

[jira] [Created] (SPARK-44954) Make DEA algorithms pluggable

2023-08-24 Thread Holden Karau (Jira)
Holden Karau created SPARK-44954: Summary: Make DEA algorithms pluggable Key: SPARK-44954 URL: https://issues.apache.org/jira/browse/SPARK-44954 Project: Spark Issue Type: Sub-task

[jira] [Updated] (SPARK-44953) Log a warning (or automatically disable) when shuffle tracking is enabled along side another DA supported mechanism

2023-08-24 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau updated SPARK-44953: - Parent: SPARK-44951 Issue Type: Sub-task (was: Improvement) > Log a warning

[jira] [Created] (SPARK-44953) Log a warning (or automatically disable) when shuffle tracking is enabled along side another DA supported mechanism

2023-08-24 Thread Holden Karau (Jira)
Holden Karau created SPARK-44953: Summary: Log a warning (or automatically disable) when shuffle tracking is enabled along side another DA supported mechanism Key: SPARK-44953 URL: https://issues.apache.org/jira

[jira] [Created] (SPARK-44951) Improve Spark Dynamic Allocation

2023-08-24 Thread Holden Karau (Jira)
Holden Karau created SPARK-44951: Summary: Improve Spark Dynamic Allocation Key: SPARK-44951 URL: https://issues.apache.org/jira/browse/SPARK-44951 Project: Spark Issue Type: Improvement

[jira] [Created] (SPARK-44950) Improve Spark Driver Launch Time

2023-08-24 Thread Holden Karau (Jira)
Holden Karau created SPARK-44950: Summary: Improve Spark Driver Launch Time Key: SPARK-44950 URL: https://issues.apache.org/jira/browse/SPARK-44950 Project: Spark Issue Type: Improvement

Re: [Internet]Re: Improving Dynamic Allocation Logic for Spark 4+

2023-08-23 Thread Holden Karau
g Lead >>> London >>> United Kingdom >>> >>> >>>view my Linkedin profile >>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>> >>> >>> https://en.everybodywiki.com/Mich_Talebzadeh >>> >>

[jira] [Created] (SPARK-44769) Add SQL statement to create an empty array with a type

2023-08-10 Thread Holden Karau (Jira)
Holden Karau created SPARK-44769: Summary: Add SQL statement to create an empty array with a type Key: SPARK-44769 URL: https://issues.apache.org/jira/browse/SPARK-44769 Project: Spark Issue

[jira] [Updated] (SPARK-42035) Add a config flag to force exit on JDK major version mismatch

2023-08-09 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-42035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau updated SPARK-42035: - Target Version/s: 4.0.0 > Add a config flag to force exit on JDK major version misma

[jira] [Updated] (SPARK-42261) K8s will not allocate more execs if there are any pending execs until next snapshot

2023-08-09 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-42261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau updated SPARK-42261: - Target Version/s: 4.0.0 > K8s will not allocate more execs if there are any pending execs un

[jira] [Updated] (SPARK-44511) Allow insertInto to succeed with partion columns specified when they match those on the target table

2023-08-09 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau updated SPARK-44511: - Target Version/s: 4.0.0 > Allow insertInto to succeed with partion columns specified when t

[jira] [Updated] (SPARK-42361) Add an option to use external storage to distribute JAR set in cluster mode on Kube

2023-08-09 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-42361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau updated SPARK-42361: - Target Version/s: 4.0.0 > Add an option to use external storage to distribute JAR set

[jira] [Updated] (SPARK-42260) Log when the K8s Exec Pods Allocator Stalls

2023-08-09 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-42260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau updated SPARK-42260: - Target Version/s: 4.0.0 > Log when the K8s Exec Pods Allocator Sta

[jira] [Commented] (SPARK-44727) Improve the error message for dynamic allocation conditions

2023-08-09 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17752496#comment-17752496 ] Holden Karau commented on SPARK-44727: -- Do you have more context [~chen

[jira] [Updated] (SPARK-42035) Add a config flag to force exit on JDK major version mismatch

2023-08-09 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-42035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau updated SPARK-42035: - Description: JRE version mismatches can cause errors which are difficult to debug (potentially

[jira] [Updated] (SPARK-34337) Reject disk blocks when out of disk space

2023-08-09 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau updated SPARK-34337: - Target Version/s: 4.0.0 > Reject disk blocks when out of disk sp

[jira] [Created] (SPARK-44735) Log a warning when inserting columns with the same name by row that don't match up

2023-08-08 Thread Holden Karau (Jira)
Holden Karau created SPARK-44735: Summary: Log a warning when inserting columns with the same name by row that don't match up Key: SPARK-44735 URL: https://issues.apache.org/jira/browse/SPARK-

Re: [Internet]Re: Improving Dynamic Allocation Logic for Spark 4+

2023-08-08 Thread Holden Karau
2023 at 23:42, Mich Talebzadeh >>>> wrote: >>>> >>>> >>>> >>>> Hi, >>>> >>>> >>>> >>>> From what I have seen spark on a serverless cluster has hard up getting >>>> the dr

Re: ASF board report draft for August 2023

2023-08-08 Thread Holden Karau
Maybe add a link to the 4.0 JIRA where we are tracking the current plans for 4.0? On Tue, Aug 8, 2023 at 9:33 AM Dongjoon Hyun wrote: > Thank you, Matei. > > It looks good to me. > > Dongjoon > > On Mon, Aug 7, 2023 at 22:54 Matei Zaharia > wrote: > >> It’s time to send our quarterly report to

Re: Dynamic allocation does not deallocate executors

2023-08-08 Thread Holden Karau
for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such lo

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2023-08-07 Thread Holden Karau
Oooh fascinating. I’m going on call this week so it will take me awhile but I do want to review this :) On Mon, Aug 7, 2023 at 5:30 PM Pavan Kotikalapudi wrote: > Hi Spark Dev, > > I have extended traditional DRA to work for structured streaming > use-case. > > Here is an initial Implementation

Re: Improving Dynamic Allocation Logic for Spark 4+

2023-08-07 Thread Holden Karau
Oh great point On Mon, Aug 7, 2023 at 2:23 PM bo yang wrote: > Thanks Holden for bringing this up! > > Maybe another thing to think about is how to make dynamic allocation more > friendly with Kubernetes and disaggregated shuffle storage? > > > > On Mon, Aug 7, 2023

[jira] [Commented] (SPARK-44050) Unable to Mount ConfigMap in Driver Pod - ConfigMap Creation Issue

2023-08-07 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17751796#comment-17751796 ] Holden Karau commented on SPARK-44050: -- Ah interesting, it sounds like the

[jira] [Commented] (SPARK-44508) Add user guide for Python UDTFs

2023-08-07 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17751795#comment-17751795 ] Holden Karau commented on SPARK-44508: -- I'm not sure this should be

Improving Dynamic Allocation Logic for Spark 4+

2023-08-07 Thread Holden Karau
So I wondering if there is interesting in revisiting some of how Spark is doing it's dynamica allocation for Spark 4+? Some things that I've been thinking about: - Advisory user input (e.g. a way to say after X is done I know I need Y where Y might be a bunch of GPU machines) - Configurable toler

Re: Dynamic allocation does not deallocate executors

2023-08-07 Thread Holden Karau
I think you need to set "spark.dynamicAllocation.shuffleTracking.enabled=true" to false. On Mon, Aug 7, 2023 at 2:50 AM Mich Talebzadeh wrote: > Yes I have seen cases where the driver gone but a couple of executors > hanging on. Sounds like a code issue. > > HTH > > Mich Talebzadeh, > Solutions

[jira] [Commented] (SPARK-24282) Add support for PMML export for the Standard Scaler Stage

2023-08-07 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-24282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17751758#comment-17751758 ] Holden Karau commented on SPARK-24282: -- I don't think were going t

[jira] [Resolved] (SPARK-28740) Add support for building with bloop

2023-08-07 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau resolved SPARK-28740. -- Resolution: Won't Fix > Add support for building wi

[jira] [Commented] (SPARK-32111) Cleanup locks and docs in CoarseGrainedSchedulerBackend

2023-08-07 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17751754#comment-17751754 ] Holden Karau commented on SPARK-32111: -- I think this could be a good target

[jira] [Updated] (SPARK-32111) Cleanup locks and docs in CoarseGrainedSchedulerBackend

2023-08-07 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau updated SPARK-32111: - Target Version/s: 4.0.0 Affects Version/s: 4.0.0 > Cleanup locks and docs

[jira] [Updated] (SPARK-44578) Support pushing down BoundFunction in DSv2

2023-07-31 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau updated SPARK-44578: - Description: See [https://github.com/apache/iceberg/pull/7886#discussion_r1257537662]  (was

  1   2   3   4   5   6   7   8   9   10   >