Re: LLM script for error message improvement

2023-08-02 Thread Hyukjin Kwon
I think adding that dev tool script to improve the error message is fine. On Thu, 3 Aug 2023 at 10:24, Haejoon Lee wrote: > Dear contributors, I hope you are doing well! > > I see there are contributors who are interested in working on error > message improvements and persistent contribution,

LLM script for error message improvement

2023-08-02 Thread Haejoon Lee
Dear contributors, I hope you are doing well! I see there are contributors who are interested in working on error message improvements and persistent contribution, so I want to share an llm-based error message improvement script for helping your contribution. You can find a detail for the script

Query hints visible to DSV2 connectors?

2023-08-02 Thread Alex Cruise
Hey folks, I'm adding an optional feature to my DSV2 connector where it can choose between a row-based or columnar PartitionReader dynamically depending on a query's schema. I'd like to be able to supply a hint at query time that's visible to the connector, but at the moment I can't see any way

[VOTE][RESULT] XML data source support

2023-08-02 Thread Sandip Agarwala
The vote passes with 7 +1s (4 binding +1s). Thank you all for your comments and votes! (* = binding) Adrian Pop-Tifrea Hyukjin Kwon * Jia Fan Mich Talebzadeh Maciej Szymkiewicz * Sean Owen * Xiao Li * SPIP link:

Re: [Reminder] Spark 3.5 RC Cut

2023-08-02 Thread Bjørn Jørgensen
@Dongjoon Hyun FYI [image: image.png] We better ask common-...@hadoop.apache.org. ons. 2. aug. 2023 kl. 18:03 skrev Dongjoon Hyun : > Oh, I got it, Emil and Bjorn. > > Dongjoon. > > On Wed, Aug 2, 2023 at 12:32 AM Bjørn Jørgensen > wrote: > >> "*As far as I can tell this makes both 3.3.5 and

Re: [Reminder] Spark 3.5 RC Cut

2023-08-02 Thread Dongjoon Hyun
Oh, I got it, Emil and Bjorn. Dongjoon. On Wed, Aug 2, 2023 at 12:32 AM Bjørn Jørgensen wrote: > "*As far as I can tell this makes both 3.3.5 and 3.3.6 unusable with s3 > without providing an alternative committer code.*" > > https://github.com/apache/hadoop/pull/5706#issuecomment-1619927992 >

Re: [Reminder] Spark 3.5 RC Cut

2023-08-02 Thread Bjørn Jørgensen
"*As far as I can tell this makes both 3.3.5 and 3.3.6 unusable with s3 without providing an alternative committer code.*" https://github.com/apache/hadoop/pull/5706#issuecomment-1619927992 ons. 2. aug. 2023 kl. 08:05 skrev Emil Ejbyfeldt : > > Apache Spark is not affected by HADOOP-18757

Re: [Reminder] Spark 3.5 RC Cut

2023-08-02 Thread Emil Ejbyfeldt
> Apache Spark is not affected by HADOOP-18757 because it is not a part of > both Apache Hadoop 3.3.5 and 3.3.6. I am not sure I am following what you are trying to say here. Is that the jira is saying that only 3.3.5 is affected? Here I think the Jira is just incorrect. The jira was created

Re: [Reminder] Spark 3.5 RC Cut

2023-08-01 Thread Dongjoon Hyun
It's still invalid information, Emil. Apache Spark is not affected by HADOOP-18757 because it is not a part of both Apache Hadoop 3.3.5 and 3.3.6. HADOOP-18757 seems to be merged just two weeks ago and there is no Apache Hadoop release with it, isn't it? Could you check your local branch once

Re: [Reminder] Spark 3.5 RC Cut

2023-08-01 Thread Emil Ejbyfeldt
Hi, Yes, sorry about that seem to have messed up the link. Should have been https://issues.apache.org/jira/browse/HADOOP-18757 Best, Emil On 01/08/2023 19:08, Dongjoon Hyun wrote: Hi, Emil. HADOOP-18568 is still open and it seems to be never a part of the Hadoop trunk branch. Do you

Re: [Reminder] Spark 3.5 RC Cut

2023-08-01 Thread Dongjoon Hyun
Hi, Emil. HADOOP-18568 is still open and it seems to be never a part of the Hadoop trunk branch. Do you mean another JIRA? Dongjoon. On Tue, Aug 1, 2023 at 2:59 AM Emil Ejbyfeldt wrote: > Hi, > > We previously ran some experiments on builds from the 3.5 branch and > noticed that Hadoop had

Re: [Reminder] Spark 3.5 RC Cut

2023-08-01 Thread Emil Ejbyfeldt
Hi, We previously ran some experiments on builds from the 3.5 branch and noticed that Hadoop had a regression (https://issues.apache.org/jira/browse/HADOOP-18568) in their s3a committer affecting 3.3.5 and 3.3.6 (Spark 3.4 uses hadoop 3.3.4). This fix has been merged into Hadoop and will be

Re: Time for Spark 3.3.3 release?

2023-07-31 Thread Ruifeng Zheng
+1, thank you Yuming On Tue, Aug 1, 2023 at 10:40 AM Yuming Wang wrote: > Thank you. I will prepare 3.3.3-rc1 soon. > > On Sun, Jul 30, 2023 at 12:15 AM Dongjoon Hyun > wrote: > >> +1 >> >> Thank you for volunteering, Yuming. >> >> Dongjoon >> >> >> On Fri, Jul 28, 2023 at 11:35 AM Yuming Wang

Re: Time for Spark 3.3.3 release?

2023-07-31 Thread Yuming Wang
Thank you. I will prepare 3.3.3-rc1 soon. On Sun, Jul 30, 2023 at 12:15 AM Dongjoon Hyun wrote: > +1 > > Thank you for volunteering, Yuming. > > Dongjoon > > > On Fri, Jul 28, 2023 at 11:35 AM Yuming Wang wrote: > >> Hi Spark devs, >> >> Since Apache Spark 3.3.2 tag creation (Feb 11), 60

Re: [VOTE] SPIP: XML data source support

2023-07-29 Thread Hyukjin Kwon
+1 On Sat, 29 Jul 2023 at 22:49, Maciej wrote: > +1 > > Best regards, > Maciej Szymkiewicz > > Web: https://zero323.net > PGP: A30CEF0C31A501EC > > On 7/29/23 11:28, Mich Talebzadeh wrote: > > +1 for me. > > Though Databriks did a good job releasing the code. > > GitHub - databricks/spark-xml:

Re: Time for Spark 3.3.3 release?

2023-07-29 Thread Dongjoon Hyun
+1 Thank you for volunteering, Yuming. Dongjoon On Fri, Jul 28, 2023 at 11:35 AM Yuming Wang wrote: > Hi Spark devs, > > Since Apache Spark 3.3.2 tag creation (Feb 11), 60 patches > have > arrived at branch-3.3. > > Shall we make

Re: [VOTE] SPIP: XML data source support

2023-07-29 Thread Maciej
+1 Best regards, Maciej Szymkiewicz Web:https://zero323.net PGP: A30CEF0C31A501EC On 7/29/23 11:28, Mich Talebzadeh wrote: +1 for me. Though Databriks did a good job releasing the code. GitHub - databricks/spark-xml: XML data source for Spark SQL and DataFrames

Re: [VOTE] SPIP: XML data source support

2023-07-29 Thread Mich Talebzadeh
+1 for me. Though Databriks did a good job releasing the code. GitHub - databricks/spark-xml: XML data source for Spark SQL and DataFrames Mich Talebzadeh, Solutions Architect/Engineering Lead Palantir

[Reminder] Spark 3.5 RC Cut

2023-07-29 Thread Yuanjian Li
Hi everyone, Following the release timeline, I will cut the RC on* Tuesday, Aug 1st at 1 pm PST* as scheduled. DateEvent July 17th 2023 Late July 2023 Code freeze. Release branch cut. QA period. Focus on bug fixes, tests, stability and docs. Generally, no new features merged. August 2023

Re: [VOTE] SPIP: XML data source support

2023-07-28 Thread Jia Fan
+ 1 > 2023年7月29日 13:06,Adrian Pop-Tifrea 写道: > > +1, the more data source formats, the better, and if the solution is already > thoroughly tested, I say we should go for it. > > On Sat, Jul 29, 2023, 06:35 Xiao Li > wrote: >> +1 >> >> On Fri, Jul 28, 2023 at

Re: [VOTE] SPIP: XML data source support

2023-07-28 Thread Adrian Pop-Tifrea
+1, the more data source formats, the better, and if the solution is already thoroughly tested, I say we should go for it. On Sat, Jul 29, 2023, 06:35 Xiao Li wrote: > +1 > > On Fri, Jul 28, 2023 at 15:54 Sean Owen wrote: > >> +1 I think that porting the package 'as is' into Spark is probably

Re: [VOTE] SPIP: XML data source support

2023-07-28 Thread Xiao Li
+1 On Fri, Jul 28, 2023 at 15:54 Sean Owen wrote: > +1 I think that porting the package 'as is' into Spark is probably > worthwhile. > That's relatively easy; the code is already pretty battle-tested and not > that big and even originally came from Spark code, so is more or less > similar

Re: [VOTE] SPIP: XML data source support

2023-07-28 Thread Sean Owen
+1 I think that porting the package 'as is' into Spark is probably worthwhile. That's relatively easy; the code is already pretty battle-tested and not that big and even originally came from Spark code, so is more or less similar already. One thing it never got was DSv2 support, which means XML

[VOTE] SPIP: XML data source support

2023-07-28 Thread Sandip Agarwala
Dear Spark community, I would like to start the vote for "SPIP: XML data source support". XML is a widely used data format. An external spark-xml package ( https://github.com/databricks/spark-xml) is available to read and write XML data in spark. Making spark-xml built-in will provide a better

Re: Apache Arrow integration issue with Spark involving Netty

2023-07-28 Thread Dane Pitkin
Update! Netty has reverted the affecting change in v4.1.96. See netty commit here[1] and arrow PR to upgrade here[2]. The upcoming release of arrow-memory-netty v13 should work with netty versions <4.1.94 and >=4.1.96. [1]

Time for Spark 3.3.3 release?

2023-07-28 Thread Yuming Wang
Hi Spark devs, Since Apache Spark 3.3.2 tag creation (Feb 11), 60 patches have arrived at branch-3.3. Shall we make a new release, Apache Spark 3.3.3, as the third release at branch-3.3? I'd like to volunteer as the release manager

Re: Spark 3.0.0 EOL

2023-07-26 Thread Manu Zhang
Yes, I'm referring to this line. > The last minor release within a major a release will typically be maintained for longer as an “LTS” release Basically, I'm asking whether 3.5 will be the last 3.x release since we are already discussing 4.0. Thanks, Manu On Wed, Jul 26, 2023 at 7:35 PM Sean

Re: Spark 3.0.0 EOL

2023-07-26 Thread Sean Owen
There aren't "LTS" releases, though you might expect the last 3.x release will see maintenance releases longer. See end of https://spark.apache.org/versioning-policy.html On Wed, Jul 26, 2023 at 3:56 AM Manu Zhang wrote: > Will Apache Spark 3.5 be a LTS version? > > Thanks, > Manu > > On Mon,

Re: Spark 3.0.0 EOL

2023-07-26 Thread Manu Zhang
Will Apache Spark 3.5 be a LTS version? Thanks, Manu On Mon, Jul 24, 2023 at 4:26 PM Dongjoon Hyun wrote: > As Hyukjin replied, Apache Spark 3.0.0 is already in EOL status. > > To Pralabh, FYI, in the community, > > - Apache Spark 3.2 also reached the EOL already. >

Re: Spark 3.3 + parquet 1.10

2023-07-24 Thread Mich Talebzadeh
personally I have not done it myself. CCed to spark user group if some user has tried it among users. HTH Mich Talebzadeh, Solutions Architect/Engineering Lead Palantir Technologies Limited London United Kingdom view my Linkedin profile

Re: Spark 3.3 + parquet 1.10

2023-07-24 Thread Pralabh Kumar
Spark3.3 in OSS built with parquet 1.12. Just compiling with parquet 1.10 results in build failure , so just wondering if any one have build & compiled Spark 3.3 with parquet 1.10. Regards Pralabh Kumar On Mon, Jul 24, 2023 at 3:04 PM Mich Talebzadeh wrote: > Hi, > > Where is this limitation

Re: Spark 3.3 + parquet 1.10

2023-07-24 Thread Mich Talebzadeh
Hi, Where is this limitation coming from (using 1.1.0)? That is 2018 build Have you tried Spark 3.3 with parquet writes as is? Just a small PoC will prove it. HTH Mich Talebzadeh, Solutions Architect/Engineering Lead Palantir Technologies Limited London United Kingdom view my Linkedin

Spark 3.3 + parquet 1.10

2023-07-24 Thread Pralabh Kumar
Hi Dev community. I have a quick question with respect to Spark 3.3. Currently Spark 3.3 is built with parquet 1.12. However, anyone tried Spark 3.3 with parquet 1.10 . We are at Uber , planning to migrate Spark 3.3 but we have limitations of using parquet 1.10 . Has anyone tried building Spark

Re: Spark 3.0.0 EOL

2023-07-24 Thread Dongjoon Hyun
As Hyukjin replied, Apache Spark 3.0.0 is already in EOL status. To Pralabh, FYI, in the community, - Apache Spark 3.2 also reached the EOL already. https://lists.apache.org/thread/n4mdfwr5ksgpmrz0jpqp335qpvormos1 If you are considering Apache Spark 4, here is the other 3.x timeline, -

Re: Spark 3.0.0 EOL

2023-07-24 Thread Hyukjin Kwon
It's already EOL On Mon, Jul 24, 2023 at 4:17 PM Pralabh Kumar wrote: > Hi Dev Team > > If possible , can you please provide the Spark 3.0.0 EOL timelines . > > Regards > Pralabh Kumar > > > > >

Spark 3.0.0 EOL

2023-07-24 Thread Pralabh Kumar
Hi Dev Team If possible , can you please provide the Spark 3.0.0 EOL timelines . Regards Pralabh Kumar

Re: Spark 3.4.0 and 3.4.1 and Java version in Dockerfile

2023-07-23 Thread Mich Talebzadeh
Worth noting that these official dockerfiles https://hub.docker.com/_/spark were created with a valid java version *openjdk version "11.0.19" 2023-04-18* docker run -it* apache/spark:3.4.1-scala2.12-java11-r-ubuntu* /bin/bash ## downloaded from the above repository

Re: Spark 3.4.0 and 3.4.1 and Java version in Dockerfile

2023-07-22 Thread Mich Talebzadeh
Yes thanks, I know the answer. That was not what I was looking for. The provided script should be working one way or another which is not. good that someone has raised the issue already. that Jira was raised in Oct 2022 Mich Talebzadeh, Solutions Architect/Engineering Lead Palantir Technologies

Re: Spark 3.4.0 and 3.4.1 and Java version in Dockerfile

2023-07-22 Thread Bjørn Jørgensen
https://hub.docker.com/_/openjdk DEPRECATION NOTICE This image is officially deprecated and all users are recommended to find and use suitable replacements ASAP. Some examples of other Official Image alternatives (listed in alphabetical order with no intentional or implied preference): -

Spark 3.4.0 and 3.4.1 and Java version in Dockerfile

2023-07-22 Thread Mich Talebzadeh
Hi, I was checking the contents of Dockerfile for JAVA in Spark directory, .i.e ${SPARK_HOME}/kubernetes/dockerfiles/spark/Dockerfile in version 3.4.1 I recall that in 3.4.0, I made adjustment to Dockerfile content replacing #ARG java_image_tag=17-jre #FROM eclipse-temurin:${java_image_tag}

Re: Spark Docker Official Image is now available

2023-07-22 Thread Mich Talebzadeh
Hi, It helps if Spark binaries were added to PATH in the docker images. used to be there in previous versions like 3.1.3 etc docker run -it apache/spark:3.4.1-scala2.12-java11-r-ubuntu /bin/bash spark@e48cc28ff89e:/opt/spark/work-dir$ which spark-submit spark@e48cc28ff89e:/opt/spark/work-dir$

Re: Spark Docker Official Image is now available

2023-07-20 Thread Kent Yao
Thank you, Yikun! Kent Dongjoon Hyun 于2023年7月20日周四 19:25写道: > Thank you! > > Dongjoon > > On Thu, Jul 20, 2023 at 8:40 AM Xiao Li > wrote: > >> Thank you, Yikun! This is great! >> >> On Wed, Jul 19, 2023 at 7:55 PM Ruifeng Zheng wrote: >> >>> Awesome, thank you YiKun for driving this! >>>

Re: Spark Docker Official Image is now available

2023-07-20 Thread Dongjoon Hyun
Thank you! Dongjoon On Thu, Jul 20, 2023 at 8:40 AM Xiao Li wrote: > Thank you, Yikun! This is great! > > On Wed, Jul 19, 2023 at 7:55 PM Ruifeng Zheng wrote: > >> Awesome, thank you YiKun for driving this! >> >> On Thu, Jul 20, 2023 at 9:12 AM Hyukjin Kwon >> wrote: >> >>> This is amazing,

Re: Spark Docker Official Image is now available

2023-07-20 Thread Xiao Li
Thank you, Yikun! This is great! On Wed, Jul 19, 2023 at 7:55 PM Ruifeng Zheng wrote: > Awesome, thank you YiKun for driving this! > > On Thu, Jul 20, 2023 at 9:12 AM Hyukjin Kwon wrote: > >> This is amazing, finally! >> >> On Thu, 20 Jul 2023 at 10:10, Yikun Jiang wrote: >> >>> The spark

Re: Spark Docker Official Image is now available

2023-07-19 Thread Ruifeng Zheng
Awesome, thank you YiKun for driving this! On Thu, Jul 20, 2023 at 9:12 AM Hyukjin Kwon wrote: > This is amazing, finally! > > On Thu, 20 Jul 2023 at 10:10, Yikun Jiang wrote: > >> The spark Docker Official Image is now available: >> https://hub.docker.com/_/spark >> >> $ docker run -it --rm

Re: Spark Docker Official Image is now available

2023-07-19 Thread Hyukjin Kwon
This is amazing, finally! On Thu, 20 Jul 2023 at 10:10, Yikun Jiang wrote: > The spark Docker Official Image is now available: > https://hub.docker.com/_/spark > > $ docker run -it --rm *spark* /opt/spark/bin/spark-shell > $ docker run -it --rm *spark*:python3 /opt/spark/bin/pyspark > $ docker

Spark Docker Official Image is now available

2023-07-19 Thread Yikun Jiang
The spark Docker Official Image is now available: https://hub.docker.com/_/spark $ docker run -it --rm *spark* /opt/spark/bin/spark-shell $ docker run -it --rm *spark*:python3 /opt/spark/bin/pyspark $ docker run -it --rm *spark*:r /opt/spark/bin/sparkR We had a longer review journey than we

Re: [DISCUSS] SPIP: XML data source support

2023-07-19 Thread Maciej
That's a great idea, as long as we can keep additional dependencies under control. Best regards, Maciej Szymkiewicz Web:https://zero323.net PGP: A30CEF0C31A501EC On 7/19/23 18:22, Franco Patano wrote: +1 Many people have struggled with incorporating this separate library into their Spark

Re: [DISCUSS] SPIP: XML data source support

2023-07-19 Thread Franco Patano
+1 Many people have struggled with incorporating this separate library into their Spark pipelines. On Wed, Jul 19, 2023 at 10:53 AM Burak Yavuz wrote: > +1 on adding to Spark. Community involvement will make the XML reader > better. > > Best, > Burak > > On Wed, Jul 19, 2023 at 3:25 AM Martin

Re: [DISCUSS] SPIP: XML data source support

2023-07-19 Thread Burak Yavuz
+1 on adding to Spark. Community involvement will make the XML reader better. Best, Burak On Wed, Jul 19, 2023 at 3:25 AM Martin Andersson wrote: > Alright, makes sense to add it then. > -- > *From:* Hyukjin Kwon > *Sent:* Wednesday, July 19, 2023 11:01 > *To:*

Re: [DISCUSS] SPIP: XML data source support

2023-07-19 Thread Martin Andersson
Alright, makes sense to add it then. From: Hyukjin Kwon Sent: Wednesday, July 19, 2023 11:01 To: Martin Andersson Cc: Sandip Agarwala ; dev@spark.apache.org Subject: Re: [DISCUSS] SPIP: XML data source support EXTERNAL SENDER. Do not click links or open

Re: [DISCUSS] SPIP: XML data source support

2023-07-19 Thread Hyukjin Kwon
Here are the benefits of having it as a built-in source: - We can leverage the community to improve the Spark XML (not within Databricks repositories). - We can share the same core for XML expressions (e.g., from_xml and to_xml like from_csv, from_json, etc.). - It is more to

Re: [DISCUSS] SPIP: XML data source support

2023-07-19 Thread Martin Andersson
How much of an effort is it to use the spark-xml library today? What's the drawback to keeping this as an external library as-is? Best Regards, Martin From: Hyukjin Kwon Sent: Wednesday, July 19, 2023 01:27 To: Sandip Agarwala Cc: dev@spark.apache.org Subject:

Re: [DISCUSS] SPIP: XML data source support

2023-07-18 Thread Hyukjin Kwon
Yeah I support this. XML is pretty outdated format TBH but still used in many legacy systems. For example, Wikipedia dump is one case. Even when you take a look from stats CVS vs XML vs JSON, some show that XML is more used in CSV. On Wed, Jul 19, 2023 at 12:58 AM Sandip Agarwala <

Re: Spark Scala SBT Local build fails

2023-07-18 Thread Varun Shah
++ DEV community On Mon, Jul 17, 2023 at 4:14 PM Varun Shah wrote: > Resending this message with a proper Subject line > > Hi Spark Community, > > I am trying to set up my forked apache/spark project locally for my 1st > Open Source Contribution, by building and creating a package as mentioned

Re: Spark 3.5 Branch Cut

2023-07-17 Thread Yuanjian Li
Further reminder for the release timeline: DateEvent July 17th 2023 Code freeze. Release branch cut. Late July 2023 QA period. Focus on bug fixes, tests, stability and docs. Generally, no new features merged. August 2023 Release candidates (RC), voting, etc. until final release passes Please

Re: Spark 3.5 Branch Cut

2023-07-17 Thread Raghu Angadi
Thanks Yuanjian for accepting these for warmfix. Raghu. On Mon, Jul 17, 2023 at 1:04 PM Yuanjian Li wrote: > Hi, all > > FYI, I cut branch-3.5 as https://github.com/apache/spark/tree/branch-3.5 > > Here is the complete list of exception merge requests received before the > cut: > >- > >

Re: Spark 3.5 Branch Cut

2023-07-17 Thread Dongjoon Hyun
Thank you so much, Yuanjian! Dongjoon. On Mon, Jul 17, 2023 at 1:05 PM Yuanjian Li wrote: > Hi, all > > FYI, I cut branch-3.5 as https://github.com/apache/spark/tree/branch-3.5 > > Here is the complete list of exception merge requests received before the > cut: > >- > >SPARK-44421:

Spark 3.5 Branch Cut

2023-07-17 Thread Yuanjian Li
Hi, all FYI, I cut branch-3.5 as https://github.com/apache/spark/tree/branch-3.5 Here is the complete list of exception merge requests received before the cut: - SPARK-44421: Reattach to existing execute in Spark Connect (server mechanism) - SPARK-44423: Reattach to existing

Re: [Reminder] Spark 3.5 Branch Cut

2023-07-16 Thread Herman van Hovell
Hi Yuanjian, For the ongoing encoder work for the connect scala client I'd like to get the following tickets in: - SPARK-44396 : Direct Arrow Deserialization - SPARK-9 :

Re: Data Contracts

2023-07-16 Thread Phillip Henry
No worries. Have you had a chance to look at it? Since this thread has gone dead, I assume there is no appetite for adding data contract functionality..? Regards, Phillip On Mon, 19 Jun 2023, 11:23 Deepak Sharma, wrote: > Sorry for using simple in my last email . > It’s not gonna to be

Re: [Reminder] Spark 3.5 Branch Cut

2023-07-15 Thread Enrico Minack
Speaking of JdbcDialect, is there any interest in getting upserts for JDBC into 3.5.0? [SPARK-19335][SPARK-38200][SQL] Add upserts for writing to JDBC: https://github.com/apache/spark/pull/41518 [SPARK-19335][SPARK-38200][SQL] Add upserts for writing to JDBC using MERGE INTO with temp table:

Re: [Reminder] Spark 3.5 Branch Cut

2023-07-14 Thread Jia Fan
Can we put [SPARK-44262][SQL] Add `dropTable` and `getInsertStatement` to JdbcDialect into 3.5.0? https://github.com/apache/spark/pull/41855 Since this is the last major version update of 3.x, I think we need to make sure JdbcDialect can support more databases. Gengliang Wang 于2023年7月15日周六

Re: [Reminder] Spark 3.5 Branch Cut

2023-07-14 Thread Gengliang Wang
Hi Yuanjian, Besides the abovementioned changes, it would be great to include the UI page for Spakr Connect: SPARK-44394 . Best Regards, Gengliang On Fri, Jul 14, 2023 at 11:44 AM Julek Sompolski wrote: > Thank you, > My changes that you

Re: [Reminder] Spark 3.5 Branch Cut

2023-07-14 Thread Julek Sompolski
Thank you, My changes that you listed are tracked under this Epic: https://issues.apache.org/jira/browse/SPARK-43754 I am also working on https://issues.apache.org/jira/browse/SPARK-44422, didn't mention it before because I have hopes that this one will make it before the cut. (Unrelated) My

Re: [Reminder] Spark 3.5 Branch Cut

2023-07-14 Thread Raghu Angadi
Thank you. We plan to get remaining major pieces for Streaming Spark Connect (Epic SPARK-42938 ). I would like to request a warmfix exception for the following tweaks and improvements over the next two weeks (all in the same epic). -

Re: Time for Spark v3.5.0 release

2023-07-14 Thread Yuanjian Li
Thanks for raising all the requests. Let's stick to the previously agreed branch cut time. Based on past practice, let's label the above requests as exception features. I have just sent out a branch cut reminder titled "[Reminder] Spark 3.5 Branch Cut." Please ensure that all your requests are

[Reminder] Spark 3.5 Branch Cut

2023-07-14 Thread Yuanjian Li
Hi everyone, As discussed earlier in "Time for Spark v3.5.0 release", I will cut branch-3.5 on *Monday, July 17th at 1 pm PST* as scheduled. Please plan your PR merge accordingly with the given timeline. Currently, we have received the following exception merge requests: - SPARK-44421:

Re: Time for Spark v3.5.0 release

2023-07-14 Thread Julek Sompolski
I am working on SPARK-44421, SPARK-44423 and SPARK-44424 in Spark Connect to support execution reconnection. A week or two of warmfix grace period would be much appreciated for this work. Best regards, Juliusz Sompolski On Fri, Jul 14, 2023 at 5:40 PM Raghu Angadi wrote: > We have a bunch of

Re: Time for Spark v3.5.0 release

2023-07-14 Thread Raghu Angadi
We have a bunch of work in progress for Spark Connect trying to meet the branch cut deadline. Moving to 17th is certainly welcome. Is it feasible to extend it by a couple of more days? Alternatively, we could have a relaxed warmfix process for Spark Connect code for a week or two since it does

Unsubscribe

2023-07-13 Thread Dumas Hwang

unsubscribe

2023-07-13 Thread Raffael Bottoli Schemmer
unsubscribe

Re: Apache Arrow integration issue with Spark involving Netty

2023-07-13 Thread Dane Pitkin
I just want to add that there is a Spark Jira issue[1] for upgrading Netty once Arrow v13.0.0 is released this month. [1] https://issues.apache.org/jira/projects/SPARK/issues/SPARK-44212 On Thu, Jul 6, 2023 at 2:25 PM Dane Pitkin wrote: > Hi all, > > The next release of Apache Arrow v13.0.0

Re: [VOTE][RESULT] Python Data Source API

2023-07-11 Thread Mich Talebzadeh
Hi Allison, Great job and thanks for your efforts in driving this. Looking forward to seeing it in action soon! Best Mich Talebzadeh, Solutions Architect/Engineering Lead Palantir Technologies Limited London United Kingdom view my Linkedin profile

[VOTE][RESULT] Python Data Source API

2023-07-10 Thread Allison Wang
The vote passes with 12 +1s (8 binding +1s) and one +0 (binding). (* = binding) +1: - Hyukjin Kwon * - Xiao Li * - Denny Lee - Martin Grund - Mich Talebzadeh - Huaxin Gao * - Holden Karau * - Reynold Xin * - Jungtaek Lim - Ruifeng Zheng * - Takuya Ueshin * - Matei Zaharia * +0: Maciej

Re: [VOTE][SPIP] Python Data Source API

2023-07-10 Thread Jungtaek Lim
Just to be fully sure, SPIP does not cover streaming, but if the performance is not great compared to the JVM based implementation in any way (which I expect so), I don't think it's good to integrate with streaming which targets lower latency. That's the reason I gave +1 although it's not covering

Re: [VOTE][SPIP] Python Data Source API

2023-07-10 Thread Matei Zaharia
+1 > On Jul 10, 2023, at 10:19 AM, Takuya UESHIN wrote: > > +1 > > On Sun, Jul 9, 2023 at 10:05 PM Ruifeng Zheng > wrote: >> +1 >> >> On Mon, Jul 10, 2023 at 8:20 AM Jungtaek Lim > > wrote: >>> +1 >>> >>> On Sat, Jul 8, 2023

Re: [VOTE][SPIP] Python Data Source API

2023-07-10 Thread Takuya UESHIN
+1 On Sun, Jul 9, 2023 at 10:05 PM Ruifeng Zheng wrote: > +1 > > On Mon, Jul 10, 2023 at 8:20 AM Jungtaek Lim > wrote: > >> +1 >> >> On Sat, Jul 8, 2023 at 4:13 AM Reynold Xin >> wrote: >> >>> +1! >>> >>> >>> On Fri, Jul 7 2023 at 11:58 AM, Holden Karau >>> wrote: >>> +1 On

Unsubscribe

2023-07-10 Thread Bode, Meikel
Unsubscribe

Re: [VOTE][SPIP] Python Data Source API

2023-07-09 Thread Ruifeng Zheng
+1 On Mon, Jul 10, 2023 at 8:20 AM Jungtaek Lim wrote: > +1 > > On Sat, Jul 8, 2023 at 4:13 AM Reynold Xin > wrote: > >> +1! >> >> >> On Fri, Jul 7 2023 at 11:58 AM, Holden Karau >> wrote: >> >>> +1 >>> >>> On Fri, Jul 7, 2023 at 9:55 AM huaxin gao >>> wrote: >>> +1 On Fri,

Re: [VOTE][SPIP] Python Data Source API

2023-07-09 Thread Jungtaek Lim
+1 On Sat, Jul 8, 2023 at 4:13 AM Reynold Xin wrote: > +1! > > > On Fri, Jul 7 2023 at 11:58 AM, Holden Karau > wrote: > >> +1 >> >> On Fri, Jul 7, 2023 at 9:55 AM huaxin gao wrote: >> >>> +1 >>> >>> On Fri, Jul 7, 2023 at 8:59 AM Mich Talebzadeh < >>> mich.talebza...@gmail.com> wrote: >>>

Re: [VOTE][SPIP] Python Data Source API

2023-07-07 Thread Reynold Xin
+1! On Fri, Jul 7 2023 at 11:58 AM, Holden Karau < hol...@pigscanfly.ca > wrote: > > +1 > > > On Fri, Jul 7, 2023 at 9:55 AM huaxin gao < huaxin.ga...@gmail.com > wrote: > > > >> +1 >> >> >> On Fri, Jul 7, 2023 at 8:59 AM Mich Talebzadeh < mich.talebza...@gmail.com >> > wrote: >> >>

Re: [VOTE][SPIP] Python Data Source API

2023-07-07 Thread Holden Karau
+1 On Fri, Jul 7, 2023 at 9:55 AM huaxin gao wrote: > +1 > > On Fri, Jul 7, 2023 at 8:59 AM Mich Talebzadeh > wrote: > >> +1 for me >> >> Mich Talebzadeh, >> Solutions Architect/Engineering Lead >> Palantir Technologies Limited >> London >> United Kingdom >> >> >>view my Linkedin profile

Re: [VOTE][SPIP] Python Data Source API

2023-07-07 Thread huaxin gao
+1 On Fri, Jul 7, 2023 at 8:59 AM Mich Talebzadeh wrote: > +1 for me > > Mich Talebzadeh, > Solutions Architect/Engineering Lead > Palantir Technologies Limited > London > United Kingdom > > >view my Linkedin profile > > > >

Re: [VOTE][SPIP] Python Data Source API

2023-07-07 Thread Mich Talebzadeh
+1 for me Mich Talebzadeh, Solutions Architect/Engineering Lead Palantir Technologies Limited London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk.

Re: [VOTE][SPIP] Python Data Source API

2023-07-07 Thread Martin Grund
+1 (non-binding) On Fri, Jul 7, 2023 at 12:05 AM Denny Lee wrote: > +1 (non-binding) > > On Fri, Jul 7, 2023 at 00:50 Maciej wrote: > >> +0 >> >> Best regards, >> Maciej Szymkiewicz >> >> Web: https://zero323.net >> PGP: A30CEF0C31A501EC >> >> On 7/6/23 17:41, Xiao Li wrote: >> >> +1 >> >>

Re: [VOTE][SPIP] Python Data Source API

2023-07-06 Thread Denny Lee
+1 (non-binding) On Fri, Jul 7, 2023 at 00:50 Maciej wrote: > +0 > > Best regards, > Maciej Szymkiewicz > > Web: https://zero323.net > PGP: A30CEF0C31A501EC > > On 7/6/23 17:41, Xiao Li wrote: > > +1 > > Xiao > > Hyukjin Kwon 于2023年7月5日周三 17:28写道: > >> +1. >> >> See

Apache Arrow integration issue with Spark involving Netty

2023-07-06 Thread Dane Pitkin
Hi all, The next release of Apache Arrow v13.0.0 coming this month[1] has upgraded Netty to v4.1.94.Final[2] due to a moderate severity CVE[3]. We are seeing that Spark using Netty v4.1.93.Final is not compatible with Arrow v13.0.0, throwing an exception at runtime[4]. There has been some talk in

Re: [VOTE][SPIP] Python Data Source API

2023-07-06 Thread Maciej
+0 Best regards, Maciej Szymkiewicz Web:https://zero323.net PGP: A30CEF0C31A501EC On 7/6/23 17:41, Xiao Li wrote: +1 Xiao Hyukjin Kwon 于2023年7月5日周三 17:28写道: +1. See https://youtu.be/yj7XlTB1Jvc?t=604 :-). On Thu, 6 Jul 2023 at 09:15, Allison Wang wrote: Hi all,

Re: [VOTE][SPIP] Python Data Source API

2023-07-06 Thread Xiao Li
+1 Xiao Hyukjin Kwon 于2023年7月5日周三 17:28写道: > +1. > > See https://youtu.be/yj7XlTB1Jvc?t=604 :-). > > On Thu, 6 Jul 2023 at 09:15, Allison Wang > wrote: > >> Hi all, >> >> I'd like to start the vote for SPIP: Python Data Source API. >> >> The high-level summary for the SPIP is that it aims to

Re: [VOTE][SPIP] Python Data Source API

2023-07-05 Thread Hyukjin Kwon
+1. See https://youtu.be/yj7XlTB1Jvc?t=604 :-). On Thu, 6 Jul 2023 at 09:15, Allison Wang wrote: > Hi all, > > I'd like to start the vote for SPIP: Python Data Source API. > > The high-level summary for the SPIP is that it aims to introduce a simple > API in Python for Data Sources. The idea

[VOTE][SPIP] Python Data Source API

2023-07-05 Thread Allison Wang
Hi all, I'd like to start the vote for SPIP: Python Data Source API. The high-level summary for the SPIP is that it aims to introduce a simple API in Python for Data Sources. The idea is to enable Python developers to create data sources without learning Scala or dealing with the complexities of

Re: Time for Spark v3.5.0 release

2023-07-04 Thread Xinrong Meng
+1 Thank you! On Tue, Jul 4, 2023 at 3:04 PM Jungtaek Lim wrote: > +1 > > On Wed, Jul 5, 2023 at 2:23 AM L. C. Hsieh wrote: > >> +1 >> >> Thanks Yuanjian. >> >> On Tue, Jul 4, 2023 at 7:45 AM yangjie01 wrote: >> > >> > +1 >> > >> > >> > >> > 发件人: Maxim Gekk >> > 日期: 2023年7月4日 星期二 17:24 >> >

Re: Time for Spark v3.5.0 release

2023-07-04 Thread Jungtaek Lim
+1 On Wed, Jul 5, 2023 at 2:23 AM L. C. Hsieh wrote: > +1 > > Thanks Yuanjian. > > On Tue, Jul 4, 2023 at 7:45 AM yangjie01 wrote: > > > > +1 > > > > > > > > 发件人: Maxim Gekk > > 日期: 2023年7月4日 星期二 17:24 > > 收件人: Kent Yao > > 抄送: "dev@spark.apache.org" > > 主题: Re: Time for Spark v3.5.0

Re: Time for Spark v3.5.0 release

2023-07-04 Thread L. C. Hsieh
+1 Thanks Yuanjian. On Tue, Jul 4, 2023 at 7:45 AM yangjie01 wrote: > > +1 > > > > 发件人: Maxim Gekk > 日期: 2023年7月4日 星期二 17:24 > 收件人: Kent Yao > 抄送: "dev@spark.apache.org" > 主题: Re: Time for Spark v3.5.0 release > > > > +1 > > On Tue, Jul 4, 2023 at 11:55 AM Kent Yao wrote: > > +1, thank you

Re: Time for Spark v3.5.0 release

2023-07-04 Thread yangjie01
+1 发件人: Maxim Gekk 日期: 2023年7月4日 星期二 17:24 收件人: Kent Yao 抄送: "dev@spark.apache.org" 主题: Re: Time for Spark v3.5.0 release +1 On Tue, Jul 4, 2023 at 11:55 AM Kent Yao mailto:y...@apache.org>> wrote: +1, thank you Kent On 2023/07/04 05:32:52 Dongjoon Hyun wrote: > +1 > > Thank you, Yuanjian

Re: Time for Spark v3.5.0 release

2023-07-04 Thread Jia Fan
+1 Maxim Gekk 于2023年7月4日周二 17:23写道: > +1 > > On Tue, Jul 4, 2023 at 11:55 AM Kent Yao wrote: > >> +1, thank you >> >> Kent >> >> On 2023/07/04 05:32:52 Dongjoon Hyun wrote: >> > +1 >> > >> > Thank you, Yuanjian >> > >> > Dongjoon >> > >> > On Tue, Jul 4, 2023 at 1:03 AM Hyukjin Kwon >> wrote:

Re: Time for Spark v3.5.0 release

2023-07-04 Thread Maxim Gekk
+1 On Tue, Jul 4, 2023 at 11:55 AM Kent Yao wrote: > +1, thank you > > Kent > > On 2023/07/04 05:32:52 Dongjoon Hyun wrote: > > +1 > > > > Thank you, Yuanjian > > > > Dongjoon > > > > On Tue, Jul 4, 2023 at 1:03 AM Hyukjin Kwon > wrote: > > > > > Yeah one day postponed shouldn't be a big deal.

Re: Time for Spark v3.5.0 release

2023-07-04 Thread Kent Yao
+1, thank you Kent On 2023/07/04 05:32:52 Dongjoon Hyun wrote: > +1 > > Thank you, Yuanjian > > Dongjoon > > On Tue, Jul 4, 2023 at 1:03 AM Hyukjin Kwon wrote: > > > Yeah one day postponed shouldn't be a big deal. > > > > On Tue, Jul 4, 2023 at 7:10 AM Yuanjian Li wrote: > > > >> Hi All,

Re: Time for Spark v3.5.0 release

2023-07-03 Thread Dongjoon Hyun
+1 Thank you, Yuanjian Dongjoon On Tue, Jul 4, 2023 at 1:03 AM Hyukjin Kwon wrote: > Yeah one day postponed shouldn't be a big deal. > > On Tue, Jul 4, 2023 at 7:10 AM Yuanjian Li wrote: > >> Hi All, >> >> According to the Spark versioning policy at >>

<    8   9   10   11   12   13   14   15   16   17   >