Re: [VOTE] SPARK-44444: Use ANSI SQL mode by default

2024-04-13 Thread Chao Sun
+1. This feature is very helpful for guarding against correctness issues, such as null results due to invalid input or math overflows. It’s been there for a while now and it’s a good time to enable it by default as Spark enters the next major release. On Sat, Apr 13, 2024 at 3:27 PM Dongjoon

Re: [VOTE] Add new `Versions` in Apache Spark JIRA for Versioning of Spark Operator

2024-04-12 Thread Chao Sun
+1 On Fri, Apr 12, 2024 at 4:23 PM Xiao Li wrote: > +1 > > > > > On Fri, Apr 12, 2024 at 14:30 bo yang wrote: > >> +1 >> > >> On Fri, Apr 12, 2024 at 12:34 PM huaxin gao >> wrote: >> >>> +1 >>> >>> On Fri, Apr 12, 2024 at 9:07 AM Dongjoon Hyun >>> wrote: >>> +1 Thank you!

Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-04-01 Thread Chao Sun
+1 On Sun, Mar 31, 2024 at 10:31 PM Hyukjin Kwon wrote: > Oh I didn't send the discussion thread out as it's pretty simple, > non-invasive and the discussion was sort of done as part of the Spark > Connect initial discussion .. > > On Mon, Apr 1, 2024 at 1:59 PM Mridul Muralidharan > wrote: >

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-12 Thread Chao Sun
+1 On Tue, Mar 12, 2024 at 8:03 AM Xiao Li wrote: > +1 > > On Tue, Mar 12, 2024 at 6:09 AM Holden Karau > wrote: > >> +1 >> >> Twitter: https://twitter.com/holdenkarau >> Books (Learning Spark, High Performance Spark, etc.): >> https://amzn.to/2MaRAG9 >> YouTube Live

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

2024-02-19 Thread Chao Sun
*Disclaimer:* The information provided is correct to the best of my > knowledge, sourced from both personal expertise and other resources but of > course cannot be guaranteed . It is essential to note that, as with any > advice, one verified and tested result holds more weight than a thousand

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

2024-02-14 Thread Chao Sun
> wanted to try out some realtime aggregate performance on top of parquet and > spark dataframes. > > Thanks and Regards > Praveen > > > On Wed, Feb 14, 2024 at 9:20 AM Chao Sun wrote: > >> > Out of interest what are the differences in the approach between this &g

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

2024-02-14 Thread Chao Sun
one! >> >> Yufei >> >> >> On Tue, Feb 13, 2024 at 12:43 PM Chao Sun wrote: >>> >>> Hi all, >>> >>> We are very happy to announce that Project Comet, a plugin to >>> accelerate Spark query execution via leveraging DataFusion an

Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

2024-02-13 Thread Chao Sun
Hi all, We are very happy to announce that Project Comet, a plugin to accelerate Spark query execution via leveraging DataFusion and Arrow, has now been open sourced under the Apache Arrow umbrella. Please check the project repo https://github.com/apache/arrow-datafusion-comet for more details if

Re: [VOTE] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-14 Thread Chao Sun
+1 On Tue, Nov 14, 2023 at 9:52 AM L. C. Hsieh wrote: > > +1 > > On Tue, Nov 14, 2023 at 9:46 AM Ye Zhou wrote: > > > > +1(Non-binding) > > > > On Tue, Nov 14, 2023 at 9:42 AM L. C. Hsieh wrote: > >> > >> Hi all, > >> > >> I’d like to start a vote for SPIP: An Official Kubernetes Operator for

Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-09 Thread Chao Sun
+1 On Thu, Nov 9, 2023 at 6:36 PM Xiao Li wrote: > > +1 > > huaxin gao 于2023年11月9日周四 16:53写道: >> >> +1 >> >> On Thu, Nov 9, 2023 at 3:14 PM DB Tsai wrote: >>> >>> +1 >>> >>> To be completely transparent, I am employed in the same department as Zhou >>> at Apple. >>> >>> I support this

Re: Welcome to Our New Apache Spark Committer and PMCs

2023-10-04 Thread Chao Sun
Congratulations! On Wed, Oct 4, 2023 at 5:11 AM Jungtaek Lim wrote: > Congrats! > > 2023년 10월 4일 (수) 오후 5:04, yangjie01 님이 작성: > >> Congratulations! >> >> >> >> Jie Yang >> >> >> >> *发件人**: *Dongjoon Hyun >> *日期**: *2023年10月4日 星期三 13:04 >> *收件人**: *Hyukjin Kwon >> *抄送**: *Hussein Awala , Rui

Re: [VOTE] Release Spark 3.4.1 (RC1)

2023-06-22 Thread Chao Sun
+1 On Thu, Jun 22, 2023 at 6:52 AM Yuming Wang wrote: > > +1. > > On Thu, Jun 22, 2023 at 4:41 PM Jacek Laskowski wrote: >> >> +1 >> >> Builds and runs fine on Java 17, macOS. >> >> $ ./dev/change-scala-version.sh 2.13 >> $ mvn \ >>

Re: [VOTE] Release Plan for Apache Spark 4.0.0 (June 2024)

2023-06-12 Thread Chao Sun
+1 On Mon, Jun 12, 2023 at 12:50 PM kazuyuki tanimura wrote: > +1 (non-binding) > > Thank you! > Kazu > > > On Jun 12, 2023, at 11:32 AM, Holden Karau wrote: > > -0 > > I'd like to see more of a doc around what we're planning on for a 4.0 > before we pick a target release date etc. (feels like

Re: Apache Spark 3.4.1 Release?

2023-06-08 Thread Chao Sun
+1 too On Thu, Jun 8, 2023 at 2:34 PM kazuyuki tanimura wrote: > > +1 (non-binding), Thank you Dongjoon > > Kazu > - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: [CONNECT] New Clients for Go and Rust

2023-05-25 Thread Chao Sun
+1 on separate repo too On Thu, May 25, 2023 at 12:43 PM Dongjoon Hyun wrote: > > +1 for starting on a separate repo. > > Dongjoon. > > On Thu, May 25, 2023 at 9:53 AM yangjie01 wrote: >> >> +1 on start this with a separate repo. >> >> Which new clients can be placed in the main repo should be

hadoop-2 profile to be removed in 3.5.0

2023-04-14 Thread Chao Sun
Hi all, Just a heads up that `hadoop-2` profile is going to be removed in Apache Spark 3.5.0. This has been discussed previously through this email thread: https://lists.apache.org/thread/z4jdy9959b6zj9t726zl0zcrk4hzs0xs and is now realized via https://issues.apache.org/jira/browse/SPARK-42452

Re: [VOTE] Release Apache Spark 3.4.0 (RC7)

2023-04-10 Thread Chao Sun
+1 (non-binding) On Mon, Apr 10, 2023 at 12:41 AM Ruifeng Zheng wrote: > +1 (non-binding) > > -- > Ruifeng Zheng > ruife...@foxmail.com > >

Re: [VOTE] Release Apache Spark 3.2.4 (RC1)

2023-04-10 Thread Chao Sun
+1 (non-binding) On Mon, Apr 10, 2023 at 7:07 AM yangjie01 wrote: > +1 (non-binding) > > > > *发件人**: *Sean Owen > *日期**: *2023年4月10日 星期一 21:19 > *收件人**: *Dongjoon Hyun > *抄送**: *"dev@spark.apache.org" > *主题**: *Re: [VOTE] Release Apache Spark 3.2.4 (RC1) > > > > +1 from me > > > > On Sun,

Re: Apache Spark 3.2.4 EOL Release?

2023-04-04 Thread Chao Sun
+1 On Tue, Apr 4, 2023 at 11:12 AM Holden Karau wrote: > +1 > > On Tue, Apr 4, 2023 at 11:04 AM L. C. Hsieh wrote: > >> +1 >> >> Sounds good and thanks Dongjoon for driving this. >> >> On 2023/04/04 17:24:54 Dongjoon Hyun wrote: >> > Hi, All. >> > >> > Since Apache Spark 3.2.0 passed RC7 vote

Re: [ANNOUNCE] Apache Spark 3.3.2 released

2023-02-17 Thread Chao Sun
Thanks Liang-Chi! On Fri, Feb 17, 2023 at 1:28 AM kazuyuki tanimura wrote: > Great, Thank you Liang-Chi > > Kazu > > On Feb 17, 2023, at 1:02 AM, Wanqiang Ji wrote: > > Congratulations! > > On Fri, Feb 17, 2023 at 4:59 PM L. C. Hsieh wrote: > > > We are happy to announce the availability of

Re: [VOTE] Release Spark 3.3.2 (RC1)

2023-02-13 Thread Chao Sun
+1 On Mon, Feb 13, 2023 at 9:20 AM L. C. Hsieh wrote: > > If it is not supported in Spark 3.3.x, it looks like an improvement at > Spark 3.4. > For such cases we usually do not back port. I think this is also why > the PR did not back port when it was merged. > > I'm okay if there is consensus

Re: Time for release v3.3.2

2023-01-30 Thread Chao Sun
+1, thanks Liang-Chi for volunteering! Chao On Mon, Jan 30, 2023 at 5:51 PM L. C. Hsieh wrote: > > Hi Spark devs, > > As you know, it has been 4 months since Spark 3.3.1 was released on > 2022/10, it seems a good time to think about next maintenance release, > i.e. Spark 3.3.2. > > I'm thinking

Re: Time for Spark 3.4.0 release?

2023-01-04 Thread Chao Sun
+1, thanks! Chao On Wed, Jan 4, 2023 at 1:56 AM Mridul Muralidharan wrote: > > +1, Thanks ! > > Regards, > Mridul > > On Wed, Jan 4, 2023 at 2:20 AM Gengliang Wang wrote: > >> +1, thanks for driving the release! >> >> >> Gengliang >> >> On Tue, Jan 3, 2023 at 10:55 PM Dongjoon Hyun >> wrote:

[ANNOUNCE] Apache Spark 3.2.3 released

2022-11-29 Thread Chao Sun
We are happy to announce the availability of Apache Spark 3.2.3! Spark 3.2.3 is a maintenance release containing stability fixes. This release is based on the branch-3.2 maintenance branch of Spark. We strongly recommend all 3.2 users to upgrade to this stable release. To download Spark 3.2.3,

Re: [VOTE][RESULT] Release Spark 3.2.3, RC1

2022-11-18 Thread Chao Sun
(*) - Ruifeng Zheng - Chao Sun +0: None -1: None On Fri, Nov 18, 2022 at 10:35 AM Chao Sun wrote: > > Oops, sorry! I thought he voted but for some reason I didn't see his > vote in the email thread. Strange. Now I found it in here: > https://lists.apache.org/thread/gh2oktrndxopqnyxbsvp

Re: [VOTE][RESULT] Release Spark 3.2.3, RC1

2022-11-18 Thread Chao Sun
sing Sean Owen's vote. > > - Mridul > > > > On Fri, Nov 18, 2022 at 11:51 AM Chao Sun wrote: >> >> The vote passes with 11 +1s (5 binding +1s). >> Thanks to all who helped with the release! >> >> (* = binding) >> +1: >> - Dongjoon Hyu

[VOTE][RESULT] Release Spark 3.2.3, RC1

2022-11-18 Thread Chao Sun
The vote passes with 11 +1s (5 binding +1s). Thanks to all who helped with the release! (* = binding) +1: - Dongjoon Hyun (*) - L. C. Hsieh (*) - Huaxin Gao (*) - Kazuyuki Tanimura - Mridul Muralidharan (*) - Yuming Wang - Chris Nauroth - Yang Jie - Wenche Fan (*) - Ruifeng Zheng - Chao Sun +0

Re: [VOTE] Release Spark 3.2.3 (RC1)

2022-11-18 Thread Chao Sun
g,Jie(INF)"; > *Cc:* "Chris Nauroth";"Yuming > Wang";"Dongjoon > Hyun";"huaxin gao";"L. > C. Hsieh";"Chao Sun";"dev"< > dev@spark.apache.org>; > *Subject:* Re: [VOTE] Release Spark 3.2.3 (RC1) > &g

[VOTE] Release Spark 3.2.3 (RC1)

2022-11-14 Thread Chao Sun
Please vote on releasing the following candidate as Apache Spark version 3.2.3. The vote is open until 11:59pm Pacific time Nov 17th and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 3.2.3 [ ] -1 Do not release this package

Re: [ANNOUNCE] Apache Spark 3.3.1 released

2022-10-26 Thread Chao Sun
Congrats everyone! and thanks Yuming for driving the release! On Wed, Oct 26, 2022 at 7:37 AM beliefer wrote: > > Congratulations everyone have contributed to this release. > > > At 2022-10-26 14:21:36, "Yuming Wang" wrote: > > We are happy to announce the availability of Apache Spark 3.3.1! >

Re: [VOTE] Release Spark 3.3.1 (RC4)

2022-10-18 Thread Chao Sun
+1. Thanks Yuming! Chao On Tue, Oct 18, 2022 at 1:18 PM Thomas graves wrote: > > +1. Ran internal test suite. > > Tom > > On Sun, Oct 16, 2022 at 9:14 PM Yuming Wang wrote: > > > > Please vote on releasing the following candidate as Apache Spark version > > 3.3.1. > > > > The vote is open

Apache Spark 3.2.3 Release?

2022-10-18 Thread Chao Sun
Hi All, It's been more than 3 months since 3.2.2 (tagged at Jul 11) was released There are now 66 patches accumulated in branch-3.2, including 2 correctness issues. Is it a good time to start a new release? If there's no objection, I'd like to volunteer as the release manager for the 3.2.3

Re: Welcome Yikun Jiang as a Spark committer

2022-10-09 Thread Chao Sun
Congratulations Yikun! On Sun, Oct 9, 2022 at 11:14 AM vaquar khan wrote: > Congratulations. > > Regards, > Vaquar khan > > On Sun, Oct 9, 2022, 6:46 AM 叶先进 wrote: > >> Congrats >> >> On Oct 9, 2022, at 16:44, XiDuo You wrote: >> >> Congratulations, Yikun ! >> >> Maxim Gekk 于2022年10月9日周日

Re: Dropping Apache Spark Hadoop2 Binary Distribution?

2022-10-05 Thread Chao Sun
+1 > and specifically may allow us to finally move off of the ancient version of Guava (?) I think the Guava issue comes from Hive 2.3 dependency, not Hadoop. On Wed, Oct 5, 2022 at 1:55 PM Xinrong Meng wrote: > +1. > > On Wed, Oct 5, 2022 at 1:53 PM Xiao Li > wrote: > >> +1. >> >> Xiao >>

Re: [DISCUSS] SPIP: Support Docker Official Image for Spark

2022-09-20 Thread Chao Sun
+1 (non-binding) On Mon, Sep 19, 2022 at 10:17 PM Wenchen Fan wrote: > > +1 > > On Mon, Sep 19, 2022 at 2:59 PM Yang,Jie(INF) wrote: >> >> +1 (non-binding) >> >> >> >> Yang Jie >> >> >> 发件人: Yikun Jiang >> 发送时间: 2022年9月19日 14:23:14 >> 收件人: Denny Lee >> 抄送: bo

Re: [VOTE] Release Spark 3.3.1 (RC1)

2022-09-18 Thread Chao Sun
It'd be really nice if we can include https://issues.apache.org/jira/browse/SPARK-40169 in this release, since otherwise it'll introduce a perf regression with Parquet column index disabled. On Sat, Sep 17, 2022 at 2:08 PM Sean Owen wrote: > > +1 LGTM. I tested Scala 2.13 + Java 11 on Ubuntu

Re: Spark 3.3.0/3.2.2: java.io.IOException: can not read class org.apache.parquet.format.PageHeader: don't know what type: 15

2022-09-01 Thread Chao Sun
Hi Fengyu, Do you still have the Parquet file that caused the error? could you open a JIRA and attach the file to it? I can take a look. Chao On Thu, Sep 1, 2022 at 4:03 AM FengYu Cao wrote: > > I'm trying to upgrade our spark (3.2.1 now) > > but with spark 3.3.0 and spark 3.2.2, we had error

Re: Welcome Xinrong Meng as a Spark committer

2022-08-09 Thread Chao Sun
Congratulations! On Tue, Aug 9, 2022 at 1:00 PM huaxin gao wrote: > > Congratulations! > > On Tue, Aug 9, 2022 at 12:47 PM Dongjoon Hyun wrote: >> >> Congrat! :) >> >> Dongjoon. >> >> On Tue, Aug 9, 2022 at 10:40 AM Takuya UESHIN wrote: >> > >> > Congratulations, Xinrong! >> > >> > On Tue, Aug

Re: Welcoming three new PMC members

2022-08-09 Thread Chao Sun
Congrats everyone! On Tue, Aug 9, 2022 at 5:36 PM Dongjoon Hyun wrote: > > Congrat to all! > > Dongjoon. > > On Tue, Aug 9, 2022 at 5:13 PM Takuya UESHIN wrote: > > > > Congratulations! > > > > On Tue, Aug 9, 2022 at 4:57 PM Hyukjin Kwon wrote: > >> > >> Congrats everybody! > >> > >> On Wed,

Re: Update Spark 3.4 Release Window?

2022-07-21 Thread Chao Sun
+1 for Jan 2023 (Code freeze) and Feb 2023 (RC). Chao On Thu, Jul 21, 2022 at 11:43 AM L. C. Hsieh wrote: > > I'm also +1 for Feb. 2023 (RC) and Jan. 2023 (Code freeze). > > Liang-Chi > > On Wed, Jul 20, 2022 at 2:02 PM Dongjoon Hyun wrote: > > > > I fixed typos :) > > > > +1 for February 2023

Re: [VOTE] Release Spark 3.2.2 (RC1)

2022-07-14 Thread Chao Sun
+1 (non-binding) On Thu, Jul 14, 2022 at 12:40 AM Wenchen Fan wrote: > > +1 > > On Wed, Jul 13, 2022 at 7:29 PM Yikun Jiang wrote: >> >> +1 (non-binding) >> >> Checked out tag and built from source on Linux aarch64 and ran some basic >> test. >> >> >> Regards, >> Yikun >> >> >> On Wed, Jul 13,

Re: [VOTE] Release Spark 3.3.0 (RC6)

2022-06-13 Thread Chao Sun
+1 (non-binding) Thanks, Chao On Mon, Jun 13, 2022 at 5:37 PM Cheng Su wrote: > +1 (non-binding). > > > > Thanks, > > Cheng Su > > > > *From: *L. C. Hsieh > *Date: *Monday, June 13, 2022 at 5:13 PM > *To: *dev > *Subject: *Re: [VOTE] Release Spark 3.3.0 (RC6) > > +1 > > On Mon, Jun 13, 2022

Re: [VOTE][SPIP] Spark Connect

2022-06-13 Thread Chao Sun
+1 (non-binding) On Mon, Jun 13, 2022 at 5:11 PM Hyukjin Kwon wrote: > +1 > > On Tue, 14 Jun 2022 at 08:50, Yuming Wang wrote: > >> +1. >> >> On Tue, Jun 14, 2022 at 2:20 AM Matei Zaharia >> wrote: >> >>> +1, very excited about this direction. >>> >>> Matei >>> >>> On Jun 13, 2022, at 11:07

Re: SIGMOD System Award for Apache Spark

2022-05-13 Thread Chao Sun
Huge congrats to the whole community! On Fri, May 13, 2022 at 1:56 AM Wenchen Fan wrote: > Great! Congratulations to everyone! > > On Fri, May 13, 2022 at 10:38 AM Gengliang Wang wrote: > >> Congratulations to the whole spark community! >> >> On Fri, May 13, 2022 at 10:14 AM Jungtaek Lim < >>

Re: Apache Spark 3.3 Release

2022-03-16 Thread Chao Sun
branch > together. This situation only becomes worse and worse because there is no way > to block the other patches from landing unintentionally if we don't cut a > branch. > > [SPARK-38335][SQL] Implement parser support for DEFAULT column values > > Let's cut `branch-3.3` Today f

Re: Apache Spark 3.3 Release

2022-03-15 Thread Chao Sun
K-38548][SQL] New SQL function: try_sum >> Do you mean we should include them, or exclude them from 3.3? > > > If possible, I hope these features can be shipped with Spark 3.3. > > > > Chao Sun 于2022年3月15日周二 10:06写道: >> >> Hi Xiao, >> >> For the foll

Re: Apache Spark 3.3 Release

2022-03-15 Thread Chao Sun
Hi Xiao, For the following list: #35789 [SPARK-32268][SQL] Row-level Runtime Filtering #34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized reader #35848 [SPARK-38548][SQL] New SQL function: try_sum Do you mean we should include them, or exclude them from 3.3? Thanks, Chao

Re: Apache Spark 3.3 Release

2022-03-14 Thread Chao Sun
SPIPs as well but I'm not involved in those so not quite sure whether they are intended for 3.3 release. Chao Chao On Mon, Mar 14, 2022 at 8:53 PM Xiao Li wrote: > > Could you please list which features we want to finish before the branch cut? > How long will they take? > > Xiao

Re: Apache Spark 3.3 Release

2022-03-14 Thread Chao Sun
Hi Max, As there are still some ongoing work for the above listed SPIPs, can we still merge them after the branch cut? Thanks, Chao On Mon, Mar 14, 2022 at 6:12 AM Maxim Gekk wrote: > Hi All, > > Since there are no actual blockers for Spark 3.3.0 and significant > objections, I am going to

Re: [VOTE] SPIP: Catalog API for view metadata

2022-02-03 Thread Chao Sun
+1 (non-binding). Looking forward to this feature! On Thu, Feb 3, 2022 at 2:32 PM Ryan Blue wrote: > +1 for the SPIP. I think it's well designed and it has worked quite well > at Netflix for a long time. > > On Thu, Feb 3, 2022 at 2:04 PM John Zhuge wrote: > >> Hi Spark community, >> >> I’d

Re: [ANNOUNCE] Apache Spark 3.2.1 released

2022-01-28 Thread Chao Sun
Thanks Huaxin for driving the release! On Fri, Jan 28, 2022 at 5:37 PM Ruifeng Zheng wrote: > It's Great! > Congrats and thanks, huaxin! > > > -- 原始邮件 -- > *发件人:* "huaxin gao" ; > *发送时间:* 2022年1月29日(星期六) 上午9:07 > *收件人:* "dev";"user"; > *主题:* [ANNOUNCE] Apache

Re: [VOTE] Release Spark 3.2.1 (RC2)

2022-01-24 Thread Chao Sun
+1 (non-binding) On Mon, Jan 24, 2022 at 6:32 AM Michael Heuer wrote: > +1 (non-binding) > >michael > > > On Jan 24, 2022, at 7:30 AM, Gengliang Wang wrote: > > +1 (non-binding) > > On Mon, Jan 24, 2022 at 6:26 PM Dongjoon Hyun > wrote: > >> +1 >> >> Dongjoon. >> >> On Sat, Jan 22, 2022

Re: [VOTE] Release Spark 3.2.1 (RC1)

2022-01-12 Thread Chao Sun
+1 (non-binding). Thanks Huaxin for driving the release! On Tue, Jan 11, 2022 at 11:56 PM Ruifeng Zheng wrote: > +1 (non-binding) > > Thanks, ruifeng zheng > > -- Original -- > *From:* "Cheng Su" ; > *Date:* Wed, Jan 12, 2022 02:54 PM > *To:* "Qian Sun";"huaxin

Re: [VOTE] SPIP: Row-level operations in Data Source V2

2021-11-14 Thread Chao Sun
+1 (non-binding). Thanks Anton for the work! On Sun, Nov 14, 2021 at 10:01 AM Ryan Blue wrote: > +1 > > Thanks to Anton for all this great work! > > On Sat, Nov 13, 2021 at 8:24 AM Mich Talebzadeh > wrote: > >> +1 non-binding >> >> >> >>view my Linkedin profile >>

Re: [VOTE][RESULT] SPIP: Storage Partitioned Join for Data Source V2

2021-11-02 Thread Chao Sun
Thanks all for voting on this proposal! On Tue, Nov 2, 2021 at 9:39 AM Liang Chi Hsieh wrote: > Hi all, > > The vote passed with the following 9 +1 votes and no -1 or +0 votes: > Liang-Chi Hsieh* > Russell Spitzer > Dongjoon Hyun* > Huaxin Gao > Ryan Blue > DB Tsai* > Holden Karau* > Cheng Su >

Re: [DISCUSS] SPIP: Storage Partitioned Join for Data Source V2

2021-10-27 Thread Chao Sun
lit-wise join. >>> >>> And two questions for further improvements: >>> 1. Can we apply this idea to partitioned file source tables >>> (non-bucketed) as well? >>> 2. What if the table has many partitions? Shall we apply certain join >>> alg

Re: [DISCUSS] SPIP: Storage Partitioned Join for Data Source V2

2021-10-26 Thread Chao Sun
> great to consider aggregate as well when doing this proposal. > > > >1. Any major use cases in mind except Hive bucketed table? > > > > Just curious if there’s any other use cases we are targeting as part of > SPIP. > > > > Thanks, > > Che

Re: [DISCUSS] SPIP: Storage Partitioned Join for Data Source V2

2021-10-26 Thread Chao Sun
i wrote: >>>>> >>>>>> +1 on this SPIP. >>>>>> >>>>>> This is a more generalized version of bucketed tables and bucketed >>>>>> joins which can eliminate very expensive data shuffles when joins, and >>>>&

[DISCUSS] SPIP: Storage Partitioned Join for Data Source V2

2021-10-22 Thread Chao Sun
Hi, Ryan and I drafted a design doc to support a new type of join: storage partitioned join which covers bucket join support for DataSourceV2 but is more general. The goal is to let Spark leverage distribution properties reported by data sources and eliminate shuffle whenever possible. Design

Re: [VOTE] Release Spark 3.2.0 (RC7)

2021-10-08 Thread Chao Sun
+1 (non-binding) On Fri, Oct 8, 2021 at 1:01 AM Maxim Gekk wrote: > +1 (non-binding) > > On Fri, Oct 8, 2021 at 10:44 AM Mich Talebzadeh > wrote: > >> +1 (non-binding) >> >> >> >>view my Linkedin profile >> >> >> >> >>

Re: [VOTE] Release Spark 3.2.0 (RC5)

2021-09-28 Thread Chao Sun
Looks like it's related to https://github.com/apache/spark/pull/34085. I filed https://issues.apache.org/jira/browse/SPARK-36873 to fix it. On Mon, Sep 27, 2021 at 6:00 PM Chao Sun wrote: > Thanks. Trying it on my local machine now but it will probably take a > while. I think https://gith

Re: [VOTE] Release Spark 3.2.0 (RC5)

2021-09-27 Thread Chao Sun
rises > only in 'test'. > > On Mon, Sep 27, 2021 at 6:58 PM Chao Sun wrote: > >> Hmm it may be related to the commit. Sean: how do I reproduce this? >> >> On Mon, Sep 27, 2021 at 4:56 PM Sean Owen wrote: >> >>> Another "is anyone else seeing this"? in

Re: [VOTE] Release Spark 3.2.0 (RC5)

2021-09-27 Thread Chao Sun
Hmm it may be related to the commit. Sean: how do I reproduce this? On Mon, Sep 27, 2021 at 4:56 PM Sean Owen wrote: > Another "is anyone else seeing this"? in compiling common/yarn-network: > > [ERROR] [Error] >

Re: [VOTE] Release Spark 3.2.0 (RC3)

2021-09-21 Thread Chao Sun
hadoop version < 3.x? > > Regards > Venkata krishnan > > > On Tue, Sep 21, 2021 at 3:33 PM Chao Sun wrote: > >> I just created SPARK-36820 for the above LZ4 test issue. Will post a PR >> there soon. >> >> On Tue, Sep 21, 2021 at 2:05 PM Chao Sun

Re: [VOTE] Release Spark 3.2.0 (RC3)

2021-09-21 Thread Chao Sun
I just created SPARK-36820 for the above LZ4 test issue. Will post a PR there soon. On Tue, Sep 21, 2021 at 2:05 PM Chao Sun wrote: > Mridul, is the LZ4 failure about Parquet? I think Parquet currently uses > Hadoop compression codec while Hadoop 2.7 still depends on native lib for >

Re: [VOTE] Release Spark 3.2.0 (RC3)

2021-09-21 Thread Chao Sun
Mridul, is the LZ4 failure about Parquet? I think Parquet currently uses Hadoop compression codec while Hadoop 2.7 still depends on native lib for the LZ4. Maybe we should run the test only for Hadoop 3.2 profile. On Tue, Sep 21, 2021 at 10:08 AM Mridul Muralidharan wrote: > > Signatures,

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-31 Thread Chao Sun
How long will it take? Normally, in the RC stage, we always revert the > upgrade made in the current release. We did the parquet upgrade multiple > times in the previous releases for avoiding the major delay in our Spark > release > > Thanks, > > Xiao > > > On Tue, A

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-31 Thread Chao Sun
The Apache Parquet community found an issue [1] in 1.12.0 which could cause incorrect file offset being written and subsequently reading of the same file to fail. A fix has been proposed in the same JIRA and we may have to wait until a new release is available so that we can upgrade Spark with the

Re: [DISCUSS] Rename hadoop-3.2/hadoop-2.7 profile to hadoop-3/hadoop-2?

2021-06-25 Thread Chao Sun
+1 for targeting the renaming for Apache Spark 3.3 at the current phase. > > On Fri, Jun 25, 2021 at 6:55 AM DB Tsai wrote: > >> +1 on renaming. >> >> DB Tsai | https://www.dbtsai.com/ | PGP 42E5B25A8F7A82C1 >> >> On Jun 24, 2021, at 11:41 AM, Chao Sun wrote:

[DISCUSS] Rename hadoop-3.2/hadoop-2.7 profile to hadoop-3/hadoop-2?

2021-06-24 Thread Chao Sun
Hi, As Spark master has upgraded to Hadoop-3.3.1, the current Maven profile name hadoop-3.2 is no longer accurate, and it may confuse Spark users when they realize the actual version is not Hadoop 3.2.x. Therefore, I created https://issues.apache.org/jira/browse/SPARK-33880 to change the profile

Re: [VOTE] Release Spark 3.1.2 (RC1)

2021-05-27 Thread Chao Sun
+1 (non-binding) - thanks Dongjoon for the work! On Wed, May 26, 2021 at 8:35 PM Dongjoon Hyun wrote: > +1 > > Bests, > Dongjoon > > On Wed, May 26, 2021 at 7:55 PM Kent Yao wrote: > >> +1, non-binding >> >> *Kent Yao * >> @ Data Science Center, Hangzhou Research Institute, NetEase Corp. >> *a

Re: [ANNOUNCE] Apache Spark 2.4.8 released

2021-05-18 Thread Chao Sun
Great work Liang-Chi! On Tue, May 18, 2021 at 1:14 AM Maxim Gekk wrote: > Congratulations everyone with the new release, and thanks to Liang-Chi. > > Maxim Gekk > > Software Engineer > > Databricks, Inc. > > > On Tue, May 18, 2021 at 11:06 AM Yuming Wang wrote: > >> Great work, Liang-Chi! >>

Re: Apache Spark 3.1.2 Release?

2021-05-17 Thread Chao Sun
+1. Thanks Dongjoon for doing this! On Mon, May 17, 2021 at 7:58 PM John Zhuge wrote: > +1, thanks Dongjoon! > > On Mon, May 17, 2021 at 7:50 PM Yuming Wang wrote: > >> +1. >> >> On Tue, May 18, 2021 at 9:06 AM Hyukjin Kwon wrote: >> >>> +1 thanks for driving me >>> >>> On Tue, 18 May 2021,

Re: Welcoming six new Apache Spark committers

2021-03-26 Thread Chao Sun
Congrats everyone! On Fri, Mar 26, 2021 at 6:23 PM Mridul Muralidharan wrote: > > Congratulations, looking forward to more exciting contributions ! > > Regards, > Mridul > > On Fri, Mar 26, 2021 at 8:21 PM Dongjoon Hyun > wrote: > >> >> Congratulations! :) >> >> Bests, >> Dongjoon. >> >> On

Re: [VOTE] SPIP: Add FunctionCatalog

2021-03-08 Thread Chao Sun
+1 (non-binding) On Mon, Mar 8, 2021 at 5:13 PM John Zhuge wrote: > +1 (non-binding) > > On Mon, Mar 8, 2021 at 4:32 PM Holden Karau wrote: > >> +1 (binding) >> >> On Mon, Mar 8, 2021 at 3:56 PM Ryan Blue wrote: >> >>> Hi everyone, I’d like to start a vote for the FunctionCatalog design >>>

Re: [DISCUSS] SPIP: FunctionCatalog

2021-03-04 Thread Chao Sun
+1 on Dongjoon's proposal. Great to see this is getting moved forward and thanks everyone for the insightful discussion! On Thu, Mar 4, 2021 at 8:58 AM Ryan Blue wrote: > Okay, great. I'll update the SPIP doc and call a vote in the next day or > two. > > On Thu, Mar 4, 2021 at 8:26 AM Erik

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-12 Thread Chao Sun
This is an important feature which can unblock several other projects including bucket join support for DataSource v2, complete support for enforcing DataSource v2 distribution requirements on the write path, etc. I like Ryan's proposals which look simple and elegant, with nice support on function

Migrating BinaryFileFormat to DSv2?

2020-09-10 Thread Chao Sun
Hi all, As we are moving all data sources to v2, I'm wondering whether it makes sense to do the same for `BinaryFileFormat` which only has v1 impl at the moment. Also curious to know what other data sources that haven't been migrated yet. Thanks, Chao

Re: [SparkSql] Casting of Predicate Literals

2020-08-26 Thread Chao Sun
r > data types are a lot trickier and we should analyze them one by one. > > On Tue, Aug 25, 2020 at 7:31 PM Chao Sun wrote: > >> Hi, >> >> So just realized there were already multiple attempts on this issue in >> the past. From the discussion it seems the

Re: [SparkSql] Casting of Predicate Literals

2020-08-25 Thread Chao Sun
to enable pushdown in this case. What do you think? Thanks for your input! Chao [1]: https://github.com/apache/spark/pull/8718 [2]: https://github.com/apache/spark/pull/27648 On Mon, Aug 24, 2020 at 1:57 PM Chao Sun wrote: > > Currently we can't. This is something we should improve, by

Re: [SparkSql] Casting of Predicate Literals

2020-08-24 Thread Chao Sun
> Currently we can't. This is something we should improve, by either pushing down the cast to the data source, or simplifying the predicates to eliminate the cast. Hi all, I've created https://issues.apache.org/jira/browse/SPARK-32694 to track this. Welcome to comment on the JIRA. On Wed, Aug

subscribe

2018-04-05 Thread Chao Sun