Re: [VOTE] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-15 Thread Yikun Jiang
+1 Regards, Yikun On Wed, Nov 15, 2023 at 4:26 PM huaxin gao wrote: > +1 > > On Tue, Nov 14, 2023 at 10:45 AM Holden Karau > wrote: > >> +1 >> >> On Tue, Nov 14, 2023 at 10:21 AM DB Tsai wrote: >> >>> +1 >>> >>> DB Tsai | https://www.dbtsai.com/ | PGP 42E5B25A8F7A82C1 >>> >>> On Nov 14,

Spark Docker Official image (Java 17) coming soon

2023-11-13 Thread Yikun Jiang
We added the Java 17 support for Apache Spark docker official image at [1]. (Thanks @vakarisbk efforts) After the [2] merge in future, the first java17 series docker official image will be available. You can also have a try on ghcr test image: all in one image: ghcr.io/apache/spark-docker/spark

Re: [VOTE] Updating documentation hosted for EOL and maintenance releases

2023-09-26 Thread Yikun Jiang
+1, I believe it is a wise choice to update the EOL policy of the document based on the real demands of community users. Regards, Yikun On Tue, Sep 26, 2023 at 1:06 PM Ruifeng Zheng wrote: > +1 > > On Tue, Sep 26, 2023 at 12:51 PM Hyukjin Kwon > wrote: > >> Hi all, >> >> I would like to

Re: Volcano in spark distro

2023-08-22 Thread Yikun Jiang
@Santosh We tried to add this in v3.3.0. [1] The main reason for not adding it at that time was: 1. Volcano multi-arch not supported before v1.7.0. (already upgraded to 1.7.0 since Spark 3.4.0) 2. Spark on K8s + Volcano is experimental. (We have removed the experimental [2]) Consider spark

Spark Docker Official Image is now available

2023-07-19 Thread Yikun Jiang
The spark Docker Official Image is now available: https://hub.docker.com/_/spark $ docker run -it --rm *spark* /opt/spark/bin/spark-shell $ docker run -it --rm *spark*:python3 /opt/spark/bin/pyspark $ docker run -it --rm *spark*:r /opt/spark/bin/sparkR We had a longer review journey than we

Re: [VOTE][SPIP] PySpark Test Framework

2023-06-24 Thread Yikun Jiang
+1 Regards, Yikun On Fri, Jun 23, 2023 at 6:17 AM L. C. Hsieh wrote: > +1 > > On Thu, Jun 22, 2023 at 3:10 PM Xinrong Meng wrote: > > > > +1 > > > > Thanks for driving that! > > > > On Wed, Jun 21, 2023 at 10:25 PM Ruifeng Zheng > wrote: > >> > >> +1 > >> > >> On Thu, Jun 22, 2023 at 1:11 

Re: [DISCUSS] Unified Apache Spark Docker image tag?

2023-05-09 Thread Yikun Jiang
ible necessity at some point in the > future. > - `deprecation` means no breaking change. > > > Dongjoon > > > > On Tue, May 9, 2023 at 12:01 AM Yikun Jiang wrote: > >> > It seems that your reply (the following) didn't reach out to the >> mailing list co

Re: [DISCUSS] Unified Apache Spark Docker image tag?

2023-05-09 Thread Yikun Jiang
ache/spark:scala`. > > I believe (1) and (2) were our mistakes. We had better recover them ASAP. > For Java questions, I prefer to be consistent with Apache Spark repo's > default. > > Dongjoon. > > On 2023/05/08 08:56:26 Yikun Jiang wrote: > > This is a call for dis

Re: [DISCUSS] Unified Apache Spark Docker image tag?

2023-05-09 Thread Yikun Jiang
olr > https://hub.docker.com/_/zookeeper > > In short, according to the SPIP's `Docker Official Image` definition, new > images should go to (1) only in order to achieve `Support Docker Official > Image for Spark`, shouldn't they? > > Dongjoon. > > On Mon, May 8, 202

[DISCUSS] Unified Apache Spark Docker image tag?

2023-05-08 Thread Yikun Jiang
This is a call for discussion for how we can unified Apache Spark Docker image tag fluently. As you might know, there is an apache/spark-docker repo to store the dockerfiles and help to publish the docker images, also intended to replace the original

Re: [VOTE] Release Apache Spark 3.4.0 (RC7)

2023-04-10 Thread Yikun Jiang
+1 (non-binding) Also ran the docker image related test (signatures/standalone/k8s) with rc7: https://github.com/apache/spark-docker/pull/32 Regards, Yikun On Tue, Apr 11, 2023 at 4:44 AM Jacek Laskowski wrote: > +1 > > * Built fine with Scala 2.13 > and

Re: [VOTE] Release Spark 3.3.2 (RC1)

2023-02-12 Thread Yikun Jiang
+1, Test 3.3.2-rc1 with spark-docker: - Downloading rc4 tgz, validate the key. - Extract bin and build image - Run K8s IT, standalone test of R/Python/Scala/All image [1] [1] https://github.com/apache/spark-docker/pull/29 Regards, Yikun On Mon, Feb 13, 2023 at 10:25 AM yangjie01 wrote: >

Re: Publish Apache Spark offcial image under the new rules?

2022-11-10 Thread Yikun Jiang
ero/spark/tags?page=1=last_updated> (docker hub) . Regards, Yikun On Thu, Nov 10, 2022 at 6:27 PM Yikun Jiang wrote: > Hi, all > > Last month the vote of "Support Docker Official Image for Spark > <https://issues.apache.org/jira/browse/SPARK-40513>" passed. > &

Publish Apache Spark offcial image under the new rules?

2022-11-10 Thread Yikun Jiang
Hi, all Last month the vote of "Support Docker Official Image for Spark " passed. # Progress of SPIP: ## Completed: - A new github repo created: https://github.com/apache/spark-docker - Add "Spark Docker

Re: [VOTE] Release Spark 3.3.1 (RC4)

2022-10-18 Thread Yikun Jiang
+1, also test passed with spark-docker workflow (downloading rc4 tgz, extract, build image, run K8s IT) [1] https://github.com/Yikun/spark-docker/pull/9 Regards, Yikun On Wed, Oct 19, 2022 at 8:59 AM Wenchen Fan wrote: > +1 > > On Wed, Oct 19, 2022 at 4:59 AM Chao Sun wrote: > >> +1. Thanks

Re: Enforcing scalafmt on Spark Connect - connector/connect

2022-10-14 Thread Yikun Jiang
+1, I also think it's a good idea. BTW, we might also consider adding some notes about `lint-scala` in [1], just like `lint-python` in pyspark [2]. [1] https://spark.apache.org/developer-tools.html [2] https://spark.apache.org/docs/latest/api/python/development/contributing.html Regards, Yikun

Re: Welcome Yikun Jiang as a Spark committer

2022-10-09 Thread Yikun Jiang
;> >>>>>> Congrats, Yikun! >>>>>> >>>>>> -- >>>>>> Ruifeng Zheng >>>>>> ruife...@foxmail.com >>>>>> >>>>>> <https://wx.mail.qq.com/home/i

Re: [VOTE] SPIP: Support Docker Official Image for Spark

2022-09-21 Thread Yikun Jiang
+1 (non-binding) Regards, Yikun On Thu, Sep 22, 2022 at 9:43 AM Hyukjin Kwon wrote: > Starting with my +1. > > On Thu, 22 Sept 2022 at 10:41, Hyukjin Kwon wrote: > >> Hi all, >> >> I would like to start a vote for SPIP: "Support Docker Official Image >> for Spark" >> >> The goal of the SPIP

Re: [DISCUSS] SPIP: Support Docker Official Image for Spark

2022-09-21 Thread Yikun Jiang
upta > > > On Wed, Sep 21, 2022 at 9:19 PM Xiao Li wrote: > >> +1 >> >> Yikun Jiang 于2022年9月21日周三 07:22写道: >> >>> Thanks for all your inputs! BTW, I also create a JIRA to track related >>> work: https://issues.apache.org/jira/browse/SP

Re: [DISCUSS] SPIP: Support Docker Official Image for Spark

2022-09-21 Thread Yikun Jiang
022 at 11:08 PM Qian SUN wrote: > >> +1. >> It's valuable, can I be involved in this work? >> >> Yikun Jiang 于2022年9月19日周一 08:15写道: >> >>> Hi, all >>> >>> I would like to start the discussion for supporting Docker Official >>>

Re: [DISCUSS] SPIP: Support Docker Official Image for Spark

2022-09-19 Thread Yikun Jiang
Wang 于2022年9月19日周一 10:18写道: >> >>> +1. >>> >>> On Mon, Sep 19, 2022 at 9:44 AM Kent Yao wrote: >>> >>>> +1 >>>> >>>> Gengliang Wang 于2022年9月19日周一 09:23写道: >>>> > >>>> > +1, thanks for

[DISCUSS] SPIP: Support Docker Official Image for Spark

2022-09-18 Thread Yikun Jiang
Hi, all I would like to start the discussion for supporting Docker Official Image for Spark. This SPIP is proposed to add Docker Official Image(DOI) to ensure the Spark Docker images meet the quality standards for Docker images, to provide

Re: Welcoming three new PMC members

2022-08-10 Thread Yikun Jiang
Congratulations! Regards, Yikun On Wed, Aug 10, 2022 at 3:19 PM Maciej wrote: > Congratulations! > > On 8/10/22 08:14, Yi Wu wrote: > > Congrats everyone! > > > > > > > > On Wed, Aug 10, 2022 at 11:33 AM Yuanjian Li > > wrote: > > > > Congrats everyone! > >

Re: Welcome Xinrong Meng as a Spark committer

2022-08-09 Thread Yikun Jiang
Congratulations! Regards, Yikun On Tue, Aug 9, 2022 at 4:13 PM Hyukjin Kwon wrote: > Hi all, > > The Spark PMC recently added Xinrong Meng as a committer on the project. > Xinrong is the major contributor of PySpark especially Pandas API on Spark. > She has guided a lot of new contributors

Re: [SPARK-39515] Improve scheduled jobs in GitHub Actions

2022-07-14 Thread Yikun Jiang
ards, Yikun On Mon, Jun 27, 2022 at 12:05 AM Yikun Jiang wrote: > > There’s one last task to simply caching the Docker image ( > https://issues.apache.org/jira/browse/SPARK-39522). > I will have to be less active for this week and next week because of the > Spark Summit. Would

Re: [VOTE] Release Spark 3.2.2 (RC1)

2022-07-13 Thread Yikun Jiang
+1 (non-binding) Checked out tag and built from source on Linux aarch64 and ran some basic test. Regards, Yikun On Wed, Jul 13, 2022 at 5:54 AM Mridul Muralidharan wrote: > > +1 > > Signatures, digests, etc check out fine. > Checked out tag and build/tested with "-Pyarn -Pmesos

Re: Apache Spark 3.2.2 Release?

2022-07-07 Thread Yikun Jiang
+1 (non-binding) Thanks! Regards, Yikun On Thu, Jul 7, 2022 at 1:57 PM Mridul Muralidharan wrote: > +1 > > Thanks for driving this Dongjoon ! > > Regards, > Mridul > > On Thu, Jul 7, 2022 at 12:36 AM Gengliang Wang wrote: > >> +1. >> Thank you, Dongjoon. >> >> On Wed, Jul 6, 2022 at 10:21

Re: [SPARK-39515] Improve scheduled jobs in GitHub Actions

2022-06-26 Thread Yikun Jiang
nnot join that quick meeting. >>>> I have another schedule at South Bay around 7PM and need to leave San >>>> Francisco at least 5PM. >>>> >>>> Dongjoon. >>>> >>>> >>>> On Wed, Jun 22, 2022 at 3:39 AM Hyukjin Kwon >

Re: Re: [VOTE][SPIP] Spark Connect

2022-06-15 Thread Yikun Jiang
+1 (non-binding) A lighter client will definitely help other ecosystems integrate more easily with Spark! Regards, Yikun On Thu, Jun 16, 2022 at 12:54 AM Gengliang Wang wrote: > +1 (non-binding) > > On Wed, Jun 15, 2022 at 9:32 AM Dongjoon Hyun > wrote: > >> +1 >> >> On Wed, Jun 15, 2022 at

Re: [VOTE] Release Spark 3.3.0 (RC5)

2022-06-06 Thread Yikun Jiang
+1 (non-binding) 1. Verify binary checksums and signatures. 2. Check kubernetes and pyspark documentation 3. Verify K8S integration test on aarch64 4. Verify customized scheduler and volcano integration test with Volcano 1.5.1. On Mon, Jun 6, 2022 at 3:43 PM Dongjoon Hyun wrote: > +1. > > I

Re: Introducing "Pandas API on Spark" component in JIRA, and use "PS" PR title component

2022-05-16 Thread Yikun Jiang
It's a pretty good idea, +1. To be clear in Github: - For each PR Title: [SPARK-XXX][PYTHON][PS] The Pandas on spark pr title (*still keep [PYTHON]* and [PS] new added) - For PR label: new added: `PANDAS API ON Spark`, still keep: `PYTHON`, `CORE` (*still keep `PYTHON`, `CORE`* and `PANDAS API

Re: SIGMOD System Award for Apache Spark

2022-05-14 Thread Yikun Jiang
Awesome! Congrats to the whole community! On Fri, May 13, 2022 at 3:44 AM Matei Zaharia wrote: > Hi all, > > We recently found out that Apache Spark received > the SIGMOD System Award this > year, given by SIGMOD (the ACM’s data management research

Final recap: SPIP: Support Customized Kubernetes Scheduler

2022-03-24 Thread Yikun Jiang
Last month, I synced some progress on "Support Customized Kubernetes Scheduler" [1] at 24. Feb. 2022. Another month has passed, with the cut of the 3.3 release, there are also some changes on SPIP. I'd like to share in here:

Re: Apache Spark 3.3 Release

2022-03-15 Thread Yikun Jiang
> To make our release time more predictable, let us collect the PRs and wait three more days before the branch cut? For SPIP: Support Customized Kubernetes Schedulers: #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1 Three more days are

Re: Apache Spark 3.3 Release

2022-03-04 Thread Yikun Jiang
@Maxim Thanks for driving the release! > Not sure about SPARK-36057 since the current state. @Igor Costa Thanks for your attention, as dongjoon said, basic framework abilities of custom scheduler have been supported, we are also planning to mark this as beta in 3.3.0. Of course, we will do more

Re: `running-on-kubernetes` page render bad in v3.2.1(latest) website

2022-03-03 Thread Yikun Jiang
It already has been fixed by: https://github.com/apache/spark/pull/35572 Sorry for bothering here. Just ignore my previous email.

`running-on-kubernetes` page render bad in v3.2.1(latest) website

2022-03-02 Thread Yikun Jiang
Looks like the `running-on-kubernetes` page encounterd some problems when published. [1] https://spark.apache.org/docs/latest/running-on-kubernetes.html#spark-properties (You can see bad format after #spark-properties) - I also check the master branch (setup local env) and also v3.2.0 (

Re: Recap on current status of "SPIP: Support Customized Kubernetes Schedulers"

2022-02-24 Thread Yikun Jiang
@dongjoon-hyun @yangwwei Thanks! @Mich Thanks for testing it, I'm not very professional with GKE, I'm also not quite sure if it is different in configurations, internal network, scheduler implementations itself VS upstream K8S. As far as I know, different K8S vendors also maintain their own

Recap on current status of "SPIP: Support Customized Kubernetes Schedulers"

2022-02-23 Thread Yikun Jiang
First, much thanks for all your help (Spark/Volcano/Yunikorn community) to make this SPIP happen! Especially,@dongjoon-hyun @holdenk @william-wang @attilapiros @HyukjinKwon @martin-g @yangwwei @tgravescs The SPIP is near the end of the stage. It can be said that it is beta available at the basic

[VOTE][RESULT] SPIP: Support Customized Kubernetes Schedulers Proposal

2022-01-20 Thread Yikun Jiang
Hi all, The vote passed with the following 14 +1 votes and no -1 or +0 votes: Bowen Li Weiwei Yang Chenya Zhang Chaoran Yu William Wang Holden Karau * bo yang Mich Talebzadeh John Zhuge Thomas Graves * Kent Yao Mridul Muralidharan * Ryan Blue Yikun Jiang * = binding Thank you guys all for your

Re: [VOTE][SPIP] Support Customized Kubernetes Schedulers Proposal

2022-01-20 Thread Yikun Jiang
haven't had time to look at the implementation >>>> details is please make sure resource aware scheduling and the stage >>>> level scheduling still work or any caveats are documented. Feel free >>>> to ping me if questions in these areas. >>>> >>

Re: Tries on migrating Spark Linux arm64 Job from Jenkins to GitHub Actions

2022-01-08 Thread Yikun Jiang
elf-hosted action from the spark community for future reference. Regards, Yikun Yikun Jiang 于2022年1月9日周日 11:33写道: > Hi, all > > I tried to verify the possibility of *Linux arm64 scheduled job *using > self-hosted action, below is some progress and I would like to hear > suggestion

Tries on migrating Spark Linux arm64 Job from Jenkins to GitHub Actions

2022-01-08 Thread Yikun Jiang
Hi, all I tried to verify the possibility of *Linux arm64 scheduled job *using self-hosted action, below is some progress and I would like to hear suggestion from you in the next step (continue or stop). Related JIRA: SPARK-35607 *## About

[VOTE][SPIP] Support Customized Kubernetes Schedulers Proposal

2022-01-05 Thread Yikun Jiang
Hi all, I’d like to start a vote for SPIP: "Support Customized Kubernetes Schedulers Proposal" The SPIP is to support customized Kubernetes schedulers in Spark on Kubernetes. Please also refer to: - Previous discussion in dev mailing list: [DISCUSSION] SPIP: Support Volcano/Alternative

Re: [DISCUSSION] SPIP: Support Volcano/Alternative Schedulers Proposal

2022-01-05 Thread Yikun Jiang
t;>> Here are parts of performance indications in Volcano. >>>>> 1. Scheduler throughput: 1.5k pod/s (default scheduler: 100 Pod/s) >>>>> 2. Spark application performance improved 30%+ with minimal resource >>>>> reservation feature in case of insufficient reso

Re: [DISCUSSION] SPIP: Support Volcano/Alternative Schedulers Proposal

2022-01-04 Thread Yikun Jiang
guidance on how to best contribute? > > > > Best, > > Janak > > > > *From:* Mich Talebzadeh > *Sent:* Tuesday, January 4, 2022 2:12 AM > *To:* Yikun Jiang > *Cc:* dev ; Weiwei Yang ; Holden > Karau ; wang.platf...@gmail.com; Prasad Paravatha < >

Re: [DISCUSSION] SPIP: Support Volcano/Alternative Schedulers Proposal

2022-01-04 Thread Yikun Jiang
ease help to abstract out the things in common > and allow Spark to plug different implementations? I'd be happy to work > with you guys on this issue. > > > On Tue, Nov 30, 2021 at 6:49 PM Yikun Jiang wrote: > >> @Weiwei @Chenya >> >> > Thanks for b

Re: [DISCUSSION] SPIP: Support Volcano/Alternative Schedulers Proposal

2021-12-01 Thread Yikun Jiang
allow Spark to plug different implementations? I'd be happy to work > with you guys on this issue. > > > On Tue, Nov 30, 2021 at 6:49 PM Yikun Jiang wrote: > >> @Weiwei @Chenya >> >> > Thanks for bringing this up. This is quite interesting, we definitely >>

Re: [DISCUSSION] SPIP: Support Volcano/Alternative Schedulers Proposal

2021-11-30 Thread Yikun Jiang
extra cost and potential inconsistency of maintaining >>>> different layers of resource strategies. One interesting topic we hope to >>>> discuss more about is dynamic allocation, which would benefit from native >>>> coordination between Spark and r

[DISCUSSION] SPIP: Support Volcano/Alternative Schedulers Proposal

2021-11-30 Thread Yikun Jiang
Hey everyone, I'd like to start a discussion on "Support Volcano/Alternative Schedulers Proposal". This SPIP is proposed to make spark k8s schedulers provide more YARN like features (such as queues and minimum resources before scheduling jobs) that many folks want on Kubernetes. The goal of

Re: Spark on Kubernetes scheduler variety

2021-06-29 Thread Yikun Jiang
Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from >

Re: Spark on Kubernetes scheduler variety

2021-06-25 Thread Yikun Jiang
-cloudnative/spark/commit/6c1f37525f026353eaead34216d47dad653f13a4 And Regards, Yikun Yikun Jiang 于2021年6月25日周五 上午11:53写道: > Hi, folks. > > As @Klaus mentioned, We have some work on Spark on k8s with volcano native > support. Also, there were also some production deployment validation from &g

Re: Spark on Kubernetes scheduler variety

2021-06-24 Thread Yikun Jiang
Hi, folks. As @Klaus mentioned, We have some work on Spark on k8s with volcano native support. Also, there were also some production deployment validation from our partners in China, like JingDong, XiaoHongShu, VIPshop. We will also prepare to propose an initial design and POC[3] on a shared

Re: UPDATE: Apache Spark 3.2 Release

2021-06-17 Thread Yikun Jiang
- Apache Hadoop 3.3.2 becomes the default Hadoop profile for Apache Spark 3.2 via SPARK-29250 today. We are observing big improvements in S3 use cases. Please try it and share your experience. It should be Apache Hadoop 3.3.1 [1]. : ) Note that Apache hadoop 3.3.0 is the first Hadoop release

Re: [DISCUSS] Multiple columns adding/replacing support in PySpark DataFrame API

2021-05-04 Thread Yikun Jiang
t; > withColumn". > > +1 for exposing the *withColumns* > > Regards > Saurabh Chawla > > On Thu, Apr 22, 2021 at 1:03 PM Yikun Jiang wrote: > >> Hi, all >> >> *Background:* >> >> Currently, there is a withColumns >> <https://g

[DISCUSS] Multiple columns adding/replacing support in PySpark DataFrame API

2021-04-22 Thread Yikun Jiang
Hi, all *Background:* Currently, there is a withColumns [1] method to help users/devs add/replace multiple columns at once. But this method is private

Re: please read: current state and the future of the apache spark build system

2021-04-15 Thread Yikun Jiang
Much thanks for your work on infra @Shane. Especially, we (I and @huangtianhua) got really much help from you when make Arm CI work. [1] > prepare jenkins worker ansible configs and stick in the spark repo https://github.com/apache/spark/pull/32178 I take a quick glance on it, it seems it

Re: K8s Integration test is unable to run because of the unavailable libs

2021-03-22 Thread Yikun Jiang
hey, Yi Wu Looks like it's just an apt installation problem, we should do apt update to refresh the local package cache list before we install the "gnupg". I opened a issue on jira [1] , and try to fix it in [2], hope this helps. [1] https://issues.apache.org/jira/browse/SPARK-34820 [2]

Re: [VOTE] Release Spark 3.1.1 (RC2)

2021-02-09 Thread Yikun Jiang
+1, Tested build and basic feature on aarch64(ARM64) environment. Regards, Yikun Yuming Wang 于2021年2月9日周二 下午8:24写道: > +1. Tested a batch of queries with YARN client mode. > > On Tue, Feb 9, 2021 at 2:57 PM 郑瑞峰 wrote: > >> +1 (non-binding) >> >> Thank you, Hyukjin >> >> >> --

Re: [DISCUSS] Add RocksDB StateStore

2021-02-07 Thread Yikun Jiang
I worked on some work about rocksdb multi-arch support and version upgrade on Kafka/Storm/Flink[1][2][3].To avoid these issues happened in spark again, I want to give some inputs in here about rocksdb version selection from multi-arch support view. Hope it helps. The Rocksdb adds Arm64 support