Re: [DISCUSS] Add RocksDB StateStore

2021-02-09 Thread Hyukjin Kwon
I'm good with this too. 2021년 2월 9일 (화) 오후 4:16, DB Tsai 님이 작성: > +1 to add it as an external module so people can test it out and give > feedback easier. > > On Mon, Feb 8, 2021 at 10:22 PM Gabor Somogyi > wrote: > > > > +1 adding it any way. > > > > On Mon, 8 Feb 2021, 21:54 Holden Karau, wro

Re: [DISCUSS] Add RocksDB StateStore

2021-02-09 Thread Hyukjin Kwon
I mean I am okay with adding it as an external module for the extra clarification :-) 2021년 2월 9일 (화) 오후 11:10, Hyukjin Kwon 님이 작성: > I'm good with this too. > > 2021년 2월 9일 (화) 오후 4:16, DB Tsai 님이 작성: > >> +1 to add it as an external module so people can test it out and

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-09 Thread Hyukjin Kwon
Just dropping a few lines. I remember that one of the goals in DSv2 is to correct the mistakes we made in the current Spark codes. It would not have much point if we will happen to just follow and mimic what Spark currently does. It might just end up with another copy of Spark APIs, e.g. Expression

Re: Apache Spark 3.0.2 Release ?

2021-02-12 Thread Hyukjin Kwon
Yeah, +1 too 2021년 2월 13일 (토) 오전 4:49, Dongjoon Hyun 님이 작성: > Thank you, Sean! > > On Fri, Feb 12, 2021 at 11:41 AM Sean Owen wrote: > >> Sounds like a fine time to me, sure. >> >> On Fri, Feb 12, 2021 at 1:39 PM Dongjoon Hyun >> wrote: >> >>> Hi, All. >>> >>> As of today, `branch-3.0` has 307

Re: [VOTE] Release Spark 3.1.1 (RC2)

2021-02-12 Thread Hyukjin Kwon
es >> >> I keep getting test failures >> with org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite: removing this >> suite gets the build through though - does anyone have suggestions on how >> to fix it ? >> Perhaps a local problem at my end ? >> >> >>

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-12 Thread Hyukjin Kwon
> hard-to-debug behavior due to use of reflective method signature > >>> searches. > >>> The merits on both sides can hopefully be more properly examined with > >>> code, > >>> so I look forward to seeing an implementation of Wenchen's ideas

Re: [DISCUSS] assignee practice on committers+ (possible issue on preemption)

2021-02-15 Thread Hyukjin Kwon
I remember I raised a similar issue a long time ago in the dev mailing list. I agree that setting no assignee makes sense in most of the cases, and also think we share similar thoughts about the assignee on umbrella JIRAs, followup tasks, the case when it's clear with a design doc, etc. It makes me

Re: [VOTE] Release Spark 3.0.2 (RC1)

2021-02-16 Thread Hyukjin Kwon
+1 2021년 2월 16일 (화) 오후 5:10, Prashant Sharma 님이 작성: > +1 > > On Tue, Feb 16, 2021 at 1:22 PM Dongjoon Hyun > wrote: > >> Please vote on releasing the following candidate as Apache Spark version >> 3.0.2. >> >> The vote is open until February 19th 9AM (PST) and passes if a majority >> +1 PMC vote

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-16 Thread Hyukjin Kwon
ete support for >>> enforcing DataSource v2 distribution requirements on the write path, etc. I >>> like Ryan's proposals which look simple and elegant, with nice support on >>> function overloading and variadic arguments. On the other hand, I think >>>

Re: Please use Jekyll via "bundle exec" from now on

2021-02-18 Thread Hyukjin Kwon
Thanks Attlila for fixing and sharing this. 2021년 2월 18일 (목) 오후 6:17, Attila Zsolt Piros 님이 작성: > Hello everybody, > > To pin the exact same version of Jekyll across all the contributors, Ruby > Bundler is introduced. > This way the differences in the generated documentation, which were caused >

Re: [VOTE] Release Spark 3.1.1 (RC2)

2021-02-19 Thread Hyukjin Kwon
77920 > > Pozdrawiam, > Jacek Laskowski > > https://about.me/JacekLaskowski > "The Internals Of" Online Books <https://books.japila.pl/> > Follow me on https://twitter.com/jaceklaskowski > > <https://twitter.com/jaceklaskowski> > > > On Sat, Fe

[VOTE] Release Spark 3.1.1 (RC3)

2021-02-21 Thread Hyukjin Kwon
Please vote on releasing the following candidate as Apache Spark version 3.1.1. The vote is open until February 24th 11PM PST and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 3.1.1 [ ] -1 Do not release this package because

Re: [VOTE] Release Spark 3.1.1 (RC3)

2021-02-21 Thread Hyukjin Kwon
Starting with my +1 (binding). 2021년 2월 22일 (월) 오후 3:56, Hyukjin Kwon 님이 작성: > Please vote on releasing the following candidate as Apache Spark version > 3.1.1. > > The vote is open until February 24th 11PM PST and passes if a majority +1 > PMC votes are cast, with a minimu

Re: [VOTE] Release Spark 3.1.1 (RC3)

2021-02-24 Thread Hyukjin Kwon
Mridul > > > On Mon, Feb 22, 2021 at 12:57 AM Hyukjin Kwon wrote: > >> Please vote on releasing the following candidate as Apache Spark version >> 3.1.1. >> >> The vote is open until February 24th 11PM PST and passes if a majority +1 >> PMC votes are cast

Re: [VOTE] Release Spark 3.1.1 (RC3)

2021-02-25 Thread Hyukjin Kwon
;> >>>>> -1 Could we extend the voting deadline? >>>>> >>>>> A few TPC-DS queries (q17, q18, q39a, q39b) are returning different >>>>> results between Spark 3.0 and Spark 3.1. We need a few more days to >>>>> un

[VOTE][RESULT] Release Spark 3.1.1 (RC3)

2021-02-26 Thread Hyukjin Kwon
The vote passes with 15 +1s (6 binding +1s). (* = binding) +1 - Hyukjin Kwon * - Jungtaek Lim - Herman van Hovell * - Sean Owen * - Yuming Wang - Gengliang Wang - John Zhuge - Takeshi Yamamuro - Cheng Su - Maxim Gekk - Gabor Somogyi - Dongjoon Hyun * - Terry Kim - Mridul Muralidharan * - Xiao Li

Re: Apache Spark 3.2 Expectation

2021-02-26 Thread Hyukjin Kwon
I have an idea which I'll send an email to discuss next or a week after the next week. I did not have enough bandwidth to drive both together at the same time. I would appreciate if we have some more time for 3.2. In addition, It would also be great if we follow the schedule and catch potential bl

Please take a look at the draft of the Spark 3.1.1 release notes

2021-02-27 Thread Hyukjin Kwon
Hi all, I am preparing to publish and announce Spark 3.1.1. This is the draft of the release note, and I plan to edit a bit more and use it as the final release note. Please take a look and let me know if I missed any major changes or something. https://docs.google.com/document/d/1x6zzgRsZ4u1DgUh

Re: Please take a look at the draft of the Spark 3.1.1 release notes

2021-03-01 Thread Hyukjin Kwon
or sharing, Hyukjin! > > Dongjoon. > > On Sat, Feb 27, 2021 at 12:36 AM Hyukjin Kwon wrote: > > Hi all, > > I am preparing to publish and announce Spark 3.1.1. > This is the draft of the release note, and I plan to edit a bit more and > use it as the final release note

Re: Please take a look at the draft of the Spark 3.1.1 release notes

2021-03-01 Thread Hyukjin Kwon
section" ? Currently, they refer to " > https://spark.apache.org/docs/3.0.0/.. <https://spark.apache.org/docs>.". > I think that they should refer to "https://spark.apache.org/docs/3.1.1/.. > <https://spark.apache.org/docs>." > > Regards, >

[ANNOUNCE] Announcing Apache Spark 3.1.1

2021-03-02 Thread Hyukjin Kwon
We are excited to announce Spark 3.1.1 today. Apache Spark 3.1.1 is the second release of the 3.x line. This release adds Python type annotations and Python dependency management support as part of Project Zen. Other major updates include improved ANSI SQL compliance support, history server suppor

Re: Apache Spark 2.4.8 (and EOL of 2.4)

2021-03-03 Thread Hyukjin Kwon
Yeah, I would prefer to have a 2.4.8 release as an EOL too. I don't mind having 2.4.9 as EOL too if that's preferred from more people. 2021년 3월 4일 (목) 오전 4:01, Sean Owen 님이 작성: > Sure, I'm even arguing that 2.4.8 could possibly be the final release. No > objection of course to continuing to backp

Re: [ANNOUNCE] Announcing Apache Spark 3.1.1

2021-03-03 Thread Hyukjin Kwon
es / >>>>>> Greenplum >>>>>> with Spark SQL and DataFrames, 10~100x faster.* >>>>>> *spark-func-extras <https://github.com/yaooqinn/spark-func-extras>A >>>>>> library that brings excellent and useful functions fro

Re: Apache Spark 2.4.8 (and EOL of 2.4)

2021-03-04 Thread Hyukjin Kwon
eleasing 2.4.8 and thanks, Liang-chi, for volunteering. > > Btw, anyone roughly know how many v2.4 users still are based on some > stats > > (e.g., # of v2.4.7 downloads from the official repos)? > > Most users have started using v3.x? > > > > On Thu, Mar 4, 2021 at 8:

Re: Apache Spark 3.2 Expectation

2021-03-11 Thread Hyukjin Kwon
Just for an update, I will send a discussion email about my idea late this week or early next week. 2021년 3월 11일 (목) 오후 7:00, Wenchen Fan 님이 작성: > There are many projects going on right now, such as new DS v2 APIs, ANSI > interval types, join improvement, disaggregated shuffle, etc. I don't > thi

Re: [VOTE] SPIP: Add FunctionCatalog

2021-03-11 Thread Hyukjin Kwon
+1 2021년 3월 12일 (금) 오후 2:54, Jungtaek Lim 님이 작성: > +1 (non-binding) Excellent description on SPIP doc! Thanks for the amazing > effort! > > On Wed, Mar 10, 2021 at 3:19 AM Liang-Chi Hsieh wrote: > >> >> +1 (non-binding). >> >> Thanks for the work! >> >> >> Erik Krogen wrote >> > +1 from me (non-

[DISCUSS] Support pandas API layer on PySpark

2021-03-13 Thread Hyukjin Kwon
Hi all, I would like to start the discussion on supporting pandas API layer on Spark. If we have a general consensus on having it in PySpark, I will initiate and drive an SPIP with a detailed explanation about the implementation’s overview and structure. I would appreciate it if I can know whe

Re: [DISCUSS] Support pandas API layer on PySpark

2021-03-14 Thread Hyukjin Kwon
maintained > better in Spark but it's well maintained. It adds some overhead to > maintaining Spark conversely. On the upside it makes it a little more > discoverable. Are there more 'synergies'? > > On Sat, Mar 13, 2021, 7:57 PM Hyukjin Kwon wrote: > >> Hi

Re: [DISCUSS] Support pandas API layer on PySpark

2021-03-16 Thread Hyukjin Kwon
Thank you guys for all your feedback. I will start working on SPIP with Koalas team. I would expect the SPIP can be sent late this week or early next week. I inlined and answered the questions unanswered as below: Is the community developing the pandas API layer for Spark interested in being par

Re: [DISCUSS] Support pandas API layer on PySpark

2021-03-17 Thread Hyukjin Kwon
osed integration >> would provide the hooks via Pandas' ExtensionArray interface to allow >> Spark to performantly interchange jagged/ragged lists to/from python >> UDFs. >> >> Cheers >> Andrew >> >> On Tue, Mar 16, 2021 at 8:15 PM Hyukjin K

Re: [DISCUSS] Support pandas API layer on PySpark

2021-03-17 Thread Hyukjin Kwon
ad. > > Especially when some people here are thinking about making it the > default/replacing the regular API I would strongly suggest defaulting to an > indexing mechanism that is not changing the query plan. > > Best, > Georg > > Am Mi., 17. März 2021 um 12:13 Uhr schrieb

Re: [DISCUSS] Support pandas API layer on PySpark

2021-03-17 Thread Hyukjin Kwon
Thanks Nicholas for the pointer :-). On Thu, 18 Mar 2021, 00:11 Nicholas Chammas, wrote: > On Tue, Mar 16, 2021 at 9:15 PM Hyukjin Kwon wrote: > >> I am currently thinking we will have to convert the Koalas tests to use >> unittests to match with PySpark for now. >

[VOTE] SPIP: Support pandas API layer on PySpark

2021-03-26 Thread Hyukjin Kwon
Hi all, I’d like to start a vote for SPIP: Support pandas API layer on PySpark. The proposal is to embrace Koalas in PySpark to have the pandas API layer on PySpark. Please also refer to: - Previous discussion in dev mailing list: [DISCUSS] Support pandas API layer on PySpark

Re: [VOTE] SPIP: Support pandas API layer on PySpark

2021-03-26 Thread Hyukjin Kwon
I'll start with my +1 (binding) On Fri, 26 Mar 2021, 23:52 Hyukjin Kwon, wrote: > Hi all, > > I’d like to start a vote for SPIP: Support pandas API layer on PySpark. > > The proposal is to embrace Koalas in PySpark to have the pandas API layer > on PySpark. >

Re: Welcoming six new Apache Spark committers

2021-03-26 Thread Hyukjin Kwon
Congrats guys. Well deserved! On Sat, 27 Mar 2021, 05:28 Matei Zaharia, wrote: > Hi all, > > The Spark PMC recently voted to add several new committers. Please join me > in welcoming them to their new role! Our new committers are: > > - Maciej Szymkiewicz (contributor to PySpark) > - Max Gekk (c

Re: [VOTE] SPIP: Support pandas API layer on PySpark

2021-03-29 Thread Hyukjin Kwon
The vote passed with the following 20 +1 votes and no -1 or +0 votes: Hyukjin Kwon* Dongjoon Hyun* Maciej Szymkiewicz Bryan Cutler Reynold Xin* Liang-Chi Hsieh Takeshi Yamamuro Xiao Li* Mridul Muralidharan* Gengliang Wang Matei Zaharia* Maxim Gekk 郑瑞峰 (Ruifeng Zheng) Denny Lee Kousuke Saruta

Re: Apache Spark 2.4.8 (and EOL of 2.4)

2021-04-04 Thread Hyukjin Kwon
I would +1for just going ahead. That looks flaky to me too. Thanks Langchi for driving this! On Sun, 4 Apr 2021, 18:17 Liang-Chi Hsieh, wrote: > Hi devs, > > Currently no open issues or ongoing issues targeting 2.4. > > On QA test dashboard, only spark-branch-2.4-test-sbt-hadoop-2.6 is in red >

Re: Support User Defined Types in pandas_udf for Spark's own Python API

2021-04-06 Thread Hyukjin Kwon
7;t remove the existing APIs in general)? > > > Fourthly, PySpark is still not Pythonic enough. For example, I hear > complaints such as "why does > > PySpark follow pascalCase?" or "PySpark APIs are difficult to learn", > and APIs are very difficult to c

Increase the number of parallel jobs in GitHub Actions at ASF organization level

2021-04-06 Thread Hyukjin Kwon
Hi all, I am an Apache Spark PMC, and would like to know the future plan about GitHub Actions in ASF. Please also see the INFRA ticket I filed: https://issues.apache.org/jira/browse/INFRA-21646. I am aware of the limited GitHub Actions resources that are shared across all projects in ASF, and man

Re: Increase the number of parallel jobs in GitHub Actions at ASF organization level

2021-04-07 Thread Hyukjin Kwon
suffer from the lack of resources. I appreciate the resources provided to us but that does not resolve the issue of the development being slowed down. 2021년 4월 7일 (수) 오후 5:52, Greg Stein 님이 작성: > On Wed, Apr 7, 2021 at 12:25 AM Hyukjin Kwon wrote: > >> Hi all, >> >>

Re: Increase the number of parallel jobs in GitHub Actions at ASF organization level

2021-04-07 Thread Hyukjin Kwon
anisation shares all resources across the projects. 2021년 4월 7일 (수) 오후 10:04, Martin Grigorov 님이 작성: > > > On Wed, Apr 7, 2021 at 3:41 PM Hyukjin Kwon wrote: > >> Hi Greg, >> >> I raised this thread to figure out a way that we can work together to >> resolve this

Re: [DISCUSS] Build error message guideline

2021-04-07 Thread Hyukjin Kwon
LGTM (I took a look, and had some offline discussions w/ some corrections before it came out) 2021년 4월 8일 (목) 오전 5:28, Karen 님이 작성: > Hi all, > > As discussed in SPIP: Standardize Exception Messages in Spark ( > https://docs.google.com/document/d/1XGj1o3xAFh8BA7RCn3DtwIPC6--hIFOaNUNSlpaOIZs/edit?

Re: Increase the number of parallel jobs in GitHub Actions at ASF organization level

2021-04-07 Thread Hyukjin Kwon
if we can make it a business: we allow > individual projects to sign deals with Github to get dedicated resources. > It's a bit wasteful to ask every project to set up its own dev ops, > using Github Action is more convenient. Maybe we should raise it to Github? > > On Wed, Apr 7, 2021 a

Re: [VOTE] Release Spark 2.4.8 (RC2)

2021-04-12 Thread Hyukjin Kwon
+1 On Tue, 13 Apr 2021, 02:58 Sean Owen, wrote: > +1 same result as last RC for me. > > On Mon, Apr 12, 2021, 12:53 AM Liang-Chi Hsieh wrote: > >> Please vote on releasing the following candidate as Apache Spark version >> 2.4.8. >> >> The vote is open until Apr 15th at 9AM PST and passes if a

Re: [DISCUSS] Build error message guideline

2021-04-13 Thread Hyukjin Kwon
are these > guidelines with the wider community. A good landing page for contributors > could be https://spark.apache.org/contributing.html. What do you think? > > Thank you, > > Karen Feng > > On Wed, Apr 7, 2021 at 8:19 PM Hyukjin Kwon wrote: > >> LGTM (I took a look

[PSA] Please read: PR builder now runs test and build in your forked repository

2021-04-13 Thread Hyukjin Kwon
Hi all, After https://github.com/apache/spark/pull/32092 merged, now we run the GitHub Actions workflows in your forked repository. In short, please see this example HyukjinKwon#34 1. You create a PR and your repository triggers the workflow. You

Re: [PSA] Please read: PR builder now runs test and build in your forked repository

2021-04-13 Thread Hyukjin Kwon
repository). Please check the build notified by github-actions bot before merging it. There would be a followup work to reflect the status of the forked repository's build to the status of PR. 2021년 4월 14일 (수) 오후 1:42, Hyukjin Kwon 님이 작성: > Hi all, > > After https://github.com/apach

Re: [PSA] Please read: PR builder now runs test and build in your forked repository

2021-04-14 Thread Hyukjin Kwon
t;>> On Wed, Apr 14, 2021 at 1:00 PM Gengliang Wang wrote: >>> >>>> Thanks for the amazing work, Hyukjin! >>>> I created a PR for trial and it looks well so far: >>>> https://github.com/apache/spark/pull/32158 >>>> >>>> On

Re: please read: current state and the future of the apache spark build system

2021-04-14 Thread Hyukjin Kwon
Thanks Shane!! On Thu, 15 Apr 2021, 09:03 shane knapp ☠, wrote: > medium term (in 6 months): >> * prepare jenkins worker ansible configs and stick in the spark repo >> - nothing fancy, but enough to config ubuntu workers >> - could be used to create docker containers for testing in >> THE CL

Re: [PSA] Please read: PR builder now runs test and build in your forked repository

2021-04-14 Thread Hyukjin Kwon
ilapiros/spark/runs/2344911058?check_suite_focus=true >> (some other failures noticed) >> >> >> Bests, >> >> Kent >> >> Dongjoon Hyun 于2021年4月14日周三 下午11:34写道: >> > >> > Thank you again, Hyukjin. >> > >> &g

Re: [PSA] Please read: PR builder now runs test and build in your forked repository

2021-04-14 Thread Hyukjin Kwon
The fix will be straightforward. We can either, in Github Actions workflow,: - remove fast forward option and see if ti works - or git rebase before merge the branch 2021년 4월 15일 (목) 오전 11:00, Hyukjin Kwon 님이 작성: > I think it works mostly correctly as Dongjoon investigated and shared > (Th

Re: [PSA] Please read: PR builder now runs test and build in your forked repository

2021-04-14 Thread Hyukjin Kwon
The issue is fixed now. Please keep monitoring this. Thank you all! The spark community is super active and cooperative! 2021년 4월 15일 (목) 오전 11:01, Hyukjin Kwon 님이 작성: > The fix will be straightforward. We can either, in Github Actions > workflow,: > - remove fast forward option and

Re: [PSA] Please read: PR builder now runs test and build in your forked repository

2021-04-14 Thread Hyukjin Kwon
aooqinn/spark-func-extras>A** library t**hat > brings useful functions from various modern database management systems to > **Apache > Spark <http://spark.apache.org/>.* > > > > On 04/15/2021 12:09,Hyukjin Kwon > wrote: > > The issue is fixed now. Please keep m

Re: [VOTE] Release Spark 2.4.8 (RC3)

2021-04-28 Thread Hyukjin Kwon
+1 On Thu, 29 Apr 2021, 07:08 Sean Owen, wrote: > +1 from me too, same result as last time. > > On Wed, Apr 28, 2021 at 11:33 AM Liang-Chi Hsieh wrote: > >> Please vote on releasing the following candidate as Apache Spark version >> 2.4.8. >> >> The vote is open until May 4th at 9AM PST and pas

Re: [VOTE] Release Spark 2.4.8 (RC4)

2021-05-10 Thread Hyukjin Kwon
+1 2021년 5월 10일 (월) 오후 4:45, John Zhuge 님이 작성: > No, just try to build a Java project with Maven RC repo. > > Validated checksum and signature; ran RAT checks; built the source and ran > unit tests. > > +1 (non-binding) > > On Sun, May 9, 2021 at 11:10 PM Liang-Chi Hsieh wrote: > >> Yea, I don't

Re: Apache Spark 3.1.2 Release?

2021-05-17 Thread Hyukjin Kwon
+1 thanks for driving me On Tue, 18 May 2021, 09:33 Holden Karau, wrote: > +1 and thanks for volunteering to be the RM :) > > On Mon, May 17, 2021 at 4:09 PM Takeshi Yamamuro > wrote: > >> Thank you, Dongjoon~ sgtm, too. >> >> On Tue, May 18, 2021 at 7:34 AM Cheng Su wrote: >> >>> +1 for a new

Re: [ANNOUNCE] Apache Spark 2.4.8 released

2021-05-17 Thread Hyukjin Kwon
Yay! 2021년 5월 18일 (화) 오후 12:57, Liang-Chi Hsieh 님이 작성: > We are happy to announce the availability of Spark 2.4.8! > > Spark 2.4.8 is a maintenance release containing stability, correctness, and > security fixes. > This release is based on the branch-2.4 maintenance branch of Spark. We > strongly

Re: Resolves too old JIRAs as incomplete

2021-05-19 Thread Hyukjin Kwon
Yeah, I wanted to discuss this. I agree since 2.4.x became EOL 2021년 5월 20일 (목) 오전 10:54, Sean Owen 님이 작성: > I agree. Such old JIRAs are 99% obsolete. If anyone objects to a > particular issue being closed, they can comment and we can reopen. It's a > very reversible thing. There is value in keep

Re: Resolves too old JIRAs as incomplete

2021-05-24 Thread Hyukjin Kwon
Awesome, thanks Takeshi! 2021년 5월 25일 (화) 오전 10:59, Takeshi Yamamuro 님이 작성: > FYI: > > Thank you for all the comments. > I closed 754 tickets in bulk a few minutes ago. > Please let me know if there is any problem. > > Bests, > Takeshi > > On Fri, May 21, 2021 at 10:29 AM Kent Yao wrote: > >> +1

Re: [VOTE] Release Spark 3.1.2 (RC1)

2021-05-25 Thread Hyukjin Kwon
+1 2021년 5월 26일 (수) 오전 9:00, Cheng Su 님이 작성: > +1 (non-binding) > > > > Checked the related commits in commit history manually. > > > > Thanks! > > Cheng Su > > > > *From: *Takeshi Yamamuro > *Date: *Tuesday, May 25, 2021 at 4:47 PM > *To: *Dongjoon Hyun , dev > *Subject: *Re: [VOTE] Release Sp

Re: [ANNOUNCE] Apache Spark 3.1.2 released

2021-06-01 Thread Hyukjin Kwon
awesome! 2021년 6월 2일 (수) 오전 9:59, Dongjoon Hyun 님이 작성: > We are happy to announce the availability of Spark 3.1.2! > > Spark 3.1.2 is a maintenance release containing stability fixes. This > release is based on the branch-3.1 maintenance branch of Spark. We strongly > recommend all 3.1 users to u

Re: Apache Spark 3.0.3 Release?

2021-06-08 Thread Hyukjin Kwon
Yeah, +1 2021년 6월 9일 (수) 오후 12:06, Yi Wu 님이 작성: > Hi, All. > > Since Apache Spark 3.0.2 tag creation (Feb 16), > new 119 patches (92 issues > > resolved) arrived at branch-3.0. > > Shall we make a new release, Apache Spark 3.0.3, a

Re: Apache Spark 3.2 Expectation

2021-06-15 Thread Hyukjin Kwon
e September 2. >> >> I'm updating the release dates in >> https://github.com/apache/spark-website/pull/331 >> >> Thanks, >> Wenchen >> >> On Thu, Mar 11, 2021 at 11:17 PM Dongjoon Hyun >> wrote: >> >>> Thank you, Xiao, Wenc

Re: Apache Spark 3.2 Expectation

2021-06-16 Thread Hyukjin Kwon
>>>>> >> StateStore and session window, we're working on them and expect to >>>>> have >>>>> >> them >>>>> >> in the new release. >>>>> >> >>>>> >> So I propose to postpone the branch cut date.

Re: Apache Spark 3.2 Expectation

2021-06-16 Thread Hyukjin Kwon
*GA -> QA On Thu, 17 Jun 2021, 15:16 Hyukjin Kwon, wrote: > I think we would make sure treating these items in the list as exceptions > from the code freeze, and discourage to push new APIs and features though. > > GA period ideally we should focus on bug fixes and polishing.

Re: [VOTE] Release Spark 3.0.3 (RC1)

2021-06-20 Thread Hyukjin Kwon
+1 2021년 6월 21일 (월) 오후 2:19, Dongjoon Hyun 님이 작성: > +1 > > Thank you, Yi. > > Bests, > Dongjoon. > > > On Sat, Jun 19, 2021 at 6:57 PM Yuming Wang wrote: > >> +1 >> >> Tested a batch of production query with Thrift Server. >> >> On Sat, Jun 19, 2021 at 3:04 PM Mridul Muralidharan >> wrote: >> >

Flaky build in GitHub Actions

2021-07-20 Thread Hyukjin Kwon
Hi all, Looks like there's something going on in the machines in GitHub Actions. The build is now very flaky and keeps dying with symptoms like I guess out-of-memory (?). I will try to take a closer look tomorrow but it would be great if you guys find some time to take a look into it 🙏

Re: Flaky build in GitHub Actions

2021-07-20 Thread Hyukjin Kwon
I filed a ticket at GitHub. I will share more details when I get a response from them. 2021년 7월 20일 (화) 오후 7:30, Hyukjin Kwon 님이 작성: > Hi all, > > Looks like there's something going on in the machines in GitHub Actions. > The build is now very flaky and keeps dying with symp

Re: Flaky build in GitHub Actions

2021-07-21 Thread Hyukjin Kwon
ntime, I'm assuming if things pass Jenkins we are OK with merging yes? > > On Wed, Jul 21, 2021 at 10:03 AM Dongjoon Hyun > wrote: > >> Thank you, Hyukjin! >> >> Dongjoon. >> >> On Tue, Jul 20, 2021 at 8:53 PM Hyukjin Kwon wrote: >> >>>

Re: Flaky build in GitHub Actions

2021-07-25 Thread Hyukjin Kwon
ecific to Apahc Spark repo. 2021년 7월 22일 (목) 오전 9:40, Hyukjin Kwon 님이 작성: > FYI, @Liang-Chi Hsieh is trying to control the memory > in the test base at https://github.com/apache/spark/pull/33447 which > looks almost promising now. > While I don't object to merge things, would n

Re: ASF board report draft for August

2021-08-09 Thread Hyukjin Kwon
There is an SPIP passed and ready for Spark 3.2: pandas API on Spark: - JIRA: SPIP: Support pandas API layer on PySpark ( https://issues.apache.org/jira/browse/SPARK-34849) - Vote: [VOTE] SPIP: Support pandas API layer on PySpark ( https://www.mail-archive.com/dev@spark.apache.org/msg27605.html) -

Re: ASF board report draft for August

2021-08-09 Thread Hyukjin Kwon
> Are you referring to what version of Koala project? 1.8.1? Yes, the latest version 1.8.1. 2021년 8월 10일 (화) 오전 11:07, Igor Costa 님이 작성: > Hi Matei, nice update > > > Just one question, when you mention “ We are working on Spark 3.2.0 as > our next release, with a release candidate likely to com

Re: Time to start publishing Spark Docker Images?

2021-08-12 Thread Hyukjin Kwon
+1, I think we generally agreed upon having it. Thanks Holden for headsup and driving this. +@Dongjoon Hyun FYI 2021년 7월 22일 (목) 오후 12:22, Kent Yao 님이 작성: > +1 > > Bests, > > *Kent Yao * > @ Data Science Center, Hangzhou Research Institute, NetEase Corp. > *a spark enthusiast* > *kyuubi

Re: -1s on committed but not released code?

2021-08-19 Thread Hyukjin Kwon
Yeah, I think we can discuss and revert it (or fix it) per the veto set. Often problems are found later after codes are merged. 2021년 8월 20일 (금) 오전 4:08, Mridul Muralidharan 님이 작성: > Hi Holden, > > In the past, I have seen discussions on the merged pr to thrash out the > details. > Usually it

Re: CRAN package SparkR

2021-08-31 Thread Hyukjin Kwon
Oh I missed this. Yes, can we simply get the user' confirmation when we install.spark? IIRC, the auto installation is only triggered by interactive shell so getting user's confirmation should be fine. 2021년 6월 18일 (금) 오전 2:54, Felix Cheung 님이 작성: > Any suggestion or comment on this? They are goin

Re: CRAN package SparkR

2021-09-01 Thread Hyukjin Kwon
be enough. This checks for > interactive() > > > https://github.com/apache/spark/blob/c6a2021fec5bab9069fbfba33f75d4415ea76e99/R/pkg/R/sparkR.R#L658 > > > On Tue, Aug 31, 2021 at 5:55 PM Hyukjin Kwon wrote: > >> Oh I missed this. Yes, can we simply get the user'

Re: Adding Spark 4 to JIRA for targetted versions

2021-09-13 Thread Hyukjin Kwon
BTW, I vaguely remember that adding a new version affects the default version for the merging script to use for JIRA resolution. e.g., now it's 3.3.0 but it becomes 4.0.0 ... Maybe it's nicer to double check how it's affected. 2021년 9월 14일 (화) 오후 1:32, Dongjoon Hyun 님이 작성: > I'm fine to have the

Re: [DISCUSS] SPIP: Storage Partitioned Join for Data Source V2

2021-10-26 Thread Hyukjin Kwon
Seems making sense to me. Would be great to have some feedback from people such as @Wenchen Fan @Cheng Su @angers zhu . On Tue, 26 Oct 2021 at 17:25, Dongjoon Hyun wrote: > +1 for this SPIP. > > On Sun, Oct 24, 2021 at 9:59 AM huaxin gao wrote: > >> +1. Thanks for lifting the current restri

Update Spark 3.3 release window?

2021-10-27 Thread Hyukjin Kwon
Hi all, Spark 3.2. is out. Shall we update the release window https://spark.apache.org/versioning-policy.html? I am thinking of Mid March 2022 (5 months after the 3.2 release) for code freeze and onward.

DataFrame.mapInArrow

2021-11-09 Thread Hyukjin Kwon
Hi dev, I proposed DataFrame.mapInArrow (https://github.com/apache/spark/pull/34505) which allows users to directly leverage Arrow batch to plug in other external systems easily. I would like to make sure this design of API covers most use cases, and would like to know if there is other feedback

Re: DataFrame.mapInArrow

2021-11-10 Thread Hyukjin Kwon
Last reminder: I plan to merge this in a few more days. Any feedback and review would be very appreciated. On Tue, 9 Nov 2021 at 21:51, Hyukjin Kwon wrote: > Hi dev, > > I proposed DataFrame.mapInArrow ( > https://github.com/apache/spark/pull/34505) which allows users to > di

Re: DataFrame.mapInArrow

2021-11-10 Thread Hyukjin Kwon
Sure, thanks Holden :-). On Thu, 11 Nov 2021 at 15:53, Holden Karau wrote: > Sorry I've been busy, I'll try and take a look tomorrow, excited to see > this progress though :) > > On Wed, Nov 10, 2021 at 9:01 PM Hyukjin Kwon wrote: > >> Last reminder: I plan t

Re: [FYI] Build and run tests on Java 17 for Apache Spark 3.3

2021-11-12 Thread Hyukjin Kwon
Awesome! On Sat, Nov 13, 2021 at 12:04 PM Xiao Li wrote: > Thank you! Great job! > > Xiao > > > On Fri, Nov 12, 2021 at 7:02 PM Mridul Muralidharan > wrote: > >> >> Nice job ! >> There are some nice API's which should be interesting to explore with JDK >> 17 :-) >> >> Regards. >> Mridul >> >> O

Re: Supports Dynamic Table Options for Spark SQL

2021-11-15 Thread Hyukjin Kwon
My biggest concern with the syntax in hints is that Spark SQL's options can change results (e.g., CSV's header options) whereas hints are generally not designed to affect the external results if I am not mistaken. This is counterintuitive. I left the comment in the PR but what's the real benefit ov

Re: Jira components cleanup

2021-11-28 Thread Hyukjin Kwon
Thanks Nicholas for raising this, and Sean for updating it! On Tue, 16 Nov 2021 at 03:27, Sean Owen wrote: > Done. Now let's see if that generated 86 update emails! > > On Mon, Nov 15, 2021 at 11:03 AM Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > >> >> https://issues.apache.org/jira

Re: [DISCUSSION] SPIP: Support Volcano/Alternative Schedulers Proposal

2021-11-30 Thread Hyukjin Kwon
Adding @Holden Karau @Dongjoon Hyun @wuyi FYI On Tue, 30 Nov 2021 at 17:46, Yikun Jiang wrote: > Hey everyone, > > I'd like to start a discussion on "Support Volcano/Alternative Schedulers > Proposal". > > This SPIP is proposed to make spark k8s schedulers provide more YARN like > features (s

Re: [Apache Spark Jenkins] build system shutting down Dec 23th, 2021

2021-12-06 Thread Hyukjin Kwon
Thanks, Shane. On Tue, 7 Dec 2021 at 09:19, Dongjoon Hyun wrote: > I really want to thank you for all your help. > You've done so many things for the Apache Spark community. > > Sincerely, > Dongjoon > > > On Mon, Dec 6, 2021 at 12:02 PM shane knapp ☠ wrote: > >> hey everyone! >> >> after a mar

Time for Spark 3.2.1?

2021-12-06 Thread Hyukjin Kwon
Hi all, It's been two months since Spark 3.2.0 release, and we have resolved many bug fixes and regressions. What do you guys think about rolling Spark 3.2.1 release? cc @huaxin gao FYI who I happened to overhear that is interested in rolling the maintenance release :-).

Re: Time for Spark 3.2.1?

2021-12-07 Thread Hyukjin Kwon
t;> It's been ~6 months since the last 3.0.x and 3.1.x releases, too; a new >>> release of those wouldn't hurt either, if any of our release managers have >>> the time or inclination. 3.0.x is reaching unofficial end-of-life around >>> now anyway. >>>

Re: Time for Spark 3.2.1?

2021-12-07 Thread Hyukjin Kwon
SGTM! On Wed, 8 Dec 2021 at 09:07, huaxin gao wrote: > I prefer to start rolling the release in January if there is no need to > publish it sooner :) > > On Tue, Dec 7, 2021 at 3:59 PM Hyukjin Kwon wrote: > >> Oh BTW, I realised that it's a holiday season soon this mo

Re: Hadoop profile change to hadoop-2 and hadoop-3 since Spark 3.3

2021-12-11 Thread Hyukjin Kwon
cc @Holden Karau @DB Tsai @Imran Rashid @Mridul Muralidharan FYI On Thu, 9 Dec 2021 at 14:07, angers zhu wrote: > Hi all, > > Since Spark 3.2, we have supported Hadoop 3.3.1 now, but its profile name > is *hadoop-3.2* (and *hadoop-2.7*) that is not correct. > So we made a change in https://g

Re: Hadoop profile change to hadoop-2 and hadoop-3 since Spark 3.3

2021-12-11 Thread Hyukjin Kwon
and @tgra...@apache.org too On Sat, 11 Dec 2021 at 21:38, Hyukjin Kwon wrote: > cc @Holden Karau @DB Tsai @Imran > Rashid @Mridul Muralidharan FYI > > On Thu, 9 Dec 2021 at 14:07, angers zhu wrote: > >> Hi all, >> >> Since Spark 3.2, we have supported

Re: Conda Python Env in K8S

2021-12-24 Thread Hyukjin Kwon
Can you share the logs, settings, environment, etc. and file a JIRA? There are integration test cases for K8S support, and I myself also tested it before. It would be helpful if you try what I did at https://databricks.com/blog/2020/12/22/how-to-manage-python-dependencies-in-pyspark.html and see if

[DISCUSS] Rename 'SQL' to 'SQL / DataFrame', and 'Query' to 'Execution' in SQL UI page

2022-03-27 Thread Hyukjin Kwon
Hi all, I have been investigating the improvements for Pandas API on Spark specifically in UI. I chatted with a couple of people, and decided to send an email here to discuss more. Currently, both SQL and DataFrame API are shown in “SQL” tab as below: [image: Screen Shot 2022-03-25 at 12.18.14 P

PR builder not working now

2022-04-11 Thread Hyukjin Kwon
Hi all, There is a bug in GitHub Actions' RESTful API (see https://github.com/HyukjinKwon/spark/actions?query=branch%3Adebug-ga-detection as an example). So, currently OSS PR builder doesn't work properly with showing a screen such as https://github.com/apache/spark/pull/36157/checks?check_run_id=

Re: PR builder not working now

2022-04-18 Thread Hyukjin Kwon
It's still persistent. I will send an email to GitHub support today On Wed, 13 Apr 2022 at 11:04, Dongjoon Hyun wrote: > Thank you for sharing that information! > > Bests > Dongjoon. > > > On Mon, Apr 11, 2022 at 10:29 PM Hyukjin Kwon wrote: > >> Hi all, >

Re: PR builder not working now

2022-04-19 Thread Hyukjin Kwon
It's fixed now. On Tue, 19 Apr 2022 at 08:33, Hyukjin Kwon wrote: > It's still persistent. I will send an email to GitHub support today > > On Wed, 13 Apr 2022 at 11:04, Dongjoon Hyun > wrote: > >> Thank you for sharing that information! >> >> Bests

Re: [VOTE] Release Spark 3.3.0 (RC1)

2022-05-10 Thread Hyukjin Kwon
I expect to see RC2 too. I guess he just sticks to the standard, leaving the vote open till the end. It hasn't got enough +1s anyway :-). On Wed, 11 May 2022 at 10:17, Holden Karau wrote: > Technically release don't follow vetos (see > https://www.apache.org/foundation/voting.html ) it's up to t

Re: Contributor data in github-page no longer updated after May 1

2022-05-11 Thread Hyukjin Kwon
It's very likely a GitHub issue On Wed, 11 May 2022 at 18:01, Yang,Jie(INF) wrote: > Hi, teams > > > > The contributors data in the following page seems no longer updated after > May 1, Can anyone fix it? > > > > > https://github.com/apache/spark/graphs/contributors?from=2022-05-01&to=2022-05-1

Re: SIGMOD System Award for Apache Spark

2022-05-12 Thread Hyukjin Kwon
Awesome! On Fri, May 13, 2022 at 5:29 AM Mosharaf Chowdhury wrote: > Wow! Congratulations to everyone indeed. > > On Thu, May 12, 2022 at 3:44 PM Matei Zaharia > wrote: > >> Hi all, >> >> We recently found out that Apache Spark received >> the SIGMOD Sys

  1   2   3   4   5   6   7   8   >