Re: Running lint-java during PR builds?

2018-05-28 Thread Hyukjin Kwon
gt; backed up since - if I recall - all of ASF shares one queue. > >> > >> At the number of PRs Spark has this could be a big issue. > >> > >> > >> > >> From: Marcelo Vanzin <van...@cloudera.com> > >> Sent: Monday, May 21, 2018 9:08:28 AM > &g

Re: [VOTE] Spark 2.3.1 (RC4)

2018-06-03 Thread Hyukjin Kwon
+1 2018년 6월 3일 (일) 오후 9:25, Ricardo Almeida 님이 작성: > +1 (non-binding) > > On 3 June 2018 at 09:23, Dongjoon Hyun wrote: > >> +1 >> >> Bests, >> Dongjoon. >> >> On Sat, Jun 2, 2018 at 8:09 PM, Denny Lee wrote: >> >>> +1 >>> >>> On Sat, Jun 2, 2018 at 4:53 PM Nicholas Chammas < >>>

Re: [build system] meet your build engineer @ spark ai summit SF 2018

2018-06-07 Thread Hyukjin Kwon
I regret that I couldn't make it to Spark Summit :(. 2018년 6월 6일 (수) 오전 3:25, Holden Karau 님이 작성: > That's awesome! > > On Tue, Jun 5, 2018 at 12:23 PM, shane knapp wrote: > >> just a reminder to come meet your build engineer! >> >> we'll also be having a couple of demos of current projects in

Re: [VOTE] Spark 2.2.2 (RC2)

2018-07-01 Thread Hyukjin Kwon
enchen Fan wrote: >> >>> +1 >>> >>> On Thu, Jun 28, 2018 at 10:19 AM zhenya Sun wrote: >>> >>>> +1 >>>> >>>> 在 2018年6月28日,上午10:15,Hyukjin Kwon 写道: >>>> >>>> +1 >>>> >>>> 201

Re: Spark-XML maintenance

2017-10-28 Thread Hyukjin Kwon
I am sorry about the delay. I was super busy ... Will try to take a look for all of it within the next week, 2017-10-27 2:23 GMT+09:00 Reynold Xin : > Adding Hyukjin who has been maintaining it. > > The easiest is probably to leave comments in the repo. > > On Thu, Oct 26,

Re: Thoughts on Cloudpickle Update

2018-01-15 Thread Hyukjin Kwon
Hi Bryan, Yup, I support to match the version. I pushed it forward before to match it with https://github.com/cloudpipe/cloudpickle before few times in Spark's copy and also cloudpickle itself with few fixes. I believe our copy is closest to 0.4.1. I have been trying to follow up the changes in

Re: Thoughts on Cloudpickle Update

2018-01-19 Thread Hyukjin Kwon
n 19, 2018 7:28 PM, "Hyukjin Kwon" <gurwls...@gmail.com> wrote: > > > Is it an option to match the latest version of cloudpickle and still > set protocol level 2? > > IMHO, I think this can be an option but I am not fully sure yet if we > should/could go ahead for it w

Re: Thoughts on Cloudpickle Update

2018-01-19 Thread Hyukjin Kwon
l at that point for if we want it > to go to master or master & branch-2.3? > > On Fri, Jan 19, 2018 at 12:30 AM, Hyukjin Kwon <gurwls...@gmail.com> > wrote: > >> > So given that it fixes some real world bugs, any particular reason >> why? Would you be com

Re: Thoughts on Cloudpickle Update

2018-01-18 Thread Hyukjin Kwon
protocol level 2? >> >> I agree that upgrading to try and match version 0.4.2 would be a good >> starting point. Unless no one objects, I will open up a JIRA and try to do >> this. >> >> Thanks, >> Bryan >> >> On Mon, Jan 15, 2018 at 7:5

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-19 Thread Hyukjin Kwon
Ah, I see. For 1), I overlooked Felix's input here. I couldn't foresee this when I added this documentation because it worked in my simple demo: https://spark-test.github.io/sparksqldoc/search.html?q=approx https://spark-test.github.io/sparksqldoc/#approx_percentile Will try to investigate this

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-20 Thread Hyukjin Kwon
+1 too 2018-02-20 14:41 GMT+09:00 Takuya UESHIN : > +1 > > > On Tue, Feb 20, 2018 at 2:14 PM, Xingbo Jiang > wrote: > >> +1 >> >> >> Wenchen Fan 于2018年2月20日 周二下午1:09写道: >> >>> +1 >>> >>> On Tue, Feb 20, 2018 at 12:53 PM,

Re: [VOTE] Spark 2.3.0 (RC5)

2018-02-24 Thread Hyukjin Kwon
+1 2018-02-24 16:57 GMT+09:00 Bryan Cutler : > +1 > Tests passed and additionally ran Arrow related tests and did some perf > checks with python 2.7.14 > > On Fri, Feb 23, 2018 at 6:18 PM, Holden Karau > wrote: > >> Note: given the state of Jenkins I'd

Re: Assign SPARK JIRA 18844 to me

2018-01-03 Thread Hyukjin Kwon
No need to assign. Just leave a comment saying you are working on it and open a PR. 2018-01-04 14:09 GMT+09:00 Sandeep Kr. Choudhary < tssandeepkumarchoudh...@gmail.com>: > Hi All, > > This is Sandeep from India. I was trying to solve the SPARK- >

Re: Review notification bot

2018-07-30 Thread Hyukjin Kwon
Holden, so, is it a fork in https://github.com/facebookarchive/mention-bot? Would you mind if I ask where I can see the configurations for it? 2018년 7월 23일 (월) 오전 10:16, Holden Karau 님이 작성: > Yeah so the issue with codeowners is it will only assign to committers on > the repo (the Beam project

Re: [Spark SQL] Future of CalendarInterval

2018-07-30 Thread Hyukjin Kwon
FYI, org.apache.spark.unsafe.types.CalendarInterval is undocumented in both scaladoc/javadoc (entire unsafe module) but org.apache.spark.sql.types.CalendarIntervalType is exposed ( https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.types.CalendarIntervalType ) +1 for

Re: Review notification bot

2018-07-30 Thread Hyukjin Kwon
nal, is there something you want to try and > change? > > On Mon, Jul 30, 2018 at 7:30 PM Hyukjin Kwon wrote: > >> I see. Thanks. I was wondering if I can see the configuration file since >> that looks needed (https://github.com/holdenk/mention-bot#configuration) >> but I

Re: Review notification bot

2018-07-30 Thread Hyukjin Kwon
he folks being pinged are not just committers. The hope > is to get more code authors who aren't committers involved in the reviews > and then eventually become committers. > > On Mon, Jul 30, 2018, 9:09 PM Hyukjin Kwon wrote: > >> *reviewers: I mean people who committed the PR given

Re: Review notification bot

2018-07-30 Thread Hyukjin Kwon
ing is the the form in my own repo (set up for K8s > deployment) - http://github.com/holdenk/mention-bot > > On Mon, Jul 30, 2018 at 3:15 AM Hyukjin Kwon wrote: > >> Holden, so, is it a fork in >> https://github.com/facebookarchive/mention-bot? Would you mind if I ask >> w

Re: Review notification bot

2018-07-31 Thread Hyukjin Kwon
nt-409035244 > > Is the issue that @-mentions cause emails too? > > Is there any option to maybe only consider pinging someone if they've > touched the code within the last N months? > > On Tue, Jul 31, 2018 at 2:31 AM Hyukjin Kwon wrote: > >> > I originally did t

Re: Review notification bot

2018-07-31 Thread Hyukjin Kwon
>> >>>> Also if we are going to use this, can we rename the bot to something >>>> like spark-bot, rather than holden's personal bot? >>>> >>> I originally did that, but GitHub told me I could only have one personal >>> and one bot account.

Re: Review notification bot

2018-07-30 Thread Hyukjin Kwon
*reviewers: I mean people who committed the PR given my observation. 2018년 7월 31일 (화) 오전 11:50, Hyukjin Kwon 님이 작성: > I was wondering if we can leave the configuration open and accept some > custom configurations, IMHO, because I saw some people less related or less > active are con

Re: [build system] bumped pull request builder job timeout to 400mins

2018-08-07 Thread Hyukjin Kwon
Thanks, Shane. 2018년 8월 8일 (수) 오전 1:05, shane knapp 님이 작성: > i hate doing this, because our tests and builds take WY too long, > but this should help get PRs through before the code freeze. > > -- > Shane Knapp > UC Berkeley EECS Research / RISELab Staff Technical Lead >

Re: best way to run one python test?

2018-08-20 Thread Hyukjin Kwon
s.py#L74-L97 > > those don't matter in most cases I guess? > > On Sun, Aug 19, 2018 at 11:54 PM Hyukjin Kwon wrote: > >> There's informal way to test specific tests. For instance: >> >> SPARK_TESTING=1 ../bin/pyspark pyspark.sql.tests VectorizedUDFTests >> >&g

Porting or explicitly linking project style in Apache Spark based on https://github.com/databricks/scala-style-guide

2018-08-23 Thread Hyukjin Kwon
Hi all, I usually follow https://github.com/databricks/scala-style-guide for Apache Spark's style, which is usually generally the same with the Spark's code base in practice. Thing is, we don't explicitly mention this within Apache Spark as far as I can tell. Can we explicitly mention this or

Re: Porting or explicitly linking project style in Apache Spark based on https://github.com/databricks/scala-style-guide

2018-08-23 Thread Hyukjin Kwon
o follow the code around the code you're changing. > > > > On Thu, Aug 23, 2018 at 8:14 PM Hyukjin Kwon > wrote: > > Hi all, > > > > I usually follow https://github.com/databricks/scala-style-guide for > Apache Spark's style, which is usually generally the same with the S

Re: Porting or explicitly linking project style in Apache Spark based on https://github.com/databricks/scala-style-guide

2018-08-23 Thread Hyukjin Kwon
possible. > > > > On Thu, Aug 23, 2018 at 6:38 PM Hyukjin Kwon wrote: > >> If you meant "Code Style Guide", many of them are missing and it refers >> https://docs.scala-lang.org/style/ not >> https://github.com/databricks/scala-style-guide (please correct me if I

Re: best way to run one python test?

2018-08-19 Thread Hyukjin Kwon
There's informal way to test specific tests. For instance: SPARK_TESTING=1 ../bin/pyspark pyspark.sql.tests VectorizedUDFTests I have a partial fix for our testing script to support this way in my local but couldn't have enough time to make a PR for it yet. 2018년 8월 20일 (월) 오전 11:08, Imran

[DISCUSS] USING syntax for Datasource V2

2018-08-20 Thread Hyukjin Kwon
Hi all, I have been trying to follow `USING` syntax support since that looks currently not supported whereas `format` API supports this. I have been trying to understand why and talked with Ryan. Ryan knows all the details and, He and I thought it's good to post here - I just started to look

Re: [R] discuss: removing lint-r checks for old branches

2018-08-19 Thread Hyukjin Kwon
SGTM too 2018년 8월 12일 (일) 오전 7:41, shane knapp 님이 작성: > they do seem like real failures on branches 2.0 and 2.1. > > regarding infrastructure, centos and ubuntu have lintr pinned to > 1.0.1.9000, and installed via: > devtools::install_github('jimhester/lintr@5431140') > > builds on branches 2.2+

Re: [discuss][minor] impending python 3.x jenkins upgrade... 3.5.x? 3.6.x?

2018-08-19 Thread Hyukjin Kwon
Actually Python 3.7 is released ( https://www.python.org/downloads/release/python-370/) too and I fixed the compatibility issues accordingly - https://github.com/apache/spark/pull/21714 There has been an issue for 3.6 (comparing to lower versions of Python including 3.5) -

Stale PR update and review request

2018-07-15 Thread Hyukjin Kwon
Hi all, I was checking https://spark-prs.appspot.com/users who has PRs more then 10. viirya 13 mgaido91 12 wangyum 12 maropu

Re: Cleaning Spark releases from mirrors, and the flakiness of HiveExternalCatalogVersionsSuite

2018-07-16 Thread Hyukjin Kwon
+1 2018년 7월 17일 (화) 오전 7:34, Sean Owen 님이 작성: > Fix is committed to branches back through 2.2.x, where this test was added. > > There is still some issue; I'm seeing that archive.apache.org is > rate-limiting downloads and frequently returning 503 errors. > > We can help, I guess, by avoiding

Re: Build timeout -- continuous-integration/appveyor/pr — AppVeyor build failed

2018-07-24 Thread Hyukjin Kwon
ve a lot of time to debug *why* this happened, or > how to go about triggering another build, but at the very least we should > up the timeout. > > On Sun, May 13, 2018 at 7:38 PM, Hyukjin Kwon wrote: > >> Yup, I am not saying it's required but might be better since that's >> written

Re: Build timeout -- continuous-integration/appveyor/pr — AppVeyor build failed

2018-07-24 Thread Hyukjin Kwon
not mistaken. 2018년 7월 25일 (수) 오전 9:44, shane knapp 님이 작성: > out of curiosity: why are we using appveyor again? > > closing and reopening PRs solely to retrigger builds seems... cumbersome. > > shane > > On Tue, Jul 24, 2018 at 6:09 PM, Hyukjin Kwon wrote: > >> loo

Re: code freeze and branch cut for Apache Spark 2.4

2018-09-05 Thread Hyukjin Kwon
Oops, one more - https://github.com/apache/spark/pull/6. I just read this thread. 2018년 9월 6일 (목) 오후 12:12, Sean Owen 님이 작성: > (I slipped https://github.com/apache/spark/pull/22340 in for Scala 2.12. > Maybe it really is the last one. In any event, yes go ahead with a 2.4 RC) > > On Wed, Sep

Re: python test infrastructure

2018-09-05 Thread Hyukjin Kwon
> 1. all of the output in target/test-reports & python/unit-tests.log should be included in the jenkins archived artifacts. Hmmm, I thought they are already archived ( https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95734/artifact/target/unit-tests.log ). FWIW, unit-tests.log

Re: no logging in pyspark code?

2018-09-05 Thread Hyukjin Kwon
FYI, we do have a basic logging by warnings module. 2018년 8월 28일 (화) 오전 2:05, Imran Rashid 님이 작성: > ah, great, thanks! sorry I missed that, I'll watch that jira. > > On Mon, Aug 27, 2018 at 12:41 PM Ilan Filonenko wrote: > >> A JIRA has been opened up on this exact topic: SPARK-25236 >>

Re: Spark JIRA tags clarification and management

2018-09-06 Thread Hyukjin Kwon
Does anyone know if we still user starter or newbie tags as well? 2018년 9월 4일 (화) 오후 10:00, Kazuaki Ishizaki 님이 작성: > Of course, we would like to eliminate all of the following tags > > "flanky" or "flankytest" > > Kazuaki Ishizaki > > > > Fr

Re: data source api v2 refactoring

2018-09-07 Thread Hyukjin Kwon
BTW, do we hold Datasource V2 related PRs for now until we finish this refactoring just for clarification? 2018년 9월 7일 (금) 오전 12:52, Ryan Blue 님이 작성: > Wenchen, > > I'm not really sure what you're proposing here. What is a `LogicalWrite`? > Is it something that mirrors the read side in your PR?

Re: Branch 2.4 is cut

2018-09-06 Thread Hyukjin Kwon
Thanks, Wenchen. 2018년 9월 6일 (목) 오후 3:32, Wenchen Fan 님이 작성: > Hi all, > > I've cut the branch-2.4 since all the major blockers are resolved. If no > objections I'll shortly followup with an RC to get the QA started in > parallel. > > Committers, please only merge PRs to branch-2.4 that are bug

Re: Spark Streaming : Multiple sources found for csv : Error

2018-08-30 Thread Hyukjin Kwon
Yea, this is exactly what I have been worried of the recent changes (discussed in https://issues.apache.org/jira/browse/SPARK-24924) See https://github.com/apache/spark/pull/17916. This should be fine in upper Spark versions. FYI, +Wechen and Dongjoon I want to add Thomas Graves and Gengliang

Re: [DISCUSS] move away from python doctests

2018-08-31 Thread Hyukjin Kwon
IMHO, one thing we should consider before this is, refactoring the PySpark tests all to make them separate pairs for main codes. Now, we put all those unit tests into few several files, which makes hard to follow the tests. 2018년 8월 31일 (금) 오후 2:05, Felix Cheung 님이 작성: > +1 on what Li said. > >

Re: [VOTE] SPARK 2.3.2 (RC1)

2018-07-08 Thread Hyukjin Kwon
think we will live with this bug for long time anyway. 2018년 7월 9일 (월) 오전 9:28, Saisai Shao 님이 작성: > Thanks @Hyukjin Kwon . Yes I'm using python2 to > build docs, looks like Python2 with Sphinx has issues. > > What is the pending thing for this PR ( > https://github.com/apache/s

Re: V2.3 Scala API to Github Links Incorrect

2018-04-15 Thread Hyukjin Kwon
t; *To: *"Thakrar, Jayesh" <jthak...@conversantmedia.com> > *Cc: *"dev@spark.apache.org" <dev@spark.apache.org>, Hyukjin Kwon < > gurwls...@gmail.com> > *Subject: *Re: V2.3 Scala API to Github Links Incorrect > > > > [+Hyukjin] > > > > Thank

Re: Welcome Zhenhua Wang as a Spark committer

2018-04-01 Thread Hyukjin Kwon
Congratuation, Zhenhua Wang! Very well deserved. 2018-04-02 13:28 GMT+08:00 Wenchen Fan : > Hi all, > > The Spark PMC recently added Zhenhua Wang as a committer on the project. > Zhenhua is the major contributor of the CBO project, and has been > contributing across several

Re: Welcoming some new committers

2018-03-03 Thread Hyukjin Kwon
Congratulations !! On 3 Mar 2018 4:43 pm, "Saisai Shao" wrote: > Congrats to everyone! > > Thanks > Jerry > > 2018-03-03 15:30 GMT+08:00 Liang-Chi Hsieh : > >> >> Congrats to everyone! >> >> >> Kazuaki Ishizaki wrote >> > Congratulations to everyone! >>

Re: [VOTE] Spark 2.2.2 (RC2)

2018-06-27 Thread Hyukjin Kwon
+1 2018년 6월 28일 (목) 오전 8:42, Sean Owen 님이 작성: > +1 from me too. > > On Wed, Jun 27, 2018 at 3:31 PM Tom Graves > wrote: > >> Please vote on releasing the following candidate as Apache Spark version >> 2.2.2. >> >> The vote is open until Mon, July 2nd @ 9PM UTC (2PM PDT) and passes if a >>

Re: Remove Flume support in 3.0.0?

2018-10-12 Thread Hyukjin Kwon
Yea, I thought we are already going to remove this out. +1 for removing it anyway. 2018년 10월 12일 (금) 오전 1:44, Wenchen Fan 님이 작성: > Note that, it was deprecated in 2.3.0 already: > https://spark.apache.org/docs/2.3.0/streaming-flume-integration.html > > On Fri, Oct 12, 2018 at 12:46 AM Reynold

Re: GitHub is out of order

2018-10-21 Thread Hyukjin Kwon
Yea.. please ignore my duplicated comments if they exist. I didn't know it's globally happening but I thought a problem specific to me so I left duplicated comments multiple times. 2018년 10월 22일 (월) 오후 12:40, Dongjoon Hyun 님이 작성: > Hi, All. > > Currently, GitHub is out of order. Apache Spark

Re: GitHub is out of order

2018-10-22 Thread Hyukjin Kwon
It's chaotic now.. can we turn off the Jenkins for a while if the Github is being out of order for a while? My notifications are full of AmblapJenkins bot messages ... On Mon, 22 Oct 2018, 1:13 pm Hyukjin Kwon, wrote: > Yea.. please ignore my duplicated comments if they exist. I didn't k

Re: Hadoop 3 support

2018-10-17 Thread Hyukjin Kwon
See the discussion at https://github.com/apache/spark/pull/21588 2018년 10월 17일 (수) 오전 5:06, t4 님이 작성: > has anyone got spark jars working with hadoop3.1 that they can share? i am > looking to be able to use the latest hadoop-aws fixes from v3.1 > > > > -- > Sent from:

Re: [VOTE] SPARK 2.4.0 (RC4)

2018-10-23 Thread Hyukjin Kwon
I am sorry for raising this late. Out of curiosity, does anyone know why we don't treat SPARK-24935 (https://github.com/apache/spark/pull/22144) as a blocker? It looks it broke a API compatibility, and an actual usecase of an external library (https://github.com/DataSketches/sketches-hive) Also,

Re: [VOTE] SPARK 2.4.0 (RC4)

2018-10-23 Thread Hyukjin Kwon
I am searching and checking some PRs or JIRAs that state regression. Let me leave a link - it might be good to double check https://github.com/apache/spark/pull/22514 as well. 2018년 10월 23일 (화) 오후 11:58, Stavros Kontopoulos < stavros.kontopou...@lightbend.com>님이 작성: > Sean, > > I will try it

Re: [VOTE] SPARK 2.4.0 (RC4)

2018-10-23 Thread Hyukjin Kwon
https://github.com/apache/spark/pull/22514 sounds like a regression that affects Hive CTAS in write path (by not replacing them into Spark internal datasources; therefore performance regression). but yea I suspect if we should block the release by this. https://github.com/apache/spark/pull/22144

Re: What's a blocker?

2018-10-24 Thread Hyukjin Kwon
> Let's understand statements like "X is not a blocker" to mean "I don't think that X is a blocker". Interpretations not proclamations, backed up by reasons, not all of which are appeals to policy and precedent. Might not be a big deal and out of the topic but I rather hope people explicitly avoid

Re: [VOTE] SPARK 2.4.0 (RC5)

2018-10-29 Thread Hyukjin Kwon
+1 2018년 10월 30일 (화) 오전 11:03, Gengliang Wang 님이 작성: > +1 > > > 在 2018年10月30日,上午10:41,Sean Owen 写道: > > > > +1 > > > > Same result as in RC4 from me, and the issues I know of that were > > raised with RC4 are resolved. I tested vs Scala 2.12 and 2.11. > > > > These items are still targeted to

Re: Some PRs not automatically linked to JIRAs

2018-10-30 Thread Hyukjin Kwon
here. Thanks. 2018년 10월 1일 (월) 오후 7:15, Hyukjin Kwon 님이 작성: > Seems fixed but looks it starts to leave duplicated PR links for some > recent JIRAs. Not a big deal but are they being ran in multiple places > maybe? > > For instance, > > https://issues.apache.org/jira/brows

[discuss] SparkR CRAN feasibility check server problem

2018-11-01 Thread Hyukjin Kwon
Hi all, I want to raise the CRAN failure issue because it started to block Spark PRs time to time. Since the number of PRs grows hugely in Spark community, this is critical to not block other PRs. There has been a problem at CRAN (See https://github.com/apache/spark/pull/20005 for analysis). To

Re: DataSourceV2 hangouts sync

2018-10-25 Thread Hyukjin Kwon
I didn't know I live in the same timezone with you Wenchen :D. Monday or Wednesday at 5PM PDT sounds good to me too FWIW. 2018년 10월 26일 (금) 오전 8:29, Ryan Blue 님이 작성: > Good point. How about Monday or Wednesday at 5PM PDT then? > > Everyone, please reply to me (no need to spam the list) with

Re: DataSourceV2 hangouts sync

2018-10-25 Thread Hyukjin Kwon
+1 ! 2018년 10월 26일 (금) 오전 7:21, Dongjoon Hyun 님이 작성: > +1. Thank you for volunteering, Ryan! > > Bests, > Dongjoon. > > > On Thu, Oct 25, 2018 at 4:19 PM Xiao Li wrote: > >> +1 >> >> Reynold Xin 于2018年10月25日周四 下午4:16写道: >> >>> +1 >>> >>> >>> >>> On Thu, Oct 25, 2018 at 4:12 PM Li Jin wrote:

Re: Arrow optimization in conversion from R DataFrame to Spark DataFrame

2018-11-10 Thread Hyukjin Kwon
t; Thanks Hyukjin! Very cool results >> >> Shivaram >> On Fri, Nov 9, 2018 at 10:58 AM Felix Cheung >> wrote: >> > >> > Very cool! >> > >> > >> > >> > From: Hyukjin Kwon >> > Sent

Re: [discuss] SparkR CRAN feasibility check server problem

2018-11-10 Thread Hyukjin Kwon
ematic. > > > On Thu, Nov 1, 2018 at 7:35 PM Hyukjin Kwon wrote: > >> Hi all, >> >> I want to raise the CRAN failure issue because it started to block Spark >> PRs time to time. Since the number >> of PRs grows hugely in Spark community, this is critical to n

Arrow optimization in conversion from R DataFrame to Spark DataFrame

2018-11-08 Thread Hyukjin Kwon
Hi all, I am trying to introduce R Arrow optimization by reusing PySpark Arrow optimization. It boosts R DataFrame > Spark DataFrame up to roughly 900% ~ 1200% faster. Looks working fine so far; however, I would appreciate if you guys have some time to take a look

Re: [discuss] SparkR CRAN feasibility check server problem

2018-11-11 Thread Hyukjin Kwon
--- > *From:* Liang-Chi Hsieh > *Sent:* Saturday, November 10, 2018 2:32 AM > *To:* dev@spark.apache.org > *Subject:* Re: [discuss] SparkR CRAN feasibility check server problem > > > Yeah, thanks Hyukjin Kwon for bringing this up for discussion. > > I don't know how h

New PySpark test style

2018-11-13 Thread Hyukjin Kwon
Hi all, Lately, https://github.com/apache/spark/pull/23021 is merged, which tries to a big single file that contains all the tests into smaller files. I picked up one example and follow, NumPy. Because the current style looks closer to NumPy structure and looks easier to follow. Please see

Re: DataSourceV2 APIs creating multiple instances of DataSourceReader and hence not preserving the state

2018-10-09 Thread Hyukjin Kwon
I took a look for the codes. val source = classOf[MyDataSource].getCanonicalName spark.read.format(source).load().collect() Looks indeed it calls twice. First all: Looks it creates it first to read the schema for a logical plan

Re: [VOTE] SPARK 2.4.0 (RC3)

2018-10-11 Thread Hyukjin Kwon
So, which date is it? 2018년 10월 11일 (목) 오전 1:48, Garlapati, Suryanarayana (Nokia - IN/Bangalore) < suryanarayana.garlap...@nokia.com>님이 작성: > Might be you need to change the date(Oct 1 has already passed). > > > > >> The vote is open until October 1 PST and passes if a majority +1 PMC > votes

Re: Possible bug in DatasourceV2

2018-10-11 Thread Hyukjin Kwon
See https://github.com/apache/spark/pull/22688 +WEnchen, here looks the problem raised. This might have to be considered as a blocker ... On Thu, 11 Oct 2018, 2:48 pm assaf.mendelson, wrote: > Hi, > > I created a datasource writer WITHOUT a reader. When I do, I get an > exception:

Re: Possible bug in DatasourceV2

2018-10-11 Thread Hyukjin Kwon
oAttributes, options, ident, userSpecifiedSchema) > > } > > > > Correct this? > > > > Or even creating a new create which simply gets the schema as non optional? > > > > Thanks, > > Assaf > > > > *From:* Hyukjin Kwon [mailto:gurwl

Re: Some PRs not automatically linked to JIRAs

2018-10-01 Thread Hyukjin Kwon
/browse/SPARK-25564 2018년 9월 17일 (월) 오후 10:09, Ilan Filonenko 님이 작성: > Same over here: > > https://issues.apache.org/jira/browse/SPARK-25291 / > https://github.com/apache/spark/pull/22415 > > On Sun, Sep 16, 2018 at 10:09 PM Hyukjin Kwon wrote: > >> Seems sa

Re: welcome a new batch of committers

2018-10-03 Thread Hyukjin Kwon
Yay! you guys all individuals do deserve it. Congratulations! 2018년 10월 3일 (수) 오후 4:59, Reynold Xin 님이 작성: > Hi all, > > The Apache Spark PMC has recently voted to add several new committers to > the project, for their contributions: > > - Shane Knapp (contributor to infra) > - Dongjoon Hyun

Jenkins automatic disabling service - who and why?

2018-09-03 Thread Hyukjin Kwon
Hi all, I lately noticed we started to block Jenkins tests in old PRs. For instance, see https://github.com/apache/spark/pull/18447 I don't explicitly object this idea but at least can I ask who and why this was started? Is it for notification purpose or to save resource? Did I miss some

Spark JIRA tags clarification and management

2018-09-03 Thread Hyukjin Kwon
Hi all, I lately noticed tags are often used to classify JIRAs. I was thinking we better explicitly document what tags are used and explain which tag means what. For instance, we documented "Contributing to JIRA Maintenance" at https://spark.apache.org/contributing.html before (thanks, Sean Owen)

Re: Jenkins automatic disabling service - who and why?

2018-09-03 Thread Hyukjin Kwon
own because when we type "ok to test", the Jenkins asking is gone away. 2018년 9월 3일 (월) 오후 8:54, Hyukjin Kwon 님이 작성: > Not a big deal but it has been few months since I saw this, and wondering > why it suddenly asks Jenkins admin verification from at certain point. > > I had a sma

Re: Jenkins automatic disabling service - who and why?

2018-09-03 Thread Hyukjin Kwon
e web app UI? > > On Mon, Sep 3, 2018, 1:54 AM Hyukjin Kwon wrote: > >> Hi all, >> >> I lately noticed we started to block Jenkins tests in old PRs. For >> instance, see https://github.com/apache/spark/pull/18447 >> I don't explicitly object this idea but

Re: Spark JIRA tags clarification and management

2018-09-03 Thread Hyukjin Kwon
like rel note. Would be good to clarify. > > -- > *From:* Reynold Xin > *Sent:* Sunday, September 2, 2018 11:50 PM > *To:* Hyukjin Kwon > *Cc:* dev > *Subject:* Re: Spark JIRA tags clarification and management > > It would be great to documen

Re: Spark JIRA tags clarification and management

2018-09-03 Thread Hyukjin Kwon
Thanks, Reynold. +Adding Xiao and Wenchen who I saw often used tags. Would you have some tags you think we should document more? 2018년 9월 4일 (화) 오전 9:27, Reynold Xin 님이 작성: > The most common ones we do are: > > releasenotes > > correctness > > > > On Mon, Sep 3, 2

Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

2019-01-15 Thread Hyukjin Kwon
Resolving HIVE-16391 means Hive to release 1.2.x that contains the fixes of our Hive fork (correct me if I am mistaken). Just to be honest by myself and as a personal opinion, that basically says Hive to take care of Spark's dependency. Hive looks going ahead for 3.1.x and no one would use the

Re: Removing old HiveMetastore(0.12~0.14) from Spark 3.0.0?

2019-01-22 Thread Hyukjin Kwon
Yea, I was thinking about that too. They are too old to keep. +1 for removing them out. 2019년 1월 23일 (수) 오전 11:30, Dongjoon Hyun 님이 작성: > Hi, All. > > Currently, Apache Spark supports Hive Metastore(HMS) 0.12 ~ 2.3. > Among them, HMS 0.x releases look very old since we are in 2019. > If these

Re: [discuss] SparkR CRAN feasibility check server problem

2018-12-12 Thread Hyukjin Kwon
on will be in > https://issues.apache.org/jira/browse/SPARK-24152. I will post here if I > get > reply from CRAN admin. > > Thanks. > > > Liang-Chi Hsieh wrote > > Thanks for letting me know! I will look into it and ask CRAN admin for > > help. > > > > > > Hyuk

Re: How can I help?

2018-12-17 Thread Hyukjin Kwon
Please take a look for https://spark.apache.org/contributing.html . It contains virtually all information it needs for contributions. 2018년 12월 18일 (화) 오전 3:54, Raghunadh Madamanchi 님이 작성: > Hi, > > I am Raghu, I live in Dallas,TX. > Having 15+ years of Experience in Software Development and

Re: Apache Spark git repo moved to gitbox.apache.org

2018-12-11 Thread Hyukjin Kwon
Me too. I want to put some input as well if that can be helpful. On Wed, 12 Dec 2018, 8:20 am Reynold Xin Thanks, Sean. Which INFRA ticket is it? It's creating a lot of noise so I > want to put some pressure myself there too. > > > On Mon, Dec 10, 2018 at 9:51 AM, Sean Owen wrote: > >> Agree,

Re: Apache Spark git repo moved to gitbox.apache.org

2018-12-10 Thread Hyukjin Kwon
BTW, should I be able to close PRs via GitHub UI right now or is there another way to do it? Looks I'm not seeing the close button. 2018년 12월 11일 (화) 오전 1:51, Sean Owen 님이 작성: > Agree, I'll ask on the INFRA ticket and follow up. That's a lot of extra > noise. > > On Mon, Dec 10, 2018 at 11:37 AM

Re: Apache Spark git repo moved to gitbox.apache.org

2018-12-10 Thread Hyukjin Kwon
Ah, sorry. I missed it. It works correctly. Thanks. 2018년 12월 11일 (화) 오전 10:47, Sean Owen 님이 작성: > Did you do the step where you sync your GitHub and ASF account? After an > hour you should get an email and then you can. > > On Mon, Dec 10, 2018, 8:01 PM Hyukjin Kwon >> BTW

Re: [discuss] SparkR CRAN feasibility check server problem

2018-12-12 Thread Hyukjin Kwon
of this problem..! 2018년 11월 12일 (월) 오후 1:47, Hyukjin Kwon 님이 작성: > I made a PR to officially drop R prior to version 3.4 ( > https://github.com/apache/spark/pull/23012). > The tests will probably fail for now since it produces warnings for using > R 3.1.x. > > 2018년 11월 11일 (일) 오전 3:0

Re: Run a specific PySpark test or group of tests

2018-12-05 Thread Hyukjin Kwon
oes support unittest-based tests >>> <https://docs.pytest.org/en/latest/unittest.html>, allowing for >>> incremental adoption. I'll see how convenient it is to use with our current >>> test layout. >>> >>> On Tue, Aug 15, 2017 at 1:03 AM Hyukjin Kwon

Re: Run a specific PySpark test or group of tests

2018-12-05 Thread Hyukjin Kwon
It's merged now and in developer tools page - http://spark.apache.org/developer-tools.html#individual-tests Have some func with PySpark testing! 2018년 12월 5일 (수) 오후 4:30, Hyukjin Kwon 님이 작성: > Hey all, I kind of met the goal with a minimised fix with keeping > available framework and o

A user of thincrs has selected this issue. Deadline: Xxx, Xxx X, XXXX XX:XX

2018-12-01 Thread Hyukjin Kwon
Just out of curiosity, does any one know what kind of account it is? https://issues.apache.org/jira/secure/ViewProfile.jspa?name=Thincrs Was wondering if it's a bot for some purposes

Re: [VOTE] SPARK 2.2.3 (RC1)

2019-01-10 Thread Hyukjin Kwon
+1 Thanks. 2019년 1월 11일 (금) 오전 7:01, Takeshi Yamamuro 님이 작성: > ok, thanks for the check. > > best, > takeshi > > On Fri, Jan 11, 2019 at 1:37 AM Dongjoon Hyun > wrote: > >> Hi, Takeshi. >> >> Yep. It's not a release blocker. We don't need that as Sean mentioned >> already. >> Since you are the

Re: Ask for reviewing on Structured Streaming PRs

2019-01-13 Thread Hyukjin Kwon
But it's true that imho there's less activity in SS in general. Should be noted. Maybe it's also because committers are busy for other stuffs. Yea, I agree that one actionable strategy for now might be to make the PR description as clear as possible to make the review easier, and then ping them

Re: from_csv

2018-09-16 Thread Hyukjin Kwon
+1 for this idea since text parsing in CSV/JSON is quite common. One thing is about schema inference likewise with JSON functionality. In case of JSON, we added schema_of_json for it and same thing should be able to apply to CSV too. If we see some more needs for it, we can consider a function

Re: Should python-2 be supported in Spark 3.0?

2018-09-16 Thread Hyukjin Kwon
I think we can deprecate it in 3.x.0 and remove it in Spark 4.0.0. Many people still use Python 2. Also, techincally 2.7 support is not officially dropped yet - https://pythonclock.org/ 2018년 9월 17일 (월) 오전 9:31, Aakash Basu 님이 작성: > Removing support for an API in a major release makes poor

Re: Some PRs not automatically linked to JIRAs

2018-09-16 Thread Hyukjin Kwon
Seems same thing is happening again. For instance, - https://issues.apache.org/jira/browse/SPARK-25440 / https://github.com/apache/spark/pull/22429 - https://issues.apache.org/jira/browse/SPARK-25429 / https://github.com/apache/spark/pull/22420 2017년 8월 3일 (목) 오전 9:06, Hyukjin Kwon 님이 작성: >

Re: Apache Spark git repo moved to gitbox.apache.org

2018-12-18 Thread Hyukjin Kwon
Similar issues are going on in spark-website as well. I also filed a ticket at https://issues.apache.org/jira/browse/INFRA-17469. 2018년 12월 12일 (수) 오전 9:02, Reynold Xin 님이 작성: > I filed a ticket: https://issues.apache.org/jira/browse/INFRA-17403 > > Please add your support there. > > > On Tue,

Re: Noisy spark-website notifications

2018-12-19 Thread Hyukjin Kwon
Yea, that's a bit noisy .. I would just completely disable it to be honest. I failed https://issues.apache.org/jira/browse/INFRA-17469 before. I would appreciate if there would be more inputs there :-) 2018년 12월 20일 (목) 오전 11:22, Nicholas Chammas 님이 작성: > I'd prefer it if we disabled all git

Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-03-25 Thread Hyukjin Kwon
Hi all, We really need to upgrade the minimal version soon. It's actually slowing down the PySpark dev, for instance, by the overhead that sometimes we need currently to test all multiple matrix of Arrow and Pandas. Also, it currently requires to add some weird hacks or ugly codes. Some bugs

Re: PySpark syntax vs Pandas syntax

2019-03-26 Thread Hyukjin Kwon
BTW, I am working on the documentation related with this subject at https://issues.apache.org/jira/browse/SPARK-26022 to describe the difference 2019년 3월 26일 (화) 오후 3:34, Reynold Xin 님이 작성: > We have some early stuff there but not quite ready to talk about it in > public yet (I hope soon

Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-03-28 Thread Hyukjin Kwon
st 2.7 >> and 3.5. >> >> On Mon, Mar 25, 2019 at 9:44 PM Reynold Xin wrote: >> >>> +1 on doing this in 3.0. >>> >>> >>> On Mon, Mar 25, 2019 at 9:31 PM, Felix Cheung >> > wrote: >>> >>>> I’m +1 if 3.0 >>

Re: [pyspark] dataframe map_partition

2019-03-10 Thread Hyukjin Kwon
Because both dapply in R and Scalar Pandas UDF in Python are similar, and cover each other. FWIW, it somewhat sounds like SPARK-26413 and SPARK-26412 2019년 3월 9일 (토) 오후 12:32, peng yu 님이 작성: > Cool, thanks for letting me know, but why not support dapply >

Re: [VOTE] Release Apache Spark 2.3.3 (RC2)

2019-02-08 Thread Hyukjin Kwon
Sorry for the last minute vote. +1 2019년 2월 8일 (금) 오전 10:15, Takeshi Yamamuro 님이 작성: > Thanks, all. > > Yea, I think we don't need to block the release, too. > > > Jungtaek > Thanks! That is very helpful! > If you find something, please let me know. > > Best, > Takeshi > > On Fri, Feb 8, 2019

<    1   2   3   4   5   6   7   >