[DISCUSS] New sections in Github Pull Request description template

2019-07-23 Thread Hyukjin Kwon
Hi all, I would like to discuss about some new sections under "## What changes were proposed in this pull request?": ### Do the changes affect _any_ user/dev-facing input or output? (Please answer yes or no. If yes, answer the questions below) ### What was the previous behavior? (Please provid

Re: In Apache Spark JIRA, spark/dev/github_jira_sync.py not running properly

2019-07-19 Thread Hyukjin Kwon
the same. The API key used by the bot is rejected > by Apache JIRA and forwarded to CAPCHAR. > > Bests, > Dongjoon. > > On Thu, Jul 18, 2019 at 8:24 PM Hyukjin Kwon wrote: > >> Hi all, >> >> Seems this issue is re-happening again. Seems the PR link is pro

Re: In Apache Spark JIRA, spark/dev/github_jira_sync.py not running properly

2019-07-18 Thread Hyukjin Kwon
owse/SPARK-28440 https://issues.apache.org/jira/browse/SPARK-28436 https://issues.apache.org/jira/browse/SPARK-28434 https://issues.apache.org/jira/browse/SPARK-28433 https://issues.apache.org/jira/browse/SPARK-28431 Josh and Dongjoon, do you guys maybe have any idea? 2019년 4월 25일 (목) 오후 3:09, Hyukjin

Re: Contribution help needed for sub-tasks of an umbrella JIRA - port *.sql tests to improve coverage of Python, Pandas, Scala UDF cases

2019-07-09 Thread Hyukjin Kwon
n Tue, Jul 9, 2019 at 6:17 AM Hyukjin Kwon wrote: > >> Hi all, >> >> I am currently targeting to improve Python, Pandas UDFs Scala UDF test >> cases by integrating our existing *.sql files at >> https://issues.apache.org/jira/browse/SPARK-27921 >> >>

Contribution help needed for sub-tasks of an umbrella JIRA - port *.sql tests to improve coverage of Python, Pandas, Scala UDF cases

2019-07-08 Thread Hyukjin Kwon
Hi all, I am currently targeting to improve Python, Pandas UDFs Scala UDF test cases by integrating our existing *.sql files at https://issues.apache.org/jira/browse/SPARK-27921 I would appreciate that anyone who's interested in Spark contribution takes some sub-tasks. It's too many for me to do

Re: Disabling `Merge Commits` from GitHub Merge Button

2019-07-01 Thread Hyukjin Kwon
+1 2019년 7월 2일 (화) 오전 9:39, Takeshi Yamamuro 님이 작성: > I'm also using the script in both cases, anyway +1. > > On Tue, Jul 2, 2019 at 5:58 AM Sean Owen wrote: > >> I'm using the merge script in both repos. I think that was the best >> practice? >> So, sure, I'm fine with disabling it. >> >> On Mo

Re: Exposing JIRA issue types at GitHub PRs

2019-06-16 Thread Hyukjin Kwon
Labels look good and useful. On Sat, 15 Jun 2019, 02:36 Dongjoon Hyun, wrote: > Now, you can see the exposed component labels (ordered by the number of > PRs) here and click the component to search. > > https://github.com/apache/spark/labels?sort=count-desc > > Dongjoon. > > > On Fri, Jun 14

Re: [DISCUSS] Increasing minimum supported version of Pandas

2019-06-15 Thread Hyukjin Kwon
On Fri, Jun 14, 2019 at 11:36 AM Felix Cheung > wrote: > >> How about pyArrow? >> >> -- >> *From:* Holden Karau >> *Sent:* Friday, June 14, 2019 11:06:15 AM >> *To:* Felix Cheung >> *Cc:* Bryan Cutler; Dongjoon

Re: [DISCUSS] Increasing minimum supported version of Pandas

2019-06-13 Thread Hyukjin Kwon
I am +1 to go for 0.23.2 - it brings some overhead to test PyArrow and pandas combinations. Spark 3 should be good time to increase. 2019년 6월 14일 (금) 오전 9:46, Bryan Cutler 님이 작성: > Hi All, > > We would like to discuss increasing the minimum supported version of > Pandas in Spark, which is current

Re: Exposing JIRA issue types at GitHub PRs

2019-06-12 Thread Hyukjin Kwon
Yea, I think we can automate this process via, for instance, https://github.com/apache/spark/blob/master/dev/github_jira_sync.py +1 for such sort of automatic categorizing and matching metadata between JIRA and github Adding Josh and Sean as well. On Thu, 13 Jun 2019, 13:17 Dongjoon Hyun, wrote

Re: Resolving all JIRAs affecting EOL releases

2019-05-20 Thread Hyukjin Kwon
<https://issues.apache.org/jira/browse/SPARK-22766> > 2. > 3. > > > On Sun, May 19, 2019 at 6:43 PM Hyukjin Kwon wrote: > >> Thanks Shane .. the URL I linked somehow didn't work in other people >> browser. Hope this link works: >> &g

Re: Resolving all JIRAs affecting EOL releases

2019-05-19 Thread Hyukjin Kwon
20%3C%3D%20-52w I will take an action around this time tomorrow considering there were some more changes to make at the last minute. 2019년 5월 19일 (일) 오후 6:39, Hyukjin Kwon 님이 작성: > I will add one more condition for "updated". So, it will additionally > avoid things updated wit

Re: Resolving all JIRAs affecting EOL releases

2019-05-19 Thread Hyukjin Kwon
it has been reported against > 2.1.0. > > On the other hand, I'd go further and close _anything_ not updated in a > long time, like a year (or 2 if feeling conservative). That is there's > probably a lot of old cruft out there that wasn't marked with an Affected > Ver

Re: Resolving all JIRAs affecting EOL releases

2019-05-18 Thread Hyukjin Kwon
May 17, 2019 at 9:07 AM Imran Rashid > wrote: > >> +1, thanks for taking this on >> >> On Wed, May 15, 2019 at 7:26 PM Hyukjin Kwon wrote: >> >>> oh, wait. 'Incomplete' can still make sense in this way then. >>> Yes, I am good with &#x

Re: Resolving all JIRAs affecting EOL releases

2019-05-15 Thread Hyukjin Kwon
haven't > otherwise used that. Maybe that's simpler than a label. But, anything like > that sounds good. > > On Wed, May 15, 2019 at 8:40 PM Hyukjin Kwon wrote: > >> BTW, affected version became a required field (I don't remember when >> exactly was .. I b

Re: Resolving all JIRAs affecting EOL releases

2019-05-15 Thread Hyukjin Kwon
oh, wait. 'Incomplete' can still make sense in this way then. Yes, I am good with 'Incomplete' too. 2019년 5월 16일 (목) 오전 11:24, Hyukjin Kwon 님이 작성: > I actually recently used 'Incomplete' a bit when the JIRA is basically > too poorly formed (like just copying

Re: Resolving all JIRAs affecting EOL releases

2019-05-15 Thread Hyukjin Kwon
a start. >> I like the idea of closing things that only affect an EOL release, >> but, many items aren't marked, so may need to cast the net wider. >> >> I think only then does it make sense to look at bothering to reproduce >> or evaluate the 1000s that will stil

Re: Resolving all JIRAs affecting EOL releases

2019-05-15 Thread Hyukjin Kwon
ectedVersion in > versionMatch("^2.4.*") OR affectedVersion in versionMatch("^2.3.*") OR > affectedVersion in versionMatch("^2.2.*")) > AND priority NOT IN (Urgent, Blocker, Critical, High) > > > On Wed, May 15, 2019, 14:55 Hyukjin Kwon wrote: > >

Resolving all JIRAs affecting EOL releases

2019-05-15 Thread Hyukjin Kwon
Hi all, I would like to propose to resolve all JIRAs that affects EOL releases - 2.2 and below. and affected version not specified. I was rather against this way and considered this as last resort in roughly 3 years ago when we discussed. Now I think we should go ahead with this. See below. I hav

Re: In Apache Spark JIRA, spark/dev/github_jira_sync.py not running properly

2019-04-24 Thread Hyukjin Kwon
4%2Flib%2Fjira%2Fresilientsession.py&line=57&logInsertId=5cc1483600029309a7af76d5&logNanos=1556170805012269000&nestedLogIndex=3&project=spark-prs&src=ac>, >> in raise_on_error r.status_code, error, r.url, request=request, response=r, >> **kwargs) JIRAError: JiraE

Re: In Apache Spark JIRA, spark/dev/github_jira_sync.py not running properly

2019-04-24 Thread Hyukjin Kwon
Can anyone take a look for this one? OPEN status JIRAs are being rapidly increased (from around 2400 to 2600) 2019년 4월 19일 (금) 오후 8:05, Hyukjin Kwon 님이 작성: > Hi all, > > Looks 'spark/dev/github_jira_sync.py' is not running correctly somewhere. > Usually the JIRA's stat

Re: pyspark.sql.functions ide friendly

2019-04-19 Thread Hyukjin Kwon
+1 I'm good with changing too. On Thu, 18 Apr 2019, 01:18 Reynold Xin, wrote: > Are you talking about the ones that are defined in a dictionary? If yes, > that was actually not that great in hindsight (makes it harder to read & > change), so I'm OK changing it. > > E.g. > > _functions = { >

In Apache Spark JIRA, spark/dev/github_jira_sync.py not running properly

2019-04-19 Thread Hyukjin Kwon
Hi all, Looks 'spark/dev/github_jira_sync.py' is not running correctly somewhere. Usually the JIRA's status should be updated to "IN PROGRESS" when somebody opens a PR against a JIRA. Looks now it only leaves a link and does not change JIRA's status. Can someone else who knows where it's running

Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-03-28 Thread Hyukjin Kwon
kins test against 2.7 >> and 3.5. >> >> On Mon, Mar 25, 2019 at 9:44 PM Reynold Xin wrote: >> >>> +1 on doing this in 3.0. >>> >>> >>> On Mon, Mar 25, 2019 at 9:31 PM, Felix Cheung >> > wrote: >>> >>>> I’m +1

Re: PySpark syntax vs Pandas syntax

2019-03-26 Thread Hyukjin Kwon
BTW, I am working on the documentation related with this subject at https://issues.apache.org/jira/browse/SPARK-26022 to describe the difference 2019년 3월 26일 (화) 오후 3:34, Reynold Xin 님이 작성: > We have some early stuff there but not quite ready to talk about it in > public yet (I hope soon though).

Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-03-25 Thread Hyukjin Kwon
Hi all, We really need to upgrade the minimal version soon. It's actually slowing down the PySpark dev, for instance, by the overhead that sometimes we need currently to test all multiple matrix of Arrow and Pandas. Also, it currently requires to add some weird hacks or ugly codes. Some bugs exist

Re: Request to disable a bot account, 'Thincrs' in JIRA of Apache Spark

2019-03-13 Thread Hyukjin Kwon
Thanks, I opened https://issues.apache.org/jira/browse/INFRA-18004 2019년 3월 14일 (목) 오전 8:35, Marcelo Vanzin 님이 작성: > Go for it. I would do it now, instead of waiting, since there's been > enough time for them to take action. > > On Wed, Mar 13, 2019 at 4:32 PM Hyukjin Kwon wro

Re: Request to disable a bot account, 'Thincrs' in JIRA of Apache Spark

2019-03-13 Thread Hyukjin Kwon
Looks this bot keeps working. I am going to open a INFRA JIRA to block this bot in few days. Please let me know if you guys have a different idea to prevent this. 2019년 3월 13일 (수) 오전 8:16, Hyukjin Kwon 님이 작성: > Hi whom it may concern in Thincrs > > > > I am still observing t

Re: [pyspark] dataframe map_partition

2019-03-10 Thread Hyukjin Kwon
Because both dapply in R and Scalar Pandas UDF in Python are similar, and cover each other. FWIW, it somewhat sounds like SPARK-26413 and SPARK-26412 2019년 3월 9일 (토) 오후 12:32, peng yu 님이 작성: > Cool, thanks for letting me know, but why not support dapply > http://spark.apache.org/docs/2.0.0/api/R

Re: [build system] Jenkins stopped working

2019-02-19 Thread Hyukjin Kwon
M shane knapp >>>> wrote: >>>> >>>>> yep, it got wedged. issued a restart and it should be back up in a >>>>> few minutes. >>>>> >>>>> On Tue, Feb 19, 2019 at 7:32 AM Parth Gandhi >>>>> wrote: >>&

[build system] Jenkins stopped working

2019-02-19 Thread Hyukjin Kwon
Hi all, Looks Jenkins stopped working. Did I maybe miss a thread, or anybody didn't report this yet? Thanks!

Re: [ANNOUNCE] Announcing Apache Spark 2.3.3

2019-02-18 Thread Hyukjin Kwon
Yay! Good job Takeshi! On Mon, 18 Feb 2019, 14:47 Takeshi Yamamuro We are happy to announce the availability of Spark 2.3.3! > > Apache Spark 2.3.3 is a maintenance release, based on the branch-2.3 > maintenance branch of Spark. We strongly recommend all 2.3.x users to > upgrade to this stable re

Re: Vectorized R gapply[Collect]() implementation

2019-02-14 Thread Hyukjin Kwon
wesome! > > > -- > *From:* Shivaram Venkataraman > *Sent:* Saturday, February 9, 2019 8:33 AM > *To:* Hyukjin Kwon > *Cc:* dev; Felix Cheung; Bryan Cutler; Liang-Chi Hsieh; Shivaram > Venkataraman > *Subject:* Re: Vectorized R gapply[Collect]()

Re: Time to cut an Apache 2.4.1 release?

2019-02-12 Thread Hyukjin Kwon
+1 for 2.4.1 2019년 2월 12일 (화) 오후 4:56, Dongjin Lee 님이 작성: > > SPARK-23539 is a non-trivial improvement, so probably would not be > back-ported to 2.4.x. > > Got it. It seems reasonable. > > Committers: > > Please don't omit SPARK-23539 from 2.5.0. Kafka community needs this > feature. > > Thanks,

Vectorized R gapply[Collect]() implementation

2019-02-09 Thread Hyukjin Kwon
Guys, as continuation of Arrow optimization for R DataFrame to Spark DataFrame, I am trying to make a vectorized gapply[Collect] implementation as an experiment like vectorized Pandas UDFs It brought 820%+ performance improvement. See https://github.com/apache/spark/pull/23746 Please come and ta

Re: [VOTE] Release Apache Spark 2.3.3 (RC2)

2019-02-08 Thread Hyukjin Kwon
Sorry for the last minute vote. +1 2019년 2월 8일 (금) 오전 10:15, Takeshi Yamamuro 님이 작성: > Thanks, all. > > Yea, I think we don't need to block the release, too. > > > Jungtaek > Thanks! That is very helpful! > If you find something, please let me know. > > Best, > Takeshi > > On Fri, Feb 8, 2019 at

Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

2019-02-04 Thread Hyukjin Kwon
al is to minimize > the risk and ensure the release stability and quality. > > Hyukjin Kwon 于2019年2月4日周一 下午12:01写道: > >> Xiao, to check if I understood correctly, do you mean the below? >> >> 1. Use our fork with Hadoop 2.x profile for now, and use Hive 2.x with >>

Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

2019-02-04 Thread Hyukjin Kwon
n't clear whether those concerns specifically argue against >> these PRs. >> >> >> On Fri, Feb 1, 2019 at 2:03 PM Felix Cheung >> wrote: >> > >> > What’s the update and next step on this? >> > >> > We have real users getting blo

Missing SparkR in CRAN

2019-01-24 Thread Hyukjin Kwon
Hi all, I happened to find SparkR is missing in CRAN. See https://cran.r-project.org/web/packages/SparkR/index.html I remember I saw some threads about this in spark-dev mailing list a long long ago IIRC. Is it in progress to fix it somewhere? or is it something I misunderstood?

Re: Removing old HiveMetastore(0.12~0.14) from Spark 3.0.0?

2019-01-22 Thread Hyukjin Kwon
Yea, I was thinking about that too. They are too old to keep. +1 for removing them out. 2019년 1월 23일 (수) 오전 11:30, Dongjoon Hyun 님이 작성: > Hi, All. > > Currently, Apache Spark supports Hive Metastore(HMS) 0.12 ~ 2.3. > Among them, HMS 0.x releases look very old since we are in 2019. > If these are

Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

2019-01-15 Thread Hyukjin Kwon
Resolving HIVE-16391 means Hive to release 1.2.x that contains the fixes of our Hive fork (correct me if I am mistaken). Just to be honest by myself and as a personal opinion, that basically says Hive to take care of Spark's dependency. Hive looks going ahead for 3.1.x and no one would use the new

Re: Ask for reviewing on Structured Streaming PRs

2019-01-13 Thread Hyukjin Kwon
But it's true that imho there's less activity in SS in general. Should be noted. Maybe it's also because committers are busy for other stuffs. Yea, I agree that one actionable strategy for now might be to make the PR description as clear as possible to make the review easier, and then ping them in

Re: [VOTE] SPARK 2.2.3 (RC1)

2019-01-10 Thread Hyukjin Kwon
+1 Thanks. 2019년 1월 11일 (금) 오전 7:01, Takeshi Yamamuro 님이 작성: > ok, thanks for the check. > > best, > takeshi > > On Fri, Jan 11, 2019 at 1:37 AM Dongjoon Hyun > wrote: > >> Hi, Takeshi. >> >> Yep. It's not a release blocker. We don't need that as Sean mentioned >> already. >> Since you are the

Re: Noisy spark-website notifications

2018-12-19 Thread Hyukjin Kwon
Yea, that's a bit noisy .. I would just completely disable it to be honest. I failed https://issues.apache.org/jira/browse/INFRA-17469 before. I would appreciate if there would be more inputs there :-) 2018년 12월 20일 (목) 오전 11:22, Nicholas Chammas 님이 작성: > I'd prefer it if we disabled all git noti

Re: Apache Spark git repo moved to gitbox.apache.org

2018-12-18 Thread Hyukjin Kwon
Similar issues are going on in spark-website as well. I also filed a ticket at https://issues.apache.org/jira/browse/INFRA-17469. 2018년 12월 12일 (수) 오전 9:02, Reynold Xin 님이 작성: > I filed a ticket: https://issues.apache.org/jira/browse/INFRA-17403 > > Please add your support there. > > > On Tue, De

Re: How can I help?

2018-12-17 Thread Hyukjin Kwon
Please take a look for https://spark.apache.org/contributing.html . It contains virtually all information it needs for contributions. 2018년 12월 18일 (화) 오전 3:54, Raghunadh Madamanchi 님이 작성: > Hi, > > I am Raghu, I live in Dallas,TX. > Having 15+ years of Experience in Software Development and Des

Re: [discuss] SparkR CRAN feasibility check server problem

2018-12-12 Thread Hyukjin Kwon
cussion will be in > https://issues.apache.org/jira/browse/SPARK-24152. I will post here if I > get > reply from CRAN admin. > > Thanks. > > > Liang-Chi Hsieh wrote > > Thanks for letting me know! I will look into it and ask CRAN admin for > > help. > > > > > >

Re: [discuss] SparkR CRAN feasibility check server problem

2018-12-12 Thread Hyukjin Kwon
this problem..! 2018년 11월 12일 (월) 오후 1:47, Hyukjin Kwon 님이 작성: > I made a PR to officially drop R prior to version 3.4 ( > https://github.com/apache/spark/pull/23012). > The tests will probably fail for now since it produces warnings for using > R 3.1.x. > > 2018년 11월 11일 (일) 오

Re: Apache Spark git repo moved to gitbox.apache.org

2018-12-11 Thread Hyukjin Kwon
Me too. I want to put some input as well if that can be helpful. On Wed, 12 Dec 2018, 8:20 am Reynold Xin Thanks, Sean. Which INFRA ticket is it? It's creating a lot of noise so I > want to put some pressure myself there too. > > > On Mon, Dec 10, 2018 at 9:51 AM, Sean Owen wrote: > >> Agree, I'

Re: Apache Spark git repo moved to gitbox.apache.org

2018-12-10 Thread Hyukjin Kwon
Ah, sorry. I missed it. It works correctly. Thanks. 2018년 12월 11일 (화) 오전 10:47, Sean Owen 님이 작성: > Did you do the step where you sync your GitHub and ASF account? After an > hour you should get an email and then you can. > > On Mon, Dec 10, 2018, 8:01 PM Hyukjin Kwon >> BTW, s

Re: Apache Spark git repo moved to gitbox.apache.org

2018-12-10 Thread Hyukjin Kwon
BTW, should I be able to close PRs via GitHub UI right now or is there another way to do it? Looks I'm not seeing the close button. 2018년 12월 11일 (화) 오전 1:51, Sean Owen 님이 작성: > Agree, I'll ask on the INFRA ticket and follow up. That's a lot of extra > noise. > > On Mon, Dec 10, 2018 at 11:37 AM

Re: Run a specific PySpark test or group of tests

2018-12-05 Thread Hyukjin Kwon
It's merged now and in developer tools page - http://spark.apache.org/developer-tools.html#individual-tests Have some func with PySpark testing! 2018년 12월 5일 (수) 오후 4:30, Hyukjin Kwon 님이 작성: > Hey all, I kind of met the goal with a minimised fix with keeping > available framework

Re: Run a specific PySpark test or group of tests

2018-12-05 Thread Hyukjin Kwon
es support unittest-based tests >>> <https://docs.pytest.org/en/latest/unittest.html>, allowing for >>> incremental adoption. I'll see how convenient it is to use with our current >>> test layout. >>> >>> On Tue, Aug 15, 2017 at 1:03 AM Hyukji

A user of thincrs has selected this issue. Deadline: Xxx, Xxx X, XXXX XX:XX

2018-12-01 Thread Hyukjin Kwon
Just out of curiosity, does any one know what kind of account it is? https://issues.apache.org/jira/secure/ViewProfile.jspa?name=Thincrs Was wondering if it's a bot for some purposes

Re: Some PRs not automatically linked to JIRAs

2018-11-21 Thread Hyukjin Kwon
://issues.apache.org/jira/browse/SPARK-26104 the links are still duplicated. Looks the scripts are being ran at multiple places. 2018년 10월 30일 (화) 오후 4:58, Hyukjin Kwon 님이 작성: > Duplicated link problem looks still persistent: > > https://issues.apache.org/jira/browse/SPARK-2588

New PySpark test style

2018-11-13 Thread Hyukjin Kwon
Hi all, Lately, https://github.com/apache/spark/pull/23021 is merged, which tries to a big single file that contains all the tests into smaller files. I picked up one example and follow, NumPy. Because the current style looks closer to NumPy structure and looks easier to follow. Please see https:

Re: [discuss] SparkR CRAN feasibility check server problem

2018-11-11 Thread Hyukjin Kwon
--- > *From:* Liang-Chi Hsieh > *Sent:* Saturday, November 10, 2018 2:32 AM > *To:* dev@spark.apache.org > *Subject:* Re: [discuss] SparkR CRAN feasibility check server problem > > > Yeah, thanks Hyukjin Kwon for bringing this up for discussion. > > I don't know

Re: [discuss] SparkR CRAN feasibility check server problem

2018-11-10 Thread Hyukjin Kwon
lso be problematic. > > > On Thu, Nov 1, 2018 at 7:35 PM Hyukjin Kwon wrote: > >> Hi all, >> >> I want to raise the CRAN failure issue because it started to block Spark >> PRs time to time. Since the number >> of PRs grows hugely in Spark community, this i

Re: Arrow optimization in conversion from R DataFrame to Spark DataFrame

2018-11-10 Thread Hyukjin Kwon
gt; >> Thanks Hyukjin! Very cool results >> >> Shivaram >> On Fri, Nov 9, 2018 at 10:58 AM Felix Cheung >> wrote: >> > >> > Very cool! >> > >> > >> > >> > From: Hyukjin Kwon >>

Arrow optimization in conversion from R DataFrame to Spark DataFrame

2018-11-08 Thread Hyukjin Kwon
Hi all, I am trying to introduce R Arrow optimization by reusing PySpark Arrow optimization. It boosts R DataFrame > Spark DataFrame up to roughly 900% ~ 1200% faster. Looks working fine so far; however, I would appreciate if you guys have some time to take a look (https://github.com/apache/spar

[discuss] SparkR CRAN feasibility check server problem

2018-11-01 Thread Hyukjin Kwon
Hi all, I want to raise the CRAN failure issue because it started to block Spark PRs time to time. Since the number of PRs grows hugely in Spark community, this is critical to not block other PRs. There has been a problem at CRAN (See https://github.com/apache/spark/pull/20005 for analysis). To c

Re: Some PRs not automatically linked to JIRAs

2018-10-30 Thread Hyukjin Kwon
here. Thanks. 2018년 10월 1일 (월) 오후 7:15, Hyukjin Kwon 님이 작성: > Seems fixed but looks it starts to leave duplicated PR links for some > recent JIRAs. Not a big deal but are they being ran in multiple places > maybe? > > For instance, > > https://issues.apache.org/jira/brows

Re: [VOTE] SPARK 2.4.0 (RC5)

2018-10-29 Thread Hyukjin Kwon
+1 2018년 10월 30일 (화) 오전 11:03, Gengliang Wang 님이 작성: > +1 > > > 在 2018年10月30日,上午10:41,Sean Owen 写道: > > > > +1 > > > > Same result as in RC4 from me, and the issues I know of that were > > raised with RC4 are resolved. I tested vs Scala 2.12 and 2.11. > > > > These items are still targeted to 2.

Re: DataSourceV2 hangouts sync

2018-10-25 Thread Hyukjin Kwon
I didn't know I live in the same timezone with you Wenchen :D. Monday or Wednesday at 5PM PDT sounds good to me too FWIW. 2018년 10월 26일 (금) 오전 8:29, Ryan Blue 님이 작성: > Good point. How about Monday or Wednesday at 5PM PDT then? > > Everyone, please reply to me (no need to spam the list) with which

Re: DataSourceV2 hangouts sync

2018-10-25 Thread Hyukjin Kwon
+1 ! 2018년 10월 26일 (금) 오전 7:21, Dongjoon Hyun 님이 작성: > +1. Thank you for volunteering, Ryan! > > Bests, > Dongjoon. > > > On Thu, Oct 25, 2018 at 4:19 PM Xiao Li wrote: > >> +1 >> >> Reynold Xin 于2018年10月25日周四 下午4:16写道: >> >>> +1 >>> >>> >>> >>> On Thu, Oct 25, 2018 at 4:12 PM Li Jin wrote: >>

Re: What's a blocker?

2018-10-24 Thread Hyukjin Kwon
> Let's understand statements like "X is not a blocker" to mean "I don't think that X is a blocker". Interpretations not proclamations, backed up by reasons, not all of which are appeals to policy and precedent. Might not be a big deal and out of the topic but I rather hope people explicitly avoid

Re: [VOTE] SPARK 2.4.0 (RC4)

2018-10-23 Thread Hyukjin Kwon
https://github.com/apache/spark/pull/22514 sounds like a regression that affects Hive CTAS in write path (by not replacing them into Spark internal datasources; therefore performance regression). but yea I suspect if we should block the release by this. https://github.com/apache/spark/pull/22144 i

Re: [VOTE] SPARK 2.4.0 (RC4)

2018-10-23 Thread Hyukjin Kwon
I am searching and checking some PRs or JIRAs that state regression. Let me leave a link - it might be good to double check https://github.com/apache/spark/pull/22514 as well. 2018년 10월 23일 (화) 오후 11:58, Stavros Kontopoulos < stavros.kontopou...@lightbend.com>님이 작성: > Sean, > > I will try it agai

Re: [VOTE] SPARK 2.4.0 (RC4)

2018-10-23 Thread Hyukjin Kwon
I am sorry for raising this late. Out of curiosity, does anyone know why we don't treat SPARK-24935 (https://github.com/apache/spark/pull/22144) as a blocker? It looks it broke a API compatibility, and an actual usecase of an external library (https://github.com/DataSketches/sketches-hive) Also, l

Re: GitHub is out of order

2018-10-22 Thread Hyukjin Kwon
It's chaotic now.. can we turn off the Jenkins for a while if the Github is being out of order for a while? My notifications are full of AmblapJenkins bot messages ... On Mon, 22 Oct 2018, 1:13 pm Hyukjin Kwon, wrote: > Yea.. please ignore my duplicated comments if they exist. I did

Re: GitHub is out of order

2018-10-21 Thread Hyukjin Kwon
Yea.. please ignore my duplicated comments if they exist. I didn't know it's globally happening but I thought a problem specific to me so I left duplicated comments multiple times. 2018년 10월 22일 (월) 오후 12:40, Dongjoon Hyun 님이 작성: > Hi, All. > > Currently, GitHub is out of order. Apache Spark repo

Re: Hadoop 3 support

2018-10-17 Thread Hyukjin Kwon
See the discussion at https://github.com/apache/spark/pull/21588 2018년 10월 17일 (수) 오전 5:06, t4 님이 작성: > has anyone got spark jars working with hadoop3.1 that they can share? i am > looking to be able to use the latest hadoop-aws fixes from v3.1 > > > > -- > Sent from: http://apache-spark-develop

Re: Remove Flume support in 3.0.0?

2018-10-12 Thread Hyukjin Kwon
Yea, I thought we are already going to remove this out. +1 for removing it anyway. 2018년 10월 12일 (금) 오전 1:44, Wenchen Fan 님이 작성: > Note that, it was deprecated in 2.3.0 already: > https://spark.apache.org/docs/2.3.0/streaming-flume-integration.html > > On Fri, Oct 12, 2018 at 12:46 AM Reynold Xin

Re: Possible bug in DatasourceV2

2018-10-11 Thread Hyukjin Kwon
er data > sources that don't have table concept. We should opt-in for the schema > validation of append operator. > > On Thu, Oct 11, 2018 at 8:12 PM Hyukjin Kwon wrote: > >> That's why I initially suggested to revert this part out of Spark 2.4 and >> have more dis

Re: Possible bug in DatasourceV2

2018-10-11 Thread Hyukjin Kwon
ema.toAttributes, options, ident, userSpecifiedSchema) > > } > > > > Correct this? > > > > Or even creating a new create which simply gets the schema as non optional? > > > > Thanks, > > Assaf > > > > *From:* Hyukjin Kwon [mailto:

Re: Possible bug in DatasourceV2

2018-10-11 Thread Hyukjin Kwon
See https://github.com/apache/spark/pull/22688 +WEnchen, here looks the problem raised. This might have to be considered as a blocker ... On Thu, 11 Oct 2018, 2:48 pm assaf.mendelson, wrote: > Hi, > > I created a datasource writer WITHOUT a reader. When I do, I get an > exception: org.apache.s

Re: [VOTE] SPARK 2.4.0 (RC3)

2018-10-10 Thread Hyukjin Kwon
So, which date is it? 2018년 10월 11일 (목) 오전 1:48, Garlapati, Suryanarayana (Nokia - IN/Bangalore) < suryanarayana.garlap...@nokia.com>님이 작성: > Might be you need to change the date(Oct 1 has already passed). > > > > >> The vote is open until October 1 PST and passes if a majority +1 PMC > votes are

Re: DataSourceV2 APIs creating multiple instances of DataSourceReader and hence not preserving the state

2018-10-09 Thread Hyukjin Kwon
I took a look for the codes. val source = classOf[MyDataSource].getCanonicalName spark.read.format(source).load().collect() Looks indeed it calls twice. First all: Looks it creates it first to read the schema for a logical plan test.org.apache.spark.sql.sources.v2.MyDataSourceReader.(MyDataSour

Re: welcome a new batch of committers

2018-10-03 Thread Hyukjin Kwon
Yay! you guys all individuals do deserve it. Congratulations! 2018년 10월 3일 (수) 오후 4:59, Reynold Xin 님이 작성: > Hi all, > > The Apache Spark PMC has recently voted to add several new committers to > the project, for their contributions: > > - Shane Knapp (contributor to infra) > - Dongjoon Hyun (con

Re: Some PRs not automatically linked to JIRAs

2018-10-01 Thread Hyukjin Kwon
/browse/SPARK-25564 2018년 9월 17일 (월) 오후 10:09, Ilan Filonenko 님이 작성: > Same over here: > > https://issues.apache.org/jira/browse/SPARK-25291 / > https://github.com/apache/spark/pull/22415 > > On Sun, Sep 16, 2018 at 10:09 PM Hyukjin Kwon wrote: > >> Seems same thing is

Re: Some PRs not automatically linked to JIRAs

2018-09-16 Thread Hyukjin Kwon
Seems same thing is happening again. For instance, - https://issues.apache.org/jira/browse/SPARK-25440 / https://github.com/apache/spark/pull/22429 - https://issues.apache.org/jira/browse/SPARK-25429 / https://github.com/apache/spark/pull/22420 2017년 8월 3일 (목) 오전 9:06, Hyukjin Kwon 님이 작성: >

Re: from_csv

2018-09-16 Thread Hyukjin Kwon
+1 for this idea since text parsing in CSV/JSON is quite common. One thing is about schema inference likewise with JSON functionality. In case of JSON, we added schema_of_json for it and same thing should be able to apply to CSV too. If we see some more needs for it, we can consider a function lik

Re: Should python-2 be supported in Spark 3.0?

2018-09-16 Thread Hyukjin Kwon
I think we can deprecate it in 3.x.0 and remove it in Spark 4.0.0. Many people still use Python 2. Also, techincally 2.7 support is not officially dropped yet - https://pythonclock.org/ 2018년 9월 17일 (월) 오전 9:31, Aakash Basu 님이 작성: > Removing support for an API in a major release makes poor sense

Re: data source api v2 refactoring

2018-09-07 Thread Hyukjin Kwon
BTW, do we hold Datasource V2 related PRs for now until we finish this refactoring just for clarification? 2018년 9월 7일 (금) 오전 12:52, Ryan Blue 님이 작성: > Wenchen, > > I'm not really sure what you're proposing here. What is a `LogicalWrite`? > Is it something that mirrors the read side in your PR? >

Re: Spark JIRA tags clarification and management

2018-09-06 Thread Hyukjin Kwon
Does anyone know if we still user starter or newbie tags as well? 2018년 9월 4일 (화) 오후 10:00, Kazuaki Ishizaki 님이 작성: > Of course, we would like to eliminate all of the following tags > > "flanky" or "flankytest" > > Kazuaki Ishizaki > > > > Fr

Re: Branch 2.4 is cut

2018-09-06 Thread Hyukjin Kwon
Thanks, Wenchen. 2018년 9월 6일 (목) 오후 3:32, Wenchen Fan 님이 작성: > Hi all, > > I've cut the branch-2.4 since all the major blockers are resolved. If no > objections I'll shortly followup with an RC to get the QA started in > parallel. > > Committers, please only merge PRs to branch-2.4 that are bug f

Re: no logging in pyspark code?

2018-09-05 Thread Hyukjin Kwon
FYI, we do have a basic logging by warnings module. 2018년 8월 28일 (화) 오전 2:05, Imran Rashid 님이 작성: > ah, great, thanks! sorry I missed that, I'll watch that jira. > > On Mon, Aug 27, 2018 at 12:41 PM Ilan Filonenko wrote: > >> A JIRA has been opened up on this exact topic: SPARK-25236 >>

Re: code freeze and branch cut for Apache Spark 2.4

2018-09-05 Thread Hyukjin Kwon
Oops, one more - https://github.com/apache/spark/pull/6. I just read this thread. 2018년 9월 6일 (목) 오후 12:12, Sean Owen 님이 작성: > (I slipped https://github.com/apache/spark/pull/22340 in for Scala 2.12. > Maybe it really is the last one. In any event, yes go ahead with a 2.4 RC) > > On Wed, Sep

Re: python test infrastructure

2018-09-05 Thread Hyukjin Kwon
> 1. all of the output in target/test-reports & python/unit-tests.log should be included in the jenkins archived artifacts. Hmmm, I thought they are already archived ( https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95734/artifact/target/unit-tests.log ). FWIW, unit-tests.log ar

Re: Spark JIRA tags clarification and management

2018-09-03 Thread Hyukjin Kwon
Thanks, Reynold. +Adding Xiao and Wenchen who I saw often used tags. Would you have some tags you think we should document more? 2018년 9월 4일 (화) 오전 9:27, Reynold Xin 님이 작성: > The most common ones we do are: > > releasenotes > > correctness > > > > On Mon, Sep 3, 2

Re: Spark JIRA tags clarification and management

2018-09-03 Thread Hyukjin Kwon
cess, like rel note. Would be good to clarify. > > -- > *From:* Reynold Xin > *Sent:* Sunday, September 2, 2018 11:50 PM > *To:* Hyukjin Kwon > *Cc:* dev > *Subject:* Re: Spark JIRA tags clarification and management > > It would be great to do

Re: Jenkins automatic disabling service - who and why?

2018-09-03 Thread Hyukjin Kwon
ot shown because when we type "ok to test", the Jenkins asking is gone away. 2018년 9월 3일 (월) 오후 8:54, Hyukjin Kwon 님이 작성: > Not a big deal but it has been few months since I saw this, and wondering > why it suddenly asks Jenkins admin verification from at certain point. > > I had

Re: Jenkins automatic disabling service - who and why?

2018-09-03 Thread Hyukjin Kwon
eb app UI? > > On Mon, Sep 3, 2018, 1:54 AM Hyukjin Kwon wrote: > >> Hi all, >> >> I lately noticed we started to block Jenkins tests in old PRs. For >> instance, see https://github.com/apache/spark/pull/18447 >> I don't explicitly object this idea but at

Jenkins automatic disabling service - who and why?

2018-09-02 Thread Hyukjin Kwon
Hi all, I lately noticed we started to block Jenkins tests in old PRs. For instance, see https://github.com/apache/spark/pull/18447 I don't explicitly object this idea but at least can I ask who and why this was started? Is it for notification purpose or to save resource? Did I miss some discussio

Spark JIRA tags clarification and management

2018-09-02 Thread Hyukjin Kwon
Hi all, I lately noticed tags are often used to classify JIRAs. I was thinking we better explicitly document what tags are used and explain which tag means what. For instance, we documented "Contributing to JIRA Maintenance" at https://spark.apache.org/contributing.html before (thanks, Sean Owen)

Re: [DISCUSS] move away from python doctests

2018-08-31 Thread Hyukjin Kwon
IMHO, one thing we should consider before this is, refactoring the PySpark tests all to make them separate pairs for main codes. Now, we put all those unit tests into few several files, which makes hard to follow the tests. 2018년 8월 31일 (금) 오후 2:05, Felix Cheung 님이 작성: > +1 on what Li said. > > A

Re: Spark Streaming : Multiple sources found for csv : Error

2018-08-30 Thread Hyukjin Kwon
Yea, this is exactly what I have been worried of the recent changes (discussed in https://issues.apache.org/jira/browse/SPARK-24924) See https://github.com/apache/spark/pull/17916. This should be fine in upper Spark versions. FYI, +Wechen and Dongjoon I want to add Thomas Graves and Gengliang Wang

Re: Porting or explicitly linking project style in Apache Spark based on https://github.com/databricks/scala-style-guide

2018-08-23 Thread Hyukjin Kwon
t, if possible. > > > > On Thu, Aug 23, 2018 at 6:38 PM Hyukjin Kwon wrote: > >> If you meant "Code Style Guide", many of them are missing and it refers >> https://docs.scala-lang.org/style/ not >> https://github.com/databricks/scala-style-guide (please correct me

Re: Porting or explicitly linking project style in Apache Spark based on https://github.com/databricks/scala-style-guide

2018-08-23 Thread Hyukjin Kwon
s always to follow the code around the code you're changing. > > > > On Thu, Aug 23, 2018 at 8:14 PM Hyukjin Kwon > wrote: > > Hi all, > > > > I usually follow https://github.com/databricks/scala-style-guide for > Apache Spark's style, which is usually generally

Porting or explicitly linking project style in Apache Spark based on https://github.com/databricks/scala-style-guide

2018-08-23 Thread Hyukjin Kwon
Hi all, I usually follow https://github.com/databricks/scala-style-guide for Apache Spark's style, which is usually generally the same with the Spark's code base in practice. Thing is, we don't explicitly mention this within Apache Spark as far as I can tell. Can we explicitly mention this or por

<    1   2   3   4   5   6   7   8   >