Filter applied on merged Parquet shemsa with new column fails.

2015-10-27 Thread Hyukjin Kwon
When enabling mergedSchema and predicate filter, this fails since Parquet filters are pushed down regardless of each schema of the splits (or rather files). Dominic Ricard reported this issue ( https://issues.apache.org/jira/browse/SPARK-11103) Even though this would work okay by setting

Differences between Spark APIs for Hadoop 1.x and Hadoop 2.x in terms of performance, progress reporting and IO metrics.

2015-12-09 Thread Hyukjin Kwon
Hi all, I am writing this email to both user-group and dev-group since this is applicable to both. I am now working on Spark XML datasource ( https://github.com/databricks/spark-xml). This uses a InputFormat implementation which I downgraded to Hadoop 1.x for version compatibility. However, I

Re: Differences between Spark APIs for Hadoop 1.x and Hadoop 2.x in terms of performance, progress reporting and IO metrics.

2015-12-09 Thread Hyukjin Kwon
change, right? > > It’s not a big change to 2.x API. if you agree, I can do, but I cannot > promise the time within one or two weeks because of my daily job. > > > > > > On Dec 9, 2015, at 5:01 PM, Hyukjin Kwon <gurwls...@gmail.com> wrote: > > Hi all, > >

coalesce at DataFrame missing argument for shuffle.

2015-12-10 Thread Hyukjin Kwon
Hi all, I accidentally met coalesce() function and found this taking arguments different for RDD and DataFrame. It looks shuffle option is missing for DataFrame. I understand repartition() exactly works as coalesce() with shuffling but it looks a bit weird that the same functions take different

Re: Welcoming Yanbo Liang as a committer

2016-06-05 Thread Hyukjin Kwon
Congratulations! 2016-06-04 11:48 GMT+09:00 Matei Zaharia : > Hi all, > > The PMC recently voted to add Yanbo Liang as a committer. Yanbo has been a > super active contributor in many areas of MLlib. Please join me in > welcoming Yanbo! > > Matei >

Unable to compile and test Spark in IntelliJ

2016-01-18 Thread Hyukjin Kwon
Hi all, I usually have been working with Spark in IntelliJ. Before this PR, https://github.com/apache/spark/commit/7cd7f2202547224593517b392f56e49e4c94cabc for `[SPARK-12575][SQL] Grammar parity with existing SQL parser`. I was able to just open the project and then run some tests with IntelliJ

RE: Unable to compile and test Spark in IntelliJ

2016-01-26 Thread Hyukjin Kwon
then remake project > > > > Thanks, > > William Mao > > > > *From:* Iulian Dragoș [mailto:iulian.dra...@typesafe.com] > *Sent:* Wednesday, January 27, 2016 12:12 AM > *To:* Hyukjin Kwon > *Cc:* dev@spark.apache.org > *Subject:* Re: Unable to compile and test Spa

Ability to auto-detect input data for datasources (by file extension).

2016-02-18 Thread Hyukjin Kwon
Hi all, I am planning to submit a PR for https://issues.apache.org/jira/browse/SPARK-8000. Currently, file format is not detected by the file extension unlike compression codecs are being detected. I am thinking of introducing another interface (a function) at DataSourceRegister just like

Inconsistent file extensions and omitting file extensions written by CSV, TEXT and JSON data sources.

2016-03-08 Thread Hyukjin Kwon
Hi all, Currently, the output from CSV, TEXT and JSON data sources does not have file extensions such as .csv, .txt and .json (except for compression extensions such as .gz, .deflate and .bz4). In addition, it looks Parquet has the extensions such as .gz.parquet or .snappy.parquet according to

Re: Inconsistent file extensions and omitting file extensions written by CSV, TEXT and JSON data sources.

2016-03-09 Thread Hyukjin Kwon
n internal representation, and I would > not expect them to have such an extension. For example, you're not > really guaranteed that the way the data breaks up leaves each file a > valid JSON doc. > > On Wed, Mar 9, 2016 at 5:49 AM, Hyukjin Kwon <gurwls...@gmail.com> wrote: >

PySpark, spill-related (possibly psutil) issue, throwing an exception '_fill_function() takes exactly 4 arguments (5 given)'

2016-03-06 Thread Hyukjin Kwon
Hi all, While I am testing some codes in PySpark, I met a weird issue. This works fine at Spark 1.6.0 but it looks it does not for Spark 2.0.0. When I simply run *logData = sc.textFile(path).coalesce(1) *with some big files in stand-alone local mode without HDFS, this simply throws the

Re: PySpark, spill-related (possibly psutil) issue, throwing an exception '_fill_function() takes exactly 4 arguments (5 given)'

2016-03-06 Thread Hyukjin Kwon
Just in case, My python version is 2.7.10. 2016-03-07 11:19 GMT+09:00 Hyukjin Kwon <gurwls...@gmail.com>: > Hi all, > > While I am testing some codes in PySpark, I met a weird issue. > > This works fine at Spark 1.6.0 but it looks it does not for Spark 2.0.0. > >

Re: Null pointer exception when using com.databricks.spark.csv

2016-03-29 Thread Hyukjin Kwon
Hi, I guess this is not a CSV-datasource specific problem. Does loading any file (eg. textFile()) work as well? I think this is related with this thread, http://apache-spark-user-list.1001560.n3.nabble.com/Error-while-running-example-scala-application-using-spark-submit-td10056.html .

Coding style question (about extra anonymous closure within functional transformations)

2016-04-13 Thread Hyukjin Kwon
Hi all, I recently noticed that actually there are some usages of functional transformations (eg. map, foreach and etc.) with extra anonymous closure. For example, ...map(item => { ... }) which can be just simply as below: ...map { item => ... } I wrote a regex to find all of them and

Proposal of closing some PRs which at least one of committers suggested so

2016-04-22 Thread Hyukjin Kwon
Hi all, I realised that there are many open PRs and it is somehow problematic after the past discussion ( http://apache-spark-developers-list.1001551.n3.nabble.com/auto-closing-pull-requests-that-have-been-inactive-gt-30-days-td17208.html ). ​ I looked through them PR by PR and could make a list

Re: [vote] Apache Spark 2.0.0-preview release (rc1)

2016-05-19 Thread Hyukjin Kwon
I happened to test SparkR in Windows 7 (32 bits) and it seems some tests are failed. Could this be a reason to downvote? For more details of the tests, please see https://github.com/apache/spark/pull/13165#issuecomment-220515182 2016-05-20 13:44 GMT+09:00 Takuya UESHIN

Re: Using Travis for JDK7/8 compilation and lint-java.

2016-05-23 Thread Hyukjin Kwon
+1 - I wouldn't be bothered if a build becomes longer if I can write cleaner codes without manually running it. I have just looked though the related PRs and JIRAs and it looks generally okay and reasonable to me. 2016-05-23 18:54 GMT+09:00 Steve Loughran : > > On 23

Question about enabling some of missing rules.

2016-05-15 Thread Hyukjin Kwon
Hi all, Lately, I made a list of rules currently not applied on Spark from http://www.scalastyle.org/rules-dev.html and then I tried to test them. I found two rules that I think might be helpful but I am not too sure. Could I ask both can be added? *RedundantIfChecker *(See

Re: Question about enabling some of missing rules.

2016-05-15 Thread Hyukjin Kwon
s style to > gradually improve over time without a sudden, sweeping change that breaks > everybody's workflow. So far nobody's been able to put such a system > together, as far as I know. > > Nick > > On Sun, May 15, 2016 at 9:51 PM Hyukjin Kwon <gurwls...@gmail.com>

Proposal of closing some PRs and maybe some PRs abandoned by its author

2016-05-06 Thread Hyukjin Kwon
Hi all, This was similar with the proposal of closing PRs before I asked. I think the PRs suggested to be closed below are closable but not very sure of PRs apparently abandoned by its author at least for a month. I remember the discussion about auto-closing PR before. So, I included the PRs

Re: auto closing pull requests that have been inactive > 30 days?

2016-04-18 Thread Hyukjin Kwon
I also think this might not have to be closed only because it is inactive. How about closing issues after 30 days when a committer's comment is added at the last without responses from the author? IMHO, If the committers are not sure whether the patch would be useful, then I think they should

Re: Question about Scala style, explicit typing within transformation functions and anonymous val.

2016-04-17 Thread Hyukjin Kwon
declare them. >> >> For 3 again it depends on context. >> >> >> So while it is a good idea to change 1 to reflect a more consistent code >> base (and maybe we should codify it), it is almost always a bad idea to >> change 2 and 3 just for the sake of chang

Question about Scala style, explicit typing within transformation functions and anonymous val.

2016-04-17 Thread Hyukjin Kwon
Hi all, First of all, I am sorry that this is relatively trivial and too minor but I just want to be clear on this and careful for the more PRs in the future. Recently, I have submitted a PR (https://github.com/apache/spark/pull/12413) about Scala style and this was merged. In this PR, I changed

Re: Recent Jenkins always fails in specific two tests

2016-04-17 Thread Hyukjin Kwon
+1 Yea, I am facing this problem as well, https://github.com/apache/spark/pull/12452 I thought they are spurious because the tests are passed in my local. 2016-04-18 3:26 GMT+09:00 Kazuaki Ishizaki : > I realized that recent Jenkins among different pull requests always

Re: Coding style question (about extra anonymous closure within functional transformations)

2016-04-14 Thread Hyukjin Kwon
e same way that you do. Looking at a few similar cases, I've only found the bytecode produced to be the same regardless of which style is used. >> >> On Wed, Apr 13, 2016 at 7:46 PM, Hyukjin Kwon <gurwls...@gmail.com> wrote: >>> >>> Hi all, >>> >&g

Re: auto closing pull requests that have been inactive > 30 days?

2016-04-18 Thread Hyukjin Kwon
gt; > 2016년 4월 18일 (월) 오후 10:30, Ted Yu <yuzhih...@gmail.com>님이 작성: > >> During the months of November / December, the 30 day period should be >> relaxed. >> >> Some people(at least in US) may take extended vacation during that time. >> >> For Chinese deve

Re: orc/parquet sql conf

2016-07-25 Thread Hyukjin Kwon
For the question 1., It is possible but not supported yet. Please refer https://github.com/apache/spark/pull/13775 Thanks! 2016-07-25 19:01 GMT+09:00 Ovidiu-Cristian MARCU < ovidiu-cristian.ma...@inria.fr>: > Hi, > > Assuming I have some data in both ORC/Parquet formats, and some complex >

Re: Sorting within partitions is not maintained in parquet?

2016-08-11 Thread Hyukjin Kwon
I just took a quick look for this. It seems not parquet-specific problem but for datasources implimenting FileFormat. In 1.6, it seems apparently partitions are made per file but in 2.0 partition can hold multiple files. So, in your case files are miltiple but partitions are fewer, meaning each

Re: Does schema merge on keys with different types is allowed?

2016-06-28 Thread Hyukjin Kwon
I have tested that issue manually and looked into the codes before. It seems it does not support to find a compatible type. https://github.com/apache/spark/blob/b914e1930fd5c5f2808f92d4958ec6fbeddf2e30/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala#L396-L465 2016-06-29

Inquery about Spark's behaviour for configurations in Hadoop configuration instance via read/write.options()

2016-08-04 Thread Hyukjin Kwon
Hi all, If my understanding is correct, now Spark supports to set some options to Hadoop configuration instance via read/write.option(..) API. However, I recently saw some comments and opinion about this. If I understood them correctly, it was as below: - Respecting all the

Re: Welcoming Felix Cheung as a committer

2016-08-08 Thread Hyukjin Kwon
Congratulations! 2016-08-09 7:47 GMT+09:00 Xiao Li : > Congrats Felix! > > 2016-08-08 15:04 GMT-07:00 Herman van Hövell tot Westerflier > : > > Congrats Felix! > > > > On Mon, Aug 8, 2016 at 11:57 PM, dhruve ashar > wrote:

Re: PSA: Java 8 unidoc build

2017-02-06 Thread Hyukjin Kwon
Oh, Joseph, thanks. It is nice to inform this in dev mailing list. Let me please leave another PR to refer, https://github.com/apache/spark/pull/16013 and the JIRA you kindly opened, https://issues.apache.org/jira/browse/SPARK-18692 On 7 Feb 2017 9:13 a.m., "Joseph Bradley"

Re: PSA: Java 8 unidoc build

2017-02-06 Thread Hyukjin Kwon
(One more.. https://github.com/apache/spark/pull/15999 this describes some more cases that might easily be mistaken) On 7 Feb 2017 9:21 a.m., "Hyukjin Kwon" <gurwls...@gmail.com> wrote: > Oh, Joseph, thanks. It is nice to inform this in dev mailing list. > > Let m

Re: welcoming Burak and Holden as committers

2017-01-24 Thread Hyukjin Kwon
Congratuation!! 2017-01-25 9:22 GMT+09:00 Takeshi Yamamuro : > Congrats! > > // maropu > > On Wed, Jan 25, 2017 at 9:20 AM, Kousuke Saruta > wrote: > >> Congrats, Burak and Holden! >> >> - Kousuke >> >> On 2017/01/25 6:36, Herman van Hövell tot

Re: Support for decimal separator (comma or period) in spark 2.1

2017-02-23 Thread Hyukjin Kwon
Please take a look at https://issues.apache.org/jira/browse/SPARK-18359. 2017-02-23 21:53 GMT+09:00 Arkadiusz Bicz : > Thank you Sam for answer, I have solved problem by loading all decimals > columns as string and replacing all commas with dots but this solution is >

Re: spark support on windows

2017-01-16 Thread Hyukjin Kwon
Hi, I just looked through Jacek's page and I believe that is the correct way. That seems to be a Hadoop library specific issue[1]. Up to my knowledge, winutils and the binaries in the private repo are built by a Hadoop PMC member on a dedicated Windows VM which I believe are pretty trustable.

Inconsistency for nullvalue handling CSV: see SPARK-16462, SPARK-16460, SPARK-15144, SPARK-17290 and SPARK-16903

2016-08-29 Thread Hyukjin Kwon
Hi all, PR: https://github.com/apache/spark/pull/14118 JIRAs https://issues.apache.org/jira/browse/SPARK-17290 https://issues.apache.org/jira/browse/SPARK-16903 https://issues.apache.org/jira/browse/SPARK-16462 https://issues.apache.org/jira/browse/SPARK-16460

Change the settings in AppVeyor to prevent triggering the tests in other PRs in other branches

2016-09-09 Thread Hyukjin Kwon
Hi all, Currently, it seems the settings in AppVeyor is default and runs some tests on different branches. For example, https://github.com/apache/spark/pull/15023 https://github.com/apache/spark/pull/15022 It seems it happens only in other branches as they don’t have appveyor.yml and try to

Re: Change the settings in AppVeyor to prevent triggering the tests in other PRs in other branches

2016-09-09 Thread Hyukjin Kwon
FYI, I just ran the SparkR tests on Windows for branch-2.0 and 1.6. branch-2.0 - https://github.com/spark-test/spark/pull/7 branch-1.6 - https://github.com/spark-test/spark/pull/8 2016-09-10 0:59 GMT+09:00 Hyukjin Kwon <gurwls...@gmail.com>: > Yes, if we don't have any PRs to other

Re: Change the settings in AppVeyor to prevent triggering the tests in other PRs in other branches

2016-09-09 Thread Hyukjin Kwon
> INFRA ticket. > > Thanks > Shivaram > > On Fri, Sep 9, 2016 at 6:55 AM, Hyukjin Kwon <gurwls...@gmail.com> wrote: > > Hi all, > > > > > > Currently, it seems the settings in AppVeyor is default and runs some > tests > > on different branches.

Re: Change the settings in AppVeyor to prevent triggering the tests in other PRs in other branches

2016-09-09 Thread Hyukjin Kwon
not pass. On 10 Sep 2016 12:52 a.m., "Shivaram Venkataraman" < shiva...@eecs.berkeley.edu> wrote: > One thing we could do is to backport the commit to branch-2.0 and > branch-1.6 -- Do you think that will fix the problem ? > > On Fri, Sep 9, 2016 at 8:50 AM, Hyukjin Kwon

Re: @scala.annotation.varargs or @_root_.scala.annotation.varargs?

2016-09-08 Thread Hyukjin Kwon
I was also actually wondering why it is being written like this. I actually took a look for this before and wanted to fix them but I found https://github.com/apache/spark/pull/12077/files#r58041468 So, I kind of persuaded myself that committers already know about it and there is a reason for

Re: @scala.annotation.varargs or @_root_.scala.annotation.varargs?

2016-09-23 Thread Hyukjin Kwon
right? > >> > >> On Thu, Sep 8, 2016 at 5:07 PM, Reynold Xin <r...@databricks.com> > wrote: > >> > There is a package called scala. > >> > > >> > > >> > On Friday, September 9, 2016, Hyukjin Kwon <gurwls...@gmail.com> &

Re: Spark Improvement Proposals

2016-10-07 Thread Hyukjin Kwon
I am glad that it was not only what I was thinking. I also do agree with Holden, Sean and Cody. All I wanted to say were all said. 2016-10-08 1:16 GMT+09:00 Holden Karau : > First off, thanks Cody for taking the time to put together these proposals > - I think it has

Re: PSA: JIRA resolutions and meanings

2016-10-08 Thread Hyukjin Kwon
I am uncertain too. It'd be great if these are documented too. FWIW, in my case, I privately asked and told Sean first that I am going to look though the JIRAs and resolve some via the suggested conventions from Sean. (Definitely all blames should be on me if I have done something terribly

Re: Suggestion in README.md for guiding pull requests/JIRAs (probably about linking CONTRIBUTING.md or wiki)

2016-10-09 Thread Hyukjin Kwon
ps://github.com/apache/spark/blob/master/.githu > b/PULL_REQUEST_TEMPLATE > > I wouldn't want to duplicate info too much, but more pointers to a single > source of information seems OK. Although I don't know if it will help much, > sure, pointers from README.md are OK. > > On

Suggestion in README.md for guiding pull requests/JIRAs (probably about linking CONTRIBUTING.md or wiki)

2016-10-09 Thread Hyukjin Kwon
Hi all, I just noticed the README.md (https://github.com/apache/spark) does not describe the steps or links to follow for creating a PR or JIRA directly. I know probably it is sensible to search google about the contribution guides first before trying to make a PR/JIRA but I think it seems not

Re: [VOTE] Release Apache Spark 2.0.1 (RC3)

2016-09-26 Thread Hyukjin Kwon
+1 (non-binding) 2016-09-27 13:22 GMT+09:00 Denny Lee : > +1 on testing with Python2. > > > On Mon, Sep 26, 2016 at 3:13 PM Krishna Sankar > wrote: > >> I do run both Python and Scala. But via iPython/Python2 with my own test >> code. Not running the

Re: welcoming Xiao Li as a committer

2016-10-04 Thread Hyukjin Kwon
Congratulations! 2016-10-04 15:51 GMT+09:00 Dilip Biswal : > Hi Xiao, > > Congratulations Xiao !! This is indeed very well deserved !! > > Regards, > Dilip Biswal > Tel: 408-463-4980 > dbis...@us.ibm.com > > > > From:Reynold Xin > To:

Re: Mini-Proposal: Make it easier to contribute to the contributing to Spark Guide

2016-10-18 Thread Hyukjin Kwon
+1 if the docs can be exposed more. On 19 Oct 2016 2:04 a.m., "Shivaram Venkataraman" < shiva...@eecs.berkeley.edu> wrote: > +1 - Given that our website is now on github > (https://github.com/apache/spark-website), I think we can move most of > our wiki into the main website. That way we'll only

Component naming in the PR title

2016-11-12 Thread Hyukjin Kwon
Hi all, First of all, this might be minor but I just have been curious of different PR titles in particular component part. So, I looked through Spark wiki again and I found the description not quite the same with the PRs. It seems, it is said, Pull Request ... 1. The PR title

Re: [SQL][JDBC] Possible regression in JDBC reader

2016-11-25 Thread Hyukjin Kwon
I believe https://github.com/apache/spark/pull/15975 fixes this regression. I am sorry for the trouble. 2016-11-25 22:23 GMT+09:00 Sean Owen : > See https://github.com/apache/spark/pull/15499#discussion_r89008564 in > particular. Hyukjin / Xiao do we need to undo part of

Re: How do I convert json_encoded_blob_column into a data frame? (This may be a feature request)

2016-11-16 Thread Hyukjin Kwon
Maybe it sounds like you are looking for from_json/to_json functions after en/decoding properly. On 16 Nov 2016 6:45 p.m., "kant kodali" wrote: > > > https://spark.apache.org/docs/2.0.2/sql-programming-guide. > html#json-datasets > > "Spark SQL can automatically infer the

Re: Component naming in the PR title

2016-11-13 Thread Hyukjin Kwon
ories to maintain. > > On Sat, Nov 12, 2016, 18:27 Hyukjin Kwon <gurwls...@gmail.com> wrote: > >> Hi all, >> >> >> First of all, this might be minor but I just have been curious of >> different PR titles in particular component part. So, I lo

Re: Looking for a Spark-Python expert

2016-10-11 Thread Hyukjin Kwon
Just as one of those who subscribed to dev/user mailing list, I would like to avoid to recieve flooding emails about job recruiting. In my personal opinion, I think that might mean virtually allowing that this list is being used as the mean for some profits in an organisation. On 7 Oct 2016 5:05

Re: Why the json file used by sparkSession.read.json must be a valid json object per line

2016-10-15 Thread Hyukjin Kwon
Hi, The reason is just simply JSON data source depends on Hadoop's LineRecordReader when we first try to read the files. There is a workaround for this here in this link, http://searchdatascience.com/spark-adventures-1-processing-multi-line-json-files/ I hope this is helpful. Thanks!

Re: Quick request: prolific PR openers, review your open PRs

2017-01-04 Thread Hyukjin Kwon
Let me double-check mind too. 2017-01-04 21:57 GMT+09:00 Liang-Chi Hsieh : > > Ok. I will go through and check my open PRs. > > > Sean Owen wrote > > Just saw that there are many people with >= 8 open PRs. Some are > > legitimately in flight but many are probably stale. To set

Re: [Important for PySpark Devs]: Master now tests with Python 2.7 rather than 2.6 - please retest any Python PRs

2017-03-29 Thread Hyukjin Kwon
Thank you for informing this. On 30 Mar 2017 3:52 a.m., "Holden Karau" wrote: > Hi PySpark Developers, > > In https://issues.apache.org/jira/browse/SPARK-19955 / > https://github.com/apache/spark/pull/17355, as part of our continued > Python 2.6 deprecation

Re: Build completed: spark 866-master

2017-03-06 Thread Hyukjin Kwon
Xin" <r...@databricks.com> wrote: > >> Most of the previous notifications were caught as spam. We should really >> disable this. >> >> >> On Sat, Mar 4, 2017 at 4:17 PM Hyukjin Kwon <gurwls...@gmail.com> wrote: >> >>> Oh BTW, I was asked a

Re: Build completed: spark 866-master

2017-03-04 Thread Hyukjin Kwon
I think we should ask to disable this within Web UI configuration. In this JIRA, https://issues.apache.org/jira/browse/INFRA-12590, Daniel said > ... configured to send build results to dev@spark.apache.org. In the case of my accounts, I manually went to https://ci.appveyor.com/ notifications

Re: Build completed: spark 866-master

2017-03-04 Thread Hyukjin Kwon
Oh BTW, I was asked about this by Reynold. Few month ago and I said the similar answer. I think I am not supposed to don't recieve the emails (not sure but I have not recieved) so I am not too sure if this has happened so far or occationally. On 5 Mar 2017 9:08 a.m., "Hyukjin Kwon&quo

Re: Tests failing with run-tests.py SyntaxError

2017-07-28 Thread Hyukjin Kwon
Yes, that's my guess just given information here without a close look. On 28 Jul 2017 11:03 pm, "Sean Owen" <so...@cloudera.com> wrote: I see, does that suggest that a machine has 2.6, when it should use 2.7? On Fri, Jul 28, 2017 at 2:58 PM Hyukjin Kwon <gurwls.

Re: Tests failing with run-tests.py SyntaxError

2017-07-28 Thread Hyukjin Kwon
That looks appearently due to dict comprehension which is, IIRC, not allowed in Python 2.6.x. I checked the release note for sure before - https://issues.apache.org/jira/browse/SPARK-20149 On 28 Jul 2017 9:56 pm, "Sean Owen" wrote: > File "./dev/run-tests.py", line 124 >

Re: Some PRs not automatically linked to JIRAs

2017-08-02 Thread Hyukjin Kwon
uot;in progress", but that's not a big deal. >> Otherwise that helps until we can find out why it's not doing this >> automatically. I'm not familiar with that script, can anyone run it to >> apply to a single JIRA they are working on? >> >> On Wed, Aug 2, 2017 at

Re: Some PRs not automatically linked to JIRAs

2017-08-02 Thread Hyukjin Kwon
I was wondering about this too.. Yes, actually, I have been manually adding some links by resembling the same steps in the script before. I was thinking it'd rather be nicer to run this manually once and then I ran this against single JIRA first -

Question about manually running dev/github_jira_sync.py

2017-08-02 Thread Hyukjin Kwon
Hi all, I lately realised it looks we have some problem between somewhere executing dev/github_jira_sync.py and JIRA. So, I see committers or issue reporters manually leave PR links in their JIRAs or multiple PRs are open for the same JIRA. Would anyone mind if I manually run this script? I

Re: Run a specific PySpark test or group of tests

2017-08-14 Thread Hyukjin Kwon
For me, I would like this if this can be done with relatively small changes. How about adding more granular options, for example, specifying or filtering smaller set of test goals in the run-tests.py script? I think it'd be quite small change and we could roughly reach this goal if I understood

Re: Crowdsourced triage Scapegoat compiler plugin warnings

2017-07-13 Thread Hyukjin Kwon
ults have been analyzed. I suspect > there's more than enough to act on already. I think we should wait until > after 2.2 is done. > Anybody prefer how to proceed here -- just open a JIRA to take care of a > batch of related types of issues and go for it? > > On Sat, Jun 17, 2017 at 4

Re: [VOTE] Apache Spark 2.2.0 (RC6)

2017-07-06 Thread Hyukjin Kwon
+1 2017-07-07 6:41 GMT+09:00 Reynold Xin : > +1 > > > On Fri, Jun 30, 2017 at 6:44 PM, Michael Armbrust > wrote: > >> Please vote on releasing the following candidate as Apache Spark version >> 2.2.0. The vote is open until Friday, July 7th, 2017 at

Re: [VOTE] Apache Spark 2.2.0 (RC1)

2017-04-29 Thread Hyukjin Kwon
SPARK-20364 describes a bug but I am unclear that we should call it a regression that blocks a release. It is something working incorrectly (in some cases in terms of output) but this case looks not even working so far in the past releases. The

Re: Question, Flaky tests: pyspark.sql.tests.ArrowTests tests in Jenkins worker 5(?)

2017-08-06 Thread Hyukjin Kwon
at 2:01 AM, Liang-Chi Hsieh <vii...@gmail.com> > wrote: > >> > >> Maybe a possible fix: > >> https://stackoverflow.com/questions/31495657/ > development-build-of-pandas-giving-importerror-c- > extension-hashtable-not-bui > >> > >>

Question, Flaky tests: pyspark.sql.tests.ArrowTests tests in Jenkins worker 5(?)

2017-08-05 Thread Hyukjin Kwon
Hi all, I am seeing flaky Python tests time to time and if I am not mistaken mostly in amp-jenkins-worker-05: == ERROR: test_filtered_frame (pyspark.sql.tests.ArrowTests)

Re: Welcoming Hyukjin Kwon and Sameer Agarwal as committers

2017-08-07 Thread Hyukjin Kwon
AM, Mridul Muralidharan <mri...@gmail.com> >> wrote: >> >>> Congratulations Hyukjin, Sameer ! >>> >>> Regards, >>> Mridul >>> >>> On Mon, Aug 7, 2017 at 8:53 AM, Matei Zaharia <matei.zaha...@gmail.com> >>> wro

Re: [VOTE] [SPIP] SPARK-18085: Better History Server scalability

2017-08-01 Thread Hyukjin Kwon
+1 Although I am not used to this code path, I read the proposal few times and it makes sense to me. On 1 Aug 2017 7:01 pm, "Denis Bolshakov" wrote: +1 Absolutely agree on SPARK-18085. On 1 August 2017 at 12:18, Sean Owen wrote: > (Direct

Re: Tests failing with run-tests.py SyntaxError

2017-07-28 Thread Hyukjin Kwon
Or maybe in https://github.com/apache/spark/blob/master/dev/run-tests#L23 On 29 Jul 2017 11:16 am, "Hyukjin Kwon" <gurwls...@gmail.com> wrote: I am sorry for saying just based on my wild guess because I have no way to check and take a look into Jenkins but I think we m

Re: Tests failing with run-tests.py SyntaxError

2017-07-28 Thread Hyukjin Kwon
s_to_test) for m in modules_to_test}, sort=True) Bests, Dongjoon. *From: *Hyukjin Kwon <gurwls...@gmail.com> *Date: *Friday, July 28, 2017 at 7:06 AM *To: *Sean Owen <so...@cloudera.com> *Cc: *dev <dev@spark.apache.org> *Subject: *Re: Tests failing with run-tests.py SyntaxError Yes, that

Encouraging tests SparkR, in particular, dapply, gapply and RDD based APIs (SPARK-21093)

2017-06-25 Thread Hyukjin Kwon
Hi all, Recently, there was an issue about a leak in SparkR in https://issues.apache.org/jira/browse/SPARK-21093. It was even worse because R workers crash on CentOS easily. This was fixed in https://github.com/apache/spark/commit/6b3d02285ee0debc73cbcab01b10398a498fbeb8. It was about the very

Re: Crowdsourced triage Scapegoat compiler plugin warnings

2017-06-17 Thread Hyukjin Kwon
Gentle ping to dev for help. I hope this effort is not abandoned. On 25 May 2017 9:41 am, "Josh Rosen" wrote: I'm interested in using the Scapegoat Scala compiler plugin to find potential bugs and performance problems in Spark.

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-13 Thread Hyukjin Kwon
For the test failure on R, I checked: Per https://github.com/apache/spark/tree/v2.2.0-rc4, 1. Windows Server 2012 R2 / R 3.3.1 - passed ( https://ci.appveyor.com/project/spark-test/spark/build/755-r-test-v2.2.0-rc4 ) 2. macOS Sierra 10.12.3 / R 3.4.0 - passed 3. macOS Sierra 10.12.3 / R 3.2.3 -

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-14 Thread Hyukjin Kwon
> From: Nick Pentreath <nick.pentre...@gmail.com> > Sent: Tuesday, June 13, 2017 11:38 PM > Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4) > To: Felix Cheung <felixcheun...@hotmail.com>, Hyukjin Kwon < > gurwls...@gmail.com>, dev <dev@spark.

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-14 Thread Hyukjin Kwon
, "a", function(key, x) { x }, schema(df))) looks occasionally throwing an error. I will leave here and probably explain more information if a JIRA is open. This does not look a regression anyway. 2017-06-14 16:22 GMT+09:00 Hyukjin Kwon <gurwls...@gmail.com>: > > Per http

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-18 Thread Hyukjin Kwon
Is this a regression BTW? I am just curious. On 19 Jun 2017 1:18 pm, "Liang-Chi Hsieh" wrote: -1. When using kyro serializer and partition number is greater than 2000. There seems a NPE issue needed to fix. SPARK-21133

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-14 Thread Hyukjin Kwon
Actually, I opened - https://issues.apache.org/jira/browse/SPARK-21093. 2017-06-14 17:08 GMT+09:00 Hyukjin Kwon <gurwls...@gmail.com>: > For a shorter reproducer ... > > > df <- createDataFrame(list(list(1L, 1, "1", 0.1)), c("a", "b", "

Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

2017-09-11 Thread Hyukjin Kwon
+1 (non-binding) 2017-09-12 9:52 GMT+09:00 Yin Huai : > +1 > > On Mon, Sep 11, 2017 at 5:47 PM, Sameer Agarwal > wrote: > >> +1 (non-binding) >> >> On Thu, Sep 7, 2017 at 9:10 PM, Bryan Cutler wrote: >> >>> +1 (non-binding) for

Re: doc patch review

2017-09-21 Thread Hyukjin Kwon
I think it would have been nicer if the JIRA and PR are written in this email. 2017-09-21 19:44 GMT+09:00 Steve Loughran : > I have a doc patch on spark streaming & object store sources which has > been hitting is six-month-unreviewed state this week > > are there any

Re: [VOTE] Spark 2.1.2 (RC4)

2017-10-06 Thread Hyukjin Kwon
1 the release > > Perhaps someone can take a look at the R failures on RHEL just in case > though. > > > On Fri, 6 Oct 2017 at 05:58 vaquar khan <vaquar.k...@gmail.com> wrote: > >> +1 (non binding ) tested on Ubuntu ,all test case are passed. >> >> Re

Re: [VOTE] Spark 2.1.2 (RC4)

2017-10-05 Thread Hyukjin Kwon
+1 too. On 6 Oct 2017 10:49 am, "Reynold Xin" wrote: +1 On Mon, Oct 2, 2017 at 11:24 PM, Holden Karau wrote: > Please vote on releasing the following candidate as Apache Spark version 2 > .1.2. The vote is open until Saturday October 7th at 9:00

Re: Disabling Closed -> Reopened transition for non-committers

2017-10-05 Thread Hyukjin Kwon
It's, Closed -> Reopened (not, Resolved -> Reopened) and I think we mostly leave JIRAs as Resolved. I support this idea. I think don't think this is unfriendly as it sounds in practice. This case should be quite occasional I guess. 2017-10-05 20:02 GMT+09:00 Sean Owen : >

Re: Nightly builds for master branch failed

2017-10-05 Thread Hyukjin Kwon
Thanks Shane. 2017-10-06 2:07 GMT+09:00 shane knapp : > ...and we're green: https://amplab.cs.berkeley.edu/jenkins/job/ > spark-master-maven-snapshots/2025/ > > On Thu, Oct 5, 2017 at 9:46 AM, shane knapp wrote: > >> not a problem. :) >> >> On Thu,

Re: Welcoming Saisai (Jerry) Shao as a committer

2017-08-28 Thread Hyukjin Kwon
Congratulations! Very well deserved. 2017-08-29 11:41 GMT+09:00 Liwei Lin : > Congratulations, Jerry! > > Cheers, > Liwei > > On Tue, Aug 29, 2017 at 10:15 AM, 蒋星博 wrote: > >> congs! >> >> Takeshi Yamamuro 于2017年8月28日 周一下午7:11写道:

Re: [discuss][SQL] Partitioned column type inference proposal

2017-11-14 Thread Hyukjin Kwon
es, the priority from high to low: >> DecimalType, LongType, IntegerType. This is because DecimalType is used as >> big integer when paring partition column values. >> 4. DoubleType can't be merged with other types, except DoubleType itself. >> 5. when merging TimestampType wit

Re: [VOTE] Spark 2.2.1 (RC2)

2017-11-28 Thread Hyukjin Kwon
+1 2017-11-29 8:18 GMT+09:00 Henry Robinson : > (My vote is non-binding, of course). > > On 28 November 2017 at 14:53, Henry Robinson wrote: > >> +1, tests all pass for me on Ubuntu 16.04. >> >> On 28 November 2017 at 10:36, Herman van Hövell tot Westerflier

Re: [discuss][PySpark] Can we drop support old Pandas (<0.19.2) or what version should we support?

2017-11-14 Thread Hyukjin Kwon
+0 to drop it as I said in the PR. I am seeing It brings a lot of hard time to get the cool changes through, and is slowing down them to get pushed. My only worry is, users who depends on lower pandas versions (Pandas 0.19.2 seems released less then a year before. In the similar time, Spark 2.1.0

[discuss][SQL] Partitioned column type inference proposal

2017-11-14 Thread Hyukjin Kwon
Hi dev, I would like to post a proposal about partitioned column type inference (related with 'spark.sql.sources.partitionColumnTypeInference.enabled' configuration). This thread focuses on the type coercion (finding the common type) in partitioned columns, in particular, when the different form

Re: Spark build is failing in amplab Jenkins

2017-11-04 Thread Hyukjin Kwon
I assume it is as it says: Python versions prior to 2.7 are not supported. Looks this happens in worker 2, 6 and 7 given my observation. On 4 Nov 2017 5:15 pm, "Sean Owen" wrote: Agree, seeing this somewhat regularly on the pull request builder. Do some machines

Re: Build timeout -- continuous-integration/appveyor/pr — AppVeyor build failed

2018-05-13 Thread Hyukjin Kwon
>From a very quick look, I believe that's just occasional network issue in AppVeyor. For example, in this case: Downloading: https://repo.maven.apache.org/maven2/org/scala-lang/scala-compiler/2.11.8/scala-compiler-2.11.8.jar This took 26ish mins and seems further downloading jars look mins much

Re: Running lint-java during PR builds?

2018-05-22 Thread Hyukjin Kwon
lume of test runs on Travis. >>> > >>> > In ASF projects Travis could get significantly >>> > backed up since - if I recall - all of ASF shares one queue. >>> > >>> > At the number of PRs Spark has this could be a big issue. >>> > >>

Re: Running lint-java during PR builds?

2018-05-20 Thread Hyukjin Kwon
I would like to revive this proposal. Travis CI. Shall we give this try? I think it's worth trying it. 2016-11-17 3:50 GMT+08:00 Dongjoon Hyun : > Hi, Marcelo and Ryan. > > That was the main purpose of my proposal about Travis.CI. > IMO, that is the only way to achieve that

Re: Running lint-java during PR builds?

2018-05-21 Thread Hyukjin Kwon
I am going to open an INFRA JIRA if there's no explicit objection in few days. 2018-05-21 13:09 GMT+08:00 Hyukjin Kwon <gurwls...@gmail.com>: > I would like to revive this proposal. Travis CI. Shall we give this try? I > think it's worth trying it. > > 2016-11-17 3:50 GMT+0

Re: Jenkins availability question

2018-06-16 Thread Hyukjin Kwon
Ooops, I just noticed Shane's email. Please ignore this email. 2018년 6월 16일 (토) 오후 7:43, Hyukjin Kwon 님이 작성: > Is Jenkins down now? I was about to investigate some issues that happened > specifically within Jenkins. > > I would appreciate if anyone could roughly confirm when i

  1   2   3   4   5   6   >