Re: Welcoming Yanbo Liang as a committer

2016-06-03 Thread Dongjoon Hyun
Wow, Congratulations, Yanbo! Dongjoon. On Fri, Jun 3, 2016 at 8:22 PM, Xiao Li wrote: > Congratulations, Yanbo! > > 2016-06-03 19:54 GMT-07:00 Nan Zhu : > >> Congratulations ! >> >> -- >> Nan Zhu >> >> On June 3, 2016 at 10:50:33 PM, Ted Yu

Re: BUILD FAILURE due to...Unable to find configuration file at location dev/scalastyle-config.xml

2016-03-07 Thread Dongjoon Hyun
Or, I hope one of committer commits both mine(11567) and that soon. It's related to build setting files, Jenkins test tooks over 2 hours. :( Dongjoon. On Mon, Mar 7, 2016 at 11:48 PM, Dongjoon Hyun <dongj...@apache.org> wrote: > Ur, may I include that, too? > > Dongjoon. > >

Re: BUILD FAILURE due to...Unable to find configuration file at location dev/scalastyle-config.xml

2016-03-08 Thread Dongjoon Hyun
Hi, I updated PR https://github.com/apache/spark/pull/11567. But, `lint-java` fails if that file is in the dev folder. (Jenkins fails, too.) So, inevitably, I changed pom.xml instead. Dongjoon. On Mon, Mar 7, 2016 at 11:40 PM, Jacek Laskowski wrote: > Hi, > > At first

Re: BUILD FAILURE due to...Unable to find configuration file at location dev/scalastyle-config.xml

2016-03-07 Thread Dongjoon Hyun
Ur, may I include that, too? Dongjoon. On Mon, Mar 7, 2016 at 11:46 PM, Jacek Laskowski wrote: > Okey...it's building now > properly...https://github.com/apache/spark/pull/11567 + git mv > scalastyle-config.xml dev/ > > How to fix it in the repo? Should I send a pull request

Re: Using Travis for JDK7/8 compilation and lint-java.

2016-05-23 Thread Dongjoon Hyun
simultaneous PR builds may > have conflict. > > On Sun, May 22, 2016 at 9:21 PM, Dongjoon Hyun <dongj...@apache.org> > wrote: > >> Thank you for feedback. Sure, correctly, that's the reason why the >> current SparkPullRequestBuilder do not run `lint-java`. :-) >> &

Re: Using Travis for JDK7/8 compilation and lint-java.

2016-05-23 Thread Dongjoon Hyun
Thank you, Shane! I really hope that SparkPullRequestBuilder handle them if possible. Dongjoon. On Mon, May 23, 2016 at 1:24 PM, Dongjoon Hyun <dongj...@apache.org> wrote: > Thank you for your opinion! > > Sure. I know that history and totally agree with all your concerns.

Re: Using Travis for JDK7/8 compilation and lint-java.

2016-05-23 Thread Dongjoon Hyun
set of CI infrastructure to maintain. > > On Mon, May 23, 2016 at 9:43 AM, Dongjoon Hyun <dongj...@apache.org> > wrote: > >> Thank you, Steve and Hyukjin. >> >> And, don't worry, Ted. >> >> Travis launches new VMs for every PR. >> >> A

Re: Using Travis for JDK7/8 compilation and lint-java.

2016-05-23 Thread Dongjoon Hyun
(or Github). :-) If Spark uses Travis CI freely, they might dislike me for the heavy traffic. On Mon, May 23, 2016 at 1:26 PM, Dongjoon Hyun <dongj...@apache.org> wrote: > Thank you, Shane! > > I really hope that SparkPullRequestBuilder handle them if possible. > > Dongjoon. > >

Re: Using Travis for JDK7/8 compilation and lint-java.

2016-05-24 Thread Dongjoon Hyun
Yep. Let's hold on. :) On Tue, May 24, 2016 at 3:45 PM, shane knapp wrote: > > Sure, could you give me the permission for Spark Jira? > > > > Although we haven't decided yet, I can add Travis related section > > (summarizing current configurations and expected VM HW, etc).

Re: Using Travis for JDK7/8 compilation and lint-java.

2016-05-23 Thread Dongjoon Hyun
Thank you, Sean! On Mon, May 23, 2016 at 2:09 PM, Sean Owen wrote: > No, because then none of the Java 8 support can build. Marcelo has a JIRA > for handling that the right way with bootstrap class path config. > > Ideally it can be rolled into Jenkins though there are

Re: Using Travis for JDK7/8 compilation and lint-java.

2016-05-22 Thread Dongjoon Hyun
n install' takes ~30 minutes. > > Maybe not everyone is willing to wait that long. > > On Sun, May 22, 2016 at 1:30 PM, Dongjoon Hyun <dongj...@apache.org> > wrote: > >> Oh, Sure. My bad! >> >> - For Oracle JDK7, mvn -DskipTests install and run `dev/lint-java`. &g

Using Travis for JDK7/8 compilation and lint-java.

2016-05-22 Thread Dongjoon Hyun
Hi, All. I want to propose the followings. - Turn on Travis CI for Apache Spark PR queue. - Recommend this for contributors, too Currently, Spark provides Travis CI configuration file to help contributors check Scala/Java style conformance and JDK7/8 compilation easily during their preparing

Re: Using Travis for JDK7/8 compilation and lint-java.

2016-05-22 Thread Dongjoon Hyun
ted twice: > > - For Oracle JDK7, mvn -DskipTests install and run `dev/lint-java`. > > Did you intend to cover JDK 8 ? > > Cheers > > On Sun, May 22, 2016 at 1:25 PM, Dongjoon Hyun <dongj...@apache.org> > wrote: > >> Hi, All. >> >> I want to propos

Re: Building spark master failed

2016-05-23 Thread Dongjoon Hyun
Hi, That is not the latest. The bug was fixed 5 days ago. Regards, Dongjoon. On Mon, May 23, 2016 at 2:16 AM, Ovidiu-Cristian MARCU < ovidiu-cristian.ma...@inria.fr> wrote: > Hi > > I have the following issue when trying to build the latest spark source > code on master: > >

Re: Using Travis for JDK7/8 compilation and lint-java.

2016-05-24 Thread Dongjoon Hyun
Hi, All. As Sean said, Vanzin made a PR for JDK7 compilation. We can ignore the issue of JDK7 compilation. The remaining issues are the java-linter and maven installation test. To: Michael For the rate limit, Apache Foundation seems to use 30 concurrent according to the INFRA blog.

Re: Using Travis for JDK7/8 compilation and lint-java.

2016-05-24 Thread Dongjoon Hyun
Thank you, Shane. Sure, could you give me the permission for Spark Jira? Although we haven't decided yet, I can add Travis related section (summarizing current configurations and expected VM HW, etc). That will be helpful for further discussions. It's just a Wiki, you can delete the Travis

Re: Internal Deprecation warnings - worth fixing?

2016-07-27 Thread Dongjoon Hyun
+1 for fixing :) Dongjoon. On Wed, Jul 27, 2016 at 12:53 PM, Nick Pentreath wrote: > +1 I don't believe there's any reason for the warnings to still be there > except for available dev time & focus :) > > On Wed, 27 Jul 2016 at 21:35, Jacek Laskowski

Re: drop java 7 support for spark 2.1.x or spark 2.2.x

2016-07-23 Thread Dongjoon Hyun
Hi, All. What about providing a official benchmark result between `Apache Spark on JDK7` and `Apache Spark on JDK8`? I think that is enough for this issue since we cannot drive users. We had better let users choose one of JDK7/JDK8 for their own benefits. Bests, Dongjoon. On Sat, Jul 23, 2016

Re: Spark Homepage

2016-07-13 Thread Dongjoon Hyun
m to be affected so maybe it was a just > bad update for the Spark website? > > On Wed, Jul 13, 2016 at 12:05 PM, Dongjoon Hyun <dongj...@apache.org> > wrote: > >> Hi, All. >> >> Currently, Spark Homepage (http://spark.apache.org/) shows file listing >>

Spark Homepage

2016-07-13 Thread Dongjoon Hyun
Hi, All. Currently, Spark Homepage (http://spark.apache.org/) shows file listing (containing md files) Is there any maintenance operation on that? :) Warmly, Dongjoon.

Re: Spark Homepage

2016-07-13 Thread Dongjoon Hyun
07 PM, Holden Karau <hol...@pigscanfly.ca> > wrote: > >> This has also been reported on the user@ by a few people - other apache >> projects (arrow & hadoop) don't seem to be affected so maybe it was a just >> bad update for the Spark website? >> >> On Wed,

Re: What's the meaning of Target Version/s in Spark's JIRA?

2016-06-28 Thread Dongjoon Hyun
Hi, 1.6.2 is just the result of back-porting of that patch. The patch was originally targeted and merged into 2.0.0. Warmly, Dongjoon. On Tue, Jun 28, 2016 at 10:54 AM, Jacek Laskowski wrote: > Hi, > > While reviewing the release notes for 1.6.2 I stumbled upon >

Re: Welcoming Felix Cheung as a committer

2016-08-08 Thread Dongjoon Hyun
Congratulation, Felix! Bests, Dongjoon. On Monday, August 8, 2016, Ted Yu wrote: > Congratulations, Felix. > > On Mon, Aug 8, 2016 at 11:15 AM, Matei Zaharia > wrote: > >> Hi all, >> >> The

Re: welcoming Burak and Holden as committers

2017-01-24 Thread Dongjoon Hyun
Great! Congratulations, Burak and Holden. Bests, Dongjoon. On 2017-01-24 10:29 (-0800), Nicholas Chammas wrote: > 👏 👍 > > Congratulations, Burak and Holden. > > On Tue, Jan 24, 2017 at 1:27 PM Russell Spitzer > wrote: > > >

Re: Weird experience Hive with Spark Transformations

2017-01-17 Thread Dongjoon Hyun
Hi, Chetan. Did you copy your `hive-site.xml` into Spark conf directory? For example, cp /usr/local/hive/conf/hive-site.xml /usr/local/spark/conf If you want to use the existing Hive metastore, you need to provide that information to Spark. Bests, Dongjoon. On 2017-01-16 21:36 (-0800),

Re: GraphX-related "open" issues

2017-01-17 Thread Dongjoon Hyun
Hi, Takeshi. > So, IMO it seems okay to close tickets about "Improvement" and "New Feature" > for now. I'm just wondering about what kind of field value you want to fill in the `Resolution` field for those issues. Maybe, 'Later'? Or, 'Won't Fix'? Bests, Dongjoon.

Re: Support for Hive 2.x

2016-09-02 Thread Dongjoon Hyun
Hi, Rostyslav, After your email, I also tried to search in this morning, but I didn't find a proper one. The last related issue is SPARK-8064, `Upgrade Hive to 1.2` https://issues.apache.org/jira/browse/SPARK-8064 If you want, you can file an JIRA issue including your pain points, then you can

Re: Mesos is now a maven module

2016-08-30 Thread Dongjoon Hyun
ower to modify Jenkins jobs now, so I will carefully > take > > a look at them and see if any of the config needs -Pmesos. But yeah I > > thought this should be baked into the script. > > > > On Tue, Aug 30, 2016 at 5:56 PM, Dongjoon Hyun <dongj...@apache.org> >

Re: Mesos is now a maven module

2016-08-30 Thread Dongjoon Hyun
Hi, Michael. It's a great news! BTW, I'm wondering if the Jenkins (SparkPullRequestBuilder) knows this new profile, -Pmesos. The PR was passed with the following Jenkins build arguments without `-Pmesos` option. (at the last test) ``` [info] Building Spark (w/Hive 1.2.1) using SBT with these

Re: Mesos is now a maven module

2016-08-30 Thread Dongjoon Hyun
Thank you all for quick fix! :D Dongjoon. On Tuesday, August 30, 2016, Michael Gummelt wrote: > https://github.com/apache/spark/pull/14885 > > Thanks > > On Tue, Aug 30, 2016 at 11:36 AM, Marcelo Vanzin

Re: [VOTE] Release Apache Spark 2.0.1 (RC2)

2016-09-24 Thread Dongjoon Hyun
+1 (non binding) I compiled and tested on the following two systems. - CentOS 7.2 / Oracle JDK 1.8.0_77 / R 3.3.1 with -Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver -Dsparkr - CentOS 7.2 / Open JDK 1.8.0_102 with -Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver Bests,

Re: [VOTE] Release Apache Spark 2.0.1 (RC3)

2016-09-25 Thread Dongjoon Hyun
+1 (non binding) RC3 is compiled and tested on the following two systems, too. All tests passed. * CentOS 7.2 / Oracle JDK 1.8.0_77 / R 3.3.1 with -Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver -Dsparkr * CentOS 7.2 / Open JDK 1.8.0_102 with -Pyarn -Phadoop-2.7 -Pkinesis-asl

Re: [VOTE] Release Apache Spark 2.0.1 (RC4)

2016-09-29 Thread Dongjoon Hyun
+1 (non-binding) At this time, I tested RC4 on the followings. - CentOS 6.8 (Final) - OpenJDK 1.8.0_101 - Python 2.7.12 /build/mvn -Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver -Dpyspark -Dsparkr -DskipTests clean package /build/mvn -Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive

Re: welcoming Xiao Li as a committer

2016-10-04 Thread Dongjoon Hyun
Congratulations, Xiao! Bests, Dongjoon. On Monday, October 3, 2016, Jagadeesan As wrote: > Congratulations Xiao Li. > > Cheers > Jagadeesan A S > > > > From:Reynold Xin > > To:

Re: Straw poll: dropping support for things like Scala 2.10

2016-10-26 Thread Dongjoon Hyun
Hi, All. It's great since it's a progress. Then, at least, in 2017, Spark 2.2.0 will be out with JDK8 and Scala 2.11/2.12, right? Bests, Dongjoon. - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: [VOTE] Release Apache Spark 1.6.3 (RC1)

2016-10-27 Thread Dongjoon Hyun
Hi, All. Last time, RC1 passed the tests with only the timezone testcase failure. Now, it's backported, too. I'm wondering if we have other issues to block releasing Apache Spark 1.6.3. Bests, Dongjoon. - To unsubscribe

Re: [VOTE] Release Apache Spark 2.0.2 (RC1)

2016-10-27 Thread Dongjoon Hyun
+1 non-binding. Built and tested CentOS 6.6 / OpenJDK 1.8.0_111. Cheers, Dongjoon. - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: [VOTE] Release Apache Spark 2.0.2 (RC3)

2016-11-08 Thread Dongjoon Hyun
+1 (non-binding) It's built and tested on CentOS 6.8 / OpenJDK 1.8.0_111 with `-Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver -Psparkr` profile. Cheers! Dongjoon. On 2016-11-08 14:03 (-0800), Michael Armbrust wrote: > +1 > > On Tue, Nov 8, 2016 at 1:17

Is `randomized aggregation test` testsuite stable?

2016-11-10 Thread Dongjoon Hyun
Hi, All. Recently, I observed frequent failures of `randomized aggregation test` of ObjectHashAggregateSuite in SparkPullRequestBuilder. SPARK-17982 https://github.com/apache/spark/pull/15546 (Today) SPARK-18123 https://github.com/apache/spark/pull/15664 (Today) SPARK-18169

Re: Is `randomized aggregation test` testsuite stable?

2016-11-10 Thread Dongjoon Hyun
. > Gonna disable them temporarily for now. > > Sorry for the inconvenience! > > Cheng > > > On 11/10/16 8:48 AM, Dongjoon Hyun wrote: > > Hi, All. > > > > Recently, I observed frequent failures of `randomized aggregation test` of > > Object

Two major versions?

2016-11-27 Thread Dongjoon Hyun
Hi, All. Do we have a release plan of Apache Spark 1.6.4? Up to my knowledge, Apache Spark community has been focusing on latest two versions. There was no official release of Apache Spark *X.X.4* so far. It's also well-documented on Apache Spark home page (Versioning policy;

Re: Two major versions?

2016-11-27 Thread Dongjoon Hyun
that impact wide use cases, or security bugs. > > > On Sun, Nov 27, 2016 at 12:49 PM, Dongjoon Hyun <dongj...@apache.org> wrote: > > > Hi, All. > > > > Do we have a release plan of Apache Spark 1.6.4? > > > > Up to my knowledge, Apache

Re: Two major versions?

2016-11-28 Thread Dongjoon Hyun
releases but I think that may also > be too rare to put even informal statements around. Every couple years? > > On Sun, Nov 27, 2016 at 8:49 PM Dongjoon Hyun <dongj...@apache.org> wrote: > > Hi, All. > > Do we have a release plan of Apache Spark 1.6.4? > > Up to my knowled

Re: Running lint-java during PR builds?

2016-11-16 Thread Dongjoon Hyun
/dongjoon-hyun/spark/jobs/176351319 Actually, I've been monitoring the history here. (It's synced every 30 minutes.) https://travis-ci.org/dongjoon-hyun/spark/builds Could we give a change to this? Bests, Dongjoon. On 2016-11-15 13:40 (-0800), "Shixiong(Ryan) Zhu" <shixi...@databrick

Re: [VOTE] Release Apache Spark 2.0.2 (RC3)

2016-11-11 Thread Dongjoon Hyun
Hi. Now, do we have Apache Spark 2.0.2? :) Bests, Dongjoon. On 2016-11-07 22:09 (-0800), Reynold Xin wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.0.2. The vote is open until Thu, Nov 10, 2016 at 22:00 PDT and passes if > a majority

Re: [VOTE] Release Apache Spark 2.0.2 (RC2)

2016-11-02 Thread Dongjoon Hyun
Hi, Sean. The same failure blocks me, too. - SPARK-18189: Fix serialization issue in KeyValueGroupedDataset *** FAILED *** I used `-Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver -Dsparkr` on CentOS 7 / OpenJDK1.8.0_111. Dongjoon. On 2016-11-02 10:44 (-0700), Sean Owen

Re: [VOTE] Release Apache Spark 1.6.3 (RC2)

2016-11-03 Thread Dongjoon Hyun
+1 (non-binding) It's built and tested on CentOS 6.8 / OpenJDK 1.8.0_111, too. Cheers, Dongjoon. On 2016-11-03 14:30 (-0700), Davies Liu wrote: > +1 > > On Wed, Nov 2, 2016 at 5:40 PM, Reynold Xin wrote: > > Please vote on releasing the following

Re: Question about SPARK-11374 (skip.header.line.count)

2016-12-10 Thread Dongjoon Hyun
mpanies have. > > > -- > *From:* Dongjoon Hyun <dongj...@apache.org> > *Sent:* Friday, December 9, 2016 9:42:58 AM > *To:* Dongjin Lee; dev@spark.apache.org > *Subject:* Re: Question about SPARK-11374 (skip.header.line.count) > > Thank you for the opinion, Dongjin! > &g

Re: Question about SPARK-11374 (skip.header.line.count)

2016-12-11 Thread Dongjoon Hyun
Thank you for the opinion, Mingjie and Liang-Chi. Dongjoon. On Sun, Dec 11, 2016 at 5:42 PM, Liang-Chi Hsieh wrote: > Hi Dongjoon, > > I know some people only use Spark SQL with SQL syntax not Dataset API. So I > think it should be useful to provide a way to do this in SQL. >

Forking or upgrading Apache Parquet in Spark

2016-12-15 Thread Dongjoon Hyun
Hi, All. I made a PR to upgrade Parquet to 1.9.0 for Apache Spark 2.2 on Late March. - https://github.com/apache/spark/pull/16281 Currently, there occurs some important options about that. Here is the summary. 1. Forking Parquet 1.8.X and maintaining like Spark Hive. 2. Wait and see for

Re: [VOTE] Apache Spark 2.1.0 (RC5)

2016-12-16 Thread Dongjoon Hyun
RC5 is also tested on CentOS 6.8, OpenJDK 1.8.0_111, R 3.3.2 with profiles `-Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver -Psparkr`. BTW, there still exist five on-going issues in JIRA (with target version 2.1.0). 1. SPARK-16845

Fwd: Question about SPARK-11374 (skip.header.line.count)

2016-12-08 Thread Dongjoon Hyun
+dev I forget to add @user. Dongjoon. -- Forwarded message - From: Dongjoon Hyun <dongj...@apache.org> Date: Thu, Dec 8, 2016 at 16:00 Subject: Question about SPARK-11374 (skip.header.line.count) To: <dev@spark.apache.org> Hi, All. Could you give me

Question about SPARK-11374 (skip.header.line.count)

2016-12-08 Thread Dongjoon Hyun
Hi, All. Could you give me some opinion? There is an old SPARK issue, SPARK-11374, about removing header lines from text file. Currently, Spark supports removing CSV header lines by the following way. ``` scala> spark.read.option("header","true").csv("/data").show +---+---+ | c1| c2| +---+---+

Re: Question about SPARK-11374 (skip.header.line.count)

2016-12-09 Thread Dongjoon Hyun
Thank you for the opinion, Dongjin! On Thu, Dec 8, 2016 at 21:56 Dongjin Lee <dong...@apache.org> wrote: > +1 For this idea. I need it also. > > Regards, > Dongjin > > On Fri, Dec 9, 2016 at 8:59 AM, Dongjoon Hyun <dongj...@apache.org> wrote: > > Hi, All

Re: Parquet patch release

2017-01-06 Thread Dongjoon Hyun
Great! Thank you, Ryan. Bests, Dongjoon. On Fri, Jan 6, 2017 at 15:49 Xiao Li wrote: > Hi, Ryan, > > Really thank you for your help! > > Happy New Year! > > Xiao Li > > 2017-01-06 15:46 GMT-08:00 Ryan Blue : > > Last month, there was interest in

Re: Spark Project build Issues.(Intellij)

2017-06-28 Thread Dongjoon Hyun
Did you follow the guide in `IDE Setup` -> `IntelliJ` section of http://spark.apache.org/developer-tools.html ? Bests, Dongjoon. On Wed, Jun 28, 2017 at 5:13 PM, satyajit vegesna < satyajit.apas...@gmail.com> wrote: > Hi All, > > When i try to build source code of apache spark code from >

Re: Thoughts on release cadence?

2017-07-30 Thread Dongjoon Hyun
+1 Bests, Dongjoon On Sun, Jul 30, 2017 at 02:20 Sean Owen wrote: > The project had traditionally posted some guidance about upcoming > releases. The last release cycle was about 6 months. What about penciling > in December 2017 for 2.3.0?

Re: Welcoming Hyukjin Kwon and Sameer Agarwal as committers

2017-08-07 Thread Dongjoon Hyun
Great! Congratulation, Hyukjin and Sameer! Dongjoon. On Mon, Aug 7, 2017 at 8:55 AM, Bai, Dave wrote: > Congrats, leveled up!=) > > On 8/7/17, 10:53 AM, "Matei Zaharia" wrote: > > >Hi everyone, > > > >The Spark PMC recently voted to add Hyukjin

Re: CHAR implementation?

2017-09-15 Thread Dongjoon Hyun
ORC does, maybe it has > a native CHAR type. > > rb > > On Thu, Sep 14, 2017 at 5:31 PM, Dongjoon Hyun <dongjoon.h...@gmail.com> > wrote: > >> Hi, All. >> >> Currently, Spark shows different behavior when we uses CHAR types. >> >> spark-sql> CR

Re: [VOTE] Spark 2.1.2 (RC1)

2017-09-14 Thread Dongjoon Hyun
Hi, Holden. It's not a problem, but the link of `List of JIRA ... with this filter` seems to be wrong. Bests, Dongjoon. On Thu, Sep 14, 2017 at 10:47 AM, Holden Karau wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.1.2. The vote is

Re: [VOTE] Spark 2.1.2 (RC1)

2017-09-14 Thread Dongjoon Hyun
ne is not actually > included in the filter. It opens on a JIRA that's not included, but the > search results look correct. project = SPARK AND fixVersion = 2.1.2 > > On Thu, Sep 14, 2017 at 9:15 PM Dongjoon Hyun <dongjoon.h...@gmail.com> > wrote: > >> Hi, Holden. >>

CHAR implementation?

2017-09-14 Thread Dongjoon Hyun
Hi, All. Currently, Spark shows different behavior when we uses CHAR types. spark-sql> CREATE TABLE t1(a CHAR(3)); spark-sql> CREATE TABLE t2(a CHAR(3)) STORED AS ORC; spark-sql> CREATE TABLE t3(a CHAR(3)) STORED AS PARQUET; spark-sql> INSERT INTO TABLE t1 SELECT 'a '; spark-sql> INSERT INTO

Re: Disabling Closed -> Reopened transition for non-committers

2017-10-04 Thread Dongjoon Hyun
It can stop reopening, but new JIRA issues with duplicate content will be created intentionally instead. Is that policy (privileged reopening) used in other Apache communities for that purpose? On Wed, Oct 4, 2017 at 7:06 PM, Sean Owen wrote: > We have this problem

Re: [VOTE] [SPIP] SPARK-15689: Data Source API V2 read path

2017-09-07 Thread Dongjoon Hyun
+1 (non-binding). On Thu, Sep 7, 2017 at 12:46 PM, 蒋星博 wrote: > +1 > > > Reynold Xin 于2017年9月7日 周四下午12:04写道: > >> +1 as well >> >> On Thu, Sep 7, 2017 at 9:12 PM, Michael Armbrust >> wrote: >> >>> +1 >>> >>> On Thu, Sep 7,

Re: 2.1.2 maintenance release?

2017-09-07 Thread Dongjoon Hyun
+1! As of today, For 2.1.2, we have 87 commits. (2.1.1 was released 4 months ago) For 2.2.1, we have 95 commits. (2.2.0 was released 2 months ago) Can we have 2.2.1, too? Bests, Dongjoon. On Thu, Sep 7, 2017 at 2:14 AM, Sean Owen wrote: > In a separate conversation

Re: [VOTE] Spark 2.1.2 (RC2)

2017-09-27 Thread Dongjoon Hyun
+1 (non-binding) Bests, Dongjoon. On Wed, Sep 27, 2017 at 7:54 AM, Denny Lee wrote: > +1 (non-binding) > > > On Wed, Sep 27, 2017 at 6:54 AM Sean Owen wrote: > >> +1 >> >> I tested the source release. >> Hashes and signature (your signature) check

Re: [VOTE] Spark 2.1.2 (RC4)

2017-10-03 Thread Dongjoon Hyun
+1 (non-binding) Dongjoon. On Tue, Oct 3, 2017 at 5:13 AM, Herman van Hövell tot Westerflier < hvanhov...@databricks.com> wrote: > +1 > > On Tue, Oct 3, 2017 at 1:32 PM, Sean Owen wrote: > >> +1 same as last RC. Tests pass, sigs and hashes are OK. >> >> On Tue, Oct 3, 2017

Re: Welcoming Saisai (Jerry) Shao as a committer

2017-08-28 Thread Dongjoon Hyun
Great! Congratulation, Jerry! :D Bests, Dongjoon. On Mon, Aug 28, 2017 at 6:30 PM, Ted Yu wrote: > Congratulations, Jerry ! > > On Mon, Aug 28, 2017 at 6:28 PM, Matei Zaharia > wrote: > >> Hi everyone, >> >> The PMC recently voted to add Saisai

Re: Increase Timeout or optimize Spark UT?

2017-08-25 Thread Dongjoon Hyun
BTW, the situation seems to become worse, now we lost two builds. - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7/ - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.6/

Re: [VOTE] Spark 2.2.1 (RC2)

2017-11-28 Thread Dongjoon Hyun
+1 (non-binding) RC2 is tested on CentOS, too. Bests, Dongjoon. On Tue, Nov 28, 2017 at 4:35 PM, Hyukjin Kwon wrote: > +1 > > 2017-11-29 8:18 GMT+09:00 Henry Robinson : > >> (My vote is non-binding, of course). >> >> On 28 November 2017 at 14:53, Henry

Re: SPARK-22267 issue: Spark SQL incorrectly reads ORC file when column order is different

2017-11-14 Thread Dongjoon Hyun
Hi, Mark. That is one of the reasons why I left it behind from the previous PR (below) and I'm focusing is the second approach; use OrcFileFormat with convertMetastoreOrc. https://github.com/apache/spark/pull/19470 [SPARK-14387][SPARK-16628][SPARK-18355][SQL] Use Spark schema to read ORC table

Re: Cutting the RC for Spark 2.2.1 release

2017-11-08 Thread Dongjoon Hyun
It's great, Felix! As of today, `branch-2.2` seems to be broken due to SPARK-22211 (Scala UT failure) and SPARK-22417 (Python UT failure). I pinged you at both. Bests, Dongjoon. On Wed, Nov 8, 2017 at 5:51 PM, Holden Karau wrote: > Thanks for stepping up and running the

Apache Spark 2.3 and Apache ORC 1.4 finally

2017-12-05 Thread Dongjoon Hyun
Hi, All. Today, Apache Spark starts to use Apache ORC 1.4 as a `native` ORC implementation. SPARK-20728 Make OrcFileFormat configurable between `sql/hive` and `sql/core`. - https://github.com/apache/spark/commit/326f1d6728a7734c228d8bfaa69442a1c7b92e9b Thank you so much for all your supports

Re: Running lint-java during PR builds?

2018-05-21 Thread Dongjoon Hyun
; > At the number of PRs Spark has this could be a big issue. > > > > > > > > From: Marcelo Vanzin <van...@cloudera.com> > > Sent: Monday, May 21, 2018 9:08:28 AM > > To: Hyukjin Kwon > > Cc: Dongjoon Hyun; dev > > Subject: Re: Running lint-

Re: [VOTE] Spark 2.3.1 (RC4)

2018-06-03 Thread Dongjoon Hyun
+1 Bests, Dongjoon. On Sat, Jun 2, 2018 at 8:09 PM, Denny Lee wrote: > +1 > > On Sat, Jun 2, 2018 at 4:53 PM Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > >> I'll give that a try, but I'll still have to figure out what to do if >> none of the release builds work with hadoop-aws,

Vectorized ORC Reader in Apache Spark 2.3 with Apache ORC 1.4.1.

2018-01-10 Thread Dongjoon Hyun
Hi, All. Vectorized ORC Reader is now supported in Apache Spark 2.3. https://issues.apache.org/jira/browse/SPARK-16060 It has been a long journey. From now, Spark can read ORC files faster without feature penalty. Thank you for all your support, especially Wenchen Fan. It's done by two

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread Dongjoon Hyun
SPARK-23221 is one of the reasons for Kafka-test-suite deadlock issue. For the hang issues, it seems not to be marked as a failure correctly in Apache Spark Jenkins history. On Thu, Jan 25, 2018 at 1:03 PM, Marcelo Vanzin wrote: > On Thu, Jan 25, 2018 at 12:29 PM, Sean

Re: Schema Evolution in Apache Spark

2018-01-12 Thread Dongjoon Hyun
the data format used, i.e. parquet, Avro, ... which > already support changing schema? > > Dongjoon Hyun <dongjoon.h...@gmail.com> schrieb am Fr., 12. Jan. 2018 um > 02:30 Uhr: > >> Hi, All. >> >> A data schema can evolve in several ways and Apache Spark 2.3 a

Build timed out for `branch-2.3 (hadoop-2.7)`

2018-01-11 Thread Dongjoon Hyun
Hi, All and Shane. Can we increase the build time for `branch-2.3` during 2.3 RC period? There are two known test issues, but the Jenkins on branch-2.3 with hadoop-2.7 fails with build timeout. So, it's difficult to monitor whether the branch is healthy or not. Build timed out (after 255

Schema Evolution in Apache Spark

2018-01-11 Thread Dongjoon Hyun
Hi, All. A data schema can evolve in several ways and Apache Spark 2.3 already supports the followings for file-based data sources like CSV/JSON/ORC/Parquet. 1. Add a column 2. Remove a column 3. Change a column position 4. Change a column type Can we guarantee users some schema evolution

Re: Vectorized ORC Reader in Apache Spark 2.3 with Apache ORC 1.4.1.

2018-01-28 Thread Dongjoon Hyun
; Hi > > Thanks for this work. > > Will this affect both: > 1) spark.read.format("orc").load("...") > 2) spark.sql("select ... from my_orc_table_in_hive") > > ? > > > Le 10 janv. 2018 à 20:14, Dongjoon Hyun écrivait : > > Hi, All. > > > >

`convertMetastoreOrc/Parquet` issue

2018-02-07 Thread Dongjoon Hyun
Hi, All. SPARK-22279 turned on `convertMetastoreOrc` by default for `Feature Parity with Parquet`. Unfortunately, it will be turned off back via https://github.com/apache/spark/pull/20536 in Apache Spark 2.3 RC3 because a well-known

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-19 Thread Dongjoon Hyun
+1. I tested RC4 on CentOS 7.4 / OpenJDK 1.8.0_161 with `-Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver -Psparkr`. Bests, Dongjoon. On Sun, Feb 18, 2018 at 3:22 PM, Denny Lee wrote: > +1 (non-binding) > > Built and tested on macOS and Ubuntu. > > > On

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-19 Thread Dongjoon Hyun
In addition to Hyukjin's `github.io` result, `jekyll` also forwards the search result links correctly. SKIP_SCALADOC=1 SKIP_PYTHONDOC=1 SKIP_RDOC=1 jekyll serve --watch And, connect `http://127.0.0.1:4000`. This will be the same in Apache Spark websites. Bests, Dongjoon. On Mon, Feb 19,

Re: [VOTE] Spark 2.3.0 (RC5)

2018-02-26 Thread Dongjoon Hyun
+1 (non-binding). Bests, Dongjoon. On Mon, Feb 26, 2018 at 9:14 AM, Ryan Blue wrote: > +1 (non-binding) > > On Sat, Feb 24, 2018 at 4:17 PM, Xiao Li wrote: > >> +1 (binding) in Spark SQL, Core and PySpark. >> >> Xiao >> >> 2018-02-24 14:49

Re: [VOTE] SPIP: Standardize SQL logical plans

2018-07-18 Thread Dongjoon Hyun
+1 (non-binding). Bests, Dongjoon. On Wed, Jul 18, 2018 at 11:32 AM Henry Robinson wrote: > +1 (non-binding) > On Wed, Jul 18, 2018 at 9:12 AM Reynold Xin wrote: > >> +1 on this, on the condition that we can come up with a design that will >> remove the existing plans. >> >> >> On Tue, Jul

Re: Branch 2.4 is cut

2018-09-07 Thread Dongjoon Hyun
Thank you, Shane! :D Bests, Dongjoon. On Fri, Sep 7, 2018 at 9:51 AM shane knapp wrote: > i'll try and get to the 2.4 branch stuff today... > >

Re: Branch 2.4 is cut

2018-09-06 Thread Dongjoon Hyun
Great for branch cut and Scala 2.12 build. We also need to add `branch-2.4` to our Jenkins dashboard to prevent any regression. https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/ Bests, Dongjoon. On Thu, Sep 6, 2018 at 6:56 AM Wenchen Fan wrote: > Good news! I'll

Re: Maintenance releases for SPARK-23852?

2018-04-11 Thread Dongjoon Hyun
Great. If we can upgrade the parquet dependency from 1.8.2 to 1.8.3 in Apache Spark 2.3.1, let's upgrade orc dependency from 1.4.1 to 1.4.3 together. Currently, the patch is only merged into master branch now. 1.4.1 has the following issue. https://issues.apache.org/jira/browse/SPARK-23340

Re: Maintenance releases for SPARK-23852?

2018-04-17 Thread Dongjoon Hyun
t;> Seems like there aren't any objections. I'll pick this thread back up >> when a Parquet maintenance release has happened. >> >> Henry >> >> On 11 April 2018 at 14:00, Dongjoon Hyun <dongjoon.h...@gmail.com> wrote: >> >>> Great. >>> >

Re: Welcome Zhenhua Wang as a Spark committer

2018-04-02 Thread Dongjoon Hyun
Congratulations! Bests, Dongjoon. On Mon, Apr 2, 2018 at 07:57 Cody Koeninger wrote: > Congrats! > > On Mon, Apr 2, 2018 at 12:28 AM, Wenchen Fan wrote: > > Hi all, > > > > The Spark PMC recently added Zhenhua Wang as a committer on the project. > >

Re: [VOTE] Spark 2.2.2 (RC2)

2018-06-28 Thread Dongjoon Hyun
+1 Tested on CentOS 7.4 and Oracle JDK 1.8.0_171. Bests, Dongjoon. On Thu, Jun 28, 2018 at 7:24 AM Takeshi Yamamuro wrote: > +1 > > I run tests on a EC2 m4.2xlarge instance; > [ec2-user]$ java -version > openjdk version "1.8.0_171" > OpenJDK Runtime Environment (build 1.8.0_171-b10) > OpenJDK

Re: Random sampling in tests

2018-10-08 Thread Dongjoon Hyun
Sean's approach looks much better to me ( https://github.com/apache/spark/pull/22672) It achieves both contradictory goals simultaneously; keeping all test coverages and reducing the time from 2:31 to 0:24. Since we can remove test coverages anytime, can we proceed with Sean's non-intrusive

Re: [VOTE] SPARK 2.4.0 (RC3)

2018-10-12 Thread Dongjoon Hyun
Hi, Holden. Since that's a performance at 2.4.0, I marked as `Blocker` four days ago. Bests, Dongjoon. On Fri, Oct 12, 2018 at 11:45 AM Holden Karau wrote: > Following up I just wanted to make sure this new blocker that Dongjoon > designated is surfaced - >

Re: [DISCUSS][K8S][TESTS] Include Kerberos integration tests for Spark 2.4

2018-10-16 Thread Dongjoon Hyun
I also agree with Reynold and Xiao. Although I love that new feature, Spark 2.4 branch-cut was made a long time ago. We cannot backport new features at this stage at RC4. In addition, could you split Apache SPARK issue IDs, Ilan? It's confusing during discussion. (1) [SPARK-23257][K8S]

GitHub is out of order

2018-10-21 Thread Dongjoon Hyun
Hi, All. Currently, GitHub is out of order. Apache Spark repo is also affected. Newly filed pull requests to Apache Spark repository seem to disappear repeatedly, too. https://status.github.com/messages Bests, Dongjoon.

Re: [VOTE] SPARK 2.4.0 (RC3)

2018-10-19 Thread Dongjoon Hyun
>From the document, should we be more specific with 'Java 8' instead of 'Java 8+' because we don't build (or test) in the community with Java 9 ~ 11. https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc3-docs/_site/building-spark.html > Building Spark using Maven requires Maven 3.3.9 or newer

Re: [VOTE] SPARK 2.4.0 (RC3)

2018-10-19 Thread Dongjoon Hyun
can merge it very soon if we > decide to do it. > > Thanks, > Wenchen > > On Sat, Oct 20, 2018 at 5:27 AM Dongjoon Hyun > wrote: > >> From the document, should we be more specific with 'Java 8' instead of >> 'Java 8+' because we don't build (or test) in the

Re: [VOTE] SPARK 2.4.0 (RC3)

2018-10-13 Thread Dongjoon Hyun
Yes. From my side, it's -1 for RC3. Bests, Dongjoon. On Sat, Oct 13, 2018 at 1:24 PM Holden Karau wrote: > So if it's a blocker would you think this should be a -1? > > On Fri, Oct 12, 2018 at 3:52 PM Dongjoon Hyun > wrote: > >> Hi, Holden. >> >> Since that's

Re: [VOTE] SPARK 2.4.0 (RC4)

2018-10-23 Thread Dongjoon Hyun
BTW, for that integration suite, I saw the related artifacts in the RC4 staging directory. Does Spark 2.4.0 need to start to release these `spark-kubernetes-integration-tests` artifacts? -

  1   2   3   4   5   6   >