JDK11 QA (SPARK-29194)

2019-09-20 Thread Dongjoon Hyun
Hi, All. As a next step, we started JDK11 QA. https://issues.apache.org/jira/browse/SPARK-29194 This issue mainly focuses on the following areas, but feel free to add any sub-issues which you hit on JDK11 from now. - Documentations - Examples - Performance - Integration Tests

Re: [DISCUSS] Spark 2.5 release

2019-09-20 Thread Dongjoon Hyun
Do you mean you want to have a breaking API change between 3.0 and 3.1? I believe we follow Semantic Versioning ( https://spark.apache.org/versioning-policy.html ). > We just won’t add any breaking changes before 3.1. Bests, Dongjoon. On Fri, Sep 20, 2019 at 11:48 AM Ryan Blue wrote: > I

Re: [DISCUSS] Spark 2.5 release

2019-09-21 Thread Dongjoon Hyun
good >>> experience - especially we are not completely closed the chance to further >>> modify DSv2, and the change could be backward incompatible. >>> >>> If we really want to bring the DSv2 change to 2.x version line to let >>> end users avoid forcing to upg

Re: [DISCUSS] Preferred approach on dealing with SPARK-29322

2019-10-01 Thread Dongjoon Hyun
Thank you for reporting, Jungtaek. Can we try to upgrade it to the newer version first? Since we are at 1.4.2, the newer version is 1.4.3. Bests, Dongjoon. On Tue, Oct 1, 2019 at 9:18 PM Mridul Muralidharan wrote: > Makes more sense to drop support for zstd assuming the fix is not >

Re: [DISCUSS] Preferred approach on dealing with SPARK-29322

2019-10-02 Thread Dongjoon Hyun
ld, so I'll try >>> it out. >>> >>> Before that, I just indicated ZstdOutputStream has a parameter >>> "closeFrameOnFlush" which seems to deal with flush. We let the value as the >>> default value which is "false". Let me pass the value to

Re: [DISCUSS] Spark 2.5 release

2019-09-23 Thread Dongjoon Hyun
wrote: >> >>> +1 for Matei's as well. >>> >>> On Sun, 22 Sep 2019, 14:59 Marco Gaido, wrote: >>> >>>> I agree with Matei too. >>>> >>>> Thanks, >>>> Marco >>>> >>>> Il giorno dom 22

Re: Release Apache Spark 2.4.4 before 3.0.0

2019-07-09 Thread Dongjoon Hyun
n before 3.0, but could. Usually maintenance > releases happen 3-4 months apart and the last one was 2 months ago. If > these are significant issues, sure. It'll probably be August before > it's out anyway. > > On Tue, Jul 9, 2019 at 11:15 AM Dongjoon Hyun > wrote: > > > > Hi,

Re: [VOTE] SPARK 3.0.0-preview (RC2)

2019-11-04 Thread Dongjoon Hyun
Hi, Xingbo. Could you sent a vote result email to finalize this vote, please? Bests, Dongjoon. On Fri, Nov 1, 2019 at 2:55 PM Takeshi Yamamuro wrote: > +1, too. > > On Sat, Nov 2, 2019 at 3:36 AM Hyukjin Kwon wrote: > >> +1 >> >> On Fri, 1 Nov 2019, 15:36 Wenchen Fan, wrote: >> >>> The PR

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2019-11-01 Thread Dongjoon Hyun
ty of Hadoop 3.2 profile are unknown. The >>>> changes are massive, including Hive execution and a new version of Hive >>>> thriftserver. >>>> >>>> To reduce the risk, I would like to keep the current default version >>>> unchanged. Wh

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2019-11-02 Thread Dongjoon Hyun
about the >> implications. >> That said my guess is we're close to the point where we don't need to >> support Hadoop 2.x anyway, so, yeah. >> >> On Mon, Oct 28, 2019 at 2:33 PM Dongjoon Hyun >> wrote: >> > >> > Hi, All. >> > &g

Removing `CRAN incoming feasibility` check from the main build

2019-11-02 Thread Dongjoon Hyun
Hi, All. CRAN instability seems to be a blocker for our dev process. The following simple check causes consecutive failures in 4 of 9 Jenkins jobs + PR builder. - spark-branch-2.4-test-sbt-hadoop-2.6 - spark-branch-2.4-test-sbt-hadoop-2.7 - spark-master-test-sbt-hadoop-2.7 -

Re: Removing `CRAN incoming feasibility` check from the main build

2019-11-02 Thread Dongjoon Hyun
, Dongjoon. On Sat, Nov 2, 2019 at 7:10 PM Dongjoon Hyun wrote: > Hi, All. > > CRAN instability seems to be a blocker for our dev process. > The following simple check causes consecutive failures in 4 of 9 Jenkins > jobs + PR builder. > > - spark-branch-2.4-test-sbt-hadoop-2.6 >

Re: [VOTE] SPARK 3.0.0-preview (RC1)

2019-10-30 Thread Dongjoon Hyun
Hi, Xingbo. Currently, RC2 tag is pointing RC1 tag. https://github.com/apache/spark/tree/v3.0.0-preview-rc2 Could you cut from the HEAD of master branch? Otherwise, nobody knows what release script you used for RC2. Bests, Dongjoon. On Wed, Oct 30, 2019 at 4:15 PM Xingbo Jiang wrote: > Hi

Re: [DISCUSS] Deprecate Python < 3.6 in Spark 3.0

2019-10-30 Thread Dongjoon Hyun
rom me as well. >>>> >>>> 2019년 10월 29일 (화) 오전 5:34, Xiangrui Meng 님이 작성: >>>> >>>>> +1. And we should start testing 3.7 and maybe 3.8 in Jenkins. >>>>> >>>>> On Thu, Oct 24, 2019 at 9:34 AM Dongjoon Hy

Re: [apache/spark] [SPARK-29674][CORE] Update dropwizard metrics to 4.1.x for JDK 9+ (#26332)

2019-10-30 Thread Dongjoon Hyun
The Ganglia module has only 2 files. In addition to dropping, we may choose the following two ways to support it still partially like `kafka-0.8` which Apache Spark supports in Scala 2.11 only. 1. We can stick to `dropwizard 3.x` for JDK8 (by default) and use `dropwizard 4.x` for `hadoop-3.2`

Re: Adding JIRA ID as the prefix for the test case name

2019-11-12 Thread Dongjoon Hyun
Thank you for the suggestion, Hyukjin. Previously, we added Jira IDs for the bug fix PR test cases as Gabor said. For the new features (and improvements), we didn't add them because all test cases in the newly added test suite share the same prefix JIRA ID in that case. It might looks

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2019-11-16 Thread Dongjoon Hyun
Thank you for suggestion. Having `hive-2.3` profile sounds good to me because it's orthogonal to Hadoop 3. IIRC, originally, it was proposed in that way, but we put it under `hadoop-3.2` to avoid adding new profiles at that time. And, I'm wondering if you are considering additional pre-built

Re: Spark 2.4.5 release for Parquet and Avro dependency updates?

2019-11-22 Thread Dongjoon Hyun
Hi, Michael. I'm not sure Apache Spark is in the status close to what you want. First, both Apache Spark 3.0.0-preview and Apache Spark 2.4 is using Avro 1.8.2. Also, `master` and `branch-2.4` branch does. Cutting new releases do not provide you what you want. Do we have a PR on the master

Re: The Myth: the forked Hive 1.2.1 is stabler than XXX

2019-11-22 Thread Dongjoon Hyun
Thank you, Steve and all. As a conclusion of this thread, we will merge the following PR and move forward. [SPARK-29981][BUILD] Add hive-1.2/2.3 profiles https://github.com/apache/spark/pull/26619 Please leave your comments if you have any concern. And, the following PRs and more will

Re: SQL test failures in PR builder?

2019-12-04 Thread Dongjoon Hyun
Hi, Sean. It seems that there is no failure on your other SQL PR. https://github.com/apache/spark/pull/26748 Does the sequential failure happen only at `NewSparkPullRequestBuilder`? Since `NewSparkPullRequestBuilder` is not the same with `SparkPullRequestBuilder`, there might be a root

Re: Spark 3.0 preview release 2?

2019-12-09 Thread Dongjoon Hyun
Thank you, All. +1 for another `3.0-preview`. Also, thank you Yuming for volunteering for that! Bests, Dongjoon. On Mon, Dec 9, 2019 at 9:39 AM Xiao Li wrote: > When entering the official release candidates, the new features have to be > disabled or even reverted [if the conf is not

Release Apache Spark 2.4.5 and 2.4.6

2019-12-09 Thread Dongjoon Hyun
Hi, All. Along with the discussion on 3.0.0, I'd like to discuss about the next releases on `branch-2.4`. As we know, `branch-2.4` is our LTS branch and also there exists some questions on the release plans. More releases are important not only for the latest K8s version support, but also for

Re: Spark 3.0 preview release 2?

2019-12-10 Thread Dongjoon Hyun
BTW, our Jenkins seems to be behind. 1. For the first item, `Support JDK 11 with Hadoop 2.7`: At least, we need a new Jenkins job `spark-master-test-maven-hadoop-2.7-jdk-11/`. 2. https://issues.apache.org/jira/browse/SPARK-28900 (Test Pyspark, SparkR on JDK 11 with run-tests) 3.

Re: R linter is broken

2019-12-13 Thread Dongjoon Hyun
It seems to fail at installation because the remote repository seems to be changed. Bests, Dongjoon On Fri, Dec 13, 2019 at 07:46 Nicholas Chammas wrote: > The R linter GitHub action seems to be busted > . >

Re: R linter is broken

2019-12-13 Thread Dongjoon Hyun
Please see here for the root cause. - https://github.community/t5/GitHub-Actions/ubuntu-latest-Apt-repository-list-issues/td-p/41122 On Fri, Dec 13, 2019 at 9:11 AM Dongjoon Hyun wrote: > It seems to fail at installation because the remote repository seems to be > changed. >

Re: R linter is broken

2019-12-13 Thread Dongjoon Hyun
Microsoft mirror is recovered now. Bests, Dongjoon. On Fri, Dec 13, 2019 at 9:45 AM Dongjoon Hyun wrote: > Please see here for the root cause. > > - > https://github.community/t5/GitHub-Actions/ubuntu-latest-Apt-repository-list-issues/td-p/41122 > > > On Fri, Dec 13, 201

Re: Release Apache Spark 2.4.5 and 2.4.6

2019-12-11 Thread Dongjoon Hyun
is probably less to fix, so Jan-Feb 2020 for 2.4.5 and >>> something like middle or Q3 2020 for 2.4.6 is a reasonable >>> expectation. It might plausibly be the last 2.4.x release but who >>> knows. >>> >>> On Mon, Dec 9, 2019 at 12:29 PM Dongjoon Hyun >&

FYI: SPARK-30098 Use default datasource as provider for CREATE TABLE syntax

2019-12-06 Thread Dongjoon Hyun
Hi, All. I want to share the following change to the community. SPARK-30098 Use default datasource as provider for CREATE TABLE syntax This is merged today and now Spark's `CREATE TABLE` is using Spark's default data sources instead of `hive` provider. This is a good and big improvement for

Re: Packages to release in 3.0.0-preview

2019-10-27 Thread Dongjoon Hyun
JDK)(build 1.8.0_232-b09) OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.232-b09, mixed mode) Bests, Dongjoon. On Sun, Oct 27, 2019 at 1:38 PM Dongjoon Hyun wrote: > It seems not a Hadoop issue, doesn't it? > > What Yuming pointed seems to be `Hive 2.3.6` profile implementation issue

Re: Packages to release in 3.0.0-preview

2019-10-27 Thread Dongjoon Hyun
It seems not a Hadoop issue, doesn't it? What Yuming pointed seems to be `Hive 2.3.6` profile implementation issue which is enabled only when `Hadoop 3.2`. >From my side, I'm +1 for publishing jars which depends on `Hadoop 3.2.0 / Hive 2.3.6` jars to Maven since Apache Spark 3.0.0. For the

Apache Spark 3.0 timeline

2019-10-16 Thread Dongjoon Hyun
Hi, All. I saw the following comment from Wenchen in the previous email thread. > Personally I'd like to avoid cutting branch-3.0 right now, otherwise we need to merge PRs into two branches in the following several months. Since 3.0.0-preview seems to be already here for RC, can we update our

branch-3.0 vs branch-3.0-preview (?)

2019-10-15 Thread Dongjoon Hyun
Hi, It seems that we have `branch-3.0-preview` branch. https://github.com/apache/spark/commits/branch-3.0-preview Can we have `branch-3.0` instead of `branch-3.0-preview`? We can tag `v3.0.0-preview` on `branch-3.0` and continue to use for `v3.0.0` later. Bests, Dongjoon.

Re: [DISCUSS] Deprecate Python < 3.6 in Spark 3.0

2019-10-24 Thread Dongjoon Hyun
Thank you for starting the thread. In addition to that, we currently are testing Python 3.6 only in Apache Spark Jenkins environment. Given that Python 3.8 is already out and Apache Spark 3.0.0 RC1 will start next January (https://spark.apache.org/versioning-policy.html), I'm +1 for the

Minimum JDK8 version

2019-10-24 Thread Dongjoon Hyun
Hi, All. Apache Spark 3.x will support both JDK8 and JDK11. I'm wondering if we can have a minimum JDK8 version in Apache Spark 3.0. Specifically, can we start to deprecate JDK8u81 and older at 3.0. Currently, Apache Spark testing infra are testing only with jdk1.8.0_191 and above. Bests,

Re: Minimum JDK8 version

2019-10-24 Thread Dongjoon Hyun
w, any other project announcing the minimum support jdk version? > It seems that hadoop does not. > > On Fri, Oct 25, 2019 at 6:51 AM Sean Owen wrote: > >> Probably, but what is the difference that makes it different to >> support u81 vs later? >> >> On Thu,

Re: Minimum JDK8 version

2019-10-24 Thread Dongjoon Hyun
recent release. > > On Thu, Oct 24, 2019 at 8:56 PM Dongjoon Hyun > wrote: > > > > Thank you for reply, Sean, Shane, Takeshi. > > > > The reason is that there is a PR to aim to add > `-XX:OnOutOfMemoryError="kill -9 %p"` as a default behavior at 3.0.0.

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2019-10-28 Thread Dongjoon Hyun
trong opinion nor info about the >> implications. >> That said my guess is we're close to the point where we don't need to >> support Hadoop 2.x anyway, so, yeah. >> >> On Mon, Oct 28, 2019 at 2:33 PM Dongjoon Hyun >> wrote: >> > >> > Hi, All.

Re: [build system] intermittent network issues + potential power shutoff over the weekend

2019-10-28 Thread Dongjoon Hyun
Thank you for fixing the worker ENVs, Shane. Bests, Dongjoon. On Mon, Oct 28, 2019 at 10:47 AM Shane Knapp wrote: > i will need to restart jenkins -- the worker's ENV vars got borked when > they came back up. > > this is happening NOW. > > shane > > On Mon, Oct 28, 2019 at 10:37 AM Shane Knapp

Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2019-10-28 Thread Dongjoon Hyun
Hi, All. There was a discussion on publishing artifacts built with Hadoop 3 . But, we are still publishing with Hadoop 2.7.3 and `3.0-preview` will be the same because we didn't change anything yet. Technically, we need to change two places for publishing. 1. Jenkins Snapshot Publishing

Re: [VOTE] SPARK 3.0.0-preview (RC1)

2019-10-29 Thread Dongjoon Hyun
Hi, Xingbo. PySpark seems to fail to build. There is only `sha512`. SparkR_3.0.0-preview.tar.gz SparkR_3.0.0-preview.tar.gz.asc SparkR_3.0.0-preview.tar.gz.sha512 *pyspark-3.0.0.preview.tar.gz.sha512* spark-3.0.0-preview-bin-hadoop2.7.tgz spark-3.0.0-preview-bin-hadoop2.7.tgz.asc

Re: Apache Spark 3.0 timeline

2019-10-16 Thread Dongjoon Hyun
pdated statement about 3.0 release. Clearly a preview is imminent. I > figure we are probably moving to code freeze late in the year, release > early next year? Any better ideas about estimates to publish? They aren't > binding. > > On Wed, Oct 16, 2019, 4:01 PM Dongjoon Hyun > wrot

Re: branch-3.0 vs branch-3.0-preview (?)

2019-10-17 Thread Dongjoon Hyun
ah I figured the merge script would pick it up, which >> >> is a little annoying, but you can still just type branch-2.4. >> >> I think we have to retain the branch though if there are any >> >> cherry-picks, to record the state of the release. >> >>

Re: branch-3.0 vs branch-3.0-preview (?)

2019-10-16 Thread Dongjoon Hyun
new features might still go into Spark 3.0 after preview > release, I guess it might make more sense to have separated branches for > 3.0.0 and 3.0-preview. > > > > However, I'm open to both solutions, if we really want to reuse the > branch to also release Spark 3.0.0, then

Re: [build system] intermittent network issues + potential power shutoff over the weekend

2019-10-25 Thread Dongjoon Hyun
Thank you for notice, Shane. Bests, Dongjoon. On Fri, Oct 25, 2019 at 12:31 PM Shane Knapp wrote: > > 1) our department is having some issues w/their network and other > > services. this means that if you're at the jenkins site, you may > > occasionally get a 503 error. just hit refresh a

Re: Unable to resolve dependency of sbt-mima-plugin since yesterday

2019-10-22 Thread Dongjoon Hyun
Hi, All. This is fixed in master/branch-2.4. Bests, Dongjoon. On Tue, Oct 22, 2019 at 12:19 Sean Owen wrote: > Weird. Let's discuss at https://issues.apache.org/jira/browse/SPARK-29560 > > On Tue, Oct 22, 2019 at 2:06 PM Xingbo Jiang > wrote: > > > > Hi, > > > > Do you have any idea why the

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

2019-11-19 Thread Dongjoon Hyun
Cheng, could you elaborate on your criteria, `Hive 2.3 code paths are proven to be stable`? For me, it's difficult to image that we can reach any stable situation when we don't use it at all by ourselves. > The Hive 1.2 code paths can only be removed once the Hive 2.3 code paths are proven to

Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

2019-11-18 Thread Dongjoon Hyun
Hi, All. First of all, I want to put this as a policy issue instead of a technical issue. Also, this is orthogonal from `hadoop` version discussion. Apache Spark community kept (not maintained) the forked Apache Hive 1.2.1 because there has been no other options before. As we see at SPARK-20202,

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2019-11-18 Thread Dongjoon Hyun
- > *From:* Steve Loughran > *Sent:* Sunday, November 17, 2019 9:22:09 AM > *To:* Cheng Lian > *Cc:* Sean Owen ; Wenchen Fan ; > Dongjoon Hyun ; dev ; > Yuming Wang > *Subject:* Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0? > > Can I take this moment t

Re: The Myth: the forked Hive 1.2.1 is stabler than XXX

2019-11-20 Thread Dongjoon Hyun
.x is already full of behavior changes and 'unstable', so I > think this is minor relative to the overall risk question. > > On Wed, Nov 20, 2019 at 12:53 PM Dongjoon Hyun > wrote: > > > > Hi, All. > > > > I'm sending this email because it's important to discu

The Myth: the forked Hive 1.2.1 is stabler than XXX

2019-11-20 Thread Dongjoon Hyun
Hi, All. I'm sending this email because it's important to discuss this topic narrowly and make a clear conclusion. `The forked Hive 1.2.1 is stable`? It sounds like a myth we created by ignoring the existing bugs. If you want to say the forked Hive 1.2.1 is stabler than XXX, please give us the

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

2019-11-20 Thread Dongjoon Hyun
ld then prefer defaulting to Hive 2 in the POM. Am I missing > something about the implication? > > (That fork will stay published forever anyway, that's not an issue per se.) > > On Wed, Nov 20, 2019 at 1:40 AM Dongjoon Hyun > wrote: > > Sean, our published POM

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

2019-11-19 Thread Dongjoon Hyun
people understand works > differently. We can accept some behavior changes. > > On Mon, Nov 18, 2019 at 11:11 PM Dongjoon Hyun > wrote: > > > > Hi, All. > > > > First of all, I want to put this as a policy issue instead of a > technical issue. > > Also

Re: Status of Scala 2.13 support

2019-12-02 Thread Dongjoon Hyun
Thank you for sharing the status, Sean. Given the current circumstance, our status and approach sounds realistic to me. +1 for continuing after cutting `branch-3.0`. Bests, Dongjoon. On Sun, Dec 1, 2019 at 10:50 AM Sean Owen wrote: > As you can see, I've been working on Scala 2.13 support.

Re: [DISCUSS] PostgreSQL dialect

2019-11-27 Thread Dongjoon Hyun
+1 Bests, Dongjoon. On Tue, Nov 26, 2019 at 3:52 PM Takeshi Yamamuro wrote: > Yea, +1, that looks pretty reasonable to me. > > Here I'm proposing to hold off the PostgreSQL dialect. Let's remove it > from the codebase before it's too late. Curently we only have 3 features > under PostgreSQL

Re: [VOTE] SPARK 3.0.0-preview (RC2)

2019-11-01 Thread Dongjoon Hyun
+1 for Apache Spark 3.0.0-preview (RC2). Bests, Dongjoon. On Thu, Oct 31, 2019 at 11:36 PM Wenchen Fan wrote: > The PR builder uses Hadoop 2.7 profile, which makes me think that 2.7 is > more stable and we should make releases using 2.7 by default. > > +1 > > On Fri, Nov 1, 2019 at 7:16 AM

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

2019-11-19 Thread Dongjoon Hyun
eases, I'm afraid that their visibility is not good > enough for covering such major upgrades. > > On Tue, Nov 19, 2019 at 8:39 AM Dongjoon Hyun > wrote: > >> Thank you for feedback, Hyujkjin and Sean. >> >> I proposed `preview-2` for that purpose but I'm also +1 for

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

2019-11-19 Thread Dongjoon Hyun
:11 AM Dongjoon Hyun wrote: > Hi, Cheng. > > This is irrelevant to JDK11 and Hadoop 3. I'm talking about JDK8 world. > If we consider them, it could be the followings. > > +--+-++ > | | Hive 1.2.1 fork

Re: The Myth: the forked Hive 1.2.1 is stabler than XXX

2019-11-20 Thread Dongjoon Hyun
gt;> Just to add - hive 1.2 fork is definitely not more stable. We know of a >> few critical bug fixes that we cherry picked into a fork of that fork to >> maintain ourselves. >> >> >> -- >> *From:* Dongjoon Hyun >> *Sent:* Wednesday, November 20, 2019 11:

Migration `Spark QA Compile` Jenkins jobs to GitHub Action

2019-11-19 Thread Dongjoon Hyun
Hi, All. Apache Spark community used the following dashboard as post-hook verifications. https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/ There are six registered jobs. 1. spark-branch-2.4-compile-maven-hadoop-2.6 2. spark-branch-2.4-compile-maven-hadoop-2.7 3.

Re: Migration `Spark QA Compile` Jenkins jobs to GitHub Action

2019-11-19 Thread Dongjoon Hyun
en wrote: > > > > > > I would favor moving whatever we can to Github. It's difficult to > > > modify the Jenkins instances without Shane's valiant help, and over > > > time makes more sense to modernize and integrate it into the project. > > > > > &

Re: The Myth: the forked Hive 1.2.1 is stabler than XXX

2019-11-20 Thread Dongjoon Hyun
nitializing the Hive metastore >>> client. And those dependencies will NOT conflict with the built-in Hive >>> 1.2.1 jars, because the downloaded jars are loaded using an isolated >>> classloader (see here >>> <https://github.com/apache/spark/blob/1febd373ea8

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

2019-11-20 Thread Dongjoon Hyun
keep the Hive 1.2.1 fork looks like a feasible approach to me. > Thanks for starting the discussion! > > On Wed, Nov 20, 2019 at 9:46 AM Dongjoon Hyun > wrote: > >> Yes. Right. That's the situation we are hitting and the result I expected. >> We need to change our defau

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

2019-11-19 Thread Dongjoon Hyun
n always build Spark with proper profiles >> themselves. >> >> And thanks for clarifying the Hive 2.3.5 issue. I didn't notice that it's >> due to the folder name. >> >> On Tue, Nov 19, 2019 at 11:15 AM Dongjoon Hyun >> wrote: >> >>> BTW, `hive.version.s

Re: Spark 3.0 preview release on-going features discussion

2019-09-20 Thread Dongjoon Hyun
Thank you for the summarization, Xingbo. I also agree with Sean because I don't think those block 3.0.0 preview release. Especially, correctness issues should not be there. Instead, could you summarize what we have as of now for 3.0.0 preview? I believe JDK11 (SPARK-28684) and Hive 2.3.5

Re: [VOTE][SPARK-28885] Follow ANSI store assignment rules in table insertion by default

2019-10-10 Thread Dongjoon Hyun
+1 Bests, Dongjoon On Thu, Oct 10, 2019 at 10:14 Ryan Blue wrote: > +1 > > Thanks for fixing this! > > On Thu, Oct 10, 2019 at 6:30 AM Xiao Li wrote: > >> +1 >> >> On Thu, Oct 10, 2019 at 2:13 AM Hyukjin Kwon wrote: >> >>> +1 (binding) >>> >>> 2019년 10월 10일 (목) 오후 5:11, Takeshi Yamamuro 님이

Re: Spark 3.0 preview release feature list and major changes

2019-10-08 Thread Dongjoon Hyun
Thank you for the preparation of 3.0-preview, Xingbo! Bests, Dongjoon. On Tue, Oct 8, 2019 at 2:32 PM Xingbo Jiang wrote: > What's the process to propose a feature to be included in the final Spark >> 3.0 release? >> > > I don't know whether there exists any specific process here, normally

Re: Auto-closing PRs when there are no feedback or response from its author

2019-10-09 Thread Dongjoon Hyun
Thank you for keeping eyes on this difficult issue, Hyukjin. Although we try our best, there exist some corner cases always. For examples, 1. Although we close old JIRA issues on EOL-version only, but some issues doesn't have `Affected Versions` field info at all. -

Re: [VOTE] SPARK 3.0.0-preview2 (RC2)

2019-12-21 Thread Dongjoon Hyun
ed binaries. >> Also, I run tests with -Pyarn -Phadoop-2.7 -Phive -Phive-thriftserver >> -Pmesos -Pkubernetes -Psparkr >> on java version "1.8.0_181. >> All the things above look fine. >> >> Bests, >> Takeshi >> >> On Thu, Dec 19, 201

Re: [VOTE][RESULT] SPARK 3.0.0-preview2 (RC2)

2019-12-22 Thread Dongjoon Hyun
nouncement once everything is > published. > > +1 (* = binding): > - Sean Owen * > - Dongjoon Hyun * > - Takeshi Yamamuro * > - Wenchen Fan * > > +0: None > > -1: None > > > > > Regards, > Yuming >

Re: [VOTE] SPARK 3.0.0-preview2 (RC2)

2019-12-18 Thread Dongjoon Hyun
+1 I also check the signatures and docs. And, built and tested with JDK 11.0.5, Hadoop 3.2, Hive 2.3. In addition, the newly added `spark-3.0.0-preview2-bin-hadoop2.7-hive1.2.tgz` distribution looks correct. Thank you Yuming and all. Bests, Dongjoon. On Tue, Dec 17, 2019 at 4:11 PM Sean Owen

Re: [ANNOUNCE] Announcing Apache Spark 2.4.5

2020-02-08 Thread Dongjoon Hyun
There was a typo in one URL. The correct release note URL is here. https://spark.apache.org/releases/spark-release-2-4-5.html On Sat, Feb 8, 2020 at 5:22 PM Dongjoon Hyun wrote: > We are happy to announce the availability of Spark 2.4.5! > > Spark 2.4.5 is a maintenance release c

[ANNOUNCE] Announcing Apache Spark 2.4.5

2020-02-08 Thread Dongjoon Hyun
all community members for contributing to this release. This release would not have been possible without you. Dongjoon Hyun

Re: GitHub action permissions

2020-02-28 Thread Dongjoon Hyun
Hi, Thomas. If you log-in with a GitHub account registered Apache project member, it will be enough. On some PRs of Apache Spark, can you see 'Squash and merge' button? Bests, Dongjoon On Fri, Feb 28, 2020 at 07:15 Thomas graves wrote: > Does anyone know how the GitHub action permissions

Re: [Proposal] Modification to Spark's Semantic Versioning Policy

2020-03-05 Thread Dongjoon Hyun
Hi, All. There is a on-going Xiao's PR referencing this email. https://github.com/apache/spark/pull/27821 Bests, Dongjoon. On Fri, Feb 28, 2020 at 11:20 AM Sean Owen wrote: > On Fri, Feb 28, 2020 at 12:03 PM Holden Karau > wrote: > >> 1. Could you estimate how many revert commits are

Re: [Proposal] Modification to Spark's Semantic Versioning Policy

2020-02-28 Thread Dongjoon Hyun
Hi, Matei and Michael. I'm also a big supporter for policy-based project management. Before going further, 1. Could you estimate how many revert commits are required in `branch-3.0` for new rubric? 2. Are you going to revert all removed test cases for the deprecated ones? 3. Does it

Re: 'spark-master-docs' job missing in Jenkins

2020-02-26 Thread Dongjoon Hyun
Instead of adding another Jenkins job, adding GitHub Action job will be a better solution because we can share the long-term workload of maintenance. I'll make a PR for that. Bests, Dongjoon. On Tue, Feb 25, 2020 at 9:10 PM Hyukjin Kwon wrote: > Hm, we should still run this I believe. PR

`Target Version` management on correctness/data-loss Issues

2020-01-26 Thread Dongjoon Hyun
Hi, All. After 2.4.5 RC1 vote failure, I asked your opinions about correctness/dataloss issues (at mailing lists/JIRAs/PRs) in order to collect the current status and public opinion widely in the community to build a consensus on this at this time. Before talking about those issues, please

Re: Block a user from spark-website who repeatedly open the invalid same PR

2020-01-26 Thread Dongjoon Hyun
+1 On Sun, Jan 26, 2020 at 13:22 Shane Knapp wrote: > +1 > > On Sun, Jan 26, 2020 at 10:01 AM Denny Lee wrote: > > > > +1 > > > > On Sun, Jan 26, 2020 at 09:59 Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > >> > >> +1 > >> > >> I think y'all have shown this person more patience than

Re: `Target Version` management on correctness/data-loss Issues

2020-01-27 Thread Dongjoon Hyun
ould we do something like mail the dev list whenever one of these issues > is tagged if its not going to be back ported to an affected release? > > Tom > On Sunday, January 26, 2020, 11:22:13 PM CST, Dongjoon Hyun < > dongjoon.h...@gmail.com> wrote: > > > Hi, All. &g

Re: Spark 2.4.5 RC2 Preparation Status

2020-01-29 Thread Dongjoon Hyun
t; Are we talking about https://issues.apache.org/jira/browse/SPARK-28344 - > targeted for 2.4.5 but not backported, and a 'correctness' issue? > Simply: who argues this must hold up 2.4.5, and if so what's the status? > > On Wed, Jan 29, 2020 at 11:27 AM Dongjoon Hyun > wrote: &

Re: `Target Version` management on correctness/data-loss Issues

2020-01-27 Thread Dongjoon Hyun
ion` explicitly if you think there is any other correctness/dataloss issue which is blocking 2.4.5 RC2. Otherwise, it's very hard for the release manager to notice it from the hey stacks of JIRA comments and PR comments. Bests, Dongjoon. On Mon, Jan 27, 2020 at 12:30 PM Dongjoon Hyun wrote: &g

Re: Spark 2.4.5 RC2 Preparation Status

2020-01-29 Thread Dongjoon Hyun
ntinue with RC2 assuming none of these are > necessary for 2.4.5. > It's not wrong to 'wait' if there are strong feelings about something, > but, if we can't see a reason to expect the situation changes in a week, 2 > weeks, then, why? The release of 2.4.5 nowish doesn't necessarily make the > rel

Re: Spark 2.4.5 RC2 Preparation Status

2020-01-29 Thread Dongjoon Hyun
should be noncontroversial. (Not sure why I didn't > backport originally) > > On Wed, Jan 29, 2020 at 3:27 PM Dongjoon Hyun > wrote: > > > > Thanks, Sean. > > > > If there is no further objection to the mailing list, > > could you remove the `Target Versio

Re: Spark 3.0 and ORC 1.6

2020-01-29 Thread Dongjoon Hyun
Hi, David. Thank you for sharing your opinion. I'm also a supporter for ZStandard. Apache Spark 3.0 starts to take advantage of ZStd a lot. 1) Switch the default codec for MapOutputStatus from GZip to ZStd. 2) Add spark.eventLog.compression.codec to allow ZStd. 3) Use Parquet+ZStd

Apache Spark Docker image repository

2020-02-05 Thread Dongjoon Hyun
Hi, All. >From 2020, shall we have an official Docker image repository as an additional distribution channel? I'm considering the following images. - Public binary release (no snapshot image) - Public non-Spark base image (OS + R + Python) (This can be used in GitHub Action Jobs

Re: Spark 3.0 branch cut and code freeze on Jan 31?

2020-02-04 Thread Dongjoon Hyun
d have multiple subtasks and part of >>>> subtasks have been merged and other subtask(s) are in reviewing. In this >>>> case do we allow these subtasks to have more days to get reviewed and >>>> merged later? >>>> >>>> Happy Holiday! >>>> &

[VOTE][RESULT] Spark 2.4.5 (RC2)

2020-02-05 Thread Dongjoon Hyun
Hi, All. The vote passes. Thanks to all who helped with this release 2.4.5! I'll follow up later with a release announcement once everything is published. +1 (* = binding): - Dongjoon Hyun * - Wenchen Fan * - Hyukjin Kwon * - Takeshi Yamamuro - Maxim Gekk - Sean Owen * +0: None -1: None Bests

[FYI] `Target Version` on `Improvement`/`New Feature` JIRA issues

2020-02-01 Thread Dongjoon Hyun
Hi, All. >From Today, we have `branch-3.0` as a tool of `Feature Freeze`. https://github.com/apache/spark/tree/branch-3.0 All open JIRA issues whose type is `Improvement` or `New Feature` and had `3.0.0` as a `Target Version` are changed accordingly first. - Most of them are

Revise the blocker policy

2020-01-31 Thread Dongjoon Hyun
Hi, All. We discussed the correctness/dataloss policies for two weeks. According to our practice, I want to revise our policy in our website explicitly. - Correctness and data loss issues should be considered Blockers + Correctness and data loss issues should be considered Blockers for their

Re: [VOTE] Release Apache Spark 2.4.5 (RC2)

2020-02-03 Thread Dongjoon Hyun
Yes, it does officially since 2.4.0. 2.4.5 is a maintenance release of 2.4.x line and the community didn't support Hadoop 3.x on 'branch-2.4'. We didn't run test at all. Bests, Dongjoon. On Sun, Feb 2, 2020 at 22:58 Ajith shetty wrote: > Is hadoop-3.1 profile supported for this release.? i

Re: new branch-3.0 jenkins job configs are ready to be deployed...

2020-01-31 Thread Dongjoon Hyun
Thank you, Shane. BTW, we need to enable JDK11 unit run on Python and R. (Currently, it's only tested in PRBuilder.) https://issues.apache.org/jira/browse/SPARK-28900 Today, Thomas and I'm hitting Python UT failure on JDK11 environment in independent PRs. ERROR [32.750s]:

Re: new branch-3.0 jenkins job configs are ready to be deployed...

2020-01-31 Thread Dongjoon Hyun
/StreamingLogisticRegressionWithSGDTests/test_parameter_accuracy/ Anyway, I'll file a JIRA issue for this Python flakiness. Bests, Dongjoon. On Fri, Jan 31, 2020 at 5:17 PM Dongjoon Hyun wrote: > Thank you, Shane. > > BTW, we need to enable JDK11 unit run on Python and R. (Currently, it's > only tested

Re: Apache Spark Docker image repository

2020-02-07 Thread Dongjoon Hyun
s in the docker images and how useful > it is. People run different os's, different python versions, etc. And like > Sean mentioned how useful really is it other then a few examples. Some > discussions on https://issues.apache.org/jira/browse/SPARK-24655 > > Tom > > > >

Re: [spark-packages.org] Jenkins down

2020-01-24 Thread Dongjoon Hyun
Thank you for working on that, Xiao. BTW, I'm wondering why SPARK-30636 is a blocker for 2.4.5 release? Do you mean `Critical`? Bests, Dongjoon. On Fri, Jan 24, 2020 at 10:20 AM Xiao Li wrote: > Hi, all, > > Because the Jenkins of spark-packages.org is down, new packages or > releases are

Re: [spark-packages.org] Jenkins down

2020-01-24 Thread Dongjoon Hyun
Thank you for updating! On Fri, Jan 24, 2020 at 10:29 AM Xiao Li wrote: > It does not block any Spark release. Reduced the priority to Critical. > > Cheers, > > Xiao > > Dongjoon Hyun 于2020年1月24日周五 上午10:24写道: > >> Thank you for working on that, Xiao. >> &

Re: [DISCUSS][SPARK-30275] Discussion about whether to add a gitlab-ci.yml file

2020-01-23 Thread Dongjoon Hyun
Hi, Jim. Thank you for the proposal. I understand the request. However, the following key benefit sounds like unofficial snapshot binary releases. > For example, this was used to build a version of spark that included SPARK-28938 which has yet to be released and was necessary for spark-operator

Re: `Target Version` management on correctness/data-loss Issues

2020-01-28 Thread Dongjoon Hyun
e visibility to > those important issues and get people interesting in working on them sooner. > > Tom > > On Monday, January 27, 2020, 02:31:03 PM CST, Dongjoon Hyun < > dongjoon.h...@gmail.com> wrote: > > > Yes. That is what I pointed in `Unfortunately, we didn't

[VOTE] Release Apache Spark 2.4.5 (RC2)

2020-02-02 Thread Dongjoon Hyun
Please vote on releasing the following candidate as Apache Spark version 2.4.5. The vote is open until February 5th 11PM PST and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 2.4.5 [ ] -1 Do not release this package because

Re: [VOTE] Release Apache Spark 2.4.5 (RC2)

2020-02-02 Thread Dongjoon Hyun
Version: v1.14.9-eks-c0eccc Bests, Dongjoon. On Sun, Feb 2, 2020 at 9:30 PM Dongjoon Hyun wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.4.5. > > The vote is open until February 5th 11PM PST and passes if a majority +1 > PMC votes are cast, with

Re: [DISCUSSION] Esoteric Spark function `TRIM/LTRIM/RTRIM`

2020-02-17 Thread Dongjoon Hyun
is no such standard that defines the >>> parameter order of the TRIM function. >>> >>> In the long term, we would also promote the SQL standard TRIM syntax. I >>> don't see much benefit of "fixing" the parameter order that worth to make a >>>

<    1   2   3   4   5   6   7   8   >