Re: [build system] intermittent network issues + potential power shutoff over the weekend

2019-10-25 Thread Dongjoon Hyun
Thank you for notice, Shane. Bests, Dongjoon. On Fri, Oct 25, 2019 at 12:31 PM Shane Knapp wrote: > > 1) our department is having some issues w/their network and other > > services. this means that if you're at the jenkins site, you may > > occasionally get a 503 error. just hit refresh a cou

Re: Packages to release in 3.0.0-preview

2019-10-27 Thread Dongjoon Hyun
It seems not a Hadoop issue, doesn't it? What Yuming pointed seems to be `Hive 2.3.6` profile implementation issue which is enabled only when `Hadoop 3.2`. >From my side, I'm +1 for publishing jars which depends on `Hadoop 3.2.0 / Hive 2.3.6` jars to Maven since Apache Spark 3.0.0. For the other

Re: Packages to release in 3.0.0-preview

2019-10-27 Thread Dongjoon Hyun
(AdoptOpenJDK)(build 1.8.0_232-b09) OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.232-b09, mixed mode) Bests, Dongjoon. On Sun, Oct 27, 2019 at 1:38 PM Dongjoon Hyun wrote: > It seems not a Hadoop issue, doesn't it? > > What Yuming pointed seems to be `Hive 2.3.6` profile implem

Re: [build system] intermittent network issues + potential power shutoff over the weekend

2019-10-28 Thread Dongjoon Hyun
Thank you for fixing the worker ENVs, Shane. Bests, Dongjoon. On Mon, Oct 28, 2019 at 10:47 AM Shane Knapp wrote: > i will need to restart jenkins -- the worker's ENV vars got borked when > they came back up. > > this is happening NOW. > > shane > > On Mon, Oct 28, 2019 at 10:37 AM Shane Knapp

Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2019-10-28 Thread Dongjoon Hyun
Hi, All. There was a discussion on publishing artifacts built with Hadoop 3 . But, we are still publishing with Hadoop 2.7.3 and `3.0-preview` will be the same because we didn't change anything yet. Technically, we need to change two places for publishing. 1. Jenkins Snapshot Publishing https:/

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2019-10-28 Thread Dongjoon Hyun
t have a strong opinion nor info about the >> implications. >> That said my guess is we're close to the point where we don't need to >> support Hadoop 2.x anyway, so, yeah. >> >> On Mon, Oct 28, 2019 at 2:33 PM Dongjoon Hyun >> wrote: >> > &g

Re: [VOTE] SPARK 3.0.0-preview (RC1)

2019-10-29 Thread Dongjoon Hyun
Hi, Xingbo. PySpark seems to fail to build. There is only `sha512`. SparkR_3.0.0-preview.tar.gz SparkR_3.0.0-preview.tar.gz.asc SparkR_3.0.0-preview.tar.gz.sha512 *pyspark-3.0.0.preview.tar.gz.sha512* spark-3.0.0-preview-bin-hadoop2.7.tgz spark-3.0.0-preview-bin-hadoop2.7.tgz.asc spark-3.0.0-prev

Re: [DISCUSS] Deprecate Python < 3.6 in Spark 3.0

2019-10-30 Thread Dongjoon Hyun
rom me as well. >>>> >>>> 2019년 10월 29일 (화) 오전 5:34, Xiangrui Meng 님이 작성: >>>> >>>>> +1. And we should start testing 3.7 and maybe 3.8 in Jenkins. >>>>> >>>>> On Thu, Oct 24, 2019 at 9:34 AM Dongjoon Hy

Re: [VOTE] SPARK 3.0.0-preview (RC1)

2019-10-30 Thread Dongjoon Hyun
Hi, Xingbo. Currently, RC2 tag is pointing RC1 tag. https://github.com/apache/spark/tree/v3.0.0-preview-rc2 Could you cut from the HEAD of master branch? Otherwise, nobody knows what release script you used for RC2. Bests, Dongjoon. On Wed, Oct 30, 2019 at 4:15 PM Xingbo Jiang wrote: > Hi

Re: [apache/spark] [SPARK-29674][CORE] Update dropwizard metrics to 4.1.x for JDK 9+ (#26332)

2019-10-30 Thread Dongjoon Hyun
The Ganglia module has only 2 files. In addition to dropping, we may choose the following two ways to support it still partially like `kafka-0.8` which Apache Spark supports in Scala 2.11 only. 1. We can stick to `dropwizard 3.x` for JDK8 (by default) and use `dropwizard 4.x` for `hadoop-3.2` p

Re: [VOTE] SPARK 3.0.0-preview (RC2)

2019-11-01 Thread Dongjoon Hyun
+1 for Apache Spark 3.0.0-preview (RC2). Bests, Dongjoon. On Thu, Oct 31, 2019 at 11:36 PM Wenchen Fan wrote: > The PR builder uses Hadoop 2.7 profile, which makes me think that 2.7 is > more stable and we should make releases using 2.7 by default. > > +1 > > On Fri, Nov 1, 2019 at 7:16 AM Xiao

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2019-11-01 Thread Dongjoon Hyun
ility and quality of Hadoop 3.2 profile are unknown. The >>>> changes are massive, including Hive execution and a new version of Hive >>>> thriftserver. >>>> >>>> To reduce the risk, I would like to keep the current default version >>>>

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2019-11-02 Thread Dongjoon Hyun
strong opinion nor info about the >> implications. >> That said my guess is we're close to the point where we don't need to >> support Hadoop 2.x anyway, so, yeah. >> >> On Mon, Oct 28, 2019 at 2:33 PM Dongjoon Hyun >> wrote: >> > >>

Removing `CRAN incoming feasibility` check from the main build

2019-11-02 Thread Dongjoon Hyun
Hi, All. CRAN instability seems to be a blocker for our dev process. The following simple check causes consecutive failures in 4 of 9 Jenkins jobs + PR builder. - spark-branch-2.4-test-sbt-hadoop-2.6 - spark-branch-2.4-test-sbt-hadoop-2.7 - spark-master-test-sbt-hadoop-2.7 - spark-master-test-sbt

Re: Removing `CRAN incoming feasibility` check from the main build

2019-11-02 Thread Dongjoon Hyun
, Dongjoon. On Sat, Nov 2, 2019 at 7:10 PM Dongjoon Hyun wrote: > Hi, All. > > CRAN instability seems to be a blocker for our dev process. > The following simple check causes consecutive failures in 4 of 9 Jenkins > jobs + PR builder. > > - spark-branch-2.4-test-sbt-hadoop-2.6 >

Re: [VOTE] SPARK 3.0.0-preview (RC2)

2019-11-04 Thread Dongjoon Hyun
Hi, Xingbo. Could you sent a vote result email to finalize this vote, please? Bests, Dongjoon. On Fri, Nov 1, 2019 at 2:55 PM Takeshi Yamamuro wrote: > +1, too. > > On Sat, Nov 2, 2019 at 3:36 AM Hyukjin Kwon wrote: > >> +1 >> >> On Fri, 1 Nov 2019, 15:36 Wenchen Fan, wrote: >> >>> The PR bu

Re: Adding JIRA ID as the prefix for the test case name

2019-11-12 Thread Dongjoon Hyun
Thank you for the suggestion, Hyukjin. Previously, we added Jira IDs for the bug fix PR test cases as Gabor said. For the new features (and improvements), we didn't add them because all test cases in the newly added test suite share the same prefix JIRA ID in that case. It might looks redundant

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2019-11-16 Thread Dongjoon Hyun
Thank you for suggestion. Having `hive-2.3` profile sounds good to me because it's orthogonal to Hadoop 3. IIRC, originally, it was proposed in that way, but we put it under `hadoop-3.2` to avoid adding new profiles at that time. And, I'm wondering if you are considering additional pre-built dist

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2019-11-18 Thread Dongjoon Hyun
--- > *From:* Steve Loughran > *Sent:* Sunday, November 17, 2019 9:22:09 AM > *To:* Cheng Lian > *Cc:* Sean Owen ; Wenchen Fan ; > Dongjoon Hyun ; dev ; > Yuming Wang > *Subject:* Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0? > > Can I take thi

Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

2019-11-18 Thread Dongjoon Hyun
Hi, All. First of all, I want to put this as a policy issue instead of a technical issue. Also, this is orthogonal from `hadoop` version discussion. Apache Spark community kept (not maintained) the forked Apache Hive 1.2.1 because there has been no other options before. As we see at SPARK-20202,

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

2019-11-19 Thread Dongjoon Hyun
> Spark 3.0 is already something that people understand works > differently. We can accept some behavior changes. > > On Mon, Nov 18, 2019 at 11:11 PM Dongjoon Hyun > wrote: > > > > Hi, All. > > > > First of all, I want to put this as a policy issue instead o

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

2019-11-19 Thread Dongjoon Hyun
> 3.0. For preview releases, I'm afraid that their visibility is not good > enough for covering such major upgrades. > > On Tue, Nov 19, 2019 at 8:39 AM Dongjoon Hyun > wrote: > >> Thank you for feedback, Hyujkjin and Sean. >> >> I proposed `preview-2` for th

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

2019-11-19 Thread Dongjoon Hyun
, Nov 19, 2019 at 11:11 AM Dongjoon Hyun wrote: > Hi, Cheng. > > This is irrelevant to JDK11 and Hadoop 3. I'm talking about JDK8 world. > If we consider them, it could be the followings. > > +--+-++ > |

Migration `Spark QA Compile` Jenkins jobs to GitHub Action

2019-11-19 Thread Dongjoon Hyun
Hi, All. Apache Spark community used the following dashboard as post-hook verifications. https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/ There are six registered jobs. 1. spark-branch-2.4-compile-maven-hadoop-2.6 2. spark-branch-2.4-compile-maven-hadoop-2.7 3.

Re: Migration `Spark QA Compile` Jenkins jobs to GitHub Action

2019-11-19 Thread Dongjoon Hyun
9 at 1:49 PM Sean Owen wrote: > > > > > > I would favor moving whatever we can to Github. It's difficult to > > > modify the Jenkins instances without Shane's valiant help, and over > > > time makes more sense to modernize and integrate it into the proje

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

2019-11-19 Thread Dongjoon Hyun
combination, they can always build Spark with proper profiles >> themselves. >> >> And thanks for clarifying the Hive 2.3.5 issue. I didn't notice that it's >> due to the folder name. >> >> On Tue, Nov 19, 2019 at 11:15 AM Dongjoon Hyun >> wrote: >>

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

2019-11-19 Thread Dongjoon Hyun
Cheng, could you elaborate on your criteria, `Hive 2.3 code paths are proven to be stable`? For me, it's difficult to image that we can reach any stable situation when we don't use it at all by ourselves. > The Hive 1.2 code paths can only be removed once the Hive 2.3 code paths are proven to

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

2019-11-20 Thread Dongjoon Hyun
ent either way about what's the default. I > too would then prefer defaulting to Hive 2 in the POM. Am I missing > something about the implication? > > (That fork will stay published forever anyway, that's not an issue per se.) > > On Wed, Nov 20, 2019 at 1:40 A

The Myth: the forked Hive 1.2.1 is stabler than XXX

2019-11-20 Thread Dongjoon Hyun
Hi, All. I'm sending this email because it's important to discuss this topic narrowly and make a clear conclusion. `The forked Hive 1.2.1 is stable`? It sounds like a myth we created by ignoring the existing bugs. If you want to say the forked Hive 1.2.1 is stabler than XXX, please give us the ev

Re: The Myth: the forked Hive 1.2.1 is stabler than XXX

2019-11-20 Thread Dongjoon Hyun
n't care should probably be nudged to 2.x. > Spark 3.x is already full of behavior changes and 'unstable', so I > think this is minor relative to the overall risk question. > > On Wed, Nov 20, 2019 at 12:53 PM Dongjoon Hyun > wrote: > > > > Hi, All. > &g

Re: The Myth: the forked Hive 1.2.1 is stabler than XXX

2019-11-20 Thread Dongjoon Hyun
2019 at 11:49 AM Felix Cheung > wrote: > >> Just to add - hive 1.2 fork is definitely not more stable. We know of a >> few critical bug fixes that we cherry picked into a fork of that fork to >> maintain ourselves. >> >> >> -- >> *From:* Dongjoon Hyun >

Re: The Myth: the forked Hive 1.2.1 is stabler than XXX

2019-11-20 Thread Dongjoon Hyun
gt; dependencies from Maven at runtime when initializing the Hive metastore >>> client. And those dependencies will NOT conflict with the built-in Hive >>> 1.2.1 jars, because the downloaded jars are loaded using an isolated >>> classloader (see here >>> <htt

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

2019-11-20 Thread Dongjoon Hyun
profile to keep the Hive 1.2.1 fork looks like a feasible approach to me. > Thanks for starting the discussion! > > On Wed, Nov 20, 2019 at 9:46 AM Dongjoon Hyun > wrote: > >> Yes. Right. That's the situation we are hitting and the result I expected. >> We need to c

Re: Spark 2.4.5 release for Parquet and Avro dependency updates?

2019-11-22 Thread Dongjoon Hyun
Hi, Michael. I'm not sure Apache Spark is in the status close to what you want. First, both Apache Spark 3.0.0-preview and Apache Spark 2.4 is using Avro 1.8.2. Also, `master` and `branch-2.4` branch does. Cutting new releases do not provide you what you want. Do we have a PR on the master branc

Re: The Myth: the forked Hive 1.2.1 is stabler than XXX

2019-11-22 Thread Dongjoon Hyun
Thank you, Steve and all. As a conclusion of this thread, we will merge the following PR and move forward. [SPARK-29981][BUILD] Add hive-1.2/2.3 profiles https://github.com/apache/spark/pull/26619 Please leave your comments if you have any concern. And, the following PRs and more will fo

Re: [DISCUSS] PostgreSQL dialect

2019-11-27 Thread Dongjoon Hyun
+1 Bests, Dongjoon. On Tue, Nov 26, 2019 at 3:52 PM Takeshi Yamamuro wrote: > Yea, +1, that looks pretty reasonable to me. > > Here I'm proposing to hold off the PostgreSQL dialect. Let's remove it > from the codebase before it's too late. Curently we only have 3 features > under PostgreSQL dia

Re: Status of Scala 2.13 support

2019-12-02 Thread Dongjoon Hyun
Thank you for sharing the status, Sean. Given the current circumstance, our status and approach sounds realistic to me. +1 for continuing after cutting `branch-3.0`. Bests, Dongjoon. On Sun, Dec 1, 2019 at 10:50 AM Sean Owen wrote: > As you can see, I've been working on Scala 2.13 support. T

Re: SQL test failures in PR builder?

2019-12-04 Thread Dongjoon Hyun
Hi, Sean. It seems that there is no failure on your other SQL PR. https://github.com/apache/spark/pull/26748 Does the sequential failure happen only at `NewSparkPullRequestBuilder`? Since `NewSparkPullRequestBuilder` is not the same with `SparkPullRequestBuilder`, there might be a root cause

FYI: SPARK-30098 Use default datasource as provider for CREATE TABLE syntax

2019-12-06 Thread Dongjoon Hyun
Hi, All. I want to share the following change to the community. SPARK-30098 Use default datasource as provider for CREATE TABLE syntax This is merged today and now Spark's `CREATE TABLE` is using Spark's default data sources instead of `hive` provider. This is a good and big improvement for

Re: Spark 3.0 preview release 2?

2019-12-09 Thread Dongjoon Hyun
Thank you, All. +1 for another `3.0-preview`. Also, thank you Yuming for volunteering for that! Bests, Dongjoon. On Mon, Dec 9, 2019 at 9:39 AM Xiao Li wrote: > When entering the official release candidates, the new features have to be > disabled or even reverted [if the conf is not availabl

Release Apache Spark 2.4.5 and 2.4.6

2019-12-09 Thread Dongjoon Hyun
Hi, All. Along with the discussion on 3.0.0, I'd like to discuss about the next releases on `branch-2.4`. As we know, `branch-2.4` is our LTS branch and also there exists some questions on the release plans. More releases are important not only for the latest K8s version support, but also for del

Re: Spark 3.0 preview release 2?

2019-12-10 Thread Dongjoon Hyun
BTW, our Jenkins seems to be behind. 1. For the first item, `Support JDK 11 with Hadoop 2.7`: At least, we need a new Jenkins job `spark-master-test-maven-hadoop-2.7-jdk-11/`. 2. https://issues.apache.org/jira/browse/SPARK-28900 (Test Pyspark, SparkR on JDK 11 with run-tests) 3. https://issues

Re: Release Apache Spark 2.4.5 and 2.4.6

2019-12-11 Thread Dongjoon Hyun
here is probably less to fix, so Jan-Feb 2020 for 2.4.5 and >>> something like middle or Q3 2020 for 2.4.6 is a reasonable >>> expectation. It might plausibly be the last 2.4.x release but who >>> knows. >>> >>> On Mon, Dec 9, 2019 at 12:29 PM Dongjoon Hyun

Re: R linter is broken

2019-12-13 Thread Dongjoon Hyun
It seems to fail at installation because the remote repository seems to be changed. Bests, Dongjoon On Fri, Dec 13, 2019 at 07:46 Nicholas Chammas wrote: > The R linter GitHub action seems to be busted > . > Loo

Re: R linter is broken

2019-12-13 Thread Dongjoon Hyun
Please see here for the root cause. - https://github.community/t5/GitHub-Actions/ubuntu-latest-Apt-repository-list-issues/td-p/41122 On Fri, Dec 13, 2019 at 9:11 AM Dongjoon Hyun wrote: > It seems to fail at installation because the remote repository seems to be > changed. >

Re: R linter is broken

2019-12-13 Thread Dongjoon Hyun
Microsoft mirror is recovered now. Bests, Dongjoon. On Fri, Dec 13, 2019 at 9:45 AM Dongjoon Hyun wrote: > Please see here for the root cause. > > - > https://github.community/t5/GitHub-Actions/ubuntu-latest-Apt-repository-list-issues/td-p/41122 > > > On Fri, Dec 13, 201

Re: [VOTE] SPARK 3.0.0-preview2 (RC2)

2019-12-18 Thread Dongjoon Hyun
+1 I also check the signatures and docs. And, built and tested with JDK 11.0.5, Hadoop 3.2, Hive 2.3. In addition, the newly added `spark-3.0.0-preview2-bin-hadoop2.7-hive1.2.tgz` distribution looks correct. Thank you Yuming and all. Bests, Dongjoon. On Tue, Dec 17, 2019 at 4:11 PM Sean Owen

Re: [VOTE] SPARK 3.0.0-preview2 (RC2)

2019-12-21 Thread Dongjoon Hyun
ed binaries. >> Also, I run tests with -Pyarn -Phadoop-2.7 -Phive -Phive-thriftserver >> -Pmesos -Pkubernetes -Psparkr >> on java version "1.8.0_181. >> All the things above look fine. >> >> Bests, >> Takeshi >> >> On Thu, Dec 19, 2019 at

Re: [VOTE][RESULT] SPARK 3.0.0-preview2 (RC2)

2019-12-22 Thread Dongjoon Hyun
se announcement once everything is > published. > > +1 (* = binding): > - Sean Owen * > - Dongjoon Hyun * > - Takeshi Yamamuro * > - Wenchen Fan * > > +0: None > > -1: None > > > > > Regards, > Yuming >

Re: Spark 3.0 branch cut and code freeze on Jan 31?

2019-12-24 Thread Dongjoon Hyun
+1 for January 31st. Bests, Dongjoon. On Tue, Dec 24, 2019 at 7:11 AM Xiao Li wrote: > Jan 31 is pretty reasonable. Happy Holidays! > > Xiao > > On Tue, Dec 24, 2019 at 5:52 AM Sean Owen wrote: > >> Yep, always happens. Is earlier realistic, like Jan 15? it's all >> arbitrary but indeed this h

Re: [ANNOUNCE] Announcing Apache Spark 3.0.0-preview2

2019-12-24 Thread Dongjoon Hyun
Indeed! Thank you again, Yuming and all. Bests, Dongjoon. On Tue, Dec 24, 2019 at 13:38 Takeshi Yamamuro wrote: > Great work, Yuming! > > Bests, > Takeshi > > On Wed, Dec 25, 2019 at 6:00 AM Xiao Li wrote: > >> Thank you all. Happy Holidays! >> >> Xiao >> >> On Tue, Dec 24, 2019 at 12:53 PM Y

Release Apache Spark 2.4.5

2020-01-05 Thread Dongjoon Hyun
Hi, All. Happy New Year (2020)! Although we slightly missed the timeline for 3.0 branch cut last month, it seems that we keep 2.4.x timeline on track. https://spark.apache.org/versioning-policy.html As of today, `branch-2.4` has 154 patches since v2.4.4. $ git log --oneline v2.4.4..HEA

Re: Release Apache Spark 2.4.5

2020-01-06 Thread Dongjoon Hyun
re's release window for this. >>>> >>>> On Mon, Jan 6, 2020 at 12:38 PM Hyukjin Kwon >>>> wrote: >>>> >>>>> Yeah, I think it's nice to have another maintenance release given >>>>> Spark 3.0 timeline. >>&

Re: [DISCUSS] Support year-month and day-time Intervals

2020-01-10 Thread Dongjoon Hyun
Hi, Kent. Thank you for the proposal. Does your proposal need to revert something from the master branch? I'm just asking because it's not clear in the proposal document. Bests, Dongjoon. On Fri, Jan 10, 2020 at 5:31 AM Dr. Kent Yao wrote: > Hi, Devs > > I’d like to propose to add two new int

Re: [DISCUSS] Support year-month and day-time Intervals

2020-01-10 Thread Dongjoon Hyun
es, > 2. `interval` -> CalenderIntervalType support in the parser > > Thanks > > *Kent Yao* > Data Science Center, Hangzhou Research Institute, Netease Corp. > PHONE: (86) 186-5715-3499 > EMAIL: hzyao...@corp.netease.com > > On 01/11/2020 01:57,Dongjoon Hyun >

[VOTE] Release Apache Spark 2.4.5 (RC1)

2020-01-13 Thread Dongjoon Hyun
Please vote on releasing the following candidate as Apache Spark version 2.4.5. The vote is open until January 16th 5AM PST and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 2.4.5 [ ] -1 Do not release this package because ..

Re: [VOTE] Release Apache Spark 2.4.5 (RC1)

2020-01-13 Thread Dongjoon Hyun
Server Version: v1.14.9-eks-c0eccc Bests, Dongjoon. On Mon, Jan 13, 2020 at 4:27 AM Dongjoon Hyun wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.4.5. > > The vote is open until January 16th 5AM PST and passes if a majority +1 > PMC

Re: [VOTE] Release Apache Spark 2.4.5 (RC1)

2020-01-14 Thread Dongjoon Hyun
ndlers'. > [error] Make sure that term eclipse is in your classpath and check for > conflicting dependencies with `-Ylog-classpath`. > [error] A full rebuild may help if 'MetricsSystem.class' was compiled > against an incompatible version of org. > [error] testUtils.sendM

Re: [VOTE] Release Apache Spark 2.4.5 (RC1)

2020-01-15 Thread Dongjoon Hyun
w.dbtsai.com >>>>> PGP Key ID: 42E5B25A8F7A82C1 >>>>> >>>>> On Tue, Jan 14, 2020 at 11:08 AM Sean Owen wrote: >>>>> > >>>>> > Yeah it's something about the env I spun up, but I don't know what. >>

Re: [VOTE] Release Apache Spark 2.4.5 (RC1)

2020-01-16 Thread Dongjoon Hyun
t it fixed a > regression, long lasting one (broken at 2.3.0). The link refers the PR for > 2.4 branch. > > Thanks, > Jungtaek Lim (HeartSaVioR) > > On Thu, Jan 16, 2020 at 12:56 PM Dongjoon Hyun > wrote: > >> Sure. Wenchen and Hyukjin. >> >> I observed

Re: PR lint-scala jobs failing with http error

2020-01-16 Thread Dongjoon Hyun
Hi, Tom and Shane. It looks like an old `sbt` bug. Maven seems to start to ban the `http` access recently. If you use Maven, it's okay because it goes to `https`. $ build/sbt clean [error] org.apache.maven.model.building.ModelBuildingException: 1 problem was encountered while building the effect

[FYI] SBT Build Failure

2020-01-16 Thread Dongjoon Hyun
Hi, All. As of now, Apache Spark sbt build is broken by the Maven Central repository policy. - https://stackoverflow.com/questions/59764749/requests-to-http-repo1-maven-org-maven2-return-a-501-https-required-status-an > Effective January 15, 2020, The Central Maven Repository no longer supports

Re: Spark master build hangs using parallel build option in maven

2020-01-17 Thread Dongjoon Hyun
Hi, Saurabh. It seems that you are hitting https://issues.apache.org/jira/browse/SPARK-26095 . And, we disabled the parallel build via https://github.com/apache/spark/pull/23061 at 3.0.0. According to the stack trace in JIRA and PR description, `maven-shade-plugin` seems to be the root cause. F

Correctness and data loss issues

2020-01-19 Thread Dongjoon Hyun
Hi, All. According to our policy, "Correctness and data loss issues should be considered Blockers". - http://spark.apache.org/contributing.html Since we are close to branch-3.0 cut, I want to ask your opinions on the following correctness and data loss issues. SPARK-30218 Columns used i

Spark 2.4.5 RC2 Preparation Status

2020-01-20 Thread Dongjoon Hyun
Hi, All. RC2 was scheduled on Today and all RC1 feedbacks seems to be addressed. However, I'm waiting for another on-going correctness PR. https://github.com/apache/spark/pull/27233 [SPARK-29701][SQL] Correct behaviours of group analytical queries when empty input given Unlike the other

Re: Adding Maven Central mirror from Google to the build?

2020-01-21 Thread Dongjoon Hyun
+1, I'm supporting the following proposal. > this mirror as the primary repo in the build, falling back to Central if needed. Thanks, Dongjoon. On Tue, Jan 21, 2020 at 14:37 Sean Owen wrote: > See https://github.com/apache/spark/pull/27307 for some context. We've > had to add, in at least one

Re: Correctness and data loss issues

2020-01-21 Thread Dongjoon Hyun
://issues.apache.org/jira/browse/SPARK-28344 > > On Mon, Jan 20, 2020 at 2:07 PM Dongjoon Hyun > wrote: > >> Hi, All. >> >> According to our policy, "Correctness and data loss issues should be >> considered Blockers". >> >> - http://

Re: Correctness and data loss issues

2020-01-22 Thread Dongjoon Hyun
been consistent about that it feels like > it can wait for 3.0 but would be good to get others input and I'm not an > expert on SQL standard and what do the other sql engines do in this case. > > Tom > > On Monday, January 20, 2020, 12:07:54 AM CST, Dongjoon Hyun < >

Re: Correctness and data loss issues

2020-01-22 Thread Dongjoon Hyun
solve. The remaining things are the followings: 1. Revisit `3.0.0`-only correctness patches? 2. Set the target version to `2.4.5`? (Specifically, is this feasible in terms of timeline?) Bests, Dongjoon. On Wed, Jan 22, 2020 at 9:43 AM Dongjoon Hyun wrote: > Hi, Tom. > > Then,

Re: [DISCUSS][SPARK-30275] Discussion about whether to add a gitlab-ci.yml file

2020-01-23 Thread Dongjoon Hyun
Hi, Jim. Thank you for the proposal. I understand the request. However, the following key benefit sounds like unofficial snapshot binary releases. > For example, this was used to build a version of spark that included SPARK-28938 which has yet to be released and was necessary for spark-operator t

Re: [spark-packages.org] Jenkins down

2020-01-24 Thread Dongjoon Hyun
Thank you for working on that, Xiao. BTW, I'm wondering why SPARK-30636 is a blocker for 2.4.5 release? Do you mean `Critical`? Bests, Dongjoon. On Fri, Jan 24, 2020 at 10:20 AM Xiao Li wrote: > Hi, all, > > Because the Jenkins of spark-packages.org is down, new packages or > releases are una

Re: [spark-packages.org] Jenkins down

2020-01-24 Thread Dongjoon Hyun
Thank you for updating! On Fri, Jan 24, 2020 at 10:29 AM Xiao Li wrote: > It does not block any Spark release. Reduced the priority to Critical. > > Cheers, > > Xiao > > Dongjoon Hyun 于2020年1月24日周五 上午10:24写道: > >> Thank you for working on that, Xiao. >> >

Re: Block a user from spark-website who repeatedly open the invalid same PR

2020-01-26 Thread Dongjoon Hyun
+1 On Sun, Jan 26, 2020 at 13:22 Shane Knapp wrote: > +1 > > On Sun, Jan 26, 2020 at 10:01 AM Denny Lee wrote: > > > > +1 > > > > On Sun, Jan 26, 2020 at 09:59 Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > >> > >> +1 > >> > >> I think y'all have shown this person more patience than

`Target Version` management on correctness/data-loss Issues

2020-01-26 Thread Dongjoon Hyun
Hi, All. After 2.4.5 RC1 vote failure, I asked your opinions about correctness/dataloss issues (at mailing lists/JIRAs/PRs) in order to collect the current status and public opinion widely in the community to build a consensus on this at this time. Before talking about those issues, please remind

Re: `Target Version` management on correctness/data-loss Issues

2020-01-27 Thread Dongjoon Hyun
opinion, so > should we do something like mail the dev list whenever one of these issues > is tagged if its not going to be back ported to an affected release? > > Tom > On Sunday, January 26, 2020, 11:22:13 PM CST, Dongjoon Hyun < > dongjoon.h...@gmail.com> wrote: > &

Re: `Target Version` management on correctness/data-loss Issues

2020-01-27 Thread Dongjoon Hyun
ion` explicitly if you think there is any other correctness/dataloss issue which is blocking 2.4.5 RC2. Otherwise, it's very hard for the release manager to notice it from the hey stacks of JIRA comments and PR comments. Bests, Dongjoon. On Mon, Jan 27, 2020 at 12:30 PM Dongjoon Hyun wrote: > Y

Re: `Target Version` management on correctness/data-loss Issues

2020-01-28 Thread Dongjoon Hyun
ht also just bring more visibility to > those important issues and get people interesting in working on them sooner. > > Tom > > On Monday, January 27, 2020, 02:31:03 PM CST, Dongjoon Hyun < > dongjoon.h...@gmail.com> wrote: > > > Yes. That is what I pointed in `Unfortun

Re: Spark 2.4.5 RC2 Preparation Status

2020-01-29 Thread Dongjoon Hyun
for it. > Are we talking about https://issues.apache.org/jira/browse/SPARK-28344 - > targeted for 2.4.5 but not backported, and a 'correctness' issue? > Simply: who argues this must hold up 2.4.5, and if so what's the status? > > On Wed, Jan 29, 2020 at 11:27 AM Don

Re: Spark 2.4.5 RC2 Preparation Status

2020-01-29 Thread Dongjoon Hyun
Anything else? not according to JIRA at least. > > I think it's valid to continue with RC2 assuming none of these are > necessary for 2.4.5. > It's not wrong to 'wait' if there are strong feelings about something, > but, if we can't see a reason to expect the

Re: Spark 2.4.5 RC2 Preparation Status

2020-01-29 Thread Dongjoon Hyun
back-port, as this should be noncontroversial. (Not sure why I didn't > backport originally) > > On Wed, Jan 29, 2020 at 3:27 PM Dongjoon Hyun > wrote: > > > > Thanks, Sean. > > > > If there is no further objection to the mailing list, > > could yo

Re: Spark 3.0 and ORC 1.6

2020-01-29 Thread Dongjoon Hyun
Hi, David. Thank you for sharing your opinion. I'm also a supporter for ZStandard. Apache Spark 3.0 starts to take advantage of ZStd a lot. 1) Switch the default codec for MapOutputStatus from GZip to ZStd. 2) Add spark.eventLog.compression.codec to allow ZStd. 3) Use Parquet+ZStd easil

Revise the blocker policy

2020-01-31 Thread Dongjoon Hyun
Hi, All. We discussed the correctness/dataloss policies for two weeks. According to our practice, I want to revise our policy in our website explicitly. - Correctness and data loss issues should be considered Blockers + Correctness and data loss issues should be considered Blockers for their targ

Re: new branch-3.0 jenkins job configs are ready to be deployed...

2020-01-31 Thread Dongjoon Hyun
Thank you, Shane. BTW, we need to enable JDK11 unit run on Python and R. (Currently, it's only tested in PRBuilder.) https://issues.apache.org/jira/browse/SPARK-28900 Today, Thomas and I'm hitting Python UT failure on JDK11 environment in independent PRs. ERROR [32.750s]: test_parameter_acc

Re: new branch-3.0 jenkins job configs are ready to be deployed...

2020-01-31 Thread Dongjoon Hyun
/StreamingLogisticRegressionWithSGDTests/test_parameter_accuracy/ Anyway, I'll file a JIRA issue for this Python flakiness. Bests, Dongjoon. On Fri, Jan 31, 2020 at 5:17 PM Dongjoon Hyun wrote: > Thank you, Shane. > > BTW, we need to enable JDK11 unit run on Python and R. (Currently, it's > on

[FYI] `Target Version` on `Improvement`/`New Feature` JIRA issues

2020-02-01 Thread Dongjoon Hyun
Hi, All. >From Today, we have `branch-3.0` as a tool of `Feature Freeze`. https://github.com/apache/spark/tree/branch-3.0 All open JIRA issues whose type is `Improvement` or `New Feature` and had `3.0.0` as a `Target Version` are changed accordingly first. - Most of them are re-targeted

[VOTE] Release Apache Spark 2.4.5 (RC2)

2020-02-02 Thread Dongjoon Hyun
Please vote on releasing the following candidate as Apache Spark version 2.4.5. The vote is open until February 5th 11PM PST and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 2.4.5 [ ] -1 Do not release this package because .

Re: [VOTE] Release Apache Spark 2.4.5 (RC2)

2020-02-02 Thread Dongjoon Hyun
erver Version: v1.14.9-eks-c0eccc Bests, Dongjoon. On Sun, Feb 2, 2020 at 9:30 PM Dongjoon Hyun wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.4.5. > > The vote is open until February 5th 11PM PST and passes if a majority +1 > PMC votes are cast,

Re: [VOTE] Release Apache Spark 2.4.5 (RC2)

2020-02-03 Thread Dongjoon Hyun
Yes, it does officially since 2.4.0. 2.4.5 is a maintenance release of 2.4.x line and the community didn't support Hadoop 3.x on 'branch-2.4'. We didn't run test at all. Bests, Dongjoon. On Sun, Feb 2, 2020 at 22:58 Ajith shetty wrote: > Is hadoop-3.1 profile supported for this release.? i see

Re: Spark 3.0 branch cut and code freeze on Jan 31?

2020-02-04 Thread Dongjoon Hyun
me feature could have multiple subtasks and part of >>>> subtasks have been merged and other subtask(s) are in reviewing. In this >>>> case do we allow these subtasks to have more days to get reviewed and >>>> merged later? >>>> >>>> Happy Holiday! >

Apache Spark Docker image repository

2020-02-05 Thread Dongjoon Hyun
Hi, All. >From 2020, shall we have an official Docker image repository as an additional distribution channel? I'm considering the following images. - Public binary release (no snapshot image) - Public non-Spark base image (OS + R + Python) (This can be used in GitHub Action Jobs an

[VOTE][RESULT] Spark 2.4.5 (RC2)

2020-02-05 Thread Dongjoon Hyun
Hi, All. The vote passes. Thanks to all who helped with this release 2.4.5! I'll follow up later with a release announcement once everything is published. +1 (* = binding): - Dongjoon Hyun * - Wenchen Fan * - Hyukjin Kwon * - Takeshi Yamamuro - Maxim Gekk - Sean Owen * +0: None -1: None

Re: Apache Spark Docker image repository

2020-02-07 Thread Dongjoon Hyun
t; > Then the discussion comes into what is in the docker images and how useful > it is. People run different os's, different python versions, etc. And like > Sean mentioned how useful really is it other then a few examples. Some > discussions on https://issues.apache.org/jira/bro

[ANNOUNCE] Announcing Apache Spark 2.4.5

2020-02-08 Thread Dongjoon Hyun
all community members for contributing to this release. This release would not have been possible without you. Dongjoon Hyun

Re: [ANNOUNCE] Announcing Apache Spark 2.4.5

2020-02-08 Thread Dongjoon Hyun
There was a typo in one URL. The correct release note URL is here. https://spark.apache.org/releases/spark-release-2-4-5.html On Sat, Feb 8, 2020 at 5:22 PM Dongjoon Hyun wrote: > We are happy to announce the availability of Spark 2.4.5! > > Spark 2.4.5 is a maintenance release c

Re: Apache Spark Docker image repository

2020-02-10 Thread Dongjoon Hyun
ersion? > If that looks not too much, I think it's fine to give a shot. > > > 2020년 2월 8일 (토) 오전 6:51, Dongjoon Hyun 님이 작성: > >> Thank you, Sean, Jiaxin, Shane, and Tom, for feedbacks. >> >> 1. For legal questions, please see the following three Apach

Re: Apache Spark Docker image repository

2020-02-11 Thread Dongjoon Hyun
d to only publish images at official > releases > > 2) There was some ambiguity about whether or not a container image that > included GPL'ed packages (spark images do) might trip over the GPL "viral > propagation" due to integrating ASL and GPL in a "binary rele

Re: Request to document the direct relationship between other configurations

2020-02-12 Thread Dongjoon Hyun
Thank you for raising the issue, Hyukjin. According to the current status of discussion, it seems that we are able to agree on updating the non-structured configurations and keeping the structured configuration AS-IS. I'm +1 for the revisiting the configurations if that is our direction. If there

Re: [DISCUSS] naming policy of Spark configs

2020-02-12 Thread Dongjoon Hyun
Thank you, Wenchen. The new policy looks clear to me. +1 for the explicit policy. So, are we going to revise the existing conf names before 3.0.0 release? Or, is it applied to new up-coming configurations from now? Bests, Dongjoon. On Wed, Feb 12, 2020 at 7:43 AM Wenchen Fan wrote: > Hi all,

[DISCUSSION] Esoteric Spark function `TRIM/LTRIM/RTRIM`

2020-02-14 Thread Dongjoon Hyun
Hi, All. I'm sending this email because the Apache Spark committers had better have a consistent point of views for the upcoming PRs. And, the community policy is the way to lead the community members transparently and clearly for a long term good. First of all, I want to emphasize that, like Apa

Re: [DISCUSSION] Esoteric Spark function `TRIM/LTRIM/RTRIM`

2020-02-14 Thread Dongjoon Hyun
Please note that the context if TRIM/LTRIM/RTRIM with two-parameters and TRIM(trimStr FROM str) syntax. This thread is irrelevant to one-parameter TRIM/LTRIM/RTRIM. On Fri, Feb 14, 2020 at 11:35 AM Dongjoon Hyun wrote: > Hi, All. > > I'm sending this email because the Apache Sp

<    1   2   3   4   5   6   7   8   9   >