Thank you all.
I'll try to make JIRA and PR for that.
Bests,
Dongjoon.
On Wed, Nov 20, 2019 at 4:08 PM Cheng Lian wrote:
> Sean, thanks for the corner cases you listed. They make a lot of sense.
> Now I do incline to have Hive 2.3 as the default version.
>
> Dongjoon, apologize if I didn't
Sean, thanks for the corner cases you listed. They make a lot of sense. Now
I do incline to have Hive 2.3 as the default version.
Dongjoon, apologize if I didn't make it clear before. What made me
concerned initially was only the following part:
> can we remove the usage of forked `hive` in
Yes. Right. That's the situation we are hitting and the result I expected.
We need to change our default with Hive 2 in the POM.
Dongjoon.
On Wed, Nov 20, 2019 at 5:20 AM Sean Owen wrote:
> Yes, good point. A user would get whatever the POM says without
> profiles enabled so it matters.
>
>
Yes, good point. A user would get whatever the POM says without
profiles enabled so it matters.
Playing it out, an app _should_ compile with the Spark dependency
marked 'provided'. In that case the app that is spark-submit-ted is
agnostic to the Hive dependency as the only one that matters is
Cheng, could you elaborate on your criteria, `Hive 2.3 code paths are
proven to be stable`?
For me, it's difficult to image that we can reach any stable situation when
we don't use it at all by ourselves.
> The Hive 1.2 code paths can only be removed once the Hive 2.3 code
paths are proven to
> Should Hadoop 2 + Hive 2 be considered to work on JDK 11?
This seems being investigated by Yuming's PR (
https://github.com/apache/spark/pull/26533) if I am not mistaken.
Oh, yes, what I meant by (default) was the default profiles we will use in
Spark.
2019년 11월 20일 (수) 오전 10:14, Sean Owen 님이
Should Hadoop 2 + Hive 2 be considered to work on JDK 11? I wasn't
sure if 2.7 did, but honestly I've lost track.
Anyway, it doesn't matter much as the JDK doesn't cause another build
permutation. All are built targeting Java 8.
I also don't know if we have to declare a binary release a default.
So, are we able to conclude our plans as below?
1. In Spark 3, we release as below:
- Hadoop 3.2 + Hive 2.3 + JDK8 build that also works JDK 11
- Hadoop 2.7 + Hive 2.3 + JDK8 build that also works JDK 11
- Hadoop 2.7 + Hive 1.2.1 (fork) + JDK8 (default)
2. In Spark 3.1, we target:
-
Thanks for taking care of this, Dongjoon!
We can target SPARK-20202 to 3.1.0, but I don't think we should do it
immediately after cutting the branch-3.0. The Hive 1.2 code paths can only
be removed once the Hive 2.3 code paths are proven to be stable. If it
turned out to be buggy in Spark 3.1, we
Same idea? support this combo in 3.0 and then remove Hadoop 2 support
in 3.1 or something? or at least make them non-default, not
necessarily publish special builds?
On Tue, Nov 19, 2019 at 4:53 PM Dongjoon Hyun wrote:
> For additional `hadoop-2.7 with Hive 2.3` pre-built distribution, how do
Yes. It does. I meant SPARK-20202.
Thanks. I understand that it can be considered like Scala version issue.
So, that's the reason why I put this as a `policy` issue from the beginning.
> First of all, I want to put this as a policy issue instead of a technical
issue.
In the policy perspective,
It's kinda like Scala version upgrade. Historically, we only remove the
support of an older Scala version when the newer version is proven to be
stable after one or more Spark minor versions.
On Tue, Nov 19, 2019 at 2:07 PM Cheng Lian wrote:
> Hmm, what exactly did you mean by "remove the usage
Hmm, what exactly did you mean by "remove the usage of forked `hive` in
Apache Spark 3.0 completely officially"? I thought you wanted to remove the
forked Hive 1.2 dependencies completely, no? As long as we still keep the
Hive 1.2 in Spark 3.0, I'm fine with that. I personally don't have a
BTW, `hive.version.short` is a directory name. We are using 2.3.6 only.
For directory name, we use '1.2.1' and '2.3.5' because we just delayed the
renaming the directories until 3.0.0 deadline to minimize the diff.
We can replace it immediately if we want right now.
On Tue, Nov 19, 2019 at
Hi, Cheng.
This is irrelevant to JDK11 and Hadoop 3. I'm talking about JDK8 world.
If we consider them, it could be the followings.
+--+-++
| | Hive 1.2.1 fork | Apache Hive 2.3.6 |
+-+
Dongjoon, I'm with Hyukjin. There should be at least one Spark 3.x minor
release to stabilize Hive 2.3 code paths before retiring the Hive 1.2
fork. Even today, the Hive 2.3.6 version bundled in Spark 3.0 is still
buggy in terms of JDK 11 support. (BTW, I just found that our root POM is
referring
Thank you for feedback, Hyujkjin and Sean.
I proposed `preview-2` for that purpose but I'm also +1 for do that at 3.1
if we can make a decision to eliminate the illegitimate Hive fork reference
immediately after `branch-3.0` cut.
Sean, I'm referencing Cheng Lian's email for the status of
Just to clarify, as even I have lost the details over time: hadoop-2.7
works with hive-2.3? it isn't tied to hadoop-3.2?
Roughly how much risk is there in using the Hive 1.x fork over Hive
2.x, for end users using Hive via Spark?
I don't have a strong opinion, other than sharing the view that we
I struggled hard to deal with this issue multiple times over a year and
thankfully we finally
decided to use the official version of Hive 2.3.x too (thank you, Yuming,
Alan, and guys)
I think this is already a huge progress that we started to use the
official version of Hive.
I think we should at
Hi, All.
First of all, I want to put this as a policy issue instead of a technical
issue.
Also, this is orthogonal from `hadoop` version discussion.
Apache Spark community kept (not maintained) the forked Apache Hive
1.2.1 because there has been no other options before. As we see at
SPARK-20202,
20 matches
Mail list logo