[jira] [Commented] (SPARK-30643) Add support for embedding Hive 3

2020-08-09 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17174017#comment-17174017
 ] 

Hyukjin Kwon commented on SPARK-30643:
--

[~Kelvin.FE] you can try to create a PR to upgrade by your self that pass all 
tests. That would be easiest way to understand the pain points here.
Also to clarify, Spark is already able to read Hive 3 tables. This is a 
different issue to use a built-in Hive 3.

> Add support for embedding Hive 3
> 
>
> Key: SPARK-30643
> URL: https://issues.apache.org/jira/browse/SPARK-30643
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Igor Dvorzhak
>Priority: Major
>
> Currently Spark can be compiled only against Hive 1.2.1 and Hive 2.3, 
> compilation fails against Hive 3.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30643) Add support for embedding Hive 3

2020-08-09 Thread Zhaoyang Qin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17173999#comment-17173999
 ] 

Zhaoyang Qin commented on SPARK-30643:
--

I can't understand the experts' opinion. Obviously Hive3 has become popular, 
and users have already encountered trouble. Now Spark cannot query the managed 
table of hive3. On the contrary, Flink, which used to be very weak, began to 
support hive3. This is very confusing. It is ridiculous that Spark is surpassed 
by Flink, and it is in the sql field that Spark is proud of.

> Add support for embedding Hive 3
> 
>
> Key: SPARK-30643
> URL: https://issues.apache.org/jira/browse/SPARK-30643
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Igor Dvorzhak
>Priority: Major
>
> Currently Spark can be compiled only against Hive 1.2.1 and Hive 2.3, 
> compilation fails against Hive 3.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30643) Add support for embedding Hive 3

2020-01-29 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026389#comment-17026389
 ] 

Hyukjin Kwon commented on SPARK-30643:
--

Yeah, it needs some huge efforts to upgrade this, and a lot of changes. Unless 
there are some strong reasons to do it, I wouldn't do it.
There are so many reasons specifically why we had to upgrade to Hive 2.3 (e.g., 
Hadoop 3, removing its own fork, etc.). Hive 3 doesn't seem sharing the reason.

> Add support for embedding Hive 3
> 
>
> Key: SPARK-30643
> URL: https://issues.apache.org/jira/browse/SPARK-30643
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Igor Dvorzhak
>Priority: Major
>
> Currently Spark can be compiled only against Hive 1.2.1 and Hive 2.3, 
> compilation fails against Hive 3.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30643) Add support for embedding Hive 3

2020-01-26 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024062#comment-17024062
 ] 

Dongjoon Hyun commented on SPARK-30643:
---

It sounds like a misunderstanding on the role of embedded Hive. It's just used 
to talk Hive metastore.
> But if I chose to run Hive 3 and Spark with embedded Hive 2.3, then SparkSQL 
> and Hive queries behavior could differ in some cases.

Everything (SQL Parser/Analyzer/Optimizer and execution engine) are Spark's own 
code. So, in general, the embedded Hive 1.2/2.3 doesn't make a different. The 
exceptional cases might be Hive bugs. For example, Spark 3.0.0 will ship with 
Hive 1.2 and Hive 2.3 (default), and all UTs passed in both environment with 
same results.

I don't think Apache Spark need to have Hive 1.2 and Hive 2.3 and 3.1 in Apache 
Spark 3.x era. Adding 2.3 took away too many efforts from Apache Spark 
community, so it couldn't happen in Apache Spark 2.x. Maybe, we can consider 
that for Apache Spark 4.0 if there is many users who running Hive 3.x in the 
production stably (not beta.)
> I think that majority of reasons that went into support of embedding Hive 2.3 
> will apply to support of embedding Hive 3.


> Add support for embedding Hive 3
> 
>
> Key: SPARK-30643
> URL: https://issues.apache.org/jira/browse/SPARK-30643
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Igor Dvorzhak
>Priority: Major
>
> Currently Spark can be compiled only against Hive 1.2.1 and Hive 2.3, 
> compilation fails against Hive 3.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30643) Add support for embedding Hive 3

2020-01-25 Thread Igor Dvorzhak (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023726#comment-17023726
 ] 

Igor Dvorzhak commented on SPARK-30643:
---

Hi, [~dongjoon]. Thank you for fixing up this JIRA.

I think that majority of reasons that went into support of embedding Hive 2.3 
will apply to support of embedding Hive 3.

As a user I would want to have as close behavior as possible between Spark SQL 
and Hive queries in the same installation where I use both Spark and Hive. But 
if I chose to run Hive 3 and Spark with embedded Hive 2.3, then SparkSQL and 
Hive queries behavior could differ in some cases.

Personally, I'm interested in performance and correctness improvements that 
were made to Hive Server, Driver and Metastore client in Hive 3.

AWS EMR 6.0 (currently in beta) uses Hive 3, I would expect that other vendors 
will follow suit soon too.

Will follow up in dev list too, thanks!

> Add support for embedding Hive 3
> 
>
> Key: SPARK-30643
> URL: https://issues.apache.org/jira/browse/SPARK-30643
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Igor Dvorzhak
>Priority: Major
>
> Currently Spark can be compiled only against Hive 1.2.1 and Hive 2.3, 
> compilation fails against Hive 3.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30643) Add support for embedding Hive 3

2020-01-25 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023699#comment-17023699
 ] 

Dongjoon Hyun commented on SPARK-30643:
---

Hi, [~medb]. Thank you for filing a JIRA. I updated a little according to the 
guide, https://spark.apache.org/contributing.html .
1. This is not a `Bug`. I changed to `Improvement`.
2. Please don't use `Target Version` 

Could you explain why do we need this more? Spark can talk to Hive 3.0 and 3.1 
Metastore (SPARK-24360, SPARK-27970). And, AFAIK, CDP 6.3 delivers still Hive 
2.1.1. It means Hive 3.x is not used widely in the production yet.
It would be great if you send out to dev mailing list to get more feedbacks for 
you.

> Add support for embedding Hive 3
> 
>
> Key: SPARK-30643
> URL: https://issues.apache.org/jira/browse/SPARK-30643
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Igor Dvorzhak
>Priority: Major
>
> Currently Spark can be compiled only against Hive 1.2.1 and Hive 2.3, 
> compilation fails against Hive 3.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org