[jira] [Commented] (HIVE-15302) Relax the requirement that HoS needs Spark built w/o Hive

2017-04-19 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975959#comment-15975959
 ] 

Rui Li commented on HIVE-15302:
---

Thanks [~xuefuz] for the suggestions. I'll investigate how to use maven to 
figure out the jars for us.

> Relax the requirement that HoS needs Spark built w/o Hive
> -
>
> Key: HIVE-15302
> URL: https://issues.apache.org/jira/browse/HIVE-15302
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rui Li
>Assignee: Rui Li
>
> This requirement becomes more and more unacceptable as SparkSQL becomes 
> widely adopted. Let's use this JIRA to find out how we can relax the 
> limitation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15302) Relax the requirement that HoS needs Spark built w/o Hive

2017-04-18 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15973590#comment-15973590
 ] 

Xuefu Zhang commented on HIVE-15302:


[~lirui] I think Marcelo's suggestion makes sense. It's quite tedious and 
error-prone to figured out what classes are needed for HoS. It's better if we 
utilize a maven plugin that figures out this and pack them in an archive to be 
used as spark.yarn.jars. I'm pretty positive that such a plugin exists. 

Under the cover, Hive depends on spark-core, which has its own dependencies. 
the plugin should be able to figure out all dependencies. The plugin should 
also allow you to exclude anything that you don't want to include.

(As a fyi, spark-core.jar direct dependencies can be found in its maven pom 
file.)

> Relax the requirement that HoS needs Spark built w/o Hive
> -
>
> Key: HIVE-15302
> URL: https://issues.apache.org/jira/browse/HIVE-15302
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rui Li
>Assignee: Rui Li
>
> This requirement becomes more and more unacceptable as SparkSQL becomes 
> widely adopted. Let's use this JIRA to find out how we can relax the 
> limitation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15302) Relax the requirement that HoS needs Spark built w/o Hive

2017-04-16 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970629#comment-15970629
 ] 

Rui Li commented on HIVE-15302:
---

Hi [~vanzin], the goal here is just to figure out what to set for 
spark.yarn.archive and recommend it in our wiki, so that we don't have to 
require the Spark is built w/o Hive.
[~xuefuz] do you think the min set of jars makes sense? It avoids conflicts and 
improve performance as much as possible, but shortcoming is the set may change 
as Spark upgrades. Alternatively, we can simply tell user to use all the jars 
except those related to Hive.

> Relax the requirement that HoS needs Spark built w/o Hive
> -
>
> Key: HIVE-15302
> URL: https://issues.apache.org/jira/browse/HIVE-15302
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rui Li
>Assignee: Rui Li
>
> This requirement becomes more and more unacceptable as SparkSQL becomes 
> widely adopted. Let's use this JIRA to find out how we can relax the 
> limitation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15302) Relax the requirement that HoS needs Spark built w/o Hive

2017-04-15 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970231#comment-15970231
 ] 

Marcelo Vanzin commented on HIVE-15302:
---

Livy doesn't figure out what spark.yarn.archive or spark.yarn.jars should be. 
It assumes the user has a valid configuration.

If you're going to manage the list of jars for the user, the best way is to use 
maven, as I said. Have a module that is "Hive's packaging of Spark" and have it 
create a zip with all the needed jars or something, and use that, instead of 
manually figuring out lists of jars.

> Relax the requirement that HoS needs Spark built w/o Hive
> -
>
> Key: HIVE-15302
> URL: https://issues.apache.org/jira/browse/HIVE-15302
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rui Li
>Assignee: Rui Li
>
> This requirement becomes more and more unacceptable as SparkSQL becomes 
> widely adopted. Let's use this JIRA to find out how we can relax the 
> limitation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15302) Relax the requirement that HoS needs Spark built w/o Hive

2017-04-15 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970228#comment-15970228
 ] 

Rui Li commented on HIVE-15302:
---

Thanks [~vanzin] for the suggestions. I'm trying to figure out the least 
required jars to set for {{spark.yarn.archive}}. The purpose of doing this is 
to avoid conflicts and potentially improve performance. Could you please 
explain more about how you figured out these jars in your work for Livy? It 
doesn't seem obvious to me.

> Relax the requirement that HoS needs Spark built w/o Hive
> -
>
> Key: HIVE-15302
> URL: https://issues.apache.org/jira/browse/HIVE-15302
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rui Li
>Assignee: Rui Li
>
> This requirement becomes more and more unacceptable as SparkSQL becomes 
> widely adopted. Let's use this JIRA to find out how we can relax the 
> limitation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15302) Relax the requirement that HoS needs Spark built w/o Hive

2017-04-14 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969242#comment-15969242
 ] 

Marcelo Vanzin commented on HIVE-15302:
---

I'm not sure which approach you're trying here, but I'd really discourage you 
from trying to manually figure out the list of needed jars like that... that's 
what maven is for.

I've done something like this for Livy in the past by having a fake 
spark-submit script that does everything that Livy needs:
https://github.com/cloudera/livy/commit/3c314b11777459e10984ab408aaf2cbd47edf6db

The test code in Livy provides the needed classpath 
({{System.getProperty("java.class.path")}}), and it all works out. You could 
even expand on that idea to do this outside of tests too, by adding features to 
the fake spark-submit script.

> Relax the requirement that HoS needs Spark built w/o Hive
> -
>
> Key: HIVE-15302
> URL: https://issues.apache.org/jira/browse/HIVE-15302
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rui Li
>Assignee: Rui Li
>
> This requirement becomes more and more unacceptable as SparkSQL becomes 
> widely adopted. Let's use this JIRA to find out how we can relax the 
> limitation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15302) Relax the requirement that HoS needs Spark built w/o Hive

2017-04-14 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969031#comment-15969031
 ] 

Rui Li commented on HIVE-15302:
---

Following is the identified min set of required jars:
{noformat}
chill-java-0.8.0.jar kryo-shaded-3.0.3.jar  
  spark-network-common_2.11-2.0.0.jar
chill_2.11-0.8.0.jar mesos-0.21.1-shaded-protobuf.jar   
  spark-network-shuffle_2.11-2.0.0.jar
jackson-module-paranamer-2.6.5.jar   minlog-1.3.0.jar   
  spark-unsafe_2.11-2.0.0.jar
jackson-module-scala_2.11-2.6.5.jar  scala-library-2.11.8.jar   
  spark-yarn_2.11-2.0.0.jar
jersey-container-servlet-core-2.22.2.jar scala-xml_2.11-1.0.2.jar   
  xbean-asm5-shaded-4.4.jar
jersey-server-2.22.2.jar spark-core_2.11-2.0.0.jar
json4s-ast_2.11-3.2.11.jar   spark-launcher_2.11-2.0.0.jar
{noformat}
I'll run some more thorough tests with it. Meanwhile, I'd appreciate it if 
anyone can help verify it.

> Relax the requirement that HoS needs Spark built w/o Hive
> -
>
> Key: HIVE-15302
> URL: https://issues.apache.org/jira/browse/HIVE-15302
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rui Li
>Assignee: Rui Li
>
> This requirement becomes more and more unacceptable as SparkSQL becomes 
> widely adopted. Let's use this JIRA to find out how we can relax the 
> limitation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15302) Relax the requirement that HoS needs Spark built w/o Hive

2016-12-01 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15713754#comment-15713754
 ] 

Rui Li commented on HIVE-15302:
---

[~kellyzly], if we use the config to make HoS run against Spark built with 
Hive, that only works for yarn-cluster mode. But for other possible conflicts, 
as well as the performance impact you mentioned, we should set the config for 
both yarn-cluster and yarn-client.

> Relax the requirement that HoS needs Spark built w/o Hive
> -
>
> Key: HIVE-15302
> URL: https://issues.apache.org/jira/browse/HIVE-15302
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rui Li
>Assignee: Rui Li
>
> This requirement becomes more and more unacceptable as SparkSQL becomes 
> widely adopted. Let's use this JIRA to find out how we can relax the 
> limitation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15302) Relax the requirement that HoS needs Spark built w/o Hive

2016-12-01 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15713661#comment-15713661
 ] 

liyunzhang_intel commented on HIVE-15302:
-

[~lirui]:  
{quote}
We only care about yarn-client and yarn-cluster, because spark.yarn.archive and 
spark.yarn.jars are only for Spark on YARN. And yes we have to try different 
cases to identify the needed jars.
{quote}

And before you comment
{code}
To clarify, the method here only works for yarn-cluster mode. For yarn-client, 
the driver runs on client side, and it will suffer conflicts if spark pulls in 
hive libs.
{code}

I guess this method only work with yarn-cluster mode, is my understanding right?


> Relax the requirement that HoS needs Spark built w/o Hive
> -
>
> Key: HIVE-15302
> URL: https://issues.apache.org/jira/browse/HIVE-15302
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rui Li
>Assignee: Rui Li
>
> This requirement becomes more and more unacceptable as SparkSQL becomes 
> widely adopted. Let's use this JIRA to find out how we can relax the 
> limitation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15302) Relax the requirement that HoS needs Spark built w/o Hive

2016-12-01 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15712522#comment-15712522
 ] 

Marcelo Vanzin commented on HIVE-15302:
---

bq. For #2, I think we still need SPARK_HOME unless we clone a simplified spark 
installation in Hive directory structure, which is not ideal.

You can create your own command line to run the SparkSubmit class with the 
correct classpath and command line arguments. That's basically all that the 
spark-submit script does anyway. Since HoS's use case is a lot simpler, it 
shouldn't be hard to do (you don't need to support all the different 
combinations or arguments that the spark-submit script handles).

> Relax the requirement that HoS needs Spark built w/o Hive
> -
>
> Key: HIVE-15302
> URL: https://issues.apache.org/jira/browse/HIVE-15302
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rui Li
>Assignee: Rui Li
>
> This requirement becomes more and more unacceptable as SparkSQL becomes 
> widely adopted. Let's use this JIRA to find out how we can relax the 
> limitation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15302) Relax the requirement that HoS needs Spark built w/o Hive

2016-12-01 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15711818#comment-15711818
 ] 

Rui Li commented on HIVE-15302:
---

We only care about yarn-client and yarn-cluster, because spark.yarn.archive and 
spark.yarn.jars are only for Spark on YARN. And yes we have to try different 
cases to identify the needed jars.

> Relax the requirement that HoS needs Spark built w/o Hive
> -
>
> Key: HIVE-15302
> URL: https://issues.apache.org/jira/browse/HIVE-15302
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rui Li
>Assignee: Rui Li
>
> This requirement becomes more and more unacceptable as SparkSQL becomes 
> widely adopted. Let's use this JIRA to find out how we can relax the 
> limitation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15302) Relax the requirement that HoS needs Spark built w/o Hive

2016-12-01 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15711352#comment-15711352
 ] 

liyunzhang_intel commented on HIVE-15302:
-

[~lirui]:  understand the requirement.  My question:
1.how to get the necessary jars which HoS really depends in yarn client, 
yarn-cluster and other modes? just try different cases and get the full 
necessary jars?


> Relax the requirement that HoS needs Spark built w/o Hive
> -
>
> Key: HIVE-15302
> URL: https://issues.apache.org/jira/browse/HIVE-15302
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rui Li
>Assignee: Rui Li
>
> This requirement becomes more and more unacceptable as SparkSQL becomes 
> widely adopted. Let's use this JIRA to find out how we can relax the 
> limitation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15302) Relax the requirement that HoS needs Spark built w/o Hive

2016-11-30 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15711207#comment-15711207
 ] 

Rui Li commented on HIVE-15302:
---

[~kellyzly], you're right about the ideas. But the needed spark jars may not be 
the same as those listed in wiki now. Those listed are needed when linking 
spark to hive side, while spark.yarn.archive and spark.yarn.jars are intended 
for the containers on YARN side. But I guess the needed jars should be quite 
similar to those for local mode in our current wiki.

bq. because user has already set spark.yarn.jars so they can directory download 
a spark tarball from webside
I'm not sure what you mean here. We still need user to have spark installed in 
their cluster, either downloaded or built by themselves. But we can relax the 
limitation that the spark must be built w/o hive, in some cases.

> Relax the requirement that HoS needs Spark built w/o Hive
> -
>
> Key: HIVE-15302
> URL: https://issues.apache.org/jira/browse/HIVE-15302
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rui Li
>Assignee: Rui Li
>
> This requirement becomes more and more unacceptable as SparkSQL becomes 
> widely adopted. Let's use this JIRA to find out how we can relax the 
> limitation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15302) Relax the requirement that HoS needs Spark built w/o Hive

2016-11-30 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15710922#comment-15710922
 ] 

Xuefu Zhang commented on HIVE-15302:


I think there are two dependency on Spark from Hive

  1. Spark runtime classes which used to be in spark-assembly.jar
  2. spark-submit.sh script that is used to submit spark application for a hive 
session.

For #1, I think spark.yarn.jars or spark.yarn.archive will do. 
For #2, I think we still need SPARK_HOME unless we clone a simplified spark 
installation in Hive directory structure, which is not ideal.

Thus, SPARK_HOME seems still required. If so, Hive can automatically figure out 
spark.yarn.jars or spark.yarn.archive if it's not already set, from SPARK_HOME. 
To speed file distribution, an admin can point any of this properties to an 
HDFS location, which requires admin manually upload files to HDFS beforehand.

As to spark.yarn.archive, I think one needs to zip all the jars, not the folder 
that contains the jar. However, I didn't try and verify this.

> Relax the requirement that HoS needs Spark built w/o Hive
> -
>
> Key: HIVE-15302
> URL: https://issues.apache.org/jira/browse/HIVE-15302
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rui Li
>Assignee: Rui Li
>
> This requirement becomes more and more unacceptable as SparkSQL becomes 
> widely adopted. Let's use this JIRA to find out how we can relax the 
> limitation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15302) Relax the requirement that HoS needs Spark built w/o Hive

2016-11-30 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15710852#comment-15710852
 ] 

liyunzhang_intel commented on HIVE-15302:
-

[~lirui]:   The idea is good. so the flow of your idea will be following ?
1. user create a directory called spark_jars  with needed spark jars(listed in 
wiki)  and upload the directory to hdfs
2. export spark.yarn.jars

because user has already set spark.yarn.jars so they can directory download a 
spark tarball from webside(not using ./dev/make-distribution.sh --name 
"hadoop2-without-hive" --tgz 
"-Pyarn,hadoop-provided,hadoop-2.7,parquet-provided" to build a tarball).



> Relax the requirement that HoS needs Spark built w/o Hive
> -
>
> Key: HIVE-15302
> URL: https://issues.apache.org/jira/browse/HIVE-15302
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rui Li
>Assignee: Rui Li
>
> This requirement becomes more and more unacceptable as SparkSQL becomes 
> widely adopted. Let's use this JIRA to find out how we can relax the 
> limitation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15302) Relax the requirement that HoS needs Spark built w/o Hive

2016-11-30 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15710478#comment-15710478
 ] 

Rui Li commented on HIVE-15302:
---

We don't only depend on Spark jars, but also the scripts like spark-submit. Not 
sure how to package them into Hive. [~xuefuz], what do you think about 
Marcelo's suggestions?

> Relax the requirement that HoS needs Spark built w/o Hive
> -
>
> Key: HIVE-15302
> URL: https://issues.apache.org/jira/browse/HIVE-15302
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rui Li
>Assignee: Rui Li
>
> This requirement becomes more and more unacceptable as SparkSQL becomes 
> widely adopted. Let's use this JIRA to find out how we can relax the 
> limitation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15302) Relax the requirement that HoS needs Spark built w/o Hive

2016-11-30 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15709208#comment-15709208
 ] 

Marcelo Vanzin commented on HIVE-15302:
---

bq. I plan to find the needed jars from the Spark installed in the cluster

That's kind of what I meant. Wouldn't it be better to just directly depend on 
the parts of Spark that Hive needs, package those with Hive, and not have to 
depend on any cluster deployment of Spark?

Then the user doesn't need to care about a separate Spark installation when he 
wants to run HoS.

> Relax the requirement that HoS needs Spark built w/o Hive
> -
>
> Key: HIVE-15302
> URL: https://issues.apache.org/jira/browse/HIVE-15302
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rui Li
>Assignee: Rui Li
>
> This requirement becomes more and more unacceptable as SparkSQL becomes 
> widely adopted. Let's use this JIRA to find out how we can relax the 
> limitation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15302) Relax the requirement that HoS needs Spark built w/o Hive

2016-11-29 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15707443#comment-15707443
 ] 

Rui Li commented on HIVE-15302:
---

Thanks for your suggestions, Marcelo. I'll use spark.yarn.jars instead.

The "download tarball from somewhere" approach is for the test, and we have 
HIVE-14735 to move it to maven. What I'm trying to solve here is how to avoid 
the conflict at runtime (i.e. user running SQL with HoS). I plan to find the 
needed jars from the Spark installed in the cluster, and upload the jars to 
HDFS. Then we don't have to require the Spark in cluster is built w/o Hive.

> Relax the requirement that HoS needs Spark built w/o Hive
> -
>
> Key: HIVE-15302
> URL: https://issues.apache.org/jira/browse/HIVE-15302
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rui Li
>Assignee: Rui Li
>
> This requirement becomes more and more unacceptable as SparkSQL becomes 
> widely adopted. Let's use this JIRA to find out how we can relax the 
> limitation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15302) Relax the requirement that HoS needs Spark built w/o Hive

2016-11-29 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15707398#comment-15707398
 ] 

Marcelo Vanzin commented on HIVE-15302:
---

You don't need to use the archive. You can use {{spark.yarn.jars}}, for 
example; that doesn't require an archive. I don't remember the exact 
requirements for the archive; it's documented in Spark's documentation.

The recommended is either HDFS, or having it on every node and using a "local:" 
URI to tell Spark to not upload anything.

I'm hoping that you'll be using maven to package the needed Spark dependencies 
instead of the current "download tarball from somewhere" approach.

> Relax the requirement that HoS needs Spark built w/o Hive
> -
>
> Key: HIVE-15302
> URL: https://issues.apache.org/jira/browse/HIVE-15302
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rui Li
>Assignee: Rui Li
>
> This requirement becomes more and more unacceptable as SparkSQL becomes 
> widely adopted. Let's use this JIRA to find out how we can relax the 
> limitation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15302) Relax the requirement that HoS needs Spark built w/o Hive

2016-11-29 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15707362#comment-15707362
 ] 

Rui Li commented on HIVE-15302:
---

To clarify, the method here only works for yarn-cluster mode. For yarn-client, 
the driver runs on client side, and it will suffer conflicts if spark pulls in 
hive libs.

> Relax the requirement that HoS needs Spark built w/o Hive
> -
>
> Key: HIVE-15302
> URL: https://issues.apache.org/jira/browse/HIVE-15302
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rui Li
>Assignee: Rui Li
>
> This requirement becomes more and more unacceptable as SparkSQL becomes 
> widely adopted. Let's use this JIRA to find out how we can relax the 
> limitation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15302) Relax the requirement that HoS needs Spark built w/o Hive

2016-11-29 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15707353#comment-15707353
 ] 

Rui Li commented on HIVE-15302:
---

Yeah my plan is to put the jars to HDFS. For example, if user doesn't specify 
spark.yarn.archive or spark.yarn.jars, we can find the needed jars in 
spark.home, and upload them to HDFS, under our session's tmp dir.
I'm actually not very clear about the difference between spark.yarn.archive and 
spark.yarn.jars. In my test I just put all the jars in a folder in HDFS, and 
point spark.yarn.archive to that folder and it worked. I guess the usage of 
spark.yarn.jars should be similar to this.

> Relax the requirement that HoS needs Spark built w/o Hive
> -
>
> Key: HIVE-15302
> URL: https://issues.apache.org/jira/browse/HIVE-15302
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rui Li
>Assignee: Rui Li
>
> This requirement becomes more and more unacceptable as SparkSQL becomes 
> widely adopted. Let's use this JIRA to find out how we can relax the 
> limitation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15302) Relax the requirement that HoS needs Spark built w/o Hive

2016-11-29 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15707343#comment-15707343
 ] 

Rui Li commented on HIVE-15302:
---

Hi [~vanzin], the potential conflicts introduced by transitive dep have always 
been there. My understanding is {{spark.yarn.archive}} gives us a chance to 
exclude unneeded jars as much as possible, right?

I have two more questions,
1. How to archive the spark jars if to use {{spark.yarn.archive}}? It worked if 
I put the jars in a folder, but it didn't work if I tar or zip that folder.
2. I think the recommended config is to put the archive to HDFS? But if I 
provide a local path, does it require that all the NMs have the same archive in 
their local FS?

> Relax the requirement that HoS needs Spark built w/o Hive
> -
>
> Key: HIVE-15302
> URL: https://issues.apache.org/jira/browse/HIVE-15302
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rui Li
>Assignee: Rui Li
>
> This requirement becomes more and more unacceptable as SparkSQL becomes 
> widely adopted. Let's use this JIRA to find out how we can relax the 
> limitation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15302) Relax the requirement that HoS needs Spark built w/o Hive

2016-11-29 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15705967#comment-15705967
 ] 

Marcelo Vanzin commented on HIVE-15302:
---

As far as I know, HoS would need spark-core, spark-yarn and maybe, if you care 
about it, spark-mesos (if talking about Spark 2.x). Those have some extra 
transitive dependencies (of course) but nothing that depends on Hive libraries.

There's a chance you'll get some interesting conflicts, though, like kryo.

> Relax the requirement that HoS needs Spark built w/o Hive
> -
>
> Key: HIVE-15302
> URL: https://issues.apache.org/jira/browse/HIVE-15302
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rui Li
>Assignee: Rui Li
>
> This requirement becomes more and more unacceptable as SparkSQL becomes 
> widely adopted. Let's use this JIRA to find out how we can relax the 
> limitation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15302) Relax the requirement that HoS needs Spark built w/o Hive

2016-11-29 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15705364#comment-15705364
 ] 

Xuefu Zhang commented on HIVE-15302:


This sounds good to me. If we know the minimum set of jars to use, would it be 
easier to specify them in spark.yarn.jars to avoid the requirement of archiving 
those jars in order to use spark.yarn.archive? Also, can Hive have a mechanism 
to host the jars on HDFS to boost performance? Thanks. 

> Relax the requirement that HoS needs Spark built w/o Hive
> -
>
> Key: HIVE-15302
> URL: https://issues.apache.org/jira/browse/HIVE-15302
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rui Li
>Assignee: Rui Li
>
> This requirement becomes more and more unacceptable as SparkSQL becomes 
> widely adopted. Let's use this JIRA to find out how we can relax the 
> limitation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15302) Relax the requirement that HoS needs Spark built w/o Hive

2016-11-28 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15704529#comment-15704529
 ] 

Rui Li commented on HIVE-15302:
---

Basically HoS only needs the "core" functionalities of spark, so I guess we can 
exclude everything else for YARN. Also pinging [~vanzin], do you think this is 
the right approach? Thanks!

> Relax the requirement that HoS needs Spark built w/o Hive
> -
>
> Key: HIVE-15302
> URL: https://issues.apache.org/jira/browse/HIVE-15302
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rui Li
>Assignee: Rui Li
>
> This requirement becomes more and more unacceptable as SparkSQL becomes 
> widely adopted. Let's use this JIRA to find out how we can relax the 
> limitation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15302) Relax the requirement that HoS needs Spark built w/o Hive

2016-11-28 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15704508#comment-15704508
 ] 

Rui Li commented on HIVE-15302:
---

With Spark 2.0, we can use {{spark.yarn.archive}} or {{spark.yarn.jars}} to 
specify the spark jars needed on YARN side. Therefore, even if the spark is 
built with hive dependencies, we can exclude such jars (e.g. 
hive-exec-1.2.1.spark2.jar, spark-sql.jar) from {{spark.yarn.archive}}. Then we 
won't have conflicts running HoS. I did some simple tests and it worked. I'll 
do more investigation to verify.
After that, I think we can update our wiki to tell users how to achieve this 
(probably better if we can find the minimum set of required jars). In addition, 
we may also consider to automatically set this on hive side - since we ask user 
to set spark.home, it won't be difficult to find the location of the jars.
[~xuefuz], any idea on this?

> Relax the requirement that HoS needs Spark built w/o Hive
> -
>
> Key: HIVE-15302
> URL: https://issues.apache.org/jira/browse/HIVE-15302
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rui Li
>Assignee: Rui Li
>
> This requirement becomes more and more unacceptable as SparkSQL becomes 
> widely adopted. Let's use this JIRA to find out how we can relax the 
> limitation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)