Re: Executing hive query from Spark code

2015-03-02 Thread Ted Yu
Here is snippet of dependency tree for spark-hive module:

[INFO] org.apache.spark:spark-hive_2.10:jar:1.3.0-SNAPSHOT
...
[INFO] +- org.spark-project.hive:hive-metastore:jar:0.13.1a:compile
[INFO] |  +- org.spark-project.hive:hive-shims:jar:0.13.1a:compile
[INFO] |  |  +-
org.spark-project.hive.shims:hive-shims-common:jar:0.13.1a:compile
[INFO] |  |  +-
org.spark-project.hive.shims:hive-shims-0.20:jar:0.13.1a:runtime
[INFO] |  |  +-
org.spark-project.hive.shims:hive-shims-common-secure:jar:0.13.1a:compile
[INFO] |  |  +-
org.spark-project.hive.shims:hive-shims-0.20S:jar:0.13.1a:runtime
[INFO] |  |  \-
org.spark-project.hive.shims:hive-shims-0.23:jar:0.13.1a:runtime
...
[INFO] +- org.spark-project.hive:hive-exec:jar:0.13.1a:compile
[INFO] |  +- org.spark-project.hive:hive-ant:jar:0.13.1a:compile
[INFO] |  |  \- org.apache.velocity:velocity:jar:1.5:compile
[INFO] |  | \- oro:oro:jar:2.0.8:compile
[INFO] |  +- org.spark-project.hive:hive-common:jar:0.13.1a:compile
...
[INFO] +- org.spark-project.hive:hive-serde:jar:0.13.1a:compile

bq. is there a way to have the hive support without updating the assembly

I don't think so.

On Mon, Mar 2, 2015 at 12:37 PM, nitinkak001 nitinkak...@gmail.com wrote:

 I want to run Hive query inside Spark and use the RDDs generated from that
 inside Spark. I read in the documentation

 /Hive support is enabled by adding the -Phive and -Phive-thriftserver
 flags
 to Spark’s build. This command builds a new assembly jar that includes
 Hive.
 Note that this Hive assembly jar must also be present on all of the worker
 nodes, as they will need access to the Hive serialization and
 deserialization libraries (SerDes) in order to access data stored in
 Hive./

 I just wanted to know what -Phive and -Phive-thriftserver flags really do
 and is there a way to have the hive support without updating the assembly.
 Does that flag add a hive support jar or something?

 The reason I am asking is that I will be using Cloudera version of Spark in
 future and I am not sure how to add the Hive support to that Spark
 distribution.






 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Executing-hive-query-from-Spark-code-tp21880.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: Executing hive query from Spark code

2015-03-02 Thread Felix C
It should work in CDH without having to recompile.

http://eradiating.wordpress.com/2015/02/22/getting-hivecontext-to-work-in-cdh/

--- Original Message ---

From: Ted Yu yuzhih...@gmail.com
Sent: March 2, 2015 1:35 PM
To: nitinkak001 nitinkak...@gmail.com
Cc: user user@spark.apache.org
Subject: Re: Executing hive query from Spark code

Here is snippet of dependency tree for spark-hive module:

[INFO] org.apache.spark:spark-hive_2.10:jar:1.3.0-SNAPSHOT
...
[INFO] +- org.spark-project.hive:hive-metastore:jar:0.13.1a:compile
[INFO] |  +- org.spark-project.hive:hive-shims:jar:0.13.1a:compile
[INFO] |  |  +-
org.spark-project.hive.shims:hive-shims-common:jar:0.13.1a:compile
[INFO] |  |  +-
org.spark-project.hive.shims:hive-shims-0.20:jar:0.13.1a:runtime
[INFO] |  |  +-
org.spark-project.hive.shims:hive-shims-common-secure:jar:0.13.1a:compile
[INFO] |  |  +-
org.spark-project.hive.shims:hive-shims-0.20S:jar:0.13.1a:runtime
[INFO] |  |  \-
org.spark-project.hive.shims:hive-shims-0.23:jar:0.13.1a:runtime
...
[INFO] +- org.spark-project.hive:hive-exec:jar:0.13.1a:compile
[INFO] |  +- org.spark-project.hive:hive-ant:jar:0.13.1a:compile
[INFO] |  |  \- org.apache.velocity:velocity:jar:1.5:compile
[INFO] |  | \- oro:oro:jar:2.0.8:compile
[INFO] |  +- org.spark-project.hive:hive-common:jar:0.13.1a:compile
...
[INFO] +- org.spark-project.hive:hive-serde:jar:0.13.1a:compile

bq. is there a way to have the hive support without updating the assembly

I don't think so.

On Mon, Mar 2, 2015 at 12:37 PM, nitinkak001 nitinkak...@gmail.com wrote:

 I want to run Hive query inside Spark and use the RDDs generated from that
 inside Spark. I read in the documentation

 /Hive support is enabled by adding the -Phive and -Phive-thriftserver
 flags
 to Spark’s build. This command builds a new assembly jar that includes
 Hive.
 Note that this Hive assembly jar must also be present on all of the worker
 nodes, as they will need access to the Hive serialization and
 deserialization libraries (SerDes) in order to access data stored in
 Hive./

 I just wanted to know what -Phive and -Phive-thriftserver flags really do
 and is there a way to have the hive support without updating the assembly.
 Does that flag add a hive support jar or something?

 The reason I am asking is that I will be using Cloudera version of Spark in
 future and I am not sure how to add the Hive support to that Spark
 distribution.






 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Executing-hive-query-from-Spark-code-tp21880.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




RE: Executing hive query from Spark code

2015-03-02 Thread Cheng, Hao
I am not so sure how Spark SQL compiled in CDH, but if didn’t specify the 
–Phive and –Phive-thriftserver flags during the build, most likely it will not 
work if just by providing the Hive lib jars later on.  For example, does the 
HiveContext class exist in the assembly jar?

I am also quite curious with that, any hint will be appreciated.

From: Felix C [mailto:felixcheun...@hotmail.com]
Sent: Tuesday, March 3, 2015 12:59 PM
To: Ted Yu; nitinkak001
Cc: user
Subject: Re: Executing hive query from Spark code

It should work in CDH without having to recompile.

http://eradiating.wordpress.com/2015/02/22/getting-hivecontext-to-work-in-cdh/

--- Original Message ---

From: Ted Yu yuzhih...@gmail.commailto:yuzhih...@gmail.com
Sent: March 2, 2015 1:35 PM
To: nitinkak001 nitinkak...@gmail.commailto:nitinkak...@gmail.com
Cc: user user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: Executing hive query from Spark code
Here is snippet of dependency tree for spark-hive module:

[INFO] org.apache.spark:spark-hive_2.10:jar:1.3.0-SNAPSHOT
...
[INFO] +- org.spark-project.hive:hive-metastore:jar:0.13.1a:compile
[INFO] |  +- org.spark-project.hive:hive-shims:jar:0.13.1a:compile
[INFO] |  |  +- 
org.spark-project.hive.shims:hive-shims-common:jar:0.13.1a:compile
[INFO] |  |  +- org.spark-project.hive.shims:hive-shims-0.20:jar:0.13.1a:runtime
[INFO] |  |  +- 
org.spark-project.hive.shims:hive-shims-common-secure:jar:0.13.1a:compile
[INFO] |  |  +- 
org.spark-project.hive.shims:hive-shims-0.20S:jar:0.13.1a:runtime
[INFO] |  |  \- org.spark-project.hive.shims:hive-shims-0.23:jar:0.13.1a:runtime
...
[INFO] +- org.spark-project.hive:hive-exec:jar:0.13.1a:compile
[INFO] |  +- org.spark-project.hive:hive-ant:jar:0.13.1a:compile
[INFO] |  |  \- org.apache.velocity:velocity:jar:1.5:compile
[INFO] |  | \- oro:oro:jar:2.0.8:compile
[INFO] |  +- org.spark-project.hive:hive-common:jar:0.13.1a:compile
...
[INFO] +- org.spark-project.hive:hive-serde:jar:0.13.1a:compile

bq. is there a way to have the hive support without updating the assembly

I don't think so.

On Mon, Mar 2, 2015 at 12:37 PM, nitinkak001 
nitinkak...@gmail.commailto:nitinkak...@gmail.com wrote:
I want to run Hive query inside Spark and use the RDDs generated from that
inside Spark. I read in the documentation

/Hive support is enabled by adding the -Phive and -Phive-thriftserver flags
to Spark’s build. This command builds a new assembly jar that includes Hive.
Note that this Hive assembly jar must also be present on all of the worker
nodes, as they will need access to the Hive serialization and
deserialization libraries (SerDes) in order to access data stored in Hive./

I just wanted to know what -Phive and -Phive-thriftserver flags really do
and is there a way to have the hive support without updating the assembly.
Does that flag add a hive support jar or something?

The reason I am asking is that I will be using Cloudera version of Spark in
future and I am not sure how to add the Hive support to that Spark
distribution.






--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Executing-hive-query-from-Spark-code-tp21880.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org
For additional commands, e-mail: 
user-h...@spark.apache.orgmailto:user-h...@spark.apache.org



Executing hive query from Spark code

2015-03-02 Thread nitinkak001
I want to run Hive query inside Spark and use the RDDs generated from that
inside Spark. I read in the documentation 

/Hive support is enabled by adding the -Phive and -Phive-thriftserver flags
to Spark’s build. This command builds a new assembly jar that includes Hive.
Note that this Hive assembly jar must also be present on all of the worker
nodes, as they will need access to the Hive serialization and
deserialization libraries (SerDes) in order to access data stored in Hive./

I just wanted to know what -Phive and -Phive-thriftserver flags really do
and is there a way to have the hive support without updating the assembly.
Does that flag add a hive support jar or something?

The reason I am asking is that I will be using Cloudera version of Spark in
future and I am not sure how to add the Hive support to that Spark
distribution.






--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Executing-hive-query-from-Spark-code-tp21880.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org