On 10/14/14 7:31 PM, Neeraj Garg02 wrote:
Hi All,
I’ve downloaded and installed Apache Spark 1.1.0 pre-built for Hadoop
2.4.
Now, I want to test two features of Spark:
|1.|*YARN deployment* : As per my understanding, I need to modify
“spark-defaults.conf” file with the settings mentioned at URL
http://spark.apache.org/docs/1.1.0/running-on-yarn.html#configuration
. For example, settings like |spark.yarn.applicationMaster.waitTries
etc.|||
|**|
|*In order to launch *|a Spark application in yarn-cluster mode,
following command can be used once the configurations are done.
|*./bin/spark-submit --class path.to.your.Class --master yarn-cluster
[options] <app jar> [app options]*||**|
|**|
|*Is this understanding correct*||*or please suggest with the steps to
Deploy Spark on YARN.*||**|
Yes.
|**|
2.*Testing Thrift JDBC server connection: *I’ve Hadoop 2.4 cluster
setup. Apache spark is running on this cluster. Now, in order to test
JDC thrift server, I’ve successfully followed the steps mentioned in
the “*Other SQL Interfaces” *section of Spark SQL programming guide
i.e. I can see beeline prompt and it’s connected to thrift server
using the given command. Please help me to get answers of following
queries:
a.Which kind of queries I can execute using this beeline prompt. Would
these be Spark SQL queries or Hive queries?
You can only use HiveQL under beeline.
*b.**Configuration of Hive is done by placing your
*|*hive-site.xml*|*file in *|*conf/*|*.***Right now, I don’t have Hive
installed as part of the Hadoop 2.4 cluster. Do I need to install Hive
to test the Thrift JDBC server OR to execute Spark SQL queries from
the beeline prompt.**
i.In case Hive installation is a pre-requisite, then, is there a need
to re-build the Spark package. What are the steps for these. Is
internet required for the re-build?
The Thrift server is used to interact with existing Hive data, and thus
needs Hive Metastore to access Hive catalog. In your case, you need to
build Spark with |sbt/sbt -Phive,hadoop-2.4 clean package|. But since
you’ve already started Thrift server successfully, this step should
already have been done properly.
*c.*What else would I need in case I need to connect BI tools with
Spark SQL using Thrift JDBC/ ODBC server. Please share the steps or
pointers to do the same.
You can follow this awesome article authored by Denny Lee:
https://www.concur.com/blog/en-us/connect-tableau-to-sparksql
**
As I could not find sufficient information on the same, please help.
Please let me know if more information/ explanation is required.
Thanks and Regards,
Neeraj Garg
**************** CAUTION - Disclaimer *****************
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely
for the use of the addressee(s). If you are not the intended recipient, please
notify the sender by e-mail and delete the original message. Further, you are
not
to copy, disclose, or distribute this e-mail or its contents to any other
person and
any such actions are unlawful. This e-mail may contain viruses. Infosys has
taken
every reasonable precaution to minimize this risk, but is not liable for any
damage
you may sustain as a result of any virus in this e-mail. You should carry out
your
own virus checks before opening the e-mail or attachment. Infosys reserves the
right to monitor and review the content of all messages sent to or from this
e-mail
address. Messages sent to or from this e-mail address may be stored on the
Infosys e-mail system.
***INFOSYS******** End of Disclaimer ********INFOSYS***