I am able to connect to MySQL Hive metastore from the client
cluster machine.
-sh-4.1$ mysql --user=hiveuser --password=pass
--host=hostname.vip.company.com <http://hostname.vip.company.com>
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 9417286
Server version: 5.5.12-eb-5.5.12-log MySQL-eb 5.5.12, Revision 3492
Copyright (c) 2000, 2011, Oracle and/or its affiliates. All
rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current
input statement.
mysql> use eBayHDB;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> show tables;
+---------------------------+
| Tables_in_HDB |
+---------------------------+
Regards,
Deepak
On Sat, Mar 28, 2015 at 12:35 PM, ÐΞ€ρ@Ҝ (๏̯͡๏)
<deepuj...@gmail.com <mailto:deepuj...@gmail.com>> wrote:
Yes am using yarn-cluster and i did add it via --files. I get
"Suitable error not found error"
Please share the spark-submit command that shows mysql jar
containing driver class used to connect to Hive MySQL meta
store.
Even after including it through
--driver-class-path
/home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar
OR (AND)
--jars /home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar
I keep getting "Suitable driver not found for"
Command
========
./bin/spark-submit -v --master yarn-cluster
--driver-class-path
*/home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar*:/apache/hadoop/share/hadoop/common/hadoop-common-2.4.1-EBAY-2.jar:/apache/hadoop/lib/hadoop-lzo-0.6.0.jar:/apache/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/yarn/lib/guava-11.0.2.jar
--jars
/home/dvasthimal/spark1.3/spark-avro_2.10-1.0.0.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar,*/home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.ja*r
--files $SPARK_HOME/conf/hive-site.xml --num-executors 1
--driver-memory 4g --driver-java-options "-XX:MaxPermSize=2G"
--executor-memory 2g --executor-cores 1 --queue hdmi-express
--class com.ebay.ep.poc.spark.reporting.SparkApp
spark_reporting-1.0-SNAPSHOT.jar startDate=2015-02-16
endDate=2015-02-16
input=/user/dvasthimal/epdatasets/successdetail1/part-r-00000.avro
subcommand=successevents2
output=/user/dvasthimal/epdatasets/successdetail2
Logs
====
Caused by: java.sql.SQLException: No suitable driver found
for jdbc:mysql://hostname:3306/HDB
at java.sql.DriverManager.getConnection(DriverManager.java:596)
at java.sql.DriverManager.getConnection(DriverManager.java:187)
at
com.jolbox.bonecp.BoneCP.obtainRawInternalConnection(BoneCP.java:361)
at com.jolbox.bonecp.BoneCP.<init>(BoneCP.java:416)
... 68 more
...
...
15/03/27 23:56:08 INFO yarn.Client: Uploading resource
file:/home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar ->
hdfs://apollo-NN:8020/user/dvasthimal/.sparkStaging/application_1426715280024_119815/mysql-connector-java-5.1.34.jar
...
...
-sh-4.1$ jar -tvf ../mysql-connector-java-5.1.34.jar | grep
Driver
61 Fri Oct 17 08:05:36 GMT-07:00 2014
META-INF/services/java.sql.Driver
3396 Fri Oct 17 08:05:22 GMT-07:00 2014
com/mysql/fabric/jdbc/FabricMySQLDriver.class
* 692 Fri Oct 17 08:05:22 GMT-07:00 2014
com/mysql/jdbc/Driver.class*
1562 Fri Oct 17 08:05:20 GMT-07:00 2014
com/mysql/jdbc/NonRegisteringDriver$ConnectionPhantomReference.class
17817 Fri Oct 17 08:05:20 GMT-07:00 2014
com/mysql/jdbc/NonRegisteringDriver.class
690 Fri Oct 17 08:05:24 GMT-07:00 2014
com/mysql/jdbc/NonRegisteringReplicationDriver.class
731 Fri Oct 17 08:05:24 GMT-07:00 2014
com/mysql/jdbc/ReplicationDriver.class
336 Fri Oct 17 08:05:24 GMT-07:00 2014
org/gjt/mm/mysql/Driver.class
You have new mail in /var/spool/mail/dvasthimal
-sh-4.1$ cat conf/hive-site.xml | grep Driver
<name>javax.jdo.option.ConnectionDriverName</name>
*<value>com.mysql.jdbc.Driver</value>*
<description>Driver class name for a JDBC
metastore</description>
-sh-4.1$
--
Deepak
On Sat, Mar 28, 2015 at 1:06 AM, Michael Armbrust
<mich...@databricks.com <mailto:mich...@databricks.com>> wrote:
Are you running on yarn?
- If you are running in yarn-client mode, set
HADOOP_CONF_DIR to /etc/hive/conf/ (or the directory
where your hive-site.xml is located).
- If you are running in yarn-cluster mode, the easiest
thing to do is to add--files=/etc/hive/conf/hive-site.xml
(or the path for your hive-site.xml) to your spark-submit
script.
On Fri, Mar 27, 2015 at 5:42 AM, ÐΞ€ρ@Ҝ (๏̯͡๏)
<deepuj...@gmail.com <mailto:deepuj...@gmail.com>> wrote:
I can recreate tables but what about data. It looks
like this is a obvious feature that Spark SQL must be
having. People will want to transform tons of data
stored in HDFS through Hive from Spark SQL.
Spark programming guide suggests its possible.
Spark SQL also supports reading and writing data
stored in Apache Hive <http://hive.apache.org/>. ....
Configuration of Hive is done by placing your
|hive-site.xml| file in |conf/|.
https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#hive-tables
For some reason its not working.
On Fri, Mar 27, 2015 at 3:35 PM, Arush Kharbanda
<ar...@sigmoidanalytics.com
<mailto:ar...@sigmoidanalytics.com>> wrote:
Seems Spark SQL accesses some more columns apart
from those created by hive.
You can always recreate the tables, you would
need to execute the table creation scripts but it
would be good to avoid recreation.
On Fri, Mar 27, 2015 at 3:20 PM, ÐΞ€ρ@Ҝ (๏̯͡๏)
<deepuj...@gmail.com
<mailto:deepuj...@gmail.com>> wrote:
I did copy hive-conf.xml form Hive
installation into spark-home/conf. IT does
have all the meta store connection details,
host, username, passwd, driver and others.
Snippet
======
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://host.vip.company.com:3306/HDB
<http://host.vip.company.com:3306/HDB></value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC
metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hiveuser</value>
<description>username to use against
metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>some-password</value>
<description>password to use against
metastore database</description>
</property>
<property>
<name>hive.metastore.local</name>
<value>false</value>
<description>controls whether to connect to
remove metastore server or open a new
metastore server in Hive Client JVM</description>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description>location of default database for
the warehouse</description>
</property>
......
When i attempt to read hive table, it does
not work. dw_bid does not exists.
I am sure there is a way to read tables
stored in HDFS (Hive) from Spark SQL.
Otherwise how would anyone do analytics since
the source tables are always either persisted
directly on HDFS or through Hive.
On Fri, Mar 27, 2015 at 1:15 PM, Arush
Kharbanda <ar...@sigmoidanalytics.com
<mailto:ar...@sigmoidanalytics.com>> wrote:
Since hive and spark SQL internally use
HDFS and Hive metastore. The only thing
you want to change is the processing
engine. You can try to bring your
hive-site.xml to
%SPARK_HOME%/conf/hive-site.xml.(Ensure
that the hive site xml captures the
metastore connection details).
Its a hack, i havnt tried it. I have
played around with the metastore and it
should work.
On Fri, Mar 27, 2015 at 12:04 PM, ÐΞ€ρ@Ҝ
(๏̯͡๏) <deepuj...@gmail.com
<mailto:deepuj...@gmail.com>> wrote:
I have few tables that are created in
Hive. I wan to transform data stored
in these Hive tables using Spark SQL.
Is this even possible ?
So far i have seen that i can create
new tables using Spark SQL dialect.
However when i run show tables or do
desc hive_table it says table not found.
I am now wondering is this support
present or not in Spark SQL ?
--
Deepak
--
Sigmoid Analytics
<http://htmlsig.com/www.sigmoidanalytics.com>
*Arush Kharbanda* || Technical Teamlead
ar...@sigmoidanalytics.com
<mailto:ar...@sigmoidanalytics.com> ||
www.sigmoidanalytics.com
<http://www.sigmoidanalytics.com/>
--
Deepak
--
Sigmoid Analytics
<http://htmlsig.com/www.sigmoidanalytics.com>
*Arush Kharbanda* || Technical Teamlead
ar...@sigmoidanalytics.com
<mailto:ar...@sigmoidanalytics.com> ||
www.sigmoidanalytics.com
<http://www.sigmoidanalytics.com/>
--
Deepak
--
Deepak
--
Deepak