Re: Can spark sql read existing tables created in hive

๏̯͡๏ Sat, 28 Mar 2015 00:07:50 -0700

Yes am using yarn-cluster and i did add it via --files. I get "Suitable
error not found error"


Please share the spark-submit command that shows mysql jar containing
driver class used to connect to Hive MySQL meta store.

Even after including it through

 --driver-class-path
/home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar
OR (AND)
 --jars /home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar

I keep getting "Suitable driver not found for"


Command
========

./bin/spark-submit -v --master yarn-cluster --driver-class-path
*/home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar*:/apache/hadoop/share/hadoop/common/hadoop-common-2.4.1-EBAY-2.jar:/apache/hadoop/lib/hadoop-lzo-0.6.0.jar:/apache/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/yarn/lib/guava-11.0.2.jar
--jars
/home/dvasthimal/spark1.3/spark-avro_2.10-1.0.0.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar,
*/home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.ja*r --files
$SPARK_HOME/conf/hive-site.xml  --num-executors 1 --driver-memory 4g
--driver-java-options "-XX:MaxPermSize=2G" --executor-memory 2g
--executor-cores 1 --queue hdmi-express --class
com.ebay.ep.poc.spark.reporting.SparkApp spark_reporting-1.0-SNAPSHOT.jar
startDate=2015-02-16 endDate=2015-02-16
input=/user/dvasthimal/epdatasets/successdetail1/part-r-00000.avro
subcommand=successevents2 output=/user/dvasthimal/epdatasets/successdetail2
Logs
====

Caused by: java.sql.SQLException: No suitable driver found for
jdbc:mysql://hostname:3306/HDB
at java.sql.DriverManager.getConnection(DriverManager.java:596)
at java.sql.DriverManager.getConnection(DriverManager.java:187)
at com.jolbox.bonecp.BoneCP.obtainRawInternalConnection(BoneCP.java:361)
at com.jolbox.bonecp.BoneCP.<init>(BoneCP.java:416)
... 68 more
...
...

15/03/27 23:56:08 INFO yarn.Client: Uploading resource
file:/home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar ->
hdfs://apollo-NN:8020/user/dvasthimal/.sparkStaging/application_1426715280024_119815/mysql-connector-java-5.1.34.jar

...

...



-sh-4.1$ jar -tvf ../mysql-connector-java-5.1.34.jar | grep Driver
    61 Fri Oct 17 08:05:36 GMT-07:00 2014 META-INF/services/java.sql.Driver
  3396 Fri Oct 17 08:05:22 GMT-07:00 2014
com/mysql/fabric/jdbc/FabricMySQLDriver.class
*   692 Fri Oct 17 08:05:22 GMT-07:00 2014 com/mysql/jdbc/Driver.class*
  1562 Fri Oct 17 08:05:20 GMT-07:00 2014
com/mysql/jdbc/NonRegisteringDriver$ConnectionPhantomReference.class
 17817 Fri Oct 17 08:05:20 GMT-07:00 2014
com/mysql/jdbc/NonRegisteringDriver.class
   690 Fri Oct 17 08:05:24 GMT-07:00 2014
com/mysql/jdbc/NonRegisteringReplicationDriver.class
   731 Fri Oct 17 08:05:24 GMT-07:00 2014
com/mysql/jdbc/ReplicationDriver.class
   336 Fri Oct 17 08:05:24 GMT-07:00 2014 org/gjt/mm/mysql/Driver.class
You have new mail in /var/spool/mail/dvasthimal
-sh-4.1$ cat conf/hive-site.xml | grep Driver
  <name>javax.jdo.option.ConnectionDriverName</name>
*  <value>com.mysql.jdbc.Driver</value>*
  <description>Driver class name for a JDBC metastore</description>
-sh-4.1$

-- 
Deepak


On Sat, Mar 28, 2015 at 1:06 AM, Michael Armbrust <mich...@databricks.com>
wrote:

> Are you running on yarn?
>
>  - If you are running in yarn-client mode, set HADOOP_CONF_DIR to
> /etc/hive/conf/ (or the directory where your hive-site.xml is located).
>  - If you are running in yarn-cluster mode, the easiest thing to do is to
> add--files=/etc/hive/conf/hive-site.xml (or the path for your
> hive-site.xml) to your spark-submit script.
>
> On Fri, Mar 27, 2015 at 5:42 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com>
> wrote:
>
>> I can recreate tables but what about data. It looks like this is a
>> obvious feature that Spark SQL must be having. People will want to
>> transform tons of data stored in HDFS through Hive from Spark SQL.
>>
>> Spark programming guide suggests its possible.
>>
>>
>> Spark SQL also supports reading and writing data stored in Apache Hive
>> <http://hive.apache.org/>.  .... Configuration of Hive is done by
>> placing your hive-site.xml file in conf/.
>> https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#hive-tables
>>
>> For some reason its not working.
>>
>>
>> On Fri, Mar 27, 2015 at 3:35 PM, Arush Kharbanda <
>> ar...@sigmoidanalytics.com> wrote:
>>
>>> Seems Spark SQL accesses some more columns apart from those created by
>>> hive.
>>>
>>> You can always recreate the tables, you would need to execute the table
>>> creation scripts but it would be good to avoid recreation.
>>>
>>> On Fri, Mar 27, 2015 at 3:20 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com>
>>> wrote:
>>>
>>>> I did copy hive-conf.xml form Hive installation into spark-home/conf.
>>>> IT does have all the meta store connection details, host, username, passwd,
>>>> driver and others.
>>>>
>>>>
>>>>
>>>> Snippet
>>>> ======
>>>>
>>>>
>>>> <configuration>
>>>>
>>>> <property>
>>>>   <name>javax.jdo.option.ConnectionURL</name>
>>>>   <value>jdbc:mysql://host.vip.company.com:3306/HDB</value>
>>>> </property>
>>>>
>>>> <property>
>>>>   <name>javax.jdo.option.ConnectionDriverName</name>
>>>>   <value>com.mysql.jdbc.Driver</value>
>>>>   <description>Driver class name for a JDBC metastore</description>
>>>> </property>
>>>>
>>>> <property>
>>>>   <name>javax.jdo.option.ConnectionUserName</name>
>>>>   <value>hiveuser</value>
>>>>   <description>username to use against metastore database</description>
>>>> </property>
>>>>
>>>> <property>
>>>>   <name>javax.jdo.option.ConnectionPassword</name>
>>>>   <value>some-password</value>
>>>>   <description>password to use against metastore database</description>
>>>> </property>
>>>>
>>>> <property>
>>>>   <name>hive.metastore.local</name>
>>>>   <value>false</value>
>>>>   <description>controls whether to connect to remove metastore server
>>>> or open a new metastore server in Hive Client JVM</description>
>>>> </property>
>>>>
>>>> <property>
>>>>   <name>hive.metastore.warehouse.dir</name>
>>>>   <value>/user/hive/warehouse</value>
>>>>   <description>location of default database for the
>>>> warehouse</description>
>>>> </property>
>>>>
>>>> ......
>>>>
>>>>
>>>>
>>>> When i attempt to read hive table, it does not work. dw_bid does not
>>>> exists.
>>>>
>>>> I am sure there is a way to read tables stored in HDFS (Hive) from
>>>> Spark SQL. Otherwise how would anyone do analytics since the source tables
>>>> are always either persisted directly on HDFS or through Hive.
>>>>
>>>>
>>>> On Fri, Mar 27, 2015 at 1:15 PM, Arush Kharbanda <
>>>> ar...@sigmoidanalytics.com> wrote:
>>>>
>>>>> Since hive and spark SQL internally use HDFS and Hive metastore. The
>>>>> only thing you want to change is the processing engine. You can try to
>>>>> bring your hive-site.xml to %SPARK_HOME%/conf/hive-site.xml.(Ensure that
>>>>> the hive site xml captures the metastore connection details).
>>>>>
>>>>> Its a hack,  i havnt tried it. I have played around with the metastore
>>>>> and it should work.
>>>>>
>>>>> On Fri, Mar 27, 2015 at 12:04 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I have few tables that are created in Hive. I wan to transform data
>>>>>> stored in these Hive tables using Spark SQL. Is this even possible ?
>>>>>>
>>>>>> So far i have seen that i can create new tables using Spark SQL
>>>>>> dialect. However when i run show tables or do desc hive_table it says 
>>>>>> table
>>>>>> not found.
>>>>>>
>>>>>> I am now wondering is this support present or not in Spark SQL ?
>>>>>>
>>>>>> --
>>>>>> Deepak
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> [image: Sigmoid Analytics]
>>>>> <http://htmlsig.com/www.sigmoidanalytics.com>
>>>>>
>>>>> *Arush Kharbanda* || Technical Teamlead
>>>>>
>>>>> ar...@sigmoidanalytics.com || www.sigmoidanalytics.com
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Deepak
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> [image: Sigmoid Analytics] <http://htmlsig.com/www.sigmoidanalytics.com>
>>>
>>> *Arush Kharbanda* || Technical Teamlead
>>>
>>> ar...@sigmoidanalytics.com || www.sigmoidanalytics.com
>>>
>>
>>
>>
>> --
>> Deepak
>>
>>
>


-- 
Deepak

Re: Can spark sql read existing tables created in hive

Reply via email to