Ah, sorry, my bad... http://www.cloudera.com/content/cloudera/en/documentation/cdh4/v4-2-0/CDH4-Installation-Guide/cdh4ig_topic_18_4.html

Hello Lian
Can you share the URL ?

    The "mysql" command line doesn't use JDBC to talk to MySQL server,
    so this doesn't verify anything.

    I think this Hive metastore installation guide from Cloudera may
    be helpful. Although this document is for CDH4, the general steps
    are the same, and should help you to figure out the relationships


    I am able to connect to MySQL Hive metastore from the client
    cluster machine.

    -sh-4.1$ mysql --user=hiveuser --password=pass
    --host=hostname.vip.company.com <http://hostname.vip.company.com>
    Welcome to the MySQL monitor.  Commands end with ; or \g.
    Your MySQL connection id is 9417286
    Server version: 5.5.12-eb-5.5.12-log MySQL-eb 5.5.12, Revision 3492
    Copyright (c) 2000, 2011, Oracle and/or its affiliates. All
    rights reserved.
    Oracle is a registered trademark of Oracle Corporation and/or its
    affiliates. Other names may be trademarks of their respective
    Type 'help;' or '\h' for help. Type '\c' to clear the current
    input statement.
    mysql> use eBayHDB;
    Reading table information for completion of table and column names
    You can turn off this feature to get a quicker startup with -A

    Database changed
    mysql> show tables;
    | Tables_in_HDB         |



        Yes am using yarn-cluster and i did add it via --files. I get
        "Suitable error not found error"

        Please share the spark-submit command that shows mysql jar
        containing driver class used to connect to Hive MySQL meta

        Even after including it through

        OR (AND)
         --jars /home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar

        I keep getting "Suitable driver not found for"


        ./bin/spark-submit -v --master yarn-cluster
        --files $SPARK_HOME/conf/hive-site.xml  --num-executors 1
        --driver-memory 4g --driver-java-options "-XX:MaxPermSize=2G"
        --executor-memory 2g --executor-cores 1 --queue hdmi-express
        --class com.ebay.ep.poc.spark.reporting.SparkApp
        spark_reporting-1.0-SNAPSHOT.jar startDate=2015-02-16


        Caused by: java.sql.SQLException: No suitable driver found
        for jdbc:mysql://hostname:3306/HDB
        at java.sql.DriverManager.getConnection(DriverManager.java:596)
        at java.sql.DriverManager.getConnection(DriverManager.java:187)
        at com.jolbox.bonecp.BoneCP.<init>(BoneCP.java:416)
        ... 68 more

        15/03/27 23:56:08 INFO yarn.Client: Uploading resource
        file:/home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar ->



        -sh-4.1$ jar -tvf ../mysql-connector-java-5.1.34.jar | grep
            61 Fri Oct 17 08:05:36 GMT-07:00 2014
          3396 Fri Oct 17 08:05:22 GMT-07:00 2014
        *   692 Fri Oct 17 08:05:22 GMT-07:00 2014
          1562 Fri Oct 17 08:05:20 GMT-07:00 2014
         17817 Fri Oct 17 08:05:20 GMT-07:00 2014
           690 Fri Oct 17 08:05:24 GMT-07:00 2014
           731 Fri Oct 17 08:05:24 GMT-07:00 2014
           336 Fri Oct 17 08:05:24 GMT-07:00 2014
        You have new mail in /var/spool/mail/dvasthimal
        -sh-4.1$ cat conf/hive-site.xml | grep Driver
          <description>Driver class name for a JDBC

            Are you running on yarn?

             - If you are running in yarn-client mode, set
            HADOOP_CONF_DIR to /etc/hive/conf/ (or the directory
            where your hive-site.xml is located).
             - If you are running in yarn-cluster mode, the easiest
            thing to do is to add--files=/etc/hive/conf/hive-site.xml
            (or the path for your hive-site.xml) to your spark-submit

                I can recreate tables but what about data. It looks
                like this is a obvious feature that Spark SQL must be
                having. People will want to transform tons of data
                stored in HDFS through Hive from Spark SQL.

                Spark programming guide suggests its possible.

                Spark SQL also supports reading and writing data
                stored in Apache Hive <http://hive.apache.org/>. ....
                Configuration of Hive is done by placing your
                |hive-site.xml| file in |conf/|.

                For some reason its not working.

                    Seems Spark SQL accesses some more columns apart
                    from those created by hive.

                    You can always recreate the tables, you would
                    need to execute the table creation scripts but it
                    would be good to avoid recreation.

                        I did copy hive-conf.xml form Hive
                        installation into spark-home/conf. IT does
                        have all the meta store connection details,
                        host, username, passwd, driver and others.




                        <description>Driver class name for a JDBC

                        <description>username to use against
                        metastore database</description>

                        <description>password to use against
                        metastore database</description>

                        <description>controls whether to connect to
                        remove metastore server or open a new
                        metastore server in Hive Client JVM</description>

                        <description>location of default database for
                        the warehouse</description>


                        When i attempt to read hive table, it does
                        not work. dw_bid does not exists.

                        I am sure there is a way to read tables
                        stored in HDFS (Hive) from Spark SQL.
                        Otherwise how would anyone do analytics since
                        the source tables are always either persisted
                        directly on HDFS or through Hive.

                            Since hive and spark SQL internally use
                            HDFS and Hive metastore. The only thing
                            you want to change is the processing
                            engine. You can try to bring your
                            hive-site.xml to
                            that the hive site xml captures the
                            metastore connection details).

                            Its a hack,  i havnt tried it. I have
                            played around with the metastore and it
                            should work.

                                I have few tables that are created in
                                Hive. I wan to transform data stored
                                in these Hive tables using Spark SQL.
                                Is this even possible ?

                                So far i have seen that i can create
                                new tables using Spark SQL dialect.
                                However when i run show tables or do
                                desc hive_table it says table not found.

                                I am now wondering is this support
                                present or not in Spark SQL ?

