Re: Can spark sql read existing tables created in hive

Cheng Lian Mon, 30 Mar 2015 09:35:03 -0700

Ah, sorry, my bad...http://www.cloudera.com/content/cloudera/en/documentation/cdh4/v4-2-0/CDH4-Installation-Guide/cdh4ig_topic_18_4.html


On 3/30/15 10:24 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote:

Hello Lian
Can you share the URL ?

On Mon, Mar 30, 2015 at 6:12 PM, Cheng Lian <lian.cs....@gmail.com<mailto:lian.cs....@gmail.com>> wrote:


    The "mysql" command line doesn't use JDBC to talk to MySQL server,
    so this doesn't verify anything.

    I think this Hive metastore installation guide from Cloudera may
    be helpful. Although this document is for CDH4, the general steps
    are the same, and should help you to figure out the relationships
    here.

    Cheng


    On 3/30/15 3:33 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote:

    I am able to connect to MySQL Hive metastore from the client
    cluster machine.

    -sh-4.1$ mysql --user=hiveuser --password=pass
    --host=hostname.vip.company.com <http://hostname.vip.company.com>
    Welcome to the MySQL monitor.  Commands end with ; or \g.
    Your MySQL connection id is 9417286
    Server version: 5.5.12-eb-5.5.12-log MySQL-eb 5.5.12, Revision 3492
    Copyright (c) 2000, 2011, Oracle and/or its affiliates. All
    rights reserved.
    Oracle is a registered trademark of Oracle Corporation and/or its
    affiliates. Other names may be trademarks of their respective
    owners.
    Type 'help;' or '\h' for help. Type '\c' to clear the current
    input statement.
    mysql> use eBayHDB;
    Reading table information for completion of table and column names
    You can turn off this feature to get a quicker startup with -A

    Database changed
    mysql> show tables;
    +---------------------------+
    | Tables_in_HDB         |

    +---------------------------+


    Regards,
    Deepak


    On Sat, Mar 28, 2015 at 12:35 PM, ÐΞ€ρ@Ҝ (๏̯͡๏)
    <deepuj...@gmail.com <mailto:deepuj...@gmail.com>> wrote:

        Yes am using yarn-cluster and i did add it via --files. I get
        "Suitable error not found error"

        Please share the spark-submit command that shows mysql jar
        containing driver class used to connect to Hive MySQL meta
        store.

        Even after including it through

         --driver-class-path
        /home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar
        OR (AND)
         --jars /home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar

        I keep getting "Suitable driver not found for"


        Command
        ========

        ./bin/spark-submit -v --master yarn-cluster
        --driver-class-path
        
*/home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar*:/apache/hadoop/share/hadoop/common/hadoop-common-2.4.1-EBAY-2.jar:/apache/hadoop/lib/hadoop-lzo-0.6.0.jar:/apache/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/yarn/lib/guava-11.0.2.jar
        --jars
        
/home/dvasthimal/spark1.3/spark-avro_2.10-1.0.0.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar,*/home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.ja*r
        --files $SPARK_HOME/conf/hive-site.xml  --num-executors 1
        --driver-memory 4g --driver-java-options "-XX:MaxPermSize=2G"
        --executor-memory 2g --executor-cores 1 --queue hdmi-express
        --class com.ebay.ep.poc.spark.reporting.SparkApp
        spark_reporting-1.0-SNAPSHOT.jar startDate=2015-02-16
        endDate=2015-02-16
        input=/user/dvasthimal/epdatasets/successdetail1/part-r-00000.avro
        subcommand=successevents2
        output=/user/dvasthimal/epdatasets/successdetail2

        Logs
        ====

        Caused by: java.sql.SQLException: No suitable driver found
        for jdbc:mysql://hostname:3306/HDB
        at java.sql.DriverManager.getConnection(DriverManager.java:596)
        at java.sql.DriverManager.getConnection(DriverManager.java:187)
        at
        com.jolbox.bonecp.BoneCP.obtainRawInternalConnection(BoneCP.java:361)
        at com.jolbox.bonecp.BoneCP.<init>(BoneCP.java:416)
        ... 68 more
        ...
        ...

        15/03/27 23:56:08 INFO yarn.Client: Uploading resource
        file:/home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar ->
        
hdfs://apollo-NN:8020/user/dvasthimal/.sparkStaging/application_1426715280024_119815/mysql-connector-java-5.1.34.jar

        ...

        ...




        -sh-4.1$ jar -tvf ../mysql-connector-java-5.1.34.jar | grep
        Driver
            61 Fri Oct 17 08:05:36 GMT-07:00 2014
        META-INF/services/java.sql.Driver
          3396 Fri Oct 17 08:05:22 GMT-07:00 2014
        com/mysql/fabric/jdbc/FabricMySQLDriver.class
        *   692 Fri Oct 17 08:05:22 GMT-07:00 2014
        com/mysql/jdbc/Driver.class*
          1562 Fri Oct 17 08:05:20 GMT-07:00 2014
        com/mysql/jdbc/NonRegisteringDriver$ConnectionPhantomReference.class
         17817 Fri Oct 17 08:05:20 GMT-07:00 2014
        com/mysql/jdbc/NonRegisteringDriver.class
           690 Fri Oct 17 08:05:24 GMT-07:00 2014
        com/mysql/jdbc/NonRegisteringReplicationDriver.class
           731 Fri Oct 17 08:05:24 GMT-07:00 2014
        com/mysql/jdbc/ReplicationDriver.class
           336 Fri Oct 17 08:05:24 GMT-07:00 2014
        org/gjt/mm/mysql/Driver.class
        You have new mail in /var/spool/mail/dvasthimal
        -sh-4.1$ cat conf/hive-site.xml | grep Driver
        <name>javax.jdo.option.ConnectionDriverName</name>
        *<value>com.mysql.jdbc.Driver</value>*
          <description>Driver class name for a JDBC
        metastore</description>
        -sh-4.1$

--Deepak



        On Sat, Mar 28, 2015 at 1:06 AM, Michael Armbrust
        <mich...@databricks.com <mailto:mich...@databricks.com>> wrote:

            Are you running on yarn?

             - If you are running in yarn-client mode, set
            HADOOP_CONF_DIR to /etc/hive/conf/ (or the directory
            where your hive-site.xml is located).
             - If you are running in yarn-cluster mode, the easiest
            thing to do is to add--files=/etc/hive/conf/hive-site.xml
            (or the path for your hive-site.xml) to your spark-submit
            script.

            On Fri, Mar 27, 2015 at 5:42 AM, ÐΞ€ρ@Ҝ (๏̯͡๏)
            <deepuj...@gmail.com <mailto:deepuj...@gmail.com>> wrote:

                I can recreate tables but what about data. It looks
                like this is a obvious feature that Spark SQL must be
                having. People will want to transform tons of data
                stored in HDFS through Hive from Spark SQL.

                Spark programming guide suggests its possible.


                Spark SQL also supports reading and writing data
                stored in Apache Hive <http://hive.apache.org/>. ....
                Configuration of Hive is done by placing your
                |hive-site.xml| file in |conf/|.
                
https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#hive-tables

                For some reason its not working.


                On Fri, Mar 27, 2015 at 3:35 PM, Arush Kharbanda
                <ar...@sigmoidanalytics.com
                <mailto:ar...@sigmoidanalytics.com>> wrote:

                    Seems Spark SQL accesses some more columns apart
                    from those created by hive.

                    You can always recreate the tables, you would
                    need to execute the table creation scripts but it
                    would be good to avoid recreation.

                    On Fri, Mar 27, 2015 at 3:20 PM, ÐΞ€ρ@Ҝ (๏̯͡๏)
                    <deepuj...@gmail.com
                    <mailto:deepuj...@gmail.com>> wrote:

                        I did copy hive-conf.xml form Hive
                        installation into spark-home/conf. IT does
                        have all the meta store connection details,
                        host, username, passwd, driver and others.



                        Snippet
                        ======


                        <configuration>

                        <property>
                        <name>javax.jdo.option.ConnectionURL</name>
                        <value>jdbc:mysql://host.vip.company.com:3306/HDB
                        <http://host.vip.company.com:3306/HDB></value>
                        </property>

                        <property>
                        <name>javax.jdo.option.ConnectionDriverName</name>
                        <value>com.mysql.jdbc.Driver</value>
                        <description>Driver class name for a JDBC
                        metastore</description>
                        </property>

                        <property>
                        <name>javax.jdo.option.ConnectionUserName</name>
                        <value>hiveuser</value>
                        <description>username to use against
                        metastore database</description>
                        </property>

                        <property>
                        <name>javax.jdo.option.ConnectionPassword</name>
                        <value>some-password</value>
                        <description>password to use against
                        metastore database</description>
                        </property>

                        <property>
                        <name>hive.metastore.local</name>
                        <value>false</value>
                        <description>controls whether to connect to
                        remove metastore server or open a new
                        metastore server in Hive Client JVM</description>
                        </property>

                        <property>
                        <name>hive.metastore.warehouse.dir</name>
                        <value>/user/hive/warehouse</value>
                        <description>location of default database for
                        the warehouse</description>
                        </property>

                        ......



                        When i attempt to read hive table, it does
                        not work. dw_bid does not exists.

                        I am sure there is a way to read tables
                        stored in HDFS (Hive) from Spark SQL.
                        Otherwise how would anyone do analytics since
                        the source tables are always either persisted
                        directly on HDFS or through Hive.


                        On Fri, Mar 27, 2015 at 1:15 PM, Arush
                        Kharbanda <ar...@sigmoidanalytics.com
                        <mailto:ar...@sigmoidanalytics.com>> wrote:

                            Since hive and spark SQL internally use
                            HDFS and Hive metastore. The only thing
                            you want to change is the processing
                            engine. You can try to bring your
                            hive-site.xml to
                            %SPARK_HOME%/conf/hive-site.xml.(Ensure
                            that the hive site xml captures the
                            metastore connection details).

                            Its a hack,  i havnt tried it. I have
                            played around with the metastore and it
                            should work.

                            On Fri, Mar 27, 2015 at 12:04 PM, ÐΞ€ρ@Ҝ
                            (๏̯͡๏) <deepuj...@gmail.com
                            <mailto:deepuj...@gmail.com>> wrote:

                                I have few tables that are created in
                                Hive. I wan to transform data stored
                                in these Hive tables using Spark SQL.
                                Is this even possible ?

                                So far i have seen that i can create
                                new tables using Spark SQL dialect.
                                However when i run show tables or do
                                desc hive_table it says table not found.

                                I am now wondering is this support
                                present or not in Spark SQL ?

--Deepak

--

                            Sigmoid Analytics
                            <http://htmlsig.com/www.sigmoidanalytics.com>

                            *Arush Kharbanda* || Technical Teamlead

                            ar...@sigmoidanalytics.com
                            <mailto:ar...@sigmoidanalytics.com> ||
                            www.sigmoidanalytics.com
                            <http://www.sigmoidanalytics.com/>

--Deepak

--

                    Sigmoid Analytics
                    <http://htmlsig.com/www.sigmoidanalytics.com>

                    *Arush Kharbanda* || Technical Teamlead

                    ar...@sigmoidanalytics.com
                    <mailto:ar...@sigmoidanalytics.com> ||
                    www.sigmoidanalytics.com
                    <http://www.sigmoidanalytics.com/>

--Deepak





--
Deepak

Re: Can spark sql read existing tables created in hive

Reply via email to