I think the problem is that when you are using yarn-cluster mode, because the Spark driver runs inside the application master, the hive-conf is not accessible by the driver. Can you try to set those confs by using hiveContext.set(...)? Or, maybe you can copy hive-site.xml to spark/conf in the node running the application master.
On Tue, Aug 12, 2014 at 8:38 PM, Jenny Zhao <linlin200...@gmail.com> wrote: > > Hi Yin, > > hive-site.xml was copied to spark/conf and the same as the one under > $HIVE_HOME/conf. > > through hive cli, I don't see any problem. but for spark on yarn-cluster > mode, I am not able to switch to a database other than the default one, for > Yarn-client mode, it works fine. > > Thanks! > > Jenny > > > On Tue, Aug 12, 2014 at 12:53 PM, Yin Huai <huaiyin....@gmail.com> wrote: > >> Hi Jenny, >> >> Have you copied hive-site.xml to spark/conf directory? If not, can you >> put it in conf/ and try again? >> >> Thanks, >> >> Yin >> >> >> On Mon, Aug 11, 2014 at 8:57 PM, Jenny Zhao <linlin200...@gmail.com> >> wrote: >> >>> >>> Thanks Yin! >>> >>> here is my hive-site.xml, which I copied from $HIVE_HOME/conf, didn't >>> experience problem connecting to the metastore through hive. which uses DB2 >>> as metastore database. >>> >>> <?xml version="1.0"?> >>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> >>> <!-- >>> Licensed to the Apache Software Foundation (ASF) under one or more >>> contributor license agreements. See the NOTICE file distributed with >>> this work for additional information regarding copyright ownership. >>> The ASF licenses this file to You under the Apache License, Version >>> 2.0 >>> (the "License"); you may not use this file except in compliance with >>> the License. You may obtain a copy of the License at >>> >>> http://www.apache.org/licenses/LICENSE-2.0 >>> >>> Unless required by applicable law or agreed to in writing, software >>> distributed under the License is distributed on an "AS IS" BASIS, >>> WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or >>> implied. >>> See the License for the specific language governing permissions and >>> limitations under the License. >>> --> >>> <configuration> >>> <property> >>> <name>hive.hwi.listen.port</name> >>> <value>9999</value> >>> </property> >>> <property> >>> <name>hive.querylog.location</name> >>> <value>/var/ibm/biginsights/hive/query/${user.name}</value> >>> </property> >>> <property> >>> <name>hive.metastore.warehouse.dir</name> >>> <value>/biginsights/hive/warehouse</value> >>> </property> >>> <property> >>> <name>hive.hwi.war.file</name> >>> <value>lib/hive-hwi-0.12.0.war</value> >>> </property> >>> <property> >>> <name>hive.metastore.metrics.enabled</name> >>> <value>true</value> >>> </property> >>> <property> >>> <name>javax.jdo.option.ConnectionURL</name> >>> <value>jdbc:db2://hdtest022.svl.ibm.com:50001/BIDB</value> >>> </property> >>> <property> >>> <name>javax.jdo.option.ConnectionDriverName</name> >>> <value>com.ibm.db2.jcc.DB2Driver</value> >>> </property> >>> <property> >>> <name>hive.stats.autogather</name> >>> <value>false</value> >>> </property> >>> <property> >>> <name>javax.jdo.mapping.Schema</name> >>> <value>HIVE</value> >>> </property> >>> <property> >>> <name>javax.jdo.option.ConnectionUserName</name> >>> <value>catalog</value> >>> </property> >>> <property> >>> <name>javax.jdo.option.ConnectionPassword</name> >>> <value>V2pJNWMxbFlVbWhaZHowOQ==</value> >>> </property> >>> <property> >>> <name>hive.metastore.password.encrypt</name> >>> <value>true</value> >>> </property> >>> <property> >>> <name>org.jpox.autoCreateSchema</name> >>> <value>true</value> >>> </property> >>> <property> >>> <name>hive.server2.thrift.min.worker.threads</name> >>> <value>5</value> >>> </property> >>> <property> >>> <name>hive.server2.thrift.max.worker.threads</name> >>> <value>100</value> >>> </property> >>> <property> >>> <name>hive.server2.thrift.port</name> >>> <value>10000</value> >>> </property> >>> <property> >>> <name>hive.server2.thrift.bind.host</name> >>> <value>hdtest022.svl.ibm.com</value> >>> </property> >>> <property> >>> <name>hive.server2.authentication</name> >>> <value>CUSTOM</value> >>> </property> >>> <property> >>> <name>hive.server2.custom.authentication.class</name> >>> >>> <value>org.apache.hive.service.auth.WebConsoleAuthenticationProviderImpl</value> >>> </property> >>> <property> >>> <name>hive.server2.enable.impersonation</name> >>> <value>true</value> >>> </property> >>> <property> >>> <name>hive.security.webconsole.url</name> >>> <value>http://hdtest022.svl.ibm.com:8080</value> >>> </property> >>> <property> >>> <name>hive.security.authorization.enabled</name> >>> <value>true</value> >>> </property> >>> <property> >>> <name>hive.security.authorization.createtable.owner.grants</name> >>> <value>ALL</value> >>> </property> >>> </configuration> >>> >>> >>> >>> On Mon, Aug 11, 2014 at 4:29 PM, Yin Huai <huaiyin....@gmail.com> wrote: >>> >>>> Hi Jenny, >>>> >>>> How's your metastore configured for both Hive and Spark SQL? Which >>>> metastore mode are you using (based on >>>> https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin >>>> )? >>>> >>>> Thanks, >>>> >>>> Yin >>>> >>>> >>>> On Mon, Aug 11, 2014 at 6:15 PM, Jenny Zhao <linlin200...@gmail.com> >>>> wrote: >>>> >>>>> >>>>> >>>>> you can reproduce this issue with the following steps (assuming you >>>>> have Yarn cluster + Hive 12): >>>>> >>>>> 1) using hive shell, create a database, e.g: create database ttt >>>>> >>>>> 2) write a simple spark sql program >>>>> >>>>> import org.apache.spark.{SparkConf, SparkContext} >>>>> import org.apache.spark.sql._ >>>>> import org.apache.spark.sql.hive.HiveContext >>>>> >>>>> object HiveSpark { >>>>> case class Record(key: Int, value: String) >>>>> >>>>> def main(args: Array[String]) { >>>>> val sparkConf = new SparkConf().setAppName("HiveSpark") >>>>> val sc = new SparkContext(sparkConf) >>>>> >>>>> // A hive context creates an instance of the Hive Metastore in >>>>> process, >>>>> val hiveContext = new HiveContext(sc) >>>>> import hiveContext._ >>>>> >>>>> hql("use ttt") >>>>> hql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)") >>>>> hql("LOAD DATA INPATH '/user/biadmin/kv1.txt' INTO TABLE src") >>>>> >>>>> // Queries are expressed in HiveQL >>>>> println("Result of 'SELECT *': ") >>>>> hql("SELECT * FROM src").collect.foreach(println) >>>>> sc.stop() >>>>> } >>>>> } >>>>> 3) run it in yarn-cluster mode. >>>>> >>>>> >>>>> On Mon, Aug 11, 2014 at 9:44 AM, Cheng Lian <lian.cs....@gmail.com> >>>>> wrote: >>>>> >>>>>> Since you were using hql(...), it’s probably not related to JDBC >>>>>> driver. But I failed to reproduce this issue locally with a single node >>>>>> pseudo distributed YARN cluster. Would you mind to elaborate more about >>>>>> steps to reproduce this bug? Thanks >>>>>> >>>>>> >>>>>> >>>>>> On Sun, Aug 10, 2014 at 9:36 PM, Cheng Lian <lian.cs....@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hi Jenny, does this issue only happen when running Spark SQL with >>>>>>> YARN in your environment? >>>>>>> >>>>>>> >>>>>>> On Sat, Aug 9, 2014 at 3:56 AM, Jenny Zhao <linlin200...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I am able to run my hql query on yarn cluster mode when connecting >>>>>>>> to the default hive metastore defined in hive-site.xml. >>>>>>>> >>>>>>>> however, if I want to switch to a different database, like: >>>>>>>> >>>>>>>> hql("use other-database") >>>>>>>> >>>>>>>> >>>>>>>> it only works in yarn client mode, but failed on yarn-cluster mode >>>>>>>> with the following stack: >>>>>>>> >>>>>>>> 14/08/08 12:09:11 INFO HiveMetaStore: 0: get_database: tt >>>>>>>> 14/08/08 12:09:11 INFO audit: ugi=biadmin ip=unknown-ip-addr >>>>>>>> cmd=get_database: tt >>>>>>>> 14/08/08 12:09:11 ERROR RetryingHMSHandler: >>>>>>>> NoSuchObjectException(message:There is no database named tt) >>>>>>>> at >>>>>>>> org.apache.hadoop.hive.metastore.ObjectStore.getMDatabase(ObjectStore.java:431) >>>>>>>> at >>>>>>>> org.apache.hadoop.hive.metastore.ObjectStore.getDatabase(ObjectStore.java:441) >>>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>>>>> at >>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) >>>>>>>> at >>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) >>>>>>>> at java.lang.reflect.Method.invoke(Method.java:611) >>>>>>>> at >>>>>>>> org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:124) >>>>>>>> at $Proxy15.getDatabase(Unknown Source) >>>>>>>> at >>>>>>>> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(HiveMetaStore.java:628) >>>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>>>>> at >>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) >>>>>>>> at >>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) >>>>>>>> at java.lang.reflect.Method.invoke(Method.java:611) >>>>>>>> at >>>>>>>> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:103) >>>>>>>> at $Proxy17.get_database(Unknown Source) >>>>>>>> at >>>>>>>> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:810) >>>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>>>>> at >>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) >>>>>>>> at >>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) >>>>>>>> at java.lang.reflect.Method.invoke(Method.java:611) >>>>>>>> at >>>>>>>> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89) >>>>>>>> at $Proxy18.getDatabase(Unknown Source) >>>>>>>> at >>>>>>>> org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1139) >>>>>>>> at >>>>>>>> org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1128) >>>>>>>> at >>>>>>>> org.apache.hadoop.hive.ql.exec.DDLTask.switchDatabase(DDLTask.java:3479) >>>>>>>> at >>>>>>>> org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:237) >>>>>>>> at >>>>>>>> org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151) >>>>>>>> at >>>>>>>> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65) >>>>>>>> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1414) >>>>>>>> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1192) >>>>>>>> at >>>>>>>> org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1020) >>>>>>>> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888) >>>>>>>> at >>>>>>>> org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:208) >>>>>>>> at >>>>>>>> org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:182) >>>>>>>> at >>>>>>>> org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd$lzycompute(HiveContext.scala:272) >>>>>>>> at >>>>>>>> org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd(HiveContext.scala:269) >>>>>>>> at >>>>>>>> org.apache.spark.sql.hive.HiveContext.hiveql(HiveContext.scala:86) >>>>>>>> at >>>>>>>> org.apache.spark.sql.hive.HiveContext.hql(HiveContext.scala:91) >>>>>>>> at >>>>>>>> org.apache.spark.examples.sql.hive.HiveSpark$.main(HiveSpark.scala:35) >>>>>>>> at >>>>>>>> org.apache.spark.examples.sql.hive.HiveSpark.main(HiveSpark.scala) >>>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>>>>> at >>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) >>>>>>>> at >>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) >>>>>>>> at java.lang.reflect.Method.invoke(Method.java:611) >>>>>>>> at >>>>>>>> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:186) >>>>>>>> >>>>>>>> 14/08/08 12:09:11 ERROR DDLTask: >>>>>>>> org.apache.hadoop.hive.ql.metadata.HiveException: Database does not >>>>>>>> exist: tt >>>>>>>> at >>>>>>>> org.apache.hadoop.hive.ql.exec.DDLTask.switchDatabase(DDLTask.java:3480) >>>>>>>> at >>>>>>>> org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:237) >>>>>>>> at >>>>>>>> org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151) >>>>>>>> at >>>>>>>> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65) >>>>>>>> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1414) >>>>>>>> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1192) >>>>>>>> at >>>>>>>> org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1020) >>>>>>>> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888) >>>>>>>> at >>>>>>>> org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:208) >>>>>>>> at >>>>>>>> org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:182) >>>>>>>> at >>>>>>>> org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd$lzycompute(HiveContext.scala:272) >>>>>>>> at >>>>>>>> org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd(HiveContext.scala:269) >>>>>>>> at >>>>>>>> org.apache.spark.sql.hive.HiveContext.hiveql(HiveContext.scala:86) >>>>>>>> at >>>>>>>> org.apache.spark.sql.hive.HiveContext.hql(HiveContext.scala:91) >>>>>>>> at >>>>>>>> org.apache.spark.examples.sql.hive.HiveSpark$.main(HiveSpark.scala:35) >>>>>>>> at >>>>>>>> org.apache.spark.examples.sql.hive.HiveSpark.main(HiveSpark.scala) >>>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>>>>> at >>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) >>>>>>>> at >>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) >>>>>>>> at java.lang.reflect.Method.invoke(Method.java:611) >>>>>>>> at >>>>>>>> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:186) >>>>>>>> nono >>>>>>>> >>>>>>>> why is that? not sure if this is something to do with hive jdbc >>>>>>>> driver? >>>>>>>> >>>>>>>> Thank you! >>>>>>>> >>>>>>>> Jenny >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >