oh...metastore_db location is not controlled by hive.metastore.warehouse.dir -- one is the location of your metastore DB, the other is the physical location of your stored data. Checkout this SO thread: http://stackoverflow.com/questions/13624893/metastore-db-created-wherever-i-run-hive
On Sat, May 16, 2015 at 9:07 AM, Tamas Jambor <jambo...@gmail.com> wrote: > Gave it another try - it seems that it picks up the variable and prints > out the correct value, but still puts the metatore_db folder in the current > directory, regardless. > > On Sat, May 16, 2015 at 1:13 PM, Tamas Jambor <jambo...@gmail.com> wrote: > >> Thank you for the reply. >> >> I have tried your experiment, it seems that it does not print the >> settings out in spark-shell (I'm using 1.3 by the way). >> >> Strangely I have been experimenting with an SQL connection instead, which >> works after all (still if I go to spark-shell and try to print out the SQL >> settings that I put in hive-site.xml, it does not print them). >> >> >> On Fri, May 15, 2015 at 7:22 PM, Yana Kadiyska <yana.kadiy...@gmail.com> >> wrote: >> >>> My point was more to how to verify that properties are picked up from >>> the hive-site.xml file. You don't really need hive.metastore.uris if >>> you're not running against an external metastore. I just did an >>> experiment with warehouse.dir. >>> >>> My hive-site.xml looks like this: >>> >>> <configuration> >>> <property> >>> <name>hive.metastore.warehouse.dir</name> >>> <value>/home/ykadiysk/Github/warehouse_dir</value> >>> <description>location of default database for the >>> warehouse</description> >>> </property> >>> </configuration> >>> >>> >>> >>> and spark-shell code: >>> >>> scala> val hc= new org.apache.spark.sql.hive.HiveContext(sc) >>> hc: org.apache.spark.sql.hive.HiveContext = >>> org.apache.spark.sql.hive.HiveContext@3036c16f >>> >>> scala> hc.sql("show tables").collect >>> 15/05/15 14:12:57 INFO HiveMetaStore: 0: Opening raw store with >>> implemenation class:org.apache.hadoop.hive.metastore.ObjectStore >>> 15/05/15 14:12:57 INFO ObjectStore: ObjectStore, initialize called >>> 15/05/15 14:12:57 INFO Persistence: Property datanucleus.cache.level2 >>> unknown - will be ignored >>> 15/05/15 14:12:58 WARN Connection: BoneCP specified but not present in >>> CLASSPATH (or one of dependencies) >>> 15/05/15 14:12:58 WARN Connection: BoneCP specified but not present in >>> CLASSPATH (or one of dependencies) >>> 15/05/15 14:13:03 INFO ObjectStore: Setting MetaStore object pin classes >>> with >>> hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order" >>> 15/05/15 14:13:03 INFO ObjectStore: Initialized ObjectStore >>> 15/05/15 14:13:04 WARN ObjectStore: Version information not found in >>> metastore. hive.metastore.schema.verification is not enabled so recording >>> the schema version 0.12.0-protobuf-2.5 >>> 15/05/15 14:13:05 INFO HiveMetaStore: 0: get_tables: db=default pat=.* >>> 15/05/15 14:13:05 INFO audit: ugi=ykadiysk ip=unknown-ip-addr >>> cmd=get_tables: db=default pat=.* >>> 15/05/15 14:13:05 INFO Datastore: The class >>> "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as >>> "embedded-only" so does not have its own datastore table. >>> 15/05/15 14:13:05 INFO Datastore: The class >>> "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as >>> "embedded-only" so does not have its own datastore table. >>> res0: Array[org.apache.spark.sql.Row] = Array() >>> >>> scala> hc.getConf("hive.metastore.warehouse.dir") >>> res1: String = /home/ykadiysk/Github/warehouse_dir >>> >>> >>> >>> I have not tried an HDFS path but you should be at least able to verify >>> that the variable is being read. It might be that your value is read but is >>> otherwise not liked... >>> >>> On Fri, May 15, 2015 at 2:03 PM, Tamas Jambor <jambo...@gmail.com> >>> wrote: >>> >>>> thanks for the reply. I am trying to use it without hive setup >>>> (spark-standalone), so it prints something like this: >>>> >>>> hive_ctx.sql("show tables").collect() >>>> 15/05/15 17:59:03 INFO HiveMetaStore: 0: Opening raw store with >>>> implemenation class:org.apache.hadoop.hive.metastore.ObjectStore >>>> 15/05/15 17:59:03 INFO ObjectStore: ObjectStore, initialize called >>>> 15/05/15 17:59:04 INFO Persistence: Property datanucleus.cache.level2 >>>> unknown - will be ignored >>>> 15/05/15 17:59:04 INFO Persistence: Property >>>> hive.metastore.integral.jdo.pushdown unknown - will be ignored >>>> 15/05/15 17:59:04 WARN Connection: BoneCP specified but not present in >>>> CLASSPATH (or one of dependencies) >>>> 15/05/15 17:59:05 WARN Connection: BoneCP specified but not present in >>>> CLASSPATH (or one of dependencies) >>>> 15/05/15 17:59:08 INFO BlockManagerMasterActor: Registering block >>>> manager xxxx:42819 with 3.0 GB RAM, BlockManagerId(2, xxx, 42819) >>>> >>>> [0/1844] >>>> 15/05/15 17:59:18 INFO ObjectStore: Setting MetaStore object pin >>>> classes with >>>> hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order" >>>> 15/05/15 17:59:18 INFO MetaStoreDirectSql: MySQL check failed, assuming >>>> we are not on mysql: Lexical error at line 1, column 5. Encountered: "@" >>>> (64), after : "". >>>> 15/05/15 17:59:20 INFO Datastore: The class >>>> "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as >>>> "embedded-only" so does not have its own datastore table. >>>> 15/05/15 17:59:20 INFO Datastore: The class >>>> "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as >>>> "embedded-only" so does not have its own datastore table. >>>> 15/05/15 17:59:28 INFO Datastore: The class >>>> "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as >>>> "embedded-only" so does not have its own datastore table. >>>> 15/05/15 17:59:29 INFO Datastore: The class >>>> "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as >>>> "embedded-only" so does not have its own datastore table. >>>> 15/05/15 17:59:31 INFO ObjectStore: Initialized ObjectStore >>>> 15/05/15 17:59:32 WARN ObjectStore: Version information not found in >>>> metastore. hive.metastore.schema.verification is not enabled so recording >>>> the schema version 0.13.1aa >>>> 15/05/15 17:59:33 WARN MetricsConfig: Cannot locate configuration: >>>> tried >>>> hadoop-metrics2-azure-file-system.properties,hadoop-metrics2.properties >>>> 15/05/15 17:59:33 INFO MetricsSystemImpl: Scheduled snapshot period at >>>> 10 second(s). >>>> 15/05/15 17:59:33 INFO MetricsSystemImpl: azure-file-system metrics >>>> system started >>>> 15/05/15 17:59:33 INFO HiveMetaStore: Added admin role in metastore >>>> 15/05/15 17:59:34 INFO HiveMetaStore: Added public role in metastore >>>> 15/05/15 17:59:34 INFO HiveMetaStore: No user is added in admin role, >>>> since config is empty >>>> 15/05/15 17:59:35 INFO SessionState: No Tez session required at this >>>> point. hive.execution.engine=mr. >>>> 15/05/15 17:59:37 INFO HiveMetaStore: 0: get_tables: db=default pat=.* >>>> 15/05/15 17:59:37 INFO audit: ugi=testuser ip=unknown-ip-addr >>>> cmd=get_tables: db=default pat=.* >>>> >>>> not sure what to put in hive.metastore.uris in this case? >>>> >>>> >>>> On Fri, May 15, 2015 at 2:52 PM, Yana Kadiyska <yana.kadiy...@gmail.com >>>> > wrote: >>>> >>>>> This should work. Which version of Spark are you using? Here is what I >>>>> do -- make sure hive-site.xml is in the conf directory of the machine >>>>> you're using the driver from. Now let's run spark-shell from that machine: >>>>> >>>>> scala> val hc= new org.apache.spark.sql.hive.HiveContext(sc) >>>>> hc: org.apache.spark.sql.hive.HiveContext = >>>>> org.apache.spark.sql.hive.HiveContext@6e9f8f26 >>>>> >>>>> scala> hc.sql("show tables").collect >>>>> 15/05/15 09:34:17 INFO metastore: Trying to connect to metastore with URI >>>>> thrift://hostname.com:9083 <-- here should be a value from >>>>> your hive-site.xml >>>>> 15/05/15 09:34:17 INFO metastore: Waiting 1 seconds before next >>>>> connection attempt. >>>>> 15/05/15 09:34:18 INFO metastore: Connected to metastore. >>>>> res0: Array[org.apache.spark.sql.Row] = Array([table1,false], >>>>> >>>>> scala> hc.getConf("hive.metastore.uris") >>>>> res13: String = thrift://hostname.com:9083 >>>>> >>>>> scala> hc.getConf("hive.metastore.warehouse.dir") >>>>> res14: String = /user/hive/warehouse >>>>> >>>>> >>>>> >>>>> The first line tells you which metastore it's trying to connect to -- >>>>> this should be the string specified under hive.metastore.uris property in >>>>> your hive-site.xml file. I have not mucked with warehouse.dir too much but >>>>> I know that the value of the metastore URI is in fact picked up from there >>>>> as I regularly point to different systems... >>>>> >>>>> >>>>> On Thu, May 14, 2015 at 6:26 PM, Tamas Jambor <jambo...@gmail.com> >>>>> wrote: >>>>> >>>>>> I have tried to put the hive-site.xml file in the conf/ directory >>>>>> with, seems it is not picking up from there. >>>>>> >>>>>> >>>>>> On Thu, May 14, 2015 at 6:50 PM, Michael Armbrust < >>>>>> mich...@databricks.com> wrote: >>>>>> >>>>>>> You can configure Spark SQLs hive interaction by placing a >>>>>>> hive-site.xml file in the conf/ directory. >>>>>>> >>>>>>> On Thu, May 14, 2015 at 10:24 AM, jamborta <jambo...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> is it possible to set hive.metastore.warehouse.dir, that is >>>>>>>> internally >>>>>>>> create by spark, to be stored externally (e.g. s3 on aws or wasb on >>>>>>>> azure)? >>>>>>>> >>>>>>>> thanks, >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> View this message in context: >>>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/store-hive-metastore-on-persistent-store-tp22891.html >>>>>>>> Sent from the Apache Spark User List mailing list archive at >>>>>>>> Nabble.com. >>>>>>>> >>>>>>>> >>>>>>>> --------------------------------------------------------------------- >>>>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>>>>>> For additional commands, e-mail: user-h...@spark.apache.org >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >