Re: store hive metastore on persistent store

Yana Kadiyska Fri, 15 May 2015 11:24:58 -0700

My point was more to how to verify that properties are picked up from
the hive-site.xml file. You don't really need hive.metastore.uris if you're
not running against an external metastore.  I just did an experiment with
warehouse.dir.


My hive-site.xml looks like this:

<configuration>
    <property>
        <name>hive.metastore.warehouse.dir</name>
        <value>/home/ykadiysk/Github/warehouse_dir</value>
        <description>location of default database for the
warehouse</description>
    </property>
</configuration>



and spark-shell code:

scala> val hc= new org.apache.spark.sql.hive.HiveContext(sc)
hc: org.apache.spark.sql.hive.HiveContext =
org.apache.spark.sql.hive.HiveContext@3036c16f

scala> hc.sql("show tables").collect
15/05/15 14:12:57 INFO HiveMetaStore: 0: Opening raw store with
implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
15/05/15 14:12:57 INFO ObjectStore: ObjectStore, initialize called
15/05/15 14:12:57 INFO Persistence: Property datanucleus.cache.level2
unknown - will be ignored
15/05/15 14:12:58 WARN Connection: BoneCP specified but not present in
CLASSPATH (or one of dependencies)
15/05/15 14:12:58 WARN Connection: BoneCP specified but not present in
CLASSPATH (or one of dependencies)
15/05/15 14:13:03 INFO ObjectStore: Setting MetaStore object pin
classes with 
hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
15/05/15 14:13:03 INFO ObjectStore: Initialized ObjectStore
15/05/15 14:13:04 WARN ObjectStore: Version information not found in
metastore. hive.metastore.schema.verification is not enabled so
recording the schema version 0.12.0-protobuf-2.5
15/05/15 14:13:05 INFO HiveMetaStore: 0: get_tables: db=default pat=.*
15/05/15 14:13:05 INFO audit: ugi=ykadiysk      ip=unknown-ip-addr
 cmd=get_tables: db=default pat=.*
15/05/15 14:13:05 INFO Datastore: The class
"org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as
"embedded-only" so does not have its own datastore table.
15/05/15 14:13:05 INFO Datastore: The class
"org.apache.hadoop.hive.metastore.model.MOrder" is tagged as
"embedded-only" so does not have its own datastore table.
res0: Array[org.apache.spark.sql.Row] = Array()

scala> hc.getConf("hive.metastore.warehouse.dir")
res1: String = /home/ykadiysk/Github/warehouse_dir



I have not tried an HDFS path but you should be at least able to verify
that the variable is being read. It might be that your value is read but is
otherwise not liked...

On Fri, May 15, 2015 at 2:03 PM, Tamas Jambor <jambo...@gmail.com> wrote:

> thanks for the reply. I am trying to use it without hive setup
> (spark-standalone), so it prints something like this:
>
> hive_ctx.sql("show tables").collect()
> 15/05/15 17:59:03 INFO HiveMetaStore: 0: Opening raw store with
> implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
> 15/05/15 17:59:03 INFO ObjectStore: ObjectStore, initialize called
> 15/05/15 17:59:04 INFO Persistence: Property datanucleus.cache.level2
> unknown - will be ignored
> 15/05/15 17:59:04 INFO Persistence: Property
> hive.metastore.integral.jdo.pushdown unknown - will be ignored
> 15/05/15 17:59:04 WARN Connection: BoneCP specified but not present in
> CLASSPATH (or one of dependencies)
> 15/05/15 17:59:05 WARN Connection: BoneCP specified but not present in
> CLASSPATH (or one of dependencies)
> 15/05/15 17:59:08 INFO BlockManagerMasterActor: Registering block manager
> xxxx:42819 with 3.0 GB RAM, BlockManagerId(2, xxx, 42819)
>
> [0/1844]
> 15/05/15 17:59:18 INFO ObjectStore: Setting MetaStore object pin classes
> with
> hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
> 15/05/15 17:59:18 INFO MetaStoreDirectSql: MySQL check failed, assuming we
> are not on mysql: Lexical error at line 1, column 5.  Encountered: "@"
> (64), after : "".
> 15/05/15 17:59:20 INFO Datastore: The class
> "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as
> "embedded-only" so does not have its own datastore table.
> 15/05/15 17:59:20 INFO Datastore: The class
> "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as
> "embedded-only" so does not have its own datastore table.
> 15/05/15 17:59:28 INFO Datastore: The class
> "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as
> "embedded-only" so does not have its own datastore table.
> 15/05/15 17:59:29 INFO Datastore: The class
> "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as
> "embedded-only" so does not have its own datastore table.
> 15/05/15 17:59:31 INFO ObjectStore: Initialized ObjectStore
> 15/05/15 17:59:32 WARN ObjectStore: Version information not found in
> metastore. hive.metastore.schema.verification is not enabled so recording
> the schema version 0.13.1aa
> 15/05/15 17:59:33 WARN MetricsConfig: Cannot locate configuration: tried
> hadoop-metrics2-azure-file-system.properties,hadoop-metrics2.properties
> 15/05/15 17:59:33 INFO MetricsSystemImpl: Scheduled snapshot period at 10
> second(s).
> 15/05/15 17:59:33 INFO MetricsSystemImpl: azure-file-system metrics system
> started
> 15/05/15 17:59:33 INFO HiveMetaStore: Added admin role in metastore
> 15/05/15 17:59:34 INFO HiveMetaStore: Added public role in metastore
> 15/05/15 17:59:34 INFO HiveMetaStore: No user is added in admin role,
> since config is empty
> 15/05/15 17:59:35 INFO SessionState: No Tez session required at this
> point. hive.execution.engine=mr.
> 15/05/15 17:59:37 INFO HiveMetaStore: 0: get_tables: db=default pat=.*
> 15/05/15 17:59:37 INFO audit: ugi=testuser     ip=unknown-ip-addr
>  cmd=get_tables: db=default pat=.*
>
> not sure what to put in hive.metastore.uris in this case?
>
>
> On Fri, May 15, 2015 at 2:52 PM, Yana Kadiyska <yana.kadiy...@gmail.com>
> wrote:
>
>> This should work. Which version of Spark are you using? Here is what I do
>> -- make sure hive-site.xml is in the conf directory of the machine you're
>> using the driver from. Now let's run spark-shell from that machine:
>>
>> scala> val hc= new org.apache.spark.sql.hive.HiveContext(sc)
>> hc: org.apache.spark.sql.hive.HiveContext = 
>> org.apache.spark.sql.hive.HiveContext@6e9f8f26
>>
>> scala> hc.sql("show tables").collect
>> 15/05/15 09:34:17 INFO metastore: Trying to connect to metastore with URI 
>> thrift://hostname.com:9083              <-- here should be a value from your 
>> hive-site.xml
>> 15/05/15 09:34:17 INFO metastore: Waiting 1 seconds before next connection 
>> attempt.
>> 15/05/15 09:34:18 INFO metastore: Connected to metastore.
>> res0: Array[org.apache.spark.sql.Row] = Array([table1,false],
>>
>> scala> hc.getConf("hive.metastore.uris")
>> res13: String = thrift://hostname.com:9083
>>
>> scala> hc.getConf("hive.metastore.warehouse.dir")
>> res14: String = /user/hive/warehouse
>>
>> 
>>
>> The first line tells you which metastore it's trying to connect to --
>> this should be the string specified under hive.metastore.uris property in
>> your hive-site.xml file. I have not mucked with warehouse.dir too much but
>> I know that the value of the metastore URI is in fact picked up from there
>> as I regularly point to different systems...
>>
>>
>> On Thu, May 14, 2015 at 6:26 PM, Tamas Jambor <jambo...@gmail.com> wrote:
>>
>>> I have tried to put the hive-site.xml file in the conf/ directory with,
>>> seems it is not picking up from there.
>>>
>>>
>>> On Thu, May 14, 2015 at 6:50 PM, Michael Armbrust <
>>> mich...@databricks.com> wrote:
>>>
>>>> You can configure Spark SQLs hive interaction by placing a
>>>> hive-site.xml file in the conf/ directory.
>>>>
>>>> On Thu, May 14, 2015 at 10:24 AM, jamborta <jambo...@gmail.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> is it possible to set hive.metastore.warehouse.dir, that is internally
>>>>> create by spark, to be stored externally (e.g. s3 on aws or wasb on
>>>>> azure)?
>>>>>
>>>>> thanks,
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/store-hive-metastore-on-persistent-store-tp22891.html
>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>> Nabble.com.
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: store hive metastore on persistent store

Reply via email to