I am not so sure if Hive supports change the metastore after initialized, I
guess not. Spark SQL totally rely on Hive Metastore in HiveContext, probably
that's why it doesn't work as expected for Q1.
BTW, in most of cases, people configure the metastore settings in
hive-site.xml, and will not change that since then, is there any reason that
you want to change that in runtime?
For Q2, probably something wrong in configuration, seems the HDFS run into the
pseudo/single node mode, can you double check that? Or can you run the DDL
(like create a table) from the spark shell with HiveContext?
From: Haopu Wang [mailto:hw...@qilinsoft.com]
Sent: Tuesday, March 10, 2015 6:38 PM
To: user; dev@spark.apache.org
Subject: [SparkSQL] Reuse HiveContext to different Hive warehouse?
I'm using Spark 1.3.0 RC3 build with Hive support.
In Spark Shell, I want to reuse the HiveContext instance to different warehouse
locations. Below are the steps for my test (Assume I have loaded a file into
table src).
==
15/03/10 18:22:59 INFO SparkILoop: Created sql context (with Hive support)..
SQL context available as sqlContext.
scala sqlContext.sql(SET hive.metastore.warehouse.dir=/test/w)
scala sqlContext.sql(SELECT * from src).saveAsTable(table1)
scala sqlContext.sql(SET hive.metastore.warehouse.dir=/test/w2)
scala sqlContext.sql(SELECT * from src).saveAsTable(table2)
==
After these steps, the tables are stored in /test/w only. I expect table2
to be stored in /test/w2 folder.
Another question is: if I set hive.metastore.warehouse.dir to a HDFS folder,
I cannot use saveAsTable()? Is this by design? Exception stack trace is below:
==
15/03/10 18:35:28 INFO BlockManagerMaster: Updated info of block
broadcast_0_piece0
15/03/10 18:35:28 INFO SparkContext: Created broadcast 0 from broadcast at
TableReader.scala:74
java.lang.IllegalArgumentException: Wrong FS:
hdfs://server:8020/space/warehouse/table2, expected: file:///file:///\\
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:643)
at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:463)
at
org.apache.hadoop.fs.FilterFileSystem.makeQualified(FilterFileSystem.java:118)
at
org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.apply(newParquet.scala:252)
at
org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.apply(newParquet.scala:251)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at
org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache.refresh(newParquet.scala:251)
at
org.apache.spark.sql.parquet.ParquetRelation2.init(newParquet.scala:370)
at
org.apache.spark.sql.parquet.DefaultSource.createRelation(newParquet.scala:96)
at
org.apache.spark.sql.parquet.DefaultSource.createRelation(newParquet.scala:125)
at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:308)
at
org.apache.spark.sql.hive.execution.CreateMetastoreDataSourceAsSelect.run(commands.scala:217)
at
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:55)
at
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:55)
at
org.apache.spark.sql.execution.ExecutedCommand.execute(commands.scala:65)
at
org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:1088)
at
org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:1088)
at org.apache.spark.sql.DataFrame.saveAsTable(DataFrame.scala:1048)
at org.apache.spark.sql.DataFrame.saveAsTable(DataFrame.scala:998)
at org.apache.spark.sql.DataFrame.saveAsTable(DataFrame.scala:964)
at org.apache.spark.sql.DataFrame.saveAsTable(DataFrame.scala:942)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:20)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:25)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:27)
at $iwC$$iwC$$iwC$$iwC$$iwC.init(console:29)
at $iwC$$iwC$$iwC$$iwC.init(console:31)
at $iwC$$iwC$$iwC.init(console:33)
at $iwC$$iwC.init(console:35)
at $iwC.init(console:37)
at init(console:39)
Thank you very much!