Hi All,

It has been awhile, but what I did to make it work is to make sure the 
followings:

1. Hive is working when you run Hive CLI and JDBC via Hiveserver2

2. Make sure you have the hive-site.xml from above Hive configuration. The 
problem here is that you want the hive-site.xml from the Hive metastore. The 
one for Hive and HCatalog may be different files. Make sure you check the xml 
properties in that file, pick the one that has the warehouse property 
configured and the JDO setup.

3. Make sure hive-site.xml from step 2 is included in $SPARK_HOME/conf, and in 
your runtime CLASSPATH when you run spark-shell

4. Use the history server to check the runtime CLASSPATH and order to ensure 
hive-site.xml is included.

HiveContext should pick up the hive-site.xml and talk to your running hive 
service.

Hope these tips help.

> On Jul 30, 2014, at 22:47, "chenjie" <chenjie2...@gmail.com> wrote:
> 
> Hi, Michael. I Have the same problem. My warehouse directory is always
> created locally. I copied the default hive-site.xml into the
> $SPARK_HOME/conf directory on each node. After I executed the code below,
>    val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
>    hiveContext.hql("CREATE TABLE IF NOT EXISTS src (key INT, value
> STRING)")
>    hiveContext.hql("LOAD DATA LOCAL INPATH
> '/extdisk2/tools/spark/examples/src/main/resources/kv1.txt' INTO TABLE src")
>    hiveContext.hql("FROM src SELECT key, value").collect()
> 
> I got the exception below:
> java.io.FileNotFoundException: File file:/user/hive/warehouse/src/kv1.txt
> does not exist
>    at
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:520)
>    at
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398)
>    at
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:137)
>    at
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339)
>    at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:763)
>    at
> org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:106)
>    at
> org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
>    at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:193)
> 
> At last, I found /user/hive/warehouse/src/kv1.txt was created on the node
> where I start spark-shell.
> 
> The spark that I used is pre-built spark1.0.1 for hadoop2.
> 
> Thanks in advance.
> 
> 
> Michael Armbrust wrote
>> The warehouse and the metastore directories are two different things.  The
>> metastore holds the schema information about the tables and will by
>> default
>> be a local directory.  With javax.jdo.option.ConnectionURL you can
>> configure it to be something like mysql.  The warehouse directory is the
>> default location where the actual contents of the tables is stored.  What
>> directory are seeing created locally?
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/HiveContext-is-creating-metastore-warehouse-locally-instead-of-in-hdfs-tp10838p11024.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to