Re: HiveContext is creating metastore warehouse locally instead of in hdfs

2014-08-01 Thread chenjie
I used the web ui of spark and could see the conf directory is in CLASSPATH.
An abnormal thing is that when start spark-shell I always get the following
info:
WARN NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable

At first, I think it's because the hadoop version is not compatible with the
pre-built spark. My hadoop version is 2.4.1 and the pre-built spark is built
against hadoop 2.2.0. Then, I built spark from src against hadoop 2.4.1.
However, I still got the info above.

Besides, when I set log4j.rootCategory to DEBUG, I got an exception which
said HADOOP_HOME or hadoop.home.dir are not set despite I have set
HADOOP_HOME.



alee526 wrote
 Could you enable HistoryServer and provide the properties and CLASSPATH
 for the spark-shell? And 'env' command to list your environment variables?
 
 By the way, what does the spark logs says? Enable debug mode to see what's
 going on in spark-shell when it tries to interact and init HiveContext.





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/HiveContext-is-creating-metastore-warehouse-locally-instead-of-in-hdfs-tp10838p11147.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: HiveContext is creating metastore warehouse locally instead of in hdfs

2014-07-31 Thread Andrew Lee
Could you enable HistoryServer and provide the properties and CLASSPATH for the 
spark-shell? And 'env' command to list your environment variables?

By the way, what does the spark logs says? Enable debug mode to see what's 
going on in spark-shell when it tries to interact and init HiveContext.



 On Jul 31, 2014, at 19:09, chenjie chenjie2...@gmail.com wrote:
 
 Hi, Yin and Andrew, thank you for your reply.
 When I create table in hive cli, it works correctly and the table will be
 found in hdfs. I forgot start hiveserver2 before and I started it today.
 Then I run the command below:
spark-shell --master spark://192.168.40.164:7077  --driver-class-path
 conf/hive-site.xml
 Furthermore, I added the following command:
hiveContext.hql(SET
 hive.metastore.warehouse.dir=hdfs://192.168.40.164:8020/user/hive/warehouse)
 But then didn't work for me. I got the same exception as before and found
 the table file in local directory instead of hdfs.
 
 
 Yin Huai-2 wrote
 Another way is to set hive.metastore.warehouse.dir explicitly to the
 HDFS
 dir storing Hive tables by using SET command. For example:
 
 hiveContext.hql(SET
 hive.metastore.warehouse.dir=hdfs://localhost:54310/user/hive/warehouse)
 
 
 
 
 On Thu, Jul 31, 2014 at 8:05 AM, Andrew Lee lt;
 
 alee526@
 
 gt; wrote:
 
 Hi All,
 
 It has been awhile, but what I did to make it work is to make sure the
 followings:
 
 1. Hive is working when you run Hive CLI and JDBC via Hiveserver2
 
 2. Make sure you have the hive-site.xml from above Hive configuration.
 The
 problem here is that you want the hive-site.xml from the Hive metastore.
 The one for Hive and HCatalog may be different files. Make sure you check
 the xml properties in that file, pick the one that has the warehouse
 property configured and the JDO setup.
 
 3. Make sure hive-site.xml from step 2 is included in $SPARK_HOME/conf,
 and in your runtime CLASSPATH when you run spark-shell
 
 4. Use the history server to check the runtime CLASSPATH and order to
 ensure hive-site.xml is included.
 
 HiveContext should pick up the hive-site.xml and talk to your running
 hive
 service.
 
 Hope these tips help.
 
 On Jul 30, 2014, at 22:47, chenjie lt;
 
 chenjie2001@
 
 gt; wrote:
 
 Hi, Michael. I Have the same problem. My warehouse directory is always
 created locally. I copied the default hive-site.xml into the
 $SPARK_HOME/conf directory on each node. After I executed the code
 below,
   val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
   hiveContext.hql(CREATE TABLE IF NOT EXISTS src (key INT, value
 STRING))
   hiveContext.hql(LOAD DATA LOCAL INPATH
 '/extdisk2/tools/spark/examples/src/main/resources/kv1.txt' INTO TABLE
 src)
   hiveContext.hql(FROM src SELECT key, value).collect()
 
 I got the exception below:
 java.io.FileNotFoundException: File
 file:/user/hive/warehouse/src/kv1.txt
 does not exist
   at
 org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:520)
   at
 org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398)
   at
 org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.
 init
 (ChecksumFileSystem.java:137)
   at
 org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339)
   at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:763)
   at
 org.apache.hadoop.mapred.LineRecordReader.
 init
 (LineRecordReader.java:106)
   at
 org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
   at org.apache.spark.rdd.HadoopRDD$$anon$1.
 init
 (HadoopRDD.scala:193)
 
 At last, I found /user/hive/warehouse/src/kv1.txt was created on the
 node
 where I start spark-shell.
 
 The spark that I used is pre-built spark1.0.1 for hadoop2.
 
 Thanks in advance.
 
 
 
 
 
 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/HiveContext-is-creating-metastore-warehouse-locally-instead-of-in-hdfs-tp10838p1.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.


RE: HiveContext is creating metastore warehouse locally instead of in hdfs

2014-07-29 Thread nikroy16
Thanks for the response... hive-site.xml is in the classpath so that doesn't
seem to be the issue.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/HiveContext-is-creating-metastore-warehouse-locally-instead-of-in-hdfs-tp10838p10871.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: HiveContext is creating metastore warehouse locally instead of in hdfs

2014-07-29 Thread Michael Armbrust
The warehouse and the metastore directories are two different things.  The
metastore holds the schema information about the tables and will by default
be a local directory.  With javax.jdo.option.ConnectionURL you can
configure it to be something like mysql.  The warehouse directory is the
default location where the actual contents of the tables is stored.  What
directory are seeing created locally?


On Tue, Jul 29, 2014 at 10:49 AM, nikroy16 nikro...@gmail.com wrote:

 Thanks for the response... hive-site.xml is in the classpath so that
 doesn't
 seem to be the issue.



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/HiveContext-is-creating-metastore-warehouse-locally-instead-of-in-hdfs-tp10838p10871.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.



HiveContext is creating metastore warehouse locally instead of in hdfs

2014-07-28 Thread nikroy16
Hi,

Even though hive.metastore.warehouse.dir in hive-site.xml is set to the
default user/hive/warehouse and the permissions are correct in hdfs,
HiveContext seems to be creating metastore locally instead of hdfs. After
looking into the spark code, I found the following in HiveContext.scala:

   /**
* SQLConf and HiveConf contracts: when the hive session is first
initialized, params in


* HiveConf will get picked up by the SQLConf. Additionally, any properties
set by


* set() or a SET command inside hql() or sql() will be set in the SQLConf
*as well as*


* in the HiveConf.
*/
  @transient protected[hive] lazy val hiveconf = new
HiveConf(classOf[SessionState])


  @transient protected[hive] lazy val sessionState = {


val ss = new SessionState(hiveconf)


set(hiveconf.getAllProperties) // Have SQLConf pick up the initial set
of HiveConf.


ss
  }


It seems as though when a HiveContext is created, it is launched without any
configuration and hive-site.xml is not used to set properties. It looks like
I can set properties after creation by using hql() method but what I am
looking for is for the hive context to be initialized according to the
configuration in hive-site.xml at the time of initialization. Any help would
be greatly appreciated!





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/HiveContext-is-creating-metastore-warehouse-locally-instead-of-in-hdfs-tp10838.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.