Re: HiveContext is creating metastore warehouse locally instead of in hdfs
I used the web ui of spark and could see the conf directory is in CLASSPATH. An abnormal thing is that when start spark-shell I always get the following info: WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable At first, I think it's because the hadoop version is not compatible with the pre-built spark. My hadoop version is 2.4.1 and the pre-built spark is built against hadoop 2.2.0. Then, I built spark from src against hadoop 2.4.1. However, I still got the info above. Besides, when I set log4j.rootCategory to DEBUG, I got an exception which said HADOOP_HOME or hadoop.home.dir are not set despite I have set HADOOP_HOME. alee526 wrote Could you enable HistoryServer and provide the properties and CLASSPATH for the spark-shell? And 'env' command to list your environment variables? By the way, what does the spark logs says? Enable debug mode to see what's going on in spark-shell when it tries to interact and init HiveContext. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/HiveContext-is-creating-metastore-warehouse-locally-instead-of-in-hdfs-tp10838p11147.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: HiveContext is creating metastore warehouse locally instead of in hdfs
Could you enable HistoryServer and provide the properties and CLASSPATH for the spark-shell? And 'env' command to list your environment variables? By the way, what does the spark logs says? Enable debug mode to see what's going on in spark-shell when it tries to interact and init HiveContext. On Jul 31, 2014, at 19:09, chenjie chenjie2...@gmail.com wrote: Hi, Yin and Andrew, thank you for your reply. When I create table in hive cli, it works correctly and the table will be found in hdfs. I forgot start hiveserver2 before and I started it today. Then I run the command below: spark-shell --master spark://192.168.40.164:7077 --driver-class-path conf/hive-site.xml Furthermore, I added the following command: hiveContext.hql(SET hive.metastore.warehouse.dir=hdfs://192.168.40.164:8020/user/hive/warehouse) But then didn't work for me. I got the same exception as before and found the table file in local directory instead of hdfs. Yin Huai-2 wrote Another way is to set hive.metastore.warehouse.dir explicitly to the HDFS dir storing Hive tables by using SET command. For example: hiveContext.hql(SET hive.metastore.warehouse.dir=hdfs://localhost:54310/user/hive/warehouse) On Thu, Jul 31, 2014 at 8:05 AM, Andrew Lee lt; alee526@ gt; wrote: Hi All, It has been awhile, but what I did to make it work is to make sure the followings: 1. Hive is working when you run Hive CLI and JDBC via Hiveserver2 2. Make sure you have the hive-site.xml from above Hive configuration. The problem here is that you want the hive-site.xml from the Hive metastore. The one for Hive and HCatalog may be different files. Make sure you check the xml properties in that file, pick the one that has the warehouse property configured and the JDO setup. 3. Make sure hive-site.xml from step 2 is included in $SPARK_HOME/conf, and in your runtime CLASSPATH when you run spark-shell 4. Use the history server to check the runtime CLASSPATH and order to ensure hive-site.xml is included. HiveContext should pick up the hive-site.xml and talk to your running hive service. Hope these tips help. On Jul 30, 2014, at 22:47, chenjie lt; chenjie2001@ gt; wrote: Hi, Michael. I Have the same problem. My warehouse directory is always created locally. I copied the default hive-site.xml into the $SPARK_HOME/conf directory on each node. After I executed the code below, val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) hiveContext.hql(CREATE TABLE IF NOT EXISTS src (key INT, value STRING)) hiveContext.hql(LOAD DATA LOCAL INPATH '/extdisk2/tools/spark/examples/src/main/resources/kv1.txt' INTO TABLE src) hiveContext.hql(FROM src SELECT key, value).collect() I got the exception below: java.io.FileNotFoundException: File file:/user/hive/warehouse/src/kv1.txt does not exist at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:520) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker. init (ChecksumFileSystem.java:137) at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:763) at org.apache.hadoop.mapred.LineRecordReader. init (LineRecordReader.java:106) at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67) at org.apache.spark.rdd.HadoopRDD$$anon$1. init (HadoopRDD.scala:193) At last, I found /user/hive/warehouse/src/kv1.txt was created on the node where I start spark-shell. The spark that I used is pre-built spark1.0.1 for hadoop2. Thanks in advance. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/HiveContext-is-creating-metastore-warehouse-locally-instead-of-in-hdfs-tp10838p1.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
RE: HiveContext is creating metastore warehouse locally instead of in hdfs
Thanks for the response... hive-site.xml is in the classpath so that doesn't seem to be the issue. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/HiveContext-is-creating-metastore-warehouse-locally-instead-of-in-hdfs-tp10838p10871.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: HiveContext is creating metastore warehouse locally instead of in hdfs
The warehouse and the metastore directories are two different things. The metastore holds the schema information about the tables and will by default be a local directory. With javax.jdo.option.ConnectionURL you can configure it to be something like mysql. The warehouse directory is the default location where the actual contents of the tables is stored. What directory are seeing created locally? On Tue, Jul 29, 2014 at 10:49 AM, nikroy16 nikro...@gmail.com wrote: Thanks for the response... hive-site.xml is in the classpath so that doesn't seem to be the issue. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/HiveContext-is-creating-metastore-warehouse-locally-instead-of-in-hdfs-tp10838p10871.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
HiveContext is creating metastore warehouse locally instead of in hdfs
Hi, Even though hive.metastore.warehouse.dir in hive-site.xml is set to the default user/hive/warehouse and the permissions are correct in hdfs, HiveContext seems to be creating metastore locally instead of hdfs. After looking into the spark code, I found the following in HiveContext.scala: /** * SQLConf and HiveConf contracts: when the hive session is first initialized, params in * HiveConf will get picked up by the SQLConf. Additionally, any properties set by * set() or a SET command inside hql() or sql() will be set in the SQLConf *as well as* * in the HiveConf. */ @transient protected[hive] lazy val hiveconf = new HiveConf(classOf[SessionState]) @transient protected[hive] lazy val sessionState = { val ss = new SessionState(hiveconf) set(hiveconf.getAllProperties) // Have SQLConf pick up the initial set of HiveConf. ss } It seems as though when a HiveContext is created, it is launched without any configuration and hive-site.xml is not used to set properties. It looks like I can set properties after creation by using hql() method but what I am looking for is for the hive context to be initialized according to the configuration in hive-site.xml at the time of initialization. Any help would be greatly appreciated! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/HiveContext-is-creating-metastore-warehouse-locally-instead-of-in-hdfs-tp10838.html Sent from the Apache Spark User List mailing list archive at Nabble.com.