I dont know a lot about how pyspark works. Can you possibly try running spark-shell and do the same?
sqlContext.sql("show databases").collect Deenar On 29 October 2015 at 14:18, Zoltan Fedor <zoltan.0.fe...@gmail.com> wrote: > Yes, I am. It was compiled with the following: > > export SPARK_HADOOP_VERSION=2.5.0-cdh5.3.3 > export SPARK_YARN=true > export SPARK_HIVE=true > export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M > -XX:ReservedCodeCacheSize=512m" > mvn -Pyarn -Phadoop-2.5 -Dhadoop.version=2.5.0-cdh5.3.3 -Phive > -Phive-thriftserver -DskipTests clean package > > On Thu, Oct 29, 2015 at 10:16 AM, Deenar Toraskar < > deenar.toras...@gmail.com> wrote: > >> Are you using Spark built with hive ? >> >> # Apache Hadoop 2.4.X with Hive 13 support >> mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -Phive -Phive-thriftserver >> -DskipTests clean package >> >> >> On 29 October 2015 at 13:08, Zoltan Fedor <zoltan.0.fe...@gmail.com> >> wrote: >> >>> Hi Deenar, >>> As suggested, I have moved the hive-site.xml from HADOOP_CONF_DIR >>> ($SPARK_HOME/hadoop-conf) to YARN_CONF_DIR ($SPARK_HOME/conf/yarn-conf) and >>> use the below to start pyspark, but the error is the exact same as before. >>> >>> $ HADOOP_CONF_DIR=$SPARK_HOME/hadoop-conf >>> YARN_CONF_DIR=$SPARK_HOME/conf/yarn-conf HADOOP_USER_NAME=biapp MASTER=yarn >>> $SPARK_HOME/bin/pyspark --deploy-mode client >>> >>> Python 2.6.6 (r266:84292, Jul 23 2015, 05:13:40) >>> [GCC 4.4.7 20120313 (Red Hat 4.4.7-16)] on linux2 >>> Type "help", "copyright", "credits" or "license" for more information. >>> SLF4J: Class path contains multiple SLF4J bindings. >>> SLF4J: Found binding in >>> [jar:file:/usr/lib/spark-1.5.1-bin-without-hadoop/lib/spark-assembly-1.5.1-hadoop2.5.0-cdh5.3.3.jar!/org/slf4j/impl/StaticLoggerBinder.class] >>> SLF4J: Found binding in >>> [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] >>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an >>> explanation. >>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] >>> 15/10/29 09:06:36 WARN MetricsSystem: Using default name DAGScheduler >>> for source because spark.app.id is not set. >>> 15/10/29 09:06:38 WARN NativeCodeLoader: Unable to load native-hadoop >>> library for your platform... using builtin-java classes where applicable >>> 15/10/29 09:07:03 WARN HiveConf: HiveConf of name hive.metastore.local >>> does not exist >>> Welcome to >>> ____ __ >>> / __/__ ___ _____/ /__ >>> _\ \/ _ \/ _ `/ __/ '_/ >>> /__ / .__/\_,_/_/ /_/\_\ version 1.5.1 >>> /_/ >>> >>> Using Python version 2.6.6 (r266:84292, Jul 23 2015 05:13:40) >>> SparkContext available as sc, HiveContext available as sqlContext. >>> >>> sqlContext2 = HiveContext(sc) >>> >>> sqlContext2 = HiveContext(sc) >>> >>> sqlContext2.sql("show databases").first() >>> 15/10/29 09:07:34 WARN HiveConf: HiveConf of name hive.metastore.local >>> does not exist >>> 15/10/29 09:07:35 WARN ShellBasedUnixGroupsMapping: got exception trying >>> to get groups for user biapp: id: biapp: No such user >>> >>> 15/10/29 09:07:35 WARN UserGroupInformation: No groups available for >>> user biapp >>> Traceback (most recent call last): >>> File "<stdin>", line 1, in <module> >>> File >>> "/usr/lib/spark-1.5.1-bin-without-hadoop/python/pyspark/sql/context.py", >>> line 552, in sql >>> return DataFrame(self._ssql_ctx.sql(sqlQuery), self) >>> File >>> "/usr/lib/spark-1.5.1-bin-without-hadoop/python/pyspark/sql/context.py", >>> line 660, in _ssql_ctx >>> "build/sbt assembly", e) >>> Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' >>> and run build/sbt assembly", Py4JJavaError(u'An error occurred while >>> calling None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o20)) >>> >>> >>> >>> >>> On Thu, Oct 29, 2015 at 7:20 AM, Deenar Toraskar < >>> deenar.toras...@gmail.com> wrote: >>> >>>> *Hi Zoltan* >>>> >>>> Add hive-site.xml to your YARN_CONF_DIR. i.e. >>>> $SPARK_HOME/conf/yarn-conf >>>> >>>> Deenar >>>> >>>> *Think Reactive Ltd* >>>> deenar.toras...@thinkreactive.co.uk >>>> 07714140812 >>>> >>>> On 28 October 2015 at 14:28, Zoltan Fedor <zoltan.0.fe...@gmail.com> >>>> wrote: >>>> >>>>> Hi, >>>>> We have a shared CDH 5.3.3 cluster and trying to use Spark 1.5.1 on it >>>>> in yarn client mode with Hive. >>>>> >>>>> I have compiled Spark 1.5.1 with SPARK_HIVE=true, but it seems I am >>>>> not able to make SparkSQL to pick up the hive-site.xml when runnig >>>>> pyspark. >>>>> >>>>> hive-site.xml is located in $SPARK_HOME/hadoop-conf/hive-site.xml and >>>>> also in $SPARK_HOME/conf/hive-site.xml >>>>> >>>>> When I start pyspark with the below command and then run some simple >>>>> SparkSQL it fails, it seems it didn't pic up the settings in hive-site.xml >>>>> >>>>> $ HADOOP_CONF_DIR=$SPARK_HOME/hadoop-conf >>>>> YARN_CONF_DIR=$SPARK_HOME/yarn-conf HADOOP_USER_NAME=biapp MASTER=yarn >>>>> $SPARK_HOME/bin/pyspark --deploy-mode client >>>>> >>>>> Python 2.6.6 (r266:84292, Jul 23 2015, 05:13:40) >>>>> [GCC 4.4.7 20120313 (Red Hat 4.4.7-16)] on linux2 >>>>> Type "help", "copyright", "credits" or "license" for more information. >>>>> SLF4J: Class path contains multiple SLF4J bindings. >>>>> SLF4J: Found binding in >>>>> [jar:file:/usr/lib/spark-1.5.1-bin-without-hadoop/lib/spark-assembly-1.5.1-hadoop2.5.0-cdh5.3.3.jar!/org/slf4j/impl/StaticLoggerBinder.class] >>>>> SLF4J: Found binding in >>>>> [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] >>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an >>>>> explanation. >>>>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] >>>>> 15/10/28 10:22:33 WARN MetricsSystem: Using default name DAGScheduler >>>>> for source because spark.app.id is not set. >>>>> 15/10/28 10:22:35 WARN NativeCodeLoader: Unable to load native-hadoop >>>>> library for your platform... using builtin-java classes where applicable >>>>> 15/10/28 10:22:59 WARN HiveConf: HiveConf of name hive.metastore.local >>>>> does not exist >>>>> Welcome to >>>>> ____ __ >>>>> / __/__ ___ _____/ /__ >>>>> _\ \/ _ \/ _ `/ __/ '_/ >>>>> /__ / .__/\_,_/_/ /_/\_\ version 1.5.1 >>>>> /_/ >>>>> >>>>> Using Python version 2.6.6 (r266:84292, Jul 23 2015 05:13:40) >>>>> SparkContext available as sc, HiveContext available as sqlContext. >>>>> >>> sqlContext2 = HiveContext(sc) >>>>> >>> sqlContext2.sql("show databases").first() >>>>> 15/10/28 10:23:12 WARN HiveConf: HiveConf of name hive.metastore.local >>>>> does not exist >>>>> 15/10/28 10:23:13 WARN ShellBasedUnixGroupsMapping: got exception >>>>> trying to get groups for user biapp: id: biapp: No such user >>>>> >>>>> 15/10/28 10:23:13 WARN UserGroupInformation: No groups available for >>>>> user biapp >>>>> Traceback (most recent call last): >>>>> File "<stdin>", line 1, in <module> >>>>> File >>>>> "/usr/lib/spark-1.5.1-bin-without-hadoop/python/pyspark/sql/context.py", >>>>> line 552, in sql >>>>> return DataFrame(self._ssql_ctx.sql(sqlQuery), self) >>>>> File >>>>> "/usr/lib/spark-1.5.1-bin-without-hadoop/python/pyspark/sql/context.py", >>>>> line 660, in _ssql_ctx >>>>> "build/sbt assembly", e) >>>>> Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' >>>>> and run build/sbt assembly", Py4JJavaError(u'An error occurred while >>>>> calling None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject >>>>> id=o20)) >>>>> >>> >>>>> >>>>> >>>>> See in the above the warning about "WARN HiveConf: HiveConf of name >>>>> hive.metastore.local does not exist" while actually there is a >>>>> hive.metastore.local attribute in the hive-site.xml >>>>> >>>>> Any idea how to submit hive-site.xml in yarn client mode? >>>>> >>>>> Thanks >>>>> >>>> >>>> >>> >> >