Re: No way to supply hive-site.xml in yarn client mode?

Deenar Toraskar Thu, 29 Oct 2015 07:27:53 -0700

I dont know a lot about how pyspark works. Can you possibly try running
spark-shell and do the same?


sqlContext.sql("show databases").collect

Deenar

On 29 October 2015 at 14:18, Zoltan Fedor <zoltan.0.fe...@gmail.com> wrote:

> Yes, I am. It was compiled with the following:
>
> export SPARK_HADOOP_VERSION=2.5.0-cdh5.3.3
> export SPARK_YARN=true
> export SPARK_HIVE=true
> export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M
> -XX:ReservedCodeCacheSize=512m"
> mvn -Pyarn -Phadoop-2.5 -Dhadoop.version=2.5.0-cdh5.3.3 -Phive
> -Phive-thriftserver -DskipTests clean package
>
> On Thu, Oct 29, 2015 at 10:16 AM, Deenar Toraskar <
> deenar.toras...@gmail.com> wrote:
>
>> Are you using Spark built with hive ?
>>
>> # Apache Hadoop 2.4.X with Hive 13 support
>> mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -Phive -Phive-thriftserver 
>> -DskipTests clean package
>>
>>
>> On 29 October 2015 at 13:08, Zoltan Fedor <zoltan.0.fe...@gmail.com>
>> wrote:
>>
>>> Hi Deenar,
>>> As suggested, I have moved the hive-site.xml from HADOOP_CONF_DIR
>>> ($SPARK_HOME/hadoop-conf) to YARN_CONF_DIR ($SPARK_HOME/conf/yarn-conf) and
>>> use the below to start pyspark, but the error is the exact same as before.
>>>
>>> $ HADOOP_CONF_DIR=$SPARK_HOME/hadoop-conf
>>> YARN_CONF_DIR=$SPARK_HOME/conf/yarn-conf HADOOP_USER_NAME=biapp MASTER=yarn
>>> $SPARK_HOME/bin/pyspark --deploy-mode client
>>>
>>> Python 2.6.6 (r266:84292, Jul 23 2015, 05:13:40)
>>> [GCC 4.4.7 20120313 (Red Hat 4.4.7-16)] on linux2
>>> Type "help", "copyright", "credits" or "license" for more information.
>>> SLF4J: Class path contains multiple SLF4J bindings.
>>> SLF4J: Found binding in
>>> [jar:file:/usr/lib/spark-1.5.1-bin-without-hadoop/lib/spark-assembly-1.5.1-hadoop2.5.0-cdh5.3.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>> SLF4J: Found binding in
>>> [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>>> explanation.
>>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>>> 15/10/29 09:06:36 WARN MetricsSystem: Using default name DAGScheduler
>>> for source because spark.app.id is not set.
>>> 15/10/29 09:06:38 WARN NativeCodeLoader: Unable to load native-hadoop
>>> library for your platform... using builtin-java classes where applicable
>>> 15/10/29 09:07:03 WARN HiveConf: HiveConf of name hive.metastore.local
>>> does not exist
>>> Welcome to
>>>       ____              __
>>>      / __/__  ___ _____/ /__
>>>     _\ \/ _ \/ _ `/ __/  '_/
>>>    /__ / .__/\_,_/_/ /_/\_\   version 1.5.1
>>>       /_/
>>>
>>> Using Python version 2.6.6 (r266:84292, Jul 23 2015 05:13:40)
>>> SparkContext available as sc, HiveContext available as sqlContext.
>>> >>> sqlContext2 = HiveContext(sc)
>>> >>> sqlContext2 = HiveContext(sc)
>>> >>> sqlContext2.sql("show databases").first()
>>> 15/10/29 09:07:34 WARN HiveConf: HiveConf of name hive.metastore.local
>>> does not exist
>>> 15/10/29 09:07:35 WARN ShellBasedUnixGroupsMapping: got exception trying
>>> to get groups for user biapp: id: biapp: No such user
>>>
>>> 15/10/29 09:07:35 WARN UserGroupInformation: No groups available for
>>> user biapp
>>> Traceback (most recent call last):
>>>   File "<stdin>", line 1, in <module>
>>>   File
>>> "/usr/lib/spark-1.5.1-bin-without-hadoop/python/pyspark/sql/context.py",
>>> line 552, in sql
>>>     return DataFrame(self._ssql_ctx.sql(sqlQuery), self)
>>>   File
>>> "/usr/lib/spark-1.5.1-bin-without-hadoop/python/pyspark/sql/context.py",
>>> line 660, in _ssql_ctx
>>>     "build/sbt assembly", e)
>>> Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true'
>>> and run build/sbt assembly", Py4JJavaError(u'An error occurred while
>>> calling None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o20))
>>> >>>
>>>
>>>
>>> On Thu, Oct 29, 2015 at 7:20 AM, Deenar Toraskar <
>>> deenar.toras...@gmail.com> wrote:
>>>
>>>> *Hi Zoltan*
>>>>
>>>> Add hive-site.xml to your YARN_CONF_DIR. i.e.
>>>> $SPARK_HOME/conf/yarn-conf
>>>>
>>>> Deenar
>>>>
>>>> *Think Reactive Ltd*
>>>> deenar.toras...@thinkreactive.co.uk
>>>> 07714140812
>>>>
>>>> On 28 October 2015 at 14:28, Zoltan Fedor <zoltan.0.fe...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>> We have a shared CDH 5.3.3 cluster and trying to use Spark 1.5.1 on it
>>>>> in yarn client mode with Hive.
>>>>>
>>>>> I have compiled Spark 1.5.1 with SPARK_HIVE=true, but it seems I am
>>>>> not able to make SparkSQL to pick up the hive-site.xml when runnig 
>>>>> pyspark.
>>>>>
>>>>> hive-site.xml is located in $SPARK_HOME/hadoop-conf/hive-site.xml and
>>>>> also in $SPARK_HOME/conf/hive-site.xml
>>>>>
>>>>> When I start pyspark with the below command and then run some simple
>>>>> SparkSQL it fails, it seems it didn't pic up the settings in hive-site.xml
>>>>>
>>>>> $ HADOOP_CONF_DIR=$SPARK_HOME/hadoop-conf
>>>>> YARN_CONF_DIR=$SPARK_HOME/yarn-conf HADOOP_USER_NAME=biapp MASTER=yarn
>>>>> $SPARK_HOME/bin/pyspark --deploy-mode client
>>>>>
>>>>> Python 2.6.6 (r266:84292, Jul 23 2015, 05:13:40)
>>>>> [GCC 4.4.7 20120313 (Red Hat 4.4.7-16)] on linux2
>>>>> Type "help", "copyright", "credits" or "license" for more information.
>>>>> SLF4J: Class path contains multiple SLF4J bindings.
>>>>> SLF4J: Found binding in
>>>>> [jar:file:/usr/lib/spark-1.5.1-bin-without-hadoop/lib/spark-assembly-1.5.1-hadoop2.5.0-cdh5.3.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>> SLF4J: Found binding in
>>>>> [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>>>>> explanation.
>>>>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>>>>> 15/10/28 10:22:33 WARN MetricsSystem: Using default name DAGScheduler
>>>>> for source because spark.app.id is not set.
>>>>> 15/10/28 10:22:35 WARN NativeCodeLoader: Unable to load native-hadoop
>>>>> library for your platform... using builtin-java classes where applicable
>>>>> 15/10/28 10:22:59 WARN HiveConf: HiveConf of name hive.metastore.local
>>>>> does not exist
>>>>> Welcome to
>>>>>       ____              __
>>>>>      / __/__  ___ _____/ /__
>>>>>     _\ \/ _ \/ _ `/ __/  '_/
>>>>>    /__ / .__/\_,_/_/ /_/\_\   version 1.5.1
>>>>>       /_/
>>>>>
>>>>> Using Python version 2.6.6 (r266:84292, Jul 23 2015 05:13:40)
>>>>> SparkContext available as sc, HiveContext available as sqlContext.
>>>>> >>> sqlContext2 = HiveContext(sc)
>>>>> >>> sqlContext2.sql("show databases").first()
>>>>> 15/10/28 10:23:12 WARN HiveConf: HiveConf of name hive.metastore.local
>>>>> does not exist
>>>>> 15/10/28 10:23:13 WARN ShellBasedUnixGroupsMapping: got exception
>>>>> trying to get groups for user biapp: id: biapp: No such user
>>>>>
>>>>> 15/10/28 10:23:13 WARN UserGroupInformation: No groups available for
>>>>> user biapp
>>>>> Traceback (most recent call last):
>>>>>   File "<stdin>", line 1, in <module>
>>>>>   File
>>>>> "/usr/lib/spark-1.5.1-bin-without-hadoop/python/pyspark/sql/context.py",
>>>>> line 552, in sql
>>>>>     return DataFrame(self._ssql_ctx.sql(sqlQuery), self)
>>>>>   File
>>>>> "/usr/lib/spark-1.5.1-bin-without-hadoop/python/pyspark/sql/context.py",
>>>>> line 660, in _ssql_ctx
>>>>>     "build/sbt assembly", e)
>>>>> Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true'
>>>>> and run build/sbt assembly", Py4JJavaError(u'An error occurred while
>>>>> calling None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject 
>>>>> id=o20))
>>>>> >>>
>>>>>
>>>>>
>>>>> See in the above the warning about "WARN HiveConf: HiveConf of name
>>>>> hive.metastore.local does not exist" while actually there is a
>>>>> hive.metastore.local attribute in the hive-site.xml
>>>>>
>>>>> Any idea how to submit hive-site.xml in yarn client mode?
>>>>>
>>>>> Thanks
>>>>>
>>>>
>>>>
>>>
>>
>

Re: No way to supply hive-site.xml in yarn client mode?

Reply via email to