Re: Zeppelin in local computer using yarn on distant cluster

Abhi Basu Wed, 02 Nov 2016 08:00:54 -0700

I am assuming you are pointing to hadoop/spark on remote host, right? Can
you not point hadoop conf and spark dirs to remote machine? Not sure if
this works, just suggesting, others may have tried.


On Wed, Nov 2, 2016 at 9:58 AM, Hyung Sung Shim <hss...@nflabs.com> wrote:

> Hello.
> You don't need to install hadoop in your machine but you need a proper
> version of spark[0] to use spark-submit.
> and then you can set[1] the SPARK_HOME where the spark exists and
> HADOOP_CONF_DIR, master as yarn-client your spark interpreter in the
> interpreter menu.
>
> [0]
> http://spark.apache.org/downloads.html
> [1]
> http://zeppelin.apache.org/docs/0.7.0-SNAPSHOT/interpreter/spark.html#1-
> export-spark_home
> http://zeppelin.apache.org/docs/0.7.0-SNAPSHOT/install/
> spark_cluster_mode.html#4-configure-spark-interpreter-in-zeppelin
>
> Hope this helps.
>
> 2016-11-02 19:06 GMT+09:00 Benoit Hanotte <benoit.h...@gmail.com>:
>
>> I have only set HADOOP_CONF_DIR as following (my hadoop conf files are in
>> /usr/local/lib/hadoop/etc/hadoop/, eg /usr/local/lib/hadoop/etc/hado
>> op/yarn-site.xml):
>>
>>     #!/bin/bash
>>     #
>>     # Licensed to the Apache Software Foundation (ASF) under one or more
>>     # contributor license agreements.  See the NOTICE file distributed
>> with
>>     # this work for additional information regarding copyright ownership.
>>     # The ASF licenses this file to You under the Apache License, Version
>> 2.0
>>     # (the "License"); you may not use this file except in compliance with
>>     # the License.  You may obtain a copy of the License at
>>     #
>>     #    http://www.apache.org/licenses/LICENSE-2.0
>>     #
>>     # Unless required by applicable law or agreed to in writing, software
>>     # distributed under the License is distributed on an "AS IS" BASIS,
>>     # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
>> implied.
>>     # See the License for the specific language governing permissions and
>>     # limitations under the License.
>>     #
>>
>>     # export JAVA_HOME=
>>     # export MASTER=                 # Spark master url. eg.
>> spark://master_addr:7077. Leave empty if you want to use local mode.
>>     # export ZEPPELIN_JAVA_OPTS       # Additional jvm options. for
>> example, export ZEPPELIN_JAVA_OPTS="-Dspark.executor.memory=8g
>> -Dspark.cores.max=16"
>>     # export ZEPPELIN_MEM             # Zeppelin jvm mem options Default
>> -Xms1024m -Xmx1024m -XX:MaxPermSize=512m
>>     # export ZEPPELIN_INTP_MEM       # zeppelin interpreter process jvm
>> mem options. Default -Xms1024m -Xmx1024m -XX:MaxPermSize=512m
>>     # export ZEPPELIN_INTP_JAVA_OPTS # zeppelin interpreter process jvm
>> options.
>>     # export ZEPPELIN_SSL_PORT       # ssl port (used when ssl
>> environment variable is set to true)
>>
>>     # export ZEPPELIN_LOG_DIR         # Where log files are stored.  PWD
>> by default.
>>     # export ZEPPELIN_PID_DIR         # The pid files are stored.
>> ${ZEPPELIN_HOME}/run by default.
>>     # export ZEPPELIN_WAR_TEMPDIR     # The location of jetty temporary
>> directory.
>>     # export ZEPPELIN_NOTEBOOK_DIR   # Where notebook saved
>>     # export ZEPPELIN_NOTEBOOK_HOMESCREEN # Id of notebook to be
>> displayed in homescreen. ex) 2A94M5J1Z
>>     # export ZEPPELIN_NOTEBOOK_HOMESCREEN_HIDE # hide homescreen
>> notebook from list when this value set to "true". default "false"
>>     # export ZEPPELIN_NOTEBOOK_S3_BUCKET        # Bucket where notebook
>> saved
>>     # export ZEPPELIN_NOTEBOOK_S3_ENDPOINT      # Endpoint of the bucket
>>     # export ZEPPELIN_NOTEBOOK_S3_USER          # User in bucket where
>> notebook saved. For example bucket/user/notebook/2A94M5J1Z/note.json
>>     # export ZEPPELIN_IDENT_STRING   # A string representing this
>> instance of zeppelin. $USER by default.
>>     # export ZEPPELIN_NICENESS       # The scheduling priority for
>> daemons. Defaults to 0.
>>     # export ZEPPELIN_INTERPRETER_LOCALREPO         # Local repository
>> for interpreter's additional dependency loading
>>     # export ZEPPELIN_NOTEBOOK_STORAGE # Refers to pluggable notebook
>> storage class, can have two classes simultaneously with a sync between them
>> (e.g. local and remote).
>>     # export ZEPPELIN_NOTEBOOK_ONE_WAY_SYNC # If there are multiple
>> notebook storages, should we treat the first one as the only source of
>> truth?
>>
>>     #### Spark interpreter configuration ####
>>
>>     ## Use provided spark installation ##
>>     ## defining SPARK_HOME makes Zeppelin run spark interpreter process
>> using spark-submit
>>     ##
>>     # export SPARK_HOME                             # (required) When it
>> is defined, load it instead of Zeppelin embedded Spark libraries
>>     # export SPARK_SUBMIT_OPTIONS                   # (optional) extra
>> options to pass to spark submit. eg) "--driver-memory 512M
>> --executor-memory 1G".
>>     # export SPARK_APP_NAME                         # (optional) The name
>> of spark application.
>>
>>     ## Use embedded spark binaries ##
>>     ## without SPARK_HOME defined, Zeppelin still able to run spark
>> interpreter process using embedded spark binaries.
>>     ## however, it is not encouraged when you can define SPARK_HOME
>>     ##
>>     # Options read in YARN client mode
>>     export HADOOP_CONF_DIR = /usr/local/lib/hadoop/etc/hadoop/         #
>> yarn-site.xml is located in configuration directory in HADOOP_CONF_DIR.
>>     # Pyspark (supported with Spark 1.2.1 and above)
>>     # To configure pyspark, you need to set spark distribution's path to
>> 'spark.home' property in Interpreter setting screen in Zeppelin GUI
>>     # export PYSPARK_PYTHON           # path to the python command. must
>> be the same path on the driver(Zeppelin) and all workers.
>>     # export PYTHONPATH
>>
>>     ## Spark interpreter options ##
>>     ##
>>     # export ZEPPELIN_SPARK_USEHIVECONTEXT  # Use HiveContext instead of
>> SQLContext if set true. true by default.
>>     # export ZEPPELIN_SPARK_CONCURRENTSQL   # Execute multiple SQL
>> concurrently if set true. false by default.
>>     # export ZEPPELIN_SPARK_IMPORTIMPLICIT  # Import implicits, UDF
>> collection, and sql if set true. true by default.
>>     # export ZEPPELIN_SPARK_MAXRESULT       # Max number of Spark SQL
>> result to display. 1000 by default.
>>     # export ZEPPELIN_WEBSOCKET_MAX_TEXT_MESSAGE_SIZE       # Size in
>> characters of the maximum text message to be received by websocket.
>> Defaults to 1024000
>>
>>
>>     #### HBase interpreter configuration ####
>>
>>     ## To connect to HBase running on a cluster, either HBASE_HOME or
>> HBASE_CONF_DIR must be set
>>
>>     # export HBASE_HOME=                    # (require) Under which HBase
>> scripts and configuration should be
>>     # export HBASE_CONF_DIR=                # (optional) Alternatively,
>> configuration directory can be set to point to the directory that has
>> hbase-site.xml
>>
>>     #### ZeppelinHub connection configuration ####
>>     # export ZEPPELINHUB_API_ADDRESS # Refers to the address of the
>> ZeppelinHub service in use
>>     # export ZEPPELINHUB_API_TOKEN # Refers to the Zeppelin instance
>> token of the user
>>     # export ZEPPELINHUB_USER_KEY # Optional, when using Zeppelin with
>> authentication.
>>
>>
>>
>> I also tried simply /usr/local/lib/hadoop and I also create a conf
>> directory within /usr/local/lib/hadoop/etc/hadoop and placed
>> yarn-site.xml within this folder
>>
>> Thanks
>>
>> On Wed, Nov 2, 2016 at 10:06 AM, Hyung Sung Shim <hss...@nflabs.com>
>> wrote:
>>
>>> Could you share your zeppelin-env.sh ?
>>> 2016년 11월 2일 (수) 오후 4:57, Benoit Hanotte <benoit.h...@gmail.com>님이 작성:
>>>
>>>> Thanks for your reply,
>>>> I have tried setting it within zeppelin-env.sh but it doesn't work any
>>>> better.
>>>>
>>>> Thanks
>>>>
>>>> On Wed, Nov 2, 2016 at 2:13 AM, Hyung Sung Shim <hss...@nflabs.com>
>>>> wrote:
>>>>
>>>> Hello.
>>>> You should set the HADOOP_CONF_DIR to /usr/local/lib/hadoop/etc/hadoop/
>>>> in the conf/zeppelin-env.sh.
>>>> Thanks.
>>>> 2016년 11월 2일 (수) 오전 5:07, Benoit Hanotte <benoit.h...@gmail.com>님이 작성:
>>>>
>>>> Hello,
>>>>
>>>> I'd like to use zeppelin on my local computer and use it to run spark
>>>> executors on a distant yarn cluster since I can't easily install zeppelin
>>>> on the cluster gateway.
>>>>
>>>> I installed the correct hadoop version (2.6), and compiled zeppelin
>>>> (from the master branch) as following:
>>>>
>>>> *mvn clean package -DskipTests -Phadoop-2.6
>>>> -Dhadoop.version=2.6.0-cdh5.5.0 -Pyarn -Pspark-2.0 -Pscala-2.11*
>>>>
>>>> I also set HADOOP_HOME_DIR to /usr/local/lib/hadoop where my hadoop is
>>>> installed (I also tried with /usr/local/lib/hadoop/etc/hadoop/ where
>>>> the conf files such as yarn-site.xml are). I set
>>>> yarn.resourcemanager.hostname to the resource manager of the cluster (I
>>>> copied the value from the config file on the cluster) but when I start a
>>>> spark command it still tries to connect to 0.0.0.0:8032 as one can see
>>>> in the logs:
>>>>
>>>> *INFO [2016-11-01 20:48:26,581] ({pool-2-thread-2}
>>>> Client.java[handleConnectionFailure]:862) - Retrying connect to server:
>>>> 0.0.0.0/0.0.0.0:8032 <http://0.0.0.0/0.0.0.0:8032>. Already tried 9
>>>> time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
>>>> sleepTime=1000 MILLISECONDS)*
>>>>
>>>> Am I missing something something? Is there any additional parameters to
>>>> set?
>>>>
>>>> Thanks!
>>>>
>>>> Benoit
>>>>
>>>>
>>>>
>>>>
>>
>


-- 
Abhi Basu

Re: Zeppelin in local computer using yarn on distant cluster

Reply via email to