Hi!
Couple days ago I tested Zeppelin on my laptop, Cloudera Hadoop in pseudodistributed mode with Spark Standalone. I faced with fasterxml.jackson problem. Eric Charles said that he had the similar problem and advised to remove jackson-*.jar libraries from lib folder. So I did it. I also coped with parameters in zeppelin-env.sh to make Zeppelin work locally.

On Monday, when I came to job, it became clear that configuration parameters for local installation and real cluster installation vary greatly. And I got this Thrift Transport Exception . In 2 days, rebuilt Zeppelin several times, checked all parameters, checked & changed my network. At last, when I received your letter, I checked MASTER variable. And I remembered those deleted *.jar files. I thought that they are sections of the chain. I copied them back to lib folder. And Spark began to work! But Spark SQL doesn't work, DataFrames can't load & write ORC files. It gives some HiveContext error connected to metastore_db (Derby). Either Hive itself (which is situated on the same edge node as Zeppelin) has its own Derby metastore_db, or I should delete metastore_db from $ZEPPELIN_HOME/bin. Should I?
The code is
%spark
import org.apache.spark.sql._
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)

Import is made. Then I get error.



On 11/24/2015 07:39 PM, moon soo Lee wrote:
Basically, if SPARK_HOME/bin/spark-shell works, then export SPARK_HOME in conf/zeppelin-env.sh and setting 'master' property in Interpreter menu on Zeppelin GUI should be enough to make successful connection to Spark standalone cluster.

Do you see any new exception in your log file when you set 'master' property in Interpreter menu on Zeppelin GUI and see 'Scheduler already Terminated' error? If you can share, that would be helpful.

Zeppelin does not use HiveThriftServer2 and does not need any other dependency except for JVM to run, once it's been built.


Thanks,
moon

On Tue, Nov 24, 2015 at 11:37 PM Timur Shenkao <t...@timshenkao.su <mailto:t...@timshenkao.su>> wrote:

    One more question. What should be installed on server? What the
    dependencies of Zeppelin?
    Node.js, npm, bower? Scala?

    On Tue, Nov 24, 2015 at 5:34 PM, Timur Shenkao <t...@timshenkao.su
    <mailto:t...@timshenkao.su>> wrote:

    > I also checked Spark workers. There are no traces, folders, logs
    about
    > Zeppelin on them.
    > There are logs about Zeppelin on Spark Master server only where
    Zeppelin
    > is launched.
    >
    > For example, H2O creates logs on every worker in folders
    > /usr/spark/work/app-.....-... Is it correct?
    >
    > I also launched Thrift server via
    /usr/spark/sbin/start-thriftserver.sh on
    > Spark Master. Does Zeppelin use
    > org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 ?
    >
    > For terminated scheduler, I got
    > INFO [2015-11-24 16:26:16,610] ({pool-1-thread-2}
    > SchedulerFactory.java[jobFinished]:138) - Job paragraph_1448346$
    > ERROR [2015-11-24 16:26:17,658] ({Thread-34}
    > JobProgressPoller.java[run]:57) - Can not get or update progress
    > org.apache.zeppelin.interpreter.InterpreterException:
    > org.apache.thrift.transport.TTransportException
    >         at
    >
    
org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getProgress(RemoteInterpreter.java:302)
    >         at
    >
    
org.apache.zeppelin.interpreter.LazyOpenInterpreter.getProgress(LazyOpenInterpreter.java:110)
    >         at
    > org.apache.zeppelin.notebook.Paragraph.progress(Paragraph.java:174)
    >         at
    >
    
org.apache.zeppelin.scheduler.JobProgressPoller.run(JobProgressPoller.java:54)
    > Caused by: org.apache.thrift.transport.TTransportException
    >         at
    >
    
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
    >         at
    > org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
    >         at
    >
    org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
    >         at
    >
    org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
    >         at
    >
    
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
    >         at
    > org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
    >         at
    >
    
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_getProgress(RemoteInterpret$
    >         at
    >
    
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.getProgress(RemoteInterpreterSer$
    > INFO [2015-11-24 16:26:52,617] ({qtp982007015-52}
    > InterpreterRestApi.java[updateSetting]:104) - Update interprete$
    >  INFO [2015-11-24 16:27:56,319] ({qtp982007015-48}
    > InterpreterRestApi.java[restartSetting]:143) - Restart interpre$
    > ERROR [2015-11-24 16:28:09,603] ({qtp982007015-48}
    > NotebookServer.java[runParagraph]:661) - Exception from run
    > java.lang.RuntimeException: Scheduler already terminated
    >         at
    >
    
org.apache.zeppelin.scheduler.RemoteScheduler.submit(RemoteScheduler.java:124)
    >         at org.apache.zeppelin.notebook.Note.run(Note.java:326)
    >         at
    >
    
org.apache.zeppelin.socket.NotebookServer.runParagraph(NotebookServer.java:659)
    >         at
    >
    org.apache.zeppelin.socket.NotebookServer.onMessage(NotebookServer.java:126)
    >         at
    >
    org.apache.zeppelin.socket.NotebookSocket.onMessage(NotebookSocket.java:56)
    >         at
    >
    
org.eclipse.jetty.websocket.WebSocketConnectionRFC6455$WSFrameHandler.onFrame(WebSocketConnectionRFC645$
    >         at
    >
    
org.eclipse.jetty.websocket.WebSocketParserRFC6455.parseNext(WebSocketParserRFC6455.java:349)
    >         at
    >
    
org.eclipse.jetty.websocket.WebSocketConnectionRFC6455.handle(WebSocketConnectionRFC6455.java:225)
    >         at
    >
    
org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667)
    >         at
    >
    
org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
    >         at
    >
    
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
    >         at
    >
    
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
    >         at java.lang.Thread.run(Thread.java:745)
    > ERROR [2015-11-24 16:28:36,906] ({qtp982007015-50}
    > NotebookServer.java[runParagraph]:661) - Exception from run
    > java.lang.RuntimeException: Scheduler already terminated
    >         at
    >
    
org.apache.zeppelin.scheduler.RemoteScheduler.submit(RemoteScheduler.java:124)
    >         at org.apache.zeppelin.notebook.Note.run(Note.java:326)
    >         at
    >
    
org.apache.zeppelin.socket.NotebookServer.runParagraph(NotebookServer.java:659)
    >         at
    >
    org.apache.zeppelin.socket.NotebookServer.onMessage(NotebookServer.java:126)
    >         at
    >
    org.apache.zeppelin.socket.NotebookSocket.onMessage(NotebookSocket.java:56)
    >
    >
    >
    >
    > On Tue, Nov 24, 2015 at 4:50 PM, Timur Shenkao
    <t...@timshenkao.su <mailto:t...@timshenkao.su>> wrote:
    >
    >> Hello!
    >>
    >> There is no Kerberos, no security in my cluster. It's in an
    internal
    >> network.
    >>
    >> Interpreters %hive and %sh work, I can create tables, drop,
    pwd, etc. So,
    >> the problem is in integration with Spark.
    >>
    >> In /usr/spark/conf/spark-env.sh I set / unset in turn MASTER =
    >> spark://localhost:7077,  MASTER = spark://192.168.58.10:7077
    <http://192.168.58.10:7077>, MASTER =
    >> spark://127.0.0.1:7077 <http://127.0.0.1:7077> on master node.
    On slaves I set / unset in turn
    >> MASTER = spark://192.168.58.10:7077 <http://192.168.58.10:7077>
    in different combinations.
    >>
    >> Zeppelin is installed on the same machine as Spark Master. So, in
>> zeppelin-env.sh I set / unset MASTER = spark://localhost:7077, MASTER =
    >> spark://192.168.58.10:7077 <http://192.168.58.10:7077>, MASTER
    = spark://127.0.0.1:7077 <http://127.0.0.1:7077>
    >> Yes, I can connect to 192.168.58 and see URL
    spark://192.168.58:7077
    >> REST URL spark://192.168.58:6066 (cluster mode)
    >>
    >> Does TCP type influence? On my laptop, in pseudodistributed
    mode, all
    >> connections are IPv4 (tcp). There are IPv4 lines in /etc/hosts
    only.
    >> In cluster, Spark automatically, for unknown reasons, uses IPv6
    (tcp6).
    >> There are IPv6 lines in /etc/hosts.
    >> Right now, I try to make Spark use IPv4
    >>
    >> I switched Spark to IPv4 via -Djava.net.preferIPv4Stack=true
    >>
    >> It seems that Zeppelin uses / answers the following ports on Master
    >> server (ps axu | grep zeppelin;  then for each PID netstat
    -natp | grep
    >> ...):
    >> 41303
    >> 46971
    >> 59007
    >> 35781
    >> 53637
    >> 34860
    >> 59793
    >> 46971
    >> 50676
    >> 50677
    >>
    >> 44341
    >> 50805
    >> 50803
    >> 50802
    >>
    >> 60886
    >> 43345
    >> 48415
    >> 48417
    >> 10000
    >> 48416
    >>
    >> Best regards
    >>
    >> P.S. I inserted into zeppelin-env.sh and spark interpreter
    configuration
    >> in web UI precise address from Spark page: MASTER=spark://
    >> 192.168.58.10:7077 <http://192.168.58.10:7077>.
    >> Earlier, I got Java error stacktrace in Web UI.  I BEGAN to receive
    >> "Scheduler already terminated"
    >>
    >> On Tue, Nov 24, 2015 at 12:56 PM, moon soo Lee <m...@apache.org
    <mailto:m...@apache.org>> wrote:
    >>
    >>> Thanks for sharing the problem.
    >>>
    >>> Based on your log file, it looks like somehow your spark
    master address
    >>> is not well configured.
    >>>
    >>> Can you confirm that you have also set 'master' property in
    Interpreter
    >>> menu on GUI, at spark section?
    >>>
    >>> If it is not, you can connect Spark Master UI with your web
    browser and
    >>> see the first line, "Spark Master at spark://....". That value
    should be in
    >>> 'master' property in Interpreter menu on GUI, at spark section.
    >>>
    >>> Hope this helps
    >>>
    >>> Best,
    >>> moon
    >>>
    >>> On Tue, Nov 24, 2015 at 3:07 AM Timur Shenkao
    <t...@timshenkao.su <mailto:t...@timshenkao.su>> wrote:
    >>>
    >>>> Hi!
    >>>>
    >>>> New mistake comes: TTransportException.
    >>>> I use CentOS 6.7 + Spark 1.5.2 Standalone + Cloudera Hadoop
    5.4.8 on
    >>>> the same cluster. I can't use Mesos or Spark on YARN.
    >>>> I built Zeppelin 0.6.0 so:
    >>>> mvn clean package  –DskipTests -Pspark-1.5 -Phadoop-2.6 -Pyarn
    >>>> -Ppyspark -Pbuild-distr
    >>>>
    >>>> I constantly get errors like
    >>>> ERROR [2015-11-23 18:14:33,404] ({pool-1-thread-4}
    Job.java[run]:183) -
    >>>> Job failed
    >>>> org.apache.zeppelin.interpreter.InterpreterException:
    >>>> org.apache.thrift.transport.TTransportException
    >>>>     at
    >>>>
    
org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:237)
    >>>>
    >>>>
    >>>> or
    >>>>
    >>>> ERROR [2015-11-23 18:07:26,535] ({Thread-11}
    >>>> RemoteInterpreterEventPoller.java[run]:72) - Can't get
    >>>> RemoteInterpreterEvent
    >>>> org.apache.thrift.transport.TTransportException
    >>>>
    >>>> I changed several parameters in zeppelin-env.sh and in Spark
    configs.
    >>>> Whatever I do - these mistakes come. At the same time, when I
    use local
    >>>> Zeppelin with Hadoop in pseudodistributed mode + Spark
    Standalone (Master +
    >>>> workers on the same machine), everything works.
    >>>>
    >>>> What configurations (memory, network, CPU cores) should be in
    order to
    >>>> Zeppelin to work?
    >>>>
    >>>> I launch H2O on this cluster. And it works.
    >>>> Spark Master config:
    >>>> SPARK_MASTER_WEBUI_PORT=18080
    >>>> HADOOP_CONF_DIR=/etc/hadoop/conf
    >>>> SPARK_HOME=/usr/spark
    >>>>
    >>>> Spark Worker config:
    >>>>    export HADOOP_CONF_DIR=/etc/hadoop/conf
    >>>>    export MASTER=spark://192.168.58.10:7077
    <http://192.168.58.10:7077>
    >>>>    export SPARK_HOME=/usr/spark
    >>>>
    >>>>    SPARK_WORKER_INSTANCES=1
    >>>>    SPARK_WORKER_CORES=4
    >>>>    SPARK_WORKER_MEMORY=32G
    >>>>
    >>>>
    >>>> I apply Spark configs + zeppelin configs & logs for local
    mode   +
    >>>> zeppelin configs & logs when I defined IP address of Spark Master
    >>>> explicitly.
    >>>> Thank you.
    >>>>
    >>>
    >>
    >


Reply via email to