About 'How to increase memory for Zeppelin',  This recent thread might help
http://apache-zeppelin-users-incubating-mailing-list.75479.x6.nabble.com/Can-not-configure-driver-memory-size-td1513.html


Thanks,
moon

On Wed, Nov 25, 2015 at 11:54 PM Timur Shenkao <t...@timshenkao.su> wrote:

> Hi!
> Finally Zeppelin worked. It required to edit /etc/hive/conf/hive-site.xml
> (remove 's' in 2 parameters), delete $ZEPPELIN_HOME/bin/metastore_db,
> reload HiveMetastore & HiveServer2.
>
> Conclusion: never ever create HiveContext() in %spark and %pyspark. It
> crushes HiveContext and gives misleading errors like rebuilt your Spark
> with ENABLE_HIVE=true.
>
> I launched sparkSql job like: select count(*) from ...
>
> Data set is 6.5 Billion records.
>
> There are no errors in workers but Zeppelin failed with error (it last
> 1730 seconds):
>
> Py4JJavaError: An error occurred while calling o155.count.
> : java.lang.OutOfMemoryError: GC overhead limit exceeded
>     at
> org.apache.spark.util.io.ByteArrayChunkOutputStream.allocateNewChunkIfNeeded(ByteArrayChunkOutputStream.scala:66)
>     at
> org.apache.spark.util.io.ByteArrayChunkOutputStream.write(ByteArrayChunkOutputStream.scala:55)
>     at
> org.xerial.snappy.SnappyOutputStream.dumpOutput(SnappyOutputStream.java:294)
>     at
> org.xerial.snappy.SnappyOutputStream.compressInput(SnappyOutputStream.java:306)
>     at
> org.xerial.snappy.SnappyOutputStream.rawWrite(SnappyOutputStream.java:245)
>     at
> org.xerial.snappy.SnappyOutputStream.write(SnappyOutputStream.java:107)
>     at
> org.apache.spark.io.SnappyOutputStreamWrapper.write(CompressionCodec.scala:189)
>     at
> java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1877)
>     at
> java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1848)
>     at java.io.ObjectOutputStream.write(ObjectOutputStream.java:709)
>     at
> org.apache.hadoop.io.WritableUtils.writeCompressedByteArray(WritableUtils.java:75)
>     at
> org.apache.hadoop.io.WritableUtils.writeCompressedString(WritableUtils.java:94)
>     at
> org.apache.hadoop.io.WritableUtils.writeCompressedStringArray(WritableUtils.java:155)
>     at org.apache.hadoop.conf.Configuration.write(Configuration.java:2756)
>     at
> org.apache.spark.util.SerializableConfiguration$$anonfun$writeObject$1.apply$mcV$sp(SerializableConfiguration.scala:27)
>     at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1160)
>     at
> org.apache.spark.util.SerializableConfiguration.writeObject(SerializableConfiguration.scala:25)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:483)
>     at
> java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988)
>     at
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496)
>     at
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>     at
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>     at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
>     at
> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44)
>     at
> org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:203)
>     at
> org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:102)
>     at
> org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:85)
>     at
> org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
>     at
> org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:63)(<class
> 'py4j.protocol.Py4JJavaError'>, Py4JJavaError(u'An error occurred while
> calling o155.count.\n', JavaObject id=o156), <traceback object at
> 0x7fec15043b90>)
>
> I apply logs. I see that Garbage Collector squeezed out Zeppelin job on
> Master server. The job was run on 4 workers with 32 GB RAM each.
>
> Questions are :
> How to make Zeppelin not to fail?
> How to increase memory for Zeppelin?
> How to know that job is actually frozen because of lack of memory? Don't
> wait until GC forces out the job.
>
> On Wed, Nov 25, 2015 at 12:56 PM, Timur Shenkao <t...@timshenkao.su> wrote:
>
>> Hi again!
>>
>> Spark works, Hive works, %sh works!
>>
>> But when I try to use %pyspark^
>> %pyspark
>> sqlContext.setConf("spark.sql.orc.filterPushdown", "true")
>> people = sqlContext.read.format("orc").load("peoplePartitioned")
>> people.filter(people.age < 15).select("name").show()
>>
>>  error comes:
>> Traceback (most recent call last):
>>  File "/tmp/zeppelin_pyspark.py", line 178, in <module>
>>    eval(compiledCode)
>>  File "<string>", line 1, in <module>
>>  File "/usr/spark/python/pyspark/sql/context.py", line 632, in read
>>    return DataFrameReader(self)
>>  File "/usr/spark/python/pyspark/sql/readwriter.py", line 49, in __init__
>>    self._jreader = sqlContext._ssql_ctx.read()
>>  File "/usr/spark/python/pyspark/sql/context.py", line 660, in _ssql_ctx
>>    "build/sbt assembly", e)
>> Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and
>> run build/sbt assembly", Py4JJavaError(u'An error occurred while calling
>> None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o56))
>>
>>
>> Is there some specific name for sqlContext in %pyspark?
>>
>> Or should I really rebuild Spark?
>>
>> Best regards.
>>
>>
>> On Tue, Nov 24, 2015 at 10:51 PM, moon soo Lee <m...@apache.org> wrote:
>>
>>> Really appreciate for trying.
>>>
>>> About HiveContext (sqlContext)
>>> Zeppelin creates sqlContext and inject it by default.
>>> So you don't need to create it manually.
>>>
>>> If there're multiple sqlContext (HiveContext) being created with Derby
>>> as metastore, then only first one works but all others will fail.
>>>
>>> Therefore, it would help
>>>  - make sure unnecessary Interpreter processes (ps -ef | grep
>>> RemoteInterpreterServer) are not remaining from previous Zeppelin execution.
>>>  - try not to create sqlContext manually
>>>
>>> Thanks,
>>> moon
>>>
>>> On Wed, Nov 25, 2015 at 3:32 AM tsh <t...@timshenkao.su> wrote:
>>>
>>>> Hi!
>>>> Couple days ago I tested Zeppelin on my laptop, Cloudera Hadoop in
>>>> pseudodistributed mode with Spark Standalone. I faced with
>>>> fasterxml.jackson problem. Eric Charles said that he had the similar
>>>> problem and advised to remove jackson-*.jar libraries from lib folder. So I
>>>> did it. I also coped with parameters in zeppelin-env.sh to make Zeppelin
>>>> work locally.
>>>>
>>>> On Monday, when I came to job, it became clear that configuration
>>>> parameters for local installation and real cluster installation vary
>>>> greatly. And I got this Thrift Transport Exception .
>>>> In 2 days, rebuilt Zeppelin several times, checked all parameters,
>>>> checked & changed my network.  At last, when I received your letter, I
>>>> checked MASTER variable. And I remembered those deleted *.jar files. I
>>>> thought that they are sections of the chain. I copied them back to lib
>>>> folder. And Spark began to work!
>>>> But Spark SQL doesn't work, DataFrames can't load & write ORC files. It
>>>> gives some HiveContext error connected to metastore_db (Derby).  Either
>>>> Hive itself (which is situated on the same edge node as Zeppelin) has its
>>>> own Derby metastore_db, or I should delete metastore_db from
>>>> $ZEPPELIN_HOME/bin. Should I?
>>>> The code is
>>>> %spark
>>>> import org.apache.spark.sql._
>>>> val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
>>>>
>>>> Import is made. Then I get error.
>>>>
>>>>
>>>>
>>>>
>>>> On 11/24/2015 07:39 PM, moon soo Lee wrote:
>>>>
>>>> Basically, if SPARK_HOME/bin/spark-shell works, then export SPARK_HOME
>>>> in conf/zeppelin-env.sh and setting 'master' property in Interpreter menu
>>>> on Zeppelin GUI should be enough to make successful connection to Spark
>>>> standalone cluster.
>>>>
>>>> Do you see any new exception in your log file when you set 'master'
>>>> property in Interpreter menu on Zeppelin GUI and see 'Scheduler already
>>>> Terminated' error? If you can share, that would be helpful.
>>>>
>>>> Zeppelin does not use HiveThriftServer2 and does not need any other
>>>> dependency except for JVM to run, once it's been built.
>>>>
>>>>
>>>> Thanks,
>>>> moon
>>>>
>>>> On Tue, Nov 24, 2015 at 11:37 PM Timur Shenkao <t...@timshenkao.su>
>>>> wrote:
>>>>
>>>>> One more question. What should be installed on server? What the
>>>>> dependencies of Zeppelin?
>>>>> Node.js, npm, bower? Scala?
>>>>>
>>>>> On Tue, Nov 24, 2015 at 5:34 PM, Timur Shenkao < <t...@timshenkao.su>
>>>>> t...@timshenkao.su> wrote:
>>>>>
>>>>> > I also checked Spark workers. There are no traces, folders, logs
>>>>> about
>>>>> > Zeppelin on them.
>>>>> > There are logs about Zeppelin on Spark Master server only where
>>>>> Zeppelin
>>>>> > is launched.
>>>>> >
>>>>> > For example, H2O creates logs on every worker in folders
>>>>> > /usr/spark/work/app-.....-... Is it correct?
>>>>> >
>>>>> > I also launched Thrift server via
>>>>> /usr/spark/sbin/start-thriftserver.sh on
>>>>> > Spark Master. Does Zeppelin use
>>>>> > org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 ?
>>>>> >
>>>>> > For terminated scheduler, I got
>>>>> > INFO [2015-11-24 16:26:16,610] ({pool-1-thread-2}
>>>>> > SchedulerFactory.java[jobFinished]:138) - Job paragraph_1448346$
>>>>> > ERROR [2015-11-24 16:26:17,658] ({Thread-34}
>>>>> > JobProgressPoller.java[run]:57) - Can not get or update progress
>>>>> > org.apache.zeppelin.interpreter.InterpreterException:
>>>>> > org.apache.thrift.transport.TTransportException
>>>>> >         at
>>>>> >
>>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getProgress(RemoteInterpreter.java:302)
>>>>> >         at
>>>>> >
>>>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.getProgress(LazyOpenInterpreter.java:110)
>>>>> >         at
>>>>> > org.apache.zeppelin.notebook.Paragraph.progress(Paragraph.java:174)
>>>>> >         at
>>>>> >
>>>>> org.apache.zeppelin.scheduler.JobProgressPoller.run(JobProgressPoller.java:54)
>>>>> > Caused by: org.apache.thrift.transport.TTransportException
>>>>> >         at
>>>>> >
>>>>> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
>>>>> >         at
>>>>> > org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
>>>>> >         at
>>>>> >
>>>>> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
>>>>> >         at
>>>>> >
>>>>> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
>>>>> >         at
>>>>> >
>>>>> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
>>>>> >         at
>>>>> > org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
>>>>> >         at
>>>>> >
>>>>> org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_getProgress(RemoteInterpret$
>>>>> >         at
>>>>> >
>>>>> org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.getProgress(RemoteInterpreterSer$
>>>>> > INFO [2015-11-24 16:26:52,617] ({qtp982007015-52}
>>>>> > InterpreterRestApi.java[updateSetting]:104) - Update interprete$
>>>>> >  INFO [2015-11-24 16:27:56,319] ({qtp982007015-48}
>>>>> > InterpreterRestApi.java[restartSetting]:143) - Restart interpre$
>>>>> > ERROR [2015-11-24 16:28:09,603] ({qtp982007015-48}
>>>>> > NotebookServer.java[runParagraph]:661) - Exception from run
>>>>> > java.lang.RuntimeException: Scheduler already terminated
>>>>> >         at
>>>>> >
>>>>> org.apache.zeppelin.scheduler.RemoteScheduler.submit(RemoteScheduler.java:124)
>>>>> >         at org.apache.zeppelin.notebook.Note.run(Note.java:326)
>>>>> >         at
>>>>> >
>>>>> org.apache.zeppelin.socket.NotebookServer.runParagraph(NotebookServer.java:659)
>>>>> >         at
>>>>> >
>>>>> org.apache.zeppelin.socket.NotebookServer.onMessage(NotebookServer.java:126)
>>>>> >         at
>>>>> >
>>>>> org.apache.zeppelin.socket.NotebookSocket.onMessage(NotebookSocket.java:56)
>>>>> >         at
>>>>> >
>>>>> org.eclipse.jetty.websocket.WebSocketConnectionRFC6455$WSFrameHandler.onFrame(WebSocketConnectionRFC645$
>>>>> >         at
>>>>> >
>>>>> org.eclipse.jetty.websocket.WebSocketParserRFC6455.parseNext(WebSocketParserRFC6455.java:349)
>>>>> >         at
>>>>> >
>>>>> org.eclipse.jetty.websocket.WebSocketConnectionRFC6455.handle(WebSocketConnectionRFC6455.java:225)
>>>>> >         at
>>>>> >
>>>>> org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667)
>>>>> >         at
>>>>> >
>>>>> org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
>>>>> >         at
>>>>> >
>>>>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
>>>>> >         at
>>>>> >
>>>>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
>>>>> >         at java.lang.Thread.run(Thread.java:745)
>>>>> > ERROR [2015-11-24 16:28:36,906] ({qtp982007015-50}
>>>>> > NotebookServer.java[runParagraph]:661) - Exception from run
>>>>> > java.lang.RuntimeException: Scheduler already terminated
>>>>> >         at
>>>>> >
>>>>> org.apache.zeppelin.scheduler.RemoteScheduler.submit(RemoteScheduler.java:124)
>>>>> >         at org.apache.zeppelin.notebook.Note.run(Note.java:326)
>>>>> >         at
>>>>> >
>>>>> org.apache.zeppelin.socket.NotebookServer.runParagraph(NotebookServer.java:659)
>>>>> >         at
>>>>> >
>>>>> org.apache.zeppelin.socket.NotebookServer.onMessage(NotebookServer.java:126)
>>>>> >         at
>>>>> >
>>>>> org.apache.zeppelin.socket.NotebookSocket.onMessage(NotebookSocket.java:56)
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Tue, Nov 24, 2015 at 4:50 PM, Timur Shenkao <t...@timshenkao.su>
>>>>> wrote:
>>>>> >
>>>>> >> Hello!
>>>>> >>
>>>>> >> There is no Kerberos, no security in my cluster. It's in an internal
>>>>> >> network.
>>>>> >>
>>>>> >> Interpreters %hive and %sh work, I can create tables, drop, pwd,
>>>>> etc. So,
>>>>> >> the problem is in integration with Spark.
>>>>> >>
>>>>> >> In /usr/spark/conf/spark-env.sh I set / unset in turn MASTER =
>>>>> >> spark://localhost:7077,  MASTER = spark://192.168.58.10:7077,
>>>>> MASTER =
>>>>> >> spark://127.0.0.1:7077 on master node. On slaves I set / unset in
>>>>> turn
>>>>> >> MASTER = spark://192.168.58.10:7077 in different combinations.
>>>>> >>
>>>>> >> Zeppelin is installed on the same machine as Spark Master. So, in
>>>>> >> zeppelin-env.sh I set / unset MASTER = spark://localhost:7077,
>>>>> MASTER =
>>>>> >> spark://192.168.58.10:7077, MASTER = spark://127.0.0.1:7077
>>>>> >> Yes, I can connect to 192.168.58 and see URL spark://192.168.58:7077
>>>>> >> REST URL spark://192.168.58:6066 (cluster mode)
>>>>> >>
>>>>> >> Does TCP type influence? On my laptop, in pseudodistributed mode,
>>>>> all
>>>>> >> connections are IPv4 (tcp). There are IPv4 lines in /etc/hosts only.
>>>>> >> In cluster, Spark automatically, for unknown reasons, uses IPv6
>>>>> (tcp6).
>>>>> >> There are IPv6 lines in /etc/hosts.
>>>>> >> Right now, I try to make Spark use IPv4
>>>>> >>
>>>>> >> I switched Spark to IPv4 via -Djava.net.preferIPv4Stack=true
>>>>> >>
>>>>> >> It seems that Zeppelin uses / answers the following ports on Master
>>>>> >> server (ps axu | grep zeppelin;  then for each PID netstat -natp |
>>>>> grep
>>>>> >> ...):
>>>>> >> 41303
>>>>> >> 46971
>>>>> >> 59007
>>>>> >> 35781
>>>>> >> 53637
>>>>> >> 34860
>>>>> >> 59793
>>>>> >> 46971
>>>>> >> 50676
>>>>> >> 50677
>>>>> >>
>>>>> >> 44341
>>>>> >> 50805
>>>>> >> 50803
>>>>> >> 50802
>>>>> >>
>>>>> >> 60886
>>>>> >> 43345
>>>>> >> 48415
>>>>> >> 48417
>>>>> >> 10000
>>>>> >> 48416
>>>>> >>
>>>>> >> Best regards
>>>>> >>
>>>>> >> P.S. I inserted into zeppelin-env.sh and spark interpreter
>>>>> configuration
>>>>> >> in web UI precise address from Spark page: MASTER=spark://
>>>>> >> 192.168.58.10:7077.
>>>>> >> Earlier, I got Java error stacktrace in Web UI.  I BEGAN to receive
>>>>> >> "Scheduler already terminated"
>>>>> >>
>>>>> >> On Tue, Nov 24, 2015 at 12:56 PM, moon soo Lee <m...@apache.org>
>>>>> wrote:
>>>>> >>
>>>>> >>> Thanks for sharing the problem.
>>>>> >>>
>>>>> >>> Based on your log file, it looks like somehow your spark master
>>>>> address
>>>>> >>> is not well configured.
>>>>> >>>
>>>>> >>> Can you confirm that you have also set 'master' property in
>>>>> Interpreter
>>>>> >>> menu on GUI, at spark section?
>>>>> >>>
>>>>> >>> If it is not, you can connect Spark Master UI with your web
>>>>> browser and
>>>>> >>> see the first line, "Spark Master at spark://....". That value
>>>>> should be in
>>>>> >>> 'master' property in Interpreter menu on GUI, at spark section.
>>>>> >>>
>>>>> >>> Hope this helps
>>>>> >>>
>>>>> >>> Best,
>>>>> >>> moon
>>>>> >>>
>>>>> >>> On Tue, Nov 24, 2015 at 3:07 AM Timur Shenkao <t...@timshenkao.su>
>>>>> wrote:
>>>>> >>>
>>>>> >>>> Hi!
>>>>> >>>>
>>>>> >>>> New mistake comes: TTransportException.
>>>>> >>>> I use CentOS 6.7 + Spark 1.5.2 Standalone + Cloudera Hadoop 5.4.8
>>>>> on
>>>>> >>>> the same cluster. I can't use Mesos or Spark on YARN.
>>>>> >>>> I built Zeppelin 0.6.0 so:
>>>>> >>>> mvn clean package  –DskipTests  -Pspark-1.5 -Phadoop-2.6 -Pyarn
>>>>> >>>> -Ppyspark -Pbuild-distr
>>>>> >>>>
>>>>> >>>> I constantly get errors like
>>>>> >>>> ERROR [2015-11-23 18:14:33,404] ({pool-1-thread-4}
>>>>> Job.java[run]:183) -
>>>>> >>>> Job failed
>>>>> >>>> org.apache.zeppelin.interpreter.InterpreterException:
>>>>> >>>> org.apache.thrift.transport.TTransportException
>>>>> >>>>     at
>>>>> >>>>
>>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:237)
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> or
>>>>> >>>>
>>>>> >>>> ERROR [2015-11-23 18:07:26,535] ({Thread-11}
>>>>> >>>> RemoteInterpreterEventPoller.java[run]:72) - Can't get
>>>>> >>>> RemoteInterpreterEvent
>>>>> >>>> org.apache.thrift.transport.TTransportException
>>>>> >>>>
>>>>> >>>> I changed several parameters in zeppelin-env.sh and in Spark
>>>>> configs.
>>>>> >>>> Whatever I do - these mistakes come. At the same time, when I use
>>>>> local
>>>>> >>>> Zeppelin with Hadoop in pseudodistributed mode + Spark Standalone
>>>>> (Master +
>>>>> >>>> workers on the same machine), everything works.
>>>>> >>>>
>>>>> >>>> What configurations (memory, network, CPU cores) should be in
>>>>> order to
>>>>> >>>> Zeppelin to work?
>>>>> >>>>
>>>>> >>>> I launch H2O on this cluster. And it works.
>>>>> >>>> Spark Master config:
>>>>> >>>> SPARK_MASTER_WEBUI_PORT=18080
>>>>> >>>> HADOOP_CONF_DIR=/etc/hadoop/conf
>>>>> >>>> SPARK_HOME=/usr/spark
>>>>> >>>>
>>>>> >>>> Spark Worker config:
>>>>> >>>>    export HADOOP_CONF_DIR=/etc/hadoop/conf
>>>>> >>>>    export MASTER=spark://192.168.58.10:7077
>>>>> >>>>    export SPARK_HOME=/usr/spark
>>>>> >>>>
>>>>> >>>>    SPARK_WORKER_INSTANCES=1
>>>>> >>>>    SPARK_WORKER_CORES=4
>>>>> >>>>    SPARK_WORKER_MEMORY=32G
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> I apply Spark configs + zeppelin configs & logs for local mode   +
>>>>> >>>> zeppelin configs & logs when I defined IP address of Spark Master
>>>>> >>>> explicitly.
>>>>> >>>> Thank you.
>>>>> >>>>
>>>>> >>>
>>>>> >>
>>>>> >
>>>>>
>>>>
>>>>
>>
>

Reply via email to