About 'How to increase memory for Zeppelin', This recent thread might help http://apache-zeppelin-users-incubating-mailing-list.75479.x6.nabble.com/Can-not-configure-driver-memory-size-td1513.html
Thanks, moon On Wed, Nov 25, 2015 at 11:54 PM Timur Shenkao <t...@timshenkao.su> wrote: > Hi! > Finally Zeppelin worked. It required to edit /etc/hive/conf/hive-site.xml > (remove 's' in 2 parameters), delete $ZEPPELIN_HOME/bin/metastore_db, > reload HiveMetastore & HiveServer2. > > Conclusion: never ever create HiveContext() in %spark and %pyspark. It > crushes HiveContext and gives misleading errors like rebuilt your Spark > with ENABLE_HIVE=true. > > I launched sparkSql job like: select count(*) from ... > > Data set is 6.5 Billion records. > > There are no errors in workers but Zeppelin failed with error (it last > 1730 seconds): > > Py4JJavaError: An error occurred while calling o155.count. > : java.lang.OutOfMemoryError: GC overhead limit exceeded > at > org.apache.spark.util.io.ByteArrayChunkOutputStream.allocateNewChunkIfNeeded(ByteArrayChunkOutputStream.scala:66) > at > org.apache.spark.util.io.ByteArrayChunkOutputStream.write(ByteArrayChunkOutputStream.scala:55) > at > org.xerial.snappy.SnappyOutputStream.dumpOutput(SnappyOutputStream.java:294) > at > org.xerial.snappy.SnappyOutputStream.compressInput(SnappyOutputStream.java:306) > at > org.xerial.snappy.SnappyOutputStream.rawWrite(SnappyOutputStream.java:245) > at > org.xerial.snappy.SnappyOutputStream.write(SnappyOutputStream.java:107) > at > org.apache.spark.io.SnappyOutputStreamWrapper.write(CompressionCodec.scala:189) > at > java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1877) > at > java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1848) > at java.io.ObjectOutputStream.write(ObjectOutputStream.java:709) > at > org.apache.hadoop.io.WritableUtils.writeCompressedByteArray(WritableUtils.java:75) > at > org.apache.hadoop.io.WritableUtils.writeCompressedString(WritableUtils.java:94) > at > org.apache.hadoop.io.WritableUtils.writeCompressedStringArray(WritableUtils.java:155) > at org.apache.hadoop.conf.Configuration.write(Configuration.java:2756) > at > org.apache.spark.util.SerializableConfiguration$$anonfun$writeObject$1.apply$mcV$sp(SerializableConfiguration.scala:27) > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1160) > at > org.apache.spark.util.SerializableConfiguration.writeObject(SerializableConfiguration.scala:25) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:483) > at > java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) > at > org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44) > at > org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:203) > at > org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:102) > at > org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:85) > at > org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34) > at > org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:63)(<class > 'py4j.protocol.Py4JJavaError'>, Py4JJavaError(u'An error occurred while > calling o155.count.\n', JavaObject id=o156), <traceback object at > 0x7fec15043b90>) > > I apply logs. I see that Garbage Collector squeezed out Zeppelin job on > Master server. The job was run on 4 workers with 32 GB RAM each. > > Questions are : > How to make Zeppelin not to fail? > How to increase memory for Zeppelin? > How to know that job is actually frozen because of lack of memory? Don't > wait until GC forces out the job. > > On Wed, Nov 25, 2015 at 12:56 PM, Timur Shenkao <t...@timshenkao.su> wrote: > >> Hi again! >> >> Spark works, Hive works, %sh works! >> >> But when I try to use %pyspark^ >> %pyspark >> sqlContext.setConf("spark.sql.orc.filterPushdown", "true") >> people = sqlContext.read.format("orc").load("peoplePartitioned") >> people.filter(people.age < 15).select("name").show() >> >> error comes: >> Traceback (most recent call last): >> File "/tmp/zeppelin_pyspark.py", line 178, in <module> >> eval(compiledCode) >> File "<string>", line 1, in <module> >> File "/usr/spark/python/pyspark/sql/context.py", line 632, in read >> return DataFrameReader(self) >> File "/usr/spark/python/pyspark/sql/readwriter.py", line 49, in __init__ >> self._jreader = sqlContext._ssql_ctx.read() >> File "/usr/spark/python/pyspark/sql/context.py", line 660, in _ssql_ctx >> "build/sbt assembly", e) >> Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and >> run build/sbt assembly", Py4JJavaError(u'An error occurred while calling >> None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o56)) >> >> >> Is there some specific name for sqlContext in %pyspark? >> >> Or should I really rebuild Spark? >> >> Best regards. >> >> >> On Tue, Nov 24, 2015 at 10:51 PM, moon soo Lee <m...@apache.org> wrote: >> >>> Really appreciate for trying. >>> >>> About HiveContext (sqlContext) >>> Zeppelin creates sqlContext and inject it by default. >>> So you don't need to create it manually. >>> >>> If there're multiple sqlContext (HiveContext) being created with Derby >>> as metastore, then only first one works but all others will fail. >>> >>> Therefore, it would help >>> - make sure unnecessary Interpreter processes (ps -ef | grep >>> RemoteInterpreterServer) are not remaining from previous Zeppelin execution. >>> - try not to create sqlContext manually >>> >>> Thanks, >>> moon >>> >>> On Wed, Nov 25, 2015 at 3:32 AM tsh <t...@timshenkao.su> wrote: >>> >>>> Hi! >>>> Couple days ago I tested Zeppelin on my laptop, Cloudera Hadoop in >>>> pseudodistributed mode with Spark Standalone. I faced with >>>> fasterxml.jackson problem. Eric Charles said that he had the similar >>>> problem and advised to remove jackson-*.jar libraries from lib folder. So I >>>> did it. I also coped with parameters in zeppelin-env.sh to make Zeppelin >>>> work locally. >>>> >>>> On Monday, when I came to job, it became clear that configuration >>>> parameters for local installation and real cluster installation vary >>>> greatly. And I got this Thrift Transport Exception . >>>> In 2 days, rebuilt Zeppelin several times, checked all parameters, >>>> checked & changed my network. At last, when I received your letter, I >>>> checked MASTER variable. And I remembered those deleted *.jar files. I >>>> thought that they are sections of the chain. I copied them back to lib >>>> folder. And Spark began to work! >>>> But Spark SQL doesn't work, DataFrames can't load & write ORC files. It >>>> gives some HiveContext error connected to metastore_db (Derby). Either >>>> Hive itself (which is situated on the same edge node as Zeppelin) has its >>>> own Derby metastore_db, or I should delete metastore_db from >>>> $ZEPPELIN_HOME/bin. Should I? >>>> The code is >>>> %spark >>>> import org.apache.spark.sql._ >>>> val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) >>>> >>>> Import is made. Then I get error. >>>> >>>> >>>> >>>> >>>> On 11/24/2015 07:39 PM, moon soo Lee wrote: >>>> >>>> Basically, if SPARK_HOME/bin/spark-shell works, then export SPARK_HOME >>>> in conf/zeppelin-env.sh and setting 'master' property in Interpreter menu >>>> on Zeppelin GUI should be enough to make successful connection to Spark >>>> standalone cluster. >>>> >>>> Do you see any new exception in your log file when you set 'master' >>>> property in Interpreter menu on Zeppelin GUI and see 'Scheduler already >>>> Terminated' error? If you can share, that would be helpful. >>>> >>>> Zeppelin does not use HiveThriftServer2 and does not need any other >>>> dependency except for JVM to run, once it's been built. >>>> >>>> >>>> Thanks, >>>> moon >>>> >>>> On Tue, Nov 24, 2015 at 11:37 PM Timur Shenkao <t...@timshenkao.su> >>>> wrote: >>>> >>>>> One more question. What should be installed on server? What the >>>>> dependencies of Zeppelin? >>>>> Node.js, npm, bower? Scala? >>>>> >>>>> On Tue, Nov 24, 2015 at 5:34 PM, Timur Shenkao < <t...@timshenkao.su> >>>>> t...@timshenkao.su> wrote: >>>>> >>>>> > I also checked Spark workers. There are no traces, folders, logs >>>>> about >>>>> > Zeppelin on them. >>>>> > There are logs about Zeppelin on Spark Master server only where >>>>> Zeppelin >>>>> > is launched. >>>>> > >>>>> > For example, H2O creates logs on every worker in folders >>>>> > /usr/spark/work/app-.....-... Is it correct? >>>>> > >>>>> > I also launched Thrift server via >>>>> /usr/spark/sbin/start-thriftserver.sh on >>>>> > Spark Master. Does Zeppelin use >>>>> > org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 ? >>>>> > >>>>> > For terminated scheduler, I got >>>>> > INFO [2015-11-24 16:26:16,610] ({pool-1-thread-2} >>>>> > SchedulerFactory.java[jobFinished]:138) - Job paragraph_1448346$ >>>>> > ERROR [2015-11-24 16:26:17,658] ({Thread-34} >>>>> > JobProgressPoller.java[run]:57) - Can not get or update progress >>>>> > org.apache.zeppelin.interpreter.InterpreterException: >>>>> > org.apache.thrift.transport.TTransportException >>>>> > at >>>>> > >>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getProgress(RemoteInterpreter.java:302) >>>>> > at >>>>> > >>>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.getProgress(LazyOpenInterpreter.java:110) >>>>> > at >>>>> > org.apache.zeppelin.notebook.Paragraph.progress(Paragraph.java:174) >>>>> > at >>>>> > >>>>> org.apache.zeppelin.scheduler.JobProgressPoller.run(JobProgressPoller.java:54) >>>>> > Caused by: org.apache.thrift.transport.TTransportException >>>>> > at >>>>> > >>>>> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) >>>>> > at >>>>> > org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) >>>>> > at >>>>> > >>>>> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429) >>>>> > at >>>>> > >>>>> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318) >>>>> > at >>>>> > >>>>> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219) >>>>> > at >>>>> > org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) >>>>> > at >>>>> > >>>>> org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_getProgress(RemoteInterpret$ >>>>> > at >>>>> > >>>>> org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.getProgress(RemoteInterpreterSer$ >>>>> > INFO [2015-11-24 16:26:52,617] ({qtp982007015-52} >>>>> > InterpreterRestApi.java[updateSetting]:104) - Update interprete$ >>>>> > INFO [2015-11-24 16:27:56,319] ({qtp982007015-48} >>>>> > InterpreterRestApi.java[restartSetting]:143) - Restart interpre$ >>>>> > ERROR [2015-11-24 16:28:09,603] ({qtp982007015-48} >>>>> > NotebookServer.java[runParagraph]:661) - Exception from run >>>>> > java.lang.RuntimeException: Scheduler already terminated >>>>> > at >>>>> > >>>>> org.apache.zeppelin.scheduler.RemoteScheduler.submit(RemoteScheduler.java:124) >>>>> > at org.apache.zeppelin.notebook.Note.run(Note.java:326) >>>>> > at >>>>> > >>>>> org.apache.zeppelin.socket.NotebookServer.runParagraph(NotebookServer.java:659) >>>>> > at >>>>> > >>>>> org.apache.zeppelin.socket.NotebookServer.onMessage(NotebookServer.java:126) >>>>> > at >>>>> > >>>>> org.apache.zeppelin.socket.NotebookSocket.onMessage(NotebookSocket.java:56) >>>>> > at >>>>> > >>>>> org.eclipse.jetty.websocket.WebSocketConnectionRFC6455$WSFrameHandler.onFrame(WebSocketConnectionRFC645$ >>>>> > at >>>>> > >>>>> org.eclipse.jetty.websocket.WebSocketParserRFC6455.parseNext(WebSocketParserRFC6455.java:349) >>>>> > at >>>>> > >>>>> org.eclipse.jetty.websocket.WebSocketConnectionRFC6455.handle(WebSocketConnectionRFC6455.java:225) >>>>> > at >>>>> > >>>>> org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667) >>>>> > at >>>>> > >>>>> org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52) >>>>> > at >>>>> > >>>>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) >>>>> > at >>>>> > >>>>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) >>>>> > at java.lang.Thread.run(Thread.java:745) >>>>> > ERROR [2015-11-24 16:28:36,906] ({qtp982007015-50} >>>>> > NotebookServer.java[runParagraph]:661) - Exception from run >>>>> > java.lang.RuntimeException: Scheduler already terminated >>>>> > at >>>>> > >>>>> org.apache.zeppelin.scheduler.RemoteScheduler.submit(RemoteScheduler.java:124) >>>>> > at org.apache.zeppelin.notebook.Note.run(Note.java:326) >>>>> > at >>>>> > >>>>> org.apache.zeppelin.socket.NotebookServer.runParagraph(NotebookServer.java:659) >>>>> > at >>>>> > >>>>> org.apache.zeppelin.socket.NotebookServer.onMessage(NotebookServer.java:126) >>>>> > at >>>>> > >>>>> org.apache.zeppelin.socket.NotebookSocket.onMessage(NotebookSocket.java:56) >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > On Tue, Nov 24, 2015 at 4:50 PM, Timur Shenkao <t...@timshenkao.su> >>>>> wrote: >>>>> > >>>>> >> Hello! >>>>> >> >>>>> >> There is no Kerberos, no security in my cluster. It's in an internal >>>>> >> network. >>>>> >> >>>>> >> Interpreters %hive and %sh work, I can create tables, drop, pwd, >>>>> etc. So, >>>>> >> the problem is in integration with Spark. >>>>> >> >>>>> >> In /usr/spark/conf/spark-env.sh I set / unset in turn MASTER = >>>>> >> spark://localhost:7077, MASTER = spark://192.168.58.10:7077, >>>>> MASTER = >>>>> >> spark://127.0.0.1:7077 on master node. On slaves I set / unset in >>>>> turn >>>>> >> MASTER = spark://192.168.58.10:7077 in different combinations. >>>>> >> >>>>> >> Zeppelin is installed on the same machine as Spark Master. So, in >>>>> >> zeppelin-env.sh I set / unset MASTER = spark://localhost:7077, >>>>> MASTER = >>>>> >> spark://192.168.58.10:7077, MASTER = spark://127.0.0.1:7077 >>>>> >> Yes, I can connect to 192.168.58 and see URL spark://192.168.58:7077 >>>>> >> REST URL spark://192.168.58:6066 (cluster mode) >>>>> >> >>>>> >> Does TCP type influence? On my laptop, in pseudodistributed mode, >>>>> all >>>>> >> connections are IPv4 (tcp). There are IPv4 lines in /etc/hosts only. >>>>> >> In cluster, Spark automatically, for unknown reasons, uses IPv6 >>>>> (tcp6). >>>>> >> There are IPv6 lines in /etc/hosts. >>>>> >> Right now, I try to make Spark use IPv4 >>>>> >> >>>>> >> I switched Spark to IPv4 via -Djava.net.preferIPv4Stack=true >>>>> >> >>>>> >> It seems that Zeppelin uses / answers the following ports on Master >>>>> >> server (ps axu | grep zeppelin; then for each PID netstat -natp | >>>>> grep >>>>> >> ...): >>>>> >> 41303 >>>>> >> 46971 >>>>> >> 59007 >>>>> >> 35781 >>>>> >> 53637 >>>>> >> 34860 >>>>> >> 59793 >>>>> >> 46971 >>>>> >> 50676 >>>>> >> 50677 >>>>> >> >>>>> >> 44341 >>>>> >> 50805 >>>>> >> 50803 >>>>> >> 50802 >>>>> >> >>>>> >> 60886 >>>>> >> 43345 >>>>> >> 48415 >>>>> >> 48417 >>>>> >> 10000 >>>>> >> 48416 >>>>> >> >>>>> >> Best regards >>>>> >> >>>>> >> P.S. I inserted into zeppelin-env.sh and spark interpreter >>>>> configuration >>>>> >> in web UI precise address from Spark page: MASTER=spark:// >>>>> >> 192.168.58.10:7077. >>>>> >> Earlier, I got Java error stacktrace in Web UI. I BEGAN to receive >>>>> >> "Scheduler already terminated" >>>>> >> >>>>> >> On Tue, Nov 24, 2015 at 12:56 PM, moon soo Lee <m...@apache.org> >>>>> wrote: >>>>> >> >>>>> >>> Thanks for sharing the problem. >>>>> >>> >>>>> >>> Based on your log file, it looks like somehow your spark master >>>>> address >>>>> >>> is not well configured. >>>>> >>> >>>>> >>> Can you confirm that you have also set 'master' property in >>>>> Interpreter >>>>> >>> menu on GUI, at spark section? >>>>> >>> >>>>> >>> If it is not, you can connect Spark Master UI with your web >>>>> browser and >>>>> >>> see the first line, "Spark Master at spark://....". That value >>>>> should be in >>>>> >>> 'master' property in Interpreter menu on GUI, at spark section. >>>>> >>> >>>>> >>> Hope this helps >>>>> >>> >>>>> >>> Best, >>>>> >>> moon >>>>> >>> >>>>> >>> On Tue, Nov 24, 2015 at 3:07 AM Timur Shenkao <t...@timshenkao.su> >>>>> wrote: >>>>> >>> >>>>> >>>> Hi! >>>>> >>>> >>>>> >>>> New mistake comes: TTransportException. >>>>> >>>> I use CentOS 6.7 + Spark 1.5.2 Standalone + Cloudera Hadoop 5.4.8 >>>>> on >>>>> >>>> the same cluster. I can't use Mesos or Spark on YARN. >>>>> >>>> I built Zeppelin 0.6.0 so: >>>>> >>>> mvn clean package –DskipTests -Pspark-1.5 -Phadoop-2.6 -Pyarn >>>>> >>>> -Ppyspark -Pbuild-distr >>>>> >>>> >>>>> >>>> I constantly get errors like >>>>> >>>> ERROR [2015-11-23 18:14:33,404] ({pool-1-thread-4} >>>>> Job.java[run]:183) - >>>>> >>>> Job failed >>>>> >>>> org.apache.zeppelin.interpreter.InterpreterException: >>>>> >>>> org.apache.thrift.transport.TTransportException >>>>> >>>> at >>>>> >>>> >>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:237) >>>>> >>>> >>>>> >>>> >>>>> >>>> or >>>>> >>>> >>>>> >>>> ERROR [2015-11-23 18:07:26,535] ({Thread-11} >>>>> >>>> RemoteInterpreterEventPoller.java[run]:72) - Can't get >>>>> >>>> RemoteInterpreterEvent >>>>> >>>> org.apache.thrift.transport.TTransportException >>>>> >>>> >>>>> >>>> I changed several parameters in zeppelin-env.sh and in Spark >>>>> configs. >>>>> >>>> Whatever I do - these mistakes come. At the same time, when I use >>>>> local >>>>> >>>> Zeppelin with Hadoop in pseudodistributed mode + Spark Standalone >>>>> (Master + >>>>> >>>> workers on the same machine), everything works. >>>>> >>>> >>>>> >>>> What configurations (memory, network, CPU cores) should be in >>>>> order to >>>>> >>>> Zeppelin to work? >>>>> >>>> >>>>> >>>> I launch H2O on this cluster. And it works. >>>>> >>>> Spark Master config: >>>>> >>>> SPARK_MASTER_WEBUI_PORT=18080 >>>>> >>>> HADOOP_CONF_DIR=/etc/hadoop/conf >>>>> >>>> SPARK_HOME=/usr/spark >>>>> >>>> >>>>> >>>> Spark Worker config: >>>>> >>>> export HADOOP_CONF_DIR=/etc/hadoop/conf >>>>> >>>> export MASTER=spark://192.168.58.10:7077 >>>>> >>>> export SPARK_HOME=/usr/spark >>>>> >>>> >>>>> >>>> SPARK_WORKER_INSTANCES=1 >>>>> >>>> SPARK_WORKER_CORES=4 >>>>> >>>> SPARK_WORKER_MEMORY=32G >>>>> >>>> >>>>> >>>> >>>>> >>>> I apply Spark configs + zeppelin configs & logs for local mode + >>>>> >>>> zeppelin configs & logs when I defined IP address of Spark Master >>>>> >>>> explicitly. >>>>> >>>> Thank you. >>>>> >>>> >>>>> >>> >>>>> >> >>>>> > >>>>> >>>> >>>> >> >