Hi Alex, Created: https://issues.apache.org/jira/browse/ZEPPELIN-1503
Regards, Andrey 2016-09-28 1:24 GMT+03:00 Goodman, Alexander (398K) < alexander.good...@jpl.nasa.gov>: > Hi Andrey, > > Hmm. Usually if you wait long enough the notebook should eventually open. > You are right though, the only other way to fix this that I can think of is > to edit the note.json file directly and remove the output yourself (you'll > see it as a really long string contained in the SVG div tag). As far as I > know though, there isn't an option in the Zeppelin GUI to clear the output > in a specific note from the main menu, so that would be a nice feature. I > would consider filing a JIRA issue: https://issues.apache. > org/jira/browse/ZEPPELIN/ > > Thanks, > Alex > > On Tue, Sep 27, 2016 at 3:10 PM, Андрей Ривкин <amriv...@gmail.com> wrote: > >> Hi Alex, >> >> This helped! Great! Thank you! >> >> As for the option to hide the Paragraph output, there is a problem. If >> SVG is lagging u can't open notebook at all. So may be there is some way to >> clean all notebook output before opening it? >> Also we can change notebook json file directly on disk. >> >> Again, thank you for help. >> >> Regards, >> Andrey >> >> >> >> >> >> 2016-09-28 0:45 GMT+03:00 Goodman, Alexander (398K) < >> alexander.good...@jpl.nasa.gov>: >> >>> Hi Andrey, >>> >>> To get rid of the lag the SVG images are causing in your notebook, you >>> can hide the Paragraph output. Look for this icon in the upper right hand >>> corner of the paragraph: https://puu.sh/rq0JU/6fa29f2ff9.png >>> >>> For your first problem, matplotlib is very inflexible when it comes to >>> setting the backend. The default backend on most systems is set to Qt4Agg >>> which is a GUI backend and therefore requires DISPLAY to be set in your >>> environment (eg through X11). Hence, you should always call >>> matplotlib.use('Agg') before making calls (AND imports) to any other >>> plotting functions. In fact it is good practice to this in your very first >>> paragraph cell before running all others. plt.switch_backend() can work in >>> certain circumstances but a safe bet is to restart the interpreter through >>> the Interpreter menu in Zeppelin, then running the paragraphs again. If you >>> don't want to do this for every notebook your best bet is to change the >>> default backend to Agg in your matplotlibrc file. Part of the ongoing >>> development work for Zeppelin will involve creating a custom matplotlib >>> backend that is automatically defaulted so users won't have to worry about >>> this stuff in the future. >>> >>> For the second problem, the PR that I linked you to has not been merged, >>> and only works with the python (not pyspark) interpreter. You'll need >>> directly define the show function yourself somewhere in your notebook. Hope >>> this helps. >>> >>> Thanks, >>> Alex >>> >>> On Tue, Sep 27, 2016 at 2:22 PM, Андрей Ривкин <amriv...@gmail.com> >>> wrote: >>> >>>> Hi Alex, >>>> >>>> Thank you, we will give PNG a try. >>>> >>>> Our dataset is very small (for Big Data and Hadoop) - only 4mb. We have >>>> 40 000 rows x 17 columns. Not so big. >>>> >>>> But it seems that 40k dots it too much for my browser. Also may be >>>> Zeppelin should somehow disable such diffcult paragraphs and not whole >>>> notebook. >>>> And it's very difficult to change notebook after this plot was painted. >>>> Is there any way to clean up all results of notebook before opening? >>>> >>>> If we do import os, then we get: >>>> >>>> Traceback (most recent call last): >>>> File "/tmp/zeppelin_pyspark-3283164060812521118.py", line 239, in >>>> <module> >>>> eval(compiledCode) >>>> File "<string>", line 1, in <module> >>>> File "/opt/anaconda2/lib/python2.7/site-packages/pandas/tools/plotting.py", >>>> line 2951, in hist_series >>>> plt.figure(figsize=figsize)) >>>> File "/opt/anaconda2/lib/python2.7/site-packages/matplotlib/pyplot.py", >>>> line 527, in figure >>>> **kwargs) >>>> File >>>> "/opt/anaconda2/lib/python2.7/site-packages/matplotlib/backends/backend_qt4agg.py", >>>> line 46, in new_figure_manager >>>> return new_figure_manager_given_figure(num, thisFig) >>>> File >>>> "/opt/anaconda2/lib/python2.7/site-packages/matplotlib/backends/backend_qt4agg.py", >>>> line 53, in new_figure_manager_given_figure >>>> canvas = FigureCanvasQTAgg(figure) >>>> File >>>> "/opt/anaconda2/lib/python2.7/site-packages/matplotlib/backends/backend_qt4agg.py", >>>> line 76, in __init__ >>>> FigureCanvasQT.__init__(self, figure) >>>> File >>>> "/opt/anaconda2/lib/python2.7/site-packages/matplotlib/backends/backend_qt4.py", >>>> line 68, in __init__ >>>> _create_qApp() >>>> File >>>> "/opt/anaconda2/lib/python2.7/site-packages/matplotlib/backends/backend_qt5.py", >>>> line 138, in _create_qApp >>>> raise RuntimeError('Invalid DISPLAY variable') >>>> RuntimeError: Invalid DISPLAY variable >>>> >>>> Also trying : >>>> >>>> >>>> %pyspark >>>> import matplotlib matplotlib.use('Agg') import matplotlib.pyplot as plt >>>> def show(p): z.show(plt, fmt="png") >>>> >>>> >>>> %pyspark raw_data['age'].hist(bins=20,color = 'g') plt.xlabel('Age') >>>> plt.ylabel('Number of people') plt.title('Age distribution') show(plt) >>>> plt.close() >>>> >>>> >>>> Traceback (most recent call last): >>>> File "/tmp/zeppelin_pyspark-7759686698822330483.py", line 239, in >>>> <module> >>>> eval(compiledCode) >>>> File "<string>", line 5, in <module> >>>> File "<string>", line 5, in show >>>> TypeError: show() got an unexpected keyword argument 'fmt' >>>> >>>> >>>> >>>> Regards, >>>> Andrey >>>> >>>> >>>> 2016-09-27 20:53 GMT+03:00 Goodman, Alexander (398K) < >>>> alexander.good...@jpl.nasa.gov>: >>>> >>>>> Hi Andrey, >>>>> >>>>> These two lines: >>>>> >>>>> os.system("export DISPLAY=:0") >>>>> plt.switch_backend('Agg') >>>>> >>>>> should not be necessary since you have already set the backend >>>>> manually to AGG. >>>>> >>>>> More importantly, how large is your dataset? While SVG looks nice, it >>>>> does not scale well with large datasets. I would suggest you try using PNG >>>>> images instead. Some code for doing this can be found in this PR: >>>>> https://github.com/apache/zeppelin/pull/1422. There is work ongoing >>>>> to improve matplotlib integration with zeppelin even further than this, >>>>> but >>>>> this solution should be sufficient for you right now. If you are still >>>>> having problems, let us know. >>>>> >>>>> Thanks, >>>>> Alex >>>>> >>>>> On Tue, Sep 27, 2016 at 3:27 AM, Андрей Ривкин <amriv...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi Alex, >>>>>> >>>>>> here is exported notebook and sample data. >>>>>> >>>>>> Notebook is quite havy (19MB) where can I upload it? >>>>>> >>>>>> Here is some code sample: >>>>>> >>>>>> %pyspark >>>>>> >>>>>> import matplotlib >>>>>> import os >>>>>> >>>>>> from pylab import figure, show, rand >>>>>> from matplotlib.patches import Ellipse >>>>>> import matplotlib.pyplot as plt >>>>>> # helper function to display in Zeppelin >>>>>> >>>>>> matplotlib.use('Agg') >>>>>> os.system("export DISPLAY=:0") >>>>>> plt.switch_backend('Agg') >>>>>> >>>>>> import StringIO >>>>>> def show(p): >>>>>> img = StringIO.StringIO() >>>>>> p.savefig(img, format='svg') >>>>>> img.seek(0) >>>>>> print "%html <div style='width:600px'>" + img.buf + "</div>" >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> %pyspark >>>>>> data_1 = data.ix[data['y']==1] >>>>>> data_0 = data.ix[data['y']==0] >>>>>> x_1 = data_1['balance'].values >>>>>> y_1 = data_1['age'].values >>>>>> x_0 = data_0['balance'].values >>>>>> y_0 = data_0['age'].values >>>>>> colors = ['red','green'] >>>>>> plt.figure(figsize=(10, 6)) >>>>>> plt.xlabel('Balance') >>>>>> plt.ylabel('Age') >>>>>> plt.title('') >>>>>> plt.scatter(x_0, y_0, alpha=0.5, color='blue') >>>>>> plt.scatter(x_1, y_1, alpha=0.5, color='red')#matplotlib.colors >>>>>> .ListedColormap(colors) >>>>>> plt.title('Destributions of balance by age and target value') >>>>>> show(plt) >>>>>> plt.close() >>>>>> >>>>>> Regards, >>>>>> Andrey >>>>>> >>>>>> 2016-09-20 19:20 GMT+03:00 Goodman, Alexander (398K) < >>>>>> alexander.good...@jpl.nasa.gov>: >>>>>> >>>>>>> Hi Andrey, >>>>>>> >>>>>>> Would you be able to post the code you were using so we can try to >>>>>>> reproduce your problem including how you are generating the images >>>>>>> inline >>>>>>> (eg, is your chosen image format png or svg?). >>>>>>> >>>>>>> Thanks, >>>>>>> Alex >>>>>>> >>>>>>> On Tue, Sep 20, 2016 at 9:12 AM, Андрей Ривкин <amriv...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> We are using Zeppelin 0.6.1 with Spark 1.6.2. >>>>>>>> >>>>>>>> We have very simple demo and small file. If we want just to >>>>>>>> calculate some - it's ok. >>>>>>>> But when we try to visualize using matplotlib Zepplin hangs (even >>>>>>>> scroll bar) and then disconnects. >>>>>>>> >>>>>>>> We are using Chrome. >>>>>>>> >>>>>>>> In logs just this: >>>>>>>> >>>>>>>> INFO [2016-09-20 18:44:22,574] ({pool-1-thread-10} >>>>>>>> Paragraph.java[jobRun]:252) - run paragraph 20160920-143431_1028264283 >>>>>>>> using sql org.apache.zeppelin.interprete >>>>>>>> r.LazyOpenInterpreter@49b54b66 >>>>>>>> INFO [2016-09-20 18:44:25,114] ({pool-1-thread-10} >>>>>>>> NotebookServer.java[afterStatusChange]:1150) - Job >>>>>>>> 20160920-143431_1028264283 is finished >>>>>>>> INFO [2016-09-20 18:44:25,142] ({pool-1-thread-10} >>>>>>>> SchedulerFactory.java[jobFinished]:137) - Job >>>>>>>> paragraph_1474371271306_-871438747 finished by scheduler >>>>>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpretershar >>>>>>>> ed_session476562431 >>>>>>>> INFO [2016-09-20 18:47:19,430] ({qtp88558700-14} >>>>>>>> NotebookServer.java[onClose]:227) - Closed connection to >>>>>>>> 192.168.110.249 : 53565. (1001) null >>>>>>>> >>>>>>>> Always null and 1001 in the end. >>>>>>>> >>>>>>>> In Firefox it's sometimes ok. But if there are more then 3 plots it >>>>>>>> will hang too. >>>>>>>> >>>>>>>> How could we debug this? >>>>>>>> >>>>>>>> Regards, >>>>>>>> Andrey >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Alex Goodman >>>>>>> Data Scientist I >>>>>>> Science Data Modeling and Computing (398K) >>>>>>> Jet Propulsion Laboratory >>>>>>> California Institute of Technology >>>>>>> Tel: +1-818-354-6012 >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Alex Goodman >>>>> Data Scientist I >>>>> Science Data Modeling and Computing (398K) >>>>> Jet Propulsion Laboratory >>>>> California Institute of Technology >>>>> Tel: +1-818-354-6012 >>>>> >>>> >>>> >>> >>> >>> -- >>> Alex Goodman >>> Data Scientist I >>> Science Data Modeling and Computing (398K) >>> Jet Propulsion Laboratory >>> California Institute of Technology >>> Tel: +1-818-354-6012 >>> >> >> > > > -- > Alex Goodman > Data Scientist I > Science Data Modeling and Computing (398K) > Jet Propulsion Laboratory > California Institute of Technology > Tel: +1-818-354-6012 >