Hi Alex,

Created: https://issues.apache.org/jira/browse/ZEPPELIN-1503

Regards,
Andrey

2016-09-28 1:24 GMT+03:00 Goodman, Alexander (398K) <
alexander.good...@jpl.nasa.gov>:

> Hi Andrey,
>
> Hmm. Usually if you wait long enough the notebook should eventually open.
> You are right though, the only other way to fix this that I can think of is
> to edit the note.json file directly and remove the output yourself (you'll
> see it as a really long string contained in the SVG div tag). As far as I
> know though, there isn't an option in the Zeppelin GUI to clear the output
> in a specific note from the main menu, so that would be a nice feature. I
> would consider filing a JIRA issue: https://issues.apache.
> org/jira/browse/ZEPPELIN/
>
> Thanks,
> Alex
>
> On Tue, Sep 27, 2016 at 3:10 PM, Андрей Ривкин <amriv...@gmail.com> wrote:
>
>> Hi Alex,
>>
>> This helped! Great! Thank you!
>>
>> As for the option to hide the Paragraph output, there is a problem. If
>> SVG is lagging u can't open notebook at all. So may be there is some way to
>> clean all notebook output before opening it?
>> Also we can change notebook json file directly on disk.
>>
>> Again, thank you for help.
>>
>> Regards,
>> Andrey
>>
>>
>>
>>
>>
>> 2016-09-28 0:45 GMT+03:00 Goodman, Alexander (398K) <
>> alexander.good...@jpl.nasa.gov>:
>>
>>> Hi Andrey,
>>>
>>> To get rid of the lag the SVG images are causing in your notebook, you
>>> can hide the Paragraph output. Look for this icon in the upper right hand
>>> corner of the paragraph: https://puu.sh/rq0JU/6fa29f2ff9.png
>>>
>>> For your first problem, matplotlib is very inflexible when it comes to
>>> setting the backend. The default backend on most systems is set to Qt4Agg
>>> which is a GUI backend and therefore requires DISPLAY to be set in your
>>> environment (eg through X11). Hence, you should always call
>>> matplotlib.use('Agg') before making calls (AND imports) to any other
>>> plotting functions. In fact it is good practice to this in your very first
>>> paragraph cell before running all others. plt.switch_backend() can work in
>>> certain circumstances but a safe bet is to restart the interpreter through
>>> the Interpreter menu in Zeppelin, then running the paragraphs again. If you
>>> don't want to do this for every notebook your best bet is to change the
>>> default backend to Agg in your matplotlibrc file. Part of the ongoing
>>> development work for Zeppelin will involve creating a custom matplotlib
>>> backend that is automatically defaulted so users won't have to worry about
>>> this stuff in the future.
>>>
>>> For the second problem, the PR that I linked you to has not been merged,
>>> and only works with the python (not pyspark) interpreter. You'll need
>>> directly define the show function yourself somewhere in your notebook. Hope
>>> this helps.
>>>
>>> Thanks,
>>> Alex
>>>
>>> On Tue, Sep 27, 2016 at 2:22 PM, Андрей Ривкин <amriv...@gmail.com>
>>> wrote:
>>>
>>>> Hi Alex,
>>>>
>>>> Thank you, we will give PNG a try.
>>>>
>>>> Our dataset is very small (for Big Data and Hadoop) - only 4mb. We have
>>>> 40 000 rows x 17 columns. Not so big.
>>>>
>>>> But it seems that 40k dots it too much for my browser. Also may be
>>>> Zeppelin should somehow disable such diffcult paragraphs and not whole
>>>> notebook.
>>>> And it's very difficult to change notebook after this plot was painted.
>>>> Is there any way to clean up all results of notebook before opening?
>>>>
>>>> If we do import os, then we get:
>>>>
>>>> Traceback (most recent call last):
>>>> File "/tmp/zeppelin_pyspark-3283164060812521118.py", line 239, in
>>>> <module>
>>>> eval(compiledCode)
>>>> File "<string>", line 1, in <module>
>>>> File "/opt/anaconda2/lib/python2.7/site-packages/pandas/tools/plotting.py",
>>>> line 2951, in hist_series
>>>> plt.figure(figsize=figsize))
>>>> File "/opt/anaconda2/lib/python2.7/site-packages/matplotlib/pyplot.py",
>>>> line 527, in figure
>>>> **kwargs)
>>>> File 
>>>> "/opt/anaconda2/lib/python2.7/site-packages/matplotlib/backends/backend_qt4agg.py",
>>>> line 46, in new_figure_manager
>>>> return new_figure_manager_given_figure(num, thisFig)
>>>> File 
>>>> "/opt/anaconda2/lib/python2.7/site-packages/matplotlib/backends/backend_qt4agg.py",
>>>> line 53, in new_figure_manager_given_figure
>>>> canvas = FigureCanvasQTAgg(figure)
>>>> File 
>>>> "/opt/anaconda2/lib/python2.7/site-packages/matplotlib/backends/backend_qt4agg.py",
>>>> line 76, in __init__
>>>> FigureCanvasQT.__init__(self, figure)
>>>> File 
>>>> "/opt/anaconda2/lib/python2.7/site-packages/matplotlib/backends/backend_qt4.py",
>>>> line 68, in __init__
>>>> _create_qApp()
>>>> File 
>>>> "/opt/anaconda2/lib/python2.7/site-packages/matplotlib/backends/backend_qt5.py",
>>>> line 138, in _create_qApp
>>>> raise RuntimeError('Invalid DISPLAY variable')
>>>> RuntimeError: Invalid DISPLAY variable
>>>>
>>>> Also trying :
>>>>
>>>>
>>>> %pyspark
>>>> import matplotlib matplotlib.use('Agg') import matplotlib.pyplot as plt
>>>> def show(p): z.show(plt, fmt="png")
>>>>
>>>>
>>>> %pyspark raw_data['age'].hist(bins=20,color = 'g') plt.xlabel('Age')
>>>> plt.ylabel('Number of people') plt.title('Age distribution') show(plt)
>>>> plt.close()
>>>>
>>>>
>>>> Traceback (most recent call last):
>>>> File "/tmp/zeppelin_pyspark-7759686698822330483.py", line 239, in
>>>> <module>
>>>> eval(compiledCode)
>>>> File "<string>", line 5, in <module>
>>>> File "<string>", line 5, in show
>>>> TypeError: show() got an unexpected keyword argument 'fmt'
>>>>
>>>>
>>>>
>>>> Regards,
>>>> Andrey
>>>>
>>>>
>>>> 2016-09-27 20:53 GMT+03:00 Goodman, Alexander (398K) <
>>>> alexander.good...@jpl.nasa.gov>:
>>>>
>>>>> Hi Andrey,
>>>>>
>>>>> These two lines:
>>>>>
>>>>> os.system("export DISPLAY=:0")
>>>>> plt.switch_backend('Agg')
>>>>>
>>>>> should not be necessary since you have already set the backend
>>>>> manually to AGG.
>>>>>
>>>>> More importantly, how large is your dataset? While SVG looks nice, it
>>>>> does not scale well with large datasets. I would suggest you try using PNG
>>>>> images instead. Some code for doing this can be found in this PR:
>>>>> https://github.com/apache/zeppelin/pull/1422. There is work ongoing
>>>>> to improve matplotlib integration with zeppelin even further than this, 
>>>>> but
>>>>> this solution should be sufficient for you right now. If you are still
>>>>> having problems, let us know.
>>>>>
>>>>> Thanks,
>>>>> Alex
>>>>>
>>>>> On Tue, Sep 27, 2016 at 3:27 AM, Андрей Ривкин <amriv...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Alex,
>>>>>>
>>>>>> here is exported notebook and sample data.
>>>>>>
>>>>>> Notebook is quite havy (19MB) where can I upload it?
>>>>>>
>>>>>> Here is some code sample:
>>>>>>
>>>>>> %pyspark
>>>>>>
>>>>>> import matplotlib
>>>>>> import os
>>>>>>
>>>>>> from pylab import figure, show, rand
>>>>>> from matplotlib.patches import Ellipse
>>>>>> import matplotlib.pyplot as plt
>>>>>> # helper function to display in Zeppelin
>>>>>>
>>>>>> matplotlib.use('Agg')
>>>>>> os.system("export DISPLAY=:0")
>>>>>> plt.switch_backend('Agg')
>>>>>>
>>>>>> import StringIO
>>>>>> def show(p):
>>>>>>   img = StringIO.StringIO()
>>>>>>   p.savefig(img, format='svg')
>>>>>>   img.seek(0)
>>>>>>   print "%html <div style='width:600px'>" + img.buf + "</div>"
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> %pyspark
>>>>>> data_1 = data.ix[data['y']==1]
>>>>>> data_0 = data.ix[data['y']==0]
>>>>>> x_1 = data_1['balance'].values
>>>>>> y_1 = data_1['age'].values
>>>>>> x_0 = data_0['balance'].values
>>>>>> y_0 = data_0['age'].values
>>>>>> colors = ['red','green']
>>>>>> plt.figure(figsize=(10, 6))
>>>>>> plt.xlabel('Balance')
>>>>>> plt.ylabel('Age')
>>>>>> plt.title('')
>>>>>> plt.scatter(x_0, y_0, alpha=0.5, color='blue')
>>>>>> plt.scatter(x_1, y_1, alpha=0.5, color='red')#matplotlib.colors
>>>>>> .ListedColormap(colors)
>>>>>> plt.title('Destributions of balance by age and target value')
>>>>>> show(plt)
>>>>>> plt.close()
>>>>>>
>>>>>> Regards,
>>>>>> Andrey
>>>>>>
>>>>>> 2016-09-20 19:20 GMT+03:00 Goodman, Alexander (398K) <
>>>>>> alexander.good...@jpl.nasa.gov>:
>>>>>>
>>>>>>> Hi Andrey,
>>>>>>>
>>>>>>> Would you be able to post the code you were using so we can try to
>>>>>>> reproduce your problem including how you are generating the images 
>>>>>>> inline
>>>>>>> (eg, is your chosen image format png or svg?).
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Alex
>>>>>>>
>>>>>>> On Tue, Sep 20, 2016 at 9:12 AM, Андрей Ривкин <amriv...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> We are using Zeppelin 0.6.1 with Spark 1.6.2.
>>>>>>>>
>>>>>>>> We have very simple demo and small file. If we want just to
>>>>>>>> calculate some - it's ok.
>>>>>>>> But when we try to visualize using matplotlib Zepplin hangs (even
>>>>>>>> scroll bar) and then disconnects.
>>>>>>>>
>>>>>>>> We are using Chrome.
>>>>>>>>
>>>>>>>> In logs just this:
>>>>>>>>
>>>>>>>>  INFO [2016-09-20 18:44:22,574] ({pool-1-thread-10}
>>>>>>>> Paragraph.java[jobRun]:252) - run paragraph 20160920-143431_1028264283
>>>>>>>> using sql org.apache.zeppelin.interprete
>>>>>>>> r.LazyOpenInterpreter@49b54b66
>>>>>>>>  INFO [2016-09-20 18:44:25,114] ({pool-1-thread-10}
>>>>>>>> NotebookServer.java[afterStatusChange]:1150) - Job
>>>>>>>> 20160920-143431_1028264283 is finished
>>>>>>>>  INFO [2016-09-20 18:44:25,142] ({pool-1-thread-10}
>>>>>>>> SchedulerFactory.java[jobFinished]:137) - Job
>>>>>>>> paragraph_1474371271306_-871438747 finished by scheduler
>>>>>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpretershar
>>>>>>>> ed_session476562431
>>>>>>>>  INFO [2016-09-20 18:47:19,430] ({qtp88558700-14}
>>>>>>>> NotebookServer.java[onClose]:227) - Closed connection to
>>>>>>>> 192.168.110.249 : 53565. (1001) null
>>>>>>>>
>>>>>>>> Always null and 1001 in the end.
>>>>>>>>
>>>>>>>> In Firefox it's sometimes ok. But if there are more then 3 plots it
>>>>>>>> will hang too.
>>>>>>>>
>>>>>>>> How could we debug this?
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Andrey
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Alex Goodman
>>>>>>> Data Scientist I
>>>>>>> Science Data Modeling and Computing (398K)
>>>>>>> Jet Propulsion Laboratory
>>>>>>> California Institute of Technology
>>>>>>> Tel: +1-818-354-6012
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Alex Goodman
>>>>> Data Scientist I
>>>>> Science Data Modeling and Computing (398K)
>>>>> Jet Propulsion Laboratory
>>>>> California Institute of Technology
>>>>> Tel: +1-818-354-6012
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Alex Goodman
>>> Data Scientist I
>>> Science Data Modeling and Computing (398K)
>>> Jet Propulsion Laboratory
>>> California Institute of Technology
>>> Tel: +1-818-354-6012
>>>
>>
>>
>
>
> --
> Alex Goodman
> Data Scientist I
> Science Data Modeling and Computing (398K)
> Jet Propulsion Laboratory
> California Institute of Technology
> Tel: +1-818-354-6012
>

Reply via email to