Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead of [NOTEID]/note.json

2018-08-13 Thread Belousov Maksim Eduardovich
The use of Russian and other specific letters in the note name is big advantage 
of Zeppelin. I would not like to give up this functionality.

I support the idea about `zpln` file extension.
The folder structure also sounds good.

I'm afraid about non-latin symbols in folder and note name. And what about 
hieroglyphs?

Apache Zeppelin may be the first to use Russian letters in file system in our 
company.
I see a lot of risks to use non-latin symbols and a lot of issues to make new 
folder structure stable.





От: Jeff Zhang 
Отправлено: 13 августа 2018 г. 12:50
Кому: users@zeppelin.apache.org
Тема: Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead of 
[NOTEID]/note.json

>>> Do we need the note id in the file name at all? What’s wrong with just 
>>> note_name.zpln?
The reason I keep note id is because currently we use noteId to identify one 
note. e.g. we use note id in both websocket api and rest api. It is almost 
impossible to remove noteId for the current architecture. If we put note id 
into file content of note_name.zpln, then we have to read the note file every 
time, then we meet the issues I mentioned above again.

>>> If the file content is json then why not use note_name.json instead of 
>>> .zpln? That would make it easier for editors to know how to load/highlight 
>>> the file contents.
I am not strongly biased on *.zpln. But I think one purpose is to help third 
parties to identify zeppelin note properly. e.g. github can identify jupyter 
notebook (*.ipynb) and render it properly.

>>> Is there any reason for not using real folders or directories for 
>>> organising the notebooks rather than embedding the folder hierarchy in the 
>>> names of the notebooks?  If someone wants to ‘move’ the notebooks to 
>>> another folder they’d have to manually rename all the files/notebooks at 
>>> present.  That’s not very user-friendly.

Actually my proposal is to use real folders. What user see in zeppelin note 
menu is the actual notes folder structure. If they want to move the notebooks 
to another folder, they can change the folder name just like what user did in 
file system.





Partridge, Lucas (GE Aviation) 
mailto:lucas.partri...@ge.com>>于2018年8月13日周一 下午4:43写道:
Hi Jeff,
I have some questions about this proposal (I can’t edit the design doc):


  1.  Do we need the note id in the file name at all? What’s wrong with just 
note_name.zpln?

  2.  If the file content is json then why not use note_name.json instead of 
.zpln? That would make it easier for editors to know how to load/highlight the 
file contents.

  3.  Is there any reason for not using real folders or directories for 
organising the notebooks rather than embedding the folder hierarchy in the 
names of the notebooks?  If someone wants to ‘move’ the notebooks to another 
folder they’d have to manually rename all the files/notebooks at present.  
That’s not very user-friendly.

Thanks, Lucas.
From: Jeff Zhang mailto:zjf...@gmail.com>>
Sent: 13 August 2018 09:06
To: users@zeppelin.apache.org
Cc: dev mailto:d...@zeppelin.apache.org>>
Subject: EXT: Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead of 
[NOTEID]/note.json

In that case, zeppelin should fail to create note.

Felix Cheung 
mailto:felixcheun...@hotmail.com>>于2018年8月13日周一 
下午3:47写道:
Perhaps one concern is users having characters in note name that are invalid 
for file name/file path?



From: Mohit Jaggi mailto:mohitja...@gmail.com>>
Sent: Sunday, August 12, 2018 6:02 PM
To: users@zeppelin.apache.org
Cc: dev
Subject: Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead of 
[NOTEID]/note.json

sounds like a good idea!

On Sun, Aug 12, 2018 at 5:34 PM Jeff Zhang 
mailto:zjf...@gmail.com>> wrote:
Motivation

   The motivation of ZEPPELIN-2619 is to change the notes storage structure. 
Previously we store it using {noteId}/note.json, we’d like to change it into 
{note_name}_{note_id}.zpln. There are several reasons for this change.


  1.  {noteId}/note.json is not scalable. We put all notes in one root folder 
in flat structure. And when zeppelin server starts, we need to read all 
note.json to get the note file name and build the note folder structure 
(Because we need to get the note name which is stored in note.json to build the 
notebook menu). This would be a nightmare when you have large amounts of notes.
  2.  {noteId}/note.json is not maintainable. It is difficult for a 
developer/administrator to find note file based on note name.
  3.  {noteId}/note.json has no folder structure. Currently zeppelin have to 
build the folder structure internally in memory according note name which is a 
big overhead.

New Approach

   As I mentioned above, I propose to change the note storage structure to 
{note_name}_{note_id}.zpln.  note_name could contains folders, e.g. 
folder_1/mynote_abcd.zpln

This kind of note 

Re: [DISCUSS] Share Data in Zeppelin

2018-07-17 Thread Belousov Maksim Eduardovich
Ability to work with many data source is one the reason we chose Apache 
Zeppelin.

For branch-0.7 our ops-team wrote a lot of python functions for import and 
export data from diffent source (Greenlum, Hive, Oracle) using Python DataFrame 
as middleware.
Our users can upload flat files to Zeppelin via Samba, then upload to DBs and 
run queries.

Availability of ResourcePool in 0.8 is big milestone.
I hope ResourcePool will allow to smoothly intergate all sources in company.
It would be great if not only spark and python interpreter could get data from 
ResourcePool.

2b case is nice.
Now I see that the transmit of table data is sufficient.



Regards,
Maxim Belousov


От: Jeff Zhang 
Отправлено: 13 июля 2018 г. 6:00
Кому: users@zeppelin.apache.org
Копия: dev
Тема: Re: [DISCUSS] Share Data in Zeppelin

Thanks Sanjay, I have fixed the example note.

*Folks, to be noticed,* the example note is just a fake note, it won't work
for now.



Jongyoul Lee 于2018年7月13日周五 上午10:54写道:

> BTW, we need to consider the case where the result is large in a design
> time. In my experience, If we implement this feature, users could use it
> with large data.
>
> On Fri, Jul 13, 2018 at 11:51 AM, Sanjay Dasgupta <
> sanjay.dasgu...@gmail.com> wrote:
>
>> I prefer 2.b also. Could we use (save*Result*AsTable=people) instead?
>>
>> There are a few typos in the example note shared:
>>
>> 1) The line val peopleDF = spark.read.format("zeppelin").load() should
>> mention the table name (possibly as argument to load?)
>> 2) The python line val peopleDF = z.getTable("people").toPandas() should
>> not have the val
>>
>>
>> The z.getTable() method could be a very good tool to judge
>> which use-cases are important in the community. It is easy to implement for
>> the in-memory data case, and could be very useful for many situations where
>> a small amount of data is being transferred across interpreters (like the
>> jdbc -> matplotlib case mentioned).
>>
>> Thanks,
>> Sanjay
>>
>> On Fri, Jul 13, 2018 at 8:07 AM, Jongyoul Lee  wrote:
>>
>>> Yes, it's similar to 2.b.
>>>
>>> Basically, my concern is to handle all kinds of data. But in your case,
>>> it looks like focusing on table data. It's also useful but it would be
>>> better to handle all of the data including table or plain text as well.
>>> WDYT?
>>>
>>> About storage, we could discuss it later.
>>>
>>> On Fri, Jul 13, 2018 at 11:25 AM, Jeff Zhang  wrote:
>>>

 I think your use case is the same of 2.b.  Personally I don't recommend
 to use z.get(noteId, paragraphId) to get the shared data for 2 reasons
 1.  noteId, paragraphId is meaningless, which is not readable
 2. The note will break if we clone it as the noteId is changed.
 That's why I suggest to use paragraph property to save paragraph's
 result

 Regarding the intermediate storage, I also though about it and agree
 that in the long term we should provide such layer to support large data,
 currently we put the shared data in memory which is not a scalable
 solution.  One candidate in my mind is alluxio [1], and regarding the data
 format I think apache arrow [2] is another good option for zeppelin to
 share table data across interpreter processes and different languages. But
 these are all implementation details, I think we can talk about them in
 another thread. In this thread, I think we should focus on the user facing
 api.


 [1] http://www.alluxio.org/
 [2] https://arrow.apache.org/



 Jongyoul Lee 于2018年7月13日周五 上午10:11写道:

> I have a bit different idea to share data.
>
> In my case,
>
> It would be very useful to get a paragraph's result as an input of
> other paragraphs.
>
> e.g.
>
> -- Paragrph 1
> %jdbc
> select * from some_table;
>
> -- Paragraph 2
> %spark
> val rdd = z.get("noteId", "paragraphId").parse.makeRddByMyself
> spark.read(table).select
>
> If paragraph 1's result is too big to show on FE, it would be saved in
> Zeppelin Server with proper way and pass to SparkInterpreter when 
> Paragraph
> 2 is executed.
>
> Basically, I think we need to intermediate storage to store
> paragraph's results to share them. We can introduce another layer or 
> extend
> NotebootRepo. In some cases, we might change notebook repos as well.
>
> JL
>
>
>
> On Fri, Jul 13, 2018 at 10:39 AM, Jeff Zhang  wrote:
>
>> Hi Folks,
>>
>> Recently, there's several tickets [1][2][3] about sharing data in
>> zeppelin.
>> Zeppelin's goal is to be an unified data analyst platform which could
>> integrate most of the big data tools and help user to switch between
>> tools
>> and share data between tools easily. So sharing data is a very
>> critical and
>> killer feature of Zeppelin IMHO.
>>
>> I raise this ticket to 

RE: Partial code lost when multiple people work in same note

2018-07-05 Thread Belousov Maksim Eduardovich
PR2848 [1] fixed this behavior, but not merged to branch-0.8.
So fixed released versions are absent.



1.   https://github.com/apache/zeppelin/pull/2848 - [Zeppelin-3307] - 
Improved shared browsing/editing for the note


Regards,

Maksim Belousov


From: Jeff Zhang [mailto:zjf...@gmail.com]
Sent: Thursday, July 05, 2018 4:17 PM
To: users@zeppelin.apache.org
Subject: Re: Partial code lost when multiple people work in same note


Which version of zeppelin do you use ? And how do you cooperate ? Multiple 
people work on the same paragraphs ?

Ben Teeuwen 
mailto:ben.teeu...@booking.com>>于2018年7月5日周四 下午7:24写道:
Hi,

We're trying out Zeppelin with a bunch of people. As soon as 2 people work in 
the same note on the same machine, code is lost from the chunk someone is 
working in. Quite some colleagues concluded that the cooperation feature, 
initially expected to be one of the killer features, doesn't live up to the 
promise and moved back to Jupyter.

Is this a known issue, and/or have others experienced this? Curious if we've 
set it up erroneously, or whether this is a ticket worthy and needs more 
debugging information.

Ben


RE: mysql jdbc

2018-04-02 Thread Belousov Maksim Eduardovich
Hi Mohit!

Did you follow to [1] ?
JAR need to put in $ZEPPELIN_HOME/interpreter/jdbc


1.   http://zeppelin.apache.org/docs/0.7.3/interpreter/jdbc.html#mysql

Regards,

Maksim Belousov


From: Mohit Jaggi [mailto:mohitja...@gmail.com]
Sent: Tuesday, April 03, 2018 12:12 AM
To: users@zeppelin.apache.org
Subject: mysql jdbc

Hi,
I following instructions here: 
https://zeppelin.apache.org/docs/0.7.0/interpreter/jdbc.html#mysql
But I get this when I try "show databases". Do I need to do something to fetch 
the driver?

%mysql
show databases
java.lang.ClassNotFoundException: org.mysql.jdbc.Driver
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at 
org.apache.zeppelin.jdbc.JDBCInterpreter.createConnectionPool(JDBCInterpreter.java:341)
at 
org.apache.zeppelin.jdbc.JDBCInterpreter.getConnectionFromPool(JDBCInterpreter.java:352)
at 
org.apache.zeppelin.jdbc.JDBCInterpreter.getConnection(JDBCInterpreter.java:372)
at org.apache.zeppelin.jdbc.JDBCInterpreter.executeSql(JDBCInterpreter.java:565)
at org.apache.zeppelin.jdbc.JDBCInterpreter.interpret(JDBCInterpreter.java:692)
at 
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:97)
at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:498)
at org.apache.zeppelin.scheduler.Job.r


RE: [DISCUSS] Roadmap 0.9 and future

2018-03-13 Thread Belousov Maksim Eduardovich
Hello Moon and users!

Thanks a lot for sharing.

I propose to abandon the roadmap.
The idea of no roadmap in open source projects is from the book "Social 
Architecture". The author is Pieter Hintjen. As it's written: in 2007 he 
founded the successful and broad-ranging ZeroMQ community. There is the chapter 
"How ZeroMQ Lost Its Road Map" in [1].

Furthermore the current item list is more like "wish list".
I believe one can rename the wiki-page to "Community wish list"


[1] https://hintjens.gitbooks.io/social-architecture/content/chapter5.html



Regards,

Maksim Belousov


From: Jongyoul Lee [mailto:jongy...@gmail.com]
Sent: Thursday, March 08, 2018 3:10 AM
To: users 
Subject: Re: [DISCUSS] Roadmap 0.9 and future

Thanks for the sharing.

As an enterprise user, it would help a lot to support Spark more concretely. 
And containerization became a default option to operate a framework.

For provide immediate release of Spark, I suggest separate release cycle 
between zeppelin-* module and interpreters. This would help contributor see 
more frequent releases of interpreters they are using.

Thanks,
JL

On Thu, Mar 8, 2018 at 4:08 AM, moon soo Lee 
> wrote:
I think there were couple of discussions about 0.8.0 release in the community. 
We have umbrella issue that tracks 0.8.0 release [1]. And i see not many 
blockers are left. Jeff volunteered the release manager for 0.8.0 and I think 
he can give better idea of 0.8.0 release estimate.

Thanks,
moon

[1] https://issues.apache.org/jira/browse/ZEPPELIN-2385

On Wed, Mar 7, 2018 at 10:49 AM Paul Brenner 
> wrote:
[https://share.polymail.io/v2/z/a/NWFhMDMwOTE2OTMz/BRe7pNgtAmQwzQfWdRurKMu_L3CexD7fbzihUQ9bmqEuXHBjiuR2B50qLJ6JeUnOOIHtNGf9IAWHv9b8xv6g5da0m57iEDKSQZ1e3rLZr6y8VKN7JHEMrJDVbOGhHSngSnovAYT-1vZDQE_5ziuawEs6EUk=.png]
Thanks for sharing the results of the meeting!

Regarding the point "Need more frequent release”, was there any discussion 
around when 0.8.0 will be officially released? I remember a message a few 
months ago suggesting that it might be soon.

[https://marketing.placeiq.net/images/placeiq.png]

Paul Brenner

[https://marketing.placeiq.net/images/twitter1.png]

[https://marketing.placeiq.net/images/facebook.png]

[https://marketing.placeiq.net/images/linkedin.png]

DATA SCIENTIST

(217) 390-3033



[PlaceIQ:CES 
2018]


On Wed, Mar 07, 2018 at 1:21 PM moon soo Lee mailto:moon+soo+lee+%3cm...@apache.org%3E> > wrote:
Hi forks,

There were an offline meeting yesterday at PaloAlto with contributors and 
users. We've shared idea about current state of project and future project 
roadmap and wishlists (meeting note [1]). It was really inspiring and exciting 
time. Let me try summarize, move this discussion to online.

There were many ideas related to Interpreter. Especially, there were consensus 
that Spark support is one of biggest strength of Zeppelin and need to make 
further improvement to keep the strengths.

  *   Spark

 *   Immediate support of new spark release
 *   Ramp up support of current Spark feature (e.g. Display job progress 
correctly)
 *   Spark streaming support
 *   Handling Livy timeout

  *   Other interpreters

 *   Better Hive support (e.g. configuration)
 *   Latest version PrestoDB support (pass property correctly)

  *   Run interpreter in containerized environment
  *   Let individual user upload custom library from user's machine directly
  *   Interpreter documentation is not detail enough
And people in the meeting excited about ConInterpreter ZEPPELIN-3085 [2] in 
upcoming release, regarding dynamic/inline configuration of interpreter.

And there were ideas on other areas, too. like

  *   Separate Admin role and user role
  *   Sidebar with plugin widget
  *   Better integration with emerging framework like Tensorflow/MXNet/Ray
  *   Sharing data
  *   Schedule notebook from external scheduler
Regarding scheduling notebook, Luciano shared his project NotebookTools[3] and 
it made people really excited.

Also, there were inspiring discussions about the community/project. Current 
status and how can we make community/project more healthy. And here's some 
ideas around the topic

  *   Need more frequent release
  *   More attention to code review to speed up
  *   Publishing roadmap beforehand to help contribution
  *   'Newbie', 'low hanging fruit' tag helps contribution
  *   Enterprise friendly is another biggest strength of Zeppelin (in addition 
to Spark support) need to keep improve.

I probably missed many idea shared yesterday. Please feel free to add/correct 
the summary. Hope more people in the mailinglist join and develop the idea 

Zeppelin use survey

2018-02-16 Thread Belousov Maksim Eduardovich
Hello users!

Apache Zeppelin has wide functionality. It would be good to know how Zeppelin 
is used, most popular features and wishes.

I prepared the survey with 11 questions [1]. Please fill it.

After a while I will share source data and results.
I plan to make the survey every year.

1. https://goo.gl/forms/cnypeaT0lhGfEMld2


Regards,

Maksim Belousov




[DISCUSS] Large teams: guidelines for note folders and permissions

2018-01-30 Thread Belousov Maksim Eduardovich
Hello users!

Apache Zeppelin is very good for collaborative work. The users may easy create 
and share notes.
Imagine that 50 analysts work in Zeppelin every day and each analyst creates 
new note every week.
Thus there will be the thousands of notes over few months and it would be nice 
to organize them.

The second issue, in large team there are a lot of groups and some notes must 
be private or only run.
Therefore it's to be good to have simple rules for note permissions.
Now we have public/private workspace[1], I believe it is insufficient.


It's very interested the experience of Twitter. They had over 600 users [2]
Share your experience of folder structure and note permissions in large team.



1. 
http://zeppelin.apache.org/docs/0.8.0-SNAPSHOT/setup/security/notebook_authorization.html#separate-notebook-workspaces-public-vs-private
2. https://medium.com/@prasadwagle/zeppelin-at-twitter-d2c800b7b01



Regards,

Maksim Belousov




RE: Whats the best practice limit of Query results?

2018-01-23 Thread Belousov Maksim Eduardovich
Hi Alexander!

There was PR2323 [1] "[ZEPPELIN-2411] Improve Table" that added UI-grid [2]
The UI-grid excellent processes a huge amount of data and has a nice 
functionality.


[1] https://github.com/apache/zeppelin/pull/2323
[2] http://ui-grid.info/


Regards,

Maksim Belousov


From: alexander.me...@t-systems.com [mailto:alexander.me...@t-systems.com]
Sent: Tuesday, January 23, 2018 5:37 PM
To: users@zeppelin.apache.org
Subject: Whats the best practice limit of Query results?

Hi guys

We're using Zeppelin to do some analysis of log files (Cloudera Cluster, 
Zeppelin 0.7.1 currently) and we're experiencing that zeppelin tends to get 
really slow when notebooks / queries return large datasets.


* Is there a best practice on what amounts of data / query results 
zeppelin can handle?

* And is there a way to increase the performance?

o   (This may even be actually browser specific?)

As an example we'd like to be able to save a simple select timestamp, hostname, 
etc.. query, displayed in a table as a csv file. This will work fine, as long 
as the resultset is "small enough". Once a certain size is exceeded, it takes 
veeery long until the "save as" popup window appears (if it appears at all)

We experience the same extremely slow behavior when large resultsets are used 
for charts - the notebooks become unusable (too slow, browser becomes 
irresponsive)

How are you guys dealing with this?

Thanks in advance
Alex


RE: Use parameter into several paragraph, several interpreter

2018-01-10 Thread Belousov Maksim Eduardovich
Hello, Maxime!

$$ syntax [1] and note level dynamic forms [2] are available now only in the 
master branch.

[1] 
http://zeppelin.apache.org/docs/0.8.0-SNAPSHOT/usage/dynamic_form/intro.html#using-form-templates-scope-note
[2] 
http://zeppelin.apache.org/docs/0.8.0-SNAPSHOT/usage/dynamic_form/intro.html#creates-programmatically-scope-note


Regards,

Maksim Belousov


From: Maxime Lanciaux [mailto:mlanc...@gmail.com]
Sent: Thursday, January 11, 2018 12:19 AM
To: users@zeppelin.apache.org
Subject: Use parameter into several paragraph, several interpreter

Hello team,

I am trying to use a dynamically created parameter within python paragraph in 
another %jdbc paragraph in the same notebook but seems it is not working for 
now with zeppelin 7.3 (using z.input or $$)

Can you please give me an example on how it should work ?

Thanks for your help
Kind regards

Sent from Mail for Windows 10


[https://ipmcdn.avast.com/images/icons/icon-envelope-tick-round-orange-animated-no-repeat-v1.gif]

Virus-free. 
www.avast.com




RE: [DISCUSS] Change some default settings for avoiding unintended usages

2017-12-21 Thread Belousov Maksim Eduardovich
The authentication by default isn't big deal, it's could be enabled.
It's nice to use another account by default: guest/guest, for example.


Thanks,

Maksim Belousov

From: Jongyoul Lee [mailto:jongy...@gmail.com]
Sent: Monday, December 18, 2017 6:07 AM
To: users 
Cc: d...@zeppelin.apache.org
Subject: Re: [DISCUSS] Change some default settings for avoiding unintended 
usages

Agreed. Supporting container services must be good and I like this idea, but I 
don't think it's the part of this issue directly. Let's talk about this issue 
with another email.

I want to talk about enabling authentication by default. If it's enabled, we 
should login admin/password1 at the beginning. How do you think of it?

On Sat, Dec 2, 2017 at 1:57 AM, Felix Cheung 
> wrote:
I’d +1 docker or container support (mesos, dc/os, k8s)

But I think that they are separate things. If users are authenticated and 
interpreter is impersonating each user, the risk of system disruption should be 
low. This is typically how to secure things in a system, through user directory 
(eg LDAP) and access control (normal user can’t sudo and delete everything).

Thought?

_
From: Jeff Zhang >
Sent: Thursday, November 30, 2017 11:51 PM

Subject: Re: [DISCUSS] Change some default settings for avoiding unintended 
usages
To: >
Cc: users >


+1 for running interpreter process in docker container.



Jongyoul Lee >于2017年12月1日周五 
下午3:36写道:
Yes, exactly, this is not only the shell interpreter problem, all can run
any script through python and Scala. Shell is just an example.

Using docker looks good but it cannot avoid unindented usage of resources
like mining coin.

On Fri, Dec 1, 2017 at 2:36 PM, Felix Cheung 
>
wrote:

> I don’t think that’s limited to the shell interpreter.
>
> You can run any arbitrary program or script from python or Scala (or java)
> as well.
>
> _
> From: Jeff Zhang >
> Sent: Wednesday, November 29, 2017 4:00 PM
> Subject: Re: [DISCUSS] Change some default settings for avoiding
> unintended usages
> To: >
> Cc: users >
>
>
>
> Shell interpreter is a black hole for security, usually we don't recommend
> or allow user to use shell.
>
> We may need to refactor the shell interpreter, running under zeppelin user
> is too dangerous.
>
>
>
>
>
> Jongyoul Lee >于2017年11月29日周三 
> 下午11:44写道:
>
> > Hi, users and dev,
> >
> > Recently, I've got an issue about the abnormal usage of some
> interpreters.
> > Zeppelin's users can access shell by shell and python interpreters. It
> > means all users can run or execute what they want even if it harms the
> > system. Thus I agree that we need to change some default settings to
> > prevent this kind of abusing situation. Before we proceed to do it, I
> want
> > to listen to others' opinions.
> >
> > Feel free to reply this email
> >
> > Regards,
> > Jongyoul
> >
> > --
> > 이종열, Jongyoul Lee, 李宗烈
> > http://madeng.net
> >
>
>
>


--
이종열, Jongyoul Lee, 李宗烈
http://madeng.net




--
이종열, Jongyoul Lee, 李宗烈
http://madeng.net


RE: Is any limitation of maximum interpreter processes?

2017-10-18 Thread Belousov Maksim Eduardovich
I found out that there is a limitaion in a number of schedulers in 
SchedulerFactory.java[1]

"executor = ExecutorFactory.singleton().createOrGet("SchedulerFactory", 100);"

It can be tested by:
Set a small number for SchedulerFactory, for example 16.
Run notes with interpreters in an isolated mode per user and per note.
See pending paragraphs when a dozen of interpreter processes will start.

There is no limitation in total number of started interpreter processes, but 
there is a limitation in schedulers.
Scheduler born inside interpreter. If we need a limitation it's to be good to 
limit a number of interpreter processes.

Is this limitation in schedulers useful?


1. 
https://github.com/apache/zeppelin/blob/master/zeppelin-interpreter/src/main/java/org/apache/zeppelin/scheduler/SchedulerFactory.java


Maksim Belousov

From: Belousov Maksim Eduardovich [mailto:m.belou...@tinkoff.ru]
Sent: Tuesday, October 03, 2017 10:37 AM
To: users@zeppelin.apache.org
Subject: RE: Is any limitation of maximum interpreter processes?

> Which interpreter is pending ?
There comes a time when any paragraph with any interpreter doesn't run and 
remains in 'Pending' state.
We use local spark instances in spark interpretator.

Logs don't contain errors.


Максим Белоусов
Архитектор
Отдел отчетности и витрин данных
Управление хранилищ данных и отчетности
Тел.: +7 495 648-10-00, доб. 2271

From: Jianfeng (Jeff) Zhang [mailto:jzh...@hortonworks.com]
Sent: Tuesday, October 03, 2017 2:01 AM
To: users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>
Subject: Re: Is any limitation of maximum interpreter processes?


Which interpreter is pending ? It is possible that spark interpreter pending 
due to yarn resource capacity if you run it in yarn client mode

If it is pending, you can check the log first.



Best Regard,
Jeff Zhang


From: Belousov Maksim Eduardovich 
<m.belou...@tinkoff.ru<mailto:m.belou...@tinkoff.ru>>
Reply-To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Date: Monday, October 2, 2017 at 9:26 PM
To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Subject: Is any limitation of maximum interpreter processes?

Hello, users!

Our analysts run notes with such interpreters: markdown, one or two jdbc and 
pyspark. The interpreters are instantiated Per User in isolated process and Per 
Note in isolated process.

And the analysts complain that sometimes paragraphs aren't processed and stay 
in status 'Pending'.
We noticed that it happen when number of started interpreter processes is about 
90-100.
If admin restarts one of the popular interpreter (that is killing some 
interpreter processes), the paragraphs become 'Running'.

We can't see any workload on zeppelin server when paragraphs are pended. RAM is 
sufficiently, iowait ~ 0
Also we can't find out any parameters about maximum interpreter processes.


Has anyone of you faced the same problem? How can this problem be solved?


Thanks,

Maksim Belousov




RE: Zeppelin Stops Loading Notes

2017-10-13 Thread Belousov Maksim Eduardovich
Paul, Ben, Fabian,
please share your workload at time when notes are not loading.

How much interpreters were started at that moment?

You can find all started interpreters in linux command line with:
ps -ef | grep ZeppelinServer | grep -v grep | awk "NR==1" | awk -F' ' '{print 
$2}' | xargs ps -f --ppid | wc -l

And spark started interpreters:
ps -ef | grep ZeppelinServer | grep -v grep | awk "NR==1" | awk -F' ' '{print 
$2}' | xargs ps -f --ppid | grep spark | wc -l



Максим Белоусов
Архитектор
Отдел отчетности и витрин данных
Управление хранилищ данных и отчетности
Тел.: +7 495 648-10-00, доб. 2271

From: Paul Brenner [mailto:pbren...@placeiq.com]
Sent: Thursday, October 12, 2017 7:45 PM
To: Geoffrey Cheng ; users@zeppelin.apache.org
Subject: Re: Zeppelin Stops Loading Notes

[https://share.polymail.io/v2/z/a/NTlkZjliYjE3MmFm/GBrDc33gW4GiZNTZBS_xV13grE-GeWLFw0ScDUHMwFNKw_JiWRWwck1mDOMthhhFuKERg2yT-GFU9L4aTwWhM7XWYY-r6DJGKBzEDe3VE2RfpzddnNRkp_Zpnb_TXVItyGxSxTvIer3-RVJ3GroFVEMhWGg=.png]
Does this issue need a Jira ticket? The problem is that I have no idea how to 
reproduce and I’m not sure if there is anything in the logs that is relevant.

Any ideas how we can produce an actionable Jira ticket out of this?

[https://ci3.googleusercontent.com/proxy/tFn1I-GEOnccUtv8DHHEc49-6g3x3CbuQKzbfl2Z1BObEy0Qz6QebJimpP96TK3Za5MXwXTuwBZaobKp22nYAG3NdxAC0Q=s0-d-e1-ft#https://marketing.placeiq.net/images/placeiq.png]

Paul Brenner

[https://ci4.googleusercontent.com/proxy/490PXYv9O6OiIp_DL4vuabJqVn53fMon5xNYZdftCVea9ySR2LcFDHe6Cdntb2G68uDAuA6FgLny8wKWLFWpsrPAt_FtLaE=s0-d-e1-ft#https://marketing.placeiq.net/images/twitter1.png]

[https://ci3.googleusercontent.com/proxy/fztHf1lRKLQYcAxebqfp2PYXCwVap3GobHVIbyp0j3NcuJOY16bUAZBibVOFf-fd1GsiuhrOfYy6dSwhlCwWU8ZUlw9OX5I=s0-d-e1-ft#https://marketing.placeiq.net/images/facebook.png]

[https://ci5.googleusercontent.com/proxy/H26ThD7R6DOqxoLTgzi6k5SMrHoF2Tj44xI_7XlD9KfOIiGwe1WIMc5iQBxUBA9EuIyJMdaRXrhZTOrnkrn8O9Rf1FP9UQU=s0-d-e1-ft#https://marketing.placeiq.net/images/linkedin.png]

DATA SCIENTIST

(217) 390-3033



[PlaceIQ:Landmark by 
PlaceIQ]


On Thu, Oct 12, 2017 at 8:31 AM Geoffrey Cheng mailto:geoffrey%20cheng%20%3cgeoffrey.ch...@gmail.com%3e> > wrote:
we have the same issue.  usually when multiple ppl using it, only header loads.

we tried couldn't find solution, so we restart every single time.   in fact ,  
we have to restart daily at least.

On Oct 12, 2017 2:40 AM, "Fabian Böhnlein" 
> wrote:
Hi Paul, Ben,

we also see this happen regularly. It's more likely to happen when a handful of 
people are using it.

We mostly run one spark interpreter per person. We also don't observe anything 
in the logs. The 'header' that you mentioned is actually still in the cache.

Sometimes it's specific notes that don't load.
Sometimes there's a hanging Spark interpreter, once it's killed notes load 
again.

We're pretty clueless about it.

Any front-end related logs we could enable to find out more?

On Sat, 19 Aug 2017 at 20:19 Ben Vogan 
> wrote:
I have seen Zeppelin get into this state once.  I restarted it without 
investigating the logs however so I don't have anything useful to go on as to 
why.

--Ben

On Sat, Aug 19, 2017 at 8:17 AM, Paul Brenner 
> wrote:
You were correct. We had "export ZEPPELIN_SSL_PORT=false” in our 
zeppelin-env.sh. I’m going to comment that out. I suspect it is actually 
unrelated to the behavior we are seeing where pages stop loading though. Anyone 
else see this happen?

I’ll report back if that happens again after the fix.



Paul Brenner





DATA SCIENTIST

(217) 390-3033  





 
On Fri, Aug 18, 2017 at 6:37 PM moon soo Lee  
wrote:
Hi,
 
One of configuration value in your conf/zeppelin-env.sh or 
conf/zeppelin-site.xml seems "false" which expected to be to a 
number.
 
Do you have any environment variable or property set to "false" for the 
configurations below?
 
ZEPPELIN_PORT, zeppelin.server.port
ZEPPELIN_SSL_PORT, zeppelin.server.ssl.port
ZEPPELIN_INTERPRETER_CONNECT_TIMEOUT, 
zeppelin.interpreter.connect.timeout

RE: Is any limitation of maximum interpreter processes?

2017-10-03 Thread Belousov Maksim Eduardovich
> Which interpreter is pending ?
There comes a time when any paragraph with any interpreter doesn't run and 
remains in 'Pending' state.
We use local spark instances in spark interpretator.

Logs don't contain errors.


Максим Белоусов
Архитектор
Отдел отчетности и витрин данных
Управление хранилищ данных и отчетности
Тел.: +7 495 648-10-00, доб. 2271

From: Jianfeng (Jeff) Zhang [mailto:jzh...@hortonworks.com]
Sent: Tuesday, October 03, 2017 2:01 AM
To: users@zeppelin.apache.org
Subject: Re: Is any limitation of maximum interpreter processes?


Which interpreter is pending ? It is possible that spark interpreter pending 
due to yarn resource capacity if you run it in yarn client mode

If it is pending, you can check the log first.



Best Regard,
Jeff Zhang


From: Belousov Maksim Eduardovich 
<m.belou...@tinkoff.ru<mailto:m.belou...@tinkoff.ru>>
Reply-To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Date: Monday, October 2, 2017 at 9:26 PM
To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Subject: Is any limitation of maximum interpreter processes?

Hello, users!

Our analysts run notes with such interpreters: markdown, one or two jdbc and 
pyspark. The interpreters are instantiated Per User in isolated process and Per 
Note in isolated process.

And the analysts complain that sometimes paragraphs aren't processed and stay 
in status 'Pending'.
We noticed that it happen when number of started interpreter processes is about 
90-100.
If admin restarts one of the popular interpreter (that is killing some 
interpreter processes), the paragraphs become 'Running'.

We can't see any workload on zeppelin server when paragraphs are pended. RAM is 
sufficiently, iowait ~ 0
Also we can't find out any parameters about maximum interpreter processes.


Has anyone of you faced the same problem? How can this problem be solved?


Thanks,


Maksim Belousov





Is any limitation of maximum interpreter processes?

2017-10-02 Thread Belousov Maksim Eduardovich
Hello, users!

Our analysts run notes with such interpreters: markdown, one or two jdbc and 
pyspark. The interpreters are instantiated Per User in isolated process and Per 
Note in isolated process.

And the analysts complain that sometimes paragraphs aren't processed and stay 
in status 'Pending'.
We noticed that it happen when number of started interpreter processes is about 
90-100.
If admin restarts one of the popular interpreter (that is killing some 
interpreter processes), the paragraphs become 'Running'.

We can't see any workload on zeppelin server when paragraphs are pended. RAM is 
sufficiently, iowait ~ 0
Also we can't find out any parameters about maximum interpreter processes.


Has anyone of you faced the same problem? How can this problem be solved?


Thanks,

Maksim Belousov




Implementing run all paragraphs sequentially

2017-09-28 Thread Belousov Maksim Eduardovich
Hello, users!
At the moment our analysts often use mixes of interpreters in their notes.
For example, they prepare data using %jdbc and then use it in %pyspark. 
Besides, they often use scheduling to make some regular reporting. And they 
should do something like `time.sleep()` to wait for the data from %jdbc. It 
doesn`t guarantee the result and doesn`t look cool.

You can find early attempts to implement sequential running of all paragraphs 
in [1].
We are really interested in implementation of the issue [2] and are ready to 
solve it.
It seems a good idea to discuss any requirements.
My idea is to introduce note setting that defines the type of running to use 
(parallel or sequential) and leave "Run all" to be the only button running all 
the cells in the note. This will make sequential or parallel running the `note 
option` but not `run option`.
Option will be controlled by nearby button as shown
[https://lh6.googleusercontent.com/jwnb7xfb0fPbFg1CWPoMSqovu7ecSMv4pJfuP4zdKVZbyAUDwzAT2GJ5EiemXVYrqMW73yklemTpjXNyLRJABpTCoHi6us2ZI_AxWKHwZpBEA7MjpMP0-7Nk8saaJQfIF4yBMPfS]


For new notes the default state would be "Run sequential all", for old - "Run 
parallel for interpreters"
We are glad to hear any thoughts.
Thank you.

[1] https://issues.apache.org/jira/browse/ZEPPELIN-1165
[2] https://issues.apache.org/jira/browse/ZEPPELIN-2368



Maksim Belousov




Issues with installation of helium packages on 0.7.2

2017-09-27 Thread Belousov Maksim Eduardovich
Hi, users!


1)  I installed version 0.7.2
Then installed package: "npm i ultimate-column-chart-negative-values"
And follow instruction for 0.7.x in 
https://stackoverflow.com/questions/44342619/apache-zeppelin-how-to-use-helium-framework-in-apache-zeppelin

When I try to enable ultimate-column-chart-negative-values I have got error
[https://lh6.googleusercontent.com/r2x-pC33c59BeSgsZqeTeViam5Y_ov8ExK_7x7akJoiGg4yWbfM-i56K8ZYfFQ-oj7x3aZT6Vw_qSY2JpILtTfrwmWHX-hXUXgDDuIKzP70ld_l9GMfkFinQRAVvgjBeq5lHr4y3]


2)  Also commands "npm i ultimate-area-chart" and "npm i 
ultimate-line-chart" are fallen with error
npm http 404 https://registry.npmjs.org/amcharts3-export
npm ERR! TypeError: Object.keys called on non-object

How these issues can be solved?

Thanks,
Maksim Belousov