Re: [DISCUSS] Update Roadmap

Shabeel Syed Tue, 01 Mar 2016 00:52:49 -0800

+1

Hi Tamas,
   Pluggable external visualization is really a GREAT feature to have. I'm
looking forward to this :)


Regards
Shabeel

On Tue, Mar 1, 2016 at 2:16 PM, Tamas Szuromi <tamas.szur...@odigeo.com>
wrote:

> Hey,
>
> Really promising roadmap.
>
> I'd only push more visualization options. I agree built in visualization
> is needed with limited charting options but I think we also need somehow
> 'inject' external js visualizations also.
>
>
> For scheduling Zeppelin notebooks  we use
>  https://github.com/airbnb/airflow <https://github.com/airbnb/airflow> through
> the job rest api. It's an enterprise ready and very robust solution right
> now.
>
>
> *Tamas*
>
> On 1 March 2016 at 09:12, Eran Witkon <eranwit...@gmail.com> wrote:
>
>> One point to clarify, I don't want to suggest Oozie in specific, I want
>> to think about which features we develop and which ones we integrate
>> external, preferred Apache, technology? We don't think about building our
>> own storage services so why build our own scheduler?
>> Eran
>> On Tue, 1 Mar 2016 at 09:49 moon soo Lee <m...@apache.org> wrote:
>>
>>> @Vinayak, @Eran, @Benjamin, @Guilherme, @Sourav, @Rick
>>> Now I can see a lot of demands around enterprise level job scheduling.
>>> Either external or built-in, I completely agree having enterprise level job
>>> scheduling support on the roadmap.
>>> ZEPPELIN-137 <https://issues.apache.org/jira/browse/ZEPPELIN-137>,
>>> ZEPPELIN-531 <https://issues.apache.org/jira/browse/ZEPPELIN-531> are
>>> related issues i can find in our JIRA.
>>>
>>> @Vinayak
>>> Regarding importing notebook from github, Zeppelin has pluggable
>>> notebook storage layer (see related package
>>> <https://github.com/apache/incubator-zeppelin/tree/master/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/repo>).
>>> So, github notebook sync can be implemented easily.
>>>
>>> @Shabeel
>>> Right, we need better manage management to prevent such OOM.
>>> And i think table is one of the most frequently used way of displaying
>>> data. So definitely, we'll need more features like filter, sort, etc.
>>> After this roadmap discussion, discussion for the next release will
>>> follow. Then we'll get idea when those features will be available.
>>>
>>> @Prasad
>>> Thanks for mentioning HA and DR. They're really important subject for
>>> enterprise use. Definitely Zeppelin will need to address them.
>>> And displaying meta information of notebook on top level page is good
>>> idea.
>>>
>>> It's really great to hear many opinions and ideas.
>>> And thanks @Rick for sharing valuable view to Zeppelin project.
>>>
>>> Thanks,
>>> moon
>>>
>>>
>>> On Mon, Feb 29, 2016 at 11:14 PM Rick Moritz <rah...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> For one, I know that there is rudimentary scheduling built into
>>>> Zeppelin already (at least I fixed a bug in the test for a scheduling
>>>> feature a few months ago).
>>>> But another point is, that Zeppelin should also focus on quality,
>>>> reproduceability and portability.
>>>> Although this doesn't offer exciting new features, it would make
>>>> development much easier.
>>>>
>>>> Cross-platform testability, Tests that pass when run sequentially,
>>>> compatibility with Firefox, and many more open issues that make it so much
>>>> harder to enhance Zeppelin and add features should be addressed soon,
>>>> preferably before more features are added. Already Zeppelin is suffering -
>>>> in my opinion - from quite a lot of feature creep, and we should avoid
>>>> putting in the kitchen sink, at the cost of quality and maintainability.
>>>> Instead modularity (ZEPPELIN-533 in particular) should be targeted.
>>>>
>>>> Oozie, in my opinion, is a dead end - it may de-facto still be in use
>>>> on many clusters, but it's not getting the love it needs, and I wouldn't
>>>> bet on it, when it comes to integrating scheduling. Instead, any external
>>>> tool should be able to use the REST-API to trigger executions, if you want
>>>> external scheduling.
>>>>
>>>> So, in conclusion, if we take Moon's list as a list of descending
>>>> priorities, I fully agree, under the condition that code quality is
>>>> included as a subset of enterprise-readyness. Auth* is paramount (Kerberos
>>>> SPNEGO SSO support is what we really want) with user and group rights
>>>> assignment on the notebook level. We probably also need Knox-integration
>>>> (ODP-Members looking at integrating Zeppelin should consider contributing
>>>> this), and integration of something like Spree (
>>>> https://github.com/hammerlab/spree) to be able to profile jobs.
>>>>
>>>> I'm hopeful that soon I can resume contributing some quality-oriented
>>>> code, to drive this "necessary evil" forward ;)
>>>>
>>>> On Mon, Feb 29, 2016 at 8:27 PM, Sourav Mazumder <
>>>> sourav.mazumde...@gmail.com> wrote:
>>>>
>>>>> I do agree with Vinayak. It need not be coupled with Oozie.
>>>>>
>>>>> Rather one should be able to call it from any scheduler typically used
>>>>> in enterprise level. May be support for BPML.
>>>>>
>>>>> I believe the existing ability to call/execute a Zeppelin Notebook or
>>>>> a specific paragraph within a notebook using REST API should take care of
>>>>> this requirement to some extent.
>>>>>
>>>>> Regards,
>>>>> Sourav
>>>>>
>>>>> On Mon, Feb 29, 2016 at 11:23 AM, Vinayak Agrawal <
>>>>> vinayakagrawa...@gmail.com> wrote:
>>>>>
>>>>>> @Eran Witkon,
>>>>>> Thanks for the suggestion Eran. I concur with your thought.
>>>>>> If Zepplin can be integrated with oozie, that would be wonderful.
>>>>>> Users will also be able to leverage their Oozie skills.
>>>>>> This would be promising for now.
>>>>>> However, in the future Hadoop might not necessarily be installed in
>>>>>> Spark Cluster and Oozie (since its installs with Hadoop Distribution) 
>>>>>> might
>>>>>> not be available.
>>>>>> So perhaps we should give a thought about this feature for the
>>>>>> future. Should it depend on oozie or should Zeppelin have its owns
>>>>>> scheduling?
>>>>>>
>>>>>> As Benjamin has iterated, Databrick notebook has this as a core
>>>>>> notebook feature.
>>>>>>
>>>>>>
>>>>>> Also, would anybody give any suggestions regarding "sync with github"
>>>>>> feature?
>>>>>> -Exporting notebook to Github
>>>>>> -Importing notebook from Github
>>>>>>
>>>>>> Thanks
>>>>>> Vinayak
>>>>>>
>>>>>>
>>>>>> On Mon, Feb 29, 2016 at 4:17 AM, Eran Witkon <eranwit...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> @Vinayak Agrawal I would suggest adding the ability to connect
>>>>>>> zeppelin to existing scheduling tools\workflow tools such as
>>>>>>> https://oozie.apache.org/. this requires betters hooks and status
>>>>>>> reporting but doesn't make zeppeling and ETL\scheduler tool by itself/
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal <
>>>>>>> vinayakagrawa...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Moon,
>>>>>>>> The new roadmap looks very promising. I am very happy to see
>>>>>>>> security in the list.
>>>>>>>> I have some suggestions regarding Enterprise Ready features:
>>>>>>>>
>>>>>>>> 1. Job Scheduler - Can this be improved?
>>>>>>>> Currently the scheduler can be used with Cron expression or a
>>>>>>>> pre-set time. But in an enterprise solution, a notebook might be one 
>>>>>>>> piece
>>>>>>>> of the workflow. Can we look towards the functionality of scheduling
>>>>>>>> notebook's based on other notebooks finishing their job successfully?
>>>>>>>> This requirement would arise in any ETL workflow, where all the
>>>>>>>> downstream users wait for the ETL notebook to finish successfully. Only
>>>>>>>> after that, other business oriented notebooks can be executed.
>>>>>>>>
>>>>>>>> 2. Importing a notebook - Is there a current requirement or future
>>>>>>>> plan to implement a feature that allows import-notebook-from-github? 
>>>>>>>> This
>>>>>>>> would allow users to share notebooks seamlessly.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Vinayak
>>>>>>>>
>>>>>>>> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <m...@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Zhong Wang,
>>>>>>>>> Right, Folder support would be quite useful. Thanks for the
>>>>>>>>> opinion.
>>>>>>>>>
>>>>>>>> Hope i can finish the work pr-190
>>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/190>.
>>>>>>>>>
>>>>>>>>
>>>>>>>>> Sourav,
>>>>>>>>> Regarding concurrent running, Zeppelin doesn't have limitation of
>>>>>>>>> run paragraph/query concurrently. Interpreter can implement it's own
>>>>>>>>> scheduling policy. For example, SparkSQL interpreter and 
>>>>>>>>> ShellInterpreter
>>>>>>>>> can already run paragraph/query concurrently.
>>>>>>>>>
>>>>>>>>> SparkInterpreter is implemented with FIFO scheduler considering
>>>>>>>>> nature of scala compiler. That's why user can not run multiple 
>>>>>>>>> paragraph
>>>>>>>>> concurrently when they work with SparkInterpreter.
>>>>>>>>> But as Zhong Wang mentioned, pr-703 enables each notebook will
>>>>>>>>> have separate scala compiler so paragraphs run concurrently, while 
>>>>>>>>> they're
>>>>>>>>> in different notebooks.
>>>>>>>>> Thanks for the feedback!
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> moon
>>>>>>>>>
>>>>>>>> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <wangzhong....@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>> Sourav: I think this newly merged PR can help you
>>>>>>>>>> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537
>>>>>>>>>>
>>>>>>>>>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
>>>>>>>>>> sourav.mazumde...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>> Hi Moon,
>>>>>>>>>>>
>>>>>>>>>>> This looks great.
>>>>>>>>>>>
>>>>>>>>>>> My only suggestion would be to include a PR/feature - Support
>>>>>>>>>>> for Running Concurrent paragraphs/queries in Zeppelin.
>>>>>>>>>>>
>>>>>>>>>>> Right now if more than one user tries to run paragraphs in
>>>>>>>>>>> multiple notebooks concurrently through a single Zeppelin instance 
>>>>>>>>>>> (and
>>>>>>>>>>> single interpreter instance) the performance is very slow. It is 
>>>>>>>>>>> obvious
>>>>>>>>>>> that the queue gets built up within the zeppelin process and 
>>>>>>>>>>> interpreter
>>>>>>>>>>> process in that scenario as the time taken to move the status from 
>>>>>>>>>>> start to
>>>>>>>>>>> pending and pending to running is very high compared to the actual 
>>>>>>>>>>> running
>>>>>>>>>>> time of a paragraph.
>>>>>>>>>>>
>>>>>>>>>>> Without this the multi tenancy support would be meaningless as
>>>>>>>>>>> no one can practically use it in a situation where multiple users 
>>>>>>>>>>> are
>>>>>>>>>>> trying to connect to the same instance of Zeppelin (and the related
>>>>>>>>>>> interpreter). A possible solution would be to spawn separate 
>>>>>>>>>>> instance of
>>>>>>>>>>> the same interpreter at every notebook/user level.
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Sourav
>>>>>>>>>>>
>>>>>>>>>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <m...@apache.org>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>> Hi Zeppelin users and developers,
>>>>>>>>>>>>
>>>>>>>>>>>> The roadmap we have published at
>>>>>>>>>>>>
>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>>>>>>>>>>>> is almost 9 month old, and it doesn't reflect where the
>>>>>>>>>>>> community goes anymore. It's time to update.
>>>>>>>>>>>>
>>>>>>>>>>>> Based on mailing list, jira issues, pullrequests, feedbacks
>>>>>>>>>>>> from users, conferences and meetings, I could summarize the major 
>>>>>>>>>>>> interest
>>>>>>>>>>>> of users and developers in 7 categories. Enterprise ready, 
>>>>>>>>>>>> Usability
>>>>>>>>>>>> improvement, Pluggability, Documentation, Backend integration, 
>>>>>>>>>>>> Notebook
>>>>>>>>>>>> storage, and Visualization.
>>>>>>>>>>>>
>>>>>>>>>>>> And i could list related subjects under each categories.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>    - Enterprise ready
>>>>>>>>>>>>       - Authentication
>>>>>>>>>>>>          - Shiro authentication ZEPPELIN-548
>>>>>>>>>>>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>>>>>>>>>>>>       - Authorization
>>>>>>>>>>>>          - Notebook authorization PR-681
>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/681>
>>>>>>>>>>>>       - Security
>>>>>>>>>>>>       - Multi-tenancy
>>>>>>>>>>>>       - Stability
>>>>>>>>>>>>    - Usability Improvement
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>    - UX improvement
>>>>>>>>>>>>       - Better Table data support
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>    - Download data as csv, etc PR-725
>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/725>
>>>>>>>>>>>>          , PR-714
>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/714>
>>>>>>>>>>>>          , PR-6
>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/6>
>>>>>>>>>>>>          , PR-89
>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/89>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>    - Featureful table data display (pagenation, etc)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>    - Pluggability ZEPPELIN-533
>>>>>>>>>>>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>>>>>>>>>>>>       - Pluggable visualization
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>    - Dynamic Interpreter, notebook, visualization loading
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>    - Repository and registry for pluggable components
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>    - Improve documentation
>>>>>>>>>>>>       - Improve contents and readability
>>>>>>>>>>>>       - more tutorials, examples
>>>>>>>>>>>>    - Interpreter
>>>>>>>>>>>>       - Generic JDBC Interpreter
>>>>>>>>>>>>       - (spark)R Interpreter
>>>>>>>>>>>>       - Cluster manager for interpreter (Proposal
>>>>>>>>>>>>       
>>>>>>>>>>>> <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>>>>>>>>>>>>       )
>>>>>>>>>>>>       - more interpreters
>>>>>>>>>>>>    - Notebook storage
>>>>>>>>>>>>       - Versioning ZEPPELIN-540
>>>>>>>>>>>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>>>>>>>>>>>>       - more notebook storages
>>>>>>>>>>>>    - Visualization
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>    - More visualizations PR-152
>>>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/152>,
>>>>>>>>>>>>       PR-728
>>>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/728>,
>>>>>>>>>>>>       PR-336
>>>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/336>,
>>>>>>>>>>>>       PR-321
>>>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/321>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>    - Customize graph (show/hide label, color, etc)
>>>>>>>>>>>>
>>>>>>>>>>>> It will help anyone quickly get overall interest of project and
>>>>>>>>>>>> the direction. And based on this roadmap, we can discuss and 
>>>>>>>>>>>> re-define the
>>>>>>>>>>>> next release 0.6.0 scope and it's schedule.
>>>>>>>>>>>>
>>>>>>>>>>>> What do you think? Any feedback would be appreciated.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> moon
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Vinayak Agrawal
>>>>>>>>
>>>>>>>>
>>>>>>>> "To Strive, To Seek, To Find and Not to Yield!"
>>>>>>>> ~Lord Alfred Tennyson
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Vinayak Agrawal
>>>>>> Big Data Analytics
>>>>>> IBM
>>>>>>
>>>>>> "To Strive, To Seek, To Find and Not to Yield!"
>>>>>> ~Lord Alfred Tennyson
>>>>>>
>>>>>
>>>>>
>>>>
>

Re: [DISCUSS] Update Roadmap

Reply via email to