Hey, Really promising roadmap.
I'd only push more visualization options. I agree built in visualization is needed with limited charting options but I think we also need somehow 'inject' external js visualizations also. For scheduling Zeppelin notebooks we use https://github.com/airbnb/airflow <https://github.com/airbnb/airflow> through the job rest api. It's an enterprise ready and very robust solution right now. *Tamas* On 1 March 2016 at 09:12, Eran Witkon <eranwit...@gmail.com> wrote: > One point to clarify, I don't want to suggest Oozie in specific, I want to > think about which features we develop and which ones we integrate external, > preferred Apache, technology? We don't think about building our own storage > services so why build our own scheduler? > Eran > On Tue, 1 Mar 2016 at 09:49 moon soo Lee <m...@apache.org> wrote: > >> @Vinayak, @Eran, @Benjamin, @Guilherme, @Sourav, @Rick >> Now I can see a lot of demands around enterprise level job scheduling. >> Either external or built-in, I completely agree having enterprise level job >> scheduling support on the roadmap. >> ZEPPELIN-137 <https://issues.apache.org/jira/browse/ZEPPELIN-137>, >> ZEPPELIN-531 <https://issues.apache.org/jira/browse/ZEPPELIN-531> are >> related issues i can find in our JIRA. >> >> @Vinayak >> Regarding importing notebook from github, Zeppelin has pluggable notebook >> storage layer (see related package >> <https://github.com/apache/incubator-zeppelin/tree/master/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/repo>). >> So, github notebook sync can be implemented easily. >> >> @Shabeel >> Right, we need better manage management to prevent such OOM. >> And i think table is one of the most frequently used way of displaying >> data. So definitely, we'll need more features like filter, sort, etc. >> After this roadmap discussion, discussion for the next release will >> follow. Then we'll get idea when those features will be available. >> >> @Prasad >> Thanks for mentioning HA and DR. They're really important subject for >> enterprise use. Definitely Zeppelin will need to address them. >> And displaying meta information of notebook on top level page is good >> idea. >> >> It's really great to hear many opinions and ideas. >> And thanks @Rick for sharing valuable view to Zeppelin project. >> >> Thanks, >> moon >> >> >> On Mon, Feb 29, 2016 at 11:14 PM Rick Moritz <rah...@gmail.com> wrote: >> >>> Hi, >>> >>> For one, I know that there is rudimentary scheduling built into Zeppelin >>> already (at least I fixed a bug in the test for a scheduling feature a few >>> months ago). >>> But another point is, that Zeppelin should also focus on quality, >>> reproduceability and portability. >>> Although this doesn't offer exciting new features, it would make >>> development much easier. >>> >>> Cross-platform testability, Tests that pass when run sequentially, >>> compatibility with Firefox, and many more open issues that make it so much >>> harder to enhance Zeppelin and add features should be addressed soon, >>> preferably before more features are added. Already Zeppelin is suffering - >>> in my opinion - from quite a lot of feature creep, and we should avoid >>> putting in the kitchen sink, at the cost of quality and maintainability. >>> Instead modularity (ZEPPELIN-533 in particular) should be targeted. >>> >>> Oozie, in my opinion, is a dead end - it may de-facto still be in use on >>> many clusters, but it's not getting the love it needs, and I wouldn't bet >>> on it, when it comes to integrating scheduling. Instead, any external tool >>> should be able to use the REST-API to trigger executions, if you want >>> external scheduling. >>> >>> So, in conclusion, if we take Moon's list as a list of descending >>> priorities, I fully agree, under the condition that code quality is >>> included as a subset of enterprise-readyness. Auth* is paramount (Kerberos >>> SPNEGO SSO support is what we really want) with user and group rights >>> assignment on the notebook level. We probably also need Knox-integration >>> (ODP-Members looking at integrating Zeppelin should consider contributing >>> this), and integration of something like Spree ( >>> https://github.com/hammerlab/spree) to be able to profile jobs. >>> >>> I'm hopeful that soon I can resume contributing some quality-oriented >>> code, to drive this "necessary evil" forward ;) >>> >>> On Mon, Feb 29, 2016 at 8:27 PM, Sourav Mazumder < >>> sourav.mazumde...@gmail.com> wrote: >>> >>>> I do agree with Vinayak. It need not be coupled with Oozie. >>>> >>>> Rather one should be able to call it from any scheduler typically used >>>> in enterprise level. May be support for BPML. >>>> >>>> I believe the existing ability to call/execute a Zeppelin Notebook or a >>>> specific paragraph within a notebook using REST API should take care of >>>> this requirement to some extent. >>>> >>>> Regards, >>>> Sourav >>>> >>>> On Mon, Feb 29, 2016 at 11:23 AM, Vinayak Agrawal < >>>> vinayakagrawa...@gmail.com> wrote: >>>> >>>>> @Eran Witkon, >>>>> Thanks for the suggestion Eran. I concur with your thought. >>>>> If Zepplin can be integrated with oozie, that would be wonderful. >>>>> Users will also be able to leverage their Oozie skills. >>>>> This would be promising for now. >>>>> However, in the future Hadoop might not necessarily be installed in >>>>> Spark Cluster and Oozie (since its installs with Hadoop Distribution) >>>>> might >>>>> not be available. >>>>> So perhaps we should give a thought about this feature for the future. >>>>> Should it depend on oozie or should Zeppelin have its owns scheduling? >>>>> >>>>> As Benjamin has iterated, Databrick notebook has this as a core >>>>> notebook feature. >>>>> >>>>> >>>>> Also, would anybody give any suggestions regarding "sync with github" >>>>> feature? >>>>> -Exporting notebook to Github >>>>> -Importing notebook from Github >>>>> >>>>> Thanks >>>>> Vinayak >>>>> >>>>> >>>>> On Mon, Feb 29, 2016 at 4:17 AM, Eran Witkon <eranwit...@gmail.com> >>>>> wrote: >>>>> >>>>>> @Vinayak Agrawal I would suggest adding the ability to connect >>>>>> zeppelin to existing scheduling tools\workflow tools such as >>>>>> https://oozie.apache.org/. this requires betters hooks and status >>>>>> reporting but doesn't make zeppeling and ETL\scheduler tool by itself/ >>>>>> >>>>>> >>>>>> On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal < >>>>>> vinayakagrawa...@gmail.com> wrote: >>>>>> >>>>>>> Moon, >>>>>>> The new roadmap looks very promising. I am very happy to see >>>>>>> security in the list. >>>>>>> I have some suggestions regarding Enterprise Ready features: >>>>>>> >>>>>>> 1. Job Scheduler - Can this be improved? >>>>>>> Currently the scheduler can be used with Cron expression or a >>>>>>> pre-set time. But in an enterprise solution, a notebook might be one >>>>>>> piece >>>>>>> of the workflow. Can we look towards the functionality of scheduling >>>>>>> notebook's based on other notebooks finishing their job successfully? >>>>>>> This requirement would arise in any ETL workflow, where all the >>>>>>> downstream users wait for the ETL notebook to finish successfully. Only >>>>>>> after that, other business oriented notebooks can be executed. >>>>>>> >>>>>>> 2. Importing a notebook - Is there a current requirement or future >>>>>>> plan to implement a feature that allows import-notebook-from-github? >>>>>>> This >>>>>>> would allow users to share notebooks seamlessly. >>>>>>> >>>>>>> Thanks >>>>>>> Vinayak >>>>>>> >>>>>>> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <m...@apache.org> >>>>>>> wrote: >>>>>>> >>>>>>>> Zhong Wang, >>>>>>>> Right, Folder support would be quite useful. Thanks for the >>>>>>>> opinion. >>>>>>>> >>>>>>> Hope i can finish the work pr-190 >>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/190>. >>>>>>>> >>>>>>> >>>>>>>> Sourav, >>>>>>>> Regarding concurrent running, Zeppelin doesn't have limitation of >>>>>>>> run paragraph/query concurrently. Interpreter can implement it's own >>>>>>>> scheduling policy. For example, SparkSQL interpreter and >>>>>>>> ShellInterpreter >>>>>>>> can already run paragraph/query concurrently. >>>>>>>> >>>>>>>> SparkInterpreter is implemented with FIFO scheduler considering >>>>>>>> nature of scala compiler. That's why user can not run multiple >>>>>>>> paragraph >>>>>>>> concurrently when they work with SparkInterpreter. >>>>>>>> But as Zhong Wang mentioned, pr-703 enables each notebook will have >>>>>>>> separate scala compiler so paragraphs run concurrently, while they're >>>>>>>> in >>>>>>>> different notebooks. >>>>>>>> Thanks for the feedback! >>>>>>>> >>>>>>>> Best, >>>>>>>> moon >>>>>>>> >>>>>>> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <wangzhong....@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>> Sourav: I think this newly merged PR can help you >>>>>>>>> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537 >>>>>>>>> >>>>>>>>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder < >>>>>>>>> sourav.mazumde...@gmail.com> wrote: >>>>>>>>> >>>>>>>> Hi Moon, >>>>>>>>>> >>>>>>>>>> This looks great. >>>>>>>>>> >>>>>>>>>> My only suggestion would be to include a PR/feature - Support for >>>>>>>>>> Running Concurrent paragraphs/queries in Zeppelin. >>>>>>>>>> >>>>>>>>>> Right now if more than one user tries to run paragraphs in >>>>>>>>>> multiple notebooks concurrently through a single Zeppelin instance >>>>>>>>>> (and >>>>>>>>>> single interpreter instance) the performance is very slow. It is >>>>>>>>>> obvious >>>>>>>>>> that the queue gets built up within the zeppelin process and >>>>>>>>>> interpreter >>>>>>>>>> process in that scenario as the time taken to move the status from >>>>>>>>>> start to >>>>>>>>>> pending and pending to running is very high compared to the actual >>>>>>>>>> running >>>>>>>>>> time of a paragraph. >>>>>>>>>> >>>>>>>>>> Without this the multi tenancy support would be meaningless as no >>>>>>>>>> one can practically use it in a situation where multiple users are >>>>>>>>>> trying >>>>>>>>>> to connect to the same instance of Zeppelin (and the related >>>>>>>>>> interpreter). >>>>>>>>>> A possible solution would be to spawn separate instance of the same >>>>>>>>>> interpreter at every notebook/user level. >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Sourav >>>>>>>>>> >>>>>>>>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <m...@apache.org> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>> Hi Zeppelin users and developers, >>>>>>>>>>> >>>>>>>>>>> The roadmap we have published at >>>>>>>>>>> >>>>>>>>>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap >>>>>>>>>>> is almost 9 month old, and it doesn't reflect where the >>>>>>>>>>> community goes anymore. It's time to update. >>>>>>>>>>> >>>>>>>>>>> Based on mailing list, jira issues, pullrequests, feedbacks from >>>>>>>>>>> users, conferences and meetings, I could summarize the major >>>>>>>>>>> interest of >>>>>>>>>>> users and developers in 7 categories. Enterprise ready, Usability >>>>>>>>>>> improvement, Pluggability, Documentation, Backend integration, >>>>>>>>>>> Notebook >>>>>>>>>>> storage, and Visualization. >>>>>>>>>>> >>>>>>>>>>> And i could list related subjects under each categories. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> - Enterprise ready >>>>>>>>>>> - Authentication >>>>>>>>>>> - Shiro authentication ZEPPELIN-548 >>>>>>>>>>> <https://issues.apache.org/jira/browse/ZEPPELIN-548> >>>>>>>>>>> - Authorization >>>>>>>>>>> - Notebook authorization PR-681 >>>>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/681> >>>>>>>>>>> - Security >>>>>>>>>>> - Multi-tenancy >>>>>>>>>>> - Stability >>>>>>>>>>> - Usability Improvement >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> - UX improvement >>>>>>>>>>> - Better Table data support >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> - Download data as csv, etc PR-725 >>>>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/725> >>>>>>>>>>> , PR-714 >>>>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/714> >>>>>>>>>>> , PR-6 >>>>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/6>, >>>>>>>>>>> PR-89 >>>>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/89> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> - Featureful table data display (pagenation, etc) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> - Pluggability ZEPPELIN-533 >>>>>>>>>>> <https://issues.apache.org/jira/browse/ZEPPELIN-533> >>>>>>>>>>> - Pluggable visualization >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> - Dynamic Interpreter, notebook, visualization loading >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> - Repository and registry for pluggable components >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> - Improve documentation >>>>>>>>>>> - Improve contents and readability >>>>>>>>>>> - more tutorials, examples >>>>>>>>>>> - Interpreter >>>>>>>>>>> - Generic JDBC Interpreter >>>>>>>>>>> - (spark)R Interpreter >>>>>>>>>>> - Cluster manager for interpreter (Proposal >>>>>>>>>>> >>>>>>>>>>> <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal> >>>>>>>>>>> ) >>>>>>>>>>> - more interpreters >>>>>>>>>>> - Notebook storage >>>>>>>>>>> - Versioning ZEPPELIN-540 >>>>>>>>>>> <http://issues.apache.org/jira/browse/ZEPPELIN-540> >>>>>>>>>>> - more notebook storages >>>>>>>>>>> - Visualization >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> - More visualizations PR-152 >>>>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/152>, >>>>>>>>>>> PR-728 >>>>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/728>, >>>>>>>>>>> PR-336 >>>>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/336>, >>>>>>>>>>> PR-321 >>>>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/321> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> - Customize graph (show/hide label, color, etc) >>>>>>>>>>> >>>>>>>>>>> It will help anyone quickly get overall interest of project and >>>>>>>>>>> the direction. And based on this roadmap, we can discuss and >>>>>>>>>>> re-define the >>>>>>>>>>> next release 0.6.0 scope and it's schedule. >>>>>>>>>>> >>>>>>>>>>> What do you think? Any feedback would be appreciated. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> moon >>>>>>>>>>> >>>>>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Vinayak Agrawal >>>>>>> >>>>>>> >>>>>>> "To Strive, To Seek, To Find and Not to Yield!" >>>>>>> ~Lord Alfred Tennyson >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Vinayak Agrawal >>>>> Big Data Analytics >>>>> IBM >>>>> >>>>> "To Strive, To Seek, To Find and Not to Yield!" >>>>> ~Lord Alfred Tennyson >>>>> >>>> >>>> >>>