+1 Hi Tamas, Pluggable external visualization is really a GREAT feature to have. I'm looking forward to this :)
Regards Shabeel On Tue, Mar 1, 2016 at 2:16 PM, Tamas Szuromi <tamas.szur...@odigeo.com> wrote: > Hey, > > Really promising roadmap. > > I'd only push more visualization options. I agree built in visualization > is needed with limited charting options but I think we also need somehow > 'inject' external js visualizations also. > > > For scheduling Zeppelin notebooks we use > https://github.com/airbnb/airflow <https://github.com/airbnb/airflow> through > the job rest api. It's an enterprise ready and very robust solution right > now. > > > *Tamas* > > On 1 March 2016 at 09:12, Eran Witkon <eranwit...@gmail.com> wrote: > >> One point to clarify, I don't want to suggest Oozie in specific, I want >> to think about which features we develop and which ones we integrate >> external, preferred Apache, technology? We don't think about building our >> own storage services so why build our own scheduler? >> Eran >> On Tue, 1 Mar 2016 at 09:49 moon soo Lee <m...@apache.org> wrote: >> >>> @Vinayak, @Eran, @Benjamin, @Guilherme, @Sourav, @Rick >>> Now I can see a lot of demands around enterprise level job scheduling. >>> Either external or built-in, I completely agree having enterprise level job >>> scheduling support on the roadmap. >>> ZEPPELIN-137 <https://issues.apache.org/jira/browse/ZEPPELIN-137>, >>> ZEPPELIN-531 <https://issues.apache.org/jira/browse/ZEPPELIN-531> are >>> related issues i can find in our JIRA. >>> >>> @Vinayak >>> Regarding importing notebook from github, Zeppelin has pluggable >>> notebook storage layer (see related package >>> <https://github.com/apache/incubator-zeppelin/tree/master/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/repo>). >>> So, github notebook sync can be implemented easily. >>> >>> @Shabeel >>> Right, we need better manage management to prevent such OOM. >>> And i think table is one of the most frequently used way of displaying >>> data. So definitely, we'll need more features like filter, sort, etc. >>> After this roadmap discussion, discussion for the next release will >>> follow. Then we'll get idea when those features will be available. >>> >>> @Prasad >>> Thanks for mentioning HA and DR. They're really important subject for >>> enterprise use. Definitely Zeppelin will need to address them. >>> And displaying meta information of notebook on top level page is good >>> idea. >>> >>> It's really great to hear many opinions and ideas. >>> And thanks @Rick for sharing valuable view to Zeppelin project. >>> >>> Thanks, >>> moon >>> >>> >>> On Mon, Feb 29, 2016 at 11:14 PM Rick Moritz <rah...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> For one, I know that there is rudimentary scheduling built into >>>> Zeppelin already (at least I fixed a bug in the test for a scheduling >>>> feature a few months ago). >>>> But another point is, that Zeppelin should also focus on quality, >>>> reproduceability and portability. >>>> Although this doesn't offer exciting new features, it would make >>>> development much easier. >>>> >>>> Cross-platform testability, Tests that pass when run sequentially, >>>> compatibility with Firefox, and many more open issues that make it so much >>>> harder to enhance Zeppelin and add features should be addressed soon, >>>> preferably before more features are added. Already Zeppelin is suffering - >>>> in my opinion - from quite a lot of feature creep, and we should avoid >>>> putting in the kitchen sink, at the cost of quality and maintainability. >>>> Instead modularity (ZEPPELIN-533 in particular) should be targeted. >>>> >>>> Oozie, in my opinion, is a dead end - it may de-facto still be in use >>>> on many clusters, but it's not getting the love it needs, and I wouldn't >>>> bet on it, when it comes to integrating scheduling. Instead, any external >>>> tool should be able to use the REST-API to trigger executions, if you want >>>> external scheduling. >>>> >>>> So, in conclusion, if we take Moon's list as a list of descending >>>> priorities, I fully agree, under the condition that code quality is >>>> included as a subset of enterprise-readyness. Auth* is paramount (Kerberos >>>> SPNEGO SSO support is what we really want) with user and group rights >>>> assignment on the notebook level. We probably also need Knox-integration >>>> (ODP-Members looking at integrating Zeppelin should consider contributing >>>> this), and integration of something like Spree ( >>>> https://github.com/hammerlab/spree) to be able to profile jobs. >>>> >>>> I'm hopeful that soon I can resume contributing some quality-oriented >>>> code, to drive this "necessary evil" forward ;) >>>> >>>> On Mon, Feb 29, 2016 at 8:27 PM, Sourav Mazumder < >>>> sourav.mazumde...@gmail.com> wrote: >>>> >>>>> I do agree with Vinayak. It need not be coupled with Oozie. >>>>> >>>>> Rather one should be able to call it from any scheduler typically used >>>>> in enterprise level. May be support for BPML. >>>>> >>>>> I believe the existing ability to call/execute a Zeppelin Notebook or >>>>> a specific paragraph within a notebook using REST API should take care of >>>>> this requirement to some extent. >>>>> >>>>> Regards, >>>>> Sourav >>>>> >>>>> On Mon, Feb 29, 2016 at 11:23 AM, Vinayak Agrawal < >>>>> vinayakagrawa...@gmail.com> wrote: >>>>> >>>>>> @Eran Witkon, >>>>>> Thanks for the suggestion Eran. I concur with your thought. >>>>>> If Zepplin can be integrated with oozie, that would be wonderful. >>>>>> Users will also be able to leverage their Oozie skills. >>>>>> This would be promising for now. >>>>>> However, in the future Hadoop might not necessarily be installed in >>>>>> Spark Cluster and Oozie (since its installs with Hadoop Distribution) >>>>>> might >>>>>> not be available. >>>>>> So perhaps we should give a thought about this feature for the >>>>>> future. Should it depend on oozie or should Zeppelin have its owns >>>>>> scheduling? >>>>>> >>>>>> As Benjamin has iterated, Databrick notebook has this as a core >>>>>> notebook feature. >>>>>> >>>>>> >>>>>> Also, would anybody give any suggestions regarding "sync with github" >>>>>> feature? >>>>>> -Exporting notebook to Github >>>>>> -Importing notebook from Github >>>>>> >>>>>> Thanks >>>>>> Vinayak >>>>>> >>>>>> >>>>>> On Mon, Feb 29, 2016 at 4:17 AM, Eran Witkon <eranwit...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> @Vinayak Agrawal I would suggest adding the ability to connect >>>>>>> zeppelin to existing scheduling tools\workflow tools such as >>>>>>> https://oozie.apache.org/. this requires betters hooks and status >>>>>>> reporting but doesn't make zeppeling and ETL\scheduler tool by itself/ >>>>>>> >>>>>>> >>>>>>> On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal < >>>>>>> vinayakagrawa...@gmail.com> wrote: >>>>>>> >>>>>>>> Moon, >>>>>>>> The new roadmap looks very promising. I am very happy to see >>>>>>>> security in the list. >>>>>>>> I have some suggestions regarding Enterprise Ready features: >>>>>>>> >>>>>>>> 1. Job Scheduler - Can this be improved? >>>>>>>> Currently the scheduler can be used with Cron expression or a >>>>>>>> pre-set time. But in an enterprise solution, a notebook might be one >>>>>>>> piece >>>>>>>> of the workflow. Can we look towards the functionality of scheduling >>>>>>>> notebook's based on other notebooks finishing their job successfully? >>>>>>>> This requirement would arise in any ETL workflow, where all the >>>>>>>> downstream users wait for the ETL notebook to finish successfully. Only >>>>>>>> after that, other business oriented notebooks can be executed. >>>>>>>> >>>>>>>> 2. Importing a notebook - Is there a current requirement or future >>>>>>>> plan to implement a feature that allows import-notebook-from-github? >>>>>>>> This >>>>>>>> would allow users to share notebooks seamlessly. >>>>>>>> >>>>>>>> Thanks >>>>>>>> Vinayak >>>>>>>> >>>>>>>> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <m...@apache.org> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Zhong Wang, >>>>>>>>> Right, Folder support would be quite useful. Thanks for the >>>>>>>>> opinion. >>>>>>>>> >>>>>>>> Hope i can finish the work pr-190 >>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/190>. >>>>>>>>> >>>>>>>> >>>>>>>>> Sourav, >>>>>>>>> Regarding concurrent running, Zeppelin doesn't have limitation of >>>>>>>>> run paragraph/query concurrently. Interpreter can implement it's own >>>>>>>>> scheduling policy. For example, SparkSQL interpreter and >>>>>>>>> ShellInterpreter >>>>>>>>> can already run paragraph/query concurrently. >>>>>>>>> >>>>>>>>> SparkInterpreter is implemented with FIFO scheduler considering >>>>>>>>> nature of scala compiler. That's why user can not run multiple >>>>>>>>> paragraph >>>>>>>>> concurrently when they work with SparkInterpreter. >>>>>>>>> But as Zhong Wang mentioned, pr-703 enables each notebook will >>>>>>>>> have separate scala compiler so paragraphs run concurrently, while >>>>>>>>> they're >>>>>>>>> in different notebooks. >>>>>>>>> Thanks for the feedback! >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> moon >>>>>>>>> >>>>>>>> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <wangzhong....@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>> Sourav: I think this newly merged PR can help you >>>>>>>>>> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537 >>>>>>>>>> >>>>>>>>>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder < >>>>>>>>>> sourav.mazumde...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>> Hi Moon, >>>>>>>>>>> >>>>>>>>>>> This looks great. >>>>>>>>>>> >>>>>>>>>>> My only suggestion would be to include a PR/feature - Support >>>>>>>>>>> for Running Concurrent paragraphs/queries in Zeppelin. >>>>>>>>>>> >>>>>>>>>>> Right now if more than one user tries to run paragraphs in >>>>>>>>>>> multiple notebooks concurrently through a single Zeppelin instance >>>>>>>>>>> (and >>>>>>>>>>> single interpreter instance) the performance is very slow. It is >>>>>>>>>>> obvious >>>>>>>>>>> that the queue gets built up within the zeppelin process and >>>>>>>>>>> interpreter >>>>>>>>>>> process in that scenario as the time taken to move the status from >>>>>>>>>>> start to >>>>>>>>>>> pending and pending to running is very high compared to the actual >>>>>>>>>>> running >>>>>>>>>>> time of a paragraph. >>>>>>>>>>> >>>>>>>>>>> Without this the multi tenancy support would be meaningless as >>>>>>>>>>> no one can practically use it in a situation where multiple users >>>>>>>>>>> are >>>>>>>>>>> trying to connect to the same instance of Zeppelin (and the related >>>>>>>>>>> interpreter). A possible solution would be to spawn separate >>>>>>>>>>> instance of >>>>>>>>>>> the same interpreter at every notebook/user level. >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Sourav >>>>>>>>>>> >>>>>>>>>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <m...@apache.org> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>> Hi Zeppelin users and developers, >>>>>>>>>>>> >>>>>>>>>>>> The roadmap we have published at >>>>>>>>>>>> >>>>>>>>>>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap >>>>>>>>>>>> is almost 9 month old, and it doesn't reflect where the >>>>>>>>>>>> community goes anymore. It's time to update. >>>>>>>>>>>> >>>>>>>>>>>> Based on mailing list, jira issues, pullrequests, feedbacks >>>>>>>>>>>> from users, conferences and meetings, I could summarize the major >>>>>>>>>>>> interest >>>>>>>>>>>> of users and developers in 7 categories. Enterprise ready, >>>>>>>>>>>> Usability >>>>>>>>>>>> improvement, Pluggability, Documentation, Backend integration, >>>>>>>>>>>> Notebook >>>>>>>>>>>> storage, and Visualization. >>>>>>>>>>>> >>>>>>>>>>>> And i could list related subjects under each categories. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> - Enterprise ready >>>>>>>>>>>> - Authentication >>>>>>>>>>>> - Shiro authentication ZEPPELIN-548 >>>>>>>>>>>> <https://issues.apache.org/jira/browse/ZEPPELIN-548> >>>>>>>>>>>> - Authorization >>>>>>>>>>>> - Notebook authorization PR-681 >>>>>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/681> >>>>>>>>>>>> - Security >>>>>>>>>>>> - Multi-tenancy >>>>>>>>>>>> - Stability >>>>>>>>>>>> - Usability Improvement >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> - UX improvement >>>>>>>>>>>> - Better Table data support >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> - Download data as csv, etc PR-725 >>>>>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/725> >>>>>>>>>>>> , PR-714 >>>>>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/714> >>>>>>>>>>>> , PR-6 >>>>>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/6> >>>>>>>>>>>> , PR-89 >>>>>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/89> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> - Featureful table data display (pagenation, etc) >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> - Pluggability ZEPPELIN-533 >>>>>>>>>>>> <https://issues.apache.org/jira/browse/ZEPPELIN-533> >>>>>>>>>>>> - Pluggable visualization >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> - Dynamic Interpreter, notebook, visualization loading >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> - Repository and registry for pluggable components >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> - Improve documentation >>>>>>>>>>>> - Improve contents and readability >>>>>>>>>>>> - more tutorials, examples >>>>>>>>>>>> - Interpreter >>>>>>>>>>>> - Generic JDBC Interpreter >>>>>>>>>>>> - (spark)R Interpreter >>>>>>>>>>>> - Cluster manager for interpreter (Proposal >>>>>>>>>>>> >>>>>>>>>>>> <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal> >>>>>>>>>>>> ) >>>>>>>>>>>> - more interpreters >>>>>>>>>>>> - Notebook storage >>>>>>>>>>>> - Versioning ZEPPELIN-540 >>>>>>>>>>>> <http://issues.apache.org/jira/browse/ZEPPELIN-540> >>>>>>>>>>>> - more notebook storages >>>>>>>>>>>> - Visualization >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> - More visualizations PR-152 >>>>>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/152>, >>>>>>>>>>>> PR-728 >>>>>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/728>, >>>>>>>>>>>> PR-336 >>>>>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/336>, >>>>>>>>>>>> PR-321 >>>>>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/321> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> - Customize graph (show/hide label, color, etc) >>>>>>>>>>>> >>>>>>>>>>>> It will help anyone quickly get overall interest of project and >>>>>>>>>>>> the direction. And based on this roadmap, we can discuss and >>>>>>>>>>>> re-define the >>>>>>>>>>>> next release 0.6.0 scope and it's schedule. >>>>>>>>>>>> >>>>>>>>>>>> What do you think? Any feedback would be appreciated. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> moon >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Vinayak Agrawal >>>>>>>> >>>>>>>> >>>>>>>> "To Strive, To Seek, To Find and Not to Yield!" >>>>>>>> ~Lord Alfred Tennyson >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Vinayak Agrawal >>>>>> Big Data Analytics >>>>>> IBM >>>>>> >>>>>> "To Strive, To Seek, To Find and Not to Yield!" >>>>>> ~Lord Alfred Tennyson >>>>>> >>>>> >>>>> >>>> >