+1 on @rick. quality is really important... I am still encountering bugs consistently
On Tue, Mar 1, 2016 at 10:16 AM, TEJA SRIVASTAV <tejasrivas...@gmail.com> wrote: > +1 on @rick > > On Tue, Mar 1, 2016 at 11:26 PM Benjamin Kim <bbuil...@gmail.com> wrote: > >> I see in the Enterprise section that multi-tenancy will be included, will >> this have user impersonation too? In this way, the user executing will be >> the user owning the process. >> >> On Mar 1, 2016, at 12:51 AM, Shabeel Syed <shabeels...@gmail.com> wrote: >> >> +1 >> >> Hi Tamas, >> Pluggable external visualization is really a GREAT feature to have. >> I'm looking forward to this :) >> >> Regards >> Shabeel >> >> On Tue, Mar 1, 2016 at 2:16 PM, Tamas Szuromi <tamas.szur...@odigeo.com> >> wrote: >> >>> Hey, >>> >>> Really promising roadmap. >>> >>> I'd only push more visualization options. I agree built in >>> visualization is needed with limited charting options but I think we also >>> need somehow 'inject' external js visualizations also. >>> >>> >>> For scheduling Zeppelin notebooks we use >>> https://github.com/airbnb/airflow <https://github.com/airbnb/airflow> >>> through >>> the job rest api. It's an enterprise ready and very robust solution >>> right now. >>> >>> >>> *Tamas* >>> >>> On 1 March 2016 at 09:12, Eran Witkon <eranwit...@gmail.com> wrote: >>> >>>> One point to clarify, I don't want to suggest Oozie in specific, I want >>>> to think about which features we develop and which ones we integrate >>>> external, preferred Apache, technology? We don't think about building our >>>> own storage services so why build our own scheduler? >>>> Eran >>>> On Tue, 1 Mar 2016 at 09:49 moon soo Lee <m...@apache.org> wrote: >>>> >>>>> @Vinayak, @Eran, @Benjamin, @Guilherme, @Sourav, @Rick >>>>> Now I can see a lot of demands around enterprise level job scheduling. >>>>> Either external or built-in, I completely agree having enterprise level >>>>> job >>>>> scheduling support on the roadmap. >>>>> ZEPPELIN-137 <https://issues.apache.org/jira/browse/ZEPPELIN-137>, >>>>> ZEPPELIN-531 <https://issues.apache.org/jira/browse/ZEPPELIN-531> are >>>>> related issues i can find in our JIRA. >>>>> >>>>> @Vinayak >>>>> Regarding importing notebook from github, Zeppelin has pluggable >>>>> notebook storage layer (see related package >>>>> <https://github.com/apache/incubator-zeppelin/tree/master/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/repo>). >>>>> So, github notebook sync can be implemented easily. >>>>> >>>>> @Shabeel >>>>> Right, we need better manage management to prevent such OOM. >>>>> And i think table is one of the most frequently used way of displaying >>>>> data. So definitely, we'll need more features like filter, sort, etc. >>>>> After this roadmap discussion, discussion for the next release will >>>>> follow. Then we'll get idea when those features will be available. >>>>> >>>>> @Prasad >>>>> Thanks for mentioning HA and DR. They're really important subject for >>>>> enterprise use. Definitely Zeppelin will need to address them. >>>>> And displaying meta information of notebook on top level page is good >>>>> idea. >>>>> >>>>> It's really great to hear many opinions and ideas. >>>>> And thanks @Rick for sharing valuable view to Zeppelin project. >>>>> >>>>> Thanks, >>>>> moon >>>>> >>>>> >>>>> On Mon, Feb 29, 2016 at 11:14 PM Rick Moritz <rah...@gmail.com> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> For one, I know that there is rudimentary scheduling built into >>>>>> Zeppelin already (at least I fixed a bug in the test for a scheduling >>>>>> feature a few months ago). >>>>>> But another point is, that Zeppelin should also focus on quality, >>>>>> reproduceability and portability. >>>>>> Although this doesn't offer exciting new features, it would make >>>>>> development much easier. >>>>>> >>>>>> Cross-platform testability, Tests that pass when run sequentially, >>>>>> compatibility with Firefox, and many more open issues that make it so >>>>>> much >>>>>> harder to enhance Zeppelin and add features should be addressed soon, >>>>>> preferably before more features are added. Already Zeppelin is suffering >>>>>> - >>>>>> in my opinion - from quite a lot of feature creep, and we should avoid >>>>>> putting in the kitchen sink, at the cost of quality and maintainability. >>>>>> Instead modularity (ZEPPELIN-533 in particular) should be targeted. >>>>>> >>>>>> Oozie, in my opinion, is a dead end - it may de-facto still be in use >>>>>> on many clusters, but it's not getting the love it needs, and I wouldn't >>>>>> bet on it, when it comes to integrating scheduling. Instead, any external >>>>>> tool should be able to use the REST-API to trigger executions, if you >>>>>> want >>>>>> external scheduling. >>>>>> >>>>>> So, in conclusion, if we take Moon's list as a list of descending >>>>>> priorities, I fully agree, under the condition that code quality is >>>>>> included as a subset of enterprise-readyness. Auth* is paramount >>>>>> (Kerberos >>>>>> SPNEGO SSO support is what we really want) with user and group rights >>>>>> assignment on the notebook level. We probably also need Knox-integration >>>>>> (ODP-Members looking at integrating Zeppelin should consider contributing >>>>>> this), and integration of something like Spree ( >>>>>> https://github.com/hammerlab/spree) to be able to profile jobs. >>>>>> >>>>>> I'm hopeful that soon I can resume contributing some quality-oriented >>>>>> code, to drive this "necessary evil" forward ;) >>>>>> >>>>>> On Mon, Feb 29, 2016 at 8:27 PM, Sourav Mazumder < >>>>>> sourav.mazumde...@gmail.com> wrote: >>>>>> >>>>>>> I do agree with Vinayak. It need not be coupled with Oozie. >>>>>>> >>>>>>> Rather one should be able to call it from any scheduler typically >>>>>>> used in enterprise level. May be support for BPML. >>>>>>> >>>>>>> I believe the existing ability to call/execute a Zeppelin Notebook >>>>>>> or a specific paragraph within a notebook using REST API should take >>>>>>> care >>>>>>> of this requirement to some extent. >>>>>>> >>>>>>> Regards, >>>>>>> Sourav >>>>>>> >>>>>>> On Mon, Feb 29, 2016 at 11:23 AM, Vinayak Agrawal < >>>>>>> vinayakagrawa...@gmail.com> wrote: >>>>>>> >>>>>>>> @Eran Witkon, >>>>>>>> Thanks for the suggestion Eran. I concur with your thought. >>>>>>>> If Zepplin can be integrated with oozie, that would be wonderful. >>>>>>>> Users will also be able to leverage their Oozie skills. >>>>>>>> This would be promising for now. >>>>>>>> However, in the future Hadoop might not necessarily be installed in >>>>>>>> Spark Cluster and Oozie (since its installs with Hadoop Distribution) >>>>>>>> might >>>>>>>> not be available. >>>>>>>> So perhaps we should give a thought about this feature for the >>>>>>>> future. Should it depend on oozie or should Zeppelin have its owns >>>>>>>> scheduling? >>>>>>>> >>>>>>>> As Benjamin has iterated, Databrick notebook has this as a core >>>>>>>> notebook feature. >>>>>>>> >>>>>>>> >>>>>>>> Also, would anybody give any suggestions regarding "sync with >>>>>>>> github" feature? >>>>>>>> -Exporting notebook to Github >>>>>>>> -Importing notebook from Github >>>>>>>> >>>>>>>> Thanks >>>>>>>> Vinayak >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Feb 29, 2016 at 4:17 AM, Eran Witkon <eranwit...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> @Vinayak Agrawal I would suggest adding the ability to connect >>>>>>>>> zeppelin to existing scheduling tools\workflow tools such as >>>>>>>>> https://oozie.apache.org/. this requires betters hooks and status >>>>>>>>> reporting but doesn't make zeppeling and ETL\scheduler tool by itself/ >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal < >>>>>>>>> vinayakagrawa...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Moon, >>>>>>>>>> The new roadmap looks very promising. I am very happy to see >>>>>>>>>> security in the list. >>>>>>>>>> I have some suggestions regarding Enterprise Ready features: >>>>>>>>>> >>>>>>>>>> 1. Job Scheduler - Can this be improved? >>>>>>>>>> Currently the scheduler can be used with Cron expression or a >>>>>>>>>> pre-set time. But in an enterprise solution, a notebook might be one >>>>>>>>>> piece >>>>>>>>>> of the workflow. Can we look towards the functionality of scheduling >>>>>>>>>> notebook's based on other notebooks finishing their job successfully? >>>>>>>>>> This requirement would arise in any ETL workflow, where all the >>>>>>>>>> downstream users wait for the ETL notebook to finish successfully. >>>>>>>>>> Only >>>>>>>>>> after that, other business oriented notebooks can be executed. >>>>>>>>>> >>>>>>>>>> 2. Importing a notebook - Is there a current requirement or >>>>>>>>>> future plan to implement a feature that allows >>>>>>>>>> import-notebook-from-github? >>>>>>>>>> This would allow users to share notebooks seamlessly. >>>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>> Vinayak >>>>>>>>>> >>>>>>>>>> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <m...@apache.org> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Zhong Wang, >>>>>>>>>>> Right, Folder support would be quite useful. Thanks for the >>>>>>>>>>> opinion. >>>>>>>>>>> >>>>>>>>>> Hope i can finish the work pr-190 >>>>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/190>. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Sourav, >>>>>>>>>>> Regarding concurrent running, Zeppelin doesn't have limitation >>>>>>>>>>> of run paragraph/query concurrently. Interpreter can implement it's >>>>>>>>>>> own >>>>>>>>>>> scheduling policy. For example, SparkSQL interpreter and >>>>>>>>>>> ShellInterpreter >>>>>>>>>>> can already run paragraph/query concurrently. >>>>>>>>>>> >>>>>>>>>>> SparkInterpreter is implemented with FIFO scheduler considering >>>>>>>>>>> nature of scala compiler. That's why user can not run multiple >>>>>>>>>>> paragraph >>>>>>>>>>> concurrently when they work with SparkInterpreter. >>>>>>>>>>> But as Zhong Wang mentioned, pr-703 enables each notebook will >>>>>>>>>>> have separate scala compiler so paragraphs run concurrently, while >>>>>>>>>>> they're >>>>>>>>>>> in different notebooks. >>>>>>>>>>> Thanks for the feedback! >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> moon >>>>>>>>>>> >>>>>>>>>> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang < >>>>>>>>>>> wangzhong....@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>> Sourav: I think this newly merged PR can help you >>>>>>>>>>>> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537 >>>>>>>>>>>> >>>>>>>>>>>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder < >>>>>>>>>>>> sourav.mazumde...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>> Hi Moon, >>>>>>>>>>>>> >>>>>>>>>>>>> This looks great. >>>>>>>>>>>>> >>>>>>>>>>>>> My only suggestion would be to include a PR/feature - Support >>>>>>>>>>>>> for Running Concurrent paragraphs/queries in Zeppelin. >>>>>>>>>>>>> >>>>>>>>>>>>> Right now if more than one user tries to run paragraphs in >>>>>>>>>>>>> multiple notebooks concurrently through a single Zeppelin >>>>>>>>>>>>> instance (and >>>>>>>>>>>>> single interpreter instance) the performance is very slow. It is >>>>>>>>>>>>> obvious >>>>>>>>>>>>> that the queue gets built up within the zeppelin process and >>>>>>>>>>>>> interpreter >>>>>>>>>>>>> process in that scenario as the time taken to move the status >>>>>>>>>>>>> from start to >>>>>>>>>>>>> pending and pending to running is very high compared to the >>>>>>>>>>>>> actual running >>>>>>>>>>>>> time of a paragraph. >>>>>>>>>>>>> >>>>>>>>>>>>> Without this the multi tenancy support would be meaningless as >>>>>>>>>>>>> no one can practically use it in a situation where multiple users >>>>>>>>>>>>> are >>>>>>>>>>>>> trying to connect to the same instance of Zeppelin (and the >>>>>>>>>>>>> related >>>>>>>>>>>>> interpreter). A possible solution would be to spawn separate >>>>>>>>>>>>> instance of >>>>>>>>>>>>> the same interpreter at every notebook/user level. >>>>>>>>>>>>> >>>>>>>>>>>>> Regards, >>>>>>>>>>>>> Sourav >>>>>>>>>>>>> >>>>>>>>>>>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <m...@apache.org >>>>>>>>>>>>> > wrote: >>>>>>>>>>>>> >>>>>>>>>>>> Hi Zeppelin users and developers, >>>>>>>>>>>>>> >>>>>>>>>>>>>> The roadmap we have published at >>>>>>>>>>>>>> >>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap >>>>>>>>>>>>>> is almost 9 month old, and it doesn't reflect where the >>>>>>>>>>>>>> community goes anymore. It's time to update. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Based on mailing list, jira issues, pullrequests, feedbacks >>>>>>>>>>>>>> from users, conferences and meetings, I could summarize the >>>>>>>>>>>>>> major interest >>>>>>>>>>>>>> of users and developers in 7 categories. Enterprise ready, >>>>>>>>>>>>>> Usability >>>>>>>>>>>>>> improvement, Pluggability, Documentation, Backend integration, >>>>>>>>>>>>>> Notebook >>>>>>>>>>>>>> storage, and Visualization. >>>>>>>>>>>>>> >>>>>>>>>>>>>> And i could list related subjects under each categories. >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> - Enterprise ready >>>>>>>>>>>>>> - Authentication >>>>>>>>>>>>>> - Shiro authentication ZEPPELIN-548 >>>>>>>>>>>>>> <https://issues.apache.org/jira/browse/ZEPPELIN-548> >>>>>>>>>>>>>> - Authorization >>>>>>>>>>>>>> - Notebook authorization PR-681 >>>>>>>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/681> >>>>>>>>>>>>>> - Security >>>>>>>>>>>>>> - Multi-tenancy >>>>>>>>>>>>>> - Stability >>>>>>>>>>>>>> - Usability Improvement >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> - UX improvement >>>>>>>>>>>>>> - Better Table data support >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> - Download data as csv, etc PR-725 >>>>>>>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/725> >>>>>>>>>>>>>> , PR-714 >>>>>>>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/714> >>>>>>>>>>>>>> , PR-6 >>>>>>>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/6> >>>>>>>>>>>>>> , PR-89 >>>>>>>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/89> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> - Featureful table data display (pagenation, etc) >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> - Pluggability ZEPPELIN-533 >>>>>>>>>>>>>> <https://issues.apache.org/jira/browse/ZEPPELIN-533> >>>>>>>>>>>>>> - Pluggable visualization >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> - Dynamic Interpreter, notebook, visualization loading >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> - Repository and registry for pluggable components >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> - Improve documentation >>>>>>>>>>>>>> - Improve contents and readability >>>>>>>>>>>>>> - more tutorials, examples >>>>>>>>>>>>>> - Interpreter >>>>>>>>>>>>>> - Generic JDBC Interpreter >>>>>>>>>>>>>> - (spark)R Interpreter >>>>>>>>>>>>>> - Cluster manager for interpreter (Proposal >>>>>>>>>>>>>> >>>>>>>>>>>>>> <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal> >>>>>>>>>>>>>> ) >>>>>>>>>>>>>> - more interpreters >>>>>>>>>>>>>> - Notebook storage >>>>>>>>>>>>>> - Versioning ZEPPELIN-540 >>>>>>>>>>>>>> <http://issues.apache.org/jira/browse/ZEPPELIN-540> >>>>>>>>>>>>>> - more notebook storages >>>>>>>>>>>>>> - Visualization >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> - More visualizations PR-152 >>>>>>>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/152> >>>>>>>>>>>>>> , PR-728 >>>>>>>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/728> >>>>>>>>>>>>>> , PR-336 >>>>>>>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/336> >>>>>>>>>>>>>> , PR-321 >>>>>>>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/321> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> - Customize graph (show/hide label, color, etc) >>>>>>>>>>>>>> >>>>>>>>>>>>>> It will help anyone quickly get overall interest of project >>>>>>>>>>>>>> and the direction. And based on this roadmap, we can discuss and >>>>>>>>>>>>>> re-define >>>>>>>>>>>>>> the next release 0.6.0 scope and it's schedule. >>>>>>>>>>>>>> >>>>>>>>>>>>>> What do you think? Any feedback would be appreciated. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> moon >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Vinayak Agrawal >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> "To Strive, To Seek, To Find and Not to Yield!" >>>>>>>>>> ~Lord Alfred Tennyson >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Vinayak Agrawal >>>>>>>> Big Data Analytics >>>>>>>> IBM >>>>>>>> >>>>>>>> "To Strive, To Seek, To Find and Not to Yield!" >>>>>>>> ~Lord Alfred Tennyson >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>> >> >>