Hi All, I've submitted a design approach for Multi-tenancy and Security for Zeppelin - https://issues.apache.org/jira/browse/ZEPPELIN-773.
Look forward for the reviews and suggestions on the topic. Thanks, Rohit. On Sat, Mar 26, 2016 at 10:04 PM, moon soo Lee <m...@apache.org> wrote: > There is an discussion thread for Release Policy. > https://s.apache.org/3JCm please check this thread, too. > > Thanks, > moon > > > On Thu, Mar 24, 2016 at 12:02 PM Guilherme Silveira < > guilhermecgss...@gmail.com> wrote: > >> Is there a predefined release interval, lets say, 6 months or 1 year, >> between one version and another? >> Em 23 de mar de 2016 4:10 PM, "Joel Van Veluwen" < >> joel.vanvelu...@quantium.com.au> escreveu: >> >>> Hi Nikolay, >>> >>> >>> >>> I raised this with MapR and there doesn’t appear to be plans to add >>> Zeppelin to 5.1 >>> >>> >>> >>> https://community.mapr.com/message/40332 >>> >>> >>> >>> We are deploying it manually and everything is pretty stable – but it >>> will vary depending on your environment. >>> >>> >>> >>> Cheers, >>> >>> >>> >>> Joel Van Veluwen >>> *QUANTIUM* >>> Level 25, 8 Chifley >>> 8-12 Chifley Square >>> Sydney NSW 2000 >>> >>> T: +61 2 8224 8981 >>> M: +61 403 153 265 >>> F: +61 2 9292 6444 >>> >>> W: quantium.com.au <http://www.quantium.com.au> >>> ------------------------------ >>> >>> linkedin.com/company/quantium <http://www.linkedin.com/company/quantium> >>> facebook.com/QuantiumAustralia >>> <http://www.facebook.com/QuantiumAustralia> >>> twitter.com/QuantiumAU <http://www.twitter.com/QuantiumAU> >>> >>> The contents of this email, including attachments, may be confidential >>> information. If you are not the intended recipient, any use, disclosure or >>> copying of the information is unauthorised. If you have received this email >>> in error, we would be grateful if you would notify us immediately by email >>> reply, phone (+ 61 2 9292 6400) or fax (+ 61 2 9292 6444) and delete >>> the message from your system. >>> >>> >>> >>> *From:* Nikolay Voronchikhin [mailto:nvoronchik...@gmail.com] >>> *Sent:* Tuesday, 22 March 2016 11:39 AM >>> *To:* users@zeppelin.incubator.apache.org >>> *Subject:* Re: [DISCUSS] Update Roadmap >>> >>> >>> >>> Hi Zeppelin Users and Developers, >>> >>> >>> >>> Do you know if MapR will be adding Zeppelin to its roadmap for the next >>> version after MapR 5.1? >>> >>> >>> >>> We see in Hue 3.9 that it provides notebooks for R Shell, Python Shell, >>> PySpark, SparkR, Hive SQL, Impala SQL, and Spark SQL, but no Drill SQL >>> notebook. >>> >>> We are looking for an Apache Project that focuses on a Drill Notebook UI >>> that performs better than the Drill Web Console UI itself. >>> >>> >>> >>> Sincerely, >>> >>> *Nikolay Voronchikhin* >>> >>> *Big Data/Data Warehouse/Data Science/Data Platforms Engineer at Cisco* >>> >>> *https://www.linkedin.com/in/nvoronchikhin >>> <https://www.linkedin.com/in/nvoronchikhin>* >>> >>> *E-mail: nvoronchik...@gmail.com <nvoronchik...@gmail.com>* >>> >>> *Mobile: 951-288-2778 <951-288-2778>* >>> >>> >>> >>> >>> >>> On Mon, Mar 21, 2016 at 2:44 PM, rohit choudhary <rconl...@gmail.com> >>> wrote: >>> >>> Dear All, >>> >>> >>> >>> I think direction setting is important for Enterprise readiness. I have >>> a little bit of an overview of Ambari Views, which is very similar in >>> nature to Zeppelin. Please let me explain: >>> >>> >>> >>> Hive View - interacts with Hive >>> >>> Pig View - interacts with Pig >>> >>> Workflow Designer - interacts with Oozie >>> >>> >>> >>> We have a very similar architecture in Zeppelin where we interact with >>> these systems through Interpreters. The usage will also be similar, as both >>> with interact with Hadoop clusters or in some cases Spark with Yarn on >>> HDFS. Our priorities should include: >>> >>> >>> >>> - Design & implement for multi-tenancy >>> >>> - Auditability from Data/State and Lineage perspective >>> >>> - Ability to share Notebooks/Data/State across users, preferably through >>> SparkContext sharing >>> >>> - Security between Zeppelin and the other systems, not limited to Spark >>> through Kerberos. (@Rick +1) >>> >>> >>> >>> I will share an initial draft of the thoughts I have in mind, in the >>> next couple of days. >>> >>> >>> >>> Thanks, >>> >>> Rohit. >>> >>> >>> >>> >>> >>> >>> >>> On Thu, Mar 3, 2016 at 7:54 AM, moon soo Lee <m...@apache.org> wrote: >>> >>> Shabeel, thanks for the feedback about rest api and custom id. that >>> might help avoid multiple rest api calls. >>> >>> >>> >>> Thanks everyone for valuable feedback. Looks like all we're going to the >>> same direction. I have updated wiki. >>> >>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap >>> >>> Please take a look. >>> >>> >>> >>> I'm sure there're many missing details in this roadmap. I must say >>> something not on this roadmap doesn't mean community is not working on or >>> can't be included in the Zeppelin. Roadmap represents more like community >>> interest and overall direction. >>> >>> We're not changing roadmap everyday, but that doesn't mean roadmap is >>> set in stone and never be changed. We can improve it continuously. >>> >>> >>> >>> Please feel free to fork the this mail thread for any further discussion >>> on specific subject. (e.g. job scheduling) >>> >>> >>> >>> Thanks, >>> >>> moon >>> >>> >>> >>> On Wed, Mar 2, 2016 at 12:31 AM Shabeel Syed <shabeels...@gmail.com> >>> wrote: >>> >>> Also we need better rest api support for creating and fetching the >>> notebooks and paragraphs. >>> >>> for example if I can set custom defined notebookid and paragraphid , we >>> can avoid multiple rest api calls. >>> >>> >>> >>> http://localhost:8080/#/notebook/ >>> <notebookid>/paragraph/<paragraphid>?asIframe >>> >>> should return me error if notebook or paragraph deos not exists. >>> >>> >>> >>> and while creating notebook or paragraph I should be able to mention my >>> custom ids. >>> >>> >>> >>> Regards >>> >>> Shabeel >>> >>> >>> >>> On Wed, Mar 2, 2016 at 11:55 AM, Zhong Wang <wangzhong....@gmail.com> >>> wrote: >>> >>> +1 on @rick. quality is really important... I am still encountering bugs >>> consistently >>> >>> >>> >>> On Tue, Mar 1, 2016 at 10:16 AM, TEJA SRIVASTAV <tejasrivas...@gmail.com> >>> wrote: >>> >>> +1 on @rick >>> >>> >>> >>> On Tue, Mar 1, 2016 at 11:26 PM Benjamin Kim <bbuil...@gmail.com> wrote: >>> >>> I see in the Enterprise section that multi-tenancy will be included, >>> will this have user impersonation too? In this way, the user executing will >>> be the user owning the process. >>> >>> >>> >>> On Mar 1, 2016, at 12:51 AM, Shabeel Syed <shabeels...@gmail.com> wrote: >>> >>> >>> >>> +1 >>> >>> >>> >>> Hi Tamas, >>> >>> Pluggable external visualization is really a GREAT feature to have. >>> I'm looking forward to this :) >>> >>> >>> >>> Regards >>> >>> Shabeel >>> >>> >>> >>> On Tue, Mar 1, 2016 at 2:16 PM, Tamas Szuromi <tamas.szur...@odigeo.com> >>> wrote: >>> >>> Hey, >>> >>> >>> >>> Really promising roadmap. >>> >>> >>> >>> I'd only push more visualization options. I agree built in >>> visualization is needed with limited charting options but I think we also >>> need somehow 'inject' external js visualizations also. >>> >>> >>> >>> >>> >>> For scheduling Zeppelin notebooks we use >>> https://github.com/airbnb/airflow <https://github.com/airbnb/airflow> >>> through >>> the job rest api. It's an enterprise ready and very robust solution >>> right now. >>> >>> >>> >>> *Tamas* >>> >>> >>> >>> On 1 March 2016 at 09:12, Eran Witkon <eranwit...@gmail.com> wrote: >>> >>> One point to clarify, I don't want to suggest Oozie in specific, I want >>> to think about which features we develop and which ones we integrate >>> external, preferred Apache, technology? We don't think about building our >>> own storage services so why build our own scheduler? >>> Eran >>> >>> On Tue, 1 Mar 2016 at 09:49 moon soo Lee <m...@apache.org> wrote: >>> >>> @Vinayak, @Eran, @Benjamin, @Guilherme, @Sourav, @Rick >>> >>> Now I can see a lot of demands around enterprise level job scheduling. >>> Either external or built-in, I completely agree having enterprise level job >>> scheduling support on the roadmap. >>> >>> ZEPPELIN-137 <https://issues.apache.org/jira/browse/ZEPPELIN-137>, >>> ZEPPELIN-531 <https://issues.apache.org/jira/browse/ZEPPELIN-531> are >>> related issues i can find in our JIRA. >>> >>> >>> >>> @Vinayak >>> >>> Regarding importing notebook from github, Zeppelin has pluggable >>> notebook storage layer (see related package >>> <https://github.com/apache/incubator-zeppelin/tree/master/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/repo>). >>> So, github notebook sync can be implemented easily. >>> >>> >>> >>> @Shabeel >>> >>> Right, we need better manage management to prevent such OOM. >>> >>> And i think table is one of the most frequently used way of displaying >>> data. So definitely, we'll need more features like filter, sort, etc. >>> >>> After this roadmap discussion, discussion for the next release will >>> follow. Then we'll get idea when those features will be available. >>> >>> >>> >>> @Prasad >>> >>> Thanks for mentioning HA and DR. They're really important subject for >>> enterprise use. Definitely Zeppelin will need to address them. >>> >>> And displaying meta information of notebook on top level page is good >>> idea. >>> >>> >>> >>> It's really great to hear many opinions and ideas. >>> >>> And thanks @Rick for sharing valuable view to Zeppelin project. >>> >>> >>> >>> Thanks, >>> >>> moon >>> >>> >>> >>> >>> >>> On Mon, Feb 29, 2016 at 11:14 PM Rick Moritz <rah...@gmail.com> wrote: >>> >>> Hi, >>> >>> For one, I know that there is rudimentary scheduling built into Zeppelin >>> already (at least I fixed a bug in the test for a scheduling feature a few >>> months ago). >>> >>> But another point is, that Zeppelin should also focus on quality, >>> reproduceability and portability. >>> >>> Although this doesn't offer exciting new features, it would make >>> development much easier. >>> >>> Cross-platform testability, Tests that pass when run sequentially, >>> compatibility with Firefox, and many more open issues that make it so much >>> harder to enhance Zeppelin and add features should be addressed soon, >>> preferably before more features are added. Already Zeppelin is suffering - >>> in my opinion - from quite a lot of feature creep, and we should avoid >>> putting in the kitchen sink, at the cost of quality and maintainability. >>> Instead modularity (ZEPPELIN-533 in particular) should be targeted. >>> >>> Oozie, in my opinion, is a dead end - it may de-facto still be in use on >>> many clusters, but it's not getting the love it needs, and I wouldn't bet >>> on it, when it comes to integrating scheduling. Instead, any external tool >>> should be able to use the REST-API to trigger executions, if you want >>> external scheduling. >>> >>> So, in conclusion, if we take Moon's list as a list of descending >>> priorities, I fully agree, under the condition that code quality is >>> included as a subset of enterprise-readyness. Auth* is paramount (Kerberos >>> SPNEGO SSO support is what we really want) with user and group rights >>> assignment on the notebook level. We probably also need Knox-integration >>> (ODP-Members looking at integrating Zeppelin should consider contributing >>> this), and integration of something like Spree ( >>> https://github.com/hammerlab/spree) to be able to profile jobs. >>> >>> I'm hopeful that soon I can resume contributing some quality-oriented >>> code, to drive this "necessary evil" forward ;) >>> >>> >>> >>> On Mon, Feb 29, 2016 at 8:27 PM, Sourav Mazumder < >>> sourav.mazumde...@gmail.com> wrote: >>> >>> I do agree with Vinayak. It need not be coupled with Oozie. >>> >>> Rather one should be able to call it from any scheduler typically used >>> in enterprise level. May be support for BPML. >>> >>> I believe the existing ability to call/execute a Zeppelin Notebook or a >>> specific paragraph within a notebook using REST API should take care of >>> this requirement to some extent. >>> >>> Regards, >>> >>> Sourav >>> >>> >>> >>> On Mon, Feb 29, 2016 at 11:23 AM, Vinayak Agrawal < >>> vinayakagrawa...@gmail.com> wrote: >>> >>> @Eran Witkon, >>> >>> Thanks for the suggestion Eran. I concur with your thought. >>> >>> If Zepplin can be integrated with oozie, that would be wonderful. Users >>> will also be able to leverage their Oozie skills. >>> >>> This would be promising for now. >>> >>> However, in the future Hadoop might not necessarily be installed in >>> Spark Cluster and Oozie (since its installs with Hadoop Distribution) might >>> not be available. >>> >>> So perhaps we should give a thought about this feature for the future. >>> Should it depend on oozie or should Zeppelin have its owns scheduling? >>> >>> As Benjamin has iterated, Databrick notebook has this as a core notebook >>> feature. >>> >>> >>> >>> Also, would anybody give any suggestions regarding "sync with github" >>> feature? >>> >>> -Exporting notebook to Github >>> >>> -Importing notebook from Github >>> >>> >>> >>> Thanks >>> >>> Vinayak >>> >>> >>> >>> >>> >>> On Mon, Feb 29, 2016 at 4:17 AM, Eran Witkon <eranwit...@gmail.com> >>> wrote: >>> >>> @*Vinayak Agrawal *I would suggest adding the ability to connect >>> zeppelin to existing scheduling tools\workflow tools such as >>> https://oozie.apache.org/. this requires betters hooks and status >>> reporting but doesn't make zeppeling and ETL\scheduler tool by itself/ >>> >>> >>> >>> >>> >>> On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal < >>> vinayakagrawa...@gmail.com> wrote: >>> >>> Moon, >>> >>> The new roadmap looks very promising. I am very happy to see security in >>> the list. >>> I have some suggestions regarding Enterprise Ready features: >>> >>> >>> 1. Job Scheduler - Can this be improved? >>> >>> Currently the scheduler can be used with Cron expression or a pre-set >>> time. But in an enterprise solution, a notebook might be one piece of the >>> workflow. Can we look towards the functionality of scheduling notebook's >>> based on other notebooks finishing their job successfully? >>> >>> This requirement would arise in any ETL workflow, where all the >>> downstream users wait for the ETL notebook to finish successfully. Only >>> after that, other business oriented notebooks can be executed. >>> >>> 2. Importing a notebook - Is there a current requirement or future plan >>> to implement a feature that allows import-notebook-from-github? This would >>> allow users to share notebooks seamlessly. >>> >>> Thanks >>> >>> Vinayak >>> >>> >>> >>> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <m...@apache.org> wrote: >>> >>> Zhong Wang, >>> >>> Right, Folder support would be quite useful. Thanks for the opinion. >>> >>> Hope i can finish the work pr-190 >>> <https://github.com/apache/incubator-zeppelin/pull/190>. >>> >>> >>> >>> Sourav, >>> >>> Regarding concurrent running, Zeppelin doesn't have limitation of run >>> paragraph/query concurrently. Interpreter can implement it's own scheduling >>> policy. For example, SparkSQL interpreter and ShellInterpreter can already >>> run paragraph/query concurrently. >>> >>> >>> >>> SparkInterpreter is implemented with FIFO scheduler considering nature >>> of scala compiler. That's why user can not run multiple paragraph >>> concurrently when they work with SparkInterpreter. >>> >>> But as Zhong Wang mentioned, pr-703 enables each notebook will have >>> separate scala compiler so paragraphs run concurrently, while they're in >>> different notebooks. >>> >>> Thanks for the feedback! >>> >>> >>> >>> Best, >>> >>> moon >>> >>> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <wangzhong....@gmail.com> >>> wrote: >>> >>> Sourav: I think this newly merged PR can help you >>> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537 >>> >>> >>> >>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder < >>> sourav.mazumde...@gmail.com> wrote: >>> >>> Hi Moon, >>> >>> This looks great. >>> >>> My only suggestion would be to include a PR/feature - Support for >>> Running Concurrent paragraphs/queries in Zeppelin. >>> >>> Right now if more than one user tries to run paragraphs in multiple >>> notebooks concurrently through a single Zeppelin instance (and single >>> interpreter instance) the performance is very slow. It is obvious that the >>> queue gets built up within the zeppelin process and interpreter process in >>> that scenario as the time taken to move the status from start to pending >>> and pending to running is very high compared to the actual running time of >>> a paragraph. >>> >>> Without this the multi tenancy support would be meaningless as no one >>> can practically use it in a situation where multiple users are trying to >>> connect to the same instance of Zeppelin (and the related interpreter). A >>> possible solution would be to spawn separate instance of the same >>> interpreter at every notebook/user level. >>> >>> Regards, >>> >>> Sourav >>> >>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <m...@apache.org> wrote: >>> >>> Hi Zeppelin users and developers, >>> >>> >>> >>> The roadmap we have published at >>> >>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap >>> >>> is almost 9 month old, and it doesn't reflect where the community goes >>> anymore. It's time to update. >>> >>> >>> >>> Based on mailing list, jira issues, pullrequests, feedbacks from users, >>> conferences and meetings, I could summarize the major interest of users and >>> developers in 7 categories. Enterprise ready, Usability improvement, >>> Pluggability, Documentation, Backend integration, Notebook storage, and >>> Visualization. >>> >>> >>> >>> And i could list related subjects under each categories. >>> >>> >>> - Enterprise ready >>> >>> >>> - Authentication >>> >>> >>> - Shiro authentication ZEPPELIN-548 >>> <https://issues.apache.org/jira/browse/ZEPPELIN-548> >>> >>> >>> - Authorization >>> >>> >>> - Notebook authorization PR-681 >>> <https://github.com/apache/incubator-zeppelin/pull/681> >>> >>> >>> - Security >>> - Multi-tenancy >>> - Stability >>> >>> >>> - Usability Improvement >>> >>> >>> - UX improvement >>> - Better Table data support >>> >>> >>> - Download data as csv, etc PR-725 >>> <https://github.com/apache/incubator-zeppelin/pull/725>, PR-714 >>> <https://github.com/apache/incubator-zeppelin/pull/714>, PR-6 >>> <https://github.com/apache/incubator-zeppelin/pull/6>, PR-89 >>> <https://github.com/apache/incubator-zeppelin/pull/89> >>> >>> >>> - Featureful table data display (pagenation, etc) >>> >>> >>> - Pluggability ZEPPELIN-533 >>> <https://issues.apache.org/jira/browse/ZEPPELIN-533> >>> >>> >>> - Pluggable visualization >>> >>> >>> - Dynamic Interpreter, notebook, visualization loading >>> >>> >>> - Repository and registry for pluggable components >>> >>> >>> - Improve documentation >>> >>> >>> - Improve contents and readability >>> - more tutorials, examples >>> >>> >>> - Interpreter >>> >>> >>> - Generic JDBC Interpreter >>> - (spark)R Interpreter >>> - Cluster manager for interpreter (Proposal >>> >>> <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal> >>> ) >>> - more interpreters >>> >>> >>> - Notebook storage >>> >>> >>> - Versioning ZEPPELIN-540 >>> <http://issues.apache.org/jira/browse/ZEPPELIN-540> >>> - more notebook storages >>> >>> >>> - Visualization >>> >>> >>> - More visualizations PR-152 >>> <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728 >>> <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336 >>> <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321 >>> <https://github.com/apache/incubator-zeppelin/pull/321> >>> >>> >>> - Customize graph (show/hide label, color, etc) >>> >>> It will help anyone quickly get overall interest of project and the >>> direction. And based on this roadmap, we can discuss and re-define the next >>> release 0.6.0 scope and it's schedule. >>> >>> >>> >>> What do you think? Any feedback would be appreciated. >>> >>> >>> >>> Thanks, >>> >>> moon >>> >>> >>> >>> >>> >>> >>> -- >>> >>> Vinayak Agrawal >>> >>> >>> >>> "To Strive, To Seek, To Find and Not to Yield!" >>> >>> ~Lord Alfred Tennyson >>> >>> >>> >>> >>> -- >>> >>> Vinayak Agrawal >>> >>> Big Data Analytics >>> >>> IBM >>> >>> "To Strive, To Seek, To Find and Not to Yield!" >>> >>> ~Lord Alfred Tennyson >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>