Re: [DISCUSS] Update Roadmap

rohit choudhary Tue, 29 Mar 2016 05:51:14 -0700

Hi All,

I've submitted a design approach for Multi-tenancy and Security for
Zeppelin - https://issues.apache.org/jira/browse/ZEPPELIN-773.


Look forward for the reviews and suggestions on the topic.

Thanks,
Rohit.

On Sat, Mar 26, 2016 at 10:04 PM, moon soo Lee <m...@apache.org> wrote:

> There is an discussion thread for Release Policy.
> https://s.apache.org/3JCm please check this thread, too.
>
> Thanks,
> moon
>
>
> On Thu, Mar 24, 2016 at 12:02 PM Guilherme Silveira <
> guilhermecgss...@gmail.com> wrote:
>
>> Is there a predefined release interval,  lets say,  6 months or 1 year,
>> between one version and another?
>> Em 23 de mar de 2016 4:10 PM, "Joel Van Veluwen" <
>> joel.vanvelu...@quantium.com.au> escreveu:
>>
>>> Hi Nikolay,
>>>
>>>
>>>
>>> I raised this with MapR and there doesn’t appear to be plans to add
>>> Zeppelin to 5.1
>>>
>>>
>>>
>>> https://community.mapr.com/message/40332
>>>
>>>
>>>
>>> We are deploying it manually and everything is pretty stable – but it
>>> will vary depending on your environment.
>>>
>>>
>>>
>>> Cheers,
>>>
>>>
>>>
>>> Joel Van Veluwen
>>> *QUANTIUM*
>>> Level 25, 8 Chifley
>>> 8-12 Chifley Square
>>> Sydney NSW 2000
>>>
>>> T: +61 2 8224 8981
>>> M: +61 403 153 265
>>> F: +61 2 9292 6444
>>>
>>> W: quantium.com.au <http://www.quantium.com.au>
>>> ------------------------------
>>>
>>> linkedin.com/company/quantium <http://www.linkedin.com/company/quantium>
>>> facebook.com/QuantiumAustralia
>>> <http://www.facebook.com/QuantiumAustralia>
>>> twitter.com/QuantiumAU <http://www.twitter.com/QuantiumAU>
>>>
>>> The contents of this email, including attachments, may be confidential
>>> information. If you are not the intended recipient, any use, disclosure or
>>> copying of the information is unauthorised. If you have received this email
>>> in error, we would be grateful if you would notify us immediately by email
>>> reply, phone (+ 61 2 9292 6400) or fax (+ 61 2 9292 6444) and delete
>>> the message from your system.
>>>
>>>
>>>
>>> *From:* Nikolay Voronchikhin [mailto:nvoronchik...@gmail.com]
>>> *Sent:* Tuesday, 22 March 2016 11:39 AM
>>> *To:* users@zeppelin.incubator.apache.org
>>> *Subject:* Re: [DISCUSS] Update Roadmap
>>>
>>>
>>>
>>> Hi Zeppelin Users and Developers,
>>>
>>>
>>>
>>> Do you know if MapR will be adding Zeppelin to its roadmap for the next
>>> version after MapR 5.1?
>>>
>>>
>>>
>>> We see in Hue 3.9 that it provides notebooks for R Shell, Python Shell,
>>> PySpark, SparkR, Hive SQL, Impala SQL, and Spark SQL, but no Drill SQL
>>> notebook.
>>>
>>> We are looking for an Apache Project that focuses on a Drill Notebook UI
>>> that performs better than the Drill Web Console UI itself.
>>>
>>>
>>>
>>> Sincerely,
>>>
>>> *Nikolay Voronchikhin*
>>>
>>> *Big Data/Data Warehouse/Data Science/Data Platforms Engineer at Cisco*
>>>
>>> *https://www.linkedin.com/in/nvoronchikhin
>>> <https://www.linkedin.com/in/nvoronchikhin>*
>>>
>>> *E-mail: nvoronchik...@gmail.com <nvoronchik...@gmail.com>*
>>>
>>> *Mobile: 951-288-2778 <951-288-2778>*
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Mar 21, 2016 at 2:44 PM, rohit choudhary <rconl...@gmail.com>
>>> wrote:
>>>
>>> Dear All,
>>>
>>>
>>>
>>> I think direction setting is important for Enterprise readiness. I have
>>> a little bit of an overview of Ambari Views, which is very similar in
>>> nature to Zeppelin. Please let me explain:
>>>
>>>
>>>
>>> Hive View - interacts with Hive
>>>
>>> Pig View - interacts with Pig
>>>
>>> Workflow Designer - interacts with Oozie
>>>
>>>
>>>
>>> We have a very similar architecture in Zeppelin where we interact with
>>> these systems through Interpreters. The usage will also be similar, as both
>>> with interact with Hadoop clusters or in some cases Spark with Yarn on
>>> HDFS. Our priorities should include:
>>>
>>>
>>>
>>> - Design & implement for multi-tenancy
>>>
>>> - Auditability from Data/State and Lineage perspective
>>>
>>> - Ability to share Notebooks/Data/State across users, preferably through
>>> SparkContext sharing
>>>
>>> - Security between Zeppelin and the other systems, not limited to Spark
>>> through Kerberos. (@Rick +1)
>>>
>>>
>>>
>>> I will share an initial draft of the thoughts I have in mind, in the
>>> next couple of days.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Rohit.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Mar 3, 2016 at 7:54 AM, moon soo Lee <m...@apache.org> wrote:
>>>
>>> Shabeel, thanks for the feedback about rest api and custom id. that
>>> might help avoid multiple rest api calls.
>>>
>>>
>>>
>>> Thanks everyone for valuable feedback. Looks like all we're going to the
>>> same direction. I have updated wiki.
>>>
>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>>>
>>> Please take a look.
>>>
>>>
>>>
>>> I'm sure there're many missing details in this roadmap. I must say
>>> something not on this roadmap doesn't mean community is not working on or
>>> can't be included in the Zeppelin. Roadmap represents more like community
>>> interest and overall direction.
>>>
>>> We're not changing roadmap everyday, but that doesn't mean roadmap is
>>> set in stone and never be changed. We can improve it continuously.
>>>
>>>
>>>
>>> Please feel free to fork the this mail thread for any further discussion
>>> on specific subject. (e.g. job scheduling)
>>>
>>>
>>>
>>> Thanks,
>>>
>>> moon
>>>
>>>
>>>
>>> On Wed, Mar 2, 2016 at 12:31 AM Shabeel Syed <shabeels...@gmail.com>
>>> wrote:
>>>
>>> Also we need better rest api support for creating and fetching the
>>> notebooks and paragraphs.
>>>
>>> for example if I can set custom defined notebookid and paragraphid , we
>>> can avoid multiple rest api calls.
>>>
>>>
>>>
>>> http://localhost:8080/#/notebook/
>>> <notebookid>/paragraph/<paragraphid>?asIframe
>>>
>>> should return me error if notebook or paragraph deos not exists.
>>>
>>>
>>>
>>> and while creating notebook or paragraph I should be able to mention my
>>> custom ids.
>>>
>>>
>>>
>>> Regards
>>>
>>> Shabeel
>>>
>>>
>>>
>>> On Wed, Mar 2, 2016 at 11:55 AM, Zhong Wang <wangzhong....@gmail.com>
>>> wrote:
>>>
>>> +1 on @rick. quality is really important... I am still encountering bugs
>>> consistently
>>>
>>>
>>>
>>> On Tue, Mar 1, 2016 at 10:16 AM, TEJA SRIVASTAV <tejasrivas...@gmail.com>
>>> wrote:
>>>
>>> +1 on @rick
>>>
>>>
>>>
>>> On Tue, Mar 1, 2016 at 11:26 PM Benjamin Kim <bbuil...@gmail.com> wrote:
>>>
>>> I see in the Enterprise section that multi-tenancy will be included,
>>> will this have user impersonation too? In this way, the user executing will
>>> be the user owning the process.
>>>
>>>
>>>
>>> On Mar 1, 2016, at 12:51 AM, Shabeel Syed <shabeels...@gmail.com> wrote:
>>>
>>>
>>>
>>> +1
>>>
>>>
>>>
>>> Hi Tamas,
>>>
>>>    Pluggable external visualization is really a GREAT feature to have.
>>> I'm looking forward to this :)
>>>
>>>
>>>
>>> Regards
>>>
>>> Shabeel
>>>
>>>
>>>
>>> On Tue, Mar 1, 2016 at 2:16 PM, Tamas Szuromi <tamas.szur...@odigeo.com>
>>> wrote:
>>>
>>> Hey,
>>>
>>>
>>>
>>> Really promising roadmap.
>>>
>>>
>>>
>>> I'd only push more visualization options. I agree built in
>>> visualization is needed with limited charting options but I think we also
>>> need somehow 'inject' external js visualizations also.
>>>
>>>
>>>
>>>
>>>
>>> For scheduling Zeppelin notebooks  we use
>>>  https://github.com/airbnb/airflow <https://github.com/airbnb/airflow> 
>>> through
>>> the job rest api. It's an enterprise ready and very robust solution
>>> right now.
>>>
>>>
>>>
>>> *Tamas*
>>>
>>>
>>>
>>> On 1 March 2016 at 09:12, Eran Witkon <eranwit...@gmail.com> wrote:
>>>
>>> One point to clarify, I don't want to suggest Oozie in specific, I want
>>> to think about which features we develop and which ones we integrate
>>> external, preferred Apache, technology? We don't think about building our
>>> own storage services so why build our own scheduler?
>>> Eran
>>>
>>> On Tue, 1 Mar 2016 at 09:49 moon soo Lee <m...@apache.org> wrote:
>>>
>>> @Vinayak, @Eran, @Benjamin, @Guilherme, @Sourav, @Rick
>>>
>>> Now I can see a lot of demands around enterprise level job scheduling.
>>> Either external or built-in, I completely agree having enterprise level job
>>> scheduling support on the roadmap.
>>>
>>> ZEPPELIN-137 <https://issues.apache.org/jira/browse/ZEPPELIN-137>,
>>> ZEPPELIN-531 <https://issues.apache.org/jira/browse/ZEPPELIN-531> are
>>> related issues i can find in our JIRA.
>>>
>>>
>>>
>>> @Vinayak
>>>
>>> Regarding importing notebook from github, Zeppelin has pluggable
>>> notebook storage layer (see related package
>>> <https://github.com/apache/incubator-zeppelin/tree/master/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/repo>).
>>> So, github notebook sync can be implemented easily.
>>>
>>>
>>>
>>> @Shabeel
>>>
>>> Right, we need better manage management to prevent such OOM.
>>>
>>> And i think table is one of the most frequently used way of displaying
>>> data. So definitely, we'll need more features like filter, sort, etc.
>>>
>>> After this roadmap discussion, discussion for the next release will
>>> follow. Then we'll get idea when those features will be available.
>>>
>>>
>>>
>>> @Prasad
>>>
>>> Thanks for mentioning HA and DR. They're really important subject for
>>> enterprise use. Definitely Zeppelin will need to address them.
>>>
>>> And displaying meta information of notebook on top level page is good
>>> idea.
>>>
>>>
>>>
>>> It's really great to hear many opinions and ideas.
>>>
>>> And thanks @Rick for sharing valuable view to Zeppelin project.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> moon
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Feb 29, 2016 at 11:14 PM Rick Moritz <rah...@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> For one, I know that there is rudimentary scheduling built into Zeppelin
>>> already (at least I fixed a bug in the test for a scheduling feature a few
>>> months ago).
>>>
>>> But another point is, that Zeppelin should also focus on quality,
>>> reproduceability and portability.
>>>
>>> Although this doesn't offer exciting new features, it would make
>>> development much easier.
>>>
>>> Cross-platform testability, Tests that pass when run sequentially,
>>> compatibility with Firefox, and many more open issues that make it so much
>>> harder to enhance Zeppelin and add features should be addressed soon,
>>> preferably before more features are added. Already Zeppelin is suffering -
>>> in my opinion - from quite a lot of feature creep, and we should avoid
>>> putting in the kitchen sink, at the cost of quality and maintainability.
>>> Instead modularity (ZEPPELIN-533 in particular) should be targeted.
>>>
>>> Oozie, in my opinion, is a dead end - it may de-facto still be in use on
>>> many clusters, but it's not getting the love it needs, and I wouldn't bet
>>> on it, when it comes to integrating scheduling. Instead, any external tool
>>> should be able to use the REST-API to trigger executions, if you want
>>> external scheduling.
>>>
>>> So, in conclusion, if we take Moon's list as a list of descending
>>> priorities, I fully agree, under the condition that code quality is
>>> included as a subset of enterprise-readyness. Auth* is paramount (Kerberos
>>> SPNEGO SSO support is what we really want) with user and group rights
>>> assignment on the notebook level. We probably also need Knox-integration
>>> (ODP-Members looking at integrating Zeppelin should consider contributing
>>> this), and integration of something like Spree (
>>> https://github.com/hammerlab/spree) to be able to profile jobs.
>>>
>>> I'm hopeful that soon I can resume contributing some quality-oriented
>>> code, to drive this "necessary evil" forward ;)
>>>
>>>
>>>
>>> On Mon, Feb 29, 2016 at 8:27 PM, Sourav Mazumder <
>>> sourav.mazumde...@gmail.com> wrote:
>>>
>>> I do agree with Vinayak. It need not be coupled with Oozie.
>>>
>>> Rather one should be able to call it from any scheduler typically used
>>> in enterprise level. May be support for BPML.
>>>
>>> I believe the existing ability to call/execute a Zeppelin Notebook or a
>>> specific paragraph within a notebook using REST API should take care of
>>> this requirement to some extent.
>>>
>>> Regards,
>>>
>>> Sourav
>>>
>>>
>>>
>>> On Mon, Feb 29, 2016 at 11:23 AM, Vinayak Agrawal <
>>> vinayakagrawa...@gmail.com> wrote:
>>>
>>> @Eran Witkon,
>>>
>>> Thanks for the suggestion Eran. I concur with your thought.
>>>
>>> If Zepplin can be integrated with oozie, that would be wonderful. Users
>>> will also be able to leverage their Oozie skills.
>>>
>>> This would be promising for now.
>>>
>>> However, in the future Hadoop might not necessarily be installed in
>>> Spark Cluster and Oozie (since its installs with Hadoop Distribution) might
>>> not be available.
>>>
>>> So perhaps we should give a thought about this feature for the future.
>>> Should it depend on oozie or should Zeppelin have its owns scheduling?
>>>
>>> As Benjamin has iterated, Databrick notebook has this as a core notebook
>>> feature.
>>>
>>>
>>>
>>> Also, would anybody give any suggestions regarding "sync with github"
>>> feature?
>>>
>>> -Exporting notebook to Github
>>>
>>> -Importing notebook from Github
>>>
>>>
>>>
>>> Thanks
>>>
>>> Vinayak
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Feb 29, 2016 at 4:17 AM, Eran Witkon <eranwit...@gmail.com>
>>> wrote:
>>>
>>> @*Vinayak Agrawal *I would suggest adding the ability to connect
>>> zeppelin to existing scheduling tools\workflow tools such as
>>> https://oozie.apache.org/. this requires betters hooks and status
>>> reporting but doesn't make zeppeling and ETL\scheduler tool by itself/
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal <
>>> vinayakagrawa...@gmail.com> wrote:
>>>
>>> Moon,
>>>
>>> The new roadmap looks very promising. I am very happy to see security in
>>> the list.
>>> I have some suggestions regarding Enterprise Ready features:
>>>
>>>
>>> 1. Job Scheduler - Can this be improved?
>>>
>>> Currently the scheduler can be used with Cron expression or a pre-set
>>> time. But in an enterprise solution, a notebook might be one piece of the
>>> workflow. Can we look towards the functionality of scheduling notebook's
>>> based on other notebooks finishing their job successfully?
>>>
>>> This requirement would arise in any ETL workflow, where all the
>>> downstream users wait for the ETL notebook to finish successfully. Only
>>> after that, other business oriented notebooks can be executed.
>>>
>>> 2. Importing a notebook - Is there a current requirement or future plan
>>> to implement a feature that allows import-notebook-from-github? This would
>>> allow users to share notebooks seamlessly.
>>>
>>> Thanks
>>>
>>> Vinayak
>>>
>>>
>>>
>>> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <m...@apache.org> wrote:
>>>
>>> Zhong Wang,
>>>
>>> Right, Folder support would be quite useful. Thanks for the opinion.
>>>
>>> Hope i can finish the work pr-190
>>> <https://github.com/apache/incubator-zeppelin/pull/190>.
>>>
>>>
>>>
>>> Sourav,
>>>
>>> Regarding concurrent running, Zeppelin doesn't have limitation of run
>>> paragraph/query concurrently. Interpreter can implement it's own scheduling
>>> policy. For example, SparkSQL interpreter and ShellInterpreter can already
>>> run paragraph/query concurrently.
>>>
>>>
>>>
>>> SparkInterpreter is implemented with FIFO scheduler considering nature
>>> of scala compiler. That's why user can not run multiple paragraph
>>> concurrently when they work with SparkInterpreter.
>>>
>>> But as Zhong Wang mentioned, pr-703 enables each notebook will have
>>> separate scala compiler so paragraphs run concurrently, while they're in
>>> different notebooks.
>>>
>>> Thanks for the feedback!
>>>
>>>
>>>
>>> Best,
>>>
>>> moon
>>>
>>> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <wangzhong....@gmail.com>
>>> wrote:
>>>
>>> Sourav: I think this newly merged PR can help you
>>> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537
>>>
>>>
>>>
>>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
>>> sourav.mazumde...@gmail.com> wrote:
>>>
>>> Hi Moon,
>>>
>>> This looks great.
>>>
>>> My only suggestion would be to include a PR/feature - Support for
>>> Running Concurrent paragraphs/queries in Zeppelin.
>>>
>>> Right now if more than one user tries to run paragraphs in multiple
>>> notebooks concurrently through a single Zeppelin instance (and single
>>> interpreter instance) the performance is very slow. It is obvious that the
>>> queue gets built up within the zeppelin process and interpreter process in
>>> that scenario as the time taken to move the status from start to pending
>>> and pending to running is very high compared to the actual running time of
>>> a paragraph.
>>>
>>> Without this the multi tenancy support would be meaningless as no one
>>> can practically use it in a situation where multiple users are trying to
>>> connect to the same instance of Zeppelin (and the related interpreter). A
>>> possible solution would be to spawn separate instance of the same
>>> interpreter at every notebook/user level.
>>>
>>> Regards,
>>>
>>> Sourav
>>>
>>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <m...@apache.org> wrote:
>>>
>>> Hi Zeppelin users and developers,
>>>
>>>
>>>
>>> The roadmap we have published at
>>>
>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>>>
>>> is almost 9 month old, and it doesn't reflect where the community goes
>>> anymore. It's time to update.
>>>
>>>
>>>
>>> Based on mailing list, jira issues, pullrequests, feedbacks from users,
>>> conferences and meetings, I could summarize the major interest of users and
>>> developers in 7 categories. Enterprise ready, Usability improvement,
>>> Pluggability, Documentation, Backend integration, Notebook storage, and
>>> Visualization.
>>>
>>>
>>>
>>> And i could list related subjects under each categories.
>>>
>>>
>>>    - Enterprise ready
>>>
>>>
>>>    - Authentication
>>>
>>>
>>>    - Shiro authentication ZEPPELIN-548
>>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>>>
>>>
>>>    - Authorization
>>>
>>>
>>>    - Notebook authorization PR-681
>>>          <https://github.com/apache/incubator-zeppelin/pull/681>
>>>
>>>
>>>    - Security
>>>       - Multi-tenancy
>>>       - Stability
>>>
>>>
>>>    - Usability Improvement
>>>
>>>
>>>    - UX improvement
>>>       - Better Table data support
>>>
>>>
>>>    - Download data as csv, etc PR-725
>>>          <https://github.com/apache/incubator-zeppelin/pull/725>, PR-714
>>>          <https://github.com/apache/incubator-zeppelin/pull/714>, PR-6
>>>          <https://github.com/apache/incubator-zeppelin/pull/6>, PR-89
>>>          <https://github.com/apache/incubator-zeppelin/pull/89>
>>>
>>>
>>>    - Featureful table data display (pagenation, etc)
>>>
>>>
>>>    - Pluggability ZEPPELIN-533
>>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>>>
>>>
>>>    - Pluggable visualization
>>>
>>>
>>>    - Dynamic Interpreter, notebook, visualization loading
>>>
>>>
>>>    - Repository and registry for pluggable components
>>>
>>>
>>>    - Improve documentation
>>>
>>>
>>>    - Improve contents and readability
>>>       - more tutorials, examples
>>>
>>>
>>>    - Interpreter
>>>
>>>
>>>    - Generic JDBC Interpreter
>>>       - (spark)R Interpreter
>>>       - Cluster manager for interpreter (Proposal
>>>       
>>> <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>>>       )
>>>       - more interpreters
>>>
>>>
>>>    - Notebook storage
>>>
>>>
>>>    - Versioning ZEPPELIN-540
>>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>>>       - more notebook storages
>>>
>>>
>>>    - Visualization
>>>
>>>
>>>    - More visualizations PR-152
>>>       <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728
>>>       <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336
>>>       <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321
>>>       <https://github.com/apache/incubator-zeppelin/pull/321>
>>>
>>>
>>>    - Customize graph (show/hide label, color, etc)
>>>
>>> It will help anyone quickly get overall interest of project and the
>>> direction. And based on this roadmap, we can discuss and re-define the next
>>> release 0.6.0 scope and it's schedule.
>>>
>>>
>>>
>>> What do you think? Any feedback would be appreciated.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> moon
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Vinayak Agrawal
>>>
>>>
>>>
>>> "To Strive, To Seek, To Find and Not to Yield!"
>>>
>>> ~Lord Alfred Tennyson
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Vinayak Agrawal
>>>
>>> Big Data Analytics
>>>
>>> IBM
>>>
>>> "To Strive, To Seek, To Find and Not to Yield!"
>>>
>>> ~Lord Alfred Tennyson
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>

Re: [DISCUSS] Update Roadmap

Reply via email to