Re: [DISCUSS] Update Roadmap

moon soo Lee Sat, 26 Mar 2016 09:36:00 -0700

There is an discussion thread for Release Policy.
https://s.apache.org/3JCm please check this thread, too.


Thanks,
moon

On Thu, Mar 24, 2016 at 12:02 PM Guilherme Silveira <
guilhermecgss...@gmail.com> wrote:

> Is there a predefined release interval,  lets say,  6 months or 1 year,
> between one version and another?
> Em 23 de mar de 2016 4:10 PM, "Joel Van Veluwen" <
> joel.vanvelu...@quantium.com.au> escreveu:
>
>> Hi Nikolay,
>>
>>
>>
>> I raised this with MapR and there doesn’t appear to be plans to add
>> Zeppelin to 5.1
>>
>>
>>
>> https://community.mapr.com/message/40332
>>
>>
>>
>> We are deploying it manually and everything is pretty stable – but it
>> will vary depending on your environment.
>>
>>
>>
>> Cheers,
>>
>>
>>
>> Joel Van Veluwen
>> *QUANTIUM*
>> Level 25, 8 Chifley
>> 8-12 Chifley Square
>> Sydney NSW 2000
>>
>> T: +61 2 8224 8981
>> M: +61 403 153 265
>> F: +61 2 9292 6444
>>
>> W: quantium.com.au <http://www.quantium.com.au>
>> ------------------------------
>>
>> linkedin.com/company/quantium <http://www.linkedin.com/company/quantium>
>> facebook.com/QuantiumAustralia
>> <http://www.facebook.com/QuantiumAustralia>
>> twitter.com/QuantiumAU <http://www.twitter.com/QuantiumAU>
>>
>> The contents of this email, including attachments, may be confidential
>> information. If you are not the intended recipient, any use, disclosure or
>> copying of the information is unauthorised. If you have received this email
>> in error, we would be grateful if you would notify us immediately by email
>> reply, phone (+ 61 2 9292 6400) or fax (+ 61 2 9292 6444) and delete the
>> message from your system.
>>
>>
>>
>> *From:* Nikolay Voronchikhin [mailto:nvoronchik...@gmail.com]
>> *Sent:* Tuesday, 22 March 2016 11:39 AM
>> *To:* users@zeppelin.incubator.apache.org
>> *Subject:* Re: [DISCUSS] Update Roadmap
>>
>>
>>
>> Hi Zeppelin Users and Developers,
>>
>>
>>
>> Do you know if MapR will be adding Zeppelin to its roadmap for the next
>> version after MapR 5.1?
>>
>>
>>
>> We see in Hue 3.9 that it provides notebooks for R Shell, Python Shell,
>> PySpark, SparkR, Hive SQL, Impala SQL, and Spark SQL, but no Drill SQL
>> notebook.
>>
>> We are looking for an Apache Project that focuses on a Drill Notebook UI
>> that performs better than the Drill Web Console UI itself.
>>
>>
>>
>> Sincerely,
>>
>> *Nikolay Voronchikhin*
>>
>> *Big Data/Data Warehouse/Data Science/Data Platforms Engineer at Cisco*
>>
>> *https://www.linkedin.com/in/nvoronchikhin
>> <https://www.linkedin.com/in/nvoronchikhin>*
>>
>> *E-mail: nvoronchik...@gmail.com <nvoronchik...@gmail.com>*
>>
>> *Mobile: 951-288-2778 <951-288-2778>*
>>
>>
>>
>>
>>
>> On Mon, Mar 21, 2016 at 2:44 PM, rohit choudhary <rconl...@gmail.com>
>> wrote:
>>
>> Dear All,
>>
>>
>>
>> I think direction setting is important for Enterprise readiness. I have a
>> little bit of an overview of Ambari Views, which is very similar in nature
>> to Zeppelin. Please let me explain:
>>
>>
>>
>> Hive View - interacts with Hive
>>
>> Pig View - interacts with Pig
>>
>> Workflow Designer - interacts with Oozie
>>
>>
>>
>> We have a very similar architecture in Zeppelin where we interact with
>> these systems through Interpreters. The usage will also be similar, as both
>> with interact with Hadoop clusters or in some cases Spark with Yarn on
>> HDFS. Our priorities should include:
>>
>>
>>
>> - Design & implement for multi-tenancy
>>
>> - Auditability from Data/State and Lineage perspective
>>
>> - Ability to share Notebooks/Data/State across users, preferably through
>> SparkContext sharing
>>
>> - Security between Zeppelin and the other systems, not limited to Spark
>> through Kerberos. (@Rick +1)
>>
>>
>>
>> I will share an initial draft of the thoughts I have in mind, in the next
>> couple of days.
>>
>>
>>
>> Thanks,
>>
>> Rohit.
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Mar 3, 2016 at 7:54 AM, moon soo Lee <m...@apache.org> wrote:
>>
>> Shabeel, thanks for the feedback about rest api and custom id. that might
>> help avoid multiple rest api calls.
>>
>>
>>
>> Thanks everyone for valuable feedback. Looks like all we're going to the
>> same direction. I have updated wiki.
>>
>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>>
>> Please take a look.
>>
>>
>>
>> I'm sure there're many missing details in this roadmap. I must say
>> something not on this roadmap doesn't mean community is not working on or
>> can't be included in the Zeppelin. Roadmap represents more like community
>> interest and overall direction.
>>
>> We're not changing roadmap everyday, but that doesn't mean roadmap is set
>> in stone and never be changed. We can improve it continuously.
>>
>>
>>
>> Please feel free to fork the this mail thread for any further discussion
>> on specific subject. (e.g. job scheduling)
>>
>>
>>
>> Thanks,
>>
>> moon
>>
>>
>>
>> On Wed, Mar 2, 2016 at 12:31 AM Shabeel Syed <shabeels...@gmail.com>
>> wrote:
>>
>> Also we need better rest api support for creating and fetching the
>> notebooks and paragraphs.
>>
>> for example if I can set custom defined notebookid and paragraphid , we
>> can avoid multiple rest api calls.
>>
>>
>>
>> http://localhost:8080/#/notebook/
>> <notebookid>/paragraph/<paragraphid>?asIframe
>>
>> should return me error if notebook or paragraph deos not exists.
>>
>>
>>
>> and while creating notebook or paragraph I should be able to mention my
>> custom ids.
>>
>>
>>
>> Regards
>>
>> Shabeel
>>
>>
>>
>> On Wed, Mar 2, 2016 at 11:55 AM, Zhong Wang <wangzhong....@gmail.com>
>> wrote:
>>
>> +1 on @rick. quality is really important... I am still encountering bugs
>> consistently
>>
>>
>>
>> On Tue, Mar 1, 2016 at 10:16 AM, TEJA SRIVASTAV <tejasrivas...@gmail.com>
>> wrote:
>>
>> +1 on @rick
>>
>>
>>
>> On Tue, Mar 1, 2016 at 11:26 PM Benjamin Kim <bbuil...@gmail.com> wrote:
>>
>> I see in the Enterprise section that multi-tenancy will be included, will
>> this have user impersonation too? In this way, the user executing will be
>> the user owning the process.
>>
>>
>>
>> On Mar 1, 2016, at 12:51 AM, Shabeel Syed <shabeels...@gmail.com> wrote:
>>
>>
>>
>> +1
>>
>>
>>
>> Hi Tamas,
>>
>>    Pluggable external visualization is really a GREAT feature to have.
>> I'm looking forward to this :)
>>
>>
>>
>> Regards
>>
>> Shabeel
>>
>>
>>
>> On Tue, Mar 1, 2016 at 2:16 PM, Tamas Szuromi <tamas.szur...@odigeo.com>
>> wrote:
>>
>> Hey,
>>
>>
>>
>> Really promising roadmap.
>>
>>
>>
>> I'd only push more visualization options. I agree built in visualization
>> is needed with limited charting options but I think we also need somehow
>> 'inject' external js visualizations also.
>>
>>
>>
>>
>>
>> For scheduling Zeppelin notebooks  we use
>>  https://github.com/airbnb/airflow <https://github.com/airbnb/airflow> 
>> through
>> the job rest api. It's an enterprise ready and very robust solution
>> right now.
>>
>>
>>
>> *Tamas*
>>
>>
>>
>> On 1 March 2016 at 09:12, Eran Witkon <eranwit...@gmail.com> wrote:
>>
>> One point to clarify, I don't want to suggest Oozie in specific, I want
>> to think about which features we develop and which ones we integrate
>> external, preferred Apache, technology? We don't think about building our
>> own storage services so why build our own scheduler?
>> Eran
>>
>> On Tue, 1 Mar 2016 at 09:49 moon soo Lee <m...@apache.org> wrote:
>>
>> @Vinayak, @Eran, @Benjamin, @Guilherme, @Sourav, @Rick
>>
>> Now I can see a lot of demands around enterprise level job scheduling.
>> Either external or built-in, I completely agree having enterprise level job
>> scheduling support on the roadmap.
>>
>> ZEPPELIN-137 <https://issues.apache.org/jira/browse/ZEPPELIN-137>,
>> ZEPPELIN-531 <https://issues.apache.org/jira/browse/ZEPPELIN-531> are
>> related issues i can find in our JIRA.
>>
>>
>>
>> @Vinayak
>>
>> Regarding importing notebook from github, Zeppelin has pluggable notebook
>> storage layer (see related package
>> <https://github.com/apache/incubator-zeppelin/tree/master/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/repo>).
>> So, github notebook sync can be implemented easily.
>>
>>
>>
>> @Shabeel
>>
>> Right, we need better manage management to prevent such OOM.
>>
>> And i think table is one of the most frequently used way of displaying
>> data. So definitely, we'll need more features like filter, sort, etc.
>>
>> After this roadmap discussion, discussion for the next release will
>> follow. Then we'll get idea when those features will be available.
>>
>>
>>
>> @Prasad
>>
>> Thanks for mentioning HA and DR. They're really important subject for
>> enterprise use. Definitely Zeppelin will need to address them.
>>
>> And displaying meta information of notebook on top level page is good
>> idea.
>>
>>
>>
>> It's really great to hear many opinions and ideas.
>>
>> And thanks @Rick for sharing valuable view to Zeppelin project.
>>
>>
>>
>> Thanks,
>>
>> moon
>>
>>
>>
>>
>>
>> On Mon, Feb 29, 2016 at 11:14 PM Rick Moritz <rah...@gmail.com> wrote:
>>
>> Hi,
>>
>> For one, I know that there is rudimentary scheduling built into Zeppelin
>> already (at least I fixed a bug in the test for a scheduling feature a few
>> months ago).
>>
>> But another point is, that Zeppelin should also focus on quality,
>> reproduceability and portability.
>>
>> Although this doesn't offer exciting new features, it would make
>> development much easier.
>>
>> Cross-platform testability, Tests that pass when run sequentially,
>> compatibility with Firefox, and many more open issues that make it so much
>> harder to enhance Zeppelin and add features should be addressed soon,
>> preferably before more features are added. Already Zeppelin is suffering -
>> in my opinion - from quite a lot of feature creep, and we should avoid
>> putting in the kitchen sink, at the cost of quality and maintainability.
>> Instead modularity (ZEPPELIN-533 in particular) should be targeted.
>>
>> Oozie, in my opinion, is a dead end - it may de-facto still be in use on
>> many clusters, but it's not getting the love it needs, and I wouldn't bet
>> on it, when it comes to integrating scheduling. Instead, any external tool
>> should be able to use the REST-API to trigger executions, if you want
>> external scheduling.
>>
>> So, in conclusion, if we take Moon's list as a list of descending
>> priorities, I fully agree, under the condition that code quality is
>> included as a subset of enterprise-readyness. Auth* is paramount (Kerberos
>> SPNEGO SSO support is what we really want) with user and group rights
>> assignment on the notebook level. We probably also need Knox-integration
>> (ODP-Members looking at integrating Zeppelin should consider contributing
>> this), and integration of something like Spree (
>> https://github.com/hammerlab/spree) to be able to profile jobs.
>>
>> I'm hopeful that soon I can resume contributing some quality-oriented
>> code, to drive this "necessary evil" forward ;)
>>
>>
>>
>> On Mon, Feb 29, 2016 at 8:27 PM, Sourav Mazumder <
>> sourav.mazumde...@gmail.com> wrote:
>>
>> I do agree with Vinayak. It need not be coupled with Oozie.
>>
>> Rather one should be able to call it from any scheduler typically used in
>> enterprise level. May be support for BPML.
>>
>> I believe the existing ability to call/execute a Zeppelin Notebook or a
>> specific paragraph within a notebook using REST API should take care of
>> this requirement to some extent.
>>
>> Regards,
>>
>> Sourav
>>
>>
>>
>> On Mon, Feb 29, 2016 at 11:23 AM, Vinayak Agrawal <
>> vinayakagrawa...@gmail.com> wrote:
>>
>> @Eran Witkon,
>>
>> Thanks for the suggestion Eran. I concur with your thought.
>>
>> If Zepplin can be integrated with oozie, that would be wonderful. Users
>> will also be able to leverage their Oozie skills.
>>
>> This would be promising for now.
>>
>> However, in the future Hadoop might not necessarily be installed in Spark
>> Cluster and Oozie (since its installs with Hadoop Distribution) might not
>> be available.
>>
>> So perhaps we should give a thought about this feature for the future.
>> Should it depend on oozie or should Zeppelin have its owns scheduling?
>>
>> As Benjamin has iterated, Databrick notebook has this as a core notebook
>> feature.
>>
>>
>>
>> Also, would anybody give any suggestions regarding "sync with github"
>> feature?
>>
>> -Exporting notebook to Github
>>
>> -Importing notebook from Github
>>
>>
>>
>> Thanks
>>
>> Vinayak
>>
>>
>>
>>
>>
>> On Mon, Feb 29, 2016 at 4:17 AM, Eran Witkon <eranwit...@gmail.com>
>> wrote:
>>
>> @*Vinayak Agrawal *I would suggest adding the ability to connect
>> zeppelin to existing scheduling tools\workflow tools such as
>> https://oozie.apache.org/. this requires betters hooks and status
>> reporting but doesn't make zeppeling and ETL\scheduler tool by itself/
>>
>>
>>
>>
>>
>> On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal <
>> vinayakagrawa...@gmail.com> wrote:
>>
>> Moon,
>>
>> The new roadmap looks very promising. I am very happy to see security in
>> the list.
>> I have some suggestions regarding Enterprise Ready features:
>>
>>
>> 1. Job Scheduler - Can this be improved?
>>
>> Currently the scheduler can be used with Cron expression or a pre-set
>> time. But in an enterprise solution, a notebook might be one piece of the
>> workflow. Can we look towards the functionality of scheduling notebook's
>> based on other notebooks finishing their job successfully?
>>
>> This requirement would arise in any ETL workflow, where all the
>> downstream users wait for the ETL notebook to finish successfully. Only
>> after that, other business oriented notebooks can be executed.
>>
>> 2. Importing a notebook - Is there a current requirement or future plan
>> to implement a feature that allows import-notebook-from-github? This would
>> allow users to share notebooks seamlessly.
>>
>> Thanks
>>
>> Vinayak
>>
>>
>>
>> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <m...@apache.org> wrote:
>>
>> Zhong Wang,
>>
>> Right, Folder support would be quite useful. Thanks for the opinion.
>>
>> Hope i can finish the work pr-190
>> <https://github.com/apache/incubator-zeppelin/pull/190>.
>>
>>
>>
>> Sourav,
>>
>> Regarding concurrent running, Zeppelin doesn't have limitation of run
>> paragraph/query concurrently. Interpreter can implement it's own scheduling
>> policy. For example, SparkSQL interpreter and ShellInterpreter can already
>> run paragraph/query concurrently.
>>
>>
>>
>> SparkInterpreter is implemented with FIFO scheduler considering nature of
>> scala compiler. That's why user can not run multiple paragraph concurrently
>> when they work with SparkInterpreter.
>>
>> But as Zhong Wang mentioned, pr-703 enables each notebook will have
>> separate scala compiler so paragraphs run concurrently, while they're in
>> different notebooks.
>>
>> Thanks for the feedback!
>>
>>
>>
>> Best,
>>
>> moon
>>
>> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <wangzhong....@gmail.com>
>> wrote:
>>
>> Sourav: I think this newly merged PR can help you
>> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537
>>
>>
>>
>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
>> sourav.mazumde...@gmail.com> wrote:
>>
>> Hi Moon,
>>
>> This looks great.
>>
>> My only suggestion would be to include a PR/feature - Support for Running
>> Concurrent paragraphs/queries in Zeppelin.
>>
>> Right now if more than one user tries to run paragraphs in multiple
>> notebooks concurrently through a single Zeppelin instance (and single
>> interpreter instance) the performance is very slow. It is obvious that the
>> queue gets built up within the zeppelin process and interpreter process in
>> that scenario as the time taken to move the status from start to pending
>> and pending to running is very high compared to the actual running time of
>> a paragraph.
>>
>> Without this the multi tenancy support would be meaningless as no one can
>> practically use it in a situation where multiple users are trying to
>> connect to the same instance of Zeppelin (and the related interpreter). A
>> possible solution would be to spawn separate instance of the same
>> interpreter at every notebook/user level.
>>
>> Regards,
>>
>> Sourav
>>
>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <m...@apache.org> wrote:
>>
>> Hi Zeppelin users and developers,
>>
>>
>>
>> The roadmap we have published at
>>
>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>>
>> is almost 9 month old, and it doesn't reflect where the community goes
>> anymore. It's time to update.
>>
>>
>>
>> Based on mailing list, jira issues, pullrequests, feedbacks from users,
>> conferences and meetings, I could summarize the major interest of users and
>> developers in 7 categories. Enterprise ready, Usability improvement,
>> Pluggability, Documentation, Backend integration, Notebook storage, and
>> Visualization.
>>
>>
>>
>> And i could list related subjects under each categories.
>>
>>
>>    - Enterprise ready
>>
>>
>>    - Authentication
>>
>>
>>    - Shiro authentication ZEPPELIN-548
>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>>
>>
>>    - Authorization
>>
>>
>>    - Notebook authorization PR-681
>>          <https://github.com/apache/incubator-zeppelin/pull/681>
>>
>>
>>    - Security
>>       - Multi-tenancy
>>       - Stability
>>
>>
>>    - Usability Improvement
>>
>>
>>    - UX improvement
>>       - Better Table data support
>>
>>
>>    - Download data as csv, etc PR-725
>>          <https://github.com/apache/incubator-zeppelin/pull/725>, PR-714
>>          <https://github.com/apache/incubator-zeppelin/pull/714>, PR-6
>>          <https://github.com/apache/incubator-zeppelin/pull/6>, PR-89
>>          <https://github.com/apache/incubator-zeppelin/pull/89>
>>
>>
>>    - Featureful table data display (pagenation, etc)
>>
>>
>>    - Pluggability ZEPPELIN-533
>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>>
>>
>>    - Pluggable visualization
>>
>>
>>    - Dynamic Interpreter, notebook, visualization loading
>>
>>
>>    - Repository and registry for pluggable components
>>
>>
>>    - Improve documentation
>>
>>
>>    - Improve contents and readability
>>       - more tutorials, examples
>>
>>
>>    - Interpreter
>>
>>
>>    - Generic JDBC Interpreter
>>       - (spark)R Interpreter
>>       - Cluster manager for interpreter (Proposal
>>       
>> <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>>       )
>>       - more interpreters
>>
>>
>>    - Notebook storage
>>
>>
>>    - Versioning ZEPPELIN-540
>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>>       - more notebook storages
>>
>>
>>    - Visualization
>>
>>
>>    - More visualizations PR-152
>>       <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728
>>       <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336
>>       <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321
>>       <https://github.com/apache/incubator-zeppelin/pull/321>
>>
>>
>>    - Customize graph (show/hide label, color, etc)
>>
>> It will help anyone quickly get overall interest of project and the
>> direction. And based on this roadmap, we can discuss and re-define the next
>> release 0.6.0 scope and it's schedule.
>>
>>
>>
>> What do you think? Any feedback would be appreciated.
>>
>>
>>
>> Thanks,
>>
>> moon
>>
>>
>>
>>
>>
>>
>> --
>>
>> Vinayak Agrawal
>>
>>
>>
>> "To Strive, To Seek, To Find and Not to Yield!"
>>
>> ~Lord Alfred Tennyson
>>
>>
>>
>>
>> --
>>
>> Vinayak Agrawal
>>
>> Big Data Analytics
>>
>> IBM
>>
>> "To Strive, To Seek, To Find and Not to Yield!"
>>
>> ~Lord Alfred Tennyson
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>

Re: [DISCUSS] Update Roadmap

Reply via email to