RE: [DISCUSS] Update Roadmap

Guilherme Silveira Thu, 24 Mar 2016 12:10:12 -0700

Is there a predefined release interval,  lets say,  6 months or 1 year,
between one version and another?
Em 23 de mar de 2016 4:10 PM, "Joel Van Veluwen" <
joel.vanvelu...@quantium.com.au> escreveu:


> Hi Nikolay,
>
>
>
> I raised this with MapR and there doesn’t appear to be plans to add
> Zeppelin to 5.1
>
>
>
> https://community.mapr.com/message/40332
>
>
>
> We are deploying it manually and everything is pretty stable – but it will
> vary depending on your environment.
>
>
>
> Cheers,
>
>
>
> Joel Van Veluwen
> *QUANTIUM*
> Level 25, 8 Chifley
> 8-12 Chifley Square
> Sydney NSW 2000
>
> T: +61 2 8224 8981
> M: +61 403 153 265
> F: +61 2 9292 6444
>
> W: quantium.com.au <http://www.quantium.com.au>
> ------------------------------
>
> linkedin.com/company/quantium <http://www.linkedin.com/company/quantium>
> facebook.com/QuantiumAustralia <http://www.facebook.com/QuantiumAustralia>
> twitter.com/QuantiumAU <http://www.twitter.com/QuantiumAU>
>
> The contents of this email, including attachments, may be confidential
> information. If you are not the intended recipient, any use, disclosure or
> copying of the information is unauthorised. If you have received this email
> in error, we would be grateful if you would notify us immediately by email
> reply, phone (+ 61 2 9292 6400) or fax (+ 61 2 9292 6444) and delete the
> message from your system.
>
>
>
> *From:* Nikolay Voronchikhin [mailto:nvoronchik...@gmail.com]
> *Sent:* Tuesday, 22 March 2016 11:39 AM
> *To:* users@zeppelin.incubator.apache.org
> *Subject:* Re: [DISCUSS] Update Roadmap
>
>
>
> Hi Zeppelin Users and Developers,
>
>
>
> Do you know if MapR will be adding Zeppelin to its roadmap for the next
> version after MapR 5.1?
>
>
>
> We see in Hue 3.9 that it provides notebooks for R Shell, Python Shell,
> PySpark, SparkR, Hive SQL, Impala SQL, and Spark SQL, but no Drill SQL
> notebook.
>
> We are looking for an Apache Project that focuses on a Drill Notebook UI
> that performs better than the Drill Web Console UI itself.
>
>
>
> Sincerely,
>
> *Nikolay Voronchikhin*
>
> *Big Data/Data Warehouse/Data Science/Data Platforms Engineer at Cisco*
>
> *https://www.linkedin.com/in/nvoronchikhin
> <https://www.linkedin.com/in/nvoronchikhin>*
>
> *E-mail: nvoronchik...@gmail.com <nvoronchik...@gmail.com>*
>
> *Mobile: 951-288-2778 <951-288-2778>*
>
>
>
>
>
> On Mon, Mar 21, 2016 at 2:44 PM, rohit choudhary <rconl...@gmail.com>
> wrote:
>
> Dear All,
>
>
>
> I think direction setting is important for Enterprise readiness. I have a
> little bit of an overview of Ambari Views, which is very similar in nature
> to Zeppelin. Please let me explain:
>
>
>
> Hive View - interacts with Hive
>
> Pig View - interacts with Pig
>
> Workflow Designer - interacts with Oozie
>
>
>
> We have a very similar architecture in Zeppelin where we interact with
> these systems through Interpreters. The usage will also be similar, as both
> with interact with Hadoop clusters or in some cases Spark with Yarn on
> HDFS. Our priorities should include:
>
>
>
> - Design & implement for multi-tenancy
>
> - Auditability from Data/State and Lineage perspective
>
> - Ability to share Notebooks/Data/State across users, preferably through
> SparkContext sharing
>
> - Security between Zeppelin and the other systems, not limited to Spark
> through Kerberos. (@Rick +1)
>
>
>
> I will share an initial draft of the thoughts I have in mind, in the next
> couple of days.
>
>
>
> Thanks,
>
> Rohit.
>
>
>
>
>
>
>
> On Thu, Mar 3, 2016 at 7:54 AM, moon soo Lee <m...@apache.org> wrote:
>
> Shabeel, thanks for the feedback about rest api and custom id. that might
> help avoid multiple rest api calls.
>
>
>
> Thanks everyone for valuable feedback. Looks like all we're going to the
> same direction. I have updated wiki.
>
> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>
> Please take a look.
>
>
>
> I'm sure there're many missing details in this roadmap. I must say
> something not on this roadmap doesn't mean community is not working on or
> can't be included in the Zeppelin. Roadmap represents more like community
> interest and overall direction.
>
> We're not changing roadmap everyday, but that doesn't mean roadmap is set
> in stone and never be changed. We can improve it continuously.
>
>
>
> Please feel free to fork the this mail thread for any further discussion
> on specific subject. (e.g. job scheduling)
>
>
>
> Thanks,
>
> moon
>
>
>
> On Wed, Mar 2, 2016 at 12:31 AM Shabeel Syed <shabeels...@gmail.com>
> wrote:
>
> Also we need better rest api support for creating and fetching the
> notebooks and paragraphs.
>
> for example if I can set custom defined notebookid and paragraphid , we
> can avoid multiple rest api calls.
>
>
>
> http://localhost:8080/#/notebook/
> <notebookid>/paragraph/<paragraphid>?asIframe
>
> should return me error if notebook or paragraph deos not exists.
>
>
>
> and while creating notebook or paragraph I should be able to mention my
> custom ids.
>
>
>
> Regards
>
> Shabeel
>
>
>
> On Wed, Mar 2, 2016 at 11:55 AM, Zhong Wang <wangzhong....@gmail.com>
> wrote:
>
> +1 on @rick. quality is really important... I am still encountering bugs
> consistently
>
>
>
> On Tue, Mar 1, 2016 at 10:16 AM, TEJA SRIVASTAV <tejasrivas...@gmail.com>
> wrote:
>
> +1 on @rick
>
>
>
> On Tue, Mar 1, 2016 at 11:26 PM Benjamin Kim <bbuil...@gmail.com> wrote:
>
> I see in the Enterprise section that multi-tenancy will be included, will
> this have user impersonation too? In this way, the user executing will be
> the user owning the process.
>
>
>
> On Mar 1, 2016, at 12:51 AM, Shabeel Syed <shabeels...@gmail.com> wrote:
>
>
>
> +1
>
>
>
> Hi Tamas,
>
>    Pluggable external visualization is really a GREAT feature to have. I'm
> looking forward to this :)
>
>
>
> Regards
>
> Shabeel
>
>
>
> On Tue, Mar 1, 2016 at 2:16 PM, Tamas Szuromi <tamas.szur...@odigeo.com>
> wrote:
>
> Hey,
>
>
>
> Really promising roadmap.
>
>
>
> I'd only push more visualization options. I agree built in visualization
> is needed with limited charting options but I think we also need somehow
> 'inject' external js visualizations also.
>
>
>
>
>
> For scheduling Zeppelin notebooks  we use
>  https://github.com/airbnb/airflow <https://github.com/airbnb/airflow> through
> the job rest api. It's an enterprise ready and very robust solution right
> now.
>
>
>
> *Tamas*
>
>
>
> On 1 March 2016 at 09:12, Eran Witkon <eranwit...@gmail.com> wrote:
>
> One point to clarify, I don't want to suggest Oozie in specific, I want to
> think about which features we develop and which ones we integrate external,
> preferred Apache, technology? We don't think about building our own storage
> services so why build our own scheduler?
> Eran
>
> On Tue, 1 Mar 2016 at 09:49 moon soo Lee <m...@apache.org> wrote:
>
> @Vinayak, @Eran, @Benjamin, @Guilherme, @Sourav, @Rick
>
> Now I can see a lot of demands around enterprise level job scheduling.
> Either external or built-in, I completely agree having enterprise level job
> scheduling support on the roadmap.
>
> ZEPPELIN-137 <https://issues.apache.org/jira/browse/ZEPPELIN-137>,
> ZEPPELIN-531 <https://issues.apache.org/jira/browse/ZEPPELIN-531> are
> related issues i can find in our JIRA.
>
>
>
> @Vinayak
>
> Regarding importing notebook from github, Zeppelin has pluggable notebook
> storage layer (see related package
> <https://github.com/apache/incubator-zeppelin/tree/master/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/repo>).
> So, github notebook sync can be implemented easily.
>
>
>
> @Shabeel
>
> Right, we need better manage management to prevent such OOM.
>
> And i think table is one of the most frequently used way of displaying
> data. So definitely, we'll need more features like filter, sort, etc.
>
> After this roadmap discussion, discussion for the next release will
> follow. Then we'll get idea when those features will be available.
>
>
>
> @Prasad
>
> Thanks for mentioning HA and DR. They're really important subject for
> enterprise use. Definitely Zeppelin will need to address them.
>
> And displaying meta information of notebook on top level page is good idea.
>
>
>
> It's really great to hear many opinions and ideas.
>
> And thanks @Rick for sharing valuable view to Zeppelin project.
>
>
>
> Thanks,
>
> moon
>
>
>
>
>
> On Mon, Feb 29, 2016 at 11:14 PM Rick Moritz <rah...@gmail.com> wrote:
>
> Hi,
>
> For one, I know that there is rudimentary scheduling built into Zeppelin
> already (at least I fixed a bug in the test for a scheduling feature a few
> months ago).
>
> But another point is, that Zeppelin should also focus on quality,
> reproduceability and portability.
>
> Although this doesn't offer exciting new features, it would make
> development much easier.
>
> Cross-platform testability, Tests that pass when run sequentially,
> compatibility with Firefox, and many more open issues that make it so much
> harder to enhance Zeppelin and add features should be addressed soon,
> preferably before more features are added. Already Zeppelin is suffering -
> in my opinion - from quite a lot of feature creep, and we should avoid
> putting in the kitchen sink, at the cost of quality and maintainability.
> Instead modularity (ZEPPELIN-533 in particular) should be targeted.
>
> Oozie, in my opinion, is a dead end - it may de-facto still be in use on
> many clusters, but it's not getting the love it needs, and I wouldn't bet
> on it, when it comes to integrating scheduling. Instead, any external tool
> should be able to use the REST-API to trigger executions, if you want
> external scheduling.
>
> So, in conclusion, if we take Moon's list as a list of descending
> priorities, I fully agree, under the condition that code quality is
> included as a subset of enterprise-readyness. Auth* is paramount (Kerberos
> SPNEGO SSO support is what we really want) with user and group rights
> assignment on the notebook level. We probably also need Knox-integration
> (ODP-Members looking at integrating Zeppelin should consider contributing
> this), and integration of something like Spree (
> https://github.com/hammerlab/spree) to be able to profile jobs.
>
> I'm hopeful that soon I can resume contributing some quality-oriented
> code, to drive this "necessary evil" forward ;)
>
>
>
> On Mon, Feb 29, 2016 at 8:27 PM, Sourav Mazumder <
> sourav.mazumde...@gmail.com> wrote:
>
> I do agree with Vinayak. It need not be coupled with Oozie.
>
> Rather one should be able to call it from any scheduler typically used in
> enterprise level. May be support for BPML.
>
> I believe the existing ability to call/execute a Zeppelin Notebook or a
> specific paragraph within a notebook using REST API should take care of
> this requirement to some extent.
>
> Regards,
>
> Sourav
>
>
>
> On Mon, Feb 29, 2016 at 11:23 AM, Vinayak Agrawal <
> vinayakagrawa...@gmail.com> wrote:
>
> @Eran Witkon,
>
> Thanks for the suggestion Eran. I concur with your thought.
>
> If Zepplin can be integrated with oozie, that would be wonderful. Users
> will also be able to leverage their Oozie skills.
>
> This would be promising for now.
>
> However, in the future Hadoop might not necessarily be installed in Spark
> Cluster and Oozie (since its installs with Hadoop Distribution) might not
> be available.
>
> So perhaps we should give a thought about this feature for the future.
> Should it depend on oozie or should Zeppelin have its owns scheduling?
>
> As Benjamin has iterated, Databrick notebook has this as a core notebook
> feature.
>
>
>
> Also, would anybody give any suggestions regarding "sync with github"
> feature?
>
> -Exporting notebook to Github
>
> -Importing notebook from Github
>
>
>
> Thanks
>
> Vinayak
>
>
>
>
>
> On Mon, Feb 29, 2016 at 4:17 AM, Eran Witkon <eranwit...@gmail.com> wrote:
>
> @*Vinayak Agrawal *I would suggest adding the ability to connect zeppelin
> to existing scheduling tools\workflow tools such as
> https://oozie.apache.org/. this requires betters hooks and status
> reporting but doesn't make zeppeling and ETL\scheduler tool by itself/
>
>
>
>
>
> On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal <
> vinayakagrawa...@gmail.com> wrote:
>
> Moon,
>
> The new roadmap looks very promising. I am very happy to see security in
> the list.
> I have some suggestions regarding Enterprise Ready features:
>
>
> 1. Job Scheduler - Can this be improved?
>
> Currently the scheduler can be used with Cron expression or a pre-set
> time. But in an enterprise solution, a notebook might be one piece of the
> workflow. Can we look towards the functionality of scheduling notebook's
> based on other notebooks finishing their job successfully?
>
> This requirement would arise in any ETL workflow, where all the downstream
> users wait for the ETL notebook to finish successfully. Only after that,
> other business oriented notebooks can be executed.
>
> 2. Importing a notebook - Is there a current requirement or future plan to
> implement a feature that allows import-notebook-from-github? This would
> allow users to share notebooks seamlessly.
>
> Thanks
>
> Vinayak
>
>
>
> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <m...@apache.org> wrote:
>
> Zhong Wang,
>
> Right, Folder support would be quite useful. Thanks for the opinion.
>
> Hope i can finish the work pr-190
> <https://github.com/apache/incubator-zeppelin/pull/190>.
>
>
>
> Sourav,
>
> Regarding concurrent running, Zeppelin doesn't have limitation of run
> paragraph/query concurrently. Interpreter can implement it's own scheduling
> policy. For example, SparkSQL interpreter and ShellInterpreter can already
> run paragraph/query concurrently.
>
>
>
> SparkInterpreter is implemented with FIFO scheduler considering nature of
> scala compiler. That's why user can not run multiple paragraph concurrently
> when they work with SparkInterpreter.
>
> But as Zhong Wang mentioned, pr-703 enables each notebook will have
> separate scala compiler so paragraphs run concurrently, while they're in
> different notebooks.
>
> Thanks for the feedback!
>
>
>
> Best,
>
> moon
>
> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <wangzhong....@gmail.com>
> wrote:
>
> Sourav: I think this newly merged PR can help you
> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537
>
>
>
> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
> sourav.mazumde...@gmail.com> wrote:
>
> Hi Moon,
>
> This looks great.
>
> My only suggestion would be to include a PR/feature - Support for Running
> Concurrent paragraphs/queries in Zeppelin.
>
> Right now if more than one user tries to run paragraphs in multiple
> notebooks concurrently through a single Zeppelin instance (and single
> interpreter instance) the performance is very slow. It is obvious that the
> queue gets built up within the zeppelin process and interpreter process in
> that scenario as the time taken to move the status from start to pending
> and pending to running is very high compared to the actual running time of
> a paragraph.
>
> Without this the multi tenancy support would be meaningless as no one can
> practically use it in a situation where multiple users are trying to
> connect to the same instance of Zeppelin (and the related interpreter). A
> possible solution would be to spawn separate instance of the same
> interpreter at every notebook/user level.
>
> Regards,
>
> Sourav
>
> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <m...@apache.org> wrote:
>
> Hi Zeppelin users and developers,
>
>
>
> The roadmap we have published at
>
> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>
> is almost 9 month old, and it doesn't reflect where the community goes
> anymore. It's time to update.
>
>
>
> Based on mailing list, jira issues, pullrequests, feedbacks from users,
> conferences and meetings, I could summarize the major interest of users and
> developers in 7 categories. Enterprise ready, Usability improvement,
> Pluggability, Documentation, Backend integration, Notebook storage, and
> Visualization.
>
>
>
> And i could list related subjects under each categories.
>
>
>    - Enterprise ready
>
>
>    - Authentication
>
>
>    - Shiro authentication ZEPPELIN-548
>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>
>
>    - Authorization
>
>
>    - Notebook authorization PR-681
>          <https://github.com/apache/incubator-zeppelin/pull/681>
>
>
>    - Security
>       - Multi-tenancy
>       - Stability
>
>
>    - Usability Improvement
>
>
>    - UX improvement
>       - Better Table data support
>
>
>    - Download data as csv, etc PR-725
>          <https://github.com/apache/incubator-zeppelin/pull/725>, PR-714
>          <https://github.com/apache/incubator-zeppelin/pull/714>, PR-6
>          <https://github.com/apache/incubator-zeppelin/pull/6>, PR-89
>          <https://github.com/apache/incubator-zeppelin/pull/89>
>
>
>    - Featureful table data display (pagenation, etc)
>
>
>    - Pluggability ZEPPELIN-533
>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>
>
>    - Pluggable visualization
>
>
>    - Dynamic Interpreter, notebook, visualization loading
>
>
>    - Repository and registry for pluggable components
>
>
>    - Improve documentation
>
>
>    - Improve contents and readability
>       - more tutorials, examples
>
>
>    - Interpreter
>
>
>    - Generic JDBC Interpreter
>       - (spark)R Interpreter
>       - Cluster manager for interpreter (Proposal
>       
> <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>       )
>       - more interpreters
>
>
>    - Notebook storage
>
>
>    - Versioning ZEPPELIN-540
>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>       - more notebook storages
>
>
>    - Visualization
>
>
>    - More visualizations PR-152
>       <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728
>       <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336
>       <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321
>       <https://github.com/apache/incubator-zeppelin/pull/321>
>
>
>    - Customize graph (show/hide label, color, etc)
>
> It will help anyone quickly get overall interest of project and the
> direction. And based on this roadmap, we can discuss and re-define the next
> release 0.6.0 scope and it's schedule.
>
>
>
> What do you think? Any feedback would be appreciated.
>
>
>
> Thanks,
>
> moon
>
>
>
>
>
>
> --
>
> Vinayak Agrawal
>
>
>
> "To Strive, To Seek, To Find and Not to Yield!"
>
> ~Lord Alfred Tennyson
>
>
>
>
> --
>
> Vinayak Agrawal
>
> Big Data Analytics
>
> IBM
>
> "To Strive, To Seek, To Find and Not to Yield!"
>
> ~Lord Alfred Tennyson
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>

RE: [DISCUSS] Update Roadmap

Reply via email to