Re: [DISCUSS] Update Roadmap

Zhong Wang Tue, 01 Mar 2016 22:26:23 -0800

+1 on @rick. quality is really important... I am still encountering bugs
consistently


On Tue, Mar 1, 2016 at 10:16 AM, TEJA SRIVASTAV <tejasrivas...@gmail.com>
wrote:

> +1 on @rick
>
> On Tue, Mar 1, 2016 at 11:26 PM Benjamin Kim <bbuil...@gmail.com> wrote:
>
>> I see in the Enterprise section that multi-tenancy will be included, will
>> this have user impersonation too? In this way, the user executing will be
>> the user owning the process.
>>
>> On Mar 1, 2016, at 12:51 AM, Shabeel Syed <shabeels...@gmail.com> wrote:
>>
>> +1
>>
>> Hi Tamas,
>>    Pluggable external visualization is really a GREAT feature to have.
>> I'm looking forward to this :)
>>
>> Regards
>> Shabeel
>>
>> On Tue, Mar 1, 2016 at 2:16 PM, Tamas Szuromi <tamas.szur...@odigeo.com>
>> wrote:
>>
>>> Hey,
>>>
>>> Really promising roadmap.
>>>
>>> I'd only push more visualization options. I agree built in
>>> visualization is needed with limited charting options but I think we also
>>> need somehow 'inject' external js visualizations also.
>>>
>>>
>>> For scheduling Zeppelin notebooks  we use
>>>  https://github.com/airbnb/airflow <https://github.com/airbnb/airflow> 
>>> through
>>> the job rest api. It's an enterprise ready and very robust solution
>>> right now.
>>>
>>>
>>> *Tamas*
>>>
>>> On 1 March 2016 at 09:12, Eran Witkon <eranwit...@gmail.com> wrote:
>>>
>>>> One point to clarify, I don't want to suggest Oozie in specific, I want
>>>> to think about which features we develop and which ones we integrate
>>>> external, preferred Apache, technology? We don't think about building our
>>>> own storage services so why build our own scheduler?
>>>> Eran
>>>> On Tue, 1 Mar 2016 at 09:49 moon soo Lee <m...@apache.org> wrote:
>>>>
>>>>> @Vinayak, @Eran, @Benjamin, @Guilherme, @Sourav, @Rick
>>>>> Now I can see a lot of demands around enterprise level job scheduling.
>>>>> Either external or built-in, I completely agree having enterprise level 
>>>>> job
>>>>> scheduling support on the roadmap.
>>>>> ZEPPELIN-137 <https://issues.apache.org/jira/browse/ZEPPELIN-137>,
>>>>> ZEPPELIN-531 <https://issues.apache.org/jira/browse/ZEPPELIN-531> are
>>>>> related issues i can find in our JIRA.
>>>>>
>>>>> @Vinayak
>>>>> Regarding importing notebook from github, Zeppelin has pluggable
>>>>> notebook storage layer (see related package
>>>>> <https://github.com/apache/incubator-zeppelin/tree/master/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/repo>).
>>>>> So, github notebook sync can be implemented easily.
>>>>>
>>>>> @Shabeel
>>>>> Right, we need better manage management to prevent such OOM.
>>>>> And i think table is one of the most frequently used way of displaying
>>>>> data. So definitely, we'll need more features like filter, sort, etc.
>>>>> After this roadmap discussion, discussion for the next release will
>>>>> follow. Then we'll get idea when those features will be available.
>>>>>
>>>>> @Prasad
>>>>> Thanks for mentioning HA and DR. They're really important subject for
>>>>> enterprise use. Definitely Zeppelin will need to address them.
>>>>> And displaying meta information of notebook on top level page is good
>>>>> idea.
>>>>>
>>>>> It's really great to hear many opinions and ideas.
>>>>> And thanks @Rick for sharing valuable view to Zeppelin project.
>>>>>
>>>>> Thanks,
>>>>> moon
>>>>>
>>>>>
>>>>> On Mon, Feb 29, 2016 at 11:14 PM Rick Moritz <rah...@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> For one, I know that there is rudimentary scheduling built into
>>>>>> Zeppelin already (at least I fixed a bug in the test for a scheduling
>>>>>> feature a few months ago).
>>>>>> But another point is, that Zeppelin should also focus on quality,
>>>>>> reproduceability and portability.
>>>>>> Although this doesn't offer exciting new features, it would make
>>>>>> development much easier.
>>>>>>
>>>>>> Cross-platform testability, Tests that pass when run sequentially,
>>>>>> compatibility with Firefox, and many more open issues that make it so 
>>>>>> much
>>>>>> harder to enhance Zeppelin and add features should be addressed soon,
>>>>>> preferably before more features are added. Already Zeppelin is suffering 
>>>>>> -
>>>>>> in my opinion - from quite a lot of feature creep, and we should avoid
>>>>>> putting in the kitchen sink, at the cost of quality and maintainability.
>>>>>> Instead modularity (ZEPPELIN-533 in particular) should be targeted.
>>>>>>
>>>>>> Oozie, in my opinion, is a dead end - it may de-facto still be in use
>>>>>> on many clusters, but it's not getting the love it needs, and I wouldn't
>>>>>> bet on it, when it comes to integrating scheduling. Instead, any external
>>>>>> tool should be able to use the REST-API to trigger executions, if you 
>>>>>> want
>>>>>> external scheduling.
>>>>>>
>>>>>> So, in conclusion, if we take Moon's list as a list of descending
>>>>>> priorities, I fully agree, under the condition that code quality is
>>>>>> included as a subset of enterprise-readyness. Auth* is paramount 
>>>>>> (Kerberos
>>>>>> SPNEGO SSO support is what we really want) with user and group rights
>>>>>> assignment on the notebook level. We probably also need Knox-integration
>>>>>> (ODP-Members looking at integrating Zeppelin should consider contributing
>>>>>> this), and integration of something like Spree (
>>>>>> https://github.com/hammerlab/spree) to be able to profile jobs.
>>>>>>
>>>>>> I'm hopeful that soon I can resume contributing some quality-oriented
>>>>>> code, to drive this "necessary evil" forward ;)
>>>>>>
>>>>>> On Mon, Feb 29, 2016 at 8:27 PM, Sourav Mazumder <
>>>>>> sourav.mazumde...@gmail.com> wrote:
>>>>>>
>>>>>>> I do agree with Vinayak. It need not be coupled with Oozie.
>>>>>>>
>>>>>>> Rather one should be able to call it from any scheduler typically
>>>>>>> used in enterprise level. May be support for BPML.
>>>>>>>
>>>>>>> I believe the existing ability to call/execute a Zeppelin Notebook
>>>>>>> or a specific paragraph within a notebook using REST API should take 
>>>>>>> care
>>>>>>> of this requirement to some extent.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Sourav
>>>>>>>
>>>>>>> On Mon, Feb 29, 2016 at 11:23 AM, Vinayak Agrawal <
>>>>>>> vinayakagrawa...@gmail.com> wrote:
>>>>>>>
>>>>>>>> @Eran Witkon,
>>>>>>>> Thanks for the suggestion Eran. I concur with your thought.
>>>>>>>> If Zepplin can be integrated with oozie, that would be wonderful.
>>>>>>>> Users will also be able to leverage their Oozie skills.
>>>>>>>> This would be promising for now.
>>>>>>>> However, in the future Hadoop might not necessarily be installed in
>>>>>>>> Spark Cluster and Oozie (since its installs with Hadoop Distribution) 
>>>>>>>> might
>>>>>>>> not be available.
>>>>>>>> So perhaps we should give a thought about this feature for the
>>>>>>>> future. Should it depend on oozie or should Zeppelin have its owns
>>>>>>>> scheduling?
>>>>>>>>
>>>>>>>> As Benjamin has iterated, Databrick notebook has this as a core
>>>>>>>> notebook feature.
>>>>>>>>
>>>>>>>>
>>>>>>>> Also, would anybody give any suggestions regarding "sync with
>>>>>>>> github" feature?
>>>>>>>> -Exporting notebook to Github
>>>>>>>> -Importing notebook from Github
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Vinayak
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Feb 29, 2016 at 4:17 AM, Eran Witkon <eranwit...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> @Vinayak Agrawal I would suggest adding the ability to connect
>>>>>>>>> zeppelin to existing scheduling tools\workflow tools such as
>>>>>>>>> https://oozie.apache.org/. this requires betters hooks and status
>>>>>>>>> reporting but doesn't make zeppeling and ETL\scheduler tool by itself/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal <
>>>>>>>>> vinayakagrawa...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Moon,
>>>>>>>>>> The new roadmap looks very promising. I am very happy to see
>>>>>>>>>> security in the list.
>>>>>>>>>> I have some suggestions regarding Enterprise Ready features:
>>>>>>>>>>
>>>>>>>>>> 1. Job Scheduler - Can this be improved?
>>>>>>>>>> Currently the scheduler can be used with Cron expression or a
>>>>>>>>>> pre-set time. But in an enterprise solution, a notebook might be one 
>>>>>>>>>> piece
>>>>>>>>>> of the workflow. Can we look towards the functionality of scheduling
>>>>>>>>>> notebook's based on other notebooks finishing their job successfully?
>>>>>>>>>> This requirement would arise in any ETL workflow, where all the
>>>>>>>>>> downstream users wait for the ETL notebook to finish successfully. 
>>>>>>>>>> Only
>>>>>>>>>> after that, other business oriented notebooks can be executed.
>>>>>>>>>>
>>>>>>>>>> 2. Importing a notebook - Is there a current requirement or
>>>>>>>>>> future plan to implement a feature that allows 
>>>>>>>>>> import-notebook-from-github?
>>>>>>>>>> This would allow users to share notebooks seamlessly.
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> Vinayak
>>>>>>>>>>
>>>>>>>>>> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <m...@apache.org>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Zhong Wang,
>>>>>>>>>>> Right, Folder support would be quite useful. Thanks for the
>>>>>>>>>>> opinion.
>>>>>>>>>>>
>>>>>>>>>> Hope i can finish the work pr-190
>>>>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/190>.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Sourav,
>>>>>>>>>>> Regarding concurrent running, Zeppelin doesn't have limitation
>>>>>>>>>>> of run paragraph/query concurrently. Interpreter can implement it's 
>>>>>>>>>>> own
>>>>>>>>>>> scheduling policy. For example, SparkSQL interpreter and 
>>>>>>>>>>> ShellInterpreter
>>>>>>>>>>> can already run paragraph/query concurrently.
>>>>>>>>>>>
>>>>>>>>>>> SparkInterpreter is implemented with FIFO scheduler considering
>>>>>>>>>>> nature of scala compiler. That's why user can not run multiple 
>>>>>>>>>>> paragraph
>>>>>>>>>>> concurrently when they work with SparkInterpreter.
>>>>>>>>>>> But as Zhong Wang mentioned, pr-703 enables each notebook will
>>>>>>>>>>> have separate scala compiler so paragraphs run concurrently, while 
>>>>>>>>>>> they're
>>>>>>>>>>> in different notebooks.
>>>>>>>>>>> Thanks for the feedback!
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> moon
>>>>>>>>>>>
>>>>>>>>>> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <
>>>>>>>>>>> wangzhong....@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>> Sourav: I think this newly merged PR can help you
>>>>>>>>>>>> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537
>>>>>>>>>>>>
>>>>>>>>>>>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
>>>>>>>>>>>> sourav.mazumde...@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>> Hi Moon,
>>>>>>>>>>>>>
>>>>>>>>>>>>> This looks great.
>>>>>>>>>>>>>
>>>>>>>>>>>>> My only suggestion would be to include a PR/feature - Support
>>>>>>>>>>>>> for Running Concurrent paragraphs/queries in Zeppelin.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Right now if more than one user tries to run paragraphs in
>>>>>>>>>>>>> multiple notebooks concurrently through a single Zeppelin 
>>>>>>>>>>>>> instance (and
>>>>>>>>>>>>> single interpreter instance) the performance is very slow. It is 
>>>>>>>>>>>>> obvious
>>>>>>>>>>>>> that the queue gets built up within the zeppelin process and 
>>>>>>>>>>>>> interpreter
>>>>>>>>>>>>> process in that scenario as the time taken to move the status 
>>>>>>>>>>>>> from start to
>>>>>>>>>>>>> pending and pending to running is very high compared to the 
>>>>>>>>>>>>> actual running
>>>>>>>>>>>>> time of a paragraph.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Without this the multi tenancy support would be meaningless as
>>>>>>>>>>>>> no one can practically use it in a situation where multiple users 
>>>>>>>>>>>>> are
>>>>>>>>>>>>> trying to connect to the same instance of Zeppelin (and the 
>>>>>>>>>>>>> related
>>>>>>>>>>>>> interpreter). A possible solution would be to spawn separate 
>>>>>>>>>>>>> instance of
>>>>>>>>>>>>> the same interpreter at every notebook/user level.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Sourav
>>>>>>>>>>>>>
>>>>>>>>>>>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <m...@apache.org
>>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>> Hi Zeppelin users and developers,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The roadmap we have published at
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>>>>>>>>>>>>>> is almost 9 month old, and it doesn't reflect where the
>>>>>>>>>>>>>> community goes anymore. It's time to update.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Based on mailing list, jira issues, pullrequests, feedbacks
>>>>>>>>>>>>>> from users, conferences and meetings, I could summarize the 
>>>>>>>>>>>>>> major interest
>>>>>>>>>>>>>> of users and developers in 7 categories. Enterprise ready, 
>>>>>>>>>>>>>> Usability
>>>>>>>>>>>>>> improvement, Pluggability, Documentation, Backend integration, 
>>>>>>>>>>>>>> Notebook
>>>>>>>>>>>>>> storage, and Visualization.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> And i could list related subjects under each categories.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>    - Enterprise ready
>>>>>>>>>>>>>>       - Authentication
>>>>>>>>>>>>>>          - Shiro authentication ZEPPELIN-548
>>>>>>>>>>>>>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>>>>>>>>>>>>>>       - Authorization
>>>>>>>>>>>>>>          - Notebook authorization PR-681
>>>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/681>
>>>>>>>>>>>>>>       - Security
>>>>>>>>>>>>>>       - Multi-tenancy
>>>>>>>>>>>>>>       - Stability
>>>>>>>>>>>>>>    - Usability Improvement
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    - UX improvement
>>>>>>>>>>>>>>       - Better Table data support
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    - Download data as csv, etc PR-725
>>>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/725>
>>>>>>>>>>>>>>          , PR-714
>>>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/714>
>>>>>>>>>>>>>>          , PR-6
>>>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/6>
>>>>>>>>>>>>>>          , PR-89
>>>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/89>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    - Featureful table data display (pagenation, etc)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    - Pluggability ZEPPELIN-533
>>>>>>>>>>>>>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>>>>>>>>>>>>>>       - Pluggable visualization
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    - Dynamic Interpreter, notebook, visualization loading
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    - Repository and registry for pluggable components
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    - Improve documentation
>>>>>>>>>>>>>>       - Improve contents and readability
>>>>>>>>>>>>>>       - more tutorials, examples
>>>>>>>>>>>>>>    - Interpreter
>>>>>>>>>>>>>>       - Generic JDBC Interpreter
>>>>>>>>>>>>>>       - (spark)R Interpreter
>>>>>>>>>>>>>>       - Cluster manager for interpreter (Proposal
>>>>>>>>>>>>>>       
>>>>>>>>>>>>>> <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>>>>>>>>>>>>>>       )
>>>>>>>>>>>>>>       - more interpreters
>>>>>>>>>>>>>>    - Notebook storage
>>>>>>>>>>>>>>       - Versioning ZEPPELIN-540
>>>>>>>>>>>>>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>>>>>>>>>>>>>>       - more notebook storages
>>>>>>>>>>>>>>    - Visualization
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    - More visualizations PR-152
>>>>>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/152>
>>>>>>>>>>>>>>       , PR-728
>>>>>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/728>
>>>>>>>>>>>>>>       , PR-336
>>>>>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/336>
>>>>>>>>>>>>>>       , PR-321
>>>>>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/321>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    - Customize graph (show/hide label, color, etc)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It will help anyone quickly get overall interest of project
>>>>>>>>>>>>>> and the direction. And based on this roadmap, we can discuss and 
>>>>>>>>>>>>>> re-define
>>>>>>>>>>>>>> the next release 0.6.0 scope and it's schedule.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> What do you think? Any feedback would be appreciated.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> moon
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Vinayak Agrawal
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> "To Strive, To Seek, To Find and Not to Yield!"
>>>>>>>>>> ~Lord Alfred Tennyson
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Vinayak Agrawal
>>>>>>>> Big Data Analytics
>>>>>>>> IBM
>>>>>>>>
>>>>>>>> "To Strive, To Seek, To Find and Not to Yield!"
>>>>>>>> ~Lord Alfred Tennyson
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>
>>
>>

Re: [DISCUSS] Update Roadmap

Reply via email to