Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

Stephan Ewen Thu, 21 Feb 2019 09:06:51 -0800

Hi Shaoxuan!

I think adding the web UI improvements makes sense - there is not much open
to discuss there. Will do that.


For the machine learning improvements - that is a pretty big piece and I
think the discussions are still ongoing. I would prefer this to advance a
bit before adding it to the roadmap. The way I proposed the roadmap, it was
meant to reflect the ongoing features where we have consensus on what it
should roughly look like.
We can update the roadmap very soon, once the machine learning discussion
has advanced a bit and has reached the state of a FLIP or so.

What do you think?

Best,
Stephan

On Mon, Feb 18, 2019 at 4:31 PM Shaoxuan Wang <wshaox...@gmail.com> wrote:

> Hi Stephan,
>
> Thanks for summarizing the work&discussions into a roadmap. It really
> helps users to understand where Flink will forward to. The entire outline
> looks good to me. If appropriate, I would recommend to add another two
> attracting categories in the roadmap.
>
> *Flink ML Enhancement*
>   - Refactor ML pipeline on TableAPI
>   - Python support for TableAPI
>   - Support streaming training & inference.
>   - Seamless integration of DL engines (Tensorflow, PyTorch etc)
>   - ML platform with a group of AI tooling
> Some of these work have already been discussed in the dev mail list.
> Related JIRA (FLINK-11095) and discussion:
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Embracing-Table-API-in-Flink-ML-td25368.html
> ;
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Python-and-Non-JVM-Language-Support-in-Flink-td25905.html
>
>
> *Flink-Runtime-Web Improvement*
>   - Much of this comes via Blink
>   - Refactor the entire module to use latest Angular (7.x)
>   - Add resource information at three levels including Cluster,
> TaskManager and Job
>   - Add operator level topology and and data flow tracing
>   - Add new metrics to track the back pressure, filter and data skew
>   - Add log association to Job, Vertex and SubTasks
> Related JIRA (FLINK-10705) and discussion:
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Change-underlying-Frontend-Architecture-for-Flink-Web-Dashboard-td24902.html
>
>
> What do you think?
>
> Regards,
> Shaoxuan
>
>
>
> On Wed, Feb 13, 2019 at 7:21 PM Stephan Ewen <se...@apache.org> wrote:
>
>> Hi all!
>>
>> Recently several contributors, committers, and users asked about making
>> it more visible in which way the project is currently going.
>>
>> Users and developers can track the direction by following the discussion
>> threads and JIRA, but due to the mass of discussions and open issues, it is
>> very hard to get a good overall picture.
>> Especially for new users and contributors, is is very hard to get a quick
>> overview of the project direction.
>>
>> To fix this, I suggest to add a brief roadmap summary to the homepage. It
>> is a bit of a commitment to keep that roadmap up to date, but I think the
>> benefit for users justifies that.
>> The Apache Beam project has added such a roadmap [1]
>> <https://beam.apache.org/roadmap/>, which was received very well by the
>> community, I would suggest to follow a similar structure here.
>>
>> If the community is in favor of this, I would volunteer to write a first
>> version of such a roadmap. The points I would include are below.
>>
>> Best,
>> Stephan
>>
>> [1] https://beam.apache.org/roadmap/
>>
>> ========================================================
>>
>> Disclaimer: Apache Flink is not governed or steered by any one single
>> entity, but by its community and Project Management Committee (PMC). This
>> is not a authoritative roadmap in the sense of a plan with a specific
>> timeline. Instead, we share our vision for the future and major initiatives
>> that are receiving attention and give users and contributors an
>> understanding what they can look forward to.
>>
>> *Future Role of Table API and DataStream API*
>>   - Table API becomes first class citizen
>>   - Table API becomes primary API for analytics use cases
>>       * Declarative, automatic optimizations
>>       * No manual control over state and timers
>>   - DataStream API becomes primary API for applications and data pipeline
>> use cases
>>       * Physical, user controls data types, no magic or optimizer
>>       * Explicit control over state and time
>>
>> *Batch Streaming Unification*
>>   - Table API unification (environments) (FLIP-32)
>>   - New unified source interface (FLIP-27)
>>   - Runtime operator unification & code reuse between DataStream / Table
>>   - Extending Table API to make it convenient API for all analytical use
>> cases (easier mix in of UDFs)
>>   - Same join operators on bounded/unbounded Table API and DataStream API
>>
>> *Faster Batch (Bounded Streams)*
>>   - Much of this comes via Blink contribution/merging
>>   - Fine-grained Fault Tolerance on bounded data (Table API)
>>   - Batch Scheduling on bounded data (Table API)
>>   - External Shuffle Services Support on bounded streams
>>   - Caching of intermediate results on bounded data (Table API)
>>   - Extending DataStream API to explicitly model bounded streams (API
>> breaking)
>>   - Add fine fault tolerance, scheduling, caching also to DataStream API
>>
>> *Streaming State Evolution*
>>   - Let all built-in serializers support stable evolution
>>   - First class support for other evolvable formats (Protobuf, Thrift)
>>   - Savepoint input/output format to modify / adjust savepoints
>>
>> *Simpler Event Time Handling*
>>   - Event Time Alignment in Sources
>>   - Simpler out-of-the box support in sources
>>
>> *Checkpointing*
>>   - Consistency of Side Effects: suspend / end with savepoint (FLIP-34)
>>   - Failed checkpoints explicitly aborted on TaskManagers (not only on
>> coordinator)
>>
>> *Automatic scaling (adjusting parallelism)*
>>   - Reactive scaling
>>   - Active scaling policies
>>
>> *Kubernetes Integration*
>>   - Active Kubernetes Integration (Flink actively manages containers)
>>
>> *SQL Ecosystem*
>>   - Extended Metadata Stores / Catalog / Schema Registries support
>>   - DDL support
>>   - Integration with Hive Ecosystem
>>
>> *Simpler Handling of Dependencies*
>>   - Scala in the APIs, but not in the core (hide in separate class loader)
>>   - Hadoop-free by default
>>
>>

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

Reply via email to