date:20191009

[SURVEY] How do you use ExternallyInducedSource or WithMasterCheckpointHook

2019-10-09 Thread Biao Liu

Hi everyone,

I would like to reach out to the user who uses or is interested in
`ExternallyInducedSource` or `WithMasterCheckpointHook` interfaces.

The background of this survey is I'm doing some reworking of
`CheckpointCoordinator`. I encountered a problem that the semantics of
`MasterTriggerRestoreHook#triggerCheckpoint` [1] might be a bit confusing.
It looks like an asynchronous invocation (value returned is a future). But
based on the description of java doc, the implementation could be
synchronous (maybe blocking) or asynchronous. It's OK for now. However it
makes the optimization more complicated, to take care of synchronous and
asynchronous invocation at the same time [2].

I want to make the semantics clearer. Here are some options.
1. Keeps this method as the current. But emphasize in java doc and release
note that it should be a non-blocking operation, any heavy IO operation
should be executed asynchronously through the given executor. Otherwise
there might be a performance issue. In this way, it keeps the compatibility.
2. Changes the signature of this method. There will be no executor and
completable future in this method. It could be blocking for a while. We
will execute it in an executor outside. This also makes things easier,
however it breaks the compatibility.

At this moment, personally I intend to choose option 1.
If you depends on these interfaces, please let me know your opinion. Any
feedback is welcome!

[1]
https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/MasterTriggerRestoreHook.java
[2] https://issues.apache.org/jira/browse/FLINK-14344

Thanks,
Biao /'bɪ.aʊ/

Re: [DISCUSS] FLIP-76: Unaligned checkpoints

2019-10-09 Thread Congxian Qiu

Thanks for the FLIP, Arvid.

This is a good improvement for checkpoint under backpressure. Currently, if
a job under backpressure, it almost can't complete the checkpoint. so +1
from my side.

Best,
Congxian


zhijiang  于2019年10月10日周四 上午11:02写道：

> Thanks for writing up this FLIP, Arvid!
>
> Many users would expect this feature and also +1 from my side.
>
> Best,
> Zhijiang
> --
> From:Piotr Nowojski 
> Send Time:2019年10月7日(星期一) 10:13
> To:dev 
> Subject:Re: [DISCUSS] FLIP-76: Unaligned checkpoints
>
> Hi Arvid,
>
> Thanks for coming up with this FLIP. I think it addresses the issues
> raised in the previous mailing list discussion [2].
>
> For the record: +1 from my side to implement this.
>
> Piotrek
>
> > On 30 Sep 2019, at 14:31, Arvid Heise  wrote:
> >
> > Hi Devs,
> >
> > I would like to start the formal discussion about FLIP-76 [1], which
> > improves the checkpoint latency in systems under backpressure, where a
> > checkpoint can take hours to complete in the worst case. I recommend the
> > thread "checkpointing under backpressure" [2] to get a good idea why
> users
> > are not satisfied with the current behavior. The key points:
> >
> >   - Since the checkpoint barrier flows much slower through the
> >   back-pressured channels, the other channels and their upstream
> operators
> >   are effectively blocked during checkpointing.
> >   - The checkpoint barrier takes a long time to reach the sinks causing
> >   long checkpointing times. A longer checkpointing time in turn means
> that
> >   the checkpoint will be fairly outdated once done. Since a heavily
> utilized
> >   pipeline is inherently more fragile, we may run into a vicious cycle of
> >   late checkpoints, crash, recovery to a rather outdated checkpoint, more
> >   back pressure, and even later checkpoints, which would result in
> little to
> >   no progress in the application.
> >
> > The FLIP proposes "unaligned checkpoints" which improves the current
> state,
> > such that
> >
> >   - Upstream processes can continue to produce data, even if some
> operator
> >   still waits on a checkpoint barrier on a specific input channel.
> >   - Checkpointing times are heavily reduced across the execution graph,
> >   even for operators with a single input channel.
> >   - End-users will see more progress even in unstable environments as
> more
> >   up-to-date checkpoints will avoid too many recomputations.
> >   - Facilitate faster rescaling.
> >
> > The key idea is to allow checkpoint barriers to be forwarded to
> downstream
> > tasks before the synchronous part of the checkpointing has been conducted
> > (see Fig. 1). To that end, we need to store in-flight data as part of the
> > checkpoint as described in greater details in this FLIP.
> >
> > Although the basic idea was already sketched in [2], we would like get
> > broader feedback in this dedicated mail thread.
> >
> > Best,
> >
> > Arvid
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-76%3A+Unaligned+Checkpoints
> > [2]
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Checkpointing-under-backpressure-td31616.html
>
>

Re: [DISCUSS] FLIP-68: Extend Core Table System with Modular Plugins

2019-10-09 Thread Xuefu Z

Jark has a good point. However, I think validation logic can put in place
to restrict one instance per type. Maybe the doc needs to be specific on
this.

Thanks,
Xuefu

On Wed, Oct 9, 2019 at 7:41 PM Jark Wu  wrote:

> Thanks Bowen for the updating.
>
> I have some different opinions on the change.
> IIUC, in the previous design, the "name" is also the "id" or "type" to
> identify which module to load. Which means we can only load one instance of
> a module.
> In the new design, the "name" is just an alias to the module instance, the
> "kind" is used to identify modules. Which means we can load different
> instances of a module.
> However, what's the "name" or alias used for? Do we need to support loading
> different instances of a module? From my point of view, it brings more
> complexity and confusion.
> For example, if we load a "hive121" which uses HiveModule with version
> 1.2.1 and load a "hive234" which uses HiveModule with version 2.3.4, then
> how to solve the class conflict problem?
>
> IMO, a module can only be load once in a session, so "name" maybe useless.
> So my proposal is similar to the previous one, but only change "name" to
> "kind".
>
>SQL:
>  LOAD MODULE "kind" [WITH (properties)];
>  UNLOAD MODULE "kind";
> Table:
>  tEnv.loadModule("kind" [, properties]);
>  tEnv.unloadModule("kind");
>
> What do you think?
>
>
> Best,
> Jark
>
>
>
>
>
> On Wed, 9 Oct 2019 at 20:38, Bowen Li  wrote:
>
> > Thanks everyone for your review.
> >
> > After discussing with Timo and Dawid offline, as well as incorporating
> > feedback from Xuefu and Jark on mailing list, I decided to make a few
> > critical changes to the proposal.
> >
> > - renamed the keyword "type" to "kind". The community has plan to have
> > "type" keyword in yaml/descriptor refer to data types exclusively in the
> > near future. We should cater to that change in our design
> > - allowed specifying names for modules to simplify and unify module
> > loading/unloading syntax between programming and SQL. Here're the
> proposed
> > changes:
> > SQL:
> >  LOAD MODULE "name" WITH ("kind"="xxx" [, (properties)])
> >  UNLOAD MODULE "name";
> > Table:
> >  tEnv.loadModule("name", new Xxx(properties));
> >  tEnv.unloadModule("name");
> >
> > I have completely updated the google doc [1]. Please take another look,
> and
> > let me know if you have any other questions. Thanks!
> >
> > [1]
> >
> >
> https://docs.google.com/document/d/17CPMpMbPDjvM4selUVEfh_tqUK_oV0TODAUA9dfHakc/edit#
> >
> >
> > On Tue, Oct 8, 2019 at 6:26 AM Jark Wu  wrote:
> >
> > > Hi Bowen,
> > >
> > > Thanks for the proposal. I have two thoughts:
> > >
> > > 1) Regarding to "loadModule", how about
> > > tableEnv.loadModule("xxx" [, propertiesMap]);
> > > tableEnv.unloadModule(“xxx”);
> > >
> > > This makes the API similar to SQL. IMO, instance of Module is not
> needed
> > > and verbose as parameter.
> > > And this makes it easier to load a simple module without any additional
> > > properties, e.g. tEnv.loadModule("GEO"), tEnv.unloadModule("GEO")
> > >
> > > 2) In current design, the module interface only defines function
> > metadata,
> > > but no implementations.
> > > I'm wondering how to call/map the implementations in runtime? Am I
> > missing
> > > something?
> > >
> > > Besides, I left some minor comments in the doc.
> > >
> > > Best,
> > > Jark
> > >
> > >
> > > On Sat, 5 Oct 2019 at 08:42, Xuefu Z  wrote:
> > >
> > > > I agree with Timo that the new table APIs need to be consistent. I'd
> go
> > > > further that an name (or id) is needed for module definition in YAML
> > > file.
> > > > In the current design, name is skipped and type has binary meanings.
> > > >
> > > > Thanks,
> > > > Xuefu
> > > >
> > > > On Fri, Oct 4, 2019 at 5:24 AM Timo Walther 
> > wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > first, I was also questioning my proposal. But Bowen's proposal of
> > > > > `tEnv.offloadToYaml()` would not work with the current
> > > design
> > > > > because we don't know how to serialize a catalog or module into
> > > > > properties. Currently, there is no converter from instance to
> > > > > properties. It is a one way conversion. We can add a `toProperties`
> > > > > method to both Catalog and Module class in the future to solve
> this.
> > > > > Solving the table environment serializability can be future work.
> > > > >
> > > > > However, I find the current proposal for the TableEnvironment
> methods
> > > is
> > > > > contradicting:
> > > > >
> > > > > tableEnv.loadModule(new Yyy());
> > > > > tableEnv.unloadModule(“Xxx”);
> > > > >
> > > > > The loading is specified programmatically whereas the unloading
> > > requires
> > > > > a string that is not specified in the module itself. But is defined
> > in
> > > > > the factory according to the current design.
> > > > >
> > > > > SQL does it more consistently. There, the name `xxx` is used when
> > > > > loading and

Re: [SURVEY] Dropping non Credit-based Flow Control

2019-10-09 Thread zhijiang

Thanks for bringing this survey Piotr.

Actually I was also trying to dropping the non credit-based code path from
release-1.9, and now I think it is the proper time to do it motivated by [3].
The credit-based mode is as default from Flink 1.5 and it has been verified to
be stable and reliable in many versions. In Alibaba we are always using the
default credit-based mode in all products.
It can reduce much overhead of maintaining non credit-based code path, so +1
from my side to drop it.

Best,
Zhijiang
--
From:Piotr Nowojski
Send Time:2019年10月2日(星期三) 17:01
To:dev
Subject:[SURVEY] Dropping non Credit-based Flow Control

Hi,

In Flink 1.5 we have introduced Credit-based Flow Control [1] in the network
stack. Back then we were aware about potential downsides of it [2] and we
decided to keep the old model in the code base ( configurable by setting
`taskmanager.network.credit-model: false` ). Now, that we are about to modify
internals of the network stack again [3], it might be a good time to clean up
the code and remove the older code paths.

Is anyone still using the non default non Credit-based model (
`taskmanager.network.credit-model: false`)? If so, why?

Piotrek

[1] https://flink.apache.org/2019/06/05/flink-network-stack.html

[2]
https://flink.apache.org/2019/06/05/flink-network-stack.html#what-do-we-gain-where-is-the-catch

[3]
https://lists.apache.org/thread.html/a2b58b7b2b24b9bd4814b2aa51d2fc44b08a919eddbb5b1256be5b6a@%3Cdev.flink.apache.org%3E

[jira] [Created] (FLINK-14359) Create a module called flink-sql-connector-hbase to shade HBase

2019-10-09 Thread Jingsong Lee (Jira)

Jingsong Lee created FLINK-14359:


 Summary: Create a module called flink-sql-connector-hbase to shade 
HBase
 Key: FLINK-14359
 URL: https://issues.apache.org/jira/browse/FLINK-14359
 Project: Flink
  Issue Type: Bug
  Components: Connectors / HBase
Reporter: Jingsong Lee
 Fix For: 1.10.0


We need do the same thing as kafka and elasticsearch to HBase.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] FLIP-76: Unaligned checkpoints

2019-10-09 Thread zhijiang

Thanks for writing up this FLIP, Arvid! 

Many users would expect this feature and also +1 from my side.

Best,
Zhijiang
--
From:Piotr Nowojski 
Send Time:2019年10月7日(星期一) 10:13
To:dev 
Subject:Re: [DISCUSS] FLIP-76: Unaligned checkpoints

Hi Arvid,

Thanks for coming up with this FLIP. I think it addresses the issues raised in 
the previous mailing list discussion [2]. 

For the record: +1 from my side to implement this.

Piotrek

> On 30 Sep 2019, at 14:31, Arvid Heise  wrote:
> 
> Hi Devs,
> 
> I would like to start the formal discussion about FLIP-76 [1], which
> improves the checkpoint latency in systems under backpressure, where a
> checkpoint can take hours to complete in the worst case. I recommend the
> thread "checkpointing under backpressure" [2] to get a good idea why users
> are not satisfied with the current behavior. The key points:
> 
>   - Since the checkpoint barrier flows much slower through the
>   back-pressured channels, the other channels and their upstream operators
>   are effectively blocked during checkpointing.
>   - The checkpoint barrier takes a long time to reach the sinks causing
>   long checkpointing times. A longer checkpointing time in turn means that
>   the checkpoint will be fairly outdated once done. Since a heavily utilized
>   pipeline is inherently more fragile, we may run into a vicious cycle of
>   late checkpoints, crash, recovery to a rather outdated checkpoint, more
>   back pressure, and even later checkpoints, which would result in little to
>   no progress in the application.
> 
> The FLIP proposes "unaligned checkpoints" which improves the current state,
> such that
> 
>   - Upstream processes can continue to produce data, even if some operator
>   still waits on a checkpoint barrier on a specific input channel.
>   - Checkpointing times are heavily reduced across the execution graph,
>   even for operators with a single input channel.
>   - End-users will see more progress even in unstable environments as more
>   up-to-date checkpoints will avoid too many recomputations.
>   - Facilitate faster rescaling.
> 
> The key idea is to allow checkpoint barriers to be forwarded to downstream
> tasks before the synchronous part of the checkpointing has been conducted
> (see Fig. 1). To that end, we need to store in-flight data as part of the
> checkpoint as described in greater details in this FLIP.
> 
> Although the basic idea was already sketched in [2], we would like get
> broader feedback in this dedicated mail thread.
> 
> Best,
> 
> Arvid
> 
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-76%3A+Unaligned+Checkpoints
> [2]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Checkpointing-under-backpressure-td31616.html

Re: [DISCUSS] FLIP-68: Extend Core Table System with Modular Plugins

2019-10-09 Thread Jark Wu

Thanks Bowen for the updating.

I have some different opinions on the change.
IIUC, in the previous design, the "name" is also the "id" or "type" to
identify which module to load. Which means we can only load one instance of
a module.
In the new design, the "name" is just an alias to the module instance, the
"kind" is used to identify modules. Which means we can load different
instances of a module.
However, what's the "name" or alias used for? Do we need to support loading
different instances of a module? From my point of view, it brings more
complexity and confusion.
For example, if we load a "hive121" which uses HiveModule with version
1.2.1 and load a "hive234" which uses HiveModule with version 2.3.4, then
how to solve the class conflict problem?

IMO, a module can only be load once in a session, so "name" maybe useless.
So my proposal is similar to the previous one, but only change "name" to
"kind".

   SQL:
 LOAD MODULE "kind" [WITH (properties)];
 UNLOAD MODULE "kind";
Table:
 tEnv.loadModule("kind" [, properties]);
 tEnv.unloadModule("kind");

What do you think?


Best,
Jark





On Wed, 9 Oct 2019 at 20:38, Bowen Li  wrote:

> Thanks everyone for your review.
>
> After discussing with Timo and Dawid offline, as well as incorporating
> feedback from Xuefu and Jark on mailing list, I decided to make a few
> critical changes to the proposal.
>
> - renamed the keyword "type" to "kind". The community has plan to have
> "type" keyword in yaml/descriptor refer to data types exclusively in the
> near future. We should cater to that change in our design
> - allowed specifying names for modules to simplify and unify module
> loading/unloading syntax between programming and SQL. Here're the proposed
> changes:
> SQL:
>  LOAD MODULE "name" WITH ("kind"="xxx" [, (properties)])
>  UNLOAD MODULE "name";
> Table:
>  tEnv.loadModule("name", new Xxx(properties));
>  tEnv.unloadModule("name");
>
> I have completely updated the google doc [1]. Please take another look, and
> let me know if you have any other questions. Thanks!
>
> [1]
>
> https://docs.google.com/document/d/17CPMpMbPDjvM4selUVEfh_tqUK_oV0TODAUA9dfHakc/edit#
>
>
> On Tue, Oct 8, 2019 at 6:26 AM Jark Wu  wrote:
>
> > Hi Bowen,
> >
> > Thanks for the proposal. I have two thoughts:
> >
> > 1) Regarding to "loadModule", how about
> > tableEnv.loadModule("xxx" [, propertiesMap]);
> > tableEnv.unloadModule(“xxx”);
> >
> > This makes the API similar to SQL. IMO, instance of Module is not needed
> > and verbose as parameter.
> > And this makes it easier to load a simple module without any additional
> > properties, e.g. tEnv.loadModule("GEO"), tEnv.unloadModule("GEO")
> >
> > 2) In current design, the module interface only defines function
> metadata,
> > but no implementations.
> > I'm wondering how to call/map the implementations in runtime? Am I
> missing
> > something?
> >
> > Besides, I left some minor comments in the doc.
> >
> > Best,
> > Jark
> >
> >
> > On Sat, 5 Oct 2019 at 08:42, Xuefu Z  wrote:
> >
> > > I agree with Timo that the new table APIs need to be consistent. I'd go
> > > further that an name (or id) is needed for module definition in YAML
> > file.
> > > In the current design, name is skipped and type has binary meanings.
> > >
> > > Thanks,
> > > Xuefu
> > >
> > > On Fri, Oct 4, 2019 at 5:24 AM Timo Walther 
> wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > first, I was also questioning my proposal. But Bowen's proposal of
> > > > `tEnv.offloadToYaml()` would not work with the current
> > design
> > > > because we don't know how to serialize a catalog or module into
> > > > properties. Currently, there is no converter from instance to
> > > > properties. It is a one way conversion. We can add a `toProperties`
> > > > method to both Catalog and Module class in the future to solve this.
> > > > Solving the table environment serializability can be future work.
> > > >
> > > > However, I find the current proposal for the TableEnvironment methods
> > is
> > > > contradicting:
> > > >
> > > > tableEnv.loadModule(new Yyy());
> > > > tableEnv.unloadModule(“Xxx”);
> > > >
> > > > The loading is specified programmatically whereas the unloading
> > requires
> > > > a string that is not specified in the module itself. But is defined
> in
> > > > the factory according to the current design.
> > > >
> > > > SQL does it more consistently. There, the name `xxx` is used when
> > > > loading and unloading the module:
> > > >
> > > > LOAD MODULE 'xxx' [WITH ('prop'='myProp', ...)]
> > > > UNLOAD MODULE 'xxx’
> > > >
> > > > How about:
> > > >
> > > > tableEnv.loadModule("xxx", new Yyy());
> > > > tableEnv.unloadModule(“xxx”);
> > > >
> > > > This would be similar to the catalog interfaces. The name is not part
> > of
> > > > the instance itself.
> > > >
> > > > What do you think?
> > > >
> > > > Thanks,
> > > > Timo
> > > >
> > > >
> > > >
> > > >
> > > > On 01.10.19 21:17,

Re: [DISCUSS] Drop Python 2 support for 1.10

2019-10-09 Thread Dian Fu

Thanks everyone for your reply.  

So far all the reply tend to option 1 (dropping Python 2 support in 1.10) and 
will continue to hear if there are any other opinions. 

@Jincheng @Hequn, you are right, things become more complicate if dropping 
Python 2 support is performed after Python UDF has been supported. Users will 
have to migrate their Python UDFs if they have used features which only are 
supported in Python 2.

Thanks @Yu for your suggestion. It makes much sense to me and will do that. 
Also CC @user and @user-zh ML in case any users are concerned about this.

Thanks,
Dian

> 在 2019年10月9日，下午1:14，Yu Li  写道：
> 
> Thanks for bringing this up Dian.
> 
> Since python 2.7 support was added in 1.9.0 and would be EOL near the
> planned release time for 1.10, I could see a good reason to take option 1.
> 
> Please remember to add an explicit release note and would be better to send
> a notification in user ML about the plan to drop it, just in case some
> 1.9.0 users are already using python 2.7 in their product env.
> 
> Best Regards,
> Yu
> 
> 
> On Wed, 9 Oct 2019 at 11:13, Jeff Zhang  wrote:
> 
>> +1
>> 
>> Hequn Cheng  于2019年10月9日周三 上午11:07写道：
>> 
>>> Hi Dian,
>>> 
>>> +1 to drop Python 2 directly.
>>> 
>>> Just as @jincheng said, things would be more complicated if we are going
>> to
>>> support python UDFs.
>>> The python UDFs will introduce a lot of python dependencies which will
>> also
>>> drop the support of Python 2, such as beam, pandas, pyarrow, etc.
>>> Given this and Python 2 will reach EOL on Jan 1 2020. I think we can drop
>>> Python 2 in Flink as well.
>>> 
>>> As for the two options, I think we can drop it directly in 1.10. The
>>> flink-python is introduced just from 1.9, I think it's safe to drop it
>> for
>>> now.
>>> And we can also benefit from it when we add support for python UDFs.
>>> 
>>> Best, Hequn
>>> 
>>> 
>>> On Wed, Oct 9, 2019 at 8:40 AM jincheng sun 
>>> wrote:
>>> 
 Hi Dian，
 
 Thanks for bringing this discussion!
 
 In Flink 1.9 we only add Python Table API mapping to Java Table
>>> API(without
 Python UDFs), there no special requirements for Python version, so we
>> add
 python 2,7 support. But for Flink 1.10, we add the Python UDFs support,
 i.e., user will add more python code in Flink job and more requirements
>>> for
 the features of the Python language.So I think It's better to follow
>> the
 rhythm of Python official.
 
 Option 2 is the most conservative and correct approach, but for the
>>> current
 situation, we cooperate with the Beam community and use Beam's
>>> portability
 framework for UDFs support, so we prefer to adopt the Option 1.
 
 Best,
 Jincheng
 
 
 
 Dian Fu  于2019年10月8日周二 下午10:34写道：
 
> Hi everyone,
> 
> I would like to propose to drop Python 2 support(Currently Python
>> 2.7,
> 3.5, 3.6, 3.7 are all supported in Flink) as it's coming to an end at
>>> Jan
> 1, 2020 [1]. A lot of projects [2][3][4] has already stated or are
 planning
> to drop Python 2 support.
> 
> The benefits of dropping Python 2 support are:
> 1. Maintaining Python 2/3 compatibility is a burden and it makes the
>>> code
> complicate as Python 2 and Python 3 is not compatible.
> 2. There are many features which are only available in Python 3.x
>> such
>>> as
> Type Hints[5]. We can only make use of this kind of features after
 dropping
> the Python 2 support.
> 3. Flink-python depends on third-part projects, such as Apache Beam
>>> (may
> add more dependencies such as pandas, etc in the near future), it's
>> not
> possible to upgrade them to the latest version once they drop the
>>> Python
 2
> support.
> 
> Here are the options we have:
> 1. Drop Python 2 support in 1.10:
> As flink-python module is a new module added since 1.9.0 and so
>>> dropping
> Python 2 support at the early stage seems a good choice for us.
> 2. Deprecate Python 2 in 1.10 and drop its support in 1.11:
> As 1.10 is planned to be released around the beginning of 2020. This
>> is
> also aligned with the official Python 2 support.
> 
> Personally I prefer option 1 as flink-python is new module and there
>> is
 no
> much history reasons to consider.
> 
> Looking forward to your feedback!
> 
> Regards,
> Dian
> 
> [1] https://pythonclock.org/ 
> [2] https://python3statement.org/ 
> [3]
 https://spark.apache.org/news/plan-for-dropping-python-2-support.html
> <
>> https://spark.apache.org/news/plan-for-dropping-python-2-support.html
 
> [4]
> 
 
>>> 
>> https://lists.apache.org/thread.html/eba6caa58ea79a7ecbc8560d1c680a366b44c531d96ce5c699d41535@%3Cdev.beam.apache.org%3E
> <
> 
 
>>> 
>>

[jira] [Created] (FLINK-14358) [Web UI] configuration tab for jobmanager has improper width for prop key

2019-10-09 Thread Steven Zhen Wu (Jira)

Steven Zhen Wu created FLINK-14358:
--

 Summary: [Web UI] configuration tab for jobmanager has improper 
width for prop key
 Key: FLINK-14358
 URL: https://issues.apache.org/jira/browse/FLINK-14358
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Web Frontend
Affects Versions: 1.9.0
Reporter: Steven Zhen Wu
 Attachments: image-2019-10-09-16-48-30-161.png

!image-2019-10-09-16-48-30-161.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: Multiple Taskmanagers per node for standalone cluster

2019-10-09 Thread Ethan Li

Thank you very much for the help, Xintong.

Best,
Ethan

> On Oct 7, 2019, at 9:18 PM, Xintong Song  wrote:
> 
> I don't think using zookeeper should cause any problem on starting multiple 
> TMs per node.
> 
> For standalone cluster, having one TM per node is usually the easiest way. It 
> is easy to config (simply config the TM resource to the available resource of 
> the node), and it is more efficient (less TM framework overhead, no isolation 
> between JVMs of different TMs). But that doesn't mean Flink can not have 
> multiple TMs per node, and given your requirements of job isolation, I think 
> you should do that. You just need to carefully config the TM number and 
> resources to make sure they do not exceed the node's available resource.
> 
> I don't have much experience running standalone clusters with multiple TMs 
> per node. We mainly run Flink on Yarn in our production, with multiple TMs 
> (Yarn containers) per node (Yarn NM), and it works fine for us.
> 
> Thank you~
> Xintong Song
> 
> 
> On Mon, Oct 7, 2019 at 9:05 PM Ethan Li  > wrote:
> Thank you very much Xintong for the information. 
> 
> I am currently using zookeeper. I guess starting another task manager will 
> just work? But I am not sure if multiple TaskManagers per node is good idea 
> to use Flink. Do you have any experience on this? Thanks!
> 
> Best,
> Ethan
> 
> 
>> On Oct 5, 2019, at 3:43 AM, Xintong Song > > wrote:
>> 
>> For having multiple task managers on the same host, you can put multiple 
>> duplicated lines with the target host in '/conf/slaves'. 
>> The task managers on the same host will share the same config files on the 
>> host.
>> 
>> Thank you~
>> Xintong Song
>> 
>> 
>> On Sat, Oct 5, 2019 at 5:02 AM Ethan Li > > wrote:
>> Hello,
>> 
>> Does/did anyone try to set up a standalone cluster with multiple 
>> TaskManagers per node? We are working on moving to flink-on-yarn solution. 
>> But before that happens,  I am thinking about the following setup to  get 
>> jobs isolated from each other
>> 
>> 1) multiple taskmanager per host
>> 2) 1 taskSlot per TaskManager
>> 
>> 
>> Currently we have 1 TaskManger per node and many taskSlot per TM, tasks from 
>> different job will be scheduled into one JVM process and it’s basically 
>> impossible to debug. One bad job will kill the whole cluster.
>> 
>> Could you please share if you have any experience on this and what’re the 
>> problems that might have?
>> 
>> Thank you very much. Really appreciate it.
>> 
>> Best,
>> Ethan
>

Re: [DISCUSS] FLIP-73: Introducing Executors for job submission

2019-10-09 Thread Zili Chen

Hi Kostas & Aljoscha,

I'm drafting a plan exposing multi-layered clients. It is mainly about
how we distinguish different layers and what clients we're going to
expose.

In FLIP-73 scope I'd like to ask a question that whether or not Executor
becomes a public interface that can be made use of by downstream
project developer? Or it just an internal concept for unifying job
submission?
If it is the latter, I'm feeling multi-layer client topic is totally
independent from
Executor.

Best,
tison.


Thomas Weise  于2019年10月5日周六 上午12:17写道：

> It might be useful to mention on FLIP-73 that the intention for
> Executor.execute is to be an asynchronous API once it becomes public and
> also refer to FLIP-74 as such.
>
>
> On Fri, Oct 4, 2019 at 2:52 AM Aljoscha Krettek 
> wrote:
>
> > Hi Tison,
> >
> > I agree, for now the async Executor.execute() is an internal detail but
> > during your work for FLIP-74 it will probably also reach the public API.
> >
> > Best,
> > Aljoscha
> >
> > > On 4. Oct 2019, at 11:39, Zili Chen  wrote:
> > >
> > > Hi Aljoscha,
> > >
> > > After clearly narrow the scope of this FLIP it looks good to me the
> > > interface
> > > Executor and its discovery so that I'm glad to see the vote thread.
> > >
> > > As you said, we should still discuss on implementation details but I
> > don't
> > > think
> > > it should be a blocker of the vote thread because a vote means we
> > generally
> > > agree on the motivation and overall design.
> > >
> > > As for Executor.execute() to be async, it is much better than we keep
> the
> > > difference between sync/async in this level. But I'd like to note that
> it
> > > only
> > > works internally for now because user-facing interface is still
> > env.execute
> > > which block and return a JobExecutionResult. I'm afraid that there are
> > > several
> > > people depends on the result for doing post execution process, although
> > it
> > > doesn't
> > > work on current per-job mode.
> > >
> > > Best,
> > > tison.
> > >
> > >
> > > Aljoscha Krettek  于2019年10月4日周五 下午4:40写道：
> > >
> > >> Do you all think we could agree on the basic executor primitives and
> > start
> > >> voting on this FLIP? There are still some implementation details but I
> > >> think we can discuss/tackle them when we get to them and the various
> > people
> > >> implementing this should be in close collaboration.
> > >>
> > >> Best,
> > >> Aljoscha
> > >>
> > >>> On 4. Oct 2019, at 10:15, Aljoscha Krettek 
> > wrote:
> > >>>
> > >>> Hi,
> > >>>
> > >>> I think the end goal is to have only one environment per API, but I
> > >> think we won’t be able to achieve that in the short-term because of
> > >> backwards compatibility. This is most notable with the context
> > environment,
> > >> preview environments etc.
> > >>>
> > >>> To keep this FLIP very slim we can make this only about the executors
> > >> and executor discovery. Anything else like job submission semantics,
> > >> detached mode, … can be tackled after this. If we don’t focus I’m
> afraid
> > >> this will drag on for quite a while.
> > >>>
> > >>> One thing I would like to propose to make this easier is to change
> > >> Executor.execute() to return a CompletableFuture and to completely
> > remove
> > >> the “detached” logic from ClusterClient. That way, the new components
> > make
> > >> no distinction between “detached” and “attached” but we can still do
> it
> > in
> > >> the CLI (via the ContextEnvironment) to support the existing
> “detached”
> > >> behaviour of the CLI that users expect. What do you think about this?
> > >>>
> > >>> Best,
> > >>> Aljoscha
> > >>>
> >  On 3. Oct 2019, at 10:03, Zili Chen  wrote:
> > 
> >  Thanks for your explanation Kostas to make it clear subtasks under
> > >> FLIP-73.
> > 
> >  As you described, changes of Environment are included in this FLIP.
> > For
> >  "each
> >  API to have a single Environment", it could be helpful to describe
> > which
> >  APIs we'd
> >  like to have after FLIP-73. And if we keep multiple Environments,
> > shall
> > >> we
> >  keep the
> >  way inject context environment for each API?
> > 
> > 
> >  Kostas Kloudas  于2019年10月3日周四 下午1:44写道：
> > 
> > > Hi Tison,
> > >
> > > The changes that this FLIP propose are:
> > > - the introduction of the Executor interface
> > > - the fact that everything in the current state of job submission
> in
> > > Flink can be defined through configuration parameters
> > > - implementation of Executors that do not change any of the
> semantics
> > > of the currently offered "modes" of job submission
> > >
> > > In this, and in the FLIP itself where the
> > > ExecutionEnvironment.execute() method is described, there are
> details
> > > about parts of the
> > > integration with the existing Flink code-base.
> > >
> > > So I am not sure what do you mean by making the "integration a
> > > follow-up discussion".
> > >
>

[jira] [Created] (FLINK-14357) Request for the ability to move checkpoints/savepoints

2019-10-09 Thread Mitch Wasson (Jira)

Mitch Wasson created FLINK-14357:


 Summary: Request for the ability to move checkpoints/savepoints
 Key: FLINK-14357
 URL: https://issues.apache.org/jira/browse/FLINK-14357
 Project: Flink
  Issue Type: Wish
Reporter: Mitch Wasson


This request comes out of working with the state processor APIs.

I was originally trying to have this workflow when using the state processing 
APIs:
 # Download savepoint / checkpoint from s3 to my pc using non-flink tooling 
(e.g. s3 cli)
 # Process the savepoint / checkpoint locally without s3 access

This ended up not being possible.

It seems that the state processing APIs look at the checkpoint / savepoint 
metadata file first. Then they follow fully qualified s3 links from the 
metadata file. So, even though I've downloaded the entire checkpoint / 
savepoint to my pc, s3 access is still required.

This is a specific instance of the problem, which I think more generally can be 
stated as: Checkpoints/savepoints can't be moved (easily)

 

It would be great if savepoints / checkpoints could be moved around in a 
self-contained manner and used at a later time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-14356) Support some special RowDeserializationSchema and RowSerializationSchema

2019-10-09 Thread jinfeng (Jira)

jinfeng created FLINK-14356:
---

 Summary: Support some special RowDeserializationSchema and 
RowSerializationSchema 
 Key: FLINK-14356
 URL: https://issues.apache.org/jira/browse/FLINK-14356
 Project: Flink
  Issue Type: Improvement
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile), Table 
SQL / API
Reporter: jinfeng


I want to use flink sql to write kafka messages directly to hdfs. The 
serialization and deserialization of messages are not involved in the middle.  
The bytes of the message directly convert the first field of Row.  However, the 
current RowSerializationSchema does not support the conversion of bytes to 
VARBINARY. Can we add some special RowSerializationSchema and 
RowDerializationSchema ? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-14355) Example code in state processor API docs doesn't compile

2019-10-09 Thread Mitch Wasson (Jira)

Mitch Wasson created FLINK-14355:


 Summary: Example code in state processor API docs doesn't compile
 Key: FLINK-14355
 URL: https://issues.apache.org/jira/browse/FLINK-14355
 Project: Flink
  Issue Type: Bug
Reporter: Mitch Wasson


The example code in this doc page doesn't compile:

[https://ci.apache.org/projects/flink/flink-docs-stable/dev/libs/state_processor_api.html]

Here are two instances I found:
 * Reading State java and scala contain references to undefined 
{{stateDescriptor}} variable
 * Reading Keyed State scala has some invalid scala ("{{override def 
open(Configuration parameters)"}})

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-14354) Provide interfaces instead of abstract classes in org.apache.flink.state.api.functions

2019-10-09 Thread Mitch Wasson (Jira)

Mitch Wasson created FLINK-14354:


 Summary: Provide interfaces instead of abstract classes in 
org.apache.flink.state.api.functions
 Key: FLINK-14354
 URL: https://issues.apache.org/jira/browse/FLINK-14354
 Project: Flink
  Issue Type: Improvement
Reporter: Mitch Wasson


I've started using the new state processing API in Flink 1.9. Super useful and 
works great for the most part.

However, I think there is opportunity to simplify implementations that use the 
API. My request to enable these simplifications is to provides interfaces 
instead of (or in addition to) abstract classes in 
org.apache.flink.state.api.functions. Then have the state processing API 
require those interfaces.

My use case involves maintaining and processing keyed state. This is 
accomplished with a KeyedProcessFunction:

{color:#cc7832}class {color}BooleanProcess {color:#cc7832}extends 
{color}KeyedProcessFunction[{color:#4e807d}String{color}{color:#cc7832}, 
{color}{color:#4e807d}String{color}{color:#cc7832}, 
{color}{color:#4e807d}String{color}] {

   {color:#cc7832}var {color}{color:#9876aa}bool{color}: 
ValueState[{color:#cc7832}Boolean{color}] = _

   {color:#cc7832}override def {color}{color:#ffc66d}open{color}(parameters: 
Configuration) {
 {color:#9876aa}bool {color}= getRuntimeContext.getState({color:#cc7832}new 
{color}ValueStateDescriptor({color:#6a8759}"boolean-state"{color}{color:#cc7832},
 {color}classOf[{color:#cc7832}Boolean{color}]))
   }

   {color:#cc7832}override def 
{color}{color:#ffc66d}processElement{color}(value: 
{color:#4e807d}String{color}{color:#cc7832}, {color}ctx: 
KeyedProcessFunction[{color:#4e807d}String{color}{color:#cc7832}, 
{color}{color:#4e807d}String{color}{color:#cc7832}, 
{color}{color:#4e807d}String{color}]#Context{color:#cc7832}, {color}out: 
Collector[{color:#4e807d}String{color}]): {color:#cc7832}Unit {color}= {

 {color:#cc7832}if {color}({color:#9876aa}bool{color}.value) {
   out.collect(value)
 } {color:#cc7832}else {color}{
   {color:#cc7832}if {color}(Math.random < {color:#6897bb}0.005{color}) {
 {color:#9876aa}bool{color}.update({color:#cc7832}true{color})
 out.collect(value)
   }
 }
   }
}


 I then use a KeyedStateReaderFunction like this to inspect 
savepoints/checkpoints:

{color:#cc7832}class {color}BooleanProcessStateReader {color:#cc7832}extends 
{color}KeyedStateReaderFunction[{color:#4e807d}String{color}{color:#cc7832}, 
{color}{color:#4e807d}String{color}] {

   {color:#cc7832}var {color}{color:#9876aa}bool{color}: 
ValueState[{color:#cc7832}Boolean{color}] = _

   {color:#cc7832}override def {color}{color:#ffc66d}open{color}(parameters: 
Configuration) {
 {color:#9876aa}bool {color}= getRuntimeContext.getState({color:#cc7832}new 
{color}ValueStateDescriptor({color:#6a8759}"boolean-state"{color}{color:#cc7832},
 {color}classOf[{color:#cc7832}Boolean{color}]))
   }

   {color:#cc7832}override def {color}{color:#ffc66d}readKey{color}(key: 
{color:#4e807d}String{color}{color:#cc7832}, {color}ctx: 
KeyedStateReaderFunction.Context{color:#cc7832}, {color}out: 
Collector[{color:#4e807d}String{color}]): {color:#cc7832}Unit {color}= {
 out.collect(key)
   }
}

 

Ideally, I would like my KeyedStateReaderFunction to look like this:

{color:#cc7832}class {color}BooleanProcessStateReader {color:#cc7832}extends 
{color:#172b4d}BooleanProcess{color} implements 
{color}KeyedStateReaderFunction[{color:#4e807d}String{color}{color:#cc7832}, 
{color}{color:#4e807d}String{color}] {

   {color:#cc7832}override def {color}{color:#ffc66d}readKey{color}(key: 
{color:#4e807d}String{color}{color:#cc7832}, {color}ctx: 
KeyedStateReaderFunction.Context{color:#cc7832}, {color}out: 
Collector[{color:#4e807d}String{color}]): {color:#cc7832}Unit {color}= {
 out.collect(key)
   }
}

However, this can't be done with the current API due Java's single inheritance 
and KeyedStateReaderFunction being an abstract class.

The code savings are rather trivial in this example. However, it makes the 
state reader much easier to maintain. It would automatically inherit state and 
lifecycle methods from the class whose state it is inspecting.

 

 

 

 

 

 

 

 

 

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table module

2019-10-09 Thread Bowen Li

Hi Dawid,

+1 for proposed changes

On Wed, Oct 9, 2019 at 12:15 PM Dawid Wysakowicz 
wrote:

> Sorry for a very delayed response.
>
> @Kurt Yes, this is the goal to have a function created like new
> Function(...) also be wrapped into CatalogFunction. This would have to
> be though a temporary function as we cannot represent that as a set of
> properties. Similar to the createTemporaryView(DataStream stream).
>
> As for the ConnectTableDescriptor I agree this is very similar to
> CatalogTable. I am not sure though if we should get rid of it. In the
> end I see it as a builder for a CatalogTable, which is a slightly more
> internal API, but we might revisit that some time in the future if we
> find that it makes more sense.
>
> @All I updated the FLIP page with some more details from the outcome of
> the discussions around FLIP-57. Please take a look. I would like to
> start a vote on this FLIP as soon as the vote on FLIP-57 goes through.
>
> Best,
>
> Dawid
>
>
> On 19/09/2019 09:24, Kurt Young wrote:
> > IIUC it's good to see that both serializable (tables description from
> DDL)
> > and unserializable (tables with DataStream underneath) tables are treated
> > unify with CatalogTable.
> >
> > Can I also assume functions that either come from a function class (from
> > DDL)
> > or function objects (newed by user) will also treated unify with
> > CatalogFunction?
> >
> > This will greatly simplify and unify current API level concepts and
> design.
> >
> > And it seems only one thing left, how do we deal with
> > ConnectTableDescriptor?
> > It's actually very similar with serializable CatalogTable, both carry
> some
> > text
> > properties which even are the same. Is there any chance we can further
> unify
> > this to CatalogTable?
> >
> > object
> > Best,
> > Kurt
> >
> >
> > On Thu, Sep 19, 2019 at 3:13 PM Jark Wu  wrote:
> >
> >> Thanks Dawid for the design doc.
> >>
> >> In general, I’m +1 to the FLIP.
> >>
> >>
> >> +1 to the single-string and parse way to express object path.
> >>
> >> +1 to deprecate registerTableSink & registerTableSource.
> >> But I would suggest to provide an easy way to register a custom
> >> source/sink before we drop them (this is another story).
> >> Currently, it’s not easy to implement a custom connector descriptor.
> >>
> >> Best,
> >> Jark
> >>
> >>
> >>> 在 2019年9月19日，11:37，Dawid Wysakowicz  写道：
> >>>
> >>> Hi JingsongLee,
> >>> From my understanding they can. Underneath they will be CatalogTables.
> >> The
> >>> difference is the lifetime of the tables. Plus some of the user facing
> >>> interfaces cannot be persisted e.g. datastream. Therefore we must have
> a
> >>> separate methods for that. In the end the temporary tables are held in
> >>> memory as CatalogTables.
> >>> Best,
> >>> Dawid
> >>>
> >>> On Thu, 19 Sep 2019, 10:08 JingsongLee,  >> .invalid>
> >>> wrote:
> >>>
>  Hi dawid:
>  Can temporary tables achieve the same capabilities as catalog table?
>  like statistics: CatalogTableStatistics, CatalogColumnStatistics,
>  PartitionStatistics
>  like partition support: we have added some catalog equivalent
> interfaces
>  on TableSource/TableSink: getPartitions, getPartitionFieldNames
>  Maybe it's not a good idea to add these interfaces to
>  TableSource/TableSink. What do you think?
> 
>  Best,
>  Jingsong Lee
> 
> 
>  --
>  From:Kurt Young 
>  Send Time:2019年9月18日(星期三) 17:54
>  To:dev 
>  Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table
>  module
> 
>  Hi all,
> 
>  Sorry to join this party late. Big +1 to this flip, especially for the
>  dropping
>  "registerTableSink & registerTableSource" part. These are indeed
> legacy
>  and we should try to unify them through CatalogTable after we
> introduce
>  the concept of Catalog.
> 
>  From my understanding, what we can registered should all be metadata,
>  TableSource/TableSink should only be the one who is responsible to do
>  the real work, i.e. reading and writing data according to the schema
> and
>  other information like computed column, partition, .e.g.
> 
>  Best,
>  Kurt
> 
> 
>  On Wed, Sep 18, 2019 at 5:14 PM JingsongLee   .invalid>
>  wrote:
> 
> > After some development and thinking, I have a general understanding.
> > +1 to registering a source/sink does not fit into the SQL world.
> > I am OK to have a deprecated registerTemporarySource/Sink to
> compatible
> > with old ways.
> >
> > Best,
> > Jingsong Lee
> >
> >
> > --
> > From:Timo Walther 
> > Send Time:2019年9月17日(星期二) 08:00
> > To:dev 
> > Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table
> > module
> >
> > Hi Dawid,
> >
> > thanks for the design

[jira] [Created] (FLINK-14353) Enable fork-reuse for table-planner

2019-10-09 Thread Chesnay Schepler (Jira)

Chesnay Schepler created FLINK-14353:


 Summary: Enable fork-reuse for table-planner
 Key: FLINK-14353
 URL: https://issues.apache.org/jira/browse/FLINK-14353
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Planner, Tests
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.10.0


Enable fork reuse for table-planner to half test times.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] FLIP-68: Extend Core Table System with Modular Plugins

2019-10-09 Thread Bowen Li

Thanks everyone for your review.

After discussing with Timo and Dawid offline, as well as incorporating
feedback from Xuefu and Jark on mailing list, I decided to make a few
critical changes to the proposal.

- renamed the keyword "type" to "kind". The community has plan to have
"type" keyword in yaml/descriptor refer to data types exclusively in the
near future. We should cater to that change in our design
- allowed specifying names for modules to simplify and unify module
loading/unloading syntax between programming and SQL. Here're the proposed
changes:
SQL:
 LOAD MODULE "name" WITH ("kind"="xxx" [, (properties)])
 UNLOAD MODULE "name";
Table:
 tEnv.loadModule("name", new Xxx(properties));
 tEnv.unloadModule("name");

I have completely updated the google doc [1]. Please take another look, and
let me know if you have any other questions. Thanks!

[1]
https://docs.google.com/document/d/17CPMpMbPDjvM4selUVEfh_tqUK_oV0TODAUA9dfHakc/edit#


On Tue, Oct 8, 2019 at 6:26 AM Jark Wu  wrote:

> Hi Bowen,
>
> Thanks for the proposal. I have two thoughts:
>
> 1) Regarding to "loadModule", how about
> tableEnv.loadModule("xxx" [, propertiesMap]);
> tableEnv.unloadModule(“xxx”);
>
> This makes the API similar to SQL. IMO, instance of Module is not needed
> and verbose as parameter.
> And this makes it easier to load a simple module without any additional
> properties, e.g. tEnv.loadModule("GEO"), tEnv.unloadModule("GEO")
>
> 2) In current design, the module interface only defines function metadata,
> but no implementations.
> I'm wondering how to call/map the implementations in runtime? Am I missing
> something?
>
> Besides, I left some minor comments in the doc.
>
> Best,
> Jark
>
>
> On Sat, 5 Oct 2019 at 08:42, Xuefu Z  wrote:
>
> > I agree with Timo that the new table APIs need to be consistent. I'd go
> > further that an name (or id) is needed for module definition in YAML
> file.
> > In the current design, name is skipped and type has binary meanings.
> >
> > Thanks,
> > Xuefu
> >
> > On Fri, Oct 4, 2019 at 5:24 AM Timo Walther  wrote:
> >
> > > Hi everyone,
> > >
> > > first, I was also questioning my proposal. But Bowen's proposal of
> > > `tEnv.offloadToYaml()` would not work with the current
> design
> > > because we don't know how to serialize a catalog or module into
> > > properties. Currently, there is no converter from instance to
> > > properties. It is a one way conversion. We can add a `toProperties`
> > > method to both Catalog and Module class in the future to solve this.
> > > Solving the table environment serializability can be future work.
> > >
> > > However, I find the current proposal for the TableEnvironment methods
> is
> > > contradicting:
> > >
> > > tableEnv.loadModule(new Yyy());
> > > tableEnv.unloadModule(“Xxx”);
> > >
> > > The loading is specified programmatically whereas the unloading
> requires
> > > a string that is not specified in the module itself. But is defined in
> > > the factory according to the current design.
> > >
> > > SQL does it more consistently. There, the name `xxx` is used when
> > > loading and unloading the module:
> > >
> > > LOAD MODULE 'xxx' [WITH ('prop'='myProp', ...)]
> > > UNLOAD MODULE 'xxx’
> > >
> > > How about:
> > >
> > > tableEnv.loadModule("xxx", new Yyy());
> > > tableEnv.unloadModule(“xxx”);
> > >
> > > This would be similar to the catalog interfaces. The name is not part
> of
> > > the instance itself.
> > >
> > > What do you think?
> > >
> > > Thanks,
> > > Timo
> > >
> > >
> > >
> > >
> > > On 01.10.19 21:17, Bowen Li wrote:
> > > > If something like the yaml file is the way to go and achieve such
> > > > motivation, we would cover that with current design.
> > > >
> > > > On Tue, Oct 1, 2019 at 12:05 Bowen Li  wrote:
> > > >
> > > >> Hi Timo, Dawid,
> > > >>
> > > >> I've added the suggested SQL and related changes to TableEnvironment
> > API
> > > >> and other classes to the google doc. Also removed "USE MODULE" and
> its
> > > >> APIs. Will update FLIP wiki once we have a consensus.
> > > >>
> > > >> W.r.t. descriptor approach, my gut feeling is similar to Dawid's.
> > > Besides,
> > > >> I feel yaml file would be a better solution to persist serializable
> > > state
> > > >> of an environment as the file itself is in serializable format
> > already.
> > > >> Though yaml file only serves SQL CLI at this moment, we may be able
> to
> > > >> extend its reach to Table API and allow users to load/offload a
> > > >> TableEnvironment from/to yaml files, as something like
> > "TableEnvironment
> > > >> tEnv = TableEnvironment.loadFromYaml()" and
> > > >> "tEnv.offloadToYaml()" to restore and persist state, and
> > try
> > > to
> > > >> make yaml file more expressive.
> > > >>
> > > >>
> > > >> On Tue, Oct 1, 2019 at 6:47 AM Dawid Wysakowicz <
> > dwysakow...@apache.org
> > > >
> > > >> wrote:
> > > >>
> > > >>> Hi Timo, Bowen,
> > > >>>
> > > >>> Unfortunately I did not have enough time to go through

[jira] [Created] (FLINK-14352) Dependencies section in Connect page of Table is broken

2019-10-09 Thread Jingsong Lee (Jira)

Jingsong Lee created FLINK-14352:


 Summary: Dependencies section in Connect page of Table is broken
 Key: FLINK-14352
 URL: https://issues.apache.org/jira/browse/FLINK-14352
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Reporter: Jingsong Lee
 Fix For: 1.10.0


In 
[https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connect.html]

Dependencies section not show the dependencies table in master, it work good in 
1.9.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-14351) Refactor MetricRegistry delimiter retrieval into separate interface

2019-10-09 Thread Chesnay Schepler (Jira)

Chesnay Schepler created FLINK-14351:


 Summary: Refactor MetricRegistry delimiter retrieval into separate 
interface
 Key: FLINK-14351
 URL: https://issues.apache.org/jira/browse/FLINK-14351
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Metrics
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.10.0


The MetricRegistry offers a few methods for retrieving configured delimiters, 
which are used a fair bit during scope operations; however other methods aren't 
being used in these contexts.

Hence we could reduce access and simplify testing by introducing a dedicated 
interface for these methods that the registry extends.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-14350) Introduce dedicated MetricScope

2019-10-09 Thread Chesnay Schepler (Jira)

Chesnay Schepler created FLINK-14350:


 Summary: Introduce dedicated MetricScope
 Key: FLINK-14350
 URL: https://issues.apache.org/jira/browse/FLINK-14350
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Metrics
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.10.0


The MetricGroup interface is currently overloaded, being used both for 
registering groups/metrics (typically called from user-functions) and accessing 
scope information (typically called from reporters).

Due to effectively having 2 different audiences with different use-cases it 
makes sense to move the scope-related methods into a separate interface.

This should make it a lot easier to extend these interfaces, as significantly 
less tests have to be adjusted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] FLIP-67: Global partitions lifecycle

2019-10-09 Thread Chesnay Schepler

Are there any other opinions in regards to the naming scheme? 
(local/global, promote)


On 06/09/2019 15:16, Chesnay Schepler wrote:

Hello,

FLIP-36 (interactive programming) 
 
proposes a new programming paradigm where jobs are built incrementally 
by the user.


To support this in an efficient manner I propose to extend partition 
life-cycle to support the notion of /global partitions/, which are 
partitions that can exist beyond the life-time of a job.


These partitions could then be re-used by subsequent jobs in a fairly 
efficient manner, as they don't have to persisted to an external 
storage first and consuming tasks could be scheduled to exploit 
data-locality.


The FLIP outlines the required changes on the JobMaster, TaskExecutor 
and ResourceManager to support this from a life-cycle perspective.


This FLIP does /not/ concern itself with the /usage/ of global 
partitions, including client-side APIs, job-submission, scheduling and 
reading said partitions; these are all follow-ups that will either be 
part of FLIP-36 or spliced out into separate FLIPs.

Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table module

2019-10-09 Thread Dawid Wysakowicz

Sorry for a very delayed response.

@Kurt Yes, this is the goal to have a function created like new
Function(...) also be wrapped into CatalogFunction. This would have to
be though a temporary function as we cannot represent that as a set of
properties. Similar to the createTemporaryView(DataStream stream).

As for the ConnectTableDescriptor I agree this is very similar to
CatalogTable. I am not sure though if we should get rid of it. In the
end I see it as a builder for a CatalogTable, which is a slightly more
internal API, but we might revisit that some time in the future if we
find that it makes more sense.

@All I updated the FLIP page with some more details from the outcome of
the discussions around FLIP-57. Please take a look. I would like to
start a vote on this FLIP as soon as the vote on FLIP-57 goes through.

Best,

Dawid


On 19/09/2019 09:24, Kurt Young wrote:
> IIUC it's good to see that both serializable (tables description from DDL)
> and unserializable (tables with DataStream underneath) tables are treated
> unify with CatalogTable.
>
> Can I also assume functions that either come from a function class (from
> DDL)
> or function objects (newed by user) will also treated unify with
> CatalogFunction?
>
> This will greatly simplify and unify current API level concepts and design.
>
> And it seems only one thing left, how do we deal with
> ConnectTableDescriptor?
> It's actually very similar with serializable CatalogTable, both carry some
> text
> properties which even are the same. Is there any chance we can further unify
> this to CatalogTable?
>
> object
> Best,
> Kurt
>
>
> On Thu, Sep 19, 2019 at 3:13 PM Jark Wu  wrote:
>
>> Thanks Dawid for the design doc.
>>
>> In general, I’m +1 to the FLIP.
>>
>>
>> +1 to the single-string and parse way to express object path.
>>
>> +1 to deprecate registerTableSink & registerTableSource.
>> But I would suggest to provide an easy way to register a custom
>> source/sink before we drop them (this is another story).
>> Currently, it’s not easy to implement a custom connector descriptor.
>>
>> Best,
>> Jark
>>
>>
>>> 在 2019年9月19日，11:37，Dawid Wysakowicz  写道：
>>>
>>> Hi JingsongLee,
>>> From my understanding they can. Underneath they will be CatalogTables.
>> The
>>> difference is the lifetime of the tables. Plus some of the user facing
>>> interfaces cannot be persisted e.g. datastream. Therefore we must have a
>>> separate methods for that. In the end the temporary tables are held in
>>> memory as CatalogTables.
>>> Best,
>>> Dawid
>>>
>>> On Thu, 19 Sep 2019, 10:08 JingsongLee, > .invalid>
>>> wrote:
>>>
 Hi dawid:
 Can temporary tables achieve the same capabilities as catalog table?
 like statistics: CatalogTableStatistics, CatalogColumnStatistics,
 PartitionStatistics
 like partition support: we have added some catalog equivalent interfaces
 on TableSource/TableSink: getPartitions, getPartitionFieldNames
 Maybe it's not a good idea to add these interfaces to
 TableSource/TableSink. What do you think?

 Best,
 Jingsong Lee


 --
 From:Kurt Young 
 Send Time:2019年9月18日(星期三) 17:54
 To:dev 
 Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table
 module

 Hi all,

 Sorry to join this party late. Big +1 to this flip, especially for the
 dropping
 "registerTableSink & registerTableSource" part. These are indeed legacy
 and we should try to unify them through CatalogTable after we introduce
 the concept of Catalog.

 From my understanding, what we can registered should all be metadata,
 TableSource/TableSink should only be the one who is responsible to do
 the real work, i.e. reading and writing data according to the schema and
 other information like computed column, partition, .e.g.

 Best,
 Kurt


 On Wed, Sep 18, 2019 at 5:14 PM JingsongLee >>> .invalid>
 wrote:

> After some development and thinking, I have a general understanding.
> +1 to registering a source/sink does not fit into the SQL world.
> I am OK to have a deprecated registerTemporarySource/Sink to compatible
> with old ways.
>
> Best,
> Jingsong Lee
>
>
> --
> From:Timo Walther 
> Send Time:2019年9月17日(星期二) 08:00
> To:dev 
> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table
> module
>
> Hi Dawid,
>
> thanks for the design document. It fixes big concept gaps due to
> historical reasons with proper support for serializability and catalog
> support in mind.
>
> I would not mind a registerTemporarySource/Sink, but the problem that I
> see is that many people think that this is the recommended way of
> registering a table source/sink which is not true. We should guide
>> users
>

Re: [DISCUSS] FLIP-67: Global partitions lifecycle

2019-10-09 Thread Chesnay Schepler

While we could argue that it's a new interface so we aren't /technically 
/changing anything about the ShuffleMaster, I'd assume most people would 
just have the ShuffleMaster implement the new interface and call it a day.


On 09/10/2019 09:57, Chesnay Schepler wrote:
So should we enforce having 2 instances now or defer this to a later 
date?


I'd rather do this early since it changes 2 assumptions that 
ShuffleMaster can currently make:

- every partition release is preceded by a registration of said partition
- the release of partitions may rely on local data

On 04/10/2019 17:10, Till Rohrmann wrote:

Thanks for updating the FLIP.

I think the RM does not need to have access to a full fledged 
ShuffleMaster

implementation. Instead it should enough to give it a leaner interface
which only supports to delete result partitions and list available 
global

partitions. This might entail that one will have a ShuffleMaster
implementation running on the Dispatcher and a
GlobalResultPartitionsShuffleMaster implementation running on the RM. 
Long

story short, if we separate the RM from the Dispatcher, then this might
entail that we will have two ShuffleMaster incarnations running in each
process.

Cheers,
Till

On Fri, Oct 4, 2019 at 3:34 PM Chesnay Schepler  
wrote:



I have updated the FLIP.

- consistently use "local"/"global" terminology; this incidentally 
should

make it easier to update the terminology if we decide on other names
- inform RM via heartbeats from TE about available global partitions
- add dedicated method for releasing global partitions
- add dedicated section for required changes to the ShuffleMaster 
(mostly

clarification)
- added some items to the "Rejected Alternatives" section
- updated discussion link


While writing the ShuffleMaster section I noticed the following:

If, at any point, the JM/RM are moved into dedicated processes we 
either

a) have multiple ShuffleMaster instances for the same shuffle service
active
b) require a single ShuffleMaster on the RM, to which JM calls are 
being

forwarded.

Neither of these are without pain-points;
a) introduces additional constraints on ShuffleMaster 
implementations in

that no local state must be kept
b) again forces the JM to regularly be in touch with the RM, and limits
the ShuffleMaster interface to being RPC-friendly.

I'm wondering whether this issue was already an anyone's radar.


On 04/10/2019 14:12, Till Rohrmann wrote:



On Fri, Oct 4, 2019 at 12:37 PM Chesnay Schepler 
wrote:

*Till: In the FLIP you wrote "The set of partitions to release may 
contain local

and/or global partitions; the promotion set must only refer to local
partitions." to describe the `releasePartitions`. I think the JM 
should

never be in the situation to release a global partition. Moreover, I
believe we should have a separate RPC to release global result 
partitions

which might come from the RM.*

We can certainly add a separate RPC method for explicitly releasing 
global partitions.
You are correct that the JM should not be able to release those, 
just like the RM should not be able to release non-global partitions.
*Till: Once the JM has obtained the required slots to run a job, it 
no longer

needs to communicate with the RM. Hence, a lost RM connection won't
interfere with the job. I would like to keep it like this by 
letting the TE
announce global result partitions to the RM and not to introduce 
another

communication roundtrip.

*Agreed, this is a nice property to retain.
*Till: How big do you expect the payload to become?

*I don't know, which is precisely why I want to be cautious about it.
The last time I made a similar assumption I didn't expect anyone to 
have hundreds of thousands of metrics on a single TM, which turned 
out to be wrong.
I wouldn't exclude the possibility of a similar number of 
partitions being hosted on a single TE.



One problem we have to solve with the heartbeat-based approach is 
that partitions may be lost without the TE noticing, due to 
disk-failures or external delete operations.
Currently, for scheduling purposes we rely on information stored in 
the JM, and update said information if a job fails due to a missing 
partition. However, IIRC the JM is informed about with an exception 
that is thrown by the consumer of said partition, not the producer. 
As far as the producing TM is concerned, it is still hosting that 
partition.
This means we have to forward errors for missing partitions from 
the network stack on the producers side to the TE, so that it can 
inform the RM about it.



Yes, I think you are right Chesnay. This would also be a good 
addition for

the local result partitions.

Cheers,
Till


On 02/10/2019 16:21, Till Rohrmann wrote:

Thanks for addressing our comments Chesnay. See some comments inline.

On Wed, Oct 2, 2019 at 4:07 PM Chesnay Schepler 
  wrote:



Thank you for your comments; I've aggregated them a bit and added
comments to each of them.

1) Concept name (proposal: persistent)

Re: [DISCUSS] FLIP-67: Global partitions lifecycle

2019-10-09 Thread Chesnay Schepler


So should we enforce having 2 instances now or defer this to a later date?

I'd rather do this early since it changes 2 assumptions that 
ShuffleMaster can currently make:

- every partition release is preceded by a registration of said partition
- the release of partitions may rely on local data

On 04/10/2019 17:10, Till Rohrmann wrote:

Thanks for updating the FLIP.

I think the RM does not need to have access to a full fledged ShuffleMaster
implementation. Instead it should enough to give it a leaner interface
which only supports to delete result partitions and list available global
partitions. This might entail that one will have a ShuffleMaster
implementation running on the Dispatcher and a
GlobalResultPartitionsShuffleMaster implementation running on the RM. Long
story short, if we separate the RM from the Dispatcher, then this might
entail that we will have two ShuffleMaster incarnations running in each
process.

Cheers,
Till

On Fri, Oct 4, 2019 at 3:34 PM Chesnay Schepler  wrote:


I have updated the FLIP.

- consistently use "local"/"global" terminology; this incidentally should
make it easier to update the terminology if we decide on other names
- inform RM via heartbeats from TE about available global partitions
- add dedicated method for releasing global partitions
- add dedicated section for required changes to the ShuffleMaster (mostly
clarification)
- added some items to the "Rejected Alternatives" section
- updated discussion link


While writing the ShuffleMaster section I noticed the following:

If, at any point, the JM/RM are moved into dedicated processes we either
a) have multiple ShuffleMaster instances for the same shuffle service
active
b) require a single ShuffleMaster on the RM, to which JM calls are being
forwarded.

Neither of these are without pain-points;
a) introduces additional constraints on ShuffleMaster implementations in
that no local state must be kept
b) again forces the JM to regularly be in touch with the RM, and limits
the ShuffleMaster interface to being RPC-friendly.

I'm wondering whether this issue was already an anyone's radar.


On 04/10/2019 14:12, Till Rohrmann wrote:



On Fri, Oct 4, 2019 at 12:37 PM Chesnay Schepler 
wrote:


*Till: In the FLIP you wrote "The set of partitions to release may contain local
and/or global partitions; the promotion set must only refer to local
partitions." to describe the `releasePartitions`. I think the JM should
never be in the situation to release a global partition. Moreover, I
believe we should have a separate RPC to release global result partitions
which might come from the RM.*

We can certainly add a separate RPC method for explicitly releasing global 
partitions.
You are correct that the JM should not be able to release those, just like the 
RM should not be able to release non-global partitions.
*Till: Once the JM has obtained the required slots to run a job, it no longer
needs to communicate with the RM. Hence, a lost RM connection won't
interfere with the job. I would like to keep it like this by letting the TE
announce global result partitions to the RM and not to introduce another
communication roundtrip.

*Agreed, this is a nice property to retain.
*Till: How big do you expect the payload to become?

*I don't know, which is precisely why I want to be cautious about it.
The last time I made a similar assumption I didn't expect anyone to have 
hundreds of thousands of metrics on a single TM, which turned out to be wrong.
I wouldn't exclude the possibility of a similar number of partitions being 
hosted on a single TE.


One problem we have to solve with the heartbeat-based approach is that 
partitions may be lost without the TE noticing, due to disk-failures or 
external delete operations.
Currently, for scheduling purposes we rely on information stored in the JM, and 
update said information if a job fails due to a missing partition. However, 
IIRC the JM is informed about with an exception that is thrown by the consumer 
of said partition, not the producer. As far as the producing TM is concerned, 
it is still hosting that partition.
This means we have to forward errors for missing partitions from the network 
stack on the producers side to the TE, so that it can inform the RM about it.



Yes, I think you are right Chesnay. This would also be a good addition for
the local result partitions.

Cheers,
Till


On 02/10/2019 16:21, Till Rohrmann wrote:

Thanks for addressing our comments Chesnay. See some comments inline.

On Wed, Oct 2, 2019 at 4:07 PM Chesnay Schepler  
 wrote:


Thank you for your comments; I've aggregated them a bit and added
comments to each of them.

1) Concept name (proposal: persistent)

I agree that "global" is rather undescriptive, particularly so since we
never had a notion of "local" partitions.
I'm not a fan of "persistent"; as to me this always implies reliable
long-term storage which as I understand we aren't shooting for here.

I was thinking of "cached" partitions.

To

Re: [VOTE] Release 1.9.1, release candidate #1

2019-10-09 Thread Jark Wu

+1 from my side.

- checked signatures and hashes
- checked that all POM files point to the same version
- verified that the source archives do not contains any binaries
- build the source release with Scala 2.12 and Scala 2.11 successfully
- manually verified the diff pom files between 1.9.0 and 1.9.1 to check
dependencies, looks good
- started cluster for both Scala 2.11 and 2.12, ran examples, verified web
ui and log output, nothing unexpected

Best,
Jark

On Wed, 9 Oct 2019 at 11:18, Jark Wu  wrote:

> Thanks Jincheng and Till, then let's keep on verifying the RC1.
>
> Best,
> Jark
>
> On Wed, 9 Oct 2019 at 11:00, jincheng sun 
> wrote:
>
>> I think we should create the new RC when we find the blocker issues.
>> We can looking forward the other check result, we can add the fix of
>> FLINK-14315 in to 1.9.1 only we find the blockers.
>>
>> Best,
>> Jincheng
>>
>> Till Rohrmann  于2019年10月8日周二 下午8:20写道：
>>
>>> FLINK-14315 has been merged into the release-1.9 branch. I've marked the
>>> fix version of this ticket as 1.9.2. If we should create a new RC, then
>>> we
>>> could include this fix. If this happens, then we need to update the fix
>>> version to 1.9.1.
>>>
>>> Cheers,
>>> Till
>>>
>>> On Tue, Oct 8, 2019 at 1:51 PM Till Rohrmann 
>>> wrote:
>>>
>>> > If people already spent time on verifying the current RC I would also
>>> be
>>> > fine to release the fix for FLINK-14315 with Flink 1.9.2.
>>> >
>>> > I will try to merge the PR as soon as possible. When I close the
>>> ticket, I
>>> > will update the fix version field to 1.9.2.
>>> >
>>> > Cheers,
>>> > Till
>>> >
>>> > On Tue, Oct 8, 2019 at 4:43 AM Jark Wu  wrote:
>>> >
>>> >> Hi Zili,
>>> >>
>>> >> Thanks for reminding me this, because of the Chinese National Day and
>>> >> Flink Forward Europe,
>>> >> we didn't receive any verification on the 1.9.1 RC1. And I guess we
>>> have
>>> >> to extend the voting time after Flink Forward.
>>> >> So I'm fine to have FLINK-14315 and rebuild another RC. What do you
>>> think
>>> >> @Till @Jincheng?
>>> >>
>>> >> I guess FLINK-14315 will be merged soon as it is approved 4 days ago?
>>> >> Could you help to merge it once it is passed ? @Zili Chen
>>> >> 
>>> >>
>>> >> Best,
>>> >> Jark
>>> >>
>>> >> On Tue, 8 Oct 2019 at 09:14, Zili Chen  wrote:
>>> >>
>>> >>> Hi Jark,
>>> >>>
>>> >>> I notice a critical bug[1] is marked resolved in 1.9.1 but given
>>> 1.9.1
>>> >>> has been cut I'd like to throw the issue here so that we're sure
>>> >>> whether or not it is included in 1.9.1.
>>> >>>
>>> >>> Best,
>>> >>> tison.
>>> >>>
>>> >>> [1] https://issues.apache.org/jira/browse/FLINK-14315
>>> >>>
>>> >>>
>>> >>> Jark Wu  于2019年9月30日周一 下午3:25写道：
>>> >>>
>>>   Hi everyone,
>>> 
>>>  Please review and vote on the release candidate #1 for the version
>>>  1.9.1,
>>>  as follows:
>>>  [ ] +1, Approve the release
>>>  [ ] -1, Do not approve the release (please provide specific
>>> comments)
>>> 
>>> 
>>>  The complete staging area is available for your review, which
>>> includes:
>>>  * JIRA release notes [1],
>>>  * the official Apache source release and binary convenience
>>> releases to
>>>  be
>>>  deployed to dist.apache.org [2], which are signed with the key with
>>>  fingerprint E2C45417BED5C104154F341085BACB5AEFAE3202 [3],
>>>  * all artifacts to be deployed to the Maven Central Repository [4],
>>>  * source code tag "release-1.9.1-rc1" [5],
>>>  * website pull request listing the new release and adding
>>> announcement
>>>  blog
>>>  post [6].
>>> 
>>>  The vote will be open for at least 72 hours.
>>>  Please cast your votes before *Oct. 3th 2019, 08:00 UTC*.
>>> 
>>>  It is adopted by majority approval, with at least 3 PMC affirmative
>>>  votes.
>>> 
>>>  Thanks,
>>>  Jark
>>> 
>>>  [1]
>>> 
>>> 
>>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12346003
>>>  [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.9.1-rc1/
>>>  [3] https://dist.apache.org/repos/dist/release/flink/KEYS
>>>  [4]
>>> 
>>> https://repository.apache.org/content/repositories/orgapacheflink-1272/
>>>  [5]
>>> 
>>> 
>>> https://github.com/apache/flink/commit/4d56de81cb692c68a7d1dbfff13087a5079a8252
>>>  [6] https://github.com/apache/flink-web/pull/274
>>> 
>>> >>>
>>>
>>

[jira] [Created] (FLINK-14349) Create a Connector Descriptor for HBase so that user can connect HBase by TableEnvironment#connect

2019-10-09 Thread Zheng Hu (Jira)

Zheng Hu created FLINK-14349:


 Summary: Create a Connector Descriptor for HBase so that user can 
connect HBase by TableEnvironment#connect
 Key: FLINK-14349
 URL: https://issues.apache.org/jira/browse/FLINK-14349
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / HBase
Reporter: Zheng Hu


Will provide a ConnectorDescritptor for HBase, Just like the Elasticsearch 
ConnectorDescritptor & Kafka ConnectorDescritptor,  which will make it easier 
to connect  HBase in Java or Scala. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [VOTE] FLIP-74: Flink JobClient API

2019-10-09 Thread Zili Chen

Given the ongoing FlinkForward Berlin event, I'm going to extend
this vote thread with a bit of period, said until Oct. 11th(Friday).

Best,
tison.


Zili Chen  于2019年10月7日周一 下午4:15写道：

> Hi all,
>
> I would like to start the vote for FLIP-74[1], which is discussed and
> reached a consensus in the discussion thread[2].
>
> The vote will be open util Oct. 9th(72h starting on Oct.7th), unless
> there is an objection or not  enough votes.
>
> Best,
> tison.
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-74%3A+Flink+JobClient+API
> [2]
> https://lists.apache.org/x/thread.html/b2e22a45aeb94a8d06b50c4de078f7b23d9ff08b8226918a1a903768@%3Cdev.flink.apache.org%3E
>

[SURVEY] How do you use ExternallyInducedSource or WithMasterCheckpointHook

Re: [DISCUSS] FLIP-76: Unaligned checkpoints

Re: [DISCUSS] FLIP-68: Extend Core Table System with Modular Plugins

Re: [SURVEY] Dropping non Credit-based Flow Control

[jira] [Created] (FLINK-14359) Create a module called flink-sql-connector-hbase to shade HBase

Re: [DISCUSS] FLIP-76: Unaligned checkpoints

Re: [DISCUSS] FLIP-68: Extend Core Table System with Modular Plugins

Re: [DISCUSS] Drop Python 2 support for 1.10

[jira] [Created] (FLINK-14358) [Web UI] configuration tab for jobmanager has improper width for prop key

Re: Multiple Taskmanagers per node for standalone cluster

Re: [DISCUSS] FLIP-73: Introducing Executors for job submission

[jira] [Created] (FLINK-14357) Request for the ability to move checkpoints/savepoints

[jira] [Created] (FLINK-14356) Support some special RowDeserializationSchema and RowSerializationSchema

[jira] [Created] (FLINK-14355) Example code in state processor API docs doesn't compile

[jira] [Created] (FLINK-14354) Provide interfaces instead of abstract classes in org.apache.flink.state.api.functions

Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table module

[jira] [Created] (FLINK-14353) Enable fork-reuse for table-planner

Re: [DISCUSS] FLIP-68: Extend Core Table System with Modular Plugins

[jira] [Created] (FLINK-14352) Dependencies section in Connect page of Table is broken

[jira] [Created] (FLINK-14351) Refactor MetricRegistry delimiter retrieval into separate interface

[jira] [Created] (FLINK-14350) Introduce dedicated MetricScope

Re: [DISCUSS] FLIP-67: Global partitions lifecycle

Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table module

Re: [DISCUSS] FLIP-67: Global partitions lifecycle

Re: [DISCUSS] FLIP-67: Global partitions lifecycle

Re: [VOTE] Release 1.9.1, release candidate #1

[jira] [Created] (FLINK-14349) Create a Connector Descriptor for HBase so that user can connect HBase by TableEnvironment#connect

Re: [VOTE] FLIP-74: Flink JobClient API

28 matches

Site Navigation

Mail list logo

Footer information