Re: [DISCUSS] Planning Flink 1.12

2020-07-22 Thread Canbin Zheng
Hi All,

Thanks for bring-up this discussion, Robert!
Congratulations on becoming the release manager of 1.12, Dian and Robert !

--
Here are some of my thoughts of the features for native integration with
Kubernetes in Flink 1.12:

1. Support user-specified pod templates
Description:
The current approach of introducing new configuration options for each
aspect of pod specification a user might wish is becoming unwieldy, we have
to maintain more and more Flink side Kubernetes configuration options and
users have to learn the gap between the declarative model used by
Kubernetes and the configuration model used by Flink. It's a great
improvement to allow users to specify pod templates as central places for
all customization needs for the jobmanager and taskmanager pods.
Benefits:
Users can leverage many of the advanced K8s features that the Flink
community does not support explicitly, such as volume mounting, DNS
configuration, pod affinity/anti-affinity setting, etc.

2. Support running PyFlink on Kubernetes
Description:
Support running PyFlink on Kubernetes, including session cluster and
application cluster.
Benefits:
Running python application in a containerized environment.

3. Support built-in init-Container
Description:
We need a built-in init-Container to help solve dependency management
in a containerized environment, especially in the application mode.
Benefits:
Separate the base Flink image from dynamic dependencies.

4. Support accessing secured services via K8s secrets
Description:
Kubernetes Secrets
 can be used to
provide credentials for a Flink application to access secured services. It
helps people who want to use a user-specified K8s Secret through an
environment variable.
Benefits:
Improve user experience.

5. Support configuring replica of JobManager Deployment in ZooKeeper HA
setups
Description:
Make the *replica* of Deployment configurable in the ZooKeeper HA
setups.
Benefits:
Achieve faster failover.

6. Support to configure limit for CPU requirement
Description:
To leverage the Kubernetes feature of container request/limit CPU.
Benefits:
Reduce cost.

Regards,
Canbin Zheng

Harold.Miao  于2020年7月23日周四 下午12:44写道:

> I'm excited to hear about this feature,  very, very, very highly encouraged
>
>
> Prasanna kumar  于2020年7月23日周四 上午12:10写道:
>
> > Hi Flink Dev Team,
> >
> > Dynamic AutoScaling Based on the incoming data load would be a great
> > feature.
> >
> > We should be able have some rule say If the load increased by 20% , add
> > extra resource should be added.
> > Or time based say during these peak hours the pipeline should scale
> > automatically by 50%.
> >
> > This will help a lot in cost reduction.
> >
> > EMR cluster provides a similar feature for SPARK based application.
> >
> > Thanks,
> > Prasanna.
> >
> > On Wed, Jul 22, 2020 at 5:40 PM Robert Metzger 
> > wrote:
> >
> > > Hi all,
> > >
> > > Now that the 1.11 release is out, it is time to plan for the next major
> > > Flink release.
> > >
> > > Some items:
> > >
> > >1.
> > >
> > >Dian Fu and me volunteer to be the release managers for Flink 1.12.
> > >
> > >
> > >
> > >1.
> > >
> > >Timeline: We propose to stick to our approximate 4 month release
> > cycle,
> > >thus the release should be done by late October. Given that there’s
> a
> > >holiday week in China at the beginning of October, I propose to do
> the
> > >feature freeze on master by late September.
> > >
> > >2.
> > >
> > >Collecting features: It would be good to have a rough overview of
> the
> > >features that will likely be ready to be merged by late September,
> and
> > > that
> > >we want in the release.
> > >Based on the discussion, we will update the Roadmap on the Flink
> > website
> > >again!
> > >
> > >
> > >
> > >1.
> > >
> > >Test instabilities and blockers: I would like to avoid a situation
> > where
> > >we have many blocking issues or build instabilities at the time of
> the
> > >feature freeze. To achieve that, we will try to check every build
> > >instability within a week, to decide if it is a blocker (make sure
> to
> > > use
> > >the “test-stability” label for those tickets!)
> > >Blocker issues will need to have somebody assigned (responsible)
> > within
> > >a week, and we want to see progress on all blocker issues
> (downgrade,
> > >resolution, a good plan how to proceed if it is more complicated)
> > >
> > >2.
> > >
> > >Quality and stability of new features: In order to have a short
> > feature
> > >freeze phase, we encourage developers to only merge well-tested and
> > >documented features. In our experience, the feature freeze works
> best
> > if
> > >new features are complete, and the community can focus fully on
> > > addressing
> > >newly found bugs and voting the 

[jira] [Created] (FLINK-18679) It doesn't support to join two tables containing the same field names in Table API

2020-07-22 Thread Dian Fu (Jira)
Dian Fu created FLINK-18679:
---

 Summary: It doesn't support to join two tables containing the same 
field names in Table API
 Key: FLINK-18679
 URL: https://issues.apache.org/jira/browse/FLINK-18679
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / API
Reporter: Dian Fu


For example, if we have two tables which have the same field named "int1" and 
want to join the two tables on this field:
{code:java}
val t1 = util.addTable[(Int, Long, String)]('int1, 'long1, 'string1)
val t2 = util.addTable[(Int, Long, String)]('int1, 'long2, 'string2)
{code}
In SQL, we could do it as following:
{code:java}
SELECT xxx
FROM t1 JOIN t2 ON t1.int1 = t2.int1
{code}
However, this is not possible in Table API. It lacks a way to specify a column 
from one table. We have to rename the field name from one table to make sure 
that all the field names are unique before joining them. This is very 
inconvenient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [ANNOUNCE] Apache Flink 1.11.1 released

2020-07-22 Thread Congxian Qiu
Thanks Dian for the great work and thanks to everyone who makes this
release possible!

Best,
Congxian


Rui Li  于2020年7月23日周四 上午10:48写道:

> Thanks Dian for the great work!
>
> On Thu, Jul 23, 2020 at 10:22 AM Jingsong Li 
> wrote:
>
> > Thanks for being the release manager for the 1.11.1 release, Dian.
> >
> > Best,
> > Jingsong
> >
> > On Thu, Jul 23, 2020 at 10:12 AM Zhijiang 
> > wrote:
> >
> >> Thanks for being the release manager and the efficient work, Dian!
> >>
> >> Best,
> >> Zhijiang
> >>
> >> --
> >> From:Konstantin Knauf 
> >> Send Time:2020年7月22日(星期三) 19:55
> >> To:Till Rohrmann 
> >> Cc:dev ; Yangze Guo ; Dian
> Fu <
> >> dia...@apache.org>; user ; user-zh <
> >> user...@flink.apache.org>
> >> Subject:Re: [ANNOUNCE] Apache Flink 1.11.1 released
> >>
> >> Thank you for managing the quick follow up release. I think this was
> very
> >> important for Table & SQL users.
> >>
> >> On Wed, Jul 22, 2020 at 1:45 PM Till Rohrmann 
> >> wrote:
> >> Thanks for being the release manager for the 1.11.1 release, Dian.
> Thanks
> >> a lot to everyone who contributed to this release.
> >>
> >> Cheers,
> >> Till
> >>
> >> On Wed, Jul 22, 2020 at 11:38 AM Hequn Cheng  wrote:
> >> Thanks Dian for the great work and thanks to everyone who makes this
> >> release possible!
> >>
> >> Best, Hequn
> >>
> >> On Wed, Jul 22, 2020 at 4:40 PM Jark Wu  wrote:
> >>
> >> > Congratulations! Thanks Dian for the great work and to be the release
> >> > manager!
> >> >
> >> > Best,
> >> > Jark
> >> >
> >> > On Wed, 22 Jul 2020 at 15:45, Yangze Guo  wrote:
> >> >
> >> > > Congrats!
> >> > >
> >> > > Thanks Dian Fu for being release manager, and everyone involved!
> >> > >
> >> > > Best,
> >> > > Yangze Guo
> >> > >
> >> > > On Wed, Jul 22, 2020 at 3:14 PM Wei Zhong 
> >> > wrote:
> >> > > >
> >> > > > Congratulations! Thanks Dian for the great work!
> >> > > >
> >> > > > Best,
> >> > > > Wei
> >> > > >
> >> > > > > 在 2020年7月22日,15:09,Leonard Xu  写道:
> >> > > > >
> >> > > > > Congratulations!
> >> > > > >
> >> > > > > Thanks Dian Fu for the great work as release manager, and thanks
> >> > > everyone involved!
> >> > > > >
> >> > > > > Best
> >> > > > > Leonard Xu
> >> > > > >
> >> > > > >> 在 2020年7月22日,14:52,Dian Fu  写道:
> >> > > > >>
> >> > > > >> The Apache Flink community is very happy to announce the
> release
> >> of
> >> > > Apache Flink 1.11.1, which is the first bugfix release for the
> Apache
> >> > Flink
> >> > > 1.11 series.
> >> > > > >>
> >> > > > >> Apache Flink® is an open-source stream processing framework for
> >> > > distributed, high-performing, always-available, and accurate data
> >> > streaming
> >> > > applications.
> >> > > > >>
> >> > > > >> The release is available for download at:
> >> > > > >> https://flink.apache.org/downloads.html
> >> > > > >>
> >> > > > >> Please check out the release blog post for an overview of the
> >> > > improvements for this bugfix release:
> >> > > > >> https://flink.apache.org/news/2020/07/21/release-1.11.1.html
> >> > > > >>
> >> > > > >> The full release notes are available in Jira:
> >> > > > >>
> >> > >
> >> >
> >>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12348323
> >> > > > >>
> >> > > > >> We would like to thank all contributors of the Apache Flink
> >> > community
> >> > > who made this release possible!
> >> > > > >>
> >> > > > >> Regards,
> >> > > > >> Dian
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >>
> >> --
> >>
> >> Konstantin Knauf
> >>
> >> https://twitter.com/snntrable
> >>
> >> https://github.com/knaufk
> >>
> >>
> >>
> >
> > --
> > Best, Jingsong Lee
> >
>
>
> --
> Best regards!
> Rui Li
>


Re: [DISCUSS] Planning Flink 1.12

2020-07-22 Thread Harold.Miao
I'm excited to hear about this feature,  very, very, very highly encouraged


Prasanna kumar  于2020年7月23日周四 上午12:10写道:

> Hi Flink Dev Team,
>
> Dynamic AutoScaling Based on the incoming data load would be a great
> feature.
>
> We should be able have some rule say If the load increased by 20% , add
> extra resource should be added.
> Or time based say during these peak hours the pipeline should scale
> automatically by 50%.
>
> This will help a lot in cost reduction.
>
> EMR cluster provides a similar feature for SPARK based application.
>
> Thanks,
> Prasanna.
>
> On Wed, Jul 22, 2020 at 5:40 PM Robert Metzger 
> wrote:
>
> > Hi all,
> >
> > Now that the 1.11 release is out, it is time to plan for the next major
> > Flink release.
> >
> > Some items:
> >
> >1.
> >
> >Dian Fu and me volunteer to be the release managers for Flink 1.12.
> >
> >
> >
> >1.
> >
> >Timeline: We propose to stick to our approximate 4 month release
> cycle,
> >thus the release should be done by late October. Given that there’s a
> >holiday week in China at the beginning of October, I propose to do the
> >feature freeze on master by late September.
> >
> >2.
> >
> >Collecting features: It would be good to have a rough overview of the
> >features that will likely be ready to be merged by late September, and
> > that
> >we want in the release.
> >Based on the discussion, we will update the Roadmap on the Flink
> website
> >again!
> >
> >
> >
> >1.
> >
> >Test instabilities and blockers: I would like to avoid a situation
> where
> >we have many blocking issues or build instabilities at the time of the
> >feature freeze. To achieve that, we will try to check every build
> >instability within a week, to decide if it is a blocker (make sure to
> > use
> >the “test-stability” label for those tickets!)
> >Blocker issues will need to have somebody assigned (responsible)
> within
> >a week, and we want to see progress on all blocker issues (downgrade,
> >resolution, a good plan how to proceed if it is more complicated)
> >
> >2.
> >
> >Quality and stability of new features: In order to have a short
> feature
> >freeze phase, we encourage developers to only merge well-tested and
> >documented features. In our experience, the feature freeze works best
> if
> >new features are complete, and the community can focus fully on
> > addressing
> >newly found bugs and voting the release.
> >By having a smooth release process, the next merge-window for the next
> >release will come sooner.
> >
> >
> > Let me know what you think about our items, and share which features you
> > want in Flink 1.12.
> >
> > Best,
> >
> > Robert & Dian
> >
>


-- 

Best Regards,
Harold Miao


[jira] [Created] (FLINK-18678) Hive connector fails to create vector orc reader if user specifies incorrect hive version

2020-07-22 Thread Rui Li (Jira)
Rui Li created FLINK-18678:
--

 Summary: Hive connector fails to create vector orc reader if user 
specifies incorrect hive version
 Key: FLINK-18678
 URL: https://issues.apache.org/jira/browse/FLINK-18678
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Hive, Formats (JSON, Avro, Parquet, ORC, 
SequenceFile)
Reporter: Rui Li






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Planning Flink 1.12

2020-07-22 Thread jincheng sun
Hi All,

Thanks for bring-up this discussion, Robert!
Congratulations on becoming the release manager of 1.12, Dian and Robert !

--
Here is my thoughts of the features for PyFlink in Flink 1.12:

1. Improve the usabilities for PyFlink
Description:
Improve the usabilities for PyFlink such as helping users check the
client and cluster environment, optimize error messages, improve the
current API type hint, etc.
Benefits:
Improve user experience.

2. PyFlink Table API DSL
Description:
Support Python Table API Expression DSL. Expression DSL has been
supported on the Java side(FLIP-55). This task tries to align Python Table
API with Java Table API.
Benefits:
Expression DSL is more user friendly than String expressions that users
can rely on IDE smart prompts to write expressions which can facilitate
users and increase development efficiency.


3. Python DataStream API
Description:
Support DataStream applications written in Python, including stateless
operations(keyBy, connect, union, map, flatMap, filter, etc) and stateful
operations(RichFunctions, ProcessFunctions,  window, join).
Benefits:
   1) By adding DataStream API in pyflink, it would provide users more
fine-grained configuration setting API for tasks(such as parallelism and
resource spec) and more complex data processing  operation, which are users
strong demand while SQL and Table API are not supported at the moment.
   2) For areas which have low relies on relation operations, such AI,
transformations like map, flatmap, are more prefered by users than Table
API.

4. Support Pandas UDAF in batch GroupBy aggregation
Description:
Support Pandas UDAF in batch GroupBy aggregation of Python Table API &
SQL. Both the input and output of the UDF is pandas.DataFrame.
Benefits:
   1) Pandas UDAF performs better than row-at-a-time UDAF more than 10x in
certain scenarios
   2) Users could use Pandas/Numpy API in the Python UDAF implementation if
the input/output data type is pandas.DataFrame

5.PyFlink Table API UDAF
   Description:
   Support UDAF for Python Table API.
   Benefits:
   Aggregations(stateful operations) can also be supported in PyFlink.

6. Support running pyflink jobs on kubernetes
Description:
Support running pyflink job on kubernetes, including dependency
management and so on just like on yarn and standalone cluster.
Benefits:
Kubernetes is a widely used container orchestration framework which has
more flexibility in application developement and deployment.


Welcome any comments and suggestions!

Best,
Jincheng


Dian Fu  于2020年7月23日周四 上午11:10写道:

> Thanks Robert for bringing up this discussion. This is very important to
> ensure that we have a smooth release process as there are only two months
> left before feature freeze.
>
> It would be good to have a list of the features for 1.12 as soon as
> possible. Welcome any one to post the feature list which you think
> important and want in 1.12.
>
> Regards,
> Dian
>
> > 在 2020年7月23日,上午12:10,Prasanna kumar  写道:
> >
> > Hi Flink Dev Team,
> >
> > Dynamic AutoScaling Based on the incoming data load would be a great
> feature.
> >
> > We should be able have some rule say If the load increased by 20% , add
> extra resource should be added.
> > Or time based say during these peak hours the pipeline should scale
> automatically by 50%.
> >
> > This will help a lot in cost reduction.
> >
> > EMR cluster provides a similar feature for SPARK based application.
> >
> > Thanks,
> > Prasanna.
> >
> > On Wed, Jul 22, 2020 at 5:40 PM Robert Metzger  > wrote:
> > Hi all,
> >
> > Now that the 1.11 release is out, it is time to plan for the next major
> > Flink release.
> >
> > Some items:
> >
> >1.
> >
> >Dian Fu and me volunteer to be the release managers for Flink 1.12.
> >
> >
> >
> >1.
> >
> >Timeline: We propose to stick to our approximate 4 month release
> cycle,
> >thus the release should be done by late October. Given that there’s a
> >holiday week in China at the beginning of October, I propose to do the
> >feature freeze on master by late September.
> >
> >2.
> >
> >Collecting features: It would be good to have a rough overview of the
> >features that will likely be ready to be merged by late September,
> and that
> >we want in the release.
> >Based on the discussion, we will update the Roadmap on the Flink
> website
> >again!
> >
> >
> >
> >1.
> >
> >Test instabilities and blockers: I would like to avoid a situation
> where
> >we have many blocking issues or build instabilities at the time of the
> >feature freeze. To achieve that, we will try to check every build
> >instability within a week, to decide if it is a blocker (make sure to
> use
> >the “test-stability” label for those tickets!)
> >Blocker issues will need to have somebody assigned (responsible)
> within
> >a week, and we want to see progress 

Re: [DISCUSS] Planning Flink 1.12

2020-07-22 Thread Dian Fu
Thanks Robert for bringing up this discussion. This is very important to ensure 
that we have a smooth release process as there are only two months left before 
feature freeze.

It would be good to have a list of the features for 1.12 as soon as possible. 
Welcome any one to post the feature list which you think important and want in 
1.12.

Regards,
Dian

> 在 2020年7月23日,上午12:10,Prasanna kumar  写道:
> 
> Hi Flink Dev Team,
> 
> Dynamic AutoScaling Based on the incoming data load would be a great feature. 
> 
> We should be able have some rule say If the load increased by 20% , add extra 
> resource should be added. 
> Or time based say during these peak hours the pipeline should scale 
> automatically by 50%.
> 
> This will help a lot in cost reduction.
> 
> EMR cluster provides a similar feature for SPARK based application.
> 
> Thanks,
> Prasanna.
> 
> On Wed, Jul 22, 2020 at 5:40 PM Robert Metzger  > wrote:
> Hi all,
> 
> Now that the 1.11 release is out, it is time to plan for the next major
> Flink release.
> 
> Some items:
> 
>1.
> 
>Dian Fu and me volunteer to be the release managers for Flink 1.12.
> 
> 
> 
>1.
> 
>Timeline: We propose to stick to our approximate 4 month release cycle,
>thus the release should be done by late October. Given that there’s a
>holiday week in China at the beginning of October, I propose to do the
>feature freeze on master by late September.
> 
>2.
> 
>Collecting features: It would be good to have a rough overview of the
>features that will likely be ready to be merged by late September, and that
>we want in the release.
>Based on the discussion, we will update the Roadmap on the Flink website
>again!
> 
> 
> 
>1.
> 
>Test instabilities and blockers: I would like to avoid a situation where
>we have many blocking issues or build instabilities at the time of the
>feature freeze. To achieve that, we will try to check every build
>instability within a week, to decide if it is a blocker (make sure to use
>the “test-stability” label for those tickets!)
>Blocker issues will need to have somebody assigned (responsible) within
>a week, and we want to see progress on all blocker issues (downgrade,
>resolution, a good plan how to proceed if it is more complicated)
> 
>2.
> 
>Quality and stability of new features: In order to have a short feature
>freeze phase, we encourage developers to only merge well-tested and
>documented features. In our experience, the feature freeze works best if
>new features are complete, and the community can focus fully on addressing
>newly found bugs and voting the release.
>By having a smooth release process, the next merge-window for the next
>release will come sooner.
> 
> 
> Let me know what you think about our items, and share which features you
> want in Flink 1.12.
> 
> Best,
> 
> Robert & Dian



Re: [ANNOUNCE] Apache Flink 1.11.1 released

2020-07-22 Thread Rui Li
Thanks Dian for the great work!

On Thu, Jul 23, 2020 at 10:22 AM Jingsong Li  wrote:

> Thanks for being the release manager for the 1.11.1 release, Dian.
>
> Best,
> Jingsong
>
> On Thu, Jul 23, 2020 at 10:12 AM Zhijiang 
> wrote:
>
>> Thanks for being the release manager and the efficient work, Dian!
>>
>> Best,
>> Zhijiang
>>
>> --
>> From:Konstantin Knauf 
>> Send Time:2020年7月22日(星期三) 19:55
>> To:Till Rohrmann 
>> Cc:dev ; Yangze Guo ; Dian Fu <
>> dia...@apache.org>; user ; user-zh <
>> user...@flink.apache.org>
>> Subject:Re: [ANNOUNCE] Apache Flink 1.11.1 released
>>
>> Thank you for managing the quick follow up release. I think this was very
>> important for Table & SQL users.
>>
>> On Wed, Jul 22, 2020 at 1:45 PM Till Rohrmann 
>> wrote:
>> Thanks for being the release manager for the 1.11.1 release, Dian. Thanks
>> a lot to everyone who contributed to this release.
>>
>> Cheers,
>> Till
>>
>> On Wed, Jul 22, 2020 at 11:38 AM Hequn Cheng  wrote:
>> Thanks Dian for the great work and thanks to everyone who makes this
>> release possible!
>>
>> Best, Hequn
>>
>> On Wed, Jul 22, 2020 at 4:40 PM Jark Wu  wrote:
>>
>> > Congratulations! Thanks Dian for the great work and to be the release
>> > manager!
>> >
>> > Best,
>> > Jark
>> >
>> > On Wed, 22 Jul 2020 at 15:45, Yangze Guo  wrote:
>> >
>> > > Congrats!
>> > >
>> > > Thanks Dian Fu for being release manager, and everyone involved!
>> > >
>> > > Best,
>> > > Yangze Guo
>> > >
>> > > On Wed, Jul 22, 2020 at 3:14 PM Wei Zhong 
>> > wrote:
>> > > >
>> > > > Congratulations! Thanks Dian for the great work!
>> > > >
>> > > > Best,
>> > > > Wei
>> > > >
>> > > > > 在 2020年7月22日,15:09,Leonard Xu  写道:
>> > > > >
>> > > > > Congratulations!
>> > > > >
>> > > > > Thanks Dian Fu for the great work as release manager, and thanks
>> > > everyone involved!
>> > > > >
>> > > > > Best
>> > > > > Leonard Xu
>> > > > >
>> > > > >> 在 2020年7月22日,14:52,Dian Fu  写道:
>> > > > >>
>> > > > >> The Apache Flink community is very happy to announce the release
>> of
>> > > Apache Flink 1.11.1, which is the first bugfix release for the Apache
>> > Flink
>> > > 1.11 series.
>> > > > >>
>> > > > >> Apache Flink® is an open-source stream processing framework for
>> > > distributed, high-performing, always-available, and accurate data
>> > streaming
>> > > applications.
>> > > > >>
>> > > > >> The release is available for download at:
>> > > > >> https://flink.apache.org/downloads.html
>> > > > >>
>> > > > >> Please check out the release blog post for an overview of the
>> > > improvements for this bugfix release:
>> > > > >> https://flink.apache.org/news/2020/07/21/release-1.11.1.html
>> > > > >>
>> > > > >> The full release notes are available in Jira:
>> > > > >>
>> > >
>> >
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12348323
>> > > > >>
>> > > > >> We would like to thank all contributors of the Apache Flink
>> > community
>> > > who made this release possible!
>> > > > >>
>> > > > >> Regards,
>> > > > >> Dian
>> > > > >
>> > > >
>> > >
>> >
>>
>>
>> --
>>
>> Konstantin Knauf
>>
>> https://twitter.com/snntrable
>>
>> https://github.com/knaufk
>>
>>
>>
>
> --
> Best, Jingsong Lee
>


-- 
Best regards!
Rui Li


Re: [ANNOUNCE] Apache Flink 1.11.1 released

2020-07-22 Thread Jingsong Li
Thanks for being the release manager for the 1.11.1 release, Dian.

Best,
Jingsong

On Thu, Jul 23, 2020 at 10:12 AM Zhijiang 
wrote:

> Thanks for being the release manager and the efficient work, Dian!
>
> Best,
> Zhijiang
>
> --
> From:Konstantin Knauf 
> Send Time:2020年7月22日(星期三) 19:55
> To:Till Rohrmann 
> Cc:dev ; Yangze Guo ; Dian Fu <
> dia...@apache.org>; user ; user-zh <
> user...@flink.apache.org>
> Subject:Re: [ANNOUNCE] Apache Flink 1.11.1 released
>
> Thank you for managing the quick follow up release. I think this was very
> important for Table & SQL users.
>
> On Wed, Jul 22, 2020 at 1:45 PM Till Rohrmann 
> wrote:
> Thanks for being the release manager for the 1.11.1 release, Dian. Thanks
> a lot to everyone who contributed to this release.
>
> Cheers,
> Till
>
> On Wed, Jul 22, 2020 at 11:38 AM Hequn Cheng  wrote:
> Thanks Dian for the great work and thanks to everyone who makes this
> release possible!
>
> Best, Hequn
>
> On Wed, Jul 22, 2020 at 4:40 PM Jark Wu  wrote:
>
> > Congratulations! Thanks Dian for the great work and to be the release
> > manager!
> >
> > Best,
> > Jark
> >
> > On Wed, 22 Jul 2020 at 15:45, Yangze Guo  wrote:
> >
> > > Congrats!
> > >
> > > Thanks Dian Fu for being release manager, and everyone involved!
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Wed, Jul 22, 2020 at 3:14 PM Wei Zhong 
> > wrote:
> > > >
> > > > Congratulations! Thanks Dian for the great work!
> > > >
> > > > Best,
> > > > Wei
> > > >
> > > > > 在 2020年7月22日,15:09,Leonard Xu  写道:
> > > > >
> > > > > Congratulations!
> > > > >
> > > > > Thanks Dian Fu for the great work as release manager, and thanks
> > > everyone involved!
> > > > >
> > > > > Best
> > > > > Leonard Xu
> > > > >
> > > > >> 在 2020年7月22日,14:52,Dian Fu  写道:
> > > > >>
> > > > >> The Apache Flink community is very happy to announce the release
> of
> > > Apache Flink 1.11.1, which is the first bugfix release for the Apache
> > Flink
> > > 1.11 series.
> > > > >>
> > > > >> Apache Flink® is an open-source stream processing framework for
> > > distributed, high-performing, always-available, and accurate data
> > streaming
> > > applications.
> > > > >>
> > > > >> The release is available for download at:
> > > > >> https://flink.apache.org/downloads.html
> > > > >>
> > > > >> Please check out the release blog post for an overview of the
> > > improvements for this bugfix release:
> > > > >> https://flink.apache.org/news/2020/07/21/release-1.11.1.html
> > > > >>
> > > > >> The full release notes are available in Jira:
> > > > >>
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12348323
> > > > >>
> > > > >> We would like to thank all contributors of the Apache Flink
> > community
> > > who made this release possible!
> > > > >>
> > > > >> Regards,
> > > > >> Dian
> > > > >
> > > >
> > >
> >
>
>
> --
>
> Konstantin Knauf
>
> https://twitter.com/snntrable
>
> https://github.com/knaufk
>
>
>

-- 
Best, Jingsong Lee


Re: [ANNOUNCE] Apache Flink 1.11.1 released

2020-07-22 Thread Zhijiang
Thanks for being the release manager and the efficient work, Dian!

Best,
Zhijiang


--
From:Konstantin Knauf 
Send Time:2020年7月22日(星期三) 19:55
To:Till Rohrmann 
Cc:dev ; Yangze Guo ; Dian Fu 
; user ; user-zh 

Subject:Re: [ANNOUNCE] Apache Flink 1.11.1 released

Thank you for managing the quick follow up release. I think this was very 
important for Table & SQL users.
On Wed, Jul 22, 2020 at 1:45 PM Till Rohrmann  wrote:

Thanks for being the release manager for the 1.11.1 release, Dian. Thanks a lot 
to everyone who contributed to this release.

Cheers,
Till
On Wed, Jul 22, 2020 at 11:38 AM Hequn Cheng  wrote:
Thanks Dian for the great work and thanks to everyone who makes this
 release possible!

 Best, Hequn

 On Wed, Jul 22, 2020 at 4:40 PM Jark Wu  wrote:

 > Congratulations! Thanks Dian for the great work and to be the release
 > manager!
 >
 > Best,
 > Jark
 >
 > On Wed, 22 Jul 2020 at 15:45, Yangze Guo  wrote:
 >
 > > Congrats!
 > >
 > > Thanks Dian Fu for being release manager, and everyone involved!
 > >
 > > Best,
 > > Yangze Guo
 > >
 > > On Wed, Jul 22, 2020 at 3:14 PM Wei Zhong 
 > wrote:
 > > >
 > > > Congratulations! Thanks Dian for the great work!
 > > >
 > > > Best,
 > > > Wei
 > > >
 > > > > 在 2020年7月22日,15:09,Leonard Xu  写道:
 > > > >
 > > > > Congratulations!
 > > > >
 > > > > Thanks Dian Fu for the great work as release manager, and thanks
 > > everyone involved!
 > > > >
 > > > > Best
 > > > > Leonard Xu
 > > > >
 > > > >> 在 2020年7月22日,14:52,Dian Fu  写道:
 > > > >>
 > > > >> The Apache Flink community is very happy to announce the release of
 > > Apache Flink 1.11.1, which is the first bugfix release for the Apache
 > Flink
 > > 1.11 series.
 > > > >>
 > > > >> Apache Flink(r) is an open-source stream processing framework for
 > > distributed, high-performing, always-available, and accurate data
 > streaming
 > > applications.
 > > > >>
 > > > >> The release is available for download at:
 > > > >> https://flink.apache.org/downloads.html
 > > > >>
 > > > >> Please check out the release blog post for an overview of the
 > > improvements for this bugfix release:
 > > > >> https://flink.apache.org/news/2020/07/21/release-1.11.1.html
 > > > >>
 > > > >> The full release notes are available in Jira:
 > > > >>
 > >
 > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12348323
 > > > >>
 > > > >> We would like to thank all contributors of the Apache Flink
 > community
 > > who made this release possible!
 > > > >>
 > > > >> Regards,
 > > > >> Dian
 > > > >
 > > >
 > >
 >


-- 
Konstantin Knauf 
https://twitter.com/snntrable
https://github.com/knaufk 



[jira] [Created] (FLINK-18677) ZooKeeperLeaderRetrievalService does not invalidate leader in case of SUSPENDED connection

2020-07-22 Thread Till Rohrmann (Jira)
Till Rohrmann created FLINK-18677:
-

 Summary: ZooKeeperLeaderRetrievalService does not invalidate 
leader in case of SUSPENDED connection
 Key: FLINK-18677
 URL: https://issues.apache.org/jira/browse/FLINK-18677
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.11.1, 1.10.1, 1.12.0
Reporter: Till Rohrmann
 Fix For: 1.12.0


The {{ZooKeeperLeaderRetrievalService}} does not invalidate the leader if the 
ZooKeeper connection gets SUSPENDED. This means that a {{TaskManager}} won't 
cancel its running tasks even though it might miss a leader change. I think we 
should at least make it configurable whether in such a situation the leader 
listener should be informed about the lost leadership. Otherwise, we might run 
into the situation where an old and a newly recovered instance of a {{Task}} 
can run at the same time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18676) Update version of aws to support use of default constructor of "WebIdentityTokenCredentialsProvider"

2020-07-22 Thread Ravi Bhushan Ratnakar (Jira)
Ravi Bhushan Ratnakar created FLINK-18676:
-

 Summary: Update version of aws to support use of default 
constructor of "WebIdentityTokenCredentialsProvider"
 Key: FLINK-18676
 URL: https://issues.apache.org/jira/browse/FLINK-18676
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / FileSystem
Affects Versions: 1.11.0
Reporter: Ravi Bhushan Ratnakar


*Background:*

I am using Flink 1.11.0 on kubernetes platform. To give access of aws services 
to taskmanager/jobmanager, we are using "IAM Roles for Service Accounts" . I 
have configured below property in flink-conf.yaml to use credential provider.

fs.s3a.aws.credentials.provider: 
com.amazonaws.auth.WebIdentityTokenCredentialsProvider

 

*Issue:*

When taskmanager/jobmanager is starting up, during this it complains that 
"WebIdentityTokenCredentialsProvider" doesn't have "public constructor" and 
container doesn't come up.

 

*Solution:*

Currently the above credential's class is being used from 
"*flink-s3-fs-hadoop"* which gets "aws-java-sdk-core" dependency from 
"*flink-s3-fs-base*". In *"flink-s3-fs-base",*  version of aws 
[fs.s3.aws.version|[https://github.com/apache/flink/blob/release-1.11.0/flink-filesystems/flink-s3-fs-base/pom.xml#L36]]
 is 1.11.754 . The support of default constructor for 
"WebIdentityTokenCredentialsProvider" is provided from aws version 1.11.788 and 
onward.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18675) Checkpoint not maintaining minimum pause duration between checkpoints

2020-07-22 Thread Ravi Bhushan Ratnakar (Jira)
Ravi Bhushan Ratnakar created FLINK-18675:
-

 Summary: Checkpoint not maintaining minimum pause duration between 
checkpoints
 Key: FLINK-18675
 URL: https://issues.apache.org/jira/browse/FLINK-18675
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Checkpointing
Affects Versions: 1.11.0
 Environment: !image.png!
Reporter: Ravi Bhushan Ratnakar
 Attachments: image.png

I am running a streaming job with Flink 1.11.0 using kubernetes infrastructure. 
I have configured checkpoint configuration like below
Interval - 3 minutes
Minimum pause between checkpoints - 3 minutes
Checkpoint timeout - 10 minutes
Checkpointing Mode - Exactly Once
Number of Concurrent Checkpoint - 1
 
Other configs
Time Characteristics - Processing Time
 
I am observing an usual behaviour. *When a checkpoint completes successfully* 
*and if it's end to end duration is almost equal or greater than Minimum pause 
duration then the next checkpoint gets triggered immediately without 
maintaining the Minimum pause duration*. Kindly notice this behaviour from 
checkpoint id 194 onward in the attached screenshot



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Planning Flink 1.12

2020-07-22 Thread Prasanna kumar
Hi Flink Dev Team,

Dynamic AutoScaling Based on the incoming data load would be a great
feature.

We should be able have some rule say If the load increased by 20% , add
extra resource should be added.
Or time based say during these peak hours the pipeline should scale
automatically by 50%.

This will help a lot in cost reduction.

EMR cluster provides a similar feature for SPARK based application.

Thanks,
Prasanna.

On Wed, Jul 22, 2020 at 5:40 PM Robert Metzger  wrote:

> Hi all,
>
> Now that the 1.11 release is out, it is time to plan for the next major
> Flink release.
>
> Some items:
>
>1.
>
>Dian Fu and me volunteer to be the release managers for Flink 1.12.
>
>
>
>1.
>
>Timeline: We propose to stick to our approximate 4 month release cycle,
>thus the release should be done by late October. Given that there’s a
>holiday week in China at the beginning of October, I propose to do the
>feature freeze on master by late September.
>
>2.
>
>Collecting features: It would be good to have a rough overview of the
>features that will likely be ready to be merged by late September, and
> that
>we want in the release.
>Based on the discussion, we will update the Roadmap on the Flink website
>again!
>
>
>
>1.
>
>Test instabilities and blockers: I would like to avoid a situation where
>we have many blocking issues or build instabilities at the time of the
>feature freeze. To achieve that, we will try to check every build
>instability within a week, to decide if it is a blocker (make sure to
> use
>the “test-stability” label for those tickets!)
>Blocker issues will need to have somebody assigned (responsible) within
>a week, and we want to see progress on all blocker issues (downgrade,
>resolution, a good plan how to proceed if it is more complicated)
>
>2.
>
>Quality and stability of new features: In order to have a short feature
>freeze phase, we encourage developers to only merge well-tested and
>documented features. In our experience, the feature freeze works best if
>new features are complete, and the community can focus fully on
> addressing
>newly found bugs and voting the release.
>By having a smooth release process, the next merge-window for the next
>release will come sooner.
>
>
> Let me know what you think about our items, and share which features you
> want in Flink 1.12.
>
> Best,
>
> Robert & Dian
>


[jira] [Created] (FLINK-18674) Support to bridge Transformation (DataStream) with FLIP-95 interface?

2020-07-22 Thread Jark Wu (Jira)
Jark Wu created FLINK-18674:
---

 Summary: Support to bridge Transformation (DataStream) with 
FLIP-95 interface?
 Key: FLINK-18674
 URL: https://issues.apache.org/jira/browse/FLINK-18674
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Reporter: Jark Wu


A user complained in the user ML that the old connector loigic is hard to 
migrate to the FLIP-95 interfaces, because they heavily used DataStream in the 
TableSource/TableSink and it is not possible to replace with new interface 
right now. 

This issue can be used to collect the user requirements around 
DataStream/Transformation + FLIP-95. We can also evaluate/discuss whether and 
how to support it. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] FLIP-129: Refactor Descriptor API to register connector in Table API

2020-07-22 Thread Jark Wu
Hi all,

After some offline discussion with other people, I'm also fine with using
the builder pattern now,
 even though I still think the `.build()` method is a little verbose in the
user code.

I have updated the FLIP with following changes:

1) use builder pattern instead of "new" keyword. In order to avoid
duplicate code and reduce development burden for connector developers,
 I introduced abstract classes `TableDescriptorBuilder` and
`FormatDescriptorBuilder`.
All the common methods are pre-defined in the base builder class, all
the custom descriptor builder should extend from the base builder classes.
And we can add more methods into the base builder class in the future
without changes in the connectors.
2) use Expression instead of SQL expression string for computed column and
watermark strategy
3) use `watermark(rowtime, expr)` as the watermark method.
4) `Schema.column()` accepts `AbstractDataType` instead of `DataType`
5) drop Schema#proctime and Schema#watermarkFor#boundedOutOfOrderTimestamps

A full example will look like this:

tEnv.createTemporaryTable(
"MyTable",
KafkaConnector.newBuilder()
.version("0.11")
.topic("user_logs")
.property("bootstrap.servers", "localhost:9092")
.property("group.id", "test-group")
.startFromEarliest()
.sinkPartitionerRoundRobin()
.format(JsonFormat.newBuilder().ignoreParseErrors(false).build())
.schema(
Schema.newBuilder()
.column("user_id", DataTypes.BIGINT())
.column("user_name", DataTypes.STRING())
.column("score", DataTypes.DECIMAL(10, 2))
.column("log_ts", DataTypes.STRING())
.column("part_field_0", DataTypes.STRING())
.column("part_field_1", DataTypes.INT())
.column("proc", proctime()) // define a processing-time
attribute with column name "proc"
.column("ts", toTimestamp($("log_ts")))
.watermark("ts", $("ts").minus(lit(3).seconds()))
.primaryKey("user_id")
.build())
.partitionedBy("part_field_0", "part_field_1")  // Kafka doesn't
support partitioned table yet, this is just an example for the API
.build()
);

I hope this resolves all your concerns. Welcome for further feedback!

Updated FLIP:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-129%3A+Refactor+Descriptor+API+to+register+connectors+in+Table+API#FLIP129:RefactorDescriptorAPItoregisterconnectorsinTableAPI-FormatDescriptor

POC:
https://github.com/wuchong/flink/tree/descriptor-POC/flink-table/flink-table-common/src/main/java/org/apache/flink/table/descriptor3

Best,
Jark

On Thu, 16 Jul 2020 at 20:18, Jark Wu  wrote:

> Thank you all for the discussion!
>
> Here are my comments:
>
> 2) I agree we should support Expression as a computed column. But I'm in
> favor of Leonard's point that maybe we can also support SQL string
> expression as a computed column.
> Because it also keeps aligned with DDL. The concern for Expression is that
> converting Expression to SQL string, or (de)serializing Expression is
> another topic not clear and may involve lots of work.
> Maybe we can support Expression later if time permits.
>
> 6,7) I still prefer the "new" keyword over builder. I don't think
> immutable is a strong reason. I care more about usability and experience
> from users and devs perspective.
>   - Users need to type more words if using builder:
> `KafkaConnector.newBuilder()...build()`  vs `new KafkaConnector()...`
>   - It's more difficult for developers to write a descriptor.  2 classes
> (KafkaConnector and KafkaConnectorBuilder) + 5 more methods (builders,
> schema, partitionedBy, like, etc..).
> With the "new" keyword all the common methods are defined by the
> framework.
>   - It's hard to have the same API style for different connectors, because
> the common methods are defined by users. For example, some may have
> `withSchema`, `partitionKey`, `withLike`, etc...
>
> 8) I'm -1 to `ConfigOption`. The ConfigOption is not used on `JsonFormat`,
> but the generic `Connector#option`. This doesn't work when using format
> options.
>
> new Connector("kafka")
>  .option(JsonOptions.IGNORE_PARSE_ERRORS, true);   // this is wrong,
> because "kafka" requires "json.ignore-parse-errors" as the option key, not
> the "ignore-parse-errors".
>
>
> 
> Hi Timo, regarding having a complete new stack, I have thought about that.
> But I still prefer to refactor the existing stack. Reasons:
> Because I think it will be more confusing if users will see two similar
> stacks and may have many problems if using the wrong class.
> For example, we may have two `Schema` and `TableDescriptor` classes. The
> `KafkaConnector` can't be used in legacy `connect()` API,
> the legacy `Kafka` class can't be used in the new `createTemporaryTable()`
> API.
> Besides, the existing API has been 

[jira] [Created] (FLINK-18673) Calling ROW() in a UDF results in UnsupportedOperationException

2020-07-22 Thread Fabian Hueske (Jira)
Fabian Hueske created FLINK-18673:
-

 Summary: Calling ROW() in a UDF results in 
UnsupportedOperationException
 Key: FLINK-18673
 URL: https://issues.apache.org/jira/browse/FLINK-18673
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: Fabian Hueske


Given a UDF {{func}} that accepts a {{ROW(INT, STRING)}} as parameter, it 
cannot be called like this:


{code:java}
SELECT func(ROW(a, b)) FROM t{code}
while this works

 

 
{code:java}
SELECT func(r) FROM (SELECT ROW(a, b) FROM t){code}
 

The exception returned is:
{quote}
org.apache.flink.table.api.ValidationException: SQL validation failed. null
{quote}
with an empty {{UnsupportedOperationException}} as cause.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18672) Fix Scala code examples for UDF type inference annotations

2020-07-22 Thread Fabian Hueske (Jira)
Fabian Hueske created FLINK-18672:
-

 Summary: Fix Scala code examples for UDF type inference annotations
 Key: FLINK-18672
 URL: https://issues.apache.org/jira/browse/FLINK-18672
 Project: Flink
  Issue Type: Bug
  Components: Documentation, Table SQL / API
Reporter: Fabian Hueske


The Scala code examples for the [UDF type inference 
annotations|https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/functions/udfs.html#type-inference]
 are not correct.

For example: the following {{FunctionHint}} annotation 

{code:scala}
@FunctionHint(
  input = Array(@DataTypeHint("INT"), @DataTypeHint("INT")),
  output = @DataTypeHint("INT")
)
{code}


needs to be changed to

{code:scala}
@FunctionHint(
  input = Array(new DataTypeHint("INT"), new DataTypeHint("INT")),
  output = new DataTypeHint("INT")
)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [REMINDER] Use 'starter' labels for Jira issues where it makes sense

2020-07-22 Thread Yu Li
Thanks for the reminder Andrey. Issues with the 'starter' label are indeed
helpful for new contributors to join the project.

Best Regards,
Yu


On Wed, 22 Jul 2020 at 14:30, Till Rohrmann  wrote:

> Thanks for the reminder Andrey. This is important for helping new
> contributors to find their way around in the Flink project.
>
> Cheers,
> Till
>
> On Mon, Jul 20, 2020 at 5:30 PM Aljoscha Krettek 
> wrote:
>
> > Yes, thanks Andrey! That's a good reminder for everyone. :-)
> >
> > On 20.07.20 16:02, Andrey Zagrebin wrote:
> > > Hi Flink Devs,
> > >
> > > I would like to remind you that we have a 'starter' label [1] to
> annotate
> > > Jira issues which need a contribution and which are not very
> > > complicated for the new contributors. The starter issues can be a good
> > > opportunity for the new contributors who want to learn about Flink but
> do
> > > not know where to start [2].
> > >
> > > When you open a Jira issue, please, pay attention to whether it can be
> a
> > > starter task.
> > > Let's try to help on-boarding new contributors!
> > >
> > > Cheers,
> > > Andrey
> > >
> > > [1]
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=154995514
> > >   (Labels)
> > > [2]
> > >
> >
> https://flink.apache.org/contributing/contribute-code.html#looking-for-what-to-contribute
> > >
> >
> >
>


[DISCUSS] Planning Flink 1.12

2020-07-22 Thread Robert Metzger
Hi all,

Now that the 1.11 release is out, it is time to plan for the next major
Flink release.

Some items:

   1.

   Dian Fu and me volunteer to be the release managers for Flink 1.12.



   1.

   Timeline: We propose to stick to our approximate 4 month release cycle,
   thus the release should be done by late October. Given that there’s a
   holiday week in China at the beginning of October, I propose to do the
   feature freeze on master by late September.

   2.

   Collecting features: It would be good to have a rough overview of the
   features that will likely be ready to be merged by late September, and that
   we want in the release.
   Based on the discussion, we will update the Roadmap on the Flink website
   again!



   1.

   Test instabilities and blockers: I would like to avoid a situation where
   we have many blocking issues or build instabilities at the time of the
   feature freeze. To achieve that, we will try to check every build
   instability within a week, to decide if it is a blocker (make sure to use
   the “test-stability” label for those tickets!)
   Blocker issues will need to have somebody assigned (responsible) within
   a week, and we want to see progress on all blocker issues (downgrade,
   resolution, a good plan how to proceed if it is more complicated)

   2.

   Quality and stability of new features: In order to have a short feature
   freeze phase, we encourage developers to only merge well-tested and
   documented features. In our experience, the feature freeze works best if
   new features are complete, and the community can focus fully on addressing
   newly found bugs and voting the release.
   By having a smooth release process, the next merge-window for the next
   release will come sooner.


Let me know what you think about our items, and share which features you
want in Flink 1.12.

Best,

Robert & Dian


Re: [ANNOUNCE] Apache Flink 1.11.1 released

2020-07-22 Thread Konstantin Knauf
Thank you for managing the quick follow up release. I think this was very
important for Table & SQL users.

On Wed, Jul 22, 2020 at 1:45 PM Till Rohrmann  wrote:

> Thanks for being the release manager for the 1.11.1 release, Dian. Thanks
> a lot to everyone who contributed to this release.
>
> Cheers,
> Till
>
> On Wed, Jul 22, 2020 at 11:38 AM Hequn Cheng  wrote:
>
>> Thanks Dian for the great work and thanks to everyone who makes this
>> release possible!
>>
>> Best, Hequn
>>
>> On Wed, Jul 22, 2020 at 4:40 PM Jark Wu  wrote:
>>
>> > Congratulations! Thanks Dian for the great work and to be the release
>> > manager!
>> >
>> > Best,
>> > Jark
>> >
>> > On Wed, 22 Jul 2020 at 15:45, Yangze Guo  wrote:
>> >
>> > > Congrats!
>> > >
>> > > Thanks Dian Fu for being release manager, and everyone involved!
>> > >
>> > > Best,
>> > > Yangze Guo
>> > >
>> > > On Wed, Jul 22, 2020 at 3:14 PM Wei Zhong 
>> > wrote:
>> > > >
>> > > > Congratulations! Thanks Dian for the great work!
>> > > >
>> > > > Best,
>> > > > Wei
>> > > >
>> > > > > 在 2020年7月22日,15:09,Leonard Xu  写道:
>> > > > >
>> > > > > Congratulations!
>> > > > >
>> > > > > Thanks Dian Fu for the great work as release manager, and thanks
>> > > everyone involved!
>> > > > >
>> > > > > Best
>> > > > > Leonard Xu
>> > > > >
>> > > > >> 在 2020年7月22日,14:52,Dian Fu  写道:
>> > > > >>
>> > > > >> The Apache Flink community is very happy to announce the release
>> of
>> > > Apache Flink 1.11.1, which is the first bugfix release for the Apache
>> > Flink
>> > > 1.11 series.
>> > > > >>
>> > > > >> Apache Flink® is an open-source stream processing framework for
>> > > distributed, high-performing, always-available, and accurate data
>> > streaming
>> > > applications.
>> > > > >>
>> > > > >> The release is available for download at:
>> > > > >> https://flink.apache.org/downloads.html
>> > > > >>
>> > > > >> Please check out the release blog post for an overview of the
>> > > improvements for this bugfix release:
>> > > > >> https://flink.apache.org/news/2020/07/21/release-1.11.1.html
>> > > > >>
>> > > > >> The full release notes are available in Jira:
>> > > > >>
>> > >
>> >
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12348323
>> > > > >>
>> > > > >> We would like to thank all contributors of the Apache Flink
>> > community
>> > > who made this release possible!
>> > > > >>
>> > > > >> Regards,
>> > > > >> Dian
>> > > > >
>> > > >
>> > >
>> >
>>
>

-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


Re: [ANNOUNCE] Apache Flink 1.11.1 released

2020-07-22 Thread Till Rohrmann
Thanks for being the release manager for the 1.11.1 release, Dian. Thanks a
lot to everyone who contributed to this release.

Cheers,
Till

On Wed, Jul 22, 2020 at 11:38 AM Hequn Cheng  wrote:

> Thanks Dian for the great work and thanks to everyone who makes this
> release possible!
>
> Best, Hequn
>
> On Wed, Jul 22, 2020 at 4:40 PM Jark Wu  wrote:
>
> > Congratulations! Thanks Dian for the great work and to be the release
> > manager!
> >
> > Best,
> > Jark
> >
> > On Wed, 22 Jul 2020 at 15:45, Yangze Guo  wrote:
> >
> > > Congrats!
> > >
> > > Thanks Dian Fu for being release manager, and everyone involved!
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Wed, Jul 22, 2020 at 3:14 PM Wei Zhong 
> > wrote:
> > > >
> > > > Congratulations! Thanks Dian for the great work!
> > > >
> > > > Best,
> > > > Wei
> > > >
> > > > > 在 2020年7月22日,15:09,Leonard Xu  写道:
> > > > >
> > > > > Congratulations!
> > > > >
> > > > > Thanks Dian Fu for the great work as release manager, and thanks
> > > everyone involved!
> > > > >
> > > > > Best
> > > > > Leonard Xu
> > > > >
> > > > >> 在 2020年7月22日,14:52,Dian Fu  写道:
> > > > >>
> > > > >> The Apache Flink community is very happy to announce the release
> of
> > > Apache Flink 1.11.1, which is the first bugfix release for the Apache
> > Flink
> > > 1.11 series.
> > > > >>
> > > > >> Apache Flink® is an open-source stream processing framework for
> > > distributed, high-performing, always-available, and accurate data
> > streaming
> > > applications.
> > > > >>
> > > > >> The release is available for download at:
> > > > >> https://flink.apache.org/downloads.html
> > > > >>
> > > > >> Please check out the release blog post for an overview of the
> > > improvements for this bugfix release:
> > > > >> https://flink.apache.org/news/2020/07/21/release-1.11.1.html
> > > > >>
> > > > >> The full release notes are available in Jira:
> > > > >>
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12348323
> > > > >>
> > > > >> We would like to thank all contributors of the Apache Flink
> > community
> > > who made this release possible!
> > > > >>
> > > > >> Regards,
> > > > >> Dian
> > > > >
> > > >
> > >
> >
>


[jira] [Created] (FLINK-18671) Update upgrade compatibility table (docs/ops/upgrading.md) for 1.11.0

2020-07-22 Thread Piotr Nowojski (Jira)
Piotr Nowojski created FLINK-18671:
--

 Summary: Update upgrade compatibility table 
(docs/ops/upgrading.md) for 1.11.0
 Key: FLINK-18671
 URL: https://issues.apache.org/jira/browse/FLINK-18671
 Project: Flink
  Issue Type: Task
  Components: Documentation
Affects Versions: 1.10.0
Reporter: Piotr Nowojski
Assignee: Yu Li
 Fix For: 1.10.0


Update upgrade compatibility table (docs/ops/upgrading.md) for 1.10.0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18670) Add (statefun-)docker repos to community page

2020-07-22 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-18670:


 Summary: Add (statefun-)docker repos to community page
 Key: FLINK-18670
 URL: https://issues.apache.org/jira/browse/FLINK-18670
 Project: Flink
  Issue Type: Improvement
  Components: Project Website
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [ANNOUNCE] Apache Flink 1.11.1 released

2020-07-22 Thread Hequn Cheng
Thanks Dian for the great work and thanks to everyone who makes this
release possible!

Best, Hequn

On Wed, Jul 22, 2020 at 4:40 PM Jark Wu  wrote:

> Congratulations! Thanks Dian for the great work and to be the release
> manager!
>
> Best,
> Jark
>
> On Wed, 22 Jul 2020 at 15:45, Yangze Guo  wrote:
>
> > Congrats!
> >
> > Thanks Dian Fu for being release manager, and everyone involved!
> >
> > Best,
> > Yangze Guo
> >
> > On Wed, Jul 22, 2020 at 3:14 PM Wei Zhong 
> wrote:
> > >
> > > Congratulations! Thanks Dian for the great work!
> > >
> > > Best,
> > > Wei
> > >
> > > > 在 2020年7月22日,15:09,Leonard Xu  写道:
> > > >
> > > > Congratulations!
> > > >
> > > > Thanks Dian Fu for the great work as release manager, and thanks
> > everyone involved!
> > > >
> > > > Best
> > > > Leonard Xu
> > > >
> > > >> 在 2020年7月22日,14:52,Dian Fu  写道:
> > > >>
> > > >> The Apache Flink community is very happy to announce the release of
> > Apache Flink 1.11.1, which is the first bugfix release for the Apache
> Flink
> > 1.11 series.
> > > >>
> > > >> Apache Flink® is an open-source stream processing framework for
> > distributed, high-performing, always-available, and accurate data
> streaming
> > applications.
> > > >>
> > > >> The release is available for download at:
> > > >> https://flink.apache.org/downloads.html
> > > >>
> > > >> Please check out the release blog post for an overview of the
> > improvements for this bugfix release:
> > > >> https://flink.apache.org/news/2020/07/21/release-1.11.1.html
> > > >>
> > > >> The full release notes are available in Jira:
> > > >>
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12348323
> > > >>
> > > >> We would like to thank all contributors of the Apache Flink
> community
> > who made this release possible!
> > > >>
> > > >> Regards,
> > > >> Dian
> > > >
> > >
> >
>


[DISCUSS] Adding Azure Platform Support in DataStream, Table and SQL Connectors

2020-07-22 Thread Israel Ekpo
I have opened the following issues to track new efforts to bring additional
Azure Support in the following areas to the connectors ecosystem.

My goal is to add the first two features [FLINK-18562] and [FLINK-18568] to
the existing file system capabilities [1] and then have the other
connectors FLINK-1856[3-7] exist as standalone plugins.

As more users adopt the additional connectors, we could incrementally bring
them into the core code base if necessary with sufficient support.

I am new to the process so that I have a few questions:

Do I need to create a FLIP [2] in order to make these changes to bring the
new capabilities or the individual JIRA issues are sufficient?

I am thinking about targeting Flink versions 1.10 through 1.12
For new connectors like this, how many versions can/should this be
integrated into?

Are there any upcoming changes to supported Java and Scala versions that I
need to be aware of?

Any ideas or suggestions you have would be great.

Below is a summary of the JIRA issues that were created to track the effort

Add Support for Azure Data Lake Store Gen 2 in Flink File System
https://issues.apache.org/jira/browse/FLINK-18562

Add Support for Azure Data Lake Store Gen 2 in Streaming File Sink
https://issues.apache.org/jira/browse/FLINK-18568

Add Support for Azure Cosmos DB DataStream Connector
https://issues.apache.org/jira/browse/FLINK-18563

Add Support for Azure Event Hub DataStream Connector
https://issues.apache.org/jira/browse/FLINK-18564

Add Support for Azure Event Grid DataStream Connector
https://issues.apache.org/jira/browse/FLINK-18565

Add Support for Azure Cognitive Search DataStream Connector
https://issues.apache.org/jira/browse/FLINK-18566

Add Support for Azure Cognitive Search Table & SQL Connector
https://issues.apache.org/jira/browse/FLINK-18567


[1] https://github.com/apache/flink/tree/master/flink-filesystems
[2]
https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals


Re: [ANNOUNCE] Apache Flink 1.11.1 released

2020-07-22 Thread Jark Wu
Congratulations! Thanks Dian for the great work and to be the release
manager!

Best,
Jark

On Wed, 22 Jul 2020 at 15:45, Yangze Guo  wrote:

> Congrats!
>
> Thanks Dian Fu for being release manager, and everyone involved!
>
> Best,
> Yangze Guo
>
> On Wed, Jul 22, 2020 at 3:14 PM Wei Zhong  wrote:
> >
> > Congratulations! Thanks Dian for the great work!
> >
> > Best,
> > Wei
> >
> > > 在 2020年7月22日,15:09,Leonard Xu  写道:
> > >
> > > Congratulations!
> > >
> > > Thanks Dian Fu for the great work as release manager, and thanks
> everyone involved!
> > >
> > > Best
> > > Leonard Xu
> > >
> > >> 在 2020年7月22日,14:52,Dian Fu  写道:
> > >>
> > >> The Apache Flink community is very happy to announce the release of
> Apache Flink 1.11.1, which is the first bugfix release for the Apache Flink
> 1.11 series.
> > >>
> > >> Apache Flink® is an open-source stream processing framework for
> distributed, high-performing, always-available, and accurate data streaming
> applications.
> > >>
> > >> The release is available for download at:
> > >> https://flink.apache.org/downloads.html
> > >>
> > >> Please check out the release blog post for an overview of the
> improvements for this bugfix release:
> > >> https://flink.apache.org/news/2020/07/21/release-1.11.1.html
> > >>
> > >> The full release notes are available in Jira:
> > >>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12348323
> > >>
> > >> We would like to thank all contributors of the Apache Flink community
> who made this release possible!
> > >>
> > >> Regards,
> > >> Dian
> > >
> >
>


Re: [DISCUSS] FLIP-36 - Support Interactive Programming in Flink Table API

2020-07-22 Thread Kurt Young
Thanks for the reply, I have one more comment about the optimizer
affection. Even if you are
trying to make the cached table be as orthogonal to the optimizer as
possible by introducing
a special sink, it is still not clear why this approach is safe. Maybe you
can add some process
introduction from API to JobGraph, otherwise I can't make sure everyone
reviewing the design
doc will have the same imagination about this. And I'm also quite sure some
of the existing
mechanism will be affected by this special sink, e.g. multi sink
optimization.

Best,
Kurt


On Wed, Jul 22, 2020 at 2:31 PM Xuannan Su  wrote:

> Hi Kurt,
>
> Thanks for the comments.
>
> 1. How do you identify the CachedTable?
> For the current design proposed in FLIP-36, we are using the first
> approach you mentioned, where the key of the map is the Cached Table java
> object. I think it is fine not to be able to identify another table
> representing the same DAG and not using the cached intermediate result
> because we want to make the caching table explicit. As mentioned in the
> FLIP, the cache API will return a Table object. And the user has to use the
> returned Table object to make use of the cached table. The rationale is
> that if the user builds the same DAG from scratch with some
> TableEnvironment instead of using the cached table object, the user
> probably doesn't want to use the cache.
>
> 2. How does the CachedTable affect the optimizer?
> We try to make the logic dealing with the cached table be as orthogonal to
> the optimizer as possible. That's why we introduce a special sink when we
> are going to cache a table and a special source when we are going to use a
> cached table. This way, we can let the optimizer does it works, and the
> logic of modifying the job graph can happen in the job graph generator. We
> can recognize the cached node with the special sink and source.
>
> 3. What's the effect of calling TableEnvironment.close()?
> We introduce the close method to prevent leaking of the cached table when
> the user is done with the table environment. Therefore, it makes more sense
> that the table environment, including all of its functionality, should not
> be used after closing. Otherwise, we should rename the close method to
> clearAllCache or something similar.
>
> And thanks for pointing out the use of not existing API used in the given
> examples. I have updated the examples in the FLIP accordingly.
>
> Best,
> Xuannan
> On Jul 16, 2020, 4:15 PM +0800, Kurt Young , wrote:
> > Hi Xuanna,
> >
> > Thanks for the detailed design doc, it described clearly how the API
> looks
> > and how to interact with Flink runtime.
> > However, the part which relates to SQL's optimizer is kind of blurry. To
> be
> > more precise, I have following questions:
> >
> > 1. How do you identify the CachedTable? I can imagine there would be map
> > representing the cache, how do you
> > compare the keys of the map? One approach is they will be compared by
> java
> > objects, which is simple but has
> > limited scope. For example, users created another table using some
> > interfaces of TableEnvironment, and the table
> > is exactly the same as the cached one, you won't be able to identify it.
> > Another choice is calculating the "signature" or
> > "diest" of the cached table, which involves string representation of the
> > whole sub tree represented by the cached table.
> > I don't think Flink currently provides such a mechanism around Table
> > though.
> >
> > 2. How does the CachedTable affect the optimizer? Specifically, will you
> > have a dedicated QueryOperation for it, will you have
> > a dedicated logical & physical RelNode for it? And I also don't see a
> > description about how to work with current optimize phases,
> > from Operation to Calcite rel node, and then to Flink's logical and
> > physical node, which will be at last translated to Flink's exec node.
> > There also exists other optimizations such as dead lock breaker, as well
> as
> > sub plan reuse inside the optimizer, I'm not sure whether
> > the logic dealing with cached tables can be orthogonal to all of these.
> > Hence I expect you could have a more detailed description here.
> >
> > 3. What's the effect of calling TableEnvironment.close()? You already
> > explained this would drop all caches this table env has,
> > could you also explain where other functionality still works for this
> table
> > env? Like can use still create/drop tables/databases/function
> > through this table env? What happens to the catalog and all temporary
> > objects of this table env?
> >
> > One minor comment: I noticed you used some not existing API in the
> examples
> > you gave, like table.collect(), which is a little
> > misleading.
> >
> > Best,
> > Kurt
> >
> >
> > On Thu, Jul 9, 2020 at 4:00 PM Xuannan Su  wrote:
> >
> > > Hi folks,
> > >
> > > I'd like to revive the discussion about FLIP-36 Support Interactive
> > > Programming in Flink Table API
> > >
> > >
> 

[jira] [Created] (FLINK-18669) Remove testing-related code from production code in PubSub connector

2020-07-22 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-18669:
--

 Summary: Remove testing-related code from production code in 
PubSub connector
 Key: FLINK-18669
 URL: https://issues.apache.org/jira/browse/FLINK-18669
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Google Cloud PubSub
Reporter: Robert Metzger


(See also: https://github.com/apache/flink/pull/12846#discussion_r457380452)

At least the PubSubSink.java contains some testing-related code, that could 
probably be completely removed from the production code.
As part of this ticket, we should also check if there are other cases like this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [ANNOUNCE] Apache Flink 1.11.1 released

2020-07-22 Thread Yangze Guo
Congrats!

Thanks Dian Fu for being release manager, and everyone involved!

Best,
Yangze Guo

On Wed, Jul 22, 2020 at 3:14 PM Wei Zhong  wrote:
>
> Congratulations! Thanks Dian for the great work!
>
> Best,
> Wei
>
> > 在 2020年7月22日,15:09,Leonard Xu  写道:
> >
> > Congratulations!
> >
> > Thanks Dian Fu for the great work as release manager, and thanks everyone 
> > involved!
> >
> > Best
> > Leonard Xu
> >
> >> 在 2020年7月22日,14:52,Dian Fu  写道:
> >>
> >> The Apache Flink community is very happy to announce the release of Apache 
> >> Flink 1.11.1, which is the first bugfix release for the Apache Flink 1.11 
> >> series.
> >>
> >> Apache Flink® is an open-source stream processing framework for 
> >> distributed, high-performing, always-available, and accurate data 
> >> streaming applications.
> >>
> >> The release is available for download at:
> >> https://flink.apache.org/downloads.html
> >>
> >> Please check out the release blog post for an overview of the improvements 
> >> for this bugfix release:
> >> https://flink.apache.org/news/2020/07/21/release-1.11.1.html
> >>
> >> The full release notes are available in Jira:
> >> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12348323
> >>
> >> We would like to thank all contributors of the Apache Flink community who 
> >> made this release possible!
> >>
> >> Regards,
> >> Dian
> >
>


Re: [ANNOUNCE] Apache Flink 1.11.1 released

2020-07-22 Thread Wei Zhong
Congratulations! Thanks Dian for the great work!

Best,
Wei

> 在 2020年7月22日,15:09,Leonard Xu  写道:
> 
> Congratulations!
> 
> Thanks Dian Fu for the great work as release manager, and thanks everyone 
> involved!
> 
> Best
> Leonard Xu
> 
>> 在 2020年7月22日,14:52,Dian Fu  写道:
>> 
>> The Apache Flink community is very happy to announce the release of Apache 
>> Flink 1.11.1, which is the first bugfix release for the Apache Flink 1.11 
>> series.
>> 
>> Apache Flink® is an open-source stream processing framework for distributed, 
>> high-performing, always-available, and accurate data streaming applications.
>> 
>> The release is available for download at:
>> https://flink.apache.org/downloads.html
>> 
>> Please check out the release blog post for an overview of the improvements 
>> for this bugfix release:
>> https://flink.apache.org/news/2020/07/21/release-1.11.1.html
>> 
>> The full release notes are available in Jira:
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12348323
>> 
>> We would like to thank all contributors of the Apache Flink community who 
>> made this release possible!
>> 
>> Regards,
>> Dian
> 



Re: [ANNOUNCE] Apache Flink 1.11.1 released

2020-07-22 Thread Leonard Xu
Congratulations!

Thanks Dian Fu for the great work as release manager, and thanks everyone 
involved!

Best
Leonard Xu

> 在 2020年7月22日,14:52,Dian Fu  写道:
> 
> The Apache Flink community is very happy to announce the release of Apache 
> Flink 1.11.1, which is the first bugfix release for the Apache Flink 1.11 
> series.
> 
> Apache Flink® is an open-source stream processing framework for distributed, 
> high-performing, always-available, and accurate data streaming applications.
> 
> The release is available for download at:
> https://flink.apache.org/downloads.html
> 
> Please check out the release blog post for an overview of the improvements 
> for this bugfix release:
> https://flink.apache.org/news/2020/07/21/release-1.11.1.html
> 
> The full release notes are available in Jira:
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12348323
> 
> We would like to thank all contributors of the Apache Flink community who 
> made this release possible!
> 
> Regards,
> Dian



[ANNOUNCE] Apache Flink 1.11.1 released

2020-07-22 Thread Dian Fu
The Apache Flink community is very happy to announce the release of Apache 
Flink 1.11.1, which is the first bugfix release for the Apache Flink 1.11 
series.

Apache Flink® is an open-source stream processing framework for distributed, 
high-performing, always-available, and accurate data streaming applications.

The release is available for download at:
https://flink.apache.org/downloads.html

Please check out the release blog post for an overview of the improvements for 
this bugfix release:
https://flink.apache.org/news/2020/07/21/release-1.11.1.html

The full release notes are available in Jira:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12348323

We would like to thank all contributors of the Apache Flink community who made 
this release possible!

Regards,
Dian

Re: [ANNOUNCE] New PMC member: Piotr Nowojski

2020-07-22 Thread Forward Xu
Congratulations!


Best,

Forward

godfrey he  于2020年7月22日周三 上午10:45写道:

> Congratulations!
>
> Best,
> Godfrey
>
> Till Rohrmann  于2020年7月21日周二 下午10:46写道:
>
> > Congrats, Piotr!
> >
> > Cheers,
> > Till
> >
> > On Thu, Jul 9, 2020 at 4:15 AM Xingcan Cui  wrote:
> >
> > > Congratulations, Piotr!
> > >
> > > Best, Xingcan
> > >
> > > On Wed, Jul 8, 2020, 21:53 Yang Wang  wrote:
> > >
> > > > Congratulations Piotr!
> > > >
> > > >
> > > > Best,
> > > > Yang
> > > >
> > > > Dan Zou  于2020年7月8日周三 下午10:36写道:
> > > >
> > > > > Congratulations!
> > > > >
> > > > > Best,
> > > > > Dan Zou
> > > > >
> > > > > > 2020年7月8日 下午5:25,godfrey he  写道:
> > > > > >
> > > > > > Congratulations
> > > > >
> > > > >
> > > >
> > >
> >
>


[jira] [Created] (FLINK-18668) BytesHashMap#growAndRehash should release newly allocated segments before throwing the exception

2020-07-22 Thread Caizhi Weng (Jira)
Caizhi Weng created FLINK-18668:
---

 Summary: BytesHashMap#growAndRehash should release newly allocated 
segments before throwing the exception
 Key: FLINK-18668
 URL: https://issues.apache.org/jira/browse/FLINK-18668
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Runtime
Affects Versions: 1.11.0
Reporter: Caizhi Weng


In {{BytesHashMap#growAndRehash}} we have the following code.

{code:java}
List newBucketSegments = new ArrayList<>(required);

try {
int numAllocatedSegments = required - memoryPool.freePages();
if (numAllocatedSegments > 0) {
throw new MemoryAllocationException();
}
int needNumFromFreeSegments = required - newBucketSegments.size();
for (int end = needNumFromFreeSegments; end > 0; end--) {
newBucketSegments.add(memoryPool.nextSegment());
}

setBucketVariables(newBucketSegments);
} catch (MemoryAllocationException e) {
LOG.warn("BytesHashMap can't allocate {} pages, and now used {} pages",
required, reservedNumBuffers);
throw new EOFException();
}
{code}

Newly allocated memory segments are temporarily stored in {{newBucketSegments}} 
before giving to the hash table. But if a {{MemoryAllocationException}} 
happens, these segments are not returned to the memory pool, causing the 
following exception stack trace.

{code}
java.lang.RuntimeException: 
org.apache.flink.runtime.memory.MemoryAllocationException: Could not allocate 
512 pages
at 
org.apache.flink.table.runtime.util.LazyMemorySegmentPool.nextSegment(LazyMemorySegmentPool.java:84)
 ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
at 
org.apache.flink.table.runtime.operators.aggregate.BytesHashMap.growAndRehash(BytesHashMap.java:393)
 ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
at 
org.apache.flink.table.runtime.operators.aggregate.BytesHashMap.append(BytesHashMap.java:313)
 ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
at HashAggregateWithKeys$360.processElement(Unknown Source) ~[?:?]
at 
org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:161)
 ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
at 
org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:178)
 ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
at 
org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:153)
 ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
at 
org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:67)
 ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:345)
 ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxStep(MailboxProcessor.java:191)
 ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:181)
 ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:560)
 ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:530) 
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721) 
[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546) 
[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
at java.lang.Thread.run(Thread.java:834) [?:1.8.0_102]
Suppressed: java.lang.RuntimeException: Should return all used memory 
before clean, page used: 2814
at 
org.apache.flink.table.runtime.util.LazyMemorySegmentPool.close(LazyMemorySegmentPool.java:99)
 ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
at 
org.apache.flink.table.runtime.operators.aggregate.BytesHashMap.free(BytesHashMap.java:486)
 ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
at 
org.apache.flink.table.runtime.operators.aggregate.BytesHashMap.free(BytesHashMap.java:475)
 ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
at HashAggregateWithKeys$360.close(Unknown Source) ~[?:?]
at 
org.apache.flink.table.runtime.operators.TableStreamOperator.dispose(TableStreamOperator.java:44)
 ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
at 

Re: thrift support

2020-07-22 Thread Chen Qin
Thanks, Yu sharing more background on this.

Jark,

We were able to sync with Yu a bit offline. I think we should reuse Jira
and the future on how to reuse code when we get into the implementation
phase.
and continue the discussion maybe share a google doc detail list of work
and options so folks can agree on as first step. Please assign FLINK-11746 to
me account.

As Benchao previously pointed out, Flink SQL thrift seems likely growing
beyond single pr work.
- Ser/Deser, use kryo to customize seralizer or infer POJO from thrift from
source
- TableSchema and Type translation, use DDL to match or use thrift to infer
DDL, will nest column pruning works?
- As most online services use either gRPc or thrift as service endpoint
definition. Is there a proper way to construct a "table" that interact
directly with those online services (v.s async io) ?

Thanks,
Chen

On Tue, Jul 21, 2020 at 12:14 PM Yu Yang  wrote:

> Thanks for the discussion. In https://github.com/apache/flink/pull/8067 we
> made an initial version on adding thrift-format support in flink, and
> haven't got time to finish it. Feel free to take it over and make changes.
> I've also linked this discussion thread in
> https://issues.apache.org/jira/browse/FLINK-11746.
>
> Regards,
> -Yu
>
> On Tue, Jul 21, 2020 at 1:14 AM Jark Wu  wrote:
>
> > Thanks Dawid for the link. I have a glance at the PR.
> >
> > I think we can continue the thrift format based on the PR (would be
> better
> > to reach out to the author).
> >
> > Best,
> > Jark
> >
> > On Tue, 21 Jul 2020 at 15:58, Dawid Wysakowicz 
> > wrote:
> >
> > > Hi,
> > >
> > > I've just spotted this PR that might be helpful in the discussion:
> > > https://github.com/apache/flink/pull/8067
> > >
> > > Best,
> > >
> > > Dawid
> > >
> > > On 20/07/2020 04:30, Benchao Li wrote:
> > > > Hi Chen,
> > > >
> > > > Thanks for bringing up this discussion. We are doing something
> similar
> > > > internally recently.
> > > >
> > > > Our use case is that many services in our company are built with
> > > > thrift protocol, and we
> > > > want to support accessing these RPC services natively with Flink SQL.
> > > > Currently, there are two ways that we aim to support, they are thrift
> > RPC
> > > > Sink and thrift RPC
> > > > temporal table (dimension table).
> > > > Then our scenario is that we need to support both (de)ser with
> > > > thrift format, and accessing
> > > > the thrift RPC service.
> > > >
> > > > Jeff Zhang  于2020年7月19日周日 上午9:43写道:
> > > >
> > > >> Hi Chen,
> > > >>
> > > >> Right, this is what I mean. Could you provide more details about the
> > > >> desr/ser work ? Giving a concrete example or usage scenario would be
> > > >> helpful.
> > > >>
> > > >>
> > > >>
> > > >> Chen Qin  于2020年7月18日周六 下午11:09写道:
> > > >>
> > > >>> Jeff,
> > > >>>
> > > >>> Are you referring something like this SPIP?
> > > >>>
> > > >>>
> > > >>
> > >
> >
> https://docs.google.com/document/d/1ug4K5e2okF5Q2Pzi3qJiUILwwqkn0fVQaQ-Q95HEcJQ/edit#heading=h.x97c6tj78zo0
> > > >>> Not at this moment, we are working on desr/ser work at the moment.
> > > Would
> > > >> be
> > > >>> good to starts discussion and learn if folks working on related
> areas
> > > and
> > > >>> align.
> > > >>>
> > > >>> Chen
> > > >>>
> > > >>> On Sat, Jul 18, 2020 at 6:41 AM Jeff Zhang 
> wrote:
> > > >>>
> > >  Hi Chen,
> > > 
> > >  Are building something like hive thrift server ?
> > > 
> > >  Chen Qin  于2020年7月18日周六 上午8:50写道:
> > > 
> > > > Hi there,
> > > >
> > > > Here in Pinterest, we utilize thrift end to end in our tech
> stack.
> > As
> > > >>> we
> > > > have been building Flink as a service platform, the team spent
> time
> > >  working
> > > > on supporting Flink jobs with thrift format and successfully
> > > >> launched a
> > > > good number of important jobs in Production in H1.
> > > >
> > > > In H2, we are looking at supporting Flink SQL with native Thrift
> > > >>> support.
> > > > We have some prototypes already running in development settings
> and
> > > >>> plan
> > >  to
> > > > move forward on this approach.
> > > >
> > > > In the long run, we thought out of box thrift format support
> would
> > >  benefit
> > > > other folks as well. So the question is if there is already some
> > > >> effort
> > > > around this space we can sync with?
> > > >
> > > > Chen
> > > > Pinterest Data
> > > >
> > > 
> > >  --
> > >  Best Regards
> > > 
> > >  Jeff Zhang
> > > 
> > > >>
> > > >> --
> > > >> Best Regards
> > > >>
> > > >> Jeff Zhang
> > > >>
> > > >
> > >
> > >
> >
>


Re: [DISCUSS] FLIP-36 - Support Interactive Programming in Flink Table API

2020-07-22 Thread Xuannan Su
Hi Kurt,

Thanks for the comments.

1. How do you identify the CachedTable?
For the current design proposed in FLIP-36, we are using the first approach you 
mentioned, where the key of the map is the Cached Table java object. I think it 
is fine not to be able to identify another table representing the same DAG and 
not using the cached intermediate result because we want to make the caching 
table explicit. As mentioned in the FLIP, the cache API will return a Table 
object. And the user has to use the returned Table object to make use of the 
cached table. The rationale is that if the user builds the same DAG from 
scratch with some TableEnvironment instead of using the cached table object, 
the user probably doesn't want to use the cache.

2. How does the CachedTable affect the optimizer?
We try to make the logic dealing with the cached table be as orthogonal to the 
optimizer as possible. That's why we introduce a special sink when we are going 
to cache a table and a special source when we are going to use a cached table. 
This way, we can let the optimizer does it works, and the logic of modifying 
the job graph can happen in the job graph generator. We can recognize the 
cached node with the special sink and source.

3. What's the effect of calling TableEnvironment.close()?
We introduce the close method to prevent leaking of the cached table when the 
user is done with the table environment. Therefore, it makes more sense that 
the table environment, including all of its functionality, should not be used 
after closing. Otherwise, we should rename the close method to clearAllCache or 
something similar.

And thanks for pointing out the use of not existing API used in the given 
examples. I have updated the examples in the FLIP accordingly.

Best,
Xuannan
On Jul 16, 2020, 4:15 PM +0800, Kurt Young , wrote:
> Hi Xuanna,
>
> Thanks for the detailed design doc, it described clearly how the API looks
> and how to interact with Flink runtime.
> However, the part which relates to SQL's optimizer is kind of blurry. To be
> more precise, I have following questions:
>
> 1. How do you identify the CachedTable? I can imagine there would be map
> representing the cache, how do you
> compare the keys of the map? One approach is they will be compared by java
> objects, which is simple but has
> limited scope. For example, users created another table using some
> interfaces of TableEnvironment, and the table
> is exactly the same as the cached one, you won't be able to identify it.
> Another choice is calculating the "signature" or
> "diest" of the cached table, which involves string representation of the
> whole sub tree represented by the cached table.
> I don't think Flink currently provides such a mechanism around Table
> though.
>
> 2. How does the CachedTable affect the optimizer? Specifically, will you
> have a dedicated QueryOperation for it, will you have
> a dedicated logical & physical RelNode for it? And I also don't see a
> description about how to work with current optimize phases,
> from Operation to Calcite rel node, and then to Flink's logical and
> physical node, which will be at last translated to Flink's exec node.
> There also exists other optimizations such as dead lock breaker, as well as
> sub plan reuse inside the optimizer, I'm not sure whether
> the logic dealing with cached tables can be orthogonal to all of these.
> Hence I expect you could have a more detailed description here.
>
> 3. What's the effect of calling TableEnvironment.close()? You already
> explained this would drop all caches this table env has,
> could you also explain where other functionality still works for this table
> env? Like can use still create/drop tables/databases/function
> through this table env? What happens to the catalog and all temporary
> objects of this table env?
>
> One minor comment: I noticed you used some not existing API in the examples
> you gave, like table.collect(), which is a little
> misleading.
>
> Best,
> Kurt
>
>
> On Thu, Jul 9, 2020 at 4:00 PM Xuannan Su  wrote:
>
> > Hi folks,
> >
> > I'd like to revive the discussion about FLIP-36 Support Interactive
> > Programming in Flink Table API
> >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink
> >
> > The FLIP proposes to add support for interactive programming in Flink
> > Table API. Specifically, it let users cache the intermediate
> > results(tables) and use them in the later jobs to avoid recomputing the
> > intermediate result(tables).
> >
> > I am looking forward to any opinions and suggestions from the community.
> >
> > Best,
> > Xuannan
> > On May 7, 2020, 5:40 PM +0800, Xuannan Su , wrote:
> > > Hi,
> > >
> > > There are some feedbacks from @Timo and @Kurt in the voting thread for
> > FLIP-36 and I want to share my thoughts here.
> > >
> > > 1. How would the FLIP-36 look like after FLIP-84?
> > > I don't think FLIP-84 will affect FLIP-36 from the public API
> > perspective. 

Re: [REMINDER] Use 'starter' labels for Jira issues where it makes sense

2020-07-22 Thread Till Rohrmann
Thanks for the reminder Andrey. This is important for helping new
contributors to find their way around in the Flink project.

Cheers,
Till

On Mon, Jul 20, 2020 at 5:30 PM Aljoscha Krettek 
wrote:

> Yes, thanks Andrey! That's a good reminder for everyone. :-)
>
> On 20.07.20 16:02, Andrey Zagrebin wrote:
> > Hi Flink Devs,
> >
> > I would like to remind you that we have a 'starter' label [1] to annotate
> > Jira issues which need a contribution and which are not very
> > complicated for the new contributors. The starter issues can be a good
> > opportunity for the new contributors who want to learn about Flink but do
> > not know where to start [2].
> >
> > When you open a Jira issue, please, pay attention to whether it can be a
> > starter task.
> > Let's try to help on-boarding new contributors!
> >
> > Cheers,
> > Andrey
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=154995514
> >   (Labels)
> > [2]
> >
> https://flink.apache.org/contributing/contribute-code.html#looking-for-what-to-contribute
> >
>
>