[jira] [Created] (FLINK-29858) Jdbc reading supports setting multiple queryTemplates

2022-11-02 Thread waywtdcc (Jira)
waywtdcc created FLINK-29858:


 Summary: Jdbc reading supports setting multiple queryTemplates
 Key: FLINK-29858
 URL: https://issues.apache.org/jira/browse/FLINK-29858
 Project: Flink
  Issue Type: New Feature
  Components: Connectors / JDBC
Affects Versions: 1.16.0
Reporter: waywtdcc
 Fix For: 1.17.0


Jdbc reading supports setting multiple queryTemplates. Currently, jdbc reading 
only supports reading one query template. Sometimes it is not enough.

queryTemplate 

in JdbcRowDataInputFormat



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Issue tracking workflow

2022-11-02 Thread Xintong Song
Thanks all for the valuable feedback, opinions and suggestions.

# Option 1.
I know this is the first choice for pretty much everyone. Many people from
the Flink community (including myself) have shared their opinion with
Infra. However, based on the feedback so far, TBH I don't think things
would turn out the way we want. I don't see what else we can do. Does
anyone have more suggestions on this option? Or we probably have to
scratch it out of the list.

# Option 4.
Seems there are also quite some concerns on using solely GH issues: limited
features (thus the significant changes to the current issue/release
management processes), migration cost, source of truth, etc. I think I'm
also convinced that this may not be a good choice.

# Option 2 & 3.
Between the two options, I'm leaning towards option 2.
- IMO, making it as easy as possible for users to report issues should be a
top priority. Having to wait for a human response for creating an account
does not meet that requirement. That makes a strong objection to option 3
from my side.
- Using GH issues for consumer-facing issues and reflecting the valid ones
back to Jira (either manually by committers or by bot) sounds good to me.
The status (open/closed) and labels should make tracking the issues easier
compared to in mailing lists / slack, in terms of whether an issue has been
taken care of / reflected to Jira / closed as invalid. That does not mean
we should not reflect things from mailing lists / slack to Jira. Ideally,
we leverage every possible channel for collecting user issues / feedback,
while guiding / suggesting users to choose GH issues over the others.
- For new contributors, they still need to request an account from a PMC
member. They can even make that request on GH issues, if they do not mind
posting the email address publicly.
- I would not be worried very much about the privacy issue, if the Jira
account creation is restricted to contributors. Contributors are exposing
their email addresses publicly anyway, in dev@ mailing list and commit
history. I'm also not strongly against creating a dedicated mailing list
though.

Best,

Xintong



On Wed, Nov 2, 2022 at 9:16 PM Chesnay Schepler  wrote:

> Calcite just requested a separate mailing list for users to request a
> JIRA account.
>
>
> I think I'd try going a similar route. While I prefer the openness of
> github issues, they are really limited, and while some things can be
> replicated with labels (like fix versions / components), things like
> release notes can't.
> We'd also lose a central place for collecting issues, since we'd have to
> (?) scope issues per repo.
>
> I wouldn't want to import everything into GH issues (it's just a flawed
> approach in the long-term imo), but on the other hand I don't know if
> the auto linker even works if it has to link to either jira or a GH issue.
>
> Given that we need to change workflows in any case, I think I'd prefer
> sticking to JIRA.
> For reported bugs I'd wager that in most cases we can file the tickets
> ourselves and communicate with users on slack/MLs to gather all the
> information. I'd argue that if we'd had been more pro-active with filing
> tickets for user issues (instead of relying on them to do it) we
> would've addressed several issues way sooner.
>
> Additionally, since either option would be a sort of experiment, then
> JIRA is a safer option. We have to change less and there aren't any
> long-term ramifications (like having to re-import GH tickets into JIRA).
>
> On 28/10/2022 16:49, Piotr Nowojski wrote:
> > Hi,
> >
> > I'm afraid of the migration cost to only github issues and lack of many
> > features that we are currently using. That would be very disruptive and
> > annoying. For me github issues are far worse compared to using the Jira.
> >
> > I would strongly prefer Option 1. over the others. Option 4 I would like
> > the least. I would be fine with Option 3, and Option 2 but assuming that
> > Jira would stay the source of truth.
> > For option 2, maybe we could have a bot that would backport/copy user
> > created issues in github to Jira (and link them together)? Discussions
> > could still happen in the github, but we could track all of the issues as
> > we are doing right now. Bot could also sync it the other way around (like
> > marking tickets closed, affected/fixed versions etc).
> >
> > Best,
> > Piotrek
> >
> > czw., 27 paź 2022 o 07:48 Martijn Visser 
> > napisał(a):
> >
> >> Hi,
> >>
> >> We have to keep in mind that if a users asks for a new Jira account,
> that
> >> person will need to provide its email address which is the Flink PMC
> >> processing personal identifiable information. There needs to be a
> careful
> >> process for that and to be honest, I don't think the ASF should do this
> >> from a privacy perspective.
> >>
> >> As an example, the Calcite community decided to create a dedicated,
> private
> >> list where users can ask for an account to avoid making the email
> address
> >> public.
> >>
> 

[jira] [Created] (FLINK-29857) Fix HiveServer2Endpoint crush when using Hive3 Beeline

2022-11-02 Thread yuzelin (Jira)
yuzelin created FLINK-29857:
---

 Summary: Fix HiveServer2Endpoint crush when using Hive3 Beeline
 Key: FLINK-29857
 URL: https://issues.apache.org/jira/browse/FLINK-29857
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.16.0
Reporter: yuzelin
 Fix For: 1.16.1






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Flink release retro

2022-11-02 Thread Xingbo Huang
Thanks for starting the discussion, Matthias!

When I was the 1.16 release manager, I checked the dev mailing list to see
what the release managers of the previous two versions did throughout the
release cycle. In release 1.16, after each release sync meeting, we would
send a summary email, which can help capture how the release cycle has
progressed. So we may not really need a dedicated retrospective meeting.
But I think it's a good idea to have a wiki to summarize and review what
can be improved after we finish a release. For example, the release cycle
for 1.16 was also longer than expected. It took almost three months from
the feature freeze to our final official release, a month longer than
expected.

Best,
Xingbo

Jing Ge  于2022年11月3日周四 04:56写道:

> Hi all,
>
> I figure it is a good idea and +1 for the async retro. More developers will
> learn from what the release process looks like, which will give them
> context to engage in future releases. It would be great if the conversation
> could somehow follow the traditional retro pattern, e.g. tagged with
> "Liked, learned, Lacked, and Longed for".
>
> Best regards,
> Jing
>
> On Wed, Nov 2, 2022 at 11:21 AM Martijn Visser 
> wrote:
>
> > Hi Matthias,
> >
> > I think it's a good idea to capture how this release cycle has
> progressed.
> > I'm not sure that a classical "retrospective" is the best solution, since
> > it would require multiple people in different timezones to attend a
> virtual
> > meeting.
> >
> > So I would +1 an async retrospective, which could be the questions that
> you
> > would normally ask during a retrospective yet but now via a
> questionnaire.
> > It probably makes sense to have a proposal of the questions that can be
> > asked, discuss them and then sent them out.
> >
> > WDYT?
> >
> > Thanks,
> >
> > Martijn
> >
> > On Wed, Nov 2, 2022 at 9:42 AM Qingsheng Ren  wrote:
> >
> > > Thanks for starting the discussion Matthias!
> > >
> > > I think having a retro after a release cycle would be quite helpful to
> > > standardizing the procedure of the release, and also could avoid new
> > > release managers getting stuck on the same issue that happened before.
> I
> > > prefer the second option that RMs could open a discussion thread in ML
> at
> > > the end of the release to collect feedback about the last release cycle
> > and
> > > add them to the release wiki page, which would be quite handy for
> further
> > > RMs.
> > >
> > > Best,
> > > Qingsheng
> > > Ververica (Alibaba)
> > >
> > > On Mon, Oct 31, 2022 at 11:02 PM Matthias Pohl
> > >  wrote:
> > >
> > > > Hi everyone,
> > > > I want to bring up the idea of having a retrospective on the release
> > from
> > > > the release manager's perspective. The idea would be to collect
> > feedback
> > > on
> > > > what went well and what could be improved for a specific minor
> release.
> > > So
> > > > far, I didn't find anything on that topic. Does the community find
> this
> > > > useful? Or was this already done but not helpful?
> > > >
> > > > I see three options here:
> > > > 1. Having an actual meeting where issues can be discussed and/or
> > > > experiences can be shared between the release managers of the
> previous
> > > > release and the release managers of the next minor release. Of
> course,
> > > this
> > > > could be open to other contributors as well. A summary could be
> > provided
> > > in
> > > > the Flink wiki (the Flink release's wiki article).
> > > > 2. The release manager(s) provide a summary on the Flink release's
> wiki
> > > > article as part of the release process.
> > > > 3. Leave the process as is without any additional retrospective but
> > focus
> > > > on improving the documentation if issues arose during the release.
> > > >
> > > > That might help people who consider contributing to the community
> > through
> > > > supporting the release efforts. Additionally, it might help in
> > > > understanding what went wrong in past releases retroactively (e.g.
> the
> > > > longer release cycle for 1.15).
> > > >
> > > > I'm curious about opinion's on that topic.
> > > >
> > > > Best,
> > > > Matthias
> > > >
> > >
> >
>


[jira] [Created] (FLINK-29856) Triggering savepoint does not trigger source operator checkpoint

2022-11-02 Thread Mason Chen (Jira)
Mason Chen created FLINK-29856:
--

 Summary: Triggering savepoint does not trigger source operator 
checkpoint 
 Key: FLINK-29856
 URL: https://issues.apache.org/jira/browse/FLINK-29856
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Checkpointing
Affects Versions: 1.16.0
Reporter: Mason Chen


When I trigger a savepoint with the Flink K8s operator, I verified for two 
sources (KafkaSource and MultiClusterKafkaSource) do not invoke snapshotState 
or notifyCheckpointComplete. This is easily reproducible in a simple pipeline 
(e.g. KafkaSource -> print).

 

However, when the checkpoint occurs via the interval, I do see the sources 
checkpointing properly and expected logs in the output.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29855) UDF randomly processed input data twice

2022-11-02 Thread Xinyi Yan (Jira)
Xinyi Yan created FLINK-29855:
-

 Summary: UDF randomly processed input data twice 
 Key: FLINK-29855
 URL: https://issues.apache.org/jira/browse/FLINK-29855
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.14.4
Reporter: Xinyi Yan
 Attachments: IntInputUdf.java, SpendReport.java, example.log

 

To reproduce the issue:
 # create a datagen table with a single column int type of id.
 # create a UDF that only mod input data with logging statements.
 # create a print table that prints the results.
 # insert data into the print table with UDF(input id column) execution from 
the datagen table.

The logging shows that some of the data have been processed twice, which is not 
expected I guess? This will totally change the behavior of the UDF if the data 
has been processed twice. I also attached main and UDF classes, as well as the 
logging file for additional info.

 

DDL

 
{code:java}
public static void main(String[] args) throws Exception {
        EnvironmentSettings settings = 
EnvironmentSettings.newInstance().build();
        
        TableEnvironment tEnv = TableEnvironment.create(settings);
        
        tEnv.executeSql("CREATE FUNCTION IntInputUdf AS 
'org.apache.flink.playgrounds.spendreport.IntInputUdf'");        
tEnv.executeSql("CREATE TABLE datagenTable (\n" +
                "    id  INT\n" +
                ") WITH (\n" +
                "    'connector' = 'datagen',\n" +
                "    'number-of-rows' = '100',\n" +
                "    'rows-per-second' = '1'\n" +
                ")");        tEnv.executeSql("CREATE TABLE print_table (\n" +
                "    id_in_bytes  VARBINARY,\n" +
                "    id  INT\n" +
                ") WITH (\n" +
                "    'connector' = 'print'\n" +
                ")");        tEnv.executeSql("INSERT INTO print_table SELECT * 
FROM ( SELECT IntInputUdf(`id`) AS `id_in_bytes`, `id` FROM datagenTable ) AS 
ET WHERE ET.`id_in_bytes` IS NOT NULL");
    }  {code}
 

UDF

 
{code:java}
public @DataTypeHint("Bytes") byte[] eval(@DataTypeHint("INT") Integer 
intputNum) {
    byte[] results = intputNum.toString().getBytes(StandardCharsets.UTF_8);
    if (intputNum % 2 == 0) {
      LOG.info("### ### input bytes {} and num {}.   ### ### DEBUG ### ### 
duplicated call??? ### DEBUG  ### ### ", results, intputNum);
      return results;
    }
    LOG.info("*** *** input bytes {} and num {}.", results, intputNum);
    return null;
  } {code}
output

 

 
{code:java}
2022-11-02 13:38:56,765 INFO  
org.apache.flink.playgrounds.spendreport.IntInputUdf [] - ### ### input 
bytes [45, 49, 51, 50, 52, 56, 51, 54, 53, 48, 50] and num -1324836502.   ### 
### DEBUG ### ### duplicated call??? ### DEBUG  ### ### 2022-11-02 13:38:56,766 
INFO  org.apache.flink.playgrounds.spendreport.IntInputUdf [] - ### ### 
input bytes [45, 49, 51, 50, 52, 56, 51, 54, 53, 48, 50] and num -1324836502.   
### ### DEBUG ### ### duplicated call??? ### DEBUG  ### ### 2022-11-02 
13:38:57,761 INFO  org.apache.flink.playgrounds.spendreport.IntInputUdf 
[] - ### ### input bytes [49, 48, 56, 53, 52, 53, 54, 53, 52, 50] and num 
1085456542.   ### ### DEBUG ### ### duplicated call??? ### DEBUG  ### ### 
2022-11-02 13:38:57,763 INFO  
org.apache.flink.playgrounds.spendreport.IntInputUdf [] - ### ### input 
bytes [49, 48, 56, 53, 52, 53, 54, 53, 52, 50] and num 1085456542.   ### ### 
DEBUG ### ### duplicated call??? ### DEBUG  ### ### 2022-11-02 13:38:58,760 
INFO  org.apache.flink.playgrounds.spendreport.IntInputUdf [] - ### ### 
input bytes [49, 53, 48, 54, 51, 49, 49, 57, 53, 52] and num 1506311954.   ### 
### DEBUG ### ### duplicated call??? ### DEBUG  ### ### 2022-11-02 13:38:58,761 
INFO  org.apache.flink.playgrounds.spendreport.IntInputUdf [] - ### ### 
input bytes [49, 53, 48, 54, 51, 49, 49, 57, 53, 52] and num 1506311954.   ### 
### DEBUG ### ### duplicated call??? ### DEBUG  ### ### 2022-11-02 13:38:59,759 
INFO  org.apache.flink.playgrounds.spendreport.IntInputUdf [] - *** *** 
input bytes [45, 49, 56, 48, 48, 54, 57, 48, 52, 51, 55] and num 
-1800690437.2022-11-02 13:39:00,761 INFO  
org.apache.flink.playgrounds.spendreport.IntInputUdf [] - *** *** input 
bytes [49, 52, 50, 56, 56, 55, 55, 52, 56, 51] and num 1428877483.2022-11-02 
13:39:01,761 INFO  org.apache.flink.playgrounds.spendreport.IntInputUdf 
[] - ### ### input bytes [45, 49, 55, 57, 52, 50, 54, 51, 54, 56, 54] and num 
-1794263686.   ### ### DEBUG ### ### duplicated call??? ### DEBUG  ### ### 
2022-11-02 13:39:01,761 INFO  
org.apache.flink.playgrounds.spendreport.IntInputUdf [] - ### ### input 
bytes [45, 49, 55, 57, 52, 50, 54, 51, 54, 56, 54] and num -1794263686.   ### 
### DEBUG ### ### duplicated call??? ### DEBUG  ### ### 2022-11-02 13:39:02,760 
INFO  

Re: [DISCUSS] Flink release retro

2022-11-02 Thread Jing Ge
Hi all,

I figure it is a good idea and +1 for the async retro. More developers will
learn from what the release process looks like, which will give them
context to engage in future releases. It would be great if the conversation
could somehow follow the traditional retro pattern, e.g. tagged with
"Liked, learned, Lacked, and Longed for".

Best regards,
Jing

On Wed, Nov 2, 2022 at 11:21 AM Martijn Visser 
wrote:

> Hi Matthias,
>
> I think it's a good idea to capture how this release cycle has progressed.
> I'm not sure that a classical "retrospective" is the best solution, since
> it would require multiple people in different timezones to attend a virtual
> meeting.
>
> So I would +1 an async retrospective, which could be the questions that you
> would normally ask during a retrospective yet but now via a questionnaire.
> It probably makes sense to have a proposal of the questions that can be
> asked, discuss them and then sent them out.
>
> WDYT?
>
> Thanks,
>
> Martijn
>
> On Wed, Nov 2, 2022 at 9:42 AM Qingsheng Ren  wrote:
>
> > Thanks for starting the discussion Matthias!
> >
> > I think having a retro after a release cycle would be quite helpful to
> > standardizing the procedure of the release, and also could avoid new
> > release managers getting stuck on the same issue that happened before. I
> > prefer the second option that RMs could open a discussion thread in ML at
> > the end of the release to collect feedback about the last release cycle
> and
> > add them to the release wiki page, which would be quite handy for further
> > RMs.
> >
> > Best,
> > Qingsheng
> > Ververica (Alibaba)
> >
> > On Mon, Oct 31, 2022 at 11:02 PM Matthias Pohl
> >  wrote:
> >
> > > Hi everyone,
> > > I want to bring up the idea of having a retrospective on the release
> from
> > > the release manager's perspective. The idea would be to collect
> feedback
> > on
> > > what went well and what could be improved for a specific minor release.
> > So
> > > far, I didn't find anything on that topic. Does the community find this
> > > useful? Or was this already done but not helpful?
> > >
> > > I see three options here:
> > > 1. Having an actual meeting where issues can be discussed and/or
> > > experiences can be shared between the release managers of the previous
> > > release and the release managers of the next minor release. Of course,
> > this
> > > could be open to other contributors as well. A summary could be
> provided
> > in
> > > the Flink wiki (the Flink release's wiki article).
> > > 2. The release manager(s) provide a summary on the Flink release's wiki
> > > article as part of the release process.
> > > 3. Leave the process as is without any additional retrospective but
> focus
> > > on improving the documentation if issues arose during the release.
> > >
> > > That might help people who consider contributing to the community
> through
> > > supporting the release efforts. Additionally, it might help in
> > > understanding what went wrong in past releases retroactively (e.g. the
> > > longer release cycle for 1.15).
> > >
> > > I'm curious about opinion's on that topic.
> > >
> > > Best,
> > > Matthias
> > >
> >
>


[jira] [Created] (FLINK-29854) Make Record Size Flush Strategy Optional for Async Sink

2022-11-02 Thread Danny Cranmer (Jira)
Danny Cranmer created FLINK-29854:
-

 Summary: Make Record Size Flush Strategy Optional for Async Sink
 Key: FLINK-29854
 URL: https://issues.apache.org/jira/browse/FLINK-29854
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Common
Reporter: Danny Cranmer


h3. Background

Currently AsyncSinkWriter supports three mechanisms that trigger a flush to the 
destination:
 * TIme based 
 * Batch size in bytes
 * Number of records in the batch

For "batch size in bytes" one must implement 
[getSizeInBytes|https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/sink/writer/AsyncSinkWriter.java#L202]
 in order for the base to calculate the total batch size. In some cases 
computing the batch size within the AsyncSinkWriter is an expensive operation, 
or not possible. For example, the DynamoDB connector needs to determined the 
serialized size of {{DynamoDbWriteRequest}}.

h3. Scope

Add a feature to make "size in bytes" support optional, this includes:
- Connectors will not be required to implement {{getSizeInBytes}}
- Batches will not be validated for max size
- Records will not be validated size

The sink implementer can decide if it is appropriate to enable this feature.







--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] Release Apache Flink Elasticsearch connector 3.0.0, rc1

2022-11-02 Thread Danny Cranmer
Hey,

It is very exciting to see the first RC for an externalized connector!
Thanks for all the effort setting up the release scripts and processes
Chesnay.

Just to confirm before I start verifying this, will there be an RC2 to bump
the Jackson version?

Danny,

On Wed, Nov 2, 2022 at 6:22 PM Chesnay Schepler  wrote:

> Yeah we should bump that to be closer to the connector version released
> with 1.16.0.
>
> On 02/11/2022 15:53, Sergey Nuyanzin wrote:
> > still checking
> > however there is at least one finding I would like to highlight
> > currently elasticsearch connector depends on jackson-bom 2.13.2.20220328
> > which has 2 CVEs CVE-2022-42003[1] CVE-2022-42004[2] fixed in
> > 2.13.4.20221013 [3]
> > Does it make sense to include it in this version?
> >
> > [1] https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-42003
> > [2] https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-42004
> > [3]
> >
> https://github.com/FasterXML/jackson-databind/issues/3590#issue-1362567066
> >
> > On Wed, Nov 2, 2022 at 12:01 PM Chesnay Schepler 
> wrote:
> >
> >> Hi everyone,
> >> Please review and vote on the release candidate #1 for the version
> >> 3.0.0, as follows:
> >> [ ] +1, Approve the release
> >> [ ] -1, Do not approve the release (please provide specific comments)
> >>
> >> The complete staging area is available for your review, which includes:
> >> * JIRA release notes [1],
> >> * the official Apache source release to be deployed to dist.apache.org
> >> [2], which are signed with the key with fingerprint C2EED7B111D464BA
> [3],
> >> * all artifacts to be deployed to the Maven Central Repository [4],
> >> * source code tag [5],
> >> * website pull request listing the new release [6].
> >>
> >> The vote will be open for at least 72 hours. It is adopted by majority
> >> approval, with at least 3 PMC affirmative votes.
> >>
> >> Note: This is the first release of an externalized connector, relying on
> >> a new set of scripts. Double-check _everything_.
> >>
> >>Thanks,
> >> Release Manager
> >>
> >> [1] https://issues.apache.org/jira/projects/FLINK/versions/12352291
> >> [2]
> >>
> >>
> https://dist.apache.org/repos/dist/dev/flink/flink-connector-elasticsearch-3.0.0-rc1/
> >> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> >> [4]
> >> https://repository.apache.org/content/repositories/orgapacheflink-1543/
> >> [5]
> >>
> >>
> https://github.com/apache/flink-connector-elasticsearch/releases/tag/v3.0.0-rc1
> >> [6] https://github.com/apache/flink-web/pull/579
> >>
> >>
>
>


Re: [DISCUSS] FLIP-264 Extract BaseCoordinatorContext

2022-11-02 Thread gang ye
Hi Max and Qingsheng,

Thanks for the feedback. The initial motivation to propose this is to
reduce the duplicated code since ShuffleCoordinator would need similar
communication logic as SourceCoordinator to talk with operators. I
understand your concern that OperatorCoordinator is an internal class and
except SourceCoordinator for now no other uses this.
How about let's do it like what Qingsheng said? I can go ahead with the
ShufflingCoordinator implementation without the extraction. Then we have
intuitive sense of how many codes are copied and can be reused. If we feel
that there is still a need to extract, we can revisit the discussion.

Thanks
Gang



On Wed, Nov 2, 2022 at 12:21 AM Qingsheng Ren  wrote:

> Thanks Gang and Steven for the FLIP. Actually I share the same concern
> with Piotr and Maximilian.
>
> OperatorCoordinator is marked as @Internal intentionally considering some
> existing issues, like consistency between non-source operator and
> coordinator on checkpoint. I'm wondering if it is useful to expose a public
> context to developers but have the OperatorCoordinator as an internal API.
> If we finally close all issues and decide to expose the operator
> coordinator API, it would be a better chance to include the base context as
> a part of it.
>
> Best,
> Qingsheng
>
> On Tue, Nov 1, 2022 at 8:29 PM Maximilian Michels  wrote:
>
>> Thanks Steven! My confusion stemmed from the lack of context in the FLIP.
>> The first version did not lay out how the refactoring would be used down
>> the line, e.g. by the ShuffleCoordinator. The OperatorCoordinator API is a
>> non-public API and before reading the code, I wasn't even aware how
>> exactly
>> it worked and whether it would be available to regular operators (it was
>> originally intended for sources only).
>>
>> I might seem pedantic here but I believe the purpose of a FLIP should be
>> to
>> describe the *why* behind the changes, not only the changes itself. A FLIP
>> is not a formality but is a tool to communicate and discuss changes. I
>> think we still haven't laid out the exact reasons why we are factoring out
>> the base. As far as I understand now, we need the base class to deal with
>> concurrent updates in the custom Coordinator from the runtime (sub)tasks.
>> Effectively, we are enforcing an actor model for the processing of the
>> incoming messages such that the OperatorCoordinator can cleanly update its
>> state. However, if there are no actual implementations that make use of
>> the
>> refactoring in Flink itself, I wonder if it would make sense to copy this
>> code to the downstream implementation, e.g. the ShuffleCoordinator. As
>> soon
>> as it is part of Flink, we could of course try to consolidate this code.
>>
>> Considering the *how* of this, there appear to be both methods from
>> SourceCoordinator (e.g. runInEventLoop) as well as
>> SourceCoordinatorContext
>> listed in the FLIP, as well as methods which do not appear anywhere in
>> Flink code, e.g. subTaskReady / subTaskNotReady / sendEventToOperator. It
>> appears that some of this has been extracted from a downstream
>> implementation. It would be great to adjust this, such that it reflects
>> the
>> status quo in Flink.
>>
>> -Max
>>
>> On Fri, Oct 28, 2022 at 5:53 AM Steven Wu  wrote:
>>
>> > Max,
>> >
>> > Thanks a lot for the comments. We should clarify that the shuffle
>> > operator/coordinator is not really part of the Flink sink
>> > function/operator. shuffle operator is a custom operator that can be
>> > inserted right before the Iceberg writer operator. Shuffle operator
>> > calculates the traffic statistics and performs a custom
>> partition/shuffle
>> > (DataStream#partitionCustom) to cluster the data right before they get
>> to
>> > the Iceberg writer operator.
>> >
>> > We are not proposing to introduce a sink coordinator for the sink
>> > interface. Shuffle operator needs the CoordinatorContextBase to
>> > facilitate the communication btw shuffle subtasks and coordinator for
>> > traffic statistics aggregation. The communication part is already
>> > implemented by SourceCoordinatorContext.
>> >
>> > Here are some details about the communication needs.
>> > - subtasks periodically calculate local statistics and send to the
>> > coordinator for global aggregation
>> > - the coordinator sends the globally aggregated statistics to the
>> subtasks
>> > - subtasks use the globally aggregated statistics to guide the
>> > partition/shuffle decision
>> >
>> > Regards,
>> > Steven
>> >
>> > On Thu, Oct 27, 2022 at 5:38 PM Maximilian Michels 
>> wrote:
>> >
>> > > Hi Gang,
>> > >
>> > > Looks much better! I've actually gone through the OperatorCoordinator
>> > code.
>> > > It turns out, any operator already has an OperatorCoordinator
>> assigned.
>> > > Also, any operator can add custom coordinator code. So it looks like
>> you
>> > > won't have to implement any additional runtime logic to add a
>> > > ShuffleCoordinator. However, I'm wondering, why do you 

Re: [VOTE] Release Apache Flink Elasticsearch connector 3.0.0, rc1

2022-11-02 Thread Chesnay Schepler
Yeah we should bump that to be closer to the connector version released 
with 1.16.0.


On 02/11/2022 15:53, Sergey Nuyanzin wrote:

still checking
however there is at least one finding I would like to highlight
currently elasticsearch connector depends on jackson-bom 2.13.2.20220328
which has 2 CVEs CVE-2022-42003[1] CVE-2022-42004[2] fixed in
2.13.4.20221013 [3]
Does it make sense to include it in this version?

[1] https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-42003
[2] https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-42004
[3]
https://github.com/FasterXML/jackson-databind/issues/3590#issue-1362567066

On Wed, Nov 2, 2022 at 12:01 PM Chesnay Schepler  wrote:


Hi everyone,
Please review and vote on the release candidate #1 for the version
3.0.0, as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release to be deployed to dist.apache.org
[2], which are signed with the key with fingerprint C2EED7B111D464BA [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag [5],
* website pull request listing the new release [6].

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Note: This is the first release of an externalized connector, relying on
a new set of scripts. Double-check _everything_.

   Thanks,
Release Manager

[1] https://issues.apache.org/jira/projects/FLINK/versions/12352291
[2]

https://dist.apache.org/repos/dist/dev/flink/flink-connector-elasticsearch-3.0.0-rc1/
[3] https://dist.apache.org/repos/dist/release/flink/KEYS
[4]
https://repository.apache.org/content/repositories/orgapacheflink-1543/
[5]

https://github.com/apache/flink-connector-elasticsearch/releases/tag/v3.0.0-rc1
[6] https://github.com/apache/flink-web/pull/579






[jira] [Created] (FLINK-29853) Older jackson-databind found in flink-kubernetes-operator-1.2.0-shaded.jar

2022-11-02 Thread James Busche (Jira)
James Busche created FLINK-29853:


 Summary: Older jackson-databind found in 
flink-kubernetes-operator-1.2.0-shaded.jar
 Key: FLINK-29853
 URL: https://issues.apache.org/jira/browse/FLINK-29853
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.2.1
Reporter: James Busche


A Twistlock security scan of the existing 1.2.0 operator as well as the current 
main release shows a high vulnerability with the current jackson-databind 
version.

==
severity: High

cvss: 7.5

riskFactors:  Attack complexity: low,Attack vector: network,Has fix,High 
severity,Recent vulnerability

CVE link: [https://nvd.nist.gov/vuln/detail/CVE-2022-42003]

packageName: com.fasterxml.jackson.core_jackson-databind

packagePath: 
/flink-kubernetes-operator/flink-kubernetes-operator-1.2.0-shaded.jar and/or 
/flink-kubernetes-operator/flink-kubernetes-operator-1.3-SNAPSHOT-shaded.jar

description: In FasterXML jackson-databind before 2.14.0-rc1, resource 
exhaustion can occur because of a lack of a check in primitive value 
deserializers to avoid deep wrapper array nesting, when the 
UNWRAP_SINGLE_VALUE_ARRAYS feature is enabled. Additional fix version in 
2.13.4.1 and 2.12.17.1



This is exactly like the older issue 
https://issues.apache.org/jira/browse/FLINK-27654 

I'm going to see if I can fix it myself and create a PR if I'm successful.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[RESULT][VOTE] Drop TypeSerializerConfigSnapshot and savepoint support from Flink versions < 1.8.0

2022-11-02 Thread Piotr Nowojski
Hi everyone,

The proposal [1] has been accepted. I've created a ticket to follow the
actual implementation: https://issues.apache.org/jira/browse/FLINK-29807

Binding votes in favour:
Chesnay
Yuan
Dawid
Gordon
Konstantin
Piotr

Non-binding:
Hangxiang Yu

there were no votes against.

Best,
Piotrek
[1] https://lists.apache.org/thread/x5d0p08pf2wx47njogsgqct0k5rpfrl4


Re: [VOTE] Drop TypeSerializerConfigSnapshot and savepoint support from Flink versions < 1.8.0

2022-11-02 Thread Piotr Nowojski
Thanks for the vote. +1 (binding) from my side as well.

The proposal has been accepted. I've created a ticket to follow the actual
implementation: https://issues.apache.org/jira/browse/FLINK-29807

Binding votes in favour:
Chesnay
Yuan
Dawid
Gordon
Konstantin
Piotr

Non-binding:
Hangxiang Yu

there were no votes against.

Thank you for your votes!

Best,
Piotrek

pon., 31 paź 2022 o 11:39 Chesnay Schepler  napisał(a):

> +1
>
> On 28/10/2022 16:57, Piotr Nowojski wrote:
> > Hi,
> >
> > As discussed on the dev mailing list [0] I would like to start a vote to
> > drop support of older savepoint formats (for Flink versions older than
> > 1.8). You can find the original explanation from the aforementioned dev
> > mailing list thread at the bottom of this message.
> >
> > Draft PR containing the proposed change you can find here:
> > https://github.com/apache/flink/pull/21056
> >
> > Vote will be open at least until Wednesday, November 2nd 18:00 CET.
> >
> > Best,
> > Piotrek
> >
> > [0] https://lists.apache.org/thread/v1q28zg5jhxcqrpq67pyv291nznd3n0w
> >
> > I would like to open a discussion to remove the long deprecated
> > (@PublicEvolving) TypeSerializerConfigSnapshot class [1] and the related
> > code.
> >
> > The motivation behind this move is two fold. One reason is that it
> > complicates our code base unnecessarily and creates confusion on how to
> > actually implement custom serializers. The immediate reason is that I
> > wanted to clean up Flink's configuration stack a bit and refactor the
> > ExecutionConfig class [2]. This refactor would keep the API compatibility
> > of the ExecutionConfig, but it would break savepoint compatibility with
> > snapshots written with some of the old serializers, which had
> > ExecutionConfig as a field and were serialized in the snapshot. This
> issue
> > has been resolved by the introduction of TypeSerializerSnapshot in Flink
> > 1.7 [3], where serializers are no longer part of the snapshot.
> >
> > TypeSerializerConfigSnapshot has been deprecated and no longer used by
> > built-in serializers since Flink 1.8 [4] and [5]. Users were encouraged
> to
> > migrate to TypeSerializerSnapshot since then with their own custom
> > serializers. That has been plenty of time for the migration.
> >
> > This proposal would have the following impact for the users:
> > 1. we would drop support for recovery from savepoints taken with Flink <
> > 1.7.0 for all built in types serializers
> > 2. we would drop support for recovery from savepoints taken with Flink <
> > 1.8.0 for built in kryo serializers
> > 3. we would drop support for recovery from savepoints taken with Flink <
> > 1.17 for custom serializers using deprecated TypeSerializerConfigSnapshot
> >
> > 1. and 2. would have a simple migration path. Users migrating from those
> > old savepoints would have to first start his job using a Flink version
> from
> > the [1.8, 1.16] range, and take a new savepoint that would be compatible
> > with Flink 1.17.
> > 3. This is a bit more problematic, because users would have to first
> > migrate their own custom serializers to use TypeSerializerSnapshot
> (using a
> > Flink version from the [1.8, 1.16]), take a savepoint, and only then
> > migrate to Flink 1.17. However users had already 4 years to migrate,
> which
> > in my opinion has been plenty of time to do so.
> >
> > As a side effect, we could also drop support for some of the legacy
> > metadata serializers from LegacyStateMetaInfoReaders and potentially
> other
> > places that we are keeping for the sake of compatibility with old
> > savepoints.
> >
> > [1]
> >
> https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/api/common/typeutils/TypeSerializerConfigSnapshot.html
> > [2] https://issues.apache.org/jira/browse/FLINK-29379
> > [3] https://issues.apache.org/jira/browse/FLINK-9377
> > [4] https://issues.apache.org/jira/browse/FLINK-9376
> > [5] https://issues.apache.org/jira/browse/FLINK-11323
> >
>
>


[jira] [Created] (FLINK-29852) The operator is repeatedly displayed on the Flink Web UI

2022-11-02 Thread JasonLee (Jira)
JasonLee created FLINK-29852:


 Summary: The operator is repeatedly displayed on the Flink Web UI
 Key: FLINK-29852
 URL: https://issues.apache.org/jira/browse/FLINK-29852
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Web Frontend
Affects Versions: 1.16.0
 Environment: Flink 1.16.0
Reporter: JasonLee
 Fix For: 1.16.0
 Attachments: image-2022-11-02-23-57-39-387.png

All the operators in the DAG are shown repeatedly

!image-2022-11-02-23-57-39-387.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] Release Apache Flink Elasticsearch connector 3.0.0, rc1

2022-11-02 Thread Sergey Nuyanzin
still checking
however there is at least one finding I would like to highlight
currently elasticsearch connector depends on jackson-bom 2.13.2.20220328
which has 2 CVEs CVE-2022-42003[1] CVE-2022-42004[2] fixed in
2.13.4.20221013 [3]
Does it make sense to include it in this version?

[1] https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-42003
[2] https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-42004
[3]
https://github.com/FasterXML/jackson-databind/issues/3590#issue-1362567066

On Wed, Nov 2, 2022 at 12:01 PM Chesnay Schepler  wrote:

> Hi everyone,
> Please review and vote on the release candidate #1 for the version
> 3.0.0, as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release to be deployed to dist.apache.org
> [2], which are signed with the key with fingerprint C2EED7B111D464BA [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag [5],
> * website pull request listing the new release [6].
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> Note: This is the first release of an externalized connector, relying on
> a new set of scripts. Double-check _everything_.
>
>   Thanks,
> Release Manager
>
> [1] https://issues.apache.org/jira/projects/FLINK/versions/12352291
> [2]
>
> https://dist.apache.org/repos/dist/dev/flink/flink-connector-elasticsearch-3.0.0-rc1/
> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> [4]
> https://repository.apache.org/content/repositories/orgapacheflink-1543/
> [5]
>
> https://github.com/apache/flink-connector-elasticsearch/releases/tag/v3.0.0-rc1
> [6] https://github.com/apache/flink-web/pull/579
>
>

-- 
Best regards,
Sergey


[jira] [Created] (FLINK-29851) Upgrade to Fabric8 6.x.x and JOSDK 4.x.x

2022-11-02 Thread Gyula Fora (Jira)
Gyula Fora created FLINK-29851:
--

 Summary: Upgrade to Fabric8 6.x.x and JOSDK 4.x.x
 Key: FLINK-29851
 URL: https://issues.apache.org/jira/browse/FLINK-29851
 Project: Flink
  Issue Type: New Feature
  Components: Kubernetes Operator
Reporter: Gyula Fora
Assignee: Gyula Fora
 Fix For: kubernetes-operator-1.3.0


In order to get the latest developments from fabric8 and the josdk we should 
upgrade to the latest version.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29850) Flink Table Store quick start guide does not work

2022-11-02 Thread Alex Sorokoumov (Jira)
Alex Sorokoumov created FLINK-29850:
---

 Summary: Flink Table Store quick start guide does not work
 Key: FLINK-29850
 URL: https://issues.apache.org/jira/browse/FLINK-29850
 Project: Flink
  Issue Type: Bug
  Components: Table Store
Reporter: Alex Sorokoumov


Following instructions in 
https://nightlies.apache.org/flink/flink-table-store-docs-master/docs/try-table-store/quick-start/
 leads to empty results in {{word_count}} table.

Flink version 1.15.2, Flink Table Store version 0.2.1.

{noformat}
Flink SQL> show catalogs;
+-+
|catalog name |
+-+
| default_catalog |
+-+
1 row in set

Flink SQL> CREATE CATALOG my_catalog WITH (
>   'type'='table-store',
>   'warehouse'='file:/tmp/table_store'
> );
>
[INFO] Execute statement succeed.

Flink SQL> USE CATALOG my_catalog;
> USE CATALOG my_catalog;
>

Flink SQL> USE CATALOG my_catalog;
>
[INFO] Execute statement succeed.

Flink SQL> CREATE TABLE word_count (
> word STRING PRIMARY KEY NOT ENFORCED,
> cnt BIGINT
> );
>
[INFO] Execute statement succeed.

Flink SQL> CREATE TEMPORARY TABLE word_table (
> word STRING
> ) WITH (
> 'connector' = 'datagen',
> 'fields.word.length' = '1'
> );
[INFO] Execute statement succeed.

Flink SQL> SET 'execution.checkpointing.interval' = '10 s';
>
[INFO] Session property has been set.

Flink SQL> INSERT INTO word_count SELECT word, COUNT(*) FROM word_table GROUP 
BY word;
>
[INFO] Submitting SQL update statement to the cluster...
[INFO] SQL update statement has been successfully submitted to the cluster:
Job ID: 0c5f22c2ab3e83e1a1f9274818ff675b


Flink SQL> SET 'sql-client.execution.result-mode' = 'tableau';
>
[INFO] Session property has been set.

Flink SQL> RESET 'execution.checkpointing.interval';
>
[INFO] Session property has been reset.

Flink SQL> SET 'execution.runtime-mode' = 'batch';
>
[INFO] Session property has been set.

Flink SQL> SELECT * FROM word_count;
>
Empty set
{noformat}

Flink logs:

{noformat}
flink  | Starting standalonesession as a console application on host 
flink.
broker | [2022-11-02 14:07:17,045] INFO [Controller id=1] Processing 
automatic preferred replica leader election (kafka.controller.KafkaController)
broker | [2022-11-02 14:07:17,046] TRACE [Controller id=1] Checking 
need to trigger auto leader balancing (kafka.controller.KafkaController)
broker | [2022-11-02 14:07:17,050] DEBUG [Controller id=1] Topics not 
in preferred replica for broker 1 Map() (kafka.controller.KafkaController)
broker | [2022-11-02 14:07:17,051] TRACE [Controller id=1] Leader 
imbalance ratio for broker 1 is 0.0 (kafka.controller.KafkaController)
flink  | 2022-11-02 14:07:17,745 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint[] - 

flink  | 2022-11-02 14:07:17,752 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint[] -  
Preconfiguration:
flink  | 2022-11-02 14:07:17,753 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint[] -
flink  |
flink  |
flink  | RESOURCE_PARAMS extraction logs:
flink  | jvm_params: -Xmx1073741824 -Xms1073741824 
-XX:MaxMetaspaceSize=268435456
flink  | dynamic_configs: -D jobmanager.memory.off-heap.size=134217728b 
-D jobmanager.memory.jvm-overhead.min=201326592b -D 
jobmanager.memory.jvm-metaspace.size=268435456b -D 
jobmanager.memory.heap.size=1073741824b -D 
jobmanager.memory.jvm-overhead.max=201326592b
flink  | logs: INFO  [] - Loading configuration property: 
jobmanager.rpc.address, flink
flink  | INFO  [] - Loading configuration property: 
jobmanager.rpc.port, 6123
flink  | INFO  [] - Loading configuration property: 
jobmanager.bind-host, 0.0.0.0
flink  | INFO  [] - Loading configuration property: 
jobmanager.memory.process.size, 1600m
flink  | INFO  [] - Loading configuration property: 
taskmanager.bind-host, 0.0.0.0
flink  | INFO  [] - Loading configuration property: 
taskmanager.memory.process.size, 1728m
flink  | INFO  [] - Loading configuration property: 
taskmanager.numberOfTaskSlots, 1
flink  | INFO  [] - Loading configuration property: 
parallelism.default, 1
flink  | INFO  [] - Loading configuration property: 
jobmanager.execution.failover-strategy, region
flink  | INFO  [] - Loading configuration property: rest.address, 
0.0.0.0
flink  | INFO  [] - Loading configuration property: rest.bind-address, 
0.0.0.0
flink  | INFO  [] - Loading configuration property: blob.server.port, 
6124
flink  | INFO  [] - Loading configuration property: query.server.port, 
6125
flink  | INFO  [] - The derived from fraction jvm overhead memory 
(160.000mb (167772162 

Re: [DISCUSS] Issue tracking workflow

2022-11-02 Thread Chesnay Schepler
Calcite just requested a separate mailing list for users to request a 
JIRA account.



I think I'd try going a similar route. While I prefer the openness of 
github issues, they are really limited, and while some things can be 
replicated with labels (like fix versions / components), things like 
release notes can't.
We'd also lose a central place for collecting issues, since we'd have to 
(?) scope issues per repo.


I wouldn't want to import everything into GH issues (it's just a flawed 
approach in the long-term imo), but on the other hand I don't know if 
the auto linker even works if it has to link to either jira or a GH issue.


Given that we need to change workflows in any case, I think I'd prefer 
sticking to JIRA.
For reported bugs I'd wager that in most cases we can file the tickets 
ourselves and communicate with users on slack/MLs to gather all the 
information. I'd argue that if we'd had been more pro-active with filing 
tickets for user issues (instead of relying on them to do it) we 
would've addressed several issues way sooner.


Additionally, since either option would be a sort of experiment, then 
JIRA is a safer option. We have to change less and there aren't any 
long-term ramifications (like having to re-import GH tickets into JIRA).


On 28/10/2022 16:49, Piotr Nowojski wrote:

Hi,

I'm afraid of the migration cost to only github issues and lack of many
features that we are currently using. That would be very disruptive and
annoying. For me github issues are far worse compared to using the Jira.

I would strongly prefer Option 1. over the others. Option 4 I would like
the least. I would be fine with Option 3, and Option 2 but assuming that
Jira would stay the source of truth.
For option 2, maybe we could have a bot that would backport/copy user
created issues in github to Jira (and link them together)? Discussions
could still happen in the github, but we could track all of the issues as
we are doing right now. Bot could also sync it the other way around (like
marking tickets closed, affected/fixed versions etc).

Best,
Piotrek

czw., 27 paź 2022 o 07:48 Martijn Visser 
napisał(a):


Hi,

We have to keep in mind that if a users asks for a new Jira account, that
person will need to provide its email address which is the Flink PMC
processing personal identifiable information. There needs to be a careful
process for that and to be honest, I don't think the ASF should do this
from a privacy perspective.

As an example, the Calcite community decided to create a dedicated, private
list where users can ask for an account to avoid making the email address
public.

Best regards,

Martijn

Op wo 26 okt. 2022 om 22:31 schreef Danny Cranmer 
Hello,

I agree with Gyula. My preference is also option 1, and as a fallback
option 3. Handling new user account requests will be manageable,

especially

via slack. We could setup a dedicated channel for people to ask for
Jira/wiki access.

Thanks,
Danny

On Wed, 26 Oct 2022, 12:16 Gyula Fóra,  wrote:


Hi!

I would also personally prefer staying with JIRA given the feature set

and

the past positive experience with it.
I think the structured nature of JIRA with flexible components, issue
types, epics, release handling etc have been a great benefit to the
project, it would be a shame to give some of these up.

If for some reason Option 1 is not possible, I would still prefer

Option

3

(requiring new contributors to ask for JIRA access) compared to the
alternatives.

Cheers,
Gyula


On Tue, Oct 25, 2022 at 3:48 PM Robert Metzger 
wrote:


Thank you for starting this discussion Xintong!

I would also prefer option 1.

The ASF Jira is probably one of the largest, public Jira instances on

the

internet. Most other Jiras are internal within companies, so

Atlassian

is

probably not putting a lot of effort into automatically detecting and
preventing spam and malicious account creation.
If we want to convince Infra to keep the current sign up process, we
probably need to help them find a solution for the problem.
Maybe we can configure the ASF Jira to rely on GitHub as an identity
provider? I've just proposed that in the discussion on
us...@infra.apache.org, let's see ;)

Best,
Robert


On Tue, Oct 25, 2022 at 2:08 PM Konstantin Knauf 
wrote:


Hi everyone,

while I see some benefits in moving to Github Issues completely, we

need

to

be aware that Github Issues lacks many features that Jira has. From

the

top

of my head:
* there are no issue types
* no priorities
* issues can only be assigned to one milestone
So, you need to work a lot with labels and conventions and

basically

need

bots or actions to manage those. Agreeing on those processes,

setting

them

up and getting used to them will be a lot of work for the

community.

So, I am also in favor of 1) for now, because I don't really see a

good

alternative option.

Cheers,

Konstantin



Am Mo., 24. Okt. 2022 um 22:27 Uhr schrieb Matthias Pohl
:


I agree that leaving everything as is would be 

Re: [DISCUSS] Release Flink 1.15.3

2022-11-02 Thread Fabian Paul
Thanks for all the replies. @xintong I'll definitely come back to your
offer when facing steps that require PMC rights for the release.

I checked the JIRA and found four blocking/critical issues affecting 1.15.2

- FLINK-29830 
- FLINK-29492 
- FLINK-29315 
- FLINK-29234 

I'll reach out to the ticket owners to get their opinion about the current
status. In case, someone knows of some pending fixes that I haven't
mentioned please let me know.

Best,
Fabian

On Wed, Oct 26, 2022 at 2:01 PM Konstantin Knauf  wrote:

> +1, thanks Fabian.
>
> Am Mi., 26. Okt. 2022 um 08:26 Uhr schrieb Danny Cranmer <
> dannycran...@apache.org>:
>
> > +1, thanks for driving this Fabian.
> >
> > Danny,
> >
> > On Wed, Oct 26, 2022 at 2:22 AM yuxia 
> wrote:
> >
> > > Thanks for driving this.
> > > +1 for release 1.15.3
> > >
> > > Best regards,
> > > Yuxia
> > >
> > > - 原始邮件 -
> > > 发件人: "Leonard Xu" 
> > > 收件人: "dev" 
> > > 发送时间: 星期二, 2022年 10 月 25日 下午 10:00:47
> > > 主题: Re: [DISCUSS] Release Flink 1.15.3
> > >
> > > Thanks Fabian for driving this.
> > >
> > > +1 to release 1.15.3.
> > >
> > > The bug tickets FLINK-26394 and FLINK-27148 should be fixed as well,
> I’ll
> > > help to address them soon.
> > >
> > > Best,
> > > Leonard Xu
> > >
> > >
> > >
> > > > 2022年10月25日 下午8:28,Jing Ge  写道:
> > > >
> > > > +1 The timing is good to have 1.15.3 release. Thanks Fabian for
> > bringing
> > > > this to our attention.
> > > >
> > > > I just checked PRs and didn't find the 1.15 backport of FLINK-29567
> > > > . Please be aware
> > of
> > > it.
> > > > Thanks!
> > > >
> > > > Best regards,
> > > > Jing
> > > >
> > > > On Tue, Oct 25, 2022 at 11:44 AM Xintong Song  >
> > > wrote:
> > > >
> > > >> Thanks for bringing this up, Fabian.
> > > >>
> > > >> +1 for creating a 1.15.3 release. I've also seen users requiring
> this
> > > >> version [1].
> > > >>
> > > >> I can help with actions that require a PMC role, if needed.
> > > >>
> > > >> Best,
> > > >>
> > > >> Xintong
> > > >>
> > > >>
> > > >> [1]
> https://lists.apache.org/thread/501q4l1c6gs8hwh433bw3v1y8fs9cw2n
> > > >>
> > > >>
> > > >>
> > > >> On Tue, Oct 25, 2022 at 5:11 PM Fabian Paul 
> wrote:
> > > >>
> > > >>> Hi all,
> > > >>>
> > > >>> I want to start the discussion of creating a new 1.15 patch release
> > > >>> (1.15.3). The last 1.15 release is almost two months old, and since
> > > then,
> > > >>> ~60 tickets have been closed, targeting 1.15.3. It includes
> critical
> > > >>> changes to the sink architecture, including:
> > > >>>
> > > >>> - Reverting the sink metric naming [1]
> > > >>> - Recovery problems for sinks using the GlobalCommitter [2][3][4]
> > > >>>
> > > >>> If the community agrees to create a new patch release, I could
> > > volunteer
> > > >> as
> > > >>> the release manager.
> > > >>>
> > > >>> Best,
> > > >>> Fabian
> > > >>>
> > > >>> [1] https://issues.apache.org/jira/browse/FLINK-29567
> > > >>> [2] https://issues.apache.org/jira/browse/FLINK-29509
> > > >>> [3] https://issues.apache.org/jira/browse/FLINK-29512
> > > >>> [4] https://issues.apache.org/jira/browse/FLINK-29627
> > > >>>
> > > >>
> > >
> >
>
>
> --
> https://twitter.com/snntrable
> https://github.com/knaufk
>


[VOTE] Release Apache Flink Elasticsearch connector 3.0.0, rc1

2022-11-02 Thread Chesnay Schepler

Hi everyone,
Please review and vote on the release candidate #1 for the version 
3.0.0, as follows:

[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release to be deployed to dist.apache.org 
[2], which are signed with the key with fingerprint C2EED7B111D464BA [3],

* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag [5],
* website pull request listing the new release [6].

The vote will be open for at least 72 hours. It is adopted by majority 
approval, with at least 3 PMC affirmative votes.


Note: This is the first release of an externalized connector, relying on 
a new set of scripts. Double-check _everything_.


 Thanks,
Release Manager

[1] https://issues.apache.org/jira/projects/FLINK/versions/12352291
[2] 
https://dist.apache.org/repos/dist/dev/flink/flink-connector-elasticsearch-3.0.0-rc1/

[3] https://dist.apache.org/repos/dist/release/flink/KEYS
[4] https://repository.apache.org/content/repositories/orgapacheflink-1543/
[5] 
https://github.com/apache/flink-connector-elasticsearch/releases/tag/v3.0.0-rc1

[6] https://github.com/apache/flink-web/pull/579



Re: [DISCUSS] Remove FlinkKafkaConsumer and FlinkKafkaProducer in the master for 1.17 release

2022-11-02 Thread Martijn Visser
Hi David,

I believe that for the DataStream this is indeed documented [1] but it
might be missed given that there is a lot of documentation and you need to
know that your problem is related to idleness. For the Table API I think
this is never mentioned, so it should definitely be at least documented
there.

Thanks,

Martijn

[1]
https://nightlies.apache.org/flink/flink-docs-stable/docs/connectors/datastream/kafka/#idleness

On Wed, Nov 2, 2022 at 11:28 AM David Anderson  wrote:

> >
> > For the partition
> > idleness problem could you elaborate more about it? I assume both
> > FlinkKafkaConsumer and KafkaSource need a WatermarkStrategy to decide
> > whether to mark the partition as idle.
>
>
> As a matter of fact, no, that's not the case -- which is why I mentioned
> it.
>
> The FlinkKafkaConsumer automatically treats all initially empty (or
> non-existent) partitions as idle, while the KafkaSource only does this if
> the WatermarkStrategy specifies that idleness handling is desired by
> configuring withIdleness. This can be a source of confusion for folks
> upgrading to the new connector. It most often shows up in situations where
> the number of Kafka partitions is less than the parallelism of the
> connector, which is a rather common occurrence in development and testing
> environments.
>
> I believe this change in behavior was made deliberately, so as to create a
> more consistent experience across all FLIP-27 connectors. This isn't
> something that needs to be fixed, but does need to be communicated more
> clearly. Unfortunately, the whole idleness mechanism remained significantly
> broken until 1.16 (considering the impact of [1] and [2]), further
> complicating the situation. Because of FLINK-28975 [2], users with
> partitions that are initially empty may have problems with versions before
> 1.15.3 (still unreleased) and 1.16.0. See [3] for an example of this
> confusion.
>
> [1] https://issues.apache.org/jira/browse/FLINK-18934 (idleness didn't
> work
> with connected streams)
> [2] https://issues.apache.org/jira/browse/FLINK-28975 (idle streams could
> never become active again)
> [3]
>
> https://stackoverflow.com/questions/70096166/parallelism-in-flink-kafka-source-causes-nothing-to-execute/70101290#70101290
>
> Best,
> David
>
> On Wed, Nov 2, 2022 at 5:26 AM Qingsheng Ren  wrote:
>
> > Thanks Jing for starting the discussion.
> >
> > +1 for removing FlinkKafkaConsumer, as KafkaSource has evolved for many
> > release cycles and should be stable enough. I have some concerns about
> the
> > new Kafka sink based on sink v2, as sink v2 still has some ongoing work
> in
> > 1.17 (maybe Yun Gao could provide some inputs). Also we found some issues
> > of KafkaSink related to the internal mechanism of sink v2, like
> > FLINK-29492.
> >
> > @David
> > About the ability of DeserializationSchema#isEndOfStream, FLIP-208 is
> > trying to complete this piece of the puzzle, and Hang Ruan (
> > ruanhang1...@gmail.com) plans to work on it in 1.17. For the partition
> > idleness problem could you elaborate more about it? I assume both
> > FlinkKafkaConsumer and KafkaSource need a WatermarkStrategy to decide
> > whether to mark the partition as idle.
> >
> > Best,
> > Qingsheng
> > Ververica (Alibaba)
> >
> > On Thu, Oct 27, 2022 at 8:06 PM Jing Ge  wrote:
> >
> > > Hi Dev,
> > >
> > > I'd like to start a discussion about removing FlinkKafkaConsumer and
> > > FlinkKafkaProducer in 1.17.
> > >
> > > Back in the past, it was originally announced to remove it with Flink
> > 1.15
> > > after Flink 1.14 had been released[1]. And then postponed to the next
> > 1.15
> > > release which meant to remove it with Flink 1.16 but forgot to change
> the
> > > doc[2]. I have created a PRs to fix it. Since the 1.16 release branch
> has
> > > code freeze, it makes sense to, first of all, update the doc to say
> that
> > > FlinkKafkaConsumer will be removed with Flink 1.17 [3][4] and second
> > start
> > > the discussion about removing them with the current master branch i.e.
> > for
> > > the coming 1.17 release. I'm all ears and looking forward to your
> > feedback.
> > > Thanks!
> > >
> > > Best regards,
> > > Jing
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > [1]
> > >
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/connectors/datastream/kafka/#kafka-sourcefunction
> > > [2]
> > >
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/connectors/datastream/kafka/#kafka-sourcefunction
> > > [3] https://github.com/apache/flink/pull/21172
> > > [4] https://github.com/apache/flink/pull/21171
> > >
> >
>


Re: [DISCUSS] Remove FlinkKafkaConsumer and FlinkKafkaProducer in the master for 1.17 release

2022-11-02 Thread David Anderson
>
> For the partition
> idleness problem could you elaborate more about it? I assume both
> FlinkKafkaConsumer and KafkaSource need a WatermarkStrategy to decide
> whether to mark the partition as idle.


As a matter of fact, no, that's not the case -- which is why I mentioned it.

The FlinkKafkaConsumer automatically treats all initially empty (or
non-existent) partitions as idle, while the KafkaSource only does this if
the WatermarkStrategy specifies that idleness handling is desired by
configuring withIdleness. This can be a source of confusion for folks
upgrading to the new connector. It most often shows up in situations where
the number of Kafka partitions is less than the parallelism of the
connector, which is a rather common occurrence in development and testing
environments.

I believe this change in behavior was made deliberately, so as to create a
more consistent experience across all FLIP-27 connectors. This isn't
something that needs to be fixed, but does need to be communicated more
clearly. Unfortunately, the whole idleness mechanism remained significantly
broken until 1.16 (considering the impact of [1] and [2]), further
complicating the situation. Because of FLINK-28975 [2], users with
partitions that are initially empty may have problems with versions before
1.15.3 (still unreleased) and 1.16.0. See [3] for an example of this
confusion.

[1] https://issues.apache.org/jira/browse/FLINK-18934 (idleness didn't work
with connected streams)
[2] https://issues.apache.org/jira/browse/FLINK-28975 (idle streams could
never become active again)
[3]
https://stackoverflow.com/questions/70096166/parallelism-in-flink-kafka-source-causes-nothing-to-execute/70101290#70101290

Best,
David

On Wed, Nov 2, 2022 at 5:26 AM Qingsheng Ren  wrote:

> Thanks Jing for starting the discussion.
>
> +1 for removing FlinkKafkaConsumer, as KafkaSource has evolved for many
> release cycles and should be stable enough. I have some concerns about the
> new Kafka sink based on sink v2, as sink v2 still has some ongoing work in
> 1.17 (maybe Yun Gao could provide some inputs). Also we found some issues
> of KafkaSink related to the internal mechanism of sink v2, like
> FLINK-29492.
>
> @David
> About the ability of DeserializationSchema#isEndOfStream, FLIP-208 is
> trying to complete this piece of the puzzle, and Hang Ruan (
> ruanhang1...@gmail.com) plans to work on it in 1.17. For the partition
> idleness problem could you elaborate more about it? I assume both
> FlinkKafkaConsumer and KafkaSource need a WatermarkStrategy to decide
> whether to mark the partition as idle.
>
> Best,
> Qingsheng
> Ververica (Alibaba)
>
> On Thu, Oct 27, 2022 at 8:06 PM Jing Ge  wrote:
>
> > Hi Dev,
> >
> > I'd like to start a discussion about removing FlinkKafkaConsumer and
> > FlinkKafkaProducer in 1.17.
> >
> > Back in the past, it was originally announced to remove it with Flink
> 1.15
> > after Flink 1.14 had been released[1]. And then postponed to the next
> 1.15
> > release which meant to remove it with Flink 1.16 but forgot to change the
> > doc[2]. I have created a PRs to fix it. Since the 1.16 release branch has
> > code freeze, it makes sense to, first of all, update the doc to say that
> > FlinkKafkaConsumer will be removed with Flink 1.17 [3][4] and second
> start
> > the discussion about removing them with the current master branch i.e.
> for
> > the coming 1.17 release. I'm all ears and looking forward to your
> feedback.
> > Thanks!
> >
> > Best regards,
> > Jing
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > [1]
> >
> >
> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/connectors/datastream/kafka/#kafka-sourcefunction
> > [2]
> >
> >
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/connectors/datastream/kafka/#kafka-sourcefunction
> > [3] https://github.com/apache/flink/pull/21172
> > [4] https://github.com/apache/flink/pull/21171
> >
>


Re: [DISCUSS] Flink release retro

2022-11-02 Thread Martijn Visser
Hi Matthias,

I think it's a good idea to capture how this release cycle has progressed.
I'm not sure that a classical "retrospective" is the best solution, since
it would require multiple people in different timezones to attend a virtual
meeting.

So I would +1 an async retrospective, which could be the questions that you
would normally ask during a retrospective yet but now via a questionnaire.
It probably makes sense to have a proposal of the questions that can be
asked, discuss them and then sent them out.

WDYT?

Thanks,

Martijn

On Wed, Nov 2, 2022 at 9:42 AM Qingsheng Ren  wrote:

> Thanks for starting the discussion Matthias!
>
> I think having a retro after a release cycle would be quite helpful to
> standardizing the procedure of the release, and also could avoid new
> release managers getting stuck on the same issue that happened before. I
> prefer the second option that RMs could open a discussion thread in ML at
> the end of the release to collect feedback about the last release cycle and
> add them to the release wiki page, which would be quite handy for further
> RMs.
>
> Best,
> Qingsheng
> Ververica (Alibaba)
>
> On Mon, Oct 31, 2022 at 11:02 PM Matthias Pohl
>  wrote:
>
> > Hi everyone,
> > I want to bring up the idea of having a retrospective on the release from
> > the release manager's perspective. The idea would be to collect feedback
> on
> > what went well and what could be improved for a specific minor release.
> So
> > far, I didn't find anything on that topic. Does the community find this
> > useful? Or was this already done but not helpful?
> >
> > I see three options here:
> > 1. Having an actual meeting where issues can be discussed and/or
> > experiences can be shared between the release managers of the previous
> > release and the release managers of the next minor release. Of course,
> this
> > could be open to other contributors as well. A summary could be provided
> in
> > the Flink wiki (the Flink release's wiki article).
> > 2. The release manager(s) provide a summary on the Flink release's wiki
> > article as part of the release process.
> > 3. Leave the process as is without any additional retrospective but focus
> > on improving the documentation if issues arose during the release.
> >
> > That might help people who consider contributing to the community through
> > supporting the release efforts. Additionally, it might help in
> > understanding what went wrong in past releases retroactively (e.g. the
> > longer release cycle for 1.15).
> >
> > I'm curious about opinion's on that topic.
> >
> > Best,
> > Matthias
> >
>


[DISCUSS] FLIP-269: Properly Handling the Processing Timers on Job Termination

2022-11-02 Thread Yun Gao
Hi everyone,
I would like to open a discussion[1] on how to 
properly handle the processing timers on job
termination.
Currently all the processing timers would be
ignored on job termination. This behavior is
not suitable for some cases like WindowOperator. 
Thus we'd like to provide more options for how
to deal with the pending times on job termination, 
and provide correct semantics on bounded stream
for these scenarios. The FLIP is based on the previous 
discussion with Piotr and Divye in [2].
[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-269%3A+Properly+Handling+the+Processing+Timers+on+Job+Termination
 

[2] https://issues.apache.org/jira/browse/FLINK-18647 



[GitHub] [flink-connector-shared-utils] MartijnVisser commented on pull request #1: [FLINK-29472] Add first version of release scripts

2022-11-02 Thread GitBox


MartijnVisser commented on PR #1:
URL: 
https://github.com/apache/flink-connector-shared-utils/pull/1#issuecomment-1299988788

   > Thanks @zentol and @MartijnVisser for driving this, but I saw `dev` mail 
list received all PR updates information from this repo, is there some settings 
incorrect?
   
   Yes, we need to add an `.asf.yaml` file to this repo. Will do in a moment :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-connector-shared-utils] leonardBang commented on pull request #1: [FLINK-29472] Add first version of release scripts

2022-11-02 Thread GitBox


leonardBang commented on PR #1:
URL: 
https://github.com/apache/flink-connector-shared-utils/pull/1#issuecomment-1299982431

   Thanks @zentol and @zentol for driving this, but I saw `dev` mail list 
received all PR updates information from this repo, is there some settings 
incorrect?  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-connector-shared-utils] zentol commented on a diff in pull request #1: [FLINK-29472] Add first version of release scripts

2022-11-02 Thread GitBox


zentol commented on code in PR #1:
URL: 
https://github.com/apache/flink-connector-shared-utils/pull/1#discussion_r1011501540


##
stage_jars.sh:
##
@@ -0,0 +1,55 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+SCRIPT_DIR=$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" &>/dev/null && pwd)
+
+source "${SCRIPT_DIR}/_init.sh"
+source "${SCRIPT_DIR}/_utils.sh"
+
+###
+
+check_variable_set FLINK_MINOR_VERSION
+
+###
+
+function deploy_staging_jars {
+  cd "${SOURCE_DIR}"
+  mkdir -p "${RELEASE_DIR}"
+
+  project_version=$(get_pom_version)
+  if [[ ${project_version} =~ -SNAPSHOT$ ]]; then
+echo "Jars should not be created for SNAPSHOT versions. Use 
'update_branch_version.sh' first."
+exit 1
+  fi
+  version=$(project_version)-${FLINK_MINOR_VERSION}
+
+  echo "Deploying jars v${version} to repository.apache.org"
+  echo "To revert this step, login to 'https://repository.apache.org' -> 
'Staging repositories' -> Select repository -> 'Drop'"
+
+  clone_dir=$(create_pristine_source "${SOURCE_DIR}" "${RELEASE_DIR}")
+  cd "${clone_dir}"
+  set_pom_version "${version}"
+
+  options="-Prelease,docs-and-source -DskipTests 
-DretryFailedDeploymentCount=10"
+  ${MVN} clean deploy ${options}

Review Comment:
   This should set the `flink.version` property so we actually compile against 
the correct Flink version.
   May require the user to provide a full Flink version from which we extract 
the minor version.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-connector-shared-utils] zentol commented on a diff in pull request #1: [FLINK-29472] Add first version of release scripts

2022-11-02 Thread GitBox


zentol commented on code in PR #1:
URL: 
https://github.com/apache/flink-connector-shared-utils/pull/1#discussion_r1011501540


##
stage_jars.sh:
##
@@ -0,0 +1,55 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+SCRIPT_DIR=$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" &>/dev/null && pwd)
+
+source "${SCRIPT_DIR}/_init.sh"
+source "${SCRIPT_DIR}/_utils.sh"
+
+###
+
+check_variable_set FLINK_MINOR_VERSION
+
+###
+
+function deploy_staging_jars {
+  cd "${SOURCE_DIR}"
+  mkdir -p "${RELEASE_DIR}"
+
+  project_version=$(get_pom_version)
+  if [[ ${project_version} =~ -SNAPSHOT$ ]]; then
+echo "Jars should not be created for SNAPSHOT versions. Use 
'update_branch_version.sh' first."
+exit 1
+  fi
+  version=$(project_version)-${FLINK_MINOR_VERSION}
+
+  echo "Deploying jars v${version} to repository.apache.org"
+  echo "To revert this step, login to 'https://repository.apache.org' -> 
'Staging repositories' -> Select repository -> 'Drop'"
+
+  clone_dir=$(create_pristine_source "${SOURCE_DIR}" "${RELEASE_DIR}")
+  cd "${clone_dir}"
+  set_pom_version "${version}"
+
+  options="-Prelease,docs-and-source -DskipTests 
-DretryFailedDeploymentCount=10"
+  ${MVN} clean deploy ${options}

Review Comment:
   This should set the `flink.version` property. May require the user to 
provide a full Flink version from which we extract the minor version.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [DISCUSS] Flink release retro

2022-11-02 Thread Qingsheng Ren
Thanks for starting the discussion Matthias!

I think having a retro after a release cycle would be quite helpful to
standardizing the procedure of the release, and also could avoid new
release managers getting stuck on the same issue that happened before. I
prefer the second option that RMs could open a discussion thread in ML at
the end of the release to collect feedback about the last release cycle and
add them to the release wiki page, which would be quite handy for further
RMs.

Best,
Qingsheng
Ververica (Alibaba)

On Mon, Oct 31, 2022 at 11:02 PM Matthias Pohl
 wrote:

> Hi everyone,
> I want to bring up the idea of having a retrospective on the release from
> the release manager's perspective. The idea would be to collect feedback on
> what went well and what could be improved for a specific minor release. So
> far, I didn't find anything on that topic. Does the community find this
> useful? Or was this already done but not helpful?
>
> I see three options here:
> 1. Having an actual meeting where issues can be discussed and/or
> experiences can be shared between the release managers of the previous
> release and the release managers of the next minor release. Of course, this
> could be open to other contributors as well. A summary could be provided in
> the Flink wiki (the Flink release's wiki article).
> 2. The release manager(s) provide a summary on the Flink release's wiki
> article as part of the release process.
> 3. Leave the process as is without any additional retrospective but focus
> on improving the documentation if issues arose during the release.
>
> That might help people who consider contributing to the community through
> supporting the release efforts. Additionally, it might help in
> understanding what went wrong in past releases retroactively (e.g. the
> longer release cycle for 1.15).
>
> I'm curious about opinion's on that topic.
>
> Best,
> Matthias
>


[GitHub] [flink-connector-shared-utils] zentol commented on a diff in pull request #1: [FLINK-29472] Add first version of release scripts

2022-11-02 Thread GitBox


zentol commented on code in PR #1:
URL: 
https://github.com/apache/flink-connector-shared-utils/pull/1#discussion_r1011369044


##
README.md:
##
@@ -1 +1,79 @@
-This repository contains utilities for [Apache 
Flink](https://flink.apache.org/) connectors.
\ No newline at end of file
+This is a collection of release utils for [Apache 
Flink](https://flink.apache.org/) connectors.
+
+# Integration
+
+The scripts assume that they are integrated into a connector repo as a 
submodule into the connector repo
+under `tools/releasing/`.

Review Comment:
   ```suggestion
   under `tools/releasing/shared`.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-connector-shared-utils] zentol commented on a diff in pull request #1: [FLINK-29472] Add first version of release scripts

2022-11-02 Thread GitBox


zentol commented on code in PR #1:
URL: 
https://github.com/apache/flink-connector-shared-utils/pull/1#discussion_r1011368848


##
_utils.sh:
##
@@ -0,0 +1,59 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+function check_variable_set {
+  variable=$1
+
+  if [ -z "${!variable:-}" ]; then
+  echo "${variable} was not set."
+  exit 1
+  fi
+}
+
+function create_pristine_source {
+  source_dir=$1
+  release_dir=$2
+
+  clone_dir="${release_dir}/tmp-clone"
+  clean_dir="${release_dir}/tmp-clean-clone"
+  # create a temporary git clone to ensure that we have a pristine source 
release
+  git clone "${source_dir}" "${clone_dir}"
+
+  rsync -a \
+--exclude ".git" --exclude ".gitignore" --exclude ".gitattributes" 
--exclude ".gitmodules" --exclude ".github" \
+--exclude ".idea" --exclude "*.iml" \
+--exclude ".DS_Store" \
+--exclude ".asf.yaml" \
+--exclude "target" --exclude "tools/releasing/shared" \

Review Comment:
   ideally the second exclusion would be more dynamic and figure out on its own 
what the `shared` directory is actually called.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (FLINK-29849) Event time temporal join on an upsert source may produce incorrect execution plan

2022-11-02 Thread lincoln lee (Jira)
lincoln lee created FLINK-29849:
---

 Summary: Event time temporal join on an upsert source may produce 
incorrect execution plan
 Key: FLINK-29849
 URL: https://issues.apache.org/jira/browse/FLINK-29849
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.15.2, 1.16.0
Reporter: lincoln lee
 Fix For: 1.17.0


For current implementation, the execution plan is incorrect when do event time 
temporal join on an upsert source. There's two problems:
1.  for an upsert source, we should not add a ChangelogNormalize node under a 
temporal join input, or it will damage the versions of the version table. For 
versioned tables, we use a single-temporal mechanism which relies sequencial 
records of a same key to ensure the valid period of each version, so if the 
ChangelogNormalize was added then an UB message will be produced based on the 
previous  UA or Insert message, and all the columns are totally same include 
event time, e.g., 
original upsert input
{code}
+I (key1, '2022-11-02 10:00:00', a1)
+U (key1, '2022-11-02 10:01:03', a2)
{code}

the versioned data should be:
{code}
v1  [~, '2022-11-02 10:00:00')
v2  ['2022-11-02 10:00:00', '2022-11-02 10:01:03')
{code}

after ChangelogNormalize's processing, will output:
{code}
+I (key1, '2022-11-02 10:00:00', a1)
-U (key1, '2022-11-02 10:00:00', a1)
+U (key1, '2022-11-02 10:01:03', a2)
{code}

versions are incorrect:
{code}
v1  ['2022-11-02 10:00:00', '2022-11-02 10:00:00')  // invalid period
v2  ['2022-11-02 10:00:00', '2022-11-02 10:01:03')
{code}

2. semantically, a filter cannot be pushed into a temporal join which using 
event time, otherwise, the filter may also corrupt the versioned table




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29848) Port BinaryRowDataUtil to Flink Table Store

2022-11-02 Thread Jane Chan (Jira)
Jane Chan created FLINK-29848:
-

 Summary: Port BinaryRowDataUtil to Flink Table Store
 Key: FLINK-29848
 URL: https://issues.apache.org/jira/browse/FLINK-29848
 Project: Flink
  Issue Type: Improvement
  Components: Table Store
Affects Versions: table-store-0.3.0
Reporter: Jane Chan
 Fix For: table-store-0.3.0


Port org.apache.flink.table.data.binary.BinaryRowDataUtil to flink table store



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29847) Store/cache JarUploadResponseBody

2022-11-02 Thread Daren Wong (Jira)
Daren Wong created FLINK-29847:
--

 Summary: Store/cache JarUploadResponseBody
 Key: FLINK-29847
 URL: https://issues.apache.org/jira/browse/FLINK-29847
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Daren Wong
 Fix For: kubernetes-operator-1.3.0


Kubernetes operator currently uploadJar always even when the same JAR has been 
uploaded to JM previously. For example, this occurs during rollback operation.

To improve the performance, we want to cache the JarUploadResponseBody so that 
in the next runJar operation, kubernetes operator will check the cache and 
reuse the jar if it's the same JAR. This "cache" can be as simple as storing 
the uploaded JarFilePath in the CR Status field.

Ref: 
https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/service/AbstractFlinkService.java#L188-L199



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29846) Upgrade Archunit to 1.0.0

2022-11-02 Thread Martijn Visser (Jira)
Martijn Visser created FLINK-29846:
--

 Summary: Upgrade Archunit to 1.0.0
 Key: FLINK-29846
 URL: https://issues.apache.org/jira/browse/FLINK-29846
 Project: Flink
  Issue Type: Technical Debt
  Components: Build System, Tests
Reporter: Martijn Visser


Flink still uses Archunit version 0.22.0. Recently Archunit 1.0.0 has been 
introduced; we should upgrade to this major version and remove all the 
deprecated usages



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29845) ThroughputCalculator throws java.lang.IllegalArgumentException: Time should be non negative under very low throughput cluster

2022-11-02 Thread Jingxiao GU (Jira)
Jingxiao GU created FLINK-29845:
---

 Summary: ThroughputCalculator throws 
java.lang.IllegalArgumentException: Time should be non negative under very low 
throughput cluster
 Key: FLINK-29845
 URL: https://issues.apache.org/jira/browse/FLINK-29845
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Network, Runtime / Task
Affects Versions: 1.14.6
Reporter: Jingxiao GU


Our team are using Flink@1.14 to process data from Kafka.

It works all fine unless the same job jar with same arguments deployed in an 
environment with{color:#FF} *very low kafka source throughput.*{color} The 
job crashed sometimes with the following Exception and could not be able to 
recover unless we restarted TaskManagers, which is unacceptable for a 
production environment.
{code:java}
[2022-10-31T15:33:57.153+08:00] [o.a.f.runtime.taskmanager.Task#cess 
(2/16)#244] - [WARN ] KeyedProcess (2/16)#244 
(b9b54f6445419fc43c4d58fcd95cee82) switched from RUNNING to FAILED with failure 
cause: java.lang.IllegalArgumentException: Time should be non negative
at 
org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:138)
at 
org.apache.flink.runtime.throughput.ThroughputEMA.calculateThroughput(ThroughputEMA.java:44)
at 
org.apache.flink.runtime.throughput.ThroughputCalculator.calculateThroughput(ThroughputCalculator.java:80)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.debloat(StreamTask.java:789)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$null$4(StreamTask.java:781)
at 
org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50)
at 
org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90)
at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:338)
at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:324)
at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:806)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:758)
at 
org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958)
at 
org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:937)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575)
at java.lang.Thread.run(Thread.java:748)
{code}
After checking the source code roughly, we found if buffer debloating is 
disabled 
([https://github.com/apache/flink/blob/release-1.14.6/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java#L427]
 ), the buffer debloater will still be scheduled 
([https://github.com/apache/flink/blob/release-1.14.6/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java#L755]
 ) so that the {{ThrouputCalculator}}  keeps calculating the throughput 
([https://github.com/apache/flink/blob/release-1.14.6/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java#L789]
 ) which causes the division of zero and seems useless as i suppose.

Currently, we tried to workaround by setting 
{{taskmanager.network.memory.buffer-debloat.period: 365d}} to avoid the buffer 
debloater being scheduled frequently causing the random crash.

P.S. We found a bug with similar stacktrace 
https://issues.apache.org/jira/browse/FLINK-25454 which was fixed in 1.14.6.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink-connector-shared-utils] zentol commented on a diff in pull request #1: [FLINK-29472] Add first version of release scripts

2022-11-02 Thread GitBox


zentol commented on code in PR #1:
URL: 
https://github.com/apache/flink-connector-shared-utils/pull/1#discussion_r1011297983


##
_init.sh:
##
@@ -0,0 +1,45 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# all scripts should contain this line + source ${SCRIPT_DIR}/_init.sh
+SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
+
+set -o errexit
+set -o nounset
+set -o pipefail
+
+export SHELLOPTS
+
+###
+
+MVN=${MVN:-mvn}
+
+if [ "$(uname)" == "Darwin" ]; then

Review Comment:
   Doesn't hurt to have it I guess :shrug:
   I suppose we just can't be sure this is the case in general.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-connector-shared-utils] zentol commented on a diff in pull request #1: [FLINK-29472] Add first version of release scripts

2022-11-02 Thread GitBox


zentol commented on code in PR #1:
URL: 
https://github.com/apache/flink-connector-shared-utils/pull/1#discussion_r1011301544


##
publish_git_tag.sh:
##
@@ -0,0 +1,46 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+SCRIPT_DIR=$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" &>/dev/null && pwd)
+
+source ${SCRIPT_DIR}/_init.sh
+source ${SCRIPT_DIR}/_utils.sh
+
+###
+
+RC_NUM=${RC_NUM:-none}
+
+###
+
+function create_release_tag {
+  cd "${SOURCE_DIR}"
+
+  version=$(get_pom_version)
+
+  tag=v${version}
+  if [ "$RC_NUM" != "none" ]; then
+tag=${tag}-rc${RC_NUM}
+  fi

Review Comment:
   We could do that, yes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-connector-shared-utils] zentol commented on a diff in pull request #1: [FLINK-29472] Add first version of release scripts

2022-11-02 Thread GitBox


zentol commented on code in PR #1:
URL: 
https://github.com/apache/flink-connector-shared-utils/pull/1#discussion_r1011301075


##
README.md:
##
@@ -1 +1,79 @@
-This repository contains utilities for [Apache 
Flink](https://flink.apache.org/) connectors.
\ No newline at end of file
+This is a collection of release utils for [Apache 
Flink](https://flink.apache.org/) connectors.
+
+# Integration
+
+The scripts assume that they are integrated into a connector repo as a 
submodule into the connector repo
+under `tools/releasing/`.
+
+# Usage
+
+Some scripts rely on environment variables to be set.  
+These are checked at the start of each script.  
+Any instance of `${some_variable}` in this document refers to an environment 
variable that is used by the respective
+script.
+
+## check_environment.sh
+
+Runs some pre-release checks for the current environment, for example that all 
required programs are available.  
+This should be run once at the start of the release process.
+
+## publish_snapshot_branch.sh
+
+Creates (and pushes!) a new snapshot branch for the current commit.  
+The branch name is automatically determined from the version in the pom.  
+This script should be called when work on a new major/minor version has 
started.
+
+## update_branch_version.sh
+
+Updates the version in the poms of the current branch to `${NEW_VERSION}`.
+
+## stage_source_release.sh
+
+Creates a source release from the current branch and pushes it via `svn`
+to [dist.apache.org](https://dist.apache.org/repops/dist/dev/flink).  
+The version is automatically determined from the version in the pom.  
+The created `svn` directory will contain a `-rc${RC_NUM}` suffix.
+
+## stage_jars.sh
+
+Creates the jars from the current branch and deploys them to 
[repository.apache.org](https://repository.apache.org).  
+The version will be suffixed with `-${FLINK_MINOR_VERSION}` to indicate the 
supported Flink version.  
+If a particular version of a connector supports multiple Flink versions then 
this script should be called multiple
+times.
+
+## publish_git_tag.sh
+
+Creates a release tag for the current branch and pushes it to GitHub.
+The tag will be suffixed with `-rc${RC_NUM}`, if `${RC_NUM}` was set.  
+This script should only be used _after_ the `-SNAPSHOT` version suffix was 
removd via `update_branch_version.sh`.
+
+## update_japicmp_configuration.sh
+
+Sets the japicmp reference version in the pom of the current branch to 
`${NEW_VERSION}`, enables compatibility checks
+for `@PublicEvolving` when used on snapshot branches an clears the list of 
exclusions.  
+This should be called after a release on the associated snapshot branch. If it 
was a minor release it should
+additionally be called on the `main` branch.
+
+# Common workflow
+
+1. run `publish_snapshot_branch.sh`
+2. do some development work on the created snapshot branch
+3. checkout a specific commit to create a release from
+4. run `check_environment.sh`

Review Comment:
   Basically the issue is that creating a snapshot branch will sometimes not be 
done by a release manager.
   For example, after the ES 3.0 release we have a v3.0 branch and a main 
branch on 4.0-SNAPSHOT. v3.1 would be created by _someone_ when we want to make 
some change that requires a new minor version.
   
   So all the other things the check does just aren't required.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [DISCUSS] FLIP-264 Extract BaseCoordinatorContext

2022-11-02 Thread Qingsheng Ren
Thanks Gang and Steven for the FLIP. Actually I share the same concern
with Piotr and Maximilian.

OperatorCoordinator is marked as @Internal intentionally considering some
existing issues, like consistency between non-source operator and
coordinator on checkpoint. I'm wondering if it is useful to expose a public
context to developers but have the OperatorCoordinator as an internal API.
If we finally close all issues and decide to expose the operator
coordinator API, it would be a better chance to include the base context as
a part of it.

Best,
Qingsheng

On Tue, Nov 1, 2022 at 8:29 PM Maximilian Michels  wrote:

> Thanks Steven! My confusion stemmed from the lack of context in the FLIP.
> The first version did not lay out how the refactoring would be used down
> the line, e.g. by the ShuffleCoordinator. The OperatorCoordinator API is a
> non-public API and before reading the code, I wasn't even aware how exactly
> it worked and whether it would be available to regular operators (it was
> originally intended for sources only).
>
> I might seem pedantic here but I believe the purpose of a FLIP should be to
> describe the *why* behind the changes, not only the changes itself. A FLIP
> is not a formality but is a tool to communicate and discuss changes. I
> think we still haven't laid out the exact reasons why we are factoring out
> the base. As far as I understand now, we need the base class to deal with
> concurrent updates in the custom Coordinator from the runtime (sub)tasks.
> Effectively, we are enforcing an actor model for the processing of the
> incoming messages such that the OperatorCoordinator can cleanly update its
> state. However, if there are no actual implementations that make use of the
> refactoring in Flink itself, I wonder if it would make sense to copy this
> code to the downstream implementation, e.g. the ShuffleCoordinator. As soon
> as it is part of Flink, we could of course try to consolidate this code.
>
> Considering the *how* of this, there appear to be both methods from
> SourceCoordinator (e.g. runInEventLoop) as well as SourceCoordinatorContext
> listed in the FLIP, as well as methods which do not appear anywhere in
> Flink code, e.g. subTaskReady / subTaskNotReady / sendEventToOperator. It
> appears that some of this has been extracted from a downstream
> implementation. It would be great to adjust this, such that it reflects the
> status quo in Flink.
>
> -Max
>
> On Fri, Oct 28, 2022 at 5:53 AM Steven Wu  wrote:
>
> > Max,
> >
> > Thanks a lot for the comments. We should clarify that the shuffle
> > operator/coordinator is not really part of the Flink sink
> > function/operator. shuffle operator is a custom operator that can be
> > inserted right before the Iceberg writer operator. Shuffle operator
> > calculates the traffic statistics and performs a custom partition/shuffle
> > (DataStream#partitionCustom) to cluster the data right before they get to
> > the Iceberg writer operator.
> >
> > We are not proposing to introduce a sink coordinator for the sink
> > interface. Shuffle operator needs the CoordinatorContextBase to
> > facilitate the communication btw shuffle subtasks and coordinator for
> > traffic statistics aggregation. The communication part is already
> > implemented by SourceCoordinatorContext.
> >
> > Here are some details about the communication needs.
> > - subtasks periodically calculate local statistics and send to the
> > coordinator for global aggregation
> > - the coordinator sends the globally aggregated statistics to the
> subtasks
> > - subtasks use the globally aggregated statistics to guide the
> > partition/shuffle decision
> >
> > Regards,
> > Steven
> >
> > On Thu, Oct 27, 2022 at 5:38 PM Maximilian Michels 
> wrote:
> >
> > > Hi Gang,
> > >
> > > Looks much better! I've actually gone through the OperatorCoordinator
> > code.
> > > It turns out, any operator already has an OperatorCoordinator assigned.
> > > Also, any operator can add custom coordinator code. So it looks like
> you
> > > won't have to implement any additional runtime logic to add a
> > > ShuffleCoordinator. However, I'm wondering, why do you specifically
> need
> > to
> > > refactor the SourceCoordinatorContext? You could simply add your own
> > > coordinator code. I'm not sure the sink requirements map to the source
> > > interface so closely that you can reuse the same logic.
> > >
> > > If you can refactor SourceCoordinatorContext in a way that makes it fit
> > > your use case, I have nothing to object here. By the way, another
> example
> > > of an existing OperatorCoordinator is CollectSinkOperatorCoordinator
> > which
> > > is quite trivial but it might be worth evaluating whether you need the
> > full
> > > power of SourceCoordinatorContext which is why I wanted to get more
> > > context.
> > >
> > > -Max
> > >
> > > On Thu, Oct 27, 2022 at 4:15 PM gang ye  wrote:
> > >
> > > > Hi Max,
> > > > I got your concern. Since shuffling support for Flink Iceberg sink is
> > not

[GitHub] [flink-connector-shared-utils] zentol commented on a diff in pull request #1: [FLINK-29472] Add first version of release scripts

2022-11-02 Thread GitBox


zentol commented on code in PR #1:
URL: 
https://github.com/apache/flink-connector-shared-utils/pull/1#discussion_r1011297983


##
_init.sh:
##
@@ -0,0 +1,45 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# all scripts should contain this line + source ${SCRIPT_DIR}/_init.sh
+SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
+
+set -o errexit
+set -o nounset
+set -o pipefail
+
+export SHELLOPTS
+
+###
+
+MVN=${MVN:-mvn}
+
+if [ "$(uname)" == "Darwin" ]; then

Review Comment:
   Doesn't hurt to have it I guess :shrug:



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-connector-shared-utils] zentol commented on a diff in pull request #1: [FLINK-29472] Add first version of release scripts

2022-11-02 Thread GitBox


zentol commented on code in PR #1:
URL: 
https://github.com/apache/flink-connector-shared-utils/pull/1#discussion_r1011296837


##
check_environment.sh:
##
@@ -0,0 +1,67 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+SCRIPT_DIR=$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" &>/dev/null && pwd)
+
+source "${SCRIPT_DIR}/_init.sh"
+
+EXIT_CODE=0
+
+function check_program_available {
+  if program=$(command -v ${1}); then
+printf "\t%-10s%s\n" "${1}" "using ${program}"
+  else
+printf "\t%-10s%s\n" "${1}" "is not available."
+EXIT_CODE=1
+  fi
+}
+
+echo "Checking program availability:"
+check_program_available git
+check_program_available tar
+check_program_available rsync
+check_program_available gpg
+check_program_available perl
+check_program_available sed
+check_program_available svn
+check_program_available ${MVN}

Review Comment:
   This script doesn't neither enforces a maven version nor uses fancy features 
that would implicitly require a newer maven version.
   This is all left to the connector.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org