[jira] [Created] (FLINK-29898) ClusterEntrypointTest.testCloseAsyncShouldBeExecutedInShutdownHook fails on CI

2022-11-04 Thread Qingsheng Ren (Jira)
Qingsheng Ren created FLINK-29898:
-

 Summary: 
ClusterEntrypointTest.testCloseAsyncShouldBeExecutedInShutdownHook fails on CI
 Key: FLINK-29898
 URL: https://issues.apache.org/jira/browse/FLINK-29898
 Project: Flink
  Issue Type: Bug
  Components: API / Core
Affects Versions: 1.17.0
Reporter: Qingsheng Ren


[https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=42848=logs=0da23115-68bb-5dcd-192c-bd4c8adebde1=24c3384f-1bcb-57b3-224f-51bf973bbee8=8345]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29897) PyFlinkStreamUserDefinedFunctionTests.test_table_function fails on CI

2022-11-04 Thread Qingsheng Ren (Jira)
Qingsheng Ren created FLINK-29897:
-

 Summary: PyFlinkStreamUserDefinedFunctionTests.test_table_function 
fails on CI
 Key: FLINK-29897
 URL: https://issues.apache.org/jira/browse/FLINK-29897
 Project: Flink
  Issue Type: Bug
  Components: API / Python
Affects Versions: 1.17.0
Reporter: Qingsheng Ren


Error message:
{code:java}
Nov 04 14:42:54 E   at 
sun.reflect.GeneratedMethodAccessor204.invoke(Unknown Source)
Nov 04 14:42:54 E   at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
Nov 04 14:42:54 E   at 
java.lang.reflect.Method.invoke(Method.java:498)
Nov 04 14:42:54 E   at 
org.apache.flink.api.python.shaded.py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
Nov 04 14:42:54 E   at 
org.apache.flink.api.python.shaded.py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
Nov 04 14:42:54 E   at 
org.apache.flink.api.python.shaded.py4j.Gateway.invoke(Gateway.java:282)
Nov 04 14:42:54 E   at 
org.apache.flink.api.python.shaded.py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
Nov 04 14:42:54 E   at 
org.apache.flink.api.python.shaded.py4j.commands.CallCommand.execute(CallCommand.java:79)
Nov 04 14:42:54 E   at 
org.apache.flink.api.python.shaded.py4j.GatewayConnection.run(GatewayConnection.java:238)
Nov 04 14:42:54 E   at java.lang.Thread.run(Thread.java:748)
Nov 04 14:42:54 
Nov 04 14:42:54 .tox/py38/lib/python3.8/site-packages/py4j/protocol.py:326: 
Py4JJavaError {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [jira] [Created] (FLINK-28372) Investigate Akka Artery

2022-11-04 Thread Razin Bouzar
Hi all,

Has there been any further work or discussions around this issue? The Netty
security vulnerabilities in 3.10.6 are of some concern.

On Mon, Jul 4, 2022 at 3:45 AM Chesnay Schepler (Jira) 
wrote:

> Chesnay Schepler created FLINK-28372:
> 
>
>  Summary: Investigate Akka Artery
>  Key: FLINK-28372
>  URL: https://issues.apache.org/jira/browse/FLINK-28372
>  Project: Flink
>   Issue Type: Technical Debt
>   Components: Runtime / RPC
> Reporter: Chesnay Schepler
>
>
> Our current Akka setup uses the deprecated netty-based stack. We need to
> eventually migrate to Akka Artery.
>
>
>
> --
> This message was sent by Atlassian Jira
> (v8.20.10#820010)
>


-- 
RAZIN BOUZAR
Senior Engineer - Monitoring Cloud | Salesforce
Mobile: 317-502-8995




Re: [DISCUSS] FLIP-271: Autoscaling

2022-11-04 Thread Őrhidi Mátyás
Thank you Max, Gyula!

This is definitely an exciting one :)

Cheers,
Matyas

On Fri, Nov 4, 2022 at 1:16 PM Gyula Fóra  wrote:

> Hi!
>
> Thank you for the proposal Max! It is great to see this highly desired
> feature finally take shape.
>
> I think we have all the right building blocks to make this successful.
>
> Cheers,
> Gyula
>
> On Fri, Nov 4, 2022 at 7:37 PM Maximilian Michels  wrote:
>
>> Hi,
>>
>> I would like to kick off the discussion on implementing autoscaling for
>> Flink as part of the Flink Kubernetes operator. I've outlined an approach
>> here which I find promising:
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-271%3A+Autoscaling
>>
>> I've been discussing this approach with some of the operator
>> contributors: Gyula, Marton, Matyas, and Thomas (all in CC). We started
>> prototyping an implementation based on the current FLIP design. If that
>> goes well, we would like to contribute this to Flink based on the results
>> of the discussion here.
>>
>> I'm curious to hear your thoughts.
>>
>> -Max
>>
>


[jira] [Created] (FLINK-29896) DynamoDB CI Failing to run checks

2022-11-04 Thread Danny Cranmer (Jira)
Danny Cranmer created FLINK-29896:
-

 Summary: DynamoDB CI Failing to run checks
 Key: FLINK-29896
 URL: https://issues.apache.org/jira/browse/FLINK-29896
 Project: Flink
  Issue Type: Bug
  Components: Connectors / DynamoDB
Reporter: Danny Cranmer
 Fix For: aws-connector-2.0.0


The checks are failing to actually run the build and test step:
- 
https://github.com/apache/flink-connector-aws/actions/runs/3396739520/jobs/5648279513



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-271: Autoscaling

2022-11-04 Thread Gyula Fóra
Hi!

Thank you for the proposal Max! It is great to see this highly desired
feature finally take shape.

I think we have all the right building blocks to make this successful.

Cheers,
Gyula

On Fri, Nov 4, 2022 at 7:37 PM Maximilian Michels  wrote:

> Hi,
>
> I would like to kick off the discussion on implementing autoscaling for
> Flink as part of the Flink Kubernetes operator. I've outlined an approach
> here which I find promising:
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-271%3A+Autoscaling
>
> I've been discussing this approach with some of the operator contributors:
> Gyula, Marton, Matyas, and Thomas (all in CC). We started prototyping an
> implementation based on the current FLIP design. If that goes well, we
> would like to contribute this to Flink based on the results of the
> discussion here.
>
> I'm curious to hear your thoughts.
>
> -Max
>


[jira] [Created] (FLINK-29895) Improve code coverage and integration tests for DynamoDB implementation of Async Sink

2022-11-04 Thread Yuri Gusev (Jira)
Yuri Gusev created FLINK-29895:
--

 Summary: Improve code coverage and integration tests for DynamoDB 
implementation of Async Sink
 Key: FLINK-29895
 URL: https://issues.apache.org/jira/browse/FLINK-29895
 Project: Flink
  Issue Type: New Feature
  Components: Connectors / DynamoDB
Reporter: Yuri Gusev
Assignee: Yuri Gusev
 Fix For: aws-connector-2.0.0


h2. Motivation

*User stories:*
 As a Flink user, I’d like to use DynamoDB as sink for my data pipeline.

*Scope:*
 * Implement an asynchronous sink for DynamoDB by inheriting the AsyncSinkBase 
class. The implementation can for now reside in its own module in 
flink-connectors.
 * Implement an asynchornous sink writer for DynamoDB by extending the 
AsyncSinkWriter. The implementation must deal with failed requests and retry 
them using the {{requeueFailedRequestEntry}} method. If possible, the 
implementation should batch multiple requests (PutRecordsRequestEntry objects) 
to Firehose for increased throughput. The implemented Sink Writer will be used 
by the Sink class that will be created as part of this story.
 * Java / code-level docs.
 * End to end testing: add tests that hits a real AWS instance. (How to best 
donate resources to the Flink project to allow this to happen?)

h2. References

More details to be found 
[https://cwiki.apache.org/confluence/display/FLINK/FLIP-171%3A+Async+Sink]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[DISCUSS] FLIP-271: Autoscaling

2022-11-04 Thread Maximilian Michels
Hi,

I would like to kick off the discussion on implementing autoscaling for
Flink as part of the Flink Kubernetes operator. I've outlined an approach
here which I find promising:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-271%3A+Autoscaling

I've been discussing this approach with some of the operator contributors:
Gyula, Marton, Matyas, and Thomas (all in CC). We started prototyping an
implementation based on the current FLIP design. If that goes well, we
would like to contribute this to Flink based on the results of the
discussion here.

I'm curious to hear your thoughts.

-Max


Re: Stateful Functions with Flink 1.15 and onwards

2022-11-04 Thread Tzu-Li (Gordon) Tai
@Galen The next step is essentially for someone to be the release manager
and drive the release process for StateFun 3.3.0 [1]. The document for the
release process might be slightly outdated in some places, but overall
outlines the process pretty clearly.

As I mentioned earlier, there's quite a few steps in the process that
requires committer write access, so it would be easier if a committer can
pick this up as the release manager.
I'll be happy to take this on, but I'll only have availability after next
week. If someone else is willing to take this on earlier it would certainly
be better to get the release out ASAP.

[1]
https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Stateful+Functions+Release

On Fri, Nov 4, 2022 at 5:54 AM Galen Warren  wrote:

> Thanks Gordon.
>
> What is the next step here?
>
> On Thu, Nov 3, 2022 at 1:45 PM Tzu-Li (Gordon) Tai 
> wrote:
>
> > FYI, release-3.3 branch has been cut and is ready for the release process
> > for StateFun 3.3.0:
> > https://github.com/apache/flink-statefun/tree/release-3.3
> >
> > On Tue, Nov 1, 2022 at 10:21 AM Tzu-Li (Gordon) Tai  >
> > wrote:
> >
> >> Btw, I'll assume that we're using this thread to gather consensus for
> >> code-freezing for 3.3.x series of StateFun. I know there hasn't been
> much
> >> activity on the repo, so this is just a formality really :)
> >>
> >> From the commit history, it looks like we're mainly including the below
> >> major changes and bug fixes for 3.3.x:
> >> - Flink upgrade to 1.15.2
> >> - https://issues.apache.org/jira/browse/FLINK-26340
> >> - https://issues.apache.org/jira/browse/FLINK-25866
> >> - https://issues.apache.org/jira/browse/FLINK-25936
> >> - https://issues.apache.org/jira/browse/FLINK-25933
> >>
> >> I'll wait for 24 hours before cutting the release branch for 3.3.x,
> >> unless anyone raises any objections before that.
> >>
> >> Thanks,
> >> Gordon
> >>
> >> On Tue, Nov 1, 2022 at 10:09 AM Galen Warren 
> >> wrote:
> >>
> >>> Thanks Gordon and Filip, I appreciate your help on this one.
> >>>
> >>> On Tue, Nov 1, 2022 at 1:07 PM Tzu-Li (Gordon) Tai <
> tzuli...@apache.org>
> >>> wrote:
> >>>
>  PR for upgrading to Flink 1.15.2 has been merged. Thanks for the
>  efforts,
>  Galen and Filip!
> 
>  We should be ready to kick off a new release for StateFun with the
> Flink
>  version upgrade.
>  I'll cut off a release branch now on apache/flink-statefun for
>  release-3.3.x to move things forward.
>  @Galen, @Filip if you want to, after the release branch is cut, you
>  could
>  probably upgrade the master branch to Flink 1.16.x as well.
> 
>  Afterwards we should decide who is available to drive the actual
> release
>  process for 3.3.0.
>  There's quite a few steps that would require committer write access.
>  Unless someone else is up for this earlier, I'll have some
> availability
>  towards the end of next week to help drive this.
> 
>  Thanks,
>  Gordon
> 
>  On Mon, Oct 31, 2022 at 12:17 PM Galen Warren <
> ga...@cvillewarrens.com>
>  wrote:
> 
>  > Yes, that makes sense.
>  >
>  > PR is here: [FLINK-29814][statefun] Change supported Flink version
> to
>  > 1.15.2 by galenwarren · Pull Request #319 · apache/flink-statefun
>  > (github.com) .
>  >
>  > On Mon, Oct 31, 2022 at 11:35 AM Till Rohrmann <
> trohrm...@apache.org>
>  > wrote:
>  >
>  > > I think there might still be value in supporting 1.15 since not
>  everyone
>  > > upgrades Flink very fast. Hopefully, for Statefun the diff between
>  Flink
>  > > 1.15 and 1.16 boils down to changing the Flink dependencies.
>  > >
>  > > Cheers,
>  > > Till
>  > >
>  > > On Mon, Oct 31, 2022 at 2:06 PM Galen Warren <
>  ga...@cvillewarrens.com>
>  > > wrote:
>  > >
>  > >> Sure thing. One question -- Flink 1.16 was just released a few
>  days ago.
>  > >> Should I support 1.15, or just go straight to 1.16?
>  > >>
>  > >> On Mon, Oct 31, 2022 at 8:49 AM Till Rohrmann <
>  trohrm...@apache.org>
>  > >> wrote:
>  > >>
>  > >>> Hi folks,
>  > >>>
>  > >>> if you can open a PR for supporting Flink 1.15 Galen, then this
>  would
>  > be
>  > >>> awesome. I've assigned you to this ticket. The next thing after
>  merging
>  > >>> this PR would be creating a new StateFun release. Once we have
>  merged
>  > the
>  > >>> PR, let's check who can help with it the fastest.
>  > >>>
>  > >>> Cheers,
>  > >>> Till
>  > >>>
>  > >>> On Mon, Oct 31, 2022 at 1:10 PM Galen Warren <
>  ga...@cvillewarrens.com>
>  > >>> wrote:
>  > >>>
>  >  Yes, I could do that.
>  > 
>  >  On Mon, Oct 31, 2022 at 7:48 AM Filip Karnicki <
>  >  filip.karni...@gmail.com> wrote:
>  > 
>  > 

Re: SQL Gateway and SQL Client

2022-11-04 Thread Jamie Grier
Hi Shengkai,

We're doing more and more Flink development at Confluent these days and we're 
currently trying to bootstrap a prototype that relies on the SQL Client and 
Gateway.  We will be using the the components in some of our projects and would 
like to co-develop them with you and the rest of the Flink community.

As of right now it's a pretty big blocker for our upcoming milestone that the 
SQL Client has not yet been modified to talk to the SQL Gateway and we want to 
help with this effort ASAP!  We would be even willing to take over the work if 
it's not yet started but I suspect it already is.

Anyway, rather than start working immediately on the SQL Client and adding a 
the new Gateway mode ourselves we wanted to start a conversation with you and 
see where you're at with things and offer to help.

Do you have anything you can already share so we can start with your work or 
just play around with it.  Like I said, we just want to get started and are 
very able to help in this area.  We see both the SQL Client and Gateway being 
very important for us and have a good team to help develop it.

Let me know if there is a branch you can share, etc.  It would be much 
appreciated!

-Jamie Grier


On 2022/10/28 06:06:49 Shengkai Fang wrote:
> Hi.
> 
> > Is there a possibility for us to get engaged and at least introduce
> initial changes to support authentication/authorization?
> 
> Yes. You can write a FLIP about the design and change. We can discuss this
> in the dev mail. If the FLIP passes, we can develop it together.
> 
> > Another question about persistent Gateway: did you have any specific
> thoughts about it or some draft design?
> 
> We don't have any detailed plan about this. But I know Livy has a similar
> feature.
> 
> Best,
> Shengkai
> 
> 
> Alexey Leonov-Vendrovskiy  于2022年10月27日周四 15:12写道:
> 
> > Apologies from the delayed response on my side.
> >
> >  I think the authentication module is not part of our plan in 1.17 because
> >> of the busy work. I think we'll start the design at the end of the
> >> release-1.17.
> >
> >
> > Is there a possibility for us to get engaged and at least introduce
> > initial changes to support authentication/authorization? Specifically,
> > changes in the API and in SQL Client.
> >
> > We expect the following authentication flow:
> >
> > On the SQL gateway we want to be able to use a delegation token.
> > SQL client should be able to supply an API key.
> > The SQL Gateway *would not *be submitting jobs on behalf of the client.
> >
> > Ideally it would be nice to introduce some interfaces in the SQL Gateway
> > that would allow implementing custom authentication and authorization.
> >
> > Another question about persistent Gateway: did you have any specific
> > thoughts about it or some draft design?
> >
> > Thanks,
> > Alexey
> >
> >
> > On Fri, Oct 21, 2022 at 1:13 AM Shengkai Fang  wrote:
> >
> >> Sorry for the late response.
> >>
> >> In the next version(Flink 1.17), we plan to support the SQL Client to
> >> submit the statement to the Flink SQL Gateway. The FLINK-29486
> >>  is the first step to
> >> remove the usage of the `Parser` in the client side, which needs to read
> >> the table schema during the converting sql node to operation. I think the 
> >> authentication
> >> module is not part of our plan in 1.17 because of the busy work. I think
> >> we'll start the design at the end of the release-1.17.
> >> But could you share more details about the requirements of the
> >> authentication?
> >> - Do you use the kerberos or delegation token or password to do the
> >> authentication?
> >> - After the authentication, do you need the sql gateway to submit the
> >> job on behalf of the client?
> >> - ...
> >>
> >> For detailed implementation, I think Hive and Presto are good examples to
> >> dig in.  If you have some thoughts about the authentication module,
> >> please let me know.
> >>
> >> Best,
> >> Shengkai
> >>
> >> Alexey Leonov-Vendrovskiy  于2022年10月19日周三 00:37写道:
> >>
> >>> Thank you for the response, Yuxia!
> >>>
> >>> Shengkai, I would like to learn more about nearest and a bit more
> >>> distant plans about development of the SQL Gateway and the SQL Client.
> >>> Do you have a description of the work planned or maybe can share general
> >>> thoughts about the Authentication module, or Persistent Gateway.
> >>> How can the authentication part be addressed on the SQL Client side?
> >>>
> >>> Regards,
> >>> -Alexey
> >>>
> >>>
> >>> On Wed, Oct 12, 2022 at 11:24 PM yuxia 
> >>> wrote:
> >>>
>  > In what Flink’s release the connection from SQL Client to the Gateway
>  is
>  expected to be added?
>  Flink 1.17
> 
>  > “Authentication module” (2) and “Persistent Gateway” (4) as
>  possible future work. Were there any recent discussions on these
>  subjects?
>  No recent discussions on these subjects, but I think it'll come in
>  Flink 1.17
> 
>  > 

[jira] [Created] (FLINK-29894) Rename flink-connector-dynamodb to flink-connector-aws

2022-11-04 Thread Danny Cranmer (Jira)
Danny Cranmer created FLINK-29894:
-

 Summary: Rename flink-connector-dynamodb to flink-connector-aws
 Key: FLINK-29894
 URL: https://issues.apache.org/jira/browse/FLINK-29894
 Project: Flink
  Issue Type: Sub-task
Reporter: Danny Cranmer


The existing 
{{[flink-connector-dynamodb|[http://example.com|https://github.com/apache/flink-connector-dynamodb]]}}
 repository should be renamed to {{flink-connector-aws}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29893) k8s python e2e test failed due to permission error when pulling the Docker image

2022-11-04 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-29893:
-

 Summary: k8s python e2e test failed due to permission error when 
pulling the Docker image
 Key: FLINK-29893
 URL: https://issues.apache.org/jira/browse/FLINK-29893
 Project: Flink
  Issue Type: Improvement
  Components: Test Infrastructure
Affects Versions: 1.17.0
Reporter: Matthias Pohl


[This build 
failed|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=42415=logs=bea52777-eaf8-5663-8482-18fbc3630e81=b2642e3a-5b86-574d-4c8a-f7e2842bfb14=11824]
 in {{test_kubernetes_pyflink_application}} due to permission issues when 
pulling the Docker image:

{quote}
Failed to pull image "test_kubernetes_pyflink_application": rpc error: code = 
Unknown desc = Error response from daemon: pull access denied for 
test_kubernetes_pyflink_application, repository does not exist or may require 
'docker login': denied: requested access to the resource is denied
{quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29892) flink-conf.yaml does not accept hash (#) in the env.java.opts property

2022-11-04 Thread Sergio Sainz (Jira)
Sergio Sainz created FLINK-29892:


 Summary: flink-conf.yaml does not accept hash (#) in the 
env.java.opts property
 Key: FLINK-29892
 URL: https://issues.apache.org/jira/browse/FLINK-29892
 Project: Flink
  Issue Type: Bug
  Components: Deployment / Kubernetes
Affects Versions: 1.15.2
Reporter: Sergio Sainz


When adding a string with hash (#) character in env.java.opts in 
flink-conf.yaml , the string will be truncated from the # onwards even when the 
value is surrounded by single quotes or double quotes.

example:

(in flink-conf.yaml):

env.java.opts: -Djavax.net.ssl.trustStorePassword=my#pwd

 

the value shown on the flink taskmanagers or job managers is :

env.java.opts: -Djavax.net.ssl.trustStorePassword=my

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29891) The RetryRule tests print some excessive stacktraces that might confuse people investigating actual issues

2022-11-04 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-29891:
-

 Summary: The RetryRule tests print some excessive stacktraces that 
might confuse people investigating actual issues
 Key: FLINK-29891
 URL: https://issues.apache.org/jira/browse/FLINK-29891
 Project: Flink
  Issue Type: Improvement
  Components: Tests
Affects Versions: 1.17.0, 1.15.3, 1.16.1
Reporter: Matthias Pohl
 Attachments: RetryRule.log

The changes introduced with FLINK-29198 cause excessive stacktraces in the CI 
logs. We might want to change that to avoid confusing people going through the 
logs for other errors. (see attached file)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29890) UDFs classloading from JARs in 1.16 is broken

2022-11-04 Thread Alexander Fedulov (Jira)
Alexander Fedulov created FLINK-29890:
-

 Summary: UDFs classloading from JARs in 1.16 is broken
 Key: FLINK-29890
 URL: https://issues.apache.org/jira/browse/FLINK-29890
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Runtime
Affects Versions: 1.16.0
Reporter: Alexander Fedulov


1.16 introduced a lot of changes with respect to classloading in the Table API. 
The way UDFs could previously be loaded from JARs in 1.15 does not work in 1.16 
anymore - it fails with the ClassNotFound exception when UDFs are used at 
runtime. 

Here is a repository with a reproducible example:
[https://github.com/afedulov/udfs-flink-1.16/blob/main/src/test/java/com/example/UDFTest.java]
 
It works as is (Flink 1.15.2) and fails when switching the dependencies to 
1.16.0.

Here are some of the PRs that, I believe, might be related to the issue:
[https://github.com/apache/flink/pull/20001]
[https://github.com/apache/flink/pull/19845]
[https://github.com/apache/flink/pull/20211] (fixes a similar issue introduced 
after classloading changes in 1.16)
 
It is unclear how UDFs can be loaded from JARs in 1.16.
Ideally, this also should be covered by tests and described in the 
documentation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29889) netty-tcnative-static does not bundle tcnative-classes

2022-11-04 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-29889:


 Summary: netty-tcnative-static does not bundle tcnative-classes
 Key: FLINK-29889
 URL: https://issues.apache.org/jira/browse/FLINK-29889
 Project: Flink
  Issue Type: Bug
  Components: BuildSystem / Shaded
Affects Versions: shaded-16.0
Reporter: Chesnay Schepler
 Fix For: shaded-16.1






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-269: Properly Handling the Processing Timers on Job Termination

2022-11-04 Thread Maximilian Michels
Hey Yun,

I wonder whether we need to add a new option when registering timers. Won't
it be sufficient to flush all pending timers on termination but not allow
new ones to be registered?

-Max

On Wed, Nov 2, 2022 at 11:20 AM Yun Gao 
wrote:

> Hi everyone,
> I would like to open a discussion[1] on how to
> properly handle the processing timers on job
> termination.
> Currently all the processing timers would be
> ignored on job termination. This behavior is
> not suitable for some cases like WindowOperator.
> Thus we'd like to provide more options for how
> to deal with the pending times on job termination,
> and provide correct semantics on bounded stream
> for these scenarios. The FLIP is based on the previous
> discussion with Piotr and Divye in [2].
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-269%3A+Properly+Handling+the+Processing+Timers+on+Job+Termination
> <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-269%3A+Properly+Handling+the+Processing+Timers+on+Job+Termination
> >
> [2] https://issues.apache.org/jira/browse/FLINK-18647 <
> https://issues.apache.org/jira/browse/FLINK-18647 >
>


Re: Stateful Functions with Flink 1.15 and onwards

2022-11-04 Thread Galen Warren
Thanks Gordon.

What is the next step here?

On Thu, Nov 3, 2022 at 1:45 PM Tzu-Li (Gordon) Tai 
wrote:

> FYI, release-3.3 branch has been cut and is ready for the release process
> for StateFun 3.3.0:
> https://github.com/apache/flink-statefun/tree/release-3.3
>
> On Tue, Nov 1, 2022 at 10:21 AM Tzu-Li (Gordon) Tai 
> wrote:
>
>> Btw, I'll assume that we're using this thread to gather consensus for
>> code-freezing for 3.3.x series of StateFun. I know there hasn't been much
>> activity on the repo, so this is just a formality really :)
>>
>> From the commit history, it looks like we're mainly including the below
>> major changes and bug fixes for 3.3.x:
>> - Flink upgrade to 1.15.2
>> - https://issues.apache.org/jira/browse/FLINK-26340
>> - https://issues.apache.org/jira/browse/FLINK-25866
>> - https://issues.apache.org/jira/browse/FLINK-25936
>> - https://issues.apache.org/jira/browse/FLINK-25933
>>
>> I'll wait for 24 hours before cutting the release branch for 3.3.x,
>> unless anyone raises any objections before that.
>>
>> Thanks,
>> Gordon
>>
>> On Tue, Nov 1, 2022 at 10:09 AM Galen Warren 
>> wrote:
>>
>>> Thanks Gordon and Filip, I appreciate your help on this one.
>>>
>>> On Tue, Nov 1, 2022 at 1:07 PM Tzu-Li (Gordon) Tai 
>>> wrote:
>>>
 PR for upgrading to Flink 1.15.2 has been merged. Thanks for the
 efforts,
 Galen and Filip!

 We should be ready to kick off a new release for StateFun with the Flink
 version upgrade.
 I'll cut off a release branch now on apache/flink-statefun for
 release-3.3.x to move things forward.
 @Galen, @Filip if you want to, after the release branch is cut, you
 could
 probably upgrade the master branch to Flink 1.16.x as well.

 Afterwards we should decide who is available to drive the actual release
 process for 3.3.0.
 There's quite a few steps that would require committer write access.
 Unless someone else is up for this earlier, I'll have some availability
 towards the end of next week to help drive this.

 Thanks,
 Gordon

 On Mon, Oct 31, 2022 at 12:17 PM Galen Warren 
 wrote:

 > Yes, that makes sense.
 >
 > PR is here: [FLINK-29814][statefun] Change supported Flink version to
 > 1.15.2 by galenwarren · Pull Request #319 · apache/flink-statefun
 > (github.com) .
 >
 > On Mon, Oct 31, 2022 at 11:35 AM Till Rohrmann 
 > wrote:
 >
 > > I think there might still be value in supporting 1.15 since not
 everyone
 > > upgrades Flink very fast. Hopefully, for Statefun the diff between
 Flink
 > > 1.15 and 1.16 boils down to changing the Flink dependencies.
 > >
 > > Cheers,
 > > Till
 > >
 > > On Mon, Oct 31, 2022 at 2:06 PM Galen Warren <
 ga...@cvillewarrens.com>
 > > wrote:
 > >
 > >> Sure thing. One question -- Flink 1.16 was just released a few
 days ago.
 > >> Should I support 1.15, or just go straight to 1.16?
 > >>
 > >> On Mon, Oct 31, 2022 at 8:49 AM Till Rohrmann <
 trohrm...@apache.org>
 > >> wrote:
 > >>
 > >>> Hi folks,
 > >>>
 > >>> if you can open a PR for supporting Flink 1.15 Galen, then this
 would
 > be
 > >>> awesome. I've assigned you to this ticket. The next thing after
 merging
 > >>> this PR would be creating a new StateFun release. Once we have
 merged
 > the
 > >>> PR, let's check who can help with it the fastest.
 > >>>
 > >>> Cheers,
 > >>> Till
 > >>>
 > >>> On Mon, Oct 31, 2022 at 1:10 PM Galen Warren <
 ga...@cvillewarrens.com>
 > >>> wrote:
 > >>>
 >  Yes, I could do that.
 > 
 >  On Mon, Oct 31, 2022 at 7:48 AM Filip Karnicki <
 >  filip.karni...@gmail.com> wrote:
 > 
 > > Hi All
 > >
 > > So what's the play here?
 > >
 > > Galen, what do you think about taking this on? Perhaps ++Till
 would
 > > assign this jira to you (with your permission) given he's
 helped me
 > out
 > > with statefun work before
 > > https://issues.apache.org/jira/browse/FLINK-29814
 > >
 > > I can try to move to move statefun to flink 1.16 when it's out
 > >
 > >
 > > Kind regards
 > > Fil
 > >
 > > On Thu, 27 Oct 2022 at 10:02, Filip Karnicki <
 > filip.karni...@gmail.com>
 > > wrote:
 > >
 > >> Hi All
 > >>
 > >> Our use case is that we need to process elements for the same
 key
 > >> sequentially, and this processing involves async operations.
 > >>
 > >> If any part of the processing fails, we store the offending
 and all
 > >> subsequent incoming messages for that key in the state and not
 > process any
 > >> further messages for that key, until a 

Re: ASF Slack

2022-11-04 Thread Martijn Visser
Hi Max,

> I wonder how that has played out since the creation of the Slack
workspace? I haven't been following the Slack communication.

I think it has primarily played a role next to the existing User mailing
list: lots of user questions. There were maybe a handful of conversations
where the result of the conversation was a request to open a Jira, create a
FLIP or open up a discussion on the Dev list.

> There is an invite link that they can use. Maybe not the most
straight-forward process but I think it doesn't stop users from joining.

That's only possible to use for those with an @apache.org e-mail address,
see https://infra.apache.org/slack.html. Anyone else would need to be
invited by a committer, but I wouldn't be in favour of needing to spend
committers time in adding/inviting members on an ASF Slack instance.

Best regards,

Martijn

On Fri, Nov 4, 2022 at 12:33 PM Maximilian Michels  wrote:

> Hey Martijn, hi Chesnay,
>
> >The big problem of using the ASF Slack instance is that users can join
> any Slack channel, including ones outside of Flink.
>
> That is one of the main motivations for proposing to move to the ASF
> Slack. We can create an unlimited number of "flink-XY" channels in the ASF
> Slack, although one or two are probably all we need. It seems logical that
> we share the Slack workspace, just like the other infrastructure like JIRA,
> mail, Jenkins, web server, etc. I guess I'm just in too many Slack
> workspaces already.
>
> From an ASF standpoint, I think it would be desirable to channel people
> into the main workspace to promote the software foundation. Also, the main
> point of communication should still be the mailing lists. I wonder how that
> has played out since the creation of the Slack workspace? I haven't been
> following the Slack communication.
>
> >Not to mention that non-committers need to be invited by an ASF Slack
> user to be able to join.
>
> There is an invite link that they can use. Maybe not the most
> straight-forward process but I think it doesn't stop users from joining.
>
> Linen is pretty nice, especially for making Slack searchable via search
> engines.
>
> -Max
>
> On Thu, Nov 3, 2022 at 2:44 PM Chesnay Schepler 
> wrote:
>
>> According to Robert linen is supposed to show the entire history.
>>
>> On 03/11/2022 14:27, Martijn Visser wrote:
>> > Addition: I'm not sure if Linen actually provides the messages that are
>> > older, but I can't test that since we've only recently integrated it.
>> >
>> > On Thu, Nov 3, 2022 at 2:23 PM Martijn Visser > >
>> > wrote:
>> >
>> >> Hi Max,
>> >>
>> >>  From my experience, most ASF projects actually don't use the official
>> >> Slack, but their own instance. That has happened with Airflow, Iceberg,
>> >> Hudi etc.
>> >> The big problem of using the ASF Slack instance is that users can join
>> any
>> >> Slack channel, including ones outside of Flink. Not to mention that
>> >> non-committers need to be invited by an ASF Slack user to be able to
>> join.
>> >> The problem with the history is mitigated by using the service from
>> Linen
>> >> https://www.linen.dev/s/apache-flink
>> >>
>> >> I prefer the current Slack instance of the ASF one.
>> >>
>> >> Thanks,
>> >>
>> >> Martijn
>> >>
>> >> On Thu, Nov 3, 2022 at 1:14 PM Maximilian Michels 
>> wrote:
>> >>
>> >>> Hi,
>> >>>
>> >>> A while ago the Flink community decided to create a Slack workspace
>> [1]. I
>> >>> wonder, is there a particular reason why we created our own Slack
>> >>> workspace
>> >>> and do not use the official ASF Slack [2]? It looks like most ASF
>> projects
>> >>> use the official Slack. Using the official Slack makes collaboration
>> with
>> >>> other projects easier but it also comes with additional benefits, e.g.
>> >>> free
>> >>> unlimited history (our current workspace suffers from limited
>> history).
>> >>>
>> >>> Since the vote thread[1] mentioned a re-evaluation of the decision
>> until
>> >>> the end of 2022, I'd like to propose to move our channels to the
>> official
>> >>> ASF Slack workspace. In my eyes, the dedicated workspace is a bit of
>> an
>> >>> overkill and we're better off joining the rest of the ASF community.
>> Let
>> >>> me
>> >>> know what you think.
>> >>>
>> >>> -Max
>> >>>
>> >>> [1] https://lists.apache.org/thread/j2kdh3zx6rrb49mz5n2vb06g2knogxdj
>> >>> [2] https://infra.apache.org/slack.html
>> >>>
>>
>>


[jira] [Created] (FLINK-29888) Improve MutatedConfigurationException for disallowed changes in CheckpointConfig and ExecutionConfig

2022-11-04 Thread Piotr Nowojski (Jira)
Piotr Nowojski created FLINK-29888:
--

 Summary: Improve MutatedConfigurationException for disallowed 
changes in CheckpointConfig and ExecutionConfig
 Key: FLINK-29888
 URL: https://issues.apache.org/jira/browse/FLINK-29888
 Project: Flink
  Issue Type: Improvement
  Components: API / DataStream
Affects Versions: 1.17.0
Reporter: Piotr Nowojski
Assignee: Piotr Nowojski
 Fix For: 1.17.0


Currently if {{CheckpointConfig}} or {{ExecutionConfig}} are modified in a 
non-allowed way, user gets a generic error "Configuration object 
ExecutionConfig changed", without a hint of what has been modified. With 
FLINK-29379 we can improve this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: ASF Slack

2022-11-04 Thread Maximilian Michels
Hey Martijn, hi Chesnay,

>The big problem of using the ASF Slack instance is that users can join any
Slack channel, including ones outside of Flink.

That is one of the main motivations for proposing to move to the ASF Slack.
We can create an unlimited number of "flink-XY" channels in the ASF Slack,
although one or two are probably all we need. It seems logical that we
share the Slack workspace, just like the other infrastructure like JIRA,
mail, Jenkins, web server, etc. I guess I'm just in too many Slack
workspaces already.

>From an ASF standpoint, I think it would be desirable to channel people
into the main workspace to promote the software foundation. Also, the main
point of communication should still be the mailing lists. I wonder how that
has played out since the creation of the Slack workspace? I haven't been
following the Slack communication.

>Not to mention that non-committers need to be invited by an ASF Slack user
to be able to join.

There is an invite link that they can use. Maybe not the most
straight-forward process but I think it doesn't stop users from joining.

Linen is pretty nice, especially for making Slack searchable via search
engines.

-Max

On Thu, Nov 3, 2022 at 2:44 PM Chesnay Schepler  wrote:

> According to Robert linen is supposed to show the entire history.
>
> On 03/11/2022 14:27, Martijn Visser wrote:
> > Addition: I'm not sure if Linen actually provides the messages that are
> > older, but I can't test that since we've only recently integrated it.
> >
> > On Thu, Nov 3, 2022 at 2:23 PM Martijn Visser 
> > wrote:
> >
> >> Hi Max,
> >>
> >>  From my experience, most ASF projects actually don't use the official
> >> Slack, but their own instance. That has happened with Airflow, Iceberg,
> >> Hudi etc.
> >> The big problem of using the ASF Slack instance is that users can join
> any
> >> Slack channel, including ones outside of Flink. Not to mention that
> >> non-committers need to be invited by an ASF Slack user to be able to
> join.
> >> The problem with the history is mitigated by using the service from
> Linen
> >> https://www.linen.dev/s/apache-flink
> >>
> >> I prefer the current Slack instance of the ASF one.
> >>
> >> Thanks,
> >>
> >> Martijn
> >>
> >> On Thu, Nov 3, 2022 at 1:14 PM Maximilian Michels 
> wrote:
> >>
> >>> Hi,
> >>>
> >>> A while ago the Flink community decided to create a Slack workspace
> [1]. I
> >>> wonder, is there a particular reason why we created our own Slack
> >>> workspace
> >>> and do not use the official ASF Slack [2]? It looks like most ASF
> projects
> >>> use the official Slack. Using the official Slack makes collaboration
> with
> >>> other projects easier but it also comes with additional benefits, e.g.
> >>> free
> >>> unlimited history (our current workspace suffers from limited history).
> >>>
> >>> Since the vote thread[1] mentioned a re-evaluation of the decision
> until
> >>> the end of 2022, I'd like to propose to move our channels to the
> official
> >>> ASF Slack workspace. In my eyes, the dedicated workspace is a bit of an
> >>> overkill and we're better off joining the rest of the ASF community.
> Let
> >>> me
> >>> know what you think.
> >>>
> >>> -Max
> >>>
> >>> [1] https://lists.apache.org/thread/j2kdh3zx6rrb49mz5n2vb06g2knogxdj
> >>> [2] https://infra.apache.org/slack.html
> >>>
>
>


Re: [DISCUSS]Introduce a time-segment based restart strategy

2022-11-04 Thread Paul Lam
In addition, there’s another viable alternative strategy that could be 
applied with or without the proposed strategy.

We could group the exceptions occurred in an interval by exception
class. Only a distinct exception within an interval is counted as one
failure. 

The upside is that it’s more fine-grained and wouldn’t increase the
unnecessary retry time if the job are failed due to different causes.

Best,
Paul Lam

> 2022年11月4日 17:33,Paul Lam  写道:
> 
> Hi Weihua,
> 
> +1 for the new restart strategy you suggested.
> 
> We’re also using failure-rate strategy as the cluster-wide default and 
> faced the same problem, which we solved with a similar approach.
> 
> FYI. We added a freeze period config option to failure-rate strategy. 
> The freeze period would prevent counting further errors after the first
>  failure happens, so that a burst errors would not exhaust the 
> number of allow failures. 
> 
> Best,
> Paul Lam
> 
>> 2022年11月4日 16:45,Weihua Hu > > 写道:
>> 
>> Hi, everyone
>> 
>> I'd like to bring up a discussion about restart strategy. Flink supports 3
>> kinds of restart strategy. These work very well for jobs with specific
>> configs, but for platform users who manage hundreds of jobs, there is no
>> common strategy to use.
>> 
>> Let me explain the reason. We manage a lot of jobs, some are
>> keyby-connected with one region per job, some are rescale-connected with
>> many regions per job, and when using the failure rate restart strategy, we
>> cannot achieve the same control with the same configuration.
>> 
>> For example, if I want the job to fail when there are 3 exceptions within 5
>> minutes, the config would look like this:
>> 
>>> restart-strategy.failure-rate.max-failures-per-interval: 3
>>> 
>>> restart-strategy.failure-rate.failure-rate-interval: 5 min
>>> 
>> For the keyby-connected job, this config works well.
>> 
>> However, for the rescale-connected job, we need to consider the number of
>> regions and the number of slots per TaskManager. If each TM has 3 slots,
>> and these 3 slots run the task of 3 regions, then when one TaskManager
>> crashes, it will trigger 3 regions to fail, and the job will fail because
>> it exceeds the threshold of the restart strategy. To avoid the effect of
>> single TM crashes, I must increase the max-failures-per-interval to 9, but
>> after the change, user task exceptions will be more tolerant than I want.
>> 
>> 
>> Therefore, I want to introduce a new restart strategy based on time
>> periods. A continuous period of time (e.g., 5 minutes) is divided into
>> segments of a specific length (e.g., 1 minute). If an exception occurs
>> within a segment (no matter how many times), it is marked as a failed
>> segment. Similar to failure-rate restart strategy, the job will fail when
>> there are 'm' failed segments in the interval of 'n' .
>> 
>> In this mode, the keyby-connected and rescale-connected jobs can use
>> unified configurations.
>> 
>> This is a user-relevant change, so if you think this is worth to do, maybe
>> I can create a FLIP to describe it in detail.
>> Best,
>> Weihua
> 



[jira] [Created] (FLINK-29887) Lookup cache in JDBC table connector is not each process (i.e. TaskManager) will hold a cache

2022-11-04 Thread linweijiang (Jira)
linweijiang created FLINK-29887:
---

 Summary:  Lookup cache in JDBC table connector  is not each 
process (i.e. TaskManager) will hold a cache
 Key: FLINK-29887
 URL: https://issues.apache.org/jira/browse/FLINK-29887
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / JDBC
Reporter: linweijiang


hi~ I saw the description of "When lookup cache is enabled, each process (i.e. 
TaskManager) will hold a cache" on the website. But when I print out the 
hashCode of cache in each slot’s thread, I find that they are inconsistent. Can 
you help me explain? thks~

 
{code:java}
//代码占位符
2022-11-04 17:22:53,118 INFO  
org.apache.flink.connector.jdbc.table.JdbcRowDataLookupFunction [Source: 
daily[1] -> Calc[2] -> LookupJoin[3] -> Calc[4] -> ConstraintEnforcer[5] 
(8/8)#0] [] - cache hashCode is: 
org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache$LocalManualCache@656ae7d9
2022-11-04 17:22:53,118 INFO  
org.apache.flink.connector.jdbc.table.JdbcRowDataLookupFunction [Source: 
daily[1] -> Calc[2] -> LookupJoin[3] -> Calc[4] -> ConstraintEnforcer[5] 
(6/8)#0] [] - cache hashCode is: 
org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache$LocalManualCache@5c3a31c2
2022-11-04 17:22:53,118 INFO  
org.apache.flink.connector.jdbc.table.JdbcRowDataLookupFunction [Source: 
daily[1] -> Calc[2] -> LookupJoin[3] -> Calc[4] -> ConstraintEnforcer[5] 
(5/8)#0] [] - cache hashCode is: 
org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache$LocalManualCache@598a856e
2022-11-04 17:22:53,118 INFO  
org.apache.flink.connector.jdbc.table.JdbcRowDataLookupFunction [Source: 
daily[1] -> Calc[2] -> LookupJoin[3] -> Calc[4] -> ConstraintEnforcer[5] 
(7/8)#0] [] - cache hashCode is: 
org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache$LocalManualCache@765328ef
2022-11-04 17:22:53,118 INFO  
org.apache.flink.connector.jdbc.table.JdbcRowDataLookupFunction [Source: 
daily[1] -> Calc[2] -> LookupJoin[3] -> Calc[4] -> ConstraintEnforcer[5] 
(3/8)#0] [] - cache hashCode is: 
org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache$LocalManualCache@47f36967
2022-11-04 17:22:53,118 INFO  
org.apache.flink.connector.jdbc.table.JdbcRowDataLookupFunction [Source: 
daily[1] -> Calc[2] -> LookupJoin[3] -> Calc[4] -> ConstraintEnforcer[5] 
(1/8)#0] [] - cache hashCode is: 
org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache$LocalManualCache@2b2ea2f
2022-11-04 17:22:53,118 INFO  
org.apache.flink.connector.jdbc.table.JdbcRowDataLookupFunction [Source: 
daily[1] -> Calc[2] -> LookupJoin[3] -> Calc[4] -> ConstraintEnforcer[5] 
(4/8)#0] [] - cache hashCode is: 
org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache$LocalManualCache@1901ad34
2022-11-04 17:22:53,118 INFO  
org.apache.flink.connector.jdbc.table.JdbcRowDataLookupFunction [Source: 
daily[1] -> Calc[2] -> LookupJoin[3] -> Calc[4] -> ConstraintEnforcer[5] 
(2/8)#0] [] - cache hashCode is: 
org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache$LocalManualCache@6c441f09
 {code}
 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS]Introduce a time-segment based restart strategy

2022-11-04 Thread Paul Lam
Hi Weihua,

+1 for the new restart strategy you suggested.

We’re also using failure-rate strategy as the cluster-wide default and 
faced the same problem, which we solved with a similar approach.

FYI. We added a freeze period config option to failure-rate strategy. 
The freeze period would prevent counting further errors after the first
 failure happens, so that a burst errors would not exhaust the 
number of allow failures. 

Best,
Paul Lam

> 2022年11月4日 16:45,Weihua Hu  写道:
> 
> Hi, everyone
> 
> I'd like to bring up a discussion about restart strategy. Flink supports 3
> kinds of restart strategy. These work very well for jobs with specific
> configs, but for platform users who manage hundreds of jobs, there is no
> common strategy to use.
> 
> Let me explain the reason. We manage a lot of jobs, some are
> keyby-connected with one region per job, some are rescale-connected with
> many regions per job, and when using the failure rate restart strategy, we
> cannot achieve the same control with the same configuration.
> 
> For example, if I want the job to fail when there are 3 exceptions within 5
> minutes, the config would look like this:
> 
>> restart-strategy.failure-rate.max-failures-per-interval: 3
>> 
>> restart-strategy.failure-rate.failure-rate-interval: 5 min
>> 
> For the keyby-connected job, this config works well.
> 
> However, for the rescale-connected job, we need to consider the number of
> regions and the number of slots per TaskManager. If each TM has 3 slots,
> and these 3 slots run the task of 3 regions, then when one TaskManager
> crashes, it will trigger 3 regions to fail, and the job will fail because
> it exceeds the threshold of the restart strategy. To avoid the effect of
> single TM crashes, I must increase the max-failures-per-interval to 9, but
> after the change, user task exceptions will be more tolerant than I want.
> 
> 
> Therefore, I want to introduce a new restart strategy based on time
> periods. A continuous period of time (e.g., 5 minutes) is divided into
> segments of a specific length (e.g., 1 minute). If an exception occurs
> within a segment (no matter how many times), it is marked as a failed
> segment. Similar to failure-rate restart strategy, the job will fail when
> there are 'm' failed segments in the interval of 'n' .
> 
> In this mode, the keyby-connected and rescale-connected jobs can use
> unified configurations.
> 
> This is a user-relevant change, so if you think this is worth to do, maybe
> I can create a FLIP to describe it in detail.
> Best,
> Weihua



[jira] [Created] (FLINK-29886) networkThroughput 1000,100ms,OpenSSL Benchmark is failing

2022-11-04 Thread Piotr Nowojski (Jira)
Piotr Nowojski created FLINK-29886:
--

 Summary: networkThroughput 1000,100ms,OpenSSL Benchmark is failing
 Key: FLINK-29886
 URL: https://issues.apache.org/jira/browse/FLINK-29886
 Project: Flink
  Issue Type: Bug
  Components: Benchmarks, Runtime / Network
Affects Versions: 1.17.0
Reporter: Piotr Nowojski


http://codespeed.dak8s.net:8080/job/flink-master-benchmarks-java8/837/console

{noformat}
04:09:40  java.lang.NoClassDefFoundError: 
org/apache/flink/shaded/netty4/io/netty/internal/tcnative/CertificateCompressionAlgo
04:09:40at 
org.apache.flink.shaded.netty4.io.netty.handler.ssl.OpenSslX509KeyManagerFactory$OpenSslKeyManagerFactorySpi.engineInit(OpenSslX509KeyManagerFactory.java:129)
04:09:40at 
javax.net.ssl.KeyManagerFactory.init(KeyManagerFactory.java:256)
04:09:40at 
org.apache.flink.runtime.net.SSLUtils.getKeyManagerFactory(SSLUtils.java:279)
04:09:40at 
org.apache.flink.runtime.net.SSLUtils.createInternalNettySSLContext(SSLUtils.java:324)
04:09:40at 
org.apache.flink.runtime.net.SSLUtils.createInternalNettySSLContext(SSLUtils.java:303)
04:09:40at 
org.apache.flink.runtime.net.SSLUtils.createInternalClientSSLEngineFactory(SSLUtils.java:119)
04:09:40at 
org.apache.flink.runtime.io.network.netty.NettyConfig.createClientSSLEngineFactory(NettyConfig.java:147)
04:09:40at 
org.apache.flink.runtime.io.network.netty.NettyClient.init(NettyClient.java:115)
04:09:40at 
org.apache.flink.runtime.io.network.netty.NettyConnectionManager.start(NettyConnectionManager.java:87)
04:09:40at 
org.apache.flink.runtime.io.network.NettyShuffleEnvironment.start(NettyShuffleEnvironment.java:349)
04:09:40at 
org.apache.flink.streaming.runtime.io.benchmark.StreamNetworkBenchmarkEnvironment.setUp(StreamNetworkBenchmarkEnvironment.java:133)
04:09:40at 
org.apache.flink.streaming.runtime.io.benchmark.StreamNetworkThroughputBenchmark.setUp(StreamNetworkThroughputBenchmark.java:108)
04:09:40at 
org.apache.flink.benchmark.StreamNetworkThroughputBenchmarkExecutor$MultiEnvironment.setUp(StreamNetworkThroughputBenchmarkExecutor.java:117)
{noformat}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink-connector-shared-utils] leonardBang commented on pull request #1: [FLINK-29472] Add first version of release scripts

2022-11-04 Thread GitBox


leonardBang commented on PR #1:
URL: 
https://github.com/apache/flink-connector-shared-utils/pull/1#issuecomment-1303154579

   Happy to see you start working @zentol , Could you take a look my comments 
here? https://github.com/apache/flink/pull/21227 sorry to reply under this 
issue but I've tried ping you on slack but no response.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (FLINK-29885) SqlValidatorException :Column 'currency' is ambiguous

2022-11-04 Thread ZuoYan (Jira)
ZuoYan created FLINK-29885:
--

 Summary: SqlValidatorException :Column 'currency' is ambiguous
 Key: FLINK-29885
 URL: https://issues.apache.org/jira/browse/FLINK-29885
 Project: Flink
  Issue Type: Improvement
  Components: Documentation, Table SQL / API
Affects Versions: 1.16.0
Reporter: ZuoYan


When two tables are join, the two tables have the same field. When querying 
select, an exception will be thrown if the table name is not specified

exception content

Column 'currency' is ambiguous。

!image-2022-09-28-21-00-22-733.png!

!image-2022-09-28-21-00-09-054.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29884) Flaky test failure in finegrained_resource_management/SortMergeResultPartitionTest.testRelease

2022-11-04 Thread Nico Kruber (Jira)
Nico Kruber created FLINK-29884:
---

 Summary: Flaky test failure in 
finegrained_resource_management/SortMergeResultPartitionTest.testRelease
 Key: FLINK-29884
 URL: https://issues.apache.org/jira/browse/FLINK-29884
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination, Runtime / Network, Tests
Affects Versions: 1.17.0
Reporter: Nico Kruber
 Fix For: 1.17.0


{{SortMergeResultPartitionTest.testRelease}} failed with a timeout in the 
finegrained_resource_management tests:
{code:java}
Nov 03 17:28:07 [ERROR] Tests run: 20, Failures: 0, Errors: 1, Skipped: 0, Time 
elapsed: 64.649 s <<< FAILURE! - in 
org.apache.flink.runtime.io.network.partition.SortMergeResultPartitionTest
Nov 03 17:28:07 [ERROR] SortMergeResultPartitionTest.testRelease  Time elapsed: 
60.009 s  <<< ERROR!
Nov 03 17:28:07 org.junit.runners.model.TestTimedOutException: test timed out 
after 60 seconds
Nov 03 17:28:07 at 
org.apache.flink.runtime.io.network.partition.SortMergeResultPartitionTest.testRelease(SortMergeResultPartitionTest.java:374)
Nov 03 17:28:07 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
Nov 03 17:28:07 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
Nov 03 17:28:07 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
Nov 03 17:28:07 at java.lang.reflect.Method.invoke(Method.java:498)
Nov 03 17:28:07 at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
Nov 03 17:28:07 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
Nov 03 17:28:07 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
Nov 03 17:28:07 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
Nov 03 17:28:07 at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
Nov 03 17:28:07 at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
Nov 03 17:28:07 at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
Nov 03 17:28:07 at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
Nov 03 17:28:07 at 
java.util.concurrent.FutureTask.run(FutureTask.java:266)
Nov 03 17:28:07 at java.lang.Thread.run(Thread.java:748) {code}
[https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=42806=logs=a57e0635-3fad-5b08-57c7-a4142d7d6fa9=2ef0effc-1da1-50e5-c2bd-aab434b1c5b7]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[DISCUSS]Introduce a time-segment based restart strategy

2022-11-04 Thread Weihua Hu
Hi, everyone

I'd like to bring up a discussion about restart strategy. Flink supports 3
kinds of restart strategy. These work very well for jobs with specific
configs, but for platform users who manage hundreds of jobs, there is no
common strategy to use.

Let me explain the reason. We manage a lot of jobs, some are
keyby-connected with one region per job, some are rescale-connected with
many regions per job, and when using the failure rate restart strategy, we
cannot achieve the same control with the same configuration.

For example, if I want the job to fail when there are 3 exceptions within 5
minutes, the config would look like this:

> restart-strategy.failure-rate.max-failures-per-interval: 3
>
> restart-strategy.failure-rate.failure-rate-interval: 5 min
>
For the keyby-connected job, this config works well.

However, for the rescale-connected job, we need to consider the number of
regions and the number of slots per TaskManager. If each TM has 3 slots,
and these 3 slots run the task of 3 regions, then when one TaskManager
crashes, it will trigger 3 regions to fail, and the job will fail because
it exceeds the threshold of the restart strategy. To avoid the effect of
single TM crashes, I must increase the max-failures-per-interval to 9, but
after the change, user task exceptions will be more tolerant than I want.


Therefore, I want to introduce a new restart strategy based on time
periods. A continuous period of time (e.g., 5 minutes) is divided into
segments of a specific length (e.g., 1 minute). If an exception occurs
within a segment (no matter how many times), it is marked as a failed
segment. Similar to failure-rate restart strategy, the job will fail when
there are 'm' failed segments in the interval of 'n' .

In this mode, the keyby-connected and rescale-connected jobs can use
unified configurations.

This is a user-relevant change, so if you think this is worth to do, maybe
I can create a FLIP to describe it in detail.
Best,
Weihua


[jira] [Created] (FLINK-29883) Benchmarks are failing

2022-11-04 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-29883:


 Summary: Benchmarks are failing
 Key: FLINK-29883
 URL: https://issues.apache.org/jira/browse/FLINK-29883
 Project: Flink
  Issue Type: Bug
  Components: Benchmarks
Affects Versions: 1.17.0
Reporter: Chesnay Schepler


Slack message report a failure of the scripts:

{code:java}
Failed build 837 of flink-master-benchmarks-java8 (Open): 
hudson.AbortException: script returned exit code 1
{code}

can't look further into it since the link requires credentials.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29882) LargeDataITCase is not stable

2022-11-04 Thread Jane Chan (Jira)
Jane Chan created FLINK-29882:
-

 Summary: LargeDataITCase is not stable
 Key: FLINK-29882
 URL: https://issues.apache.org/jira/browse/FLINK-29882
 Project: Flink
  Issue Type: Bug
  Components: Table Store
Affects Versions: table-store-0.3.0
Reporter: Jane Chan
 Fix For: table-store-0.3.0


https://github.com/apache/flink-table-store/actions/runs/3391781964/jobs/5637271002



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29881) when Fetch results in sql gateway, the result using open api is different from using restful api

2022-11-04 Thread yiwei93 (Jira)
yiwei93 created FLINK-29881:
---

 Summary: when Fetch results in sql gateway, the result using open 
api is different  from using restful api  
 Key: FLINK-29881
 URL: https://issues.apache.org/jira/browse/FLINK-29881
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Gateway
Affects Versions: 1.16.0
Reporter: yiwei93


use restful api , fetch result from  
{code:java}
  
http://hermes02:8083/v1/sessions/9a8fcf37-73e5-43ca-bcc3-d44d8b71a24c/operations/b40085c1-a2c5-42f4-80e7-0971c5ef9710/result/0{code}
the result is 
{code:java}
{
  "results": {
    "columns": [
      {
        "name": "localtimestamp",
        "logicalType": {
          "type": "TIMESTAMP_WITHOUT_TIME_ZONE",
          "nullable": false,
          "precision": 3
        },
        "comment": null
      }
    ],
    "data": [
      {
        "kind": "INSERT",
        "fields": [
          "2022-11-04T11:41:40.036"
        ]
      }
    ]
  },
  "resultType": "PAYLOAD",
  "nextResultUri": 
"/v1/sessions/9a8fcf37-73e5-43ca-bcc3-d44d8b71a24c/operations/b40085c1-a2c5-42f4-80e7-0971c5ef9710/result/1"
}{code}
use api to fetch ,the code is 
{code:java}
ApiClient client = new ApiClient();
client.setHost("hermes02");
client.setPort(8083);
client.setScheme("http");
defaultApi = new DefaultApi(client);

OpenSessionRequestBody openSessionRequestBody = new OpenSessionRequestBody();
OpenSessionResponseBody openSessionResponseBody = 
defaultApi.openSession(openSessionRequestBody);

SessionHandle sessionHandle = new 
SessionHandle().identifier(UUID.fromString(openSessionResponseBody.getSessionHandle()));

ExecuteStatementRequestBody executeStatementRequestBody = new 
ExecuteStatementRequestBody().statement("select localtimestamp");
ExecuteStatementResponseBody executeStatementResponseBody = 
defaultApi.executeStatement(sessionHandle.getIdentifier(), 
executeStatementRequestBody);

FetchResultsResponseBody fetchResultsResponseBody = 
defaultApi.fetchResults(sessionHandle.getIdentifier(), 
UUID.fromString(executeStatementResponseBody.getOperationHandle()), 0L);{code}
the result is 
{code:java}
class FetchResultsResponseBody {
    results: class ResultSet {
        resultType: null
        nextToken: null
        resultSchema: null
        data: []
    }
    resultType: NOT_READY
    nextResultUri: 
/v1/sessions/9a8fcf37-73e5-43ca-bcc3-d44d8b71a24c/operations/b40085c1-a2c5-42f4-80e7-0971c5ef9710/result/0
}{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29880) Hive sink supports merge files in batch mode

2022-11-04 Thread luoyuxia (Jira)
luoyuxia created FLINK-29880:


 Summary: Hive sink supports merge files in batch mode
 Key: FLINK-29880
 URL: https://issues.apache.org/jira/browse/FLINK-29880
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / Hive
Affects Versions: 1.16.0
Reporter: luoyuxia
 Fix For: 1.17.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29879) Introduce operators for files merging

2022-11-04 Thread luoyuxia (Jira)
luoyuxia created FLINK-29879:


 Summary: Introduce operators for files merging
 Key: FLINK-29879
 URL: https://issues.apache.org/jira/browse/FLINK-29879
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / FileSystem
Affects Versions: 1.16.0
Reporter: luoyuxia
 Fix For: 1.17.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)