Re: Re: Weekly 1.15 Sync tomorrow cancelled

2022-03-22 Thread Yu Li
Thanks for the update Yun, and thanks all for the efforts! Look forward to
the first RC.

Best Regards,
Yu


On Tue, 22 Mar 2022 at 15:28, Yun Gao  wrote:

> Hi Yu,
>
> Currently we still have some ongoing blocker issues tracked under [1].
>
> We are tighly tracking the progress of these issues and we'll start
> creating
> the RC0 immediately after these blockers are solved, and hopefully inside
> this week.
>
> Best,
> Yun Gao
>
>
> [1]
> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=505=detail=FLINK-26779
>
>
>
> --Original Mail --
> *Sender:*Yu Li 
> *Send Date:*Tue Mar 22 15:25:17 2022
> *Recipients:*dev 
> *Subject:*Re: Weekly 1.15 Sync tomorrow cancelled
>
>> Thanks for the note Joe. Could you share with us the status of RC0
>> preparation and when it's planned to be created? Many thanks.
>>
>> Best Regards,
>> Yu
>>
>>
>> On Tue, 22 Mar 2022 at 04:00, Johannes Moser  wrote:
>>
>> > Hello,
>> >
>>
>> > As all relevant issues are assigned and worked on there’s no need to sync
>> > tomorrow.
>> >
>> > Therefor we decided to cancel the sync.
>> >
>> > Best,
>> > Joe
>>
>


Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL

2022-03-22 Thread Peter Huang
Hi Ron,

Thanks for reviving the discussion of the work. The design looks good. A
small typo in the FLIP is that currently it is marked as released in 1.16.


Best Regards
Peter Huang


On Tue, Mar 22, 2022 at 10:58 PM Mang Zhang  wrote:

> hi Yuxia,
>
>
> Thanks for your reply. Your reminder is very important !
>
>
> Since we download the file to the local, remember to clean it up when the
> flink client exits
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
>
> Best regards,
> Mang Zhang
>
>
>
>
>
> At 2022-03-23 10:02:26, "罗宇侠(莫辞)"
>  wrote:
> >Hi Ron, Thanks for starting this dicuss, some Spark/Hive users will
> benefit from it. The flip looks good to me. I just have two minor questions:
> >1. For synax explanation, I see it's "Create  function as
> identifier", I think the word "identifier" may not be
> self-dedescriptive for actually it's not a random name but the name of the
> class that provides the implementation for function to be create.
> >May be it'll be more clear to use "class_name" replace "identifier" just
> like what Hive[1]/Spark[2] do.
> >
> >2.  >> If the resource used is a remote resource, it will first download
> the resource to a local temporary directory, which will be generated using
> UUID, and then register the local path to the user class loader.
> >For the above explanation in this FLIP, It seems for such statement sets,
> >""
> >Create  function as org.apache.udf1 using jar 'hdfs://myudfs.jar';
> >Create  function as org.apache.udf2 using jar 'hdfs://myudfs.jar';
> >""
> > it'll download the resource 'hdfs://myudfs.jar' for twice. So is it
> possible to provide some cache mechanism that we won't need to download /
> store for twice?
> >
> >
> >Best regards,
> >Yuxia
> >[1] https://cwiki.apache.org/confluence/display/hive/languagemanual+ddl
> >[2]
> https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-ddl-create-function.html--
> >发件人:Mang Zhang
> >日 期:2022年03月22日 11:35:24
> >收件人:
> >主 题:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> >
> >Hi Ron, Thank you so much for this suggestion, this is so good.
> >In our company, when users use custom UDF, it is very inconvenient, and
> the code needs to be packaged into the job jar,
> >and cannot refer to the existing udf jar through the existing udf jar.
> >Or pass in the jar reference in the startup command.
> >If we implement this feature, users can focus on their own business
> development.
> >I can also contribute if needed.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >--
> >
> >Best regards,
> >Mang Zhang
> >
> >
> >
> >
> >
> >At 2022-03-21 14:57:32, "刘大龙"  wrote:
> >>Hi, everyone
> >>
> >>
> >>
> >>
> >>I would like to open a discussion for support advanced Function DDL,
> this proposal is a continuation of FLIP-79 in which Flink Function DDL is
> defined. Until now it is partially released as the Flink function DDL with
> user defined resources is not clearly discussed and implemented. It is an
> important feature for support to register UDF with custom jar resource,
> users can use UDF more more easily without having to put jars under the
> classpath in advance.
> >>
> >>Looking forward to your feedback.
> >>
> >>
> >>
> >>
> >>[1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-214+Support+Advanced+Function+DDL
> >>
> >>
> >>
> >>
> >>Best,
> >>
> >>Ron
> >>
> >>
> >
>


Re:回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL

2022-03-22 Thread Mang Zhang
hi Yuxia,


Thanks for your reply. Your reminder is very important !


Since we download the file to the local, remember to clean it up when the flink 
client exits













--

Best regards,
Mang Zhang





At 2022-03-23 10:02:26, "罗宇侠(莫辞)"  
wrote:
>Hi Ron, Thanks for starting this dicuss, some Spark/Hive users will benefit 
>from it. The flip looks good to me. I just have two minor questions:
>1. For synax explanation, I see it's "Create  function as identifier", 
>I think the word "identifier" may not be self-dedescriptive for actually it's 
>not a random name but the name of the class that provides the implementation 
>for function to be create.
>May be it'll be more clear to use "class_name" replace "identifier" just like 
>what Hive[1]/Spark[2] do.
>
>2.  >> If the resource used is a remote resource, it will first download the 
>resource to a local temporary directory, which will be generated using UUID, 
>and then register the local path to the user class loader.
>For the above explanation in this FLIP, It seems for such statement sets,
>""
>Create  function as org.apache.udf1 using jar 'hdfs://myudfs.jar';
>Create  function as org.apache.udf2 using jar 'hdfs://myudfs.jar';
>""
> it'll download the resource 'hdfs://myudfs.jar' for twice. So is it possible 
> to provide some cache mechanism that we won't need to download / store for 
> twice?
> 
>
>Best regards,
>Yuxia
>[1] https://cwiki.apache.org/confluence/display/hive/languagemanual+ddl
>[2] 
>https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-ddl-create-function.html--
>发件人:Mang Zhang
>日 期:2022年03月22日 11:35:24
>收件人:
>主 题:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
>
>Hi Ron, Thank you so much for this suggestion, this is so good.
>In our company, when users use custom UDF, it is very inconvenient, and the 
>code needs to be packaged into the job jar, 
>and cannot refer to the existing udf jar through the existing udf jar.
>Or pass in the jar reference in the startup command.
>If we implement this feature, users can focus on their own business 
>development.
>I can also contribute if needed.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>--
>
>Best regards,
>Mang Zhang
>
>
>
>
>
>At 2022-03-21 14:57:32, "刘大龙"  wrote:
>>Hi, everyone
>>
>>
>>
>>
>>I would like to open a discussion for support advanced Function DDL, this 
>>proposal is a continuation of FLIP-79 in which Flink Function DDL is defined. 
>>Until now it is partially released as the Flink function DDL with user 
>>defined resources is not clearly discussed and implemented. It is an 
>>important feature for support to register UDF with custom jar resource, users 
>>can use UDF more more easily without having to put jars under the classpath 
>>in advance.
>>
>>Looking forward to your feedback.
>>
>>
>>
>>
>>[1] 
>>https://cwiki.apache.org/confluence/display/FLINK/FLIP-214+Support+Advanced+Function+DDL
>>
>>
>>
>>
>>Best,
>>
>>Ron
>>
>>
>


[jira] [Created] (FLINK-26814) Solve the problem that "should only have one jar" is reported as an error when the application submits an operation in the yarn mode

2022-03-22 Thread zhangjingcun (Jira)
zhangjingcun created FLINK-26814:


 Summary: Solve the problem that "should only have one jar" is 
reported as an error when the application submits an operation in the yarn mode
 Key: FLINK-26814
 URL: https://issues.apache.org/jira/browse/FLINK-26814
 Project: Flink
  Issue Type: Bug
  Components: Deployment / YARN
Affects Versions: 1.14.4
Reporter: zhangjingcun


Solve the problem that "should only have one jar" is reported as an error when 
the application submits an operation in the yarn mode



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL

2022-03-22 Thread 罗宇侠(莫辞)
Hi Ron, Thanks for starting this dicuss, some Spark/Hive users will benefit 
from it. The flip looks good to me. I just have two minor questions:
1. For synax explanation, I see it's "Create  function as identifier", 
I think the word "identifier" may not be self-dedescriptive for actually it's 
not a random name but the name of the class that provides the implementation 
for function to be create.
May be it'll be more clear to use "class_name" replace "identifier" just like 
what Hive[1]/Spark[2] do.

2.  >> If the resource used is a remote resource, it will first download the 
resource to a local temporary directory, which will be generated using UUID, 
and then register the local path to the user class loader.
For the above explanation in this FLIP, It seems for such statement sets,
""
Create  function as org.apache.udf1 using jar 'hdfs://myudfs.jar';
Create  function as org.apache.udf2 using jar 'hdfs://myudfs.jar';
""
 it'll download the resource 'hdfs://myudfs.jar' for twice. So is it possible 
to provide some cache mechanism that we won't need to download / store for 
twice?
​ 

Best regards,
Yuxia
[1] https://cwiki.apache.org/confluence/display/hive/languagemanual+ddl
[2] 
https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-ddl-create-function.html--
发件人:Mang Zhang
日 期:2022年03月22日 11:35:24
收件人:
主 题:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL

Hi Ron, Thank you so much for this suggestion, this is so good.
In our company, when users use custom UDF, it is very inconvenient, and the 
code needs to be packaged into the job jar, 
and cannot refer to the existing udf jar through the existing udf jar.
Or pass in the jar reference in the startup command.
If we implement this feature, users can focus on their own business development.
I can also contribute if needed.















--

Best regards,
Mang Zhang





At 2022-03-21 14:57:32, "刘大龙"  wrote:
>Hi, everyone
>
>
>
>
>I would like to open a discussion for support advanced Function DDL, this 
>proposal is a continuation of FLIP-79 in which Flink Function DDL is defined. 
>Until now it is partially released as the Flink function DDL with user defined 
>resources is not clearly discussed and implemented. It is an important feature 
>for support to register UDF with custom jar resource, users can use UDF more 
>more easily without having to put jars under the classpath in advance.
>
>Looking forward to your feedback.
>
>
>
>
>[1] 
>https://cwiki.apache.org/confluence/display/FLINK/FLIP-214+Support+Advanced+Function+DDL
>
>
>
>
>Best,
>
>Ron
>
>



[jira] [Created] (FLINK-26813) Supports ADD/MODIFY column/watermark/constraint syntax parse for ALTER TABLE

2022-03-22 Thread dalongliu (Jira)
dalongliu created FLINK-26813:
-

 Summary: Supports ADD/MODIFY column/watermark/constraint syntax 
parse for ALTER TABLE
 Key: FLINK-26813
 URL: https://issues.apache.org/jira/browse/FLINK-26813
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / API
Affects Versions: 1.16.0
Reporter: dalongliu
 Fix For: 1.16.0






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26812) Flink native k8s integration should support owner reference

2022-03-22 Thread Thomas Weise (Jira)
Thomas Weise created FLINK-26812:


 Summary: Flink native k8s integration should support owner 
reference
 Key: FLINK-26812
 URL: https://issues.apache.org/jira/browse/FLINK-26812
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / Kubernetes
Affects Versions: 1.14.4
Reporter: Thomas Weise


Currently the JobManager deployment and other resource created by the 
integration do not support the owner reference, allowing for the possibility of 
them to become orphaned when managed by a higher level entity like the CR of 
the k8s operator. Any top level resource created should optionally have an 
owner reference to ensure safe cleanup when managing entity gets removed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26811) Document CRD upgrade process

2022-03-22 Thread Thomas Weise (Jira)
Thomas Weise created FLINK-26811:


 Summary: Document CRD upgrade process
 Key: FLINK-26811
 URL: https://issues.apache.org/jira/browse/FLINK-26811
 Project: Flink
  Issue Type: Sub-task
Reporter: Thomas Weise


We need to document how to update the CRD with a newer version. During 
development, we delete the old CRD and create it from scratch. In an 
environment with existing deployments that isn't possible, as deleting the CRD 
would wipe out all existing CRs.

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26810) The local time zone does not take effect when the dynamic index uses a field of type timestamp_ltz

2022-03-22 Thread jinfeng (Jira)
jinfeng created FLINK-26810:
---

 Summary: The local time zone does not take effect when the dynamic 
index uses a field of type timestamp_ltz
 Key: FLINK-26810
 URL: https://issues.apache.org/jira/browse/FLINK-26810
 Project: Flink
  Issue Type: Bug
  Components: Connectors / ElasticSearch
Reporter: jinfeng
 Attachments: 截屏2022-03-23 上午12.48.02.png

When using  TIMESTAMP_WITH_LOCAL_TIMEZONE field to generate a dynamic index,  
it will alway use UTC timezone.   

 

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26809) ChangelogStorageMetricsTest.testAttemptsPerUpload failed

2022-03-22 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-26809:
-

 Summary: ChangelogStorageMetricsTest.testAttemptsPerUpload failed
 Key: FLINK-26809
 URL: https://issues.apache.org/jira/browse/FLINK-26809
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Metrics, Runtime / State Backends
Affects Versions: 1.15.0, 1.16.0
Reporter: Matthias Pohl


[This 
build|https://dev.azure.com/mapohl/flink/_build/results?buildId=901=logs=f3dc9b18-b77a-55c1-591e-264c46fe44d1=2d3cd81e-1c37-5c31-0ee4-f5d5cdb9324d=24226]
 failed due a failure in {{ChangelogStorageMetricsTest.testAttemptsPerUpload}}:

{code}
Mar 22 12:23:09 [ERROR] Failures: 
Mar 22 12:23:09 [ERROR]   ChangelogStorageMetricsTest.testAttemptsPerUpload:208 
Mar 22 12:23:09 expected: 3L
Mar 22 12:23:09  but was: 0L
{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26808) [flink v1.14.2] Submit jobs via REST API not working after set web.submit.enable: false

2022-03-22 Thread Jira
Luís Manuel Azevedo Costa created FLINK-26808:
-

 Summary: [flink v1.14.2] Submit jobs via REST API not working 
after set web.submit.enable: false
 Key: FLINK-26808
 URL: https://issues.apache.org/jira/browse/FLINK-26808
 Project: Flink
  Issue Type: Bug
  Components: Client / Job Submission
Affects Versions: 1.14.2
Reporter: Luís Manuel Azevedo Costa


Greetings,

I am using flink version 1.14.2 and after changing web.submit.enable to false, 
job submission via REST API is no longer working. 
The app that uses flink receives a 404 with "Not found: /jars/upload" 
Looking into 
[documentation|[https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/config/],]
  saw that web.upload.dir is only used if  {{web.submit.enable}} is true, if 
not it will be used JOB_MANAGER_WEB_TMPDIR_KEY
Doing a curl to /jars it returns:
{code:java}
curl -v http://localhost:8081/jars

HTTP/1.1 404 Not Found
{"errors":["Unable to load requested file /jars."]} {code}
Found this issue related to option web.submit.enable 
https://issues.apache.org/jira/browse/FLINK-13799

Could you please let me know if this is an issue that you are already aware?
Thanks in advance

Best regards,
Luís Costa

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[ANNOUNCE] Call for Presentations is open for Flink Forward San Francisco 2022 in-person!

2022-03-22 Thread Timo Walther

Hi everyone,

We’re very excited to announce our Call for Presentations for Flink 
Forward San Francisco 2022! If you have an inspiring Apache Flink use 
case, real-world application, or best practice, Flink Forward is the 
platform for you to share your experiences.


https://www.flink-forward.org/sf-2022/call-for-presentations

After a couple of years in a virtual format, we're excited to announce 
that this year's event will be held in-person at the Hyatt Regency, 
August 1-3; filled with 2-days training and 1-day of conference (August 
3rd).


We look forward to receiving your submissions on:
- Flink Use Cases
- Flink Operations
- Technology Deep Dives
- Ecosystem
- Community

Why speak at Flink Forward San Francisco 2022?
- Expand your network and raise your profile in the Apache Flink community
- Share your experiences and engage with the audience with Q following 
your session

- Your talk will be promoted to the Flink community
- Commemorate Flink Forward with us with the return of Flink Fest at our 
in-person event!

- Get exclusive Flink Forward speaker swag ;-)

Our program committee, can’t wait to review your talk ideas. Be sure to 
submit your talk by May 2, 11:59 pm PDT!


See you there!

Timo Walther
Program Committee Chair

PS: Regarding Covid-19 regulations, we are following the CDC guidelines 
closely. As we near closer to the event, we will update our policy 
accordingly.


[jira] [Created] (FLINK-26807) The batch job not work well with Operator

2022-03-22 Thread Aitozi (Jira)
Aitozi created FLINK-26807:
--

 Summary: The batch job not work well with Operator
 Key: FLINK-26807
 URL: https://issues.apache.org/jira/browse/FLINK-26807
 Project: Flink
  Issue Type: Sub-task
  Components: Kubernetes Operator
Reporter: Aitozi


When I test the batch job or finite streaming job, the flinkdep will be an 
orphaned resource and keep listing job. Because the JobManagerDeploymentStatus 
will not be sync again.

I think we should sync the global terminated status from the application job, 
and do the clean up work for the flinkdep resource



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26806) Add deployment guide for Argo CD

2022-03-22 Thread Xin Hao (Jira)
Xin Hao created FLINK-26806:
---

 Summary: Add deployment guide for Argo CD
 Key: FLINK-26806
 URL: https://issues.apache.org/jira/browse/FLINK-26806
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Xin Hao


I have deployed the Flink Operator in my cluster with Argo CD.

But there will be an error caused by `CRD too big`.

This is not some bug caused by the operator itself, just the same as this issue 
[https://github.com/prometheus-operator/prometheus-operator/issues/4439.]

But maybe it's worth adding some guidelines in the document?

We need to tell the users to use Argo CD a bit differently from the other 
applications.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: proposal to allow configuration of Statefun exponential backoff

2022-03-22 Thread Galen Warren
No rush, but I just wanted to make sure this didn't fall through the
cracks. Thanks!

On Fri, Mar 18, 2022 at 4:43 PM Galen Warren 
wrote:

> Currently, when the execution of a stateful function fails, the call is
> retried using an exponential backoff scheme -- the initial delay is 10ms
> (original sync client) or 2ms (async Netty client), and it doubles on every
> subsequent failure, without limit, until the call succeeds or its budgeted
> time to complete expires.
>
> It would be useful to be able to configure this, i.e. to supply an
> initial retry delay interval and to supply a max retry interval (i.e. not
> to be unbounded as it is now).
>
> Would you be interested in a contribution along these lines? I've
> implemented this locally for the Netty client; honestly, I don't need it
> for the older, synchronous client and would be fine just doing it for the
> newer, async one, but I could also implement it for the sync client as well.
>
> Please let me know, thanks.
>
>
>


[DISCUSS] Structure of the Flink Documentation (Languages & APIs)

2022-03-22 Thread Konstantin Knauf
Hi everyone,

I would like to discuss a particular aspect of our documentation: the
top-level structure with respect to languages and APIs. The current
structure is inconsistent and the direction is unclear to me, which makes
it hard for me to contribute gradual improvements.

Currently, the Python documentation has its own independent branch in the
documentation [1]. In the rest of the documentation, Python is sometimes
included like in this Table API page [2] and sometimes ignored like on the
project setup pages [3]. Scala and Java on the other hand are always
documented in parallel next to each other in tabs.

The way I see it, most parts (application development, connectors, getting
started, project setup) of our documentation have two primary dimensions:
API (DataStream, Table API), Language (Python, Java, Scala)

In addition, there is SQL, for which the language is only a minor factor
(UDFs), but which generally requires a different structure (different
audience, different tools). On the other hand, SQL and Table API have some
conceptual overlap, whereas I doubt these concepts are of big interest
to SQL users. So, to me SQL should be treated separately in any case with
links to the Table API documentation for some concepts.

I think, in general, both approaches can work:


*Option 1: "Language Tabs"*
Application Development
> DataStream API  (Java, Scala, Python)
> Table API (Java, Scala, Python)
> SQL


*Option 2: "Language First" *
Java Development Guide
> Getting Started
> DataStream API
> Table API
Python Development Guide
> Getting Started
> Datastream API
> Table API
SQL Development Guide

I don't have a strong opinion on this, but tend towards "Language First".

* First, I assume, users actually first decide on their language/tools of
choice and then move on to the API.

* Second, most of the Flink Documentation currently is using a "Language
Tabs" approach, but this might become obsolete in the long-term anyway as
we move more and more in a Scala-free direction.

For the connectors, I think, there is a good argument for "Language & API
Embedded", because documenting every connector for each API and language
separately would result in a lot of duplication. Here, I would go one step
further then what we have right now and target

Connectors
-> Kafka (All APIs incl. SQL, All Languages)
-> Kinesis (same)
-> ...

This also results in a quick overview for users about which connectors
exist and plays well with our plan of externalizing connectors.

For completeness & scope of the discussion: there are two outdated FLIPs on
documentation (42, 60), which both have not been implemented, are partially
contradicting each other and are generally out-of-date. I specifically
don't intend to add another FLIP to this graveyard, but still reach a
consensus on the high-level direction.

What do you think?

Cheers,

Konstantin

-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


[jira] [Created] (FLINK-26805) Managed table breaks legacy connector without 'connector.type'

2022-03-22 Thread Jingsong Lee (Jira)
Jingsong Lee created FLINK-26805:


 Summary: Managed table breaks legacy connector without 
'connector.type'
 Key: FLINK-26805
 URL: https://issues.apache.org/jira/browse/FLINK-26805
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / API
Reporter: Jingsong Lee
Assignee: Jingsong Lee
 Fix For: 1.15.0


{code:java}
CREATE TABLE T (a INT) WITH ('type'='legacy');
INSERT INTO T VALUES (1); {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] Support the session job management in kubernetes operator

2022-03-22 Thread Aitozi
> We should check there's no running Flink job before deleting a session
FlinkDeployment

If we have to prevent stopping the session cluster before all session jobs
are down already. I think we should avoid deleting the session deployment
by returning DeleteControl#noFinalizerRemoval()[1] in cleanup, And then
schedule the reconcile to check and delete the session cluster until there
is no session job instance.


[1]:
https://github.com/java-operator-sdk/java-operator-sdk/blob/b91221bb54af19761a617bf18eef381e8ceb3b4c/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/reconciler/Reconciler.java#L14




Yang Wang  于2022年3月22日周二 18:48写道:

> The relationship between the session deployment and the Flink jobs looks
> good to me except for the session deployment deletion.
>
> I strongly suggest not to set the ownerference of the FlinkSessionJob to
> the session FlinkDeployment.
> Otherwise, it will be a disaster if the session FlinkDeployment is deleted
> accidentally and there are many running jobs.
> We should check there's no running Flink job before deleting a session
> FlinkDeployment. And this will force the users to have a double
> confirmation.
>
> Best,
> Yang
>
>
> Aitozi  于2022年3月22日周二 17:49写道:
>
> > Hi Thomas:
> >
> > Thanks for your valuable question. Let’s make the relationship
> between
> > the session deployment and the jobs more clear.
> >
> > IMO, the session deployment and jobs interact in these situations:
> >
> > - Create the session job. Then FlinkSessionJobController will wait for
> the
> > session cluster ready then submit the job. The look up key is namespace
> and
> > clusterId.
> >
> > - Delete the session job. Then it will cancel the current session job.
> >
> > - Delete the session deployment. It will have to delete the session job
> > first, we could set the ownerference of the FlinkSessionJob to let the
> > Kubernetes trigger the cleanup session jobs before removing the session
> > deployment.
> >
> > - Upgrade the session deployment. It will be a critical part, because it
> > will affect all the session jobs. We should suspend the job first and
> then
> > upgrade the session cluster. So I tend to validate that all the jobs are
> > suspended and then perform the session cluster upgrade. After upgrade
> then
> > change the session jobs to running manually.
> >
> > What do you think about this? If there is no objection, I will clarify it
> > in the FLIP doc.
> >
> >
> > Besides, sorry for the rough vote and discussion process. It's my first
> > time driving this, I will keep that in mind next time :)
> > Best,
> > Aitozi.
> >
> > Yang Wang  于2022年3月22日周二 10:11写道:
> >
> > > I think the session cluster could not be deleted unless all the running
> > > jobs have finished or cancelled. I agree this should be clarified in
> the
> > > FLIP.
> > >
> > > Best,
> > > Yang
> > >
> > > Thomas Weise  于2022年3月22日周二 09:26写道:
> > >
> > > > Hi Aitozi,
> > > >
> > > > Thanks for the proposal. Can you please clarify in the FLIP the
> > > > relationship between the session deployment and the jobs that depend
> on
> > > it?
> > > > Will, for example, the operator ensure that the individual jobs are
> > > > deleted when the underlying cluster is deleted?
> > > >
> > > > Side note: When the discussion thread started 5 days ago and a FLIP
> > vote
> > > > was started 2 days later and there is also a weekend included, then
> > this
> > > is
> > > > probably on the short side for broader feedback.
> > > >
> > > > Thanks,
> > > > Thomas
> > > >
> > > >
> > > > On Fri, Mar 18, 2022 at 4:01 AM Yang Wang 
> > wrote:
> > > >
> > > > > Great work. Since we are introducing a new public API, it deserves
> a
> > > > FLIP.
> > > > > And the FLIP will help the later contributors catch up soon.
> > > > >
> > > > > Best,
> > > > > Yang
> > > > >
> > > > > Gyula Fóra  于2022年3月18日周五 18:11写道:
> > > > >
> > > > > > Thank Aitozi, a FLIP might be an overkill at this point but no
> harm
> > > in
> > > > > > voting on it anyways :)
> > > > > >
> > > > > > Looks good!
> > > > > >
> > > > > > Gyula
> > > > > >
> > > > > > On Fri, Mar 18, 2022 at 10:25 AM Aitozi 
> > > wrote:
> > > > > >
> > > > > > > Hi Guys:
> > > > > > >
> > > > > > > FYI, I have integrated your comments and drawn the
> > > FLIP-215[1], I
> > > > > > will
> > > > > > > create another thread to vote for it.
> > > > > > >
> > > > > > > [1]:
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-215%3A+Introduce+FlinkSessionJob+CRD+in+the+kubernetes+operator
> > > > > > >
> > > > > > > Best,
> > > > > > >
> > > > > > > Aitozi.
> > > > > > >
> > > > > > >
> > > > > > > Aitozi  于2022年3月17日周四 11:16写道:
> > > > > > >
> > > > > > > > Hi Biao Geng:
> > > > > > > >
> > > > > > > >Thanks for your feedback, I'm +1 to go with option#2.
> It's a
> > > > good
> > > > > > > > point that
> > > > > > > >
> > > > > > > > we should improve the error message debugging for the session
> 

RE: EXT: Re: Flink cassandra connector performance issue

2022-03-22 Thread Ghiya, Jay (GE Healthcare)
Absolutely . Understood! We shall do. 

-Original Message-
From: Chesnay Schepler  
Sent: 22 March 2022 15:52
To: dev@flink.apache.org; Martijn Visser ; Marco 
Zühlke 
Cc: R, Aromal (GE Healthcare, consultant) ; Nellimarla, Aswini 
(GE Healthcare) 
Subject: EXT: Re: Flink cassandra connector performance issue

WARNING: This email originated from outside of GE. Please validate the sender's 
email address before clicking on links or attachments as they may not be safe.

I'd be quite curious on what the analysis that the mapper isn't re-used is 
based on.
A given CassandraPojoSink instance has been re-using the same mapper since it 
was added in 1.1:
https://github.com/apache/flink/blob/release-1.1/flink-streaming-connectors/flink-connector-cassandra/src/main/java/org/apache/flink/streaming/connectors/cassandra/CassandraPojoSink.java

On 22/03/2022 09:22, Martijn Visser wrote:
> Hi Jay,
>
> Thanks for reaching out! As just mentioned in the ticket, the 
> Cassandra connector hasn't been actively maintained in the last couple 
> months/years.
> It would be great if there would be more contributors who could help 
> out with this. See also my previous request [1] on the Dev mailing 
> list, which also contains an overview of the current issues with the 
> connector.
>
> I'm also including @Marco Zühlke  who previously 
> volunteered to help out with the connector. It would be great if you 
> could all help out :)
>
> Best regards,
>
> Martijn Visser
> https://twitter.com/MartijnVisser82
>
> [1] https://lists.apache.org/thread/1qokt5tp8dcp58dmshbwjc43ssbm1vvk
>
>
> On Tue, 22 Mar 2022 at 07:33, Ghiya, Jay (GE Healthcare) 
> 
> wrote:
>
>> Hi @dev@flink.apache.org,
>>
>> Greetings from gehc.
>>
>> This is regarding flink Cassandra connector implementation that could 
>> be causing the performance issue that we are facing.
>>
>> The summary of the error is
>>
>> "Insertions into scylla might be suffering. Expect performance 
>> problems unless this is resolved."
>>
>> Upon doing initial analysis we figured out - "flink cassandra 
>> connector is not keeping instance of mapping manager that is used to 
>> convert a pojo to cassandra row. Ideally the mapping manager should 
>> have the same life time as cluster and session objects which are also 
>> created once when the driver is initialized"
>>
>> Reference:
>> https://stackoverflow.com/questions/59203418/cassandra-java-driver-wa
>> rning
>>
>> Can we take a look at this? Also can we help fix this ? @R, Aromal 
>> (GE Healthcare, consultant) Is our lead dev on this.
>>
>> Here is the jira issue on the same -
>> https://issues.apache.org/jira/browse/FLINK-26793
>>
>> -Jay
>>
>>



[jira] [Created] (FLINK-26804) Operator e2e tests sporadically fail: DEPLOYED_NOT_READY

2022-03-22 Thread Jira
Márton Balassi created FLINK-26804:
--

 Summary: Operator e2e tests sporadically fail: DEPLOYED_NOT_READY
 Key: FLINK-26804
 URL: https://issues.apache.org/jira/browse/FLINK-26804
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Reporter: Márton Balassi
Assignee: Márton Balassi


I managed to introduce a sporadic failure scenario for the e2e tests via my 
solution of FLINK-26715. Since the operator only checks on the job every couple 
second the job might still be observed as being in DEPLOYED_NOT_READY state 
even after successfully completing checkpoints.

{code:bash}
Run ls e2e-tests/test_*.sh | while read script_test;do \
Running e2e-tests/test_kubernetes_application_ha.sh
persistentvolumeclaim/flink-example-statemachine created
Error from server (InternalError): error when creating 
"e2e-tests/data/cr.yaml": Internal error occurred: failed calling webhook 
"vflinkdeployments.flink.apache.org": failed to call webhook: Post 
"https://flink-operator-webhook-service.default.svc:443/validate?timeout=10s": 
dial tcp 10.106.63.26:443: connect: connection refused
Command: kubectl apply -f e2e-tests/data/cr.yaml failed. Retrying...
flinkdeployment.flink.apache.org/flink-example-statemachine created
persistentvolumeclaim/flink-example-statemachine unchanged
Error from server (NotFound): deployments.apps "flink-example-statemachine" not 
found
Command: kubectl get deploy/flink-example-statemachine failed. Retrying...
NAME READY   UP-TO-DATE   AVAILABLE   AGE
flink-example-statemachine   0/1 10   1s
deployment.apps/flink-example-statemachine condition met
Waiting for jobmanager pod flink-example-statemachine-7fcf55c88b-h5r7r ready.
pod/flink-example-statemachine-7fcf55c88b-h5r7r condition met
Waiting for log "Rest endpoint listening at"...
Log "Rest endpoint listening at" shows up.
Waiting for log "Completed checkpoint 
[0-[9](https://github.com/apache/flink-kubernetes-operator/runs/5640468148?check_suite_focus=true#step:9:9)]+
 for job"...
Log "Completed checkpoint [0-9]+ for job" shows up.
Successfully verified that 
flinkdep/flink-example-statemachine.status.jobManagerDeploymentStatus is in 
READY state.
Successfully verified that 
flinkdep/flink-example-statemachine.status.jobStatus.state is in RUNNING state.
Kill the flink-example-statemachine-7fcf55c88b-h5r7r
Defaulted container "flink-main-container" out of: flink-main-container, 
artifacts-fetcher (init)
Waiting for log "Restoring job  from 
Checkpoint"...
Log "Restoring job  from Checkpoint" shows up.
Waiting for log "Completed checkpoint [0-9]+ for job"...
Log "Completed checkpoint [0-9]+ for job" shows up.
Status verification for 
flinkdep/flink-example-statemachine.status.jobManagerDeploymentStatus failed. 
It is DEPLOYED_NOT_READY instead of READY.
Debugging failed e2e test:
{code}




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] Support the session job management in kubernetes operator

2022-03-22 Thread Yang Wang
The relationship between the session deployment and the Flink jobs looks
good to me except for the session deployment deletion.

I strongly suggest not to set the ownerference of the FlinkSessionJob to
the session FlinkDeployment.
Otherwise, it will be a disaster if the session FlinkDeployment is deleted
accidentally and there are many running jobs.
We should check there's no running Flink job before deleting a session
FlinkDeployment. And this will force the users to have a double
confirmation.

Best,
Yang


Aitozi  于2022年3月22日周二 17:49写道:

> Hi Thomas:
>
> Thanks for your valuable question. Let’s make the relationship between
> the session deployment and the jobs more clear.
>
> IMO, the session deployment and jobs interact in these situations:
>
> - Create the session job. Then FlinkSessionJobController will wait for the
> session cluster ready then submit the job. The look up key is namespace and
> clusterId.
>
> - Delete the session job. Then it will cancel the current session job.
>
> - Delete the session deployment. It will have to delete the session job
> first, we could set the ownerference of the FlinkSessionJob to let the
> Kubernetes trigger the cleanup session jobs before removing the session
> deployment.
>
> - Upgrade the session deployment. It will be a critical part, because it
> will affect all the session jobs. We should suspend the job first and then
> upgrade the session cluster. So I tend to validate that all the jobs are
> suspended and then perform the session cluster upgrade. After upgrade then
> change the session jobs to running manually.
>
> What do you think about this? If there is no objection, I will clarify it
> in the FLIP doc.
>
>
> Besides, sorry for the rough vote and discussion process. It's my first
> time driving this, I will keep that in mind next time :)
> Best,
> Aitozi.
>
> Yang Wang  于2022年3月22日周二 10:11写道:
>
> > I think the session cluster could not be deleted unless all the running
> > jobs have finished or cancelled. I agree this should be clarified in the
> > FLIP.
> >
> > Best,
> > Yang
> >
> > Thomas Weise  于2022年3月22日周二 09:26写道:
> >
> > > Hi Aitozi,
> > >
> > > Thanks for the proposal. Can you please clarify in the FLIP the
> > > relationship between the session deployment and the jobs that depend on
> > it?
> > > Will, for example, the operator ensure that the individual jobs are
> > > deleted when the underlying cluster is deleted?
> > >
> > > Side note: When the discussion thread started 5 days ago and a FLIP
> vote
> > > was started 2 days later and there is also a weekend included, then
> this
> > is
> > > probably on the short side for broader feedback.
> > >
> > > Thanks,
> > > Thomas
> > >
> > >
> > > On Fri, Mar 18, 2022 at 4:01 AM Yang Wang 
> wrote:
> > >
> > > > Great work. Since we are introducing a new public API, it deserves a
> > > FLIP.
> > > > And the FLIP will help the later contributors catch up soon.
> > > >
> > > > Best,
> > > > Yang
> > > >
> > > > Gyula Fóra  于2022年3月18日周五 18:11写道:
> > > >
> > > > > Thank Aitozi, a FLIP might be an overkill at this point but no harm
> > in
> > > > > voting on it anyways :)
> > > > >
> > > > > Looks good!
> > > > >
> > > > > Gyula
> > > > >
> > > > > On Fri, Mar 18, 2022 at 10:25 AM Aitozi 
> > wrote:
> > > > >
> > > > > > Hi Guys:
> > > > > >
> > > > > > FYI, I have integrated your comments and drawn the
> > FLIP-215[1], I
> > > > > will
> > > > > > create another thread to vote for it.
> > > > > >
> > > > > > [1]:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-215%3A+Introduce+FlinkSessionJob+CRD+in+the+kubernetes+operator
> > > > > >
> > > > > > Best,
> > > > > >
> > > > > > Aitozi.
> > > > > >
> > > > > >
> > > > > > Aitozi  于2022年3月17日周四 11:16写道:
> > > > > >
> > > > > > > Hi Biao Geng:
> > > > > > >
> > > > > > >Thanks for your feedback, I'm +1 to go with option#2. It's a
> > > good
> > > > > > > point that
> > > > > > >
> > > > > > > we should improve the error message debugging for the session
> > job,
> > > I
> > > > > > > think
> > > > > > >
> > > > > > > it can be a follow up work as an improvement after we support
> the
> > > > > session
> > > > > > > job operation.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Best,
> > > > > > >
> > > > > > > Aitozi.
> > > > > > >
> > > > > > >
> > > > > > > Geng Biao  于2022年3月17日周四 10:55写道:
> > > > > > >
> > > > > > >> Thanks Aitozi for the work!
> > > > > > >>
> > > > > > >> I lean to option#2 of using JarRunHeaders with uber job jar as
> > > well.
> > > > > As
> > > > > > >> Yang said, the user defined dependencies may be better
> supported
> > > in
> > > > > > >> upstream flink.
> > > > > > >> A follow-up thought: I think we should care the  potential
> > > influence
> > > > > on
> > > > > > >> user experiences: as the job graph is generated in JM, when
> the
> > > > > > generation
> > > > > > >> fails due to some issues in the main() method, we should do
> some
> > > 

Re: Flink cassandra connector performance issue

2022-03-22 Thread Chesnay Schepler
I'd be quite curious on what the analysis that the mapper isn't re-used 
is based on.
A given CassandraPojoSink instance has been re-using the same mapper 
since it was added in 1.1:

https://github.com/apache/flink/blob/release-1.1/flink-streaming-connectors/flink-connector-cassandra/src/main/java/org/apache/flink/streaming/connectors/cassandra/CassandraPojoSink.java

On 22/03/2022 09:22, Martijn Visser wrote:

Hi Jay,

Thanks for reaching out! As just mentioned in the ticket, the Cassandra
connector hasn't been actively maintained in the last couple months/years.
It would be great if there would be more contributors who could help out
with this. See also my previous request [1] on the Dev mailing list, which
also contains an overview of the current issues with the connector.

I'm also including @Marco Zühlke  who previously
volunteered to help out with the connector. It would be great if you could
all help out :)

Best regards,

Martijn Visser
https://twitter.com/MartijnVisser82

[1] https://lists.apache.org/thread/1qokt5tp8dcp58dmshbwjc43ssbm1vvk


On Tue, 22 Mar 2022 at 07:33, Ghiya, Jay (GE Healthcare) 
wrote:


Hi @dev@flink.apache.org,

Greetings from gehc.

This is regarding flink Cassandra connector implementation that could be
causing the performance issue that we are facing.

The summary of the error is

"Insertions into scylla might be suffering. Expect performance problems
unless this is resolved."

Upon doing initial analysis we figured out - "flink cassandra connector is
not keeping instance of mapping manager that is used to convert a pojo to
cassandra row. Ideally the mapping manager should have the same life time
as cluster and session objects which are also created once when the driver
is initialized"

Reference:
https://stackoverflow.com/questions/59203418/cassandra-java-driver-warning

Can we take a look at this? Also can we help fix this ? @R, Aromal (GE
Healthcare, consultant) Is our lead dev on this.

Here is the jira issue on the same -
https://issues.apache.org/jira/browse/FLINK-26793

-Jay






[jira] [Created] (FLINK-26803) Merge small ChannelState file for Unaligned Checkpoint

2022-03-22 Thread fanrui (Jira)
fanrui created FLINK-26803:
--

 Summary: Merge small ChannelState file for Unaligned Checkpoint
 Key: FLINK-26803
 URL: https://issues.apache.org/jira/browse/FLINK-26803
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Checkpointing, Runtime / Network
Reporter: fanrui


When making an unaligned checkpoint, the number of ChannelState files is 
TaskNumber * subtaskNumber. For high parallelism job, it writes too many small 
files. It causes high load for hdfs NN.

 

In our production, a job writes more than 50K small files for each Unaligned 
Checkpoint. Could we merge these files before write FileSystem? We can 
configure the maximum number of files each TM can write in a single Unaligned 
Checkpoint.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [ANNOUNCE] release-1.15 branch cut

2022-03-22 Thread Piotr Nowojski
Thanks for fixing this!

Best,
Piotrek

pon., 21 mar 2022 o 18:21 Yun Gao  napisał(a):

> Hi Piotr,
>
> Very thanks for the information and very sorry for the missing!
>
> Now I updated the flink.version property of the benchmark library
> and also update it to the checklist in the releasing introduction
> document~
>
> Best,
> Yun Gao
>
>
> --
> From:Piotr Nowojski 
> Send Time:2022 Mar. 21 (Mon.) 19:29
> To:dev 
> Cc:Yun Gao 
> Subject:Re: [ANNOUNCE] release-1.15 branch cut
>
> Hey Yun Gao,
>
> I think you have missed the step of updating the flink-benchmarks
> repository after forking the release-1.15 branch.
>
> https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release
> Prepare for the release -> Major release -> Flink Benchmarks repository
>
> Best,
> Piotrek
>
> wt., 15 mar 2022 o 11:02 Chesnay Schepler  napisał(a):
>
> > Correction: The fixVersion should not contain 1.15.0 and 1.16.0 at the
> > same time, but only 1.15.0.
> >
> > On 14/03/2022 18:35, Yun Gao wrote:
> > > Hi devs,
> > >
> > > The release-1.15 branch has been forked out from the master branch, at
> > the
> > > commit 9c77f13e587d7cb52469cac934c6aaec28dcd17d. Also, the version
> > > on the master branch has been upgraded to 1.16-SNAPSHOT.
> > >
> > >   From now on, for PRs that should be presented in 1.15.0, please make
> > sure:
> > > - Merge the PR into both master and release-1.15 branches
> > > - The JIRA ticket should be closed with the correct fix-versions
> (1.15.0
> > & 1.16.0).
> > >
> > >   We are now creating the rc0, which is expected to be delivered soon.
> > >
> > >   Thank you~
> > > Joe, Till & Yun Gao
> >
> >
> >
>


Re: [DISCUSS] Support the session job management in kubernetes operator

2022-03-22 Thread Aitozi
Hi Thomas:

Thanks for your valuable question. Let’s make the relationship between
the session deployment and the jobs more clear.

IMO, the session deployment and jobs interact in these situations:

- Create the session job. Then FlinkSessionJobController will wait for the
session cluster ready then submit the job. The look up key is namespace and
clusterId.

- Delete the session job. Then it will cancel the current session job.

- Delete the session deployment. It will have to delete the session job
first, we could set the ownerference of the FlinkSessionJob to let the
Kubernetes trigger the cleanup session jobs before removing the session
deployment.

- Upgrade the session deployment. It will be a critical part, because it
will affect all the session jobs. We should suspend the job first and then
upgrade the session cluster. So I tend to validate that all the jobs are
suspended and then perform the session cluster upgrade. After upgrade then
change the session jobs to running manually.

What do you think about this? If there is no objection, I will clarify it
in the FLIP doc.


Besides, sorry for the rough vote and discussion process. It's my first
time driving this, I will keep that in mind next time :)
Best,
Aitozi.

Yang Wang  于2022年3月22日周二 10:11写道:

> I think the session cluster could not be deleted unless all the running
> jobs have finished or cancelled. I agree this should be clarified in the
> FLIP.
>
> Best,
> Yang
>
> Thomas Weise  于2022年3月22日周二 09:26写道:
>
> > Hi Aitozi,
> >
> > Thanks for the proposal. Can you please clarify in the FLIP the
> > relationship between the session deployment and the jobs that depend on
> it?
> > Will, for example, the operator ensure that the individual jobs are
> > deleted when the underlying cluster is deleted?
> >
> > Side note: When the discussion thread started 5 days ago and a FLIP vote
> > was started 2 days later and there is also a weekend included, then this
> is
> > probably on the short side for broader feedback.
> >
> > Thanks,
> > Thomas
> >
> >
> > On Fri, Mar 18, 2022 at 4:01 AM Yang Wang  wrote:
> >
> > > Great work. Since we are introducing a new public API, it deserves a
> > FLIP.
> > > And the FLIP will help the later contributors catch up soon.
> > >
> > > Best,
> > > Yang
> > >
> > > Gyula Fóra  于2022年3月18日周五 18:11写道:
> > >
> > > > Thank Aitozi, a FLIP might be an overkill at this point but no harm
> in
> > > > voting on it anyways :)
> > > >
> > > > Looks good!
> > > >
> > > > Gyula
> > > >
> > > > On Fri, Mar 18, 2022 at 10:25 AM Aitozi 
> wrote:
> > > >
> > > > > Hi Guys:
> > > > >
> > > > > FYI, I have integrated your comments and drawn the
> FLIP-215[1], I
> > > > will
> > > > > create another thread to vote for it.
> > > > >
> > > > > [1]:
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-215%3A+Introduce+FlinkSessionJob+CRD+in+the+kubernetes+operator
> > > > >
> > > > > Best,
> > > > >
> > > > > Aitozi.
> > > > >
> > > > >
> > > > > Aitozi  于2022年3月17日周四 11:16写道:
> > > > >
> > > > > > Hi Biao Geng:
> > > > > >
> > > > > >Thanks for your feedback, I'm +1 to go with option#2. It's a
> > good
> > > > > > point that
> > > > > >
> > > > > > we should improve the error message debugging for the session
> job,
> > I
> > > > > > think
> > > > > >
> > > > > > it can be a follow up work as an improvement after we support the
> > > > session
> > > > > > job operation.
> > > > > >
> > > > > >
> > > > > >
> > > > > > Best,
> > > > > >
> > > > > > Aitozi.
> > > > > >
> > > > > >
> > > > > > Geng Biao  于2022年3月17日周四 10:55写道:
> > > > > >
> > > > > >> Thanks Aitozi for the work!
> > > > > >>
> > > > > >> I lean to option#2 of using JarRunHeaders with uber job jar as
> > well.
> > > > As
> > > > > >> Yang said, the user defined dependencies may be better supported
> > in
> > > > > >> upstream flink.
> > > > > >> A follow-up thought: I think we should care the  potential
> > influence
> > > > on
> > > > > >> user experiences: as the job graph is generated in JM, when the
> > > > > generation
> > > > > >> fails due to some issues in the main() method, we should do some
> > > work
> > > > on
> > > > > >> showing such error messages in this proposal or the later k8s
> > > operator
> > > > > >> implementation.  Reason for this question is that if users
> submit
> > > many
> > > > > jobs
> > > > > >> to one same session cluster, it may be not easy for them to find
> > > > > relevant
> > > > > >> error logs about main() method of a specific job. The
> FLINK-25715
> > > > could
> > > > > >> help us later.
> > > > > >>
> > > > > >>
> > > > > >> Best,
> > > > > >> Biao Geng
> > > > > >>
> > > > > >>
> > > > > >> 发件人: Aitozi 
> > > > > >> 日期: 星期三, 2022年3月16日 下午5:19
> > > > > >> 收件人: dev@flink.apache.org 
> > > > > >> 主题: Re: [DISCUSS] Support the session job management in
> kubernetes
> > > > > >> operator
> > > > > >> Hi Yang Wang
> > > > > >> Thanks for your feedback, Provide the local and http
> > > 

[jira] [Created] (FLINK-26802) StreamPhysicalOverAggregate doesn't support consuming update and delete changes which is produced by node Deduplicate

2022-03-22 Thread zhiyuan (Jira)
zhiyuan created FLINK-26802:
---

 Summary: StreamPhysicalOverAggregate doesn't support consuming 
update and delete changes which is produced by node Deduplicate
 Key: FLINK-26802
 URL: https://issues.apache.org/jira/browse/FLINK-26802
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.14.4
 Environment: flink version 1.14

flink sql  job
Reporter: zhiyuan
 Attachments: image-2022-03-22-17-28-17-759.png

Problem description:

using topn to de-duplicate, another layer of topN queries is nested

tran_tm  Is defined as watermark

sql :

!image-2022-03-22-17-28-17-759.png!

 

I tested it and found that using ProcTime worked, but using RowTime had syntax 
problems

 

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26801) LogisticRegressionTest.testGetModelData» Runtime Failed to fetch next res...

2022-03-22 Thread Zhipeng Zhang (Jira)
Zhipeng Zhang created FLINK-26801:
-

 Summary: LogisticRegressionTest.testGetModelData» Runtime Failed 
to fetch next res...
 Key: FLINK-26801
 URL: https://issues.apache.org/jira/browse/FLINK-26801
 Project: Flink
  Issue Type: Bug
  Components: Library / Machine Learning
Affects Versions: ml-2.0.0
Reporter: Zhipeng Zhang


The flink-ml run fails at the following case [1]:

```
Error:  Tests run: 8, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 71.967 
s <<< FAILURE! - in org.apache.flink.ml.classification.LogisticRegressionTest 
[27997|https://github.com/apache/flink-ml/runs/5638224415?check_suite_focus=true#step:4:27997]Error:
  testGetModelData Time elapsed: 3.221 s <<< ERROR! 
[27998|https://github.com/apache/flink-ml/runs/5638224415?check_suite_focus=true#step:4:27998]java.lang.RuntimeException:
 Failed to fetch next result 
[27999|https://github.com/apache/flink-ml/runs/5638224415?check_suite_focus=true#step:4:27999]
 at 
org.apache.flink.streaming.api.operators.collect.CollectResultIterator.nextResultFromFetcher(CollectResultIterator.java:109)
 
[28000|https://github.com/apache/flink-ml/runs/5638224415?check_suite_focus=true#step:4:28000]
 at 
org.apache.flink.streaming.api.operators.collect.CollectResultIterator.next(CollectResultIterator.java:88)
 
[28001|https://github.com/apache/flink-ml/runs/5638224415?check_suite_focus=true#step:4:28001]
 at 
org.apache.flink.ml.classification.LogisticRegressionTest.testGetModelData(LogisticRegressionTest.java:251)
 
[28002|https://github.com/apache/flink-ml/runs/5638224415?check_suite_focus=true#step:4:28002]
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
[28003|https://github.com/apache/flink-ml/runs/5638224415?check_suite_focus=true#step:4:28003]
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
[28004|https://github.com/apache/flink-ml/runs/5638224415?check_suite_focus=true#step:4:28004]
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 
[28005|https://github.com/apache/flink-ml/runs/5638224415?check_suite_focus=true#step:4:28005]
 at java.lang.reflect.Method.invoke(Method.java:498) 
[28006|https://github.com/apache/flink-ml/runs/5638224415?check_suite_focus=true#step:4:28006]
 at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
 
[28007|https://github.com/apache/flink-ml/runs/5638224415?check_suite_focus=true#step:4:28007]
 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 
[28008|https://github.com/apache/flink-ml/runs/5638224415?check_suite_focus=true#step:4:28008]
 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
 
[28009|https://github.com/apache/flink-ml/runs/5638224415?check_suite_focus=true#step:4:28009]
 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
 
[28010|https://github.com/apache/flink-ml/runs/5638224415?check_suite_focus=true#step:4:28010]
 at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
[28011|https://github.com/apache/flink-ml/runs/5638224415?check_suite_focus=true#step:4:28011]
 at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) 
[28012|https://github.com/apache/flink-ml/runs/5638224415?check_suite_focus=true#step:4:28012]
 at org.junit.rules.RunRules.evaluate(RunRules.java:20) 
[28013|https://github.com/apache/flink-ml/runs/5638224415?check_suite_focus=true#step:4:28013]
 at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) 
[28014|https://github.com/apache/flink-ml/runs/5638224415?check_suite_focus=true#step:4:28014]
 at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
 
[28015|https://github.com/apache/flink-ml/runs/5638224415?check_suite_focus=true#step:4:28015]
 at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
 
[28016|https://github.com/apache/flink-ml/runs/5638224415?check_suite_focus=true#step:4:28016]
 at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) 
[28017|https://github.com/apache/flink-ml/runs/5638224415?check_suite_focus=true#step:4:28017]
 at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) 
[28018|https://github.com/apache/flink-ml/runs/5638224415?check_suite_focus=true#step:4:28018]
 at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) 
[28019|https://github.com/apache/flink-ml/runs/5638224415?check_suite_focus=true#step:4:28019]
 at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) 
[28020|https://github.com/apache/flink-ml/runs/5638224415?check_suite_focus=true#step:4:28020]
 at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) 
[28021|https://github.com/apache/flink-ml/runs/5638224415?check_suite_focus=true#step:4:28021]
 at 

Re: Flink cassandra connector performance issue

2022-03-22 Thread Martijn Visser
Hi Jay,

Thanks for reaching out! As just mentioned in the ticket, the Cassandra
connector hasn't been actively maintained in the last couple months/years.
It would be great if there would be more contributors who could help out
with this. See also my previous request [1] on the Dev mailing list, which
also contains an overview of the current issues with the connector.

I'm also including @Marco Zühlke  who previously
volunteered to help out with the connector. It would be great if you could
all help out :)

Best regards,

Martijn Visser
https://twitter.com/MartijnVisser82

[1] https://lists.apache.org/thread/1qokt5tp8dcp58dmshbwjc43ssbm1vvk


On Tue, 22 Mar 2022 at 07:33, Ghiya, Jay (GE Healthcare) 
wrote:

> Hi @dev@flink.apache.org,
>
> Greetings from gehc.
>
> This is regarding flink Cassandra connector implementation that could be
> causing the performance issue that we are facing.
>
> The summary of the error is
>
> "Insertions into scylla might be suffering. Expect performance problems
> unless this is resolved."
>
> Upon doing initial analysis we figured out - "flink cassandra connector is
> not keeping instance of mapping manager that is used to convert a pojo to
> cassandra row. Ideally the mapping manager should have the same life time
> as cluster and session objects which are also created once when the driver
> is initialized"
>
> Reference:
> https://stackoverflow.com/questions/59203418/cassandra-java-driver-warning
>
> Can we take a look at this? Also can we help fix this ? @R, Aromal (GE
> Healthcare, consultant) Is our lead dev on this.
>
> Here is the jira issue on the same -
> https://issues.apache.org/jira/browse/FLINK-26793
>
> -Jay
>
>


[jira] [Created] (FLINK-26800) write small data file using share write buffer manager

2022-03-22 Thread YufeiLiu (Jira)
YufeiLiu created FLINK-26800:


 Summary: write small data file using share write buffer manager
 Key: FLINK-26800
 URL: https://issues.apache.org/jira/browse/FLINK-26800
 Project: Flink
  Issue Type: Bug
  Components: Runtime / State Backends
Affects Versions: 1.12.7
Reporter: YufeiLiu


When setting config {{state.backend.rocksdb.memory.fixed-per-slot}}, all 
rocksdb instances in same slot are using shared WriteBufferManager. 

I meet a extreme circumstances, there are 2 rocksdb using a WriteBufferManager 
size is 32M, if rocksdb-1 write (32*0.9)M data and won't have more data in a 
while, it won't trigger flush, then rocksdb-2 start writing data, it will 
trigger flush every single record and write many small file, spend a lot of 
time of backgroud compaction. 
rocksdb-2 only flush current CF data, it won't recovery until rocksdb-1 flush 
data in memtable.

I can disable memory managed option to avoid this case, but I can't limit the 
memory usage.


Maybe can create a tracker monitor all rocksdb memory usage, trigger a force 
flush if it's necessary?


[1] RocksDB Write Buffer Manager 
https://github.com/facebook/rocksdb/wiki/Write-Buffer-Manager



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26799) StateChangeFormat#read not seek to offset correctly

2022-03-22 Thread Feifan Wang (Jira)
Feifan Wang created FLINK-26799:
---

 Summary: StateChangeFormat#read not seek to offset correctly
 Key: FLINK-26799
 URL: https://issues.apache.org/jira/browse/FLINK-26799
 Project: Flink
  Issue Type: Bug
  Components: Runtime / State Backends
Reporter: Feifan Wang


StateChangeFormat#read must seek to offset before read, current implement as 
follows :

 
{code:java}
FSDataInputStream stream = handle.openInputStream();
DataInputViewStreamWrapper input = wrap(stream);
if (stream.getPos() != offset) {
LOG.debug("seek from {} to {}", stream.getPos(), offset);
input.skipBytesToRead((int) offset);
}{code}
But the if condition is incorrect, stream.getPos() return the position of 
underlying stream which is different from position of input.

By the way, because of wrapped by BufferedInputStream, position of underlying 
stream always at n*bufferSize or the end of file. 

Actually, input is aways at position 0 at beginning, so I think we can seek to 
the offset directly.

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26798) JobMaster.testJobFailureWhenTaskExecutorHeartbeatTimeout failed due to missing Execution

2022-03-22 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-26798:
-

 Summary: JobMaster.testJobFailureWhenTaskExecutorHeartbeatTimeout 
failed due to missing Execution
 Key: FLINK-26798
 URL: https://issues.apache.org/jira/browse/FLINK-26798
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.15.0, 1.16.0
Reporter: Matthias Pohl


[This 
build|https://dev.azure.com/mapohl/flink/_build/results?buildId=897=logs=cc649950-03e9-5fae-8326-2f1ad744b536=a9a20597-291c-5240-9913-a731d46d6dd1=8399]
 failed due to an {{ExecutionGraphException}} indicating that an expected 
{{Execution}} wasn't around:
{code}
[...]
Caused by: org.apache.flink.util.FlinkException: Execution 
48dbc880c8225256b8bc112ea36e9082 is unexpectedly no longer running on task 
executor bbad15fcb93d4b2b4f80fe2c35e03e6d.
at 
org.apache.flink.runtime.jobmaster.JobMaster$1.onMissingDeploymentsOf(JobMaster.java:250)
 ~[classes/:?]
... 35 more
{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26797) ZKCheckpointIDCounterMultiServersTest.testRecoveredAfterConnectionLoss failed on azure

2022-03-22 Thread Yun Gao (Jira)
Yun Gao created FLINK-26797:
---

 Summary: 
ZKCheckpointIDCounterMultiServersTest.testRecoveredAfterConnectionLoss failed 
on azure
 Key: FLINK-26797
 URL: https://issues.apache.org/jira/browse/FLINK-26797
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.16.0
Reporter: Yun Gao



{code:java}
Mar 21 07:39:44 [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time 
elapsed: 23.662 s <<< FAILURE! - in 
org.apache.flink.runtime.checkpoint.ZKCheckpointIDCounterMultiServersTest
Mar 21 07:39:44 [ERROR] 
org.apache.flink.runtime.checkpoint.ZKCheckpointIDCounterMultiServersTest.testRecoveredAfterConnectionLoss
  Time elapsed: 23.639 s  <<< FAILURE!
Mar 21 07:39:44 java.lang.AssertionError: 
Mar 21 07:39:44 ZooKeeperCheckpointIDCounter doesn't properly work after 
reconnected.
Mar 21 07:39:44 Expected: is <2L>
Mar 21 07:39:44  but: was <3L>
Mar 21 07:39:44 at 
org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
Mar 21 07:39:44 at org.junit.Assert.assertThat(Assert.java:964)
Mar 21 07:39:44 at 
org.apache.flink.runtime.checkpoint.ZKCheckpointIDCounterMultiServersTest.testRecoveredAfterConnectionLoss(ZKCheckpointIDCounterMultiServersTest.java:86)
Mar 21 07:39:44 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
Mar 21 07:39:44 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
Mar 21 07:39:44 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
Mar 21 07:39:44 at java.lang.reflect.Method.invoke(Method.java:498)
Mar 21 07:39:44 at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
Mar 21 07:39:44 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
Mar 21 07:39:44 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
Mar 21 07:39:44 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
Mar 21 07:39:44 at 
org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
Mar 21 07:39:44 at 
org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
Mar 21 07:39:44 at 
org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
Mar 21 07:39:44 at 
org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
Mar 21 07:39:44 at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
Mar 21 07:39:44 at 
org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
Mar 21 07:39:44 at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
Mar 21 07:39:44 at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
Mar 21 07:39:44 at 
org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
Mar 21 07:39:44 at 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
Mar 21 07:39:44 at 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
Mar 21 07:39:44 at 
org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
Mar 21 07:39:44 at 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
Mar 21 07:39:44 at 
org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
Mar 21 07:39:44 at 
org.junit.runners.ParentRunner.run(ParentRunner.java:413)
Mar 21 07:39:44 at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
Mar 21 07:39:44 at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
Mar 21 07:39:44 at 
org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:42)
Mar 21 07:39:44 at 
org.junit.vintage.engine.VintageTestEngine.executeAllChildren(VintageTestEngine.java:80)
Mar 21 07:39:44 at 
org.junit.vintage.engine.VintageTestEngine.execute(VintageTestEngine.java:72)
Mar 21 07:39:44 at 
org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:107)

{code}

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=33448=logs=a57e0635-3fad-5b08-57c7-a4142d7d6fa9=2ef0effc-1da1-50e5-c2bd-aab434b1c5b7=9173



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: Re: Weekly 1.15 Sync tomorrow cancelled

2022-03-22 Thread Yun Gao
Hi Yu,

Currently we still have some ongoing blocker issues tracked under [1].

We are tighly tracking the progress of these issues and we'll start creating
 the RC0 immediately after these blockers are solved, and hopefully inside
this week. 

Best,
Yun Gao


[1] 
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=505=detail=FLINK-26779




 --Original Mail --
Sender:Yu Li 
Send Date:Tue Mar 22 15:25:17 2022
Recipients:dev 
Subject:Re: Weekly 1.15 Sync tomorrow cancelled
Thanks for the note Joe. Could you share with us the status of RC0
preparation and when it's planned to be created? Many thanks.

Best Regards,
Yu


On Tue, 22 Mar 2022 at 04:00, Johannes Moser  wrote:

> Hello,
>
> As all relevant issues are assigned and worked on there’s no need to sync
> tomorrow.
>
> Therefor we decided to cancel the sync.
>
> Best,
> Joe


Re: Weekly 1.15 Sync tomorrow cancelled

2022-03-22 Thread Yu Li
Thanks for the note Joe. Could you share with us the status of RC0
preparation and when it's planned to be created? Many thanks.

Best Regards,
Yu


On Tue, 22 Mar 2022 at 04:00, Johannes Moser  wrote:

> Hello,
>
> As all relevant issues are assigned and worked on there’s no need to sync
> tomorrow.
>
> Therefor we decided to cancel the sync.
>
> Best,
> Joe


[jira] [Created] (FLINK-26796) TaskManagerProcessFailureStreamingRecoveryITCase.testTaskManagerProcessFailure failed on azure

2022-03-22 Thread Yun Gao (Jira)
Yun Gao created FLINK-26796:
---

 Summary: 
TaskManagerProcessFailureStreamingRecoveryITCase.testTaskManagerProcessFailure 
failed on azure
 Key: FLINK-26796
 URL: https://issues.apache.org/jira/browse/FLINK-26796
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.15.0
Reporter: Yun Gao



{code:java}
2022-03-21T17:04:48.7388935Z Mar 21 17:04:48 [ERROR]   
TaskManagerProcessFailureStreamingRecoveryITCase>AbstractTaskManagerProcessFailureRecoveryTest.testTaskManagerProcessFailure:201
 The program encountered a RestClientException : 
[org.apache.flink.runtime.rest.handler.RestHandlerException: 
org.apache.flink.runtime.messages.FlinkJobNotFoundException: Could not find 
Flink job (e5d2bef6fe1da660c2da45ef89c9acdb)
2022-03-21T17:04:48.7391533Z Mar 21 17:04:48at 
org.apache.flink.runtime.rest.handler.job.JobExecutionResultHandler.propagateException(JobExecutionResultHandler.java:94)
2022-03-21T17:04:48.7393502Z Mar 21 17:04:48at 
org.apache.flink.runtime.rest.handler.job.JobExecutionResultHandler.lambda$handleRequest$1(JobExecutionResultHandler.java:84)
2022-03-21T17:04:48.7394967Z Mar 21 17:04:48at 
java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:884)
2022-03-21T17:04:48.7396569Z Mar 21 17:04:48at 
java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:866)
2022-03-21T17:04:48.7397895Z Mar 21 17:04:48at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
2022-03-21T17:04:48.7399156Z Mar 21 17:04:48at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
2022-03-21T17:04:48.7401447Z Mar 21 17:04:48at 
org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$1(AkkaInvocationHandler.java:252)
2022-03-21T17:04:48.7402979Z Mar 21 17:04:48at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
2022-03-21T17:04:48.7403835Z Mar 21 17:04:48at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
2022-03-21T17:04:48.7404613Z Mar 21 17:04:48at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
2022-03-21T17:04:48.7405337Z Mar 21 17:04:48at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
2022-03-21T17:04:48.7406351Z Mar 21 17:04:48at 
org.apache.flink.util.concurrent.FutureUtils.doForward(FutureUtils.java:1387)
2022-03-21T17:04:48.7407088Z Mar 21 17:04:48at 
org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$null$1(ClassLoadingUtils.java:93)
2022-03-21T17:04:48.7408037Z Mar 21 17:04:48at 
org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
2022-03-21T17:04:48.7408971Z Mar 21 17:04:48at 
org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$2(ClassLoadingUtils.java:92)
2022-03-21T17:04:48.7409816Z Mar 21 17:04:48at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
2022-03-21T17:04:48.7410621Z Mar 21 17:04:48at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
2022-03-21T17:04:48.7411616Z Mar 21 17:04:48at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
2022-03-21T17:04:48.7412463Z Mar 21 17:04:48at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
2022-03-21T17:04:48.7413295Z Mar 21 17:04:48at 
org.apache.flink.runtime.concurrent.akka.AkkaFutureUtils$1.onComplete(AkkaFutureUtils.java:45)
2022-03-21T17:04:48.7413957Z Mar 21 17:04:48at 
akka.dispatch.OnComplete.internal(Future.scala:299)
2022-03-21T17:04:48.7414636Z Mar 21 17:04:48at 
akka.dispatch.OnComplete.internal(Future.scala:297)
2022-03-21T17:04:48.7415230Z Mar 21 17:04:48at 
akka.dispatch.japi$CallbackBridge.apply(Future.scala:224)
2022-03-21T17:04:48.7416123Z Mar 21 17:04:48at 
akka.dispatch.japi$CallbackBridge.apply(Future.scala:221)
2022-03-21T17:04:48.7416843Z Mar 21 17:04:48at 
scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
2022-03-21T17:04:48.7417586Z Mar 21 17:04:48at 
org.apache.flink.runtime.concurrent.akka.AkkaFutureUtils$DirectExecutionContext.execute(AkkaFutureUtils.java:65)
2022-03-21T17:04:48.7418360Z Mar 21 17:04:48at 
scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:68)
2022-03-21T17:04:48.7419058Z Mar 21 17:04:48at 
scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:284)
2022-03-21T17:04:48.7419820Z Mar 21 17:04:48at 
scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:284)
2022-03-21T17:04:48.7420561Z Mar 21 17:04:48at 

[jira] [Created] (FLINK-26795) Fix the CI not fail fast after build failed

2022-03-22 Thread Aitozi (Jira)
Aitozi created FLINK-26795:
--

 Summary: Fix the CI not fail fast after build failed 
 Key: FLINK-26795
 URL: https://issues.apache.org/jira/browse/FLINK-26795
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Reporter: Aitozi


The piping command ignore the truly exit code, so the maven build failed is 
skipped. see 
[here|https://stackoverflow.com/questions/6871859/piping-command-output-to-tee-but-also-save-exit-code-of-command]

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26794) ChangelogRescalingITCase.test failed on azure due to java.nio.file.NoSuchFileException

2022-03-22 Thread Yun Gao (Jira)
Yun Gao created FLINK-26794:
---

 Summary: ChangelogRescalingITCase.test failed on azure due to 
java.nio.file.NoSuchFileException
 Key: FLINK-26794
 URL: https://issues.apache.org/jira/browse/FLINK-26794
 Project: Flink
  Issue Type: Bug
  Components: Runtime / State Backends
Affects Versions: 1.16.0
Reporter: Yun Gao



{code:java}
Mar 21 17:33:56 [ERROR] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time 
elapsed: 14.589 s <<< FAILURE! - in 
org.apache.flink.test.state.ChangelogRescalingITCase
Mar 21 17:33:56 [ERROR] ChangelogRescalingITCase.test  Time elapsed: 8.392 s  
<<< ERROR!
Mar 21 17:33:56 java.io.UncheckedIOException: 
java.nio.file.NoSuchFileException: 
/tmp/junit4908969673123504454/junit6297505939941694356/d832f597d0b0414695fa746ffc400bb2/chk-43
Mar 21 17:33:56 at 
java.nio.file.FileTreeIterator.fetchNextIfNeeded(FileTreeIterator.java:88)
Mar 21 17:33:56 at 
java.nio.file.FileTreeIterator.hasNext(FileTreeIterator.java:104)
Mar 21 17:33:56 at 
java.util.Iterator.forEachRemaining(Iterator.java:115)
Mar 21 17:33:56 at 
java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
Mar 21 17:33:56 at 
java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
Mar 21 17:33:56 at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
Mar 21 17:33:56 at 
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
Mar 21 17:33:56 at 
java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
Mar 21 17:33:56 at 
java.util.stream.ReferencePipeline.reduce(ReferencePipeline.java:546)
Mar 21 17:33:56 at 
java.util.stream.ReferencePipeline.max(ReferencePipeline.java:582)
Mar 21 17:33:56 at 
org.apache.flink.test.util.TestUtils.getMostRecentCompletedCheckpointMaybe(TestUtils.java:114)
Mar 21 17:33:56 at 
org.apache.flink.test.state.ChangelogRescalingITCase.checkpointAndCancel(ChangelogRescalingITCase.java:333)
Mar 21 17:33:56 at 
org.apache.flink.test.state.ChangelogRescalingITCase.test(ChangelogRescalingITCase.java:156)
Mar 21 17:33:56 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
Mar 21 17:33:56 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
Mar 21 17:33:56 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
Mar 21 17:33:56 at java.lang.reflect.Method.invoke(Method.java:498)
Mar 21 17:33:56 at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
Mar 21 17:33:56 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
Mar 21 17:33:56 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
Mar 21 17:33:56 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
Mar 21 17:33:56 at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
Mar 21 17:33:56 at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
Mar 21 17:33:56 at 
org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
Mar 21 17:33:56 at 
org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
Mar 21 17:33:56 at 
org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
Mar 21 17:33:56 at 
org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
Mar 21 17:33:56 at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
Mar 21 17:33:56 at 
org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
Mar 21 17:33:56 at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
Mar 21 17:33:56 at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
Mar 21 17:33:56 at 
org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
Mar 21 17:33:56 at 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
Mar 21 17:33:56 at 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
Mar 21 17:33:56 at 
org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
Mar 21 17:33:56 at 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)

{code}
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=33515=logs=5c8e7682-d68f-54d1-16a2-a09310218a49=86f654fa-ab48-5c1a-25f4-7e7f6afb9bba=5643



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26793) Flink Cassandra connector performance issue

2022-03-22 Thread Jay Ghiya (Jira)
Jay Ghiya created FLINK-26793:
-

 Summary: Flink Cassandra connector performance issue 
 Key: FLINK-26793
 URL: https://issues.apache.org/jira/browse/FLINK-26793
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Cassandra
Affects Versions: 1.14.4
Reporter: Jay Ghiya


A warning is observed during long runs of flink job stating “Insertions into 
scylla might be suffering. Expect performance problems unless this is resolved.”
Upon initial analysis - “flink cassandra connector is not keeping instance of 
mapping manager that is used to convert a pojo to cassandra row. Ideally the 
mapping manager should have the same life time as cluster and session objects 
which are also created once when the driver is initialized”
Reference: 
https://stackoverflow.com/questions/59203418/cassandra-java-driver-warning




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Flink cassandra connector performance issue

2022-03-22 Thread Ghiya, Jay (GE Healthcare)
Hi @dev@flink.apache.org,

Greetings from gehc.

This is regarding flink Cassandra connector implementation that could be 
causing the performance issue that we are facing.

The summary of the error is

"Insertions into scylla might be suffering. Expect performance problems unless 
this is resolved."

Upon doing initial analysis we figured out - "flink cassandra connector is not 
keeping instance of mapping manager that is used to convert a pojo to cassandra 
row. Ideally the mapping manager should have the same life time as cluster and 
session objects which are also created once when the driver is initialized"

Reference: 
https://stackoverflow.com/questions/59203418/cassandra-java-driver-warning

Can we take a look at this? Also can we help fix this ? @R, Aromal (GE 
Healthcare, consultant) Is our lead dev on this.

Here is the jira issue on the same - 
https://issues.apache.org/jira/browse/FLINK-26793

-Jay