[jira] [Closed] (FLINK-30081) Local executor can not accept different jvm-overhead.min/max values

2024-04-16 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-30081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song closed FLINK-30081.

Fix Version/s: 1.20.0
   Resolution: Fixed

master (1.20): 9cc5ab9caf368ef336599e7d48f679c8c9750f49

> Local executor can not accept different jvm-overhead.min/max values
> ---
>
> Key: FLINK-30081
> URL: https://issues.apache.org/jira/browse/FLINK-30081
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Configuration
>Affects Versions: 1.16.0
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>Priority: Major
>  Labels: auto-deprioritized-major, pull-request-available
> Fix For: 1.20.0
>
>
> In local executor, it's not possible to set different values for 
> {{taskmanager.memory.jvm-overhead.max}} and 
> {{{}taskmanager.memory.jvm-overhead.min{}}}. The same problem for 
> {{taskmanager.memory.network.max}} and {{{}taskmanager.memory.network.min{}}}.
> Sample code to reproduce:
> {code:java}
> Configuration conf = new Configuration();
> conf.setString(TaskManagerOptions.JVM_OVERHEAD_MIN.key(), "1GB");
> conf.setString(TaskManagerOptions.JVM_OVERHEAD_MAX.key(), "2GB");
> StreamExecutionEnvironment.createLocalEnvironment(conf)
> .fromElements("Hello", "World")
> .executeAndCollect()
> .forEachRemaining(System.out::println); {code}
> The failing exception is something like:
> {code:java}
> Exception in thread "main" java.lang.IllegalArgumentException
>   at org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:122)
>   at 
> org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.calculateTotalProcessMemoryFromComponents(TaskExecutorResourceUtils.java:182)
>   at 
> org.apache.flink.runtime.taskexecutor.TaskExecutorMemoryConfiguration.create(TaskExecutorMemoryConfiguration.java:119)
> {code}
> I think the problem was that we expect the max and min to equal, but local 
> executor did not reset them correctly?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-35089) Two input AbstractStreamOperator may throw NPE when receiving RecordAttributes

2024-04-16 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song reassigned FLINK-35089:


Assignee: Xuannan Su

> Two input AbstractStreamOperator may throw NPE when receiving RecordAttributes
> --
>
> Key: FLINK-35089
> URL: https://issues.apache.org/jira/browse/FLINK-35089
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Task
>Affects Versions: 1.19.0
>Reporter: Xuannan Su
>Assignee: Xuannan Su
>Priority: Major
>  Labels: pull-request-available
>
> Currently the `lastRecordAttributes1` and `lastRecordAttributes2` in the 
> `AbstractStreamOperator` are transient. The two fields will be null when it 
> is deserialized in TaskManager, which may cause an NPE.
> To fix it, we propose to make the RecordAttributes serializable and these 
> fields non-transient.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34589) FineGrainedSlotManager doesn't handle errors in the resource reconcilliation step

2024-03-06 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824248#comment-17824248
 ] 

Xintong Song commented on FLINK-34589:
--

[~mapohl], thanks for reaching out.

Do you already see errors thrown from the reconciliation? Or you just noticed 
the absence of safe-net but don't observe any errors so far? Just trying to 
understand what possible errors would there be.

My understanding is that, ideally we expect no exceptions thrown from the 
reconciliation? If it does, then there might be some possibilities that we are 
not yet aware of. In such case, I'd be in favor of fail eagerly so that we 
don't ignore the problem. Thus, I'd be in favor of option 1.

But it's not a strong preference and I'd also be fine with option 2.

> FineGrainedSlotManager doesn't handle errors in the resource reconcilliation 
> step
> -
>
> Key: FLINK-34589
> URL: https://issues.apache.org/jira/browse/FLINK-34589
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0, 1.18.1, 1.20.0
>Reporter: Matthias Pohl
>Priority: Major
>
> I noticed during my work on FLINK-34427 that the reconcilliation is scheduled 
> periodically when starting the {{SlotManager}}. But it doesn't handle errors 
> in this step. I see two options here:
> 1. Fail fatally because such an error might indicate a major issue with the 
> RM backend.
> 2. Log the failure and continue the scheduled task even in case of an error.
> My understanding is that we're just not able to recreate TaskManagers which 
> should be a transient issue and could be resolved in the backend (YARN, k8s). 
> That's why I would lean towards option 2.
> [~xtsong] WDYT?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34526) Actively disconnect the killed TM in RM to reduce restart time

2024-02-28 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821640#comment-17821640
 ] 

Xintong Song commented on FLINK-34526:
--

This makes sense tome.

Just to provide some backgrounds on this. Flink's JM, RM and TM coordinates 
based on an assumption that the ground truth of how TMs' resources are 
allocated for JMs lies with the TMs. Therefore, to avoid inconsistency, RM does 
not tell JM which resource (slot) from which TM is (de)allocated for it, nor 
tell JM to connect to or disconnect from a TM.

However, this IMHO might be a bit over protecting. If a TM is known for sure to 
be terminated, I don't see any problem in notifying relevant JMs earlier about 
it.

> Actively disconnect the killed TM in RM to reduce restart time
> --
>
> Key: FLINK-34526
> URL: https://issues.apache.org/jira/browse/FLINK-34526
> Project: Flink
>  Issue Type: Sub-task
>Reporter: junzhong qin
>Assignee: junzhong qin
>Priority: Not a Priority
> Attachments: image-2024-02-27-15-50-25-071.png, 
> image-2024-02-27-15-50-39-337.png
>
>
> In our test case, the pipeline is:
> !image-2024-02-27-15-50-25-071.png!
>  # parallelism = 100
>  # taskmanager.numberOfTaskSlots = 2
>  # disable checkpoint
> h3. Phenomenon
> When the worker was killed at 2024-02-27 15:10:13,691
> {code:java}
> 2024-02-27 15:10:13,691 INFO  
> org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - 
> Worker container_e2472_1706081484717_60538_01_50 is terminated. 
> Diagnostics: Container container_e2472_1706081484717_60538_01_50 marked 
> as failed. Exit code:137. Diagnostics:[2024-02-27 15:10:12.720]Container 
> killed on request. Exit code is 137[2024-02-27 15:10:12.763]Container exited 
> with a non-zero exit code 137. [2024-02-27 15:10:12.839]Killed by external 
> signal {code}
> It took about 20 seconds to restart the job.
> {code:java}
> 2024-02-27 15:10:30,749 INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source: 
> datagen_source[1] -> Sink: print_sink[2] (70/100) 
> (2a1d06e6610bc499475fa6e647f8cac9_d3f21cabc6fe0fdf76c8be915bdb22a2_69_0) 
> switched from RUNNING to FAILED on 
> container_e2472_1706081484717_60538_01_50 @ xxx 
> (dataPort=38597).org.apache.flink.runtime.jobmaster.JobMasterException: 
> TaskManager with id container_e2472_1706081484717_60538_01_50(xxx:5454) 
> is no longer reachable.
>     at 
> org.apache.flink.runtime.jobmaster.JobMaster$TaskManagerHeartbeatListener.notifyTargetUnreachable(JobMaster.java:1515)
>     at 
> org.apache.flink.runtime.heartbeat.DefaultHeartbeatMonitor.reportHeartbeatRpcFailure(DefaultHeartbeatMonitor.java:126)
>     at 
> org.apache.flink.runtime.heartbeat.HeartbeatManagerImpl.runIfHeartbeatMonitorExists(HeartbeatManagerImpl.java:275)
>     at 
> org.apache.flink.runtime.heartbeat.HeartbeatManagerImpl.reportHeartbeatTargetUnreachable(HeartbeatManagerImpl.java:267)
>     at 
> org.apache.flink.runtime.heartbeat.HeartbeatManagerImpl.handleHeartbeatRpcFailure(HeartbeatManagerImpl.java:262)
>     at 
> org.apache.flink.runtime.heartbeat.HeartbeatManagerImpl.lambda$handleHeartbeatRpc$0(HeartbeatManagerImpl.java:248)//
>  Deploy and run task
> 2024-02-27 15:10:32,426 INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source: 
> datagen_source[1] -> Sink: print_sink[2] (70/100) 
> (2a1d06e6610bc499475fa6e647f8cac9_d3f21cabc6fe0fdf76c8be915bdb22a2_69_1) 
> switched from DEPLOYING to INITIALIZING.2024-02-27 15:10:32,427 INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source: 
> datagen_source[1] -> Sink: print_sink[2] (69/100) 
> (2a1d06e6610bc499475fa6e647f8cac9_d3f21cabc6fe0fdf76c8be915bdb22a2_68_1) 
> switched from DEPLOYING to INITIALIZING.2024-02-27 15:10:33,347 INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source: 
> datagen_source[1] -> Sink: print_sink[2] (70/100) 
> (2a1d06e6610bc499475fa6e647f8cac9_d3f21cabc6fe0fdf76c8be915bdb22a2_69_1) 
> switched from INITIALIZING to RUNNING.2024-02-27 15:10:33,421 INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source: 
> datagen_source[1] -> Sink: print_sink[2] (69/100) 
> (2a1d06e6610bc499475fa6e647f8cac9_d3f21cabc6fe0fdf76c8be915bdb22a2_68_1) 
> switched from INITIALIZING to RUNNING. {code}
>  
> h3. Reason
> When the RM received the message the TM was killed, the JobMaster still kept 
> the connection with the killed TM.  And the JobMaster found the TM is no 
> longer reachable after about 17 seconds.
> h3. Solution:
> We can reduce the restart time by disConnectTaskManager actively in 
> ResourceManager
> {code:java}
> // class ResourceManager
> protected Optional closeTaskManagerConnection(
> final ResourceID 

[jira] [Commented] (FLINK-34528) Disconnect TM in JM when TM was killed to further reduce the job restart time

2024-02-28 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821631#comment-17821631
 ] 

Xintong Song commented on FLINK-34528:
--

I'm not sure about introducing an option for this.

A big concern of the Flink project is that it has too many configuration 
options, many of which are too detailed and require user to be extremely 
familiar with the Flink internals in order to use. The community is working 
hard to reduce such no-one-knows-how-to-use knobs, rather than introducing more 
of them.

Pursuing ultimate performance must not come at the price of throwing every 
single decision to the users. The engine itself should learn to smartly decide 
its behavior, or we should admit that there's no good way to achieve such 
improvements.

In this particular case, if the user wants to discover the TM lost faster, 
he/she can simply decrease the heartbeat timeout.

> Disconnect TM in JM when TM was killed to further reduce the job restart time
> -
>
> Key: FLINK-34528
> URL: https://issues.apache.org/jira/browse/FLINK-34528
> Project: Flink
>  Issue Type: Sub-task
>Reporter: junzhong qin
>Assignee: junzhong qin
>Priority: Not a Priority
> Attachments: image-2024-02-27-16-35-04-464.png
>
>
> In https://issues.apache.org/jira/browse/FLINK-34526 we disconnect the killed 
> TM in RM. But in the following scenario, we can further reduce the restart 
> time.
> h3. Phenomenon
> In the test case, the pipeline looks like:
> !image-2024-02-27-16-35-04-464.png!
> The Source: Custom Source generates strings, and the job keyBy the strings to 
> Sink: Unnamed.
>  # parallelism = 100
>  # taskmanager.numberOfTaskSlots = 2
>  # disable checkpoint
> The worker was killed at 
> {code:java}
> 2024-02-27 16:41:49,982 INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph   [] - Sink: 
> Unnamed (6/100) 
> (2f1c7b22098a273f5471e3e8f794e1d3_0a448493b4782967b150582570326227_5_0) 
> switched from RUNNING to FAILED on 
> container_e2472_1705993319725_62292_01_46 @ xxx 
> (dataPort=38827).org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
>  Connection unexpectedly closed by remote task manager 
> 'xxx/10.169.18.138:35983 [ 
> container_e2472_1705993319725_62292_01_10(xxx:5454) ] '. This might 
> indicate that the remote task manager was lost.at 
> org.apache.flink.runtime.io.network.netty.CreditBasedPartitionRequestClientHandler.channelInactive(CreditBasedPartitionRequestClientHandler.java:134)
> at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
> at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
> at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
> at 
> org.apache.flink.shaded.netty4.io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81)
> at 
> org.apache.flink.runtime.io.network.netty.NettyMessageClientDecoderDelegate.channelInactive(NettyMessageClientDecoderDelegate.java:94)
> at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
> at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
> at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
> at 
> org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405)
> at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
> at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
> at 
> org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901)
> at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:813)
> at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)
> at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)
> at 
> 

[jira] [Closed] (FLINK-34371) FLIP-331: Support EndOfStreamTrigger and isOutputOnlyAfterEndOfStream operator attribute to optimize task deployment

2024-02-28 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song closed FLINK-34371.

Fix Version/s: 1.20.0
   Resolution: Done

master (1.20): 94b5f031a785d16077d870fe9e009d168077430b

> FLIP-331: Support EndOfStreamTrigger and isOutputOnlyAfterEndOfStream 
> operator attribute to optimize task deployment
> 
>
> Key: FLINK-34371
> URL: https://issues.apache.org/jira/browse/FLINK-34371
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing, Runtime / Task
>Reporter: Yunfeng Zhou
>Assignee: Yunfeng Zhou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>
> This is an umbrella ticket for FLIP-331.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-34371) FLIP-331: Support EndOfStreamTrigger and isOutputOnlyAfterEndOfStream operator attribute to optimize task deployment

2024-02-28 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song reassigned FLINK-34371:


Assignee: Yunfeng Zhou

> FLIP-331: Support EndOfStreamTrigger and isOutputOnlyAfterEndOfStream 
> operator attribute to optimize task deployment
> 
>
> Key: FLINK-34371
> URL: https://issues.apache.org/jira/browse/FLINK-34371
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing, Runtime / Task
>Reporter: Yunfeng Zhou
>Assignee: Yunfeng Zhou
>Priority: Major
>  Labels: pull-request-available
>
> This is an umbrella ticket for FLIP-331.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-33728) Do not rewatch when KubernetesResourceManagerDriver watch fail

2024-02-21 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819433#comment-17819433
 ] 

Xintong Song edited comment on FLINK-33728 at 2/22/24 1:21 AM:
---

master (1.20): e7e31a99d6f93d4dadda21fbd1ebee079fe2418e


was (Author: xintongsong):
master (1.19): e7e31a99d6f93d4dadda21fbd1ebee079fe2418e

> Do not rewatch when KubernetesResourceManagerDriver watch fail
> --
>
> Key: FLINK-33728
> URL: https://issues.apache.org/jira/browse/FLINK-33728
> Project: Flink
>  Issue Type: New Feature
>  Components: Deployment / Kubernetes
>Reporter: xiaogang zhou
>Assignee: xiaogang zhou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>
> I met massive production problem when kubernetes ETCD slow responding happen. 
> After Kube recoverd after 1 hour, Thousands of Flink jobs using 
> kubernetesResourceManagerDriver rewatched when recieving 
> ResourceVersionTooOld,  which caused great pressure on API Server and made 
> API server failed again... 
>  
> I am not sure is it necessary to
> getResourceEventHandler().onError(throwable)
> in  PodCallbackHandlerImpl# handleError method?
>  
> We can just neglect the disconnection of watching process. and try to rewatch 
> once new requestResource called. And we can leverage on the akka heartbeat 
> timeout to discover the TM failure, just like YARN mode do.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33728) Do not rewatch when KubernetesResourceManagerDriver watch fail

2024-02-21 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song closed FLINK-33728.

Fix Version/s: 1.19.0
   Resolution: Done

master (1.19): e7e31a99d6f93d4dadda21fbd1ebee079fe2418e

> Do not rewatch when KubernetesResourceManagerDriver watch fail
> --
>
> Key: FLINK-33728
> URL: https://issues.apache.org/jira/browse/FLINK-33728
> Project: Flink
>  Issue Type: New Feature
>  Components: Deployment / Kubernetes
>Reporter: xiaogang zhou
>Assignee: xiaogang zhou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> I met massive production problem when kubernetes ETCD slow responding happen. 
> After Kube recoverd after 1 hour, Thousands of Flink jobs using 
> kubernetesResourceManagerDriver rewatched when recieving 
> ResourceVersionTooOld,  which caused great pressure on API Server and made 
> API server failed again... 
>  
> I am not sure is it necessary to
> getResourceEventHandler().onError(throwable)
> in  PodCallbackHandlerImpl# handleError method?
>  
> We can just neglect the disconnection of watching process. and try to rewatch 
> once new requestResource called. And we can leverage on the akka heartbeat 
> timeout to discover the TM failure, just like YARN mode do.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33728) Do not rewatch when KubernetesResourceManagerDriver watch fail

2024-02-21 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song updated FLINK-33728:
-
Fix Version/s: 1.20.0
   (was: 1.19.0)

> Do not rewatch when KubernetesResourceManagerDriver watch fail
> --
>
> Key: FLINK-33728
> URL: https://issues.apache.org/jira/browse/FLINK-33728
> Project: Flink
>  Issue Type: New Feature
>  Components: Deployment / Kubernetes
>Reporter: xiaogang zhou
>Assignee: xiaogang zhou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>
> I met massive production problem when kubernetes ETCD slow responding happen. 
> After Kube recoverd after 1 hour, Thousands of Flink jobs using 
> kubernetesResourceManagerDriver rewatched when recieving 
> ResourceVersionTooOld,  which caused great pressure on API Server and made 
> API server failed again... 
>  
> I am not sure is it necessary to
> getResourceEventHandler().onError(throwable)
> in  PodCallbackHandlerImpl# handleError method?
>  
> We can just neglect the disconnection of watching process. and try to rewatch 
> once new requestResource called. And we can leverage on the akka heartbeat 
> timeout to discover the TM failure, just like YARN mode do.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-30081) Local executor can not accept different jvm-overhead.min/max values

2024-02-18 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-30081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song updated FLINK-30081:
-
Priority: Major  (was: Minor)

> Local executor can not accept different jvm-overhead.min/max values
> ---
>
> Key: FLINK-30081
> URL: https://issues.apache.org/jira/browse/FLINK-30081
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Configuration
>Affects Versions: 1.16.0
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>Priority: Major
>  Labels: auto-deprioritized-major, pull-request-available
>
> In local executor, it's not possible to set different values for 
> {{taskmanager.memory.jvm-overhead.max}} and 
> {{{}taskmanager.memory.jvm-overhead.min{}}}. The same problem for 
> {{taskmanager.memory.network.max}} and {{{}taskmanager.memory.network.min{}}}.
> Sample code to reproduce:
> {code:java}
> Configuration conf = new Configuration();
> conf.setString(TaskManagerOptions.JVM_OVERHEAD_MIN.key(), "1GB");
> conf.setString(TaskManagerOptions.JVM_OVERHEAD_MAX.key(), "2GB");
> StreamExecutionEnvironment.createLocalEnvironment(conf)
> .fromElements("Hello", "World")
> .executeAndCollect()
> .forEachRemaining(System.out::println); {code}
> The failing exception is something like:
> {code:java}
> Exception in thread "main" java.lang.IllegalArgumentException
>   at org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:122)
>   at 
> org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.calculateTotalProcessMemoryFromComponents(TaskExecutorResourceUtils.java:182)
>   at 
> org.apache.flink.runtime.taskexecutor.TaskExecutorMemoryConfiguration.create(TaskExecutorMemoryConfiguration.java:119)
> {code}
> I think the problem was that we expect the max and min to equal, but local 
> executor did not reset them correctly?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-30081) Local executor can not accept different jvm-overhead.min/max values

2024-02-18 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-30081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song reassigned FLINK-30081:


Assignee: Mingliang Liu

> Local executor can not accept different jvm-overhead.min/max values
> ---
>
> Key: FLINK-30081
> URL: https://issues.apache.org/jira/browse/FLINK-30081
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Configuration
>Affects Versions: 1.16.0
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>Priority: Minor
>  Labels: auto-deprioritized-major, pull-request-available
>
> In local executor, it's not possible to set different values for 
> {{taskmanager.memory.jvm-overhead.max}} and 
> {{{}taskmanager.memory.jvm-overhead.min{}}}. The same problem for 
> {{taskmanager.memory.network.max}} and {{{}taskmanager.memory.network.min{}}}.
> Sample code to reproduce:
> {code:java}
> Configuration conf = new Configuration();
> conf.setString(TaskManagerOptions.JVM_OVERHEAD_MIN.key(), "1GB");
> conf.setString(TaskManagerOptions.JVM_OVERHEAD_MAX.key(), "2GB");
> StreamExecutionEnvironment.createLocalEnvironment(conf)
> .fromElements("Hello", "World")
> .executeAndCollect()
> .forEachRemaining(System.out::println); {code}
> The failing exception is something like:
> {code:java}
> Exception in thread "main" java.lang.IllegalArgumentException
>   at org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:122)
>   at 
> org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.calculateTotalProcessMemoryFromComponents(TaskExecutorResourceUtils.java:182)
>   at 
> org.apache.flink.runtime.taskexecutor.TaskExecutorMemoryConfiguration.create(TaskExecutorMemoryConfiguration.java:119)
> {code}
> I think the problem was that we expect the max and min to equal, but local 
> executor did not reset them correctly?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-34402) Class loading conflicts when using PowerMock in ITcase.

2024-02-07 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song closed FLINK-34402.

Resolution: Not A Bug

Thanks for you understanding.

Closing the ticket as "Not A Bug"

>  Class loading conflicts when using PowerMock in ITcase.
> 
>
> Key: FLINK-34402
> URL: https://issues.apache.org/jira/browse/FLINK-34402
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: yisha zhou
>Assignee: yisha zhou
>Priority: Major
>
> Currently when no user jars exist, system classLoader will be used to load 
> classes as default. However, if we use powerMock to create some ITCases, the 
> framework will utilize JavassistMockClassLoader to load classes.  Forcing the 
> use of the system classLoader can lead to class loading conflict issue.
> Therefore we should use Thread.currentThread().getContextClassLoader() 
> instead of 
> ClassLoader.getSystemClassLoader() here.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34402) Class loading conflicts when using PowerMock in ITcase.

2024-02-07 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17815206#comment-17815206
 ] 

Xintong Song commented on FLINK-34402:
--

 > Returning to the change, do you have any concern about the replacement?

Replacing `ClassLoader.getSystemClassLoader()` with 
`Thread.currentThread().getContextClassLoader()` implies the assumption that 
`BlobLibraryCacheManager` should be called from a thread that is created by the 
system classloader, which is implicit and fragile. If later 
`BlobLibraryCacheManager` is called from another thread, the assumption can 
easily be overlooked and broken, leading to unpredictable behaviors. This may 
not be absolutely unaffordable, but compared to what we gain from the changes, 
I'd rather not to apply it.

>  Class loading conflicts when using PowerMock in ITcase.
> 
>
> Key: FLINK-34402
> URL: https://issues.apache.org/jira/browse/FLINK-34402
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: yisha zhou
>Assignee: yisha zhou
>Priority: Major
>
> Currently when no user jars exist, system classLoader will be used to load 
> classes as default. However, if we use powerMock to create some ITCases, the 
> framework will utilize JavassistMockClassLoader to load classes.  Forcing the 
> use of the system classLoader can lead to class loading conflict issue.
> Therefore we should use Thread.currentThread().getContextClassLoader() 
> instead of 
> ClassLoader.getSystemClassLoader() here.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34402) Class loading conflicts when using PowerMock in ITcase.

2024-02-06 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17815142#comment-17815142
 ] 

Xintong Song commented on FLINK-34402:
--

Thanks for the detailed information.

I think this should not be a bug. The reported issue doesn't really affect any 
real production use case. And Flink is not designed to be executed with a 
PowerMockRunner or JavassistMockClassLoader.

I'm also not familiar with other approaches for checking API calling times and 
query outputs, other than manually implementing them. But if the cases are only 
for internal usages in your company, you don't really need to follow the 
community code-style guides.

My suggestion would be to apply the proposed changes only to your internal 
fork. WDYT?

>  Class loading conflicts when using PowerMock in ITcase.
> 
>
> Key: FLINK-34402
> URL: https://issues.apache.org/jira/browse/FLINK-34402
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: yisha zhou
>Assignee: yisha zhou
>Priority: Major
> Fix For: 1.19.0
>
>
> Currently when no user jars exist, system classLoader will be used to load 
> classes as default. However, if we use powerMock to create some ITCases, the 
> framework will utilize JavassistMockClassLoader to load classes.  Forcing the 
> use of the system classLoader can lead to class loading conflict issue.
> Therefore we should use Thread.currentThread().getContextClassLoader() 
> instead of 
> ClassLoader.getSystemClassLoader() here.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34402) Class loading conflicts when using PowerMock in ITcase.

2024-02-06 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17815096#comment-17815096
 ] 

Xintong Song commented on FLINK-34402:
--

Hi [~nilerzhou],

The description of this issue is a bit unclear to me. Could you provide a bit 
more information?
- In which ITCase did you run into the problem? If it's an ITCase that is not 
yet exist and you are planning to add, it would be helpful to also provide the 
codes so others can reproduce the issue.
- Where exactly are you suggesting to replace 
`ClassLoader.getSystemClassLoader()` with 
`Thread.currentThread().getContextClassLoader()`?

BTW, it is discouraged to use Mockito for testing. See the Code Style and 
Quality Guide for more details. 
https://flink.apache.org/how-to-contribute/code-style-and-quality-common/#avoid-mockito---use-reusable-test-implementations

>  Class loading conflicts when using PowerMock in ITcase.
> 
>
> Key: FLINK-34402
> URL: https://issues.apache.org/jira/browse/FLINK-34402
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: yisha zhou
>Assignee: yisha zhou
>Priority: Major
> Fix For: 1.19.0
>
>
> Currently when no user jars exist, system classLoader will be used to load 
> classes as default. However, if we use powerMock to create some ITCases, the 
> framework will utilize JavassistMockClassLoader to load classes.  Forcing the 
> use of the system classLoader can lead to class loading conflict issue.
> Therefore we should use Thread.currentThread().getContextClassLoader() 
> instead of 
> ClassLoader.getSystemClassLoader() here.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-34237) MongoDB connector compile failed with Flink 1.19-SNAPSHOT

2024-01-26 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song closed FLINK-34237.

Resolution: Fixed

master (1.19): d0829ba3b162c24f2655b35258c9a8dc61cdba67

> MongoDB connector compile failed with Flink 1.19-SNAPSHOT
> -
>
> Key: FLINK-34237
> URL: https://issues.apache.org/jira/browse/FLINK-34237
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core, Connectors / MongoDB
>Reporter: Leonard Xu
>Assignee: Wencong Liu
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> {code:java}
> Error:  Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.8.0:compile 
> (default-compile) on project flink-connector-mongodb: Compilation failure
> 134Error:  
> /home/runner/work/flink-connector-mongodb/flink-connector-mongodb/flink-connector-mongodb/src/main/java/org/apache/flink/connector/mongodb/source/reader/MongoSourceReaderContext.java:[35,8]
>  org.apache.flink.connector.mongodb.source.reader.MongoSourceReaderContext is 
> not abstract and does not override abstract method getTaskInfo() in 
> org.apache.flink.api.connector.source.SourceReaderContext
> 135{code}
> [https://github.com/apache/flink-connector-mongodb/actions/runs/7657281844/job/20867604084]
> This is related to FLINK-33905
> One point: As 
> [FLIP-382|https://cwiki.apache.org/confluence/display/FLINK/FLIP-382%3A+Unify+the+Provision+of+Diverse+Metadata+for+Context-like+APIs]
>  is accepted,  all connectors who implement SourceReaderContext (i.e 
> MongoSourceReaderContext) should implement new introduced methods ` 
> getTaskInfo()` if they want to compile/work with Flink 1.19.
> Another point: The FLIP-382 didn't mentioned the connector backward 
> compatibility well, maybe we need to rethink the section. As I just have a 
> rough look at the FLIP, maybe [~xtsong] and [~Wencong Liu] could comment 
> under this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34237) MongoDB connector compile failed with Flink 1.19-SNAPSHOT

2024-01-25 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song updated FLINK-34237:
-
Fix Version/s: 1.19.0
   (was: mongodb-1.1.0)

> MongoDB connector compile failed with Flink 1.19-SNAPSHOT
> -
>
> Key: FLINK-34237
> URL: https://issues.apache.org/jira/browse/FLINK-34237
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core, Connectors / MongoDB
>Reporter: Leonard Xu
>Assignee: Wencong Liu
>Priority: Blocker
> Fix For: 1.19.0
>
>
> {code:java}
> Error:  Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.8.0:compile 
> (default-compile) on project flink-connector-mongodb: Compilation failure
> 134Error:  
> /home/runner/work/flink-connector-mongodb/flink-connector-mongodb/flink-connector-mongodb/src/main/java/org/apache/flink/connector/mongodb/source/reader/MongoSourceReaderContext.java:[35,8]
>  org.apache.flink.connector.mongodb.source.reader.MongoSourceReaderContext is 
> not abstract and does not override abstract method getTaskInfo() in 
> org.apache.flink.api.connector.source.SourceReaderContext
> 135{code}
> [https://github.com/apache/flink-connector-mongodb/actions/runs/7657281844/job/20867604084]
> This is related to FLINK-33905
> One point: As 
> [FLIP-382|https://cwiki.apache.org/confluence/display/FLINK/FLIP-382%3A+Unify+the+Provision+of+Diverse+Metadata+for+Context-like+APIs]
>  is accepted,  all connectors who implement SourceReaderContext (i.e 
> MongoSourceReaderContext) should implement new introduced methods ` 
> getTaskInfo()` if they want to compile/work with Flink 1.19. As I just have a 
> rough look at the FLIP, maybe [~xtsong] and [~Wencong Liu] could comment 
> under this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34237) MongoDB connector compile failed with Flink 1.19-SNAPSHOT

2024-01-25 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17811082#comment-17811082
 ] 

Xintong Song commented on FLINK-34237:
--

Thanks for reporting. This is indeed an unintended breaking change and a 
blocker for Flink 1.19.

We thought `SourceReaderContext` is only called by various connectors and were 
not aware of that it is also implemented by connectors. FLIP-382 is for 
clean-up purposes and does not introduce any new feature. Even in Flink 2.0, I 
think we should not require all connectors to change their codes only for such 
clean-up purposes. So let's simply revert changes for this interface.

[~Wencong Liu], could you please help fix this?

> MongoDB connector compile failed with Flink 1.19-SNAPSHOT
> -
>
> Key: FLINK-34237
> URL: https://issues.apache.org/jira/browse/FLINK-34237
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core, Connectors / MongoDB
>Reporter: Leonard Xu
>Priority: Blocker
> Fix For: mongodb-1.1.0
>
>
> {code:java}
> Error:  Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.8.0:compile 
> (default-compile) on project flink-connector-mongodb: Compilation failure
> 134Error:  
> /home/runner/work/flink-connector-mongodb/flink-connector-mongodb/flink-connector-mongodb/src/main/java/org/apache/flink/connector/mongodb/source/reader/MongoSourceReaderContext.java:[35,8]
>  org.apache.flink.connector.mongodb.source.reader.MongoSourceReaderContext is 
> not abstract and does not override abstract method getTaskInfo() in 
> org.apache.flink.api.connector.source.SourceReaderContext
> 135{code}
> [https://github.com/apache/flink-connector-mongodb/actions/runs/7657281844/job/20867604084]
> This is related to FLINK-33905
> One point: As 
> [FLIP-382|https://cwiki.apache.org/confluence/display/FLINK/FLIP-382%3A+Unify+the+Provision+of+Diverse+Metadata+for+Context-like+APIs]
>  is accepted,  all connectors who implement SourceReaderContext (i.e 
> MongoSourceReaderContext) should implement new introduced methods ` 
> getTaskInfo()` if they want to compile/work with Flink 1.19. As I just have a 
> rough look at the FLIP, maybe [~xtsong] and [~Wencong Liu] could comment 
> under this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-34237) MongoDB connector compile failed with Flink 1.19-SNAPSHOT

2024-01-25 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song reassigned FLINK-34237:


Assignee: Wencong Liu

> MongoDB connector compile failed with Flink 1.19-SNAPSHOT
> -
>
> Key: FLINK-34237
> URL: https://issues.apache.org/jira/browse/FLINK-34237
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core, Connectors / MongoDB
>Reporter: Leonard Xu
>Assignee: Wencong Liu
>Priority: Blocker
> Fix For: mongodb-1.1.0
>
>
> {code:java}
> Error:  Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.8.0:compile 
> (default-compile) on project flink-connector-mongodb: Compilation failure
> 134Error:  
> /home/runner/work/flink-connector-mongodb/flink-connector-mongodb/flink-connector-mongodb/src/main/java/org/apache/flink/connector/mongodb/source/reader/MongoSourceReaderContext.java:[35,8]
>  org.apache.flink.connector.mongodb.source.reader.MongoSourceReaderContext is 
> not abstract and does not override abstract method getTaskInfo() in 
> org.apache.flink.api.connector.source.SourceReaderContext
> 135{code}
> [https://github.com/apache/flink-connector-mongodb/actions/runs/7657281844/job/20867604084]
> This is related to FLINK-33905
> One point: As 
> [FLIP-382|https://cwiki.apache.org/confluence/display/FLINK/FLIP-382%3A+Unify+the+Provision+of+Diverse+Metadata+for+Context-like+APIs]
>  is accepted,  all connectors who implement SourceReaderContext (i.e 
> MongoSourceReaderContext) should implement new introduced methods ` 
> getTaskInfo()` if they want to compile/work with Flink 1.19. As I just have a 
> rough look at the FLIP, maybe [~xtsong] and [~Wencong Liu] could comment 
> under this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-34084) Deprecate unused configuration in BinaryInput/OutputFormat and FileInput/OutputFormat

2024-01-17 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song closed FLINK-34084.

Resolution: Done

master (1.19): 4559b851b22d6ffa197aa311adbea15b21a43c66

> Deprecate unused configuration in BinaryInput/OutputFormat and 
> FileInput/OutputFormat
> -
>
> Key: FLINK-34084
> URL: https://issues.apache.org/jira/browse/FLINK-34084
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Configuration
>Reporter: Xuannan Su
>Assignee: Xuannan Su
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Update FileInputFormat.java, FileOutputFormat.java, BinaryInputFormat.java, 
> and BinaryOutputFormat.java to deprecate unused string configuration keys.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33728) do not rewatch when KubernetesResourceManagerDriver watch fail

2024-01-14 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17806641#comment-17806641
 ] 

Xintong Song commented on FLINK-33728:
--

Sure, thanks for volunteering working on this. I've assigned you to the ticket. 
Please go ahead.

> do not rewatch when KubernetesResourceManagerDriver watch fail
> --
>
> Key: FLINK-33728
> URL: https://issues.apache.org/jira/browse/FLINK-33728
> Project: Flink
>  Issue Type: New Feature
>  Components: Deployment / Kubernetes
>Reporter: xiaogang zhou
>Assignee: xiaogang zhou
>Priority: Major
>  Labels: pull-request-available
>
> I met massive production problem when kubernetes ETCD slow responding happen. 
> After Kube recoverd after 1 hour, Thousands of Flink jobs using 
> kubernetesResourceManagerDriver rewatched when recieving 
> ResourceVersionTooOld,  which caused great pressure on API Server and made 
> API server failed again... 
>  
> I am not sure is it necessary to
> getResourceEventHandler().onError(throwable)
> in  PodCallbackHandlerImpl# handleError method?
>  
> We can just neglect the disconnection of watching process. and try to rewatch 
> once new requestResource called. And we can leverage on the akka heartbeat 
> timeout to discover the TM failure, just like YARN mode do.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33728) do not rewatch when KubernetesResourceManagerDriver watch fail

2024-01-14 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song reassigned FLINK-33728:


Assignee: xiaogang zhou

> do not rewatch when KubernetesResourceManagerDriver watch fail
> --
>
> Key: FLINK-33728
> URL: https://issues.apache.org/jira/browse/FLINK-33728
> Project: Flink
>  Issue Type: New Feature
>  Components: Deployment / Kubernetes
>Reporter: xiaogang zhou
>Assignee: xiaogang zhou
>Priority: Major
>  Labels: pull-request-available
>
> I met massive production problem when kubernetes ETCD slow responding happen. 
> After Kube recoverd after 1 hour, Thousands of Flink jobs using 
> kubernetesResourceManagerDriver rewatched when recieving 
> ResourceVersionTooOld,  which caused great pressure on API Server and made 
> API server failed again... 
>  
> I am not sure is it necessary to
> getResourceEventHandler().onError(throwable)
> in  PodCallbackHandlerImpl# handleError method?
>  
> We can just neglect the disconnection of watching process. and try to rewatch 
> once new requestResource called. And we can leverage on the akka heartbeat 
> timeout to discover the TM failure, just like YARN mode do.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-32978) Deprecate RichFunction#open(Configuration parameters)

2024-01-14 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song closed FLINK-32978.

Resolution: Fixed

Breaking changes reverted.
master (1.19): 1d6150f386d9c9ec61f4ab30853b915de7712047

> Deprecate RichFunction#open(Configuration parameters)
> -
>
> Key: FLINK-32978
> URL: https://issues.apache.org/jira/browse/FLINK-32978
> Project: Flink
>  Issue Type: Technical Debt
>  Components: API / Core
>Affects Versions: 1.19.0
>Reporter: Wencong Liu
>Assignee: Wencong Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> The 
> [FLIP-344|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263425231]
>  has decided that the parameter in RichFunction#open will be removed in the 
> next major version. We should deprecate it now and remove it in Flink 2.0. 
> The removal will be tracked in 
> [FLINK-6912|https://issues.apache.org/jira/browse/FLINK-6912].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33728) do not rewatch when KubernetesResourceManagerDriver watch fail

2024-01-14 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17806586#comment-17806586
 ] 

Xintong Song commented on FLINK-33728:
--

Sorry for the late reply, I was distracted by some other works last week.

I think you are right about that JM will kill itself if the re-watch does not 
succeed. I think it is expected in most cases that the client try re-watch 
immediately after seeing a ResourceVersionTooOld exception. However, if the 
first attempt to re-watch fail, JM should not kill itself immediately, but may 
retry with some backoff interval.

cc [~wangyang0918]

> do not rewatch when KubernetesResourceManagerDriver watch fail
> --
>
> Key: FLINK-33728
> URL: https://issues.apache.org/jira/browse/FLINK-33728
> Project: Flink
>  Issue Type: New Feature
>  Components: Deployment / Kubernetes
>Reporter: xiaogang zhou
>Priority: Major
>  Labels: pull-request-available
>
> I met massive production problem when kubernetes ETCD slow responding happen. 
> After Kube recoverd after 1 hour, Thousands of Flink jobs using 
> kubernetesResourceManagerDriver rewatched when recieving 
> ResourceVersionTooOld,  which caused great pressure on API Server and made 
> API server failed again... 
>  
> I am not sure is it necessary to
> getResourceEventHandler().onError(throwable)
> in  PodCallbackHandlerImpl# handleError method?
>  
> We can just neglect the disconnection of watching process. and try to rewatch 
> once new requestResource called. And we can leverage on the akka heartbeat 
> timeout to discover the TM failure, just like YARN mode do.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33905) FLIP-382: Unify the Provision of Diverse Metadata for Context-like APIs

2024-01-10 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song closed FLINK-33905.

Resolution: Done

master (1.19): 06b46a9cbf0d8fa987bbde570510f75a7558f54d

> FLIP-382: Unify the Provision of Diverse Metadata for Context-like APIs
> ---
>
> Key: FLINK-33905
> URL: https://issues.apache.org/jira/browse/FLINK-33905
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Core
>Affects Versions: 1.19.0
>Reporter: Wencong Liu
>Assignee: Wencong Liu
>Priority: Major
>  Labels: pull-request-available
>
> This ticket is proposed for 
> [FLIP-382|https://cwiki.apache.org/confluence/display/FLINK/FLIP-382%3A+Unify+the+Provision+of+Diverse+Metadata+for+Context-like+APIs].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33938) Correct implicit coercions in relational operators to adopt typescript 5.0

2024-01-08 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song closed FLINK-33938.

Resolution: Fixed

- master (1.19): e07545e458bd22099244a353ac29477ca3a13811
- release-1.18: 12463fbad39edc17af687c1421bba4623f924083
- release-1.17: ce62d45477447537088930ec116f7e18a2743166

> Correct implicit coercions in relational operators to adopt typescript 5.0
> --
>
> Key: FLINK-33938
> URL: https://issues.apache.org/jira/browse/FLINK-33938
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Web Frontend
>Affects Versions: 1.19.0
>Reporter: Ao Yuchen
>Assignee: Ao Yuchen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0, 1.17.3, 1.18.2
>
>
> Since TypeScript 5.0, there is a break change that implicit coercions in 
> relational operators are forbidden [1].
> So that the following code in 
> flink-runtime-web/web-dashboard/src/app/components/humanize-date.pipe.ts get 
> error:
> {code:java}
> public transform(
>   value: number | string | Date,
>   ...
> ): string | null | undefined {
>   if (value == null || value === '' || value !== value || value < 0) {
> return '-';
>   } 
>   ...
> }{code}
> The correctness improvement is availble in here 
> [2][.|https://github.com/microsoft/TypeScript/pull/52048.]
> I think we should optimize this type of code for better compatibility.
>  
> [1] 
> [https://devblogs.microsoft.com/typescript/announcing-typescript-5-0/#forbidden-implicit-coercions-in-relational-operators]
> [2] 
> [https://github.com/microsoft/TypeScript/pull/52048|https://github.com/microsoft/TypeScript/pull/52048.]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (FLINK-32978) Deprecate RichFunction#open(Configuration parameters)

2024-01-08 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song reopened FLINK-32978:
--

Reopening the ticket to fix the unexpected breaking changes.

> Deprecate RichFunction#open(Configuration parameters)
> -
>
> Key: FLINK-32978
> URL: https://issues.apache.org/jira/browse/FLINK-32978
> Project: Flink
>  Issue Type: Technical Debt
>  Components: API / Core
>Affects Versions: 1.19.0
>Reporter: Wencong Liu
>Assignee: Wencong Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> The 
> [FLIP-344|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263425231]
>  has decided that the parameter in RichFunction#open will be removed in the 
> next major version. We should deprecate it now and remove it in Flink 2.0. 
> The removal will be tracked in 
> [FLINK-6912|https://issues.apache.org/jira/browse/FLINK-6912].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32978) Deprecate RichFunction#open(Configuration parameters)

2024-01-08 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17804566#comment-17804566
 ] 

Xintong Song commented on FLINK-32978:
--

Thanks for reporting this, [~Sergey Nuyanzin].

You're right, this is indeed a problem.

I think the interface change itself is non-breaking. The problem is that we 
also migrated the built-in implementations from the old interface to the new 
one. That should be fine for internal classes, but would become a breaking 
change for @Public / @PublicEvolving classes which might be overridden by user 
codes.

[~Wencong Liu], could you please look into this?

> Deprecate RichFunction#open(Configuration parameters)
> -
>
> Key: FLINK-32978
> URL: https://issues.apache.org/jira/browse/FLINK-32978
> Project: Flink
>  Issue Type: Technical Debt
>  Components: API / Core
>Affects Versions: 1.19.0
>Reporter: Wencong Liu
>Assignee: Wencong Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> The 
> [FLIP-344|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263425231]
>  has decided that the parameter in RichFunction#open will be removed in the 
> next major version. We should deprecate it now and remove it in Flink 2.0. 
> The removal will be tracked in 
> [FLINK-6912|https://issues.apache.org/jira/browse/FLINK-6912].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33728) do not rewatch when KubernetesResourceManagerDriver watch fail

2024-01-08 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17804564#comment-17804564
 ] 

Xintong Song commented on FLINK-33728:
--

Thanks for pulling me in.

I'm also concerned about solely relying on heartbeat timeout to detecting pod 
failure. In addition the cleaning-up issue, it can also delay the detection of 
pod failure in many cases.

IIUC, the problem we are trying to solve here is to avoid massive Flink jobs 
trying to re-creating watches at the same time. That doesn't necessarily result 
in the proposed solution.
1. I think this is not a problem of individual Flink jobs, but a problem of the 
K8s cluster that runs massive Flink workloads. Ideally, such problems, i.e. how 
to better deal with the massive workloads, should be solved on the K8s cluster 
side. However, I don't have the expertise to come up with a cluster-side 
solution.
2. If 1) is not feasible, I think we can introduce a random backoff. User may 
configure a max backoff time (default 0), and Flink randomly pick a time that 
is no greater than the max to re-create the watch. Ideally, that would spread 
the pressure on API server over a longer and configurable period.

WDYT?

> do not rewatch when KubernetesResourceManagerDriver watch fail
> --
>
> Key: FLINK-33728
> URL: https://issues.apache.org/jira/browse/FLINK-33728
> Project: Flink
>  Issue Type: New Feature
>  Components: Deployment / Kubernetes
>Reporter: xiaogang zhou
>Priority: Major
>  Labels: pull-request-available
>
> I met massive production problem when kubernetes ETCD slow responding happen. 
> After Kube recoverd after 1 hour, Thousands of Flink jobs using 
> kubernetesResourceManagerDriver rewatched when recieving 
> ResourceVersionTooOld,  which caused great pressure on API Server and made 
> API server failed again... 
>  
> I am not sure is it necessary to
> getResourceEventHandler().onError(throwable)
> in  PodCallbackHandlerImpl# handleError method?
>  
> We can just neglect the disconnection of watching process. and try to rewatch 
> once new requestResource called. And we can leverage on the akka heartbeat 
> timeout to discover the TM failure, just like YARN mode do.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33939) Make husky in runtime-web no longer affect git global hooks

2024-01-03 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song updated FLINK-33939:
-
Affects Version/s: 1.9.0

> Make husky in runtime-web no longer affect git global hooks
> ---
>
> Key: FLINK-33939
> URL: https://issues.apache.org/jira/browse/FLINK-33939
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Web Frontend
>Affects Versions: 1.9.0
>Reporter: Jason TANG
>Assignee: Jason TANG
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0, 1.17.3, 1.18.2
>
>
> Since runtime-web relies on husky to ensure that front-end code changes are 
> detected before `git commit`, husky modifies the global git hooks 
> (core.hooksPath) so that core.hooksPath won't take effect if it's configured 
> globally, I thought it would be a good idea to make the front-end code 
> detection a optional command execution, which ensures that the globally 
> configured hooks are executed correctly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33939) Make husky in runtime-web no longer affect git global hooks

2024-01-03 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song closed FLINK-33939.

Resolution: Fixed

- master (1.19): 8e8e2d649e98071bc15ee768cf65da3e96f255b4
- release-1.18: 8342ac7f1e9b52438a3f26bf96b2102c7145dcab
- release-1.17: 32af5b368569f6bbd7bff45a7be5ea17d8e59c65

> Make husky in runtime-web no longer affect git global hooks
> ---
>
> Key: FLINK-33939
> URL: https://issues.apache.org/jira/browse/FLINK-33939
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Web Frontend
>Reporter: Jason TANG
>Assignee: Jason TANG
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0, 1.17.3, 1.18.2
>
>
> Since runtime-web relies on husky to ensure that front-end code changes are 
> detected before `git commit`, husky modifies the global git hooks 
> (core.hooksPath) so that core.hooksPath won't take effect if it's configured 
> globally, I thought it would be a good idea to make the front-end code 
> detection a optional command execution, which ensures that the globally 
> configured hooks are executed correctly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33939) Make husky in runtime-web no longer affect git global hooks

2024-01-03 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song updated FLINK-33939:
-
Issue Type: Bug  (was: Improvement)

> Make husky in runtime-web no longer affect git global hooks
> ---
>
> Key: FLINK-33939
> URL: https://issues.apache.org/jira/browse/FLINK-33939
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Web Frontend
>Reporter: Jason TANG
>Assignee: Jason TANG
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0, 1.17.3, 1.18.2
>
>
> Since runtime-web relies on husky to ensure that front-end code changes are 
> detected before `git commit`, husky modifies the global git hooks 
> (core.hooksPath) so that core.hooksPath won't take effect if it's configured 
> globally, I thought it would be a good idea to make the front-end code 
> detection a optional command execution, which ensures that the globally 
> configured hooks are executed correctly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33939) Make husky in runtime-web no longer affect git global hooks

2024-01-03 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song updated FLINK-33939:
-
Fix Version/s: 1.19.0
   1.17.3
   1.18.2

> Make husky in runtime-web no longer affect git global hooks
> ---
>
> Key: FLINK-33939
> URL: https://issues.apache.org/jira/browse/FLINK-33939
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Web Frontend
>Reporter: Jason TANG
>Assignee: Jason TANG
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0, 1.17.3, 1.18.2
>
>
> Since runtime-web relies on husky to ensure that front-end code changes are 
> detected before `git commit`, husky modifies the global git hooks 
> (core.hooksPath) so that core.hooksPath won't take effect if it's configured 
> globally, I thought it would be a good idea to make the front-end code 
> detection a optional command execution, which ensures that the globally 
> configured hooks are executed correctly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33939) Make husky in runtime-web no longer affect git global hooks

2024-01-03 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song updated FLINK-33939:
-
Priority: Major  (was: Minor)

> Make husky in runtime-web no longer affect git global hooks
> ---
>
> Key: FLINK-33939
> URL: https://issues.apache.org/jira/browse/FLINK-33939
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Web Frontend
>Reporter: Jason TANG
>Assignee: Jason TANG
>Priority: Major
>  Labels: pull-request-available
>
> Since runtime-web relies on husky to ensure that front-end code changes are 
> detected before `git commit`, husky modifies the global git hooks 
> (core.hooksPath) so that core.hooksPath won't take effect if it's configured 
> globally, I thought it would be a good idea to make the front-end code 
> detection a optional command execution, which ensures that the globally 
> configured hooks are executed correctly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33962) Chaining-agnostic OperatorID generation for improved state compatibility on parallelism change

2024-01-02 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801976#comment-17801976
 ] 

Xintong Song commented on FLINK-33962:
--

Thanks for reaching out, [~Zhanghao Chen].

Just some quick responses, I would need to look a bit more into the related 
components before giving further comments.

Based on your description, in general I think it makes sense to make operator 
id generation independent from chaining. However, as you have already 
mentioned, this is a breaking change that may result in state incompatibility. 
Therefore, I think it deserves a FLIP discussion and an official vote.

> Chaining-agnostic OperatorID generation for improved state compatibility on 
> parallelism change
> --
>
> Key: FLINK-33962
> URL: https://issues.apache.org/jira/browse/FLINK-33962
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Core
>Reporter: Zhanghao Chen
>Priority: Major
>
> *Background*
> Flink restores opeartor state from snapshots based on matching the 
> operatorIDs. Since Flink 1.2, {{StreamGraphHasherV2}} is used for operatorID 
> generation when no user-set uid exists. The generated OperatorID is 
> deterministic with respect to:
>  * node-local properties (the traverse ID in the BFS for the stream graph)
>  * chained output nodes
>  * input nodes hashes
> *Problem*
> The chaining behavior will affect state compatibility, as the generation of 
> the OperatorID of an Op is dependent on its chained output nodes. For 
> example, a simple source->sink DAG with source and sink chained together is 
> state imcompatible with an otherwise identical DAG with source and sink 
> unchained (either because the parallelisms of the two ops are changed to be 
> unequal or chaining is disabled). This greatly limits the flexibility to 
> perform chain-breaking/joining for performance tuning.
> *Proposal*
> Introduce {{StreamGraphHasherV3}} that is agnostic to the chaining behavior 
> of operators, which effectively just removes L227-235 of 
> [flink/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/graph/StreamGraphHasherV2.java
>  at master · apache/flink 
> (github.com)|https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/graph/StreamGraphHasherV2.java].
>  
> This will not hurt the deteministicity of the ID generation across job 
> submission as long as the stream graph topology doesn't change, and since new 
> versions of Flink have already adopted pure operator-level state recovery, 
> this will not break state recovery across job submission as long as both 
> submissions use the same hasher.
> This will, however, break cross-version state compatibility. So we can 
> introduce a new option to enable using HasherV3 in v1.19 and consider making 
> it the default hasher in v2.0.
> Looking forward to suggestions on this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33938) Correct implicit coercions in relational operators to adopt typescript 5.0

2023-12-26 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song reassigned FLINK-33938:


Assignee: Ao Yuchen

> Correct implicit coercions in relational operators to adopt typescript 5.0
> --
>
> Key: FLINK-33938
> URL: https://issues.apache.org/jira/browse/FLINK-33938
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Web Frontend
>Affects Versions: 1.19.0
>Reporter: Ao Yuchen
>Assignee: Ao Yuchen
>Priority: Major
> Fix For: 1.19.0, 1.17.3, 1.18.2
>
>
> Since TypeScript 5.0, there is a break change that implicit coercions in 
> relational operators are forbidden [1].
> So that the following code in 
> flink-runtime-web/web-dashboard/src/app/components/humanize-date.pipe.ts get 
> error:
> {code:java}
> public transform(
>   value: number | string | Date,
>   ...
> ): string | null | undefined {
>   if (value == null || value === '' || value !== value || value < 0) {
> return '-';
>   } 
>   ...
> }{code}
> The correctness improvement is availble in here 
> [2][.|https://github.com/microsoft/TypeScript/pull/52048.]
> I think we should optimize this type of code for better compatibility.
>  
> [1] 
> [https://devblogs.microsoft.com/typescript/announcing-typescript-5-0/#forbidden-implicit-coercions-in-relational-operators]
> [2] 
> [https://github.com/microsoft/TypeScript/pull/52048|https://github.com/microsoft/TypeScript/pull/52048.]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33939) Make husky in runtime-web no longer affect git global hooks

2023-12-25 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800423#comment-17800423
 ] 

Xintong Song commented on FLINK-33939:
--

Sounds good to me. Thanks for reporting and volunteering on this, 
[~simplejason]. You are assigned. Please go ahead.

> Make husky in runtime-web no longer affect git global hooks
> ---
>
> Key: FLINK-33939
> URL: https://issues.apache.org/jira/browse/FLINK-33939
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Web Frontend
>Reporter: Jason TANG
>Assignee: Jason TANG
>Priority: Minor
>
> Since runtime-web relies on husky to ensure that front-end code changes are 
> detected before `git commit`, husky modifies the global git hooks 
> (core.hooksPath) so that core.hooksPath won't take effect if it's configured 
> globally, I thought it would be a good idea to make the front-end code 
> detection a optional command execution, which ensures that the globally 
> configured hooks are executed correctly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33939) Make husky in runtime-web no longer affect git global hooks

2023-12-25 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song reassigned FLINK-33939:


Assignee: Jason TANG

> Make husky in runtime-web no longer affect git global hooks
> ---
>
> Key: FLINK-33939
> URL: https://issues.apache.org/jira/browse/FLINK-33939
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Web Frontend
>Reporter: Jason TANG
>Assignee: Jason TANG
>Priority: Minor
>
> Since runtime-web relies on husky to ensure that front-end code changes are 
> detected before `git commit`, husky modifies the global git hooks 
> (core.hooksPath) so that core.hooksPath won't take effect if it's configured 
> globally, I thought it would be a good idea to make the front-end code 
> detection a optional command execution, which ensures that the globally 
> configured hooks are executed correctly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33810) Propagate RecordAttributes that contains isProcessingBacklog status

2023-12-19 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song closed FLINK-33810.

Fix Version/s: 1.19.0
   Resolution: Done

master (1.19): f6bbf1cf364a3b4d04e6b6ddc522bad6431b43c4

> Propagate RecordAttributes that contains isProcessingBacklog status
> ---
>
> Key: FLINK-33810
> URL: https://issues.apache.org/jira/browse/FLINK-33810
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Task
>Reporter: Xuannan Su
>Assignee: Xuannan Su
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33810) Propagate RecordAttributes that contains isProcessingBacklog status

2023-12-19 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song reassigned FLINK-33810:


Assignee: Xuannan Su

> Propagate RecordAttributes that contains isProcessingBacklog status
> ---
>
> Key: FLINK-33810
> URL: https://issues.apache.org/jira/browse/FLINK-33810
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Task
>Reporter: Xuannan Su
>Assignee: Xuannan Su
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-31335) using sql-gateway to submit job to yarn cluster, sql-gateway don't support kerberos

2023-11-28 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song updated FLINK-31335:
-
Fix Version/s: 1.17.3
   (was: 1.17.2)

> using sql-gateway to submit job to yarn cluster, sql-gateway don't support 
> kerberos
> ---
>
> Key: FLINK-31335
> URL: https://issues.apache.org/jira/browse/FLINK-31335
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / Gateway
>Affects Versions: 1.16.0
>Reporter: felixzh
>Assignee: felixzh
>Priority: Major
>  Labels: pull-request-available, stale-assigned
> Fix For: 1.17.3
>
>
> when submit job to yarn cluster, sql-gateway don't support kerberos.
> 1. yarn-per-job mode
> -Dexecution.target=yarn-per-job
> 2. yarn-session mode
> -Dexecution.target=yarn-session -Dyarn.application.id=yarnSessionAppID(eg: 
> application_1677479737242_0052)
> sql-gateway need to use SecurityUtils Modules.
> default use flink-conf.yaml(security.kerberos.login.principal and 
> security.kerberos.login.keytab), also support 
> -Dsecurity.kerberos.login.keytab and -Dsecurity.kerberos.login.principal (eg: 
> 1/2)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33041) Add an introduction about how to migrate DataSet API to DataStream

2023-11-02 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song closed FLINK-33041.

Fix Version/s: 1.18.1
 Assignee: Wencong Liu
   Resolution: Done

- master (1.19): 9fcea6e61a99c673205baf21a1159647099fdf67
- release-1.18: c0866243ce7283e26544472368b860991463a9f8

> Add an introduction about how to migrate DataSet API to DataStream
> --
>
> Key: FLINK-33041
> URL: https://issues.apache.org/jira/browse/FLINK-33041
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.18.0
>Reporter: Wencong Liu
>Assignee: Wencong Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0, 1.18.1
>
>
> The DataSet API has been formally deprecated and will no longer receive 
> active maintenance and support. It will be removed in the Flink 2.0 version. 
> Flink users are recommended to migrate from the DataSet API to the DataStream 
> API, Table API and SQL for their data processing requirements.
> Most of the DataSet operators can be implemented using the DataStream API. 
> However, we believe it would be beneficial to have an introductory article on 
> the Flink website that guides users in migrating their DataSet jobs to 
> DataStream.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33204) Add description for missing options in the all jobmanager/taskmanager options section in document

2023-10-09 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song closed FLINK-33204.

Resolution: Fixed

master (1.19): 011b6b44074bad6b5f6db416f77a15c83a47ccc2

> Add description for missing options in the all jobmanager/taskmanager options 
> section in document
> -
>
> Key: FLINK-33204
> URL: https://issues.apache.org/jira/browse/FLINK-33204
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Runtime / Configuration
>Affects Versions: 1.17.0, 1.18.0
>Reporter: Zhanghao Chen
>Assignee: Zhanghao Chen
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> There are 4 options which are excluded from the all jobmanager/taskmanager 
> options section in the configuration document:
>  # taskmanager.bind-host
>  # taskmanager.rpc.bind-port
>  # jobmanager.bind-host
>  # jobmanager.rpc.bind-port
> We should add them to the document under the all  jobmanager/taskmanager 
> options section for completeness.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33204) Add description for missing options in the all jobmanager/taskmanager options section in document

2023-10-09 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song reassigned FLINK-33204:


Assignee: Zhanghao Chen

> Add description for missing options in the all jobmanager/taskmanager options 
> section in document
> -
>
> Key: FLINK-33204
> URL: https://issues.apache.org/jira/browse/FLINK-33204
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Runtime / Configuration
>Affects Versions: 1.17.0, 1.18.0
>Reporter: Zhanghao Chen
>Assignee: Zhanghao Chen
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> There are 4 options which are excluded from the all jobmanager/taskmanager 
> options section in the configuration document:
>  # taskmanager.bind-host
>  # taskmanager.rpc.bind-port
>  # jobmanager.bind-host
>  # jobmanager.rpc.bind-port
> We should add them to the document under the all  jobmanager/taskmanager 
> options section for completeness.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-32978) Deprecate RichFunction#open(Configuration parameters)

2023-09-11 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song closed FLINK-32978.

Resolution: Done

> Deprecate RichFunction#open(Configuration parameters)
> -
>
> Key: FLINK-32978
> URL: https://issues.apache.org/jira/browse/FLINK-32978
> Project: Flink
>  Issue Type: Technical Debt
>  Components: API / Core
>Affects Versions: 1.19.0
>Reporter: Wencong Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> The 
> [FLIP-344|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263425231]
>  has decided that the parameter in RichFunction#open will be removed in the 
> next major version. We should deprecate it now and remove it in Flink 2.0. 
> The removal will be tracked in 
> [FLINK-6912|https://issues.apache.org/jira/browse/FLINK-6912].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32978) Deprecate RichFunction#open(Configuration parameters)

2023-09-11 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763999#comment-17763999
 ] 

Xintong Song commented on FLINK-32978:
--

master (1.19): e9353319ad625baa5b2c20fa709ab5b23f83c0f4

> Deprecate RichFunction#open(Configuration parameters)
> -
>
> Key: FLINK-32978
> URL: https://issues.apache.org/jira/browse/FLINK-32978
> Project: Flink
>  Issue Type: Technical Debt
>  Components: API / Core
>Affects Versions: 1.19.0
>Reporter: Wencong Liu
>Assignee: Wencong Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> The 
> [FLIP-344|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263425231]
>  has decided that the parameter in RichFunction#open will be removed in the 
> next major version. We should deprecate it now and remove it in Flink 2.0. 
> The removal will be tracked in 
> [FLINK-6912|https://issues.apache.org/jira/browse/FLINK-6912].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-32978) Deprecate RichFunction#open(Configuration parameters)

2023-09-11 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song reassigned FLINK-32978:


Assignee: Wencong Liu

> Deprecate RichFunction#open(Configuration parameters)
> -
>
> Key: FLINK-32978
> URL: https://issues.apache.org/jira/browse/FLINK-32978
> Project: Flink
>  Issue Type: Technical Debt
>  Components: API / Core
>Affects Versions: 1.19.0
>Reporter: Wencong Liu
>Assignee: Wencong Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> The 
> [FLIP-344|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263425231]
>  has decided that the parameter in RichFunction#open will be removed in the 
> next major version. We should deprecate it now and remove it in Flink 2.0. 
> The removal will be tracked in 
> [FLINK-6912|https://issues.apache.org/jira/browse/FLINK-6912].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-32870) Reading multiple small buffers by reading and slicing one large buffer for tiered storage

2023-09-11 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song closed FLINK-32870.

Fix Version/s: 1.18.0
   Resolution: Fixed

- master(1.19): ade583cf80478ac60e9b83fc0a97f36bc2b26f1c
- release-1.18: d100ab65367fe0b3d74a9901bcaaa26049c996b0

> Reading multiple small buffers by reading and slicing one large buffer for 
> tiered storage
> -
>
> Key: FLINK-32870
> URL: https://issues.apache.org/jira/browse/FLINK-32870
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.18.0
>Reporter: Yuxin Tan
>Assignee: Yuxin Tan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.0
>
>
> Currently, when the file reader of tiered storage loads data from the disk 
> file, it reads data in buffer granularity. Before compression, each buffer is 
> 32K by default. After compressed, the size will become smaller (may less than 
> 5K), which is pretty small for the network buffer and the file IO. 
> We should read multiple small buffers by reading and slicing one large buffer 
> to decrease the buffer competition and the file IO, leading to better 
> performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-32979) Deprecate WindowAssigner#getDefaultTrigger(StreamExecutionEnvironment env)

2023-09-11 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song closed FLINK-32979.

  Assignee: Wencong Liu
Resolution: Done

master (1.19): 831cb8eff022db5543052c96716518c862c2

> Deprecate WindowAssigner#getDefaultTrigger(StreamExecutionEnvironment env)
> --
>
> Key: FLINK-32979
> URL: https://issues.apache.org/jira/browse/FLINK-32979
> Project: Flink
>  Issue Type: Technical Debt
>  Components: API / Core
>Affects Versions: 1.19.0
>Reporter: Wencong Liu
>Assignee: Wencong Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> The 
> [FLIP-343|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263425229]
>  has decided that the parameter in WindowAssigner#getDefaultTrigger() will be 
> removed in the next major version. We should deprecate it now and remove it 
> in Flink 2.0. The removal will be tracked in 
> [FLINK-4675|https://issues.apache.org/jira/browse/FLINK-4675].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-32817) Supports running jar file names with Spaces

2023-08-31 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song closed FLINK-32817.

Fix Version/s: 1.19.0
   Resolution: Fixed

master (1.19): 1d1247d4ae6d4313f7d952c4b2d66351314c9432

> Supports running jar file names with Spaces
> ---
>
> Key: FLINK-32817
> URL: https://issues.apache.org/jira/browse/FLINK-32817
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Affects Versions: 1.14.0
>Reporter: Xianxun Ye
>Assignee: Xianxun Ye
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> When submitting a flink jar to a yarn cluster, if the jar filename has spaces 
> in it, the task will not be able to successfully parse the file path in 
> `YarnLocalResourceDescriptor`, and the following exception will occur in 
> JobManager.
> The Flink jar file name is: StreamSQLExample 2.jar
> {code:java}
> bin/flink run -d -m yarn-cluster -p 1 -c 
> org.apache.flink.table.examples.java.basics.StreamSQLExample 
> StreamSQLExample\ 2.jar {code}
> {code:java}
> 2023-08-09 18:54:31,787 WARN  
> org.apache.flink.runtime.extension.resourcemanager.NeActiveResourceManager [] 
> - Failed requesting worker with resource spec WorkerResourceSpec 
> {cpuCores=1.0, taskHeapSize=220.160mb (230854450 bytes), taskOffHeapSize=0 
> bytes, networkMemSize=158.720mb (166429984 bytes), managedMemSize=952.320mb 
> (998579934 bytes), numSlots=1}, current pending count: 0
> java.util.concurrent.CompletionException: 
> org.apache.flink.util.FlinkException: Error to parse 
> YarnLocalResourceDescriptor from 
> YarnLocalResourceDescriptor{key=StreamSQLExample 2.jar, 
> path=hdfs://***/.flink/application_1586413220781_33151/StreamSQLExample 
> 2.jar, size=7937, modificationTime=1691578403748, visibility=APPLICATION, 
> type=FILE}
>     at 
> org.apache.flink.util.concurrent.FutureUtils.lambda$supplyAsync$21(FutureUtils.java:1052)
>  ~[flink-dist_2.12-1.14.0.jar:1.14.0]
>     at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
>  ~[?:1.8.0_152]
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[?:1.8.0_152]
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_152]
>     at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_152]
> Caused by: org.apache.flink.util.FlinkException: Error to parse 
> YarnLocalResourceDescriptor from 
> YarnLocalResourceDescriptor{key=StreamSQLExample 2.jar, 
> path=hdfs://sloth-jd-pub/user/sloth/.flink/application_1586413220781_33151/StreamSQLExample
>  2.jar, size=7937, modificationTime=1691578403748, visibility=APPLICATION, 
> type=FILE}
>     at 
> org.apache.flink.yarn.YarnLocalResourceDescriptor.fromString(YarnLocalResourceDescriptor.java:112)
>  ~[flink-dist_2.12-1.14.0.jar:1.14.0]
>     at 
> org.apache.flink.yarn.Utils.decodeYarnLocalResourceDescriptorListFromString(Utils.java:600)
>  ~[flink-dist_2.12-1.14.0.jar:1.14.0]
>     at org.apache.flink.yarn.Utils.createTaskExecutorContext(Utils.java:491) 
> ~[flink-dist_2.12-1.14.0.jar:1.14.0]
>     at 
> org.apache.flink.yarn.YarnResourceManagerDriver.createTaskExecutorLaunchContext(YarnResourceManagerDriver.java:452)
>  ~[flink-dist_2.12-1.14.0.jar:1.14.0]
>     at 
> org.apache.flink.yarn.YarnResourceManagerDriver.lambda$startTaskExecutorInContainerAsync$1(YarnResourceManagerDriver.java:383)
>  ~[flink-dist_2.12-1.14.0.jar:1.14.0]
>     at 
> org.apache.flink.util.concurrent.FutureUtils.lambda$supplyAsync$21(FutureUtils.java:1050)
>  ~[flink-dist_2.12-1.14.0.jar:1.14.0]
>     ... 4 more{code}
> From what I understand, the HDFS cluster allows for file names with spaces, 
> as well as S3.
>  
> I think we could replace the `LOCAL_RESOURCE_DESC_FORMAT` with 
> {code:java}
> // code placeholder
> private static final Pattern LOCAL_RESOURCE_DESC_FORMAT =
> Pattern.compile(
> "YarnLocalResourceDescriptor\\{"
> + "key=([\\S\\x20]+), path=([\\S\\x20]+), 
> size=([\\d]+), modificationTime=([\\d]+), visibility=(\\S+), type=(\\S+)}"); 
> {code}
> add '\x20' to only match the spaces



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32817) Supports running jar file names with Spaces

2023-08-27 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759443#comment-17759443
 ] 

Xintong Song commented on FLINK-32817:
--

Thanks, [~yesorno]. You are assigned.

I notice you already opened a PR, but there are some CI failures. Please let me 
know when they are resolved and you are ready for a review.

> Supports running jar file names with Spaces
> ---
>
> Key: FLINK-32817
> URL: https://issues.apache.org/jira/browse/FLINK-32817
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Affects Versions: 1.14.0
>Reporter: Xianxun Ye
>Assignee: Xianxun Ye
>Priority: Major
>  Labels: pull-request-available
>
> When submitting a flink jar to a yarn cluster, if the jar filename has spaces 
> in it, the task will not be able to successfully parse the file path in 
> `YarnLocalResourceDescriptor`, and the following exception will occur in 
> JobManager.
> The Flink jar file name is: StreamSQLExample 2.jar
> {code:java}
> bin/flink run -d -m yarn-cluster -p 1 -c 
> org.apache.flink.table.examples.java.basics.StreamSQLExample 
> StreamSQLExample\ 2.jar {code}
> {code:java}
> 2023-08-09 18:54:31,787 WARN  
> org.apache.flink.runtime.extension.resourcemanager.NeActiveResourceManager [] 
> - Failed requesting worker with resource spec WorkerResourceSpec 
> {cpuCores=1.0, taskHeapSize=220.160mb (230854450 bytes), taskOffHeapSize=0 
> bytes, networkMemSize=158.720mb (166429984 bytes), managedMemSize=952.320mb 
> (998579934 bytes), numSlots=1}, current pending count: 0
> java.util.concurrent.CompletionException: 
> org.apache.flink.util.FlinkException: Error to parse 
> YarnLocalResourceDescriptor from 
> YarnLocalResourceDescriptor{key=StreamSQLExample 2.jar, 
> path=hdfs://***/.flink/application_1586413220781_33151/StreamSQLExample 
> 2.jar, size=7937, modificationTime=1691578403748, visibility=APPLICATION, 
> type=FILE}
>     at 
> org.apache.flink.util.concurrent.FutureUtils.lambda$supplyAsync$21(FutureUtils.java:1052)
>  ~[flink-dist_2.12-1.14.0.jar:1.14.0]
>     at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
>  ~[?:1.8.0_152]
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[?:1.8.0_152]
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_152]
>     at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_152]
> Caused by: org.apache.flink.util.FlinkException: Error to parse 
> YarnLocalResourceDescriptor from 
> YarnLocalResourceDescriptor{key=StreamSQLExample 2.jar, 
> path=hdfs://sloth-jd-pub/user/sloth/.flink/application_1586413220781_33151/StreamSQLExample
>  2.jar, size=7937, modificationTime=1691578403748, visibility=APPLICATION, 
> type=FILE}
>     at 
> org.apache.flink.yarn.YarnLocalResourceDescriptor.fromString(YarnLocalResourceDescriptor.java:112)
>  ~[flink-dist_2.12-1.14.0.jar:1.14.0]
>     at 
> org.apache.flink.yarn.Utils.decodeYarnLocalResourceDescriptorListFromString(Utils.java:600)
>  ~[flink-dist_2.12-1.14.0.jar:1.14.0]
>     at org.apache.flink.yarn.Utils.createTaskExecutorContext(Utils.java:491) 
> ~[flink-dist_2.12-1.14.0.jar:1.14.0]
>     at 
> org.apache.flink.yarn.YarnResourceManagerDriver.createTaskExecutorLaunchContext(YarnResourceManagerDriver.java:452)
>  ~[flink-dist_2.12-1.14.0.jar:1.14.0]
>     at 
> org.apache.flink.yarn.YarnResourceManagerDriver.lambda$startTaskExecutorInContainerAsync$1(YarnResourceManagerDriver.java:383)
>  ~[flink-dist_2.12-1.14.0.jar:1.14.0]
>     at 
> org.apache.flink.util.concurrent.FutureUtils.lambda$supplyAsync$21(FutureUtils.java:1050)
>  ~[flink-dist_2.12-1.14.0.jar:1.14.0]
>     ... 4 more{code}
> From what I understand, the HDFS cluster allows for file names with spaces, 
> as well as S3.
>  
> I think we could replace the `LOCAL_RESOURCE_DESC_FORMAT` with 
> {code:java}
> // code placeholder
> private static final Pattern LOCAL_RESOURCE_DESC_FORMAT =
> Pattern.compile(
> "YarnLocalResourceDescriptor\\{"
> + "key=([\\S\\x20]+), path=([\\S\\x20]+), 
> size=([\\d]+), modificationTime=([\\d]+), visibility=(\\S+), type=(\\S+)}"); 
> {code}
> add '\x20' to only match the spaces



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-32817) Supports running jar file names with Spaces

2023-08-27 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song reassigned FLINK-32817:


Assignee: Xianxun Ye

> Supports running jar file names with Spaces
> ---
>
> Key: FLINK-32817
> URL: https://issues.apache.org/jira/browse/FLINK-32817
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Affects Versions: 1.14.0
>Reporter: Xianxun Ye
>Assignee: Xianxun Ye
>Priority: Major
>  Labels: pull-request-available
>
> When submitting a flink jar to a yarn cluster, if the jar filename has spaces 
> in it, the task will not be able to successfully parse the file path in 
> `YarnLocalResourceDescriptor`, and the following exception will occur in 
> JobManager.
> The Flink jar file name is: StreamSQLExample 2.jar
> {code:java}
> bin/flink run -d -m yarn-cluster -p 1 -c 
> org.apache.flink.table.examples.java.basics.StreamSQLExample 
> StreamSQLExample\ 2.jar {code}
> {code:java}
> 2023-08-09 18:54:31,787 WARN  
> org.apache.flink.runtime.extension.resourcemanager.NeActiveResourceManager [] 
> - Failed requesting worker with resource spec WorkerResourceSpec 
> {cpuCores=1.0, taskHeapSize=220.160mb (230854450 bytes), taskOffHeapSize=0 
> bytes, networkMemSize=158.720mb (166429984 bytes), managedMemSize=952.320mb 
> (998579934 bytes), numSlots=1}, current pending count: 0
> java.util.concurrent.CompletionException: 
> org.apache.flink.util.FlinkException: Error to parse 
> YarnLocalResourceDescriptor from 
> YarnLocalResourceDescriptor{key=StreamSQLExample 2.jar, 
> path=hdfs://***/.flink/application_1586413220781_33151/StreamSQLExample 
> 2.jar, size=7937, modificationTime=1691578403748, visibility=APPLICATION, 
> type=FILE}
>     at 
> org.apache.flink.util.concurrent.FutureUtils.lambda$supplyAsync$21(FutureUtils.java:1052)
>  ~[flink-dist_2.12-1.14.0.jar:1.14.0]
>     at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
>  ~[?:1.8.0_152]
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[?:1.8.0_152]
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_152]
>     at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_152]
> Caused by: org.apache.flink.util.FlinkException: Error to parse 
> YarnLocalResourceDescriptor from 
> YarnLocalResourceDescriptor{key=StreamSQLExample 2.jar, 
> path=hdfs://sloth-jd-pub/user/sloth/.flink/application_1586413220781_33151/StreamSQLExample
>  2.jar, size=7937, modificationTime=1691578403748, visibility=APPLICATION, 
> type=FILE}
>     at 
> org.apache.flink.yarn.YarnLocalResourceDescriptor.fromString(YarnLocalResourceDescriptor.java:112)
>  ~[flink-dist_2.12-1.14.0.jar:1.14.0]
>     at 
> org.apache.flink.yarn.Utils.decodeYarnLocalResourceDescriptorListFromString(Utils.java:600)
>  ~[flink-dist_2.12-1.14.0.jar:1.14.0]
>     at org.apache.flink.yarn.Utils.createTaskExecutorContext(Utils.java:491) 
> ~[flink-dist_2.12-1.14.0.jar:1.14.0]
>     at 
> org.apache.flink.yarn.YarnResourceManagerDriver.createTaskExecutorLaunchContext(YarnResourceManagerDriver.java:452)
>  ~[flink-dist_2.12-1.14.0.jar:1.14.0]
>     at 
> org.apache.flink.yarn.YarnResourceManagerDriver.lambda$startTaskExecutorInContainerAsync$1(YarnResourceManagerDriver.java:383)
>  ~[flink-dist_2.12-1.14.0.jar:1.14.0]
>     at 
> org.apache.flink.util.concurrent.FutureUtils.lambda$supplyAsync$21(FutureUtils.java:1050)
>  ~[flink-dist_2.12-1.14.0.jar:1.14.0]
>     ... 4 more{code}
> From what I understand, the HDFS cluster allows for file names with spaces, 
> as well as S3.
>  
> I think we could replace the `LOCAL_RESOURCE_DESC_FORMAT` with 
> {code:java}
> // code placeholder
> private static final Pattern LOCAL_RESOURCE_DESC_FORMAT =
> Pattern.compile(
> "YarnLocalResourceDescriptor\\{"
> + "key=([\\S\\x20]+), path=([\\S\\x20]+), 
> size=([\\d]+), modificationTime=([\\d]+), visibility=(\\S+), type=(\\S+)}"); 
> {code}
> add '\x20' to only match the spaces



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32817) Supports running jar file names with Spaces

2023-08-22 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757328#comment-17757328
 ] 

Xintong Song commented on FLINK-32817:
--

cc [~wangyang0918] as the original contributor

> Supports running jar file names with Spaces
> ---
>
> Key: FLINK-32817
> URL: https://issues.apache.org/jira/browse/FLINK-32817
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Affects Versions: 1.14.0
>Reporter: Xianxun Ye
>Priority: Major
>
> When submitting a flink jar to a yarn cluster, if the jar filename has spaces 
> in it, the task will not be able to successfully parse the file path in 
> `YarnLocalResourceDescriptor`, and the following exception will occur in 
> JobManager.
> The Flink jar file name is: StreamSQLExample 2.jar
> {code:java}
> bin/flink run -d -m yarn-cluster -p 1 -c 
> org.apache.flink.table.examples.java.basics.StreamSQLExample 
> StreamSQLExample\ 2.jar {code}
> {code:java}
> 2023-08-09 18:54:31,787 WARN  
> org.apache.flink.runtime.extension.resourcemanager.NeActiveResourceManager [] 
> - Failed requesting worker with resource spec WorkerResourceSpec 
> {cpuCores=1.0, taskHeapSize=220.160mb (230854450 bytes), taskOffHeapSize=0 
> bytes, networkMemSize=158.720mb (166429984 bytes), managedMemSize=952.320mb 
> (998579934 bytes), numSlots=1}, current pending count: 0
> java.util.concurrent.CompletionException: 
> org.apache.flink.util.FlinkException: Error to parse 
> YarnLocalResourceDescriptor from 
> YarnLocalResourceDescriptor{key=StreamSQLExample 2.jar, 
> path=hdfs://***/.flink/application_1586413220781_33151/StreamSQLExample 
> 2.jar, size=7937, modificationTime=1691578403748, visibility=APPLICATION, 
> type=FILE}
>     at 
> org.apache.flink.util.concurrent.FutureUtils.lambda$supplyAsync$21(FutureUtils.java:1052)
>  ~[flink-dist_2.12-1.14.0.jar:1.14.0]
>     at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
>  ~[?:1.8.0_152]
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[?:1.8.0_152]
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_152]
>     at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_152]
> Caused by: org.apache.flink.util.FlinkException: Error to parse 
> YarnLocalResourceDescriptor from 
> YarnLocalResourceDescriptor{key=StreamSQLExample 2.jar, 
> path=hdfs://sloth-jd-pub/user/sloth/.flink/application_1586413220781_33151/StreamSQLExample
>  2.jar, size=7937, modificationTime=1691578403748, visibility=APPLICATION, 
> type=FILE}
>     at 
> org.apache.flink.yarn.YarnLocalResourceDescriptor.fromString(YarnLocalResourceDescriptor.java:112)
>  ~[flink-dist_2.12-1.14.0.jar:1.14.0]
>     at 
> org.apache.flink.yarn.Utils.decodeYarnLocalResourceDescriptorListFromString(Utils.java:600)
>  ~[flink-dist_2.12-1.14.0.jar:1.14.0]
>     at org.apache.flink.yarn.Utils.createTaskExecutorContext(Utils.java:491) 
> ~[flink-dist_2.12-1.14.0.jar:1.14.0]
>     at 
> org.apache.flink.yarn.YarnResourceManagerDriver.createTaskExecutorLaunchContext(YarnResourceManagerDriver.java:452)
>  ~[flink-dist_2.12-1.14.0.jar:1.14.0]
>     at 
> org.apache.flink.yarn.YarnResourceManagerDriver.lambda$startTaskExecutorInContainerAsync$1(YarnResourceManagerDriver.java:383)
>  ~[flink-dist_2.12-1.14.0.jar:1.14.0]
>     at 
> org.apache.flink.util.concurrent.FutureUtils.lambda$supplyAsync$21(FutureUtils.java:1050)
>  ~[flink-dist_2.12-1.14.0.jar:1.14.0]
>     ... 4 more{code}
> From what I understand, the HDFS cluster allows for file names with spaces, 
> as well as S3.
>  
> I think we could replace the `LOCAL_RESOURCE_DESC_FORMAT` with 
> {code:java}
> // code placeholder
> private static final Pattern LOCAL_RESOURCE_DESC_FORMAT =
> Pattern.compile(
> "YarnLocalResourceDescriptor\\{"
> + "key=([\\S\\x20]+), path=([\\S\\x20]+), 
> size=([\\d]+), modificationTime=([\\d]+), visibility=(\\S+), type=(\\S+)}"); 
> {code}
> add '\x20' to only match the spaces



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32817) Supports running jar file names with Spaces

2023-08-22 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757326#comment-17757326
 ] 

Xintong Song commented on FLINK-32817:
--

Thanks for reporting and volunteering to fix this, [~yesorno].

I think this is a valid problem. Actually, I think it's more serious than not 
supporting filenames with spaces.

IIUC, what we need here is serialization and deserialization of 
`YarnLocalResourceDescriptor` so that we can pass it via environment variables. 
However, the pattern-matching based approach is problematic when there are user 
provided strings. E.g., what if the jar name contains "key=("? I know the 
chance is very little, but still this is unsafe.

I think it might worth to switch to a proper json serializer. Flink uses 
Jackson for json serialization and deserialization. You may take a look at 
{{JacksonMapperFactory}} and places where it is used.

WDYT?

> Supports running jar file names with Spaces
> ---
>
> Key: FLINK-32817
> URL: https://issues.apache.org/jira/browse/FLINK-32817
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Affects Versions: 1.14.0
>Reporter: Xianxun Ye
>Priority: Major
>
> When submitting a flink jar to a yarn cluster, if the jar filename has spaces 
> in it, the task will not be able to successfully parse the file path in 
> `YarnLocalResourceDescriptor`, and the following exception will occur in 
> JobManager.
> The Flink jar file name is: StreamSQLExample 2.jar
> {code:java}
> bin/flink run -d -m yarn-cluster -p 1 -c 
> org.apache.flink.table.examples.java.basics.StreamSQLExample 
> StreamSQLExample\ 2.jar {code}
> {code:java}
> 2023-08-09 18:54:31,787 WARN  
> org.apache.flink.runtime.extension.resourcemanager.NeActiveResourceManager [] 
> - Failed requesting worker with resource spec WorkerResourceSpec 
> {cpuCores=1.0, taskHeapSize=220.160mb (230854450 bytes), taskOffHeapSize=0 
> bytes, networkMemSize=158.720mb (166429984 bytes), managedMemSize=952.320mb 
> (998579934 bytes), numSlots=1}, current pending count: 0
> java.util.concurrent.CompletionException: 
> org.apache.flink.util.FlinkException: Error to parse 
> YarnLocalResourceDescriptor from 
> YarnLocalResourceDescriptor{key=StreamSQLExample 2.jar, 
> path=hdfs://***/.flink/application_1586413220781_33151/StreamSQLExample 
> 2.jar, size=7937, modificationTime=1691578403748, visibility=APPLICATION, 
> type=FILE}
>     at 
> org.apache.flink.util.concurrent.FutureUtils.lambda$supplyAsync$21(FutureUtils.java:1052)
>  ~[flink-dist_2.12-1.14.0.jar:1.14.0]
>     at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
>  ~[?:1.8.0_152]
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[?:1.8.0_152]
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_152]
>     at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_152]
> Caused by: org.apache.flink.util.FlinkException: Error to parse 
> YarnLocalResourceDescriptor from 
> YarnLocalResourceDescriptor{key=StreamSQLExample 2.jar, 
> path=hdfs://sloth-jd-pub/user/sloth/.flink/application_1586413220781_33151/StreamSQLExample
>  2.jar, size=7937, modificationTime=1691578403748, visibility=APPLICATION, 
> type=FILE}
>     at 
> org.apache.flink.yarn.YarnLocalResourceDescriptor.fromString(YarnLocalResourceDescriptor.java:112)
>  ~[flink-dist_2.12-1.14.0.jar:1.14.0]
>     at 
> org.apache.flink.yarn.Utils.decodeYarnLocalResourceDescriptorListFromString(Utils.java:600)
>  ~[flink-dist_2.12-1.14.0.jar:1.14.0]
>     at org.apache.flink.yarn.Utils.createTaskExecutorContext(Utils.java:491) 
> ~[flink-dist_2.12-1.14.0.jar:1.14.0]
>     at 
> org.apache.flink.yarn.YarnResourceManagerDriver.createTaskExecutorLaunchContext(YarnResourceManagerDriver.java:452)
>  ~[flink-dist_2.12-1.14.0.jar:1.14.0]
>     at 
> org.apache.flink.yarn.YarnResourceManagerDriver.lambda$startTaskExecutorInContainerAsync$1(YarnResourceManagerDriver.java:383)
>  ~[flink-dist_2.12-1.14.0.jar:1.14.0]
>     at 
> org.apache.flink.util.concurrent.FutureUtils.lambda$supplyAsync$21(FutureUtils.java:1050)
>  ~[flink-dist_2.12-1.14.0.jar:1.14.0]
>     ... 4 more{code}
> From what I understand, the HDFS cluster allows for file names with spaces, 
> as well as S3.
>  
> I think we could replace the `LOCAL_RESOURCE_DESC_FORMAT` with 
> {code:java}
> // code placeholder
> private static final Pattern LOCAL_RESOURCE_DESC_FORMAT =
> Pattern.compile(
> "YarnLocalResourceDescriptor\\{"
> + "key=([\\S\\x20]+), path=([\\S\\x20]+), 
> size=([\\d]+), modificationTime=([\\d]+), visibility=(\\S+), type=(\\S+)}"); 
> {code}
> add '\x20' to only match the spaces



--
This message was sent by Atlassian 

[jira] [Closed] (FLINK-32820) ParameterTool is mistakenly marked as deprecated

2023-08-12 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song closed FLINK-32820.

Fix Version/s: 1.18.0
   Resolution: Fixed

master (1.18): fc2b5d8f53a41695117f6eaf4c798cc183cf1e36

> ParameterTool is mistakenly marked as deprecated
> 
>
> Key: FLINK-32820
> URL: https://issues.apache.org/jira/browse/FLINK-32820
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataSet, API / DataStream
>Affects Versions: 1.18.0
>Reporter: Zhanghao Chen
>Assignee: Zhanghao Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.0
>
>
> ParameterTool and AbstractParameterTool in pacakge flink-java is mistakenly 
> marked as deprecated in [FLINK-32558] Properly deprecate DataSet API - ASF 
> JIRA (apache.org). They are widely used for handling application parameters 
> and is also listed in the Flink user doc: 
> [https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/application_parameters/.|https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/application_parameters/]
>  Also, they are not directly related to Dataset API.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-32821) Streaming examples failed to execute due to error in packaging

2023-08-09 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song reassigned FLINK-32821:


Assignee: Zhanghao Chen

> Streaming examples failed to execute due to error in packaging
> --
>
> Key: FLINK-32821
> URL: https://issues.apache.org/jira/browse/FLINK-32821
> Project: Flink
>  Issue Type: Bug
>  Components: Examples
>Affects Versions: 1.18.0
>Reporter: Zhanghao Chen
>Assignee: Zhanghao Chen
>Priority: Major
>
> 5 out of the 7 streaming examples failed to run:
>  * Iteration & SessionWindowing & SocketWindowWordCount & WindowJoin failed 
> to run due to java.lang.NoClassDefFoundError: 
> org/apache/flink/streaming/examples/utils/ParameterTool
>  * TopSpeedWindowing failed to run due to: Caused by: 
> java.lang.ClassNotFoundException: 
> org.apache.flink.connector.datagen.source.GeneratorFunction
> The NoClassDefFoundError with ParameterTool is introduced by [FLINK-32558] 
> Properly deprecate DataSet API - ASF JIRA (apache.org), and we'd better 
> resolve [FLINK-32820] ParameterTool is mistakenly marked as deprecated - ASF 
> JIRA (apache.org) first before we come to a fix for this problem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32820) ParameterTool is mistakenly marked as deprecated

2023-08-09 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752567#comment-17752567
 ] 

Xintong Song commented on FLINK-32820:
--

And thanks for volunteering. You are assigned. Please go ahead.

> ParameterTool is mistakenly marked as deprecated
> 
>
> Key: FLINK-32820
> URL: https://issues.apache.org/jira/browse/FLINK-32820
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataSet, API / DataStream
>Affects Versions: 1.18.0
>Reporter: Zhanghao Chen
>Assignee: Zhanghao Chen
>Priority: Major
>
> ParameterTool and AbstractParameterTool in pacakge flink-java is mistakenly 
> marked as deprecated in [FLINK-32558] Properly deprecate DataSet API - ASF 
> JIRA (apache.org). They are widely used for handling application parameters 
> and is also listed in the Flink user doc: 
> [https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/application_parameters/.|https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/application_parameters/]
>  Also, they are not directly related to Dataset API.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-32820) ParameterTool is mistakenly marked as deprecated

2023-08-09 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song reassigned FLINK-32820:


Assignee: Zhanghao Chen

> ParameterTool is mistakenly marked as deprecated
> 
>
> Key: FLINK-32820
> URL: https://issues.apache.org/jira/browse/FLINK-32820
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataSet, API / DataStream
>Affects Versions: 1.18.0
>Reporter: Zhanghao Chen
>Assignee: Zhanghao Chen
>Priority: Major
>
> ParameterTool and AbstractParameterTool in pacakge flink-java is mistakenly 
> marked as deprecated in [FLINK-32558] Properly deprecate DataSet API - ASF 
> JIRA (apache.org). They are widely used for handling application parameters 
> and is also listed in the Flink user doc: 
> [https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/application_parameters/.|https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/application_parameters/]
>  Also, they are not directly related to Dataset API.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32820) ParameterTool is mistakenly marked as deprecated

2023-08-09 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752566#comment-17752566
 ] 

Xintong Song commented on FLINK-32820:
--

[~Zhanghao Chen], thanks for reporting this. I wasn't aware that ParameterTool 
is mentioned in documentation.

ParameterTool is currently in the flink-java module, which is likely removed 
together with DataSet API in future. That's why we tried to deprecate it. And 
because many DataStream examples and end-to-ends tests are still depending on 
it, we introduced the other ParameterTool in flink-exmaples-streaming, in a 
different package to avoid conflicts.

If should not be removed, it needs to be moved to another module (maybe 
flink-streaming-java / flink-core), probably in the same package so users need 
not to re-import it as long as the module is included as dependency. However, 
moving it to another module may introduce breaking changes to DataSet users, 
thus should only happen in the major version bump.

So I guess for now reverting the ParameterTool related changes from FLINK-32558 
should be good enough.

> ParameterTool is mistakenly marked as deprecated
> 
>
> Key: FLINK-32820
> URL: https://issues.apache.org/jira/browse/FLINK-32820
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataSet, API / DataStream
>Affects Versions: 1.18.0
>Reporter: Zhanghao Chen
>Priority: Major
>
> ParameterTool and AbstractParameterTool in pacakge flink-java is mistakenly 
> marked as deprecated in [FLINK-32558] Properly deprecate DataSet API - ASF 
> JIRA (apache.org). They are widely used for handling application parameters 
> and is also listed in the Flink user doc: 
> [https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/application_parameters/.|https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/application_parameters/]
>  Also, they are not directly related to Dataset API.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-32741) Remove DataSet related descriptions in doc

2023-08-07 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song closed FLINK-32741.

Resolution: Done

master (1.18): dcdca1e460369d43391a7632717a85ea59d556b7

> Remove DataSet related descriptions in doc
> --
>
> Key: FLINK-32741
> URL: https://issues.apache.org/jira/browse/FLINK-32741
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Documentation
>Affects Versions: 2.0.0
>Reporter: Wencong Liu
>Assignee: Junyao Huang
>Priority: Major
>  Labels: 2.0-related, pull-request-available
> Fix For: 1.18.0
>
>
> Since All DataSet APIs will be deprecated in [FLINK-32558] and we don't 
> recommend developers to use the DataSet, the descriptions of DataSet should 
> be removed in the doc after [FLINK-32558].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-32557) API deprecations in Flink 1.18

2023-08-04 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song closed FLINK-32557.

Resolution: Done

> API deprecations in Flink 1.18
> --
>
> Key: FLINK-32557
> URL: https://issues.apache.org/jira/browse/FLINK-32557
> Project: Flink
>  Issue Type: Technical Debt
>Reporter: Xintong Song
>Priority: Major
> Fix For: 1.18.0
>
>
> As discussed in [1], we are deprecating multiple APIs in release 1.18, in 
> order to completely remove them in release 2.0.
> The listed APIs possibly should have been deprecated already, i.e., already 
> (or won't) have replacements, but are somehow not yet.
> [1] https://lists.apache.org/thread/3dw4f8frlg8hzlv324ql7n2755bzs9hy



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-32558) Properly deprecate DataSet API

2023-08-04 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song closed FLINK-32558.

Release Note: DataSet API is formally deprecated, and will be removed in 
the next major release.
  Resolution: Done

master (1.18): aa98c18d2ba975479fcfa4930b0139fa575d303e

> Properly deprecate DataSet API
> --
>
> Key: FLINK-32558
> URL: https://issues.apache.org/jira/browse/FLINK-32558
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / DataSet
>Reporter: Xintong Song
>Assignee: Wencong Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.0
>
>
> DataSet API is described as "legacy", "soft deprecated" in user documentation 
> [1]. The required tasks for formally deprecating / removing it, according to 
> FLIP-131 [2], are all completed.
> This task include marking all related API classes as `@Deprecated` and update 
> the user documentation.
> [1] 
> https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/dataset/overview/
> [2] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866741



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-32656) Deprecate ManagedTable related APIs

2023-08-04 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song closed FLINK-32656.

Release Note: ManagedTable related APIs are deprecated and will be removed 
in a future major release.
  Resolution: Done

master (1.18): 34729c8891448b8f0a96dbbc12603b44a6e130c5

> Deprecate ManagedTable related APIs
> ---
>
> Key: FLINK-32656
> URL: https://issues.apache.org/jira/browse/FLINK-32656
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / API
>Affects Versions: 1.18.0
>Reporter: Jane Chan
>Assignee: Jane Chan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.0
>
>
> Please refer to [FLIP-346: Deprecate ManagedTable related 
> APIs|https://cwiki.apache.org/confluence/display/FLINK/FLIP-346%3A+Deprecate+ManagedTable+related+APIs]
>  for more details.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32741) Remove DataSet related descriptions in doc

2023-08-03 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17750921#comment-17750921
 ] 

Xintong Song commented on FLINK-32741:
--

Thanks for volunteering, [~pegasas]. You are assigned. Please go ahead.

FYI, the DataSet documentation should be removed before we shipping the release 
1.18, which means we only have ~2-3 weeks woking on this (including PR review).

> Remove DataSet related descriptions in doc
> --
>
> Key: FLINK-32741
> URL: https://issues.apache.org/jira/browse/FLINK-32741
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Documentation
>Affects Versions: 2.0.0
>Reporter: Wencong Liu
>Assignee: Junyao Huang
>Priority: Major
>  Labels: 2.0-related
> Fix For: 1.18.0
>
>
> Since All DataSet APIs will be deprecated in [FLINK-32558] and we don't 
> recommend developers to use the DataSet, the descriptions of DataSet should 
> be removed in the doc after [FLINK-32558].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-32741) Remove DataSet related descriptions in doc

2023-08-03 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song reassigned FLINK-32741:


Assignee: Junyao Huang

> Remove DataSet related descriptions in doc
> --
>
> Key: FLINK-32741
> URL: https://issues.apache.org/jira/browse/FLINK-32741
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Documentation
>Affects Versions: 2.0.0
>Reporter: Wencong Liu
>Assignee: Junyao Huang
>Priority: Major
>  Labels: 2.0-related
> Fix For: 2.0.0
>
>
> Since All DataSet APIs will be deprecated in [FLINK-32558] and we don't 
> recommend developers to use the DataSet, the descriptions of DataSet should 
> be removed in the doc after [FLINK-32558].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-32741) Remove DataSet related descriptions in doc

2023-08-03 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song updated FLINK-32741:
-
Fix Version/s: 1.18.0
   (was: 2.0.0)

> Remove DataSet related descriptions in doc
> --
>
> Key: FLINK-32741
> URL: https://issues.apache.org/jira/browse/FLINK-32741
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Documentation
>Affects Versions: 2.0.0
>Reporter: Wencong Liu
>Assignee: Junyao Huang
>Priority: Major
>  Labels: 2.0-related
> Fix For: 1.18.0
>
>
> Since All DataSet APIs will be deprecated in [FLINK-32558] and we don't 
> recommend developers to use the DataSet, the descriptions of DataSet should 
> be removed in the doc after [FLINK-32558].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-32656) Deprecate ManagedTable related APIs

2023-08-03 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song reassigned FLINK-32656:


Assignee: Jane Chan

> Deprecate ManagedTable related APIs
> ---
>
> Key: FLINK-32656
> URL: https://issues.apache.org/jira/browse/FLINK-32656
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / API
>Affects Versions: 1.18.0
>Reporter: Jane Chan
>Assignee: Jane Chan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.0
>
>
> Please refer to [FLIP-346: Deprecate ManagedTable related 
> APIs|https://cwiki.apache.org/confluence/display/FLINK/FLIP-346%3A+Deprecate+ManagedTable+related+APIs]
>  for more details.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-3957) Breaking changes for Flink 2.0

2023-08-03 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song updated FLINK-3957:

Labels: 2.0-related auto-deprioritized-major  (was: 
auto-deprioritized-major)

> Breaking changes for Flink 2.0
> --
>
> Key: FLINK-3957
> URL: https://issues.apache.org/jira/browse/FLINK-3957
> Project: Flink
>  Issue Type: New Feature
>  Components: API / DataSet, API / DataStream, Build System
>Affects Versions: 1.0.0
>Reporter: Robert Metzger
>Priority: Minor
>  Labels: 2.0-related, auto-deprioritized-major
> Fix For: 2.0.0
>
>
> From time to time, we find APIs in Flink (1.x.y) marked as stable, even 
> though we would like to change them at some point.
> This JIRA is to track all planned breaking API changes.
> I would suggest to add subtasks to this one.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-32560) Properly deprecate all Scala APIs

2023-08-02 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song closed FLINK-32560.

Resolution: Done

master (1.18): 255d83087c6a6432270e6886ffdcf85dae00c241

> Properly deprecate all Scala APIs
> -
>
> Key: FLINK-32560
> URL: https://issues.apache.org/jira/browse/FLINK-32560
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Scala
>Reporter: Xintong Song
>Assignee: Ryan Skraba
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.0
>
>
> We agreed to drop Scala API support in FLIP-265 [1], and have tried to 
> deprecate them in FLINK-29740. Also, both user documentation and roadmap[2] 
> shows that scala API supports are deprecated. However, none of the APIs in 
> `flink-streaming-scala` are annotated with `@Deprecated` atm, and only 
> `ExecutionEnvironment` and `package` are marked `@Deprecated` in 
> `flink-scala`.
> [1] 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-265+Deprecate+and+remove+Scala+API+support
> [2] https://flink.apache.org/roadmap/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-32559) Deprecate Queryable State

2023-08-02 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song closed FLINK-32559.

Release Note: The Queryable State feature is formally deprecated. It will 
be removed in future major version bumps.
  Resolution: Done

master (1.18): 8db8119138b492e16969fd363577efa082102538

> Deprecate Queryable State
> -
>
> Key: FLINK-32559
> URL: https://issues.apache.org/jira/browse/FLINK-32559
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Queryable State
>Reporter: Xintong Song
>Assignee: Xintong Song
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.0
>
>
> Queryable State is described as approaching end-of-life in the roadmap [1], 
> but is neither deprecated in codes nor in user documentation [2]. There're 
> also more negative opinions than positive ones in the discussion about 
> rescuing it [3].
> [1] https://flink.apache.org/roadmap/
> [2] 
> https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/datastream/fault-tolerance/queryable_state/
> [3] https://lists.apache.org/thread/9hmwcjb3q5c24pk3qshjvybfqk62v17m



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-32675) Add doc for the tiered storage of hybrid shuffle

2023-08-02 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song closed FLINK-32675.

Fix Version/s: 1.18.0
   Resolution: Done

master (1.18): b18bde9bb44e94bc2fb7f84c49367c6988aa6dfe

> Add doc for the tiered storage of hybrid shuffle
> 
>
> Key: FLINK-32675
> URL: https://issues.apache.org/jira/browse/FLINK-32675
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation, Runtime / Network
>Affects Versions: 1.18.0
>Reporter: Yuxin Tan
>Assignee: Yuxin Tan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.0
>
>
> The new Hybrid Shuffle mode supporting remote storage 
> (https://issues.apache.org/jira/browse/FLINK-31634) has finished, we should 
> also update the Flink doc of Hybrid Shuffle.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32560) Properly deprecate all Scala APIs

2023-07-24 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17746725#comment-17746725
 ] 

Xintong Song commented on FLINK-32560:
--

Thanks, [~rskraba]. Your proposal sounds good to me. The purpose of this PR is 
indeed to get more attentions on the deprecation of Scala APIs, rather than 
legalizing its removal in 2.0.

> Properly deprecate all Scala APIs
> -
>
> Key: FLINK-32560
> URL: https://issues.apache.org/jira/browse/FLINK-32560
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Scala
>Reporter: Xintong Song
>Assignee: Ryan Skraba
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.18.0
>
>
> We agreed to drop Scala API support in FLIP-265 [1], and have tried to 
> deprecate them in FLINK-29740. Also, both user documentation and roadmap[2] 
> shows that scala API supports are deprecated. However, none of the APIs in 
> `flink-streaming-scala` are annotated with `@Deprecated` atm, and only 
> `ExecutionEnvironment` and `package` are marked `@Deprecated` in 
> `flink-scala`.
> [1] 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-265+Deprecate+and+remove+Scala+API+support
> [2] https://flink.apache.org/roadmap/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32560) Properly deprecate all Scala APIs

2023-07-24 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17746312#comment-17746312
 ] 

Xintong Song commented on FLINK-32560:
--

[~twalthr],

Crossposting your comment here in order to keep the discussions at one place.
{quote}TBH I'm questioning this PR. We only deprecated the entry points to the 
API because we wanted to keep the API usable while it still exists. Nobody 
wants to develop in an API where every single method is deprecated. I'm fine 
maybe with class-level deprecations if we need more of those. But spamming the 
codebase with deprecations while everyones knows that the entire API will be 
removed, is not very helpful. It will also spam CI/CD logs with hundreds of 
warnings.
{quote}
I see your point, but I tend to think this differently. Technically speaking, 
yes, by deprecating the entry points we implicitly send the message that all 
Scala APIs are deprecated. However, that requires users to carefully read the 
deprecation message, which many of the users don't from my experience. IMHO, 
trying to catch the attention of the careless users is probably more important 
than not disturbing the users who insists to work with the deprecated Scala API.
{quote}I'm fine maybe with class-level deprecations if we need more of those.
{quote}
+1. There are many methods annotated because they have different API stability 
from the containing class. E.g., the class is annotated as `@Public` while some 
of its methods are annotated as `@PublicEvolving`. For such methods, I think 
it's good enough to mark deprecation at the class level.

> Properly deprecate all Scala APIs
> -
>
> Key: FLINK-32560
> URL: https://issues.apache.org/jira/browse/FLINK-32560
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Scala
>Reporter: Xintong Song
>Assignee: Ryan Skraba
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.18.0
>
>
> We agreed to drop Scala API support in FLIP-265 [1], and have tried to 
> deprecate them in FLINK-29740. Also, both user documentation and roadmap[2] 
> shows that scala API supports are deprecated. However, none of the APIs in 
> `flink-streaming-scala` are annotated with `@Deprecated` atm, and only 
> `ExecutionEnvironment` and `package` are marked `@Deprecated` in 
> `flink-scala`.
> [1] 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-265+Deprecate+and+remove+Scala+API+support
> [2] https://flink.apache.org/roadmap/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-32559) Deprecate Queryable State

2023-07-20 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song reassigned FLINK-32559:


Assignee: Xintong Song

> Deprecate Queryable State
> -
>
> Key: FLINK-32559
> URL: https://issues.apache.org/jira/browse/FLINK-32559
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Queryable State
>Reporter: Xintong Song
>Assignee: Xintong Song
>Priority: Blocker
> Fix For: 1.18.0
>
>
> Queryable State is described as approaching end-of-life in the roadmap [1], 
> but is neither deprecated in codes nor in user documentation [2]. There're 
> also more negative opinions than positive ones in the discussion about 
> rescuing it [3].
> [1] https://flink.apache.org/roadmap/
> [2] 
> https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/datastream/fault-tolerance/queryable_state/
> [3] https://lists.apache.org/thread/9hmwcjb3q5c24pk3qshjvybfqk62v17m



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-32559) Deprecate Queryable State

2023-07-20 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song reassigned FLINK-32559:


Assignee: (was: Xintong Song)

> Deprecate Queryable State
> -
>
> Key: FLINK-32559
> URL: https://issues.apache.org/jira/browse/FLINK-32559
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Queryable State
>Reporter: Xintong Song
>Priority: Blocker
> Fix For: 1.18.0
>
>
> Queryable State is described as approaching end-of-life in the roadmap [1], 
> but is neither deprecated in codes nor in user documentation [2]. There're 
> also more negative opinions than positive ones in the discussion about 
> rescuing it [3].
> [1] https://flink.apache.org/roadmap/
> [2] 
> https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/datastream/fault-tolerance/queryable_state/
> [3] https://lists.apache.org/thread/9hmwcjb3q5c24pk3qshjvybfqk62v17m



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-32559) Deprecate Queryable State

2023-07-19 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song reassigned FLINK-32559:


Assignee: Xintong Song

> Deprecate Queryable State
> -
>
> Key: FLINK-32559
> URL: https://issues.apache.org/jira/browse/FLINK-32559
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Queryable State
>Reporter: Xintong Song
>Assignee: Xintong Song
>Priority: Blocker
> Fix For: 1.18.0
>
>
> Queryable State is described as approaching end-of-life in the roadmap [1], 
> but is neither deprecated in codes nor in user documentation [2]. There're 
> also more negative opinions than positive ones in the discussion about 
> rescuing it [3].
> [1] https://flink.apache.org/roadmap/
> [2] 
> https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/datastream/fault-tolerance/queryable_state/
> [3] https://lists.apache.org/thread/9hmwcjb3q5c24pk3qshjvybfqk62v17m



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-17755) Support side-output of expiring states with TTL.

2023-07-17 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song updated FLINK-17755:
-
Fix Version/s: (was: 2.0.0)
   (was: 1.18.0)

> Support side-output of expiring states with TTL.
> 
>
> Key: FLINK-17755
> URL: https://issues.apache.org/jira/browse/FLINK-17755
> Project: Flink
>  Issue Type: New Feature
>  Components: API / State Processor
>Reporter: Roey Shem Tov
>Priority: Minor
>  Labels: auto-deprioritized-major
>
> When we set a StateTTLConfig to StateDescriptor, then when a record has been 
> expired, it is deleted from the StateBackend.
> I want suggest a new feature, that we can get the expiring results as side 
> output, to process them and not just delete them.
> For example, if we have a ListState that have a TTL enabled, we can get the 
> expiring records in the list as side-output.
> What do you think?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-23620) Introduce proper YAML parsing to Flink's configuration

2023-07-17 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-23620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song updated FLINK-23620:
-
Fix Version/s: 2.0.0

> Introduce proper YAML parsing to Flink's configuration
> --
>
> Key: FLINK-23620
> URL: https://issues.apache.org/jira/browse/FLINK-23620
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Configuration
>Reporter: Mika Naylor
>Assignee: Junrui Li
>Priority: Minor
>  Labels: pull-request-available, stale-assigned
> Fix For: 2.0.0
>
>
> At the moment, the YAML parsing for Flink's configuration file 
> ({{conf/flink-conf.yaml)}} is pretty basic. It only supports basic key value 
> pairs, such as:
> {code:java}
> a.b.c: a value
> a.b.d: another value{code}
> As well as supporting some invalid YAML syntax, such as:
> {code:java}
> a: b: value{code}
>  
> Introducing proper YAML parsing to the configuration component would let 
> Flink users use features such as nested keys, such as:
> {code:java}
> a:
>   b:
> c: a value
> d: another value{code}
> as well as make it easier to integrate configuration tools/languages that 
> compile to YAML, such as Dhall.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-27299) flink parsing parameter bug

2023-07-17 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-27299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song closed FLINK-27299.

Resolution: Duplicate

Closing the ticket as it should be subsumed by FLINK-23620.

> flink parsing parameter bug
> ---
>
> Key: FLINK-27299
> URL: https://issues.apache.org/jira/browse/FLINK-27299
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Configuration
>Affects Versions: 1.11.6, 1.13.5, 1.14.2, 1.13.6, 1.14.3, 1.14.4
>Reporter: Huajie Wang
>Assignee: Huajie Wang
>Priority: Minor
>  Labels: easyfix, pull-request-available, stale-assigned
> Fix For: 2.0.0
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When I am running a flink job, I specify a running parameter with a "#" sign 
> in it. The parsing fails.
> e.g: flink run com.myJob --sink.password db@123#123 
> only parse the content in front of "#", after reading the source code It is 
> found that the parameters are intercepted according to "#" in the 
> loadYAMLResource method of GlobalConfiguration. This part needs to be improved



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-28046) Annotate SourceFunction as deprecated

2023-07-17 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-28046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17743640#comment-17743640
 ] 

Xintong Song commented on FLINK-28046:
--

Per the opinions from this thread [1], it seems we need to revert this ticket?

WDYT? [~afedulov][~Weijie Guo][~leonard]

[1] https://lists.apache.org/thread/734zhkvs59w2o4d1rsnozr1bfqlr6rgm

> Annotate SourceFunction as deprecated
> -
>
> Key: FLINK-28046
> URL: https://issues.apache.org/jira/browse/FLINK-28046
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / DataStream
>Affects Versions: 1.15.3
>Reporter: Alexander Fedulov
>Assignee: Alexander Fedulov
>Priority: Major
>  Labels: pull-request-available, stale-assigned
> Fix For: 1.18.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32560) Properly deprecate all Scala APIs

2023-07-11 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741973#comment-17741973
 ] 

Xintong Song commented on FLINK-32560:
--

The help is very much welcomed. Thanks [~rskraba] and [~Sergey Nuyanzin].

> Properly deprecate all Scala APIs
> -
>
> Key: FLINK-32560
> URL: https://issues.apache.org/jira/browse/FLINK-32560
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Scala
>Reporter: Xintong Song
>Assignee: Ryan Skraba
>Priority: Blocker
> Fix For: 1.18.0
>
>
> We agreed to drop Scala API support in FLIP-265 [1], and have tried to 
> deprecate them in FLINK-29740. Also, both user documentation and roadmap[2] 
> shows that scala API supports are deprecated. However, none of the APIs in 
> `flink-streaming-scala` are annotated with `@Deprecated` atm, and only 
> `ExecutionEnvironment` and `package` are marked `@Deprecated` in 
> `flink-scala`.
> [1] 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-265+Deprecate+and+remove+Scala+API+support
> [2] https://flink.apache.org/roadmap/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32557) API deprecations in Flink 1.18

2023-07-07 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740947#comment-17740947
 ] 

Xintong Song commented on FLINK-32557:
--

{{SourceFunction}} are already annotated as `@Deprecated` in FLINK-28045.

> API deprecations in Flink 1.18
> --
>
> Key: FLINK-32557
> URL: https://issues.apache.org/jira/browse/FLINK-32557
> Project: Flink
>  Issue Type: Technical Debt
>Reporter: Xintong Song
>Priority: Blocker
> Fix For: 1.18.0
>
>
> As discussed in [1], we are deprecating multiple APIs in release 1.18, in 
> order to completely remove them in release 2.0.
> The listed APIs possibly should have been deprecated already, i.e., already 
> (or won't) have replacements, but are somehow not yet.
> [1] https://lists.apache.org/thread/3dw4f8frlg8hzlv324ql7n2755bzs9hy



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32560) Properly deprecate all Scala APIs

2023-07-07 Thread Xintong Song (Jira)
Xintong Song created FLINK-32560:


 Summary: Properly deprecate all Scala APIs
 Key: FLINK-32560
 URL: https://issues.apache.org/jira/browse/FLINK-32560
 Project: Flink
  Issue Type: Sub-task
  Components: API / Scala
Reporter: Xintong Song
 Fix For: 1.18.0


We agreed to drop Scala API support in FLIP-265 [1], and have tried to 
deprecate them in FLINK-29740. Also, both user documentation and roadmap[2] 
shows that scala API supports are deprecated. However, none of the APIs in 
`flink-streaming-scala` are annotated with `@Deprecated` atm, and only 
`ExecutionEnvironment` and `package` are marked `@Deprecated` in `flink-scala`.

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-265+Deprecate+and+remove+Scala+API+support
[2] https://flink.apache.org/roadmap/




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32559) Deprecate Queryable State

2023-07-07 Thread Xintong Song (Jira)
Xintong Song created FLINK-32559:


 Summary: Deprecate Queryable State
 Key: FLINK-32559
 URL: https://issues.apache.org/jira/browse/FLINK-32559
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Queryable State
Reporter: Xintong Song
 Fix For: 1.18.0


Queryable State is described as approaching end-of-life in the roadmap [1], but 
is neither deprecated in codes nor in user documentation [2]. There're also 
more negative opinions than positive ones in the discussion about rescuing it 
[3].

[1] https://flink.apache.org/roadmap/
[2] 
https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/datastream/fault-tolerance/queryable_state/
[3] https://lists.apache.org/thread/9hmwcjb3q5c24pk3qshjvybfqk62v17m



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32558) Properly deprecate DataSet API

2023-07-07 Thread Xintong Song (Jira)
Xintong Song created FLINK-32558:


 Summary: Properly deprecate DataSet API
 Key: FLINK-32558
 URL: https://issues.apache.org/jira/browse/FLINK-32558
 Project: Flink
  Issue Type: Sub-task
  Components: API / DataSet
Reporter: Xintong Song
 Fix For: 1.18.0


DataSet API is described as "legacy", "soft deprecated" in user documentation 
[1]. The required tasks for formally deprecating / removing it, according to 
FLIP-131 [2], are all completed.

This task include marking all related API classes as `@Deprecated` and update 
the user documentation.

[1] 
https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/dataset/overview/
[2] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866741



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-32558) Properly deprecate DataSet API

2023-07-07 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song reassigned FLINK-32558:


Assignee: Wencong Liu

> Properly deprecate DataSet API
> --
>
> Key: FLINK-32558
> URL: https://issues.apache.org/jira/browse/FLINK-32558
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / DataSet
>Reporter: Xintong Song
>Assignee: Wencong Liu
>Priority: Blocker
> Fix For: 1.18.0
>
>
> DataSet API is described as "legacy", "soft deprecated" in user documentation 
> [1]. The required tasks for formally deprecating / removing it, according to 
> FLIP-131 [2], are all completed.
> This task include marking all related API classes as `@Deprecated` and update 
> the user documentation.
> [1] 
> https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/dataset/overview/
> [2] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866741



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32557) API deprecations in Flink 1.18

2023-07-07 Thread Xintong Song (Jira)
Xintong Song created FLINK-32557:


 Summary: API deprecations in Flink 1.18
 Key: FLINK-32557
 URL: https://issues.apache.org/jira/browse/FLINK-32557
 Project: Flink
  Issue Type: Technical Debt
Reporter: Xintong Song
 Fix For: 1.18.0


As discussed in [1], we are deprecating multiple APIs in release 1.18, in order 
to completely remove them in release 2.0.

The listed APIs possibly should have been deprecated already, i.e., already (or 
won't) have replacements, but are somehow not yet.

[1] https://lists.apache.org/thread/3dw4f8frlg8hzlv324ql7n2755bzs9hy



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-29913) Shared state would be discarded by mistake when maxConcurrentCheckpoint>1

2023-05-24 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-29913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song updated FLINK-29913:
-
Fix Version/s: 1.16.3
   (was: 1.16.2)

> Shared state would be discarded by mistake when maxConcurrentCheckpoint>1
> -
>
> Key: FLINK-29913
> URL: https://issues.apache.org/jira/browse/FLINK-29913
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.15.0, 1.16.0, 1.17.0
>Reporter: Yanfei Lei
>Assignee: Feifan Wang
>Priority: Major
> Fix For: 1.16.3, 1.17.2
>
>
> When maxConcurrentCheckpoint>1, the shared state of Incremental rocksdb state 
> backend would be discarded by registering the same name handle. See 
> [https://github.com/apache/flink/pull/21050#discussion_r1011061072]
> cc [~roman] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-31639) Introduce tiered store memory manager

2023-05-23 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song closed FLINK-31639.

Fix Version/s: 1.18.0
   Resolution: Done

master (1.18): c6d7747eaef166fb7577de55cb2943fa5408d54e

> Introduce tiered store memory manager
> -
>
> Key: FLINK-31639
> URL: https://issues.apache.org/jira/browse/FLINK-31639
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Network
>Affects Versions: 1.18.0
>Reporter: Yuxin Tan
>Assignee: Yuxin Tan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-31638) Downstream supports reading buffers from tiered store

2023-05-14 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song closed FLINK-31638.

Fix Version/s: 1.18.0
   Resolution: Done

master (1.18): 7d9027dbb3ae551a86c385f985e5fe9af2cbdbac

> Downstream supports reading buffers from tiered store
> -
>
> Key: FLINK-31638
> URL: https://issues.apache.org/jira/browse/FLINK-31638
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Network
>Affects Versions: 1.18.0
>Reporter: Yuxin Tan
>Assignee: Wencong Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31635) Support writing records to the new tiered store architecture

2023-05-10 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17721577#comment-17721577
 ] 

Xintong Song commented on FLINK-31635:
--

master (1.18): 80a924309ce910715c4079c7e52e9d560318bd38

> Support writing records to the new tiered store architecture
> 
>
> Key: FLINK-31635
> URL: https://issues.apache.org/jira/browse/FLINK-31635
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Network
>Affects Versions: 1.18.0
>Reporter: Yuxin Tan
>Assignee: Yuxin Tan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-31635) Support writing records to the new tiered store architecture

2023-05-10 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song closed FLINK-31635.

Fix Version/s: 1.18.0
   Resolution: Done

> Support writing records to the new tiered store architecture
> 
>
> Key: FLINK-31635
> URL: https://issues.apache.org/jira/browse/FLINK-31635
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Network
>Affects Versions: 1.18.0
>Reporter: Yuxin Tan
>Assignee: Yuxin Tan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-31984) Savepoint on S3 should be relocatable if entropy injection is not effective

2023-05-08 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song closed FLINK-31984.

Resolution: Fixed

release-1.17: 9d31efe7e4ec358a64f79ca5e8ff516cabea0cce
release-1.16: 68a79a05155d4a43395ad51589d9a2cae740bea5

> Savepoint on S3 should be relocatable if entropy injection is not effective
> ---
>
> Key: FLINK-31984
> URL: https://issues.apache.org/jira/browse/FLINK-31984
> Project: Flink
>  Issue Type: Improvement
>  Components: FileSystems, Runtime / Checkpointing
>Affects Versions: 1.16.1
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.2, 1.18.0, 1.17.1
>
>
> We have a limitation that if we create savepoints with an injected entropy, 
> they are not relocatable 
> (https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/savepoints/#triggering-savepoints).
> FLINK-25952 improves the check by inspecting both the FileSystem extending 
> {{EntropyInjectingFileSystem}} and 
> {{FlinkS3FileSystem#getEntropyInjectionKey}} not returning null. We can 
> improve this further by checking the checkpoint path is indeed using the 
> entropy injection key. Without that, the savepoint is not relocatable even if 
> the {{state.savepoints.dir}} does not contain the entropy.
> In our setting, we enable entropy injection by setting {{s3.entropy.key}} to 
>  {{\__ENTROPY_KEY\__}} and use the entropy key in the checkpoint path (for 
> e.g. {{s3://mybuket/checkpoints/__ENTROPY_KEY__/myapp}}). However, in the 
> savepoint path, we don't use the entropy key (for e.g. 
> {{s3://mybuket/savepoints/myapp}}) because we want the savepoint to be 
> relocatable. But the current logic still generates non-relocatable savepoint 
> path just because the entropy injection key is non-null.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-31448) Use FineGrainedSlotManager as the default SlotManager

2023-05-08 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song closed FLINK-31448.

Fix Version/s: 1.18.0
   Resolution: Done

master (1.18): 3df65910025ccba93d75b3a885ef5d0b67becd17

> Use FineGrainedSlotManager as the default SlotManager
> -
>
> Key: FLINK-31448
> URL: https://issues.apache.org/jira/browse/FLINK-31448
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Weihua Hu
>Assignee: Weihua Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-31443) FineGrainedSlotManager maintain some redundant task managers

2023-05-05 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song closed FLINK-31443.

Fix Version/s: 1.18.0
   Resolution: Done

master (1.18): 00b1d4cf88022d06ec99fe3f229e1a0100fe14ee

> FineGrainedSlotManager maintain some redundant task managers
> 
>
> Key: FLINK-31443
> URL: https://issues.apache.org/jira/browse/FLINK-31443
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Weihua Hu
>Assignee: Weihua Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.0
>
>
> implementation of 
> [FLINK-18625|https://issues.apache.org/jira/browse/FLINK-18625] in 
> FineGrainedSlotManager.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-31974) JobManager crashes after KubernetesClientException exception with FatalExitExceptionHandler

2023-05-05 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719744#comment-17719744
 ] 

Xintong Song edited comment on FLINK-31974 at 5/5/23 8:45 AM:
--

Thanks all for the explanation and patience. It seems there's a commonly 
tendency towards the retry-by-default approach. 

I also consulted a few colleagues from our Kubernetes team about this. They 
also share the opinion that there might be more error types that can be 
resolved by retrying than a whitelist could possibly handle. The only concern 
they mentioned is that keeping retrying may make the Kubernetes API Server 
harder to recover from outages in some specific cases, which I believe can be 
addressed with backoff and guardrails as [~mbalassi] mentioned.

I'd respect the opinion of the majority, withdraw my proposal, and +1 for 
[~gyfora]'s proposal.


was (Author: xintongsong):
Thanks all for the explanation and patience. It seems there's a commonly 
tendency towards the retry-by-default approach. 

I also consulted a few colleagues from our Kubernetes team about this. They 
also share the opinion that there might be more error types that can be 
resolved by retrying than a whitelist could possibly handle. The only concern 
they mentioned is that keeping retrying may make the Kubernetes API Server 
harder to recover from outages, which I believe can be addressed with backoff 
and guardrails as [~mbalassi] mentioned.

I'd respect the opinion of the majority, withdraw my proposal, and +1 for 
[~gyfora]'s proposal.

> JobManager crashes after KubernetesClientException exception with 
> FatalExitExceptionHandler
> ---
>
> Key: FLINK-31974
> URL: https://issues.apache.org/jira/browse/FLINK-31974
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Affects Versions: 1.17.0
>Reporter: Sergio Sainz
>Assignee: Weijie Guo
>Priority: Major
>
> When resource quota limit is reached JobManager will throw
>  
> org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.KubernetesClientException:
>  Failure executing: POST at: 
> https://10.96.0.1/api/v1/namespaces/my-namespace/pods. Message: 
> Forbidden!Configured service account doesn't have access. Service account may 
> have been revoked. pods "my-namespace-flink-cluster-taskmanager-1-2" is 
> forbidden: exceeded quota: my-namespace-resource-quota, requested: 
> limits.cpu=3, used: limits.cpu=12100m, limited: limits.cpu=13.
>  
> In {*}1.16.1 , this is handled gracefully{*}:
> {code}
> 2023-04-28 22:07:24,631 WARN  
> org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - 
> Failed requesting worker with resource spec WorkerResourceSpec 
> \{cpuCores=1.0, taskHeapSize=25.600mb (26843542 bytes), taskOffHeapSize=0 
> bytes, networkMemSize=64.000mb (67108864 bytes), managedMemSize=230.400mb 
> (241591914 bytes), numSlots=4}, current pending count: 0
> java.util.concurrent.CompletionException: 
> io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: 
> POST at: https://10.96.0.1/api/v1/namespaces/my-namespace/pods. Message: 
> Forbidden!Configured service account doesn't have access. Service account may 
> have been revoked. pods "my-namespace-flink-cluster-taskmanager-1-138" is 
> forbidden: exceeded quota: my-namespace-resource-quota, requested: 
> limits.cpu=3, used: limits.cpu=12100m, limited: limits.cpu=13.
>         at java.util.concurrent.CompletableFuture.encodeThrowable(Unknown 
> Source) ~[?:?]
>         at java.util.concurrent.CompletableFuture.completeThrowable(Unknown 
> Source) ~[?:?]
>         at java.util.concurrent.CompletableFuture$AsyncRun.run(Unknown 
> Source) ~[?:?]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) 
> ~[?:?]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 
> ~[?:?]
>         at java.lang.Thread.run(Unknown Source) ~[?:?]
> Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure 
> executing: POST at: https://10.96.0.1/api/v1/namespaces/my-namespace/pods. 
> Message: Forbidden!Configured service account doesn't have access. Service 
> account may have been revoked. pods 
> "my-namespace-flink-cluster-taskmanager-1-138" is forbidden: exceeded quota: 
> my-namespace-resource-quota, requested: limits.cpu=3, used: 
> limits.cpu=12100m, limited: limits.cpu=13.
>         at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:684)
>  ~[flink-dist-1.16.1.jar:1.16.1]
>         at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:664)
>  ~[flink-dist-1.16.1.jar:1.16.1]
>         at 
> 

[jira] [Commented] (FLINK-31974) JobManager crashes after KubernetesClientException exception with FatalExitExceptionHandler

2023-05-05 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719744#comment-17719744
 ] 

Xintong Song commented on FLINK-31974:
--

Thanks all for the explanation and patience. It seems there's a commonly 
tendency towards the retry-by-default approach. 

I also consulted a few colleagues from our Kubernetes team about this. They 
also share the opinion that there might be more error types that can be 
resolved by retrying than a whitelist could possibly handle. The only concern 
they mentioned is that keeping retrying may make the Kubernetes API Server 
harder to recover from outages, which I believe can be addressed with backoff 
and guardrails as [~mbalassi] mentioned.

I'd respect the opinion of the majority, withdraw my proposal, and +1 for 
[~gyfora]'s proposal.

> JobManager crashes after KubernetesClientException exception with 
> FatalExitExceptionHandler
> ---
>
> Key: FLINK-31974
> URL: https://issues.apache.org/jira/browse/FLINK-31974
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Affects Versions: 1.17.0
>Reporter: Sergio Sainz
>Assignee: Weijie Guo
>Priority: Major
>
> When resource quota limit is reached JobManager will throw
>  
> org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.KubernetesClientException:
>  Failure executing: POST at: 
> https://10.96.0.1/api/v1/namespaces/my-namespace/pods. Message: 
> Forbidden!Configured service account doesn't have access. Service account may 
> have been revoked. pods "my-namespace-flink-cluster-taskmanager-1-2" is 
> forbidden: exceeded quota: my-namespace-resource-quota, requested: 
> limits.cpu=3, used: limits.cpu=12100m, limited: limits.cpu=13.
>  
> In {*}1.16.1 , this is handled gracefully{*}:
> {code}
> 2023-04-28 22:07:24,631 WARN  
> org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - 
> Failed requesting worker with resource spec WorkerResourceSpec 
> \{cpuCores=1.0, taskHeapSize=25.600mb (26843542 bytes), taskOffHeapSize=0 
> bytes, networkMemSize=64.000mb (67108864 bytes), managedMemSize=230.400mb 
> (241591914 bytes), numSlots=4}, current pending count: 0
> java.util.concurrent.CompletionException: 
> io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: 
> POST at: https://10.96.0.1/api/v1/namespaces/my-namespace/pods. Message: 
> Forbidden!Configured service account doesn't have access. Service account may 
> have been revoked. pods "my-namespace-flink-cluster-taskmanager-1-138" is 
> forbidden: exceeded quota: my-namespace-resource-quota, requested: 
> limits.cpu=3, used: limits.cpu=12100m, limited: limits.cpu=13.
>         at java.util.concurrent.CompletableFuture.encodeThrowable(Unknown 
> Source) ~[?:?]
>         at java.util.concurrent.CompletableFuture.completeThrowable(Unknown 
> Source) ~[?:?]
>         at java.util.concurrent.CompletableFuture$AsyncRun.run(Unknown 
> Source) ~[?:?]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) 
> ~[?:?]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 
> ~[?:?]
>         at java.lang.Thread.run(Unknown Source) ~[?:?]
> Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure 
> executing: POST at: https://10.96.0.1/api/v1/namespaces/my-namespace/pods. 
> Message: Forbidden!Configured service account doesn't have access. Service 
> account may have been revoked. pods 
> "my-namespace-flink-cluster-taskmanager-1-138" is forbidden: exceeded quota: 
> my-namespace-resource-quota, requested: limits.cpu=3, used: 
> limits.cpu=12100m, limited: limits.cpu=13.
>         at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:684)
>  ~[flink-dist-1.16.1.jar:1.16.1]
>         at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:664)
>  ~[flink-dist-1.16.1.jar:1.16.1]
>         at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:613)
>  ~[flink-dist-1.16.1.jar:1.16.1]
>         at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:558)
>  ~[flink-dist-1.16.1.jar:1.16.1]
>         at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:521)
>  ~[flink-dist-1.16.1.jar:1.16.1]
>         at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:308)
>  ~[flink-dist-1.16.1.jar:1.16.1]
>         at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:644)
>  ~[flink-dist-1.16.1.jar:1.16.1]
>         at 
> 

[jira] [Commented] (FLINK-31974) JobManager crashes after KubernetesClientException exception with FatalExitExceptionHandler

2023-05-04 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719580#comment-17719580
 ] 

Xintong Song commented on FLINK-31974:
--

cc [~wangyang0918]

> JobManager crashes after KubernetesClientException exception with 
> FatalExitExceptionHandler
> ---
>
> Key: FLINK-31974
> URL: https://issues.apache.org/jira/browse/FLINK-31974
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Affects Versions: 1.17.0
>Reporter: Sergio Sainz
>Assignee: Weijie Guo
>Priority: Major
>
> When resource quota limit is reached JobManager will throw
>  
> org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.KubernetesClientException:
>  Failure executing: POST at: 
> https://10.96.0.1/api/v1/namespaces/my-namespace/pods. Message: 
> Forbidden!Configured service account doesn't have access. Service account may 
> have been revoked. pods "my-namespace-flink-cluster-taskmanager-1-2" is 
> forbidden: exceeded quota: my-namespace-resource-quota, requested: 
> limits.cpu=3, used: limits.cpu=12100m, limited: limits.cpu=13.
>  
> In {*}1.16.1 , this is handled gracefully{*}:
> {code}
> 2023-04-28 22:07:24,631 WARN  
> org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - 
> Failed requesting worker with resource spec WorkerResourceSpec 
> \{cpuCores=1.0, taskHeapSize=25.600mb (26843542 bytes), taskOffHeapSize=0 
> bytes, networkMemSize=64.000mb (67108864 bytes), managedMemSize=230.400mb 
> (241591914 bytes), numSlots=4}, current pending count: 0
> java.util.concurrent.CompletionException: 
> io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: 
> POST at: https://10.96.0.1/api/v1/namespaces/my-namespace/pods. Message: 
> Forbidden!Configured service account doesn't have access. Service account may 
> have been revoked. pods "my-namespace-flink-cluster-taskmanager-1-138" is 
> forbidden: exceeded quota: my-namespace-resource-quota, requested: 
> limits.cpu=3, used: limits.cpu=12100m, limited: limits.cpu=13.
>         at java.util.concurrent.CompletableFuture.encodeThrowable(Unknown 
> Source) ~[?:?]
>         at java.util.concurrent.CompletableFuture.completeThrowable(Unknown 
> Source) ~[?:?]
>         at java.util.concurrent.CompletableFuture$AsyncRun.run(Unknown 
> Source) ~[?:?]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) 
> ~[?:?]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 
> ~[?:?]
>         at java.lang.Thread.run(Unknown Source) ~[?:?]
> Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure 
> executing: POST at: https://10.96.0.1/api/v1/namespaces/my-namespace/pods. 
> Message: Forbidden!Configured service account doesn't have access. Service 
> account may have been revoked. pods 
> "my-namespace-flink-cluster-taskmanager-1-138" is forbidden: exceeded quota: 
> my-namespace-resource-quota, requested: limits.cpu=3, used: 
> limits.cpu=12100m, limited: limits.cpu=13.
>         at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:684)
>  ~[flink-dist-1.16.1.jar:1.16.1]
>         at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:664)
>  ~[flink-dist-1.16.1.jar:1.16.1]
>         at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:613)
>  ~[flink-dist-1.16.1.jar:1.16.1]
>         at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:558)
>  ~[flink-dist-1.16.1.jar:1.16.1]
>         at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:521)
>  ~[flink-dist-1.16.1.jar:1.16.1]
>         at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:308)
>  ~[flink-dist-1.16.1.jar:1.16.1]
>         at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:644)
>  ~[flink-dist-1.16.1.jar:1.16.1]
>         at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:83)
>  ~[flink-dist-1.16.1.jar:1.16.1]
>         at 
> io.fabric8.kubernetes.client.dsl.base.CreateOnlyResourceOperation.create(CreateOnlyResourceOperation.java:61)
>  ~[flink-dist-1.16.1.jar:1.16.1]
>         at 
> org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.lambda$createTaskManagerPod$1(Fabric8FlinkKubeClient.java:163)
>  ~[flink-dist-1.16.1.jar:1.16.1]
>         ... 4 more
> {code}
> But , {*}in Flink 1.17.0 , Job Manager crashes{*}:
> {code}
> 2023-04-28 20:50:50,534 ERROR org.apache.flink.util.FatalExitExceptionHandler 
>              [] - FATAL: Thread 

[jira] [Commented] (FLINK-31974) JobManager crashes after KubernetesClientException exception with FatalExitExceptionHandler

2023-05-04 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719331#comment-17719331
 ] 

Xintong Song commented on FLINK-31974:
--

[~gyfora],

bq. Flink treats only very few errors fatal. IO errors, connector (source/sink 
) errors etc all cause job restarts and in many cases "Flink cannot recover 
from by itself". You actually expect the error to be temporary and hopefully 
not get it after the restart. So I think it would be generally inconsistent 
with the current error handling behaviour if resource manager errors would 
simply let the job die fatally and not retry in the same way.

I think the difference here is that, for IO errors and connector errors, it 
affects the job but not the Flink cluster / deployment. Thinking of a session 
cluster, we should not fail the cluster for an error from a single job. But for 
resource manager interacting with Kubernetes API server, this is a cluster 
behavior and conceptually we don't distinguish resources for individual jobs 
until the slots are allocated. Moreover, it's possible that multiple jobs share 
the same resource (pod). One could argue that in application mode the cluster / 
deployment is equivalent to the job. However, the cluster mode (session / 
application) is transparent to the resource manager.

 bq. Flink jobs/clusters should be resilient and keep retrying in case of 
errors and should not give up especially for streaming workloads.

This is different from the feedback that I get from our production. But I can 
understand if that's what some of the users want. So I guess maybe it worth a 
configuration option as you suggested.

[~mbalassi],

+1 to what you said about the specific case. I think there's a consensus on 
reaching quota limit should not be treated as fatal errors.

> JobManager crashes after KubernetesClientException exception with 
> FatalExitExceptionHandler
> ---
>
> Key: FLINK-31974
> URL: https://issues.apache.org/jira/browse/FLINK-31974
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Affects Versions: 1.17.0
>Reporter: Sergio Sainz
>Assignee: Weijie Guo
>Priority: Major
>
> When resource quota limit is reached JobManager will throw
>  
> org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.KubernetesClientException:
>  Failure executing: POST at: 
> https://10.96.0.1/api/v1/namespaces/my-namespace/pods. Message: 
> Forbidden!Configured service account doesn't have access. Service account may 
> have been revoked. pods "my-namespace-flink-cluster-taskmanager-1-2" is 
> forbidden: exceeded quota: my-namespace-resource-quota, requested: 
> limits.cpu=3, used: limits.cpu=12100m, limited: limits.cpu=13.
>  
> In {*}1.16.1 , this is handled gracefully{*}:
> {code}
> 2023-04-28 22:07:24,631 WARN  
> org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - 
> Failed requesting worker with resource spec WorkerResourceSpec 
> \{cpuCores=1.0, taskHeapSize=25.600mb (26843542 bytes), taskOffHeapSize=0 
> bytes, networkMemSize=64.000mb (67108864 bytes), managedMemSize=230.400mb 
> (241591914 bytes), numSlots=4}, current pending count: 0
> java.util.concurrent.CompletionException: 
> io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: 
> POST at: https://10.96.0.1/api/v1/namespaces/my-namespace/pods. Message: 
> Forbidden!Configured service account doesn't have access. Service account may 
> have been revoked. pods "my-namespace-flink-cluster-taskmanager-1-138" is 
> forbidden: exceeded quota: my-namespace-resource-quota, requested: 
> limits.cpu=3, used: limits.cpu=12100m, limited: limits.cpu=13.
>         at java.util.concurrent.CompletableFuture.encodeThrowable(Unknown 
> Source) ~[?:?]
>         at java.util.concurrent.CompletableFuture.completeThrowable(Unknown 
> Source) ~[?:?]
>         at java.util.concurrent.CompletableFuture$AsyncRun.run(Unknown 
> Source) ~[?:?]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) 
> ~[?:?]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 
> ~[?:?]
>         at java.lang.Thread.run(Unknown Source) ~[?:?]
> Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure 
> executing: POST at: https://10.96.0.1/api/v1/namespaces/my-namespace/pods. 
> Message: Forbidden!Configured service account doesn't have access. Service 
> account may have been revoked. pods 
> "my-namespace-flink-cluster-taskmanager-1-138" is forbidden: exceeded quota: 
> my-namespace-resource-quota, requested: limits.cpu=3, used: 
> limits.cpu=12100m, limited: limits.cpu=13.
>         at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:684)
>  

  1   2   3   4   5   6   7   8   9   10   >