[jira] [Commented] (FLINK-33053) Watcher leak in Zookeeper HA mode

2023-09-07 Thread Zili Chen (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762965#comment-17762965
 ] 

Zili Chen commented on FLINK-33053:
---

The log seems trimed. I saw:

2023-09-08 11:09:03,738 DEBUG 
org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.WatcherRemovalManager
 [] - Removing watcher for path: 
/flink/flink-native-test-117/leader/7db5c7316828f598234677e2169e7b0f/connection_info

So the TM has issued watcher removal request.

> Watcher leak in Zookeeper HA mode
> -
>
> Key: FLINK-33053
> URL: https://issues.apache.org/jira/browse/FLINK-33053
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.17.0, 1.17.1
>Reporter: Yangze Guo
>Priority: Critical
> Attachments: 26.dump.zip, 26.log
>
>
> We observe a watcher leak in our OLAP stress test when enabling Zookeeper HA 
> mode. TM's watches on the leader of JobMaster has not been stopped after job 
> finished.
> Here is how we re-produce this issue:
>  - Start a session cluster and enable Zookeeper HA mode.
>  - Continuously and concurrently submit short queries, e.g. WordCount to the 
> cluster.
>  - echo -n wchp | nc \{zk host} \{zk port} to get current watches.
> We can see a lot of watches on 
> /flink/\{cluster_name}/leader/\{job_id}/connection_info.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33053) Watcher leak in Zookeeper HA mode

2023-09-07 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762962#comment-17762962
 ] 

Matthias Pohl commented on FLINK-33053:
---

Thanks for sharing this. I guess, the thread dump (which you can generate by 
sending {{kill -3 }} to the TaskManager process) would be more helpful 
than the heap dump. Or am I missing something. The thread dump might tell us 
more of what threads are still available even after the JobMaster is closed.

> Watcher leak in Zookeeper HA mode
> -
>
> Key: FLINK-33053
> URL: https://issues.apache.org/jira/browse/FLINK-33053
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.17.0, 1.17.1
>Reporter: Yangze Guo
>Priority: Critical
> Attachments: 26.dump.zip, 26.log
>
>
> We observe a watcher leak in our OLAP stress test when enabling Zookeeper HA 
> mode. TM's watches on the leader of JobMaster has not been stopped after job 
> finished.
> Here is how we re-produce this issue:
>  - Start a session cluster and enable Zookeeper HA mode.
>  - Continuously and concurrently submit short queries, e.g. WordCount to the 
> cluster.
>  - echo -n wchp | nc \{zk host} \{zk port} to get current watches.
> We can see a lot of watches on 
> /flink/\{cluster_name}/leader/\{job_id}/connection_info.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-33053) Watcher leak in Zookeeper HA mode

2023-09-07 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762962#comment-17762962
 ] 

Matthias Pohl edited comment on FLINK-33053 at 9/8/23 5:51 AM:
---

Thanks for sharing this. I guess, the thread dump (which you can generate by 
sending {{kill -3 }} to the TaskManager process) would be more 
helpful than the heap dump. Or am I missing something. The thread dump might 
tell us more of what threads are still available even after the JobMaster is 
closed.


was (Author: mapohl):
Thanks for sharing this. I guess, the thread dump (which you can generate by 
sending {{kill -3 }} to the TaskManager process) would be more helpful 
than the heap dump. Or am I missing something. The thread dump might tell us 
more of what threads are still available even after the JobMaster is closed.

> Watcher leak in Zookeeper HA mode
> -
>
> Key: FLINK-33053
> URL: https://issues.apache.org/jira/browse/FLINK-33053
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.17.0, 1.17.1
>Reporter: Yangze Guo
>Priority: Critical
> Attachments: 26.dump.zip, 26.log
>
>
> We observe a watcher leak in our OLAP stress test when enabling Zookeeper HA 
> mode. TM's watches on the leader of JobMaster has not been stopped after job 
> finished.
> Here is how we re-produce this issue:
>  - Start a session cluster and enable Zookeeper HA mode.
>  - Continuously and concurrently submit short queries, e.g. WordCount to the 
> cluster.
>  - echo -n wchp | nc \{zk host} \{zk port} to get current watches.
> We can see a lot of watches on 
> /flink/\{cluster_name}/leader/\{job_id}/connection_info.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] JunRuiLee commented on pull request #23181: W.I.P only used to run ci.

2023-09-07 Thread via GitHub


JunRuiLee commented on PR #23181:
URL: https://github.com/apache/flink/pull/23181#issuecomment-1711096171

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (FLINK-32620) Migrate DiscardingSink to sinkv2

2023-09-07 Thread Weijie Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weijie Guo closed FLINK-32620.
--
Resolution: Fixed

master(1.19) via c4a76e2f006d834771092a8fac96bfc63361e20b.

> Migrate DiscardingSink to sinkv2
> 
>
> Key: FLINK-32620
> URL: https://issues.apache.org/jira/browse/FLINK-32620
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Connectors / Common
>Affects Versions: 1.19.0
>Reporter: Weijie Guo
>Assignee: Weijie Guo
>Priority: Major
>  Labels: pull-request-available, stale-assigned
> Fix For: 1.19.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-32620) Migrate DiscardingSink to sinkv2

2023-09-07 Thread Weijie Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weijie Guo updated FLINK-32620:
---
Fix Version/s: 1.19.0

> Migrate DiscardingSink to sinkv2
> 
>
> Key: FLINK-32620
> URL: https://issues.apache.org/jira/browse/FLINK-32620
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Connectors / Common
>Affects Versions: 1.19.0
>Reporter: Weijie Guo
>Assignee: Weijie Guo
>Priority: Major
>  Labels: pull-request-available, stale-assigned
> Fix For: 1.19.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] flinkbot commented on pull request #23378: [FLINK-33055][doc]Correct the error value about 'state.backend.type' in the document

2023-09-07 Thread via GitHub


flinkbot commented on PR #23378:
URL: https://github.com/apache/flink/pull/23378#issuecomment-1711059626

   
   ## CI report:
   
   * 144ab76379d0058918c30cc5e0830f65f7f53d09 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] ljz2051 opened a new pull request, #23378: [FLINK-33055][doc]Correct the error value about 'state.backend.type' in the document

2023-09-07 Thread via GitHub


ljz2051 opened a new pull request, #23378:
URL: https://github.com/apache/flink/pull/23378

   ## What is the purpose of the change
   
   This pull request correct the error value about 'state.backend.type' in the 
document(config.md).
   
   ## Brief change log
   - Update the description about 'state.backend.type' in the document 
(config.md).
   
   ## Verifying this change
   
   This change is a trivial work about document without any test coverage.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency):  no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
 - The serializers: no
 - The runtime per-record code paths (performance sensitive):  no
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
 - The S3 file system connector: no
   
   ## Documentation
   
 - Does this pull request introduce a new feature? no
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #23377: [FLINK-33064][table-planner] Improve the error message when the lookup source is used as the scan source

2023-09-07 Thread via GitHub


flinkbot commented on PR #23377:
URL: https://github.com/apache/flink/pull/23377#issuecomment-1711053515

   
   ## CI report:
   
   * acfd24c22b8b16fd8d2f3d12b89ec3759efac90f UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Resolved] (FLINK-30217) Use ListState#update() to replace clear + add mode.

2023-09-07 Thread Hangxiang Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-30217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hangxiang Yu resolved FLINK-30217.
--
Fix Version/s: 1.19.0
   Resolution: Fixed

merged 20c0acf60d137cd613914503f1b40e7b2adb86c1 into master

> Use ListState#update() to replace clear + add mode.
> ---
>
> Key: FLINK-30217
> URL: https://issues.apache.org/jira/browse/FLINK-30217
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Reporter: xljtswf
>Assignee: Zakelly Lan
>Priority: Major
> Fix For: 1.19.0
>
>
> When using listState, I found many times we need to clear current state, then 
> add new values. This is especially common in 
> CheckpointedFunction#snapshotState, which is slower than just use 
> ListState#update().
> Suppose we want to update the liststate to contain value1, value2, value3.
> With current implementation, we first call Liststate#clear(). this updates 
> the state 1 time.
> then we add value1, value2, value3 to the state.
> if we use heap state, we need to search the stateTable 3 times and add 3 
> values to the list.
> this happens in memory and is not too bad.
> if we use rocksdb. then we will call backend.db.merge() 3 times.
> finally, we will  update the state 4 times.
> The more values to be added, the more times we will update the state.
> while if we use listState#update. then we just need to update the state 1 
> time. I think this can save a lot of time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] JunRuiLee commented on pull request #23181: W.I.P only used to run ci.

2023-09-07 Thread via GitHub


JunRuiLee commented on PR #23181:
URL: https://github.com/apache/flink/pull/23181#issuecomment-1711052496

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-33055) Correct the error value about 'state.backend.type' in the document

2023-09-07 Thread Hangxiang Yu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762950#comment-17762950
 ] 

Hangxiang Yu commented on FLINK-33055:
--

[~lijinzhong] Thanks for volunteering, already assigned to you, please go ahead.

> Correct the error value about 'state.backend.type' in the document
> --
>
> Key: FLINK-33055
> URL: https://issues.apache.org/jira/browse/FLINK-33055
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation, Runtime / State Backends
>Reporter: Hangxiang Yu
>Assignee: Jinzhong Li
>Priority: Minor
>
>  
> {code:java}
> state.backend.type: The state backend to use. This defines the data structure 
> mechanism for taking snapshots. Common values are filesystem or rocksdb{code}
> filesystem should be replaced with hashmap after FLINK-16444.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33055) Correct the error value about 'state.backend.type' in the document

2023-09-07 Thread Hangxiang Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hangxiang Yu reassigned FLINK-33055:


Assignee: Jinzhong Li

> Correct the error value about 'state.backend.type' in the document
> --
>
> Key: FLINK-33055
> URL: https://issues.apache.org/jira/browse/FLINK-33055
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation, Runtime / State Backends
>Reporter: Hangxiang Yu
>Assignee: Jinzhong Li
>Priority: Minor
>
>  
> {code:java}
> state.backend.type: The state backend to use. This defines the data structure 
> mechanism for taking snapshots. Common values are filesystem or rocksdb{code}
> filesystem should be replaced with hashmap after FLINK-16444.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] masteryhx closed pull request #23344: [FLINK-30217] Replace ListState#clear along with following #add/addAll with a ListState#update

2023-09-07 Thread via GitHub


masteryhx closed pull request #23344: [FLINK-30217] Replace ListState#clear 
along with following #add/addAll with a ListState#update
URL: https://github.com/apache/flink/pull/23344


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] swuferhong opened a new pull request, #23377: [FLINK-33064][table-planner] Improve the error message when the lookup source is used as the scan source

2023-09-07 Thread via GitHub


swuferhong opened a new pull request, #23377:
URL: https://github.com/apache/flink/pull/23377

   
   
   
   
   ## What is the purpose of the change
   
   Improve the error message when the lookup source is used as the scan source. 
Currently, if we use a source which only implement `LookupTableSource` but not 
implement `ScanTableSource` as a scan source, it cannot get a property plan and 
give a `Cannot generate a valid execution plan for the given query` which can 
be improved.
   
   
   ## Brief change log
   - optimize the erro message.
   - adding test to cover it 
   
   
   ## Verifying this change
   - adding test to cover it 
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
 - The serializers: no
 - The runtime per-record code paths (performance sensitive): no
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
 - The S3 file system connector: no
   
   ## Documentation
   
 - Does this pull request introduce a new feature?  no
 - If yes, how is the feature documented? no docs
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-33055) Correct the error value about 'state.backend.type' in the document

2023-09-07 Thread Jinzhong Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762948#comment-17762948
 ] 

Jinzhong Li commented on FLINK-33055:
-

hi,[~masteryhx] , I would like to take this trivial work.

> Correct the error value about 'state.backend.type' in the document
> --
>
> Key: FLINK-33055
> URL: https://issues.apache.org/jira/browse/FLINK-33055
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation, Runtime / State Backends
>Reporter: Hangxiang Yu
>Priority: Minor
>
>  
> {code:java}
> state.backend.type: The state backend to use. This defines the data structure 
> mechanism for taking snapshots. Common values are filesystem or rocksdb{code}
> filesystem should be replaced with hashmap after FLINK-16444.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] JunRuiLee commented on pull request #23181: W.I.P only used to run ci.

2023-09-07 Thread via GitHub


JunRuiLee commented on PR #23181:
URL: https://github.com/apache/flink/pull/23181#issuecomment-1711047440

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (FLINK-33064) Improve the error message when the lookup source is used as the scan source

2023-09-07 Thread Yunhong Zheng (Jira)
Yunhong Zheng created FLINK-33064:
-

 Summary: Improve the error message when the lookup source is used 
as the scan source
 Key: FLINK-33064
 URL: https://issues.apache.org/jira/browse/FLINK-33064
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Planner
Affects Versions: 1.18.0
Reporter: Yunhong Zheng
 Fix For: 1.19.0


Improve the error message when the lookup source is used as the scan source. 
Currently, if we use a source which only implement LookupTableSource but not 
implement ScanTableSource, as a scan source, it cannot get a property plan and 
give a '

Cannot generate a valid execution plan for the given query' which can be 
improved.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-15922) Show "Warn - received late message for checkpoint" only when checkpoint actually expired

2023-09-07 Thread Hangxiang Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-15922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hangxiang Yu resolved FLINK-15922.
--
Fix Version/s: 1.19.0
   Resolution: Fixed

merged 80a8250176fb2b1d9809fcd47df6097fd53999f6 into master

> Show "Warn - received late message for checkpoint" only when checkpoint 
> actually expired
> 
>
> Key: FLINK-15922
> URL: https://issues.apache.org/jira/browse/FLINK-15922
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Affects Versions: 1.10.0
>Reporter: Stephan Ewen
>Assignee: Zakelly Lan
>Priority: Not a Priority
>  Labels: auto-deprioritized-major, beginner, beginner-friendly, 
> usability
> Fix For: 1.19.0
>
>
> The message "Warn - received late message for checkpoint" is shown frequently 
> in the logs, also when a checkpoint was purposefully canceled.
> In those case, this message is unhelpful and misleading.
> We should log this only when the checkpoint is actually expired.
> Meaning that when receiving the message, we check if we have an expired 
> checkpoint for that ID. If yes, we log that message, if not, we simply drop 
> the message.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] masteryhx closed pull request #23364: [FLINK-15922] Only remember expired checkpoints and log received late messages of those in CheckpointCoordinator

2023-09-07 Thread via GitHub


masteryhx closed pull request #23364: [FLINK-15922] Only remember expired 
checkpoints and log received late messages of those in CheckpointCoordinator
URL: https://github.com/apache/flink/pull/23364


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] xintongsong commented on a diff in pull request #23255: [FLINK-32870][network] Tiered storage supports reading multiple small buffers by reading and slicing one large buffer

2023-09-07 Thread via GitHub


xintongsong commented on code in PR #23255:
URL: https://github.com/apache/flink/pull/23255#discussion_r1319297747


##
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/hybrid/tiered/file/PartitionFileReader.java:
##
@@ -78,4 +87,79 @@ long getPriority(
 
 /** Release the {@link PartitionFileReader}. */
 void release();
+
+/** A {@link PartialBuffer} is a part slice of a larger buffer. */
+class PartialBuffer extends CompositeBuffer {
+
+public PartialBuffer(BufferHeader bufferHeader) {
+super(bufferHeader);
+}
+}
+
+/**
+ * A wrapper class of the reading buffer result, including the read 
buffers, the hint of
+ * continue reading, and the read progress, etc.
+ */
+class ReadBufferResult {
+
+/** The read buffers. */
+private final List readBuffers;
+
+/**
+ * A hint to determine whether the caller should continue reading the 
following buffers.
+ * Note that this hint is merely a recommendation and not obligatory. 
Following the hint
+ * while reading buffers may improve performance.
+ */
+private final boolean shouldContinueReadHint;

Review Comment:
   ```suggestion
   private final boolean continuousReadSuggested;
   ```



##
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/hybrid/tiered/file/PartitionFileReader.java:
##
@@ -78,4 +87,79 @@ long getPriority(
 
 /** Release the {@link PartitionFileReader}. */
 void release();
+
+/** A {@link PartialBuffer} is a part slice of a larger buffer. */
+class PartialBuffer extends CompositeBuffer {
+
+public PartialBuffer(BufferHeader bufferHeader) {
+super(bufferHeader);
+}
+}

Review Comment:
   Why not use `CompositeBuffer` directly?



##
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/hybrid/tiered/tier/disk/DiskIOScheduler.java:
##
@@ -396,13 +417,46 @@ private void prepareForScheduling() {
 if (nextSegmentId < 0) {
 updateSegmentId();
 }
+if (nextSegmentId < 0) {
+priority = Long.MAX_VALUE;
+return;
+}
 priority =
-nextSegmentId < 0
-? Long.MAX_VALUE
+readProgress != null
+? readProgress.getCurrentReadOffset()

Review Comment:
   This implies the io scheduler understands that priority is file offset.



##
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/hybrid/tiered/file/PartitionFileReader.java:
##
@@ -78,4 +87,79 @@ long getPriority(
 
 /** Release the {@link PartitionFileReader}. */
 void release();
+
+/** A {@link PartialBuffer} is a part slice of a larger buffer. */
+class PartialBuffer extends CompositeBuffer {
+
+public PartialBuffer(BufferHeader bufferHeader) {
+super(bufferHeader);
+}
+}
+
+/**
+ * A wrapper class of the reading buffer result, including the read 
buffers, the hint of
+ * continue reading, and the read progress, etc.
+ */
+class ReadBufferResult {
+
+/** The read buffers. */
+private final List readBuffers;
+
+/**
+ * A hint to determine whether the caller should continue reading the 
following buffers.
+ * Note that this hint is merely a recommendation and not obligatory. 
Following the hint
+ * while reading buffers may improve performance.
+ */
+private final boolean shouldContinueReadHint;
+
+/** The read progress state. */
+private final ReadProgress readProgress;
+
+public ReadBufferResult(
+List readBuffers,
+boolean shouldContinueReadHint,
+ReadProgress readProgress) {
+this.readBuffers = readBuffers;
+this.shouldContinueReadHint = shouldContinueReadHint;
+this.readProgress = readProgress;
+}
+
+public List getReadBuffers() {
+return readBuffers;
+}
+
+public boolean shouldContinueReadHint() {
+return shouldContinueReadHint;
+}
+
+public ReadProgress getReadProgress() {
+return readProgress;
+}
+}
+
+/** The {@link ReadProgress} mainly includes current reading offset, end 
of read offset, etc. */
+class ReadProgress {
+
+/**
+ * The current read file offset. Note the offset does not contain the 
length of the partial
+ * buffer, because the partial buffer may be dropped at anytime.
+ */
+private final long currentReadOffset;

Review Comment:
   ```suggestion
   private final long currentBufferOffset;
   ```



##

[GitHub] [flink] flinkbot commented on pull request #23376: [FLINK-14032][state]Make the cache size of RocksDBPriorityQueueSetFactory configurable

2023-09-07 Thread via GitHub


flinkbot commented on PR #23376:
URL: https://github.com/apache/flink/pull/23376#issuecomment-1711040369

   
   ## CI report:
   
   * df6392d251e75003327ada3ba5d728ec98674ac6 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (FLINK-33063) udaf with user defined pojo object throw error while generate record equaliser

2023-09-07 Thread lincoln lee (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lincoln lee reassigned FLINK-33063:
---

Assignee: Yunhong Zheng

> udaf with user defined pojo object throw error while generate record equaliser
> --
>
> Key: FLINK-33063
> URL: https://issues.apache.org/jira/browse/FLINK-33063
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.18.0
>Reporter: Yunhong Zheng
>Assignee: Yunhong Zheng
>Priority: Major
> Fix For: 1.18.0
>
>
> Udaf with user define pojo object throw error while generating record 
> equaliser: 
> When user create an udaf while recore contains user define complex pojo 
> object (like List or Map). The codegen 
> will throw error while generating record equaliser, the error is:
> {code:java}
> A method named "compareTo" is not declared in any enclosing class nor any 
> subtype, nor through a static import.{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33063) udaf with user defined pojo object throw error while generate record equaliser

2023-09-07 Thread lincoln lee (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762946#comment-17762946
 ] 

lincoln lee commented on FLINK-33063:
-

[~337361...@qq.com] assigned to you :)

> udaf with user defined pojo object throw error while generate record equaliser
> --
>
> Key: FLINK-33063
> URL: https://issues.apache.org/jira/browse/FLINK-33063
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.18.0
>Reporter: Yunhong Zheng
>Assignee: Yunhong Zheng
>Priority: Major
> Fix For: 1.18.0
>
>
> Udaf with user define pojo object throw error while generating record 
> equaliser: 
> When user create an udaf while recore contains user define complex pojo 
> object (like List or Map). The codegen 
> will throw error while generating record equaliser, the error is:
> {code:java}
> A method named "compareTo" is not declared in any enclosing class nor any 
> subtype, nor through a static import.{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] ljz2051 opened a new pull request, #23376: [FLINK-14032]Make the cache size of RocksDBPriorityQueueSetFactory configurable

2023-09-07 Thread via GitHub


ljz2051 opened a new pull request, #23376:
URL: https://github.com/apache/flink/pull/23376

   ## What is the purpose of the change
   
   This pull request make the cache size of RocksDBPriorityQueueSetFactory 
configurable.
   
   ## Brief change log
   - Pass the configurableOptions of EmbeddedRocksDBStateBackend into 
RocksDBKeyedStateBackendBuilder,  then create RocksDBPriorityQueueSetFactory 
using the configured cache size
   
   
   ## Verifying this change
   
   This change is a trivial work without any test coverage.
   
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency):  no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`:  no
 - The serializers: no
 - The runtime per-record code paths (performance sensitive): no
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
 - The S3 file system connector: no
   
   ## Documentation
   
 - Does this pull request introduce a new feature?  no
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] JunRuiLee commented on pull request #23181: W.I.P only used to run ci.

2023-09-07 Thread via GitHub


JunRuiLee commented on PR #23181:
URL: https://github.com/apache/flink/pull/23181#issuecomment-1711032791

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] JunRuiLee commented on pull request #23181: W.I.P only used to run ci.

2023-09-07 Thread via GitHub


JunRuiLee commented on PR #23181:
URL: https://github.com/apache/flink/pull/23181#issuecomment-1711029547

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] JunRuiLee commented on pull request #23181: W.I.P only used to run ci.

2023-09-07 Thread via GitHub


JunRuiLee commented on PR #23181:
URL: https://github.com/apache/flink/pull/23181#issuecomment-1711027731

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-33053) Watcher leak in Zookeeper HA mode

2023-09-07 Thread Yangze Guo (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762936#comment-17762936
 ] 

Yangze Guo commented on FLINK-33053:


I reproduce the issue and filter out some logs. [^26.dump.zip] [^26.log] This 
tm still watches the leader of job 7db5c7316828f598234677e2169e7b0f. 

{{/flink/flink-native-test-117/leader/7db5c7316828f598234677e2169e7b0f/connection_info}}
{{        0x6400013b748a4420}}

 

[~mapohl] [~tison] 

> Watcher leak in Zookeeper HA mode
> -
>
> Key: FLINK-33053
> URL: https://issues.apache.org/jira/browse/FLINK-33053
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.17.0, 1.17.1
>Reporter: Yangze Guo
>Priority: Critical
> Attachments: 26.dump.zip, 26.log
>
>
> We observe a watcher leak in our OLAP stress test when enabling Zookeeper HA 
> mode. TM's watches on the leader of JobMaster has not been stopped after job 
> finished.
> Here is how we re-produce this issue:
>  - Start a session cluster and enable Zookeeper HA mode.
>  - Continuously and concurrently submit short queries, e.g. WordCount to the 
> cluster.
>  - echo -n wchp | nc \{zk host} \{zk port} to get current watches.
> We can see a lot of watches on 
> /flink/\{cluster_name}/leader/\{job_id}/connection_info.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33053) Watcher leak in Zookeeper HA mode

2023-09-07 Thread Yangze Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yangze Guo updated FLINK-33053:
---
Attachment: 26.dump.zip

> Watcher leak in Zookeeper HA mode
> -
>
> Key: FLINK-33053
> URL: https://issues.apache.org/jira/browse/FLINK-33053
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.17.0, 1.17.1
>Reporter: Yangze Guo
>Priority: Critical
> Attachments: 26.dump.zip, 26.log
>
>
> We observe a watcher leak in our OLAP stress test when enabling Zookeeper HA 
> mode. TM's watches on the leader of JobMaster has not been stopped after job 
> finished.
> Here is how we re-produce this issue:
>  - Start a session cluster and enable Zookeeper HA mode.
>  - Continuously and concurrently submit short queries, e.g. WordCount to the 
> cluster.
>  - echo -n wchp | nc \{zk host} \{zk port} to get current watches.
> We can see a lot of watches on 
> /flink/\{cluster_name}/leader/\{job_id}/connection_info.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33053) Watcher leak in Zookeeper HA mode

2023-09-07 Thread Yangze Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yangze Guo updated FLINK-33053:
---
Attachment: 26.log

> Watcher leak in Zookeeper HA mode
> -
>
> Key: FLINK-33053
> URL: https://issues.apache.org/jira/browse/FLINK-33053
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.17.0, 1.17.1
>Reporter: Yangze Guo
>Priority: Critical
> Attachments: 26.log
>
>
> We observe a watcher leak in our OLAP stress test when enabling Zookeeper HA 
> mode. TM's watches on the leader of JobMaster has not been stopped after job 
> finished.
> Here is how we re-produce this issue:
>  - Start a session cluster and enable Zookeeper HA mode.
>  - Continuously and concurrently submit short queries, e.g. WordCount to the 
> cluster.
>  - echo -n wchp | nc \{zk host} \{zk port} to get current watches.
> We can see a lot of watches on 
> /flink/\{cluster_name}/leader/\{job_id}/connection_info.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] flinkbot commented on pull request #23375: Dev 0908 1101

2023-09-07 Thread via GitHub


flinkbot commented on PR #23375:
URL: https://github.com/apache/flink/pull/23375#issuecomment-1711016903

   
   ## CI report:
   
   * ac37fad2d9e939bd2dd3397de07f677b0931eaad UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] WencongLiu opened a new pull request, #23375: Dev 0908 1101

2023-09-07 Thread via GitHub


WencongLiu opened a new pull request, #23375:
URL: https://github.com/apache/flink/pull/23375

   
   
   ## What is the purpose of the change
   
   *(For example: This pull request makes task deployment go through the blob 
server, rather than through RPC. That way we avoid re-transferring them on each 
deployment (during recovery).)*
   
   
   ## Brief change log
   
   *(for example:)*
 - *The TaskInfo is stored in the blob store on job creation time as a 
persistent artifact*
 - *Deployments RPC transmits only the blob storage reference*
 - *TaskManagers retrieve the TaskInfo from the blob cache*
   
   
   ## Verifying this change
   
   Please make sure both new and modified tests in this PR follows the 
conventions defined in our code quality guide: 
https://flink.apache.org/contributing/code-style-and-quality-common.html#testing
   
   *(Please pick either of the following options)*
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This change is already covered by existing tests, such as *(please describe 
tests)*.
   
   *(or)*
   
   This change added tests and can be verified as follows:
   
   *(example:)*
 - *Added integration tests for end-to-end deployment with large payloads 
(100MB)*
 - *Extended integration test for recovery after master (JobManager) 
failure*
 - *Added test that validates that TaskInfo is transferred only once across 
recoveries*
 - *Manually verified the change by running a 4 node cluster with 2 
JobManagers and 4 TaskManagers, a stateful streaming program, and killing one 
JobManager and two TaskManagers during the execution, verifying that recovery 
happens correctly.*
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / no)
 - The serializers: (yes / no / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / no / 
don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
 - The S3 file system connector: (yes / no / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / no)
 - If yes, how is the feature documented? (not applicable / docs / JavaDocs 
/ not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] mayuehappy commented on pull request #23169: [FLINK-31238] [WIP] Use IngestDB to speed up Rocksdb rescaling recovery

2023-09-07 Thread via GitHub


mayuehappy commented on PR #23169:
URL: https://github.com/apache/flink/pull/23169#issuecomment-1711009736

   > @mayuehappy I noticed that one required PR against RocksDB 
([facebook/rocksdb#11646](https://github.com/facebook/rocksdb/pull/11646)) is 
still not merged and the conversation is stalling. Do you think it's ok to ping 
Adam to take another look?
   
   
   
   > @mayuehappy I noticed that one required PR against RocksDB 
([facebook/rocksdb#11646](https://github.com/facebook/rocksdb/pull/11646)) is 
still not merged and the conversation is stalling. Do you think it's ok to ping 
Adam to take another look?
   
   @StefanRRichter Thanks for the reminder, I've pinged him and cbi42 under the 
PR. 
   BTW, is it possible that we merge this part of the JNI code into frocksdb 
before they merge into rocksdb?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] mayuehappy commented on pull request #23169: [FLINK-31238] [WIP] Use IngestDB to speed up Rocksdb rescaling recovery

2023-09-07 Thread via GitHub


mayuehappy commented on PR #23169:
URL: https://github.com/apache/flink/pull/23169#issuecomment-1711007885

   > It looks like we will need to bump RocksDB from 6.20.x to 8.5.x
   > 
   > > 8.5.0-ververica-1.0-db-SNASHOT
   > 
   > @curcur or @Myasuka , do you know if there are any known hurdles that 
would prevent us from upgrading RocksDB atm?
   
   
   
   > @mayuehappy thanks for the draft PR. General idea of the PR looks good to 
me, we should still polish it and avoid code duplication. I currently cannot 
test the PR because the frocksDB dependency is missing. How do you want to 
proceed with this?
   @StefanRRichter 
   I’m not sure how the Flink community tested Frocksdb-jni before it was 
released. Is there a maven repository for testing where I can deploy my 
test-frocksdb-jar ? @Myasuka 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-33063) udaf with user defined pojo object throw error while generate record equaliser

2023-09-07 Thread Yunhong Zheng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762932#comment-17762932
 ] 

Yunhong Zheng commented on FLINK-33063:
---

Hi, [~lincoln.86xy] . Can you assign this issue to me? Thanks !

> udaf with user defined pojo object throw error while generate record equaliser
> --
>
> Key: FLINK-33063
> URL: https://issues.apache.org/jira/browse/FLINK-33063
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.18.0
>Reporter: Yunhong Zheng
>Priority: Major
> Fix For: 1.18.0
>
>
> Udaf with user define pojo object throw error while generating record 
> equaliser: 
> When user create an udaf while recore contains user define complex pojo 
> object (like List or Map). The codegen 
> will throw error while generating record equaliser, the error is:
> {code:java}
> A method named "compareTo" is not declared in any enclosing class nor any 
> subtype, nor through a static import.{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] flinkbot commented on pull request #23374: [FLINK-33063][table-runtime] Fix udaf with complex user defined pojo object throw error while generate record equaliser

2023-09-07 Thread via GitHub


flinkbot commented on PR #23374:
URL: https://github.com/apache/flink/pull/23374#issuecomment-1711004891

   
   ## CI report:
   
   * 52afe17fa1e4ac66fa3eb72bb9fc0d0159e2a631 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] Zakelly commented on pull request #23364: [FLINK-15922] Only remember expired checkpoints and log received late messages of those in CheckpointCoordinator

2023-09-07 Thread via GitHub


Zakelly commented on PR #23364:
URL: https://github.com/apache/flink/pull/23364#issuecomment-1711002566

   @masteryhx Thanks for your comments. I changed my PR accordingly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] swuferhong opened a new pull request, #23374: [FLINK-33063][table-planner] Fix udaf with complex user defined pojo object throw error while generate record equaliser

2023-09-07 Thread via GitHub


swuferhong opened a new pull request, #23374:
URL: https://github.com/apache/flink/pull/23374

   
   
   
   
   ## What is the purpose of the change
   
   When user create an udaf while recore contains user define complex pojo 
object (like List or Map). The codegen 
will throw error while generating record equaliser, the error is:
   `A method named "compareTo" is not declared in any enclosing class nor any 
subtype, nor through a static import.`
   
   
   ## Brief change log
   - Add a test to cover the user defined pojo object.
   
   ## Verifying this change
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
 - The serializers: (yes / no / don't know)
 - The runtime per-record code paths (performance sensitive): no
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
 - The S3 file system connector: no
   
   ## Documentation
   
 - Does this pull request introduce a new feature? no
 - If yes, how is the feature documented?  no docs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (FLINK-33063) udaf with user defined pojo object throw error while generate record equaliser

2023-09-07 Thread Yunhong Zheng (Jira)
Yunhong Zheng created FLINK-33063:
-

 Summary: udaf with user defined pojo object throw error while 
generate record equaliser
 Key: FLINK-33063
 URL: https://issues.apache.org/jira/browse/FLINK-33063
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Runtime
Affects Versions: 1.18.0
Reporter: Yunhong Zheng
 Fix For: 1.18.0


Udaf with user define pojo object throw error while generating record 
equaliser: 

When user create an udaf while recore contains user define complex pojo object 
(like List or Map). The codegen will 
throw error while generating record equaliser, the error is:
{code:java}
A method named "compareTo" is not declared in any enclosing class nor any 
subtype, nor through a static import.{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] WencongLiu commented on pull request #23362: [FLINK-33041][docs] Add an article to guide users to migrate their DataSet jobs to DataStream

2023-09-07 Thread via GitHub


WencongLiu commented on PR #23362:
URL: https://github.com/apache/flink/pull/23362#issuecomment-1710989624

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-32962) Failure to install python dependencies from requirements file

2023-09-07 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762927#comment-17762927
 ] 

Dian Fu commented on FLINK-32962:
-

+1 to backport to Flink 1.17 and 1.18, especially Flink 1.18.

> Failure to install python dependencies from requirements file
> -
>
> Key: FLINK-32962
> URL: https://issues.apache.org/jira/browse/FLINK-32962
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.15.4, 1.16.2, 1.18.0, 1.17.1
>Reporter: Aleksandr Pilipenko
>Priority: Major
>  Labels: pull-request-available
>
> We have encountered an issue when Flink fails to install python dependencies 
> from requirements file if python environment contains setuptools dependency 
> version 67.5.0 or above.
> Flink job fails with following error:
> {code:java}
> py4j.protocol.Py4JJavaError: An error occurred while calling o118.await.
> : java.util.concurrent.ExecutionException: 
> org.apache.flink.table.api.TableException: Failed to wait job finish
> at 
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
> at 
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
> at 
> org.apache.flink.table.api.internal.TableResultImpl.awaitInternal(TableResultImpl.java:118)
> at 
> org.apache.flink.table.api.internal.TableResultImpl.await(TableResultImpl.java:81)
> ...
> Caused by: java.util.concurrent.ExecutionException: 
> org.apache.flink.client.program.ProgramInvocationException: Job failed 
> (JobID: 2ca4026944022ac4537c503464d4c47f)
> at 
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
> at 
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
> at 
> org.apache.flink.table.api.internal.InsertResultProvider.hasNext(InsertResultProvider.java:83)
> ... 6 more
> Caused by: org.apache.flink.client.program.ProgramInvocationException: Job 
> failed (JobID: 2ca4026944022ac4537c503464d4c47f)
> at 
> org.apache.flink.client.deployment.ClusterClientJobClientAdapter.lambda$null$6(ClusterClientJobClientAdapter.java:130)
> at 
> java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642)
> at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
> at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
> ...
> Caused by: java.io.IOException: java.io.IOException: Failed to execute the 
> command: /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install 
> --ignore-installed -r 
> /var/folders/rb/q_3h54_94b57gz_qkbp593vwgn/T/tm_localhost:63904-4178d3/blobStorage/job_2ca4026944022ac4537c503464d4c47f/blob_p-feb04bed919e628d98b1ef085111482f05bf43a1-1f5c3fda7cbdd6842e950e04d9d807c5
>  --install-option 
> --prefix=/var/folders/rb/q_3h54_94b57gz_qkbp593vwgn/T/python-dist-c17912db-5aba-439f-9e39-69cb8c957bec/python-requirements
> output:
> Usage:
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
>  [package-index-options] ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] -r 
>  [package-index-options] ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
> [-e]  ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
> [-e]  ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
>  ...
> no such option: --install-option
> at 
> org.apache.flink.python.util.PythonEnvironmentManagerUtils.pipInstallRequirements(PythonEnvironmentManagerUtils.java:144)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager.installRequirements(AbstractPythonEnvironmentManager.java:215)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager.lambda$open$0(AbstractPythonEnvironmentManager.java:126)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager$PythonEnvResources.createResource(AbstractPythonEnvironmentManager.java:435)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager$PythonEnvResources.getOrAllocateSharedResource(AbstractPythonEnvironmentManager.java:402)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager.open(AbstractPythonEnvironmentManager.java:114)
> at 
> org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.open(BeamPythonFunctionRunner.java:238)
> at 
> org.apache.flink.streaming.api.operators.python.process.AbstractExternalPythonFunctionOperator.open(AbstractExternalPythonFunctionOperator.java:57)
> at 
> 

[jira] [Assigned] (FLINK-32962) Failure to install python dependencies from requirements file

2023-09-07 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu reassigned FLINK-32962:
---

Assignee: Aleksandr Pilipenko

> Failure to install python dependencies from requirements file
> -
>
> Key: FLINK-32962
> URL: https://issues.apache.org/jira/browse/FLINK-32962
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.15.4, 1.16.2, 1.18.0, 1.17.1
>Reporter: Aleksandr Pilipenko
>Assignee: Aleksandr Pilipenko
>Priority: Major
>  Labels: pull-request-available
>
> We have encountered an issue when Flink fails to install python dependencies 
> from requirements file if python environment contains setuptools dependency 
> version 67.5.0 or above.
> Flink job fails with following error:
> {code:java}
> py4j.protocol.Py4JJavaError: An error occurred while calling o118.await.
> : java.util.concurrent.ExecutionException: 
> org.apache.flink.table.api.TableException: Failed to wait job finish
> at 
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
> at 
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
> at 
> org.apache.flink.table.api.internal.TableResultImpl.awaitInternal(TableResultImpl.java:118)
> at 
> org.apache.flink.table.api.internal.TableResultImpl.await(TableResultImpl.java:81)
> ...
> Caused by: java.util.concurrent.ExecutionException: 
> org.apache.flink.client.program.ProgramInvocationException: Job failed 
> (JobID: 2ca4026944022ac4537c503464d4c47f)
> at 
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
> at 
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
> at 
> org.apache.flink.table.api.internal.InsertResultProvider.hasNext(InsertResultProvider.java:83)
> ... 6 more
> Caused by: org.apache.flink.client.program.ProgramInvocationException: Job 
> failed (JobID: 2ca4026944022ac4537c503464d4c47f)
> at 
> org.apache.flink.client.deployment.ClusterClientJobClientAdapter.lambda$null$6(ClusterClientJobClientAdapter.java:130)
> at 
> java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642)
> at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
> at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
> ...
> Caused by: java.io.IOException: java.io.IOException: Failed to execute the 
> command: /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install 
> --ignore-installed -r 
> /var/folders/rb/q_3h54_94b57gz_qkbp593vwgn/T/tm_localhost:63904-4178d3/blobStorage/job_2ca4026944022ac4537c503464d4c47f/blob_p-feb04bed919e628d98b1ef085111482f05bf43a1-1f5c3fda7cbdd6842e950e04d9d807c5
>  --install-option 
> --prefix=/var/folders/rb/q_3h54_94b57gz_qkbp593vwgn/T/python-dist-c17912db-5aba-439f-9e39-69cb8c957bec/python-requirements
> output:
> Usage:
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
>  [package-index-options] ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] -r 
>  [package-index-options] ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
> [-e]  ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
> [-e]  ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
>  ...
> no such option: --install-option
> at 
> org.apache.flink.python.util.PythonEnvironmentManagerUtils.pipInstallRequirements(PythonEnvironmentManagerUtils.java:144)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager.installRequirements(AbstractPythonEnvironmentManager.java:215)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager.lambda$open$0(AbstractPythonEnvironmentManager.java:126)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager$PythonEnvResources.createResource(AbstractPythonEnvironmentManager.java:435)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager$PythonEnvResources.getOrAllocateSharedResource(AbstractPythonEnvironmentManager.java:402)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager.open(AbstractPythonEnvironmentManager.java:114)
> at 
> org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.open(BeamPythonFunctionRunner.java:238)
> at 
> org.apache.flink.streaming.api.operators.python.process.AbstractExternalPythonFunctionOperator.open(AbstractExternalPythonFunctionOperator.java:57)
> at 
> 

[GitHub] [flink] 1996fanrui commented on pull request #23372: [FLINK-33042][flamegraph] Allow trigger flamegraph when task is initializing

2023-09-07 Thread via GitHub


1996fanrui commented on PR #23372:
URL: https://github.com/apache/flink/pull/23372#issuecomment-1710983769

   cc @RocMarshal 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (FLINK-33061) Translate failure-enricher documentation to Chinese

2023-09-07 Thread Weihua Hu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weihua Hu reassigned FLINK-33061:
-

Assignee: Matt Wang

> Translate failure-enricher documentation to Chinese
> ---
>
> Key: FLINK-33061
> URL: https://issues.apache.org/jira/browse/FLINK-33061
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Matt Wang
>Priority: Not a Priority
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-31895) End-to-end integration tests for failure labels

2023-09-07 Thread Weihua Hu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weihua Hu reassigned FLINK-31895:
-

Assignee: Matt Wang

> End-to-end integration tests for failure labels
> ---
>
> Key: FLINK-31895
> URL: https://issues.apache.org/jira/browse/FLINK-31895
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Matt Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33062) Deserialization creates multiple instances of case objects in Scala 2.13

2023-09-07 Thread SmedbergM (Jira)
SmedbergM created FLINK-33062:
-

 Summary: Deserialization creates multiple instances of case 
objects in Scala 2.13
 Key: FLINK-33062
 URL: https://issues.apache.org/jira/browse/FLINK-33062
 Project: Flink
  Issue Type: Bug
  Components: API / Core
Affects Versions: 1.17.1, 1.15.4
 Environment: Scala 2.13.12

Flink 1.17.1 and 1.15.4 running inside IntelliJ

Flink 1.15.4 running in AWS Managed Flink
Reporter: SmedbergM


See [https://github.com/SmedbergM/mwe-flink-2-13-deserialization] for a minimal 
working example.

When running a Flink job with Scala 2.13, deserialized objects whose fields are 
case objects have those case objects re-instantiated. Thus any code that relies 
on reference equality (such as methods of `scala.Option`) will break.

I suspect that this is due to Kyro deserialization not being singleton-aware 
for case objects, but I haven't been able to drill in and catch this in the act.

Here are relevant lines of my application log:

 
{code:java}
17:37:13.224 [jobmanager-io-thread-1] INFO  o.a.f.r.c.CheckpointCoordinator -- 
No checkpoint found during restore.
17:37:13.531 [parse-book -> Sink: log-book (2/2)#0] WARN  
smedbergm.mwe.BookSink$ -- Book.isbn: None with identityHashCode 1043314405 
reports isEmpty false; true None is 2019204827
17:37:13.531 [parse-book -> Sink: log-book (1/2)#0] INFO  
smedbergm.mwe.BookSink$ -- Winkler, Scott: Terraform In Action (ISBN 
978-1-61729-689-5) $49.99
17:37:13.534 [parse-book -> Sink: log-book (2/2)#0] WARN  
smedbergm.mwe.BookSink$ -- Book.isbn: None with identityHashCode 465138693 
reports isEmpty false; true None is 2019204827
17:37:13.534 [parse-book -> Sink: log-book (1/2)#0] INFO  
smedbergm.mwe.BookSink$ -- Čukić, Ivan: Functional Programming in C++ (ISBN 
978-1-61729-381-8) $49.99
17:37:13.538 [flink-akka.actor.default-dispatcher-8] INFO  
o.a.f.r.c.CheckpointCoordinator -- Stopping checkpoint coordinator for job 
eed1c049790ac5f38664ddfd6b049282. {code}
I know that https://issues.apache.org/jira/browse/FLINK-13414 (support for 
Scala 2.13) is still listed as in-progress, but there is no warning in the docs 
that using 2.13 might not be stable. (This particular error does not occur on 
Scala 2.12, in this case because Option.isEmpty was re-implemented in 2.13; 
however, I suspect that multiple deserialization may be occurring already in 
2.12.)

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-20628) Port RabbitMQ Sources to FLIP-27 API

2023-09-07 Thread Flink Jira Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flink Jira Bot updated FLINK-20628:
---
Labels: auto-deprioritized-major pull-request-available stale-assigned  
(was: auto-deprioritized-major pull-request-available)

I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I help 
the community manage its development. I see this issue is assigned but has not 
received an update in 30 days, so it has been labeled "stale-assigned".
If you are still working on the issue, please remove the label and add a 
comment updating the community on your progress.  If this issue is waiting on 
feedback, please consider this a reminder to the committer/reviewer. Flink is a 
very active project, and so we appreciate your patience.
If you are no longer working on the issue, please unassign yourself so someone 
else may work on it.


> Port RabbitMQ Sources to FLIP-27 API
> 
>
> Key: FLINK-20628
> URL: https://issues.apache.org/jira/browse/FLINK-20628
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors/ RabbitMQ
>Reporter: Jan Westphal
>Assignee: RocMarshal
>Priority: Minor
>  Labels: auto-deprioritized-major, pull-request-available, 
> stale-assigned
>
> *Structure*
> The new RabbitMQ Source will have three components:
>  * RabbitMQ enumerator that receives one RabbitMQ Channel Config.
>  * RabbitMQ splits contain the RabbitMQ Channel Config
>  * RabbitMQ Readers which subscribe to the same RabbitMQ channel and receive 
> the messages (automatically load balanced by RabbitMQ).
> *Checkpointing Enumerators*
> The enumerator only needs to checkpoint the RabbitMQ channel config since the 
> continuous discovery of new unread/unhandled messages is taken care of by the 
> subscribed RabbitMQ readers and RabbitMQ itself.
> *Checkpointing Readers*
> The new RabbitMQ Source needs to ensure that every reader can be checkpointed.
> Since RabbitMQ is non-persistent and cannot be read by offset, a combined 
> usage of checkpoints and message acknowledgments is necessary. Until a 
> received message is checkpointed by a reader, it will stay in an 
> un-acknowledge state. As soon as the checkpoint is created, the messages from 
> the last checkpoint can be acknowledged as handled against RabbitMQ and thus 
> will be deleted only then. Messages need to be acknowledged one by one as 
> messages are handled by each SourceReader individually.
> When deserializing the messages we will make use of the implementation in the 
> existing RabbitMQ Source.
> *Message Delivery Guarantees* 
> Unacknowledged messages of a reader will be redelivered by RabbitMQ 
> automatically to other consumers of the same channel if the reader goes down.
>  
> This Source is going to only support at-least-once as this is the default 
> RabbitMQ behavior and thus everything else would require changes to RabbitMQ 
> itself or would impair the idea of parallelizing SourceReaders.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-32509) avoid using skip in InputStreamFSInputWrapper.seek

2023-09-07 Thread Flink Jira Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flink Jira Bot updated FLINK-32509:
---
  Labels: auto-deprioritized-major pull-request-available  (was: 
pull-request-available stale-major)
Priority: Minor  (was: Major)

This issue was labeled "stale-major" 7 days ago and has not received any 
updates so it is being deprioritized. If this ticket is actually Major, please 
raise the priority and ask a committer to assign you the issue or revive the 
public discussion.


> avoid using skip in InputStreamFSInputWrapper.seek
> --
>
> Key: FLINK-32509
> URL: https://issues.apache.org/jira/browse/FLINK-32509
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core
>Affects Versions: 1.18.0
>Reporter: Libin Qin
>Priority: Minor
>  Labels: auto-deprioritized-major, pull-request-available
>
> The implementation of  InputStream does not return -1  for eof.
> The java doc of InputStream said "The skip method may, for a variety of 
> reasons, end up skipping over some smaller number of bytes, possibly 0." 
> For FileInputStream, it allows skipping any number of bytes past the end of 
> the file.
> So the method "seek" of InputStreamFSInputWrapper will cause infinite loop if 
> desired exceed end of file
>  
> I reproduced with following case
>  
> {code:java}
> byte[] bytes = "flink".getBytes();
> try (InputStream inputStream = new ByteArrayInputStream(bytes)){ 
> InputStreamFSInputWrapper wrapper = new 
> InputStreamFSInputWrapper(inputStream); 
> wrapper.seek(20); 
> } {code}
> I  found an issue of commons-io talks about the problem of skip
> https://issues.apache.org/jira/browse/IO-203
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-12869) Add yarn acls capability to flink containers

2023-09-07 Thread Flink Jira Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-12869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flink Jira Bot updated FLINK-12869:
---
Labels: pull-request-available stale-assigned  (was: pull-request-available)

I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I help 
the community manage its development. I see this issue is assigned but has not 
received an update in 30 days, so it has been labeled "stale-assigned".
If you are still working on the issue, please remove the label and add a 
comment updating the community on your progress.  If this issue is waiting on 
feedback, please consider this a reminder to the committer/reviewer. Flink is a 
very active project, and so we appreciate your patience.
If you are no longer working on the issue, please unassign yourself so someone 
else may work on it.


> Add yarn acls capability to flink containers
> 
>
> Key: FLINK-12869
> URL: https://issues.apache.org/jira/browse/FLINK-12869
> Project: Flink
>  Issue Type: New Feature
>  Components: Deployment / YARN
>Reporter: Nicolas Fraison
>Assignee: Archit Goyal
>Priority: Minor
>  Labels: pull-request-available, stale-assigned
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Yarn provide application acls mechanism to be able to provide specific rights 
> to other users than the one running the job (view logs through the 
> resourcemanager/job history, kill the application)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-31238) Use IngestDB to speed up Rocksdb rescaling recovery

2023-09-07 Thread Flink Jira Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flink Jira Bot updated FLINK-31238:
---
Labels: pull-request-available stale-assigned  (was: pull-request-available)

I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I help 
the community manage its development. I see this issue is assigned but has not 
received an update in 30 days, so it has been labeled "stale-assigned".
If you are still working on the issue, please remove the label and add a 
comment updating the community on your progress.  If this issue is waiting on 
feedback, please consider this a reminder to the committer/reviewer. Flink is a 
very active project, and so we appreciate your patience.
If you are no longer working on the issue, please unassign yourself so someone 
else may work on it.


> Use IngestDB to speed up Rocksdb rescaling recovery 
> 
>
> Key: FLINK-31238
> URL: https://issues.apache.org/jira/browse/FLINK-31238
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Affects Versions: 1.16.1
>Reporter: Yue Ma
>Assignee: Yue Ma
>Priority: Major
>  Labels: pull-request-available, stale-assigned
> Attachments: image-2023-02-27-16-41-18-552.png, 
> image-2023-02-27-16-57-18-435.png, image-2023-03-07-14-27-10-260.png, 
> image-2023-03-09-15-23-30-581.png, image-2023-03-09-15-26-12-314.png, 
> image-2023-03-09-15-28-32-363.png, image-2023-03-09-15-41-03-074.png, 
> image-2023-03-09-15-41-08-379.png, image-2023-03-09-15-45-56-081.png, 
> image-2023-03-09-15-46-01-176.png, image-2023-03-09-15-50-04-281.png, 
> image-2023-03-29-15-25-21-868.png, image-2023-07-17-14-37-38-864.png, 
> image-2023-07-17-14-38-56-946.png, image-2023-07-22-14-16-31-856.png, 
> image-2023-07-22-14-19-01-390.png, image-2023-08-08-21-32-43-783.png, 
> image-2023-08-08-21-34-39-008.png, image-2023-08-08-21-39-39-135.png, 
> screenshot-1.png
>
>
> (The detailed design is in this document
> [https://docs.google.com/document/d/10MNVytTsyiDLZQSR89kDkVdmK_YjbM6jh0teerfDFfI|https://docs.google.com/document/d/10MNVytTsyiDLZQSR89kDkVdmK_YjbM6jh0teerfDFfI])
> There have been many discussions and optimizations in the community about 
> optimizing rocksdb scaling and recovery.
> https://issues.apache.org/jira/browse/FLINK-17971
> https://issues.apache.org/jira/browse/FLINK-8845
> https://issues.apache.org/jira/browse/FLINK-21321
> We hope to discuss some of our explorations under this ticket
> The process of scaling and recovering in rocksdb simply requires two steps
>  # Insert the valid keyGroup data of the new task.
>  # Delete the invalid data in the old stateHandle.
> The current method for data writing is to specify the main Db first and then 
> insert data using writeBatch.In addition, the method of deleteRange is 
> currently used to speed up the ClipDB. But in our production environment, we 
> found that the speed of rescaling is still very slow, especially when the 
> state of a single Task is large. 
>  
> We hope that the previous sst file can be reused directly when restoring 
> state, instead of retraversing the data. So we made some attempts to optimize 
> it in our internal version of flink and frocksdb.
>  
> We added two APIs *ClipDb* and *IngestDb* in frocksdb. 
>  * ClipDB is used to clip the data of a DB. Different from db.DeteleRange and 
> db.Delete, DeleteValue and RangeTombstone will not be generated for parts 
> beyond the key range. We will iterate over the FileMetaData of db. Process 
> each sst file. There are three situations here. 
> If all the keys of a file are required, we will keep the sst file and do 
> nothing 
> If all the keys of the sst file exceed the specified range, we will delete 
> the file directly. 
> If we only need some part of the sst file, we will rewrite the required keys 
> to generate a new sst file。
> All sst file changes will be placed in a VersionEdit, and the current 
> versions will LogAndApply this edit to ensure that these changes can take 
> effect
>  * IngestDb is used to directly ingest all sst files of one DB into another 
> DB. But it is necessary to strictly ensure that the keys of the two DBs do 
> not overlap, which is easy to do in the Flink scenario. The hard link method 
> will be used in the process of ingesting files, so it will be very fast. At 
> the same time, the file number of the main DB will be incremented 
> sequentially, and the SequenceNumber of the main DB will be updated to the 
> larger SequenceNumber of the two DBs.
> When IngestDb and ClipDb are supported, the state restoration logic is as 
> follows
>  * Open the first StateHandle as the main DB and pause the compaction.
>  * Clip the main DB according to the KeyGroup range of the Task with ClipDB

[jira] [Updated] (FLINK-27402) Unexpected boolean expression simplification for AND expression

2023-09-07 Thread Flink Jira Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-27402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flink Jira Bot updated FLINK-27402:
---
Labels: pull-request-available stale-assigned  (was: pull-request-available)

I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I help 
the community manage its development. I see this issue is assigned but has not 
received an update in 30 days, so it has been labeled "stale-assigned".
If you are still working on the issue, please remove the label and add a 
comment updating the community on your progress.  If this issue is waiting on 
feedback, please consider this a reminder to the committer/reviewer. Flink is a 
very active project, and so we appreciate your patience.
If you are no longer working on the issue, please unassign yourself so someone 
else may work on it.


> Unexpected boolean expression simplification  for AND expression 
> -
>
> Key: FLINK-27402
> URL: https://issues.apache.org/jira/browse/FLINK-27402
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Reporter: luoyuxia
>Assignee: Yunhong Zheng
>Priority: Major
>  Labels: pull-request-available, stale-assigned
>
> Flink supports to compare between string and boolean, so the following sql 
> can work fine
>  
> {code:java}
> create table (c1 int, c2 string);
> select * from c2 = true;
> {code}
> But the following sql will throw excpetion 
>  
> {code:java}
> select * from c1 = 1 and c2 = true; {code}
> The reason it that Flink will try to simplify BOOLEAN expressions if possible 
> in 
> "c1 = 1 and c2 = true".
> So "c2 = true" will be simplified to "c2" by the following code in Flink:
>  
> {code:java}
> RexSimplify#simplifyAnd2ForUnknownAsFalse
> // Simplify BOOLEAN expressions if possible
> while (term.getKind() == SqlKind.EQUALS) {
> RexCall call = (RexCall) term;
> if (call.getOperands().get(0).isAlwaysTrue()) {
> term = call.getOperands().get(1);
> terms.set(i, term);
> continue;
> } else if (call.getOperands().get(1).isAlwaysTrue()) {
> term = call.getOperands().get(0);
> terms.set(i, term);
> continue;
> }
> break;
> } {code}
> So the expression will be reduced to ""c1 = 1 and c2". But AND requries both 
> sides are boolean expression and c2 is not a boolean expression for it 
> actually is a string.
> Then the exception "Boolean expression type expected" is thrown.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] flinkbot commented on pull request #23373: [FLINK-30025][doc] improve the option description with more precise content

2023-09-07 Thread via GitHub


flinkbot commented on PR #23373:
URL: https://github.com/apache/flink/pull/23373#issuecomment-1710829165

   
   ## CI report:
   
   * 6251199a537a74a4f122b1d6139380b7da89de3c UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] JingGe opened a new pull request, #23373: [FLINK-30025][doc] improve the option description with more precise content

2023-09-07 Thread via GitHub


JingGe opened a new pull request, #23373:
URL: https://github.com/apache/flink/pull/23373

   
   ## What is the purpose of the change
   
   improve the option description with more precise content
   
   
   ## Brief change log
   
 -  SqlClientOptions
 -  TableConfigOptions
   
   
   ## Verifying this change
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / **no**)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
 - The serializers: (yes / **no** / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / **no** 
/ don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / **no** / don't 
know)
 - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / **no**)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-32906) Release Testing: Verify FLIP-279 Unified the max display column width for SqlClient and Table APi in both Streaming and Batch execMode

2023-09-07 Thread Jing Ge (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762879#comment-17762879
 ] 

Jing Ge commented on FLINK-32906:
-

[~liyubin117] e.g. means "for example", it does not need to contain all cases. 
Nevertheless, I will improve the description with more precise content.

> Release Testing: Verify FLIP-279 Unified the max display column width for 
> SqlClient and Table APi in both Streaming and Batch execMode
> --
>
> Key: FLINK-32906
> URL: https://issues.apache.org/jira/browse/FLINK-32906
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Jing Ge
>Assignee: Yubin Li
>Priority: Major
> Fix For: 1.18.0
>
> Attachments: image-2023-08-22-18-54-45-122.png, 
> image-2023-08-22-18-54-57-699.png, image-2023-08-22-18-58-11-241.png, 
> image-2023-08-22-18-58-20-509.png, image-2023-08-22-19-04-49-344.png, 
> image-2023-08-22-19-06-49-335.png, image-2023-08-23-10-39-10-225.png, 
> image-2023-08-23-10-39-26-602.png, image-2023-08-23-10-41-30-913.png, 
> image-2023-08-23-10-41-42-513.png, image-2023-08-30-17-09-32-728.png, 
> screenshot-1.png
>
>
> more info could be found at 
> FLIP-279 
> [https://cwiki.apache.org/confluence/display/FLINK/FLIP-279+Unified+the+max+display+column+width+for+SqlClient+and+Table+APi+in+both+Streaming+and+Batch+execMode]
>  
> Tests:
> Both configs could be set with new value and following behaviours are 
> expected : 
> | | |sql-client.display.max-column-width, default value is 
> 30|table.display.max-column-width, default value is 30|
> |sqlclient|Streaming|text longer than the value will be truncated and 
> replaced with “...”|Text longer than the value will be truncated and replaced 
> with “...”|
> |sqlclient|Batch|text longer than the value will be truncated and replaced 
> with “...”|Text longer than the value will be truncated and replaced with 
> “...”|
> |Table API|Streaming|No effect. 
> table.display.max-column-width with the default value 30 will be used|Text 
> longer than the value will be truncated and replaced with “...”|
> |Table API|Batch|No effect. 
> table.display.max-column-width with the default value 30 will be used|Text 
> longer than the value will be truncated and replaced with “...”|
>  
> Please pay attention that this task offers a backward compatible solution and 
> deprecated sql-client.display.max-column-width, which means once 
> sql-client.display.max-column-width is used, it falls into the old scenario 
> where table.display.max-column-width didn't exist, any changes of 
> table.display.max-column-width won't take effect. You should test it either 
> by only using table.display.max-column-width or by using 
> sql-client.display.max-column-width, but not both of them back and forth.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] pgaref commented on a diff in pull request #22468: [FLINK-31889] Add documentation for implementing/loading enrichers

2023-09-07 Thread via GitHub


pgaref commented on code in PR #22468:
URL: https://github.com/apache/flink/pull/22468#discussion_r1318975346


##
docs/content/docs/deployment/advanced/failure_enrichers.md:
##
@@ -0,0 +1,106 @@
+---
+title: "Failure Enrichers"
+nav-title: failure-enrichers
+nav-parent_id: advanced
+nav-pos: 3
+---
+
+
+## Custom failure enrichers
+Flink provides a pluggable interface for users to register their custom logic 
and enrich failures with extra metadata labels (string key-value pairs).
+This enables users to implement their own failure enrichment plugins to 
categorize job failures, expose custom metrics, or make calls to external 
notification systems.
+
+FailureEnrichers are triggered every time an exception is reported at runtime 
by the JobManager.
+Every FailureEnricher may asynchronously return labels associated with the 
failure that are then exposed via the JobManager's REST API (e.g., a 
'type:System' label implying the failure is categorized as a system error).
+
+
+### Implement a plugin for your custom enricher
+
+To implement a custom FailureEnricher plugin, you need to:
+
+- Add your own FailureEnricher by implementing the {{< gh_link 
file="/flink-core/src/main/java/org/apache/flink/core/failure/FailureEnricher.java"
 name="FailureEnricher" >}} interface.
+
+- Add your own FailureEnricherFactory by implementing the {{< gh_link 
file="/flink-core/src/main/java/org/apache/flink/core/failure/FailureEnricherFactory.java"
 name="FailureEnricherFactory" >}} interface.
+
+- Add a service entry. Create a file 
`META-INF/services/org.apache.flink.core.failure.FailureEnricherFactory` which 
contains the class name of your failure enricher factory class (see [Java 
Service 
Loader](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/ServiceLoader.html)
 docs for more details).
+
+
+Then, create a jar which includes your `FailureEnricher`, 
`FailureEnricherFactory`, `META-INF/services/` and all external dependencies.
+Make a directory in `plugins/` of your Flink distribution with an arbitrary 
name, e.g. "failure-enrichment", and put the jar into this directory.
+See [Flink Plugin]({% link deployment/filesystems/plugins.md %}) for more 
details.
+
+{{< hint warning >}}
+Note that every FailureEnricher should have defined a set of {{< gh_link 
file="/flink-core/src/main/java/org/apache/flink/core/failure/FailureEnricher.java"
 name="output keys" >}} that may be associated with values. This set of keys 
has to be unique across enrichers otherwise values associated with them may be 
ignored.
+{{< /hint >}}
+
+FailureEnricherFactory example:
+
+``` java
+public class TestFailureEnricherFactory implements FailureEnricherFactory {
+
+   @Override
+   public FailureEnricher createFailureEnricher(Configuration conf) {
+return new CustomEnricher();
+   }
+}
+```
+
+FailureEnricher example:
+
+``` java
+public class CustomEnricher implements FailureEnricher {
+private final Set outputKeys;
+
+public CustomEnricher() {
+this.outputKeys = Collections.singleton("labelKey");
+}
+
+@Override
+public Set getOutputKeys() {
+return outputKeys;
+}
+
+@Override
+public CompletableFuture> processFailure(
+Throwable cause, Context context) {
+return 
CompletableFuture.completedFuture(Collections.singletonMap("labelKey", 
"labelValue"));
+}
+}
+```
+
+### Configuration
+
+The JobManager loads FailureEnricher plugins at startup. To make sure your 
FailureEnrichers are loaded all class names should be defined as part of 
[jobmanager.failure-enrichers configuration]({{< ref 
"docs/deployment/config#jobmanager-failure-enrichers" >}}).
+  If this configuration is empty, NO enrichers will be started. Example:
+```
+jobmanager.failure-enrichers = 
org.apache.flink.test.plugin.jar.failure.CustomEnricher

Review Comment:
   So what you are implying here is by filtering on the Factory level we could 
avoid instantiating unwanted enrichers that could be potentialy setting up 
resources? 
   
   Happy to add that as a followup ticket and follow an approach similar to 
MetricsReporter setup



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] pgaref commented on a diff in pull request #22468: [FLINK-31889] Add documentation for implementing/loading enrichers

2023-09-07 Thread via GitHub


pgaref commented on code in PR #22468:
URL: https://github.com/apache/flink/pull/22468#discussion_r1318968366


##
docs/content.zh/docs/deployment/advanced/failure_enrichers.md:
##
@@ -0,0 +1,106 @@
+---
+title: "Failure Enrichers"
+nav-title: failure-enrichers
+nav-parent_id: advanced
+nav-pos: 3
+---
+
+
+## Custom failure enrichers
+Flink provides a pluggable interface for users to register their custom logic 
and enrich failures with extra metadata labels (string key-value pairs).
+This enables users to implement their own failure enrichment plugins to 
categorize job failures, expose custom metrics, or make calls to external 
notification systems.

Review Comment:
   like a setup/close methods? Main functionality is enriching/labeling 
exceptions but that would be a useful addition indeed. Lets see what users do 
with it first :) 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] pgaref commented on a diff in pull request #22468: [FLINK-31889] Add documentation for implementing/loading enrichers

2023-09-07 Thread via GitHub


pgaref commented on code in PR #22468:
URL: https://github.com/apache/flink/pull/22468#discussion_r1318966093


##
docs/content/docs/deployment/advanced/failure_enrichers.md:
##
@@ -0,0 +1,106 @@
+---
+title: "Failure Enrichers"
+nav-title: failure-enrichers
+nav-parent_id: advanced
+nav-pos: 3
+---
+
+
+## Custom failure enrichers
+Flink provides a pluggable interface for users to register their custom logic 
and enrich failures with extra metadata labels (string key-value pairs).
+This enables users to implement their own failure enrichment plugins to 
categorize job failures, expose custom metrics, or make calls to external 
notification systems.
+
+FailureEnrichers are triggered every time an exception is reported at runtime 
by the JobManager.
+Every FailureEnricher may asynchronously return labels associated with the 
failure that are then exposed via the JobManager's REST API (e.g., a 
'type:System' label implying the failure is categorized as a system error).
+
+
+### Implement a plugin for your custom enricher
+
+To implement a custom FailureEnricher plugin, you need to:
+
+- Add your own FailureEnricher by implementing the {{< gh_link 
file="/flink-core/src/main/java/org/apache/flink/core/failure/FailureEnricher.java"
 name="FailureEnricher" >}} interface.
+
+- Add your own FailureEnricherFactory by implementing the {{< gh_link 
file="/flink-core/src/main/java/org/apache/flink/core/failure/FailureEnricherFactory.java"
 name="FailureEnricherFactory" >}} interface.
+
+- Add a service entry. Create a file 
`META-INF/services/org.apache.flink.core.failure.FailureEnricherFactory` which 
contains the class name of your failure enricher factory class (see [Java 
Service 
Loader](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/ServiceLoader.html)
 docs for more details).
+
+
+Then, create a jar which includes your `FailureEnricher`, 
`FailureEnricherFactory`, `META-INF/services/` and all external dependencies.
+Make a directory in `plugins/` of your Flink distribution with an arbitrary 
name, e.g. "failure-enrichment", and put the jar into this directory.
+See [Flink Plugin]({% link deployment/filesystems/plugins.md %}) for more 
details.
+
+{{< hint warning >}}
+Note that every FailureEnricher should have defined a set of {{< gh_link 
file="/flink-core/src/main/java/org/apache/flink/core/failure/FailureEnricher.java"
 name="output keys" >}} that may be associated with values. This set of keys 
has to be unique across enrichers otherwise values associated with them may be 
ignored.
+{{< /hint >}}
+
+FailureEnricherFactory example:
+
+``` java
+public class TestFailureEnricherFactory implements FailureEnricherFactory {
+
+   @Override
+   public FailureEnricher createFailureEnricher(Configuration conf) {
+return new CustomEnricher();
+   }
+}
+```
+
+FailureEnricher example:
+
+``` java
+public class CustomEnricher implements FailureEnricher {
+private final Set outputKeys;
+
+public CustomEnricher() {
+this.outputKeys = Collections.singleton("labelKey");
+}
+
+@Override
+public Set getOutputKeys() {
+return outputKeys;
+}
+
+@Override
+public CompletableFuture> processFailure(
+Throwable cause, Context context) {
+return 
CompletableFuture.completedFuture(Collections.singletonMap("labelKey", 
"labelValue"));
+}
+}
+```
+
+### Configuration
+
+The JobManager loads FailureEnricher plugins at startup. To make sure your 
FailureEnrichers are loaded all class names should be defined as part of 
[jobmanager.failure-enrichers configuration]({{< ref 
"docs/deployment/config#jobmanager-failure-enrichers" >}}).
+  If this configuration is empty, NO enrichers will be started. Example:
+```
+jobmanager.failure-enrichers = 
org.apache.flink.test.plugin.jar.failure.CustomEnricher

Review Comment:
   Yes, this is the first version of the feature -- if many enrichers are 
loaded we could easily change that to what metrics reporters do but looked like 
an overkill for now



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] pgaref commented on a diff in pull request #22468: [FLINK-31889] Add documentation for implementing/loading enrichers

2023-09-07 Thread via GitHub


pgaref commented on code in PR #22468:
URL: https://github.com/apache/flink/pull/22468#discussion_r1318964767


##
docs/content/docs/deployment/advanced/failure_enrichers.md:
##
@@ -0,0 +1,112 @@
+---
+title: "Failure Enrichers"
+nav-title: failure-enrichers
+nav-parent_id: advanced
+nav-pos: 3
+---
+
+
+## Custom failure enrichers
+Flink provides a pluggable interface for users to register their custom logic 
and enrich failures with extra metadata labels (string kv pairs).
+The goal is to enable developers implement their own failure enrichment 
plugins to categorize job failures, expose custom metrics, make calls to 
external notification systems, and more.
+
+FailureEnrichers are triggered every time an exception is reported at runtime 
by the JobManager.
+Every FailureEnricher may asynchronously return labels associated with the 
failure that are then exposed via the JobManager's Rest interface (e.g., a 
'type:System' label implying the failure is categorized as a system error).
+
+
+### Implement a plugin for your custom enricher
+
+To implement a custom FailureEnricher plugin, you need to:
+
+- Add your own FailureEnricher by implementing the {{< gh_link 
file="/flink-core/src/main/java/org/apache/flink/core/failure/FailureEnricher.java"
 name="FailureEnricher" >}} interface.
+
+- Add your own FailureEnricherFactory by implementing the {{< gh_link 
file="/flink-core/src/main/java/org/apache/flink/core/failure/FailureEnricherFactory.java"
 name="FailureEnricherFactory" >}} interface.
+
+- Add a service entry. Create a file 
`META-INF/services/org.apache.flink.core.failure.FailureEnricherFactory` which 
contains the class name of your failure enricher factory class (see [Java 
Service 
Loader](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/ServiceLoader.html)
 docs for more details).
+
+
+Then, create a jar which includes your `FailureEnricher`, 
`FailureEnricherFactory`, `META-INF/services/` and all the external 
dependencies.
+Make a directory in `plugins/` of your Flink distribution with an arbitrary 
name, e.g. "failure-enrichment", and put the jar into this directory.
+See [Flink Plugin]({% link deployment/filesystems/plugins.md %}) for more 
details.
+
+{{< hint warning >}}
+Note that every FailureEnricher should have defined a set of {{< gh_link 
file="/flink-core/src/main/java/org/apache/flink/core/failure/FailureEnricher.java"
 name="output keys" >}} that may be associated with values. This set of keys 
has to be unique across enrichers otherwise may be ignored.
+{{< /hint >}}
+
+FailureEnricherFactory example:
+
+``` java
+public class TestFailureEnricherFactory implements FailureEnricherFactory {
+
+   @Override
+   public FailureEnricher createFailureEnricher(Configuration conf) {
+return new CustomEnricher();
+   }
+}
+```
+
+FailureEnricher example:
+
+``` java
+public class CustomEnricher implements FailureEnricher {
+private final Set outputKeys;
+
+public CustomEnricher() {
+this.outputKeys = Collections.singleton("labelKey");
+}
+
+@Override
+public Set getOutputKeys() {
+return outputKeys;
+}
+
+@Override
+public CompletableFuture> processFailure(
+Throwable cause, Context context) {
+return 
CompletableFuture.completedFuture(Collections.singletonMap("labelKey", 
"labelValue"));
+}
+}
+```
+
+### Configuration
+
+JobManager loads FailureEnricher plugins at startup. To make sure your 
FailureEnrichers are loaded:
+* All class names should be defined as part of {{< javadoc 
file="org/apache/flink/configuration/JobManagerOptions.html#FAILURE_ENRICHERS_LIST"
 name="jobmanager.failure-enrichers">}} configuration.
+  If this configuration is empty, NO enrichers will be started. Example:
+```
+jobmanager.failure-enrichers = 
org.apache.flink.test.plugin.jar.failure.CustomEnricher

Review Comment:
   `org.apache.flink.test.plugin.jar.failure` is the package? Not sure what you 
are referring to here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] pgaref commented on a diff in pull request #22468: [FLINK-31889] Add documentation for implementing/loading enrichers

2023-09-07 Thread via GitHub


pgaref commented on code in PR #22468:
URL: https://github.com/apache/flink/pull/22468#discussion_r1318963910


##
docs/content/docs/deployment/advanced/failure_enrichers.md:
##
@@ -0,0 +1,112 @@
+---
+title: "Failure Enrichers"
+nav-title: failure-enrichers
+nav-parent_id: advanced
+nav-pos: 3
+---
+
+
+## Custom failure enrichers
+Flink provides a pluggable interface for users to register their custom logic 
and enrich failures with extra metadata labels (string kv pairs).
+The goal is to enable developers implement their own failure enrichment 
plugins to categorize job failures, expose custom metrics, make calls to 
external notification systems, and more.
+
+FailureEnrichers are triggered every time an exception is reported at runtime 
by the JobManager.
+Every FailureEnricher may asynchronously return labels associated with the 
failure that are then exposed via the JobManager's Rest interface (e.g., a 
'type:System' label implying the failure is categorized as a system error).
+
+
+### Implement a plugin for your custom enricher
+
+To implement a custom FailureEnricher plugin, you need to:
+
+- Add your own FailureEnricher by implementing the {{< gh_link 
file="/flink-core/src/main/java/org/apache/flink/core/failure/FailureEnricher.java"
 name="FailureEnricher" >}} interface.
+
+- Add your own FailureEnricherFactory by implementing the {{< gh_link 
file="/flink-core/src/main/java/org/apache/flink/core/failure/FailureEnricherFactory.java"
 name="FailureEnricherFactory" >}} interface.
+
+- Add a service entry. Create a file 
`META-INF/services/org.apache.flink.core.failure.FailureEnricherFactory` which 
contains the class name of your failure enricher factory class (see [Java 
Service 
Loader](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/ServiceLoader.html)
 docs for more details).
+
+
+Then, create a jar which includes your `FailureEnricher`, 
`FailureEnricherFactory`, `META-INF/services/` and all the external 
dependencies.
+Make a directory in `plugins/` of your Flink distribution with an arbitrary 
name, e.g. "failure-enrichment", and put the jar into this directory.
+See [Flink Plugin]({% link deployment/filesystems/plugins.md %}) for more 
details.
+
+{{< hint warning >}}
+Note that every FailureEnricher should have defined a set of {{< gh_link 
file="/flink-core/src/main/java/org/apache/flink/core/failure/FailureEnricher.java"
 name="output keys" >}} that may be associated with values. This set of keys 
has to be unique across enrichers otherwise may be ignored.

Review Comment:
   When enrichers output keys overlap, only the first one is loaded and the 
rest are dropped with an Error message -- that was the meaning of `This set of 
keys has to be unique across enrichers otherwise may be ignored.` 
   Rephrasing



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] venkata91 commented on pull request #23313: [FLINK-20767][table planner] Support filter push down on nested fields

2023-09-07 Thread via GitHub


venkata91 commented on PR #23313:
URL: https://github.com/apache/flink/pull/23313#issuecomment-171041

   Also tagging @slinkydeveloper


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] venkata91 commented on pull request #23313: [FLINK-20767][table planner] Support filter push down on nested fields

2023-09-07 Thread via GitHub


venkata91 commented on PR #23313:
URL: https://github.com/apache/flink/pull/23313#issuecomment-1710555055

   @wuchong @swuferhong @JingsongLi @becketqin please review


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] zentol commented on a diff in pull request #22468: [FLINK-31889] Add documentation for implementing/loading enrichers

2023-09-07 Thread via GitHub


zentol commented on code in PR #22468:
URL: https://github.com/apache/flink/pull/22468#discussion_r1318921748


##
docs/content/docs/deployment/advanced/failure_enrichers.md:
##
@@ -0,0 +1,112 @@
+---
+title: "Failure Enrichers"
+nav-title: failure-enrichers
+nav-parent_id: advanced
+nav-pos: 3
+---
+
+
+## Custom failure enrichers
+Flink provides a pluggable interface for users to register their custom logic 
and enrich failures with extra metadata labels (string kv pairs).
+The goal is to enable developers implement their own failure enrichment 
plugins to categorize job failures, expose custom metrics, make calls to 
external notification systems, and more.
+
+FailureEnrichers are triggered every time an exception is reported at runtime 
by the JobManager.
+Every FailureEnricher may asynchronously return labels associated with the 
failure that are then exposed via the JobManager's Rest interface (e.g., a 
'type:System' label implying the failure is categorized as a system error).
+
+
+### Implement a plugin for your custom enricher
+
+To implement a custom FailureEnricher plugin, you need to:
+
+- Add your own FailureEnricher by implementing the {{< gh_link 
file="/flink-core/src/main/java/org/apache/flink/core/failure/FailureEnricher.java"
 name="FailureEnricher" >}} interface.
+
+- Add your own FailureEnricherFactory by implementing the {{< gh_link 
file="/flink-core/src/main/java/org/apache/flink/core/failure/FailureEnricherFactory.java"
 name="FailureEnricherFactory" >}} interface.
+
+- Add a service entry. Create a file 
`META-INF/services/org.apache.flink.core.failure.FailureEnricherFactory` which 
contains the class name of your failure enricher factory class (see [Java 
Service 
Loader](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/ServiceLoader.html)
 docs for more details).
+
+
+Then, create a jar which includes your `FailureEnricher`, 
`FailureEnricherFactory`, `META-INF/services/` and all the external 
dependencies.
+Make a directory in `plugins/` of your Flink distribution with an arbitrary 
name, e.g. "failure-enrichment", and put the jar into this directory.
+See [Flink Plugin]({% link deployment/filesystems/plugins.md %}) for more 
details.
+
+{{< hint warning >}}
+Note that every FailureEnricher should have defined a set of {{< gh_link 
file="/flink-core/src/main/java/org/apache/flink/core/failure/FailureEnricher.java"
 name="output keys" >}} that may be associated with values. This set of keys 
has to be unique across enrichers otherwise may be ignored.

Review Comment:
   oh wow all enrichers that clash in any way are just completely dropped. That 
doesn't seem to match the documented behavior.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] zentol commented on a diff in pull request #22468: [FLINK-31889] Add documentation for implementing/loading enrichers

2023-09-07 Thread via GitHub


zentol commented on code in PR #22468:
URL: https://github.com/apache/flink/pull/22468#discussion_r1318921748


##
docs/content/docs/deployment/advanced/failure_enrichers.md:
##
@@ -0,0 +1,112 @@
+---
+title: "Failure Enrichers"
+nav-title: failure-enrichers
+nav-parent_id: advanced
+nav-pos: 3
+---
+
+
+## Custom failure enrichers
+Flink provides a pluggable interface for users to register their custom logic 
and enrich failures with extra metadata labels (string kv pairs).
+The goal is to enable developers implement their own failure enrichment 
plugins to categorize job failures, expose custom metrics, make calls to 
external notification systems, and more.
+
+FailureEnrichers are triggered every time an exception is reported at runtime 
by the JobManager.
+Every FailureEnricher may asynchronously return labels associated with the 
failure that are then exposed via the JobManager's Rest interface (e.g., a 
'type:System' label implying the failure is categorized as a system error).
+
+
+### Implement a plugin for your custom enricher
+
+To implement a custom FailureEnricher plugin, you need to:
+
+- Add your own FailureEnricher by implementing the {{< gh_link 
file="/flink-core/src/main/java/org/apache/flink/core/failure/FailureEnricher.java"
 name="FailureEnricher" >}} interface.
+
+- Add your own FailureEnricherFactory by implementing the {{< gh_link 
file="/flink-core/src/main/java/org/apache/flink/core/failure/FailureEnricherFactory.java"
 name="FailureEnricherFactory" >}} interface.
+
+- Add a service entry. Create a file 
`META-INF/services/org.apache.flink.core.failure.FailureEnricherFactory` which 
contains the class name of your failure enricher factory class (see [Java 
Service 
Loader](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/ServiceLoader.html)
 docs for more details).
+
+
+Then, create a jar which includes your `FailureEnricher`, 
`FailureEnricherFactory`, `META-INF/services/` and all the external 
dependencies.
+Make a directory in `plugins/` of your Flink distribution with an arbitrary 
name, e.g. "failure-enrichment", and put the jar into this directory.
+See [Flink Plugin]({% link deployment/filesystems/plugins.md %}) for more 
details.
+
+{{< hint warning >}}
+Note that every FailureEnricher should have defined a set of {{< gh_link 
file="/flink-core/src/main/java/org/apache/flink/core/failure/FailureEnricher.java"
 name="output keys" >}} that may be associated with values. This set of keys 
has to be unique across enrichers otherwise may be ignored.

Review Comment:
   oh wow all enrichers that clash in any way are just completely dropped if 
there is a clash. That doesn't seem to match the documented behavior.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] zentol commented on a diff in pull request #22468: [FLINK-31889] Add documentation for implementing/loading enrichers

2023-09-07 Thread via GitHub


zentol commented on code in PR #22468:
URL: https://github.com/apache/flink/pull/22468#discussion_r1318909983


##
docs/content/docs/deployment/advanced/failure_enrichers.md:
##
@@ -0,0 +1,106 @@
+---
+title: "Failure Enrichers"
+nav-title: failure-enrichers
+nav-parent_id: advanced
+nav-pos: 3
+---
+
+
+## Custom failure enrichers
+Flink provides a pluggable interface for users to register their custom logic 
and enrich failures with extra metadata labels (string key-value pairs).
+This enables users to implement their own failure enrichment plugins to 
categorize job failures, expose custom metrics, or make calls to external 
notification systems.
+
+FailureEnrichers are triggered every time an exception is reported at runtime 
by the JobManager.
+Every FailureEnricher may asynchronously return labels associated with the 
failure that are then exposed via the JobManager's REST API (e.g., a 
'type:System' label implying the failure is categorized as a system error).
+
+
+### Implement a plugin for your custom enricher
+
+To implement a custom FailureEnricher plugin, you need to:
+
+- Add your own FailureEnricher by implementing the {{< gh_link 
file="/flink-core/src/main/java/org/apache/flink/core/failure/FailureEnricher.java"
 name="FailureEnricher" >}} interface.
+
+- Add your own FailureEnricherFactory by implementing the {{< gh_link 
file="/flink-core/src/main/java/org/apache/flink/core/failure/FailureEnricherFactory.java"
 name="FailureEnricherFactory" >}} interface.
+
+- Add a service entry. Create a file 
`META-INF/services/org.apache.flink.core.failure.FailureEnricherFactory` which 
contains the class name of your failure enricher factory class (see [Java 
Service 
Loader](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/ServiceLoader.html)
 docs for more details).
+
+
+Then, create a jar which includes your `FailureEnricher`, 
`FailureEnricherFactory`, `META-INF/services/` and all external dependencies.
+Make a directory in `plugins/` of your Flink distribution with an arbitrary 
name, e.g. "failure-enrichment", and put the jar into this directory.
+See [Flink Plugin]({% link deployment/filesystems/plugins.md %}) for more 
details.
+
+{{< hint warning >}}
+Note that every FailureEnricher should have defined a set of {{< gh_link 
file="/flink-core/src/main/java/org/apache/flink/core/failure/FailureEnricher.java"
 name="output keys" >}} that may be associated with values. This set of keys 
has to be unique across enrichers otherwise values associated with them may be 
ignored.
+{{< /hint >}}
+
+FailureEnricherFactory example:
+
+``` java
+public class TestFailureEnricherFactory implements FailureEnricherFactory {
+
+   @Override
+   public FailureEnricher createFailureEnricher(Configuration conf) {
+return new CustomEnricher();
+   }
+}
+```
+
+FailureEnricher example:
+
+``` java
+public class CustomEnricher implements FailureEnricher {
+private final Set outputKeys;
+
+public CustomEnricher() {
+this.outputKeys = Collections.singleton("labelKey");
+}
+
+@Override
+public Set getOutputKeys() {
+return outputKeys;
+}
+
+@Override
+public CompletableFuture> processFailure(
+Throwable cause, Context context) {
+return 
CompletableFuture.completedFuture(Collections.singletonMap("labelKey", 
"labelValue"));
+}
+}
+```
+
+### Configuration
+
+The JobManager loads FailureEnricher plugins at startup. To make sure your 
FailureEnrichers are loaded all class names should be defined as part of 
[jobmanager.failure-enrichers configuration]({{< ref 
"docs/deployment/config#jobmanager-failure-enrichers" >}}).
+  If this configuration is empty, NO enrichers will be started. Example:
+```
+jobmanager.failure-enrichers = 
org.apache.flink.test.plugin.jar.failure.CustomEnricher

Review Comment:
   Oh my it does work like that :(
   
   Better hope the enrichers don't setup any resources in their constructor :/



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] zentol commented on a diff in pull request #22468: [FLINK-31889] Add documentation for implementing/loading enrichers

2023-09-07 Thread via GitHub


zentol commented on code in PR #22468:
URL: https://github.com/apache/flink/pull/22468#discussion_r1318907314


##
docs/content.zh/docs/deployment/advanced/failure_enrichers.md:
##
@@ -0,0 +1,106 @@
+---
+title: "Failure Enrichers"
+nav-title: failure-enrichers
+nav-parent_id: advanced
+nav-pos: 3
+---
+
+
+## Custom failure enrichers
+Flink provides a pluggable interface for users to register their custom logic 
and enrich failures with extra metadata labels (string key-value pairs).
+This enables users to implement their own failure enrichment plugins to 
categorize job failures, expose custom metrics, or make calls to external 
notification systems.

Review Comment:
   side-note: bit surprised that something that is advertised as being able to 
talk to external systems has no lifecycle hooks for managing resources (like a 
http client).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] zentol commented on a diff in pull request #22468: [FLINK-31889] Add documentation for implementing/loading enrichers

2023-09-07 Thread via GitHub


zentol commented on code in PR #22468:
URL: https://github.com/apache/flink/pull/22468#discussion_r1318902851


##
docs/content/docs/deployment/advanced/failure_enrichers.md:
##
@@ -0,0 +1,106 @@
+---
+title: "Failure Enrichers"
+nav-title: failure-enrichers
+nav-parent_id: advanced
+nav-pos: 3
+---
+
+
+## Custom failure enrichers
+Flink provides a pluggable interface for users to register their custom logic 
and enrich failures with extra metadata labels (string key-value pairs).
+This enables users to implement their own failure enrichment plugins to 
categorize job failures, expose custom metrics, or make calls to external 
notification systems.
+
+FailureEnrichers are triggered every time an exception is reported at runtime 
by the JobManager.
+Every FailureEnricher may asynchronously return labels associated with the 
failure that are then exposed via the JobManager's REST API (e.g., a 
'type:System' label implying the failure is categorized as a system error).
+
+
+### Implement a plugin for your custom enricher
+
+To implement a custom FailureEnricher plugin, you need to:
+
+- Add your own FailureEnricher by implementing the {{< gh_link 
file="/flink-core/src/main/java/org/apache/flink/core/failure/FailureEnricher.java"
 name="FailureEnricher" >}} interface.
+
+- Add your own FailureEnricherFactory by implementing the {{< gh_link 
file="/flink-core/src/main/java/org/apache/flink/core/failure/FailureEnricherFactory.java"
 name="FailureEnricherFactory" >}} interface.
+
+- Add a service entry. Create a file 
`META-INF/services/org.apache.flink.core.failure.FailureEnricherFactory` which 
contains the class name of your failure enricher factory class (see [Java 
Service 
Loader](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/ServiceLoader.html)
 docs for more details).
+
+
+Then, create a jar which includes your `FailureEnricher`, 
`FailureEnricherFactory`, `META-INF/services/` and all external dependencies.
+Make a directory in `plugins/` of your Flink distribution with an arbitrary 
name, e.g. "failure-enrichment", and put the jar into this directory.
+See [Flink Plugin]({% link deployment/filesystems/plugins.md %}) for more 
details.
+
+{{< hint warning >}}
+Note that every FailureEnricher should have defined a set of {{< gh_link 
file="/flink-core/src/main/java/org/apache/flink/core/failure/FailureEnricher.java"
 name="output keys" >}} that may be associated with values. This set of keys 
has to be unique across enrichers otherwise values associated with them may be 
ignored.
+{{< /hint >}}
+
+FailureEnricherFactory example:
+
+``` java
+public class TestFailureEnricherFactory implements FailureEnricherFactory {
+
+   @Override
+   public FailureEnricher createFailureEnricher(Configuration conf) {
+return new CustomEnricher();
+   }
+}
+```
+
+FailureEnricher example:
+
+``` java
+public class CustomEnricher implements FailureEnricher {
+private final Set outputKeys;
+
+public CustomEnricher() {
+this.outputKeys = Collections.singleton("labelKey");
+}
+
+@Override
+public Set getOutputKeys() {
+return outputKeys;
+}
+
+@Override
+public CompletableFuture> processFailure(
+Throwable cause, Context context) {
+return 
CompletableFuture.completedFuture(Collections.singletonMap("labelKey", 
"labelValue"));
+}
+}
+```
+
+### Configuration
+
+The JobManager loads FailureEnricher plugins at startup. To make sure your 
FailureEnrichers are loaded all class names should be defined as part of 
[jobmanager.failure-enrichers configuration]({{< ref 
"docs/deployment/config#jobmanager-failure-enrichers" >}}).
+  If this configuration is empty, NO enrichers will be started. Example:
+```
+jobmanager.failure-enrichers = 
org.apache.flink.test.plugin.jar.failure.CustomEnricher

Review Comment:
   side-note: If multiple enrichers are used this option can become a bit 
cumbersome for users, and a structure like we use for metrics reporters 
could've solved that (and paved a way to make them configurable!)
   
   ```
   jobmanager.failure-enrichers.enricher1.class = 
org.apache.flink.test.plugin.jar.failure.CustomEnricher
   ```



##
docs/content/docs/deployment/advanced/failure_enrichers.md:
##
@@ -0,0 +1,106 @@
+---
+title: "Failure Enrichers"
+nav-title: failure-enrichers
+nav-parent_id: advanced
+nav-pos: 3
+---
+
+
+## Custom failure enrichers
+Flink provides a pluggable interface for users to register their custom logic 
and enrich failures with extra metadata labels (string key-value pairs).
+This enables users to implement their own failure enrichment plugins to 
categorize job failures, expose custom metrics, or make calls to external 
notification systems.
+
+FailureEnrichers are triggered every time an exception is reported at runtime 
by the JobManager.
+Every FailureEnricher may asynchronously return labels associated with the 
failure that are then exposed via 

[GitHub] [flink] zentol commented on a diff in pull request #22468: [FLINK-31889] Add documentation for implementing/loading enrichers

2023-09-07 Thread via GitHub


zentol commented on code in PR #22468:
URL: https://github.com/apache/flink/pull/22468#discussion_r1318898982


##
docs/content/docs/deployment/advanced/failure_enrichers.md:
##
@@ -0,0 +1,112 @@
+---
+title: "Failure Enrichers"
+nav-title: failure-enrichers
+nav-parent_id: advanced
+nav-pos: 3
+---
+
+
+## Custom failure enrichers
+Flink provides a pluggable interface for users to register their custom logic 
and enrich failures with extra metadata labels (string kv pairs).
+The goal is to enable developers implement their own failure enrichment 
plugins to categorize job failures, expose custom metrics, make calls to 
external notification systems, and more.
+
+FailureEnrichers are triggered every time an exception is reported at runtime 
by the JobManager.
+Every FailureEnricher may asynchronously return labels associated with the 
failure that are then exposed via the JobManager's Rest interface (e.g., a 
'type:System' label implying the failure is categorized as a system error).
+
+
+### Implement a plugin for your custom enricher
+
+To implement a custom FailureEnricher plugin, you need to:
+
+- Add your own FailureEnricher by implementing the {{< gh_link 
file="/flink-core/src/main/java/org/apache/flink/core/failure/FailureEnricher.java"
 name="FailureEnricher" >}} interface.
+
+- Add your own FailureEnricherFactory by implementing the {{< gh_link 
file="/flink-core/src/main/java/org/apache/flink/core/failure/FailureEnricherFactory.java"
 name="FailureEnricherFactory" >}} interface.
+
+- Add a service entry. Create a file 
`META-INF/services/org.apache.flink.core.failure.FailureEnricherFactory` which 
contains the class name of your failure enricher factory class (see [Java 
Service 
Loader](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/ServiceLoader.html)
 docs for more details).
+
+
+Then, create a jar which includes your `FailureEnricher`, 
`FailureEnricherFactory`, `META-INF/services/` and all the external 
dependencies.
+Make a directory in `plugins/` of your Flink distribution with an arbitrary 
name, e.g. "failure-enrichment", and put the jar into this directory.
+See [Flink Plugin]({% link deployment/filesystems/plugins.md %}) for more 
details.
+
+{{< hint warning >}}
+Note that every FailureEnricher should have defined a set of {{< gh_link 
file="/flink-core/src/main/java/org/apache/flink/core/failure/FailureEnricher.java"
 name="output keys" >}} that may be associated with values. This set of keys 
has to be unique across enrichers otherwise may be ignored.
+{{< /hint >}}
+
+FailureEnricherFactory example:
+
+``` java
+public class TestFailureEnricherFactory implements FailureEnricherFactory {
+
+   @Override
+   public FailureEnricher createFailureEnricher(Configuration conf) {
+return new CustomEnricher();
+   }
+}
+```
+
+FailureEnricher example:
+
+``` java
+public class CustomEnricher implements FailureEnricher {
+private final Set outputKeys;
+
+public CustomEnricher() {
+this.outputKeys = Collections.singleton("labelKey");
+}
+
+@Override
+public Set getOutputKeys() {
+return outputKeys;
+}
+
+@Override
+public CompletableFuture> processFailure(
+Throwable cause, Context context) {
+return 
CompletableFuture.completedFuture(Collections.singletonMap("labelKey", 
"labelValue"));
+}
+}
+```
+
+### Configuration
+
+JobManager loads FailureEnricher plugins at startup. To make sure your 
FailureEnrichers are loaded:
+* All class names should be defined as part of {{< javadoc 
file="org/apache/flink/configuration/JobManagerOptions.html#FAILURE_ENRICHERS_LIST"
 name="jobmanager.failure-enrichers">}} configuration.
+  If this configuration is empty, NO enrichers will be started. Example:
+```
+jobmanager.failure-enrichers = 
org.apache.flink.test.plugin.jar.failure.CustomEnricher

Review Comment:
   Did you miss this one?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] zentol commented on a diff in pull request #22468: [FLINK-31889] Add documentation for implementing/loading enrichers

2023-09-07 Thread via GitHub


zentol commented on code in PR #22468:
URL: https://github.com/apache/flink/pull/22468#discussion_r1318898608


##
docs/content/docs/deployment/advanced/failure_enrichers.md:
##
@@ -0,0 +1,112 @@
+---
+title: "Failure Enrichers"
+nav-title: failure-enrichers
+nav-parent_id: advanced
+nav-pos: 3
+---
+
+
+## Custom failure enrichers
+Flink provides a pluggable interface for users to register their custom logic 
and enrich failures with extra metadata labels (string kv pairs).
+The goal is to enable developers implement their own failure enrichment 
plugins to categorize job failures, expose custom metrics, make calls to 
external notification systems, and more.
+
+FailureEnrichers are triggered every time an exception is reported at runtime 
by the JobManager.
+Every FailureEnricher may asynchronously return labels associated with the 
failure that are then exposed via the JobManager's Rest interface (e.g., a 
'type:System' label implying the failure is categorized as a system error).
+
+
+### Implement a plugin for your custom enricher
+
+To implement a custom FailureEnricher plugin, you need to:
+
+- Add your own FailureEnricher by implementing the {{< gh_link 
file="/flink-core/src/main/java/org/apache/flink/core/failure/FailureEnricher.java"
 name="FailureEnricher" >}} interface.
+
+- Add your own FailureEnricherFactory by implementing the {{< gh_link 
file="/flink-core/src/main/java/org/apache/flink/core/failure/FailureEnricherFactory.java"
 name="FailureEnricherFactory" >}} interface.
+
+- Add a service entry. Create a file 
`META-INF/services/org.apache.flink.core.failure.FailureEnricherFactory` which 
contains the class name of your failure enricher factory class (see [Java 
Service 
Loader](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/ServiceLoader.html)
 docs for more details).
+
+
+Then, create a jar which includes your `FailureEnricher`, 
`FailureEnricherFactory`, `META-INF/services/` and all the external 
dependencies.
+Make a directory in `plugins/` of your Flink distribution with an arbitrary 
name, e.g. "failure-enrichment", and put the jar into this directory.
+See [Flink Plugin]({% link deployment/filesystems/plugins.md %}) for more 
details.
+
+{{< hint warning >}}
+Note that every FailureEnricher should have defined a set of {{< gh_link 
file="/flink-core/src/main/java/org/apache/flink/core/failure/FailureEnricher.java"
 name="output keys" >}} that may be associated with values. This set of keys 
has to be unique across enrichers otherwise may be ignored.

Review Comment:
   I don't understand what the implications "ignoring values" is. In practice 
all that happens if that one enricher would override the label of another one, 
right? Can we just write that?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-32962) Failure to install python dependencies from requirements file

2023-09-07 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762807#comment-17762807
 ] 

Hong Liang Teoh commented on FLINK-32962:
-

[~a.pilipenko]  should we also backport this to Flink 1.17 and 1.18?

> Failure to install python dependencies from requirements file
> -
>
> Key: FLINK-32962
> URL: https://issues.apache.org/jira/browse/FLINK-32962
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.15.4, 1.16.2, 1.18.0, 1.17.1
>Reporter: Aleksandr Pilipenko
>Priority: Major
>  Labels: pull-request-available
>
> We have encountered an issue when Flink fails to install python dependencies 
> from requirements file if python environment contains setuptools dependency 
> version 67.5.0 or above.
> Flink job fails with following error:
> {code:java}
> py4j.protocol.Py4JJavaError: An error occurred while calling o118.await.
> : java.util.concurrent.ExecutionException: 
> org.apache.flink.table.api.TableException: Failed to wait job finish
> at 
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
> at 
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
> at 
> org.apache.flink.table.api.internal.TableResultImpl.awaitInternal(TableResultImpl.java:118)
> at 
> org.apache.flink.table.api.internal.TableResultImpl.await(TableResultImpl.java:81)
> ...
> Caused by: java.util.concurrent.ExecutionException: 
> org.apache.flink.client.program.ProgramInvocationException: Job failed 
> (JobID: 2ca4026944022ac4537c503464d4c47f)
> at 
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
> at 
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
> at 
> org.apache.flink.table.api.internal.InsertResultProvider.hasNext(InsertResultProvider.java:83)
> ... 6 more
> Caused by: org.apache.flink.client.program.ProgramInvocationException: Job 
> failed (JobID: 2ca4026944022ac4537c503464d4c47f)
> at 
> org.apache.flink.client.deployment.ClusterClientJobClientAdapter.lambda$null$6(ClusterClientJobClientAdapter.java:130)
> at 
> java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642)
> at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
> at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
> ...
> Caused by: java.io.IOException: java.io.IOException: Failed to execute the 
> command: /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install 
> --ignore-installed -r 
> /var/folders/rb/q_3h54_94b57gz_qkbp593vwgn/T/tm_localhost:63904-4178d3/blobStorage/job_2ca4026944022ac4537c503464d4c47f/blob_p-feb04bed919e628d98b1ef085111482f05bf43a1-1f5c3fda7cbdd6842e950e04d9d807c5
>  --install-option 
> --prefix=/var/folders/rb/q_3h54_94b57gz_qkbp593vwgn/T/python-dist-c17912db-5aba-439f-9e39-69cb8c957bec/python-requirements
> output:
> Usage:
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
>  [package-index-options] ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] -r 
>  [package-index-options] ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
> [-e]  ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
> [-e]  ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
>  ...
> no such option: --install-option
> at 
> org.apache.flink.python.util.PythonEnvironmentManagerUtils.pipInstallRequirements(PythonEnvironmentManagerUtils.java:144)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager.installRequirements(AbstractPythonEnvironmentManager.java:215)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager.lambda$open$0(AbstractPythonEnvironmentManager.java:126)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager$PythonEnvResources.createResource(AbstractPythonEnvironmentManager.java:435)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager$PythonEnvResources.getOrAllocateSharedResource(AbstractPythonEnvironmentManager.java:402)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager.open(AbstractPythonEnvironmentManager.java:114)
> at 
> org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.open(BeamPythonFunctionRunner.java:238)
> at 
> org.apache.flink.streaming.api.operators.python.process.AbstractExternalPythonFunctionOperator.open(AbstractExternalPythonFunctionOperator.java:57)
> at 
> 

[jira] [Commented] (FLINK-32962) Failure to install python dependencies from requirements file

2023-09-07 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762805#comment-17762805
 ] 

Hong Liang Teoh commented on FLINK-32962:
-

merged commit 
[{{1193e17}}|https://github.com/apache/flink/commit/1193e178f6c478bf493231fc53c74f03158c0ca9]
 into apache:master

> Failure to install python dependencies from requirements file
> -
>
> Key: FLINK-32962
> URL: https://issues.apache.org/jira/browse/FLINK-32962
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.15.4, 1.16.2, 1.18.0, 1.17.1
>Reporter: Aleksandr Pilipenko
>Priority: Major
>  Labels: pull-request-available
>
> We have encountered an issue when Flink fails to install python dependencies 
> from requirements file if python environment contains setuptools dependency 
> version 67.5.0 or above.
> Flink job fails with following error:
> {code:java}
> py4j.protocol.Py4JJavaError: An error occurred while calling o118.await.
> : java.util.concurrent.ExecutionException: 
> org.apache.flink.table.api.TableException: Failed to wait job finish
> at 
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
> at 
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
> at 
> org.apache.flink.table.api.internal.TableResultImpl.awaitInternal(TableResultImpl.java:118)
> at 
> org.apache.flink.table.api.internal.TableResultImpl.await(TableResultImpl.java:81)
> ...
> Caused by: java.util.concurrent.ExecutionException: 
> org.apache.flink.client.program.ProgramInvocationException: Job failed 
> (JobID: 2ca4026944022ac4537c503464d4c47f)
> at 
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
> at 
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
> at 
> org.apache.flink.table.api.internal.InsertResultProvider.hasNext(InsertResultProvider.java:83)
> ... 6 more
> Caused by: org.apache.flink.client.program.ProgramInvocationException: Job 
> failed (JobID: 2ca4026944022ac4537c503464d4c47f)
> at 
> org.apache.flink.client.deployment.ClusterClientJobClientAdapter.lambda$null$6(ClusterClientJobClientAdapter.java:130)
> at 
> java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642)
> at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
> at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
> ...
> Caused by: java.io.IOException: java.io.IOException: Failed to execute the 
> command: /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install 
> --ignore-installed -r 
> /var/folders/rb/q_3h54_94b57gz_qkbp593vwgn/T/tm_localhost:63904-4178d3/blobStorage/job_2ca4026944022ac4537c503464d4c47f/blob_p-feb04bed919e628d98b1ef085111482f05bf43a1-1f5c3fda7cbdd6842e950e04d9d807c5
>  --install-option 
> --prefix=/var/folders/rb/q_3h54_94b57gz_qkbp593vwgn/T/python-dist-c17912db-5aba-439f-9e39-69cb8c957bec/python-requirements
> output:
> Usage:
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
>  [package-index-options] ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] -r 
>  [package-index-options] ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
> [-e]  ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
> [-e]  ...
>   /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] 
>  ...
> no such option: --install-option
> at 
> org.apache.flink.python.util.PythonEnvironmentManagerUtils.pipInstallRequirements(PythonEnvironmentManagerUtils.java:144)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager.installRequirements(AbstractPythonEnvironmentManager.java:215)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager.lambda$open$0(AbstractPythonEnvironmentManager.java:126)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager$PythonEnvResources.createResource(AbstractPythonEnvironmentManager.java:435)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager$PythonEnvResources.getOrAllocateSharedResource(AbstractPythonEnvironmentManager.java:402)
> at 
> org.apache.flink.python.env.AbstractPythonEnvironmentManager.open(AbstractPythonEnvironmentManager.java:114)
> at 
> org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.open(BeamPythonFunctionRunner.java:238)
> at 
> org.apache.flink.streaming.api.operators.python.process.AbstractExternalPythonFunctionOperator.open(AbstractExternalPythonFunctionOperator.java:57)
> at 
> 

[GitHub] [flink] hlteoh37 merged pull request #23297: [FLINK-32962][python] Remove pip version check on installing dependencies.

2023-09-07 Thread via GitHub


hlteoh37 merged PR #23297:
URL: https://github.com/apache/flink/pull/23297


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] pgaref commented on pull request #22468: FLINK-31889: Add documentation for implementing/loading enrichers

2023-09-07 Thread via GitHub


pgaref commented on PR #22468:
URL: https://github.com/apache/flink/pull/22468#issuecomment-1710421376

   > This needs a copy in the chinese version of the docs. (that may be in 
english; translation is a follow-up)
   
   Thanks for the review @zentol ! 
   Addressed your comments, also created a copy as the chinese version of the 
docs that will be addressed as part of 
https://issues.apache.org/jira/browse/FLINK-33061 by @wangzzu 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #23372: [FLINK-33042][flamegraph] Allow trigger flamegraph when task is initializing

2023-09-07 Thread via GitHub


flinkbot commented on PR #23372:
URL: https://github.com/apache/flink/pull/23372#issuecomment-1710406803

   
   ## CI report:
   
   * 5a37a649a983c2f23d4de33ccac525bd9ead16bb UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] hlteoh37 commented on pull request #23297: [FLINK-32962][python] Remove pip version check on installing dependencies.

2023-09-07 Thread via GitHub


hlteoh37 commented on PR #23297:
URL: https://github.com/apache/flink/pull/23297#issuecomment-1710397554

   @JingGe @HuangXingBo  Any thoughts on this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] 1996fanrui opened a new pull request, #23372: [FLINK-33042][flamegraph] Allow trigger flamegraph when task is initializing

2023-09-07 Thread via GitHub


1996fanrui opened a new pull request, #23372:
URL: https://github.com/apache/flink/pull/23372

   ## What is the purpose of the change
   
   Currently, the flamegraph can be triggered when task is running.
   
   After [FLINK-17012](https://issues.apache.org/jira/browse/FLINK-17012) and 
[FLINK-22215](https://issues.apache.org/jira/browse/FLINK-22215), flink split 
the running to running and initializing. We should allow trigger flamegraph 
when task is initializing. For example, the initialization is very slow, we 
need to troubleshoot.
   
   
   ## Brief change log
   
   [FLINK-33042][flamegraph] Allow trigger flamegraph when task is initializing
   
   
   ## Verifying this change
   
   Added the `@Parameter public ExecutionState executionState;` for the 
`VertexThreadInfoTrackerTest`. And all tests will run with running and 
initializing.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
 - The serializers: no
 - The runtime per-record code paths (performance sensitive): no
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
 - The S3 file system connector: no
   
   ## Documentation
   
 - Does this pull request introduce a new feature? no
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-33018) GCP Pubsub PubSubConsumingTest.testStoppingConnectorWhenDeserializationSchemaIndicatesEndOfStream failed

2023-09-07 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762793#comment-17762793
 ] 

Ryan Skraba commented on FLINK-33018:
-

I think I have a fix -- it looks like occasionally (every 10K runs or so) we 
hit the explicit cancel() written in this specific test, before we hit the 
implicit cancel() due to the endofStream message being discovered.  I'm letting 
this run for a while before submitting a PR.  This test also slows down in 
consecutive runs, which might indicate a leak but I'm hoping this is not the 
case.

> GCP Pubsub 
> PubSubConsumingTest.testStoppingConnectorWhenDeserializationSchemaIndicatesEndOfStream
>  failed
> 
>
> Key: FLINK-33018
> URL: https://issues.apache.org/jira/browse/FLINK-33018
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Google Cloud PubSub
>Affects Versions: gcp-pubsub-3.0.2
>Reporter: Martijn Visser
>Priority: Blocker
>
> https://github.com/apache/flink-connector-gcp-pubsub/actions/runs/6061318336/job/16446392844#step:13:507
> {code:java}
> [INFO] 
> [INFO] Results:
> [INFO] 
> Error:  Failures: 
> Error:
> PubSubConsumingTest.testStoppingConnectorWhenDeserializationSchemaIndicatesEndOfStream:119
>  
> expected: ["1", "2", "3"]
>  but was: ["1", "2"]
> [INFO] 
> Error:  Tests run: 30, Failures: 1, Errors: 0, Skipped: 0
> [INFO] 
> [INFO] 
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-15736) Support Java 17 (LTS)

2023-09-07 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-15736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl updated FLINK-15736:
--
Release Note: Apache Flink was made ready to compile and run with Java 17 
(LTS). This feature is still in beta mode. Issues should be reported in Flink's 
bug tracker.  (was: Apache Flink was made ready to compile and run with Java 17 
(LTS).)

> Support Java 17 (LTS)
> -
>
> Key: FLINK-15736
> URL: https://issues.apache.org/jira/browse/FLINK-15736
> Project: Flink
>  Issue Type: New Feature
>  Components: Build System
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
>  Labels: auto-deprioritized-major, pull-request-available, 
> stale-assigned
> Fix For: 1.18.0
>
>
> Long-term issue for preparing Flink for Java 17.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33061) Translate failure-enricher documentation to Chinese

2023-09-07 Thread Panagiotis Garefalakis (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762782#comment-17762782
 ] 

Panagiotis Garefalakis commented on FLINK-33061:


[~wangm92]  feel free to take over

> Translate failure-enricher documentation to Chinese
> ---
>
> Key: FLINK-33061
> URL: https://issues.apache.org/jira/browse/FLINK-33061
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Priority: Not a Priority
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33061) Translate failure-enricher documentation to Chinese

2023-09-07 Thread Panagiotis Garefalakis (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis updated FLINK-33061:
---
Summary: Translate failure-enricher documentation to Chinese  (was: 
Translate failure-enricher documentation)

> Translate failure-enricher documentation to Chinese
> ---
>
> Key: FLINK-33061
> URL: https://issues.apache.org/jira/browse/FLINK-33061
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Priority: Not a Priority
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33061) Translate failure-enricher documentation

2023-09-07 Thread Panagiotis Garefalakis (Jira)
Panagiotis Garefalakis created FLINK-33061:
--

 Summary: Translate failure-enricher documentation
 Key: FLINK-33061
 URL: https://issues.apache.org/jira/browse/FLINK-33061
 Project: Flink
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33018) GCP Pubsub PubSubConsumingTest.testStoppingConnectorWhenDeserializationSchemaIndicatesEndOfStream failed

2023-09-07 Thread Jayadeep Jayaraman (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762765#comment-17762765
 ] 

Jayadeep Jayaraman commented on FLINK-33018:


I ran it 23K times yesterday and didnt face any issue hence stopped the test 
:). Good that you were able to reproduce the error!

> GCP Pubsub 
> PubSubConsumingTest.testStoppingConnectorWhenDeserializationSchemaIndicatesEndOfStream
>  failed
> 
>
> Key: FLINK-33018
> URL: https://issues.apache.org/jira/browse/FLINK-33018
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Google Cloud PubSub
>Affects Versions: gcp-pubsub-3.0.2
>Reporter: Martijn Visser
>Priority: Blocker
>
> https://github.com/apache/flink-connector-gcp-pubsub/actions/runs/6061318336/job/16446392844#step:13:507
> {code:java}
> [INFO] 
> [INFO] Results:
> [INFO] 
> Error:  Failures: 
> Error:
> PubSubConsumingTest.testStoppingConnectorWhenDeserializationSchemaIndicatesEndOfStream:119
>  
> expected: ["1", "2", "3"]
>  but was: ["1", "2"]
> [INFO] 
> Error:  Tests run: 30, Failures: 1, Errors: 0, Skipped: 0
> [INFO] 
> [INFO] 
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-30649) Shutting down MiniCluster times out

2023-09-07 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-30649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl reassigned FLINK-30649:
-

Assignee: (was: Matthias Pohl)

> Shutting down MiniCluster times out
> ---
>
> Key: FLINK-30649
> URL: https://issues.apache.org/jira/browse/FLINK-30649
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes, Test Infrastructure
>Affects Versions: 1.17.0
>Reporter: Matthias Pohl
>Priority: Critical
>  Labels: stale-assigned, starter, test-stability
>
> {{Run kubernetes session test (default input)}} failed with a timeout.
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=44748=logs=bea52777-eaf8-5663-8482-18fbc3630e81=b2642e3a-5b86-574d-4c8a-f7e2842bfb14=6317]
> It appears that there was some issue with shutting down the pods of the 
> MiniCluster:
> {code:java}
> 2023-01-12T08:22:13.1388597Z timed out waiting for the condition on 
> pods/flink-native-k8s-session-1-7dc9976688-gq788
> 2023-01-12T08:22:13.1390040Z timed out waiting for the condition on 
> pods/flink-native-k8s-session-1-taskmanager-1-1 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31889) Add documentation for implementing/loading enrichers

2023-09-07 Thread Matt Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762746#comment-17762746
 ] 

Matt Wang commented on FLINK-31889:
---

[~pgaref] if you create a ticket for chinese documention, you can aasign to me, 
i will translate it

> Add documentation for implementing/loading enrichers
> 
>
> Key: FLINK-31889
> URL: https://issues.apache.org/jira/browse/FLINK-31889
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>
> Describe how enrichers can be implemented and loaded to Flink as part of 
> documentation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-24943) SequenceNumber class is not POJO type

2023-09-07 Thread Danny Cranmer (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-24943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Cranmer updated FLINK-24943:
--
Fix Version/s: (was: 1.15.5)
   (was: 1.16.3)

> SequenceNumber class is not POJO type
> -
>
> Key: FLINK-24943
> URL: https://issues.apache.org/jira/browse/FLINK-24943
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kinesis
>Affects Versions: 1.12.5, 1.13.3, 1.14.4, 1.15.0
>Reporter: Alexander Egorov
>Assignee: Alexander Egorov
>Priority: Minor
>  Labels: pull-request-available, stale-assigned
> Fix For: aws-connector-4.2.0
>
>
> SequenceNumber class is currently part of the "Kinesis-Stream-Shard-State", 
> but it does not follow requirements of POJO compatible type. Because of that 
> we are getting warning like this:
> {{TypeExtractor - class 
> software.amazon.kinesis.connectors.flink.model.SequenceNumber does not 
> contain a setter for field sequenceNumber}}
> While the warning itself or inability to use optimal sterilizer for such a 
> small state is not the problem, this warning prevents us to disable Generic 
> Types via {{disableGenericTypes()}}
> So, the problem is similar to 
> https://issues.apache.org/jira/browse/FLINK-15904



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-24943) SequenceNumber class is not POJO type

2023-09-07 Thread Danny Cranmer (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-24943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762736#comment-17762736
 ] 

Danny Cranmer commented on FLINK-24943:
---

[~Cyberness] can you please recreate this PR against 
https://github.com/apache/flink-connector-aws?

> SequenceNumber class is not POJO type
> -
>
> Key: FLINK-24943
> URL: https://issues.apache.org/jira/browse/FLINK-24943
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kinesis
>Affects Versions: 1.12.5, 1.13.3, 1.14.4, 1.15.0
>Reporter: Alexander Egorov
>Assignee: Alexander Egorov
>Priority: Minor
>  Labels: pull-request-available, stale-assigned
> Fix For: 1.15.5, aws-connector-4.2.0, 1.16.3
>
>
> SequenceNumber class is currently part of the "Kinesis-Stream-Shard-State", 
> but it does not follow requirements of POJO compatible type. Because of that 
> we are getting warning like this:
> {{TypeExtractor - class 
> software.amazon.kinesis.connectors.flink.model.SequenceNumber does not 
> contain a setter for field sequenceNumber}}
> While the warning itself or inability to use optimal sterilizer for such a 
> small state is not the problem, this warning prevents us to disable Generic 
> Types via {{disableGenericTypes()}}
> So, the problem is similar to 
> https://issues.apache.org/jira/browse/FLINK-15904



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-24943) SequenceNumber class is not POJO type

2023-09-07 Thread Danny Cranmer (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-24943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762731#comment-17762731
 ] 

Danny Cranmer commented on FLINK-24943:
---

[~martijnvisser] we will pick this up. Thanks for the ping.

> SequenceNumber class is not POJO type
> -
>
> Key: FLINK-24943
> URL: https://issues.apache.org/jira/browse/FLINK-24943
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kinesis
>Affects Versions: 1.12.5, 1.13.3, 1.14.4, 1.15.0
>Reporter: Alexander Egorov
>Assignee: Alexander Egorov
>Priority: Minor
>  Labels: pull-request-available, stale-assigned
> Fix For: 1.15.5, aws-connector-4.2.0, 1.16.3
>
>
> SequenceNumber class is currently part of the "Kinesis-Stream-Shard-State", 
> but it does not follow requirements of POJO compatible type. Because of that 
> we are getting warning like this:
> {{TypeExtractor - class 
> software.amazon.kinesis.connectors.flink.model.SequenceNumber does not 
> contain a setter for field sequenceNumber}}
> While the warning itself or inability to use optimal sterilizer for such a 
> small state is not the problem, this warning prevents us to disable Generic 
> Types via {{disableGenericTypes()}}
> So, the problem is similar to 
> https://issues.apache.org/jira/browse/FLINK-15904



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33018) GCP Pubsub PubSubConsumingTest.testStoppingConnectorWhenDeserializationSchemaIndicatesEndOfStream failed

2023-09-07 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762728#comment-17762728
 ] 

Ryan Skraba commented on FLINK-33018:
-

Hey, I can reproduce this -- all you need to do is run this test about 12,000 
times ;)  (using IntelliJ repeat until fail).

{code}
Connected to the target VM, address: '127.0.0.1:44879', transport: 'socket'

org.opentest4j.AssertionFailedError: 
expected: ["1", "2", "3"]
 but was: ["1", "2"]
Expected :["1", "2", "3"]
Actual   :["1", "2"]
{code}

I'm taking a look.

> GCP Pubsub 
> PubSubConsumingTest.testStoppingConnectorWhenDeserializationSchemaIndicatesEndOfStream
>  failed
> 
>
> Key: FLINK-33018
> URL: https://issues.apache.org/jira/browse/FLINK-33018
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Google Cloud PubSub
>Affects Versions: gcp-pubsub-3.0.2
>Reporter: Martijn Visser
>Priority: Blocker
>
> https://github.com/apache/flink-connector-gcp-pubsub/actions/runs/6061318336/job/16446392844#step:13:507
> {code:java}
> [INFO] 
> [INFO] Results:
> [INFO] 
> Error:  Failures: 
> Error:
> PubSubConsumingTest.testStoppingConnectorWhenDeserializationSchemaIndicatesEndOfStream:119
>  
> expected: ["1", "2", "3"]
>  but was: ["1", "2"]
> [INFO] 
> Error:  Tests run: 30, Failures: 1, Errors: 0, Skipped: 0
> [INFO] 
> [INFO] 
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31895) End-to-end integration tests for failure labels

2023-09-07 Thread Panagiotis Garefalakis (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762718#comment-17762718
 ] 

Panagiotis Garefalakis commented on FLINK-31895:


Fine by me [~wangm92]  

> End-to-end integration tests for failure labels
> ---
>
> Key: FLINK-31895
> URL: https://issues.apache.org/jira/browse/FLINK-31895
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31895) End-to-end integration tests for failure labels

2023-09-07 Thread Matt Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762715#comment-17762715
 ] 

Matt Wang commented on FLINK-31895:
---

[~pgaref] I would like to take this ticket if possible

> End-to-end integration tests for failure labels
> ---
>
> Key: FLINK-31895
> URL: https://issues.apache.org/jira/browse/FLINK-31895
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] wangyang0918 commented on a diff in pull request #23164: [FLINK- 32775]Add parent dir of files to classpath using yarn.provided.lib.dirs

2023-09-07 Thread via GitHub


wangyang0918 commented on code in PR #23164:
URL: https://github.com/apache/flink/pull/23164#discussion_r1318475489


##
flink-yarn/src/main/java/org/apache/flink/yarn/YarnApplicationFileUploader.java:
##
@@ -360,11 +362,27 @@ List registerProvidedLocalResources() {
 envShipResourceList.add(descriptor);
 
 if (!isFlinkDistJar(filePath.getName()) && 
!isPlugin(filePath)) {
-classPaths.add(fileName);
+URI parentDirectoryUri = new 
Path(fileName).getParent().toUri();
+String relativeParentDirectory =

Review Comment:
   ```
   
-Dyarn.provided.lib.dirs="hdfs://myhdfs/tmp/flink-1.18-SNAPSHOT/lib;hdfs://myhdfs/tmp/flink-1.18-SNAPSHOT/plugins;hdfs://myhdfs/tmp/flink-1.18-SNAPSHOT/bin"
   ```
   
   It seems that we already do the relativize in the 
`getAllFilesInProvidedLibDirs`. So the input `providedLibDirs` is already 
relative path(e.g. `bin/config.sh`, `bin/find-flink-home.sh`).
   
   In addition, adding a non-jar file into the classpath does not take too many 
benefits. I suggest to only add the parent directory to the classpath. The Yarn 
ship files have the same behavior.
   
   Then we could have a simpler code here.
   ```
   if (fileName.endsWith("jar")) {
   resourcesJar.add(fileName);
   } else {
   resourcesDir.add(new Path(fileName).getParent().toString());
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-33000) SqlGatewayServiceITCase should utilize TestExecutorExtension instead of using a ThreadFactory

2023-09-07 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762707#comment-17762707
 ] 

Matthias Pohl commented on FLINK-33000:
---

Hi [~ade_hddu], thanks for offering your help. What are things that are still 
unclear to you?

> SqlGatewayServiceITCase should utilize TestExecutorExtension instead of using 
> a ThreadFactory
> -
>
> Key: FLINK-33000
> URL: https://issues.apache.org/jira/browse/FLINK-33000
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Gateway, Tests
>Affects Versions: 1.16.2, 1.18.0, 1.17.1, 1.19.0
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: starter
>
> {{SqlGatewayServiceITCase}} uses a {{ExecutorThreadFactory}} for its 
> asynchronous operations. Instead, one should use {{TestExecutorExtension}} to 
> ensure proper cleanup of threads.
> We might also want to remove the {{AbstractTestBase}} parent class because 
> that uses JUnit4 whereas {{SqlGatewayServiceITCase}} is already based on 
> JUnit5



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33060) Fix the javadoc of ListState.update/addAll about not allowing null value

2023-09-07 Thread Zakelly Lan (Jira)
Zakelly Lan created FLINK-33060:
---

 Summary: Fix the javadoc of ListState.update/addAll about not 
allowing null value
 Key: FLINK-33060
 URL: https://issues.apache.org/jira/browse/FLINK-33060
 Project: Flink
  Issue Type: Bug
Reporter: Zakelly Lan


After FLINK-8411, the ListState.update/add/addAll do not allow a null value 
passed in, while the javadoc says "If null is passed in, the state value will 
remain unchanged". This should be fixed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33060) Fix the javadoc of ListState.update/addAll about not allowing null value

2023-09-07 Thread Zakelly Lan (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zakelly Lan reassigned FLINK-33060:
---

Assignee: Zakelly Lan

> Fix the javadoc of ListState.update/addAll about not allowing null value
> 
>
> Key: FLINK-33060
> URL: https://issues.apache.org/jira/browse/FLINK-33060
> Project: Flink
>  Issue Type: Bug
>Reporter: Zakelly Lan
>Assignee: Zakelly Lan
>Priority: Minor
>
> After FLINK-8411, the ListState.update/add/addAll do not allow a null value 
> passed in, while the javadoc says "If null is passed in, the state value will 
> remain unchanged". This should be fixed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33053) Watcher leak in Zookeeper HA mode

2023-09-07 Thread Yangze Guo (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762700#comment-17762700
 ] 

Yangze Guo commented on FLINK-33053:


Thanks for the pointer [~mapohl]. I'll try to get a debug log and heap dump.

> Watcher leak in Zookeeper HA mode
> -
>
> Key: FLINK-33053
> URL: https://issues.apache.org/jira/browse/FLINK-33053
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.17.0, 1.17.1
>Reporter: Yangze Guo
>Priority: Critical
>
> We observe a watcher leak in our OLAP stress test when enabling Zookeeper HA 
> mode. TM's watches on the leader of JobMaster has not been stopped after job 
> finished.
> Here is how we re-produce this issue:
>  - Start a session cluster and enable Zookeeper HA mode.
>  - Continuously and concurrently submit short queries, e.g. WordCount to the 
> cluster.
>  - echo -n wchp | nc \{zk host} \{zk port} to get current watches.
> We can see a lot of watches on 
> /flink/\{cluster_name}/leader/\{job_id}/connection_info.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] flinkbot commented on pull request #23371: [FLINK-5865][state] Unwrapping exceptions and throw the original ones in state interfaces

2023-09-07 Thread via GitHub


flinkbot commented on PR #23371:
URL: https://github.com/apache/flink/pull/23371#issuecomment-1709932341

   
   ## CI report:
   
   * 4198ccb50d2dc4f18ed2ce1b3d57b2b04e96de32 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] Zakelly opened a new pull request, #23371: [FLINK-5865][state] Unwrapping exceptions and throw the original ones in state interfaces

2023-09-07 Thread via GitHub


Zakelly opened a new pull request, #23371:
URL: https://github.com/apache/flink/pull/23371

   ## What is the purpose of the change
   
   Some implementations of state interfaces wrap the original exception and 
throw as FlinkRuntimeException. Since all the interfaces claim to throw 
Exception, it is unnecessary to wrap exceptions into FlinkRuntimeException, 
which is not intuitive to users in log.
   
   ## Brief change log
   
   - Go through the implementation of state interfaces and unwrap the 
try...catch blocks in best effort.
   
   ## Verifying this change
   
   This change is a code cleanup without any test coverage.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / **no**)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
 - The serializers: (yes / **no** / don't know)
 - The runtime per-record code paths (performance sensitive): (**yes** / no 
/ don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / **no** / don't 
know)
 - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / **no**)
 - If yes, how is the feature documented? (**not applicable** / docs / 
JavaDocs / not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] zentol commented on a diff in pull request #22468: FLINK-31889: Add documentation for implementing/loading enrichers

2023-09-07 Thread via GitHub


zentol commented on code in PR #22468:
URL: https://github.com/apache/flink/pull/22468#discussion_r1318386858


##
docs/content/docs/deployment/advanced/failure_enrichers.md:
##
@@ -0,0 +1,112 @@
+---
+title: "Failure Enrichers"
+nav-title: failure-enrichers
+nav-parent_id: advanced
+nav-pos: 3
+---
+
+
+## Custom failure enrichers
+Flink provides a pluggable interface for users to register their custom logic 
and enrich failures with extra metadata labels (string kv pairs).
+The goal is to enable developers implement their own failure enrichment 
plugins to categorize job failures, expose custom metrics, make calls to 
external notification systems, and more.
+
+FailureEnrichers are triggered every time an exception is reported at runtime 
by the JobManager.
+Every FailureEnricher may asynchronously return labels associated with the 
failure that are then exposed via the JobManager's Rest interface (e.g., a 
'type:System' label implying the failure is categorized as a system error).
+
+
+### Implement a plugin for your custom enricher
+
+To implement a custom FailureEnricher plugin, you need to:
+
+- Add your own FailureEnricher by implementing the {{< gh_link 
file="/flink-core/src/main/java/org/apache/flink/core/failure/FailureEnricher.java"
 name="FailureEnricher" >}} interface.
+
+- Add your own FailureEnricherFactory by implementing the {{< gh_link 
file="/flink-core/src/main/java/org/apache/flink/core/failure/FailureEnricherFactory.java"
 name="FailureEnricherFactory" >}} interface.
+
+- Add a service entry. Create a file 
`META-INF/services/org.apache.flink.core.failure.FailureEnricherFactory` which 
contains the class name of your failure enricher factory class (see [Java 
Service 
Loader](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/ServiceLoader.html)
 docs for more details).
+
+
+Then, create a jar which includes your `FailureEnricher`, 
`FailureEnricherFactory`, `META-INF/services/` and all the external 
dependencies.
+Make a directory in `plugins/` of your Flink distribution with an arbitrary 
name, e.g. "failure-enrichment", and put the jar into this directory.
+See [Flink Plugin]({% link deployment/filesystems/plugins.md %}) for more 
details.
+
+{{< hint warning >}}
+Note that every FailureEnricher should have defined a set of {{< gh_link 
file="/flink-core/src/main/java/org/apache/flink/core/failure/FailureEnricher.java"
 name="output keys" >}} that may be associated with values. This set of keys 
has to be unique across enrichers otherwise may be ignored.

Review Comment:
   > otherwise may be ignored
   
   Otherwise _what_ may be ignored?



##
docs/content/docs/deployment/advanced/failure_enrichers.md:
##
@@ -0,0 +1,112 @@
+---
+title: "Failure Enrichers"
+nav-title: failure-enrichers
+nav-parent_id: advanced
+nav-pos: 3
+---
+
+
+## Custom failure enrichers
+Flink provides a pluggable interface for users to register their custom logic 
and enrich failures with extra metadata labels (string kv pairs).

Review Comment:
   ```suggestion
   Flink provides a pluggable interface for users to register their custom 
logic and enrich failures with extra metadata labels (string key-value pairs).
   ```



##
docs/content/docs/deployment/advanced/failure_enrichers.md:
##
@@ -0,0 +1,112 @@
+---
+title: "Failure Enrichers"
+nav-title: failure-enrichers
+nav-parent_id: advanced
+nav-pos: 3
+---
+
+
+## Custom failure enrichers
+Flink provides a pluggable interface for users to register their custom logic 
and enrich failures with extra metadata labels (string kv pairs).
+The goal is to enable developers implement their own failure enrichment 
plugins to categorize job failures, expose custom metrics, make calls to 
external notification systems, and more.
+
+FailureEnrichers are triggered every time an exception is reported at runtime 
by the JobManager.
+Every FailureEnricher may asynchronously return labels associated with the 
failure that are then exposed via the JobManager's Rest interface (e.g., a 
'type:System' label implying the failure is categorized as a system error).
+
+
+### Implement a plugin for your custom enricher
+
+To implement a custom FailureEnricher plugin, you need to:
+
+- Add your own FailureEnricher by implementing the {{< gh_link 
file="/flink-core/src/main/java/org/apache/flink/core/failure/FailureEnricher.java"
 name="FailureEnricher" >}} interface.
+
+- Add your own FailureEnricherFactory by implementing the {{< gh_link 
file="/flink-core/src/main/java/org/apache/flink/core/failure/FailureEnricherFactory.java"
 name="FailureEnricherFactory" >}} interface.
+
+- Add a service entry. Create a file 
`META-INF/services/org.apache.flink.core.failure.FailureEnricherFactory` which 
contains the class name of your failure enricher factory class (see [Java 
Service 
Loader](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/ServiceLoader.html)
 docs for more details).
+
+
+Then, 

[jira] [Commented] (FLINK-30314) Unable to read all records from compressed delimited format file

2023-09-07 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762689#comment-17762689
 ] 

Etienne Chauchot commented on FLINK-30314:
--

Ticket will be fixed with 
[this|https://issues.apache.org/jira/browse/FLINK-33059] feature

> Unable to read all records from compressed delimited format file
> 
>
> Key: FLINK-30314
> URL: https://issues.apache.org/jira/browse/FLINK-30314
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem
>Affects Versions: 1.16.0, 1.15.2, 1.17.1
>Reporter: Dmitry Yaraev
>Assignee: Etienne Chauchot
>Priority: Major
> Attachments: input.json, input.json.gz, input.json.zip
>
>
> I am reading gzipped JSON line-delimited files in the batch mode using 
> [FileSystem 
> Connector|https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/connectors/table/filesystem/].
>  For reading the files a new table is created with the following 
> configuration:
> {code:sql}
> CREATE TEMPORARY TABLE `my_database`.`my_table` (
>   `my_field1` BIGINT,
>   `my_field2` INT,
>   `my_field3` VARCHAR(2147483647)
> ) WITH (
>   'connector' = 'filesystem',
>   'path' = 'path-to-input-dir',
>   'format' = 'json',
>   'json.ignore-parse-errors' = 'false',
>   'json.fail-on-missing-field' = 'true'
> ) {code}
> In the input directory I have two files: input-0.json.gz and 
> input-1.json.gz. As it comes from the filenames, the files are compressed 
> with GZIP. Each of the files contains 10 records. The issue is that only 2 
> records from each file are read (4 in total). If decompressed versions of the 
> same data files are used, all 20 records are read.
> As far as I understand, that problem may be related to the fact that split 
> length, which is used when the files are read, is in fact the length of a 
> compressed file. So files are closed before all records are read from them 
> because read position of the decompressed file stream exceeds split length.
> Probably, it makes sense to add a flag to {{{}FSDataInputStream{}}}, so we 
> could identify if the file compressed or not. The flag can be set to true in 
> {{InputStreamFSInputWrapper}} because it is used for wrapping compressed file 
> streams. With such a flag it could be possible to differentiate 
> non-splittable compressed files and only rely on the end of the stream.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33059) Support transparent compression for file-connector for all file input formats

2023-09-07 Thread Etienne Chauchot (Jira)
Etienne Chauchot created FLINK-33059:


 Summary: Support transparent compression for file-connector for 
all file input formats
 Key: FLINK-33059
 URL: https://issues.apache.org/jira/browse/FLINK-33059
 Project: Flink
  Issue Type: Technical Debt
  Components: Connectors / FileSystem
Reporter: Etienne Chauchot
Assignee: Etienne Chauchot


Delimited file input formats (contrary to binary input format etc...) do not 
support compression via the existing decorator because split length is 
determined by the compressed file length lead to 
[this|https://issues.apache.org/jira/browse/FLINK-30314] bug .  We should force 
reading the whole file split (like it is done for binary input formats) on 
compressed files. Parallelism is still done at the file level (as now)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   >