[jira] [Commented] (FLINK-33940) Update the auto-derivation rule of max parallelism for enlarged upscaling space

2023-12-25 Thread Rui Fan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800435#comment-17800435
 ] 

Rui Fan commented on FLINK-33940:
-

Thanks [~gyfora] for the reminder! Too many divisors is useful for avoid data 
skew, 840 makes sense to me. (y)

> Update the auto-derivation rule of max parallelism for enlarged upscaling 
> space
> ---
>
> Key: FLINK-33940
> URL: https://issues.apache.org/jira/browse/FLINK-33940
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Core
>Reporter: Zhanghao Chen
>Assignee: Zhanghao Chen
>Priority: Major
>
> *Background*
> The choice of the max parallelism of an stateful operator is important as it 
> limits the upper bound of the parallelism of the opeartor while it can also 
> add extra overhead when being set too large. Currently, the max parallelism 
> of an opeartor is either fixed to a value specified by API core / pipeline 
> option or auto-derived with the following rules:
> {{min(max(roundUpToPowerOfTwo(operatorParallelism * 1.5), 128), 32767)}}
> *Problem*
> Recently, the elasticity of Flink jobs is becoming more and more valued by 
> users. The current auto-derived max parallelism was introduced a time time 
> ago and only allows the operator parallelism to be roughly doubled, which is 
> not desired for elasticity. Setting an max parallelism manually may not be 
> desired as well: users may not have the sufficient expertise to select a good 
> max-parallelism value.
> *Proposal*
> Update the auto-derivation rule of max parallelism to derive larger max 
> parallelism for better elasticity experience out of the box. A candidate is 
> as follows:
> {{min(max(roundUpToPowerOfTwo(operatorParallelism * {*}5{*}), {*}1024{*}), 
> 32767)}}
> Looking forward to your opinions on this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] Draft [flink]

2023-12-25 Thread via GitHub


flinkbot commented on PR #23993:
URL: https://github.com/apache/flink/pull/23993#issuecomment-1869336037

   
   ## CI report:
   
   * f4137f701ba8790ef75aefdca3c0d54e4fb7e61a UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-33940) Update the auto-derivation rule of max parallelism for enlarged upscaling space

2023-12-25 Thread Gyula Fora (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800433#comment-17800433
 ] 

Gyula Fora commented on FLINK-33940:


Consider 840 in instead of 1024, 840 has 32 divisors compared to the 11 
divisors of 1024 and it’s a smaller number… it’s a much better max parallelism 
setting in my opinion 

please see my previous comment regarding highly composite numbers 

 

> Update the auto-derivation rule of max parallelism for enlarged upscaling 
> space
> ---
>
> Key: FLINK-33940
> URL: https://issues.apache.org/jira/browse/FLINK-33940
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Core
>Reporter: Zhanghao Chen
>Assignee: Zhanghao Chen
>Priority: Major
>
> *Background*
> The choice of the max parallelism of an stateful operator is important as it 
> limits the upper bound of the parallelism of the opeartor while it can also 
> add extra overhead when being set too large. Currently, the max parallelism 
> of an opeartor is either fixed to a value specified by API core / pipeline 
> option or auto-derived with the following rules:
> {{min(max(roundUpToPowerOfTwo(operatorParallelism * 1.5), 128), 32767)}}
> *Problem*
> Recently, the elasticity of Flink jobs is becoming more and more valued by 
> users. The current auto-derived max parallelism was introduced a time time 
> ago and only allows the operator parallelism to be roughly doubled, which is 
> not desired for elasticity. Setting an max parallelism manually may not be 
> desired as well: users may not have the sufficient expertise to select a good 
> max-parallelism value.
> *Proposal*
> Update the auto-derivation rule of max parallelism to derive larger max 
> parallelism for better elasticity experience out of the box. A candidate is 
> as follows:
> {{min(max(roundUpToPowerOfTwo(operatorParallelism * {*}5{*}), {*}1024{*}), 
> 32767)}}
> Looking forward to your opinions on this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] Draft [flink]

2023-12-25 Thread via GitHub


JunRuiLee commented on PR #23993:
URL: https://github.com/apache/flink/pull/23993#issuecomment-1869335533

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Draft [flink]

2023-12-25 Thread via GitHub


JunRuiLee opened a new pull request, #23993:
URL: https://github.com/apache/flink/pull/23993

   Draft.
   Do not review this pr.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-33940) Update the auto-derivation rule of max parallelism for enlarged upscaling space

2023-12-25 Thread Rui Fan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800430#comment-17800430
 ] 

Rui Fan commented on FLINK-33940:
-

Thanks [~Zhanghao Chen] for driving this improvement and for the ping!:)

1024 as the min value of max parallelism makes sense to me, and our internal 
flink version also uses 1024 instead of 128. And it's fine for most of jobs.

IIUC, when the parallelism of one job is very small(it's 1 or 2) and the max 
parallelism is 1024, one subtask will have 1024 keyGroups. From state backend 
side, too many key groups may effect the performance. (This is my concern to 
change it by default in Flink Community.)

Note: this performance drop may be insignificant in a real production 
environment.

> Update the auto-derivation rule of max parallelism for enlarged upscaling 
> space
> ---
>
> Key: FLINK-33940
> URL: https://issues.apache.org/jira/browse/FLINK-33940
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Core
>Reporter: Zhanghao Chen
>Priority: Major
>
> *Background*
> The choice of the max parallelism of an stateful operator is important as it 
> limits the upper bound of the parallelism of the opeartor while it can also 
> add extra overhead when being set too large. Currently, the max parallelism 
> of an opeartor is either fixed to a value specified by API core / pipeline 
> option or auto-derived with the following rules:
> {{min(max(roundUpToPowerOfTwo(operatorParallelism * 1.5), 128), 32767)}}
> *Problem*
> Recently, the elasticity of Flink jobs is becoming more and more valued by 
> users. The current auto-derived max parallelism was introduced a time time 
> ago and only allows the operator parallelism to be roughly doubled, which is 
> not desired for elasticity. Setting an max parallelism manually may not be 
> desired as well: users may not have the sufficient expertise to select a good 
> max-parallelism value.
> *Proposal*
> Update the auto-derivation rule of max parallelism to derive larger max 
> parallelism for better elasticity experience out of the box. A candidate is 
> as follows:
> {{min(max(roundUpToPowerOfTwo(operatorParallelism * {*}5{*}), {*}1024{*}), 
> 32767)}}
> Looking forward to your opinions on this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33940) Update the auto-derivation rule of max parallelism for enlarged upscaling space

2023-12-25 Thread Rui Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Fan reassigned FLINK-33940:
---

Assignee: Zhanghao Chen

> Update the auto-derivation rule of max parallelism for enlarged upscaling 
> space
> ---
>
> Key: FLINK-33940
> URL: https://issues.apache.org/jira/browse/FLINK-33940
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Core
>Reporter: Zhanghao Chen
>Assignee: Zhanghao Chen
>Priority: Major
>
> *Background*
> The choice of the max parallelism of an stateful operator is important as it 
> limits the upper bound of the parallelism of the opeartor while it can also 
> add extra overhead when being set too large. Currently, the max parallelism 
> of an opeartor is either fixed to a value specified by API core / pipeline 
> option or auto-derived with the following rules:
> {{min(max(roundUpToPowerOfTwo(operatorParallelism * 1.5), 128), 32767)}}
> *Problem*
> Recently, the elasticity of Flink jobs is becoming more and more valued by 
> users. The current auto-derived max parallelism was introduced a time time 
> ago and only allows the operator parallelism to be roughly doubled, which is 
> not desired for elasticity. Setting an max parallelism manually may not be 
> desired as well: users may not have the sufficient expertise to select a good 
> max-parallelism value.
> *Proposal*
> Update the auto-derivation rule of max parallelism to derive larger max 
> parallelism for better elasticity experience out of the box. A candidate is 
> as follows:
> {{min(max(roundUpToPowerOfTwo(operatorParallelism * {*}5{*}), {*}1024{*}), 
> 32767)}}
> Looking forward to your opinions on this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-33939][runtime-web] Remove the husky dependency to make sure the global hooks work as expected [flink]

2023-12-25 Thread via GitHub


flinkbot commented on PR #23992:
URL: https://github.com/apache/flink/pull/23992#issuecomment-1869325717

   
   ## CI report:
   
   * 3f87d0950bae646a728bcd5b9c3e95dbda68 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (FLINK-33941) Use field index instead of field name about window time column

2023-12-25 Thread Jane Chan (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jane Chan reassigned FLINK-33941:
-

Assignee: xuyang

> Use field index instead of field name about window time column
> --
>
> Key: FLINK-33941
> URL: https://issues.apache.org/jira/browse/FLINK-33941
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Table SQL / Planner
>Reporter: xuyang
>Assignee: xuyang
>Priority: Minor
>  Labels: pull-request-available
>
> In some exec nodes like StreamExecGroupWindowAggregate and some rules like 
> BatchPhysicalWindowAggregateRule, planner uses "AggregateUtil#timeFieldIndex" 
> to access the actual time field index, instead of using the time field index 
> in LogicalWindow#timeAttribute directly. However, it would be more formal to 
> use the field index instead of the column field.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33941) Use field index instead of field name about window time column

2023-12-25 Thread Jane Chan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800427#comment-17800427
 ] 

Jane Chan commented on FLINK-33941:
---

Thanks, [~xuyangzhong] reporting this issue. You're right that we should rely 
on field index rather than field name mapping.

> Use field index instead of field name about window time column
> --
>
> Key: FLINK-33941
> URL: https://issues.apache.org/jira/browse/FLINK-33941
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Table SQL / Planner
>Reporter: xuyang
>Priority: Minor
>  Labels: pull-request-available
>
> In some exec nodes like StreamExecGroupWindowAggregate and some rules like 
> BatchPhysicalWindowAggregateRule, planner uses "AggregateUtil#timeFieldIndex" 
> to access the actual time field index, instead of using the time field index 
> in LogicalWindow#timeAttribute directly. However, it would be more formal to 
> use the field index instead of the column field.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [FLINK-33939][runtime-web] Remove the husky dependency to make sure the global hooks work as expected [flink]

2023-12-25 Thread via GitHub


simplejason opened a new pull request, #23992:
URL: https://github.com/apache/flink/pull/23992

   **_WIP: IN DISCUSSION_**
   
   
   ## What is the purpose of the change
   
   Remove husky dependencies to ensure global hooks work as expected
   
   ## Brief change log
   
   - remove dependency `husky` of `runtime-web`
   
   
   ## Verifying this change
   
   Please make sure both new and modified tests in this PR follows the 
conventions defined in our code quality guide: 
https://flink.apache.org/contributing/code-style-and-quality-common.html#testing
   
   *(Please pick either of the following options)*
   
   - [x] This change is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This change is already covered by existing tests, such as *(please describe 
tests)*.
   
   *(or)*
   
   This change added tests and can be verified as follows:
   
   *(example:)*
 - *Added integration tests for end-to-end deployment with large payloads 
(100MB)*
 - *Extended integration test for recovery after master (JobManager) 
failure*
 - *Added test that validates that TaskInfo is transferred only once across 
recoveries*
 - *Manually verified the change by running a 4 node cluster with 2 
JobManagers and 4 TaskManagers, a stateful streaming program, and killing one 
JobManager and two TaskManagers during the execution, verifying that recovery 
happens correctly.*
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
 - The serializers: (no)
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (no)
 - If yes, how is the feature documented? (not applicable / docs / JavaDocs 
/ not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-33939) Make husky in runtime-web no longer affect git global hooks

2023-12-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-33939:
---
Labels: pull-request-available  (was: )

> Make husky in runtime-web no longer affect git global hooks
> ---
>
> Key: FLINK-33939
> URL: https://issues.apache.org/jira/browse/FLINK-33939
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Web Frontend
>Reporter: Jason TANG
>Assignee: Jason TANG
>Priority: Minor
>  Labels: pull-request-available
>
> Since runtime-web relies on husky to ensure that front-end code changes are 
> detected before `git commit`, husky modifies the global git hooks 
> (core.hooksPath) so that core.hooksPath won't take effect if it's configured 
> globally, I thought it would be a good idea to make the front-end code 
> detection a optional command execution, which ensures that the globally 
> configured hooks are executed correctly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-32073][checkpoint] Implement file merging in snapshot [flink]

2023-12-25 Thread via GitHub


AlexYinHan commented on PR #23514:
URL: https://github.com/apache/flink/pull/23514#issuecomment-1869314576

   @curcur The CI failure has been addressed. (The file-merging checkpoint was 
unexpectedly enabled.) Would you please take another look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33941][table-planner] use field index instead of field name about window time column [flink]

2023-12-25 Thread via GitHub


flinkbot commented on PR #23991:
URL: https://github.com/apache/flink/pull/23991#issuecomment-1869311476

   
   ## CI report:
   
   * 35919f76384f970d54fd6a19f7f7e8cc90fa891d UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-33939) Make husky in runtime-web no longer affect git global hooks

2023-12-25 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800423#comment-17800423
 ] 

Xintong Song commented on FLINK-33939:
--

Sounds good to me. Thanks for reporting and volunteering on this, 
[~simplejason]. You are assigned. Please go ahead.

> Make husky in runtime-web no longer affect git global hooks
> ---
>
> Key: FLINK-33939
> URL: https://issues.apache.org/jira/browse/FLINK-33939
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Web Frontend
>Reporter: Jason TANG
>Assignee: Jason TANG
>Priority: Minor
>
> Since runtime-web relies on husky to ensure that front-end code changes are 
> detected before `git commit`, husky modifies the global git hooks 
> (core.hooksPath) so that core.hooksPath won't take effect if it's configured 
> globally, I thought it would be a good idea to make the front-end code 
> detection a optional command execution, which ensures that the globally 
> configured hooks are executed correctly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33939) Make husky in runtime-web no longer affect git global hooks

2023-12-25 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song reassigned FLINK-33939:


Assignee: Jason TANG

> Make husky in runtime-web no longer affect git global hooks
> ---
>
> Key: FLINK-33939
> URL: https://issues.apache.org/jira/browse/FLINK-33939
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Web Frontend
>Reporter: Jason TANG
>Assignee: Jason TANG
>Priority: Minor
>
> Since runtime-web relies on husky to ensure that front-end code changes are 
> detected before `git commit`, husky modifies the global git hooks 
> (core.hooksPath) so that core.hooksPath won't take effect if it's configured 
> globally, I thought it would be a good idea to make the front-end code 
> detection a optional command execution, which ensures that the globally 
> configured hooks are executed correctly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33941) Use field index instead of field name about window time column

2023-12-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-33941:
---
Labels: pull-request-available  (was: )

> Use field index instead of field name about window time column
> --
>
> Key: FLINK-33941
> URL: https://issues.apache.org/jira/browse/FLINK-33941
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Table SQL / Planner
>Reporter: xuyang
>Priority: Minor
>  Labels: pull-request-available
>
> In some exec nodes like StreamExecGroupWindowAggregate and some rules like 
> BatchPhysicalWindowAggregateRule, planner uses "AggregateUtil#timeFieldIndex" 
> to access the actual time field index, instead of using the time field index 
> in LogicalWindow#timeAttribute directly. However, it would be more formal to 
> use the field index instead of the column field.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [FLINK-33941][table-planner] use field index instead of field name about window time column [flink]

2023-12-25 Thread via GitHub


xuyangzhong opened a new pull request, #23991:
URL: https://github.com/apache/flink/pull/23991

   ## What is the purpose of the change
   
   In some exec nodes like StreamExecGroupWindowAggregate and some rules like 
BatchPhysicalWindowAggregateRule, planner uses "AggregateUtil#timeFieldIndex" 
to access the actual time field index, instead of using the time field index in 
LogicalWindow#timeAttribute directly. However, it would be more formal to use 
the field index instead of the column field.
   
   ## Brief change log
   
 - *Deprecate AggregateUtil#timeFieldIndex*
 - *update the code using "AggregateUtil.timeFieldIndex(input.getRowType, 
call.builder(), window.timeAttribute)" before to use  
"window.timeAttribute.getFieldIndex"*
 - *Fix the rule that doesn't update the index in window before while 
optimizing plan*
   
   ## Verifying this change
   
   All existent tests can cover this fix.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
 - The serializers: no
 - The runtime per-record code paths (performance sensitive): no
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
 - The S3 file system connector: no
   
   ## Documentation
   
 - Does this pull request introduce a new feature? no
 - If yes, how is the feature documented? no
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-33940) Update the auto-derivation rule of max parallelism for enlarged upscaling space

2023-12-25 Thread Zhanghao Chen (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800420#comment-17800420
 ] 

Zhanghao Chen commented on FLINK-33940:
---

Here I'm proposing to roundUpToPowerOfTwo(operatorParallelism * {*}5{*}), so we 
would allow at least a 5-fold increase of parallelism

> Update the auto-derivation rule of max parallelism for enlarged upscaling 
> space
> ---
>
> Key: FLINK-33940
> URL: https://issues.apache.org/jira/browse/FLINK-33940
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Core
>Reporter: Zhanghao Chen
>Priority: Major
>
> *Background*
> The choice of the max parallelism of an stateful operator is important as it 
> limits the upper bound of the parallelism of the opeartor while it can also 
> add extra overhead when being set too large. Currently, the max parallelism 
> of an opeartor is either fixed to a value specified by API core / pipeline 
> option or auto-derived with the following rules:
> {{min(max(roundUpToPowerOfTwo(operatorParallelism * 1.5), 128), 32767)}}
> *Problem*
> Recently, the elasticity of Flink jobs is becoming more and more valued by 
> users. The current auto-derived max parallelism was introduced a time time 
> ago and only allows the operator parallelism to be roughly doubled, which is 
> not desired for elasticity. Setting an max parallelism manually may not be 
> desired as well: users may not have the sufficient expertise to select a good 
> max-parallelism value.
> *Proposal*
> Update the auto-derivation rule of max parallelism to derive larger max 
> parallelism for better elasticity experience out of the box. A candidate is 
> as follows:
> {{min(max(roundUpToPowerOfTwo(operatorParallelism * {*}5{*}), {*}1024{*}), 
> 32767)}}
> Looking forward to your opinions on this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33940) Update the auto-derivation rule of max parallelism for enlarged upscaling space

2023-12-25 Thread Gyula Fora (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800416#comment-17800416
 ] 

Gyula Fora commented on FLINK-33940:


If we are going to change this, powers of 2 are not really good as they only 
allow you to double the parallelism. 



it’s better to consider factorial or highly composite numbers 
([https://mathworld.wolfram.com/HighlyCompositeNumber.html)] for practical use.

> Update the auto-derivation rule of max parallelism for enlarged upscaling 
> space
> ---
>
> Key: FLINK-33940
> URL: https://issues.apache.org/jira/browse/FLINK-33940
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Core
>Reporter: Zhanghao Chen
>Priority: Major
>
> *Background*
> The choice of the max parallelism of an stateful operator is important as it 
> limits the upper bound of the parallelism of the opeartor while it can also 
> add extra overhead when being set too large. Currently, the max parallelism 
> of an opeartor is either fixed to a value specified by API core / pipeline 
> option or auto-derived with the following rules:
> {{min(max(roundUpToPowerOfTwo(operatorParallelism * 1.5), 128), 32767)}}
> *Problem*
> Recently, the elasticity of Flink jobs is becoming more and more valued by 
> users. The current auto-derived max parallelism was introduced a time time 
> ago and only allows the operator parallelism to be roughly doubled, which is 
> not desired for elasticity. Setting an max parallelism manually may not be 
> desired as well: users may not have the sufficient expertise to select a good 
> max-parallelism value.
> *Proposal*
> Update the auto-derivation rule of max parallelism to derive larger max 
> parallelism for better elasticity experience out of the box. A candidate is 
> as follows:
> {{min(max(roundUpToPowerOfTwo(operatorParallelism * {*}5{*}), {*}1024{*}), 
> 32767)}}
> Looking forward to your opinions on this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33941) Use field index instead of field name about window time column

2023-12-25 Thread xuyang (Jira)
xuyang created FLINK-33941:
--

 Summary: Use field index instead of field name about window time 
column
 Key: FLINK-33941
 URL: https://issues.apache.org/jira/browse/FLINK-33941
 Project: Flink
  Issue Type: Technical Debt
  Components: Table SQL / Planner
Reporter: xuyang


In some exec nodes like StreamExecGroupWindowAggregate and some rules like 
BatchPhysicalWindowAggregateRule, planner uses "AggregateUtil#timeFieldIndex" 
to access the actual time field index, instead of using the time field index in 
LogicalWindow#timeAttribute directly. However, it would be more formal to use 
the field index instead of the column field.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-22793) HybridSource Table Implementation

2023-12-25 Thread waywtdcc (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800414#comment-17800414
 ] 

waywtdcc commented on FLINK-22793:
--

How is this going?

> HybridSource Table Implementation
> -
>
> Key: FLINK-22793
> URL: https://issues.apache.org/jira/browse/FLINK-22793
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / HybridSource
>Reporter: Nicholas Jiang
>Assignee: Ran Tao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-23633][FLIP-150][connector/common] HybridSource: Support dynamic stop position in FileSource [flink]

2023-12-25 Thread via GitHub


waywtdcc commented on PR #20289:
URL: https://github.com/apache/flink/pull/20289#issuecomment-1869298738

   How is this going?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-33940) Update the auto-derivation rule of max parallelism for enlarged upscaling space

2023-12-25 Thread Zhanghao Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhanghao Chen updated FLINK-33940:
--
Description: 
*Background*

The choice of the max parallelism of an stateful operator is important as it 
limits the upper bound of the parallelism of the opeartor while it can also add 
extra overhead when being set too large. Currently, the max parallelism of an 
opeartor is either fixed to a value specified by API core / pipeline option or 
auto-derived with the following rules:

{{min(max(roundUpToPowerOfTwo(operatorParallelism * 1.5), 128), 32767)}}

*Problem*

Recently, the elasticity of Flink jobs is becoming more and more valued by 
users. The current auto-derived max parallelism was introduced a time time ago 
and only allows the operator parallelism to be roughly doubled, which is not 
desired for elasticity. Setting an max parallelism manually may not be desired 
as well: users may not have the sufficient expertise to select a good 
max-parallelism value.

*Proposal*

Update the auto-derivation rule of max parallelism to derive larger max 
parallelism for better elasticity experience out of the box. A candidate is as 
follows:

{{min(max(roundUpToPowerOfTwo(operatorParallelism * {*}5{*}), {*}1024{*}), 
32767)}}

Looking forward to your opinions on this.

  was:
*Background*

The choice of the max parallelism of an stateful operator is important as it 
limits the upper bound of the parallelism of the opeartor while it can also add 
extra overhead when being set too large. Currently, the max parallelism of an 
opeartor is either fixed to a value specified by API core / pipeline option or 
auto-derived with the following rules:

`min(max(roundUpToPowerOfTwo(operatorParallelism * 1.5), 128), 32767)`

*Problem*

Recently, the elasticity of Flink jobs is becoming more and more valued by 
users. The current auto-derived max parallelism was introduced a time time ago 
and only allows the operator parallelism to be roughly doubled, which is not 
desired for elasticity. Setting an max parallelism manually may not be desired 
as well: users may not have the sufficient expertise to select a good 
max-parallelism value.

*Proposal*

Update the auto-derivation rule of max parallelism to derive larger max 
parallelism for better elasticity experience out of the box. A candidate is as 
follows:

`min(max(roundUpToPowerOfTwo(operatorParallelism * {*}5{*}), {*}1024{*}), 
32767)`

Looking forward to your opinions on this.


> Update the auto-derivation rule of max parallelism for enlarged upscaling 
> space
> ---
>
> Key: FLINK-33940
> URL: https://issues.apache.org/jira/browse/FLINK-33940
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Core
>Reporter: Zhanghao Chen
>Priority: Major
>
> *Background*
> The choice of the max parallelism of an stateful operator is important as it 
> limits the upper bound of the parallelism of the opeartor while it can also 
> add extra overhead when being set too large. Currently, the max parallelism 
> of an opeartor is either fixed to a value specified by API core / pipeline 
> option or auto-derived with the following rules:
> {{min(max(roundUpToPowerOfTwo(operatorParallelism * 1.5), 128), 32767)}}
> *Problem*
> Recently, the elasticity of Flink jobs is becoming more and more valued by 
> users. The current auto-derived max parallelism was introduced a time time 
> ago and only allows the operator parallelism to be roughly doubled, which is 
> not desired for elasticity. Setting an max parallelism manually may not be 
> desired as well: users may not have the sufficient expertise to select a good 
> max-parallelism value.
> *Proposal*
> Update the auto-derivation rule of max parallelism to derive larger max 
> parallelism for better elasticity experience out of the box. A candidate is 
> as follows:
> {{min(max(roundUpToPowerOfTwo(operatorParallelism * {*}5{*}), {*}1024{*}), 
> 32767)}}
> Looking forward to your opinions on this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33940) Update the auto-derivation rule of max parallelism for enlarged upscaling space

2023-12-25 Thread Zhanghao Chen (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800413#comment-17800413
 ] 

Zhanghao Chen commented on FLINK-33940:
---

cc [~mxm] [~fanrui] Looking forward to the opinions from you, active 
contributors on autoscaling

> Update the auto-derivation rule of max parallelism for enlarged upscaling 
> space
> ---
>
> Key: FLINK-33940
> URL: https://issues.apache.org/jira/browse/FLINK-33940
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Core
>Reporter: Zhanghao Chen
>Priority: Major
>
> *Background*
> The choice of the max parallelism of an stateful operator is important as it 
> limits the upper bound of the parallelism of the opeartor while it can also 
> add extra overhead when being set too large. Currently, the max parallelism 
> of an opeartor is either fixed to a value specified by API core / pipeline 
> option or auto-derived with the following rules:
> `min(max(roundUpToPowerOfTwo(operatorParallelism * 1.5), 128), 32767)`
> *Problem*
> Recently, the elasticity of Flink jobs is becoming more and more valued by 
> users. The current auto-derived max parallelism was introduced a time time 
> ago and only allows the operator parallelism to be roughly doubled, which is 
> not desired for elasticity. Setting an max parallelism manually may not be 
> desired as well: users may not have the sufficient expertise to select a good 
> max-parallelism value.
> *Proposal*
> Update the auto-derivation rule of max parallelism to derive larger max 
> parallelism for better elasticity experience out of the box. A candidate is 
> as follows:
> `min(max(roundUpToPowerOfTwo(operatorParallelism * {*}5{*}), {*}1024{*}), 
> 32767)`
> Looking forward to your opinions on this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33856) Add metrics to monitor the interaction performance between task and external storage system in the process of checkpoint making

2023-12-25 Thread Jufang He (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800412#comment-17800412
 ] 

Jufang He commented on FLINK-33856:
---

[~masteryhx] [~Weijie Guo]  I have created a PR for this issue, Could you help 
take a look?(https://github.com/apache/flink/pull/23989)

> Add metrics to monitor the interaction performance between task and external 
> storage system in the process of checkpoint making
> ---
>
> Key: FLINK-33856
> URL: https://issues.apache.org/jira/browse/FLINK-33856
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Affects Versions: 1.18.0
>Reporter: Jufang He
>Assignee: Jufang He
>Priority: Major
>  Labels: pull-request-available
>
> When Flink makes a checkpoint, the interaction performance with the external 
> file system has a great impact on the overall time-consuming. Therefore, it 
> is easy to observe the bottleneck point by adding performance indicators when 
> the task interacts with the external file storage system. These include: the 
> rate of file write , the latency to write the file, the latency to close the 
> file.
> In flink side add the above metrics has the following advantages: convenient 
> statistical different task E2E time-consuming; do not need to distinguish the 
> type of external storage system, can be unified in the 
> FsCheckpointStreamFactory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33940) Update the auto-derivation rule of max parallelism for enlarged upscaling space

2023-12-25 Thread Zhanghao Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhanghao Chen updated FLINK-33940:
--
Summary: Update the auto-derivation rule of max parallelism for enlarged 
upscaling space  (was: Update Update the auto-derivation rule of max 
parallelism for enlarged upscaling space)

> Update the auto-derivation rule of max parallelism for enlarged upscaling 
> space
> ---
>
> Key: FLINK-33940
> URL: https://issues.apache.org/jira/browse/FLINK-33940
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Core
>Reporter: Zhanghao Chen
>Priority: Major
>
> *Background*
> The choice of the max parallelism of an stateful operator is important as it 
> limits the upper bound of the parallelism of the opeartor while it can also 
> add extra overhead when being set too large. Currently, the max parallelism 
> of an opeartor is either fixed to a value specified by API core / pipeline 
> option or auto-derived with the following rules:
> `min(max(roundUpToPowerOfTwo(operatorParallelism * 1.5), 128), 32767)`
> *Problem*
> Recently, the elasticity of Flink jobs is becoming more and more valued by 
> users. The current auto-derived max parallelism was introduced a time time 
> ago and only allows the operator parallelism to be roughly doubled, which is 
> not desired for elasticity. Setting an max parallelism manually may not be 
> desired as well: users may not have the sufficient expertise to select a good 
> max-parallelism value.
> *Proposal*
> Update the auto-derivation rule of max parallelism to derive larger max 
> parallelism for better elasticity experience out of the box. A candidate is 
> as follows:
> `min(max(roundUpToPowerOfTwo(operatorParallelism * {*}5{*}), {*}1024{*}), 
> 32767)`
> Looking forward to your opinions on this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33940) Update Update the auto-derivation rule of max parallelism for enlarged upscaling space

2023-12-25 Thread Zhanghao Chen (Jira)
Zhanghao Chen created FLINK-33940:
-

 Summary: Update Update the auto-derivation rule of max parallelism 
for enlarged upscaling space
 Key: FLINK-33940
 URL: https://issues.apache.org/jira/browse/FLINK-33940
 Project: Flink
  Issue Type: Improvement
  Components: API / Core
Reporter: Zhanghao Chen


*Background*

The choice of the max parallelism of an stateful operator is important as it 
limits the upper bound of the parallelism of the opeartor while it can also add 
extra overhead when being set too large. Currently, the max parallelism of an 
opeartor is either fixed to a value specified by API core / pipeline option or 
auto-derived with the following rules:

`min(max(roundUpToPowerOfTwo(operatorParallelism * 1.5), 128), 32767)`

*Problem*

Recently, the elasticity of Flink jobs is becoming more and more valued by 
users. The current auto-derived max parallelism was introduced a time time ago 
and only allows the operator parallelism to be roughly doubled, which is not 
desired for elasticity. Setting an max parallelism manually may not be desired 
as well: users may not have the sufficient expertise to select a good 
max-parallelism value.

*Proposal*

Update the auto-derivation rule of max parallelism to derive larger max 
parallelism for better elasticity experience out of the box. A candidate is as 
follows:

`min(max(roundUpToPowerOfTwo(operatorParallelism * {*}5{*}), {*}1024{*}), 
32767)`

Looking forward to your opinions on this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33939) Make husky in runtime-web no longer affect git global hooks

2023-12-25 Thread Wencong Liu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800411#comment-17800411
 ] 

Wencong Liu commented on FLINK-33939:
-

Thanks for raising this issue!  I completely agree with your proposal to make 
front-end code detection an optional command execution in our use of husky with 
runtime-web. By doing this, we can preserve the functionality of any globally 
configured git hooks.

> Make husky in runtime-web no longer affect git global hooks
> ---
>
> Key: FLINK-33939
> URL: https://issues.apache.org/jira/browse/FLINK-33939
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Web Frontend
>Reporter: Jason TANG
>Priority: Minor
>
> Since runtime-web relies on husky to ensure that front-end code changes are 
> detected before `git commit`, husky modifies the global git hooks 
> (core.hooksPath) so that core.hooksPath won't take effect if it's configured 
> globally, I thought it would be a good idea to make the front-end code 
> detection a optional command execution, which ensures that the globally 
> configured hooks are executed correctly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33938) Correct implicit coercions in relational operators to adopt typescript 5.0

2023-12-25 Thread Ao Yuchen (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ao Yuchen updated FLINK-33938:
--
Description: 
Since TypeScript 5.0, there is a break change that implicit coercions in 
relational operators are forbidden [1].

So that the following code in 
flink-runtime-web/web-dashboard/src/app/components/humanize-date.pipe.ts get 
error:
{code:java}
public transform(
  value: number | string | Date,
  ...
): string | null | undefined {
  if (value == null || value === '' || value !== value || value < 0) {
return '-';
  } 
  ...
}{code}
The correctness improvement is availble in here 
[2][.|https://github.com/microsoft/TypeScript/pull/52048.]

I think we should optimize this type of code for better compatibility.

 

[1] 
[https://devblogs.microsoft.com/typescript/announcing-typescript-5-0/#forbidden-implicit-coercions-in-relational-operators]

[2] 
[https://github.com/microsoft/TypeScript/pull/52048|https://github.com/microsoft/TypeScript/pull/52048.]

  was:
Since TypeScript 5.0, there is a break change that implicit coercions in 
relational operators are forbidden 
([https://devblogs.microsoft.com/typescript/announcing-typescript-5-0/#forbidden-implicit-coercions-in-relational-operators]).

So that the following code in 
flink-runtime-web/web-dashboard/src/app/components/humanize-date.pipe.ts get 
error:
{code:java}
public transform(
  value: number | string | Date,
  ...
): string | null | undefined {
  if (value == null || value === '' || value !== value || value < 0) {
return '-';
  } 
  ...
}{code}
The correctness improvement is availble in 
[https://github.com/microsoft/TypeScript/pull/52048.]

I think we should optimize this type of code for better compatibility.


> Correct implicit coercions in relational operators to adopt typescript 5.0
> --
>
> Key: FLINK-33938
> URL: https://issues.apache.org/jira/browse/FLINK-33938
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Web Frontend
>Affects Versions: 1.19.0
>Reporter: Ao Yuchen
>Priority: Major
> Fix For: 1.19.0, 1.17.3, 1.18.2
>
>
> Since TypeScript 5.0, there is a break change that implicit coercions in 
> relational operators are forbidden [1].
> So that the following code in 
> flink-runtime-web/web-dashboard/src/app/components/humanize-date.pipe.ts get 
> error:
> {code:java}
> public transform(
>   value: number | string | Date,
>   ...
> ): string | null | undefined {
>   if (value == null || value === '' || value !== value || value < 0) {
> return '-';
>   } 
>   ...
> }{code}
> The correctness improvement is availble in here 
> [2][.|https://github.com/microsoft/TypeScript/pull/52048.]
> I think we should optimize this type of code for better compatibility.
>  
> [1] 
> [https://devblogs.microsoft.com/typescript/announcing-typescript-5-0/#forbidden-implicit-coercions-in-relational-operators]
> [2] 
> [https://github.com/microsoft/TypeScript/pull/52048|https://github.com/microsoft/TypeScript/pull/52048.]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33939) Make husky in runtime-web no longer affect git global hooks

2023-12-25 Thread Jason TANG (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800409#comment-17800409
 ] 

Jason TANG commented on FLINK-33939:


Feedback is welcome, I can submit a PR to remove the husky dependency on the 
front end : )

> Make husky in runtime-web no longer affect git global hooks
> ---
>
> Key: FLINK-33939
> URL: https://issues.apache.org/jira/browse/FLINK-33939
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Web Frontend
>Reporter: Jason TANG
>Priority: Minor
>
> Since runtime-web relies on husky to ensure that front-end code changes are 
> detected before `git commit`, husky modifies the global git hooks 
> (core.hooksPath) so that core.hooksPath won't take effect if it's configured 
> globally, I thought it would be a good idea to make the front-end code 
> detection a optional command execution, which ensures that the globally 
> configured hooks are executed correctly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33939) Make husky in runtime-web no longer affect git global hooks

2023-12-25 Thread Jason TANG (Jira)
Jason TANG created FLINK-33939:
--

 Summary: Make husky in runtime-web no longer affect git global 
hooks
 Key: FLINK-33939
 URL: https://issues.apache.org/jira/browse/FLINK-33939
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Web Frontend
Reporter: Jason TANG


Since runtime-web relies on husky to ensure that front-end code changes are 
detected before `git commit`, husky modifies the global git hooks 
(core.hooksPath) so that core.hooksPath won't take effect if it's configured 
globally, I thought it would be a good idea to make the front-end code 
detection a optional command execution, which ensures that the globally 
configured hooks are executed correctly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-25537] [JUnit5 Migration] Module: flink-core package api-common [flink]

2023-12-25 Thread via GitHub


GOODBOY008 commented on PR #23960:
URL: https://github.com/apache/flink/pull/23960#issuecomment-1869278941

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-30535) Introduce TTL state based benchmarks

2023-12-25 Thread Yun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800407#comment-17800407
 ] 

Yun Tang commented on FLINK-30535:
--

Some work done on Flink side:
master: 0c82f8af859a4f463a07f5dfb35648970c1c3425

> Introduce TTL state based benchmarks
> 
>
> Key: FLINK-30535
> URL: https://issues.apache.org/jira/browse/FLINK-30535
> Project: Flink
>  Issue Type: New Feature
>  Components: Benchmarks
>Reporter: Yun Tang
>Assignee: Zakelly Lan
>Priority: Major
>  Labels: pull-request-available
>
> This ticket is inspired by https://issues.apache.org/jira/browse/FLINK-30088 
> which wants to optimize the TTL state performance. I think it would be useful 
> to introduce state benchmarks based on TTL as Flink has some overhead to 
> support TTL.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-30535][Test] Customize TtlTimeProvider in state benchmarks [flink]

2023-12-25 Thread via GitHub


Myasuka merged PR #23985:
URL: https://github.com/apache/flink/pull/23985


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-33938) Correct implicit coercions in relational operators to adopt typescript 5.0

2023-12-25 Thread Ao Yuchen (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ao Yuchen updated FLINK-33938:
--
Description: 
Since TypeScript 5.0, there is a break change that implicit coercions in 
relational operators are forbidden 
([https://devblogs.microsoft.com/typescript/announcing-typescript-5-0/#forbidden-implicit-coercions-in-relational-operators]).

So that the following code in 
flink-runtime-web/web-dashboard/src/app/components/humanize-date.pipe.ts get 
error:
{code:java}
public transform(
  value: number | string | Date,
  ...
): string | null | undefined {
  if (value == null || value === '' || value !== value || value < 0) {
return '-';
  } 
  ...
}{code}
The correctness improvement is availble in 
[https://github.com/microsoft/TypeScript/pull/52048.]

I think we should optimize this type of code for better compatibility.

  was:
Since TypeScript 5.0, implicit coercions in relational operators are forbidden 
([https://devblogs.microsoft.com/typescript/announcing-typescript-5-0/#forbidden-implicit-coercions-in-relational-operators]).

So that the following code in 
flink-runtime-web/web-dashboard/src/app/components/humanize-date.pipe.ts get 
error:
{code:java}
public transform(
  value: number | string | Date,
  ...
): string | null | undefined {
  if (value == null || value === '' || value !== value || value < 0) {
return '-';
  } 
  ...
}{code}
The correctness improvement is availble in 
[https://github.com/microsoft/TypeScript/pull/52048.]

I think we should optimize this type of code for better compatibility.


> Correct implicit coercions in relational operators to adopt typescript 5.0
> --
>
> Key: FLINK-33938
> URL: https://issues.apache.org/jira/browse/FLINK-33938
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Web Frontend
>Affects Versions: 1.19.0
>Reporter: Ao Yuchen
>Priority: Major
> Fix For: 1.19.0, 1.17.3, 1.18.2
>
>
> Since TypeScript 5.0, there is a break change that implicit coercions in 
> relational operators are forbidden 
> ([https://devblogs.microsoft.com/typescript/announcing-typescript-5-0/#forbidden-implicit-coercions-in-relational-operators]).
> So that the following code in 
> flink-runtime-web/web-dashboard/src/app/components/humanize-date.pipe.ts get 
> error:
> {code:java}
> public transform(
>   value: number | string | Date,
>   ...
> ): string | null | undefined {
>   if (value == null || value === '' || value !== value || value < 0) {
> return '-';
>   } 
>   ...
> }{code}
> The correctness improvement is availble in 
> [https://github.com/microsoft/TypeScript/pull/52048.]
> I think we should optimize this type of code for better compatibility.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33938) Correct implicit coercions in relational operators to adopt typescript 5.0

2023-12-25 Thread Ao Yuchen (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ao Yuchen updated FLINK-33938:
--
Description: 
Since TypeScript 5.0, implicit coercions in relational operators are forbidden 
([https://devblogs.microsoft.com/typescript/announcing-typescript-5-0/#forbidden-implicit-coercions-in-relational-operators]).

So that the following code in 
flink-runtime-web/web-dashboard/src/app/components/humanize-date.pipe.ts get 
error:
{code:java}
public transform(
  value: number | string | Date,
  ...
): string | null | undefined {
  if (value == null || value === '' || value !== value || value < 0) {
return '-';
  } 
  ...
}{code}
The correctness improvement is availble in 
[https://github.com/microsoft/TypeScript/pull/52048.]

I think we should optimize this type of code for better compatibility.

  was:
Since TypeScript 5.0, implicit coercions in relations operators are forbidden 
([https://devblogs.microsoft.com/typescript/announcing-typescript-5-0/#forbidden-implicit-coercions-in-relational-operators]).

So that the following code in 
flink-runtime-web/web-dashboard/src/app/components/humanize-date.pipe.ts get 
error:
{code:java}
public transform(
  value: number | string | Date,
  ...
): string | null | undefined {
  if (value == null || value === '' || value !== value || value < 0) {
return '-';
  } 
  ...
}{code}
The correctness improvement is availble in 
[https://github.com/microsoft/TypeScript/pull/52048.]

I think we should optimize this type of code for better compatibility.


> Correct implicit coercions in relational operators to adopt typescript 5.0
> --
>
> Key: FLINK-33938
> URL: https://issues.apache.org/jira/browse/FLINK-33938
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Web Frontend
>Affects Versions: 1.19.0
>Reporter: Ao Yuchen
>Priority: Major
> Fix For: 1.19.0, 1.17.3, 1.18.2
>
>
> Since TypeScript 5.0, implicit coercions in relational operators are 
> forbidden 
> ([https://devblogs.microsoft.com/typescript/announcing-typescript-5-0/#forbidden-implicit-coercions-in-relational-operators]).
> So that the following code in 
> flink-runtime-web/web-dashboard/src/app/components/humanize-date.pipe.ts get 
> error:
> {code:java}
> public transform(
>   value: number | string | Date,
>   ...
> ): string | null | undefined {
>   if (value == null || value === '' || value !== value || value < 0) {
> return '-';
>   } 
>   ...
> }{code}
> The correctness improvement is availble in 
> [https://github.com/microsoft/TypeScript/pull/52048.]
> I think we should optimize this type of code for better compatibility.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33938) Correct implicit coercions in relational operators to adopt typescript 5.0

2023-12-25 Thread Ao Yuchen (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ao Yuchen updated FLINK-33938:
--
Summary: Correct implicit coercions in relational operators to adopt 
typescript 5.0  (was: Update Web UI to adopt typescript 5.0)

> Correct implicit coercions in relational operators to adopt typescript 5.0
> --
>
> Key: FLINK-33938
> URL: https://issues.apache.org/jira/browse/FLINK-33938
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Web Frontend
>Affects Versions: 1.19.0
>Reporter: Ao Yuchen
>Priority: Major
> Fix For: 1.19.0, 1.17.3, 1.18.2
>
>
> Since TypeScript 5.0, implicit coercions in relations operators are forbidden 
> ([https://devblogs.microsoft.com/typescript/announcing-typescript-5-0/#forbidden-implicit-coercions-in-relational-operators]).
> So that the following code in 
> flink-runtime-web/web-dashboard/src/app/components/humanize-date.pipe.ts get 
> error:
> {code:java}
> public transform(
>   value: number | string | Date,
>   ...
> ): string | null | undefined {
>   if (value == null || value === '' || value !== value || value < 0) {
> return '-';
>   } 
>   ...
> }{code}
> The correctness improvement is availble in 
> [https://github.com/microsoft/TypeScript/pull/52048.]
> I think we should optimize this type of code for better compatibility.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33938) Update Web UI to adopt typescript 5.0

2023-12-25 Thread Ao Yuchen (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ao Yuchen updated FLINK-33938:
--
Description: 
Since TypeScript 5.0, implicit coercions in relations operators are forbidden 
([https://devblogs.microsoft.com/typescript/announcing-typescript-5-0/#forbidden-implicit-coercions-in-relational-operators]).

So that the following code in 
flink-runtime-web/web-dashboard/src/app/components/humanize-date.pipe.ts get 
error:
{code:java}
public transform(
  value: number | string | Date,
  ...
): string | null | undefined {
  if (value == null || value === '' || value !== value || value < 0) {
return '-';
  } 
  ...
}{code}
The correctness improvement is availble in 
[https://github.com/microsoft/TypeScript/pull/52048.]

I think we should optimize this type of code for better compatibility.

  was:
Since TypeScript 5.x, implicit coercions in relations operators are forbidden 
([https://devblogs.microsoft.com/typescript/announcing-typescript-5-0/#forbidden-implicit-coercions-in-relational-operators]).

So that the following code in 
flink-runtime-web/web-dashboard/src/app/components/humanize-date.pipe.ts get 
error:
{code:java}
public transform(
  value: number | string | Date,
  ...
): string | null | undefined {
  if (value == null || value === '' || value !== value || value < 0) {
return '-';
  } 
  ...
}{code}
The correctness improvement is availble in 
[https://github.com/microsoft/TypeScript/pull/52048.]

I think we should optimize this type of code for better compatibility.


> Update Web UI to adopt typescript 5.0
> -
>
> Key: FLINK-33938
> URL: https://issues.apache.org/jira/browse/FLINK-33938
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Web Frontend
>Affects Versions: 1.19.0
>Reporter: Ao Yuchen
>Priority: Major
> Fix For: 1.19.0, 1.17.3, 1.18.2
>
>
> Since TypeScript 5.0, implicit coercions in relations operators are forbidden 
> ([https://devblogs.microsoft.com/typescript/announcing-typescript-5-0/#forbidden-implicit-coercions-in-relational-operators]).
> So that the following code in 
> flink-runtime-web/web-dashboard/src/app/components/humanize-date.pipe.ts get 
> error:
> {code:java}
> public transform(
>   value: number | string | Date,
>   ...
> ): string | null | undefined {
>   if (value == null || value === '' || value !== value || value < 0) {
> return '-';
>   } 
>   ...
> }{code}
> The correctness improvement is availble in 
> [https://github.com/microsoft/TypeScript/pull/52048.]
> I think we should optimize this type of code for better compatibility.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33938) Update Web UI to adopt typescript 5.0

2023-12-25 Thread Ao Yuchen (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ao Yuchen updated FLINK-33938:
--
Summary: Update Web UI to adopt typescript 5.0  (was: Update Web UI to 
adopt typescript 5.x )

> Update Web UI to adopt typescript 5.0
> -
>
> Key: FLINK-33938
> URL: https://issues.apache.org/jira/browse/FLINK-33938
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Web Frontend
>Affects Versions: 1.19.0
>Reporter: Ao Yuchen
>Priority: Major
> Fix For: 1.19.0, 1.17.3, 1.18.2
>
>
> Since TypeScript 5.x, implicit coercions in relations operators are forbidden 
> ([https://devblogs.microsoft.com/typescript/announcing-typescript-5-0/#forbidden-implicit-coercions-in-relational-operators]).
> So that the following code in 
> flink-runtime-web/web-dashboard/src/app/components/humanize-date.pipe.ts get 
> error:
> {code:java}
> public transform(
>   value: number | string | Date,
>   ...
> ): string | null | undefined {
>   if (value == null || value === '' || value !== value || value < 0) {
> return '-';
>   } 
>   ...
> }{code}
> The correctness improvement is availble in 
> [https://github.com/microsoft/TypeScript/pull/52048.]
> I think we should optimize this type of code for better compatibility.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33736) Update default value of exponential-delay.max-backoff and exponential-delay.backoff-multiplier

2023-12-25 Thread Rui Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Fan updated FLINK-33736:

Description: 
Update default value of exponential-delay.max-backoff from 5min to 1min.

Update default value of exponential-delay.backoff-multiplier from 2.0 to 1.5.

  was:
Update default value of exponential-delay.max-backoff from 5min to 1min.

Update default value of exponential-delay.backoff-multiplier from 2.0 to 1.2.


> Update default value of exponential-delay.max-backoff and 
> exponential-delay.backoff-multiplier
> --
>
> Key: FLINK-33736
> URL: https://issues.apache.org/jira/browse/FLINK-33736
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Rui Fan
>Assignee: Rui Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Update default value of exponential-delay.max-backoff from 5min to 1min.
> Update default value of exponential-delay.backoff-multiplier from 2.0 to 1.5.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33736) Update default value of exponential-delay.max-backoff and exponential-delay.backoff-multiplier

2023-12-25 Thread Rui Fan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800403#comment-17800403
 ] 

Rui Fan commented on FLINK-33736:
-

Merged to master(1.19) via : d9e8b2a7be9c516bf497d4a68c39ac7f12e3b293

> Update default value of exponential-delay.max-backoff and 
> exponential-delay.backoff-multiplier
> --
>
> Key: FLINK-33736
> URL: https://issues.apache.org/jira/browse/FLINK-33736
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Rui Fan
>Assignee: Rui Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Update default value of exponential-delay.max-backoff from 5min to 1min.
> Update default value of exponential-delay.backoff-multiplier from 2.0 to 1.2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-33736) Update default value of exponential-delay.max-backoff and exponential-delay.backoff-multiplier

2023-12-25 Thread Rui Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Fan resolved FLINK-33736.
-
Resolution: Fixed

> Update default value of exponential-delay.max-backoff and 
> exponential-delay.backoff-multiplier
> --
>
> Key: FLINK-33736
> URL: https://issues.apache.org/jira/browse/FLINK-33736
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Rui Fan
>Assignee: Rui Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Update default value of exponential-delay.max-backoff from 5min to 1min.
> Update default value of exponential-delay.backoff-multiplier from 2.0 to 1.2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-33736][Scheduler] Update default value of exponential-delay.max-backoff and exponential-delay.backoff-multiplier [flink]

2023-12-25 Thread via GitHub


1996fanrui merged PR #23911:
URL: https://github.com/apache/flink/pull/23911


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33736][Scheduler] Update default value of exponential-delay.max-backoff and exponential-delay.backoff-multiplier [flink]

2023-12-25 Thread via GitHub


1996fanrui commented on PR #23911:
URL: https://github.com/apache/flink/pull/23911#issuecomment-1869214268

   Thanks @RocMarshal for the quick review! Merging~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (FLINK-33938) Update Web UI to adopt typescript 5.x

2023-12-25 Thread Ao Yuchen (Jira)
Ao Yuchen created FLINK-33938:
-

 Summary: Update Web UI to adopt typescript 5.x 
 Key: FLINK-33938
 URL: https://issues.apache.org/jira/browse/FLINK-33938
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Web Frontend
Affects Versions: 1.19.0
Reporter: Ao Yuchen
 Fix For: 1.19.0, 1.17.3, 1.18.2


Since TypeScript 5.x, implicit coercions in relations operators are forbidden 
([https://devblogs.microsoft.com/typescript/announcing-typescript-5-0/#forbidden-implicit-coercions-in-relational-operators]).

So that the following code in 
flink-runtime-web/web-dashboard/src/app/components/humanize-date.pipe.ts get 
error:
{code:java}
public transform(
  value: number | string | Date,
  ...
): string | null | undefined {
  if (value == null || value === '' || value !== value || value < 0) {
return '-';
  } 
  ...
}{code}
The correctness improvement is availble in 
[https://github.com/microsoft/TypeScript/pull/52048.]

I think we should optimize this type of code for better compatibility.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-33743][runtime] Support consuming multiple subpartitions on a single channel [flink]

2023-12-25 Thread via GitHub


xintongsong commented on code in PR #23927:
URL: https://github.com/apache/flink/pull/23927#discussion_r1436062729


##
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/ResultSubpartitionIndexRange.java:
##
@@ -0,0 +1,102 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.io.network.partition;
+
+import org.apache.flink.runtime.executiongraph.IndexRange;
+
+import java.util.Iterator;
+
+import static org.apache.flink.util.Preconditions.checkArgument;
+
+/**
+ * A {@link ResultSubpartitionIndexSet} represented as a range of indexes. The 
range is inclusive.
+ */
+public class ResultSubpartitionIndexRange implements 
ResultSubpartitionIndexSet {

Review Comment:
   How is this different from `org.apache.flink.runtime.executiongraph`? Can we 
extract a common abstract from these two?



##
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/hybrid/tiered/storage/TieredStorageConsumerClient.java:
##
@@ -52,33 +55,39 @@ public class TieredStorageConsumerClient {
 Map>>
 currentConsumerAgentAndSegmentIds = new HashMap<>();
 
+private final List tieredStorageConsumerSpecs;
+
 public TieredStorageConsumerClient(
 List tierFactories,
 List tieredStorageConsumerSpecs,
 TieredStorageNettyService nettyService) {
 this.tierFactories = tierFactories;
 this.nettyService = nettyService;
 this.tierConsumerAgents = 
createTierConsumerAgents(tieredStorageConsumerSpecs);
+this.tieredStorageConsumerSpecs = tieredStorageConsumerSpecs;
 }
 
 public void start() {
 for (TierConsumerAgent tierConsumerAgent : tierConsumerAgents) {
 tierConsumerAgent.start();
+for (TieredStorageConsumerSpec spec : tieredStorageConsumerSpecs) {
+tierConsumerAgent.notifyRequiredSegmentId(
+spec.getPartitionId(), spec.getSubpartitionId(), 0);
+}

Review Comment:
   Why notifying for all sub-partitions at starting?



##
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/hybrid/tiered/netty/NettyConnectionReader.java:
##
@@ -19,18 +19,26 @@
 package org.apache.flink.runtime.io.network.partition.hybrid.tiered.netty;
 
 import org.apache.flink.runtime.io.network.buffer.Buffer;
+import 
org.apache.flink.runtime.io.network.partition.hybrid.tiered.common.TieredStorageSubpartitionId;
 import 
org.apache.flink.runtime.io.network.partition.hybrid.tiered.tier.TierConsumerAgent;
 
 import java.util.Optional;
 
 /** {@link NettyConnectionReader} is used by {@link TierConsumerAgent} to read 
buffer from netty. */
 public interface NettyConnectionReader {
+/**
+ * Notify the upstream the id of required segment that should be sent to 
netty connection.
+ *
+ * @param subpartitionId The id of the corresponding subpartition.
+ * @param segmentId The id of required segment.
+ */
+void notifyRequiredSegmentId(TieredStorageSubpartitionId subpartitionId, 
int segmentId);

Review Comment:
   Could you remind me why do we need to separate the notify from readBuffer? 
It feels like this notify interface is used for multiple purposes.



##
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/ResultSubpartitionIndexSet.java:
##
@@ -0,0 +1,28 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing 

Re: [PR] [FLINK-31192][connectors/dataGen] Fix dataGen takes too long to initi… [flink]

2023-12-25 Thread via GitHub


xuzhiwen1255 commented on PR #22010:
URL: https://github.com/apache/flink/pull/22010#issuecomment-1869194665

   @XComp I'm sorry that I haven't had time to deal with this problem recently 
due to some things. Now I plan to continue to deal with this issue. Do you have 
time to take a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-31650][metrics][rest] Remove transient metrics for subtasks in terminal state [flink]

2023-12-25 Thread via GitHub


wanglijie95 commented on PR #23447:
URL: https://github.com/apache/flink/pull/23447#issuecomment-1869194011

   Thanks @X-czh , and thanks for the review of @JunRuiLee. I will close this 
issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (FLINK-31650) Incorrect busyMsTimePerSecond metric value for FINISHED task

2023-12-25 Thread Lijie Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lijie Wang closed FLINK-31650.
--
Fix Version/s: 1.19.0
   1.18.1
   1.17.3
   Resolution: Fixed

> Incorrect busyMsTimePerSecond metric value for FINISHED task
> 
>
> Key: FLINK-31650
> URL: https://issues.apache.org/jira/browse/FLINK-31650
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Metrics, Runtime / REST
>Affects Versions: 1.17.0, 1.16.1, 1.15.4
>Reporter: Lijie Wang
>Assignee: Zhanghao Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0, 1.18.1, 1.17.3
>
> Attachments: busyMsTimePerSecond.png
>
>
> As shown in the figure below, the busyMsTimePerSecond of the FINISHED task is 
> 100%, which is obviously unreasonable.
> !busyMsTimePerSecond.png|width=1048,height=432!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31650) Incorrect busyMsTimePerSecond metric value for FINISHED task

2023-12-25 Thread Lijie Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800398#comment-17800398
 ] 

Lijie Wang commented on FLINK-31650:


Fixed via:
master(1.19) : dd028282e8ab19c0d1cd05fad02b63bbda6c1358
release-1.18: ff1ef789eba612d008424d5fc28fe5905f96fb9c
release-1.17: 024e970e27781a9fd1925d6c5938efbb364c6462

> Incorrect busyMsTimePerSecond metric value for FINISHED task
> 
>
> Key: FLINK-31650
> URL: https://issues.apache.org/jira/browse/FLINK-31650
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Metrics, Runtime / REST
>Affects Versions: 1.17.0, 1.16.1, 1.15.4
>Reporter: Lijie Wang
>Assignee: Zhanghao Chen
>Priority: Major
>  Labels: pull-request-available
> Attachments: busyMsTimePerSecond.png
>
>
> As shown in the figure below, the busyMsTimePerSecond of the FINISHED task is 
> 100%, which is obviously unreasonable.
> !busyMsTimePerSecond.png|width=1048,height=432!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-31650][metrics][rest][BP-1.17] Remove transient metrics for subtasks in terminal state [flink]

2023-12-25 Thread via GitHub


wanglijie95 merged PR #23988:
URL: https://github.com/apache/flink/pull/23988


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33743][runtime] Support consuming multiple subpartitions on a single channel [flink]

2023-12-25 Thread via GitHub


TanYuxin-tyx commented on PR #23927:
URL: https://github.com/apache/flink/pull/23927#issuecomment-1869187119

   @yunfengzhou-hub Thanks for the update. I have no more comments on the 
change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33795] Add a new config to forbid autoscale execution in the configured excluded periods [flink-kubernetes-operator]

2023-12-25 Thread via GitHub


flashJd commented on code in PR #728:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/728#discussion_r1436178955


##
flink-autoscaler/src/main/java/org/apache/flink/autoscaler/config/AutoScalerOptions.java:
##
@@ -72,6 +72,21 @@ private static ConfigOptions.OptionBuilder 
autoScalerConfig(String key) {
 .withDescription(
 "Stabilization period in which no new scaling will 
be executed");
 
+public static final ConfigOption> EXCLUDED_PERIODS =
+autoScalerConfig("excluded.periods")
+.stringType()
+.asList()
+.defaultValues()

Review Comment:
   Fixed, can you approve the workflow again, thanks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (FLINK-33937) Translate "Stateful Stream Processing" page into Chinese

2023-12-25 Thread Yongping Li (Jira)
Yongping Li created FLINK-33937:
---

 Summary: Translate "Stateful Stream Processing" page into Chinese
 Key: FLINK-33937
 URL: https://issues.apache.org/jira/browse/FLINK-33937
 Project: Flink
  Issue Type: Improvement
  Components: chinese-translation
Affects Versions: 1.8.0
Reporter: Yongping Li
 Fix For: 1.8.4
 Attachments: image-2023-12-26-08-54-14-041.png

The page is located at 
_"docs/content.zh/docs/concepts/stateful-stream-processing.md"_

_!image-2023-12-26-08-54-14-041.png!_



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33936) The aggregation of mini-batches should output the result even if the result is the same as before when TTL is configured.

2023-12-25 Thread Feng Jin (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Jin updated FLINK-33936:
-
Description: 
If mini-batch is enabled currently, and if the aggregated result is the same as 
the previous output, this current aggregation result will not be sent 
downstream.  This will cause downstream nodes to not receive updated data. If 
there is a TTL set for states at this time, the TTL of downstream will not be 
updated either.

The specific logic is as follows.

https://github.com/apache/flink/blob/a18c0cd3f0cdfd7e0acb53283f40cd2033a86472/flink-table/flink-table-runtime/src/main/java/org/apache/flink/table/runtime/operators/aggregate/MiniBatchGroupAggFunction.java#L224

{code:java}
if (!equaliser.equals(prevAggValue, newAggValue)) {
// new row is not same with prev row
if (generateUpdateBefore) {
// prepare UPDATE_BEFORE message for previous row
resultRow
.replace(currentKey, prevAggValue)
.setRowKind(RowKind.UPDATE_BEFORE);
out.collect(resultRow);
}
// prepare UPDATE_AFTER message for new row
resultRow.replace(currentKey, 
newAggValue).setRowKind(RowKind.UPDATE_AFTER);
out.collect(resultRow);
}
// new row is same with prev row, no need to output
{code}



When mini-batch is not enabled, even if the aggregation result of this time is 
the same as last time, new results will still be sent if TTL is set.

https://github.com/apache/flink/blob/e9353319ad625baa5b2c20fa709ab5b23f83c0f4/flink-table/flink-table-runtime/src/main/java/org/apache/flink/table/runtime/operators/aggregate/GroupAggFunction.java#L170

{code:java}

if (stateRetentionTime <= 0 && equaliser.equals(prevAggValue, 
newAggValue)) {
// newRow is the same as before and state cleaning is not 
enabled.
// We do not emit retraction and acc message.
// If state cleaning is enabled, we have to emit messages 
to prevent too early
// state eviction of downstream operators.
return;
} else {
// retract previous result
if (generateUpdateBefore) {
// prepare UPDATE_BEFORE message for previous row
resultRow
.replace(currentKey, prevAggValue)
.setRowKind(RowKind.UPDATE_BEFORE);
out.collect(resultRow);
}
// prepare UPDATE_AFTER message for new row
resultRow.replace(currentKey, 
newAggValue).setRowKind(RowKind.UPDATE_AFTER);
}
{code}


Therefore, based on the consideration of TTL scenarios, I believe that when 
mini-batch aggregation is enabled, new results should also output when the 
aggregated result is the same as the previous one.

  was:
If mini-batch is enabled currently, and if the aggregated result is the same as 
the previous output, this current aggregation result will not be sent 
downstream.  This will cause downstream nodes to not receive updated data. If 
there is a TTL set for states at this time, the TTL of downstream will not be 
updated either.

The specific logic is as follows.

https://github.com/apache/flink/blob/a18c0cd3f0cdfd7e0acb53283f40cd2033a86472/flink-table/flink-table-runtime/src/main/java/org/apache/flink/table/runtime/operators/aggregate/MiniBatchGroupAggFunction.java#L224

{code:java}
if (!equaliser.equals(prevAggValue, newAggValue)) {
// new row is not same with prev row
if (generateUpdateBefore) {
// prepare UPDATE_BEFORE message for previous row
resultRow
.replace(currentKey, prevAggValue)
.setRowKind(RowKind.UPDATE_BEFORE);
out.collect(resultRow);
}
// prepare UPDATE_AFTER message for new row
resultRow.replace(currentKey, 
newAggValue).setRowKind(RowKind.UPDATE_AFTER);
out.collect(resultRow);
}
// new row is same with prev row, no need to output
{code}



When mini-batch is not enabled, even if the aggregation result of this time is 
the same as last time, new results will still be sent if TTL is set.


[jira] [Updated] (FLINK-33936) The aggregation of mini-batches should output the result even if the result is the same as before when TTL is configured.

2023-12-25 Thread Feng Jin (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Jin updated FLINK-33936:
-
Summary: The aggregation of mini-batches should output the result even if 
the result is the same as before when TTL is configured.  (was: Mini-batch agg 
should output the result when the result is same as last if TTL is setted.)

> The aggregation of mini-batches should output the result even if the result 
> is the same as before when TTL is configured.
> -
>
> Key: FLINK-33936
> URL: https://issues.apache.org/jira/browse/FLINK-33936
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.18.0
>Reporter: Feng Jin
>Priority: Major
>
> If mini-batch is enabled currently, and if the aggregated result is the same 
> as the previous output, this current aggregation result will not be sent 
> downstream.  This will cause downstream nodes to not receive updated data. If 
> there is a TTL set for states at this time, the TTL of downstream will not be 
> updated either.
> The specific logic is as follows.
> https://github.com/apache/flink/blob/a18c0cd3f0cdfd7e0acb53283f40cd2033a86472/flink-table/flink-table-runtime/src/main/java/org/apache/flink/table/runtime/operators/aggregate/MiniBatchGroupAggFunction.java#L224
> {code:java}
> if (!equaliser.equals(prevAggValue, newAggValue)) {
> // new row is not same with prev row
> if (generateUpdateBefore) {
> // prepare UPDATE_BEFORE message for previous row
> resultRow
> .replace(currentKey, prevAggValue)
> .setRowKind(RowKind.UPDATE_BEFORE);
> out.collect(resultRow);
> }
> // prepare UPDATE_AFTER message for new row
> resultRow.replace(currentKey, 
> newAggValue).setRowKind(RowKind.UPDATE_AFTER);
> out.collect(resultRow);
> }
> // new row is same with prev row, no need to output
> {code}
> When mini-batch is not enabled, even if the aggregation result of this time 
> is the same as last time, new results will still be sent if TTL is set.
> https://github.com/apache/flink/blob/e9353319ad625baa5b2c20fa709ab5b23f83c0f4/flink-table/flink-table-runtime/src/main/java/org/apache/flink/table/runtime/operators/aggregate/GroupAggFunction.java#L170
> {code:java}
> if (stateRetentionTime <= 0 && equaliser.equals(prevAggValue, 
> newAggValue)) {
> // newRow is the same as before and state cleaning is not 
> enabled.
> // We do not emit retraction and acc message.
> // If state cleaning is enabled, we have to emit messages 
> to prevent too early
> // state eviction of downstream operators.
> return;
> } else {
> // retract previous result
> if (generateUpdateBefore) {
> // prepare UPDATE_BEFORE message for previous row
> resultRow
> .replace(currentKey, prevAggValue)
> .setRowKind(RowKind.UPDATE_BEFORE);
> out.collect(resultRow);
> }
> // prepare UPDATE_AFTER message for new row
> resultRow.replace(currentKey, 
> newAggValue).setRowKind(RowKind.UPDATE_AFTER);
> }
> {code}
> Therefore, based on the consideration of TTL scenarios, I believe that when 
> mini-batch aggregation is enabled, new results should also be issued when the 
> aggregated result is the same as the previous one.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33936) Mini-batch agg should output the result when the result is same as last if TTL is setted.

2023-12-25 Thread Feng Jin (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Jin updated FLINK-33936:
-
Summary: Mini-batch agg should output the result when the result is same as 
last if TTL is setted.  (was: Mini-batch should output the result when the 
result is same as last if TTL is setted.)

> Mini-batch agg should output the result when the result is same as last if 
> TTL is setted.
> -
>
> Key: FLINK-33936
> URL: https://issues.apache.org/jira/browse/FLINK-33936
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.18.0
>Reporter: Feng Jin
>Priority: Major
>
> If mini-batch is enabled currently, and if the aggregated result is the same 
> as the previous output, this current aggregation result will not be sent 
> downstream.  This will cause downstream nodes to not receive updated data. If 
> there is a TTL set for states at this time, the TTL of downstream will not be 
> updated either.
> The specific logic is as follows.
> https://github.com/apache/flink/blob/a18c0cd3f0cdfd7e0acb53283f40cd2033a86472/flink-table/flink-table-runtime/src/main/java/org/apache/flink/table/runtime/operators/aggregate/MiniBatchGroupAggFunction.java#L224
> {code:java}
> if (!equaliser.equals(prevAggValue, newAggValue)) {
> // new row is not same with prev row
> if (generateUpdateBefore) {
> // prepare UPDATE_BEFORE message for previous row
> resultRow
> .replace(currentKey, prevAggValue)
> .setRowKind(RowKind.UPDATE_BEFORE);
> out.collect(resultRow);
> }
> // prepare UPDATE_AFTER message for new row
> resultRow.replace(currentKey, 
> newAggValue).setRowKind(RowKind.UPDATE_AFTER);
> out.collect(resultRow);
> }
> // new row is same with prev row, no need to output
> {code}
> When mini-batch is not enabled, even if the aggregation result of this time 
> is the same as last time, new results will still be sent if TTL is set.
> https://github.com/apache/flink/blob/e9353319ad625baa5b2c20fa709ab5b23f83c0f4/flink-table/flink-table-runtime/src/main/java/org/apache/flink/table/runtime/operators/aggregate/GroupAggFunction.java#L170
> {code:java}
> if (stateRetentionTime <= 0 && equaliser.equals(prevAggValue, 
> newAggValue)) {
> // newRow is the same as before and state cleaning is not 
> enabled.
> // We do not emit retraction and acc message.
> // If state cleaning is enabled, we have to emit messages 
> to prevent too early
> // state eviction of downstream operators.
> return;
> } else {
> // retract previous result
> if (generateUpdateBefore) {
> // prepare UPDATE_BEFORE message for previous row
> resultRow
> .replace(currentKey, prevAggValue)
> .setRowKind(RowKind.UPDATE_BEFORE);
> out.collect(resultRow);
> }
> // prepare UPDATE_AFTER message for new row
> resultRow.replace(currentKey, 
> newAggValue).setRowKind(RowKind.UPDATE_AFTER);
> }
> {code}
> Therefore, based on the consideration of TTL scenarios, I believe that when 
> mini-batch aggregation is enabled, new results should also be issued when the 
> aggregated result is the same as the previous one.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33936) Mini-batch should output the result when the result is same as last if TTL is setted.

2023-12-25 Thread Feng Jin (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Jin updated FLINK-33936:
-
Description: 
If mini-batch is enabled currently, and if the aggregated result is the same as 
the previous output, this current aggregation result will not be sent 
downstream.  This will cause downstream nodes to not receive updated data. If 
there is a TTL set for states at this time, the TTL of downstream will not be 
updated either.

The specific logic is as follows.

https://github.com/apache/flink/blob/a18c0cd3f0cdfd7e0acb53283f40cd2033a86472/flink-table/flink-table-runtime/src/main/java/org/apache/flink/table/runtime/operators/aggregate/MiniBatchGroupAggFunction.java#L224

{code:java}
if (!equaliser.equals(prevAggValue, newAggValue)) {
// new row is not same with prev row
if (generateUpdateBefore) {
// prepare UPDATE_BEFORE message for previous row
resultRow
.replace(currentKey, prevAggValue)
.setRowKind(RowKind.UPDATE_BEFORE);
out.collect(resultRow);
}
// prepare UPDATE_AFTER message for new row
resultRow.replace(currentKey, 
newAggValue).setRowKind(RowKind.UPDATE_AFTER);
out.collect(resultRow);
}
// new row is same with prev row, no need to output
{code}



When mini-batch is not enabled, even if the aggregation result of this time is 
the same as last time, new results will still be sent if TTL is set.

https://github.com/apache/flink/blob/e9353319ad625baa5b2c20fa709ab5b23f83c0f4/flink-table/flink-table-runtime/src/main/java/org/apache/flink/table/runtime/operators/aggregate/GroupAggFunction.java#L170

{code:java}

if (stateRetentionTime <= 0 && equaliser.equals(prevAggValue, 
newAggValue)) {
// newRow is the same as before and state cleaning is not 
enabled.
// We do not emit retraction and acc message.
// If state cleaning is enabled, we have to emit messages 
to prevent too early
// state eviction of downstream operators.
return;
} else {
// retract previous result
if (generateUpdateBefore) {
// prepare UPDATE_BEFORE message for previous row
resultRow
.replace(currentKey, prevAggValue)
.setRowKind(RowKind.UPDATE_BEFORE);
out.collect(resultRow);
}
// prepare UPDATE_AFTER message for new row
resultRow.replace(currentKey, 
newAggValue).setRowKind(RowKind.UPDATE_AFTER);
}
{code}


Therefore, based on the consideration of TTL scenarios, I believe that when 
mini-batch aggregation is enabled, new results should also be issued when the 
aggregated result is the same as the previous one.

  was:
If mini-batch is enabled currently, and if the aggregated result is the same as 
the previous output, this current aggregation result will not be sent 
downstream.  This will cause downstream nodes to not receive updated data. If 
there is a TTL set for states at this time, the TTL of downstream will not be 
updated either.

The specific logic is as follows.

https://github.com/hackergin/flink/blob/a18c0cd3f0cdfd7e0acb53283f40cd2033a86472/flink-table/flink-table-runtime/src/main/java/org/apache/flink/table/runtime/operators/aggregate/MiniBatchGroupAggFunction.java#L224

{code:java}
if (!equaliser.equals(prevAggValue, newAggValue)) {
// new row is not same with prev row
if (generateUpdateBefore) {
// prepare UPDATE_BEFORE message for previous row
resultRow
.replace(currentKey, prevAggValue)
.setRowKind(RowKind.UPDATE_BEFORE);
out.collect(resultRow);
}
// prepare UPDATE_AFTER message for new row
resultRow.replace(currentKey, 
newAggValue).setRowKind(RowKind.UPDATE_AFTER);
out.collect(resultRow);
}
// new row is same with prev row, no need to output
{code}



When mini-batch is not enabled, even if the aggregation result of this time is 
the same as last time, new results will still be sent if TTL is set.


[jira] [Updated] (FLINK-33936) Mini-batch should output the result when the result is same as last if TTL is setted.

2023-12-25 Thread Feng Jin (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Jin updated FLINK-33936:
-
Description: 
If mini-batch is enabled currently, and if the aggregated result is the same as 
the previous output, this current aggregation result will not be sent 
downstream.  This will cause downstream nodes to not receive updated data. If 
there is a TTL set for states at this time, the TTL of downstream will not be 
updated either.

The specific logic is as follows.

https://github.com/hackergin/flink/blob/a18c0cd3f0cdfd7e0acb53283f40cd2033a86472/flink-table/flink-table-runtime/src/main/java/org/apache/flink/table/runtime/operators/aggregate/MiniBatchGroupAggFunction.java#L224

{code:java}
if (!equaliser.equals(prevAggValue, newAggValue)) {
// new row is not same with prev row
if (generateUpdateBefore) {
// prepare UPDATE_BEFORE message for previous row
resultRow
.replace(currentKey, prevAggValue)
.setRowKind(RowKind.UPDATE_BEFORE);
out.collect(resultRow);
}
// prepare UPDATE_AFTER message for new row
resultRow.replace(currentKey, 
newAggValue).setRowKind(RowKind.UPDATE_AFTER);
out.collect(resultRow);
}
// new row is same with prev row, no need to output
{code}



When mini-batch is not enabled, even if the aggregation result of this time is 
the same as last time, new results will still be sent if TTL is set.

https://github.com/hackergin/flink/blob/e9353319ad625baa5b2c20fa709ab5b23f83c0f4/flink-table/flink-table-runtime/src/main/java/org/apache/flink/table/runtime/operators/aggregate/GroupAggFunction.java#L170

{code:java}

if (stateRetentionTime <= 0 && equaliser.equals(prevAggValue, 
newAggValue)) {
// newRow is the same as before and state cleaning is not 
enabled.
// We do not emit retraction and acc message.
// If state cleaning is enabled, we have to emit messages 
to prevent too early
// state eviction of downstream operators.
return;
} else {
// retract previous result
if (generateUpdateBefore) {
// prepare UPDATE_BEFORE message for previous row
resultRow
.replace(currentKey, prevAggValue)
.setRowKind(RowKind.UPDATE_BEFORE);
out.collect(resultRow);
}
// prepare UPDATE_AFTER message for new row
resultRow.replace(currentKey, 
newAggValue).setRowKind(RowKind.UPDATE_AFTER);
}
{code}


Therefore, based on the consideration of TTL scenarios, I believe that when 
mini-batch aggregation is enabled, new results should also be issued when the 
aggregated result is the same as the previous one.

  was:
If mini-batch is enabled currently, and if the aggregated result is the same as 
the previous output, this time's aggregation result will not be sent 
downstream. The specific logic is as follows. This will cause downstream nodes 
to not receive updated data. If there is a TTL set for states at this time, the 
TTL of downstream will not be updated either.

https://github.com/hackergin/flink/blob/a18c0cd3f0cdfd7e0acb53283f40cd2033a86472/flink-table/flink-table-runtime/src/main/java/org/apache/flink/table/runtime/operators/aggregate/MiniBatchGroupAggFunction.java#L224

{code:java}
if (!equaliser.equals(prevAggValue, newAggValue)) {
// new row is not same with prev row
if (generateUpdateBefore) {
// prepare UPDATE_BEFORE message for previous row
resultRow
.replace(currentKey, prevAggValue)
.setRowKind(RowKind.UPDATE_BEFORE);
out.collect(resultRow);
}
// prepare UPDATE_AFTER message for new row
resultRow.replace(currentKey, 
newAggValue).setRowKind(RowKind.UPDATE_AFTER);
out.collect(resultRow);
}
// new row is same with prev row, no need to output
{code}



When mini-batch is not enabled, even if the aggregation result of this time is 
the same as last time, new results will still be sent if TTL is set.


[jira] [Created] (FLINK-33936) Mini-batch should output the result when the result is same as last if TTL is setted.

2023-12-25 Thread Feng Jin (Jira)
Feng Jin created FLINK-33936:


 Summary: Mini-batch should output the result when the result is 
same as last if TTL is setted.
 Key: FLINK-33936
 URL: https://issues.apache.org/jira/browse/FLINK-33936
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Runtime
Affects Versions: 1.18.0
Reporter: Feng Jin


If mini-batch is enabled currently, and if the aggregated result is the same as 
the previous output, this time's aggregation result will not be sent 
downstream. The specific logic is as follows. This will cause downstream nodes 
to not receive updated data. If there is a TTL set for states at this time, the 
TTL of downstream will not be updated either.

https://github.com/hackergin/flink/blob/a18c0cd3f0cdfd7e0acb53283f40cd2033a86472/flink-table/flink-table-runtime/src/main/java/org/apache/flink/table/runtime/operators/aggregate/MiniBatchGroupAggFunction.java#L224

{code:java}
if (!equaliser.equals(prevAggValue, newAggValue)) {
// new row is not same with prev row
if (generateUpdateBefore) {
// prepare UPDATE_BEFORE message for previous row
resultRow
.replace(currentKey, prevAggValue)
.setRowKind(RowKind.UPDATE_BEFORE);
out.collect(resultRow);
}
// prepare UPDATE_AFTER message for new row
resultRow.replace(currentKey, 
newAggValue).setRowKind(RowKind.UPDATE_AFTER);
out.collect(resultRow);
}
// new row is same with prev row, no need to output
{code}



When mini-batch is not enabled, even if the aggregation result of this time is 
the same as last time, new results will still be sent if TTL is set.

https://github.com/hackergin/flink/blob/e9353319ad625baa5b2c20fa709ab5b23f83c0f4/flink-table/flink-table-runtime/src/main/java/org/apache/flink/table/runtime/operators/aggregate/GroupAggFunction.java#L170

{code:java}

if (stateRetentionTime <= 0 && equaliser.equals(prevAggValue, 
newAggValue)) {
// newRow is the same as before and state cleaning is not 
enabled.
// We do not emit retraction and acc message.
// If state cleaning is enabled, we have to emit messages 
to prevent too early
// state eviction of downstream operators.
return;
} else {
// retract previous result
if (generateUpdateBefore) {
// prepare UPDATE_BEFORE message for previous row
resultRow
.replace(currentKey, prevAggValue)
.setRowKind(RowKind.UPDATE_BEFORE);
out.collect(resultRow);
}
// prepare UPDATE_AFTER message for new row
resultRow.replace(currentKey, 
newAggValue).setRowKind(RowKind.UPDATE_AFTER);
}
{code}


Therefore, based on the consideration of TTL scenarios, I believe that when 
mini-batch aggregation is enabled, new results should also be issued when the 
aggregated result is the same as the previous one.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-33784][table] Support Configuring CatalogStoreFactory via StreamExecutionEnvironment [flink]

2023-12-25 Thread via GitHub


flinkbot commented on PR #23990:
URL: https://github.com/apache/flink/pull/23990#issuecomment-1869049218

   
   ## CI report:
   
   * da80787fb7b6e345b94278cbd74ce53a0bc6df6b UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-33784) CatalogStoreFactory can not be configured via StreamExecutionEnvironment

2023-12-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-33784:
---
Labels: pull-request-available  (was: )

> CatalogStoreFactory can not be configured via StreamExecutionEnvironment
> 
>
> Key: FLINK-33784
> URL: https://issues.apache.org/jira/browse/FLINK-33784
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / API
>Reporter: Timo Walther
>Assignee: Feng Jin
>Priority: Major
>  Labels: pull-request-available
>
> The logic in TableEnvironment.create() has well-defined ordering which allows 
> to configure most settings via StreamExecutionEnvironment and 
> flink-conf.yaml. The discovery of CatalogStoreFactory should be postponed 
> until the final configuration is merged.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [FLINK-33784][table] Support Configuring CatalogStoreFactory via StreamExecutionEnvironment [flink]

2023-12-25 Thread via GitHub


hackergin opened a new pull request, #23990:
URL: https://github.com/apache/flink/pull/23990

   ## What is the purpose of the change
   
   The initialization of CatalogStore is currently before flink-conf.yaml and 
StreamTableEnvironment, which causes the configuration in flink-conf.yaml to 
not take effect. We should move the initialization of CatalogStore after the 
configuration merging.
   
   
   ## Brief change log
   
   * Postponed the discovery of CatalogStoreFactory until the final 
configuration is merged.
   * Using map type options for get CatalogStore confs
   
   
   ## Verifying this change
   
   Add unit test case in org.apache.flink.table.api.EnvironmentTest. Add 
CatalogStore Configuration in EnviromentSetting to verify if the discovery 
logic of CatalogStore can work properly, and whether other Enviroments can 
share the Catalog.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
 - The serializers: (no)
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (no)
 - If yes, how is the feature documented? (not applicable / docs / JavaDocs 
/ not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33795] Add a new config to forbid autoscale execution in the configured excluded periods [flink-kubernetes-operator]

2023-12-25 Thread via GitHub


1996fanrui commented on code in PR #728:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/728#discussion_r1436102422


##
flink-autoscaler/src/main/java/org/apache/flink/autoscaler/config/AutoScalerOptions.java:
##
@@ -72,6 +72,21 @@ private static ConfigOptions.OptionBuilder 
autoScalerConfig(String key) {
 .withDescription(
 "Stabilization period in which no new scaling will 
be executed");
 
+public static final ConfigOption> EXCLUDED_PERIODS =
+autoScalerConfig("excluded.periods")
+.stringType()
+.asList()
+.defaultValues()

Review Comment:
   ```suggestion
   .noDefaultValue()
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-32073][checkpoint] Implement file merging in snapshot [flink]

2023-12-25 Thread via GitHub


AlexYinHan commented on PR #23514:
URL: https://github.com/apache/flink/pull/23514#issuecomment-1868986046

   @flinkbot  run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33935][checkpoint] Improve the default value logic related to `execution.checkpointing.tolerable-failed-checkpoints` and `state.backend.type` [flink]

2023-12-25 Thread via GitHub


1996fanrui commented on PR #23987:
URL: https://github.com/apache/flink/pull/23987#issuecomment-1868981257

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33856][checkpointing] Add metrics for interaction performance with external storage system in the process of checkpoint making [flink]

2023-12-25 Thread via GitHub


flinkbot commented on PR #23989:
URL: https://github.com/apache/flink/pull/23989#issuecomment-1868979605

   
   ## CI report:
   
   * 1e679c2a9834caa0dc21a99a96c5f121d987ff67 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-31650][metrics][rest][BP-1.17] Remove transient metrics for subtasks in terminal state [flink]

2023-12-25 Thread via GitHub


X-czh commented on PR #23988:
URL: https://github.com/apache/flink/pull/23988#issuecomment-1868978897

   @wanglijie95 Would you mind taking a look at the BP?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-33856) Add metrics to monitor the interaction performance between task and external storage system in the process of checkpoint making

2023-12-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-33856:
---
Labels: pull-request-available  (was: )

> Add metrics to monitor the interaction performance between task and external 
> storage system in the process of checkpoint making
> ---
>
> Key: FLINK-33856
> URL: https://issues.apache.org/jira/browse/FLINK-33856
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Affects Versions: 1.18.0
>Reporter: Jufang He
>Assignee: Jufang He
>Priority: Major
>  Labels: pull-request-available
>
> When Flink makes a checkpoint, the interaction performance with the external 
> file system has a great impact on the overall time-consuming. Therefore, it 
> is easy to observe the bottleneck point by adding performance indicators when 
> the task interacts with the external file storage system. These include: the 
> rate of file write , the latency to write the file, the latency to close the 
> file.
> In flink side add the above metrics has the following advantages: convenient 
> statistical different task E2E time-consuming; do not need to distinguish the 
> type of external storage system, can be unified in the 
> FsCheckpointStreamFactory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [FLINK-33856][checkpointing] Add metrics for interaction performance with external storage system in the process of checkpoint making [flink]

2023-12-25 Thread via GitHub


hejufang opened a new pull request, #23989:
URL: https://github.com/apache/flink/pull/23989

   …with external storage system in the process of checkpoint making
   
   
   
   ## What is the purpose of the change
   
   *(For example: This pull request makes task deployment go through the blob 
server, rather than through RPC. That way we avoid re-transferring them on each 
deployment (during recovery).)*
   
   
   ## Brief change log
   
   *(for example:)*
 - *The TaskInfo is stored in the blob store on job creation time as a 
persistent artifact*
 - *Deployments RPC transmits only the blob storage reference*
 - *TaskManagers retrieve the TaskInfo from the blob cache*
   
   
   ## Verifying this change
   
   Please make sure both new and modified tests in this PR follows the 
conventions defined in our code quality guide: 
https://flink.apache.org/contributing/code-style-and-quality-common.html#testing
   
   *(Please pick either of the following options)*
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This change is already covered by existing tests, such as *(please describe 
tests)*.
   
   *(or)*
   
   This change added tests and can be verified as follows:
   
   *(example:)*
 - *Added integration tests for end-to-end deployment with large payloads 
(100MB)*
 - *Extended integration test for recovery after master (JobManager) 
failure*
 - *Added test that validates that TaskInfo is transferred only once across 
recoveries*
 - *Manually verified the change by running a 4 node cluster with 2 
JobManagers and 4 TaskManagers, a stateful streaming program, and killing one 
JobManager and two TaskManagers during the execution, verifying that recovery 
happens correctly.*
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / no)
 - The serializers: (yes / no / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / no / 
don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
 - The S3 file system connector: (yes / no / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / no)
 - If yes, how is the feature documented? (not applicable / docs / JavaDocs 
/ not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-30535][Test] Customize TtlTimeProvider in state benchmarks [flink]

2023-12-25 Thread via GitHub


Zakelly commented on code in PR #23985:
URL: https://github.com/apache/flink/pull/23985#discussion_r1436084300


##
flink-test-utils-parent/flink-test-utils/src/test/java/org/apache/flink/state/benchmark/StateBackendBenchmarkUtils.java:
##
@@ -216,7 +225,7 @@ private static HeapKeyedStateBackend 
createHeapKeyedStateBackend(File root
 return backendBuilder.build();
 }
 
-private static File prepareDirectory(String prefix, File parentDir) throws 
IOException {
+public static File prepareDirectory(String prefix, File parentDir) throws 
IOException {

Review Comment:
   Like I said in description:
   
   > Also by the way, the StateBackendBenchmarkUtils#prepareDirectory is made 
public for the de-duplication of code in RescalingBenchmarkBase in state 
benchmarks.
   
   You may see the [`RescalingBenchmarkBase#prepareDirectory` 
](https://github.com/apache/flink-benchmarks/blob/e0d922cb30c3cee827cec05981befae28ec4daa3/src/main/java/org/apache/flink/state/benchmark/RescalingBenchmarkBase.java#L66).
  I want to do some code abstraction there.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-31650][metrics][rest][BP-1.17] Remove transient metrics for subtasks in terminal state [flink]

2023-12-25 Thread via GitHub


flinkbot commented on PR #23988:
URL: https://github.com/apache/flink/pull/23988#issuecomment-1868970696

   
   ## CI report:
   
   * 4b320c2f3649172236331fa5ba4bef3650e8fe2c UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-32738][formats] PROTOBUF format supports projection push down [flink]

2023-12-25 Thread via GitHub


zhougit86 commented on PR #23323:
URL: https://github.com/apache/flink/pull/23323#issuecomment-1868970682

   > @zhougit86 Thank you for your patience and contribution. I will complete 
the review within the next two weeks and provide comments.
   
   @ljw-hit just a reminder to review, thx  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [FLINK-31650][metrics][rest] Remove transient metrics for subtasks in terminal state [flink]

2023-12-25 Thread via GitHub


X-czh opened a new pull request, #23988:
URL: https://github.com/apache/flink/pull/23988

   
   
   ## What is the purpose of the change
   
   Cherry picked from commit dd028282e8ab19c0d1cd05fad02b63bbda6c1358 (MR: 
https://github.com/apache/flink/pull/23447)
   
   ## Brief change log
   
   Removes transient metrics (idle/busy/backpressured time) for terminal 
subtasks in `MetricStore`.
   
   ## Verifying this change
   
   - Added UT to test that the busy time metric is removed for terminal 
subtasks.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / **no**)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
 - The serializers: (yes / **no** / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / **no** 
/ don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / **no** / don't 
know)
 - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / **no**)
 - If yes, how is the feature documented? (**not applicable** / docs / 
JavaDocs / not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-31650][metrics][rest] Remove transient metrics for subtasks in terminal state [flink]

2023-12-25 Thread via GitHub


X-czh commented on PR #23447:
URL: https://github.com/apache/flink/pull/23447#issuecomment-1868963641

   OK, I'll prepare it tonight


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-33927) Cleanup usage of deprecated ExecutionConfigOptions#(TABLE_EXEC_SHUFFLE_MODE, TABLE_EXEC_LEGACY_TRANSFORMATION_UIDS)

2023-12-25 Thread Yubin Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800308#comment-17800308
 ] 

Yubin Li commented on FLINK-33927:
--

[~qingyue] Looking forward your review :)

> Cleanup usage of deprecated ExecutionConfigOptions#(TABLE_EXEC_SHUFFLE_MODE, 
> TABLE_EXEC_LEGACY_TRANSFORMATION_UIDS)
> ---
>
> Key: FLINK-33927
> URL: https://issues.apache.org/jira/browse/FLINK-33927
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Planner
>Affects Versions: 1.19.0
>Reporter: Yubin Li
>Priority: Major
>  Labels: pull-request-available
>
> |org.apache.flink.table.api.config.ExecutionConfigOptions#TABLE_EXEC_SHUFFLE_MODE|
> |org.apache.flink.table.api.config.ExecutionConfigOptions#TABLE_EXEC_LEGACY_TRANSFORMATION_UIDS|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33921) Cleanup usage of deprecated IdleStateRetentionTime related method in org.apache.flink.table.api.TableConfig

2023-12-25 Thread Jane Chan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800307#comment-17800307
 ] 

Jane Chan commented on FLINK-33921:
---

[~liyubin117] Assigned to you.

> Cleanup usage of deprecated IdleStateRetentionTime related method in 
> org.apache.flink.table.api.TableConfig
> ---
>
> Key: FLINK-33921
> URL: https://issues.apache.org/jira/browse/FLINK-33921
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Planner
>Affects Versions: 1.19.0
>Reporter: Yubin Li
>Assignee: Yubin Li
>Priority: Major
>  Labels: pull-request-available
>
> getMinIdleStateRetentionTime()
> getMaxIdleStateRetentionTime()



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33921) Cleanup usage of deprecated IdleStateRetentionTime related method in org.apache.flink.table.api.TableConfig

2023-12-25 Thread Jane Chan (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jane Chan reassigned FLINK-33921:
-

Assignee: Yubin Li

> Cleanup usage of deprecated IdleStateRetentionTime related method in 
> org.apache.flink.table.api.TableConfig
> ---
>
> Key: FLINK-33921
> URL: https://issues.apache.org/jira/browse/FLINK-33921
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Planner
>Affects Versions: 1.19.0
>Reporter: Yubin Li
>Assignee: Yubin Li
>Priority: Major
>  Labels: pull-request-available
>
> getMinIdleStateRetentionTime()
> getMaxIdleStateRetentionTime()



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-31650][metrics][rest] Remove transient metrics for subtasks in terminal state [flink]

2023-12-25 Thread via GitHub


wanglijie95 commented on PR #23447:
URL: https://github.com/apache/flink/pull/23447#issuecomment-1868951155

   Hi @X-czh, kindly remind again, we need a PR for release-1.17 ~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-30535][Test] Customize TtlTimeProvider in state benchmarks [flink]

2023-12-25 Thread via GitHub


Myasuka commented on code in PR #23985:
URL: https://github.com/apache/flink/pull/23985#discussion_r1436070483


##
flink-test-utils-parent/flink-test-utils/src/test/java/org/apache/flink/state/benchmark/StateBackendBenchmarkUtils.java:
##
@@ -216,7 +225,7 @@ private static HeapKeyedStateBackend 
createHeapKeyedStateBackend(File root
 return backendBuilder.build();
 }
 
-private static File prepareDirectory(String prefix, File parentDir) throws 
IOException {
+public static File prepareDirectory(String prefix, File parentDir) throws 
IOException {

Review Comment:
Why we need to make this method public?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-33817) Allow ReadDefaultValues = False for non primitive types on Proto3

2023-12-25 Thread Benchao Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800304#comment-17800304
 ] 

Benchao Li commented on FLINK-33817:


[~maosuhan] What do you think of this? I have an impression that this is a 
limitation of proto3, so it's not any more in the later versions of proto3?

> Allow ReadDefaultValues = False for non primitive types on Proto3
> -
>
> Key: FLINK-33817
> URL: https://issues.apache.org/jira/browse/FLINK-33817
> Project: Flink
>  Issue Type: Improvement
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.18.0
>Reporter: Sai Sharath Dandi
>Priority: Major
>
> *Background*
>  
> The current Protobuf format 
> [implementation|https://github.com/apache/flink/blob/c3e2d163a637dca5f49522721109161bd7ebb723/flink-formats/flink-protobuf/src/main/java/org/apache/flink/formats/protobuf/deserialize/ProtoToRowConverter.java]
>  always sets ReadDefaultValues=False when using Proto3 version. This can 
> cause severe performance degradation for large Protobuf schemas with OneOf 
> fields as the entire generated code needs to be executed during 
> deserialization even when certain fields are not present in the data to be 
> deserialized and all the subsequent nested Fields can be skipped. Proto3 
> supports hasXXX() methods for checking field presence for non primitive types 
> since Proto version 
> [3.15|https://github.com/protocolbuffers/protobuf/releases/tag/v3.15.0]. In 
> the internal performance benchmarks in our company, we've seen almost 10x 
> difference in performance for one of our real production usecase when 
> allowing to set ReadDefaultValues=False with proto3 version. The exact 
> difference in performance depends on the schema complexity and data payload 
> but we should allow user to set readDefaultValue=False in general.
>  
> *Solution*
>  
> Support using ReadDefaultValues=False when using Proto3 version. We need to 
> be careful to check for field presence only on non-primitive types if 
> ReadDefaultValues is false and version used is Proto3



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33934) Flink SQL Source use raw format maybe lead to data lost

2023-12-25 Thread Cai Liuyang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cai Liuyang updated FLINK-33934:

Description: 
In our product we encounter a case that lead to data lost, the job info: 
   1. using flinkSQL that read data from messageQueue (our internal mq) and 
write to hive (only select value field, doesn't contain metadata field)
   2. the format of source table is raw format
 
But if we select value field and metadata field at the same time, than the data 
lost will not appear
 
After we review the code, we found that the reason is the object reuse of 
Raw-format(see code 
[RawFormatDeserializationSchema|https://github.com/apache/flink/blob/master/flink-table/flink-table-runtime/src/main/java/org/apache/flink/formats/raw/RawFormatDeserializationSchema.java#L62]),
 why object reuse will lead to this problem is below (take kafka as example):
    1. RawFormatDeserializationSchema will be used in the Fetcher-Thread of 
SourceOperator, Fetcher-Thread will read and deserialize data from kafka 
partition, than put data to ElementQueue (see code [SourceOperator FetcherTask 
|https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/source/reader/fetcher/FetchTask.java#L64])
    2. SourceOperator's main thread will pull data from the ElementQueue(which 
is shared with the FetcherThread) and process it (see code [SourceOperator main 
thread|https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/source/reader/SourceReaderBase.java#L188])
    3. For RawFormatDeserializationSchema, its deserialize function will return 
the same object([reuse rowData 
object|https://github.com/apache/flink/blob/master/flink-table/flink-table-runtime/src/main/java/org/apache/flink/formats/raw/RawFormatDeserializationSchema.java#L62])
    4. So, if elementQueue have element that not be consumed, than the 
fetcherThread can change the filed of the reused rawData that 
RawFormatDeserializationSchema::deserialize returned, this will lead to data 
lost;
 
The reason that we select value and metadata field at the same time will not 
encounter data lost is:
   if we select metadata field there will return a new RowData object see code: 
[DynamicKafkaDeserializationSchema deserialize with metadata field 
|https://github.com/apache/flink-connector-kafka/blob/main/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/table/DynamicKafkaDeserializationSchema.java#L249]
 and if we only select value filed, it will reuse the RowData object that 
formatDeserializationSchema returned see code 
[DynamicKafkaDeserializationSchema deserialize only with value 
field|https://github.com/apache/flink-connector-kafka/blob/main/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/table/DynamicKafkaDeserializationSchema.java#L113]
 
To solve this problem, i think we should remove reuse object of 
RawFormatDeserializationSchema.

  was:
In our product we encounter a case that lead to data lost, the job info: 
   1. using flinkSQL that read data from messageQueue and write to hive (only 
select value field, doesn't contain metadata field)
   2. the format of source table is raw format
 
But if we select value field and metadata field at the same time, than the data 
lost will not appear
 
After we review the code, we found that the reason is the object reuse of 
Raw-format(see code 
[RawFormatDeserializationSchema|https://github.com/apache/flink/blob/master/flink-table/flink-table-runtime/src/main/java/org/apache/flink/formats/raw/RawFormatDeserializationSchema.java#L62]),
 why object reuse will lead to this problem is below (take kafka as example):
    1. RawFormatDeserializationSchema will be used in the Fetcher-Thread of 
SourceOperator, Fetcher-Thread will read and deserialize data from kafka 
partition, than put data to ElementQueue (see code [SourceOperator FetcherTask 
|https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/source/reader/fetcher/FetchTask.java#L64])
    2. SourceOperator's main thread will pull data from the ElementQueue(which 
is shared with the FetcherThread) and process it (see code [SourceOperator main 
thread|https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/source/reader/SourceReaderBase.java#L188])
    3. For RawFormatDeserializationSchema, its deserialize function will return 
the same object([reuse rowData 
object|https://github.com/apache/flink/blob/master/flink-table/flink-table-runtime/src/main/java/org/apache/flink/formats/raw/RawFormatDeserializationSchema.java#L62])
    4. So, if elementQueue have element that not be consumed, than the 
fetcherThread can change the filed of the 

[jira] [Comment Edited] (FLINK-33934) Flink SQL Source use raw format maybe lead to data lost

2023-12-25 Thread Cai Liuyang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800273#comment-17800273
 ] 

Cai Liuyang edited comment on FLINK-33934 at 12/25/23 11:41 AM:


[~hackergin] Yeah,  turn off Object Reuse for all DeserializeSchema(that will 
be used by SourceOperator) is the simple way to avoid this data lost problem.

-- update --

after review kafka source code, kafka will deserialize from ConsumerRecord to 
real object in SourceOperator MainThread, So it will not encounter this 
problem, see code: [KafkaRecordEmitter 
code|https://github.com/apache/flink-connector-kafka/blob/main/flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/source/reader/KafkaRecordEmitter.java#L53C43-L53C43]
 

 

So, this problem will only appear in which case that 
DeserializeSchema::deserialize is called in FetcherThread

 


was (Author: cailiuyang):
[~hackergin] Yeah,  turn off Object Reuse for all DeserializeSchema(that will 
be used by SourceOperator) is the simple way to avoid this data lost problem.

> Flink SQL Source use raw format maybe lead to data lost
> ---
>
> Key: FLINK-33934
> URL: https://issues.apache.org/jira/browse/FLINK-33934
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile), Table 
> SQL / Runtime
>Reporter: Cai Liuyang
>Priority: Major
>
> In our product we encounter a case that lead to data lost, the job info: 
>    1. using flinkSQL that read data from messageQueue and write to hive (only 
> select value field, doesn't contain metadata field)
>    2. the format of source table is raw format
>  
> But if we select value field and metadata field at the same time, than the 
> data lost will not appear
>  
> After we review the code, we found that the reason is the object reuse of 
> Raw-format(see code 
> [RawFormatDeserializationSchema|https://github.com/apache/flink/blob/master/flink-table/flink-table-runtime/src/main/java/org/apache/flink/formats/raw/RawFormatDeserializationSchema.java#L62]),
>  why object reuse will lead to this problem is below (take kafka as example):
>     1. RawFormatDeserializationSchema will be used in the Fetcher-Thread of 
> SourceOperator, Fetcher-Thread will read and deserialize data from kafka 
> partition, than put data to ElementQueue (see code [SourceOperator 
> FetcherTask 
> |https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/source/reader/fetcher/FetchTask.java#L64])
>     2. SourceOperator's main thread will pull data from the 
> ElementQueue(which is shared with the FetcherThread) and process it (see code 
> [SourceOperator main 
> thread|https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/source/reader/SourceReaderBase.java#L188])
>     3. For RawFormatDeserializationSchema, its deserialize function will 
> return the same object([reuse rowData 
> object|https://github.com/apache/flink/blob/master/flink-table/flink-table-runtime/src/main/java/org/apache/flink/formats/raw/RawFormatDeserializationSchema.java#L62])
>     4. So, if elementQueue have element that not be consumed, than the 
> fetcherThread can change the filed of the reused rawData that 
> RawFormatDeserializationSchema::deserialize returned, this will lead to data 
> lost;
>  
> The reason that we select value and metadata field at the same time will not 
> encounter data lost is:
>    if we select metadata field there will return a new RowData object see 
> code: [DynamicKafkaDeserializationSchema deserialize with metadata field 
> |https://github.com/apache/flink-connector-kafka/blob/main/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/table/DynamicKafkaDeserializationSchema.java#L249]
>  and if we only select value filed, it will reuse the RowData object that 
> formatDeserializationSchema returned see code 
> [DynamicKafkaDeserializationSchema deserialize only with value 
> field|https://github.com/apache/flink-connector-kafka/blob/main/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/table/DynamicKafkaDeserializationSchema.java#L113]
>  
> To solve this problem, i think we should remove reuse object of 
> RawFormatDeserializationSchema.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33611) Support Large Protobuf Schemas

2023-12-25 Thread Benchao Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800303#comment-17800303
 ] 

Benchao Li commented on FLINK-33611:


[~dsaisharath] Sure, I'll review the PR. I may respond slow since I'm 
participating in the community in my spare time.

> Support Large Protobuf Schemas
> --
>
> Key: FLINK-33611
> URL: https://issues.apache.org/jira/browse/FLINK-33611
> Project: Flink
>  Issue Type: Improvement
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.18.0
>Reporter: Sai Sharath Dandi
>Assignee: Sai Sharath Dandi
>Priority: Major
>  Labels: pull-request-available
>
> h3. Background
> Flink serializes and deserializes protobuf format data by calling the decode 
> or encode method in GeneratedProtoToRow_XXX.java generated by codegen to 
> parse byte[] data into Protobuf Java objects. FLINK-32650 has introduced the 
> ability to split the generated code to improve the performance for large 
> Protobuf schemas. However, this is still not sufficient to support some 
> larger protobuf schemas as the generated code exceeds the java constant pool 
> size [limit|https://en.wikipedia.org/wiki/Java_class_file#The_constant_pool] 
> and we can see errors like "Too many constants" when trying to compile the 
> generated code. 
> *Solution*
> Since we already have the split code functionality already introduced, the 
> main proposal here is to now reuse the variable names across different split 
> method scopes. This will greatly reduce the constant pool size. One more 
> optimization is to only split the last code segment also only when the size 
> exceeds split threshold limit. Currently, the last segment of the generated 
> code is always being split which can lead to too many split methods and thus 
> exceed the constant pool size limit



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32880) Redundant taskManagers should always be fulfilled in FineGrainedSlotManager

2023-12-25 Thread Weihua Hu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800294#comment-17800294
 ] 

Weihua Hu commented on FLINK-32880:
---

[~mapohl]  Sorry for missing the merge information. I will remember it. Thanks 
for reminding.

> Redundant taskManagers should always be fulfilled in FineGrainedSlotManager
> ---
>
> Key: FLINK-32880
> URL: https://issues.apache.org/jira/browse/FLINK-32880
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.18.0
>Reporter: xiangyu feng
>Assignee: xiangyu feng
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.18.0
>
>
> Currently, if you are using FineGrainedSlotManager, when a redundant 
> taskmanager exit abnormally, no extra taskmanager will be replenished during 
> the
> periodical check in FineGrainedSlotManager.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [doc-fix] - changed preference to settings for setting pycharm for pyflink ide setting section [flink]

2023-12-25 Thread via GitHub


Myasuka closed pull request #23903: [doc-fix] - changed preference to settings 
for setting pycharm for pyflink ide setting section
URL: https://github.com/apache/flink/pull/23903


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33891] Remove the obsolete SingleJobGraphStore [flink]

2023-12-25 Thread via GitHub


huwh merged PR #23958:
URL: https://github.com/apache/flink/pull/23958


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Resolved] (FLINK-33891) Remove the obsolete SingleJobGraphStore

2023-12-25 Thread Weihua Hu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weihua Hu resolved FLINK-33891.
---
Fix Version/s: 1.19.0
   Resolution: Fixed

resolved in master: c80897642152690ef23978e2090be17e60a58039

> Remove the obsolete SingleJobGraphStore
> ---
>
> Key: FLINK-33891
> URL: https://issues.apache.org/jira/browse/FLINK-33891
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Runtime / Coordination
>Reporter: Zhanghao Chen
>Assignee: Zhanghao Chen
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> SingleJobGraphStore was introduced a long time ago in FLIP-6. It is only used 
> in a test case in DefaultDispatcherRunnerITCase#
> leaderChange_withBlockingJobManagerTermination_doesNotAffectNewLeader. We can 
> replace it with TestingJobGraphStore there and then safely remove the class. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33891) Remove the obsolete SingleJobGraphStore

2023-12-25 Thread Weihua Hu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weihua Hu reassigned FLINK-33891:
-

Assignee: Zhanghao Chen

> Remove the obsolete SingleJobGraphStore
> ---
>
> Key: FLINK-33891
> URL: https://issues.apache.org/jira/browse/FLINK-33891
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Runtime / Coordination
>Reporter: Zhanghao Chen
>Assignee: Zhanghao Chen
>Priority: Minor
>  Labels: pull-request-available
>
> SingleJobGraphStore was introduced a long time ago in FLIP-6. It is only used 
> in a test case in DefaultDispatcherRunnerITCase#
> leaderChange_withBlockingJobManagerTermination_doesNotAffectNewLeader. We can 
> replace it with TestingJobGraphStore there and then safely remove the class. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-33935][checkpoint] Improve the default value logic related to `execution.checkpointing.tolerable-failed-checkpoints` and `state.backend.type` [flink]

2023-12-25 Thread via GitHub


1996fanrui commented on PR #23987:
URL: https://github.com/apache/flink/pull/23987#issuecomment-1868918339

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33879] Avoids the potential hang of Hybrid Shuffle during redistribution [flink]

2023-12-25 Thread via GitHub


jiangxin369 commented on code in PR #23957:
URL: https://github.com/apache/flink/pull/23957#discussion_r1436042917


##
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/hybrid/tiered/storage/TieredStorageMemorySpec.java:
##
@@ -30,9 +30,21 @@ public class TieredStorageMemorySpec {
 /** The number of guaranteed buffers of this memory owner. */
 private final int numGuaranteedBuffers;
 
+/**
+ * Whether the buffers of this owner can definitely be recycled, even if 
the downstream does not
+ * consume them promptly.
+ */
+private final boolean definitelyRecycled;

Review Comment:
   IMO, these are two distinct concepts. For instance, a `BufferAccumulator` 
should be `DefinitelyRecycled`, but not contain 
`bufferReclaimRequestListeners`, and currently, there is no evidence to prove 
that a `DefinitelyRecycled Tier` must contain `bufferReclaimRequestListeners`. 
Therefore, I don't think we need to couple these two concepts.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33879] Avoids the potential hang of Hybrid Shuffle during redistribution [flink]

2023-12-25 Thread via GitHub


jiangxin369 commented on code in PR #23957:
URL: https://github.com/apache/flink/pull/23957#discussion_r1436041141


##
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/hybrid/tiered/storage/TieredStorageMemoryManagerImpl.java:
##
@@ -244,6 +282,39 @@ public void release() {
 ExceptionUtils.rethrow(e);
 }
 }
+while (!bufferQueue.isEmpty()) {
+MemorySegment segment = bufferQueue.poll();
+bufferPool.recycle(segment);
+numRequestedBuffers.decrementAndGet();
+}
+}
+
+@VisibleForTesting

Review Comment:
   The method is for testing the behavior of requesting and releasing buffers 
when there is a cached `bufferQueue` in the `TieredStorageMemoryManagerImpl`. 
Without this method, I cannot determine whether the behavior is logical, such 
as whether a Buffer is requested from the pool or from the internal 
`bufferQueue`.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33792] Generate the same code for the same logic [flink]

2023-12-25 Thread via GitHub


libenchao commented on PR #23984:
URL: https://github.com/apache/flink/pull/23984#issuecomment-1868890131

   @zoudan The PR looks good to me, except that the CI is failing, could you 
fix that?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-33934) Flink SQL Source use raw format maybe lead to data lost

2023-12-25 Thread Cai Liuyang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800273#comment-17800273
 ] 

Cai Liuyang commented on FLINK-33934:
-

[~hackergin] Yeah,  turn off Object Reuse for all DeserializeSchema(that will 
be used by SourceOperator) is the simple way to avoid this data lost problem.

> Flink SQL Source use raw format maybe lead to data lost
> ---
>
> Key: FLINK-33934
> URL: https://issues.apache.org/jira/browse/FLINK-33934
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile), Table 
> SQL / Runtime
>Reporter: Cai Liuyang
>Priority: Major
>
> In our product we encounter a case that lead to data lost, the job info: 
>    1. using flinkSQL that read data from messageQueue and write to hive (only 
> select value field, doesn't contain metadata field)
>    2. the format of source table is raw format
>  
> But if we select value field and metadata field at the same time, than the 
> data lost will not appear
>  
> After we review the code, we found that the reason is the object reuse of 
> Raw-format(see code 
> [RawFormatDeserializationSchema|https://github.com/apache/flink/blob/master/flink-table/flink-table-runtime/src/main/java/org/apache/flink/formats/raw/RawFormatDeserializationSchema.java#L62]),
>  why object reuse will lead to this problem is below (take kafka as example):
>     1. RawFormatDeserializationSchema will be used in the Fetcher-Thread of 
> SourceOperator, Fetcher-Thread will read and deserialize data from kafka 
> partition, than put data to ElementQueue (see code [SourceOperator 
> FetcherTask 
> |https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/source/reader/fetcher/FetchTask.java#L64])
>     2. SourceOperator's main thread will pull data from the 
> ElementQueue(which is shared with the FetcherThread) and process it (see code 
> [SourceOperator main 
> thread|https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/source/reader/SourceReaderBase.java#L188])
>     3. For RawFormatDeserializationSchema, its deserialize function will 
> return the same object([reuse rowData 
> object|https://github.com/apache/flink/blob/master/flink-table/flink-table-runtime/src/main/java/org/apache/flink/formats/raw/RawFormatDeserializationSchema.java#L62])
>     4. So, if elementQueue have element that not be consumed, than the 
> fetcherThread can change the filed of the reused rawData that 
> RawFormatDeserializationSchema::deserialize returned, this will lead to data 
> lost;
>  
> The reason that we select value and metadata field at the same time will not 
> encounter data lost is:
>    if we select metadata field there will return a new RowData object see 
> code: [DynamicKafkaDeserializationSchema deserialize with metadata field 
> |https://github.com/apache/flink-connector-kafka/blob/main/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/table/DynamicKafkaDeserializationSchema.java#L249]
>  and if we only select value filed, it will reuse the RowData object that 
> formatDeserializationSchema returned see code 
> [DynamicKafkaDeserializationSchema deserialize only with value 
> field|https://github.com/apache/flink-connector-kafka/blob/main/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/table/DynamicKafkaDeserializationSchema.java#L113]
>  
> To solve this problem, i think we should remove reuse object of 
> RawFormatDeserializationSchema.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33921) Cleanup usage of deprecated IdleStateRetentionTime related method in org.apache.flink.table.api.TableConfig

2023-12-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-33921:
---
Labels: pull-request-available  (was: )

> Cleanup usage of deprecated IdleStateRetentionTime related method in 
> org.apache.flink.table.api.TableConfig
> ---
>
> Key: FLINK-33921
> URL: https://issues.apache.org/jira/browse/FLINK-33921
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Planner
>Affects Versions: 1.19.0
>Reporter: Yubin Li
>Priority: Major
>  Labels: pull-request-available
>
> getMinIdleStateRetentionTime()
> getMaxIdleStateRetentionTime()



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-33921][table] Cleanup usage of deprecated IdleStateRetentionTime related method in org.apache.flink.table.api.TableConfig [flink]

2023-12-25 Thread via GitHub


liyubin117 commented on PR #23980:
URL: https://github.com/apache/flink/pull/23980#issuecomment-1868876743

   @LadyForest Hi, CI has passed, PTAL, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Comment Edited] (FLINK-33932) Support retry mechanism for rocksdb uploader

2023-12-25 Thread xiangyu feng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800258#comment-17800258
 ] 

xiangyu feng edited comment on FLINK-33932 at 12/25/23 8:55 AM:


Hi [~dianer17], thanks for creating  this. IMHO, this retry is very useful to 
improve the success rate of checkpoint. We can introduce a default fixed delay 
retry mechanism without adding any configuration, thus we don't need a flip for 
this jira. 

 

I have implemented a poc in this pr: 
[https://github.com/apache/flink/pull/23986], WDYT?


was (Author: JIRAUSER301129):
Hi [~dianer17], thanks for creating  this. IMHO, this retry is very useful to 
improve the success rate of checkpoint. We can introduce a default fixed delay 
retry mechanism without adding any configuration, thus we don't need a flip for 
this jira. 

 

I have implemented a poc in this pr: 
[https://github.com/apache/flink/pull/23986,] WDYT?

> Support retry mechanism for rocksdb uploader
> 
>
> Key: FLINK-33932
> URL: https://issues.apache.org/jira/browse/FLINK-33932
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Reporter: Guojun Li
>Priority: Major
>  Labels: pull-request-available
>
> Rocksdb uploader will throw exception and decline the current checkpoint if 
> there are errors when uploading to remote file system like hdfs.
> The exception is as below:
> 2023-12-19 08:46:00,197 INFO  
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Decline 
> checkpoint 2 by task 
> 5b008c2c048fa8534d648455e69f9497_fbcce2f96483f8f15e87dc6c9afd372f_183_0 of 
> job a025f19e at 
> application-6f1c6e3d-1702480803995-5093022-taskmanager-1-1 @ 
> fdbd:dc61:1a:101:0:0:0:36 (dataPort=38789).
> org.apache.flink.util.SerializedThrowable: 
> org.apache.flink.runtime.checkpoint.CheckpointException: Asynchronous task 
> checkpoint failed.
>     at 
> org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.handleExecutionException(AsyncCheckpointRunnable.java:320)
>  ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT]
>     at 
> org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.run(AsyncCheckpointRunnable.java:155)
>  ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT]
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  ~[?:?]
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  ~[?:?]
>     at java.lang.Thread.run(Thread.java:829) [?:?]
> Caused by: org.apache.flink.util.SerializedThrowable: java.lang.Exception: 
> Could not materialize checkpoint 2 for operator GlobalGroupAggregate[132] -> 
> Calc[133] (184/500)#0.
>     at 
> org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.handleExecutionException(AsyncCheckpointRunnable.java:298)
>  ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT]
>     ... 4 more
> Caused by: org.apache.flink.util.SerializedThrowable: 
> java.util.concurrent.ExecutionException: java.io.IOException: Could not flush 
> to file and close the file system output stream to 
> hdfs://xxx/shared/97329cbd-fb90-45ca-92de-edf52d6e6fc0 in order to obtain the 
> stream state handle
>     at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[?:?]
>     at java.util.concurrent.FutureTask.get(FutureTask.java:191) ~[?:?]
>     at 
> org.apache.flink.util.concurrent.FutureUtils.runIfNotDoneAndGet(FutureUtils.java:544)
>  ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT]
>     at 
> org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.(OperatorSnapshotFinalizer.java:54)
>  ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT]
>     at 
> org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.finalizeNonFinishedSnapshots(AsyncCheckpointRunnable.java:191)
>  ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT]
>     at 
> org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.run(AsyncCheckpointRunnable.java:124)
>  ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT]
>     ... 3 more
> Caused by: org.apache.flink.util.SerializedThrowable: java.io.IOException: 
> Could not flush to file and close the file system output stream to 
> hdfs://xxx/shared/97329cbd-fb90-45ca-92de-edf52d6e6fc0 in order to obtain the 
> stream state handle
>     at 
> org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.closeAndGetHandle(FsCheckpointStreamFactory.java:516)
>  ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT]
>     at 
> org.apache.flink.contrib.streaming.state.RocksDBStateUploader.uploadLocalFileToCheckpointFs(RocksDBStateUploader.java:157)
>  ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT]
>     at 
> 

[jira] [Commented] (FLINK-33934) Flink SQL Source use raw format maybe lead to data lost

2023-12-25 Thread Feng Jin (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800267#comment-17800267
 ] 

Feng Jin commented on FLINK-33934:
--

[~cailiuyang]  Does this mean that we should turn off Object Reuse for all 
DeserializeSchema? 

> Flink SQL Source use raw format maybe lead to data lost
> ---
>
> Key: FLINK-33934
> URL: https://issues.apache.org/jira/browse/FLINK-33934
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile), Table 
> SQL / Runtime
>Reporter: Cai Liuyang
>Priority: Major
>
> In our product we encounter a case that lead to data lost, the job info: 
>    1. using flinkSQL that read data from messageQueue and write to hive (only 
> select value field, doesn't contain metadata field)
>    2. the format of source table is raw format
>  
> But if we select value field and metadata field at the same time, than the 
> data lost will not appear
>  
> After we review the code, we found that the reason is the object reuse of 
> Raw-format(see code 
> [RawFormatDeserializationSchema|https://github.com/apache/flink/blob/master/flink-table/flink-table-runtime/src/main/java/org/apache/flink/formats/raw/RawFormatDeserializationSchema.java#L62]),
>  why object reuse will lead to this problem is below (take kafka as example):
>     1. RawFormatDeserializationSchema will be used in the Fetcher-Thread of 
> SourceOperator, Fetcher-Thread will read and deserialize data from kafka 
> partition, than put data to ElementQueue (see code [SourceOperator 
> FetcherTask 
> |https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/source/reader/fetcher/FetchTask.java#L64])
>     2. SourceOperator's main thread will pull data from the 
> ElementQueue(which is shared with the FetcherThread) and process it (see code 
> [SourceOperator main 
> thread|https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/source/reader/SourceReaderBase.java#L188])
>     3. For RawFormatDeserializationSchema, its deserialize function will 
> return the same object([reuse rowData 
> object|https://github.com/apache/flink/blob/master/flink-table/flink-table-runtime/src/main/java/org/apache/flink/formats/raw/RawFormatDeserializationSchema.java#L62])
>     4. So, if elementQueue have element that not be consumed, than the 
> fetcherThread can change the filed of the reused rawData that 
> RawFormatDeserializationSchema::deserialize returned, this will lead to data 
> lost;
>  
> The reason that we select value and metadata field at the same time will not 
> encounter data lost is:
>    if we select metadata field there will return a new RowData object see 
> code: [DynamicKafkaDeserializationSchema deserialize with metadata field 
> |https://github.com/apache/flink-connector-kafka/blob/main/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/table/DynamicKafkaDeserializationSchema.java#L249]
>  and if we only select value filed, it will reuse the RowData object that 
> formatDeserializationSchema returned see code 
> [DynamicKafkaDeserializationSchema deserialize only with value 
> field|https://github.com/apache/flink-connector-kafka/blob/main/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/table/DynamicKafkaDeserializationSchema.java#L113]
>  
> To solve this problem, i think we should remove reuse object of 
> RawFormatDeserializationSchema.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-33935][checkpoint] Improve the default value logic related to `execution.checkpointing.tolerable-failed-checkpoints` and `state.backend.type` [flink]

2023-12-25 Thread via GitHub


flinkbot commented on PR #23987:
URL: https://github.com/apache/flink/pull/23987#issuecomment-1868837454

   
   ## CI report:
   
   * dd1f509ca1feb399f4e7e05b7c3127a0c76c1361 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



  1   2   >