[jira] [Comment Edited] (FLINK-35124) Connector Release Fails to run Checkstyle

2024-04-18 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838572#comment-17838572
 ] 

Etienne Chauchot edited comment on FLINK-35124 at 4/18/24 9:45 AM:
---

Ok fair enough to put back {_}ci{_},  _maven_ and _releasing_ dirs in the 
pristine source and exclude only shared (because it refers to an external 
repo). But did you find the reason why the suppressions.xml path ends up being 
/tools/maven/suppressions.xml and not tools/maven/suppressions.xml ?

 

 


was (Author: echauchot):
Ok fair enough to put back {_}ci{_},  _maven_ and _releasing_ dirs in the 
pristine source. But did you find the reason why the suppressions.xml path ends 
up being /tools/maven/suppressions.xml and not tools/maven/suppressions.xml ?

 

 

> Connector Release Fails to run Checkstyle
> -
>
> Key: FLINK-35124
> URL: https://issues.apache.org/jira/browse/FLINK-35124
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Reporter: Danny Cranmer
>Assignee: Danny Cranmer
>Priority: Major
>  Labels: pull-request-available
>
> During a release of the AWS connectors the build was failing at the 
> \{{./tools/releasing/shared/stage_jars.sh}} step due to a checkstyle error.
>  
> {code:java}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-checkstyle-plugin:3.1.2:check (validate) on 
> project flink-connector-aws: Failed during checkstyle execution: Unable to 
> find suppressions file at location: /tools/maven/suppressions.xml: Could not 
> find resource '/tools/maven/suppressions.xml'. -> [Help 1] {code}
>  
> Looks like it is caused by this 
> [https://github.com/apache/flink-connector-shared-utils/commit/a75b89ee3f8c9a03e97ead2d0bd9d5b7bb02b51a]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35124) Connector Release Fails to run Checkstyle

2024-04-18 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838572#comment-17838572
 ] 

Etienne Chauchot commented on FLINK-35124:
--

Ok fair enough to put back {_}ci{_},  _maven_ and _releasing_ dirs in the 
pristine source. But did you find the reason why the suppressions.xml path ends 
up being /tools/maven/suppressions.xml and not tools/maven/suppressions.xml ?

 

 

> Connector Release Fails to run Checkstyle
> -
>
> Key: FLINK-35124
> URL: https://issues.apache.org/jira/browse/FLINK-35124
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Reporter: Danny Cranmer
>Assignee: Danny Cranmer
>Priority: Major
>  Labels: pull-request-available
>
> During a release of the AWS connectors the build was failing at the 
> \{{./tools/releasing/shared/stage_jars.sh}} step due to a checkstyle error.
>  
> {code:java}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-checkstyle-plugin:3.1.2:check (validate) on 
> project flink-connector-aws: Failed during checkstyle execution: Unable to 
> find suppressions file at location: /tools/maven/suppressions.xml: Could not 
> find resource '/tools/maven/suppressions.xml'. -> [Help 1] {code}
>  
> Looks like it is caused by this 
> [https://github.com/apache/flink-connector-shared-utils/commit/a75b89ee3f8c9a03e97ead2d0bd9d5b7bb02b51a]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35124) Connector Release Fails to run Checkstyle

2024-04-17 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838277#comment-17838277
 ] 

Etienne Chauchot commented on FLINK-35124:
--

It was failing on cassandra before the _utils.sh change, at the time I did a 
quick workaround by copying the suppression.xml file to /tools

> Connector Release Fails to run Checkstyle
> -
>
> Key: FLINK-35124
> URL: https://issues.apache.org/jira/browse/FLINK-35124
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Reporter: Danny Cranmer
>Priority: Major
>
> During a release of the AWS connectors the build was failing at the 
> \{{./tools/releasing/shared/stage_jars.sh}} step due to a checkstyle error.
>  
> {code:java}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-checkstyle-plugin:3.1.2:check (validate) on 
> project flink-connector-aws: Failed during checkstyle execution: Unable to 
> find suppressions file at location: /tools/maven/suppressions.xml: Could not 
> find resource '/tools/maven/suppressions.xml'. -> [Help 1] {code}
>  
> Looks like it is caused by this 
> [https://github.com/apache/flink-connector-shared-utils/commit/a75b89ee3f8c9a03e97ead2d0bd9d5b7bb02b51a]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-35124) Connector Release Fails to run Checkstyle

2024-04-17 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838277#comment-17838277
 ] 

Etienne Chauchot edited comment on FLINK-35124 at 4/17/24 4:21 PM:
---

It was failing on cassandra before the _utils.sh change, at the time I did a 
quick workaround by copying the suppression.xml file to /tools.

I think it was failing also for the other connectors.


was (Author: echauchot):
It was failing on cassandra before the _utils.sh change, at the time I did a 
quick workaround by copying the suppression.xml file to /tools

> Connector Release Fails to run Checkstyle
> -
>
> Key: FLINK-35124
> URL: https://issues.apache.org/jira/browse/FLINK-35124
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Reporter: Danny Cranmer
>Priority: Major
>
> During a release of the AWS connectors the build was failing at the 
> \{{./tools/releasing/shared/stage_jars.sh}} step due to a checkstyle error.
>  
> {code:java}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-checkstyle-plugin:3.1.2:check (validate) on 
> project flink-connector-aws: Failed during checkstyle execution: Unable to 
> find suppressions file at location: /tools/maven/suppressions.xml: Could not 
> find resource '/tools/maven/suppressions.xml'. -> [Help 1] {code}
>  
> Looks like it is caused by this 
> [https://github.com/apache/flink-connector-shared-utils/commit/a75b89ee3f8c9a03e97ead2d0bd9d5b7bb02b51a]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35124) Connector Release Fails to run Checkstyle

2024-04-17 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838198#comment-17838198
 ] 

Etienne Chauchot commented on FLINK-35124:
--

Fount it ! The suppression.xml issue you mentioned is not related to the change 
in the _utils.sh script 
([https://github.com/apache/flink-connector-shared-utils/commit/a75b89ee3f8c9a03e97ead2d0bd9d5b7bb02b51a])
 at all. It is a bug in the conf. Take a close look, there is an extra / in the 
path:

{code:java}
Unable to find suppressions file at location: /tools/maven/suppressions.xml
{code}

 
It should refer  to *tools* directory from the current dir and not to *tools* 
directory from */*

 

> Connector Release Fails to run Checkstyle
> -
>
> Key: FLINK-35124
> URL: https://issues.apache.org/jira/browse/FLINK-35124
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Reporter: Danny Cranmer
>Priority: Major
>
> During a release of the AWS connectors the build was failing at the 
> \{{./tools/releasing/shared/stage_jars.sh}} step due to a checkstyle error.
>  
> {code:java}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-checkstyle-plugin:3.1.2:check (validate) on 
> project flink-connector-aws: Failed during checkstyle execution: Unable to 
> find suppressions file at location: /tools/maven/suppressions.xml: Could not 
> find resource '/tools/maven/suppressions.xml'. -> [Help 1] {code}
>  
> Looks like it is caused by this 
> [https://github.com/apache/flink-connector-shared-utils/commit/a75b89ee3f8c9a03e97ead2d0bd9d5b7bb02b51a]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-35124) Connector Release Fails to run Checkstyle

2024-04-17 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838195#comment-17838195
 ] 

Etienne Chauchot edited comment on FLINK-35124 at 4/17/24 1:42 PM:
---

To be more precise, I meant: before function create_pristine_source had 
_--exclude "tools/releasing/shared"_ which was to avoid having the submodule 
release utils scripts inside the source release. But it lead to having the 
_tools_ directory in the release. In the case of regular connectors it was 
containing ci and maven subdirs and in case of connector-parent, the tools 
directory ended up empty. So it was better to remove the whole tools directory 
from the release. 

 

I'm currently checking on the cassandra connector for the checkstyle error you 
mentioned. I'll get back to you in that thread

 


was (Author: echauchot):
To be more precise, I meant: before function create_pristine_source had 
_--exclude "tools/releasing/shared"_ which was to avoid having the submodule 
release utils scripts inside the source release. But it lead to having the 
_tools_ directory in the release. In the case of regular connectors it was 
containing ci and maven subdirs and in case of connector-parent, the tools 
directory ended up empty.

> Connector Release Fails to run Checkstyle
> -
>
> Key: FLINK-35124
> URL: https://issues.apache.org/jira/browse/FLINK-35124
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Reporter: Danny Cranmer
>Priority: Major
>
> During a release of the AWS connectors the build was failing at the 
> \{{./tools/releasing/shared/stage_jars.sh}} step due to a checkstyle error.
>  
> {code:java}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-checkstyle-plugin:3.1.2:check (validate) on 
> project flink-connector-aws: Failed during checkstyle execution: Unable to 
> find suppressions file at location: /tools/maven/suppressions.xml: Could not 
> find resource '/tools/maven/suppressions.xml'. -> [Help 1] {code}
>  
> Looks like it is caused by this 
> [https://github.com/apache/flink-connector-shared-utils/commit/a75b89ee3f8c9a03e97ead2d0bd9d5b7bb02b51a]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35124) Connector Release Fails to run Checkstyle

2024-04-17 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838195#comment-17838195
 ] 

Etienne Chauchot commented on FLINK-35124:
--

To be more precise, I meant: before function create_pristine_source had 
_--exclude "tools/releasing/shared"_ which was to avoid having the submodule 
release utils scripts inside the source release. But it lead to having the 
_tools_ directory in the release. In the case of regular connectors it was 
containing ci and maven subdirs and in case of connector-parent, the tools 
directory ended up empty.

> Connector Release Fails to run Checkstyle
> -
>
> Key: FLINK-35124
> URL: https://issues.apache.org/jira/browse/FLINK-35124
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Reporter: Danny Cranmer
>Priority: Major
>
> During a release of the AWS connectors the build was failing at the 
> \{{./tools/releasing/shared/stage_jars.sh}} step due to a checkstyle error.
>  
> {code:java}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-checkstyle-plugin:3.1.2:check (validate) on 
> project flink-connector-aws: Failed during checkstyle execution: Unable to 
> find suppressions file at location: /tools/maven/suppressions.xml: Could not 
> find resource '/tools/maven/suppressions.xml'. -> [Help 1] {code}
>  
> Looks like it is caused by this 
> [https://github.com/apache/flink-connector-shared-utils/commit/a75b89ee3f8c9a03e97ead2d0bd9d5b7bb02b51a]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35124) Connector Release Fails to run Checkstyle

2024-04-17 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838045#comment-17838045
 ] 

Etienne Chauchot commented on FLINK-35124:
--

[~dannycranmer] yes, it was because it lead to an empty tools directory in the 
source release. I'll take a look for the suppressions.xml

> Connector Release Fails to run Checkstyle
> -
>
> Key: FLINK-35124
> URL: https://issues.apache.org/jira/browse/FLINK-35124
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Reporter: Danny Cranmer
>Priority: Major
>
> During a release of the AWS connectors the build was failing at the 
> \{{./tools/releasing/shared/stage_jars.sh}} step due to a checkstyle error.
>  
> {code:java}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-checkstyle-plugin:3.1.2:check (validate) on 
> project flink-connector-aws: Failed during checkstyle execution: Unable to 
> find suppressions file at location: /tools/maven/suppressions.xml: Could not 
> find resource '/tools/maven/suppressions.xml'. -> [Help 1] {code}
>  
> Looks like it is caused by this 
> [https://github.com/apache/flink-connector-shared-utils/commit/a75b89ee3f8c9a03e97ead2d0bd9d5b7bb02b51a]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-35035) Reduce job pause time when cluster resources are expanded in adaptive mode

2024-04-12 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17836515#comment-17836515
 ] 

Etienne Chauchot edited comment on FLINK-35035 at 4/12/24 9:10 AM:
---

I think [~dmvk] has started to think about that.  He might already have 
suggestions for improving the overall rescale timeout


was (Author: echauchot):
I think [~dmvk] has started to think about that. 

> Reduce job pause time when cluster resources are expanded in adaptive mode
> --
>
> Key: FLINK-35035
> URL: https://issues.apache.org/jira/browse/FLINK-35035
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Task
>Affects Versions: 1.19.0
>Reporter: yuanfenghu
>Priority: Minor
>
> When 'jobmanager.scheduler = adaptive' , job graph changes triggered by 
> cluster expansion will cause long-term task stagnation. We should reduce this 
> impact.
> As an example:
> I have jobgraph for : [v1 (maxp=10 minp = 1)] -> [v2 (maxp=10, minp=1)]
> When my cluster has 5 slots, the job will be executed as [v1 p5]->[v2 p5]
> When I add slots the task will trigger jobgraph changes,by
> org.apache.flink.runtime.scheduler.adaptive.ResourceListener#onNewResourcesAvailable,
> However, the five new slots I added were not discovered at the same time (for 
> convenience, I assume that a taskmanager has one slot), because no matter 
> what environment we add, we cannot guarantee that the new slots will be added 
> at once, so this will cause onNewResourcesAvailable triggers repeatedly
> ,If each new slot action has a certain interval, then the jobgraph will 
> continue to change during this period. What I hope is that there will be a 
> stable time to configure the cluster resources  and then go to it after the 
> number of cluster slots has been stable for a certain period of time. Trigger 
> jobgraph changes to avoid this situation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35035) Reduce job pause time when cluster resources are expanded in adaptive mode

2024-04-12 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17836515#comment-17836515
 ] 

Etienne Chauchot commented on FLINK-35035:
--

I think [~dmvk] has started to think about that. 

> Reduce job pause time when cluster resources are expanded in adaptive mode
> --
>
> Key: FLINK-35035
> URL: https://issues.apache.org/jira/browse/FLINK-35035
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Task
>Affects Versions: 1.19.0
>Reporter: yuanfenghu
>Priority: Minor
>
> When 'jobmanager.scheduler = adaptive' , job graph changes triggered by 
> cluster expansion will cause long-term task stagnation. We should reduce this 
> impact.
> As an example:
> I have jobgraph for : [v1 (maxp=10 minp = 1)] -> [v2 (maxp=10, minp=1)]
> When my cluster has 5 slots, the job will be executed as [v1 p5]->[v2 p5]
> When I add slots the task will trigger jobgraph changes,by
> org.apache.flink.runtime.scheduler.adaptive.ResourceListener#onNewResourcesAvailable,
> However, the five new slots I added were not discovered at the same time (for 
> convenience, I assume that a taskmanager has one slot), because no matter 
> what environment we add, we cannot guarantee that the new slots will be added 
> at once, so this will cause onNewResourcesAvailable triggers repeatedly
> ,If each new slot action has a certain interval, then the jobgraph will 
> continue to change during this period. What I hope is that there will be a 
> stable time to configure the cluster resources  and then go to it after the 
> number of cluster slots has been stable for a certain period of time. Trigger 
> jobgraph changes to avoid this situation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35035) Reduce job pause time when cluster resources are expanded in adaptive mode

2024-04-12 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17836507#comment-17836507
 ] 

Etienne Chauchot commented on FLINK-35035:
--

> if min-parallelism-increase=1 Then my job may trigger the scaling process 
> twice when I change the number of slots from 10 to 12

It is not slot per slot rescale, there will be only one rescale in these cases:
 * if the TM comes with 2 slots at once
 * if the second slot comes during the stabilization timeout

That being said, I know there is an ongoing reflection in the community to 
decrease the overall timeouts during rescale.

> Reduce job pause time when cluster resources are expanded in adaptive mode
> --
>
> Key: FLINK-35035
> URL: https://issues.apache.org/jira/browse/FLINK-35035
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Task
>Affects Versions: 1.19.0
>Reporter: yuanfenghu
>Priority: Minor
>
> When 'jobmanager.scheduler = adaptive' , job graph changes triggered by 
> cluster expansion will cause long-term task stagnation. We should reduce this 
> impact.
> As an example:
> I have jobgraph for : [v1 (maxp=10 minp = 1)] -> [v2 (maxp=10, minp=1)]
> When my cluster has 5 slots, the job will be executed as [v1 p5]->[v2 p5]
> When I add slots the task will trigger jobgraph changes,by
> org.apache.flink.runtime.scheduler.adaptive.ResourceListener#onNewResourcesAvailable,
> However, the five new slots I added were not discovered at the same time (for 
> convenience, I assume that a taskmanager has one slot), because no matter 
> what environment we add, we cannot guarantee that the new slots will be added 
> at once, so this will cause onNewResourcesAvailable triggers repeatedly
> ,If each new slot action has a certain interval, then the jobgraph will 
> continue to change during this period. What I hope is that there will be a 
> stable time to configure the cluster resources  and then go to it after the 
> number of cluster slots has been stable for a certain period of time. Trigger 
> jobgraph changes to avoid this situation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35035) Reduce job pause time when cluster resources are expanded in adaptive mode

2024-04-12 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17836462#comment-17836462
 ] 

Etienne Chauchot commented on FLINK-35035:
--

with adaptive scheduler, the jobMaster declares resources needed with a min and 
a max. The only difference with reactive mode is that the max is +INF. Here we 
are talking about declaring min resources needed. So unless there is something 
I missed, I'm not sure reactive mode is relevant here.

If I understand correctly,  what you want in the end is to use whatever new 
slots arrive in the cluster with a minimal waiting period.  So why not just 
leave default min-parallelism-increase=1, leave default scaling-interval.max 
unset and change default scaling-interval.min of 30s to 0s ?

The only thing is that you will have more frequent rescales (each time a slot 
is added to the cluster) modulo slots that are added during the stabilization 
period that do not lead to a rescale.

> Reduce job pause time when cluster resources are expanded in adaptive mode
> --
>
> Key: FLINK-35035
> URL: https://issues.apache.org/jira/browse/FLINK-35035
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Task
>Affects Versions: 1.19.0
>Reporter: yuanfenghu
>Priority: Minor
>
> When 'jobmanager.scheduler = adaptive' , job graph changes triggered by 
> cluster expansion will cause long-term task stagnation. We should reduce this 
> impact.
> As an example:
> I have jobgraph for : [v1 (maxp=10 minp = 1)] -> [v2 (maxp=10, minp=1)]
> When my cluster has 5 slots, the job will be executed as [v1 p5]->[v2 p5]
> When I add slots the task will trigger jobgraph changes,by
> org.apache.flink.runtime.scheduler.adaptive.ResourceListener#onNewResourcesAvailable,
> However, the five new slots I added were not discovered at the same time (for 
> convenience, I assume that a taskmanager has one slot), because no matter 
> what environment we add, we cannot guarantee that the new slots will be added 
> at once, so this will cause onNewResourcesAvailable triggers repeatedly
> ,If each new slot action has a certain interval, then the jobgraph will 
> continue to change during this period. What I hope is that there will be a 
> stable time to configure the cluster resources  and then go to it after the 
> number of cluster slots has been stable for a certain period of time. Trigger 
> jobgraph changes to avoid this situation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35035) Reduce job pause time when cluster resources are expanded in adaptive mode

2024-04-10 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835677#comment-17835677
 ] 

Etienne Chauchot commented on FLINK-35035:
--

> It needs to wait until scaling-interval.max is reached before triggering

Yes or wait until 5 slots are there. 

>When the resource changes, it will be judged whether the current resource 
>fully meets the parallelism requirements of the job. If it is satisfied, 
>rescheduling will be triggered directly. If it is not satisfied, it will be 
>rescheduled in after scaling-interval.max

It is already how things work when you set min-parallelism-increase and 
scaling-interval.max

> Reduce job pause time when cluster resources are expanded in adaptive mode
> --
>
> Key: FLINK-35035
> URL: https://issues.apache.org/jira/browse/FLINK-35035
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Task
>Affects Versions: 1.19.0
>Reporter: yuanfenghu
>Priority: Minor
>
> When 'jobmanager.scheduler = adaptive' , job graph changes triggered by 
> cluster expansion will cause long-term task stagnation. We should reduce this 
> impact.
> As an example:
> I have jobgraph for : [v1 (maxp=10 minp = 1)] -> [v2 (maxp=10, minp=1)]
> When my cluster has 5 slots, the job will be executed as [v1 p5]->[v2 p5]
> When I add slots the task will trigger jobgraph changes,by
> org.apache.flink.runtime.scheduler.adaptive.ResourceListener#onNewResourcesAvailable,
> However, the five new slots I added were not discovered at the same time (for 
> convenience, I assume that a taskmanager has one slot), because no matter 
> what environment we add, we cannot guarantee that the new slots will be added 
> at once, so this will cause onNewResourcesAvailable triggers repeatedly
> ,If each new slot action has a certain interval, then the jobgraph will 
> continue to change during this period. What I hope is that there will be a 
> stable time to configure the cluster resources  and then go to it after the 
> number of cluster slots has been stable for a certain period of time. Trigger 
> jobgraph changes to avoid this situation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35035) Reduce job pause time when cluster resources are expanded in adaptive mode

2024-04-09 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835434#comment-17835434
 ] 

Etienne Chauchot commented on FLINK-35035:
--

FLINK-21883 cooldown period was mainly designed to avoid too frequent rescales. 
Here is how it works when new slots are available:
 - Flink should rescale immediately only if last rescale was done more than 
scaling-interval.min (default 30s) ago.
 - Otherwise it should schedule a rescale at (now + scaling-interval.min) point 
in time.
The rescale is done like this:
 - if minimum scaling requirements are met (AdaptiveScheduler#shouldRescale 
default to minimum 1 slot added), the job is restarted with new parallelism
 - if minimum scaling requirements are not met
 -- if last rescale was done more than scaling-interval.max ago (default 
disabled), a rescale is forced.
 -- otherwise, schedule a forced rescale in scaling-interval.max

So in your case of slots arriving partially during the resource stabilization 
timeout leading to a rescale with only a portion of the ideal number of slots, 
what I see is that you can either:
1. increase the stabilization timeout hopping you'll get all the slots during 
that time
2. set min-parallelism-increase to 5 instead of default 1 and set 
scaling-interval.max. That way the first slots additions will not trigger a 
rescale but the rescale will be issued only when the 5th slot arrives and you 
will still get a security force rescale scheduled no matter what (as long as 
the parallelism has changed) after scaling-interval.max

> Reduce job pause time when cluster resources are expanded in adaptive mode
> --
>
> Key: FLINK-35035
> URL: https://issues.apache.org/jira/browse/FLINK-35035
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Task
>Affects Versions: 1.19.0
>Reporter: yuanfenghu
>Priority: Minor
>
> When 'jobmanager.scheduler = adaptive' , job graph changes triggered by 
> cluster expansion will cause long-term task stagnation. We should reduce this 
> impact.
> As an example:
> I have jobgraph for : [v1 (maxp=10 minp = 1)] -> [v2 (maxp=10, minp=1)]
> When my cluster has 5 slots, the job will be executed as [v1 p5]->[v2 p5]
> When I add slots the task will trigger jobgraph changes,by
> org.apache.flink.runtime.scheduler.adaptive.ResourceListener#onNewResourcesAvailable,
> However, the five new slots I added were not discovered at the same time (for 
> convenience, I assume that a taskmanager has one slot), because no matter 
> what environment we add, we cannot guarantee that the new slots will be added 
> at once, so this will cause onNewResourcesAvailable triggers repeatedly
> ,If each new slot action has a certain interval, then the jobgraph will 
> continue to change during this period. What I hope is that there will be a 
> stable time to configure the cluster resources  and then go to it after the 
> number of cluster slots has been stable for a certain period of time. Trigger 
> jobgraph changes to avoid this situation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-32353) Update Cassandra connector archunit violations with Flink 1.18 rules

2024-03-05 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot resolved FLINK-32353.
--
Fix Version/s: cassandra-3.2.0
   Resolution: Fixed

Even if this change is mostly a CI change, it is part of the overall Cassandra 
compatibility with Flink 1.18 which will be released with next Cassandra 
connector version. So targeting cassandra-connector-3.2.0 

> Update Cassandra connector archunit violations with Flink 1.18 rules
> 
>
> Key: FLINK-32353
> URL: https://issues.apache.org/jira/browse/FLINK-32353
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System / CI, Connectors / Cassandra
>Reporter: Martijn Visser
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: pull-request-available, stale-assigned
> Fix For: cassandra-3.2.0
>
>
> The current Cassandra connector in {{main}} fails when testing against Flink 
> 1.18-SNAPSHOT
> {code:java}
> Error:  Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.1 s 
> <<< FAILURE! - in org.apache.flink.architecture.rules.ITCaseRules
> Error:  ITCaseRules.ITCASE_USE_MINICLUSTER  Time elapsed: 0.025 s  <<< 
> FAILURE!
> java.lang.AssertionError: 
> Architecture Violation [Priority: MEDIUM] - Rule 'ITCASE tests should use a 
> MiniCluster resource or extension' was violated (1 times):
> org.apache.flink.streaming.connectors.cassandra.CassandraConnectorITCase does 
> not satisfy: only one of the following predicates match:
> * reside in a package 'org.apache.flink.runtime.*' and contain any fields 
> that are static, final, and of type InternalMiniClusterExtension and 
> annotated with @RegisterExtension
> * reside outside of package 'org.apache.flink.runtime.*' and contain any 
> fields that are static, final, and of type MiniClusterExtension and annotated 
> with @RegisterExtension or are , and of type MiniClusterTestEnvironment and 
> annotated with @TestEnv
> * reside in a package 'org.apache.flink.runtime.*' and is annotated with 
> @ExtendWith with class InternalMiniClusterExtension
> * reside outside of package 'org.apache.flink.runtime.*' and is annotated 
> with @ExtendWith with class MiniClusterExtension
>  or contain any fields that are public, static, and of type 
> MiniClusterWithClientResource and final and annotated with @ClassRule or 
> contain any fields that is of type MiniClusterWithClientResource and public 
> and final and not static and annotated with @Rule
> {code}
> https://github.com/apache/flink-connector-cassandra/actions/runs/5276835802/jobs/9544092571#step:13:811



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-32353) Update Cassandra connector archunit violations with Flink 1.18 rules

2024-03-05 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot updated FLINK-32353:
-
Component/s: Build System / CI

> Update Cassandra connector archunit violations with Flink 1.18 rules
> 
>
> Key: FLINK-32353
> URL: https://issues.apache.org/jira/browse/FLINK-32353
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System / CI, Connectors / Cassandra
>Reporter: Martijn Visser
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: pull-request-available, stale-assigned
>
> The current Cassandra connector in {{main}} fails when testing against Flink 
> 1.18-SNAPSHOT
> {code:java}
> Error:  Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.1 s 
> <<< FAILURE! - in org.apache.flink.architecture.rules.ITCaseRules
> Error:  ITCaseRules.ITCASE_USE_MINICLUSTER  Time elapsed: 0.025 s  <<< 
> FAILURE!
> java.lang.AssertionError: 
> Architecture Violation [Priority: MEDIUM] - Rule 'ITCASE tests should use a 
> MiniCluster resource or extension' was violated (1 times):
> org.apache.flink.streaming.connectors.cassandra.CassandraConnectorITCase does 
> not satisfy: only one of the following predicates match:
> * reside in a package 'org.apache.flink.runtime.*' and contain any fields 
> that are static, final, and of type InternalMiniClusterExtension and 
> annotated with @RegisterExtension
> * reside outside of package 'org.apache.flink.runtime.*' and contain any 
> fields that are static, final, and of type MiniClusterExtension and annotated 
> with @RegisterExtension or are , and of type MiniClusterTestEnvironment and 
> annotated with @TestEnv
> * reside in a package 'org.apache.flink.runtime.*' and is annotated with 
> @ExtendWith with class InternalMiniClusterExtension
> * reside outside of package 'org.apache.flink.runtime.*' and is annotated 
> with @ExtendWith with class MiniClusterExtension
>  or contain any fields that are public, static, and of type 
> MiniClusterWithClientResource and final and annotated with @ClassRule or 
> contain any fields that is of type MiniClusterWithClientResource and public 
> and final and not static and annotated with @Rule
> {code}
> https://github.com/apache/flink-connector-cassandra/actions/runs/5276835802/jobs/9544092571#step:13:811



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32353) Update Cassandra connector archunit violations with Flink 1.18 rules

2024-03-05 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17823625#comment-17823625
 ] 

Etienne Chauchot commented on FLINK-32353:
--

main: 88818685d195d9ab91b6c4ff31e91d00bc7858c9

> Update Cassandra connector archunit violations with Flink 1.18 rules
> 
>
> Key: FLINK-32353
> URL: https://issues.apache.org/jira/browse/FLINK-32353
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Cassandra
>Reporter: Martijn Visser
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: pull-request-available, stale-assigned
>
> The current Cassandra connector in {{main}} fails when testing against Flink 
> 1.18-SNAPSHOT
> {code:java}
> Error:  Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.1 s 
> <<< FAILURE! - in org.apache.flink.architecture.rules.ITCaseRules
> Error:  ITCaseRules.ITCASE_USE_MINICLUSTER  Time elapsed: 0.025 s  <<< 
> FAILURE!
> java.lang.AssertionError: 
> Architecture Violation [Priority: MEDIUM] - Rule 'ITCASE tests should use a 
> MiniCluster resource or extension' was violated (1 times):
> org.apache.flink.streaming.connectors.cassandra.CassandraConnectorITCase does 
> not satisfy: only one of the following predicates match:
> * reside in a package 'org.apache.flink.runtime.*' and contain any fields 
> that are static, final, and of type InternalMiniClusterExtension and 
> annotated with @RegisterExtension
> * reside outside of package 'org.apache.flink.runtime.*' and contain any 
> fields that are static, final, and of type MiniClusterExtension and annotated 
> with @RegisterExtension or are , and of type MiniClusterTestEnvironment and 
> annotated with @TestEnv
> * reside in a package 'org.apache.flink.runtime.*' and is annotated with 
> @ExtendWith with class InternalMiniClusterExtension
> * reside outside of package 'org.apache.flink.runtime.*' and is annotated 
> with @ExtendWith with class MiniClusterExtension
>  or contain any fields that are public, static, and of type 
> MiniClusterWithClientResource and final and annotated with @ClassRule or 
> contain any fields that is of type MiniClusterWithClientResource and public 
> and final and not static and annotated with @Rule
> {code}
> https://github.com/apache/flink-connector-cassandra/actions/runs/5276835802/jobs/9544092571#step:13:811



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-34136) Execute archunit tests only with Flink version that connectors were built against

2024-03-01 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot resolved FLINK-34136.
--
Resolution: Fixed

> Execute archunit tests only with Flink version that connectors were built 
> against
> -
>
> Key: FLINK-34136
> URL: https://issues.apache.org/jira/browse/FLINK-34136
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System / CI, Connectors / Parent
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>
> As part of [this 
> discussion|https://lists.apache.org/thread/pr0g812olzpgz21d9oodhc46db9jpxo3] 
> , the need for connectors to specify the main flink version that a connector 
> supports has arisen. 
> This CI variable will allow to configure the build and tests differently 
> depending on this version. This parameter would be optional.
> The first use case is to run archunit tests only on the main supported 
> version as discussed in the above thread.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-34137) Update CI to test archunit configuration

2024-03-01 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot resolved FLINK-34137.
--
Resolution: Fixed

> Update CI to test archunit configuration
> 
>
> Key: FLINK-34137
> URL: https://issues.apache.org/jira/browse/FLINK-34137
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System / CI, Connectors / Parent
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: pull-request-available
>
> Update CI to test skiping archunit tests on non-main Flink versions. Test on 
> submodules with archunit tests and without



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-32563) Add additionalExcludes property to add exclusions to surefire tests

2024-02-28 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot updated FLINK-32563:
-
Fix Version/s: connector-parent-1.1.0

> Add additionalExcludes property to add exclusions to surefire tests
> ---
>
> Key: FLINK-32563
> URL: https://issues.apache.org/jira/browse/FLINK-32563
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Parent
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: pull-request-available
> Fix For: connector-parent-1.1.0
>
>
> Add an optional property to add exclusions to surefire tests (among other 
> things for skipping archunit tests)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-32563) Add additionalExcludes property to add exclusions to surefire tests

2024-02-28 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot updated FLINK-32563:
-
Fix Version/s: (was: connector-parent-1.1.0)

> Add additionalExcludes property to add exclusions to surefire tests
> ---
>
> Key: FLINK-32563
> URL: https://issues.apache.org/jira/browse/FLINK-32563
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Parent
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: pull-request-available
>
> Add an optional property to add exclusions to surefire tests (among other 
> things for skipping archunit tests)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-34363) Connectors release utils should allow to not specify flink version in stage_jars.sh

2024-02-27 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot resolved FLINK-34363.
--
Resolution: Fixed

> Connectors release utils should allow to not specify flink version in 
> stage_jars.sh
> ---
>
> Key: FLINK-34363
> URL: https://issues.apache.org/jira/browse/FLINK-34363
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Parent, Release System
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: pull-request-available
>
> For connectors-parent release, Flink version is not needed. The stage_jars.sh 
> script should allow to specify only ${project_version} and not 
> ${project_version}-${flink_minor_version}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-34364) Fix release utils mount point to match the release doc and scripts

2024-02-14 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot resolved FLINK-34364.
--
Resolution: Fixed

fixed release utils integration

> Fix release utils mount point to match the release doc and scripts
> --
>
> Key: FLINK-34364
> URL: https://issues.apache.org/jira/browse/FLINK-34364
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Parent, Release System
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: pull-request-available
>
> parent_pom branch refers to an incorrect mount point tools/*release*/shared 
> instead of tools/*releasing*/shared for the release_utils. 
> _tools/releasing_/shared is the one used in the release scripts and in the 
> release docs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34364) Fix release utils mount point to match the release doc and scripts

2024-02-14 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817321#comment-17817321
 ] 

Etienne Chauchot commented on FLINK-34364:
--

parent_pom: c806d46ef06c8e46d28a6a8b5db3f5104cfe53bc

> Fix release utils mount point to match the release doc and scripts
> --
>
> Key: FLINK-34364
> URL: https://issues.apache.org/jira/browse/FLINK-34364
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Parent, Release System
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: pull-request-available
>
> parent_pom branch refers to an incorrect mount point tools/*release*/shared 
> instead of tools/*releasing*/shared for the release_utils. 
> _tools/releasing_/shared is the one used in the release scripts and in the 
> release docs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-34364) Fix release utils mount point to match the release doc and scripts

2024-02-13 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816981#comment-17816981
 ] 

Etienne Chauchot edited comment on FLINK-34364 at 2/13/24 12:21 PM:


Notably the source release script was excluding tools/releasing/shared but not 
tools/release/shared. This is why tools/release/shared was in the source 
release. 

And by the way I noticed that all the connectors source releases were 
containing an empty tools/releasing directory because only 
tools/releasing/shared is excluded in the source release script and not the 
whole tools/releasing directory. It seems a bit messy to me so I think we 
should fix that in the release scripts later on for next connectors releases.


was (Author: echauchot):
Notably the source release script was excluding tools/releasing/shared but not 
tools/release/shared. This is why tools/release/shared was in the source 
release. 

And by the way I noticed that all the connectors source releases were 
containing an empty tools/releasing directory because only 
tools/releasing/shared is excluded in the source release script. It seems a bit 
messy to me so I think we should fix that in the release scripts later on for 
next connectors releases.

> Fix release utils mount point to match the release doc and scripts
> --
>
> Key: FLINK-34364
> URL: https://issues.apache.org/jira/browse/FLINK-34364
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Parent, Release System
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: pull-request-available
>
> parent_pom branch refers to an incorrect mount point tools/*release*/shared 
> instead of tools/*releasing*/shared for the release_utils. 
> _tools/releasing_/shared is the one used in the release scripts and in the 
> release docs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34364) Fix release utils mount point to match the release doc and scripts

2024-02-13 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816981#comment-17816981
 ] 

Etienne Chauchot commented on FLINK-34364:
--

Notably the source release script was excluding tools/releasing/shared but not 
tools/release/shared. This is why tools/release/shared was in the source 
release. 

And by the way I noticed that all the connectors source releases were 
containing an empty tools/releasing directory because only 
tools/releasing/shared is excluded in the source release script. It seems a bit 
messy to me so I think we should fix that in the release scripts later on for 
next connectors releases.

> Fix release utils mount point to match the release doc and scripts
> --
>
> Key: FLINK-34364
> URL: https://issues.apache.org/jira/browse/FLINK-34364
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Parent, Release System
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: pull-request-available
>
> parent_pom branch refers to an incorrect mount point tools/*release*/shared 
> instead of tools/*releasing*/shared for the release_utils. 
> _tools/releasing_/shared is the one used in the release scripts and in the 
> release docs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34364) Fix release utils mount point to match the release doc and scripts

2024-02-13 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot updated FLINK-34364:
-
Description: parent_pom branch refers to an incorrect mount point 
tools/*release*/shared instead of tools/*releasing*/shared for the 
release_utils. _tools/releasing_/shared is the one used in the release scripts 
and in the release docs  (was: parent_pom branch refers to an incorrect mount 
point tools/*release*/shared instead of tools/*releasing*/shared. 
_tools/releasing_/shared is the one used in the release scripts and in the 
release docs)

> Fix release utils mount point to match the release doc and scripts
> --
>
> Key: FLINK-34364
> URL: https://issues.apache.org/jira/browse/FLINK-34364
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Parent, Release System
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>
> parent_pom branch refers to an incorrect mount point tools/*release*/shared 
> instead of tools/*releasing*/shared for the release_utils. 
> _tools/releasing_/shared is the one used in the release scripts and in the 
> release docs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34364) Fix release utils mount point to match the release doc and scripts

2024-02-13 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot updated FLINK-34364:
-
Description: parent_pom branch refers to an incorrect mount point 
tools/*release*/shared instead of tools/*releasing*/shared. 
_tools/releasing_/shared is the one used in the release scripts and in the 
release docs  (was: This directory is the mount point of the release utils 
repository and should be excluded from the source release.)

> Fix release utils mount point to match the release doc and scripts
> --
>
> Key: FLINK-34364
> URL: https://issues.apache.org/jira/browse/FLINK-34364
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Parent, Release System
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>
> parent_pom branch refers to an incorrect mount point tools/*release*/shared 
> instead of tools/*releasing*/shared. _tools/releasing_/shared is the one used 
> in the release scripts and in the release docs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] (FLINK-34364) Fix release utils mount point to match the release doc and scripts

2024-02-13 Thread Etienne Chauchot (Jira)


[ https://issues.apache.org/jira/browse/FLINK-34364 ]


Etienne Chauchot deleted comment on FLINK-34364:
--

was (Author: echauchot):
parent_pom branch refers to an incorrect mount point tools/*release*/shared 
instead of tools/*releasing*/shared. _tools/releasing_/shared is the one used 
in the release scripts and in the release docs

> Fix release utils mount point to match the release doc and scripts
> --
>
> Key: FLINK-34364
> URL: https://issues.apache.org/jira/browse/FLINK-34364
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Parent, Release System
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>
> This directory is the mount point of the release utils repository and should 
> be excluded from the source release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-34364) Fix release utils mount point to match the release doc and scripts

2024-02-13 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816560#comment-17816560
 ] 

Etienne Chauchot edited comment on FLINK-34364 at 2/13/24 12:11 PM:


parent_pom branch refers to an incorrect mount point tools/*release*/shared 
instead of tools/*releasing*/shared. _tools/releasing_/shared is the one used 
in the release scripts and in the release docs


was (Author: echauchot):
This was an incorrect mount point tools/*release* instead of tools/*releasing*. 
_tools/releasing_ is already excluded in the source release script And 
_tools/releasing_ is the path referred to in the docs

> Fix release utils mount point to match the release doc and scripts
> --
>
> Key: FLINK-34364
> URL: https://issues.apache.org/jira/browse/FLINK-34364
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Parent, Release System
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>
> This directory is the mount point of the release utils repository and should 
> be excluded from the source release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-34364) Fix release utils mount point to match the release doc and scripts

2024-02-12 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816560#comment-17816560
 ] 

Etienne Chauchot edited comment on FLINK-34364 at 2/12/24 1:27 PM:
---

This was an incorrect mount point tools/*release* instead of tools/*releasing*. 
_tools/releasing_ is already excluded in the source release script And 
_tools/releasing_ is the path referred to in the docs


was (Author: echauchot):
This was an incorrect mount point tools/*release* instead of tools/*releasing*. 
tools/releasing is already excluded in the source release script And 
tools/releasing is the path referred to in the docs

> Fix release utils mount point to match the release doc and scripts
> --
>
> Key: FLINK-34364
> URL: https://issues.apache.org/jira/browse/FLINK-34364
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Parent, Release System
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>
> This directory is the mount point of the release utils repository and should 
> be excluded from the source release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34364) Update flink-connector-parent-1.1.0-rc1 source release to exclude tools/release directory

2024-02-12 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot updated FLINK-34364:
-
Issue Type: Bug  (was: Improvement)

> Update flink-connector-parent-1.1.0-rc1 source release to exclude 
> tools/release directory
> -
>
> Key: FLINK-34364
> URL: https://issues.apache.org/jira/browse/FLINK-34364
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Parent, Release System
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>
> This directory is the mount point of the release utils repository and should 
> be excluded from the source release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34364) Fix release utils mount point to match the release doc and scripts

2024-02-12 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot updated FLINK-34364:
-
Summary: Fix release utils mount point to match the release doc and scripts 
 (was: Update flink-connector-parent-1.1.0-rc1 source release to exclude 
tools/release directory)

> Fix release utils mount point to match the release doc and scripts
> --
>
> Key: FLINK-34364
> URL: https://issues.apache.org/jira/browse/FLINK-34364
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Parent, Release System
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>
> This directory is the mount point of the release utils repository and should 
> be excluded from the source release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-34364) Update flink-connector-parent-1.1.0-rc1 source release to exclude tools/release directory

2024-02-12 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816560#comment-17816560
 ] 

Etienne Chauchot edited comment on FLINK-34364 at 2/12/24 1:25 PM:
---

This was an incorrect mount point tools/*release* instead of tools/*releasing*. 
tools/releasing is already excluded in the source release script And 
tools/releasing is the path referred to in the docs


was (Author: echauchot):
This was an incorrect mount point tools/*release* instead of tools/*releasing*. 
tools/releasing is already excluded in the source release script. Just need to 
re-create the source release and to republish it

> Update flink-connector-parent-1.1.0-rc1 source release to exclude 
> tools/release directory
> -
>
> Key: FLINK-34364
> URL: https://issues.apache.org/jira/browse/FLINK-34364
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Parent, Release System
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>
> This directory is the mount point of the release utils repository and should 
> be excluded from the source release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-34364) Update flink-connector-parent-1.1.0-rc1 source release to exclude tools/release directory

2024-02-12 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816560#comment-17816560
 ] 

Etienne Chauchot edited comment on FLINK-34364 at 2/12/24 10:35 AM:


This was an incorrect mount point tools/*release* instead of tools/*releasing*. 
tools/releasing is already excluded in the source release script. Just need to 
re-create the source release and to republish it


was (Author: echauchot):
This was an incorrect mount point tools/*release* instead of tools/*releasing*. 
tools/releasing is already excluded in the source release script. Just need to 
re-create the source release and to republish it, no need to update the release 
scripts.

> Update flink-connector-parent-1.1.0-rc1 source release to exclude 
> tools/release directory
> -
>
> Key: FLINK-34364
> URL: https://issues.apache.org/jira/browse/FLINK-34364
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Parent, Release System
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>
> This directory is the mount point of the release utils repository and should 
> be excluded from the source release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34364) Update flink-connector-parent-1.1.0-rc1 source release to exclude tools/release directory

2024-02-12 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816560#comment-17816560
 ] 

Etienne Chauchot commented on FLINK-34364:
--

This was an incorrect mount point tools/*release* instead of tools/*releasing*. 
tools/releasing is already excluded in the source release script. Just need to 
re-create the source release and to republish it, no need to update the release 
scripts.

> Update flink-connector-parent-1.1.0-rc1 source release to exclude 
> tools/release directory
> -
>
> Key: FLINK-34364
> URL: https://issues.apache.org/jira/browse/FLINK-34364
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Parent, Release System
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>
> This directory is the mount point of the release utils repository and should 
> be excluded from the source release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-34364) Update flink-connector-parent-1.1.0-rc1 source release to exclude tools/release directory

2024-02-12 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot reassigned FLINK-34364:


Assignee: Etienne Chauchot

> Update flink-connector-parent-1.1.0-rc1 source release to exclude 
> tools/release directory
> -
>
> Key: FLINK-34364
> URL: https://issues.apache.org/jira/browse/FLINK-34364
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Parent, Release System
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>
> This directory is the mount point of the release utils repository and should 
> be excluded from the source release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34364) Update flink-connector-parent-1.1.0-rc1 source release to exclude tools/release directory

2024-02-12 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot updated FLINK-34364:
-
Summary: Update flink-connector-parent-1.1.0-rc1 source release to exclude 
tools/release directory  (was: stage_source_release.sh should exclude 
tools/release directory from the source release)

> Update flink-connector-parent-1.1.0-rc1 source release to exclude 
> tools/release directory
> -
>
> Key: FLINK-34364
> URL: https://issues.apache.org/jira/browse/FLINK-34364
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Parent, Release System
>Reporter: Etienne Chauchot
>Priority: Major
>
> This directory is the mount point of the release utils repository and should 
> be excluded from the source release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34364) stage_source_release.sh should exclude tools/release directory from the source release

2024-02-05 Thread Etienne Chauchot (Jira)
Etienne Chauchot created FLINK-34364:


 Summary: stage_source_release.sh should exclude tools/release 
directory from the source release
 Key: FLINK-34364
 URL: https://issues.apache.org/jira/browse/FLINK-34364
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Parent, Release System
Reporter: Etienne Chauchot


This directory is the mount point of the release utils repository and should be 
excluded from the source release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34363) Connectors release utils should allow to not specify flink version in stage_jars.sh

2024-02-05 Thread Etienne Chauchot (Jira)
Etienne Chauchot created FLINK-34363:


 Summary: Connectors release utils should allow to not specify 
flink version in stage_jars.sh
 Key: FLINK-34363
 URL: https://issues.apache.org/jira/browse/FLINK-34363
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Parent, Release System
Reporter: Etienne Chauchot
Assignee: Etienne Chauchot


For connectors-parent release, Flink version is not needed. The stage_jars.sh 
script should allow to specify only ${project_version} and not 
${project_version}-${flink_minor_version}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33776) Allow to specify optional profile for connectors

2024-01-17 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot updated FLINK-33776:
-
Component/s: Build System / CI

> Allow to specify optional profile for connectors
> 
>
> Key: FLINK-33776
> URL: https://issues.apache.org/jira/browse/FLINK-33776
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System / CI, Connectors / Parent
>Reporter: Sergey Nuyanzin
>Assignee: Sergey Nuyanzin
>Priority: Major
>  Labels: pull-request-available
> Fix For: connector-parent-1.1.0
>
>
> The issue is that sometimes the connector should be tested against several 
> versions of sinks/sources
> e.g. hive connector should be tested against hive 2 and hive3, opensearch 
> should be tested against 1 and 2
> one of the way is using profiles for that



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33776) Allow to specify optional profile for connectors

2024-01-17 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot updated FLINK-33776:
-
Issue Type: Improvement  (was: Bug)

> Allow to specify optional profile for connectors
> 
>
> Key: FLINK-33776
> URL: https://issues.apache.org/jira/browse/FLINK-33776
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Parent
>Reporter: Sergey Nuyanzin
>Assignee: Sergey Nuyanzin
>Priority: Major
>  Labels: pull-request-available
> Fix For: connector-parent-1.1.0
>
>
> The issue is that sometimes the connector should be tested against several 
> versions of sinks/sources
> e.g. hive connector should be tested against hive 2 and hive3, opensearch 
> should be tested against 1 and 2
> one of the way is using profiles for that



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33776) Allow to specify optional profile for connectors

2024-01-17 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot updated FLINK-33776:
-
Fix Version/s: (was: connector-parent-1.1.0)

> Allow to specify optional profile for connectors
> 
>
> Key: FLINK-33776
> URL: https://issues.apache.org/jira/browse/FLINK-33776
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System / CI, Connectors / Parent
>Reporter: Sergey Nuyanzin
>Assignee: Sergey Nuyanzin
>Priority: Major
>  Labels: pull-request-available
>
> The issue is that sometimes the connector should be tested against several 
> versions of sinks/sources
> e.g. hive connector should be tested against hive 2 and hive3, opensearch 
> should be tested against 1 and 2
> one of the way is using profiles for that



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-33776) Allow to specify optional profile for connectors

2024-01-17 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17807803#comment-17807803
 ] 

Etienne Chauchot edited comment on FLINK-33776 at 1/17/24 5:01 PM:
---

[~Sergey Nuyanzin] I'm releasing flink-connector-parent and I'm reviewing the 
release notes. I think this ticket should be classified differently. 
I did:
- type = improvement instead of bug because this ticket is adding a new ability
- remove unrelated link to PR 23910
- this ticket touches only the CI and not the connector-parent pom. as it is 
related to connectors still I'd put Build/CI + connector/parent as components 
but remove the connector-parent fix version.

Feel free to change if you disagree.



was (Author: echauchot):
[~Sergey Nuyanzin] I'm releasing flink-connector-parent and I'm reviewing the 
release notes. I think this ticket should be classified differently. 
I did:
- type = improvement instead of bug because this ticket is adding a new ability
- remove unrelated link to PR 23910
- this ticket touches only the CI and not the connector-parent pom. as it is 
related to connectors still I'd put Build/CI + connector/parent as components 
but remove the connector-parent fix version.
Feel free to change if you disagree.


> Allow to specify optional profile for connectors
> 
>
> Key: FLINK-33776
> URL: https://issues.apache.org/jira/browse/FLINK-33776
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Parent
>Reporter: Sergey Nuyanzin
>Assignee: Sergey Nuyanzin
>Priority: Major
>  Labels: pull-request-available
> Fix For: connector-parent-1.1.0
>
>
> The issue is that sometimes the connector should be tested against several 
> versions of sinks/sources
> e.g. hive connector should be tested against hive 2 and hive3, opensearch 
> should be tested against 1 and 2
> one of the way is using profiles for that



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-33776) Allow to specify optional profile for connectors

2024-01-17 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17807803#comment-17807803
 ] 

Etienne Chauchot edited comment on FLINK-33776 at 1/17/24 5:01 PM:
---

[~Sergey Nuyanzin] I'm releasing flink-connector-parent and I'm reviewing the 
release notes. I think this ticket should be classified differently. 
I did:
- type = improvement instead of bug because this ticket is adding a new ability
- remove unrelated link to PR 23910
- this ticket touches only the CI and not the connector-parent pom. as it is 
related to connectors still I'd put Build/CI + connector/parent as components 
but remove the connector-parent fix version.
Feel free to change if you disagree.



was (Author: echauchot):
[~Sergey Nuyanzin] I'm releasing flink-connector-parent and I'm reviewing the 
release notes. I think this ticket should be classified differently. 
I'd do:
- type = improvement instead of bug 
- remove unrelated link to PR 23910
- this ticket touches only the CI and not the connector-parent pom. as it is 
related to connectors still I'd put Build/CI + connector/parent as components 
but remove the fix version.

WDYT ?


> Allow to specify optional profile for connectors
> 
>
> Key: FLINK-33776
> URL: https://issues.apache.org/jira/browse/FLINK-33776
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Parent
>Reporter: Sergey Nuyanzin
>Assignee: Sergey Nuyanzin
>Priority: Major
>  Labels: pull-request-available
> Fix For: connector-parent-1.1.0
>
>
> The issue is that sometimes the connector should be tested against several 
> versions of sinks/sources
> e.g. hive connector should be tested against hive 2 and hive3, opensearch 
> should be tested against 1 and 2
> one of the way is using profiles for that



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-33776) Allow to specify optional profile for connectors

2024-01-17 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17807803#comment-17807803
 ] 

Etienne Chauchot edited comment on FLINK-33776 at 1/17/24 4:39 PM:
---

[~Sergey Nuyanzin] I'm releasing flink-connector-parent and I'm reviewing the 
release notes. I think this ticket should be classified differently. 
I'd do:
- type = improvement instead of bug 
- remove unrelated link to PR 23910
- this ticket touches only the CI and not the connector-parent pom. as it is 
related to connectors still I'd put Build/CI + connector/parent as components 
but remove the fix version.

WDYT ?



was (Author: echauchot):
[~Sergey Nuyanzin] I'm releasing flink-connector-parent and I'm reviewing the 
release notes. I think this ticket should be classified differently. 
I'd do:
- type = improvement instead of bug 
- remove unrelated link to PR 23910
- this tickets touches only the CI and not the connector-parent pom. as it is 
related to connectors still I'd put Build/CI + connector/parent as component 
but remove the fix version.

WDYT ?


> Allow to specify optional profile for connectors
> 
>
> Key: FLINK-33776
> URL: https://issues.apache.org/jira/browse/FLINK-33776
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Parent
>Reporter: Sergey Nuyanzin
>Assignee: Sergey Nuyanzin
>Priority: Major
>  Labels: pull-request-available
> Fix For: connector-parent-1.1.0
>
>
> The issue is that sometimes the connector should be tested against several 
> versions of sinks/sources
> e.g. hive connector should be tested against hive 2 and hive3, opensearch 
> should be tested against 1 and 2
> one of the way is using profiles for that



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33776) Allow to specify optional profile for connectors

2024-01-17 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17807803#comment-17807803
 ] 

Etienne Chauchot commented on FLINK-33776:
--

[~Sergey Nuyanzin] I'm releasing flink-connector-parent and I'm reviewing the 
release notes. I think this ticket should be classified differently. 
I'd do:
- type = improvement instead of bug 
- remove unrelated link to PR 23910
- this tickets touches only the CI and not the connector-parent pom. as it is 
related to connectors still I'd put Build/CI + connector/parent as component 
but remove the fix version.

WDYT ?


> Allow to specify optional profile for connectors
> 
>
> Key: FLINK-33776
> URL: https://issues.apache.org/jira/browse/FLINK-33776
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Parent
>Reporter: Sergey Nuyanzin
>Assignee: Sergey Nuyanzin
>Priority: Major
>  Labels: pull-request-available
> Fix For: connector-parent-1.1.0
>
>
> The issue is that sometimes the connector should be tested against several 
> versions of sinks/sources
> e.g. hive connector should be tested against hive 2 and hive3, opensearch 
> should be tested against 1 and 2
> one of the way is using profiles for that



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34136) Execute archunit tests only with Flink version that connectors were built against

2024-01-17 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot updated FLINK-34136:
-
Component/s: Connectors / Parent

> Execute archunit tests only with Flink version that connectors were built 
> against
> -
>
> Key: FLINK-34136
> URL: https://issues.apache.org/jira/browse/FLINK-34136
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System / CI, Connectors / Parent
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>
> As part of [this 
> discussion|https://lists.apache.org/thread/pr0g812olzpgz21d9oodhc46db9jpxo3] 
> , the need for connectors to specify the main flink version that a connector 
> supports has arisen. 
> This CI variable will allow to configure the build and tests differently 
> depending on this version. This parameter would be optional.
> The first use case is to run archunit tests only on the main supported 
> version as discussed in the above thread.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34137) Update CI to test archunit configuration

2024-01-17 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot updated FLINK-34137:
-
Component/s: Connectors / Parent

> Update CI to test archunit configuration
> 
>
> Key: FLINK-34137
> URL: https://issues.apache.org/jira/browse/FLINK-34137
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System / CI, Connectors / Parent
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>
> Update CI to test skiping archunit tests on non-main Flink versions. Test on 
> submodules with archunit tests and without



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34137) Update CI to test archunit configuration

2024-01-17 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot updated FLINK-34137:
-
Component/s: Build System / CI

> Update CI to test archunit configuration
> 
>
> Key: FLINK-34137
> URL: https://issues.apache.org/jira/browse/FLINK-34137
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System / CI
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>
> Update CI to test skiping archunit tests on non-main Flink versions. Test on 
> submodules with archunit tests and without



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-32563) Add additionalExcludes property to add exclusions to surefire tests

2024-01-17 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot resolved FLINK-32563.
--
Fix Version/s: connector-parent-1.1.0
   Resolution: Fixed

> Add additionalExcludes property to add exclusions to surefire tests
> ---
>
> Key: FLINK-32563
> URL: https://issues.apache.org/jira/browse/FLINK-32563
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Parent
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: pull-request-available
> Fix For: connector-parent-1.1.0
>
>
> Add an optional property to add exclusions to surefire tests (among other 
> things for skipping archunit tests)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34137) Update CI to test archunit configuration

2024-01-17 Thread Etienne Chauchot (Jira)
Etienne Chauchot created FLINK-34137:


 Summary: Update CI to test archunit configuration
 Key: FLINK-34137
 URL: https://issues.apache.org/jira/browse/FLINK-34137
 Project: Flink
  Issue Type: Sub-task
Reporter: Etienne Chauchot
Assignee: Etienne Chauchot


Update CI to test skiping archunit tests on non-main Flink versions. Test on 
submodules with archunit tests and without





--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-32563) Add additionalExcludes property to add exclusions to surefire tests

2024-01-17 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot updated FLINK-32563:
-
Description: Add an optional property to add exclusions to surefire tests 
(among other things for skipping archunit tests)  (was: As part of [this 
discussion|https://lists.apache.org/thread/pr0g812olzpgz21d9oodhc46db9jpxo3] , 
the need for connectors to specify the main flink version that a connector 
supports has arisen. 

This CI variable will allow to configure the build and tests differently 
depending on this version. This parameter would be optional.

The first use case is to run archunit tests only on the main supported version 
as discussed in the above thread.)

> Add additionalExcludes property to add exclusions to surefire tests
> ---
>
> Key: FLINK-32563
> URL: https://issues.apache.org/jira/browse/FLINK-32563
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Parent
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: pull-request-available
>
> Add an optional property to add exclusions to surefire tests (among other 
> things for skipping archunit tests)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-32563) Add additionalExcludes property to add exclusions to surefire tests

2024-01-17 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot updated FLINK-32563:
-
Component/s: Connectors / Parent
 (was: Build System / CI)

> Add additionalExcludes property to add exclusions to surefire tests
> ---
>
> Key: FLINK-32563
> URL: https://issues.apache.org/jira/browse/FLINK-32563
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Parent
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: pull-request-available
>
> As part of [this 
> discussion|https://lists.apache.org/thread/pr0g812olzpgz21d9oodhc46db9jpxo3] 
> , the need for connectors to specify the main flink version that a connector 
> supports has arisen. 
> This CI variable will allow to configure the build and tests differently 
> depending on this version. This parameter would be optional.
> The first use case is to run archunit tests only on the main supported 
> version as discussed in the above thread.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-32563) Add additionalExcludes property to add exclusions to surefire tests

2024-01-17 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot updated FLINK-32563:
-
Summary: Add additionalExcludes property to add exclusions to surefire 
tests  (was: execute archunit tests only with Flink version that connectors 
were built against)

> Add additionalExcludes property to add exclusions to surefire tests
> ---
>
> Key: FLINK-32563
> URL: https://issues.apache.org/jira/browse/FLINK-32563
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System / CI
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: pull-request-available
>
> As part of [this 
> discussion|https://lists.apache.org/thread/pr0g812olzpgz21d9oodhc46db9jpxo3] 
> , the need for connectors to specify the main flink version that a connector 
> supports has arisen. 
> This CI variable will allow to configure the build and tests differently 
> depending on this version. This parameter would be optional.
> The first use case is to run archunit tests only on the main supported 
> version as discussed in the above thread.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-32563) execute archunit tests only with Flink version that connectors were built against

2024-01-17 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot updated FLINK-32563:
-
Parent: FLINK-34136
Issue Type: Sub-task  (was: Technical Debt)

> execute archunit tests only with Flink version that connectors were built 
> against
> -
>
> Key: FLINK-32563
> URL: https://issues.apache.org/jira/browse/FLINK-32563
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System / CI
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: pull-request-available
>
> As part of [this 
> discussion|https://lists.apache.org/thread/pr0g812olzpgz21d9oodhc46db9jpxo3] 
> , the need for connectors to specify the main flink version that a connector 
> supports has arisen. 
> This CI variable will allow to configure the build and tests differently 
> depending on this version. This parameter would be optional.
> The first use case is to run archunit tests only on the main supported 
> version as discussed in the above thread.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34136) Execute archunit tests only with Flink version that connectors were built against

2024-01-17 Thread Etienne Chauchot (Jira)
Etienne Chauchot created FLINK-34136:


 Summary: Execute archunit tests only with Flink version that 
connectors were built against
 Key: FLINK-34136
 URL: https://issues.apache.org/jira/browse/FLINK-34136
 Project: Flink
  Issue Type: Improvement
  Components: Build System / CI
Reporter: Etienne Chauchot
Assignee: Etienne Chauchot


As part of [this 
discussion|https://lists.apache.org/thread/pr0g812olzpgz21d9oodhc46db9jpxo3] , 
the need for connectors to specify the main flink version that a connector 
supports has arisen. 

This CI variable will allow to configure the build and tests differently 
depending on this version. This parameter would be optional.

The first use case is to run archunit tests only on the main supported version 
as discussed in the above thread.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-32894) flink-connector-parent should use maven-shade-plugin 3.3.0+ for Java 17

2024-01-17 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot updated FLINK-32894:
-
Issue Type: Technical Debt  (was: Bug)

> flink-connector-parent should use maven-shade-plugin 3.3.0+ for Java 17
> ---
>
> Key: FLINK-32894
> URL: https://issues.apache.org/jira/browse/FLINK-32894
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Connectors / Parent
>Affects Versions: connector-parent-1.0.0
>Reporter: Qingsheng Ren
>Assignee: Qingsheng Ren
>Priority: Major
>  Labels: pull-request-available
> Fix For: connector-parent-1.1.0
>
>
> When I tried to compile {{flink-sql-connector-kafka}} with Java 17 and using 
> profile {{{}-Pjava17 -Pjava17-target{}}}:
>  
> {code:java}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-shade-plugin:3.2.4:shade (shade-flink) on 
> project flink-sql-connector-kafka: Error creating shaded jar: Problem shading 
> JAR 
> flink-connectors/flink-connector-kafka/flink-connector-kafka/target/flink-connector-kafka-3.1-SNAPSHOT.jar
>  entry 
> org/apache/flink/streaming/connectors/kafka/FlinkKafkaProducerBase.class: 
> java.lang.IllegalArgumentException: Unsupported class file major version 61 
> {code}
> {{maven-shade-plugin}} supports Java 17 starting from 3.3.0 (see MSHADE-407). 
> We need to set the version of {{maven-shade-plugin}} to at least 3.3.0 for 
> profile {{java17}} in {{flink-connector-parent}} pom.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-30314) Unable to read all records from compressed delimited file input format

2023-11-10 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-30314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot resolved FLINK-30314.
--
Fix Version/s: 1.19.0
   Resolution: Fixed

Fixed by [FLINK-33059|https://issues.apache.org/jira/browse/FLINK-33059]

> Unable to read all records from compressed delimited file input format
> --
>
> Key: FLINK-30314
> URL: https://issues.apache.org/jira/browse/FLINK-30314
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem
>Affects Versions: 1.16.0, 1.15.2, 1.17.1
>Reporter: Dmitry Yaraev
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
> Attachments: input.json, input.json.gz, input.json.zip
>
>
> I am reading gzipped JSON line-delimited files in the batch mode using 
> [FileSystem 
> Connector|https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/connectors/table/filesystem/].
>  For reading the files a new table is created with the following 
> configuration:
> {code:sql}
> CREATE TEMPORARY TABLE `my_database`.`my_table` (
>   `my_field1` BIGINT,
>   `my_field2` INT,
>   `my_field3` VARCHAR(2147483647)
> ) WITH (
>   'connector' = 'filesystem',
>   'path' = 'path-to-input-dir',
>   'format' = 'json',
>   'json.ignore-parse-errors' = 'false',
>   'json.fail-on-missing-field' = 'true'
> ) {code}
> In the input directory I have two files: input-0.json.gz and 
> input-1.json.gz. As it comes from the filenames, the files are compressed 
> with GZIP. Each of the files contains 10 records. The issue is that only 2 
> records from each file are read (4 in total). If decompressed versions of the 
> same data files are used, all 20 records are read.
> As far as I understand, that problem may be related to the fact that split 
> length, which is used when the files are read, is in fact the length of a 
> compressed file. So files are closed before all records are read from them 
> because read position of the decompressed file stream exceeds split length.
> Probably, it makes sense to add a flag to {{{}FSDataInputStream{}}}, so we 
> could identify if the file compressed or not. The flag can be set to true in 
> {{InputStreamFSInputWrapper}} because it is used for wrapping compressed file 
> streams. With such a flag it could be possible to differentiate 
> non-splittable compressed files and only rely on the end of the stream.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-33059) Support transparent compression for file-connector for all file input formats

2023-11-10 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot resolved FLINK-33059.
--
Fix Version/s: 1.19.0
   Resolution: Fixed

master: 51252638fcb855a82da9983b3dfaa3b89754523e

> Support transparent compression for file-connector for all file input formats
> -
>
> Key: FLINK-33059
> URL: https://issues.apache.org/jira/browse/FLINK-33059
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Connectors / FileSystem
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Some FileInputFormats don't use FileInputFormat#createSplits (that would 
> detect that the file is non-splittable and deal with reading boundaries 
> correctly), they all create split manually from FileSourceSplit. If input 
> files are compressed, split length is determined by the compressed file 
> length leading to [this|https://issues.apache.org/jira/browse/FLINK-30314] 
> bug. We should force reading the whole file split (like it is done for binary 
> input formats) on compressed files. Parallelism is still done at the file 
> level (as now)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-32563) execute archunit tests only with Flink version that connectors were built against

2023-10-25 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot updated FLINK-32563:
-
Summary: execute archunit tests only with Flink version that connectors 
were built against  (was: execute sanity checks only with Flink version that 
connectors were built against)

> execute archunit tests only with Flink version that connectors were built 
> against
> -
>
> Key: FLINK-32563
> URL: https://issues.apache.org/jira/browse/FLINK-32563
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Build System / CI
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: pull-request-available
>
> As part of [this 
> discussion|https://lists.apache.org/thread/pr0g812olzpgz21d9oodhc46db9jpxo3] 
> , the need for connectors to specify the main flink version that a connector 
> supports has arisen. 
> This CI variable will allow to configure the build and tests differently 
> depending on this version. This parameter would be optional.
> The first use case is to run archunit tests only on the main supported 
> version as discussed in the above thread.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-32563) execute sanity checks only with Flink version that connectors were built against

2023-10-11 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17774092#comment-17774092
 ] 

Etienne Chauchot edited comment on FLINK-32563 at 10/11/23 2:31 PM:


I'm implementing that : link together the {{dependency-convergence}} and the 
{{archunit}} tests inside a sanity-check group. Enable these checks only on 
connector's main flink version (the one the connector is built against) and 
disable them for all other versions including snapshots


was (Author: echauchot):
link together the {{dependency-convergence}} and the {{archunit}} tests inside 
a sanity-check group. Enable these checks only on connector's main flink 
version (the one the connector is built against) and disable them for all other 
versions including snapshots

> execute sanity checks only with Flink version that connectors were built 
> against
> 
>
> Key: FLINK-32563
> URL: https://issues.apache.org/jira/browse/FLINK-32563
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Build System / CI
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: pull-request-available
>
> As part of [this 
> discussion|https://lists.apache.org/thread/pr0g812olzpgz21d9oodhc46db9jpxo3] 
> , the need for connectors to specify the main flink version that a connector 
> supports has arisen. 
> This CI variable will allow to configure the build and tests differently 
> depending on this version. This parameter would be optional.
> The first use case is to run archunit tests only on the main supported 
> version as discussed in the above thread.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32563) execute sanity checks only with Flink version that connectors were built against

2023-10-11 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17774092#comment-17774092
 ] 

Etienne Chauchot commented on FLINK-32563:
--

link together the {{dependency-convergence}} and the {{archunit}} tests inside 
a sanity-check group. Enable these checks only on connector's main flink 
version (the one the connector is built against) and disable them for all other 
versions including snapshots

> execute sanity checks only with Flink version that connectors were built 
> against
> 
>
> Key: FLINK-32563
> URL: https://issues.apache.org/jira/browse/FLINK-32563
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Build System / CI
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: pull-request-available
>
> As part of [this 
> discussion|https://lists.apache.org/thread/pr0g812olzpgz21d9oodhc46db9jpxo3] 
> , the need for connectors to specify the main flink version that a connector 
> supports has arisen. 
> This CI variable will allow to configure the build and tests differently 
> depending on this version. This parameter would be optional.
> The first use case is to run archunit tests only on the main supported 
> version as discussed in the above thread.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-32563) execute sanity checks only with Flink version that connectors were built against

2023-10-09 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot updated FLINK-32563:
-
Summary: execute sanity checks only with Flink version that connectors were 
built against  (was: Allow connectors CI to specify the main supported Flink 
version)

> execute sanity checks only with Flink version that connectors were built 
> against
> 
>
> Key: FLINK-32563
> URL: https://issues.apache.org/jira/browse/FLINK-32563
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Build System / CI
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>
> As part of [this 
> discussion|https://lists.apache.org/thread/pr0g812olzpgz21d9oodhc46db9jpxo3] 
> , the need for connectors to specify the main flink version that a connector 
> supports has arisen. 
> This CI variable will allow to configure the build and tests differently 
> depending on this version. This parameter would be optional.
> The first use case is to run archunit tests only on the main supported 
> version as discussed in the above thread.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33104) Nightly run for Flink Kafka connector fails due to architecture tests failing

2023-10-06 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17772516#comment-17772516
 ] 

Etienne Chauchot commented on FLINK-33104:
--

[~tzulitai] yes this is exactly what I wanted to do for Cassandra but it 
requires a change in the github actions scripts and/or on main build proposed 
[here|https://issues.apache.org/jira/browse/FLINK-32563] but I received no 
feedback. Among the 3 proposed changes in that ticket, I prefer the one that 
links the dep convergence tests and the archunit tests into a single maven 
profile (like architecture tests). I'll do this change in a PR.

> Nightly run for Flink Kafka connector fails due to architecture tests failing
> -
>
> Key: FLINK-33104
> URL: https://issues.apache.org/jira/browse/FLINK-33104
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: kafka-3.1.0
>Reporter: Martijn Visser
>Priority: Blocker
>
> {code:java}
> 2023-09-17T00:29:07.1675694Z [WARNING] Tests run: 18, Failures: 0, Errors: 0, 
> Skipped: 9, Time elapsed: 308.532 s - in 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerMigrationTest
> 2023-09-17T00:29:07.5171608Z [INFO] 
> 2023-09-17T00:29:07.5172360Z [INFO] Results:
> 2023-09-17T00:29:07.5172773Z [INFO] 
> 2023-09-17T00:29:07.5173139Z [ERROR] Failures: 
> 2023-09-17T00:29:07.5174181Z [ERROR]   Architecture Violation [Priority: 
> MEDIUM] - Rule 'ITCASE tests should use a MiniCluster resource or extension' 
> was violated (13 times):
> 2023-09-17T00:29:07.5176050Z 
> org.apache.flink.connector.kafka.sink.FlinkKafkaInternalProducerITCase does 
> not satisfy: only one of the following predicates match:
> 2023-09-17T00:29:07.5177452Z * reside in a package 
> 'org.apache.flink.runtime.*' and contain any fields that are static, final, 
> and of type InternalMiniClusterExtension and annotated with @RegisterExtension
> 2023-09-17T00:29:07.5179831Z * reside outside of package 
> 'org.apache.flink.runtime.*' and contain any fields that are static, final, 
> and of type MiniClusterExtension and annotated with @RegisterExtension or are 
> , and of type MiniClusterTestEnvironment and annotated with @TestEnv
> 2023-09-17T00:29:07.5181277Z * reside in a package 
> 'org.apache.flink.runtime.*' and is annotated with @ExtendWith with class 
> InternalMiniClusterExtension
> 2023-09-17T00:29:07.5182154Z * reside outside of package 
> 'org.apache.flink.runtime.*' and is annotated with @ExtendWith with class 
> MiniClusterExtension
> 2023-09-17T00:29:07.5182951Z  or contain any fields that are public, static, 
> and of type MiniClusterWithClientResource and final and annotated with 
> @ClassRule or contain any fields that is of type 
> MiniClusterWithClientResource and public and final and not static and 
> annotated with @Rule
> 2023-09-17T00:29:07.5183906Z 
> org.apache.flink.connector.kafka.sink.KafkaSinkITCase does not satisfy: only 
> one of the following predicates match:
> 2023-09-17T00:29:07.5184769Z * reside in a package 
> 'org.apache.flink.runtime.*' and contain any fields that are static, final, 
> and of type InternalMiniClusterExtension and annotated with @RegisterExtension
> 2023-09-17T00:29:07.5185812Z * reside outside of package 
> 'org.apache.flink.runtime.*' and contain any fields that are static, final, 
> and of type MiniClusterExtension and annotated with @RegisterExtension or are 
> , and of type MiniClusterTestEnvironment and annotated with @TestEnv
> 2023-09-17T00:29:07.5186880Z * reside in a package 
> 'org.apache.flink.runtime.*' and is annotated with @ExtendWith with class 
> InternalMiniClusterExtension
> 2023-09-17T00:29:07.5187929Z * reside outside of package 
> 'org.apache.flink.runtime.*' and is annotated with @ExtendWith with class 
> MiniClusterExtension
> 2023-09-17T00:29:07.5189073Z  or contain any fields that are public, static, 
> and of type MiniClusterWithClientResource and final and annotated with 
> @ClassRule or contain any fields that is of type 
> MiniClusterWithClientResource and public and final and not static and 
> annotated with @Rule
> 2023-09-17T00:29:07.5190076Z 
> org.apache.flink.connector.kafka.sink.KafkaTransactionLogITCase does not 
> satisfy: only one of the following predicates match:
> 2023-09-17T00:29:07.5190946Z * reside in a package 
> 'org.apache.flink.runtime.*' and contain any fields that are static, final, 
> and of type InternalMiniClusterExtension and annotated with @RegisterExtension
> 2023-09-17T00:29:07.5191983Z * reside outside of package 
> 'org.apache.flink.runtime.*' and contain any fields that are static, final, 
> and of type MiniClusterExtension and annotated with @RegisterExtension or are 
> , and of type MiniClusterTestEnvironment and annotated with @TestEnv
> 

[jira] [Comment Edited] (FLINK-33104) Nightly run for Flink Kafka connector fails due to architecture tests failing

2023-10-06 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770338#comment-17770338
 ] 

Etienne Chauchot edited comment on FLINK-33104 at 10/6/23 10:27 AM:


[~martijnvisser] actually it was never fixed in cassandra (see FLINK-32353): 
the problem is that when archunit rules change we need to update the archunit 
violation store. And there is only a single violation store. As the nightly 
tests the connector against several versions of flink, there will be failures 
if not all these versions have the same archunit rules. To fix this problem we 
need to skip archunit tests on Flink versions that are not the main one (the 
one the connector is built against) so that the single arunit violation store 
contains the violations for the main flink version. I proposed some change in 
the github action script 
[here|https://issues.apache.org/jira/browse/FLINK-32563]  for which I was 
waiting for your feedback.


was (Author: echauchot):
[~martijnvisser] actually it was never fixed in cassandra (see FLINK-32353): 
the problem is that when archunit rules change we need to update the archunit 
violation store. And there is only a single violation store. As the nightly 
tests the connector against several versions of flink, there will be failures 
if not all these versions have the same archunit rules. To fix this problem we 
need to skip archunit tests on Flink versions that are not the main one (the 
one the connector is built against) so that the single arunit violation store 
contains the violations for the main flink version. I proposed some change in 
the github action script here  for which I was waiting for your feedback.

> Nightly run for Flink Kafka connector fails due to architecture tests failing
> -
>
> Key: FLINK-33104
> URL: https://issues.apache.org/jira/browse/FLINK-33104
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: kafka-3.1.0
>Reporter: Martijn Visser
>Priority: Blocker
>
> {code:java}
> 2023-09-17T00:29:07.1675694Z [WARNING] Tests run: 18, Failures: 0, Errors: 0, 
> Skipped: 9, Time elapsed: 308.532 s - in 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerMigrationTest
> 2023-09-17T00:29:07.5171608Z [INFO] 
> 2023-09-17T00:29:07.5172360Z [INFO] Results:
> 2023-09-17T00:29:07.5172773Z [INFO] 
> 2023-09-17T00:29:07.5173139Z [ERROR] Failures: 
> 2023-09-17T00:29:07.5174181Z [ERROR]   Architecture Violation [Priority: 
> MEDIUM] - Rule 'ITCASE tests should use a MiniCluster resource or extension' 
> was violated (13 times):
> 2023-09-17T00:29:07.5176050Z 
> org.apache.flink.connector.kafka.sink.FlinkKafkaInternalProducerITCase does 
> not satisfy: only one of the following predicates match:
> 2023-09-17T00:29:07.5177452Z * reside in a package 
> 'org.apache.flink.runtime.*' and contain any fields that are static, final, 
> and of type InternalMiniClusterExtension and annotated with @RegisterExtension
> 2023-09-17T00:29:07.5179831Z * reside outside of package 
> 'org.apache.flink.runtime.*' and contain any fields that are static, final, 
> and of type MiniClusterExtension and annotated with @RegisterExtension or are 
> , and of type MiniClusterTestEnvironment and annotated with @TestEnv
> 2023-09-17T00:29:07.5181277Z * reside in a package 
> 'org.apache.flink.runtime.*' and is annotated with @ExtendWith with class 
> InternalMiniClusterExtension
> 2023-09-17T00:29:07.5182154Z * reside outside of package 
> 'org.apache.flink.runtime.*' and is annotated with @ExtendWith with class 
> MiniClusterExtension
> 2023-09-17T00:29:07.5182951Z  or contain any fields that are public, static, 
> and of type MiniClusterWithClientResource and final and annotated with 
> @ClassRule or contain any fields that is of type 
> MiniClusterWithClientResource and public and final and not static and 
> annotated with @Rule
> 2023-09-17T00:29:07.5183906Z 
> org.apache.flink.connector.kafka.sink.KafkaSinkITCase does not satisfy: only 
> one of the following predicates match:
> 2023-09-17T00:29:07.5184769Z * reside in a package 
> 'org.apache.flink.runtime.*' and contain any fields that are static, final, 
> and of type InternalMiniClusterExtension and annotated with @RegisterExtension
> 2023-09-17T00:29:07.5185812Z * reside outside of package 
> 'org.apache.flink.runtime.*' and contain any fields that are static, final, 
> and of type MiniClusterExtension and annotated with @RegisterExtension or are 
> , and of type MiniClusterTestEnvironment and annotated with @TestEnv
> 2023-09-17T00:29:07.5186880Z * reside in a package 
> 'org.apache.flink.runtime.*' and is annotated with @ExtendWith with class 
> InternalMiniClusterExtension
> 2023-09-17T00:29:07.5187929Z * reside 

[jira] [Commented] (FLINK-33104) Nightly run for Flink Kafka connector fails due to architecture tests failing

2023-09-29 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770338#comment-17770338
 ] 

Etienne Chauchot commented on FLINK-33104:
--

[~martijnvisser] actually it was never fixed in cassandra (see FLINK-32353): 
the problem is that when archunit rules change we need to update the archunit 
violation store. And there is only a single violation store. As the nightly 
tests the connector against several versions of flink, there will be failures 
if not all these versions have the same archunit rules. To fix this problem we 
need to skip archunit tests on Flink versions that are not the main one (the 
one the connector is built against) so that the single arunit violation store 
contains the violations for the main flink version. I proposed some change in 
the github action script here  for which I was waiting for your feedback.

> Nightly run for Flink Kafka connector fails due to architecture tests failing
> -
>
> Key: FLINK-33104
> URL: https://issues.apache.org/jira/browse/FLINK-33104
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: kafka-3.1.0
>Reporter: Martijn Visser
>Priority: Blocker
>
> {code:java}
> 2023-09-17T00:29:07.1675694Z [WARNING] Tests run: 18, Failures: 0, Errors: 0, 
> Skipped: 9, Time elapsed: 308.532 s - in 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerMigrationTest
> 2023-09-17T00:29:07.5171608Z [INFO] 
> 2023-09-17T00:29:07.5172360Z [INFO] Results:
> 2023-09-17T00:29:07.5172773Z [INFO] 
> 2023-09-17T00:29:07.5173139Z [ERROR] Failures: 
> 2023-09-17T00:29:07.5174181Z [ERROR]   Architecture Violation [Priority: 
> MEDIUM] - Rule 'ITCASE tests should use a MiniCluster resource or extension' 
> was violated (13 times):
> 2023-09-17T00:29:07.5176050Z 
> org.apache.flink.connector.kafka.sink.FlinkKafkaInternalProducerITCase does 
> not satisfy: only one of the following predicates match:
> 2023-09-17T00:29:07.5177452Z * reside in a package 
> 'org.apache.flink.runtime.*' and contain any fields that are static, final, 
> and of type InternalMiniClusterExtension and annotated with @RegisterExtension
> 2023-09-17T00:29:07.5179831Z * reside outside of package 
> 'org.apache.flink.runtime.*' and contain any fields that are static, final, 
> and of type MiniClusterExtension and annotated with @RegisterExtension or are 
> , and of type MiniClusterTestEnvironment and annotated with @TestEnv
> 2023-09-17T00:29:07.5181277Z * reside in a package 
> 'org.apache.flink.runtime.*' and is annotated with @ExtendWith with class 
> InternalMiniClusterExtension
> 2023-09-17T00:29:07.5182154Z * reside outside of package 
> 'org.apache.flink.runtime.*' and is annotated with @ExtendWith with class 
> MiniClusterExtension
> 2023-09-17T00:29:07.5182951Z  or contain any fields that are public, static, 
> and of type MiniClusterWithClientResource and final and annotated with 
> @ClassRule or contain any fields that is of type 
> MiniClusterWithClientResource and public and final and not static and 
> annotated with @Rule
> 2023-09-17T00:29:07.5183906Z 
> org.apache.flink.connector.kafka.sink.KafkaSinkITCase does not satisfy: only 
> one of the following predicates match:
> 2023-09-17T00:29:07.5184769Z * reside in a package 
> 'org.apache.flink.runtime.*' and contain any fields that are static, final, 
> and of type InternalMiniClusterExtension and annotated with @RegisterExtension
> 2023-09-17T00:29:07.5185812Z * reside outside of package 
> 'org.apache.flink.runtime.*' and contain any fields that are static, final, 
> and of type MiniClusterExtension and annotated with @RegisterExtension or are 
> , and of type MiniClusterTestEnvironment and annotated with @TestEnv
> 2023-09-17T00:29:07.5186880Z * reside in a package 
> 'org.apache.flink.runtime.*' and is annotated with @ExtendWith with class 
> InternalMiniClusterExtension
> 2023-09-17T00:29:07.5187929Z * reside outside of package 
> 'org.apache.flink.runtime.*' and is annotated with @ExtendWith with class 
> MiniClusterExtension
> 2023-09-17T00:29:07.5189073Z  or contain any fields that are public, static, 
> and of type MiniClusterWithClientResource and final and annotated with 
> @ClassRule or contain any fields that is of type 
> MiniClusterWithClientResource and public and final and not static and 
> annotated with @Rule
> 2023-09-17T00:29:07.5190076Z 
> org.apache.flink.connector.kafka.sink.KafkaTransactionLogITCase does not 
> satisfy: only one of the following predicates match:
> 2023-09-17T00:29:07.5190946Z * reside in a package 
> 'org.apache.flink.runtime.*' and contain any fields that are static, final, 
> and of type InternalMiniClusterExtension and annotated with @RegisterExtension
> 2023-09-17T00:29:07.5191983Z * reside 

[jira] (FLINK-30314) Unable to read all records from compressed delimited file input format

2023-09-27 Thread Etienne Chauchot (Jira)


[ https://issues.apache.org/jira/browse/FLINK-30314 ]


Etienne Chauchot deleted comment on FLINK-30314:
--

was (Author: echauchot):
https://github.com/apache/flink/pull/23443

> Unable to read all records from compressed delimited file input format
> --
>
> Key: FLINK-30314
> URL: https://issues.apache.org/jira/browse/FLINK-30314
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem
>Affects Versions: 1.16.0, 1.15.2, 1.17.1
>Reporter: Dmitry Yaraev
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: pull-request-available
> Attachments: input.json, input.json.gz, input.json.zip
>
>
> I am reading gzipped JSON line-delimited files in the batch mode using 
> [FileSystem 
> Connector|https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/connectors/table/filesystem/].
>  For reading the files a new table is created with the following 
> configuration:
> {code:sql}
> CREATE TEMPORARY TABLE `my_database`.`my_table` (
>   `my_field1` BIGINT,
>   `my_field2` INT,
>   `my_field3` VARCHAR(2147483647)
> ) WITH (
>   'connector' = 'filesystem',
>   'path' = 'path-to-input-dir',
>   'format' = 'json',
>   'json.ignore-parse-errors' = 'false',
>   'json.fail-on-missing-field' = 'true'
> ) {code}
> In the input directory I have two files: input-0.json.gz and 
> input-1.json.gz. As it comes from the filenames, the files are compressed 
> with GZIP. Each of the files contains 10 records. The issue is that only 2 
> records from each file are read (4 in total). If decompressed versions of the 
> same data files are used, all 20 records are read.
> As far as I understand, that problem may be related to the fact that split 
> length, which is used when the files are read, is in fact the length of a 
> compressed file. So files are closed before all records are read from them 
> because read position of the decompressed file stream exceeds split length.
> Probably, it makes sense to add a flag to {{{}FSDataInputStream{}}}, so we 
> could identify if the file compressed or not. The flag can be set to true in 
> {{InputStreamFSInputWrapper}} because it is used for wrapping compressed file 
> streams. With such a flag it could be possible to differentiate 
> non-splittable compressed files and only rely on the end of the stream.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33059) Support transparent compression for file-connector for all file input formats

2023-09-20 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot updated FLINK-33059:
-
Description: Some FileInputFormats don't use FileInputFormat#createSplits 
(that would detect that the file is non-splittable and deal with reading 
boundaries correctly), they all create split manually from FileSourceSplit. If 
input files are compressed, split length is determined by the compressed file 
length leading to [this|https://issues.apache.org/jira/browse/FLINK-30314] bug. 
We should force reading the whole file split (like it is done for binary input 
formats) on compressed files. Parallelism is still done at the file level (as 
now)  (was: Delimited file input formats (contrary to binary input format 
etc...) do not support compression via the existing decorator because split 
length is determined by the compressed file length lead to 
[this|https://issues.apache.org/jira/browse/FLINK-30314] bug .  We should force 
reading the whole file split (like it is done for binary input formats) on 
compressed files. Parallelism is still done at the file level (as now))

> Support transparent compression for file-connector for all file input formats
> -
>
> Key: FLINK-33059
> URL: https://issues.apache.org/jira/browse/FLINK-33059
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Connectors / FileSystem
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>
> Some FileInputFormats don't use FileInputFormat#createSplits (that would 
> detect that the file is non-splittable and deal with reading boundaries 
> correctly), they all create split manually from FileSourceSplit. If input 
> files are compressed, split length is determined by the compressed file 
> length leading to [this|https://issues.apache.org/jira/browse/FLINK-30314] 
> bug. We should force reading the whole file split (like it is done for binary 
> input formats) on compressed files. Parallelism is still done at the file 
> level (as now)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-30314) Unable to read all records from compressed delimited file input format

2023-09-20 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17767099#comment-17767099
 ] 

Etienne Chauchot commented on FLINK-30314:
--

https://github.com/apache/flink/pull/23443

> Unable to read all records from compressed delimited file input format
> --
>
> Key: FLINK-30314
> URL: https://issues.apache.org/jira/browse/FLINK-30314
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem
>Affects Versions: 1.16.0, 1.15.2, 1.17.1
>Reporter: Dmitry Yaraev
>Assignee: Etienne Chauchot
>Priority: Major
> Attachments: input.json, input.json.gz, input.json.zip
>
>
> I am reading gzipped JSON line-delimited files in the batch mode using 
> [FileSystem 
> Connector|https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/connectors/table/filesystem/].
>  For reading the files a new table is created with the following 
> configuration:
> {code:sql}
> CREATE TEMPORARY TABLE `my_database`.`my_table` (
>   `my_field1` BIGINT,
>   `my_field2` INT,
>   `my_field3` VARCHAR(2147483647)
> ) WITH (
>   'connector' = 'filesystem',
>   'path' = 'path-to-input-dir',
>   'format' = 'json',
>   'json.ignore-parse-errors' = 'false',
>   'json.fail-on-missing-field' = 'true'
> ) {code}
> In the input directory I have two files: input-0.json.gz and 
> input-1.json.gz. As it comes from the filenames, the files are compressed 
> with GZIP. Each of the files contains 10 records. The issue is that only 2 
> records from each file are read (4 in total). If decompressed versions of the 
> same data files are used, all 20 records are read.
> As far as I understand, that problem may be related to the fact that split 
> length, which is used when the files are read, is in fact the length of a 
> compressed file. So files are closed before all records are read from them 
> because read position of the decompressed file stream exceeds split length.
> Probably, it makes sense to add a flag to {{{}FSDataInputStream{}}}, so we 
> could identify if the file compressed or not. The flag can be set to true in 
> {{InputStreamFSInputWrapper}} because it is used for wrapping compressed file 
> streams. With such a flag it could be possible to differentiate 
> non-splittable compressed files and only rely on the end of the stream.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-30314) Unable to read all records from compressed delimited file input format

2023-09-18 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762651#comment-17762651
 ] 

Etienne Chauchot edited comment on FLINK-30314 at 9/18/23 8:23 AM:
---

Made the ticket description more general (the problem is with delimited input 
format being splittable but not when it is compressed). Added tests on newer 
flink versions and reduced the module scope to just file-connector (the problem 
is on the file-connector in ganeral that is use for table but also for other 
APIs).

The problem is with DelimitedInputFormat as none of its subclasses call 
FileInputFormat#createSplits (that would detect that the file is non-splittable 
and deal with reading boundaries correctly), they all use FileSource in 
org.apache.flink.connector.file.src that creates its own splits


was (Author: echauchot):
Made the ticket description more general (the problem is with delimited input 
format being splittable but not when it is compressed). Added tests on newer 
flink versions and reduced the module scope to just file-connector (the problem 
is on the file-connector in ganeral that is use for table but also for other 
APIs).

The problem is with DelimitedInputFormat as none of its subclasses call 
FileInputFormat#createSplits (that would detect that the file is non-splittable 
and deal with reading boundaries correctly), they all use FileSource dans 
org.apache.flink.connector.file.src that creates its own splits

> Unable to read all records from compressed delimited file input format
> --
>
> Key: FLINK-30314
> URL: https://issues.apache.org/jira/browse/FLINK-30314
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem
>Affects Versions: 1.16.0, 1.15.2, 1.17.1
>Reporter: Dmitry Yaraev
>Assignee: Etienne Chauchot
>Priority: Major
> Attachments: input.json, input.json.gz, input.json.zip
>
>
> I am reading gzipped JSON line-delimited files in the batch mode using 
> [FileSystem 
> Connector|https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/connectors/table/filesystem/].
>  For reading the files a new table is created with the following 
> configuration:
> {code:sql}
> CREATE TEMPORARY TABLE `my_database`.`my_table` (
>   `my_field1` BIGINT,
>   `my_field2` INT,
>   `my_field3` VARCHAR(2147483647)
> ) WITH (
>   'connector' = 'filesystem',
>   'path' = 'path-to-input-dir',
>   'format' = 'json',
>   'json.ignore-parse-errors' = 'false',
>   'json.fail-on-missing-field' = 'true'
> ) {code}
> In the input directory I have two files: input-0.json.gz and 
> input-1.json.gz. As it comes from the filenames, the files are compressed 
> with GZIP. Each of the files contains 10 records. The issue is that only 2 
> records from each file are read (4 in total). If decompressed versions of the 
> same data files are used, all 20 records are read.
> As far as I understand, that problem may be related to the fact that split 
> length, which is used when the files are read, is in fact the length of a 
> compressed file. So files are closed before all records are read from them 
> because read position of the decompressed file stream exceeds split length.
> Probably, it makes sense to add a flag to {{{}FSDataInputStream{}}}, so we 
> could identify if the file compressed or not. The flag can be set to true in 
> {{InputStreamFSInputWrapper}} because it is used for wrapping compressed file 
> streams. With such a flag it could be possible to differentiate 
> non-splittable compressed files and only rely on the end of the stream.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-30314) Unable to read all records from compressed delimited file input format

2023-09-18 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762651#comment-17762651
 ] 

Etienne Chauchot edited comment on FLINK-30314 at 9/18/23 8:22 AM:
---

Made the ticket description more general (the problem is with delimited input 
format being splittable but not when it is compressed). Added tests on newer 
flink versions and reduced the module scope to just file-connector (the problem 
is on the file-connector in ganeral that is use for table but also for other 
APIs).

The problem is with DelimitedInputFormat as none of its subclasses call 
FileInputFormat#createSplits (that would detect that the file is non-splittable 
and deal with reading boundaries correctly), they all use FileSource dans 
org.apache.flink.connector.file.src that creates its own splits


was (Author: echauchot):
Made the ticket description more general (the problem is with delimited input 
format being splittable but not when it is compressed). Added tests on newer 
flink versions and reduced the module scope to just file-connector (the problem 
is on the file-connector in ganeral that is use for table but also for other 
APIs).

The problem is with DelimitedInputFormat as none of its subclasses call 
FileInputFormat#createSplits (that would detect that the file is non-splittable 
and deal with reading boundaries correctly), then all use FileSource dans 
org.apache.flink.connector.file.src that creates its own splits

> Unable to read all records from compressed delimited file input format
> --
>
> Key: FLINK-30314
> URL: https://issues.apache.org/jira/browse/FLINK-30314
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem
>Affects Versions: 1.16.0, 1.15.2, 1.17.1
>Reporter: Dmitry Yaraev
>Assignee: Etienne Chauchot
>Priority: Major
> Attachments: input.json, input.json.gz, input.json.zip
>
>
> I am reading gzipped JSON line-delimited files in the batch mode using 
> [FileSystem 
> Connector|https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/connectors/table/filesystem/].
>  For reading the files a new table is created with the following 
> configuration:
> {code:sql}
> CREATE TEMPORARY TABLE `my_database`.`my_table` (
>   `my_field1` BIGINT,
>   `my_field2` INT,
>   `my_field3` VARCHAR(2147483647)
> ) WITH (
>   'connector' = 'filesystem',
>   'path' = 'path-to-input-dir',
>   'format' = 'json',
>   'json.ignore-parse-errors' = 'false',
>   'json.fail-on-missing-field' = 'true'
> ) {code}
> In the input directory I have two files: input-0.json.gz and 
> input-1.json.gz. As it comes from the filenames, the files are compressed 
> with GZIP. Each of the files contains 10 records. The issue is that only 2 
> records from each file are read (4 in total). If decompressed versions of the 
> same data files are used, all 20 records are read.
> As far as I understand, that problem may be related to the fact that split 
> length, which is used when the files are read, is in fact the length of a 
> compressed file. So files are closed before all records are read from them 
> because read position of the decompressed file stream exceeds split length.
> Probably, it makes sense to add a flag to {{{}FSDataInputStream{}}}, so we 
> could identify if the file compressed or not. The flag can be set to true in 
> {{InputStreamFSInputWrapper}} because it is used for wrapping compressed file 
> streams. With such a flag it could be possible to differentiate 
> non-splittable compressed files and only rely on the end of the stream.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-30314) Unable to read all records from compressed delimited file input format

2023-09-18 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-30314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot updated FLINK-30314:
-
Summary: Unable to read all records from compressed delimited file input 
format  (was: Unable to read all records from compressed delimited file format)

> Unable to read all records from compressed delimited file input format
> --
>
> Key: FLINK-30314
> URL: https://issues.apache.org/jira/browse/FLINK-30314
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem
>Affects Versions: 1.16.0, 1.15.2, 1.17.1
>Reporter: Dmitry Yaraev
>Assignee: Etienne Chauchot
>Priority: Major
> Attachments: input.json, input.json.gz, input.json.zip
>
>
> I am reading gzipped JSON line-delimited files in the batch mode using 
> [FileSystem 
> Connector|https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/connectors/table/filesystem/].
>  For reading the files a new table is created with the following 
> configuration:
> {code:sql}
> CREATE TEMPORARY TABLE `my_database`.`my_table` (
>   `my_field1` BIGINT,
>   `my_field2` INT,
>   `my_field3` VARCHAR(2147483647)
> ) WITH (
>   'connector' = 'filesystem',
>   'path' = 'path-to-input-dir',
>   'format' = 'json',
>   'json.ignore-parse-errors' = 'false',
>   'json.fail-on-missing-field' = 'true'
> ) {code}
> In the input directory I have two files: input-0.json.gz and 
> input-1.json.gz. As it comes from the filenames, the files are compressed 
> with GZIP. Each of the files contains 10 records. The issue is that only 2 
> records from each file are read (4 in total). If decompressed versions of the 
> same data files are used, all 20 records are read.
> As far as I understand, that problem may be related to the fact that split 
> length, which is used when the files are read, is in fact the length of a 
> compressed file. So files are closed before all records are read from them 
> because read position of the decompressed file stream exceeds split length.
> Probably, it makes sense to add a flag to {{{}FSDataInputStream{}}}, so we 
> could identify if the file compressed or not. The flag can be set to true in 
> {{InputStreamFSInputWrapper}} because it is used for wrapping compressed file 
> streams. With such a flag it could be possible to differentiate 
> non-splittable compressed files and only rely on the end of the stream.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-30314) Unable to read all records from compressed delimited file format

2023-09-18 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762651#comment-17762651
 ] 

Etienne Chauchot edited comment on FLINK-30314 at 9/18/23 8:21 AM:
---

Made the ticket description more general (the problem is with delimited input 
format being splittable but not when it is compressed). Added tests on newer 
flink versions and reduced the module scope to just file-connector (the problem 
is on the file-connector in ganeral that is use for table but also for other 
APIs).

The problem is with DelimitedInputFormat as none of its subclasses call 
FileInputFormat#createSplits (that would detect that the file is non-splittable 
and deal with reading boundaries correctly), then all use FileSource dans 
org.apache.flink.connector.file.src that creates its own splits


was (Author: echauchot):
Made the ticket description more general (the problem is with delimited input 
format being splittable but not when it is compressed). Added tests on newer 
flink versions and reduced the module scope to just file-connector (the problem 
is on the file-connector in ganeral that is use for table but also for other 
APIs)

> Unable to read all records from compressed delimited file format
> 
>
> Key: FLINK-30314
> URL: https://issues.apache.org/jira/browse/FLINK-30314
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem
>Affects Versions: 1.16.0, 1.15.2, 1.17.1
>Reporter: Dmitry Yaraev
>Assignee: Etienne Chauchot
>Priority: Major
> Attachments: input.json, input.json.gz, input.json.zip
>
>
> I am reading gzipped JSON line-delimited files in the batch mode using 
> [FileSystem 
> Connector|https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/connectors/table/filesystem/].
>  For reading the files a new table is created with the following 
> configuration:
> {code:sql}
> CREATE TEMPORARY TABLE `my_database`.`my_table` (
>   `my_field1` BIGINT,
>   `my_field2` INT,
>   `my_field3` VARCHAR(2147483647)
> ) WITH (
>   'connector' = 'filesystem',
>   'path' = 'path-to-input-dir',
>   'format' = 'json',
>   'json.ignore-parse-errors' = 'false',
>   'json.fail-on-missing-field' = 'true'
> ) {code}
> In the input directory I have two files: input-0.json.gz and 
> input-1.json.gz. As it comes from the filenames, the files are compressed 
> with GZIP. Each of the files contains 10 records. The issue is that only 2 
> records from each file are read (4 in total). If decompressed versions of the 
> same data files are used, all 20 records are read.
> As far as I understand, that problem may be related to the fact that split 
> length, which is used when the files are read, is in fact the length of a 
> compressed file. So files are closed before all records are read from them 
> because read position of the decompressed file stream exceeds split length.
> Probably, it makes sense to add a flag to {{{}FSDataInputStream{}}}, so we 
> could identify if the file compressed or not. The flag can be set to true in 
> {{InputStreamFSInputWrapper}} because it is used for wrapping compressed file 
> streams. With such a flag it could be possible to differentiate 
> non-splittable compressed files and only rely on the end of the stream.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-30314) Unable to read all records from compressed delimited file format

2023-09-18 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-30314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot updated FLINK-30314:
-
Summary: Unable to read all records from compressed delimited file format  
(was: Unable to read all records from compressed delimited format file)

> Unable to read all records from compressed delimited file format
> 
>
> Key: FLINK-30314
> URL: https://issues.apache.org/jira/browse/FLINK-30314
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem
>Affects Versions: 1.16.0, 1.15.2, 1.17.1
>Reporter: Dmitry Yaraev
>Assignee: Etienne Chauchot
>Priority: Major
> Attachments: input.json, input.json.gz, input.json.zip
>
>
> I am reading gzipped JSON line-delimited files in the batch mode using 
> [FileSystem 
> Connector|https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/connectors/table/filesystem/].
>  For reading the files a new table is created with the following 
> configuration:
> {code:sql}
> CREATE TEMPORARY TABLE `my_database`.`my_table` (
>   `my_field1` BIGINT,
>   `my_field2` INT,
>   `my_field3` VARCHAR(2147483647)
> ) WITH (
>   'connector' = 'filesystem',
>   'path' = 'path-to-input-dir',
>   'format' = 'json',
>   'json.ignore-parse-errors' = 'false',
>   'json.fail-on-missing-field' = 'true'
> ) {code}
> In the input directory I have two files: input-0.json.gz and 
> input-1.json.gz. As it comes from the filenames, the files are compressed 
> with GZIP. Each of the files contains 10 records. The issue is that only 2 
> records from each file are read (4 in total). If decompressed versions of the 
> same data files are used, all 20 records are read.
> As far as I understand, that problem may be related to the fact that split 
> length, which is used when the files are read, is in fact the length of a 
> compressed file. So files are closed before all records are read from them 
> because read position of the decompressed file stream exceeds split length.
> Probably, it makes sense to add a flag to {{{}FSDataInputStream{}}}, so we 
> could identify if the file compressed or not. The flag can be set to true in 
> {{InputStreamFSInputWrapper}} because it is used for wrapping compressed file 
> streams. With such a flag it could be possible to differentiate 
> non-splittable compressed files and only rely on the end of the stream.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-30314) Unable to read all records from compressed delimited format file

2023-09-18 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762651#comment-17762651
 ] 

Etienne Chauchot edited comment on FLINK-30314 at 9/18/23 8:12 AM:
---

Made the ticket description more general (the problem is with delimited input 
format being splittable but not when it is compressed). Added tests on newer 
flink versions and reduced the module scope to just file-connector (the problem 
is on the file-connector in ganeral that is use for table but also for other 
APIs)


was (Author: echauchot):
Made the ticket description more generale (the problem is with delimited input 
format being splittable but not when it is compressed). Added tests on newer 
flink versions and reduced the module scope to just file-connector (the problem 
is on the file-connector in ganeral that is use for table but also for other 
APIs)

> Unable to read all records from compressed delimited format file
> 
>
> Key: FLINK-30314
> URL: https://issues.apache.org/jira/browse/FLINK-30314
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem
>Affects Versions: 1.16.0, 1.15.2, 1.17.1
>Reporter: Dmitry Yaraev
>Assignee: Etienne Chauchot
>Priority: Major
> Attachments: input.json, input.json.gz, input.json.zip
>
>
> I am reading gzipped JSON line-delimited files in the batch mode using 
> [FileSystem 
> Connector|https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/connectors/table/filesystem/].
>  For reading the files a new table is created with the following 
> configuration:
> {code:sql}
> CREATE TEMPORARY TABLE `my_database`.`my_table` (
>   `my_field1` BIGINT,
>   `my_field2` INT,
>   `my_field3` VARCHAR(2147483647)
> ) WITH (
>   'connector' = 'filesystem',
>   'path' = 'path-to-input-dir',
>   'format' = 'json',
>   'json.ignore-parse-errors' = 'false',
>   'json.fail-on-missing-field' = 'true'
> ) {code}
> In the input directory I have two files: input-0.json.gz and 
> input-1.json.gz. As it comes from the filenames, the files are compressed 
> with GZIP. Each of the files contains 10 records. The issue is that only 2 
> records from each file are read (4 in total). If decompressed versions of the 
> same data files are used, all 20 records are read.
> As far as I understand, that problem may be related to the fact that split 
> length, which is used when the files are read, is in fact the length of a 
> compressed file. So files are closed before all records are read from them 
> because read position of the decompressed file stream exceeds split length.
> Probably, it makes sense to add a flag to {{{}FSDataInputStream{}}}, so we 
> could identify if the file compressed or not. The flag can be set to true in 
> {{InputStreamFSInputWrapper}} because it is used for wrapping compressed file 
> streams. With such a flag it could be possible to differentiate 
> non-splittable compressed files and only rely on the end of the stream.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-30314) Unable to read all records from compressed delimited format file

2023-09-07 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762689#comment-17762689
 ] 

Etienne Chauchot commented on FLINK-30314:
--

Ticket will be fixed with 
[this|https://issues.apache.org/jira/browse/FLINK-33059] feature

> Unable to read all records from compressed delimited format file
> 
>
> Key: FLINK-30314
> URL: https://issues.apache.org/jira/browse/FLINK-30314
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem
>Affects Versions: 1.16.0, 1.15.2, 1.17.1
>Reporter: Dmitry Yaraev
>Assignee: Etienne Chauchot
>Priority: Major
> Attachments: input.json, input.json.gz, input.json.zip
>
>
> I am reading gzipped JSON line-delimited files in the batch mode using 
> [FileSystem 
> Connector|https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/connectors/table/filesystem/].
>  For reading the files a new table is created with the following 
> configuration:
> {code:sql}
> CREATE TEMPORARY TABLE `my_database`.`my_table` (
>   `my_field1` BIGINT,
>   `my_field2` INT,
>   `my_field3` VARCHAR(2147483647)
> ) WITH (
>   'connector' = 'filesystem',
>   'path' = 'path-to-input-dir',
>   'format' = 'json',
>   'json.ignore-parse-errors' = 'false',
>   'json.fail-on-missing-field' = 'true'
> ) {code}
> In the input directory I have two files: input-0.json.gz and 
> input-1.json.gz. As it comes from the filenames, the files are compressed 
> with GZIP. Each of the files contains 10 records. The issue is that only 2 
> records from each file are read (4 in total). If decompressed versions of the 
> same data files are used, all 20 records are read.
> As far as I understand, that problem may be related to the fact that split 
> length, which is used when the files are read, is in fact the length of a 
> compressed file. So files are closed before all records are read from them 
> because read position of the decompressed file stream exceeds split length.
> Probably, it makes sense to add a flag to {{{}FSDataInputStream{}}}, so we 
> could identify if the file compressed or not. The flag can be set to true in 
> {{InputStreamFSInputWrapper}} because it is used for wrapping compressed file 
> streams. With such a flag it could be possible to differentiate 
> non-splittable compressed files and only rely on the end of the stream.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33059) Support transparent compression for file-connector for all file input formats

2023-09-07 Thread Etienne Chauchot (Jira)
Etienne Chauchot created FLINK-33059:


 Summary: Support transparent compression for file-connector for 
all file input formats
 Key: FLINK-33059
 URL: https://issues.apache.org/jira/browse/FLINK-33059
 Project: Flink
  Issue Type: Technical Debt
  Components: Connectors / FileSystem
Reporter: Etienne Chauchot
Assignee: Etienne Chauchot


Delimited file input formats (contrary to binary input format etc...) do not 
support compression via the existing decorator because split length is 
determined by the compressed file length lead to 
[this|https://issues.apache.org/jira/browse/FLINK-30314] bug .  We should force 
reading the whole file split (like it is done for binary input formats) on 
compressed files. Parallelism is still done at the file level (as now)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-30314) Unable to read all records from compressed delimited format file

2023-09-07 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762651#comment-17762651
 ] 

Etienne Chauchot commented on FLINK-30314:
--

Made the ticket description more generale (the problem is with delimited input 
format being splittable but not when it is compressed). Added tests on newer 
flink versions and reduced the module scope to just file-connector (the problem 
is on the file-connector in ganeral that is use for table but also for other 
APIs)

> Unable to read all records from compressed delimited format file
> 
>
> Key: FLINK-30314
> URL: https://issues.apache.org/jira/browse/FLINK-30314
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem
>Affects Versions: 1.16.0, 1.15.2, 1.17.1
>Reporter: Dmitry Yaraev
>Assignee: Etienne Chauchot
>Priority: Major
> Attachments: input.json, input.json.gz, input.json.zip
>
>
> I am reading gzipped JSON line-delimited files in the batch mode using 
> [FileSystem 
> Connector|https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/connectors/table/filesystem/].
>  For reading the files a new table is created with the following 
> configuration:
> {code:sql}
> CREATE TEMPORARY TABLE `my_database`.`my_table` (
>   `my_field1` BIGINT,
>   `my_field2` INT,
>   `my_field3` VARCHAR(2147483647)
> ) WITH (
>   'connector' = 'filesystem',
>   'path' = 'path-to-input-dir',
>   'format' = 'json',
>   'json.ignore-parse-errors' = 'false',
>   'json.fail-on-missing-field' = 'true'
> ) {code}
> In the input directory I have two files: input-0.json.gz and 
> input-1.json.gz. As it comes from the filenames, the files are compressed 
> with GZIP. Each of the files contains 10 records. The issue is that only 2 
> records from each file are read (4 in total). If decompressed versions of the 
> same data files are used, all 20 records are read.
> As far as I understand, that problem may be related to the fact that split 
> length, which is used when the files are read, is in fact the length of a 
> compressed file. So files are closed before all records are read from them 
> because read position of the decompressed file stream exceeds split length.
> Probably, it makes sense to add a flag to {{{}FSDataInputStream{}}}, so we 
> could identify if the file compressed or not. The flag can be set to true in 
> {{InputStreamFSInputWrapper}} because it is used for wrapping compressed file 
> streams. With such a flag it could be possible to differentiate 
> non-splittable compressed files and only rely on the end of the stream.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-30314) Unable to read all records from compressed delimited format file

2023-09-07 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-30314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot updated FLINK-30314:
-
  Component/s: (was: API / Core)
   (was: Table SQL / API)
Affects Version/s: 1.17.1
  Summary: Unable to read all records from compressed delimited 
format file  (was: Unable to read all records from compressed line-delimited 
JSON files using Table API)

> Unable to read all records from compressed delimited format file
> 
>
> Key: FLINK-30314
> URL: https://issues.apache.org/jira/browse/FLINK-30314
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem
>Affects Versions: 1.16.0, 1.15.2, 1.17.1
>Reporter: Dmitry Yaraev
>Assignee: Etienne Chauchot
>Priority: Major
> Attachments: input.json, input.json.gz, input.json.zip
>
>
> I am reading gzipped JSON line-delimited files in the batch mode using 
> [FileSystem 
> Connector|https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/connectors/table/filesystem/].
>  For reading the files a new table is created with the following 
> configuration:
> {code:sql}
> CREATE TEMPORARY TABLE `my_database`.`my_table` (
>   `my_field1` BIGINT,
>   `my_field2` INT,
>   `my_field3` VARCHAR(2147483647)
> ) WITH (
>   'connector' = 'filesystem',
>   'path' = 'path-to-input-dir',
>   'format' = 'json',
>   'json.ignore-parse-errors' = 'false',
>   'json.fail-on-missing-field' = 'true'
> ) {code}
> In the input directory I have two files: input-0.json.gz and 
> input-1.json.gz. As it comes from the filenames, the files are compressed 
> with GZIP. Each of the files contains 10 records. The issue is that only 2 
> records from each file are read (4 in total). If decompressed versions of the 
> same data files are used, all 20 records are read.
> As far as I understand, that problem may be related to the fact that split 
> length, which is used when the files are read, is in fact the length of a 
> compressed file. So files are closed before all records are read from them 
> because read position of the decompressed file stream exceeds split length.
> Probably, it makes sense to add a flag to {{{}FSDataInputStream{}}}, so we 
> could identify if the file compressed or not. The flag can be set to true in 
> {{InputStreamFSInputWrapper}} because it is used for wrapping compressed file 
> streams. With such a flag it could be possible to differentiate 
> non-splittable compressed files and only rely on the end of the stream.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33041) Add an introduction about how to migrate DataSet API to DataStream

2023-09-07 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762613#comment-17762613
 ] 

Etienne Chauchot commented on FLINK-33041:
--

[~Wencong Liu] yes, your article is way more comprehensive as targeted to 
completeness. I pointed the article just in case it was useful to add some 
content (what you are proposing here). I could take a look at the PR but, as 
I'm quite busy at the moment, I don't want to incur delays. So, don't wait for 
me, if I have time in a reasonable delay, I'll comment, if not, another 
reviewer could merge.

> Add an introduction about how to migrate DataSet API to DataStream
> --
>
> Key: FLINK-33041
> URL: https://issues.apache.org/jira/browse/FLINK-33041
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.18.0
>Reporter: Wencong Liu
>Priority: Major
> Fix For: 1.18.0
>
>
> The DataSet API has been formally deprecated and will no longer receive 
> active maintenance and support. It will be removed in the Flink 2.0 version. 
> Flink users are recommended to migrate from the DataSet API to the DataStream 
> API, Table API and SQL for their data processing requirements.
> Most of the DataSet operators can be implemented using the DataStream API. 
> However, we believe it would be beneficial to have an introductory article on 
> the Flink website that guides users in migrating their DataSet jobs to 
> DataStream.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-30314) Unable to read all records from compressed line-delimited JSON files using Table API

2023-09-05 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762056#comment-17762056
 ] 

Etienne Chauchot edited comment on FLINK-30314 at 9/5/23 11:58 AM:
---

[~dyaraev] Sorry for resuming this late. I have investigated the code. The 
enumerator creates a split per file which is fine then the reader reads the 
file as a single split. The compression is transparent to the user through a 
decorator that comes in play just before the actual reading process. What 
happens is what I thought, the boundaries of the split are evaluated based on 
the size in bytes of the *compressed* file therefore stopping the reading 
before the end of the uncompressed file. 

I'll look at a good way to fix it so that. It will be fixed for all file based 
readers no matter the format.


was (Author: echauchot):
[~dyaraev] Sorry for resuming this late. I have investigated the code. The 
enumerator creates a split per file which is fine then the reader reads the 
file as a single split. The compression is transparent to the user through a 
decorator that comes in play just before the actual reading process. What 
happens is what I thought, the boundaries of the split are evaluated based on 
the size in bytes of the *compressed* file therefore stopping the reading 
before the end of the uncompressed file. 

I'll look at a good way to fix it so that it is fixed for all file based 
readers no matter the format.

> Unable to read all records from compressed line-delimited JSON files using 
> Table API
> 
>
> Key: FLINK-30314
> URL: https://issues.apache.org/jira/browse/FLINK-30314
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core, Connectors / FileSystem, Table SQL / API
>Affects Versions: 1.16.0, 1.15.2
>Reporter: Dmitry Yaraev
>Assignee: Etienne Chauchot
>Priority: Major
> Attachments: input.json, input.json.gz, input.json.zip
>
>
> I am reading gzipped JSON line-delimited files in the batch mode using 
> [FileSystem 
> Connector|https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/connectors/table/filesystem/].
>  For reading the files a new table is created with the following 
> configuration:
> {code:sql}
> CREATE TEMPORARY TABLE `my_database`.`my_table` (
>   `my_field1` BIGINT,
>   `my_field2` INT,
>   `my_field3` VARCHAR(2147483647)
> ) WITH (
>   'connector' = 'filesystem',
>   'path' = 'path-to-input-dir',
>   'format' = 'json',
>   'json.ignore-parse-errors' = 'false',
>   'json.fail-on-missing-field' = 'true'
> ) {code}
> In the input directory I have two files: input-0.json.gz and 
> input-1.json.gz. As it comes from the filenames, the files are compressed 
> with GZIP. Each of the files contains 10 records. The issue is that only 2 
> records from each file are read (4 in total). If decompressed versions of the 
> same data files are used, all 20 records are read.
> As far as I understand, that problem may be related to the fact that split 
> length, which is used when the files are read, is in fact the length of a 
> compressed file. So files are closed before all records are read from them 
> because read position of the decompressed file stream exceeds split length.
> Probably, it makes sense to add a flag to {{{}FSDataInputStream{}}}, so we 
> could identify if the file compressed or not. The flag can be set to true in 
> {{InputStreamFSInputWrapper}} because it is used for wrapping compressed file 
> streams. With such a flag it could be possible to differentiate 
> non-splittable compressed files and only rely on the end of the stream.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-30314) Unable to read all records from compressed line-delimited JSON files using Table API

2023-09-05 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762056#comment-17762056
 ] 

Etienne Chauchot edited comment on FLINK-30314 at 9/5/23 11:50 AM:
---

[~dyaraev] Sorry for resuming this late. I have investigated the code. The 
enumerator creates a split per file which is fine then the reader reads the 
file as a single split. The compression is transparent to the user through a 
decorator that comes in play just before the actual reading process. What 
happens is what I thought, the boundaries of the split are evaluated based on 
the size in bytes of the *compressed* file therefore stopping the reading 
before the end of the uncompressed file. 

I'll look at a good way to fix it so that it is fixed for all file based 
readers no matter the format.


was (Author: echauchot):
[~dyaraev] Sorry for resuming this late. I have investigated the code. The 
enumerator creates a split per file which is fine then the reader reads the 
file as a single split. The compression is transparent to the user through a 
decorator that comes in play just before the actual reading process. What 
happens is what I thought, the boundaries of the split are evaluated based on 
the size in bytes of the *compressed* file therefore stopping the reading 
before the end of the uncompressed file.

> Unable to read all records from compressed line-delimited JSON files using 
> Table API
> 
>
> Key: FLINK-30314
> URL: https://issues.apache.org/jira/browse/FLINK-30314
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core, Connectors / FileSystem, Table SQL / API
>Affects Versions: 1.16.0, 1.15.2
>Reporter: Dmitry Yaraev
>Assignee: Etienne Chauchot
>Priority: Major
> Attachments: input.json, input.json.gz, input.json.zip
>
>
> I am reading gzipped JSON line-delimited files in the batch mode using 
> [FileSystem 
> Connector|https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/connectors/table/filesystem/].
>  For reading the files a new table is created with the following 
> configuration:
> {code:sql}
> CREATE TEMPORARY TABLE `my_database`.`my_table` (
>   `my_field1` BIGINT,
>   `my_field2` INT,
>   `my_field3` VARCHAR(2147483647)
> ) WITH (
>   'connector' = 'filesystem',
>   'path' = 'path-to-input-dir',
>   'format' = 'json',
>   'json.ignore-parse-errors' = 'false',
>   'json.fail-on-missing-field' = 'true'
> ) {code}
> In the input directory I have two files: input-0.json.gz and 
> input-1.json.gz. As it comes from the filenames, the files are compressed 
> with GZIP. Each of the files contains 10 records. The issue is that only 2 
> records from each file are read (4 in total). If decompressed versions of the 
> same data files are used, all 20 records are read.
> As far as I understand, that problem may be related to the fact that split 
> length, which is used when the files are read, is in fact the length of a 
> compressed file. So files are closed before all records are read from them 
> because read position of the decompressed file stream exceeds split length.
> Probably, it makes sense to add a flag to {{{}FSDataInputStream{}}}, so we 
> could identify if the file compressed or not. The flag can be set to true in 
> {{InputStreamFSInputWrapper}} because it is used for wrapping compressed file 
> streams. With such a flag it could be possible to differentiate 
> non-splittable compressed files and only rely on the end of the stream.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-30314) Unable to read all records from compressed line-delimited JSON files using Table API

2023-09-05 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762056#comment-17762056
 ] 

Etienne Chauchot commented on FLINK-30314:
--

[~dyaraev] Sorry for resuming this late. I have investigated the code. The 
enumerator creates a split per file which is fine then the reader reads the 
file as a single split. The compression is transparent to the user through a 
decorator that comes in play just before the actual reading process. What 
happens is what I thought, the boundaries of the split are evaluated based on 
the size in bytes of the *compressed* file therefore stopping the reading 
before the end of the uncompressed file.

> Unable to read all records from compressed line-delimited JSON files using 
> Table API
> 
>
> Key: FLINK-30314
> URL: https://issues.apache.org/jira/browse/FLINK-30314
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core, Connectors / FileSystem, Table SQL / API
>Affects Versions: 1.16.0, 1.15.2
>Reporter: Dmitry Yaraev
>Assignee: Etienne Chauchot
>Priority: Major
> Attachments: input.json, input.json.gz, input.json.zip
>
>
> I am reading gzipped JSON line-delimited files in the batch mode using 
> [FileSystem 
> Connector|https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/connectors/table/filesystem/].
>  For reading the files a new table is created with the following 
> configuration:
> {code:sql}
> CREATE TEMPORARY TABLE `my_database`.`my_table` (
>   `my_field1` BIGINT,
>   `my_field2` INT,
>   `my_field3` VARCHAR(2147483647)
> ) WITH (
>   'connector' = 'filesystem',
>   'path' = 'path-to-input-dir',
>   'format' = 'json',
>   'json.ignore-parse-errors' = 'false',
>   'json.fail-on-missing-field' = 'true'
> ) {code}
> In the input directory I have two files: input-0.json.gz and 
> input-1.json.gz. As it comes from the filenames, the files are compressed 
> with GZIP. Each of the files contains 10 records. The issue is that only 2 
> records from each file are read (4 in total). If decompressed versions of the 
> same data files are used, all 20 records are read.
> As far as I understand, that problem may be related to the fact that split 
> length, which is used when the files are read, is in fact the length of a 
> compressed file. So files are closed before all records are read from them 
> because read position of the decompressed file stream exceeds split length.
> Probably, it makes sense to add a flag to {{{}FSDataInputStream{}}}, so we 
> could identify if the file compressed or not. The flag can be set to true in 
> {{InputStreamFSInputWrapper}} because it is used for wrapping compressed file 
> streams. With such a flag it could be possible to differentiate 
> non-splittable compressed files and only rely on the end of the stream.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-29563) SourceTestSuiteBase#testSourceMetrics enters an infinite waiting loop in case the number of records counter is wrong

2023-08-16 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-29563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot closed FLINK-29563.

Resolution: Won't Fix

> SourceTestSuiteBase#testSourceMetrics enters an infinite waiting loop in case 
> the number of records counter is wrong
> 
>
> Key: FLINK-29563
> URL: https://issues.apache.org/jira/browse/FLINK-29563
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: pull-request-available, stale-assigned
>
> The call to _CommonTestUtils#waitUntilCondition_ (1) makes the test wait for 
> the condition _Precision.equals(allRecordSize, sumNumRecordsIn)_. In case the 
> reported number of records is incorrect, the waiting loop never ends.
> [1] 
> https://github.com/apache/flink/blob/a6092b1176d15a7af32a7eb19f59cdfeab172034/flink-test-utils-parent/flink-connector-test-utils/src/main/java/org/apache/flink/connector/testframe/testsuites/SourceTestSuiteBase.java#L451
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-31749) The Using Hadoop OutputFormats example is not avaliable for DataStream

2023-08-16 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot resolved FLINK-31749.
--
Resolution: Fixed

> The Using Hadoop OutputFormats example is not avaliable for DataStream
> --
>
> Key: FLINK-31749
> URL: https://issues.apache.org/jira/browse/FLINK-31749
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.17.0, 1.15.4
>Reporter: junzhong qin
>Assignee: Etienne Chauchot
>Priority: Not a Priority
>  Labels: pull-request-available, stale-assigned
>
> The following example shows how to use Hadoop’s {{TextOutputFormat from the 
> doc: 
> [https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/connectors/datastream/formats/hadoop/#using-hadoop-outputformats]
>  . But the DataStream has no output().}}
> {code:java}
> // Obtain the result we want to emit
> DataStream> hadoopResult = [...]
> // Set up the Hadoop TextOutputFormat.
> HadoopOutputFormat hadoopOF =
>   // create the Flink wrapper.
>   new HadoopOutputFormat(
> // set the Hadoop OutputFormat and specify the job.
> new TextOutputFormat(), job
>   );
> hadoopOF.getConfiguration().set("mapreduce.output.textoutputformat.separator",
>  " ");
> TextOutputFormat.setOutputPath(job, new Path(outputPath));
> // Emit data using the Hadoop TextOutputFormat.
> hadoopResult.output(hadoopOF); {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32563) Allow connectors CI to specify the main supported Flink version

2023-08-11 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17753211#comment-17753211
 ] 

Etienne Chauchot commented on FLINK-32563:
--

[~chesnay] [~martijnvisser] WDYT ?

> Allow connectors CI to specify the main supported Flink version
> ---
>
> Key: FLINK-32563
> URL: https://issues.apache.org/jira/browse/FLINK-32563
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Build System / CI
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>
> As part of [this 
> discussion|https://lists.apache.org/thread/pr0g812olzpgz21d9oodhc46db9jpxo3] 
> , the need for connectors to specify the main flink version that a connector 
> supports has arisen. 
> This CI variable will allow to configure the build and tests differently 
> depending on this version. This parameter would be optional.
> The first use case is to run archunit tests only on the main supported 
> version as discussed in the above thread.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32563) Allow connectors CI to specify the main supported Flink version

2023-07-12 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17742446#comment-17742446
 ] 

Etienne Chauchot commented on FLINK-32563:
--

Also the other option (proposed privately by [~chesnay] ) is to integrate 
archunit tests into the dependency-convergence maven profile. That would allow 
to have similar sanity checks in the same profile and reuse what is already in 
place for the run_dependency_convergence github input variable.

> Allow connectors CI to specify the main supported Flink version
> ---
>
> Key: FLINK-32563
> URL: https://issues.apache.org/jira/browse/FLINK-32563
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Build System / CI
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>
> As part of [this 
> discussion|https://lists.apache.org/thread/pr0g812olzpgz21d9oodhc46db9jpxo3] 
> , the need for connectors to specify the main flink version that a connector 
> supports has arisen. 
> This CI variable will allow to configure the build and tests differently 
> depending on this version. This parameter would be optional.
> The first use case is to run archunit tests only on the main supported 
> version as discussed in the above thread.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32563) Allow connectors CI to specify the main supported Flink version

2023-07-11 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17742101#comment-17742101
 ] 

Etienne Chauchot commented on FLINK-32563:
--

[~martijnvisser] I had in mind to be slightly more coercive with connector 
authors so that they run CI test on last 2 versions and specify which of the 2 
is main supported one (for running archunit but also other things). I was 
thinking of something like this (in _testing.yml style):
{code:java}
jobs:
  compile_and_test:
strategy:
  matrix:
include:
  - flink: 1.16.2
main_version: false
  - flink: 1.17.1
main_version: true
uses: ./.github/workflows/ci.yml
with:
  connector_branch: ci_utils
  flink_version: ${{ matrix.flink }}
  main_flink_version: ${{ matrix.main_version }}
{code}
{code:java}
inputs:
main_flink_version:
description: "Is the input Flink version, the main version that the 
connector supports."
required: false  // to avoid break the existing connectors
type: boolean
default: false
{code}
Do you prefer something like this ?
{code:java}
jobs:
  enable-archunit-tests:
uses: ./.github/workflows/ci.yml
with:
  flink_version: 1.17.1
  connector_branch: ci_utils
  run_archunit_tests: true
{code}
{code:java}
inputs:
  run_archunit_tests:
description: "Whether to run the archunit tests"
required: false // to avoid break the existing connectors
type: boolean
default: false 
{code}

> Allow connectors CI to specify the main supported Flink version
> ---
>
> Key: FLINK-32563
> URL: https://issues.apache.org/jira/browse/FLINK-32563
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Build System / CI
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>
> As part of [this 
> discussion|https://lists.apache.org/thread/pr0g812olzpgz21d9oodhc46db9jpxo3] 
> , the need for connectors to specify the main flink version that a connector 
> supports has arisen. 
> This CI variable will allow to configure the build and tests differently 
> depending on this version. This parameter would be optional.
> The first use case is to run archunit tests only on the main supported 
> version as discussed in the above thread.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-32563) Allow connectors CI to specify the main supported Flink version

2023-07-07 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741045#comment-17741045
 ] 

Etienne Chauchot edited comment on FLINK-32563 at 7/7/23 3:14 PM:
--

[~martijnvisser]

> From the discussion thread I read it as that you want to be able to skip 
> archunit for specific versions?

Yes, disable it for all the versions except the main supported by the connector

> See 
> [https://github.com/apache/flink-connector-shared-utils/blob/ci_utils/.github/workflows/_testing.yml#L25-L28]

I did not know __testing.yml workflow. What is the link with 
connector/push_pr.yml ? I see both of them call ci.yml@ci_utils. Also both run 
ci.yml workflow on various flink versions but only push_pr.yml allows the 
connector author to specify which flink versions to test his connector against.


was (Author: echauchot):
[~martijnvisser]

> From the discussion thread I read it as that you want to be able to skip 
> archunit for specific versions?

Yes, disable it for all the versions except the main supported by the connector

> See 
> [https://github.com/apache/flink-connector-shared-utils/blob/ci_utils/.github/workflows/_testing.yml#L25-L28]

I did not know {_}{{_}}testing.yml{{_}} workflow. What is the link with 
connector/push_pr.yml ? I see both of them call{_} ci.yml@ci_utils. Also both 
run ci.yml workflow on various flink versions but only push_pr.yml allows the 
connector author to specify which flink versions to test his connector against.

> Allow connectors CI to specify the main supported Flink version
> ---
>
> Key: FLINK-32563
> URL: https://issues.apache.org/jira/browse/FLINK-32563
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Build System / CI
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>
> As part of [this 
> discussion|https://lists.apache.org/thread/pr0g812olzpgz21d9oodhc46db9jpxo3] 
> , the need for connectors to specify the main flink version that a connector 
> supports has arisen. 
> This CI variable will allow to configure the build and tests differently 
> depending on this version. This parameter would be optional.
> The first use case is to run archunit tests only on the main supported 
> version as discussed in the above thread.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-32563) Allow connectors CI to specify the main supported Flink version

2023-07-07 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741045#comment-17741045
 ] 

Etienne Chauchot edited comment on FLINK-32563 at 7/7/23 3:13 PM:
--

[~martijnvisser]

> From the discussion thread I read it as that you want to be able to skip 
> archunit for specific versions?

Yes, disable it for all the versions except the main supported by the connector

> See 
> [https://github.com/apache/flink-connector-shared-utils/blob/ci_utils/.github/workflows/_testing.yml#L25-L28]

I did not know {_}{{_}}testing.yml{{_}} workflow. What is the link with 
connector/push_pr.yml ? I see both of them call{_} ci.yml@ci_utils. Also both 
run ci.yml workflow on various flink versions but only push_pr.yml allows the 
connector author to specify which flink versions to test his connector against.


was (Author: echauchot):
[~martijnvisser]

> From the discussion thread I read it as that you want to be able to skip 
> archunit for specific versions?

Yes, disable it for all the versions except the main supported by the connector

> See 
> [https://github.com/apache/flink-connector-shared-utils/blob/ci_utils/.github/workflows/_testing.yml#L25-L28]

I did not know _{_}testing.yml{_} workflow. What is the link with 
connector/push_pr.yml ? I see both of them call {_}ci.yml@ci_utils{_}. Also 
both run ci.yml workflow on various flink versions but only push_pr.yml allows 
the connector author to specify which flink versions to test his connector 
against.

> Allow connectors CI to specify the main supported Flink version
> ---
>
> Key: FLINK-32563
> URL: https://issues.apache.org/jira/browse/FLINK-32563
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Build System / CI
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>
> As part of [this 
> discussion|https://lists.apache.org/thread/pr0g812olzpgz21d9oodhc46db9jpxo3] 
> , the need for connectors to specify the main flink version that a connector 
> supports has arisen. 
> This CI variable will allow to configure the build and tests differently 
> depending on this version. This parameter would be optional.
> The first use case is to run archunit tests only on the main supported 
> version as discussed in the above thread.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-32563) Allow connectors CI to specify the main supported Flink version

2023-07-07 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741045#comment-17741045
 ] 

Etienne Chauchot edited comment on FLINK-32563 at 7/7/23 1:49 PM:
--

[~martijnvisser]

> From the discussion thread I read it as that you want to be able to skip 
> archunit for specific versions?

Yes, disable it for all the versions except the main supported by the connector

> See 
> [https://github.com/apache/flink-connector-shared-utils/blob/ci_utils/.github/workflows/_testing.yml#L25-L28]

I did not know _{_}testing.yml{_} workflow. What is the link with 
connector/push_pr.yml ? I see both of them call {_}ci.yml@ci_utils{_}. Also 
both run ci.yml workflow on various flink versions but only push_pr.yml allows 
the connector author to specify which flink versions to test his connector 
against.


was (Author: echauchot):
[~martijnvisser]

> From the discussion thread I read it as that you want to be able to skip 
> archunit for specific versions?

Yes, disable it for all the versions except the main supported by the connector

> See 
> [https://github.com/apache/flink-connector-shared-utils/blob/ci_utils/.github/workflows/_testing.yml#L25-L28]

I did not know __testing.yml_ workflow. What is the link with 
connector/push_pr.yml ? I see both of them call _ci.yml@ci_utils{_}. Also both 
run this workflow on various flink versions but only push_pr.yml allows the 
connector author to specify flink version to test his connector against.

> Allow connectors CI to specify the main supported Flink version
> ---
>
> Key: FLINK-32563
> URL: https://issues.apache.org/jira/browse/FLINK-32563
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Build System / CI
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>
> As part of [this 
> discussion|https://lists.apache.org/thread/pr0g812olzpgz21d9oodhc46db9jpxo3] 
> , the need for connectors to specify the main flink version that a connector 
> supports has arisen. 
> This CI variable will allow to configure the build and tests differently 
> depending on this version. This parameter would be optional.
> The first use case is to run archunit tests only on the main supported 
> version as discussed in the above thread.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-32563) Allow connectors CI to specify the main supported Flink version

2023-07-07 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741045#comment-17741045
 ] 

Etienne Chauchot edited comment on FLINK-32563 at 7/7/23 1:48 PM:
--

[~martijnvisser]

> From the discussion thread I read it as that you want to be able to skip 
> archunit for specific versions?

Yes, disable it for all the versions except the main supported by the connector

> See 
> [https://github.com/apache/flink-connector-shared-utils/blob/ci_utils/.github/workflows/_testing.yml#L25-L28]

I did not know __testing.yml_ workflow. What is the link with 
connector/push_pr.yml ? I see both of them call _ci.yml@ci_utils{_}. Also both 
run this workflow on various flink versions but only push_pr.yml allows the 
connector author to specify flink version to test his connector against.


was (Author: echauchot):
[~martijnvisser]

> From the discussion thread I read it as that you want to be able to skip 
> archunit for specific versions?

Yes, disable it for all the versions except the main supported by the connector

> See 
> https://github.com/apache/flink-connector-shared-utils/blob/ci_utils/.github/workflows/_testing.yml#L25-L28

I did not know __testing.yml workflow. What is the link with 
connector/push_pr.yml ? I see both of them call _ci.yml@ci_utils_. Also both 
run this workflow on various flink versions but only push_pr.yml allows the 
connector author to specify flink version to test his connector against.

> Allow connectors CI to specify the main supported Flink version
> ---
>
> Key: FLINK-32563
> URL: https://issues.apache.org/jira/browse/FLINK-32563
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Build System / CI
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>
> As part of [this 
> discussion|https://lists.apache.org/thread/pr0g812olzpgz21d9oodhc46db9jpxo3] 
> , the need for connectors to specify the main flink version that a connector 
> supports has arisen. 
> This CI variable will allow to configure the build and tests differently 
> depending on this version. This parameter would be optional.
> The first use case is to run archunit tests only on the main supported 
> version as discussed in the above thread.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-32563) Allow connectors CI to specify the main supported Flink version

2023-07-07 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741045#comment-17741045
 ] 

Etienne Chauchot edited comment on FLINK-32563 at 7/7/23 1:47 PM:
--

[~martijnvisser]

> From the discussion thread I read it as that you want to be able to skip 
> archunit for specific versions?

Yes, disable it for all the versions except the main supported by the connector

> See 
> https://github.com/apache/flink-connector-shared-utils/blob/ci_utils/.github/workflows/_testing.yml#L25-L28

I did not know __testing.yml workflow. What is the link with 
connector/push_pr.yml ? I see both of them call _ci.yml@ci_utils_. Also both 
run this workflow on various flink versions but only push_pr.yml allows the 
connector author to specify flink version to test his connector against.


was (Author: echauchot):
[~martijnvisser]

> From the discussion thread I read it as that you want to be able to skip 
> archunit for specific versions?

Yes, disable it for all the versions except the main supported by the connector

> See 
> https://github.com/apache/flink-connector-shared-utils/blob/ci_utils/.github/workflows/_testing.yml#L25-L28

I did not know "_testing.yml" workflow. What is the link with 
connector/push_pr.yml ? I see both of them call _ci.yml@ci_utils_. Also both 
run this workflow on various flink versions but only push_pr.yml allows the 
connector author to specify flink version to test his connector against.

> Allow connectors CI to specify the main supported Flink version
> ---
>
> Key: FLINK-32563
> URL: https://issues.apache.org/jira/browse/FLINK-32563
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Build System / CI
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>
> As part of [this 
> discussion|https://lists.apache.org/thread/pr0g812olzpgz21d9oodhc46db9jpxo3] 
> , the need for connectors to specify the main flink version that a connector 
> supports has arisen. 
> This CI variable will allow to configure the build and tests differently 
> depending on this version. This parameter would be optional.
> The first use case is to run archunit tests only on the main supported 
> version as discussed in the above thread.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-32563) Allow connectors CI to specify the main supported Flink version

2023-07-07 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741045#comment-17741045
 ] 

Etienne Chauchot edited comment on FLINK-32563 at 7/7/23 1:47 PM:
--

[~martijnvisser]

> From the discussion thread I read it as that you want to be able to skip 
> archunit for specific versions?

Yes, disable it for all the versions except the main supported by the connector

> See 
> https://github.com/apache/flink-connector-shared-utils/blob/ci_utils/.github/workflows/_testing.yml#L25-L28

I did not know "_testing.yml" workflow. What is the link with 
connector/push_pr.yml ? I see both of them call _ci.yml@ci_utils_. Also both 
run this workflow on various flink versions but only push_pr.yml allows the 
connector author to specify flink version to test his connector against.


was (Author: echauchot):
[~martijnvisser]

> From the discussion thread I read it as that you want to be able to skip 
> archunit for specific versions?

Yes, disable it for all the versions except the main supported by the connector

> See 
> https://github.com/apache/flink-connector-shared-utils/blob/ci_utils/.github/workflows/_testing.yml#L25-L28

I did not know _testing.yml workflow. What is the link with 
connector/push_pr.yml ? I see both of them call _ci.yml@ci_utils_. Also both 
run this workflow on various flink versions but only push_pr.yml allows the 
connector author to specify flink version to test his connector against.

> Allow connectors CI to specify the main supported Flink version
> ---
>
> Key: FLINK-32563
> URL: https://issues.apache.org/jira/browse/FLINK-32563
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Build System / CI
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>
> As part of [this 
> discussion|https://lists.apache.org/thread/pr0g812olzpgz21d9oodhc46db9jpxo3] 
> , the need for connectors to specify the main flink version that a connector 
> supports has arisen. 
> This CI variable will allow to configure the build and tests differently 
> depending on this version. This parameter would be optional.
> The first use case is to run archunit tests only on the main supported 
> version as discussed in the above thread.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32563) Allow connectors CI to specify the main supported Flink version

2023-07-07 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741045#comment-17741045
 ] 

Etienne Chauchot commented on FLINK-32563:
--

[~martijnvisser]

> From the discussion thread I read it as that you want to be able to skip 
> archunit for specific versions?

Yes, disable it for all the versions except the main supported by the connector

> See 
> https://github.com/apache/flink-connector-shared-utils/blob/ci_utils/.github/workflows/_testing.yml#L25-L28

I did not know _testing.yml workflow. What is the link with 
connector/push_pr.yml ? I see both of them call _ci.yml@ci_utils_. Also both 
run this workflow on various flink versions but only push_pr.yml allows the 
connector author to specify flink version to test his connector against.

> Allow connectors CI to specify the main supported Flink version
> ---
>
> Key: FLINK-32563
> URL: https://issues.apache.org/jira/browse/FLINK-32563
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Build System / CI
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>
> As part of [this 
> discussion|https://lists.apache.org/thread/pr0g812olzpgz21d9oodhc46db9jpxo3] 
> , the need for connectors to specify the main flink version that a connector 
> supports has arisen. 
> This CI variable will allow to configure the build and tests differently 
> depending on this version. This parameter would be optional.
> The first use case is to run archunit tests only on the main supported 
> version as discussed in the above thread.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32563) Allow connectors CI to specify the main supported Flink version

2023-07-07 Thread Etienne Chauchot (Jira)
Etienne Chauchot created FLINK-32563:


 Summary: Allow connectors CI to specify the main supported Flink 
version
 Key: FLINK-32563
 URL: https://issues.apache.org/jira/browse/FLINK-32563
 Project: Flink
  Issue Type: Technical Debt
  Components: Build System / CI
Reporter: Etienne Chauchot
Assignee: Etienne Chauchot


As part of [this 
discussion|https://lists.apache.org/thread/pr0g812olzpgz21d9oodhc46db9jpxo3] , 
the need for connectors to specify the main flink version that a connector 
supports has arisen. 

This CI variable will allow to configure the build and tests differently 
depending on this version. This parameter would be optional.

The first use case is to run archunit tests only on the main supported version 
as discussed in the above thread.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-32353) Update Cassandra connector archunit violations with Flink 1.18 rules

2023-07-06 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot updated FLINK-32353:
-
Summary: Update Cassandra connector archunit violations with Flink 1.18 
rules  (was: Make Cassandra connector tests compatible with archunit rules)

> Update Cassandra connector archunit violations with Flink 1.18 rules
> 
>
> Key: FLINK-32353
> URL: https://issues.apache.org/jira/browse/FLINK-32353
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Cassandra
>Reporter: Martijn Visser
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: pull-request-available
>
> The current Cassandra connector in {{main}} fails when testing against Flink 
> 1.18-SNAPSHOT
> {code:java}
> Error:  Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.1 s 
> <<< FAILURE! - in org.apache.flink.architecture.rules.ITCaseRules
> Error:  ITCaseRules.ITCASE_USE_MINICLUSTER  Time elapsed: 0.025 s  <<< 
> FAILURE!
> java.lang.AssertionError: 
> Architecture Violation [Priority: MEDIUM] - Rule 'ITCASE tests should use a 
> MiniCluster resource or extension' was violated (1 times):
> org.apache.flink.streaming.connectors.cassandra.CassandraConnectorITCase does 
> not satisfy: only one of the following predicates match:
> * reside in a package 'org.apache.flink.runtime.*' and contain any fields 
> that are static, final, and of type InternalMiniClusterExtension and 
> annotated with @RegisterExtension
> * reside outside of package 'org.apache.flink.runtime.*' and contain any 
> fields that are static, final, and of type MiniClusterExtension and annotated 
> with @RegisterExtension or are , and of type MiniClusterTestEnvironment and 
> annotated with @TestEnv
> * reside in a package 'org.apache.flink.runtime.*' and is annotated with 
> @ExtendWith with class InternalMiniClusterExtension
> * reside outside of package 'org.apache.flink.runtime.*' and is annotated 
> with @ExtendWith with class MiniClusterExtension
>  or contain any fields that are public, static, and of type 
> MiniClusterWithClientResource and final and annotated with @ClassRule or 
> contain any fields that is of type MiniClusterWithClientResource and public 
> and final and not static and annotated with @Rule
> {code}
> https://github.com/apache/flink-connector-cassandra/actions/runs/5276835802/jobs/9544092571#step:13:811



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-32353) Make Cassandra connector tests compatible with archunit rules

2023-07-05 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740251#comment-17740251
 ] 

Etienne Chauchot edited comment on FLINK-32353 at 7/5/23 3:05 PM:
--

The mini cluster rule has been fixed in  [Flink 
1.18|https://github.com/apache/flink/pull/22399/]
 * CassandraConnectorITCase does not use/needs MiniCluster so it is still an 
"violation" but the rule message has changed.
 * the "violation" for CassandraSourceITCase is now gone as the rule was fixed

To get rid of this failure when testing against flink 1.18, we need to update 
the archunit violation store. But as part of [this 
email|https://lists.apache.org/thread/pr0g812olzpgz21d9oodhc46db9jpxo3] , I'll 
update the violation store only when flink 1.18 is the main supported version 
for the Cassandra connector, surely short after this flink release is out.


was (Author: echauchot):
Regarding [this 
email|https://lists.apache.org/thread/pr0g812olzpgz21d9oodhc46db9jpxo3] , I'll 
add the "normal" archunit violations that comply to 1.18 when this version is 
main supported version for the Cassandra connector.

> Make Cassandra connector tests compatible with archunit rules
> -
>
> Key: FLINK-32353
> URL: https://issues.apache.org/jira/browse/FLINK-32353
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Cassandra
>Reporter: Martijn Visser
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: pull-request-available
>
> The current Cassandra connector in {{main}} fails when testing against Flink 
> 1.18-SNAPSHOT
> {code:java}
> Error:  Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.1 s 
> <<< FAILURE! - in org.apache.flink.architecture.rules.ITCaseRules
> Error:  ITCaseRules.ITCASE_USE_MINICLUSTER  Time elapsed: 0.025 s  <<< 
> FAILURE!
> java.lang.AssertionError: 
> Architecture Violation [Priority: MEDIUM] - Rule 'ITCASE tests should use a 
> MiniCluster resource or extension' was violated (1 times):
> org.apache.flink.streaming.connectors.cassandra.CassandraConnectorITCase does 
> not satisfy: only one of the following predicates match:
> * reside in a package 'org.apache.flink.runtime.*' and contain any fields 
> that are static, final, and of type InternalMiniClusterExtension and 
> annotated with @RegisterExtension
> * reside outside of package 'org.apache.flink.runtime.*' and contain any 
> fields that are static, final, and of type MiniClusterExtension and annotated 
> with @RegisterExtension or are , and of type MiniClusterTestEnvironment and 
> annotated with @TestEnv
> * reside in a package 'org.apache.flink.runtime.*' and is annotated with 
> @ExtendWith with class InternalMiniClusterExtension
> * reside outside of package 'org.apache.flink.runtime.*' and is annotated 
> with @ExtendWith with class MiniClusterExtension
>  or contain any fields that are public, static, and of type 
> MiniClusterWithClientResource and final and annotated with @ClassRule or 
> contain any fields that is of type MiniClusterWithClientResource and public 
> and final and not static and annotated with @Rule
> {code}
> https://github.com/apache/flink-connector-cassandra/actions/runs/5276835802/jobs/9544092571#step:13:811



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32353) Make Cassandra connector tests compatible with archunit rules

2023-07-05 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740251#comment-17740251
 ] 

Etienne Chauchot commented on FLINK-32353:
--

Regarding [this 
email|https://lists.apache.org/thread/pr0g812olzpgz21d9oodhc46db9jpxo3] , I'll 
add the "normal" archunit violations that comply to 1.18 when this version is 
main supported version for the Cassandra connector.

> Make Cassandra connector tests compatible with archunit rules
> -
>
> Key: FLINK-32353
> URL: https://issues.apache.org/jira/browse/FLINK-32353
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Cassandra
>Reporter: Martijn Visser
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: pull-request-available
>
> The current Cassandra connector in {{main}} fails when testing against Flink 
> 1.18-SNAPSHOT
> {code:java}
> Error:  Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.1 s 
> <<< FAILURE! - in org.apache.flink.architecture.rules.ITCaseRules
> Error:  ITCaseRules.ITCASE_USE_MINICLUSTER  Time elapsed: 0.025 s  <<< 
> FAILURE!
> java.lang.AssertionError: 
> Architecture Violation [Priority: MEDIUM] - Rule 'ITCASE tests should use a 
> MiniCluster resource or extension' was violated (1 times):
> org.apache.flink.streaming.connectors.cassandra.CassandraConnectorITCase does 
> not satisfy: only one of the following predicates match:
> * reside in a package 'org.apache.flink.runtime.*' and contain any fields 
> that are static, final, and of type InternalMiniClusterExtension and 
> annotated with @RegisterExtension
> * reside outside of package 'org.apache.flink.runtime.*' and contain any 
> fields that are static, final, and of type MiniClusterExtension and annotated 
> with @RegisterExtension or are , and of type MiniClusterTestEnvironment and 
> annotated with @TestEnv
> * reside in a package 'org.apache.flink.runtime.*' and is annotated with 
> @ExtendWith with class InternalMiniClusterExtension
> * reside outside of package 'org.apache.flink.runtime.*' and is annotated 
> with @ExtendWith with class MiniClusterExtension
>  or contain any fields that are public, static, and of type 
> MiniClusterWithClientResource and final and annotated with @ClassRule or 
> contain any fields that is of type MiniClusterWithClientResource and public 
> and final and not static and annotated with @Rule
> {code}
> https://github.com/apache/flink-connector-cassandra/actions/runs/5276835802/jobs/9544092571#step:13:811



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   3   4   5   6   >