[jira] [Commented] (FLINK-35500) DynamoDb Table API Sink fails to delete elements due to key not found

2024-05-31 Thread Ahmed Hamdy (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851180#comment-17851180
 ] 

Ahmed Hamdy commented on FLINK-35500:
-

Thanks for clarifying

> DynamoDb Table API Sink fails to delete elements due to key not found
> -
>
> Key: FLINK-35500
> URL: https://issues.apache.org/jira/browse/FLINK-35500
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / DynamoDB
>Affects Versions: aws-connector-4.0.0, aws-connector-4.1.0, 
> aws-connector-4.2.0
>Reporter: Ahmed Hamdy
>Priority: Major
> Fix For: aws-connector-4.4.0
>
>
> h2. Description
> When DynamoDbSink is used with CDC sources, it fails to process {{DELETE}} 
> records and throws 
> {quote}org.apache.flink.connector.dynamodb.shaded.software.amazon.awssdk.services.dynamodb.model.DynamoDbException:
>  The provided key element does not match the schema{quote}
> This is due to {{DynamoDbSinkWriter}} passing the whole DynamoDb Item as key 
> instead of the constructed primary Key[1].
> Note: The issue is reported in user mailing list[2]
> h2. Steps to Reproduce
> (1) Create a new DynamoDB table in AWS.  Command line:
> aws dynamodb create-table \
>   --table-name orders \
>   --attribute-definitions AttributeName=userId,AttributeType=S \
>   --key-schema AttributeName=userId,KeyType=HASH \
>   --billing-mode PAY_PER_REQUEST
> (2) Create an input file in Debezium-JSON format with the following rows to 
> start:
> {"op": "c", "after": {"orderId": 1, "userId": "a", "price": 5}}
> {"op": "c", "after": {"orderId": 2, "userId": "b", "price": 7}}
> {"op": "c", "after": {"orderId": 3, "userId": "c", "price": 9}}
> {"op": "c", "after": {"orderId": 4, "userId": "a", "price": 11}}
> (3) Start the Flink SQL Client, and run the following, substituting in the 
> proper local paths for the Dynamo Connector JAR file and for this local 
> sample input file:
> ADD JAR '/Users/robg/Downloads/flink-sql-connector-dynamodb-4.2.0-1.18.jar';
> SET 'execution.runtime-mode' = 'streaming';
> SET 'sql-client.execution.result-mode' = 'changelog';
> CREATE TABLE Orders_CDC(
>   orderId BIGINT,
>   price float,
>   userId STRING
>  ) WITH (
>'connector' = 'filesystem',
>'path' = '/path/to/input_file.jsonl',
>'format' = 'debezium-json'
>  );
> CREATE TABLE Orders_Dynamo (
>   orderId BIGINT,
>   price float,
>   userId STRING,
>   PRIMARY KEY (userId) NOT ENFORCED
> ) PARTITIONED BY ( userId )
> WITH (
>   'connector' = 'dynamodb',
>   'table-name' = 'orders',
>   'aws.region' = 'us-east-1'
> );
> INSERT INTO Orders_Dynamo SELECT * FROM Orders_CDC ;
> (4) At this point, we will see that things currently all work properly, and 
> these 4 rows are inserted properly to Dynamo, because they are "Insert" 
> operations.   So far, so good!
> (5) Now, add the following row to the input file.  This represents a deletion 
> in Debezium format, which should then cause a Deletion on the corresponding 
> DynamoDB table:
> {"op": "d", "before": {"orderId": 3, "userId": "c", "price": 9}}
> (6) Re-Run the SQL statement:
> INSERT INTO Orders_Dynamo SELECT * FROM Orders_CDC ;
> h3. References
> 1-https://github.com/apache/flink-connector-aws/blob/main/flink-connector-aws/flink-connector-dynamodb/src/main/java/org/apache/flink/connector/dynamodb/sink/DynamoDbSinkWriter.java#L267
> 2- https://lists.apache.org/thread/ysvctpvn6n9kn0qlf5b24gxchfg64ylf



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-19303) Disable WAL in RocksDB recovery

2024-05-31 Thread Roman Khachatryan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851172#comment-17851172
 ] 

Roman Khachatryan commented on FLINK-19303:
---

I think this was fixed in FLINK-32326 (cc: [~srichter] )

Should we close this ticket as a duplicate?

> Disable WAL in RocksDB recovery
> ---
>
> Key: FLINK-19303
> URL: https://issues.apache.org/jira/browse/FLINK-19303
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Reporter: Juha Mynttinen
>Assignee: Juha Mynttinen
>Priority: Minor
>
> During recovery of {{RocksDBStateBackend}} the recovery mechanism puts the 
> key value pairs to local RocksDB instance(s). To speed up the process, the 
> recovery process uses RocskDB write batch mechanism. [RocksDB 
> WAL|https://github.com/facebook/rocksdb/wiki/Write-Ahead-Log]  is enabled 
> during this process.
> During normal operations, i.e. when the state backend has been recovered and 
> the Flink application is running (on RocksDB state backend) WAL is disabled.
> The recovery process doesn't need WAL. In fact the recovery should be much 
> faster without WAL. Thus, WAL should be disabled in the recovery process.
> AFAIK the last thing that was done with WAL during recovery was an attempt to 
> remove it. Later that removal was removed because it causes stability issues 
> (https://issues.apache.org/jira/browse/FLINK-8922).
> Unfortunately the root cause why disabling WAL causes segfault during 
> recovery is unknown. After all, WAL is not used during normal operations.
> Potential explanation is some kind of bug in RocksDB write batch when using 
> WAL. It is possible later RocksDB versions have fixes / workarounds for the 
> issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35501) Use common thread pools when transferring RocksDB state files

2024-05-31 Thread Roman Khachatryan (Jira)
Roman Khachatryan created FLINK-35501:
-

 Summary: Use common thread pools when transferring RocksDB state 
files
 Key: FLINK-35501
 URL: https://issues.apache.org/jira/browse/FLINK-35501
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / State Backends
Affects Versions: 1.20.0
Reporter: Roman Khachatryan
Assignee: Roman Khachatryan
 Fix For: 1.20.0


Currently, each RocksDB state backend creates an executor backed by a thread 
pool.

This makes it difficult to control the total number of threads per TM because 
it might have at least one task per slot and theoretically, many state backends 
per task (because of chaining).

Additionally, using a common thread pool allows to indirectly control the load 
on the underlying DFS (e.g. the total number of requests to S3 from a TM).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35500) DynamoDb Table API Sink fails to delete elements due to key not found

2024-05-31 Thread Aleksandr Pilipenko (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851162#comment-17851162
 ] 

Aleksandr Pilipenko commented on FLINK-35500:
-

{quote}
yes, I agree, however It seems that by design the DynamoDbWriter is responsible 
for overriding partition keys propagated from the Table API DynamicSinkFactory, 
I believe it would be confusing to split the responsibility for both the  
element converter and the writer.
{quote}
True, however overwriteByPartitionKeys is used for request deduplication and 
not strictly required to only have table keys.
Regardless, I believe that ElementConverter[1] should distinguish between PUT 
and DELETE requests while populating items field of DynamoDbWriteRequest 
object, currently only type is different between these 2 cases.
To do that, converter will need to have information about table keys in order 
to only include required fields.

[1] - 
https://github.com/apache/flink-connector-aws/blob/main/flink-connector-aws/flink-connector-dynamodb/src/main/java/org/apache/flink/connector/dynamodb/table/RowDataElementConverter.java#L48-L72

> DynamoDb Table API Sink fails to delete elements due to key not found
> -
>
> Key: FLINK-35500
> URL: https://issues.apache.org/jira/browse/FLINK-35500
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / DynamoDB
>Affects Versions: aws-connector-4.0.0, aws-connector-4.1.0, 
> aws-connector-4.2.0
>Reporter: Ahmed Hamdy
>Priority: Major
> Fix For: aws-connector-4.4.0
>
>
> h2. Description
> When DynamoDbSink is used with CDC sources, it fails to process {{DELETE}} 
> records and throws 
> {quote}org.apache.flink.connector.dynamodb.shaded.software.amazon.awssdk.services.dynamodb.model.DynamoDbException:
>  The provided key element does not match the schema{quote}
> This is due to {{DynamoDbSinkWriter}} passing the whole DynamoDb Item as key 
> instead of the constructed primary Key[1].
> Note: The issue is reported in user mailing list[2]
> h2. Steps to Reproduce
> (1) Create a new DynamoDB table in AWS.  Command line:
> aws dynamodb create-table \
>   --table-name orders \
>   --attribute-definitions AttributeName=userId,AttributeType=S \
>   --key-schema AttributeName=userId,KeyType=HASH \
>   --billing-mode PAY_PER_REQUEST
> (2) Create an input file in Debezium-JSON format with the following rows to 
> start:
> {"op": "c", "after": {"orderId": 1, "userId": "a", "price": 5}}
> {"op": "c", "after": {"orderId": 2, "userId": "b", "price": 7}}
> {"op": "c", "after": {"orderId": 3, "userId": "c", "price": 9}}
> {"op": "c", "after": {"orderId": 4, "userId": "a", "price": 11}}
> (3) Start the Flink SQL Client, and run the following, substituting in the 
> proper local paths for the Dynamo Connector JAR file and for this local 
> sample input file:
> ADD JAR '/Users/robg/Downloads/flink-sql-connector-dynamodb-4.2.0-1.18.jar';
> SET 'execution.runtime-mode' = 'streaming';
> SET 'sql-client.execution.result-mode' = 'changelog';
> CREATE TABLE Orders_CDC(
>   orderId BIGINT,
>   price float,
>   userId STRING
>  ) WITH (
>'connector' = 'filesystem',
>'path' = '/path/to/input_file.jsonl',
>'format' = 'debezium-json'
>  );
> CREATE TABLE Orders_Dynamo (
>   orderId BIGINT,
>   price float,
>   userId STRING,
>   PRIMARY KEY (userId) NOT ENFORCED
> ) PARTITIONED BY ( userId )
> WITH (
>   'connector' = 'dynamodb',
>   'table-name' = 'orders',
>   'aws.region' = 'us-east-1'
> );
> INSERT INTO Orders_Dynamo SELECT * FROM Orders_CDC ;
> (4) At this point, we will see that things currently all work properly, and 
> these 4 rows are inserted properly to Dynamo, because they are "Insert" 
> operations.   So far, so good!
> (5) Now, add the following row to the input file.  This represents a deletion 
> in Debezium format, which should then cause a Deletion on the corresponding 
> DynamoDB table:
> {"op": "d", "before": {"orderId": 3, "userId": "c", "price": 9}}
> (6) Re-Run the SQL statement:
> INSERT INTO Orders_Dynamo SELECT * FROM Orders_CDC ;
> h3. References
> 1-https://github.com/apache/flink-connector-aws/blob/main/flink-connector-aws/flink-connector-dynamodb/src/main/java/org/apache/flink/connector/dynamodb/sink/DynamoDbSinkWriter.java#L267
> 2- https://lists.apache.org/thread/ysvctpvn6n9kn0qlf5b24gxchfg64ylf



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-35282) PyFlink Support for Apache Beam > 2.49

2024-05-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-35282:
---
Labels: pull-request-available  (was: )

> PyFlink Support for Apache Beam > 2.49
> --
>
> Key: FLINK-35282
> URL: https://issues.apache.org/jira/browse/FLINK-35282
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python
>Reporter: APA
>Priority: Major
>  Labels: pull-request-available
>
> From what I see PyFlink still has the requirement of Apache Beam => 2.43.0 
> and <= 2.49.0 which subsequently results in a requirement of PyArrow <= 
> 12.0.0. That keeps us exposed to 
> [https://nvd.nist.gov/vuln/detail/CVE-2023-47248]
> I'm not deep enough familiar with the PyFlink code base to understand why 
> Apache Beam's upper dependency limit can't be lifted. From all the existing 
> issues I haven't seen one addressing this. Therefore I created one now. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35500) DynamoDb Table API Sink fails to delete elements due to key not found

2024-05-31 Thread Ahmed Hamdy (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851156#comment-17851156
 ] 

Ahmed Hamdy commented on FLINK-35500:
-

[~a.pilipenko] Thanks for chiming in,
{quote}Looking at this issue, it appears that issue is in TableAPI element 
converter implementation.

RowDataElementConverter[1] does not have information about DDB table partition 
and sort keys or primary keys of table on Flink side.
{quote}
yes, I agree, however It seems that by design the {{DynamoDbWriter}} is 
responsible for overriding partition keys propagated from the Table API 
DynamicSinkFactory, I believe it would be confusing to split the responsibility 
for both the  element converter and the writer.

 

{quote}This is less of an issue on DataStream API part, since default element 
converter only support insert (PUT) operations[2] and user can use correct 
mapping in custom element converter implementation.\{quote}

I am not sure I follow, even if the element converter did set the primary key 
tag in the element, we still use the complete item as a key in delete requests 
from writer[2], is this behaviour normal?

1-[https://github.com/apache/flink-connector-aws/blob/c688a8545ac1001c8450e8c9c5fe8bbafa13aeba/flink-connector-aws/flink-connector-dynamodb/src/main/java/org/apache/flink/connector/dynamodb/sink/DynamoDbSinkWriter.java#L174]

2-[https://github.com/apache/flink-connector-aws/blob/c688a8545ac1001c8450e8c9c5fe8bbafa13aeba/flink-connector-aws/flink-connector-dynamodb/src/main/java/org/apache/flink/connector/dynamodb/sink/DynamoDbSinkWriter.java#L267]

> DynamoDb Table API Sink fails to delete elements due to key not found
> -
>
> Key: FLINK-35500
> URL: https://issues.apache.org/jira/browse/FLINK-35500
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / DynamoDB
>Affects Versions: aws-connector-4.0.0, aws-connector-4.1.0, 
> aws-connector-4.2.0
>Reporter: Ahmed Hamdy
>Priority: Major
> Fix For: aws-connector-4.4.0
>
>
> h2. Description
> When DynamoDbSink is used with CDC sources, it fails to process {{DELETE}} 
> records and throws 
> {quote}org.apache.flink.connector.dynamodb.shaded.software.amazon.awssdk.services.dynamodb.model.DynamoDbException:
>  The provided key element does not match the schema{quote}
> This is due to {{DynamoDbSinkWriter}} passing the whole DynamoDb Item as key 
> instead of the constructed primary Key[1].
> Note: The issue is reported in user mailing list[2]
> h2. Steps to Reproduce
> (1) Create a new DynamoDB table in AWS.  Command line:
> aws dynamodb create-table \
>   --table-name orders \
>   --attribute-definitions AttributeName=userId,AttributeType=S \
>   --key-schema AttributeName=userId,KeyType=HASH \
>   --billing-mode PAY_PER_REQUEST
> (2) Create an input file in Debezium-JSON format with the following rows to 
> start:
> {"op": "c", "after": {"orderId": 1, "userId": "a", "price": 5}}
> {"op": "c", "after": {"orderId": 2, "userId": "b", "price": 7}}
> {"op": "c", "after": {"orderId": 3, "userId": "c", "price": 9}}
> {"op": "c", "after": {"orderId": 4, "userId": "a", "price": 11}}
> (3) Start the Flink SQL Client, and run the following, substituting in the 
> proper local paths for the Dynamo Connector JAR file and for this local 
> sample input file:
> ADD JAR '/Users/robg/Downloads/flink-sql-connector-dynamodb-4.2.0-1.18.jar';
> SET 'execution.runtime-mode' = 'streaming';
> SET 'sql-client.execution.result-mode' = 'changelog';
> CREATE TABLE Orders_CDC(
>   orderId BIGINT,
>   price float,
>   userId STRING
>  ) WITH (
>'connector' = 'filesystem',
>'path' = '/path/to/input_file.jsonl',
>'format' = 'debezium-json'
>  );
> CREATE TABLE Orders_Dynamo (
>   orderId BIGINT,
>   price float,
>   userId STRING,
>   PRIMARY KEY (userId) NOT ENFORCED
> ) PARTITIONED BY ( userId )
> WITH (
>   'connector' = 'dynamodb',
>   'table-name' = 'orders',
>   'aws.region' = 'us-east-1'
> );
> INSERT INTO Orders_Dynamo SELECT * FROM Orders_CDC ;
> (4) At this point, we will see that things currently all work properly, and 
> these 4 rows are inserted properly to Dynamo, because they are "Insert" 
> operations.   So far, so good!
> (5) Now, add the following row to the input file.  This represents a deletion 
&g

[jira] [Closed] (FLINK-35483) BatchJobRecoveryTest related to JM failover produced no output for 900 second

2024-05-31 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-35483.
---
Fix Version/s: 1.20.0
   Resolution: Fixed

f3a3f926c6c6c931bb7ccc52e823d70cfd8aadf5

> BatchJobRecoveryTest related to JM failover produced no output for 900 second
> -
>
> Key: FLINK-35483
> URL: https://issues.apache.org/jira/browse/FLINK-35483
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / CI
>Affects Versions: 1.20.0
>Reporter: Weijie Guo
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>
> testRecoverFromJMFailover
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59919=logs=0da23115-68bb-5dcd-192c-bd4c8adebde1=24c3384f-1bcb-57b3-224f-51bf973bbee8=9476



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-35500) DynamoDb Table API Sink fails to delete elements due to key not found

2024-05-31 Thread Aleksandr Pilipenko (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851131#comment-17851131
 ] 

Aleksandr Pilipenko edited comment on FLINK-35500 at 5/31/24 3:21 PM:
--

Looking at this issue, it appears that issue is in TableAPI element converter 
implementation.

RowDataElementConverter[1] does not have information about DDB table partition 
and sort keys or primary keys of table on Flink side.

This is less of an issue on DataStream API part, since default element 
converter only support insert (PUT) operations[2] and user can use correct 
mapping in custom element converter implementation.

 

One option to resolve this - to get information on table primary keys in 
DynamicSinkFactory and provide it to element converter used by Table API sink 
implementation. Assumption here is that partition and sort keys of DDB table 
would be mirrored as primary keys on Flink SQL side.

 

[1] - 
[https://github.com/apache/flink-connector-aws/blob/main/flink-connector-aws/flink-connector-dynamodb/src/main/java/org/apache/flink/connector/dynamodb/table/RowDataElementConverter.java]

[2] - 
[https://github.com/apache/flink-connector-aws/blob/main/flink-connector-aws/flink-connector-dynamodb/src/main/java/org/apache/flink/connector/dynamodb/sink/DynamoDbBeanElementConverter.java#L58-L64]


was (Author: a.pilipenko):
Looking at this issue, it appears that issue is in TableAPI element converter 
implementation.

RowDataElementConverter[1] does not have information about DDB table partition 
and sort keys or primary keys of table on Flink side.

This is less of an issue on DataStream API part, since default element 
converter only support insert (PUT) operations[2] and user can use correct 
logic in custom element converter implementation.

 

One option to resolve this - to get information on table primary keys in 
DynamicSinkFactory and provide it to element converter used by Table API sink 
implementation. Assumption here is that partition and sort keys of DDB table 
would be mirrored as primary keys on Flink SQL side.

 

[1] - 
[https://github.com/apache/flink-connector-aws/blob/main/flink-connector-aws/flink-connector-dynamodb/src/main/java/org/apache/flink/connector/dynamodb/table/RowDataElementConverter.java]

[2] - 
https://github.com/apache/flink-connector-aws/blob/main/flink-connector-aws/flink-connector-dynamodb/src/main/java/org/apache/flink/connector/dynamodb/sink/DynamoDbBeanElementConverter.java#L58-L64

> DynamoDb Table API Sink fails to delete elements due to key not found
> -
>
> Key: FLINK-35500
> URL: https://issues.apache.org/jira/browse/FLINK-35500
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / DynamoDB
>Affects Versions: aws-connector-4.0.0, aws-connector-4.1.0, 
> aws-connector-4.2.0
>Reporter: Ahmed Hamdy
>Priority: Major
> Fix For: aws-connector-4.4.0
>
>
> h2. Description
> When DynamoDbSink is used with CDC sources, it fails to process {{DELETE}} 
> records and throws 
> {quote}org.apache.flink.connector.dynamodb.shaded.software.amazon.awssdk.services.dynamodb.model.DynamoDbException:
>  The provided key element does not match the schema{quote}
> This is due to {{DynamoDbSinkWriter}} passing the whole DynamoDb Item as key 
> instead of the constructed primary Key[1].
> Note: The issue is reported in user mailing list[2]
> h2. Steps to Reproduce
> (1) Create a new DynamoDB table in AWS.  Command line:
> aws dynamodb create-table \
>   --table-name orders \
>   --attribute-definitions AttributeName=userId,AttributeType=S \
>   --key-schema AttributeName=userId,KeyType=HASH \
>   --billing-mode PAY_PER_REQUEST
> (2) Create an input file in Debezium-JSON format with the following rows to 
> start:
> {"op": "c", "after": {"orderId": 1, "userId": "a", "price": 5}}
> {"op": "c", "after": {"orderId": 2, "userId": "b", "price": 7}}
> {"op": "c", "after": {"orderId": 3, "userId": "c", "price": 9}}
> {"op": "c", "after": {"orderId": 4, "userId": "a", "price": 11}}
> (3) Start the Flink SQL Client, and run the following, substituting in the 
> proper local paths for the Dynamo Connector JAR file and for this local 
> sample input file:
> ADD JAR '/Users/robg/Downloads/flink-sql-connector-dynamodb-4.2.0-1.18.jar';
> SET 'execution.runtime-mode' = 'streaming';
> SET 'sql-client.execution.result-m

[jira] [Commented] (FLINK-35500) DynamoDb Table API Sink fails to delete elements due to key not found

2024-05-31 Thread Aleksandr Pilipenko (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851131#comment-17851131
 ] 

Aleksandr Pilipenko commented on FLINK-35500:
-

Looking at this issue, it appears that issue is in TableAPI element converter 
implementation.

RowDataElementConverter[1] does not have information about DDB table partition 
and sort keys or primary keys of table on Flink side.

This is less of an issue on DataStream API part, since default element 
converter only support insert (PUT) operations[2] and user can use correct 
logic in custom element converter implementation.

 

One option to resolve this - to get information on table primary keys in 
DynamicSinkFactory and provide it to element converter used by Table API sink 
implementation. Assumption here is that partition and sort keys of DDB table 
would be mirrored as primary keys on Flink SQL side.

 

[1] - 
[https://github.com/apache/flink-connector-aws/blob/main/flink-connector-aws/flink-connector-dynamodb/src/main/java/org/apache/flink/connector/dynamodb/table/RowDataElementConverter.java]

[2] - 
https://github.com/apache/flink-connector-aws/blob/main/flink-connector-aws/flink-connector-dynamodb/src/main/java/org/apache/flink/connector/dynamodb/sink/DynamoDbBeanElementConverter.java#L58-L64

> DynamoDb Table API Sink fails to delete elements due to key not found
> -
>
> Key: FLINK-35500
> URL: https://issues.apache.org/jira/browse/FLINK-35500
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / DynamoDB
>Affects Versions: aws-connector-4.0.0, aws-connector-4.1.0, 
> aws-connector-4.2.0
>Reporter: Ahmed Hamdy
>Priority: Major
> Fix For: aws-connector-4.4.0
>
>
> h2. Description
> When DynamoDbSink is used with CDC sources, it fails to process {{DELETE}} 
> records and throws 
> {quote}org.apache.flink.connector.dynamodb.shaded.software.amazon.awssdk.services.dynamodb.model.DynamoDbException:
>  The provided key element does not match the schema{quote}
> This is due to {{DynamoDbSinkWriter}} passing the whole DynamoDb Item as key 
> instead of the constructed primary Key[1].
> Note: The issue is reported in user mailing list[2]
> h2. Steps to Reproduce
> (1) Create a new DynamoDB table in AWS.  Command line:
> aws dynamodb create-table \
>   --table-name orders \
>   --attribute-definitions AttributeName=userId,AttributeType=S \
>   --key-schema AttributeName=userId,KeyType=HASH \
>   --billing-mode PAY_PER_REQUEST
> (2) Create an input file in Debezium-JSON format with the following rows to 
> start:
> {"op": "c", "after": {"orderId": 1, "userId": "a", "price": 5}}
> {"op": "c", "after": {"orderId": 2, "userId": "b", "price": 7}}
> {"op": "c", "after": {"orderId": 3, "userId": "c", "price": 9}}
> {"op": "c", "after": {"orderId": 4, "userId": "a", "price": 11}}
> (3) Start the Flink SQL Client, and run the following, substituting in the 
> proper local paths for the Dynamo Connector JAR file and for this local 
> sample input file:
> ADD JAR '/Users/robg/Downloads/flink-sql-connector-dynamodb-4.2.0-1.18.jar';
> SET 'execution.runtime-mode' = 'streaming';
> SET 'sql-client.execution.result-mode' = 'changelog';
> CREATE TABLE Orders_CDC(
>   orderId BIGINT,
>   price float,
>   userId STRING
>  ) WITH (
>'connector' = 'filesystem',
>'path' = '/path/to/input_file.jsonl',
>'format' = 'debezium-json'
>  );
> CREATE TABLE Orders_Dynamo (
>   orderId BIGINT,
>   price float,
>   userId STRING,
>   PRIMARY KEY (userId) NOT ENFORCED
> ) PARTITIONED BY ( userId )
> WITH (
>   'connector' = 'dynamodb',
>   'table-name' = 'orders',
>   'aws.region' = 'us-east-1'
> );
> INSERT INTO Orders_Dynamo SELECT * FROM Orders_CDC ;
> (4) At this point, we will see that things currently all work properly, and 
> these 4 rows are inserted properly to Dynamo, because they are "Insert" 
> operations.   So far, so good!
> (5) Now, add the following row to the input file.  This represents a deletion 
> in Debezium format, which should then cause a Deletion on the corresponding 
> DynamoDB table:
> {"op": "d", "before": {"orderId": 3, "userId": "c", "price": 9}}
> (6) Re-Run the SQL statement:
> INSERT INTO Orders_Dynamo SELECT * FROM Orders_CDC ;
> h3. References
> 1-https://github.com/apache/flink-connector-aws/blob/main/flink-connector-aws/flink-connector-dynamodb/src/main/java/org/apache/flink/connector/dynamodb/sink/DynamoDbSinkWriter.java#L267
> 2- https://lists.apache.org/thread/ysvctpvn6n9kn0qlf5b24gxchfg64ylf



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-35500) DynamoDb Table API Sink fails to delete elements due to key not found

2024-05-31 Thread Ahmed Hamdy (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hamdy updated FLINK-35500:

Summary: DynamoDb Table API Sink fails to delete elements due to key not 
found  (was: DynamoDb SinkWriter fails to delete elements due to key not found)

> DynamoDb Table API Sink fails to delete elements due to key not found
> -
>
> Key: FLINK-35500
> URL: https://issues.apache.org/jira/browse/FLINK-35500
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / DynamoDB
>Affects Versions: aws-connector-4.0.0, aws-connector-4.1.0, 
> aws-connector-4.2.0
>Reporter: Ahmed Hamdy
>Priority: Major
> Fix For: aws-connector-4.4.0
>
>
> h2. Description
> When DynamoDbSink is used with CDC sources, it fails to process {{DELETE}} 
> records and throws 
> {quote}org.apache.flink.connector.dynamodb.shaded.software.amazon.awssdk.services.dynamodb.model.DynamoDbException:
>  The provided key element does not match the schema{quote}
> This is due to {{DynamoDbSinkWriter}} passing the whole DynamoDb Item as key 
> instead of the constructed primary Key[1].
> Note: The issue is reported in user mailing list[2]
> h2. Steps to Reproduce
> (1) Create a new DynamoDB table in AWS.  Command line:
> aws dynamodb create-table \
>   --table-name orders \
>   --attribute-definitions AttributeName=userId,AttributeType=S \
>   --key-schema AttributeName=userId,KeyType=HASH \
>   --billing-mode PAY_PER_REQUEST
> (2) Create an input file in Debezium-JSON format with the following rows to 
> start:
> {"op": "c", "after": {"orderId": 1, "userId": "a", "price": 5}}
> {"op": "c", "after": {"orderId": 2, "userId": "b", "price": 7}}
> {"op": "c", "after": {"orderId": 3, "userId": "c", "price": 9}}
> {"op": "c", "after": {"orderId": 4, "userId": "a", "price": 11}}
> (3) Start the Flink SQL Client, and run the following, substituting in the 
> proper local paths for the Dynamo Connector JAR file and for this local 
> sample input file:
> ADD JAR '/Users/robg/Downloads/flink-sql-connector-dynamodb-4.2.0-1.18.jar';
> SET 'execution.runtime-mode' = 'streaming';
> SET 'sql-client.execution.result-mode' = 'changelog';
> CREATE TABLE Orders_CDC(
>   orderId BIGINT,
>   price float,
>   userId STRING
>  ) WITH (
>'connector' = 'filesystem',
>'path' = '/path/to/input_file.jsonl',
>'format' = 'debezium-json'
>  );
> CREATE TABLE Orders_Dynamo (
>   orderId BIGINT,
>   price float,
>   userId STRING,
>   PRIMARY KEY (userId) NOT ENFORCED
> ) PARTITIONED BY ( userId )
> WITH (
>   'connector' = 'dynamodb',
>   'table-name' = 'orders',
>   'aws.region' = 'us-east-1'
> );
> INSERT INTO Orders_Dynamo SELECT * FROM Orders_CDC ;
> (4) At this point, we will see that things currently all work properly, and 
> these 4 rows are inserted properly to Dynamo, because they are "Insert" 
> operations.   So far, so good!
> (5) Now, add the following row to the input file.  This represents a deletion 
> in Debezium format, which should then cause a Deletion on the corresponding 
> DynamoDB table:
> {"op": "d", "before": {"orderId": 3, "userId": "c", "price": 9}}
> (6) Re-Run the SQL statement:
> INSERT INTO Orders_Dynamo SELECT * FROM Orders_CDC ;
> h3. References
> 1-https://github.com/apache/flink-connector-aws/blob/main/flink-connector-aws/flink-connector-dynamodb/src/main/java/org/apache/flink/connector/dynamodb/sink/DynamoDbSinkWriter.java#L267
> 2- https://lists.apache.org/thread/ysvctpvn6n9kn0qlf5b24gxchfg64ylf



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35500) DynamoDb SinkWriter fails to delete elements due to key not found

2024-05-31 Thread Ahmed Hamdy (Jira)
Ahmed Hamdy created FLINK-35500:
---

 Summary: DynamoDb SinkWriter fails to delete elements due to key 
not found
 Key: FLINK-35500
 URL: https://issues.apache.org/jira/browse/FLINK-35500
 Project: Flink
  Issue Type: Bug
  Components: Connectors / DynamoDB
Affects Versions: aws-connector-4.2.0, aws-connector-4.1.0, 
aws-connector-4.0.0
Reporter: Ahmed Hamdy
 Fix For: aws-connector-4.4.0


h2. Description
When DynamoDbSink is used with CDC sources, it fails to process {{DELETE}} 
records and throws 
{quote}org.apache.flink.connector.dynamodb.shaded.software.amazon.awssdk.services.dynamodb.model.DynamoDbException:
 The provided key element does not match the schema{quote}

This is due to {{DynamoDbSinkWriter}} passing the whole DynamoDb Item as key 
instead of the constructed primary Key[1].

Note: The issue is reported in user mailing list[2]

h2. Steps to Reproduce

(1) Create a new DynamoDB table in AWS.  Command line:
aws dynamodb create-table \
  --table-name orders \
  --attribute-definitions AttributeName=userId,AttributeType=S \
  --key-schema AttributeName=userId,KeyType=HASH \
  --billing-mode PAY_PER_REQUEST

(2) Create an input file in Debezium-JSON format with the following rows to 
start:
{"op": "c", "after": {"orderId": 1, "userId": "a", "price": 5}}
{"op": "c", "after": {"orderId": 2, "userId": "b", "price": 7}}
{"op": "c", "after": {"orderId": 3, "userId": "c", "price": 9}}
{"op": "c", "after": {"orderId": 4, "userId": "a", "price": 11}}

(3) Start the Flink SQL Client, and run the following, substituting in the 
proper local paths for the Dynamo Connector JAR file and for this local sample 
input file:

ADD JAR '/Users/robg/Downloads/flink-sql-connector-dynamodb-4.2.0-1.18.jar';
SET 'execution.runtime-mode' = 'streaming';
SET 'sql-client.execution.result-mode' = 'changelog';

CREATE TABLE Orders_CDC(
  orderId BIGINT,
  price float,
  userId STRING
 ) WITH (
   'connector' = 'filesystem',
   'path' = '/path/to/input_file.jsonl',
   'format' = 'debezium-json'
 );

CREATE TABLE Orders_Dynamo (
  orderId BIGINT,
  price float,
  userId STRING,
  PRIMARY KEY (userId) NOT ENFORCED
) PARTITIONED BY ( userId )
WITH (
  'connector' = 'dynamodb',
  'table-name' = 'orders',
  'aws.region' = 'us-east-1'
);

INSERT INTO Orders_Dynamo SELECT * FROM Orders_CDC ;

(4) At this point, we will see that things currently all work properly, and 
these 4 rows are inserted properly to Dynamo, because they are "Insert" 
operations.   So far, so good!

(5) Now, add the following row to the input file.  This represents a deletion 
in Debezium format, which should then cause a Deletion on the corresponding 
DynamoDB table:
{"op": "d", "before": {"orderId": 3, "userId": "c", "price": 9}}

(6) Re-Run the SQL statement:
INSERT INTO Orders_Dynamo SELECT * FROM Orders_CDC ;

h3. References
1-https://github.com/apache/flink-connector-aws/blob/main/flink-connector-aws/flink-connector-dynamodb/src/main/java/org/apache/flink/connector/dynamodb/sink/DynamoDbSinkWriter.java#L267
2- https://lists.apache.org/thread/ysvctpvn6n9kn0qlf5b24gxchfg64ylf



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-35192) Kubernetes operator oom

2024-05-31 Thread Gabor Somogyi (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Somogyi closed FLINK-35192.
-

> Kubernetes operator oom
> ---
>
> Key: FLINK-35192
> URL: https://issues.apache.org/jira/browse/FLINK-35192
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-1.6.1
> Environment: jdk: openjdk11
> operator version: 1.6.1
>Reporter: chenyuzhi
>Priority: Major
>  Labels: pull-request-available
> Fix For: kubernetes-operator-1.9.0
>
> Attachments: image-2024-04-22-15-47-49-455.png, 
> image-2024-04-22-15-52-51-600.png, image-2024-04-22-15-58-23-269.png, 
> image-2024-04-22-15-58-42-850.png, image-2024-04-30-16-47-07-289.png, 
> image-2024-04-30-17-11-24-974.png, image-2024-04-30-20-38-25-195.png, 
> image-2024-04-30-20-39-05-109.png, image-2024-04-30-20-39-34-396.png, 
> image-2024-04-30-20-41-51-660.png, image-2024-04-30-20-43-20-125.png, 
> screenshot-1.png, screenshot-2.png, screenshot-3.png, screenshot-4.png
>
>
> The kubernetest operator docker process was killed by kernel cause out of 
> memory(the time is 2024.04.03: 18:16)
>  !image-2024-04-22-15-47-49-455.png! 
> Metrics:
> the pod memory (RSS) is increasing slowly in the past 7 days:
>  !screenshot-1.png! 
> However the jvm memory metrics of operator not shown obvious anomaly:
>  !image-2024-04-22-15-58-23-269.png! 
>  !image-2024-04-22-15-58-42-850.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-35192) Kubernetes operator oom

2024-05-31 Thread Gabor Somogyi (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Somogyi resolved FLINK-35192.
---
Resolution: Fixed

> Kubernetes operator oom
> ---
>
> Key: FLINK-35192
> URL: https://issues.apache.org/jira/browse/FLINK-35192
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-1.6.1
> Environment: jdk: openjdk11
> operator version: 1.6.1
>Reporter: chenyuzhi
>Priority: Major
>  Labels: pull-request-available
> Fix For: kubernetes-operator-1.9.0
>
> Attachments: image-2024-04-22-15-47-49-455.png, 
> image-2024-04-22-15-52-51-600.png, image-2024-04-22-15-58-23-269.png, 
> image-2024-04-22-15-58-42-850.png, image-2024-04-30-16-47-07-289.png, 
> image-2024-04-30-17-11-24-974.png, image-2024-04-30-20-38-25-195.png, 
> image-2024-04-30-20-39-05-109.png, image-2024-04-30-20-39-34-396.png, 
> image-2024-04-30-20-41-51-660.png, image-2024-04-30-20-43-20-125.png, 
> screenshot-1.png, screenshot-2.png, screenshot-3.png, screenshot-4.png
>
>
> The kubernetest operator docker process was killed by kernel cause out of 
> memory(the time is 2024.04.03: 18:16)
>  !image-2024-04-22-15-47-49-455.png! 
> Metrics:
> the pod memory (RSS) is increasing slowly in the past 7 days:
>  !screenshot-1.png! 
> However the jvm memory metrics of operator not shown obvious anomaly:
>  !image-2024-04-22-15-58-23-269.png! 
>  !image-2024-04-22-15-58-42-850.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35483) BatchJobRecoveryTest related to JM failover produced no output for 900 second

2024-05-31 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1785#comment-1785
 ] 

Ryan Skraba commented on FLINK-35483:
-

* 1.20 test_cron_azure core 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59984=logs=77a9d8e1-d610-59b3-fc2a-4766541e0e33=125e07e7-8de0-5c6c-a541-a567415af3ef=10736
* 1.20 test_cron_hadoop313 core 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59984=logs=d89de3df-4600-5585-dadc-9bbc9a5e661c=be5a4b15-4b23-56b1-7582-795f58a645a2=10358
* 1.20 test_cron_jdk11 core 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59984=logs=f0ac5c25-1168-55a5-07ff-0e88223afed9=50bf7a25-bdc4-5e56-5478-c7b4511dde53=16309
* 1.20 test_cron_jdk17 core 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59984=logs=675bf62c-8558-587e-2555-dcad13acefb5=5878eed3-cc1e-5b12-1ed0-9e7139ce0992=35600
* 1.20 test_cron_jdk21 core 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59984=logs=d06b80b4-9e88-5d40-12a2-18072cf60528=609ecd5a-3f6e-5d0c-2239-2096b155a4d0=27143

> BatchJobRecoveryTest related to JM failover produced no output for 900 second
> -
>
> Key: FLINK-35483
> URL: https://issues.apache.org/jira/browse/FLINK-35483
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / CI
>Affects Versions: 1.20.0
>Reporter: Weijie Guo
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
>
> testRecoverFromJMFailover
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59919=logs=0da23115-68bb-5dcd-192c-bd4c8adebde1=24c3384f-1bcb-57b3-224f-51bf973bbee8=9476



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35254) build_wheels_on_macos failed

2024-05-31 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851110#comment-17851110
 ] 

Ryan Skraba commented on FLINK-35254:
-

* 1.18 build_wheels_on_macos 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59986=logs=f73b5736-8355-5390-ec71-4dfdec0ce6c5=90f7230e-bf5a-531b-8566-ad48d3e03bbb=204

> build_wheels_on_macos failed
> 
>
> Key: FLINK-35254
> URL: https://issues.apache.org/jira/browse/FLINK-35254
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / CI
>Affects Versions: 1.20.0
>Reporter: Weijie Guo
>Priority: Major
>
> {code:java}
>  ERROR: THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE. If 
> you have updated the package versions, please update the hashes. Otherwise, 
> examine the package contents carefully; someone may have tampered with them.
>   unknown package:
>   Expected sha256 
> f12932e5a6feb5c58192209af1d2607d488cb1d404fbc038ac12ada60327fa34
>Got
> 1c61bf307881167fe169de79c02f46d16fc5cd35781e02a40bf1f13671cdc22c
>   
>   [end of output]
> {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59219=logs=f73b5736-8355-5390-ec71-4dfdec0ce6c5=90f7230e-bf5a-531b-8566-ad48d3e03bbb=288



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34273) git fetch fails

2024-05-31 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851109#comment-17851109
 ] 

Ryan Skraba commented on FLINK-34273:
-

* 1.18 test_ci connect 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59986=logs=2e8cb2f7-b2d3-5c62-9c05-cd756d33a819=e46af1f3-6e88-5f8c-8976-a244d665959a=1200

> git fetch fails
> ---
>
> Key: FLINK-34273
> URL: https://issues.apache.org/jira/browse/FLINK-34273
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / CI, Test Infrastructure
>Affects Versions: 1.19.0, 1.18.1, 1.20.0
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: test-stability
>
> We've seen multiple {{git fetch}} failures. I assume this to be an 
> infrastructure issue. This Jira issue is for documentation purposes.
> {code:java}
> error: RPC failed; curl 18 transfer closed with outstanding read data 
> remaining
> error: 5211 bytes of body are still expected
> fetch-pack: unexpected disconnect while reading sideband packet
> fatal: early EOF
> fatal: fetch-pack: invalid index-pack output {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57080=logs=0e7be18f-84f2-53f0-a32d-4a5e4a174679=5d6dc3d3-393d-5111-3a40-c6a5a36202e6=667



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35483) BatchJobRecoveryTest related to JM failover produced no output for 900 second

2024-05-31 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851093#comment-17851093
 ] 

Ryan Skraba commented on FLINK-35483:
-

There were a bunch of these happening in GitHub actions over the last few days:
* 1.20 Default (Java 8) / Test (module: core) 
https://github.com/apache/flink/actions/runs/9315015519/job/25642330745#step:10:9437
* 1.20 Default (Java 8) / Test (module: core) 
https://github.com/apache/flink/actions/runs/9314750194/job/25642307976#step:10:10001
* 1.20 Default (Java 8) / Test (module: core) 
https://github.com/apache/flink/actions/runs/9314653645/job/25640759437#step:10:9994
* 1.20 Java 8 / Test (module: core) 
https://github.com/apache/flink/actions/runs/9311892945/job/25632036362#step:10:9445
* 1.20 Java 11 / Test (module: core) 
https://github.com/apache/flink/actions/runs/9311892945/job/25632060016#step:10:21861
* 1.20 Java 17 / Test (module: core) 
https://github.com/apache/flink/actions/runs/9311892945/job/25632066577#step:10:17919
* 1.20 Java 21 / Test (module: core) 
https://github.com/apache/flink/actions/runs/9311892945/job/25632044018#step:10:26883
* 1.20 Hadoop 3.1.3 / Test (module: core) 
https://github.com/apache/flink/actions/runs/9311892945/job/25632095313#step:10:10097
* 1.20 AdaptiveScheduler / Test (module: core) 
https://github.com/apache/flink/actions/runs/9311892945/job/25632037248#step:10:9465
* 1.20 Default (Java 8) / Test (module: core) 
https://github.com/apache/flink/actions/runs/9304086727/job/25608398592#step:10:10041
* 1.20 Default (Java 8) / Test (module: core) 
https://github.com/apache/flink/actions/runs/9301074793/job/25598821104#step:10:9997
* 1.20 Default (Java 8) / Test (module: core) 
https://github.com/apache/flink/actions/runs/9297918541/job/25589186429#step:10:10047
* 1.20 Java 8 / Test (module: core) 
https://github.com/apache/flink/actions/runs/9295906525/job/25583826518#step:10:9461
* 1.20 Java 11 / Test (module: core) 
https://github.com/apache/flink/actions/runs/9295906525/job/25583843903#step:10:10266
* 1.20 Java 17 / Test (module: core) 
https://github.com/apache/flink/actions/runs/9295906525/job/25583846258#step:10:9639
* 1.20 Java 21 / Test (module: core) 
https://github.com/apache/flink/commit/89c89d8522f779986f3f6f163d803e5d5f11ec62/checks/25583832153/logs
* 1.20 Hadoop 3.1.3 / Test (module: core) 
https://github.com/apache/flink/actions/runs/9295906525/job/25583978541#step:10:9502
* 1.20 AdaptiveScheduler / Test (module: core) 
https://github.com/apache/flink/actions/runs/9295906525/job/25583820574#step:10:9445
* 1.20 Default (Java 8) / Test (module: core) 
https://github.com/apache/flink/actions/runs/9288555905/job/25560930733#step:10:1
* 1.20 Default (Java 8) / Test (module: core) 
https://github.com/apache/flink/actions/runs/9286437552/job/25553690409#step:10:10182
* 1.20 Default (Java 8) / Test (module: core) 
https://github.com/apache/flink/actions/runs/9283663961/job/25545853323#step:10:9451
* 1.20 Java 8 / Test (module: core) 
https://github.com/apache/flink/actions/runs/9279196848/job/25531630521#step:10:9445
* 1.20 Java 11 / Test (module: core) 
https://github.com/apache/flink/actions/runs/9279196848/job/25531656817#step:10:11015
* 1.20 Java 17 / Test (module: core) 
https://github.com/apache/flink/actions/runs/9279196848/job/25531646405#step:10:10394
* 1.20 Java 21 / Test (module: core) 
https://github.com/apache/flink/actions/runs/9279196848/job/25531643205#step:10:25928
* 1.20 Hadoop 3.1.3 / Test (module: core) 
https://github.com/apache/flink/actions/runs/9279196848/job/25531688535#step:10:10073
* 1.20 Adaptive Scheduler / Test (module: core) 
https://github.com/apache/flink/actions/runs/9279196848/job/25531631759#step:10:10013
* 1.20 Default (Java 8) / Test (module: core) 
https://github.com/apache/flink/actions/runs/9277267623/job/25526325719#step:10:10023
* 1.20 Default (Java 8) / Test (module: core) 
https://github.com/apache/flink/actions/runs/9275522134/job/25520828361#step:10:10012
* 1.20 Default (Java 8) / Test (module: core) 
https://github.com/apache/flink/actions/runs/9265488039/job/25489065093#step:10:10041
* 1.20 Default (Java 8) / Test (module: core) 
https://github.com/apache/flink/actions/runs/9262951119/job/25480817370#step:10:10032

> BatchJobRecoveryTest related to JM failover produced no output for 900 second
> -
>
> Key: FLINK-35483
> URL: https://issues.apache.org/jira/browse/FLINK-35483
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / CI
>Affects Versions: 1.20.0
>Reporter: Weijie Guo
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
>
> testRecoverFromJMFailover
> https://dev.azure.com/apache-flink/apache-flink

[jira] [Commented] (FLINK-35095) ExecutionEnvironmentImplTest.testFromSource failure on GitHub CI

2024-05-31 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851090#comment-17851090
 ] 

Ryan Skraba commented on FLINK-35095:
-

* 1.20 Default (Java 8) / Test (module: misc) 
https://github.com/apache/flink/actions/runs/9314653645/job/25640760952#step:10:22281
* 1.20 Java 11 / Test (module: misc) 
[https://github.com/apache/flink/actions/runs/9295906525/job/25583844788#step:10:22256]

Same symptom and error message happening on 
{{ExecutionEnvironmentImplTest.testAddWrapSource}}

> ExecutionEnvironmentImplTest.testFromSource failure on GitHub CI
> 
>
> Key: FLINK-35095
> URL: https://issues.apache.org/jira/browse/FLINK-35095
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.20.0
>Reporter: Ryan Skraba
>Priority: Critical
>  Labels: test-stability
>
> 1.20 Java 17: Test (module: misc) 
> https://github.com/apache/flink/actions/runs/8655935935/job/23735920630#step:10:3
> {code}
> Error: 02:29:05 02:29:05.708 [ERROR] Tests run: 5, Failures: 1, Errors: 0, 
> Skipped: 0, Time elapsed: 0.360 s <<< FAILURE! -- in 
> org.apache.flink.datastream.impl.ExecutionEnvironmentImplTest
> Error: 02:29:05 02:29:05.708 [ERROR] 
> org.apache.flink.datastream.impl.ExecutionEnvironmentImplTest.testFromSource 
> -- Time elapsed: 0.131 s <<< FAILURE!
> Apr 12 02:29:05 java.lang.AssertionError: 
> Apr 12 02:29:05 
> Apr 12 02:29:05 Expecting actual:
> Apr 12 02:29:05   [47]
> Apr 12 02:29:05 to contain exactly (and in same order):
> Apr 12 02:29:05   [48]
> Apr 12 02:29:05 but some elements were not found:
> Apr 12 02:29:05   [48]
> Apr 12 02:29:05 and others were not expected:
> Apr 12 02:29:05   [47]
> Apr 12 02:29:05 
> Apr 12 02:29:05   at 
> org.apache.flink.datastream.impl.ExecutionEnvironmentImplTest.testFromSource(ExecutionEnvironmentImplTest.java:97)
> Apr 12 02:29:05   at 
> java.base/java.lang.reflect.Method.invoke(Method.java:568)
> Apr 12 02:29:05   at 
> java.base/java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:194)
> Apr 12 02:29:05   at 
> java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373)
> Apr 12 02:29:05   at 
> java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
> Apr 12 02:29:05   at 
> java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)
> Apr 12 02:29:05   at 
> java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)
> Apr 12 02:29:05   at 
> java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)
> Apr 12 02:29:05 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35339) Compilation timeout while building flink-dist

2024-05-31 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851092#comment-17851092
 ] 

Ryan Skraba commented on FLINK-35339:
-

* 1.20 Default (Java 8) / Test (module: python) 
https://github.com/apache/flink/actions/runs/9315015519/job/25642331343#step:10:14876

> Compilation timeout while building flink-dist
> -
>
> Key: FLINK-35339
> URL: https://issues.apache.org/jira/browse/FLINK-35339
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.19.1
>Reporter: Ryan Skraba
>Priority: Critical
>  Labels: test-stability
>
> * 1.19 Java 17 / Test (module: python) 
> https://github.com/apache/flink/actions/runs/9040330904/job/24844527283#step:10:14325
> The CI pipeline fails with:
> {code}
> May 11 02:44:25 Process exited with EXIT CODE: 143.
> May 11 02:44:25 Trying to KILL watchdog (49546).
> May 11 02:44:25 
> ==
> May 11 02:44:25 Compilation failure detected, skipping test execution.
> May 11 02:44:25 
> ==
> {code}
> It looks like this is due to a failed network connection while building 
> src/assemblies/bin.xml :
> {code}
> May 11 02:44:25java.lang.Thread.State: RUNNABLE
> May 11 02:44:25   at sun.nio.ch.Net.connect0(java.base@17.0.7/Native 
> Method)
> May 11 02:44:25   at sun.nio.ch.Net.connect(java.base@17.0.7/Net.java:579)
> May 11 02:44:25   at sun.nio.ch.Net.connect(java.base@17.0.7/Net.java:568)
> May 11 02:44:25   at 
> sun.nio.ch.NioSocketImpl.connect(java.base@17.0.7/NioSocketImpl.java:588)
> May 11 02:44:25   at 
> java.net.SocksSocketImpl.connect(java.base@17.0.7/SocksSocketImpl.java:327)
> May 11 02:44:25   at 
> java.net.Socket.connect(java.base@17.0.7/Socket.java:633)
> May 11 02:44:25   at 
> org.apache.maven.wagon.providers.http.httpclient.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:368)
> May 11 02:44:25   at 
> org.apache.maven.wagon.providers.http.httpclient.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142)
> May 11 02:44:25   at 
> org.apache.maven.wagon.providers.http.httpclient.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:376)
> May 11 02:44:25   at 
> org.apache.maven.wagon.providers.http.httpclient.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:393)
> May 11 02:44:25   at 
> org.apache.maven.wagon.providers.http.httpclient.impl.execchain.MainClientExec.execute(MainClientExec.java:236)
> May 11 02:44:25   at 
> org.apache.maven.wagon.providers.http.httpclient.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
> May 11 02:44:25   at 
> org.apache.maven.wagon.providers.http.httpclient.impl.execchain.RetryExec.execute(RetryExec.java:89)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34464) actions/cache@v4 times out

2024-05-31 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851087#comment-17851087
 ] 

Ryan Skraba commented on FLINK-34464:
-

* 1.20 Default (Java 8) / Test packaging/licensing 
https://github.com/apache/flink/actions/runs/9265488039/job/25489064468#step:4:21641

> actions/cache@v4 times out
> --
>
> Key: FLINK-34464
> URL: https://issues.apache.org/jira/browse/FLINK-34464
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / CI, Test Infrastructure
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: github-actions, test-stability
>
> [https://github.com/apache/flink/actions/runs/7953599167/job/21710058433#step:4:125]
> Pulling the docker image stalled. This should be a temporary issue:
> {code:java}
> /usr/bin/docker exec  
> 601a5a6e68acf3ba38940ec7a07e08d7c57e763ca0364070124f71bc2f708bc3 sh -c "cat 
> /etc/*release | grep ^ID"
> 120Received 260046848 of 1429155280 (18.2%), 248.0 MBs/sec
> 121Received 545259520 of 1429155280 (38.2%), 260.0 MBs/sec
> [...]
> Received 914358272 of 1429155280 (64.0%), 0.0 MBs/sec
> 21645Received 914358272 of 1429155280 (64.0%), 0.0 MBs/sec
> 21646Error: The operation was canceled. {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34428) WindowAggregateITCase#testEventTimeHopWindow_GroupingSets times out

2024-05-31 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851086#comment-17851086
 ] 

Ryan Skraba commented on FLINK-34428:
-

* 1.18 AdaptiveScheduler / Test (module: table) 
[https://github.com/apache/flink/actions/runs/9295906824/job/25583865489#step:10:17171]

This is the same symptom but occurs on the following tests:
* WindowAggregateITCase.testEventTimeTumbleWindow_Cube
* WindowAggregateITCase.testRelaxFormProctimeCascadeWindowAgg

> WindowAggregateITCase#testEventTimeHopWindow_GroupingSets times out
> ---
>
> Key: FLINK-34428
> URL: https://issues.apache.org/jira/browse/FLINK-34428
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / API
>Affects Versions: 1.18.1
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: github-actions, test-stability
>
> https://github.com/apache/flink/actions/runs/7866453368/job/21460921339#step:10:15127
> {code}
> "main" #1 prio=5 os_prio=0 tid=0x7f1770cb7000 nid=0x4ad4d waiting on 
> condition [0x7f17711f6000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0xab48e3a0> (a 
> java.util.concurrent.CompletableFuture$Signaller)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
>   at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>   at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
>   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
>   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2131)
>   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2099)
>   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2077)
>   at 
> org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.scala:876)
>   at 
> org.apache.flink.table.planner.runtime.stream.sql.WindowAggregateITCase.testTumbleWindowWithoutOutputWindowColumns(WindowAggregateITCase.scala:477)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34404) GroupWindowAggregateProcTimeRestoreTest#testRestore times out

2024-05-31 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851085#comment-17851085
 ] 

Ryan Skraba commented on FLINK-34404:
-

* 1.19 Hadoop 3.1.3 / Test (module: table) 
https://github.com/apache/flink/actions/runs/9295906585/job/25583895678#step:10:11733
* 1.20 Default (Java 8) / Test (module: table) 
https://github.com/apache/flink/actions/runs/9275522134/job/25520829167#step:10:12142

> GroupWindowAggregateProcTimeRestoreTest#testRestore times out
> -
>
> Key: FLINK-34404
> URL: https://issues.apache.org/jira/browse/FLINK-34404
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.19.0, 1.20.0
>Reporter: Matthias Pohl
>Assignee: Alan Sheinberg
>Priority: Critical
>  Labels: test-stability
> Attachments: FLINK-34404.failure.log, FLINK-34404.success.log
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57357=logs=32715a4c-21b8-59a3-4171-744e5ab107eb=ff64056b-5320-5afe-c22c-6fa339e59586=11603
> {code}
> Feb 07 02:17:40 "ForkJoinPool-74-worker-1" #382 daemon prio=5 os_prio=0 
> cpu=282.22ms elapsed=961.78s tid=0x7f880a485c00 nid=0x6745 waiting on 
> condition  [0x7f878a6f9000]
> Feb 07 02:17:40java.lang.Thread.State: WAITING (parking)
> Feb 07 02:17:40   at 
> jdk.internal.misc.Unsafe.park(java.base@17.0.7/Native Method)
> Feb 07 02:17:40   - parking to wait for  <0xff73d060> (a 
> java.util.concurrent.CompletableFuture$Signaller)
> Feb 07 02:17:40   at 
> java.util.concurrent.locks.LockSupport.park(java.base@17.0.7/LockSupport.java:211)
> Feb 07 02:17:40   at 
> java.util.concurrent.CompletableFuture$Signaller.block(java.base@17.0.7/CompletableFuture.java:1864)
> Feb 07 02:17:40   at 
> java.util.concurrent.ForkJoinPool.compensatedBlock(java.base@17.0.7/ForkJoinPool.java:3449)
> Feb 07 02:17:40   at 
> java.util.concurrent.ForkJoinPool.managedBlock(java.base@17.0.7/ForkJoinPool.java:3432)
> Feb 07 02:17:40   at 
> java.util.concurrent.CompletableFuture.waitingGet(java.base@17.0.7/CompletableFuture.java:1898)
> Feb 07 02:17:40   at 
> java.util.concurrent.CompletableFuture.get(java.base@17.0.7/CompletableFuture.java:2072)
> Feb 07 02:17:40   at 
> org.apache.flink.table.planner.plan.nodes.exec.testutils.RestoreTestBase.testRestore(RestoreTestBase.java:292)
> Feb 07 02:17:40   at 
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@17.0.7/Native 
> Method)
> Feb 07 02:17:40   at 
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@17.0.7/NativeMethodAccessorImpl.java:77)
> Feb 07 02:17:40   at 
> jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@17.0.7/DelegatingMethodAccessorImpl.java:43)
> Feb 07 02:17:40   at 
> java.lang.reflect.Method.invoke(java.base@17.0.7/Method.java:568)
> Feb 07 02:17:40   at 
> org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:728)
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-28440) EventTimeWindowCheckpointingITCase failed with restore

2024-05-31 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-28440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851083#comment-17851083
 ] 

Ryan Skraba commented on FLINK-28440:
-

* 1.18 Default (Java 8) / Test (module: tests) 
https://github.com/apache/flink/actions/runs/9314898957/job/25640372228#step:10:8136

> EventTimeWindowCheckpointingITCase failed with restore
> --
>
> Key: FLINK-28440
> URL: https://issues.apache.org/jira/browse/FLINK-28440
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing, Runtime / State Backends
>Affects Versions: 1.16.0, 1.17.0, 1.18.0, 1.19.0
>Reporter: Huang Xingbo
>Assignee: Yanfei Lei
>Priority: Critical
>  Labels: auto-deprioritized-critical, pull-request-available, 
> stale-assigned, test-stability
> Fix For: 1.20.0
>
> Attachments: image-2023-02-01-00-51-54-506.png, 
> image-2023-02-01-01-10-01-521.png, image-2023-02-01-01-19-12-182.png, 
> image-2023-02-01-16-47-23-756.png, image-2023-02-01-16-57-43-889.png, 
> image-2023-02-02-10-52-56-599.png, image-2023-02-03-10-09-07-586.png, 
> image-2023-02-03-12-03-16-155.png, image-2023-02-03-12-03-56-614.png
>
>
> {code:java}
> Caused by: java.lang.Exception: Exception while creating 
> StreamOperatorStateContext.
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:256)
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:268)
>   at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:106)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:722)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:698)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:665)
>   at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:935)
>   at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:904)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:728)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:550)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for WindowOperator_0a448493b4782967b150582570326227_(2/4) from 
> any of the 1 provided restore options.
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:160)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:353)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:165)
>   ... 11 more
> Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: 
> /tmp/junit1835099326935900400/junit1113650082510421526/52ee65b7-033f-4429-8ddd-adbe85e27ced
>  (No such file or directory)
>   at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:321)
>   at 
> org.apache.flink.runtime.state.changelog.StateChangelogHandleStreamHandleReader$1.advance(StateChangelogHandleStreamHandleReader.java:87)
>   at 
> org.apache.flink.runtime.state.changelog.StateChangelogHandleStreamHandleReader$1.hasNext(StateChangelogHandleStreamHandleReader.java:69)
>   at 
> org.apache.flink.state.changelog.restore.ChangelogBackendRestoreOperation.readBackendHandle(ChangelogBackendRestoreOperation.java:96)
>   at 
> org.apache.flink.state.changelog.restore.ChangelogBackendRestoreOperation.restore(ChangelogBackendRestoreOperation.java:75)
>   at 
> org.apache.flink.state.changelog.ChangelogStateBackend.restore(ChangelogStateBackend.java:92)
>   at 
> org.apache.flink.state.changelog.AbstractChangelogStateBackend.createKeyedStateBackend(AbstractChangelogStateBackend.java:136)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:336)
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:16

[jira] [Commented] (FLINK-33186) CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished fails on AZP

2024-05-31 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851084#comment-17851084
 ] 

Ryan Skraba commented on FLINK-33186:
-

* 1.20 Default (Java 8) / Test (module: test) 
https://github.com/apache/flink/actions/runs/9304086727/job/25608401253#step:10:8187
* 1.20 Java 11 / Test (module: tests) 
https://github.com/apache/flink/actions/runs/9295906525/job/25583844657#step:10:7960

>  CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished 
> fails on AZP
> -
>
> Key: FLINK-33186
> URL: https://issues.apache.org/jira/browse/FLINK-33186
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.19.0, 1.18.1
>Reporter: Sergey Nuyanzin
>Assignee: Jiang Xin
>Priority: Critical
>  Labels: test-stability
>
> This build 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53509=logs=baf26b34-3c6a-54e8-f93f-cf269b32f802=8c9d126d-57d2-5a9e-a8c8-ff53f7b35cd9=8762
> fails as
> {noformat}
> Sep 28 01:23:43 Caused by: 
> org.apache.flink.runtime.checkpoint.CheckpointException: Task local 
> checkpoint failure.
> Sep 28 01:23:43   at 
> org.apache.flink.runtime.checkpoint.PendingCheckpoint.abort(PendingCheckpoint.java:550)
> Sep 28 01:23:43   at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2248)
> Sep 28 01:23:43   at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2235)
> Sep 28 01:23:43   at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.lambda$null$9(CheckpointCoordinator.java:817)
> Sep 28 01:23:43   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> Sep 28 01:23:43   at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> Sep 28 01:23:43   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> Sep 28 01:23:43   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> Sep 28 01:23:43   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> Sep 28 01:23:43   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> Sep 28 01:23:43   at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35499) EventTimeWindowCheckpointingITCase times out due to Checkpoint expired before completing

2024-05-31 Thread Ryan Skraba (Jira)
Ryan Skraba created FLINK-35499:
---

 Summary: EventTimeWindowCheckpointingITCase times out due to 
Checkpoint expired before completing
 Key: FLINK-35499
 URL: https://issues.apache.org/jira/browse/FLINK-35499
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.20.0
Reporter: Ryan Skraba


* 1.20 AdaptiveScheduler / Test (module: tests) 
https://github.com/apache/flink/actions/runs/9311892945/job/25632037990#step:10:8702
* 1.20 Default (Java 8) / Test (module: tests) 
https://github.com/apache/flink/actions/runs/9275522134/job/25520829730#step:10:8264

Going into the logs, we see the following error occurs:
{code:java}

Test testTumblingTimeWindow[statebackend type =ROCKSDB_INCREMENTAL, 
buffersPerChannel = 
2](org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase) is 
running.

<...>
20:24:23,562 [Checkpoint Timer] INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator[] - Triggering 
checkpoint 22 (type=CheckpointType{name='Checkpoint', 
sharingFilesStrategy=FORWARD_BACKWARD}) @ 1716927863562 for job 
15d0a663cb415b09b9a68ccc40640c6d.
20:24:23,609 [jobmanager-io-thread-2] INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator[] - Completed 
checkpoint 22 for job 15d0a663cb415b09b9a68ccc40640c6d (2349132 bytes, 
checkpointDuration=43 ms, finalizationTime=4 ms).
20:24:23,610 [Checkpoint Timer] INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator[] - Triggering 
checkpoint 23 (type=CheckpointType{name='Checkpoint', 
sharingFilesStrategy=FORWARD_BACKWARD}) @ 1716927863610 for job 
15d0a663cb415b09b9a68ccc40640c6d.
20:24:23,620 [jobmanager-io-thread-2] WARN  
org.apache.flink.runtime.jobmaster.JobMaster [] - Error while 
processing AcknowledgeCheckpoint message
java.lang.IllegalStateException: Attempt to reference unknown state: 
a9a90973-4ee5-384f-acef-58a7c7560920
at 
org.apache.flink.util.Preconditions.checkState(Preconditions.java:193) 
~[flink-core-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
at 
org.apache.flink.runtime.state.SharedStateRegistryImpl.registerReference(SharedStateRegistryImpl.java:97)
 ~[flink-runtime-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
at 
org.apache.flink.runtime.state.SharedStateRegistry.registerReference(SharedStateRegistry.java:53)
 ~[flink-runtime-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
at 
org.apache.flink.runtime.state.IncrementalRemoteKeyedStateHandle.registerSharedStates(IncrementalRemoteKeyedStateHandle.java:289)
 ~[flink-runtime-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
at 
org.apache.flink.runtime.checkpoint.OperatorSubtaskState.registerSharedState(OperatorSubtaskState.java:243)
 ~[flink-runtime-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
at 
org.apache.flink.runtime.checkpoint.OperatorSubtaskState.registerSharedStates(OperatorSubtaskState.java:226)
 ~[flink-runtime-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
at 
org.apache.flink.runtime.checkpoint.TaskStateSnapshot.registerSharedStates(TaskStateSnapshot.java:193)
 ~[flink-runtime-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.receiveAcknowledgeMessage(CheckpointCoordinator.java:1245)
 ~[flink-runtime-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
at 
org.apache.flink.runtime.scheduler.ExecutionGraphHandler.lambda$acknowledgeCheckpoint$2(ExecutionGraphHandler.java:109)
 ~[flink-runtime-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
at 
org.apache.flink.runtime.scheduler.ExecutionGraphHandler.lambda$processCheckpointCoordinatorMessage$4(ExecutionGraphHandler.java:139)
 ~[flink-runtime-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
at 
org.apache.flink.util.MdcUtils.lambda$wrapRunnable$1(MdcUtils.java:64) 
~[flink-core-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_392]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_392]
at java.lang.Thread.run(Thread.java:750) [?:1.8.0_392]
20:24:23,663 [Source: Custom Source (1/1)#1] INFO  
org.apache.flink.runtime.taskmanager.Task[] - Source: 
Custom Source (1/1)#1 
(bc4de0d149fba0ca825771ff7eeae08d_bc764cd8ddf7a0cff126f51c16239658_0_1) 
switched from RUNNING to FINISHED.
20:24:23,663 [Source: Custom Source (1/1)#1] INFO  
org.apache.flink.runtime.taskmanager.Task[] - Freeing task 
resources for Source: Custom Source (1/1)#1 
(bc4de0d149fba0ca825771ff7eeae08d_bc764cd8ddf7a0cff126f51c16239658_0_1).
20:24:23,663 [flink-pekko.actor.default-dispatcher-8] INFO  
org.apache.flink.runtime.taskexecutor.TaskExecutor   [] - 
Un-registering task and sending final execution state FINISHED to JobManager 
fo

[jira] [Commented] (FLINK-35192) Kubernetes operator oom

2024-05-31 Thread chenyuzhi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851055#comment-17851055
 ] 

chenyuzhi commented on FLINK-35192:
---

I've picked up the relevant pr and running it in my local environment for more 
than 1 week without any further OOM issues. Thus I think this Jira has been 
solved!
 
[~gaborgsomogyi] 
 
cc [~gyfora] [~bgeng777] 

> Kubernetes operator oom
> ---
>
> Key: FLINK-35192
> URL: https://issues.apache.org/jira/browse/FLINK-35192
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-1.6.1
> Environment: jdk: openjdk11
> operator version: 1.6.1
>Reporter: chenyuzhi
>Priority: Major
>  Labels: pull-request-available
> Fix For: kubernetes-operator-1.9.0
>
> Attachments: image-2024-04-22-15-47-49-455.png, 
> image-2024-04-22-15-52-51-600.png, image-2024-04-22-15-58-23-269.png, 
> image-2024-04-22-15-58-42-850.png, image-2024-04-30-16-47-07-289.png, 
> image-2024-04-30-17-11-24-974.png, image-2024-04-30-20-38-25-195.png, 
> image-2024-04-30-20-39-05-109.png, image-2024-04-30-20-39-34-396.png, 
> image-2024-04-30-20-41-51-660.png, image-2024-04-30-20-43-20-125.png, 
> screenshot-1.png, screenshot-2.png, screenshot-3.png, screenshot-4.png
>
>
> The kubernetest operator docker process was killed by kernel cause out of 
> memory(the time is 2024.04.03: 18:16)
>  !image-2024-04-22-15-47-49-455.png! 
> Metrics:
> the pod memory (RSS) is increasing slowly in the past 7 days:
>  !screenshot-1.png! 
> However the jvm memory metrics of operator not shown obvious anomaly:
>  !image-2024-04-22-15-58-23-269.png! 
>  !image-2024-04-22-15-58-42-850.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-35498) Unexpected argument name conflict error when do extract method params from udf

2024-05-31 Thread lincoln lee (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lincoln lee updated FLINK-35498:

Description: 
Follow the steps to reproduce the error:

test case:
{code:java}
util.addTemporarySystemFunction("myudf", new TestXyz)
util.tableEnv.explainSql("select myudf(f1, f2) from t")
{code}
 

udf: TestXyz 
{code:java}
public class TestXyz extends ScalarFunction {
public String eval(String s1, String s2) {

// will not fail if add initialization
String localV1;

if (s1 == null) {
if (s2 != null) {
localV1 = s2;
} else {
localV1 = s2 + s1;
}
} else {
if ("xx".equals(s2)) {
localV1 = s1.length() >= s2.length() ? s1 : s2;
} else {
localV1 = s1;
}
}
if (s1 == null) {
return s2 + localV1;
}
if (s2 == null) {
return s1;
}
return s1.length() >= s2.length() ? s1 + localV1 : s2;
}
}
{code}
 

error stack:
{code:java}
Caused by: org.apache.flink.table.api.ValidationException: Unable to extract a 
type inference from method:
public java.lang.String 
org.apache.flink.table.planner.runtime.utils.TestXyz.eval(java.lang.String,java.lang.String)
    at 
org.apache.flink.table.types.extraction.ExtractionUtils.extractionError(ExtractionUtils.java:362)
    at 
org.apache.flink.table.types.extraction.BaseMappingExtractor.extractResultMappings(BaseMappingExtractor.java:154)
    at 
org.apache.flink.table.types.extraction.BaseMappingExtractor.extractOutputMapping(BaseMappingExtractor.java:100)
    ... 53 more
Caused by: org.apache.flink.table.api.ValidationException: Argument name 
conflict, there are at least two argument names that are the same.
    at 
org.apache.flink.table.types.extraction.ExtractionUtils.extractionError(ExtractionUtils.java:362)
    at 
org.apache.flink.table.types.extraction.ExtractionUtils.extractionError(ExtractionUtils.java:357)
    at 
org.apache.flink.table.types.extraction.FunctionSignatureTemplate.of(FunctionSignatureTemplate.java:73)
    at 
org.apache.flink.table.types.extraction.BaseMappingExtractor.lambda$createParameterSignatureExtraction$9(BaseMappingExtractor.java:381)
    at 
org.apache.flink.table.types.extraction.BaseMappingExtractor.putExtractedResultMappings(BaseMappingExtractor.java:298)
    at 
org.apache.flink.table.types.extraction.BaseMappingExtractor.collectMethodMappings(BaseMappingExtractor.java:244)
    at 
org.apache.flink.table.types.extraction.BaseMappingExtractor.extractResultMappings(BaseMappingExtractor.java:137)
    ... 54 more

{code}

  was:
Follow the steps to reproduce the error:

test case:

{code}
util.addTemporarySystemFunction("myudf", new TestXyz)
util.tableEnv.explainSql("select myudf(f1, f2) from t")
{code}

 

udf: TestXyz 

{code}
public class TestXyz extends ScalarFunction {
public String eval(String s1, String s2) {
String localV1;

if (s1 == null) {
if (s2 != null) {
localV1 = s2;
} else {
localV1 = s2 + s1;
}
} else {
if ("xx".equals(s2)) {
localV1 = s1.length() >= s2.length() ? s1 : s2;
} else {
localV1 = s1;
}
}
if (s1 == null) {
return s2 + localV1;
}
if (s2 == null) {
return s1;
}
return s1.length() >= s2.length() ? s1 + localV1 : s2;
}
}
{code}

 

error stack:

{code}

Caused by: org.apache.flink.table.api.ValidationException: Unable to extract a 
type inference from method:
public java.lang.String 
org.apache.flink.table.planner.runtime.utils.TestXyz.eval(java.lang.String,java.lang.String)
    at 
org.apache.flink.table.types.extraction.ExtractionUtils.extractionError(ExtractionUtils.java:362)
    at 
org.apache.flink.table.types.extraction.BaseMappingExtractor.extractResultMappings(BaseMappingExtractor.java:154)
    at 
org.apache.flink.table.types.extraction.BaseMappingExtractor.extractOutputMapping(BaseMappingExtractor.java:100)
    ... 53 more
Caused by: org.apache.flink.table.api.ValidationException: Argument name 
conflict, there are at least two argument names that are the same.
    at 
org.apache.flink.table.types.extraction.ExtractionUtils.extractionError(ExtractionUtils.java:362)
    at 
org.apache.flink.table.types.extraction.ExtractionUtils.extractionError(ExtractionUtils.java:357)
    at 
org.apache.flink.table.types.extraction.FunctionSignatureTemplate.of(FunctionSignatureTemplate.java:73)
    at 
org.apache.flink.table.types.extraction.BaseMappingExtractor.lambda$createParameterSignatureExtraction$9(BaseMappingExtractor.java:381)
    at 
org.apache.flink.table.types.extraction.BaseMappingExtractor.putExtractedResultMappings(BaseMappingExtractor.java:298)
    at 
org.apache.flink.table.types.extraction.BaseMappingExtractor.collectMethodMappings(BaseMappingExtractor.java:244)
    at 
org.apache.flink.table.types.extraction.BaseMappingExtractor.extractResultMappings(BaseMappingExtractor.java:137)
    ... 54 more

{code}


> Unexpected argument name conflict error when do extract method params from udf
> -

[jira] [Created] (FLINK-35498) Unexpected argument name conflict error when do extract method params from udf

2024-05-31 Thread lincoln lee (Jira)
lincoln lee created FLINK-35498:
---

 Summary: Unexpected argument name conflict error when do extract 
method params from udf
 Key: FLINK-35498
 URL: https://issues.apache.org/jira/browse/FLINK-35498
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.19.0, 1.20.0
Reporter: lincoln lee
Assignee: xuyang


Follow the steps to reproduce the error:

test case:

{code}
util.addTemporarySystemFunction("myudf", new TestXyz)
util.tableEnv.explainSql("select myudf(f1, f2) from t")
{code}

 

udf: TestXyz 

{code}
public class TestXyz extends ScalarFunction {
public String eval(String s1, String s2) {
String localV1;

if (s1 == null) {
if (s2 != null) {
localV1 = s2;
} else {
localV1 = s2 + s1;
}
} else {
if ("xx".equals(s2)) {
localV1 = s1.length() >= s2.length() ? s1 : s2;
} else {
localV1 = s1;
}
}
if (s1 == null) {
return s2 + localV1;
}
if (s2 == null) {
return s1;
}
return s1.length() >= s2.length() ? s1 + localV1 : s2;
}
}
{code}

 

error stack:

{code}

Caused by: org.apache.flink.table.api.ValidationException: Unable to extract a 
type inference from method:
public java.lang.String 
org.apache.flink.table.planner.runtime.utils.TestXyz.eval(java.lang.String,java.lang.String)
    at 
org.apache.flink.table.types.extraction.ExtractionUtils.extractionError(ExtractionUtils.java:362)
    at 
org.apache.flink.table.types.extraction.BaseMappingExtractor.extractResultMappings(BaseMappingExtractor.java:154)
    at 
org.apache.flink.table.types.extraction.BaseMappingExtractor.extractOutputMapping(BaseMappingExtractor.java:100)
    ... 53 more
Caused by: org.apache.flink.table.api.ValidationException: Argument name 
conflict, there are at least two argument names that are the same.
    at 
org.apache.flink.table.types.extraction.ExtractionUtils.extractionError(ExtractionUtils.java:362)
    at 
org.apache.flink.table.types.extraction.ExtractionUtils.extractionError(ExtractionUtils.java:357)
    at 
org.apache.flink.table.types.extraction.FunctionSignatureTemplate.of(FunctionSignatureTemplate.java:73)
    at 
org.apache.flink.table.types.extraction.BaseMappingExtractor.lambda$createParameterSignatureExtraction$9(BaseMappingExtractor.java:381)
    at 
org.apache.flink.table.types.extraction.BaseMappingExtractor.putExtractedResultMappings(BaseMappingExtractor.java:298)
    at 
org.apache.flink.table.types.extraction.BaseMappingExtractor.collectMethodMappings(BaseMappingExtractor.java:244)
    at 
org.apache.flink.table.types.extraction.BaseMappingExtractor.extractResultMappings(BaseMappingExtractor.java:137)
    ... 54 more

{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-35333) JdbcXaSinkTestBase fails in weekly Flink JDBC Connector tests

2024-05-31 Thread Sergey Nuyanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Nuyanzin resolved FLINK-35333.
-
Fix Version/s: jdbc-3.2.0
   Resolution: Fixed

> JdbcXaSinkTestBase fails in weekly Flink JDBC Connector tests
> -
>
> Key: FLINK-35333
> URL: https://issues.apache.org/jira/browse/FLINK-35333
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / JDBC
>Affects Versions: jdbc-3.2.0
>Reporter: Martijn Visser
>Assignee: João Boto
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: jdbc-3.2.0
>
>
> https://github.com/apache/flink-connector-jdbc/actions/runs/9047366679/job/24859224407#step:15:147
> {code:java}
> Error:  Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.8.0:testCompile 
> (default-testCompile) on project flink-connector-jdbc: Compilation failure
> Error:  
> /home/runner/work/flink-connector-jdbc/flink-connector-jdbc/flink-connector-jdbc/src/test/java/org/apache/flink/connector/jdbc/xa/JdbcXaSinkTestBase.java:[164,37]
>   is not 
> abstract and does not override abstract method getTaskInfo() in 
> org.apache.flink.api.common.functions.RuntimeContext
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35333) JdbcXaSinkTestBase fails in weekly Flink JDBC Connector tests

2024-05-31 Thread Sergey Nuyanzin (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851038#comment-17851038
 ] 

Sergey Nuyanzin commented on FLINK-35333:
-

weekly passed
https://github.com/apache/flink-connector-jdbc/actions/runs/9316249855

> JdbcXaSinkTestBase fails in weekly Flink JDBC Connector tests
> -
>
> Key: FLINK-35333
> URL: https://issues.apache.org/jira/browse/FLINK-35333
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / JDBC
>Affects Versions: jdbc-3.2.0
>Reporter: Martijn Visser
>Assignee: João Boto
>Priority: Blocker
>  Labels: pull-request-available, test-stability
>
> https://github.com/apache/flink-connector-jdbc/actions/runs/9047366679/job/24859224407#step:15:147
> {code:java}
> Error:  Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.8.0:testCompile 
> (default-testCompile) on project flink-connector-jdbc: Compilation failure
> Error:  
> /home/runner/work/flink-connector-jdbc/flink-connector-jdbc/flink-connector-jdbc/src/test/java/org/apache/flink/connector/jdbc/xa/JdbcXaSinkTestBase.java:[164,37]
>   is not 
> abstract and does not override abstract method getTaskInfo() in 
> org.apache.flink.api.common.functions.RuntimeContext
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-35418) EventTimeWindowCheckpointingITCase fails with an NPE

2024-05-31 Thread Ryan Skraba (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Skraba closed FLINK-35418.
---
Resolution: Duplicate

> EventTimeWindowCheckpointingITCase fails with an NPE
> 
>
> Key: FLINK-35418
> URL: https://issues.apache.org/jira/browse/FLINK-35418
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.20.0
>Reporter: Ryan Skraba
>Priority: Critical
>  Labels: test-stability
>
> * 1.20 Default (Java 8) / Test (module: tests) 
> [https://github.com/apache/flink/actions/runs/9185169193/job/25258948607#step:10:8106]
> It looks like it's possible for PhysicalFile to generate a 
> NullPointerException while a checkpoint is being aborted:
> {code}
> May 22 04:35:18 Starting 
> org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase#testTumblingTimeWindow[statebackend
>  type =ROCKSDB_INCREMENTAL_ZK, buffersPerChannel = 2].
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
>   at 
> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144)
>   at 
> org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$3(MiniClusterJobClient.java:141)
>   at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
>   at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
>   at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
>   at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
>   at 
> org.apache.flink.runtime.rpc.pekko.PekkoInvocationHandler.lambda$invokeRpc$1(PekkoInvocationHandler.java:268)
>   at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
>   at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
>   at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
>   at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
>   at 
> org.apache.flink.util.concurrent.FutureUtils.doForward(FutureUtils.java:1287)
>   at 
> org.apache.flink.runtime.concurrent.ClassLoadingUtils.lambda$null$1(ClassLoadingUtils.java:93)
>   at 
> org.apache.flink.runtime.concurrent.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
>   at 
> org.apache.flink.runtime.concurrent.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$2(ClassLoadingUtils.java:92)
>   at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
>   at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
>   at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
>   at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
>   at 
> org.apache.flink.runtime.concurrent.pekko.ScalaFutureUtils$1.onComplete(ScalaFutureUtils.java:47)
>   at org.apache.pekko.dispatch.OnComplete.internal(Future.scala:310)
>   at org.apache.pekko.dispatch.OnComplete.internal(Future.scala:307)
>   at org.apache.pekko.dispatch.japi$CallbackBridge.apply(Future.scala:234)
>   at org.apache.pekko.dispatch.japi$CallbackBridge.apply(Future.scala:231)
>   at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
>   at 
> org.apache.flink.runtime.concurrent.pekko.ScalaFutureUtils$DirectExecutionContext.execute(ScalaFutureUtils.java:65)
>   at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:72)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:288)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:288)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:288)
>   at org.apache.pekko.pattern.PromiseActorRef.$bang(AskSupport.scala:629)
>   at 
> org.apache.pekko.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:34)
>   at 
> org.apache.pekko.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:33)
>   at scala.concurrent.Future.$anonfun$andThen$1(Future.scala:536)
>   at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
>   at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
>   at sca

[jira] [Updated] (FLINK-35475) Introduce isInternalSorterSupport to OperatorAttributes

2024-05-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-35475:
---
Labels: pull-request-available  (was: )

> Introduce isInternalSorterSupport to OperatorAttributes
> ---
>
> Key: FLINK-35475
> URL: https://issues.apache.org/jira/browse/FLINK-35475
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Task
>Reporter: Xuannan Su
>Priority: Major
>  Labels: pull-request-available
>
> Introduce isInternalSorterSupport to OperatorAttributes to notify Flink 
> whether the operator will sort the data internally in batch mode or during 
> backlog.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-35496) The annotations of the new JDBC connector should be changed to non-Public/non-PublicEvolving

2024-05-31 Thread Rui Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Fan reassigned FLINK-35496:
---

Assignee: João Boto

> The annotations of the new JDBC connector should be changed to 
> non-Public/non-PublicEvolving
> 
>
> Key: FLINK-35496
> URL: https://issues.apache.org/jira/browse/FLINK-35496
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Connectors / JDBC
>Reporter: RocMarshal
>Assignee: João Boto
>Priority: Major
>  Labels: pull-request-available
>
> In general, we use the Experimental annotation instead of {{PublicEvolving}}  
> or {{Public}}  for new features or new apis. But  {{JdbcSink}} and 
> JdbcSource(merged ) was marked as {{PublicEvolving}}  in the first version. 
> [~fanrui]  commented it to the original PR[1].[1] 
> [https://github.com/apache/flink-connector-jdbc/pull/2#discussion_r1621857589]
> CC [~eskabetxe] [~Sergey Nuyanzin] [~fanrui] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-35496) The annotations of the new JDBC connector should be changed to non-Public/non-PublicEvolving

2024-05-31 Thread Rui Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Fan reassigned FLINK-35496:
---

Assignee: (was: RocMarshal)

> The annotations of the new JDBC connector should be changed to 
> non-Public/non-PublicEvolving
> 
>
> Key: FLINK-35496
> URL: https://issues.apache.org/jira/browse/FLINK-35496
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Connectors / JDBC
>Reporter: RocMarshal
>Priority: Major
>  Labels: pull-request-available
>
> In general, we use the Experimental annotation instead of {{PublicEvolving}}  
> or {{Public}}  for new features or new apis. But  {{JdbcSink}} and 
> JdbcSource(merged ) was marked as {{PublicEvolving}}  in the first version. 
> [~fanrui]  commented it to the original PR[1].[1] 
> [https://github.com/apache/flink-connector-jdbc/pull/2#discussion_r1621857589]
> CC [~eskabetxe] [~Sergey Nuyanzin] [~fanrui] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35333) JdbcXaSinkTestBase fails in weekly Flink JDBC Connector tests

2024-05-31 Thread Sergey Nuyanzin (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851014#comment-17851014
 ] 

Sergey Nuyanzin commented on FLINK-35333:
-

I've started weekly job to see whether it passes or not
https://github.com/apache/flink-connector-jdbc/actions/runs/9315890964

in case it passes we can close the issue as fixed

> JdbcXaSinkTestBase fails in weekly Flink JDBC Connector tests
> -
>
> Key: FLINK-35333
> URL: https://issues.apache.org/jira/browse/FLINK-35333
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / JDBC
>Affects Versions: jdbc-3.2.0
>Reporter: Martijn Visser
>Assignee: João Boto
>Priority: Blocker
>  Labels: pull-request-available, test-stability
>
> https://github.com/apache/flink-connector-jdbc/actions/runs/9047366679/job/24859224407#step:15:147
> {code:java}
> Error:  Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.8.0:testCompile 
> (default-testCompile) on project flink-connector-jdbc: Compilation failure
> Error:  
> /home/runner/work/flink-connector-jdbc/flink-connector-jdbc/flink-connector-jdbc/src/test/java/org/apache/flink/connector/jdbc/xa/JdbcXaSinkTestBase.java:[164,37]
>   is not 
> abstract and does not override abstract method getTaskInfo() in 
> org.apache.flink.api.common.functions.RuntimeContext
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35333) JdbcXaSinkTestBase fails in weekly Flink JDBC Connector tests

2024-05-31 Thread Sergey Nuyanzin (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851011#comment-17851011
 ] 

Sergey Nuyanzin commented on FLINK-35333:
-

Thanks for working on it [~eskabetxe]
Merged as 
[c80f95c5461464b5cf7613602e1bbd097ff418d8|https://github.com/apache/flink-connector-jdbc/commit/c80f95c5461464b5cf7613602e1bbd097ff418d8]

> JdbcXaSinkTestBase fails in weekly Flink JDBC Connector tests
> -
>
> Key: FLINK-35333
> URL: https://issues.apache.org/jira/browse/FLINK-35333
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / JDBC
>Affects Versions: jdbc-3.2.0
>Reporter: Martijn Visser
>Assignee: João Boto
>Priority: Blocker
>  Labels: pull-request-available, test-stability
>
> https://github.com/apache/flink-connector-jdbc/actions/runs/9047366679/job/24859224407#step:15:147
> {code:java}
> Error:  Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.8.0:testCompile 
> (default-testCompile) on project flink-connector-jdbc: Compilation failure
> Error:  
> /home/runner/work/flink-connector-jdbc/flink-connector-jdbc/flink-connector-jdbc/src/test/java/org/apache/flink/connector/jdbc/xa/JdbcXaSinkTestBase.java:[164,37]
>   is not 
> abstract and does not override abstract method getTaskInfo() in 
> org.apache.flink.api.common.functions.RuntimeContext
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-35333) JdbcXaSinkTestBase fails in weekly Flink JDBC Connector tests

2024-05-31 Thread Sergey Nuyanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Nuyanzin reassigned FLINK-35333:
---

Assignee: João Boto

> JdbcXaSinkTestBase fails in weekly Flink JDBC Connector tests
> -
>
> Key: FLINK-35333
> URL: https://issues.apache.org/jira/browse/FLINK-35333
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / JDBC
>Affects Versions: jdbc-3.2.0
>Reporter: Martijn Visser
>Assignee: João Boto
>Priority: Blocker
>  Labels: pull-request-available, test-stability
>
> https://github.com/apache/flink-connector-jdbc/actions/runs/9047366679/job/24859224407#step:15:147
> {code:java}
> Error:  Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.8.0:testCompile 
> (default-testCompile) on project flink-connector-jdbc: Compilation failure
> Error:  
> /home/runner/work/flink-connector-jdbc/flink-connector-jdbc/flink-connector-jdbc/src/test/java/org/apache/flink/connector/jdbc/xa/JdbcXaSinkTestBase.java:[164,37]
>   is not 
> abstract and does not override abstract method getTaskInfo() in 
> org.apache.flink.api.common.functions.RuntimeContext
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-32081) Compatibility between file-merging on and off across job runs

2024-05-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-32081:
---
Labels: pull-request-available  (was: )

> Compatibility between file-merging on and off across job runs
> -
>
> Key: FLINK-32081
> URL: https://issues.apache.org/jira/browse/FLINK-32081
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Checkpointing, Runtime / State Backends
>Affects Versions: 1.18.0
>Reporter: Zakelly Lan
>Assignee: Jinzhong Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35497) The wrong enum value was used to get month in timestampDiff

2024-05-31 Thread haishui (Jira)
haishui created FLINK-35497:
---

 Summary: The wrong enum value was used to get month in 
timestampDiff
 Key: FLINK-35497
 URL: https://issues.apache.org/jira/browse/FLINK-35497
 Project: Flink
  Issue Type: Bug
  Components: Flink CDC
Affects Versions: cdc-3.1.0
Reporter: haishui


In 
[SystemFunctionUtils#timestampDiff](https://github.com/apache/flink-cdc/blob/master/flink-cdc-runtime/src/main/java/org/apache/flink/cdc/runtime/functions/SystemFunctionUtils.java#L125):
 

{code:java}
case "MONTH":
return to.get(Calendar.YEAR) * 12
+ to.get(Calendar.MONDAY)
- (from.get(Calendar.YEAR) * 12 + from.get(Calendar.MONDAY)); {code}
The Calendar.MONDAY can be replaced with Calendar.MONTH.

This does not affect the calculation results, because Calendar.MONDAY = 
Calendar.MONTH = 2.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-35496) The annotations of the new JDBC connector should be changed to non-Public/non-PublicEvolving

2024-05-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-35496:
---
Labels: pull-request-available  (was: )

> The annotations of the new JDBC connector should be changed to 
> non-Public/non-PublicEvolving
> 
>
> Key: FLINK-35496
> URL: https://issues.apache.org/jira/browse/FLINK-35496
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Connectors / JDBC
>Reporter: RocMarshal
>Assignee: RocMarshal
>Priority: Major
>  Labels: pull-request-available
>
> In general, we use the Experimental annotation instead of {{PublicEvolving}}  
> or {{Public}}  for new features or new apis. But  {{JdbcSink}} and 
> JdbcSource(merged ) was marked as {{PublicEvolving}}  in the first version. 
> [~fanrui]  commented it to the original PR[1].[1] 
> [https://github.com/apache/flink-connector-jdbc/pull/2#discussion_r1621857589]
> CC [~eskabetxe] [~Sergey Nuyanzin] [~fanrui] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-35494) Reorganize sources

2024-05-31 Thread Leonard Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leonard Xu reassigned FLINK-35494:
--

Assignee: João Boto

> Reorganize sources
> --
>
> Key: FLINK-35494
> URL: https://issues.apache.org/jira/browse/FLINK-35494
> Project: Flink
>  Issue Type: Sub-task
>Reporter: João Boto
>Assignee: João Boto
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-35370) Create a temp module to test backward compatibility

2024-05-31 Thread Leonard Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leonard Xu reassigned FLINK-35370:
--

Assignee: João Boto

> Create a temp module to test backward compatibility
> ---
>
> Key: FLINK-35370
> URL: https://issues.apache.org/jira/browse/FLINK-35370
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / JDBC
>Reporter: João Boto
>Assignee: João Boto
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-35368) Reorganize table code

2024-05-31 Thread Leonard Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leonard Xu reassigned FLINK-35368:
--

Assignee: João Boto

> Reorganize table code
> -
>
> Key: FLINK-35368
> URL: https://issues.apache.org/jira/browse/FLINK-35368
> Project: Flink
>  Issue Type: Sub-task
>Reporter: João Boto
>Assignee: João Boto
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-35367) Reorganize sinks

2024-05-31 Thread Leonard Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leonard Xu reassigned FLINK-35367:
--

Assignee: João Boto

> Reorganize sinks
> 
>
> Key: FLINK-35367
> URL: https://issues.apache.org/jira/browse/FLINK-35367
> Project: Flink
>  Issue Type: Sub-task
>Reporter: João Boto
>Assignee: João Boto
>Priority: Major
>
> Reorganize datastream sink and source



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-35364) Create core module and move code

2024-05-31 Thread Leonard Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leonard Xu reassigned FLINK-35364:
--

Assignee: João Boto

> Create core module and move code
> 
>
> Key: FLINK-35364
> URL: https://issues.apache.org/jira/browse/FLINK-35364
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / JDBC
>Reporter: João Boto
>Assignee: João Boto
>Priority: Major
>
> * create core module
> * move all code to this new module as is
> * transforme flink-connector-jdbc in shaded module



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-35366) Create all database modules

2024-05-31 Thread Leonard Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leonard Xu reassigned FLINK-35366:
--

Assignee: João Boto

> Create all database modules
> ---
>
> Key: FLINK-35366
> URL: https://issues.apache.org/jira/browse/FLINK-35366
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / JDBC
>Reporter: João Boto
>Assignee: João Boto
>Priority: Major
>
> Create all database modules and move related code there



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35363) FLIP-449: Reorganization of flink-connector-jdbc

2024-05-31 Thread Leonard Xu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851001#comment-17851001
 ] 

Leonard Xu commented on FLINK-35363:


Sure!

> FLIP-449: Reorganization of flink-connector-jdbc
> 
>
> Key: FLINK-35363
> URL: https://issues.apache.org/jira/browse/FLINK-35363
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / JDBC
>Reporter: João Boto
>Assignee: João Boto
>Priority: Major
>  Labels: pull-request-available
>
> Described in: 
> [FLIP-449|https://cwiki.apache.org/confluence/display/FLINK/FLIP-449%3A+Reorganization+of+flink-connector-jdbc]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-35365) Reorganize catalog and dialect code

2024-05-31 Thread Leonard Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leonard Xu reassigned FLINK-35365:
--

Assignee: João Boto

> Reorganize catalog and dialect code
> ---
>
> Key: FLINK-35365
> URL: https://issues.apache.org/jira/browse/FLINK-35365
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / JDBC
>Reporter: João Boto
>Assignee: João Boto
>Priority: Major
>
> Reorganize code for catalog and dialect



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-35496) The annotations of the new JDBC connector should be changed to non-Public/non-PublicEvolving

2024-05-31 Thread Rui Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Fan reassigned FLINK-35496:
---

Assignee: RocMarshal

> The annotations of the new JDBC connector should be changed to 
> non-Public/non-PublicEvolving
> 
>
> Key: FLINK-35496
> URL: https://issues.apache.org/jira/browse/FLINK-35496
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Connectors / JDBC
>Reporter: RocMarshal
>Assignee: RocMarshal
>Priority: Major
>
> In general, we use the Experimental annotation instead of {{PublicEvolving}}  
> or {{Public}}  for new features or new apis. But  {{JdbcSink}} and 
> JdbcSource(merged ) was marked as {{PublicEvolving}}  in the first version. 
> [~fanrui]  commented it to the original PR[1].[1] 
> [https://github.com/apache/flink-connector-jdbc/pull/2#discussion_r1621857589]
> CC [~eskabetxe] [~Sergey Nuyanzin] [~fanrui] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35496) The annotations of the new JDBC connector should be changed to non-Public/non-PublicEvolving

2024-05-31 Thread RocMarshal (Jira)
RocMarshal created FLINK-35496:
--

 Summary: The annotations of the new JDBC connector should be 
changed to non-Public/non-PublicEvolving
 Key: FLINK-35496
 URL: https://issues.apache.org/jira/browse/FLINK-35496
 Project: Flink
  Issue Type: Technical Debt
  Components: Connectors / JDBC
Reporter: RocMarshal


In general, we use the Experimental annotation instead of {{PublicEvolving}}  
or {{Public}}  for new features or new apis. But  {{JdbcSink}} and 
JdbcSource(merged ) was marked as {{PublicEvolving}}  in the first version. 
[~fanrui]  commented it to the original PR[1].[1] 
[https://github.com/apache/flink-connector-jdbc/pull/2#discussion_r1621857589]

CC [~eskabetxe] [~Sergey Nuyanzin] [~fanrui] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35363) FLIP-449: Reorganization of flink-connector-jdbc

2024-05-31 Thread Jira


[ 
https://issues.apache.org/jira/browse/FLINK-35363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850967#comment-17850967
 ] 

João Boto commented on FLINK-35363:
---

Thank [~leonard]..
could you please assign also the sub-tasks please..

> FLIP-449: Reorganization of flink-connector-jdbc
> 
>
> Key: FLINK-35363
> URL: https://issues.apache.org/jira/browse/FLINK-35363
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / JDBC
>Reporter: João Boto
>Assignee: João Boto
>Priority: Major
>  Labels: pull-request-available
>
> Described in: 
> [FLIP-449|https://cwiki.apache.org/confluence/display/FLINK/FLIP-449%3A+Reorganization+of+flink-connector-jdbc]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-30719) flink-runtime-web failed due to a corrupted nodejs dependency

2024-05-31 Thread Piotr Nowojski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850965#comment-17850965
 ] 

Piotr Nowojski edited comment on FLINK-30719 at 5/31/24 8:11 AM:
-

{code:java}
14:47:52.892 [ERROR] Failed to execute goal 
com.github.eirslett:frontend-maven-plugin:1.11.0:install-node-and-npm (install 
node and npm) on project flink-runtime-web: Could not extract the Node archive: 
Could not extract archive: 
'/__w/3/.m2/repository/com/github/eirslett/node/16.13.2/node-16.13.2-linux-x64.tar.gz':
 EOFException -> [Help 1]
1
{code}
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59951=logs=52b61abe-a3cc-5bde-cc35-1bbe89bb7df5=54421a62-0c80-5aad-3319-094ff69180bb



was (Author: pnowojski):
{noformat}
14:47:52.892 [ERROR] Failed to execute goal 
com.github.eirslett:frontend-maven-plugin:1.11.0:install-node-and-npm (install 
node and npm) on project flink-runtime-web: Could not extract the Node archive: 
Could not extract archive: 
'/__w/3/.m2/repository/com/github/eirslett/node/16.13.2/node-16.13.2-linux-x64.tar.gz':
 EOFException -> [Help 1]
1
{noformat}
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59951=logs=52b61abe-a3cc-5bde-cc35-1bbe89bb7df5=54421a62-0c80-5aad-3319-094ff69180bb


> flink-runtime-web failed due to a corrupted nodejs dependency
> -
>
> Key: FLINK-30719
> URL: https://issues.apache.org/jira/browse/FLINK-30719
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Web Frontend, Test Infrastructure, Tests
>Affects Versions: 1.16.0, 1.17.0, 1.18.0
>Reporter: Matthias Pohl
>Assignee: Sergey Nuyanzin
>Priority: Critical
>  Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=44954=logs=52b61abe-a3cc-5bde-cc35-1bbe89bb7df5=54421a62-0c80-5aad-3319-094ff69180bb=12550
> The build failed due to a corrupted nodejs dependency:
> {code}
> [ERROR] The archive file 
> /__w/1/.m2/repository/com/github/eirslett/node/16.13.2/node-16.13.2-linux-x64.tar.gz
>  is corrupted and will be deleted. Please try the build again.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-30719) flink-runtime-web failed due to a corrupted nodejs dependency

2024-05-31 Thread Piotr Nowojski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850965#comment-17850965
 ] 

Piotr Nowojski commented on FLINK-30719:


{noformat}
14:47:52.892 [ERROR] Failed to execute goal 
com.github.eirslett:frontend-maven-plugin:1.11.0:install-node-and-npm (install 
node and npm) on project flink-runtime-web: Could not extract the Node archive: 
Could not extract archive: 
'/__w/3/.m2/repository/com/github/eirslett/node/16.13.2/node-16.13.2-linux-x64.tar.gz':
 EOFException -> [Help 1]
1
{noformat}
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59951=logs=52b61abe-a3cc-5bde-cc35-1bbe89bb7df5=54421a62-0c80-5aad-3319-094ff69180bb


> flink-runtime-web failed due to a corrupted nodejs dependency
> -
>
> Key: FLINK-30719
> URL: https://issues.apache.org/jira/browse/FLINK-30719
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Web Frontend, Test Infrastructure, Tests
>Affects Versions: 1.16.0, 1.17.0, 1.18.0
>Reporter: Matthias Pohl
>Assignee: Sergey Nuyanzin
>Priority: Critical
>  Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=44954=logs=52b61abe-a3cc-5bde-cc35-1bbe89bb7df5=54421a62-0c80-5aad-3319-094ff69180bb=12550
> The build failed due to a corrupted nodejs dependency:
> {code}
> [ERROR] The archive file 
> /__w/1/.m2/repository/com/github/eirslett/node/16.13.2/node-16.13.2-linux-x64.tar.gz
>  is corrupted and will be deleted. Please try the build again.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-32142) Apple Silicon Support: Unable to Build Flink Project due to "Bad CPU Type" Error

2024-05-31 Thread Piotr Nowojski (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piotr Nowojski closed FLINK-32142.
--
Resolution: Fixed

merged commit d075c9f into apache:master

> Apple Silicon Support: Unable to Build Flink Project due to "Bad CPU Type" 
> Error
> 
>
> Key: FLINK-32142
> URL: https://issues.apache.org/jira/browse/FLINK-32142
> Project: Flink
>  Issue Type: Bug
>  Components: Build System, Runtime / Web Frontend
>Affects Versions: 1.15.0, 1.15.1, 1.16.0, 1.17.0, 1.15.2, 1.15.3, 1.16.1, 
> 1.15.4, 1.16.2, 1.18.0, 1.17.1, 1.15.5, 1.16.3, 1.17.2
> Environment: Apple Silicon architecture (M2 Pro)
> macOS Ventura (Version 13.3.1)
>Reporter: Elphas Toringepi
>Assignee: Elphas Toringepi
>Priority: Major
>  Labels: pull-request-available, stale-assigned
> Fix For: 1.20.0
>
>
> Attempting to build the Flink project on Apple Silicon architecture results 
> in an error related to the execution of the frontend-maven-plugin.
> The error message indicates that the plugin fails to run 
> "flink/flink-runtime-web/web-dashboard/node/node" program due to a "Bad CPU 
> type in executable" error.
> {code:java}
> [ERROR] Failed to execute goal 
> com.github.eirslett:frontend-maven-plugin:1.11.0:npm (npm install) on project 
> flink-runtime-web: Failed to run task: 'npm ci --cache-max=0 --no-save 
> ${npm.proxy}' failed. java.io.IOException: Cannot run program 
> "flink/flink-runtime-web/web-dashboard/node/node" (in directory 
> "flink/flink-runtime-web/web-dashboard"): error=86, Bad CPU type in 
> executable{code}
> Steps to Reproduce:
>  # Clone the Flink project repository. 
>  # Attempt to build the project on an Apple Silicon device.
>  # Observe the error message mentioned above.
> {code:java}
> git clone https://github.com/apache/flink.git
> cd flink
> ./mvnw clean package -DskipTests
> {code}
> Proposed Solution
> Upgrade frontend-maven-plugin from version 1.11.0 to the latest version, 
> 1.12.1.
> frontend-maven-plugin version 1.11.0 downloads x64 binaries 
> node-v16.13.2-darwin-x64.tar.gz instead of the arm64 binaries.
> Support for arm64 has been available for frontend-maven-plugin  since version 
> 2. [https://github.com/eirslett/frontend-maven-plugin/pull/970]
> {code:java}
> [DEBUG] Executing command line 
> [/Users/elphas/src/flink/flink-runtime-web/web-dashboard/node/node, 
> --version] [INFO] Installing node version v16.13.2 [DEBUG] Creating temporary 
> directory /flink/flink-runtime-web/web-dashboard/node/tmp [INFO] Unpacking 
> ~/.m2/repository/com/github/eirslett/node/16.13.2/node-16.13.2-darwin-x64.tar.gz
>  into flink/flink-runtime-web/web-dashboard/node/tmp{code}
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] (FLINK-28260) flink-runtime-web fails to execute "npm ci" on Apple Silicon (arm64) without rosetta

2024-05-31 Thread Piotr Nowojski (Jira)


[ https://issues.apache.org/jira/browse/FLINK-28260 ]


Piotr Nowojski deleted comment on FLINK-28260:


was (Author: pnowojski):
merged commit d075c9f into apache:master

> flink-runtime-web fails to execute "npm ci" on Apple Silicon (arm64) without 
> rosetta
> 
>
> Key: FLINK-28260
> URL: https://issues.apache.org/jira/browse/FLINK-28260
> Project: Flink
>  Issue Type: Bug
>  Components: Build System, Runtime / Web Frontend
>Reporter: Robert Metzger
>Priority: Major
>
> Flink 1.16-SNAPSHOT fails to build in the flink-runtime-web project because 
> we are using an outdated frontend-maven-plugin (v 1.11.3).
> This is the error:
> {code}
> [ERROR] Failed to execute goal 
> com.github.eirslett:frontend-maven-plugin:1.11.3:npm (npm install) on project 
> flink-runtime-web: Failed to run task: 'npm ci --cache-max=0 --no-save 
> ${npm.proxy}' failed. java.io.IOException: Cannot run program 
> "/Users/rmetzger/Projects/flink/flink-runtime-web/web-dashboard/node/node" 
> (in directory 
> "/Users/rmetzger/Projects/flink/flink-runtime-web/web-dashboard"): error=86, 
> Bad CPU type in executable -> [Help 1]
> {code}
> Using the latest frontend-maven-plugin (v. 1.12.1) properly passes the build, 
> because this version downloads the proper arm64 npm version. However, 
> frontend-maven-plugin 1.12.1 requires Maven 3.6.0, which is too high for 
> Flink (highest mvn version for Flink is 3.2.5).
> The best workaround is using rosetta on M1 Macs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-28260) flink-runtime-web fails to execute "npm ci" on Apple Silicon (arm64) without rosetta

2024-05-31 Thread Piotr Nowojski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-28260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850963#comment-17850963
 ] 

Piotr Nowojski commented on FLINK-28260:


merged commit d075c9f into apache:master

> flink-runtime-web fails to execute "npm ci" on Apple Silicon (arm64) without 
> rosetta
> 
>
> Key: FLINK-28260
> URL: https://issues.apache.org/jira/browse/FLINK-28260
> Project: Flink
>  Issue Type: Bug
>  Components: Build System, Runtime / Web Frontend
>Reporter: Robert Metzger
>Priority: Major
>
> Flink 1.16-SNAPSHOT fails to build in the flink-runtime-web project because 
> we are using an outdated frontend-maven-plugin (v 1.11.3).
> This is the error:
> {code}
> [ERROR] Failed to execute goal 
> com.github.eirslett:frontend-maven-plugin:1.11.3:npm (npm install) on project 
> flink-runtime-web: Failed to run task: 'npm ci --cache-max=0 --no-save 
> ${npm.proxy}' failed. java.io.IOException: Cannot run program 
> "/Users/rmetzger/Projects/flink/flink-runtime-web/web-dashboard/node/node" 
> (in directory 
> "/Users/rmetzger/Projects/flink/flink-runtime-web/web-dashboard"): error=86, 
> Bad CPU type in executable -> [Help 1]
> {code}
> Using the latest frontend-maven-plugin (v. 1.12.1) properly passes the build, 
> because this version downloads the proper arm64 npm version. However, 
> frontend-maven-plugin 1.12.1 requires Maven 3.6.0, which is too high for 
> Flink (highest mvn version for Flink is 3.2.5).
> The best workaround is using rosetta on M1 Macs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-35351) Restore from unaligned checkpoints with a custom partitioner fails.

2024-05-31 Thread Piotr Nowojski (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piotr Nowojski closed FLINK-35351.
--
Fix Version/s: 1.18.2
   1.20.0
   1.19.1
   Resolution: Fixed

merged to master as ce0b61f376b and ce0b61f376b^
merged to release-1.19 as 551f4ae7dde and 551f4ae7dde^
merged to release-1.18 as 70f775e7ba1 and 70f775e7ba1^

> Restore from unaligned checkpoints with a custom partitioner fails.
> ---
>
> Key: FLINK-35351
> URL: https://issues.apache.org/jira/browse/FLINK-35351
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Reporter: Dmitriy Linevich
>Assignee: Dmitriy Linevich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.2, 1.20.0, 1.19.1
>
>
> We encountered a problem when using a custom partitioner with unaligned 
> checkpoints. The bug reproduces under the following steps:
>  # Run a job with graph: Source[2]->Sink[3], the custom partitioner applied 
> after the Source task.
>  # Make a checkpoint.
>  # Restore from the checkpoint with a different source parallelism: 
> Source[1]->Sink[3].
>  # An exception is thrown.
> This issue does not occur when restoring with the same parallelism or when 
> changing the Sink parallelism. The exception only occurs when the parallelism 
> of the Source is changed while the Sink parallelism remains the same.
> See the exception below and the test code at the end. 
> {code:java}
> [db13789c52b80aad852c53a0afa26247] Task [Sink: sink (3/3)#0] WARN  Sink: sink 
> (3/3)#0 
> (be1d158c2e77fc9ed9e3e5d9a8431dc2_0a448493b4782967b150582570326227_2_0) 
> switched from RUNNING to FAILED with failure cause:
> java.io.IOException: Can't get next record for channel 
> InputChannelInfo{gateIdx=0, inputChannelIdx=0}
>     at 
> org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:106)
>  ~[classes/:?]
>     at 
> org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
>  ~[classes/:?]
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:600)
>  ~[classes/:?]
>     at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:231)
>  ~[classes/:?]
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:930)
>  ~[classes/:?]
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:879)
>  ~[classes/:?]
>     at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:960)
>  ~[classes/:?]
>     at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:939) 
> [classes/:?]
>     at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:753) 
> [classes/:?]
>     at org.apache.flink.runtime.taskmanager.Task.run(Task.java:568) 
> [classes/:?]
>     at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]
> Caused by: java.io.IOException: Corrupt stream, found tag: -1
>     at 
> org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.deserialize(StreamElementSerializer.java:222)
>  ~[classes/:?]
>     at 
> org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.deserialize(StreamElementSerializer.java:44)
>  ~[classes/:?]
>     at 
> org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:53)
>  ~[classes/:?]
>     at 
> org.apache.flink.runtime.io.network.api.serialization.NonSpanningWrapper.readInto(NonSpanningWrapper.java:337)
>  ~[classes/:?]
>     at 
> org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.readNonSpanningRecord(SpillingAdaptiveSpanningRecordDeserializer.java:128)
>  ~[classes/:?]
>     at 
> org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.readNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:103)
>  ~[classes/:?]
>     at 
> org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:93)
>  ~[classes/:?]
>     at 
> org.apache.flink.streaming.runtime.io.recovery.DemultiplexingRecordDeserializer$VirtualChannel.getNextRecord(DemultiplexingRecordDeserializer.java:79)
>  ~[classes/:?]
>     at 
> org.apache.flink.streaming.runtime.io.

[jira] [Commented] (FLINK-35489) Metaspace size can be too little after autotuning change memory setting

2024-05-31 Thread Nicolas Fraison (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850953#comment-17850953
 ] 

Nicolas Fraison commented on FLINK-35489:
-

[~mxm] what do you think about providing memory to the METASPACE_MEMORY before 
the HEAP_MEMORY (switching those lines 
[https://github.com/apache/flink-kubernetes-operator/blob/main/flink-autoscaler/src/main/java/org/apache/flink/autoscaler/tuning/MemoryTuning.java#L128-L131)]
And also ensuring that the METASPACE_MEMORY computed will never be bigger than 
the one assigned by default (from config taskmanager.memory.jvm-metaspace.size)

Looks to me that this space should not grow with some load change and the 
default one must be sufficiently big to have the TaskManager running fine. The 
autotuning should only scale down this memory space depending on the usage, no?

> Metaspace size can be too little after autotuning change memory setting
> ---
>
> Key: FLINK-35489
> URL: https://issues.apache.org/jira/browse/FLINK-35489
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Affects Versions: 1.8.0
>Reporter: Nicolas Fraison
>Priority: Major
>
> We have enable the autotuning feature on one of our flink job with below 
> config
> {code:java}
> # Autoscaler configuration
> job.autoscaler.enabled: "true"
> job.autoscaler.stabilization.interval: 1m
> job.autoscaler.metrics.window: 10m
> job.autoscaler.target.utilization: "0.8"
> job.autoscaler.target.utilization.boundary: "0.1"
> job.autoscaler.restart.time: 2m
> job.autoscaler.catch-up.duration: 10m
> job.autoscaler.memory.tuning.enabled: true
> job.autoscaler.memory.tuning.overhead: 0.5
> job.autoscaler.memory.tuning.maximize-managed-memory: true{code}
> During a scale down the autotuning decided to give all the memory to to JVM 
> (having heap being scale by 2) settting taskmanager.memory.managed.size to 0b.
> Here is the config that was compute by the autotuning for a TM running on a 
> 4GB pod:
> {code:java}
> taskmanager.memory.network.max: 4063232b
> taskmanager.memory.network.min: 4063232b
> taskmanager.memory.jvm-overhead.max: 433791712b
> taskmanager.memory.task.heap.size: 3699934605b
> taskmanager.memory.framework.off-heap.size: 134217728b
> taskmanager.memory.jvm-metaspace.size: 22960020b
> taskmanager.memory.framework.heap.size: "0 bytes"
> taskmanager.memory.flink.size: 3838215565b
> taskmanager.memory.managed.size: 0b {code}
> This has lead to some issue starting the TM because we are relying on some 
> javaagent performing some memory allocation outside of the JVM (rely on some 
> C bindings).
> Tuning the overhead or disabling the scale-down-compensation.enabled could 
> have helped for that particular event but this can leads to other issue as it 
> could leads to too little HEAP size being computed.
> It would be interesting to be able to set a min memory.managed.size to be 
> taken in account by the autotuning.
> What do you think about this? Do you think that some other specific config 
> should have been applied to avoid this issue?
>  
> Edit see this comment that leads to the metaspace issue: 
> https://issues.apache.org/jira/browse/FLINK-35489?focusedCommentId=17850694=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17850694



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-35215) The performance of serializerKryo and serializerKryoWithoutRegistration are regressed

2024-05-30 Thread Rui Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Fan closed FLINK-35215.
---
Resolution: Fixed

> The performance of serializerKryo and serializerKryoWithoutRegistration are 
> regressed
> -
>
> Key: FLINK-35215
> URL: https://issues.apache.org/jira/browse/FLINK-35215
> Project: Flink
>  Issue Type: Bug
>  Components: API / Type Serialization System
>Affects Versions: 1.20.0
>Reporter: Rui Fan
>Assignee: Rui Fan
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.20.0
>
> Attachments: image-2024-04-25-14-57-55-231.png, 
> image-2024-04-25-15-00-32-410.png, image-2024-05-31-11-22-49-226.png
>
>
> The performance of serializerKryo and serializerKryoWithoutRegistration are 
> regressed[1][2], I checked recent commits, and found FLINK-34954 changed 
> related logic.
>  
> [1] 
> http://flink-speed.xyz/timeline/#/?exe=1,6,12=serializerKryo=on=on=off=3=200
> [2] 
> http://flink-speed.xyz/timeline/#/?exe=1,6,12=serializerKryoWithoutRegistration=on=on=off=3=200
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35215) The performance of serializerKryo and serializerKryoWithoutRegistration are regressed

2024-05-30 Thread Rui Fan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850904#comment-17850904
 ] 

Rui Fan commented on FLINK-35215:
-

The performance seems already recovered, let me close this JIRA.

 

!image-2024-05-31-11-22-49-226.png|width=1121,height=389!

> The performance of serializerKryo and serializerKryoWithoutRegistration are 
> regressed
> -
>
> Key: FLINK-35215
> URL: https://issues.apache.org/jira/browse/FLINK-35215
> Project: Flink
>  Issue Type: Bug
>  Components: API / Type Serialization System
>Affects Versions: 1.20.0
>Reporter: Rui Fan
>Assignee: Rui Fan
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.20.0
>
> Attachments: image-2024-04-25-14-57-55-231.png, 
> image-2024-04-25-15-00-32-410.png, image-2024-05-31-11-22-49-226.png
>
>
> The performance of serializerKryo and serializerKryoWithoutRegistration are 
> regressed[1][2], I checked recent commits, and found FLINK-34954 changed 
> related logic.
>  
> [1] 
> http://flink-speed.xyz/timeline/#/?exe=1,6,12=serializerKryo=on=on=off=3=200
> [2] 
> http://flink-speed.xyz/timeline/#/?exe=1,6,12=serializerKryoWithoutRegistration=on=on=off=3=200
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-35215) The performance of serializerKryo and serializerKryoWithoutRegistration are regressed

2024-05-30 Thread Rui Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Fan updated FLINK-35215:

Description: 
The performance of serializerKryo and serializerKryoWithoutRegistration are 
regressed[1][2], I checked recent commits, and found FLINK-34954 changed 
related logic.

 

[1] 
http://flink-speed.xyz/timeline/#/?exe=1,6,12=serializerKryo=on=on=off=3=200

[2] 
http://flink-speed.xyz/timeline/#/?exe=1,6,12=serializerKryoWithoutRegistration=on=on=off=3=200

 

 

  was:
The performance of serializerKryo and serializerKryoWithoutRegistration are 
regressed[1][2], I checked recent commits, and found FLINK-34954 changed 
related logic.

 

[1] 
[http://flink-speed.xyz/timeline/#/?exe=1,6,12=serializerKryo=on=on=off=3=50]

[2] 
http://flink-speed.xyz/timeline/#/?exe=1,6,12=serializerKryoWithoutRegistration=on=on=off=3=50

 

 


> The performance of serializerKryo and serializerKryoWithoutRegistration are 
> regressed
> -
>
> Key: FLINK-35215
> URL: https://issues.apache.org/jira/browse/FLINK-35215
> Project: Flink
>  Issue Type: Bug
>  Components: API / Type Serialization System
>Affects Versions: 1.20.0
>Reporter: Rui Fan
>Assignee: Rui Fan
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.20.0
>
> Attachments: image-2024-04-25-14-57-55-231.png, 
> image-2024-04-25-15-00-32-410.png, image-2024-05-31-11-22-49-226.png
>
>
> The performance of serializerKryo and serializerKryoWithoutRegistration are 
> regressed[1][2], I checked recent commits, and found FLINK-34954 changed 
> related logic.
>  
> [1] 
> http://flink-speed.xyz/timeline/#/?exe=1,6,12=serializerKryo=on=on=off=3=200
> [2] 
> http://flink-speed.xyz/timeline/#/?exe=1,6,12=serializerKryoWithoutRegistration=on=on=off=3=200
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-35215) The performance of serializerKryo and serializerKryoWithoutRegistration are regressed

2024-05-30 Thread Rui Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Fan updated FLINK-35215:

Attachment: image-2024-05-31-11-22-49-226.png

> The performance of serializerKryo and serializerKryoWithoutRegistration are 
> regressed
> -
>
> Key: FLINK-35215
> URL: https://issues.apache.org/jira/browse/FLINK-35215
> Project: Flink
>  Issue Type: Bug
>  Components: API / Type Serialization System
>Affects Versions: 1.20.0
>Reporter: Rui Fan
>Assignee: Rui Fan
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.20.0
>
> Attachments: image-2024-04-25-14-57-55-231.png, 
> image-2024-04-25-15-00-32-410.png, image-2024-05-31-11-22-49-226.png
>
>
> The performance of serializerKryo and serializerKryoWithoutRegistration are 
> regressed[1][2], I checked recent commits, and found FLINK-34954 changed 
> related logic.
>  
> [1] 
> http://flink-speed.xyz/timeline/#/?exe=1,6,12=serializerKryo=on=on=off=3=200
> [2] 
> http://flink-speed.xyz/timeline/#/?exe=1,6,12=serializerKryoWithoutRegistration=on=on=off=3=200
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35295) Improve jdbc connection pool initialization failure message

2024-05-30 Thread Jiabao Sun (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850903#comment-17850903
 ] 

Jiabao Sun commented on FLINK-35295:


release-3.1: e18e7a2523ac1ea59471e5714eb60f544e9f4a04

> Improve jdbc connection pool initialization failure message
> ---
>
> Key: FLINK-35295
> URL: https://issues.apache.org/jira/browse/FLINK-35295
> Project: Flink
>  Issue Type: Improvement
>  Components: Flink CDC
>Affects Versions: cdc-3.1.0
>Reporter: Xiao Huang
>Assignee: Xiao Huang
>Priority: Major
>  Labels: pull-request-available
> Fix For: cdc-3.2.0, cdc-3.1.1
>
>
> As described in ticket title.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-35295) Improve jdbc connection pool initialization failure message

2024-05-30 Thread Jiabao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiabao Sun updated FLINK-35295:
---
Fix Version/s: cdc-3.1.1

> Improve jdbc connection pool initialization failure message
> ---
>
> Key: FLINK-35295
> URL: https://issues.apache.org/jira/browse/FLINK-35295
> Project: Flink
>  Issue Type: Improvement
>  Components: Flink CDC
>Affects Versions: cdc-3.1.0
>Reporter: Xiao Huang
>Assignee: Xiao Huang
>Priority: Major
>  Labels: pull-request-available
> Fix For: cdc-3.2.0, cdc-3.1.1
>
>
> As described in ticket title.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35495) The native metrics for column family are not reported

2024-05-30 Thread Yanfei Lei (Jira)
Yanfei Lei created FLINK-35495:
--

 Summary: The native metrics for column family are not reported
 Key: FLINK-35495
 URL: https://issues.apache.org/jira/browse/FLINK-35495
 Project: Flink
  Issue Type: Sub-task
Reporter: Yanfei Lei






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35363) FLIP-449: Reorganization of flink-connector-jdbc

2024-05-30 Thread Leonard Xu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850901#comment-17850901
 ] 

Leonard Xu commented on FLINK-35363:


Hey [~eskabetxe], assigned to you.

> FLIP-449: Reorganization of flink-connector-jdbc
> 
>
> Key: FLINK-35363
> URL: https://issues.apache.org/jira/browse/FLINK-35363
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / JDBC
>Reporter: João Boto
>Assignee: João Boto
>Priority: Major
>  Labels: pull-request-available
>
> Described in: 
> [FLIP-449|https://cwiki.apache.org/confluence/display/FLINK/FLIP-449%3A+Reorganization+of+flink-connector-jdbc]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-35363) FLIP-449: Reorganization of flink-connector-jdbc

2024-05-30 Thread Leonard Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leonard Xu reassigned FLINK-35363:
--

Assignee: João Boto

> FLIP-449: Reorganization of flink-connector-jdbc
> 
>
> Key: FLINK-35363
> URL: https://issues.apache.org/jira/browse/FLINK-35363
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / JDBC
>Reporter: João Boto
>Assignee: João Boto
>Priority: Major
>  Labels: pull-request-available
>
> Described in: 
> [FLIP-449|https://cwiki.apache.org/confluence/display/FLINK/FLIP-449%3A+Reorganization+of+flink-connector-jdbc]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35484) Flink document file had removed but website can access

2024-05-30 Thread Leonard Xu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850900#comment-17850900
 ] 

Leonard Xu commented on FLINK-35484:


thanks [~gongzhongqiang] for the confirm, cool!

> Flink document file had removed but website can access
> --
>
> Key: FLINK-35484
> URL: https://issues.apache.org/jira/browse/FLINK-35484
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.19.0
>Reporter: Zhongqiang Gong
>Assignee: Zhongqiang Gong
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>
> Flink 1.18 document had remove document about DataSet : issue link 
> https://issues.apache.org/jira/browse/FLINK-32741.
> But I can still access the link : 
> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/dataset/formats/avro/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35483) BatchJobRecoveryTest related to JM failover produced no output for 900 second

2024-05-30 Thread Weijie Guo (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850894#comment-17850894
 ] 

Weijie Guo commented on FLINK-35483:


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59980=logs=0da23115-68bb-5dcd-192c-bd4c8adebde1=24c3384f-1bcb-57b3-224f-51bf973bbee8=10345

> BatchJobRecoveryTest related to JM failover produced no output for 900 second
> -
>
> Key: FLINK-35483
> URL: https://issues.apache.org/jira/browse/FLINK-35483
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / CI
>Affects Versions: 1.20.0
>Reporter: Weijie Guo
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
>
> testRecoverFromJMFailover
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59919=logs=0da23115-68bb-5dcd-192c-bd4c8adebde1=24c3384f-1bcb-57b3-224f-51bf973bbee8=9476



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33463) Support the implementation of dynamic source tables based on the new source

2024-05-30 Thread Rui Fan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850892#comment-17850892
 ] 

Rui Fan commented on FLINK-33463:
-

After discuss with [~RocMarshal] , it's better to support the JDBC dynamic 
source tables based on the new source after we think new JdbcSource is stable, 
so let's revert the commit first, and finish the Jira in the future.

> Support the implementation of dynamic source tables based on the new source
> ---
>
> Key: FLINK-33463
> URL: https://issues.apache.org/jira/browse/FLINK-33463
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / JDBC
>Reporter: RocMarshal
>Assignee: RocMarshal
>Priority: Major
>  Labels: pull-request-available
> Fix For: jdbc-3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-35484) Flink document file had removed but website can access

2024-05-30 Thread Zhongqiang Gong (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850890#comment-17850890
 ] 

Zhongqiang Gong edited comment on FLINK-35484 at 5/31/24 1:31 AM:
--

[~leonard] After build document ci scheduled,I checked:
* The link: 
https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/dataset/formats/avro/
 is 404 not foud 
* Document website is functioning properly.

All result are excepted.


was (Author: JIRAUSER301076):
[~leonard] After build document ci scheduled,I checked:
* The link: 
https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/dataset/formats/avro/
 is 404 not foud 
* Document website is functioning properly.



> Flink document file had removed but website can access
> --
>
> Key: FLINK-35484
> URL: https://issues.apache.org/jira/browse/FLINK-35484
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.19.0
>Reporter: Zhongqiang Gong
>Assignee: Zhongqiang Gong
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>
> Flink 1.18 document had remove document about DataSet : issue link 
> https://issues.apache.org/jira/browse/FLINK-32741.
> But I can still access the link : 
> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/dataset/formats/avro/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-35484) Flink document file had removed but website can access

2024-05-30 Thread Zhongqiang Gong (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850890#comment-17850890
 ] 

Zhongqiang Gong edited comment on FLINK-35484 at 5/31/24 1:31 AM:
--

[~leonard] After build document ci scheduled,I checked:
* The link: 
https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/dataset/formats/avro/
 is 404 not foud 
* Document website is functioning properly.




was (Author: JIRAUSER301076):
After build document ci scheduled,I checked:
* The link: 
https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/dataset/formats/avro/
 is 404 not foud 
* Document website is functioning properly.



> Flink document file had removed but website can access
> --
>
> Key: FLINK-35484
> URL: https://issues.apache.org/jira/browse/FLINK-35484
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.19.0
>Reporter: Zhongqiang Gong
>Assignee: Zhongqiang Gong
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>
> Flink 1.18 document had remove document about DataSet : issue link 
> https://issues.apache.org/jira/browse/FLINK-32741.
> But I can still access the link : 
> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/dataset/formats/avro/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35484) Flink document file had removed but website can access

2024-05-30 Thread Zhongqiang Gong (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850890#comment-17850890
 ] 

Zhongqiang Gong commented on FLINK-35484:
-

After build document ci scheduled,I checked:
* The link: 
https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/dataset/formats/avro/
 is 404 not foud 
* Document website is functioning properly.



> Flink document file had removed but website can access
> --
>
> Key: FLINK-35484
> URL: https://issues.apache.org/jira/browse/FLINK-35484
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.19.0
>Reporter: Zhongqiang Gong
>Assignee: Zhongqiang Gong
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>
> Flink 1.18 document had remove document about DataSet : issue link 
> https://issues.apache.org/jira/browse/FLINK-32741.
> But I can still access the link : 
> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/dataset/formats/avro/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-34023) Expose Kinesis client retry config in sink

2024-05-30 Thread Brad Atcheson (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850841#comment-17850841
 ] 

Brad Atcheson edited comment on FLINK-34023 at 5/30/24 7:28 PM:


[Here's|[https://github.com/brada/flink-connector-aws/commit/aa4fce309e94f979aa1f3ef48d5d27fa17f55759]]
 a possible solution that adds 3 new config properties:
 - aws.dynamodb.client.retry-policy.num-retries
 - aws.firehose.client.retry-policy.num-retries
 - aws.kinesis.client.retry-policy.num-retries

That seemed like the simplest possible approach. Exposing the complete AWS SDK 
retry policy would require many more parameters, including [backoff 
strategy|[https://sdk.amazonaws.com/java/api/2.0.0/software/amazon/awssdk/core/retry/backoff/BackoffStrategy.html]],
 base delay, max backoff time etc. Since num-retries is the most important 
parameter, would it be acceptable to expose just that one for now?

The `mvn clean verify` and `mvm clean package` commands fail on that branch, 
but they also fail on the origin main branch for what appears to be issues 
unrelated to my commit:
{noformat}
[ERROR] Failures:
[ERROR]   RowDataToAttributeValueConverterTest.testFloat:208
Expecting map:
  {"key"=AttributeValue(N=1.2345679E17)}
to contain entries:
  ["key"=AttributeValue(N=1.23456791E17)]
but the following map entries had different values:
  ["key"=AttributeValue(N=1.2345679E17) (expected: 
AttributeValue(N=1.23456791E17))]{noformat}
 

 


was (Author: JIRAUSER303710):
[Here's|[https://github.com/brada/flink-connector-aws/commit/aa4fce309e94f979aa1f3ef48d5d27fa17f55759]]
 a possible solution that adds 3 new config properties:
 - aws.dynamodb.client.retry-policy.num-retries
 - aws.firehose.client.retry-policy.num-retries
 - aws.kinesis.client.retry-policy.num-retries

That seemed like the simplest possible approach. Exposing the complete AWS SDK 
retry policy would require many more parameters, including [backoff 
strategy|[https://sdk.amazonaws.com/java/api/2.0.0/software/amazon/awssdk/core/retry/backoff/BackoffStrategy.html]],
 base delay, max backoff time etc. Since num-retries is the most important 
parameter, would it be acceptable to expose just that one for now?

The `mvn clean verify` and `mvm clean package` commands fail on that branch, 
but they also fail on the origin main branch for what appears to be issues 
unrelated to my commit:
```
[ERROR] Failures:
[ERROR]   RowDataToAttributeValueConverterTest.testFloat:208
Expecting map:
  \{"key"=AttributeValue(N=1.2345679E17)}
to contain entries:
  ["key"=AttributeValue(N=1.23456791E17)]
but the following map entries had different values:
  ["key"=AttributeValue(N=1.2345679E17) (expected: 
AttributeValue(N=1.23456791E17))]
``` 

> Expose Kinesis client retry config in sink
> --
>
>     Key: FLINK-34023
> URL: https://issues.apache.org/jira/browse/FLINK-34023
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Kinesis
>Reporter: Brad Atcheson
>Priority: Major
>
> The consumer side exposes client retry configuration like 
> [flink.shard.getrecords.maxretries|https://nightlies.apache.org/flink/flink-docs-stable/api/java/org/apache/flink/streaming/connectors/kinesis/config/ConsumerConfigConstants.html#SHARD_GETRECORDS_RETRIES]
>  but the producer side lacks similar config for PutRecords.
> The KinesisStreamsSinkWriter constructor calls 
> {code}
> this.httpClient = 
> AWSGeneralUtil.createAsyncHttpClient(kinesisClientProperties);
> this.kinesisClient = buildClient(kinesisClientProperties, this.httpClient);
> {code}
> But those methods only refer to these values (aside from 
> endpoint/region/creds) in the kinesisClientProperties:
> * aws.http-client.max-concurrency
> * aws.http-client.read-timeout
> * aws.trust.all.certificates
> * aws.http.protocol.version
> Without control over retry, users can observe exceptions like {code}Request 
> attempt 2 failure: Unable to execute HTTP request: connection timed out after 
> 2000 ms: kinesis.us-west-2.amazonaws.com{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-34023) Expose Kinesis client retry config in sink

2024-05-30 Thread Brad Atcheson (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850841#comment-17850841
 ] 

Brad Atcheson edited comment on FLINK-34023 at 5/30/24 7:28 PM:


[Here's|https://github.com/brada/flink-connector-aws/commit/aa4fce309e94f979aa1f3ef48d5d27fa17f55759]
 a possible solution that adds 3 new config properties:
 - aws.dynamodb.client.retry-policy.num-retries
 - aws.firehose.client.retry-policy.num-retries
 - aws.kinesis.client.retry-policy.num-retries

That seemed like the simplest possible approach. Exposing the complete AWS SDK 
retry policy would require many more parameters, including [backoff 
strategy|https://sdk.amazonaws.com/java/api/2.0.0/software/amazon/awssdk/core/retry/backoff/BackoffStrategy.html],
 base delay, max backoff time etc. Since num-retries is the most important 
parameter, would it be acceptable to expose just that one for now?

The `mvn clean verify` and `mvm clean package` commands fail on that branch, 
but they also fail on the origin main branch for what appears to be issues 
unrelated to my commit:
{noformat}
[ERROR] Failures:
[ERROR]   RowDataToAttributeValueConverterTest.testFloat:208
Expecting map:
  {"key"=AttributeValue(N=1.2345679E17)}
to contain entries:
  ["key"=AttributeValue(N=1.23456791E17)]
but the following map entries had different values:
  ["key"=AttributeValue(N=1.2345679E17) (expected: 
AttributeValue(N=1.23456791E17))]{noformat}


was (Author: JIRAUSER303710):
[Here's|[https://github.com/brada/flink-connector-aws/commit/aa4fce309e94f979aa1f3ef48d5d27fa17f55759]]
 a possible solution that adds 3 new config properties:
 - aws.dynamodb.client.retry-policy.num-retries
 - aws.firehose.client.retry-policy.num-retries
 - aws.kinesis.client.retry-policy.num-retries

That seemed like the simplest possible approach. Exposing the complete AWS SDK 
retry policy would require many more parameters, including [backoff 
strategy|[https://sdk.amazonaws.com/java/api/2.0.0/software/amazon/awssdk/core/retry/backoff/BackoffStrategy.html]],
 base delay, max backoff time etc. Since num-retries is the most important 
parameter, would it be acceptable to expose just that one for now?

The `mvn clean verify` and `mvm clean package` commands fail on that branch, 
but they also fail on the origin main branch for what appears to be issues 
unrelated to my commit:
{noformat}
[ERROR] Failures:
[ERROR]   RowDataToAttributeValueConverterTest.testFloat:208
Expecting map:
  {"key"=AttributeValue(N=1.2345679E17)}
to contain entries:
  ["key"=AttributeValue(N=1.23456791E17)]
but the following map entries had different values:
  ["key"=AttributeValue(N=1.2345679E17) (expected: 
AttributeValue(N=1.23456791E17))]{noformat}
 

 

> Expose Kinesis client retry config in sink
> --
>
>     Key: FLINK-34023
> URL: https://issues.apache.org/jira/browse/FLINK-34023
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Kinesis
>Reporter: Brad Atcheson
>Priority: Major
>
> The consumer side exposes client retry configuration like 
> [flink.shard.getrecords.maxretries|https://nightlies.apache.org/flink/flink-docs-stable/api/java/org/apache/flink/streaming/connectors/kinesis/config/ConsumerConfigConstants.html#SHARD_GETRECORDS_RETRIES]
>  but the producer side lacks similar config for PutRecords.
> The KinesisStreamsSinkWriter constructor calls 
> {code}
> this.httpClient = 
> AWSGeneralUtil.createAsyncHttpClient(kinesisClientProperties);
> this.kinesisClient = buildClient(kinesisClientProperties, this.httpClient);
> {code}
> But those methods only refer to these values (aside from 
> endpoint/region/creds) in the kinesisClientProperties:
> * aws.http-client.max-concurrency
> * aws.http-client.read-timeout
> * aws.trust.all.certificates
> * aws.http.protocol.version
> Without control over retry, users can observe exceptions like {code}Request 
> attempt 2 failure: Unable to execute HTTP request: connection timed out after 
> 2000 ms: kinesis.us-west-2.amazonaws.com{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-34023) Expose Kinesis client retry config in sink

2024-05-30 Thread Brad Atcheson (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850841#comment-17850841
 ] 

Brad Atcheson edited comment on FLINK-34023 at 5/30/24 7:26 PM:


[Here's|[https://github.com/brada/flink-connector-aws/commit/aa4fce309e94f979aa1f3ef48d5d27fa17f55759]]
 a possible solution that adds 3 new config properties:
 - aws.dynamodb.client.retry-policy.num-retries
 - aws.firehose.client.retry-policy.num-retries
 - aws.kinesis.client.retry-policy.num-retries

That seemed like the simplest possible approach. Exposing the complete AWS SDK 
retry policy would require many more parameters, including [backoff 
strategy|[https://sdk.amazonaws.com/java/api/2.0.0/software/amazon/awssdk/core/retry/backoff/BackoffStrategy.html]],
 base delay, max backoff time etc. Since num-retries is the most important 
parameter, would it be acceptable to expose just that one for now?

The `mvn clean verify` and `mvm clean package` commands fail on that branch, 
but they also fail on the origin main branch for what appears to be issues 
unrelated to my commit:
```
[ERROR] Failures:
[ERROR]   RowDataToAttributeValueConverterTest.testFloat:208
Expecting map:
  \{"key"=AttributeValue(N=1.2345679E17)}
to contain entries:
  ["key"=AttributeValue(N=1.23456791E17)]
but the following map entries had different values:
  ["key"=AttributeValue(N=1.2345679E17) (expected: 
AttributeValue(N=1.23456791E17))]
``` 


was (Author: JIRAUSER303710):
[Here's](https://github.com/brada/flink-connector-aws/commit/aa4fce309e94f979aa1f3ef48d5d27fa17f55759)
 a possible solution that adds 3 new config properties:
- aws.dynamodb.client.retry-policy.num-retries
- aws.firehose.client.retry-policy.num-retries
- aws.kinesis.client.retry-policy.num-retries

That seemed like the simplest possible approach. Exposing the complete AWS SDK 
retry policy would require many more parameters, including [backoff 
strategy](https://sdk.amazonaws.com/java/api/2.0.0/software/amazon/awssdk/core/retry/backoff/BackoffStrategy.html),
 base delay, max backoff time etc. Since num-retries is the most important 
parameter, would it be acceptable to expose just that one for now?

The `mvn clean verify` and `mvm clean package` commands fail on that branch, 
but they also fail on the origin main branch for what appears to be issues 
unrelated to my commit:
```
[ERROR] Failures:
[ERROR]   RowDataToAttributeValueConverterTest.testFloat:208
Expecting map:
  \{"key"=AttributeValue(N=1.2345679E17)}
to contain entries:
  ["key"=AttributeValue(N=1.23456791E17)]
but the following map entries had different values:
  ["key"=AttributeValue(N=1.2345679E17) (expected: 
AttributeValue(N=1.23456791E17))]
``` 

> Expose Kinesis client retry config in sink
> --
>
>     Key: FLINK-34023
> URL: https://issues.apache.org/jira/browse/FLINK-34023
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Kinesis
>Reporter: Brad Atcheson
>Priority: Major
>
> The consumer side exposes client retry configuration like 
> [flink.shard.getrecords.maxretries|https://nightlies.apache.org/flink/flink-docs-stable/api/java/org/apache/flink/streaming/connectors/kinesis/config/ConsumerConfigConstants.html#SHARD_GETRECORDS_RETRIES]
>  but the producer side lacks similar config for PutRecords.
> The KinesisStreamsSinkWriter constructor calls 
> {code}
> this.httpClient = 
> AWSGeneralUtil.createAsyncHttpClient(kinesisClientProperties);
> this.kinesisClient = buildClient(kinesisClientProperties, this.httpClient);
> {code}
> But those methods only refer to these values (aside from 
> endpoint/region/creds) in the kinesisClientProperties:
> * aws.http-client.max-concurrency
> * aws.http-client.read-timeout
> * aws.trust.all.certificates
> * aws.http.protocol.version
> Without control over retry, users can observe exceptions like {code}Request 
> attempt 2 failure: Unable to execute HTTP request: connection timed out after 
> 2000 ms: kinesis.us-west-2.amazonaws.com{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34023) Expose Kinesis client retry config in sink

2024-05-30 Thread Brad Atcheson (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850841#comment-17850841
 ] 

Brad Atcheson commented on FLINK-34023:
---

[Here's](https://github.com/brada/flink-connector-aws/commit/aa4fce309e94f979aa1f3ef48d5d27fa17f55759)
 a possible solution that adds 3 new config properties:
- aws.dynamodb.client.retry-policy.num-retries
- aws.firehose.client.retry-policy.num-retries
- aws.kinesis.client.retry-policy.num-retries

That seemed like the simplest possible approach. Exposing the complete AWS SDK 
retry policy would require many more parameters, including [backoff 
strategy](https://sdk.amazonaws.com/java/api/2.0.0/software/amazon/awssdk/core/retry/backoff/BackoffStrategy.html),
 base delay, max backoff time etc. Since num-retries is the most important 
parameter, would it be acceptable to expose just that one for now?

The `mvn clean verify` and `mvm clean package` commands fail on that branch, 
but they also fail on the origin main branch for what appears to be issues 
unrelated to my commit:
```
[ERROR] Failures:
[ERROR]   RowDataToAttributeValueConverterTest.testFloat:208
Expecting map:
  \{"key"=AttributeValue(N=1.2345679E17)}
to contain entries:
  ["key"=AttributeValue(N=1.23456791E17)]
but the following map entries had different values:
  ["key"=AttributeValue(N=1.2345679E17) (expected: 
AttributeValue(N=1.23456791E17))]
``` 

> Expose Kinesis client retry config in sink
> --
>
> Key: FLINK-34023
>     URL: https://issues.apache.org/jira/browse/FLINK-34023
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Kinesis
>Reporter: Brad Atcheson
>Priority: Major
>
> The consumer side exposes client retry configuration like 
> [flink.shard.getrecords.maxretries|https://nightlies.apache.org/flink/flink-docs-stable/api/java/org/apache/flink/streaming/connectors/kinesis/config/ConsumerConfigConstants.html#SHARD_GETRECORDS_RETRIES]
>  but the producer side lacks similar config for PutRecords.
> The KinesisStreamsSinkWriter constructor calls 
> {code}
> this.httpClient = 
> AWSGeneralUtil.createAsyncHttpClient(kinesisClientProperties);
> this.kinesisClient = buildClient(kinesisClientProperties, this.httpClient);
> {code}
> But those methods only refer to these values (aside from 
> endpoint/region/creds) in the kinesisClientProperties:
> * aws.http-client.max-concurrency
> * aws.http-client.read-timeout
> * aws.trust.all.certificates
> * aws.http.protocol.version
> Without control over retry, users can observe exceptions like {code}Request 
> attempt 2 failure: Unable to execute HTTP request: connection timed out after 
> 2000 ms: kinesis.us-west-2.amazonaws.com{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35494) Reorganize sources

2024-05-30 Thread Jira
João Boto created FLINK-35494:
-

 Summary: Reorganize sources
 Key: FLINK-35494
 URL: https://issues.apache.org/jira/browse/FLINK-35494
 Project: Flink
  Issue Type: Sub-task
Reporter: João Boto






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35399) Add documents for batch job master failure recovery

2024-05-30 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850739#comment-17850739
 ] 

Zhu Zhu commented on FLINK-35399:
-

56cd9607713d0da874dcc54c4cf6d5b3b52b1050 refined the doc a bit.

> Add documents for batch job master failure recovery
> ---
>
> Key: FLINK-35399
> URL: https://issues.apache.org/jira/browse/FLINK-35399
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Zhu Zhu
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32384) Remove deprecated configuration keys which violate YAML spec

2024-05-30 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850734#comment-17850734
 ] 

Zhu Zhu commented on FLINK-32384:
-

Thanks for volunteering to contribute to Flink. [~kartikeypant]
However, this is a breaking change. Therefore, we cannot do it until Flink 1.20 
is released and release cycle of Flink 2.0 is started.
You are welcome to take this task if you are free at that moment.
Before that, you can take a look at other tickets.

> Remove deprecated configuration keys which violate YAML spec
> 
>
> Key: FLINK-32384
> URL: https://issues.apache.org/jira/browse/FLINK-32384
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Configuration
>Reporter: Zhu Zhu
>Priority: Major
>  Labels: 2.0-related
> Fix For: 2.0.0
>
>
> In FLINK-29372, key that violate YAML spec are renamed to a valid form and 
> the old names are deprecated.
> In Flink 2.0 we should remove these deprecated keys. This will prevent users 
> (unintentionally) to create invalid YAML form flink-conf.yaml.
> Then with the work of FLINK-23620,  we can remove the non-standard YAML 
> parsing logic and enforce standard YAML validation in CI.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34746) Switching to the Apache CDN for Dockerfile

2024-05-30 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850732#comment-17850732
 ] 

Hong Liang Teoh commented on FLINK-34746:
-

 merged commit 
[{{7f63237}}|https://github.com/apache/flink-docker/commit/7f63237615138615826f2820ca54ff2054514fca]
 into   apache:dev-1.19

 merged commit 
[{{804c9f3}}|https://github.com/apache/flink-docker/commit/804c9f3bb6772751d09252b6d15e8a1aac4ca055]
 into   apache:dev-1.18

 merged commit 
[{{0ac313e}}|https://github.com/apache/flink-docker/commit/0ac313e39fda6c49778fbdf15f4b5d827476253a]
 into  apache:dev-1.17

> Switching to the Apache CDN for Dockerfile
> --
>
> Key: FLINK-34746
> URL: https://issues.apache.org/jira/browse/FLINK-34746
> Project: Flink
>  Issue Type: Improvement
>  Components: flink-docker
>Reporter: lincoln lee
>Assignee: Martijn Visser
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.2, 1.20.0, 1.19.1
>
>
> During publishing the official image, we received some comments
> for Switching to the Apache CDN
>  
> See
> https://github.com/docker-library/official-images/pull/16114
> https://github.com/docker-library/official-images/pull/16430
>  
> Reason for switching: [https://apache.org/history/mirror-history.html] (also 
> [https://www.apache.org/dyn/closer.cgi] and [https://www.apache.org/mirrors])



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34379) table.optimizer.dynamic-filtering.enabled lead to OutOfMemoryError

2024-05-30 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850730#comment-17850730
 ] 

Hong Liang Teoh commented on FLINK-34379:
-

Closing Jira as patch has been completed for master branch as well as 1.17, 
1.18, 1.19 branch

> table.optimizer.dynamic-filtering.enabled lead to OutOfMemoryError
> --
>
> Key: FLINK-34379
> URL: https://issues.apache.org/jira/browse/FLINK-34379
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.17.2, 1.18.1
> Environment: 1.17.1
>Reporter: zhu
>Assignee: Jeyhun Karimov
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.17.3, 1.18.2, 1.20.0, 1.19.1
>
>
> When using batch computing, I union all about 50 tables and then join other 
> table. When compiling the execution plan, 
> there throws OutOfMemoryError: Java heap space, which was no problem in  
> 1.15.2. However, both 1.17.2 and 1.18.1 all throws same errors,This causes 
> jobmanager to restart. Currently,it has been found that this is caused by 
> table.optimizer.dynamic-filtering.enabled, which defaults is true,When I set 
> table.optimizer.dynamic-filtering.enabled to false, it can be compiled and 
> executed normally
> code
> TableEnvironment.create(EnvironmentSettings.newInstance()
> .withConfiguration(configuration)
> .inBatchMode().build())
> sql=select att,filename,'table0' as mo_name from table0 UNION All select 
> att,filename,'table1' as mo_name from table1 UNION All select 
> att,filename,'table2' as mo_name from table2 UNION All select 
> att,filename,'table3' as mo_name from table3 UNION All select 
> att,filename,'table4' as mo_name from table4 UNION All select 
> att,filename,'table5' as mo_name from table5 UNION All select 
> att,filename,'table6' as mo_name from table6 UNION All select 
> att,filename,'table7' as mo_name from table7 UNION All select 
> att,filename,'table8' as mo_name from table8 UNION All select 
> att,filename,'table9' as mo_name from table9 UNION All select 
> att,filename,'table10' as mo_name from table10 UNION All select 
> att,filename,'table11' as mo_name from table11 UNION All select 
> att,filename,'table12' as mo_name from table12 UNION All select 
> att,filename,'table13' as mo_name from table13 UNION All select 
> att,filename,'table14' as mo_name from table14 UNION All select 
> att,filename,'table15' as mo_name from table15 UNION All select 
> att,filename,'table16' as mo_name from table16 UNION All select 
> att,filename,'table17' as mo_name from table17 UNION All select 
> att,filename,'table18' as mo_name from table18 UNION All select 
> att,filename,'table19' as mo_name from table19 UNION All select 
> att,filename,'table20' as mo_name from table20 UNION All select 
> att,filename,'table21' as mo_name from table21 UNION All select 
> att,filename,'table22' as mo_name from table22 UNION All select 
> att,filename,'table23' as mo_name from table23 UNION All select 
> att,filename,'table24' as mo_name from table24 UNION All select 
> att,filename,'table25' as mo_name from table25 UNION All select 
> att,filename,'table26' as mo_name from table26 UNION All select 
> att,filename,'table27' as mo_name from table27 UNION All select 
> att,filename,'table28' as mo_name from table28 UNION All select 
> att,filename,'table29' as mo_name from table29 UNION All select 
> att,filename,'table30' as mo_name from table30 UNION All select 
> att,filename,'table31' as mo_name from table31 UNION All select 
> att,filename,'table32' as mo_name from table32 UNION All select 
> att,filename,'table33' as mo_name from table33 UNION All select 
> att,filename,'table34' as mo_name from table34 UNION All select 
> att,filename,'table35' as mo_name from table35 UNION All select 
> att,filename,'table36' as mo_name from table36 UNION All select 
> att,filename,'table37' as mo_name from table37 UNION All select 
> att,filename,'table38' as mo_name from table38 UNION All select 
> att,filename,'table39' as mo_name from table39 UNION All select 
> att,filename,'table40' as mo_name from table40 UNION All select 
> att,filename,'table41' as mo_name from table41 UNION All select 
> att,filename,'table42' as mo_name from table42 UNION All select 
> att,filename,'table43' as mo_name from table43 UNION All select 
> att,filename,'table44' as mo_name from table44 UNION All select 
> att,filename,'table45' as mo_name from table45 UNION All select 
> att,filename,'table46' as mo_name from table46 UNION All select 
> att

[jira] [Resolved] (FLINK-35358) Breaking change when loading artifacts

2024-05-30 Thread Hong Liang Teoh (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Liang Teoh resolved FLINK-35358.
-
Resolution: Fixed

> Breaking change when loading artifacts
> --
>
> Key: FLINK-35358
> URL: https://issues.apache.org/jira/browse/FLINK-35358
> Project: Flink
>  Issue Type: Bug
>  Components: Client / Job Submission, flink-docker
>Affects Versions: 1.19.0
>Reporter: Rasmus Thygesen
>Priority: Not a Priority
>  Labels: pull-request-available
> Fix For: 1.19.1
>
>
> We have been using the following code snippet in our Dockerfiles for running 
> a Flink job in application mode
>  
> {code:java}
> FROM flink:1.18.1-scala_2.12-java17
> COPY --from=build /app/target/my-job*.jar 
> /opt/flink/usrlib/artifacts/my-job.jar
> USER flink {code}
>  
> Which has been working since at least around Flink 1.14, but the 1.19 update 
> has broken our Dockerfiles. The fix is to put the jar file a step further out 
> so the code snippet becomes
>  
> {code:java}
> FROM flink:1.18.1-scala_2.12-java17
> COPY --from=build /app/target/my-job*.jar /opt/flink/usrlib/my-job.jar
> USER flink  {code}
>  
> We have not spent too much time looking into what the cause is, but we get 
> the stack trace
>  
> {code:java}
> myjob-jobmanager-1   | org.apache.flink.util.FlinkException: Could not load 
> the provided entrypoint class.
> myjob-jobmanager-1   |     at 
> org.apache.flink.client.program.DefaultPackagedProgramRetriever.getPackagedProgram(DefaultPackagedProgramRetriever.java:230)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.container.entrypoint.StandaloneApplicationClusterEntryPoint.getPackagedProgram(StandaloneApplicationClusterEntryPoint.java:149)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.container.entrypoint.StandaloneApplicationClusterEntryPoint.lambda$main$0(StandaloneApplicationClusterEntryPoint.java:90)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.container.entrypoint.StandaloneApplicationClusterEntryPoint.main(StandaloneApplicationClusterEntryPoint.java:89)
>  [flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   | Caused by: 
> org.apache.flink.client.program.ProgramInvocationException: The program's 
> entry point class 'my.company.job.MyJob' was not found in the jar file.
> myjob-jobmanager-1   |     at 
> org.apache.flink.client.program.PackagedProgram.loadMainClass(PackagedProgram.java:481)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.client.program.PackagedProgram.(PackagedProgram.java:153)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.client.program.PackagedProgram.(PackagedProgram.java:65)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.client.program.PackagedProgram$Builder.build(PackagedProgram.java:691)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.client.program.DefaultPackagedProgramRetriever.getPackagedProgram(DefaultPackagedProgramRetriever.java:228)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     ... 4 more
> myjob-jobmanager-1   | Caused by: java.lang.ClassNotFoundException: 
> my.company.job.MyJob
> myjob-jobmanager-1   |     at java.net.URLClassLoader.findClass(Unknown 
> Source) ~[?:?]
> myjob-jobmanager-1   |     at java.lang.ClassLoader.loadClass(Unknown Source) 
> ~[?:?]
> myjob-jobmanager-1   |     at 
> org.apache.flink.util.FlinkUserCodeClassLoader.loadClassWithoutExceptionHandling(FlinkUserCodeClassLoader.java:67)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.util.ChildFirstClassLoader.loadClassWithoutExceptionHandling(ChildFirstClassLoader.java:74)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.util.FlinkUserCodeClassLoader.loadClass(FlinkUserCodeClassLoader.java:51)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at java.lang.ClassLoader.loadClass(Unknown Source) 
> ~[?:?]
> myjob-jobmanager-1   |     at 
> org.apache.flink.util.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.loadClass(FlinkUserCodeClassLoaders.java:197)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 

[jira] [Closed] (FLINK-34379) table.optimizer.dynamic-filtering.enabled lead to OutOfMemoryError

2024-05-30 Thread Hong Liang Teoh (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Liang Teoh closed FLINK-34379.
---
Resolution: Fixed

> table.optimizer.dynamic-filtering.enabled lead to OutOfMemoryError
> --
>
> Key: FLINK-34379
> URL: https://issues.apache.org/jira/browse/FLINK-34379
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.17.2, 1.18.1
> Environment: 1.17.1
>Reporter: zhu
>Assignee: Jeyhun Karimov
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.17.3, 1.18.2, 1.20.0, 1.19.1
>
>
> When using batch computing, I union all about 50 tables and then join other 
> table. When compiling the execution plan, 
> there throws OutOfMemoryError: Java heap space, which was no problem in  
> 1.15.2. However, both 1.17.2 and 1.18.1 all throws same errors,This causes 
> jobmanager to restart. Currently,it has been found that this is caused by 
> table.optimizer.dynamic-filtering.enabled, which defaults is true,When I set 
> table.optimizer.dynamic-filtering.enabled to false, it can be compiled and 
> executed normally
> code
> TableEnvironment.create(EnvironmentSettings.newInstance()
> .withConfiguration(configuration)
> .inBatchMode().build())
> sql=select att,filename,'table0' as mo_name from table0 UNION All select 
> att,filename,'table1' as mo_name from table1 UNION All select 
> att,filename,'table2' as mo_name from table2 UNION All select 
> att,filename,'table3' as mo_name from table3 UNION All select 
> att,filename,'table4' as mo_name from table4 UNION All select 
> att,filename,'table5' as mo_name from table5 UNION All select 
> att,filename,'table6' as mo_name from table6 UNION All select 
> att,filename,'table7' as mo_name from table7 UNION All select 
> att,filename,'table8' as mo_name from table8 UNION All select 
> att,filename,'table9' as mo_name from table9 UNION All select 
> att,filename,'table10' as mo_name from table10 UNION All select 
> att,filename,'table11' as mo_name from table11 UNION All select 
> att,filename,'table12' as mo_name from table12 UNION All select 
> att,filename,'table13' as mo_name from table13 UNION All select 
> att,filename,'table14' as mo_name from table14 UNION All select 
> att,filename,'table15' as mo_name from table15 UNION All select 
> att,filename,'table16' as mo_name from table16 UNION All select 
> att,filename,'table17' as mo_name from table17 UNION All select 
> att,filename,'table18' as mo_name from table18 UNION All select 
> att,filename,'table19' as mo_name from table19 UNION All select 
> att,filename,'table20' as mo_name from table20 UNION All select 
> att,filename,'table21' as mo_name from table21 UNION All select 
> att,filename,'table22' as mo_name from table22 UNION All select 
> att,filename,'table23' as mo_name from table23 UNION All select 
> att,filename,'table24' as mo_name from table24 UNION All select 
> att,filename,'table25' as mo_name from table25 UNION All select 
> att,filename,'table26' as mo_name from table26 UNION All select 
> att,filename,'table27' as mo_name from table27 UNION All select 
> att,filename,'table28' as mo_name from table28 UNION All select 
> att,filename,'table29' as mo_name from table29 UNION All select 
> att,filename,'table30' as mo_name from table30 UNION All select 
> att,filename,'table31' as mo_name from table31 UNION All select 
> att,filename,'table32' as mo_name from table32 UNION All select 
> att,filename,'table33' as mo_name from table33 UNION All select 
> att,filename,'table34' as mo_name from table34 UNION All select 
> att,filename,'table35' as mo_name from table35 UNION All select 
> att,filename,'table36' as mo_name from table36 UNION All select 
> att,filename,'table37' as mo_name from table37 UNION All select 
> att,filename,'table38' as mo_name from table38 UNION All select 
> att,filename,'table39' as mo_name from table39 UNION All select 
> att,filename,'table40' as mo_name from table40 UNION All select 
> att,filename,'table41' as mo_name from table41 UNION All select 
> att,filename,'table42' as mo_name from table42 UNION All select 
> att,filename,'table43' as mo_name from table43 UNION All select 
> att,filename,'table44' as mo_name from table44 UNION All select 
> att,filename,'table45' as mo_name from table45 UNION All select 
> att,filename,'table46' as mo_name from table46 UNION All select 
> att,filename,'table47' as mo_name from table47 UNION All select 
> att,filename,'table48' as mo_name from table48

[jira] [Updated] (FLINK-35367) Reorganize sinks

2024-05-30 Thread Jira


 [ 
https://issues.apache.org/jira/browse/FLINK-35367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

João Boto updated FLINK-35367:
--
Summary: Reorganize sinks  (was: Reorganize datastream sink and source)

> Reorganize sinks
> 
>
> Key: FLINK-35367
> URL: https://issues.apache.org/jira/browse/FLINK-35367
> Project: Flink
>  Issue Type: Sub-task
>Reporter: João Boto
>Priority: Major
>
> Reorganize datastream sink and source



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-35358) Breaking change when loading artifacts

2024-05-30 Thread Hong Liang Teoh (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Liang Teoh updated FLINK-35358:

Fix Version/s: 1.20.0

> Breaking change when loading artifacts
> --
>
> Key: FLINK-35358
> URL: https://issues.apache.org/jira/browse/FLINK-35358
> Project: Flink
>  Issue Type: Bug
>  Components: Client / Job Submission, flink-docker
>Affects Versions: 1.19.0
>Reporter: Rasmus Thygesen
>Priority: Not a Priority
>  Labels: pull-request-available
> Fix For: 1.20.0, 1.19.1
>
>
> We have been using the following code snippet in our Dockerfiles for running 
> a Flink job in application mode
>  
> {code:java}
> FROM flink:1.18.1-scala_2.12-java17
> COPY --from=build /app/target/my-job*.jar 
> /opt/flink/usrlib/artifacts/my-job.jar
> USER flink {code}
>  
> Which has been working since at least around Flink 1.14, but the 1.19 update 
> has broken our Dockerfiles. The fix is to put the jar file a step further out 
> so the code snippet becomes
>  
> {code:java}
> FROM flink:1.18.1-scala_2.12-java17
> COPY --from=build /app/target/my-job*.jar /opt/flink/usrlib/my-job.jar
> USER flink  {code}
>  
> We have not spent too much time looking into what the cause is, but we get 
> the stack trace
>  
> {code:java}
> myjob-jobmanager-1   | org.apache.flink.util.FlinkException: Could not load 
> the provided entrypoint class.
> myjob-jobmanager-1   |     at 
> org.apache.flink.client.program.DefaultPackagedProgramRetriever.getPackagedProgram(DefaultPackagedProgramRetriever.java:230)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.container.entrypoint.StandaloneApplicationClusterEntryPoint.getPackagedProgram(StandaloneApplicationClusterEntryPoint.java:149)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.container.entrypoint.StandaloneApplicationClusterEntryPoint.lambda$main$0(StandaloneApplicationClusterEntryPoint.java:90)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.container.entrypoint.StandaloneApplicationClusterEntryPoint.main(StandaloneApplicationClusterEntryPoint.java:89)
>  [flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   | Caused by: 
> org.apache.flink.client.program.ProgramInvocationException: The program's 
> entry point class 'my.company.job.MyJob' was not found in the jar file.
> myjob-jobmanager-1   |     at 
> org.apache.flink.client.program.PackagedProgram.loadMainClass(PackagedProgram.java:481)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.client.program.PackagedProgram.(PackagedProgram.java:153)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.client.program.PackagedProgram.(PackagedProgram.java:65)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.client.program.PackagedProgram$Builder.build(PackagedProgram.java:691)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.client.program.DefaultPackagedProgramRetriever.getPackagedProgram(DefaultPackagedProgramRetriever.java:228)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     ... 4 more
> myjob-jobmanager-1   | Caused by: java.lang.ClassNotFoundException: 
> my.company.job.MyJob
> myjob-jobmanager-1   |     at java.net.URLClassLoader.findClass(Unknown 
> Source) ~[?:?]
> myjob-jobmanager-1   |     at java.lang.ClassLoader.loadClass(Unknown Source) 
> ~[?:?]
> myjob-jobmanager-1   |     at 
> org.apache.flink.util.FlinkUserCodeClassLoader.loadClassWithoutExceptionHandling(FlinkUserCodeClassLoader.java:67)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.util.ChildFirstClassLoader.loadClassWithoutExceptionHandling(ChildFirstClassLoader.java:74)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.util.FlinkUserCodeClassLoader.loadClass(FlinkUserCodeClassLoader.java:51)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at java.lang.ClassLoader.loadClass(Unknown Source) 
> ~[?:?]
> myjob-jobmanager-1   |     at 
> org.apache.flink.util.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.loadClass(FlinkUserCodeClassLoaders.java:197)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   | 

[jira] [Commented] (FLINK-35358) Breaking change when loading artifacts

2024-05-30 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850729#comment-17850729
 ] 

Hong Liang Teoh commented on FLINK-35358:
-

merged commit 
[{{90a71a1}}|https://github.com/apache/flink/commit/90a71a124771697a0b8b2c2bbc520856d6ae9e25]
 into   apache:release-1.19

> Breaking change when loading artifacts
> --
>
> Key: FLINK-35358
> URL: https://issues.apache.org/jira/browse/FLINK-35358
> Project: Flink
>  Issue Type: Bug
>  Components: Client / Job Submission, flink-docker
>Affects Versions: 1.19.0
>Reporter: Rasmus Thygesen
>Priority: Not a Priority
>  Labels: pull-request-available
> Fix For: 1.19.1
>
>
> We have been using the following code snippet in our Dockerfiles for running 
> a Flink job in application mode
>  
> {code:java}
> FROM flink:1.18.1-scala_2.12-java17
> COPY --from=build /app/target/my-job*.jar 
> /opt/flink/usrlib/artifacts/my-job.jar
> USER flink {code}
>  
> Which has been working since at least around Flink 1.14, but the 1.19 update 
> has broken our Dockerfiles. The fix is to put the jar file a step further out 
> so the code snippet becomes
>  
> {code:java}
> FROM flink:1.18.1-scala_2.12-java17
> COPY --from=build /app/target/my-job*.jar /opt/flink/usrlib/my-job.jar
> USER flink  {code}
>  
> We have not spent too much time looking into what the cause is, but we get 
> the stack trace
>  
> {code:java}
> myjob-jobmanager-1   | org.apache.flink.util.FlinkException: Could not load 
> the provided entrypoint class.
> myjob-jobmanager-1   |     at 
> org.apache.flink.client.program.DefaultPackagedProgramRetriever.getPackagedProgram(DefaultPackagedProgramRetriever.java:230)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.container.entrypoint.StandaloneApplicationClusterEntryPoint.getPackagedProgram(StandaloneApplicationClusterEntryPoint.java:149)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.container.entrypoint.StandaloneApplicationClusterEntryPoint.lambda$main$0(StandaloneApplicationClusterEntryPoint.java:90)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.container.entrypoint.StandaloneApplicationClusterEntryPoint.main(StandaloneApplicationClusterEntryPoint.java:89)
>  [flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   | Caused by: 
> org.apache.flink.client.program.ProgramInvocationException: The program's 
> entry point class 'my.company.job.MyJob' was not found in the jar file.
> myjob-jobmanager-1   |     at 
> org.apache.flink.client.program.PackagedProgram.loadMainClass(PackagedProgram.java:481)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.client.program.PackagedProgram.(PackagedProgram.java:153)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.client.program.PackagedProgram.(PackagedProgram.java:65)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.client.program.PackagedProgram$Builder.build(PackagedProgram.java:691)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.client.program.DefaultPackagedProgramRetriever.getPackagedProgram(DefaultPackagedProgramRetriever.java:228)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     ... 4 more
> myjob-jobmanager-1   | Caused by: java.lang.ClassNotFoundException: 
> my.company.job.MyJob
> myjob-jobmanager-1   |     at java.net.URLClassLoader.findClass(Unknown 
> Source) ~[?:?]
> myjob-jobmanager-1   |     at java.lang.ClassLoader.loadClass(Unknown Source) 
> ~[?:?]
> myjob-jobmanager-1   |     at 
> org.apache.flink.util.FlinkUserCodeClassLoader.loadClassWithoutExceptionHandling(FlinkUserCodeClassLoader.java:67)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.util.ChildFirstClassLoader.loadClassWithoutExceptionHandling(ChildFirstClassLoader.java:74)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.util.FlinkUserCodeClassLoader.loadClass(FlinkUserCodeClassLoader.java:51)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at java.lang.ClassLoader.loadClass(Unknown Source) 
> ~[?:?]
> myjob-jobmanager-1   |     at 
> org.apache.flink.util.FlinkUserCodeClassLoader

[jira] [Commented] (FLINK-35358) Breaking change when loading artifacts

2024-05-30 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850727#comment-17850727
 ] 

Hong Liang Teoh commented on FLINK-35358:
-

 merged commit 
[{{853989b}}|https://github.com/apache/flink/commit/853989bd862c31e0c74cd5a584177dc401c5a3d4]
 into   apache:master

> Breaking change when loading artifacts
> --
>
> Key: FLINK-35358
> URL: https://issues.apache.org/jira/browse/FLINK-35358
> Project: Flink
>  Issue Type: Bug
>  Components: Client / Job Submission, flink-docker
>Affects Versions: 1.19.0
>Reporter: Rasmus Thygesen
>Priority: Not a Priority
>  Labels: pull-request-available
> Fix For: 1.19.1
>
>
> We have been using the following code snippet in our Dockerfiles for running 
> a Flink job in application mode
>  
> {code:java}
> FROM flink:1.18.1-scala_2.12-java17
> COPY --from=build /app/target/my-job*.jar 
> /opt/flink/usrlib/artifacts/my-job.jar
> USER flink {code}
>  
> Which has been working since at least around Flink 1.14, but the 1.19 update 
> has broken our Dockerfiles. The fix is to put the jar file a step further out 
> so the code snippet becomes
>  
> {code:java}
> FROM flink:1.18.1-scala_2.12-java17
> COPY --from=build /app/target/my-job*.jar /opt/flink/usrlib/my-job.jar
> USER flink  {code}
>  
> We have not spent too much time looking into what the cause is, but we get 
> the stack trace
>  
> {code:java}
> myjob-jobmanager-1   | org.apache.flink.util.FlinkException: Could not load 
> the provided entrypoint class.
> myjob-jobmanager-1   |     at 
> org.apache.flink.client.program.DefaultPackagedProgramRetriever.getPackagedProgram(DefaultPackagedProgramRetriever.java:230)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.container.entrypoint.StandaloneApplicationClusterEntryPoint.getPackagedProgram(StandaloneApplicationClusterEntryPoint.java:149)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.container.entrypoint.StandaloneApplicationClusterEntryPoint.lambda$main$0(StandaloneApplicationClusterEntryPoint.java:90)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.container.entrypoint.StandaloneApplicationClusterEntryPoint.main(StandaloneApplicationClusterEntryPoint.java:89)
>  [flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   | Caused by: 
> org.apache.flink.client.program.ProgramInvocationException: The program's 
> entry point class 'my.company.job.MyJob' was not found in the jar file.
> myjob-jobmanager-1   |     at 
> org.apache.flink.client.program.PackagedProgram.loadMainClass(PackagedProgram.java:481)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.client.program.PackagedProgram.(PackagedProgram.java:153)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.client.program.PackagedProgram.(PackagedProgram.java:65)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.client.program.PackagedProgram$Builder.build(PackagedProgram.java:691)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.client.program.DefaultPackagedProgramRetriever.getPackagedProgram(DefaultPackagedProgramRetriever.java:228)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     ... 4 more
> myjob-jobmanager-1   | Caused by: java.lang.ClassNotFoundException: 
> my.company.job.MyJob
> myjob-jobmanager-1   |     at java.net.URLClassLoader.findClass(Unknown 
> Source) ~[?:?]
> myjob-jobmanager-1   |     at java.lang.ClassLoader.loadClass(Unknown Source) 
> ~[?:?]
> myjob-jobmanager-1   |     at 
> org.apache.flink.util.FlinkUserCodeClassLoader.loadClassWithoutExceptionHandling(FlinkUserCodeClassLoader.java:67)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.util.ChildFirstClassLoader.loadClassWithoutExceptionHandling(ChildFirstClassLoader.java:74)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.util.FlinkUserCodeClassLoader.loadClass(FlinkUserCodeClassLoader.java:51)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at java.lang.ClassLoader.loadClass(Unknown Source) 
> ~[?:?]
> myjob-jobmanager-1   |     at 
> org.apache.flink.util.FlinkUserCodeClassLoader

[jira] [Comment Edited] (FLINK-33001) KafkaSource in batch mode failing with exception if topic partition is empty

2024-05-30 Thread Naci Simsek (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850678#comment-17850678
 ] 

Naci Simsek edited comment on FLINK-33001 at 5/30/24 1:07 PM:
--

Just be able to try with the *flink-connector-kafka 3.1.0-1.18* on *Flink 
1.18.1* session cluster with a batch job, and the exception is not there 
anymore. The job finishes gracefully, without setting any specific log level 
for any subclasses. I have the following log setting:
{code:java}
rootLogger.level = TRACE {code}
Therefore, the issue seems to be fixed.

However, there is another issue when setting a specific offset to start reading 
from, which causes an offset reset and infitinite loop of FetchTasks, so that 
the batch job keeps running, till there is an event received into the topic, 
but of course this should be a content of another ticket.


was (Author: JIRAUSER302646):
Just be able to try with the *flink-connector-kafka 3.1.0-1.18* on *Flink 
1.18.1* session cluster with a batch job, and the exception is not there 
anymore. The job finishes gracefully, without setting any specific log level 
for any subclasses. I have the following log setting:
{code:java}
rootLogger.level = TRACE {code}
Therefore, the issue seems to be fixed.

However, there is another issue when setting a specific offset to start reading 
from, which causes an offset reset and endless loop of FetchTasks, so that the 
batch job keeps running, till there is an event received into the topic, but of 
course this should be a content of another ticket.

> KafkaSource in batch mode failing with exception if topic partition is empty
> 
>
> Key: FLINK-33001
> URL: https://issues.apache.org/jira/browse/FLINK-33001
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.12.7, 1.14.6, 1.17.1
>Reporter: Abdul
>Priority: Major
>
> If the Kafka topic is empty in Batch mode, there is an exception while 
> processing it. This bug was supposedly fixed but unfortunately, the exception 
> still occurs. The original bug was reported as this 
> https://issues.apache.org/jira/browse/FLINK-27041
> We tried to backport it but it still doesn't work. 
>  * The problem will occur in case of the DEBUG level of logger for class 
> org.apache.flink.connector.kafka.source.reader.KafkaPartitionSplitReader
>  * The same problems will occur in other versions of Flink, at least in the 
> 1.15 release branch and tag release-1.15.4
>  * The same problem also occurs in Flink 1.17.1 and 1.14
>  
> The minimal code to produce this is 
>  
> {code:java}
>   final StreamExecutionEnvironment env = 
> StreamExecutionEnvironment.getExecutionEnvironment();
>   env.setRuntimeMode(RuntimeExecutionMode.BATCH);
>   KafkaSource kafkaSource = KafkaSource
>   .builder()
>   .setBootstrapServers("localhost:9092")
>   .setTopics("test_topic")
>   .setValueOnlyDeserializer(new 
> SimpleStringSchema())
>   .setBounded(OffsetsInitializer.latest())
>   .build();
>   DataStream stream = env.fromSource(
>   kafkaSource,
>   WatermarkStrategy.noWatermarks(),
>   "Kafka Source"
>   );
>   stream.print();
>   env.execute("Flink KafkaSource test job"); {code}
> This produces exception: 
> {code:java}
> Caused by: java.lang.RuntimeException: One or more fetchers have encountered 
> exception    at 
> org.apache.flink.connector.base.source.reader.fetcher.SplitFetcherManager.checkErrors(SplitFetcherManager.java:199)
>     at 
> org.apache.flink.connector.base.source.reader.SourceReaderBase.getNextFetch(SourceReaderBase.java:154)
>     at 
> org.apache.flink.connector.base.source.reader.SourceReaderBase.pollNext(SourceReaderBase.java:116)
>     at 
> org.apache.flink.streaming.api.operators.SourceOperator.emitNext(SourceOperator.java:275)
>     at 
> org.apache.flink.streaming.runtime.io.StreamTaskSourceInput.emitNext(StreamTaskSourceInput.java:67)
>     at 
> org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:398)
>

[jira] [Created] (FLINK-35493) Make max history age and count configurable for FlinkStateSnapshot resources

2024-05-30 Thread Mate Czagany (Jira)
Mate Czagany created FLINK-35493:


 Summary: Make max history age and count configurable for 
FlinkStateSnapshot resources
 Key: FLINK-35493
 URL: https://issues.apache.org/jira/browse/FLINK-35493
 Project: Flink
  Issue Type: Sub-task
  Components: Kubernetes Operator
Reporter: Mate Czagany






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35492) Add metrics for FlinkStateSnapshot resources

2024-05-30 Thread Mate Czagany (Jira)
Mate Czagany created FLINK-35492:


 Summary: Add metrics for FlinkStateSnapshot resources
 Key: FLINK-35492
 URL: https://issues.apache.org/jira/browse/FLINK-35492
 Project: Flink
  Issue Type: Sub-task
  Components: Kubernetes Operator
Reporter: Mate Czagany






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-30719) flink-runtime-web failed due to a corrupted nodejs dependency

2024-05-30 Thread Weijie Guo (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850700#comment-17850700
 ] 

Weijie Guo commented on FLINK-30719:


{code:java}
The archive file 
/__w/1/.m2/repository/com/github/eirslett/node/16.13.2/node-16.13.2-linux-x64.tar.gz
 is corrupted and will be deleted. Please try the build again.
{code}

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59964=logs=52b61abe-a3cc-5bde-cc35-1bbe89bb7df5=54421a62-0c80-5aad-3319-094ff69180bb=9778


> flink-runtime-web failed due to a corrupted nodejs dependency
> -
>
> Key: FLINK-30719
> URL: https://issues.apache.org/jira/browse/FLINK-30719
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Web Frontend, Test Infrastructure, Tests
>Affects Versions: 1.16.0, 1.17.0, 1.18.0
>Reporter: Matthias Pohl
>Assignee: Sergey Nuyanzin
>Priority: Critical
>  Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=44954=logs=52b61abe-a3cc-5bde-cc35-1bbe89bb7df5=54421a62-0c80-5aad-3319-094ff69180bb=12550
> The build failed due to a corrupted nodejs dependency:
> {code}
> [ERROR] The archive file 
> /__w/1/.m2/repository/com/github/eirslett/node/16.13.2/node-16.13.2-linux-x64.tar.gz
>  is corrupted and will be deleted. Please try the build again.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-35489) Metaspace size can be too little after autotuning change memory setting

2024-05-30 Thread Nicolas Fraison (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850694#comment-17850694
 ] 

Nicolas Fraison edited comment on FLINK-35489 at 5/30/24 12:07 PM:
---

Thks [~fanrui] for the feedback it help me realise that my analysis was wrong.

The issue we are facing is the JVM crashing after the 
[autotuning|https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/custom-resource/autotuning/]
 change some memory config:
{code:java}
Starting kubernetes-taskmanager as a console application on host 
flink-kafka-job-apache-right-taskmanager-1-1.
Exception in thread "main" *** java.lang.instrument ASSERTION FAILED ***: 
"result" with message agent load/premain call failed at 
src/java.instrument/share/native/libinstrument/JPLISAgent.c line: 422
FATAL ERROR in native method: processing of -javaagent failed, processJavaStart 
failed
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, 
Vv=VM code, C=native code)
V  [libjvm.so+0x78dee4]  jni_FatalError+0x70
V  [libjvm.so+0x88df00]  JvmtiExport::post_vm_initialized()+0x240
V  [libjvm.so+0xc353fc]  Threads::create_vm(JavaVMInitArgs*, bool*)+0x7ac
V  [libjvm.so+0x79c05c]  JNI_CreateJavaVM+0x7c
C  [libjli.so+0x3b2c]  JavaMain+0x7c
C  [libjli.so+0x7fdc]  ThreadJavaMain+0xc
C  [libpthread.so.0+0x7624]  start_thread+0x184 {code}
Seeing this big increase of HEAP (from 1.5 to more than 3GB and the fact that 
the memory.managed.size was shrink to 0b make me thing that it was linked to 
missing off heap.

But you are right that jvm-overhead already reserved some memory for the off 
heap (and we indeed have around 400 MB with that config)

So looking back to the new config I've identified the issue which is on the 
jvm-metaspace having been shrink to 22MB while it was set at 256MB.
I've done a test increasing this parameter and the TM is now able to start.

For the meta space computation size I can see the autotuning computing 
METASPACE_MEMORY_USED=1.41521584E8 which seems to be appropriate metaspace 
sizing.

But due to the the memBudget management it ends up setting only 22MB to the 
metaspace ([first allocate remaining memory to the heap and then this new 
remaining to metaspace and finally to managed 
memory|https://github.com/apache/flink-kubernetes-operator/blob/main/flink-autoscaler/src/main/java/org/apache/flink/autoscaler/tuning/MemoryTuning.java#L130])

 


was (Author: JIRAUSER299678):
Thks [~fanrui] for the feedback it help me realise that my analysis was wrong.

The issue we are facing ifs the JVM crashing after the 
[autotuning|https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/custom-resource/autotuning/]
 change some memory config:

 
{code:java}
Starting kubernetes-taskmanager as a console application on host 
flink-kafka-job-apache-right-taskmanager-1-1.
Exception in thread "main" *** java.lang.instrument ASSERTION FAILED ***: 
"result" with message agent load/premain call failed at 
src/java.instrument/share/native/libinstrument/JPLISAgent.c line: 422
FATAL ERROR in native method: processing of -javaagent failed, processJavaStart 
failed
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, 
Vv=VM code, C=native code)
V  [libjvm.so+0x78dee4]  jni_FatalError+0x70
V  [libjvm.so+0x88df00]  JvmtiExport::post_vm_initialized()+0x240
V  [libjvm.so+0xc353fc]  Threads::create_vm(JavaVMInitArgs*, bool*)+0x7ac
V  [libjvm.so+0x79c05c]  JNI_CreateJavaVM+0x7c
C  [libjli.so+0x3b2c]  JavaMain+0x7c
C  [libjli.so+0x7fdc]  ThreadJavaMain+0xc
C  [libpthread.so.0+0x7624]  start_thread+0x184 {code}
Seeing this big increase of HEAP (from 1.5 to more than 3GB and the fact that 
the memory.managed.size was shrink to 0b make me thing that it was linked to 
missing off heap.

But you are right that jvm-overhead already reserved some memory for the off 
heap (and we indeed have around 400 MB with that config)

So looking back to the new config I've identified the issue which is on the 
jvm-metaspace having been shrink to 22MB while it was set at 256MB.
I've done a test increasing this parameter and the TM is now able to start.

For the meta space computation size I can see the autotuning computing 
METASPACE_MEMORY_USED=1.41521584E8 which seems to be appropriate metaspace 
sizing.

But due to the the memBudget management it ends up setting only 22MB to the 
metaspace ([first allocate remaining memory to the heap and then this new 
remaining to metaspace and finally to managed 
memory|https://github.com/apache/flink-kubernetes-operator/blob/main/flink-autoscaler/src/main/java/org/apache/flink/autoscaler/tuning/MemoryTuning.java#L130])

 

> Metaspace size can be too little after autotuning change memory setting
> ---
>
>  

[jira] [Updated] (FLINK-35489) Metaspace size can be too little after autotuning change memory setting

2024-05-30 Thread Nicolas Fraison (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Fraison updated FLINK-35489:

Description: 
We have enable the autotuning feature on one of our flink job with below config
{code:java}
# Autoscaler configuration
job.autoscaler.enabled: "true"
job.autoscaler.stabilization.interval: 1m
job.autoscaler.metrics.window: 10m
job.autoscaler.target.utilization: "0.8"
job.autoscaler.target.utilization.boundary: "0.1"
job.autoscaler.restart.time: 2m
job.autoscaler.catch-up.duration: 10m
job.autoscaler.memory.tuning.enabled: true
job.autoscaler.memory.tuning.overhead: 0.5
job.autoscaler.memory.tuning.maximize-managed-memory: true{code}
During a scale down the autotuning decided to give all the memory to to JVM 
(having heap being scale by 2) settting taskmanager.memory.managed.size to 0b.
Here is the config that was compute by the autotuning for a TM running on a 4GB 
pod:
{code:java}
taskmanager.memory.network.max: 4063232b
taskmanager.memory.network.min: 4063232b
taskmanager.memory.jvm-overhead.max: 433791712b
taskmanager.memory.task.heap.size: 3699934605b
taskmanager.memory.framework.off-heap.size: 134217728b
taskmanager.memory.jvm-metaspace.size: 22960020b
taskmanager.memory.framework.heap.size: "0 bytes"
taskmanager.memory.flink.size: 3838215565b
taskmanager.memory.managed.size: 0b {code}
This has lead to some issue starting the TM because we are relying on some 
javaagent performing some memory allocation outside of the JVM (rely on some C 
bindings).

Tuning the overhead or disabling the scale-down-compensation.enabled could have 
helped for that particular event but this can leads to other issue as it could 
leads to too little HEAP size being computed.

It would be interesting to be able to set a min memory.managed.size to be taken 
in account by the autotuning.
What do you think about this? Do you think that some other specific config 
should have been applied to avoid this issue?

 

Edit see this comment that leads to the metaspace issue: 
https://issues.apache.org/jira/browse/FLINK-35489?focusedCommentId=17850694=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17850694

  was:
We have enable the autotuning feature on one of our flink job with below config
{code:java}
# Autoscaler configuration
job.autoscaler.enabled: "true"
job.autoscaler.stabilization.interval: 1m
job.autoscaler.metrics.window: 10m
job.autoscaler.target.utilization: "0.8"
job.autoscaler.target.utilization.boundary: "0.1"
job.autoscaler.restart.time: 2m
job.autoscaler.catch-up.duration: 10m
job.autoscaler.memory.tuning.enabled: true
job.autoscaler.memory.tuning.overhead: 0.5
job.autoscaler.memory.tuning.maximize-managed-memory: true{code}
During a scale down the autotuning decided to give all the memory to to JVM 
(having heap being scale by 2) settting taskmanager.memory.managed.size to 0b.
Here is the config that was compute by the autotuning for a TM running on a 4GB 
pod:
{code:java}
taskmanager.memory.network.max: 4063232b
taskmanager.memory.network.min: 4063232b
taskmanager.memory.jvm-overhead.max: 433791712b
taskmanager.memory.task.heap.size: 3699934605b
taskmanager.memory.framework.off-heap.size: 134217728b
taskmanager.memory.jvm-metaspace.size: 22960020b
taskmanager.memory.framework.heap.size: "0 bytes"
taskmanager.memory.flink.size: 3838215565b
taskmanager.memory.managed.size: 0b {code}
This has lead to some issue starting the TM because we are relying on some 
javaagent performing some memory allocation outside of the JVM (rely on some C 
bindings).

Tuning the overhead or disabling the scale-down-compensation.enabled could have 
helped for that particular event but this can leads to other issue as it could 
leads to too little HEAP size being computed.

It would be interesting to be able to set a min memory.managed.size to be taken 
in account by the autotuning.
What do you think about this? Do you think that some other specific config 
should have been applied to avoid this issue?

 

Edit


> Metaspace size can be too little after autotuning change memory setting
> ---
>
> Key: FLINK-35489
> URL: https://issues.apache.org/jira/browse/FLINK-35489
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Affects Versions: 1.8.0
>Reporter: Nicolas Fraison
>Priority: Major
>
> We have enable the autotuning feature on one of our flink job with below 
> config
> {code:java}
> # Autoscaler configuration
> job.autoscaler.enabled: "true"
> job.autoscaler.stabilization

[jira] [Updated] (FLINK-35489) Metaspace size can be too little after autotuning change memory setting

2024-05-30 Thread Nicolas Fraison (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Fraison updated FLINK-35489:

Description: 
We have enable the autotuning feature on one of our flink job with below config
{code:java}
# Autoscaler configuration
job.autoscaler.enabled: "true"
job.autoscaler.stabilization.interval: 1m
job.autoscaler.metrics.window: 10m
job.autoscaler.target.utilization: "0.8"
job.autoscaler.target.utilization.boundary: "0.1"
job.autoscaler.restart.time: 2m
job.autoscaler.catch-up.duration: 10m
job.autoscaler.memory.tuning.enabled: true
job.autoscaler.memory.tuning.overhead: 0.5
job.autoscaler.memory.tuning.maximize-managed-memory: true{code}
During a scale down the autotuning decided to give all the memory to to JVM 
(having heap being scale by 2) settting taskmanager.memory.managed.size to 0b.
Here is the config that was compute by the autotuning for a TM running on a 4GB 
pod:
{code:java}
taskmanager.memory.network.max: 4063232b
taskmanager.memory.network.min: 4063232b
taskmanager.memory.jvm-overhead.max: 433791712b
taskmanager.memory.task.heap.size: 3699934605b
taskmanager.memory.framework.off-heap.size: 134217728b
taskmanager.memory.jvm-metaspace.size: 22960020b
taskmanager.memory.framework.heap.size: "0 bytes"
taskmanager.memory.flink.size: 3838215565b
taskmanager.memory.managed.size: 0b {code}
This has lead to some issue starting the TM because we are relying on some 
javaagent performing some memory allocation outside of the JVM (rely on some C 
bindings).

Tuning the overhead or disabling the scale-down-compensation.enabled could have 
helped for that particular event but this can leads to other issue as it could 
leads to too little HEAP size being computed.

It would be interesting to be able to set a min memory.managed.size to be taken 
in account by the autotuning.
What do you think about this? Do you think that some other specific config 
should have been applied to avoid this issue?

 

Edit

  was:
We have enable the autotuning feature on one of our flink job with below config
{code:java}
# Autoscaler configuration
job.autoscaler.enabled: "true"
job.autoscaler.stabilization.interval: 1m
job.autoscaler.metrics.window: 10m
job.autoscaler.target.utilization: "0.8"
job.autoscaler.target.utilization.boundary: "0.1"
job.autoscaler.restart.time: 2m
job.autoscaler.catch-up.duration: 10m
job.autoscaler.memory.tuning.enabled: true
job.autoscaler.memory.tuning.overhead: 0.5
job.autoscaler.memory.tuning.maximize-managed-memory: true{code}
During a scale down the autotuning decided to give all the memory to to JVM 
(having heap being scale by 2) settting taskmanager.memory.managed.size to 0b.
Here is the config that was compute by the autotuning for a TM running on a 4GB 
pod:
{code:java}
taskmanager.memory.network.max: 4063232b
taskmanager.memory.network.min: 4063232b
taskmanager.memory.jvm-overhead.max: 433791712b
taskmanager.memory.task.heap.size: 3699934605b
taskmanager.memory.framework.off-heap.size: 134217728b
taskmanager.memory.jvm-metaspace.size: 22960020b
taskmanager.memory.framework.heap.size: "0 bytes"
taskmanager.memory.flink.size: 3838215565b
taskmanager.memory.managed.size: 0b {code}
This has lead to some issue starting the TM because we are relying on some 
javaagent performing some memory allocation outside of the JVM (rely on some C 
bindings).

Tuning the overhead or disabling the scale-down-compensation.enabled could have 
helped for that particular event but this can leads to other issue as it could 
leads to too little HEAP size being computed.

It would be interesting to be able to set a min memory.managed.size to be taken 
in account by the autotuning.
What do you think about this? Do you think that some other specific config 
should have been applied to avoid this issue?


> Metaspace size can be too little after autotuning change memory setting
> ---
>
> Key: FLINK-35489
> URL: https://issues.apache.org/jira/browse/FLINK-35489
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Affects Versions: 1.8.0
>Reporter: Nicolas Fraison
>Priority: Major
>
> We have enable the autotuning feature on one of our flink job with below 
> config
> {code:java}
> # Autoscaler configuration
> job.autoscaler.enabled: "true"
> job.autoscaler.stabilization.interval: 1m
> job.autoscaler.metrics.window: 10m
> job.autoscaler.target.utilization: "0.8"
> job.autoscaler.target.utilization.boundary: "0.1"
> job.autoscaler.restart.time: 2m
> job.autoscaler.catch-up.

[jira] [Updated] (FLINK-35489) Metaspace size can be too little after autotuning change memory setting

2024-05-30 Thread Nicolas Fraison (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Fraison updated FLINK-35489:

Summary: Metaspace size can be too little after autotuning change memory 
setting  (was: Add capability to set min taskmanager.memory.managed.size when 
enabling autotuning)

> Metaspace size can be too little after autotuning change memory setting
> ---
>
> Key: FLINK-35489
> URL: https://issues.apache.org/jira/browse/FLINK-35489
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Affects Versions: 1.8.0
>Reporter: Nicolas Fraison
>Priority: Major
>
> We have enable the autotuning feature on one of our flink job with below 
> config
> {code:java}
> # Autoscaler configuration
> job.autoscaler.enabled: "true"
> job.autoscaler.stabilization.interval: 1m
> job.autoscaler.metrics.window: 10m
> job.autoscaler.target.utilization: "0.8"
> job.autoscaler.target.utilization.boundary: "0.1"
> job.autoscaler.restart.time: 2m
> job.autoscaler.catch-up.duration: 10m
> job.autoscaler.memory.tuning.enabled: true
> job.autoscaler.memory.tuning.overhead: 0.5
> job.autoscaler.memory.tuning.maximize-managed-memory: true{code}
> During a scale down the autotuning decided to give all the memory to to JVM 
> (having heap being scale by 2) settting taskmanager.memory.managed.size to 0b.
> Here is the config that was compute by the autotuning for a TM running on a 
> 4GB pod:
> {code:java}
> taskmanager.memory.network.max: 4063232b
> taskmanager.memory.network.min: 4063232b
> taskmanager.memory.jvm-overhead.max: 433791712b
> taskmanager.memory.task.heap.size: 3699934605b
> taskmanager.memory.framework.off-heap.size: 134217728b
> taskmanager.memory.jvm-metaspace.size: 22960020b
> taskmanager.memory.framework.heap.size: "0 bytes"
> taskmanager.memory.flink.size: 3838215565b
> taskmanager.memory.managed.size: 0b {code}
> This has lead to some issue starting the TM because we are relying on some 
> javaagent performing some memory allocation outside of the JVM (rely on some 
> C bindings).
> Tuning the overhead or disabling the scale-down-compensation.enabled could 
> have helped for that particular event but this can leads to other issue as it 
> could leads to too little HEAP size being computed.
> It would be interesting to be able to set a min memory.managed.size to be 
> taken in account by the autotuning.
> What do you think about this? Do you think that some other specific config 
> should have been applied to avoid this issue?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35489) Add capability to set min taskmanager.memory.managed.size when enabling autotuning

2024-05-30 Thread Nicolas Fraison (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850694#comment-17850694
 ] 

Nicolas Fraison commented on FLINK-35489:
-

Thks [~fanrui] for the feedback it help me realise that my analysis was wrong.

The issue we are facing ifs the JVM crashing after the 
[autotuning|https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/custom-resource/autotuning/]
 change some memory config:

 
{code:java}
Starting kubernetes-taskmanager as a console application on host 
flink-kafka-job-apache-right-taskmanager-1-1.
Exception in thread "main" *** java.lang.instrument ASSERTION FAILED ***: 
"result" with message agent load/premain call failed at 
src/java.instrument/share/native/libinstrument/JPLISAgent.c line: 422
FATAL ERROR in native method: processing of -javaagent failed, processJavaStart 
failed
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, 
Vv=VM code, C=native code)
V  [libjvm.so+0x78dee4]  jni_FatalError+0x70
V  [libjvm.so+0x88df00]  JvmtiExport::post_vm_initialized()+0x240
V  [libjvm.so+0xc353fc]  Threads::create_vm(JavaVMInitArgs*, bool*)+0x7ac
V  [libjvm.so+0x79c05c]  JNI_CreateJavaVM+0x7c
C  [libjli.so+0x3b2c]  JavaMain+0x7c
C  [libjli.so+0x7fdc]  ThreadJavaMain+0xc
C  [libpthread.so.0+0x7624]  start_thread+0x184 {code}
Seeing this big increase of HEAP (from 1.5 to more than 3GB and the fact that 
the memory.managed.size was shrink to 0b make me thing that it was linked to 
missing off heap.

But you are right that jvm-overhead already reserved some memory for the off 
heap (and we indeed have around 400 MB with that config)

So looking back to the new config I've identified the issue which is on the 
jvm-metaspace having been shrink to 22MB while it was set at 256MB.
I've done a test increasing this parameter and the TM is now able to start.

For the meta space computation size I can see the autotuning computing 
METASPACE_MEMORY_USED=1.41521584E8 which seems to be appropriate metaspace 
sizing.

But due to the the memBudget management it ends up setting only 22MB to the 
metaspace ([first allocate remaining memory to the heap and then this new 
remaining to metaspace and finally to managed 
memory|https://github.com/apache/flink-kubernetes-operator/blob/main/flink-autoscaler/src/main/java/org/apache/flink/autoscaler/tuning/MemoryTuning.java#L130])

 

> Add capability to set min taskmanager.memory.managed.size when enabling 
> autotuning
> --
>
> Key: FLINK-35489
>     URL: https://issues.apache.org/jira/browse/FLINK-35489
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Affects Versions: 1.8.0
>Reporter: Nicolas Fraison
>Priority: Major
>
> We have enable the autotuning feature on one of our flink job with below 
> config
> {code:java}
> # Autoscaler configuration
> job.autoscaler.enabled: "true"
> job.autoscaler.stabilization.interval: 1m
> job.autoscaler.metrics.window: 10m
> job.autoscaler.target.utilization: "0.8"
> job.autoscaler.target.utilization.boundary: "0.1"
> job.autoscaler.restart.time: 2m
> job.autoscaler.catch-up.duration: 10m
> job.autoscaler.memory.tuning.enabled: true
> job.autoscaler.memory.tuning.overhead: 0.5
> job.autoscaler.memory.tuning.maximize-managed-memory: true{code}
> During a scale down the autotuning decided to give all the memory to to JVM 
> (having heap being scale by 2) settting taskmanager.memory.managed.size to 0b.
> Here is the config that was compute by the autotuning for a TM running on a 
> 4GB pod:
> {code:java}
> taskmanager.memory.network.max: 4063232b
> taskmanager.memory.network.min: 4063232b
> taskmanager.memory.jvm-overhead.max: 433791712b
> taskmanager.memory.task.heap.size: 3699934605b
> taskmanager.memory.framework.off-heap.size: 134217728b
> taskmanager.memory.jvm-metaspace.size: 22960020b
> taskmanager.memory.framework.heap.size: "0 bytes"
> taskmanager.memory.flink.size: 3838215565b
> taskmanager.memory.managed.size: 0b {code}
> This has lead to some issue starting the TM because we are relying on some 
> javaagent performing some memory allocation outside of the JVM (rely on some 
> C bindings).
> Tuning the overhead or disabling the scale-down-compensation.enabled could 
> have helped for that particular event but this can leads to other issue as it 
> could leads to too little HEAP size being computed.
> It would be interesting to be able to set a min memory.managed.size to be 
> taken in account by the autotuning.
> What do you think about this? Do you think that some other specific config 
> should have been applied to avoid this issue?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33001) KafkaSource in batch mode failing with exception if topic partition is empty

2024-05-30 Thread Naci Simsek (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850678#comment-17850678
 ] 

Naci Simsek commented on FLINK-33001:
-

Just be able to try with the *flink-connector-kafka 3.1.0-1.18* on *Flink 
1.18.1* session cluster with a batch job, and the exception is not there 
anymore. The job finishes gracefully, without setting any specific log level 
for any subclasses. I have the following log setting:
{code:java}
rootLogger.level = TRACE {code}
Therefore, the issue seems to be fixed.

 

However, there is another issue when setting a specific offset to start reading 
from, which causes an offset reset and endless loop of 
FetchTasks, so that the batch job keeps running, till there is an event 
received into the topic, but of course this should be a content of another 
ticket.

> KafkaSource in batch mode failing with exception if topic partition is empty
> 
>
> Key: FLINK-33001
> URL: https://issues.apache.org/jira/browse/FLINK-33001
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.12.7, 1.14.6, 1.17.1
>Reporter: Abdul
>Priority: Major
>
> If the Kafka topic is empty in Batch mode, there is an exception while 
> processing it. This bug was supposedly fixed but unfortunately, the exception 
> still occurs. The original bug was reported as this 
> https://issues.apache.org/jira/browse/FLINK-27041
> We tried to backport it but it still doesn't work. 
>  * The problem will occur in case of the DEBUG level of logger for class 
> org.apache.flink.connector.kafka.source.reader.KafkaPartitionSplitReader
>  * The same problems will occur in other versions of Flink, at least in the 
> 1.15 release branch and tag release-1.15.4
>  * The same problem also occurs in Flink 1.17.1 and 1.14
>  
> The minimal code to produce this is 
>  
> {code:java}
>   final StreamExecutionEnvironment env = 
> StreamExecutionEnvironment.getExecutionEnvironment();
>   env.setRuntimeMode(RuntimeExecutionMode.BATCH);
>   KafkaSource kafkaSource = KafkaSource
>   .builder()
>   .setBootstrapServers("localhost:9092")
>   .setTopics("test_topic")
>   .setValueOnlyDeserializer(new 
> SimpleStringSchema())
>   .setBounded(OffsetsInitializer.latest())
>   .build();
>   DataStream stream = env.fromSource(
>   kafkaSource,
>   WatermarkStrategy.noWatermarks(),
>   "Kafka Source"
>   );
>   stream.print();
>   env.execute("Flink KafkaSource test job"); {code}
> This produces exception: 
> {code:java}
> Caused by: java.lang.RuntimeException: One or more fetchers have encountered 
> exception    at 
> org.apache.flink.connector.base.source.reader.fetcher.SplitFetcherManager.checkErrors(SplitFetcherManager.java:199)
>     at 
> org.apache.flink.connector.base.source.reader.SourceReaderBase.getNextFetch(SourceReaderBase.java:154)
>     at 
> org.apache.flink.connector.base.source.reader.SourceReaderBase.pollNext(SourceReaderBase.java:116)
>     at 
> org.apache.flink.streaming.api.operators.SourceOperator.emitNext(SourceOperator.java:275)
>     at 
> org.apache.flink.streaming.runtime.io.StreamTaskSourceInput.emitNext(StreamTaskSourceInput.java:67)
>     at 
> org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:398)
>     at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:191)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:619)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:583)
>     at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:758) 
>    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:573)    at 
> java.lang.Thread.run(Thread.java:748)Caused by: java.lang.RuntimeException: 
> SplitFetcher thread 0 received unexpected exception while polling the records 
>    at 
> org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.runOnce(SplitFetcher.java:146)
>   

[jira] [Comment Edited] (FLINK-33001) KafkaSource in batch mode failing with exception if topic partition is empty

2024-05-30 Thread Naci Simsek (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850678#comment-17850678
 ] 

Naci Simsek edited comment on FLINK-33001 at 5/30/24 10:41 AM:
---

Just be able to try with the *flink-connector-kafka 3.1.0-1.18* on *Flink 
1.18.1* session cluster with a batch job, and the exception is not there 
anymore. The job finishes gracefully, without setting any specific log level 
for any subclasses. I have the following log setting:
{code:java}
rootLogger.level = TRACE {code}
Therefore, the issue seems to be fixed.

However, there is another issue when setting a specific offset to start reading 
from, which causes an offset reset and endless loop of FetchTasks, so that the 
batch job keeps running, till there is an event received into the topic, but of 
course this should be a content of another ticket.


was (Author: JIRAUSER302646):
Just be able to try with the *flink-connector-kafka 3.1.0-1.18* on *Flink 
1.18.1* session cluster with a batch job, and the exception is not there 
anymore. The job finishes gracefully, without setting any specific log level 
for any subclasses. I have the following log setting:
{code:java}
rootLogger.level = TRACE {code}
Therefore, the issue seems to be fixed.

However, there is another issue when setting a specific offset to start reading 
from, which causes an offset reset and endless loop of 
FetchTasks, so that the batch job keeps running, till there is an event 
received into the topic, but of course this should be a content of another 
ticket.

> KafkaSource in batch mode failing with exception if topic partition is empty
> 
>
> Key: FLINK-33001
> URL: https://issues.apache.org/jira/browse/FLINK-33001
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.12.7, 1.14.6, 1.17.1
>Reporter: Abdul
>Priority: Major
>
> If the Kafka topic is empty in Batch mode, there is an exception while 
> processing it. This bug was supposedly fixed but unfortunately, the exception 
> still occurs. The original bug was reported as this 
> https://issues.apache.org/jira/browse/FLINK-27041
> We tried to backport it but it still doesn't work. 
>  * The problem will occur in case of the DEBUG level of logger for class 
> org.apache.flink.connector.kafka.source.reader.KafkaPartitionSplitReader
>  * The same problems will occur in other versions of Flink, at least in the 
> 1.15 release branch and tag release-1.15.4
>  * The same problem also occurs in Flink 1.17.1 and 1.14
>  
> The minimal code to produce this is 
>  
> {code:java}
>   final StreamExecutionEnvironment env = 
> StreamExecutionEnvironment.getExecutionEnvironment();
>   env.setRuntimeMode(RuntimeExecutionMode.BATCH);
>   KafkaSource kafkaSource = KafkaSource
>   .builder()
>   .setBootstrapServers("localhost:9092")
>   .setTopics("test_topic")
>   .setValueOnlyDeserializer(new 
> SimpleStringSchema())
>   .setBounded(OffsetsInitializer.latest())
>   .build();
>   DataStream stream = env.fromSource(
>   kafkaSource,
>   WatermarkStrategy.noWatermarks(),
>   "Kafka Source"
>   );
>   stream.print();
>   env.execute("Flink KafkaSource test job"); {code}
> This produces exception: 
> {code:java}
> Caused by: java.lang.RuntimeException: One or more fetchers have encountered 
> exception    at 
> org.apache.flink.connector.base.source.reader.fetcher.SplitFetcherManager.checkErrors(SplitFetcherManager.java:199)
>     at 
> org.apache.flink.connector.base.source.reader.SourceReaderBase.getNextFetch(SourceReaderBase.java:154)
>     at 
> org.apache.flink.connector.base.source.reader.SourceReaderBase.pollNext(SourceReaderBase.java:116)
>     at 
> org.apache.flink.streaming.api.operators.SourceOperator.emitNext(SourceOperator.java:275)
>     at 
> org.apache.flink.streaming.runtime.io.StreamTaskSourceInput.emitNext(StreamTaskSourceInput.java:67)
>     at 
> org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:398)
>

[jira] [Comment Edited] (FLINK-33001) KafkaSource in batch mode failing with exception if topic partition is empty

2024-05-30 Thread Naci Simsek (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850678#comment-17850678
 ] 

Naci Simsek edited comment on FLINK-33001 at 5/30/24 10:40 AM:
---

Just be able to try with the *flink-connector-kafka 3.1.0-1.18* on *Flink 
1.18.1* session cluster with a batch job, and the exception is not there 
anymore. The job finishes gracefully, without setting any specific log level 
for any subclasses. I have the following log setting:
{code:java}
rootLogger.level = TRACE {code}
Therefore, the issue seems to be fixed.

However, there is another issue when setting a specific offset to start reading 
from, which causes an offset reset and endless loop of 
FetchTasks, so that the batch job keeps running, till there is an event 
received into the topic, but of course this should be a content of another 
ticket.


was (Author: JIRAUSER302646):
Just be able to try with the *flink-connector-kafka 3.1.0-1.18* on *Flink 
1.18.1* session cluster with a batch job, and the exception is not there 
anymore. The job finishes gracefully, without setting any specific log level 
for any subclasses. I have the following log setting:
{code:java}
rootLogger.level = TRACE {code}
Therefore, the issue seems to be fixed.

 

However, there is another issue when setting a specific offset to start reading 
from, which causes an offset reset and endless loop of 
FetchTasks, so that the batch job keeps running, till there is an event 
received into the topic, but of course this should be a content of another 
ticket.

> KafkaSource in batch mode failing with exception if topic partition is empty
> 
>
> Key: FLINK-33001
> URL: https://issues.apache.org/jira/browse/FLINK-33001
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.12.7, 1.14.6, 1.17.1
>Reporter: Abdul
>Priority: Major
>
> If the Kafka topic is empty in Batch mode, there is an exception while 
> processing it. This bug was supposedly fixed but unfortunately, the exception 
> still occurs. The original bug was reported as this 
> https://issues.apache.org/jira/browse/FLINK-27041
> We tried to backport it but it still doesn't work. 
>  * The problem will occur in case of the DEBUG level of logger for class 
> org.apache.flink.connector.kafka.source.reader.KafkaPartitionSplitReader
>  * The same problems will occur in other versions of Flink, at least in the 
> 1.15 release branch and tag release-1.15.4
>  * The same problem also occurs in Flink 1.17.1 and 1.14
>  
> The minimal code to produce this is 
>  
> {code:java}
>   final StreamExecutionEnvironment env = 
> StreamExecutionEnvironment.getExecutionEnvironment();
>   env.setRuntimeMode(RuntimeExecutionMode.BATCH);
>   KafkaSource kafkaSource = KafkaSource
>   .builder()
>   .setBootstrapServers("localhost:9092")
>   .setTopics("test_topic")
>   .setValueOnlyDeserializer(new 
> SimpleStringSchema())
>   .setBounded(OffsetsInitializer.latest())
>   .build();
>   DataStream stream = env.fromSource(
>   kafkaSource,
>   WatermarkStrategy.noWatermarks(),
>   "Kafka Source"
>   );
>   stream.print();
>   env.execute("Flink KafkaSource test job"); {code}
> This produces exception: 
> {code:java}
> Caused by: java.lang.RuntimeException: One or more fetchers have encountered 
> exception    at 
> org.apache.flink.connector.base.source.reader.fetcher.SplitFetcherManager.checkErrors(SplitFetcherManager.java:199)
>     at 
> org.apache.flink.connector.base.source.reader.SourceReaderBase.getNextFetch(SourceReaderBase.java:154)
>     at 
> org.apache.flink.connector.base.source.reader.SourceReaderBase.pollNext(SourceReaderBase.java:116)
>     at 
> org.apache.flink.streaming.api.operators.SourceOperator.emitNext(SourceOperator.java:275)
>     at 
> org.apache.flink.streaming.runtime.io.StreamTaskSourceInput.emitNext(StreamTaskSourceInput.java:67)
>     at 
> org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:398)
>

  1   2   3   4   5   6   7   8   9   10   >