[jira] [Commented] (FLINK-31811) Unsupported complex data type for Flink SQL

2023-04-17 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17712937#comment-17712937
 ] 

Krzysztof Chmielewski commented on FLINK-31811:
---

Hi [~jirawech.s]
??Could you share me reproducible code???

The code is attached to the issue I've created  
https://issues.apache.org/jira/browse/FLINK-31197

> Unsupported complex data type for Flink SQL
> ---
>
> Key: FLINK-31811
> URL: https://issues.apache.org/jira/browse/FLINK-31811
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem
>Affects Versions: 1.16.1
>Reporter: jirawech.s
>Priority: Major
>
> I found this issue when I tried to write data on local filesystem using Flink 
> SQL
> {code:java}
> 19:51:32,966 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph     
>   [] - compact-operator (1/4) 
> (4f2a09b638c786f74262c675d248afd9_80fe6c4f32f605d447b391cdb16cc1ff_0_4) 
> switched from RUNNING to FAILED on 69ed2306-371b-4bfc-a98e-bf75fb41748f @ 
> localhost (dataPort=-1).
> java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
>     at java.util.ArrayList.rangeCheck(ArrayList.java:659) ~[?:1.8.0_301]
>     at java.util.ArrayList.get(ArrayList.java:435) ~[?:1.8.0_301]
>     at org.apache.parquet.schema.GroupType.getType(GroupType.java:216) 
> ~[parquet-column-1.12.2.jar:1.12.2]
>     at 
> org.apache.flink.formats.parquet.vector.ParquetSplitReaderUtil.createWritableColumnVector(ParquetSplitReaderUtil.java:523)
>  ~[flink-parquet-1.16.1.jar:1.16.1]
>     at 
> org.apache.flink.formats.parquet.vector.ParquetSplitReaderUtil.createWritableColumnVector(ParquetSplitReaderUtil.java:503)
>  ~[flink-parquet-1.16.1.jar:1.16.1]
>     at 
> org.apache.flink.formats.parquet.ParquetVectorizedInputFormat.createWritableVectors(ParquetVectorizedInputFormat.java:281)
>  ~[flink-parquet-1.16.1.jar:1.16.1]
>     at 
> org.apache.flink.formats.parquet.ParquetVectorizedInputFormat.createReaderBatch(ParquetVectorizedInputFormat.java:270)
>  ~[flink-parquet-1.16.1.jar:1.16.1]
>     at 
> org.apache.flink.formats.parquet.ParquetVectorizedInputFormat.createPoolOfBatches(ParquetVectorizedInputFormat.java:260)
>  ~[flink-parquet-1.16.1.jar:1.16.1]
>      {code}
> What i tried to do is writing complex data type to parquet file
> Here is the schema of sink table. The problematic data type is 
> ARRAY>
> {code:java}
> CREATE TEMPORARY TABLE local_table (
>  `user_id` STRING, `order_id` STRING, `amount` INT, `restaurant_id` STRING, 
> `experiment` ARRAY>, `dt` STRING
> ) PARTITIONED BY (`dt`) WITH (
>   'connector'='filesystem',
>   'path'='file:///tmp/test_hadoop_write',
>   'format'='parquet',
>   'auto-compaction'='true',
>   'sink.partition-commit.policy.kind'='success-file'
> ) {code}
> PS. It is used to work in Flink version 1.15.1
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-31811) Unsupported complex data type for Flink SQL

2023-04-14 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17712410#comment-17712410
 ] 

Krzysztof Chmielewski edited comment on FLINK-31811 at 4/14/23 2:18 PM:


[~jirawech.s] I've pasted a wrong ticket number, already edited my previous 
comment sory.

I was talking about this one https://issues.apache.org/jira/browse/FLINK-31197 
which is about parquet writer.


was (Author: kristoffsc):
[~jirawech.s] I've pasted a wrong ticket number, already edited my previous 
comment.

I was talking about this one https://issues.apache.org/jira/browse/FLINK-31197 
which is about parquet writer.
Sorry.

> Unsupported complex data type for Flink SQL
> ---
>
> Key: FLINK-31811
> URL: https://issues.apache.org/jira/browse/FLINK-31811
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem
>Affects Versions: 1.16.1
>Reporter: jirawech.s
>Priority: Major
> Fix For: 1.16.2
>
>
> I found this issue when I tried to write data on local filesystem using Flink 
> SQL
> {code:java}
> 19:51:32,966 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph     
>   [] - compact-operator (1/4) 
> (4f2a09b638c786f74262c675d248afd9_80fe6c4f32f605d447b391cdb16cc1ff_0_4) 
> switched from RUNNING to FAILED on 69ed2306-371b-4bfc-a98e-bf75fb41748f @ 
> localhost (dataPort=-1).
> java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
>     at java.util.ArrayList.rangeCheck(ArrayList.java:659) ~[?:1.8.0_301]
>     at java.util.ArrayList.get(ArrayList.java:435) ~[?:1.8.0_301]
>     at org.apache.parquet.schema.GroupType.getType(GroupType.java:216) 
> ~[parquet-column-1.12.2.jar:1.12.2]
>     at 
> org.apache.flink.formats.parquet.vector.ParquetSplitReaderUtil.createWritableColumnVector(ParquetSplitReaderUtil.java:523)
>  ~[flink-parquet-1.16.1.jar:1.16.1]
>     at 
> org.apache.flink.formats.parquet.vector.ParquetSplitReaderUtil.createWritableColumnVector(ParquetSplitReaderUtil.java:503)
>  ~[flink-parquet-1.16.1.jar:1.16.1]
>     at 
> org.apache.flink.formats.parquet.ParquetVectorizedInputFormat.createWritableVectors(ParquetVectorizedInputFormat.java:281)
>  ~[flink-parquet-1.16.1.jar:1.16.1]
>     at 
> org.apache.flink.formats.parquet.ParquetVectorizedInputFormat.createReaderBatch(ParquetVectorizedInputFormat.java:270)
>  ~[flink-parquet-1.16.1.jar:1.16.1]
>     at 
> org.apache.flink.formats.parquet.ParquetVectorizedInputFormat.createPoolOfBatches(ParquetVectorizedInputFormat.java:260)
>  ~[flink-parquet-1.16.1.jar:1.16.1]
>      {code}
> What i tried to do is writing complex data type to parquet file
> Here is the schema of sink table. The problematic data type is 
> ARRAY>
> {code:java}
> CREATE TEMPORARY TABLE local_table (
>  `user_id` STRING, `order_id` STRING, `amount` INT, `restaurant_id` STRING, 
> `experiment` ARRAY>, `dt` STRING
> ) PARTITIONED BY (`dt`) WITH (
>   'connector'='filesystem',
>   'path'='file:///tmp/test_hadoop_write',
>   'format'='parquet',
>   'auto-compaction'='true',
>   'sink.partition-commit.policy.kind'='success-file'
> ) {code}
> PS. It is used to work in Flink version 1.15.1
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-31811) Unsupported complex data type for Flink SQL

2023-04-14 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17712390#comment-17712390
 ] 

Krzysztof Chmielewski edited comment on FLINK-31811 at 4/14/23 2:18 PM:


I think this might be a duplicate of 
https://issues.apache.org/jira/browse/FLINK-31197 that manifest in SQL API.

P.S.
Are you sure that this was working in 1.15.1?
I know that writing "simple" complex types like Array of Intigers or Strings 
bBut not sure about this one -> Arrays Of Map.


was (Author: kristoffsc):
-I think this might be a duplicate of 
https://issues.apache.org/jira/browse/FLINK-31197 that manifest in SQL API.

P.S.
Are you sure that this was working in 1.15.1?
I know that writing "simple" complex types like Array of Intigers or Strings 
bBut not sure about this one -> Arrays Of Map.

> Unsupported complex data type for Flink SQL
> ---
>
> Key: FLINK-31811
> URL: https://issues.apache.org/jira/browse/FLINK-31811
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem
>Affects Versions: 1.16.1
>Reporter: jirawech.s
>Priority: Major
> Fix For: 1.16.2
>
>
> I found this issue when I tried to write data on local filesystem using Flink 
> SQL
> {code:java}
> 19:51:32,966 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph     
>   [] - compact-operator (1/4) 
> (4f2a09b638c786f74262c675d248afd9_80fe6c4f32f605d447b391cdb16cc1ff_0_4) 
> switched from RUNNING to FAILED on 69ed2306-371b-4bfc-a98e-bf75fb41748f @ 
> localhost (dataPort=-1).
> java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
>     at java.util.ArrayList.rangeCheck(ArrayList.java:659) ~[?:1.8.0_301]
>     at java.util.ArrayList.get(ArrayList.java:435) ~[?:1.8.0_301]
>     at org.apache.parquet.schema.GroupType.getType(GroupType.java:216) 
> ~[parquet-column-1.12.2.jar:1.12.2]
>     at 
> org.apache.flink.formats.parquet.vector.ParquetSplitReaderUtil.createWritableColumnVector(ParquetSplitReaderUtil.java:523)
>  ~[flink-parquet-1.16.1.jar:1.16.1]
>     at 
> org.apache.flink.formats.parquet.vector.ParquetSplitReaderUtil.createWritableColumnVector(ParquetSplitReaderUtil.java:503)
>  ~[flink-parquet-1.16.1.jar:1.16.1]
>     at 
> org.apache.flink.formats.parquet.ParquetVectorizedInputFormat.createWritableVectors(ParquetVectorizedInputFormat.java:281)
>  ~[flink-parquet-1.16.1.jar:1.16.1]
>     at 
> org.apache.flink.formats.parquet.ParquetVectorizedInputFormat.createReaderBatch(ParquetVectorizedInputFormat.java:270)
>  ~[flink-parquet-1.16.1.jar:1.16.1]
>     at 
> org.apache.flink.formats.parquet.ParquetVectorizedInputFormat.createPoolOfBatches(ParquetVectorizedInputFormat.java:260)
>  ~[flink-parquet-1.16.1.jar:1.16.1]
>      {code}
> What i tried to do is writing complex data type to parquet file
> Here is the schema of sink table. The problematic data type is 
> ARRAY>
> {code:java}
> CREATE TEMPORARY TABLE local_table (
>  `user_id` STRING, `order_id` STRING, `amount` INT, `restaurant_id` STRING, 
> `experiment` ARRAY>, `dt` STRING
> ) PARTITIONED BY (`dt`) WITH (
>   'connector'='filesystem',
>   'path'='file:///tmp/test_hadoop_write',
>   'format'='parquet',
>   'auto-compaction'='true',
>   'sink.partition-commit.policy.kind'='success-file'
> ) {code}
> PS. It is used to work in Flink version 1.15.1
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-31811) Unsupported complex data type for Flink SQL

2023-04-14 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17712410#comment-17712410
 ] 

Krzysztof Chmielewski edited comment on FLINK-31811 at 4/14/23 2:17 PM:


[~jirawech.s] I've pasted a wrong ticket number, already edited my previous 
comment.

I was talking about this one https://issues.apache.org/jira/browse/FLINK-31197 
which is about parquet writer.
Sorry.


was (Author: kristoffsc):
[~jirawech.s] I've pasted a wrong ticket number, already edidt my previous 
comment.

I was talking about this one https://issues.apache.org/jira/browse/FLINK-31197 
which is about parquet writer.
Sorry.

> Unsupported complex data type for Flink SQL
> ---
>
> Key: FLINK-31811
> URL: https://issues.apache.org/jira/browse/FLINK-31811
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem
>Affects Versions: 1.16.1
>Reporter: jirawech.s
>Priority: Major
> Fix For: 1.16.2
>
>
> I found this issue when I tried to write data on local filesystem using Flink 
> SQL
> {code:java}
> 19:51:32,966 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph     
>   [] - compact-operator (1/4) 
> (4f2a09b638c786f74262c675d248afd9_80fe6c4f32f605d447b391cdb16cc1ff_0_4) 
> switched from RUNNING to FAILED on 69ed2306-371b-4bfc-a98e-bf75fb41748f @ 
> localhost (dataPort=-1).
> java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
>     at java.util.ArrayList.rangeCheck(ArrayList.java:659) ~[?:1.8.0_301]
>     at java.util.ArrayList.get(ArrayList.java:435) ~[?:1.8.0_301]
>     at org.apache.parquet.schema.GroupType.getType(GroupType.java:216) 
> ~[parquet-column-1.12.2.jar:1.12.2]
>     at 
> org.apache.flink.formats.parquet.vector.ParquetSplitReaderUtil.createWritableColumnVector(ParquetSplitReaderUtil.java:523)
>  ~[flink-parquet-1.16.1.jar:1.16.1]
>     at 
> org.apache.flink.formats.parquet.vector.ParquetSplitReaderUtil.createWritableColumnVector(ParquetSplitReaderUtil.java:503)
>  ~[flink-parquet-1.16.1.jar:1.16.1]
>     at 
> org.apache.flink.formats.parquet.ParquetVectorizedInputFormat.createWritableVectors(ParquetVectorizedInputFormat.java:281)
>  ~[flink-parquet-1.16.1.jar:1.16.1]
>     at 
> org.apache.flink.formats.parquet.ParquetVectorizedInputFormat.createReaderBatch(ParquetVectorizedInputFormat.java:270)
>  ~[flink-parquet-1.16.1.jar:1.16.1]
>     at 
> org.apache.flink.formats.parquet.ParquetVectorizedInputFormat.createPoolOfBatches(ParquetVectorizedInputFormat.java:260)
>  ~[flink-parquet-1.16.1.jar:1.16.1]
>      {code}
> What i tried to do is writing complex data type to parquet file
> Here is the schema of sink table. The problematic data type is 
> ARRAY>
> {code:java}
> CREATE TEMPORARY TABLE local_table (
>  `user_id` STRING, `order_id` STRING, `amount` INT, `restaurant_id` STRING, 
> `experiment` ARRAY>, `dt` STRING
> ) PARTITIONED BY (`dt`) WITH (
>   'connector'='filesystem',
>   'path'='file:///tmp/test_hadoop_write',
>   'format'='parquet',
>   'auto-compaction'='true',
>   'sink.partition-commit.policy.kind'='success-file'
> ) {code}
> PS. It is used to work in Flink version 1.15.1
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31811) Unsupported complex data type for Flink SQL

2023-04-14 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17712410#comment-17712410
 ] 

Krzysztof Chmielewski commented on FLINK-31811:
---

[~jirawech.s] I've pasted a wrong ticket number, already edidt my previous 
comment.

I was talking about this one https://issues.apache.org/jira/browse/FLINK-31197 
which is about parquet writer.
Sorry.

> Unsupported complex data type for Flink SQL
> ---
>
> Key: FLINK-31811
> URL: https://issues.apache.org/jira/browse/FLINK-31811
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem
>Affects Versions: 1.16.1
>Reporter: jirawech.s
>Priority: Major
> Fix For: 1.16.2
>
>
> I found this issue when I tried to write data on local filesystem using Flink 
> SQL
> {code:java}
> 19:51:32,966 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph     
>   [] - compact-operator (1/4) 
> (4f2a09b638c786f74262c675d248afd9_80fe6c4f32f605d447b391cdb16cc1ff_0_4) 
> switched from RUNNING to FAILED on 69ed2306-371b-4bfc-a98e-bf75fb41748f @ 
> localhost (dataPort=-1).
> java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
>     at java.util.ArrayList.rangeCheck(ArrayList.java:659) ~[?:1.8.0_301]
>     at java.util.ArrayList.get(ArrayList.java:435) ~[?:1.8.0_301]
>     at org.apache.parquet.schema.GroupType.getType(GroupType.java:216) 
> ~[parquet-column-1.12.2.jar:1.12.2]
>     at 
> org.apache.flink.formats.parquet.vector.ParquetSplitReaderUtil.createWritableColumnVector(ParquetSplitReaderUtil.java:523)
>  ~[flink-parquet-1.16.1.jar:1.16.1]
>     at 
> org.apache.flink.formats.parquet.vector.ParquetSplitReaderUtil.createWritableColumnVector(ParquetSplitReaderUtil.java:503)
>  ~[flink-parquet-1.16.1.jar:1.16.1]
>     at 
> org.apache.flink.formats.parquet.ParquetVectorizedInputFormat.createWritableVectors(ParquetVectorizedInputFormat.java:281)
>  ~[flink-parquet-1.16.1.jar:1.16.1]
>     at 
> org.apache.flink.formats.parquet.ParquetVectorizedInputFormat.createReaderBatch(ParquetVectorizedInputFormat.java:270)
>  ~[flink-parquet-1.16.1.jar:1.16.1]
>     at 
> org.apache.flink.formats.parquet.ParquetVectorizedInputFormat.createPoolOfBatches(ParquetVectorizedInputFormat.java:260)
>  ~[flink-parquet-1.16.1.jar:1.16.1]
>      {code}
> What i tried to do is writing complex data type to parquet file
> Here is the schema of sink table. The problematic data type is 
> ARRAY>
> {code:java}
> CREATE TEMPORARY TABLE local_table (
>  `user_id` STRING, `order_id` STRING, `amount` INT, `restaurant_id` STRING, 
> `experiment` ARRAY>, `dt` STRING
> ) PARTITIONED BY (`dt`) WITH (
>   'connector'='filesystem',
>   'path'='file:///tmp/test_hadoop_write',
>   'format'='parquet',
>   'auto-compaction'='true',
>   'sink.partition-commit.policy.kind'='success-file'
> ) {code}
> PS. It is used to work in Flink version 1.15.1
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-31811) Unsupported complex data type for Flink SQL

2023-04-14 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17712390#comment-17712390
 ] 

Krzysztof Chmielewski edited comment on FLINK-31811 at 4/14/23 2:16 PM:


-I think this might be a duplicate of 
https://issues.apache.org/jira/browse/FLINK-31197 that manifest in SQL API.

P.S.
Are you sure that this was working in 1.15.1?
I know that writing "simple" complex types like Array of Intigers or Strings 
bBut not sure about this one -> Arrays Of Map.


was (Author: kristoffsc):
I think this might be a duplicate of 
-https://issues.apache.org/jira/browse/FLINK-31202- 
https://issues.apache.org/jira/browse/FLINK-31197 that manifest in SQL API.

P.S.
Are you sure that this was working in 1.15.1?
I know that writing "simple" complex types like Array of Intigers or Strings 
bBut not sure about this one -> Arrays Of Map.

> Unsupported complex data type for Flink SQL
> ---
>
> Key: FLINK-31811
> URL: https://issues.apache.org/jira/browse/FLINK-31811
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem
>Affects Versions: 1.16.1
>Reporter: jirawech.s
>Priority: Major
> Fix For: 1.16.2
>
>
> I found this issue when I tried to write data on local filesystem using Flink 
> SQL
> {code:java}
> 19:51:32,966 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph     
>   [] - compact-operator (1/4) 
> (4f2a09b638c786f74262c675d248afd9_80fe6c4f32f605d447b391cdb16cc1ff_0_4) 
> switched from RUNNING to FAILED on 69ed2306-371b-4bfc-a98e-bf75fb41748f @ 
> localhost (dataPort=-1).
> java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
>     at java.util.ArrayList.rangeCheck(ArrayList.java:659) ~[?:1.8.0_301]
>     at java.util.ArrayList.get(ArrayList.java:435) ~[?:1.8.0_301]
>     at org.apache.parquet.schema.GroupType.getType(GroupType.java:216) 
> ~[parquet-column-1.12.2.jar:1.12.2]
>     at 
> org.apache.flink.formats.parquet.vector.ParquetSplitReaderUtil.createWritableColumnVector(ParquetSplitReaderUtil.java:523)
>  ~[flink-parquet-1.16.1.jar:1.16.1]
>     at 
> org.apache.flink.formats.parquet.vector.ParquetSplitReaderUtil.createWritableColumnVector(ParquetSplitReaderUtil.java:503)
>  ~[flink-parquet-1.16.1.jar:1.16.1]
>     at 
> org.apache.flink.formats.parquet.ParquetVectorizedInputFormat.createWritableVectors(ParquetVectorizedInputFormat.java:281)
>  ~[flink-parquet-1.16.1.jar:1.16.1]
>     at 
> org.apache.flink.formats.parquet.ParquetVectorizedInputFormat.createReaderBatch(ParquetVectorizedInputFormat.java:270)
>  ~[flink-parquet-1.16.1.jar:1.16.1]
>     at 
> org.apache.flink.formats.parquet.ParquetVectorizedInputFormat.createPoolOfBatches(ParquetVectorizedInputFormat.java:260)
>  ~[flink-parquet-1.16.1.jar:1.16.1]
>      {code}
> What i tried to do is writing complex data type to parquet file
> Here is the schema of sink table. The problematic data type is 
> ARRAY>
> {code:java}
> CREATE TEMPORARY TABLE local_table (
>  `user_id` STRING, `order_id` STRING, `amount` INT, `restaurant_id` STRING, 
> `experiment` ARRAY>, `dt` STRING
> ) PARTITIONED BY (`dt`) WITH (
>   'connector'='filesystem',
>   'path'='file:///tmp/test_hadoop_write',
>   'format'='parquet',
>   'auto-compaction'='true',
>   'sink.partition-commit.policy.kind'='success-file'
> ) {code}
> PS. It is used to work in Flink version 1.15.1
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-31811) Unsupported complex data type for Flink SQL

2023-04-14 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17712390#comment-17712390
 ] 

Krzysztof Chmielewski edited comment on FLINK-31811 at 4/14/23 2:15 PM:


I think this might be a duplicate of 
-https://issues.apache.org/jira/browse/FLINK-31202- 
https://issues.apache.org/jira/browse/FLINK-31197 that manifest in SQL API.

P.S.
Are you sure that this was working in 1.15.1?
I know that writing "simple" complex types like Array of Intigers or Strings 
bBut not sure about this one -> Arrays Of Map.


was (Author: kristoffsc):
I think this might be a duplicate of 
https://issues.apache.org/jira/browse/FLINK-31202 that manifest in SQL API.

P.S.
Are you sure that this was working in 1.15.1?
I know that writing "simple" complex types like Array of Intigers or Strings 
bBut not sure about this one -> Arrays Of Map.

> Unsupported complex data type for Flink SQL
> ---
>
> Key: FLINK-31811
> URL: https://issues.apache.org/jira/browse/FLINK-31811
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem
>Affects Versions: 1.16.1
>Reporter: jirawech.s
>Priority: Major
> Fix For: 1.16.2
>
>
> I found this issue when I tried to write data on local filesystem using Flink 
> SQL
> {code:java}
> 19:51:32,966 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph     
>   [] - compact-operator (1/4) 
> (4f2a09b638c786f74262c675d248afd9_80fe6c4f32f605d447b391cdb16cc1ff_0_4) 
> switched from RUNNING to FAILED on 69ed2306-371b-4bfc-a98e-bf75fb41748f @ 
> localhost (dataPort=-1).
> java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
>     at java.util.ArrayList.rangeCheck(ArrayList.java:659) ~[?:1.8.0_301]
>     at java.util.ArrayList.get(ArrayList.java:435) ~[?:1.8.0_301]
>     at org.apache.parquet.schema.GroupType.getType(GroupType.java:216) 
> ~[parquet-column-1.12.2.jar:1.12.2]
>     at 
> org.apache.flink.formats.parquet.vector.ParquetSplitReaderUtil.createWritableColumnVector(ParquetSplitReaderUtil.java:523)
>  ~[flink-parquet-1.16.1.jar:1.16.1]
>     at 
> org.apache.flink.formats.parquet.vector.ParquetSplitReaderUtil.createWritableColumnVector(ParquetSplitReaderUtil.java:503)
>  ~[flink-parquet-1.16.1.jar:1.16.1]
>     at 
> org.apache.flink.formats.parquet.ParquetVectorizedInputFormat.createWritableVectors(ParquetVectorizedInputFormat.java:281)
>  ~[flink-parquet-1.16.1.jar:1.16.1]
>     at 
> org.apache.flink.formats.parquet.ParquetVectorizedInputFormat.createReaderBatch(ParquetVectorizedInputFormat.java:270)
>  ~[flink-parquet-1.16.1.jar:1.16.1]
>     at 
> org.apache.flink.formats.parquet.ParquetVectorizedInputFormat.createPoolOfBatches(ParquetVectorizedInputFormat.java:260)
>  ~[flink-parquet-1.16.1.jar:1.16.1]
>      {code}
> What i tried to do is writing complex data type to parquet file
> Here is the schema of sink table. The problematic data type is 
> ARRAY>
> {code:java}
> CREATE TEMPORARY TABLE local_table (
>  `user_id` STRING, `order_id` STRING, `amount` INT, `restaurant_id` STRING, 
> `experiment` ARRAY>, `dt` STRING
> ) PARTITIONED BY (`dt`) WITH (
>   'connector'='filesystem',
>   'path'='file:///tmp/test_hadoop_write',
>   'format'='parquet',
>   'auto-compaction'='true',
>   'sink.partition-commit.policy.kind'='success-file'
> ) {code}
> PS. It is used to work in Flink version 1.15.1
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31811) Unsupported complex data type for Flink SQL

2023-04-14 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17712390#comment-17712390
 ] 

Krzysztof Chmielewski commented on FLINK-31811:
---

I think this might be a duplicate of 
https://issues.apache.org/jira/browse/FLINK-31202 that manifest in SQL API.

P.S.
Are you sure that this was working in 1.15.1?
I know that writing "simple" complex types like Array of Intigers or Strings 
bBut not sure about this one -> Arrays Of Map.

> Unsupported complex data type for Flink SQL
> ---
>
> Key: FLINK-31811
> URL: https://issues.apache.org/jira/browse/FLINK-31811
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem
>Affects Versions: 1.16.1
>Reporter: jirawech.s
>Priority: Major
> Fix For: 1.16.2
>
>
> I found this issue when I tried to write data on local filesystem using Flink 
> SQL
> {code:java}
> 19:51:32,966 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph     
>   [] - compact-operator (1/4) 
> (4f2a09b638c786f74262c675d248afd9_80fe6c4f32f605d447b391cdb16cc1ff_0_4) 
> switched from RUNNING to FAILED on 69ed2306-371b-4bfc-a98e-bf75fb41748f @ 
> localhost (dataPort=-1).
> java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
>     at java.util.ArrayList.rangeCheck(ArrayList.java:659) ~[?:1.8.0_301]
>     at java.util.ArrayList.get(ArrayList.java:435) ~[?:1.8.0_301]
>     at org.apache.parquet.schema.GroupType.getType(GroupType.java:216) 
> ~[parquet-column-1.12.2.jar:1.12.2]
>     at 
> org.apache.flink.formats.parquet.vector.ParquetSplitReaderUtil.createWritableColumnVector(ParquetSplitReaderUtil.java:523)
>  ~[flink-parquet-1.16.1.jar:1.16.1]
>     at 
> org.apache.flink.formats.parquet.vector.ParquetSplitReaderUtil.createWritableColumnVector(ParquetSplitReaderUtil.java:503)
>  ~[flink-parquet-1.16.1.jar:1.16.1]
>     at 
> org.apache.flink.formats.parquet.ParquetVectorizedInputFormat.createWritableVectors(ParquetVectorizedInputFormat.java:281)
>  ~[flink-parquet-1.16.1.jar:1.16.1]
>     at 
> org.apache.flink.formats.parquet.ParquetVectorizedInputFormat.createReaderBatch(ParquetVectorizedInputFormat.java:270)
>  ~[flink-parquet-1.16.1.jar:1.16.1]
>     at 
> org.apache.flink.formats.parquet.ParquetVectorizedInputFormat.createPoolOfBatches(ParquetVectorizedInputFormat.java:260)
>  ~[flink-parquet-1.16.1.jar:1.16.1]
>      {code}
> What i tried to do is writing complex data type to parquet file
> Here is the schema of sink table. The problematic data type is 
> ARRAY>
> {code:java}
> CREATE TEMPORARY TABLE local_table (
>  `user_id` STRING, `order_id` STRING, `amount` INT, `restaurant_id` STRING, 
> `experiment` ARRAY>, `dt` STRING
> ) PARTITIONED BY (`dt`) WITH (
>   'connector'='filesystem',
>   'path'='file:///tmp/test_hadoop_write',
>   'format'='parquet',
>   'auto-compaction'='true',
>   'sink.partition-commit.policy.kind'='success-file'
> ) {code}
> PS. It is used to work in Flink version 1.15.1
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-26051) one sql has row_number =1 and the subsequent SQL has "case when" and "where" statement result Exception : The window can only be ordered in ASCENDING mode

2023-04-04 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700264#comment-17700264
 ] 

Krzysztof Chmielewski edited comment on FLINK-26051 at 4/4/23 11:00 AM:


hi [~qingyue]
The problem is that compCnt is wrongly calculated as zero for this query/rule 
or problem is that zero is wrongly handled later in the code?

With compCnt + 1 it seems that you add a bias value to every computation cost 
value, so Im not surprised that so many plans have changed and tests are 
failing. 


was (Author: kristoffsc):
hi [~qingyue]
I wonder, isn't the issue that for this query here the compCnt (CPU cost) is 
wrongly calculated as zero?
In other words issue is that compCnt is wrongly calculated as zero for this 
query/rule or problem is that zero is wrongly handled later in the code?

With compCnt + 1 it seems that you add a bias value to every computation cost 
value, so Im not surprised that so many plans have changed and tests are 
failing. 

> one sql has row_number =1 and the subsequent SQL has "case when" and "where" 
> statement result Exception : The window can only be ordered in ASCENDING mode
> --
>
> Key: FLINK-26051
> URL: https://issues.apache.org/jira/browse/FLINK-26051
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.12.2, 1.14.4
>Reporter: chuncheng wu
>Assignee: Jane Chan
>Priority: Major
> Attachments: image-2022-02-10-20-13-14-424.png, 
> image-2022-02-11-11-18-20-594.png, image-2022-06-17-21-28-54-886.png
>
>
> hello,
>    i have 2 sqls. One  sql (sql0) is "select xx from ( ROW_NUMBER statment) 
> where rn=1" and  the other one (sql1) is   "s{color:#505f79}elect ${fields} 
> from result where ${filter_conditions}{color}"  . The fields quoted in sql1 
> has one "case when" field .The two sql can work well seperately.but if they 
> combine  it results the exception as follow . It happen in the occasion when 
> logical plan turn into physical plan :
>  
> {code:java}
> org.apache.flink.table.api.TableException: The window can only be ordered in 
> ASCENDING mode.
>     at 
> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecOverAggregate.translateToPlanInternal(StreamExecOverAggregate.scala:98)
>     at 
> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecOverAggregate.translateToPlanInternal(StreamExecOverAggregate.scala:52)
>     at 
> org.apache.flink.table.planner.plan.nodes.exec.ExecNode$class.translateToPlan(ExecNode.scala:59)
>     at 
> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecOverAggregateBase.translateToPlan(StreamExecOverAggregateBase.scala:42)
>     at 
> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecCalc.translateToPlanInternal(StreamExecCalc.scala:54)
>     at 
> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecCalc.translateToPlanInternal(StreamExecCalc.scala:39)
>     at 
> org.apache.flink.table.planner.plan.nodes.exec.ExecNode$class.translateToPlan(ExecNode.scala:59)
>     at 
> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecCalcBase.translateToPlan(StreamExecCalcBase.scala:38)
>     at 
> org.apache.flink.table.planner.delegation.StreamPlanner$$anonfun$translateToPlan$1.apply(StreamPlanner.scala:66)
>     at 
> org.apache.flink.table.planner.delegation.StreamPlanner$$anonfun$translateToPlan$1.apply(StreamPlanner.scala:65)
>     at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>     at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>     at scala.collection.Iterator$class.foreach(Iterator.scala:891)
>     at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
>     at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>     at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>     at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>     at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>     at 
> org.apache.flink.table.planner.delegation.StreamPlanner.translateToPlan(StreamPlanner.scala:65)
>     at 
> org.apache.flink.table.planner.delegation.StreamPlanner.explain(StreamPlanner.scala:103)
>     at 
> org.apache.flink.table.planner.delegation.StreamPlanner.explain(StreamPlanner.scala:42)
>     at 
> org.apache.flink.table.api.internal.TableEnvironmentImpl.explainInternal(TableEnvironmentImpl.java:630)
>     at 
> org.apache.flink.table.api.internal.TableImpl.explain(TableImpl.java:582)
>     at 
> com.meituan.grocery.data.flink.test.BugTest.testRowNumber(BugTest.java:69)
>   

[jira] [Commented] (FLINK-26051) one sql has row_number =1 and the subsequent SQL has "case when" and "where" statement result Exception : The window can only be ordered in ASCENDING mode

2023-03-14 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700264#comment-17700264
 ] 

Krzysztof Chmielewski commented on FLINK-26051:
---

hi [~qingyue]
I wonder, isn't the issue that for this query here the compCnt (CPU cost) is 
wrongly calculated as zero?
In other words issue is that compCnt is wrongly calculated as zero for this 
query/rule or problem is that zero is wrongly handled later in the code?

With compCnt + 1 it seems that you add a bias value to every computation cost 
value, so Im not surprised that so many plans have changed and tests are 
failing. 

> one sql has row_number =1 and the subsequent SQL has "case when" and "where" 
> statement result Exception : The window can only be ordered in ASCENDING mode
> --
>
> Key: FLINK-26051
> URL: https://issues.apache.org/jira/browse/FLINK-26051
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.12.2, 1.14.4
>Reporter: chuncheng wu
>Assignee: Jane Chan
>Priority: Major
> Attachments: image-2022-02-10-20-13-14-424.png, 
> image-2022-02-11-11-18-20-594.png, image-2022-06-17-21-28-54-886.png
>
>
> hello,
>    i have 2 sqls. One  sql (sql0) is "select xx from ( ROW_NUMBER statment) 
> where rn=1" and  the other one (sql1) is   "s{color:#505f79}elect ${fields} 
> from result where ${filter_conditions}{color}"  . The fields quoted in sql1 
> has one "case when" field .The two sql can work well seperately.but if they 
> combine  it results the exception as follow . It happen in the occasion when 
> logical plan turn into physical plan :
>  
> {code:java}
> org.apache.flink.table.api.TableException: The window can only be ordered in 
> ASCENDING mode.
>     at 
> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecOverAggregate.translateToPlanInternal(StreamExecOverAggregate.scala:98)
>     at 
> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecOverAggregate.translateToPlanInternal(StreamExecOverAggregate.scala:52)
>     at 
> org.apache.flink.table.planner.plan.nodes.exec.ExecNode$class.translateToPlan(ExecNode.scala:59)
>     at 
> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecOverAggregateBase.translateToPlan(StreamExecOverAggregateBase.scala:42)
>     at 
> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecCalc.translateToPlanInternal(StreamExecCalc.scala:54)
>     at 
> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecCalc.translateToPlanInternal(StreamExecCalc.scala:39)
>     at 
> org.apache.flink.table.planner.plan.nodes.exec.ExecNode$class.translateToPlan(ExecNode.scala:59)
>     at 
> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecCalcBase.translateToPlan(StreamExecCalcBase.scala:38)
>     at 
> org.apache.flink.table.planner.delegation.StreamPlanner$$anonfun$translateToPlan$1.apply(StreamPlanner.scala:66)
>     at 
> org.apache.flink.table.planner.delegation.StreamPlanner$$anonfun$translateToPlan$1.apply(StreamPlanner.scala:65)
>     at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>     at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>     at scala.collection.Iterator$class.foreach(Iterator.scala:891)
>     at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
>     at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>     at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>     at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>     at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>     at 
> org.apache.flink.table.planner.delegation.StreamPlanner.translateToPlan(StreamPlanner.scala:65)
>     at 
> org.apache.flink.table.planner.delegation.StreamPlanner.explain(StreamPlanner.scala:103)
>     at 
> org.apache.flink.table.planner.delegation.StreamPlanner.explain(StreamPlanner.scala:42)
>     at 
> org.apache.flink.table.api.internal.TableEnvironmentImpl.explainInternal(TableEnvironmentImpl.java:630)
>     at 
> org.apache.flink.table.api.internal.TableImpl.explain(TableImpl.java:582)
>     at 
> com.meituan.grocery.data.flink.test.BugTest.testRowNumber(BugTest.java:69)
>     at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>     at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>     at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.base/java.lang.reflect.Method.invoke(Method.java:568)
>     at 
> 

[jira] [Commented] (FLINK-26051) one sql has row_number =1 and the subsequent SQL has "case when" and "where" statement result Exception : The window can only be ordered in ASCENDING mode

2023-03-13 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699685#comment-17699685
 ] 

Krzysztof Chmielewski commented on FLINK-26051:
---

Hi [~qingyue]
I would like to help with solving this issue.

could you tell me where are we regarding this one?
Reading the comments I'm not sure if you have some other branch/solution that 
was proposed by [~zhangbinzaifendou] or do you have something new?

I would like to work on this one but I would need some help with starting. 
I read your comment about `CommonCalc cannot produce right CPU cost for 
computations.`. I see that switch conditions there are really straight forward. 
Do you have an idea what might be missing there?



> one sql has row_number =1 and the subsequent SQL has "case when" and "where" 
> statement result Exception : The window can only be ordered in ASCENDING mode
> --
>
> Key: FLINK-26051
> URL: https://issues.apache.org/jira/browse/FLINK-26051
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.12.2, 1.14.4
>Reporter: chuncheng wu
>Assignee: Jane Chan
>Priority: Major
> Attachments: image-2022-02-10-20-13-14-424.png, 
> image-2022-02-11-11-18-20-594.png, image-2022-06-17-21-28-54-886.png
>
>
> hello,
>    i have 2 sqls. One  sql (sql0) is "select xx from ( ROW_NUMBER statment) 
> where rn=1" and  the other one (sql1) is   "s{color:#505f79}elect ${fields} 
> from result where ${filter_conditions}{color}"  . The fields quoted in sql1 
> has one "case when" field .The two sql can work well seperately.but if they 
> combine  it results the exception as follow . It happen in the occasion when 
> logical plan turn into physical plan :
>  
> {code:java}
> org.apache.flink.table.api.TableException: The window can only be ordered in 
> ASCENDING mode.
>     at 
> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecOverAggregate.translateToPlanInternal(StreamExecOverAggregate.scala:98)
>     at 
> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecOverAggregate.translateToPlanInternal(StreamExecOverAggregate.scala:52)
>     at 
> org.apache.flink.table.planner.plan.nodes.exec.ExecNode$class.translateToPlan(ExecNode.scala:59)
>     at 
> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecOverAggregateBase.translateToPlan(StreamExecOverAggregateBase.scala:42)
>     at 
> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecCalc.translateToPlanInternal(StreamExecCalc.scala:54)
>     at 
> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecCalc.translateToPlanInternal(StreamExecCalc.scala:39)
>     at 
> org.apache.flink.table.planner.plan.nodes.exec.ExecNode$class.translateToPlan(ExecNode.scala:59)
>     at 
> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecCalcBase.translateToPlan(StreamExecCalcBase.scala:38)
>     at 
> org.apache.flink.table.planner.delegation.StreamPlanner$$anonfun$translateToPlan$1.apply(StreamPlanner.scala:66)
>     at 
> org.apache.flink.table.planner.delegation.StreamPlanner$$anonfun$translateToPlan$1.apply(StreamPlanner.scala:65)
>     at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>     at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>     at scala.collection.Iterator$class.foreach(Iterator.scala:891)
>     at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
>     at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>     at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>     at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>     at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>     at 
> org.apache.flink.table.planner.delegation.StreamPlanner.translateToPlan(StreamPlanner.scala:65)
>     at 
> org.apache.flink.table.planner.delegation.StreamPlanner.explain(StreamPlanner.scala:103)
>     at 
> org.apache.flink.table.planner.delegation.StreamPlanner.explain(StreamPlanner.scala:42)
>     at 
> org.apache.flink.table.api.internal.TableEnvironmentImpl.explainInternal(TableEnvironmentImpl.java:630)
>     at 
> org.apache.flink.table.api.internal.TableImpl.explain(TableImpl.java:582)
>     at 
> com.meituan.grocery.data.flink.test.BugTest.testRowNumber(BugTest.java:69)
>     at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>     at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>     at 
> 

[jira] [Updated] (FLINK-31202) Add support for reading Parquet files containing Arrays with complex types.

2023-02-23 Thread Krzysztof Chmielewski (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krzysztof Chmielewski updated FLINK-31202:
--
Attachment: ParquetSourceArrayOfArraysIssue.java
ParquetSourceArrayOfRowIssue.java

> Add support for reading Parquet files containing Arrays with complex types.
> ---
>
> Key: FLINK-31202
> URL: https://issues.apache.org/jira/browse/FLINK-31202
> Project: Flink
>  Issue Type: New Feature
>Affects Versions: 1.16.0, 1.17.0, 1.16.1, 1.16.2, 1.17.1
>Reporter: Krzysztof Chmielewski
>Priority: Major
> Attachments: ParquetSourceArrayOfArraysIssue.java, 
> ParquetSourceArrayOfRowIssue.java, arrayOfArrayOfInts.snappy.parquet, 
> arrayOfrows.snappy.parquet
>
>
> reading complex types to Parquet is possible since Flink 1.16 after 
> implementing https://issues.apache.org/jira/browse/FLINK-24614
> However this implementation lacks support for reading complex nested types 
> such as
> * Array
> * Array
> * Array
> This ticket is about to add support for reading below types from Parquet 
> format files.
> Currently when trying to read Parquet file containing column which such a 
> type, below exception is thrown:
> {code:java}
> Caused by: java.lang.RuntimeException: Unsupported type in the list: ROW<`f1` 
> INT>
>   at 
> org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.readPrimitiveTypedRow(ArrayColumnReader.java:175)
>   at 
> org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.fetchNextValue(ArrayColumnReader.java:113)
>   at 
> org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.readToVector(ArrayColumnReader.java:81)
> {code}
> OR:
> {code:java}
> Caused by: java.lang.RuntimeException: Unsupported type in the list: 
> ARRAY
>   at 
> org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.readPrimitiveTypedRow(ArrayColumnReader.java:175)
>   at 
> org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.fetchNextValue(ArrayColumnReader.java:113)
>   at 
> org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.readToVector(ArrayColumnReader.java:81)
> {code}
> Parquet files and reproducer code is attached to the ticket



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-31202) Add support for reading Parquet files containing Arrays with complex types.

2023-02-23 Thread Krzysztof Chmielewski (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krzysztof Chmielewski updated FLINK-31202:
--
Attachment: arrayOfArrayOfInts.snappy.parquet
arrayOfrows.snappy.parquet

> Add support for reading Parquet files containing Arrays with complex types.
> ---
>
> Key: FLINK-31202
> URL: https://issues.apache.org/jira/browse/FLINK-31202
> Project: Flink
>  Issue Type: New Feature
>Affects Versions: 1.16.0, 1.17.0, 1.16.1, 1.16.2, 1.17.1
>Reporter: Krzysztof Chmielewski
>Priority: Major
> Attachments: ParquetSourceArrayOfArraysIssue.java, 
> ParquetSourceArrayOfRowIssue.java, arrayOfArrayOfInts.snappy.parquet, 
> arrayOfrows.snappy.parquet
>
>
> reading complex types to Parquet is possible since Flink 1.16 after 
> implementing https://issues.apache.org/jira/browse/FLINK-24614
> However this implementation lacks support for reading complex nested types 
> such as
> * Array
> * Array
> * Array
> This ticket is about to add support for reading below types from Parquet 
> format files.
> Currently when trying to read Parquet file containing column which such a 
> type, below exception is thrown:
> {code:java}
> Caused by: java.lang.RuntimeException: Unsupported type in the list: ROW<`f1` 
> INT>
>   at 
> org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.readPrimitiveTypedRow(ArrayColumnReader.java:175)
>   at 
> org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.fetchNextValue(ArrayColumnReader.java:113)
>   at 
> org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.readToVector(ArrayColumnReader.java:81)
> {code}
> OR:
> {code:java}
> Caused by: java.lang.RuntimeException: Unsupported type in the list: 
> ARRAY
>   at 
> org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.readPrimitiveTypedRow(ArrayColumnReader.java:175)
>   at 
> org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.fetchNextValue(ArrayColumnReader.java:113)
>   at 
> org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.readToVector(ArrayColumnReader.java:81)
> {code}
> Parquet files and reproducer code is attached to the ticket



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-31202) Add support for reading Parquet files containing Arrays with complex types.

2023-02-23 Thread Krzysztof Chmielewski (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krzysztof Chmielewski updated FLINK-31202:
--
Description: 
reading complex types to Parquet is possible since Flink 1.16 after 
implementing https://issues.apache.org/jira/browse/FLINK-24614

However this implementation lacks support for reading complex nested types such 
as
* Array
* Array
* Array

This ticket is about to add support for reading below types from Parquet format 
files.

Currently when trying to read Parquet file containing column which such a type, 
below exception is thrown:


{code:java}
Caused by: java.lang.RuntimeException: Unsupported type in the list: ROW<`f1` 
INT>
at 
org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.readPrimitiveTypedRow(ArrayColumnReader.java:175)
at 
org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.fetchNextValue(ArrayColumnReader.java:113)
at 
org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.readToVector(ArrayColumnReader.java:81)
{code}

OR:


{code:java}
Caused by: java.lang.RuntimeException: Unsupported type in the list: ARRAY
at 
org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.readPrimitiveTypedRow(ArrayColumnReader.java:175)
at 
org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.fetchNextValue(ArrayColumnReader.java:113)
at 
org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.readToVector(ArrayColumnReader.java:81)

{code}

Parquet files and reproducer code is attached to the ticket

  was:
reading complex types to Parquet is possible since Flink 1.16 after 
implementing https://issues.apache.org/jira/browse/FLINK-24614

However this implementation lacks support for reading complex nested types such 
as
* Array
* Array
* Array

This ticket is about to add support for reading below types from Parquet format 
files.

Currently when trying to read Parquet file containing column which such a type, 
below exception is thrown:


{code:java}
Caused by: java.lang.RuntimeException: Unsupported type in the list: ROW<`f1` 
INT>
at 
org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.readPrimitiveTypedRow(ArrayColumnReader.java:175)
at 
org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.fetchNextValue(ArrayColumnReader.java:113)
at 
org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.readToVector(ArrayColumnReader.java:81)
{code}

OR:


{code:java}
Caused by: java.lang.RuntimeException: Unsupported type in the list: ARRAY
at 
org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.readPrimitiveTypedRow(ArrayColumnReader.java:175)
at 
org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.fetchNextValue(ArrayColumnReader.java:113)
at 
org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.readToVector(ArrayColumnReader.java:81)

{code}




> Add support for reading Parquet files containing Arrays with complex types.
> ---
>
> Key: FLINK-31202
> URL: https://issues.apache.org/jira/browse/FLINK-31202
> Project: Flink
>  Issue Type: New Feature
>Affects Versions: 1.16.0, 1.17.0, 1.16.1, 1.16.2, 1.17.1
>Reporter: Krzysztof Chmielewski
>Priority: Major
>
> reading complex types to Parquet is possible since Flink 1.16 after 
> implementing https://issues.apache.org/jira/browse/FLINK-24614
> However this implementation lacks support for reading complex nested types 
> such as
> * Array
> * Array
> * Array
> This ticket is about to add support for reading below types from Parquet 
> format files.
> Currently when trying to read Parquet file containing column which such a 
> type, below exception is thrown:
> {code:java}
> Caused by: java.lang.RuntimeException: Unsupported type in the list: ROW<`f1` 
> INT>
>   at 
> org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.readPrimitiveTypedRow(ArrayColumnReader.java:175)
>   at 
> org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.fetchNextValue(ArrayColumnReader.java:113)
>   at 
> org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.readToVector(ArrayColumnReader.java:81)
> {code}
> OR:
> {code:java}
> Caused by: java.lang.RuntimeException: Unsupported type in the list: 
> ARRAY
>   at 
> org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.readPrimitiveTypedRow(ArrayColumnReader.java:175)
>   at 
> org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.fetchNextValue(ArrayColumnReader.java:113)
>   at 
> org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.readToVector(ArrayColumnReader.java:81)
> {code}
> Parquet files and reproducer code is attached to the 

[jira] [Created] (FLINK-31202) Add support for reading Parquet files containing Arrays with complex types.

2023-02-23 Thread Krzysztof Chmielewski (Jira)
Krzysztof Chmielewski created FLINK-31202:
-

 Summary: Add support for reading Parquet files containing Arrays 
with complex types.
 Key: FLINK-31202
 URL: https://issues.apache.org/jira/browse/FLINK-31202
 Project: Flink
  Issue Type: New Feature
Affects Versions: 1.16.1, 1.16.0, 1.17.0, 1.16.2, 1.17.1
Reporter: Krzysztof Chmielewski


reading complex types to Parquet is possible since Flink 1.16 after 
implementing https://issues.apache.org/jira/browse/FLINK-24614

However this implementation lacks support for reading complex nested types such 
as
* Array
* Array
* Array

This ticket is about to add support for reading below types from Parquet format 
files.

Currently when trying to read Parquet file containing column which such a type, 
below exception is thrown:


{code:java}
Caused by: java.lang.RuntimeException: Unsupported type in the list: ROW<`f1` 
INT>
at 
org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.readPrimitiveTypedRow(ArrayColumnReader.java:175)
at 
org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.fetchNextValue(ArrayColumnReader.java:113)
at 
org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.readToVector(ArrayColumnReader.java:81)
{code}

OR:


{code:java}
Caused by: java.lang.RuntimeException: Unsupported type in the list: ARRAY
at 
org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.readPrimitiveTypedRow(ArrayColumnReader.java:175)
at 
org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.fetchNextValue(ArrayColumnReader.java:113)
at 
org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.readToVector(ArrayColumnReader.java:81)

{code}





--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-31197) Unable to write Parquet files containing Arrays with complex types.

2023-02-23 Thread Krzysztof Chmielewski (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krzysztof Chmielewski updated FLINK-31197:
--
Summary: Unable to write Parquet files containing Arrays with complex 
types.  (was: Exception while writing Parqeut files containing Arrays with 
complex types.)

> Unable to write Parquet files containing Arrays with complex types.
> ---
>
> Key: FLINK-31197
> URL: https://issues.apache.org/jira/browse/FLINK-31197
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem, Formats (JSON, Avro, Parquet, 
> ORC, SequenceFile)
>Affects Versions: 1.15.0, 1.15.1, 1.16.0, 1.17.0, 1.15.2, 1.15.3, 1.16.1, 
> 1.15.4, 1.16.2, 1.17.1, 1.15.5
>Reporter: Krzysztof Chmielewski
>Priority: Major
> Attachments: ParquetSinkArrayOfArraysIssue.java
>
>
> After https://issues.apache.org/jira/browse/FLINK-17782 It should be possible 
> to write complex types with File sink using Parquet format. 
> However it turns out that still it is impossible to write types such as:
> * Array
> * Array
> * Array 
> When trying to write a Parquet row with such types, the below exception is 
> thrown:
> {code:java}
> Caused by: java.lang.RuntimeException: 
> org.apache.parquet.io.ParquetEncodingException: empty fields are illegal, the 
> field should be ommited completely instead
>   at 
> org.apache.flink.formats.parquet.row.ParquetRowDataBuilder$ParquetWriteSupport.write(ParquetRowDataBuilder.java:91)
>   at 
> org.apache.flink.formats.parquet.row.ParquetRowDataBuilder$ParquetWriteSupport.write(ParquetRowDataBuilder.java:71)
>   at 
> org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:138)
>   at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:310)
>   at 
> org.apache.flink.formats.parquet.ParquetBulkWriter.addElement(ParquetBulkWriter.java:52)
>   at 
> org.apache.flink.streaming.api.functions.sink.filesystem.BulkPartWriter.write(BulkPartWriter.java:51)
>   at 
> org.apache.flink.connector.file.sink.writer.FileWriterBucket.write(FileWriterBucket.java:191)
> {code}
> The exception is misleading, not showing the real problem. 
> The reason why those complex types are still not working is that during 
> developemnt of https://issues.apache.org/jira/browse/FLINK-17782
> code paths for those types were left without implementation, no Unsupported 
> Exception no nothing, simply empty methods. In 
> https://github.com/apache/flink/blob/release-1.16.1/flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/row/ParquetRowDataWriter.java
> You will see 
> {code:java}
> @Override
> public void write(ArrayData arrayData, int ordinal) {}
> {code}
> for MapWriter, ArrayWriter and RowWriter.
> I see two problems here:
> 1. Writing those three types is still not possible.
> 2. Flink is throwing an exception that gives no hint about the real issue 
> here. It could throw "Unsupported operation" for now. Maybe this should be 
> item for a different ticket?
> The code to reproduce this issue is attached to the ticket. It tries to write 
> to Parquet file a single row with one column of type Array>



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-31197) Exception while writing Parqeut files containing Arrays with complex types.

2023-02-23 Thread Krzysztof Chmielewski (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krzysztof Chmielewski updated FLINK-31197:
--
Component/s: Connectors / FileSystem
 Formats (JSON, Avro, Parquet, ORC, SequenceFile)

> Exception while writing Parqeut files containing Arrays with complex types.
> ---
>
> Key: FLINK-31197
> URL: https://issues.apache.org/jira/browse/FLINK-31197
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem, Formats (JSON, Avro, Parquet, 
> ORC, SequenceFile)
>Affects Versions: 1.15.0, 1.15.1, 1.16.0, 1.17.0, 1.15.2, 1.15.3, 1.16.1, 
> 1.15.4, 1.16.2, 1.17.1, 1.15.5
>Reporter: Krzysztof Chmielewski
>Priority: Major
> Attachments: ParquetSinkArrayOfArraysIssue.java
>
>
> After https://issues.apache.org/jira/browse/FLINK-17782 It should be possible 
> to write complex types with File sink using Parquet format. 
> However it turns out that still it is impossible to write types such as:
> * Array
> * Array
> * Array 
> When trying to write a Parquet row with such types, the below exception is 
> thrown:
> {code:java}
> Caused by: java.lang.RuntimeException: 
> org.apache.parquet.io.ParquetEncodingException: empty fields are illegal, the 
> field should be ommited completely instead
>   at 
> org.apache.flink.formats.parquet.row.ParquetRowDataBuilder$ParquetWriteSupport.write(ParquetRowDataBuilder.java:91)
>   at 
> org.apache.flink.formats.parquet.row.ParquetRowDataBuilder$ParquetWriteSupport.write(ParquetRowDataBuilder.java:71)
>   at 
> org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:138)
>   at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:310)
>   at 
> org.apache.flink.formats.parquet.ParquetBulkWriter.addElement(ParquetBulkWriter.java:52)
>   at 
> org.apache.flink.streaming.api.functions.sink.filesystem.BulkPartWriter.write(BulkPartWriter.java:51)
>   at 
> org.apache.flink.connector.file.sink.writer.FileWriterBucket.write(FileWriterBucket.java:191)
> {code}
> The exception is misleading, not showing the real problem. 
> The reason why those complex types are still not working is that during 
> developemnt of https://issues.apache.org/jira/browse/FLINK-17782
> code paths for those types were left without implementation, no Unsupported 
> Exception no nothing, simply empty methods. In 
> https://github.com/apache/flink/blob/release-1.16.1/flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/row/ParquetRowDataWriter.java
> You will see 
> {code:java}
> @Override
> public void write(ArrayData arrayData, int ordinal) {}
> {code}
> for MapWriter, ArrayWriter and RowWriter.
> I see two problems here:
> 1. Writing those three types is still not possible.
> 2. Flink is throwing an exception that gives no hint about the real issue 
> here. It could throw "Unsupported operation" for now. Maybe this should be 
> item for a different ticket?
> The code to reproduce this issue is attached to the ticket. It tries to write 
> to Parquet file a single row with one column of type Array>



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-31197) Exception while writing Parqeut files containing Arrays with complex types.

2023-02-23 Thread Krzysztof Chmielewski (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krzysztof Chmielewski updated FLINK-31197:
--
Description: 
After https://issues.apache.org/jira/browse/FLINK-17782 It should be possible 
to write complex types with File sink using Parquet format. 

However it turns out that still it is impossible to write types such as:
* Array
* Array
* Array 

When trying to write a Parquet row with such types, the below exception is 
thrown:
{code:java}
Caused by: java.lang.RuntimeException: 
org.apache.parquet.io.ParquetEncodingException: empty fields are illegal, the 
field should be ommited completely instead
at 
org.apache.flink.formats.parquet.row.ParquetRowDataBuilder$ParquetWriteSupport.write(ParquetRowDataBuilder.java:91)
at 
org.apache.flink.formats.parquet.row.ParquetRowDataBuilder$ParquetWriteSupport.write(ParquetRowDataBuilder.java:71)
at 
org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:138)
at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:310)
at 
org.apache.flink.formats.parquet.ParquetBulkWriter.addElement(ParquetBulkWriter.java:52)
at 
org.apache.flink.streaming.api.functions.sink.filesystem.BulkPartWriter.write(BulkPartWriter.java:51)
at 
org.apache.flink.connector.file.sink.writer.FileWriterBucket.write(FileWriterBucket.java:191)

{code}


The exception is misleading, not showing the real problem. 
The reason why those complex types are still not working is that during 
developemnt of https://issues.apache.org/jira/browse/FLINK-17782

code paths for those types were left without implementation, no Unsupported 
Exception no nothing, simply empty methods. In 
https://github.com/apache/flink/blob/release-1.16.1/flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/row/ParquetRowDataWriter.java
You will see 
{code:java}
@Override
public void write(ArrayData arrayData, int ordinal) {}
{code}

for MapWriter, ArrayWriter and RowWriter.

I see two problems here:
1. Writing those three types is still not possible.
2. Flink is throwing an exception that gives no hint about the real issue here. 
It could throw "Unsupported operation" for now. Maybe this should be item for a 
different ticket?


The code to reproduce this issue is attached to the ticket. It tries to write 
to Parquet file a single row with one column of type Array>

  was:
After https://issues.apache.org/jira/browse/FLINK-17782 It should be possible 
to write complex types with File sink using Parquet format. 

However it turns out that still it is impossible to write types such as:
Array
Array
Array 

When trying to write a Parquet row with such types, the below exception is 
thrown:
{code:java}
Caused by: java.lang.RuntimeException: 
org.apache.parquet.io.ParquetEncodingException: empty fields are illegal, the 
field should be ommited completely instead
at 
org.apache.flink.formats.parquet.row.ParquetRowDataBuilder$ParquetWriteSupport.write(ParquetRowDataBuilder.java:91)
at 
org.apache.flink.formats.parquet.row.ParquetRowDataBuilder$ParquetWriteSupport.write(ParquetRowDataBuilder.java:71)
at 
org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:138)
at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:310)
at 
org.apache.flink.formats.parquet.ParquetBulkWriter.addElement(ParquetBulkWriter.java:52)
at 
org.apache.flink.streaming.api.functions.sink.filesystem.BulkPartWriter.write(BulkPartWriter.java:51)
at 
org.apache.flink.connector.file.sink.writer.FileWriterBucket.write(FileWriterBucket.java:191)

{code}


The exception is misleading, not showing the real problem. 
The reason why those complex types are still not working is that during 
developemnt of https://issues.apache.org/jira/browse/FLINK-17782

code paths for those types were left without implementation, no Unsupported 
Exception no nothing, simply empty methods. In 
https://github.com/apache/flink/blob/release-1.16.1/flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/row/ParquetRowDataWriter.java
You will see 
{code:java}
@Override
public void write(ArrayData arrayData, int ordinal) {}
{code}

for MapWriter, ArrayWriter and RowWriter.

I see two problems here:
1. Writing those three types is still not possible.
2. Flink is throwing an exception that gives no hint about the real issue here. 
It could throw "Unsupported operation" for now. Maybe this should be item for a 
different ticket?


The code to reproduce this issue is attached to the ticket. It tries to write 
to Parquet file a single row with one column of type Array>


> Exception while writing Parqeut files containing Arrays with complex types.
> ---
>
>

[jira] [Updated] (FLINK-31197) Exception while writing Parqeut files containing Arrays with complex types.

2023-02-23 Thread Krzysztof Chmielewski (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krzysztof Chmielewski updated FLINK-31197:
--
Description: 
After https://issues.apache.org/jira/browse/FLINK-17782 It should be possible 
to write complex types with File sink using Parquet format. 

However it turns out that still it is impossible to write types such as:
Array
Array
Array 

When trying to write a Parquet row with such types, the below exception is 
thrown:
{code:java}
Caused by: java.lang.RuntimeException: 
org.apache.parquet.io.ParquetEncodingException: empty fields are illegal, the 
field should be ommited completely instead
at 
org.apache.flink.formats.parquet.row.ParquetRowDataBuilder$ParquetWriteSupport.write(ParquetRowDataBuilder.java:91)
at 
org.apache.flink.formats.parquet.row.ParquetRowDataBuilder$ParquetWriteSupport.write(ParquetRowDataBuilder.java:71)
at 
org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:138)
at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:310)
at 
org.apache.flink.formats.parquet.ParquetBulkWriter.addElement(ParquetBulkWriter.java:52)
at 
org.apache.flink.streaming.api.functions.sink.filesystem.BulkPartWriter.write(BulkPartWriter.java:51)
at 
org.apache.flink.connector.file.sink.writer.FileWriterBucket.write(FileWriterBucket.java:191)

{code}


The exception is misleading, not showing the real problem. 
The reason why those complex types are still not working is that during 
developemnt of https://issues.apache.org/jira/browse/FLINK-17782

code paths for those types were left without implementation, no Unsupported 
Exception no nothing, simply empty methods. In 
https://github.com/apache/flink/blob/release-1.16.1/flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/row/ParquetRowDataWriter.java
You will see 
{code:java}
@Override
public void write(ArrayData arrayData, int ordinal) {}
{code}

for MapWriter, ArrayWriter and RowWriter.

I see two problems here:
1. Writing those three types is still not possible.
2. Flink is throwing an exception that gives no hint about the real issue here. 
It could throw "Unsupported operation" for now. Maybe this should be item for a 
different ticket?


The code to reproduce this issue is attached to the ticket. It tries to write 
to Parquet file a single row with one column of type Array>

  was:
After https://issues.apache.org/jira/browse/FLINK-17782 It should be possible 
to write complex types with File sink using Parquet format. 

However it turns out that still it is impossible to write types such as:
Array
Array
Array 

When trying to write a Parquet row with such types, the below exception is 
thrown:
{code:java}
Caused by: java.lang.RuntimeException: 
org.apache.parquet.io.ParquetEncodingException: empty fields are illegal, the 
field should be ommited completely instead
at 
org.apache.flink.formats.parquet.row.ParquetRowDataBuilder$ParquetWriteSupport.write(ParquetRowDataBuilder.java:91)
at 
org.apache.flink.formats.parquet.row.ParquetRowDataBuilder$ParquetWriteSupport.write(ParquetRowDataBuilder.java:71)
at 
org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:138)
at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:310)
at 
org.apache.flink.formats.parquet.ParquetBulkWriter.addElement(ParquetBulkWriter.java:52)
at 
org.apache.flink.streaming.api.functions.sink.filesystem.BulkPartWriter.write(BulkPartWriter.java:51)
at 
org.apache.flink.connector.file.sink.writer.FileWriterBucket.write(FileWriterBucket.java:191)

{code}


The exception is misleading, not showing the real problem. 
The reason why those complex types are still not working is that during 
developemnt of https://issues.apache.org/jira/browse/FLINK-17782

code paths for those types were left without implementation, no Unsupported 
Exception no nothing, simply empty methods. In 
https://github.com/apache/flink/blob/release-1.16.1/flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/row/ParquetRowDataWriter.java
You will see 
{code:java}
@Override
public void write(ArrayData arrayData, int ordinal) {}
{code}

for MapWriter, ArrayWriter and RowWriter.

I see two problems here.
1. writing those three types is still not possible
2. Flink is throwing an exception that gives no hint about the real issue here. 
It could throw "Unsupported operation" for now. Maybe this should be item for 
different ticket?


The code to reproduce this issue is attached to the ticket. It tries to write 
to Parquet file a single row with one column of type Array>


> Exception while writing Parqeut files containing Arrays with complex types.
> ---
>
> 

[jira] [Created] (FLINK-31197) Exception while writing Parqeut files containing Arrays with complex types.

2023-02-23 Thread Krzysztof Chmielewski (Jira)
Krzysztof Chmielewski created FLINK-31197:
-

 Summary: Exception while writing Parqeut files containing Arrays 
with complex types.
 Key: FLINK-31197
 URL: https://issues.apache.org/jira/browse/FLINK-31197
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.16.1, 1.15.3, 1.15.2, 1.16.0, 1.15.1, 1.15.0, 1.17.0, 
1.15.4, 1.16.2, 1.17.1, 1.15.5
Reporter: Krzysztof Chmielewski
 Attachments: ParquetSinkArrayOfArraysIssue.java

After https://issues.apache.org/jira/browse/FLINK-17782 It should be possible 
to write complex types with File sink using Parquet format. 

However it turns out that still it is impossible to write types such as:
Array
Array
Array 

When trying to write a Parquet row with such types, the below exception is 
thrown:
{code:java}
Caused by: java.lang.RuntimeException: 
org.apache.parquet.io.ParquetEncodingException: empty fields are illegal, the 
field should be ommited completely instead
at 
org.apache.flink.formats.parquet.row.ParquetRowDataBuilder$ParquetWriteSupport.write(ParquetRowDataBuilder.java:91)
at 
org.apache.flink.formats.parquet.row.ParquetRowDataBuilder$ParquetWriteSupport.write(ParquetRowDataBuilder.java:71)
at 
org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:138)
at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:310)
at 
org.apache.flink.formats.parquet.ParquetBulkWriter.addElement(ParquetBulkWriter.java:52)
at 
org.apache.flink.streaming.api.functions.sink.filesystem.BulkPartWriter.write(BulkPartWriter.java:51)
at 
org.apache.flink.connector.file.sink.writer.FileWriterBucket.write(FileWriterBucket.java:191)

{code}


The exception is misleading, not showing the real problem. 
The reason why those complex types are still not working is that during 
developemnt of https://issues.apache.org/jira/browse/FLINK-17782

code paths for those types were left without implementation, no Unsupported 
Exception no nothing, simply empty methods. In 
https://github.com/apache/flink/blob/release-1.16.1/flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/row/ParquetRowDataWriter.java
You will see 
{code:java}
@Override
public void write(ArrayData arrayData, int ordinal) {}
{code}

for MapWriter, ArrayWriter and RowWriter.

I see two problems here.
1. writing those three types is still not possible
2. Flink is throwing an exception that gives no hint about the real issue here. 
It could throw "Unsupported operation" for now. Maybe this should be item for 
different ticket?


The code to reproduce this issue is attached to the ticket. It tries to write 
to Parquet file a single row with one column of type Array>



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-31021) JavaCodeSplitter doesn't split static method properly

2023-02-11 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17687449#comment-17687449
 ] 

Krzysztof Chmielewski edited comment on FLINK-31021 at 2/11/23 2:10 PM:


Ok so I verified and it seems that this is NOT a regression caused by my recent 
change to Code Splitter. 
It seems that splitting static methods was never supported. I've checked on 
1.15 branch.

I think that the question we should ask here is this in fact a bug and do we 
need to make Code Splitter to handle Static methods? Quating [~TsReaper] from 
https://github.com/apache/flink/pull/21393#pullrequestreview-1273870828
{code:java}
Our code splitter is not a universal solution. It only works for Flink 
generated code under several restrictions.
{code}

Having said that, [~xccui] is there Flink SQL query that makes Planner to 
generate Java code with static methods? If so, could you provide one?

If in fact this is needed feature I can work on it since recently I've made 
bigger changes to code splitter and I would be fairly easy for me to add this.


[~TsReaper] What do you think?



was (Author: kristoffsc):
Ok so I verified and it seems that this is NOT a regression caused by my recent 
change to Code Splitter. 
It seems that splitting static methods was never supported. I've checked on 
1.15 branch.

I think that the question we should ask here is this in fact a bug and do we 
need to make Code Splitter to handle Static methods? Quating [~TsReaper] from 
https://github.com/apache/flink/pull/21393#pullrequestreview-1273870828
{code:java}
Our code splitter is not a universal solution. It only works for Flink 
generated code under several restrictions.
{code}

[~xccui] is there any Flink SQL query that makes Planner to generate Java code 
with static methods?
If so, could you provide one?

If in fact this is needed feature I can work on it since recently I've made 
bigger changes to code splitter and I would be fairly easy for me to add this.


[~TsReaper] What do you think?


> JavaCodeSplitter doesn't split static method properly
> -
>
> Key: FLINK-31021
> URL: https://issues.apache.org/jira/browse/FLINK-31021
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.14.4, 1.15.3, 1.16.1
>Reporter: Xingcan Cui
>Priority: Minor
>
> The exception while compiling the generated source
> {code:java}
> cause=org.codehaus.commons.compiler.CompileException: Line 3383, Column 90: 
> Instance method "default void 
> org.apache.flink.formats.protobuf.deserialize.GeneratedProtoToRow_655d75db1cf943838f5500013edfba82.decodeImpl(foo.bar.LogData)"
>  cannot be invoked in static context,{code}
> The original method header 
> {code:java}
> public static RowData decode(foo.bar.LogData message){{code}
> The code after split
>  
> {code:java}
> Line 3383: public static RowData decode(foo.bar.LogData message){ 
> decodeImpl(message); return decodeReturnValue$0; } 
> Line 3384:
> Line 3385: void decodeImpl(foo.bar.LogData message) {{code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31021) JavaCodeSplitter doesn't split static method properly

2023-02-11 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17687449#comment-17687449
 ] 

Krzysztof Chmielewski commented on FLINK-31021:
---

Ok so I verified and it seems that this is NOT a regression caused by my recent 
change to Code Splitter. 
It seems that splitting static methods was never supported. I've checked on 
1.15 branch.

I think that the question we should ask here is this in fact a bug and do we 
need to make Code Splitter to handle Static methods? Quating [~TsReaper] from 
https://github.com/apache/flink/pull/21393#pullrequestreview-1273870828
{code:java}
Our code splitter is not a universal solution. It only works for Flink 
generated code under several restrictions.
{code}

[~xccui] is there any Flink SQL query that makes Planner to generate Java code 
with static methods?
If so, could you provide one?

If in fact this is needed feature I can work on it since recently I've made 
bigger changes to code splitter and I would be fairly easy for me to add this.


[~TsReaper] What do you think?


> JavaCodeSplitter doesn't split static method properly
> -
>
> Key: FLINK-31021
> URL: https://issues.apache.org/jira/browse/FLINK-31021
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.14.4, 1.15.3, 1.16.1
>Reporter: Xingcan Cui
>Priority: Minor
>
> The exception while compiling the generated source
> {code:java}
> cause=org.codehaus.commons.compiler.CompileException: Line 3383, Column 90: 
> Instance method "default void 
> org.apache.flink.formats.protobuf.deserialize.GeneratedProtoToRow_655d75db1cf943838f5500013edfba82.decodeImpl(foo.bar.LogData)"
>  cannot be invoked in static context,{code}
> The original method header 
> {code:java}
> public static RowData decode(foo.bar.LogData message){{code}
> The code after split
>  
> {code:java}
> Line 3383: public static RowData decode(foo.bar.LogData message){ 
> decodeImpl(message); return decodeReturnValue$0; } 
> Line 3384:
> Line 3385: void decodeImpl(foo.bar.LogData message) {{code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31021) JavaCodeSplitter doesn't split static method properly

2023-02-10 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17687368#comment-17687368
 ] 

Krzysztof Chmielewski commented on FLINK-31021:
---

Hi, i have few questions.
1. Could you provide full body of original decode method? 
 2. do you have sql query that reproduces the problem?

3. You marked affect version as 1.16.1 and below. Did you in fact had this on 
those or on a current master? Im asking because recently there was a change in 
code splitter merged to master 1.17 and 1.16 release that is not included in 
1.16.1 so I'm wondering if this is a regression or something new.

Let me know,
Cheers.

> JavaCodeSplitter doesn't split static method properly
> -
>
> Key: FLINK-31021
> URL: https://issues.apache.org/jira/browse/FLINK-31021
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.14.4, 1.15.3, 1.16.1
>Reporter: Xingcan Cui
>Priority: Minor
>
> The exception while compiling the generated source
> {code:java}
> cause=org.codehaus.commons.compiler.CompileException: Line 3383, Column 90: 
> Instance method "default void 
> org.apache.flink.formats.protobuf.deserialize.GeneratedProtoToRow_655d75db1cf943838f5500013edfba82.decodeImpl(foo.bar.LogData)"
>  cannot be invoked in static context,{code}
> The original method header 
> {code:java}
> public static RowData decode(foo.bar.LogData message){{code}
> The code after split
>  
> {code:java}
> Line 3383: public static RowData decode(foo.bar.LogData message){ 
> decodeImpl(message); return decodeReturnValue$0; } 
> Line 3384:
> Line 3385: void decodeImpl(foo.bar.LogData message) {{code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-31018) SQL Client -j option does not load user jars to classpath.

2023-02-10 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17687218#comment-17687218
 ] 

Krzysztof Chmielewski edited comment on FLINK-31018 at 2/10/23 5:39 PM:


[~martijnvisser] yes it seemt that this is the case.

I;ve used DynamicTableFactory.Context#getClassLoader instead 
Thread.currentThread().getContextClassLoader() as suggested in one of the 
comments and it seems that problem disappeared. 

Thanks,
Thicket can be closed.


was (Author: kristoffsc):
[~martijnvisser] yes it seemt that this is the case.

I;ve used `DynamicTableFactory.Context#getClassLoader` instead 
`Thread.currentThread().getContextClassLoader()` as suggested in one of the 
comments and it seems that problem disappeared. 

Thanks,
Thicket can be closed.

> SQL Client -j option does not load user jars to classpath.
> --
>
> Key: FLINK-31018
> URL: https://issues.apache.org/jira/browse/FLINK-31018
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Client
>Affects Versions: 1.17.0, 1.16.1
>Reporter: Krzysztof Chmielewski
>Priority: Minor
> Attachments: image-2023-02-10-15-53-39-330.png, 
> image-2023-02-10-15-54-32-537.png, image-2023-02-10-16-05-12-407.png
>
>
> SQL Client '-j' option does not load custom jars to classpath as it was for 
> example in Flink 1.15
> As a result Flink 1.16 SQL Client is not able to discover classes through 
> Flink's Factory discovery mechanism throwing an error like:
> {code:java}
> [ERROR] Could not execute SQL statement. Reason:
> org.apache.flink.table.api.ValidationException: Could not find any factories 
> that implement 'com.getindata.connectors.http.LookupQueryCreatorFactory' in 
> the classpath.
> {code}
> The same Jar and sample job are working fine with Flink 1.15.
> Flink 1.15.2
> ./bin/sql-client.sh -j flink-http-connector-0.9.0.jar
>  !image-2023-02-10-15-53-39-330.png! 
> Flink 1.16.1
> ./bin/sql-client.sh -j flink-http-connector-0.9.0.jar
>  !image-2023-02-10-15-54-32-537.png! 
> ADD JAR command does not solve " Could not find any factories" issue although 
> jar seems to be added:
>  !image-2023-02-10-16-05-12-407.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-31018) SQL Client -j option does not load user jars to classpath.

2023-02-10 Thread Krzysztof Chmielewski (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krzysztof Chmielewski closed FLINK-31018.
-
Resolution: Not A Bug

> SQL Client -j option does not load user jars to classpath.
> --
>
> Key: FLINK-31018
> URL: https://issues.apache.org/jira/browse/FLINK-31018
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Client
>Affects Versions: 1.17.0, 1.16.1
>Reporter: Krzysztof Chmielewski
>Priority: Minor
> Attachments: image-2023-02-10-15-53-39-330.png, 
> image-2023-02-10-15-54-32-537.png, image-2023-02-10-16-05-12-407.png
>
>
> SQL Client '-j' option does not load custom jars to classpath as it was for 
> example in Flink 1.15
> As a result Flink 1.16 SQL Client is not able to discover classes through 
> Flink's Factory discovery mechanism throwing an error like:
> {code:java}
> [ERROR] Could not execute SQL statement. Reason:
> org.apache.flink.table.api.ValidationException: Could not find any factories 
> that implement 'com.getindata.connectors.http.LookupQueryCreatorFactory' in 
> the classpath.
> {code}
> The same Jar and sample job are working fine with Flink 1.15.
> Flink 1.15.2
> ./bin/sql-client.sh -j flink-http-connector-0.9.0.jar
>  !image-2023-02-10-15-53-39-330.png! 
> Flink 1.16.1
> ./bin/sql-client.sh -j flink-http-connector-0.9.0.jar
>  !image-2023-02-10-15-54-32-537.png! 
> ADD JAR command does not solve " Could not find any factories" issue although 
> jar seems to be added:
>  !image-2023-02-10-16-05-12-407.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31018) SQL Client -j option does not load user jars to classpath.

2023-02-10 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17687218#comment-17687218
 ] 

Krzysztof Chmielewski commented on FLINK-31018:
---

[~martijnvisser] yes it seemt that this is the case.

I;ve used `DynamicTableFactory.Context#getClassLoader` instead 
`Thread.currentThread().getContextClassLoader()` as suggested in one of the 
comments and it seems that problem disappeared. 

Thanks,
Thicket can be closed.

> SQL Client -j option does not load user jars to classpath.
> --
>
> Key: FLINK-31018
> URL: https://issues.apache.org/jira/browse/FLINK-31018
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Client
>Affects Versions: 1.17.0, 1.16.1
>Reporter: Krzysztof Chmielewski
>Priority: Minor
> Attachments: image-2023-02-10-15-53-39-330.png, 
> image-2023-02-10-15-54-32-537.png, image-2023-02-10-16-05-12-407.png
>
>
> SQL Client '-j' option does not load custom jars to classpath as it was for 
> example in Flink 1.15
> As a result Flink 1.16 SQL Client is not able to discover classes through 
> Flink's Factory discovery mechanism throwing an error like:
> {code:java}
> [ERROR] Could not execute SQL statement. Reason:
> org.apache.flink.table.api.ValidationException: Could not find any factories 
> that implement 'com.getindata.connectors.http.LookupQueryCreatorFactory' in 
> the classpath.
> {code}
> The same Jar and sample job are working fine with Flink 1.15.
> Flink 1.15.2
> ./bin/sql-client.sh -j flink-http-connector-0.9.0.jar
>  !image-2023-02-10-15-53-39-330.png! 
> Flink 1.16.1
> ./bin/sql-client.sh -j flink-http-connector-0.9.0.jar
>  !image-2023-02-10-15-54-32-537.png! 
> ADD JAR command does not solve " Could not find any factories" issue although 
> jar seems to be added:
>  !image-2023-02-10-16-05-12-407.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31018) SQL Client -j option does not load user jars to classpath.

2023-02-10 Thread Krzysztof Chmielewski (Jira)
Krzysztof Chmielewski created FLINK-31018:
-

 Summary: SQL Client -j option does not load user jars to classpath.
 Key: FLINK-31018
 URL: https://issues.apache.org/jira/browse/FLINK-31018
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Client
Affects Versions: 1.16.1, 1.17.0
Reporter: Krzysztof Chmielewski
 Attachments: image-2023-02-10-15-53-39-330.png, 
image-2023-02-10-15-54-32-537.png, image-2023-02-10-16-05-12-407.png

SQL Client '-j' option does not load custom jars to classpath as it was for 
example in Flink 1.15
As a result Flink 1.16 SQL Client is not able to discover classes through 
Flink's Factory discovery mechanism throwing an error like:

{code:java}
[ERROR] Could not execute SQL statement. Reason:
org.apache.flink.table.api.ValidationException: Could not find any factories 
that implement 'com.getindata.connectors.http.LookupQueryCreatorFactory' in the 
classpath.
{code}

The same Jar and sample job are working fine with Flink 1.15.

Flink 1.15.2
./bin/sql-client.sh -j flink-http-connector-0.9.0.jar
 !image-2023-02-10-15-53-39-330.png! 

Flink 1.16.1
./bin/sql-client.sh -j flink-http-connector-0.9.0.jar
 !image-2023-02-10-15-54-32-537.png! 

ADD JAR command does not solve " Could not find any factories" issue although 
jar seems to be added:
 !image-2023-02-10-16-05-12-407.png! 




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-23016) Job client must be a Coordination Request Gateway when submit a job on web ui

2023-02-10 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-23016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17687090#comment-17687090
 ] 

Krzysztof Chmielewski commented on FLINK-23016:
---

FYI, 
got the same error on Flink 1.16.1

> Job client must be a Coordination Request Gateway when submit a job on web ui 
> --
>
> Key: FLINK-23016
> URL: https://issues.apache.org/jira/browse/FLINK-23016
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Web Frontend
>Affects Versions: 1.13.1
> Environment: flink: 1.13.1
> flink-cdc: com.alibaba.ververica:flink-connector-postgres-cdc:1.4.0
> jdk:1.8
>Reporter: wen qi
>Priority: Not a Priority
>  Labels: auto-deprioritized-critical, auto-deprioritized-major, 
> auto-deprioritized-minor
> Attachments: WechatIMG10.png, WechatIMG11.png, WechatIMG8.png
>
>
> I used flink cdc to collect data,and use table api to  transfer data  and 
> write to another table.
> That's all ritht when I run code in IDE and submit jar of jobs use cli, but 
> web ui
> When I use StreamTableEnvironment.from('table-path').execute(), it's failed! 
> please check my attachments , it seems that a  bug of web ui bug ? 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-27246) Code of method "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V" of class "HashAggregateWithKeys$9211" grows beyond 64 KB

2023-02-07 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17684496#comment-17684496
 ] 

Krzysztof Chmielewski edited comment on FLINK-27246 at 2/7/23 11:20 AM:


master commit: af9a1128f728c691b896bc9c591e9be1327601c4 (included in 1.17 
branch)

1.16 backport PR https://github.com/apache/flink/pull/21860 (contains bugfix 
https://github.com/apache/flink/pull/21871)


was (Author: kristoffsc):
master commit: af9a1128f728c691b896bc9c591e9be1327601c4 (included in 1.17 
branch)

1.16 PR https://github.com/apache/flink/pull/21860 (contains bugfix 
https://github.com/apache/flink/pull/21871)

> Code of method 
> "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V"
>  of class "HashAggregateWithKeys$9211" grows beyond 64 KB
> -
>
> Key: FLINK-27246
> URL: https://issues.apache.org/jira/browse/FLINK-27246
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.14.3, 1.15.3, 1.16.1
>Reporter: Maciej Bryński
>Assignee: Krzysztof Chmielewski
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.17.0, 1.16.2
>
> Attachments: endInput_falseFilter9123_split9704.txt
>
>
> I think this bug should get fixed in 
> https://issues.apache.org/jira/browse/FLINK-23007
> Unfortunately I spotted it on Flink 1.14.3
> {code}
> java.lang.RuntimeException: Could not instantiate generated class 
> 'HashAggregateWithKeys$9211'
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:85)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.operators.CodeGenOperatorFactory.createStreamOperator(CodeGenOperatorFactory.java:40)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.api.operators.StreamOperatorFactoryUtil.createOperator(StreamOperatorFactoryUtil.java:81)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.(OperatorChain.java:198)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.(RegularOperatorChain.java:63)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:666)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:654)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at java.lang.Thread.run(Unknown Source) ~[?:?]
> Caused by: org.apache.flink.util.FlinkRuntimeException: 
> org.apache.flink.api.common.InvalidProgramException: Table program cannot be 
> compiled. This is a bug. Please file an issue.
>   at 
> org.apache.flink.table.runtime.generated.CompileUtils.compile(CompileUtils.java:76)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.compile(GeneratedClass.java:102)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:83)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   ... 11 more
> Caused by: 
> org.apache.flink.shaded.guava30.com.google.common.util.concurrent.UncheckedExecutionException:
>  org.apache.flink.api.common.InvalidProgramException: Table program cannot be 
> compiled. This is a bug. Please file an issue.
>   at 
> org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2051)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache.get(LocalCache.java:3962)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4859)
>  

[jira] [Comment Edited] (FLINK-27246) Code of method "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V" of class "HashAggregateWithKeys$9211" grows beyond 64 KB

2023-02-07 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17684496#comment-17684496
 ] 

Krzysztof Chmielewski edited comment on FLINK-27246 at 2/7/23 11:19 AM:


master commit: af9a1128f728c691b896bc9c591e9be1327601c4 (included in 1.17 
branch)

1.16 PR https://github.com/apache/flink/pull/21860 (contains bugfix 
https://github.com/apache/flink/pull/21871)


was (Author: kristoffsc):
master merge commit: af9a1128f728c691b896bc9c591e9be1327601c4 (included in 1.17 
branch)

1.16 PR https://github.com/apache/flink/pull/21860 (contains bugfix 
https://github.com/apache/flink/pull/21871)

> Code of method 
> "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V"
>  of class "HashAggregateWithKeys$9211" grows beyond 64 KB
> -
>
> Key: FLINK-27246
> URL: https://issues.apache.org/jira/browse/FLINK-27246
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.14.3, 1.15.3, 1.16.1
>Reporter: Maciej Bryński
>Assignee: Krzysztof Chmielewski
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.17.0, 1.16.2
>
> Attachments: endInput_falseFilter9123_split9704.txt
>
>
> I think this bug should get fixed in 
> https://issues.apache.org/jira/browse/FLINK-23007
> Unfortunately I spotted it on Flink 1.14.3
> {code}
> java.lang.RuntimeException: Could not instantiate generated class 
> 'HashAggregateWithKeys$9211'
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:85)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.operators.CodeGenOperatorFactory.createStreamOperator(CodeGenOperatorFactory.java:40)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.api.operators.StreamOperatorFactoryUtil.createOperator(StreamOperatorFactoryUtil.java:81)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.(OperatorChain.java:198)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.(RegularOperatorChain.java:63)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:666)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:654)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at java.lang.Thread.run(Unknown Source) ~[?:?]
> Caused by: org.apache.flink.util.FlinkRuntimeException: 
> org.apache.flink.api.common.InvalidProgramException: Table program cannot be 
> compiled. This is a bug. Please file an issue.
>   at 
> org.apache.flink.table.runtime.generated.CompileUtils.compile(CompileUtils.java:76)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.compile(GeneratedClass.java:102)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:83)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   ... 11 more
> Caused by: 
> org.apache.flink.shaded.guava30.com.google.common.util.concurrent.UncheckedExecutionException:
>  org.apache.flink.api.common.InvalidProgramException: Table program cannot be 
> compiled. This is a bug. Please file an issue.
>   at 
> org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2051)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache.get(LocalCache.java:3962)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4859)
>  

[jira] [Comment Edited] (FLINK-27246) Code of method "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V" of class "HashAggregateWithKeys$9211" grows beyond 64 KB

2023-02-07 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17684496#comment-17684496
 ] 

Krzysztof Chmielewski edited comment on FLINK-27246 at 2/7/23 11:15 AM:


master merge commit: af9a1128f728c691b896bc9c591e9be1327601c4 (included in 1.17 
branch)

1.16 PR https://github.com/apache/flink/pull/21860 (contains bugfix 
https://github.com/apache/flink/pull/21871)


was (Author: kristoffsc):
master merge commit: af9a1128f728c691b896bc9c591e9be1327601c4 (included in 1.17 
branch)

1.16 PR https://github.com/apache/flink/pull/21860

> Code of method 
> "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V"
>  of class "HashAggregateWithKeys$9211" grows beyond 64 KB
> -
>
> Key: FLINK-27246
> URL: https://issues.apache.org/jira/browse/FLINK-27246
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.14.3, 1.15.3, 1.16.1
>Reporter: Maciej Bryński
>Assignee: Krzysztof Chmielewski
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.17.0, 1.16.2
>
> Attachments: endInput_falseFilter9123_split9704.txt
>
>
> I think this bug should get fixed in 
> https://issues.apache.org/jira/browse/FLINK-23007
> Unfortunately I spotted it on Flink 1.14.3
> {code}
> java.lang.RuntimeException: Could not instantiate generated class 
> 'HashAggregateWithKeys$9211'
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:85)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.operators.CodeGenOperatorFactory.createStreamOperator(CodeGenOperatorFactory.java:40)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.api.operators.StreamOperatorFactoryUtil.createOperator(StreamOperatorFactoryUtil.java:81)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.(OperatorChain.java:198)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.(RegularOperatorChain.java:63)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:666)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:654)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at java.lang.Thread.run(Unknown Source) ~[?:?]
> Caused by: org.apache.flink.util.FlinkRuntimeException: 
> org.apache.flink.api.common.InvalidProgramException: Table program cannot be 
> compiled. This is a bug. Please file an issue.
>   at 
> org.apache.flink.table.runtime.generated.CompileUtils.compile(CompileUtils.java:76)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.compile(GeneratedClass.java:102)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:83)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   ... 11 more
> Caused by: 
> org.apache.flink.shaded.guava30.com.google.common.util.concurrent.UncheckedExecutionException:
>  org.apache.flink.api.common.InvalidProgramException: Table program cannot be 
> compiled. This is a bug. Please file an issue.
>   at 
> org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2051)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache.get(LocalCache.java:3962)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4859)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> 

[jira] [Comment Edited] (FLINK-27246) Code of method "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V" of class "HashAggregateWithKeys$9211" grows beyond 64 KB

2023-02-07 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17684496#comment-17684496
 ] 

Krzysztof Chmielewski edited comment on FLINK-27246 at 2/7/23 11:12 AM:


master merge commit: af9a1128f728c691b896bc9c591e9be1327601c4 (included in 1.17 
branch)

1.16 PR https://github.com/apache/flink/pull/21860


was (Author: kristoffsc):
master merge commit: af9a1128f728c691b896bc9c591e9be1327601c4

1.16 PR https://github.com/apache/flink/pull/21860

> Code of method 
> "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V"
>  of class "HashAggregateWithKeys$9211" grows beyond 64 KB
> -
>
> Key: FLINK-27246
> URL: https://issues.apache.org/jira/browse/FLINK-27246
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.14.3, 1.15.3, 1.16.1
>Reporter: Maciej Bryński
>Assignee: Krzysztof Chmielewski
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.17.0, 1.16.2
>
> Attachments: endInput_falseFilter9123_split9704.txt
>
>
> I think this bug should get fixed in 
> https://issues.apache.org/jira/browse/FLINK-23007
> Unfortunately I spotted it on Flink 1.14.3
> {code}
> java.lang.RuntimeException: Could not instantiate generated class 
> 'HashAggregateWithKeys$9211'
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:85)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.operators.CodeGenOperatorFactory.createStreamOperator(CodeGenOperatorFactory.java:40)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.api.operators.StreamOperatorFactoryUtil.createOperator(StreamOperatorFactoryUtil.java:81)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.(OperatorChain.java:198)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.(RegularOperatorChain.java:63)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:666)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:654)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at java.lang.Thread.run(Unknown Source) ~[?:?]
> Caused by: org.apache.flink.util.FlinkRuntimeException: 
> org.apache.flink.api.common.InvalidProgramException: Table program cannot be 
> compiled. This is a bug. Please file an issue.
>   at 
> org.apache.flink.table.runtime.generated.CompileUtils.compile(CompileUtils.java:76)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.compile(GeneratedClass.java:102)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:83)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   ... 11 more
> Caused by: 
> org.apache.flink.shaded.guava30.com.google.common.util.concurrent.UncheckedExecutionException:
>  org.apache.flink.api.common.InvalidProgramException: Table program cannot be 
> compiled. This is a bug. Please file an issue.
>   at 
> org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2051)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache.get(LocalCache.java:3962)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4859)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.CompileUtils.compile(CompileUtils.java:74)
>  

[jira] [Commented] (FLINK-30927) Several tests started generate output with two non-abstract methods have the same parameter types, declaring type and return type

2023-02-07 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685220#comment-17685220
 ] 

Krzysztof Chmielewski commented on FLINK-30927:
---

master: 96a296db723575d64857482a1278744e4c41201f

PR for 1.17 - https://github.com/apache/flink/pull/21879

> Several tests started generate output with two non-abstract methods  have the 
> same parameter types, declaring type and return type
> --
>
> Key: FLINK-30927
> URL: https://issues.apache.org/jira/browse/FLINK-30927
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.17.0, 1.16.2
>Reporter: Sergey Nuyanzin
>Assignee: Krzysztof Chmielewski
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.17.0, 1.16.2
>
>
> e.g. 
> org.apache.flink.table.planner.runtime.stream.sql.MatchRecognizeITCase#testUserDefinedFunctions
>  
> it seems during code splitter it starts generating some methods with same 
> signature
>  
> {noformat}
> org.codehaus.janino.InternalCompilerException: Compiling 
> "MatchRecognizePatternProcessFunction$77": Two non-abstract methods "default 
> void MatchRecognizePatternProcessFunction$77.processMatch_0(java.util.Map, 
> org.apache.flink.cep.functions.PatternProcessFunction$Context, 
> org.apache.flink.util.Collector) throws java.lang.Exception" have the same 
> parameter types, declaring type and return type
> {noformat}
>  
> Probably could be a side effect of 
> https://issues.apache.org/jira/browse/FLINK-27246



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] (FLINK-30927) Several tests started generate output with two non-abstract methods have the same parameter types, declaring type and return type

2023-02-07 Thread Krzysztof Chmielewski (Jira)


[ https://issues.apache.org/jira/browse/FLINK-30927 ]


Krzysztof Chmielewski deleted comment on FLINK-30927:
---

was (Author: kristoffsc):
master: 96a296db723575d64857482a1278744e4c41201f

> Several tests started generate output with two non-abstract methods  have the 
> same parameter types, declaring type and return type
> --
>
> Key: FLINK-30927
> URL: https://issues.apache.org/jira/browse/FLINK-30927
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.17.0, 1.16.2
>Reporter: Sergey Nuyanzin
>Assignee: Krzysztof Chmielewski
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.17.0, 1.16.2
>
>
> e.g. 
> org.apache.flink.table.planner.runtime.stream.sql.MatchRecognizeITCase#testUserDefinedFunctions
>  
> it seems during code splitter it starts generating some methods with same 
> signature
>  
> {noformat}
> org.codehaus.janino.InternalCompilerException: Compiling 
> "MatchRecognizePatternProcessFunction$77": Two non-abstract methods "default 
> void MatchRecognizePatternProcessFunction$77.processMatch_0(java.util.Map, 
> org.apache.flink.cep.functions.PatternProcessFunction$Context, 
> org.apache.flink.util.Collector) throws java.lang.Exception" have the same 
> parameter types, declaring type and return type
> {noformat}
>  
> Probably could be a side effect of 
> https://issues.apache.org/jira/browse/FLINK-27246



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-30927) Several tests started generate output with two non-abstract methods have the same parameter types, declaring type and return type

2023-02-07 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685215#comment-17685215
 ] 

Krzysztof Chmielewski commented on FLINK-30927:
---

master: 96a296db723575d64857482a1278744e4c41201f

> Several tests started generate output with two non-abstract methods  have the 
> same parameter types, declaring type and return type
> --
>
> Key: FLINK-30927
> URL: https://issues.apache.org/jira/browse/FLINK-30927
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.17.0, 1.16.2
>Reporter: Sergey Nuyanzin
>Assignee: Krzysztof Chmielewski
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.17.0, 1.16.2
>
>
> e.g. 
> org.apache.flink.table.planner.runtime.stream.sql.MatchRecognizeITCase#testUserDefinedFunctions
>  
> it seems during code splitter it starts generating some methods with same 
> signature
>  
> {noformat}
> org.codehaus.janino.InternalCompilerException: Compiling 
> "MatchRecognizePatternProcessFunction$77": Two non-abstract methods "default 
> void MatchRecognizePatternProcessFunction$77.processMatch_0(java.util.Map, 
> org.apache.flink.cep.functions.PatternProcessFunction$Context, 
> org.apache.flink.util.Collector) throws java.lang.Exception" have the same 
> parameter types, declaring type and return type
> {noformat}
>  
> Probably could be a side effect of 
> https://issues.apache.org/jira/browse/FLINK-27246



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-30927) Several tests started generate output with two non-abstract methods have the same parameter types, declaring type and return type

2023-02-07 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685175#comment-17685175
 ] 

Krzysztof Chmielewski commented on FLINK-30927:
---

PR needs to be merged to 1.17 branch aswell. 

> Several tests started generate output with two non-abstract methods  have the 
> same parameter types, declaring type and return type
> --
>
> Key: FLINK-30927
> URL: https://issues.apache.org/jira/browse/FLINK-30927
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Reporter: Sergey Nuyanzin
>Assignee: Krzysztof Chmielewski
>Priority: Major
>  Labels: pull-request-available
>
> e.g. 
> org.apache.flink.table.planner.runtime.stream.sql.MatchRecognizeITCase#testUserDefinedFunctions
>  
> it seems during code splitter it starts generating some methods with same 
> signature
>  
> {noformat}
> org.codehaus.janino.InternalCompilerException: Compiling 
> "MatchRecognizePatternProcessFunction$77": Two non-abstract methods "default 
> void MatchRecognizePatternProcessFunction$77.processMatch_0(java.util.Map, 
> org.apache.flink.cep.functions.PatternProcessFunction$Context, 
> org.apache.flink.util.Collector) throws java.lang.Exception" have the same 
> parameter types, declaring type and return type
> {noformat}
>  
> Probably could be a side effect of 
> https://issues.apache.org/jira/browse/FLINK-27246



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-30927) Several tests started generate output with two non-abstract methods have the same parameter types, declaring type and return type

2023-02-06 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17684943#comment-17684943
 ] 

Krzysztof Chmielewski commented on FLINK-30927:
---

OK,
CI build is green for provided PR, also I dont see any 
`InternalCompilerException ... Two non-abstract methods` exception in 
table_ci_table nor other tests from flink-table-planer.

I would appreciate for review for this small Bug FIX PR and sorry for any 
inconvenience caused by this.

> Several tests started generate output with two non-abstract methods  have the 
> same parameter types, declaring type and return type
> --
>
> Key: FLINK-30927
> URL: https://issues.apache.org/jira/browse/FLINK-30927
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Reporter: Sergey Nuyanzin
>Assignee: Krzysztof Chmielewski
>Priority: Major
>  Labels: pull-request-available
>
> e.g. 
> org.apache.flink.table.planner.runtime.stream.sql.MatchRecognizeITCase#testUserDefinedFunctions
>  
> it seems during code splitter it starts generating some methods with same 
> signature
>  
> {noformat}
> org.codehaus.janino.InternalCompilerException: Compiling 
> "MatchRecognizePatternProcessFunction$77": Two non-abstract methods "default 
> void MatchRecognizePatternProcessFunction$77.processMatch_0(java.util.Map, 
> org.apache.flink.cep.functions.PatternProcessFunction$Context, 
> org.apache.flink.util.Collector) throws java.lang.Exception" have the same 
> parameter types, declaring type and return type
> {noformat}
>  
> Probably could be a side effect of 
> https://issues.apache.org/jira/browse/FLINK-27246



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-30927) Several tests started generate output with two non-abstract methods have the same parameter types, declaring type and return type

2023-02-06 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17684726#comment-17684726
 ] 

Krzysztof Chmielewski edited comment on FLINK-30927 at 2/6/23 4:31 PM:
---

Provided PR above is fixing the reported issue. 

However CI build was not failing due to this problem. The reason why it was not 
failing is that code splitter has a safety net, that whenever rewritten code 
fails the compilation, Flink tries to use original code + print failing class 
into the logs. That is how the problem was spotted. 

Maybe it would worth to add an enhancement such this issue would in fact failed 
the build? A separate issue?


was (Author: kristoffsc):
Provided PR above is fixing reported issue. 

However CI build was not failing due to this problem. The reason why it was not 
failing is that code splitter has a safety net, that whenever rewritten code 
fails the compilation, Flink tries to use original code + print failing class 
into the logs. That is how the problem was spotted. 

Maybe it would worth to add an enhancement such this issue would in fact failed 
the build? A separate issue?

> Several tests started generate output with two non-abstract methods  have the 
> same parameter types, declaring type and return type
> --
>
> Key: FLINK-30927
> URL: https://issues.apache.org/jira/browse/FLINK-30927
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Reporter: Sergey Nuyanzin
>Assignee: Krzysztof Chmielewski
>Priority: Major
>  Labels: pull-request-available
>
> e.g. 
> org.apache.flink.table.planner.runtime.stream.sql.MatchRecognizeITCase#testUserDefinedFunctions
>  
> it seems during code splitter it starts generating some methods with same 
> signature
>  
> {noformat}
> org.codehaus.janino.InternalCompilerException: Compiling 
> "MatchRecognizePatternProcessFunction$77": Two non-abstract methods "default 
> void MatchRecognizePatternProcessFunction$77.processMatch_0(java.util.Map, 
> org.apache.flink.cep.functions.PatternProcessFunction$Context, 
> org.apache.flink.util.Collector) throws java.lang.Exception" have the same 
> parameter types, declaring type and return type
> {noformat}
>  
> Probably could be a side effect of 
> https://issues.apache.org/jira/browse/FLINK-27246



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-30927) Several tests started generate output with two non-abstract methods have the same parameter types, declaring type and return type

2023-02-06 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17684726#comment-17684726
 ] 

Krzysztof Chmielewski commented on FLINK-30927:
---

Provided PR above is fixing reported issue. 

However CI build was not failing due to this problem. The reason why it was not 
failing is that code splitter has a safety net, that whenever rewritten code 
fails the compilation, Flink tries to use original code + print failing class 
into the logs. That is how the problem was spotted. 

Maybe it would worth to add an enhancement such this issue would in fact failed 
the build? A separate issue?

> Several tests started generate output with two non-abstract methods  have the 
> same parameter types, declaring type and return type
> --
>
> Key: FLINK-30927
> URL: https://issues.apache.org/jira/browse/FLINK-30927
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Reporter: Sergey Nuyanzin
>Assignee: Krzysztof Chmielewski
>Priority: Major
>  Labels: pull-request-available
>
> e.g. 
> org.apache.flink.table.planner.runtime.stream.sql.MatchRecognizeITCase#testUserDefinedFunctions
>  
> it seems during code splitter it starts generating some methods with same 
> signature
>  
> {noformat}
> org.codehaus.janino.InternalCompilerException: Compiling 
> "MatchRecognizePatternProcessFunction$77": Two non-abstract methods "default 
> void MatchRecognizePatternProcessFunction$77.processMatch_0(java.util.Map, 
> org.apache.flink.cep.functions.PatternProcessFunction$Context, 
> org.apache.flink.util.Collector) throws java.lang.Exception" have the same 
> parameter types, declaring type and return type
> {noformat}
>  
> Probably could be a side effect of 
> https://issues.apache.org/jira/browse/FLINK-27246



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-30927) Several tests started generate output with two non-abstract methods have the same parameter types, declaring type and return type

2023-02-06 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17684706#comment-17684706
 ] 

Krzysztof Chmielewski commented on FLINK-30927:
---

Pull request available
https://github.com/apache/flink/pull/21871

> Several tests started generate output with two non-abstract methods  have the 
> same parameter types, declaring type and return type
> --
>
> Key: FLINK-30927
> URL: https://issues.apache.org/jira/browse/FLINK-30927
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Reporter: Sergey Nuyanzin
>Assignee: Krzysztof Chmielewski
>Priority: Major
>  Labels: pull-request-available
>
> e.g. 
> org.apache.flink.table.planner.runtime.stream.sql.MatchRecognizeITCase#testUserDefinedFunctions
>  
> it seems during code splitter it starts generating some methods with same 
> signature
>  
> {noformat}
> org.codehaus.janino.InternalCompilerException: Compiling 
> "MatchRecognizePatternProcessFunction$77": Two non-abstract methods "default 
> void MatchRecognizePatternProcessFunction$77.processMatch_0(java.util.Map, 
> org.apache.flink.cep.functions.PatternProcessFunction$Context, 
> org.apache.flink.util.Collector) throws java.lang.Exception" have the same 
> parameter types, declaring type and return type
> {noformat}
>  
> Probably could be a side effect of 
> https://issues.apache.org/jira/browse/FLINK-27246



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-30927) Several tests started generate output with two non-abstract methods have the same parameter types, declaring type and return type

2023-02-06 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17684687#comment-17684687
 ] 

Krzysztof Chmielewski commented on FLINK-30927:
---

I already have fix for this, will provide PR shortly.
It's caused by https://github.com/apache/flink/pull/21393.

Could someone assign this ticket to me?

> Several tests started generate output with two non-abstract methods  have the 
> same parameter types, declaring type and return type
> --
>
> Key: FLINK-30927
> URL: https://issues.apache.org/jira/browse/FLINK-30927
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Reporter: Sergey Nuyanzin
>Priority: Major
>
> e.g. 
> org.apache.flink.table.planner.runtime.stream.sql.MatchRecognizeITCase#testUserDefinedFunctions
>  
> it seems during code splitter it starts generating some methods with same 
> signature
>  
> {noformat}
> org.codehaus.janino.InternalCompilerException: Compiling 
> "MatchRecognizePatternProcessFunction$77": Two non-abstract methods "default 
> void MatchRecognizePatternProcessFunction$77.processMatch_0(java.util.Map, 
> org.apache.flink.cep.functions.PatternProcessFunction$Context, 
> org.apache.flink.util.Collector) throws java.lang.Exception" have the same 
> parameter types, declaring type and return type
> {noformat}
>  
> Probably could be a side effect of 
> https://issues.apache.org/jira/browse/FLINK-27246



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-27246) Code of method "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V" of class "HashAggregateWithKeys$9211" grows beyond 64 KB

2023-02-06 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17684496#comment-17684496
 ] 

Krzysztof Chmielewski edited comment on FLINK-27246 at 2/6/23 8:48 AM:
---

master merge commit: af9a1128f728c691b896bc9c591e9be1327601c4

1.16 PR https://github.com/apache/flink/pull/21860


was (Author: kristoffsc):
master merge commit: af9a1128f728c691b896bc9c591e9be1327601c4

Preparing backport to 1.16

> Code of method 
> "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V"
>  of class "HashAggregateWithKeys$9211" grows beyond 64 KB
> -
>
> Key: FLINK-27246
> URL: https://issues.apache.org/jira/browse/FLINK-27246
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.14.3, 1.16.0, 1.15.3
>Reporter: Maciej Bryński
>Assignee: Krzysztof Chmielewski
>Priority: Major
>  Labels: pull-request-available
> Attachments: endInput_falseFilter9123_split9704.txt
>
>
> I think this bug should get fixed in 
> https://issues.apache.org/jira/browse/FLINK-23007
> Unfortunately I spotted it on Flink 1.14.3
> {code}
> java.lang.RuntimeException: Could not instantiate generated class 
> 'HashAggregateWithKeys$9211'
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:85)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.operators.CodeGenOperatorFactory.createStreamOperator(CodeGenOperatorFactory.java:40)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.api.operators.StreamOperatorFactoryUtil.createOperator(StreamOperatorFactoryUtil.java:81)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.(OperatorChain.java:198)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.(RegularOperatorChain.java:63)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:666)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:654)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at java.lang.Thread.run(Unknown Source) ~[?:?]
> Caused by: org.apache.flink.util.FlinkRuntimeException: 
> org.apache.flink.api.common.InvalidProgramException: Table program cannot be 
> compiled. This is a bug. Please file an issue.
>   at 
> org.apache.flink.table.runtime.generated.CompileUtils.compile(CompileUtils.java:76)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.compile(GeneratedClass.java:102)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:83)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   ... 11 more
> Caused by: 
> org.apache.flink.shaded.guava30.com.google.common.util.concurrent.UncheckedExecutionException:
>  org.apache.flink.api.common.InvalidProgramException: Table program cannot be 
> compiled. This is a bug. Please file an issue.
>   at 
> org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2051)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache.get(LocalCache.java:3962)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4859)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.CompileUtils.compile(CompileUtils.java:74)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> 

[jira] [Commented] (FLINK-27246) Code of method "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V" of class "HashAggregateWithKeys$9211" grows beyond 64 KB

2023-02-06 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17684496#comment-17684496
 ] 

Krzysztof Chmielewski commented on FLINK-27246:
---

master merge commit: af9a1128f728c691b896bc9c591e9be1327601c4

Preparing backports to 1.16

> Code of method 
> "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V"
>  of class "HashAggregateWithKeys$9211" grows beyond 64 KB
> -
>
> Key: FLINK-27246
> URL: https://issues.apache.org/jira/browse/FLINK-27246
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.14.3, 1.16.0, 1.15.3
>Reporter: Maciej Bryński
>Assignee: Krzysztof Chmielewski
>Priority: Major
>  Labels: pull-request-available
> Attachments: endInput_falseFilter9123_split9704.txt
>
>
> I think this bug should get fixed in 
> https://issues.apache.org/jira/browse/FLINK-23007
> Unfortunately I spotted it on Flink 1.14.3
> {code}
> java.lang.RuntimeException: Could not instantiate generated class 
> 'HashAggregateWithKeys$9211'
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:85)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.operators.CodeGenOperatorFactory.createStreamOperator(CodeGenOperatorFactory.java:40)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.api.operators.StreamOperatorFactoryUtil.createOperator(StreamOperatorFactoryUtil.java:81)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.(OperatorChain.java:198)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.(RegularOperatorChain.java:63)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:666)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:654)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at java.lang.Thread.run(Unknown Source) ~[?:?]
> Caused by: org.apache.flink.util.FlinkRuntimeException: 
> org.apache.flink.api.common.InvalidProgramException: Table program cannot be 
> compiled. This is a bug. Please file an issue.
>   at 
> org.apache.flink.table.runtime.generated.CompileUtils.compile(CompileUtils.java:76)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.compile(GeneratedClass.java:102)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:83)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   ... 11 more
> Caused by: 
> org.apache.flink.shaded.guava30.com.google.common.util.concurrent.UncheckedExecutionException:
>  org.apache.flink.api.common.InvalidProgramException: Table program cannot be 
> compiled. This is a bug. Please file an issue.
>   at 
> org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2051)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache.get(LocalCache.java:3962)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4859)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.CompileUtils.compile(CompileUtils.java:74)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.compile(GeneratedClass.java:102)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> 

[jira] [Comment Edited] (FLINK-27246) Code of method "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V" of class "HashAggregateWithKeys$9211" grows beyond 64 KB

2023-02-06 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17684496#comment-17684496
 ] 

Krzysztof Chmielewski edited comment on FLINK-27246 at 2/6/23 8:12 AM:
---

master merge commit: af9a1128f728c691b896bc9c591e9be1327601c4

Preparing backport to 1.16


was (Author: kristoffsc):
master merge commit: af9a1128f728c691b896bc9c591e9be1327601c4

Preparing backports to 1.16

> Code of method 
> "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V"
>  of class "HashAggregateWithKeys$9211" grows beyond 64 KB
> -
>
> Key: FLINK-27246
> URL: https://issues.apache.org/jira/browse/FLINK-27246
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.14.3, 1.16.0, 1.15.3
>Reporter: Maciej Bryński
>Assignee: Krzysztof Chmielewski
>Priority: Major
>  Labels: pull-request-available
> Attachments: endInput_falseFilter9123_split9704.txt
>
>
> I think this bug should get fixed in 
> https://issues.apache.org/jira/browse/FLINK-23007
> Unfortunately I spotted it on Flink 1.14.3
> {code}
> java.lang.RuntimeException: Could not instantiate generated class 
> 'HashAggregateWithKeys$9211'
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:85)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.operators.CodeGenOperatorFactory.createStreamOperator(CodeGenOperatorFactory.java:40)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.api.operators.StreamOperatorFactoryUtil.createOperator(StreamOperatorFactoryUtil.java:81)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.(OperatorChain.java:198)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.(RegularOperatorChain.java:63)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:666)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:654)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at java.lang.Thread.run(Unknown Source) ~[?:?]
> Caused by: org.apache.flink.util.FlinkRuntimeException: 
> org.apache.flink.api.common.InvalidProgramException: Table program cannot be 
> compiled. This is a bug. Please file an issue.
>   at 
> org.apache.flink.table.runtime.generated.CompileUtils.compile(CompileUtils.java:76)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.compile(GeneratedClass.java:102)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:83)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   ... 11 more
> Caused by: 
> org.apache.flink.shaded.guava30.com.google.common.util.concurrent.UncheckedExecutionException:
>  org.apache.flink.api.common.InvalidProgramException: Table program cannot be 
> compiled. This is a bug. Please file an issue.
>   at 
> org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2051)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache.get(LocalCache.java:3962)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4859)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.CompileUtils.compile(CompileUtils.java:74)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> 

[jira] [Commented] (FLINK-27246) Code of method "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V" of class "HashAggregateWithKeys$9211" grows beyond 64 KB

2023-01-06 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17655573#comment-17655573
 ] 

Krzysztof Chmielewski commented on FLINK-27246:
---

Hi [~TsReaper], [~twalthr] and [~jingge]

I have finally finished working on my PR for this issue.
The PR is here: https://github.com/apache/flink/pull/21393

In short I've created new Rewritter that can rewrite IF/ESLE and WHILE blocks 
including combination and nested statements. I also removed IfStatementRewriter 
since its logic is covered by my new BlockStatementRewriter.

I ran the new Code Splitter against SQL query attached to this ticket and it 
works perfectly.

I would appreciate if you could take a look and do the review. I added detailed 
description of my change/solution to the PR.
Please let me know what do you think.

Cheers.



> Code of method 
> "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V"
>  of class "HashAggregateWithKeys$9211" grows beyond 64 KB
> -
>
> Key: FLINK-27246
> URL: https://issues.apache.org/jira/browse/FLINK-27246
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.14.3, 1.16.0, 1.15.3
>Reporter: Maciej Bryński
>Assignee: Krzysztof Chmielewski
>Priority: Major
> Attachments: endInput_falseFilter9123_split9704.txt
>
>
> I think this bug should get fixed in 
> https://issues.apache.org/jira/browse/FLINK-23007
> Unfortunately I spotted it on Flink 1.14.3
> {code}
> java.lang.RuntimeException: Could not instantiate generated class 
> 'HashAggregateWithKeys$9211'
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:85)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.operators.CodeGenOperatorFactory.createStreamOperator(CodeGenOperatorFactory.java:40)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.api.operators.StreamOperatorFactoryUtil.createOperator(StreamOperatorFactoryUtil.java:81)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.(OperatorChain.java:198)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.(RegularOperatorChain.java:63)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:666)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:654)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at java.lang.Thread.run(Unknown Source) ~[?:?]
> Caused by: org.apache.flink.util.FlinkRuntimeException: 
> org.apache.flink.api.common.InvalidProgramException: Table program cannot be 
> compiled. This is a bug. Please file an issue.
>   at 
> org.apache.flink.table.runtime.generated.CompileUtils.compile(CompileUtils.java:76)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.compile(GeneratedClass.java:102)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:83)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   ... 11 more
> Caused by: 
> org.apache.flink.shaded.guava30.com.google.common.util.concurrent.UncheckedExecutionException:
>  org.apache.flink.api.common.InvalidProgramException: Table program cannot be 
> compiled. This is a bug. Please file an issue.
>   at 
> org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2051)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache.get(LocalCache.java:3962)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> 

[jira] (FLINK-27246) Code of method "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V" of class "HashAggregateWithKeys$9211" grows beyond 64 KB

2023-01-06 Thread Krzysztof Chmielewski (Jira)


[ https://issues.apache.org/jira/browse/FLINK-27246 ]


Krzysztof Chmielewski deleted comment on FLINK-27246:
---

was (Author: kristoffsc):
Hi [~TsReaper], [~twalthr] and [~jingge]
I have finally finished working on my PR for this issue.
The PR is here: https://github.com/apache/flink/pull/21393

In short I've created new Rewritter that can rewrite IF/ESLE and WHILE blocks 
including combination and nested statements. I also removed IfStatementRewriter 
since its logic is covered by my new BlockStatementRewriter.

I ran the new Code Splitter against SQL query attached to this ticket and it 
works perfectly.

I would appreciate if you could take a look and do the review. I added detailed 
description of my change/solution to the PR.
Please let me know what do you think.

Cheers.



> Code of method 
> "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V"
>  of class "HashAggregateWithKeys$9211" grows beyond 64 KB
> -
>
> Key: FLINK-27246
> URL: https://issues.apache.org/jira/browse/FLINK-27246
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.14.3, 1.16.0, 1.15.3
>Reporter: Maciej Bryński
>Assignee: Krzysztof Chmielewski
>Priority: Major
> Attachments: endInput_falseFilter9123_split9704.txt
>
>
> I think this bug should get fixed in 
> https://issues.apache.org/jira/browse/FLINK-23007
> Unfortunately I spotted it on Flink 1.14.3
> {code}
> java.lang.RuntimeException: Could not instantiate generated class 
> 'HashAggregateWithKeys$9211'
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:85)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.operators.CodeGenOperatorFactory.createStreamOperator(CodeGenOperatorFactory.java:40)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.api.operators.StreamOperatorFactoryUtil.createOperator(StreamOperatorFactoryUtil.java:81)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.(OperatorChain.java:198)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.(RegularOperatorChain.java:63)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:666)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:654)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at java.lang.Thread.run(Unknown Source) ~[?:?]
> Caused by: org.apache.flink.util.FlinkRuntimeException: 
> org.apache.flink.api.common.InvalidProgramException: Table program cannot be 
> compiled. This is a bug. Please file an issue.
>   at 
> org.apache.flink.table.runtime.generated.CompileUtils.compile(CompileUtils.java:76)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.compile(GeneratedClass.java:102)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:83)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   ... 11 more
> Caused by: 
> org.apache.flink.shaded.guava30.com.google.common.util.concurrent.UncheckedExecutionException:
>  org.apache.flink.api.common.InvalidProgramException: Table program cannot be 
> compiled. This is a bug. Please file an issue.
>   at 
> org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2051)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache.get(LocalCache.java:3962)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> 

[jira] [Comment Edited] (FLINK-27246) Code of method "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V" of class "HashAggregateWithKeys$9211" grows beyond 64 KB

2023-01-06 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17655572#comment-17655572
 ] 

Krzysztof Chmielewski edited comment on FLINK-27246 at 1/6/23 8:16 PM:
---

Hi [~TsReaper], [~twalthr] and [~jingge]
I have finally finished working on my PR for this issue.
The PR is here: https://github.com/apache/flink/pull/21393

In short I've created new Rewritter that can rewrite IF/ESLE and WHILE blocks 
including combination and nested statements. I also removed IfStatementRewriter 
since its logic is covered by my new BlockStatementRewriter.

I ran the new Code Splitter against SQL query attached to this ticket and it 
works perfectly.

I would appreciate if you could take a look and do the review. I added detailed 
description of my change/solution to the PR.
Please let me know what do you think.

Cheers.




was (Author: kristoffsc):
Hi [~TsReaper]
I have finally finished working on my PR for this issue.
The PR is here: https://github.com/apache/flink/pull/21393

In short I've created new Rewritter that can rewrite IF/ESLE and WHILE blocks 
including combination and nested statements. I also removed IfStatementRewriter 
since its logic is covered by my new BlockStatementRewriter.

I ran the new Code Splitter against SQL query attached to this ticket and it 
works perfectly.

I would appreciate if you could take a look and do the review. I added detailed 
description of my change/solution to the PR.
Please let me know what do you think.

Cheers.



> Code of method 
> "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V"
>  of class "HashAggregateWithKeys$9211" grows beyond 64 KB
> -
>
> Key: FLINK-27246
> URL: https://issues.apache.org/jira/browse/FLINK-27246
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.14.3, 1.16.0, 1.15.3
>Reporter: Maciej Bryński
>Assignee: Krzysztof Chmielewski
>Priority: Major
> Attachments: endInput_falseFilter9123_split9704.txt
>
>
> I think this bug should get fixed in 
> https://issues.apache.org/jira/browse/FLINK-23007
> Unfortunately I spotted it on Flink 1.14.3
> {code}
> java.lang.RuntimeException: Could not instantiate generated class 
> 'HashAggregateWithKeys$9211'
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:85)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.operators.CodeGenOperatorFactory.createStreamOperator(CodeGenOperatorFactory.java:40)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.api.operators.StreamOperatorFactoryUtil.createOperator(StreamOperatorFactoryUtil.java:81)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.(OperatorChain.java:198)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.(RegularOperatorChain.java:63)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:666)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:654)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at java.lang.Thread.run(Unknown Source) ~[?:?]
> Caused by: org.apache.flink.util.FlinkRuntimeException: 
> org.apache.flink.api.common.InvalidProgramException: Table program cannot be 
> compiled. This is a bug. Please file an issue.
>   at 
> org.apache.flink.table.runtime.generated.CompileUtils.compile(CompileUtils.java:76)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.compile(GeneratedClass.java:102)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:83)
>  

[jira] [Comment Edited] (FLINK-27246) Code of method "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V" of class "HashAggregateWithKeys$9211" grows beyond 64 KB

2023-01-06 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17655572#comment-17655572
 ] 

Krzysztof Chmielewski edited comment on FLINK-27246 at 1/6/23 8:12 PM:
---

Hi [~TsReaper]
I have finally finished working on my PR for this issue.
The PR is here: https://github.com/apache/flink/pull/21393

In short I've created new Rewritter that can rewrite IF/ESLE and WHILE blocks 
including combination and nested statements. I also removed IfStatementRewriter 
since its logic is covered by my new BlockStatementRewriter.

I ran the new Code Splitter against SQL query attached to this ticket and it 
works perfectly.

I would appreciate if you could take a look and do the review. I added detailed 
description of my change/solution to the PR.
Please let me know what do you think.

Cheers.




was (Author: kristoffsc):
Hi [~TsReaper]
I have finally finished working on my PR for this issue.
The PR is here: https://github.com/apache/flink/pull/21393

I would appreciate if you could take a look and do the review. I added detailed 
description of my change/solution to the PR.
Please let me know what do you think.

Cheers.



> Code of method 
> "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V"
>  of class "HashAggregateWithKeys$9211" grows beyond 64 KB
> -
>
> Key: FLINK-27246
> URL: https://issues.apache.org/jira/browse/FLINK-27246
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.14.3, 1.16.0, 1.15.3
>Reporter: Maciej Bryński
>Assignee: Krzysztof Chmielewski
>Priority: Major
> Attachments: endInput_falseFilter9123_split9704.txt
>
>
> I think this bug should get fixed in 
> https://issues.apache.org/jira/browse/FLINK-23007
> Unfortunately I spotted it on Flink 1.14.3
> {code}
> java.lang.RuntimeException: Could not instantiate generated class 
> 'HashAggregateWithKeys$9211'
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:85)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.operators.CodeGenOperatorFactory.createStreamOperator(CodeGenOperatorFactory.java:40)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.api.operators.StreamOperatorFactoryUtil.createOperator(StreamOperatorFactoryUtil.java:81)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.(OperatorChain.java:198)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.(RegularOperatorChain.java:63)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:666)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:654)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at java.lang.Thread.run(Unknown Source) ~[?:?]
> Caused by: org.apache.flink.util.FlinkRuntimeException: 
> org.apache.flink.api.common.InvalidProgramException: Table program cannot be 
> compiled. This is a bug. Please file an issue.
>   at 
> org.apache.flink.table.runtime.generated.CompileUtils.compile(CompileUtils.java:76)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.compile(GeneratedClass.java:102)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:83)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   ... 11 more
> Caused by: 
> org.apache.flink.shaded.guava30.com.google.common.util.concurrent.UncheckedExecutionException:
>  org.apache.flink.api.common.InvalidProgramException: Table program cannot be 
> compiled. This is a bug. Please file an issue.
>   at 
> 

[jira] [Commented] (FLINK-27246) Code of method "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V" of class "HashAggregateWithKeys$9211" grows beyond 64 KB

2023-01-06 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17655572#comment-17655572
 ] 

Krzysztof Chmielewski commented on FLINK-27246:
---

Hi [~TsReaper]
I have finally finished working on my PR for this issue.
The PR is here: https://github.com/apache/flink/pull/21393

I would appreciate if you could take a look and do the review. I added detailed 
description of my change/solution to the PR.
Please let me know what do you think.

Cheers.



> Code of method 
> "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V"
>  of class "HashAggregateWithKeys$9211" grows beyond 64 KB
> -
>
> Key: FLINK-27246
> URL: https://issues.apache.org/jira/browse/FLINK-27246
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.14.3, 1.16.0, 1.15.3
>Reporter: Maciej Bryński
>Assignee: Krzysztof Chmielewski
>Priority: Major
> Attachments: endInput_falseFilter9123_split9704.txt
>
>
> I think this bug should get fixed in 
> https://issues.apache.org/jira/browse/FLINK-23007
> Unfortunately I spotted it on Flink 1.14.3
> {code}
> java.lang.RuntimeException: Could not instantiate generated class 
> 'HashAggregateWithKeys$9211'
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:85)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.operators.CodeGenOperatorFactory.createStreamOperator(CodeGenOperatorFactory.java:40)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.api.operators.StreamOperatorFactoryUtil.createOperator(StreamOperatorFactoryUtil.java:81)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.(OperatorChain.java:198)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.(RegularOperatorChain.java:63)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:666)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:654)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at java.lang.Thread.run(Unknown Source) ~[?:?]
> Caused by: org.apache.flink.util.FlinkRuntimeException: 
> org.apache.flink.api.common.InvalidProgramException: Table program cannot be 
> compiled. This is a bug. Please file an issue.
>   at 
> org.apache.flink.table.runtime.generated.CompileUtils.compile(CompileUtils.java:76)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.compile(GeneratedClass.java:102)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:83)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   ... 11 more
> Caused by: 
> org.apache.flink.shaded.guava30.com.google.common.util.concurrent.UncheckedExecutionException:
>  org.apache.flink.api.common.InvalidProgramException: Table program cannot be 
> compiled. This is a bug. Please file an issue.
>   at 
> org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2051)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache.get(LocalCache.java:3962)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4859)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.CompileUtils.compile(CompileUtils.java:74)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> 

[jira] [Commented] (FLINK-27246) Code of method "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V" of class "HashAggregateWithKeys$9211" grows beyond 64 KB

2022-12-30 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653156#comment-17653156
 ] 

Krzysztof Chmielewski commented on FLINK-27246:
---

Hi [~TsReaper]
I have modified my PR.

I reverted all chanegs from FunctionSplitter and I introduced totally new 
BlockStatementRewriter that is using new classses from original PR, 
BlockStatementSplitter and BlockStatementGrouper.

The new BlockStatementRewriter  can handle If/Else/While statemets. For IF/ESLE 
statemets if produces similiar but not exact result as IfStatementRewriter did.

However to make the original problem gone I had to use both rewriters, so 
JavaCodeSplitter now has this in splitImpl

{code:java}
return Optional.ofNullable(
new DeclarationRewriter(returnValueRewrittenCode, 
maxMethodLength)
.rewrite())
.map(text -> new IfStatementRewriter(text, 
maxMethodLength).rewrite())
.map(text -> new BlockStatementRewriter(text, 
maxMethodLength).rewrite())
.map(text -> new FunctionSplitter(text, 
maxMethodLength).rewrite())
.map(text -> new MemberFieldRewriter(text, 
maxClassMemberCount).rewrite())
.orElse(code);
}
{code}

The good news is that all tests on CI/CD are passing. 
Still I have to investigate more. My goal would be to drop IfStatementRewriter 
and use only BlockStatementRewriter   unless I will find some reason not to. I 
will keep you posted on this.

> Code of method 
> "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V"
>  of class "HashAggregateWithKeys$9211" grows beyond 64 KB
> -
>
> Key: FLINK-27246
> URL: https://issues.apache.org/jira/browse/FLINK-27246
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.14.3, 1.16.0, 1.15.3
>Reporter: Maciej Bryński
>Assignee: Krzysztof Chmielewski
>Priority: Major
> Attachments: endInput_falseFilter9123_split9704.txt
>
>
> I think this bug should get fixed in 
> https://issues.apache.org/jira/browse/FLINK-23007
> Unfortunately I spotted it on Flink 1.14.3
> {code}
> java.lang.RuntimeException: Could not instantiate generated class 
> 'HashAggregateWithKeys$9211'
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:85)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.operators.CodeGenOperatorFactory.createStreamOperator(CodeGenOperatorFactory.java:40)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.api.operators.StreamOperatorFactoryUtil.createOperator(StreamOperatorFactoryUtil.java:81)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.(OperatorChain.java:198)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.(RegularOperatorChain.java:63)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:666)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:654)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at java.lang.Thread.run(Unknown Source) ~[?:?]
> Caused by: org.apache.flink.util.FlinkRuntimeException: 
> org.apache.flink.api.common.InvalidProgramException: Table program cannot be 
> compiled. This is a bug. Please file an issue.
>   at 
> org.apache.flink.table.runtime.generated.CompileUtils.compile(CompileUtils.java:76)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.compile(GeneratedClass.java:102)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:83)
>  

[jira] [Commented] (FLINK-27246) Code of method "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V" of class "HashAggregateWithKeys$9211" grows beyond 64 KB

2022-12-27 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652279#comment-17652279
 ] 

Krzysztof Chmielewski commented on FLINK-27246:
---

Hi [~TsReaper]
Thanks for the feedback.

Regarding IfStatementRewriter its a little bit tricky for me.
I think my new thing might handle more cases than IfStatementRewriter did. Plus 
fStatementRewriter and existing rewrites seems to expect a method declaration + 
body, where my BlocksStatementGrouper and Splitter are processing individual 
block statements. They are called from FunctionSplitter::FunctionSplitVisitor 
where while processing block statements from method's body.

Now It seems that after my change,  FunctionSplitter is also rewriting the 
code, similar to IfStatementRewriter and maybe this is not the best thing to do 
from the clean code/architecture perspective.

The problem with IfStatementRewriter  is that it will not rewrite the If/Else 
branch if the branch contains "while" statement in it or when entire if/else 
statement is inside while statement, which was the original problem.

So now I'm wonder should we have this extracted from FunctionSplitter into new 
BlocksStatementRewriter that whill handle while/If/else statements in 
combination or can this be inside Function Splitter as it is now.



> Code of method 
> "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V"
>  of class "HashAggregateWithKeys$9211" grows beyond 64 KB
> -
>
> Key: FLINK-27246
> URL: https://issues.apache.org/jira/browse/FLINK-27246
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.14.3, 1.16.0, 1.15.3
>Reporter: Maciej Bryński
>Assignee: Krzysztof Chmielewski
>Priority: Major
> Attachments: endInput_falseFilter9123_split9704.txt
>
>
> I think this bug should get fixed in 
> https://issues.apache.org/jira/browse/FLINK-23007
> Unfortunately I spotted it on Flink 1.14.3
> {code}
> java.lang.RuntimeException: Could not instantiate generated class 
> 'HashAggregateWithKeys$9211'
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:85)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.operators.CodeGenOperatorFactory.createStreamOperator(CodeGenOperatorFactory.java:40)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.api.operators.StreamOperatorFactoryUtil.createOperator(StreamOperatorFactoryUtil.java:81)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.(OperatorChain.java:198)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.(RegularOperatorChain.java:63)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:666)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:654)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at java.lang.Thread.run(Unknown Source) ~[?:?]
> Caused by: org.apache.flink.util.FlinkRuntimeException: 
> org.apache.flink.api.common.InvalidProgramException: Table program cannot be 
> compiled. This is a bug. Please file an issue.
>   at 
> org.apache.flink.table.runtime.generated.CompileUtils.compile(CompileUtils.java:76)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.compile(GeneratedClass.java:102)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:83)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   ... 11 more
> Caused by: 
> org.apache.flink.shaded.guava30.com.google.common.util.concurrent.UncheckedExecutionException:
>  

[jira] [Commented] (FLINK-27246) Code of method "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V" of class "HashAggregateWithKeys$9211" grows beyond 64 KB

2022-12-01 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17642072#comment-17642072
 ] 

Krzysztof Chmielewski commented on FLINK-27246:
---

[~TsReaper] I would love for your feedbeck on my PoC fix.

thanks.

> Code of method 
> "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V"
>  of class "HashAggregateWithKeys$9211" grows beyond 64 KB
> -
>
> Key: FLINK-27246
> URL: https://issues.apache.org/jira/browse/FLINK-27246
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.14.3
>Reporter: Maciej Bryński
>Priority: Major
> Attachments: endInput_falseFilter9123_split9704.txt
>
>
> I think this bug should get fixed in 
> https://issues.apache.org/jira/browse/FLINK-23007
> Unfortunately I spotted it on Flink 1.14.3
> {code}
> java.lang.RuntimeException: Could not instantiate generated class 
> 'HashAggregateWithKeys$9211'
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:85)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.operators.CodeGenOperatorFactory.createStreamOperator(CodeGenOperatorFactory.java:40)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.api.operators.StreamOperatorFactoryUtil.createOperator(StreamOperatorFactoryUtil.java:81)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.(OperatorChain.java:198)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.(RegularOperatorChain.java:63)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:666)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:654)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at java.lang.Thread.run(Unknown Source) ~[?:?]
> Caused by: org.apache.flink.util.FlinkRuntimeException: 
> org.apache.flink.api.common.InvalidProgramException: Table program cannot be 
> compiled. This is a bug. Please file an issue.
>   at 
> org.apache.flink.table.runtime.generated.CompileUtils.compile(CompileUtils.java:76)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.compile(GeneratedClass.java:102)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:83)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   ... 11 more
> Caused by: 
> org.apache.flink.shaded.guava30.com.google.common.util.concurrent.UncheckedExecutionException:
>  org.apache.flink.api.common.InvalidProgramException: Table program cannot be 
> compiled. This is a bug. Please file an issue.
>   at 
> org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2051)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache.get(LocalCache.java:3962)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4859)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.CompileUtils.compile(CompileUtils.java:74)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.compile(GeneratedClass.java:102)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:83)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   ... 11 more
> Caused by: 

[jira] [Comment Edited] (FLINK-27246) Code of method "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V" of class "HashAggregateWithKeys$9211" grows beyond 64 KB

2022-12-01 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17642072#comment-17642072
 ] 

Krzysztof Chmielewski edited comment on FLINK-27246 at 12/1/22 5:42 PM:


[~TsReaper] I would love for your feedbeck on my PoC fix above.

thanks.


was (Author: kristoffsc):
[~TsReaper] I would love for your feedbeck on my PoC fix.

thanks.

> Code of method 
> "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V"
>  of class "HashAggregateWithKeys$9211" grows beyond 64 KB
> -
>
> Key: FLINK-27246
> URL: https://issues.apache.org/jira/browse/FLINK-27246
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.14.3
>Reporter: Maciej Bryński
>Priority: Major
> Attachments: endInput_falseFilter9123_split9704.txt
>
>
> I think this bug should get fixed in 
> https://issues.apache.org/jira/browse/FLINK-23007
> Unfortunately I spotted it on Flink 1.14.3
> {code}
> java.lang.RuntimeException: Could not instantiate generated class 
> 'HashAggregateWithKeys$9211'
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:85)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.operators.CodeGenOperatorFactory.createStreamOperator(CodeGenOperatorFactory.java:40)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.api.operators.StreamOperatorFactoryUtil.createOperator(StreamOperatorFactoryUtil.java:81)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.(OperatorChain.java:198)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.(RegularOperatorChain.java:63)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:666)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:654)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at java.lang.Thread.run(Unknown Source) ~[?:?]
> Caused by: org.apache.flink.util.FlinkRuntimeException: 
> org.apache.flink.api.common.InvalidProgramException: Table program cannot be 
> compiled. This is a bug. Please file an issue.
>   at 
> org.apache.flink.table.runtime.generated.CompileUtils.compile(CompileUtils.java:76)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.compile(GeneratedClass.java:102)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:83)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   ... 11 more
> Caused by: 
> org.apache.flink.shaded.guava30.com.google.common.util.concurrent.UncheckedExecutionException:
>  org.apache.flink.api.common.InvalidProgramException: Table program cannot be 
> compiled. This is a bug. Please file an issue.
>   at 
> org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2051)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache.get(LocalCache.java:3962)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4859)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.CompileUtils.compile(CompileUtils.java:74)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.compile(GeneratedClass.java:102)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> 

[jira] [Commented] (FLINK-26051) one sql has row_number =1 and the subsequent SQL has "case when" and "where" statement result Exception : The window can only be ordered in ASCENDING mode

2022-11-28 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17640037#comment-17640037
 ] 

Krzysztof Chmielewski commented on FLINK-26051:
---

Hi :) [~qingyue]
Do you have any update on this one? :)

> one sql has row_number =1 and the subsequent SQL has "case when" and "where" 
> statement result Exception : The window can only be ordered in ASCENDING mode
> --
>
> Key: FLINK-26051
> URL: https://issues.apache.org/jira/browse/FLINK-26051
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.12.2, 1.14.4
>Reporter: chuncheng wu
>Assignee: Jane Chan
>Priority: Major
> Attachments: image-2022-02-10-20-13-14-424.png, 
> image-2022-02-11-11-18-20-594.png, image-2022-06-17-21-28-54-886.png
>
>
> hello,
>    i have 2 sqls. One  sql (sql0) is "select xx from ( ROW_NUMBER statment) 
> where rn=1" and  the other one (sql1) is   "s{color:#505f79}elect ${fields} 
> from result where ${filter_conditions}{color}"  . The fields quoted in sql1 
> has one "case when" field .The two sql can work well seperately.but if they 
> combine  it results the exception as follow . It happen in the occasion when 
> logical plan turn into physical plan :
>  
> {code:java}
> org.apache.flink.table.api.TableException: The window can only be ordered in 
> ASCENDING mode.
>     at 
> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecOverAggregate.translateToPlanInternal(StreamExecOverAggregate.scala:98)
>     at 
> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecOverAggregate.translateToPlanInternal(StreamExecOverAggregate.scala:52)
>     at 
> org.apache.flink.table.planner.plan.nodes.exec.ExecNode$class.translateToPlan(ExecNode.scala:59)
>     at 
> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecOverAggregateBase.translateToPlan(StreamExecOverAggregateBase.scala:42)
>     at 
> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecCalc.translateToPlanInternal(StreamExecCalc.scala:54)
>     at 
> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecCalc.translateToPlanInternal(StreamExecCalc.scala:39)
>     at 
> org.apache.flink.table.planner.plan.nodes.exec.ExecNode$class.translateToPlan(ExecNode.scala:59)
>     at 
> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecCalcBase.translateToPlan(StreamExecCalcBase.scala:38)
>     at 
> org.apache.flink.table.planner.delegation.StreamPlanner$$anonfun$translateToPlan$1.apply(StreamPlanner.scala:66)
>     at 
> org.apache.flink.table.planner.delegation.StreamPlanner$$anonfun$translateToPlan$1.apply(StreamPlanner.scala:65)
>     at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>     at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>     at scala.collection.Iterator$class.foreach(Iterator.scala:891)
>     at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
>     at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>     at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>     at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>     at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>     at 
> org.apache.flink.table.planner.delegation.StreamPlanner.translateToPlan(StreamPlanner.scala:65)
>     at 
> org.apache.flink.table.planner.delegation.StreamPlanner.explain(StreamPlanner.scala:103)
>     at 
> org.apache.flink.table.planner.delegation.StreamPlanner.explain(StreamPlanner.scala:42)
>     at 
> org.apache.flink.table.api.internal.TableEnvironmentImpl.explainInternal(TableEnvironmentImpl.java:630)
>     at 
> org.apache.flink.table.api.internal.TableImpl.explain(TableImpl.java:582)
>     at 
> com.meituan.grocery.data.flink.test.BugTest.testRowNumber(BugTest.java:69)
>     at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>     at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>     at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.base/java.lang.reflect.Method.invoke(Method.java:568)
>     at 
> org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725)
>     at 
> org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60){code}
> In the stacktrace above  , rownumber() 's  physical rel which  is 
> StreamExecRank In nomal change to StreamExecOverAggregate . The 
> StreamExecOverAggregate rel has a  window= ROWS 

[jira] [Comment Edited] (FLINK-27246) Code of method "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V" of class "HashAggregateWithKeys$9211" grows beyond 64 KB

2022-11-25 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17638614#comment-17638614
 ] 

Krzysztof Chmielewski edited comment on FLINK-27246 at 11/25/22 7:21 PM:
-

Hi [~TsReaper]
Im sory for a long delay. I was actyally trying to develop a PoC fix for this 
problem. I think I managed to at least proof a concept. You can find my raft PR 
here -> https://github.com/apache/flink/pull/21393  The code from this PR made 
the SQL from this ticket to compile and execute which is at least something :) 

The idea is to enhance FunctionSplitter that for every codeBlock 
(getMergedCodeBlocks method) that is bigger than "maxMethodLength" try to 
further split it by calling two new splitters that I've created:
1. BlockStatementSplitter
2. BlockStatementGrouper

The BlockStatementSplitter splits body of WHILE, IF/ELSE statements to new 
methods. The original statement is rewritten that will call those new methods. 

Next ,the *BlockStatementGrouper* is groping calls created by 
*BlockStatementSplitter*  to blocks with lengths <  "maxMethodLength" and 
extracting those to another new method. Finally *BlockStatementGrouper* 
rewrites original code block to call methods created by *BlockStatementGrouper*.

For example, an input statement:
{code:java}
while (
(kvPair$9687 = 
(org.apache.flink.api.java.tuple.Tuple2) iterator.next()) != null) {
key$6207 = (org.apache.flink.table.data.binary.BinaryRowData) 
kvPair$9687.f0;
val$9688 = (org.apache.flink.table.data.binary.BinaryRowData) 
kvPair$9687.f1;
local$5912.replace(key$6207, val$9688);
if (lastKey$6208 == null) {
  lastKey$6208 = key$6207.copy();
  agg0_sumIsNull = true;
agg0_sum = ((org.apache.flink.table.data.DecimalData) null);
agg1_sumIsNull = true;
agg1_sum = ((org.apache.flink.table.data.DecimalData) null);
agg3_sum = ((org.apache.flink.table.data.DecimalData) null);
agg3_sum = ((org.apache.flink.table.data.DecimalData) null);
  } else if (lastKey$6209 == null) { 
agg2_sum = ((org.apache.flink.table.data.DecimalData) null);
agg3_sumIsNull = true;
} else {
agg2_sumIsNull = true;
}};
{code}

will be converted by BlockStatementSplitter to:

{code:java}
while (
(kvPair$9687 = 
(org.apache.flink.api.java.tuple.Tuple2) iterator.next()) != null) {
top_whileBody0_0();
if (lastKey$6208 == null) {
  top_whileBody0_0_ifBody0();
  } else if (lastKey$6209 == null) { 
top_whileBody0_0_ifBody1_ifBody0();
} else {
top_whileBody0_0_ifBody1_ifBody1();
}};
{code}

Further this will be converted by BlockStatementGrouper with maxMethodLength 
parameter set to 4000 to:
{code:java}
while (
(kvPair$9687 = 
(org.apache.flink.api.java.tuple.Tuple2) iterator.next()) != null) {

top_rewriteGroup_0();
}
{code}


Body for the new methods would be:
{code:java}
private void top_whileBody0_0_ifBody1_ifBody0 {
agg2_sum = ((org.apache.flink.table.data.DecimalData) null);
agg3_sumIsNull = true;
}

private void top_whileBody0_0_ifBody1_ifBody1 {
agg2_sumIsNull = true;
}

private void top_whileBody0_0 {
key$6207 = (org.apache.flink.table.data.binary.BinaryRowData) kvPair$9687.f0;
val$9688 = (org.apache.flink.table.data.binary.BinaryRowData) kvPair$9687.f1;
local$5912.replace(key$6207, val$9688);
}

private void top_whileBody0_0_ifBody0 {
lastKey$6208 = key$6207.copy();
agg0_sumIsNull = true;
agg0_sum = ((org.apache.flink.table.data.DecimalData) null);
agg1_sumIsNull = true;
agg1_sum = ((org.apache.flink.table.data.DecimalData) null);
agg3_sum = ((org.apache.flink.table.data.DecimalData) null);
agg3_sum = ((org.apache.flink.table.data.DecimalData) null);
}

void top_rewriteGroup_0() {
top_whileBody0_0();
if (lastKey$6208 == null) {
  top_whileBody0_0_ifBody0();
  } else if (lastKey$6209 == null) { 
top_whileBody0_0_ifBody1_ifBody0();
  } else {   
 top_whileBody0_0_ifBody1_ifBody1();
  }
}
{code}


What do you think [~TsReaper]?




was (Author: kristoffsc):
Hi [~TsReaper]
Im sory for a long delay. I was actyally trying to develop a PoC fix for this 
problem. I think I managed to at least proof a concept. You can find my raft PR 
here -> https://github.com/apache/flink/pull/21393  The code from this PR made 
the SQL from this ticket to compile and execute which is at least something :) 

The idea is to enhance FunctionSplitter that for every codeBlock 
(getMergedCodeBlocks method) that is bigger than "maxMethodLength" try to 
further split it by calling two new splitters that I've created:
1. BlockStatementSplitter
2. BlockStatementGrouper

The BlockStatementSplitter splits body of WHILE, IF/ELSE statements to new 
methods. The original statement is rewritten that will call those new methods. 

Next 

[jira] [Comment Edited] (FLINK-27246) Code of method "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V" of class "HashAggregateWithKeys$9211" grows beyond 64 KB

2022-11-25 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17638614#comment-17638614
 ] 

Krzysztof Chmielewski edited comment on FLINK-27246 at 11/25/22 7:20 PM:
-

Hi [~TsReaper]
Im sory for a long delay. I was actyally trying to develop a PoC fix for this 
problem. I think I managed to at least proof a concept. You can find my raft PR 
here -> https://github.com/apache/flink/pull/21393  The code from this PR made 
the SQL from this ticket to compile and execute which is at least something :) 

The idea is to enhance FunctionSplitter that for every codeBlock 
(getMergedCodeBlocks method) that is bigger than "maxMethodLength" try to 
further split it by calling two new splitters that I've created:
1. BlockStatementSplitter
2. BlockStatementGrouper

The BlockStatementSplitter splits body of WHILE, IF/ELSE statements to new 
methods. The original statement is rewritten that will call those new methods. 

Next ,the *BlockStatementGrouper* is groping calls created by 
*BlockStatementSplitter*  to blocks with lengths <  "maxMethodLength" and 
extracting those to another new method. Finally *BlockStatementGrouper* 
rewrites original code block to call methods created by *BlockStatementGrouper*.

For example, an input statement:
{code:java}
while (
(kvPair$9687 = 
(org.apache.flink.api.java.tuple.Tuple2) iterator.next()) != null) {
key$6207 = (org.apache.flink.table.data.binary.BinaryRowData) 
kvPair$9687.f0;
val$9688 = (org.apache.flink.table.data.binary.BinaryRowData) 
kvPair$9687.f1;
// prepare input
local$5912.replace(key$6207, val$9688);
if (lastKey$6208 == null) {
  lastKey$6208 = key$6207.copy();
  agg0_sumIsNull = true;
agg0_sum = ((org.apache.flink.table.data.DecimalData) null);
agg1_sumIsNull = true;
agg1_sum = ((org.apache.flink.table.data.DecimalData) null);
agg3_sum = ((org.apache.flink.table.data.DecimalData) null);
agg3_sum = ((org.apache.flink.table.data.DecimalData) null);
  } else if (lastKey$6209 == null) { 
agg2_sum = ((org.apache.flink.table.data.DecimalData) null);
agg3_sumIsNull = true;
} else {
agg2_sumIsNull = true;
}};
{code}

will be converted by BlockStatementSplitter to:

{code:java}
while (
(kvPair$9687 = 
(org.apache.flink.api.java.tuple.Tuple2) iterator.next()) != null) {
top_whileBody0_0();
if (lastKey$6208 == null) {
  top_whileBody0_0_ifBody0();
  } else if (lastKey$6209 == null) { 
top_whileBody0_0_ifBody1_ifBody0();
} else {
top_whileBody0_0_ifBody1_ifBody1();
}};
{code}

Further this will be converted by BlockStatementGrouper with maxMethodLength 
parameter set to 4000 to:
{code:java}
while (
(kvPair$9687 = 
(org.apache.flink.api.java.tuple.Tuple2) iterator.next()) != null) {

top_rewriteGroup_0();
}
{code}


Body for the new methods would be:
{code:java}
private void top_whileBody0_0_ifBody1_ifBody0 {
agg2_sum = ((org.apache.flink.table.data.DecimalData) null);
agg3_sumIsNull = true;
}

private void top_whileBody0_0_ifBody1_ifBody1 {
agg2_sumIsNull = true;
}

private void top_whileBody0_0 {
key$6207 = (org.apache.flink.table.data.binary.BinaryRowData) kvPair$9687.f0;
val$9688 = (org.apache.flink.table.data.binary.BinaryRowData) kvPair$9687.f1;
local$5912.replace(key$6207, val$9688);
}

private void top_whileBody0_0_ifBody0 {
lastKey$6208 = key$6207.copy();
agg0_sumIsNull = true;
agg0_sum = ((org.apache.flink.table.data.DecimalData) null);
agg1_sumIsNull = true;
agg1_sum = ((org.apache.flink.table.data.DecimalData) null);
agg3_sum = ((org.apache.flink.table.data.DecimalData) null);
agg3_sum = ((org.apache.flink.table.data.DecimalData) null);
}

void top_rewriteGroup_0() {
top_whileBody0_0();
if (lastKey$6208 == null) {
  top_whileBody0_0_ifBody0();
  } else if (lastKey$6209 == null) { 
top_whileBody0_0_ifBody1_ifBody0();
  } else {   
 top_whileBody0_0_ifBody1_ifBody1();
  }
}
{code}


What do you think [~TsReaper]?




was (Author: kristoffsc):
Hi [~TsReaper]
Im sory for a long delay. I was actyally trying to develop a PoC fix for this 
problem. I think I managed to at least proof a concept. You can find my raft PR 
here -> https://github.com/apache/flink/pull/21393  The code from this PR made 
the SQL from this ticket to compile and execute which is at least something :) 

The idea is to enhance FunctionSplitter that for every codeBlock 
(getMergedCodeBlocks method) that is bigger than "maxMethodLength" try to 
further split it by calling two new splitters that I've created:
1. BlockStatementSplitter
2. BlockStatementGrouper

The BlockStatementSplitter splits body of WHILE, IF/ELSE statements to new 
methods. The original statement is rewritten that will 

[jira] [Comment Edited] (FLINK-27246) Code of method "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V" of class "HashAggregateWithKeys$9211" grows beyond 64 KB

2022-11-25 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17638614#comment-17638614
 ] 

Krzysztof Chmielewski edited comment on FLINK-27246 at 11/25/22 10:23 AM:
--

Hi [~TsReaper]
Im sory for a long delay. I was actyally trying to develop a PoC fix for this 
problem. I think I managed to at least proof a concept. You can find my raft PR 
here -> https://github.com/apache/flink/pull/21393  The code from this PR made 
the SQL from this ticket to compile and execute which is at least something :) 

The idea is to enhance FunctionSplitter that for every codeBlock 
(getMergedCodeBlocks method) that is bigger than "maxMethodLength" try to 
further split it by calling two new splitters that I've created:
1. BlockStatementSplitter
2. BlockStatementGrouper

The BlockStatementSplitter splits body of WHILE, IF/ELSE statements to new 
methods. The original statement is rewritten that will call those new methods. 

Next ,the *BlockStatementGrouper* is groping calls created by 
*BlockStatementSplitter*  to blocks with lengths <  "maxMethodLength" and 
extracting those to another new method. Finally *BlockStatementGrouper* 
rewrites original code block to call methods created by *BlockStatementGrouper*.

For example, an input statement:
{code:java}
while (
(kvPair$9687 = 
(org.apache.flink.api.java.tuple.Tuple2) iterator.next()) != null) {
key$6207 = (org.apache.flink.table.data.binary.BinaryRowData) 
kvPair$9687.f0;
val$9688 = (org.apache.flink.table.data.binary.BinaryRowData) 
kvPair$9687.f1;
// prepare input
local$5912.replace(key$6207, val$9688);
if (lastKey$6208 == null) {
  // found first key group
  lastKey$6208 = key$6207.copy();
  agg0_sumIsNull = true;
agg0_sum = ((org.apache.flink.table.data.DecimalData) null);
agg1_sumIsNull = true;
agg1_sum = ((org.apache.flink.table.data.DecimalData) null);
agg3_sum = ((org.apache.flink.table.data.DecimalData) null);
agg3_sum = ((org.apache.flink.table.data.DecimalData) null);
  } else if (lastKey$6209 == null) { 
agg2_sum = ((org.apache.flink.table.data.DecimalData) null);
agg3_sumIsNull = true;
} else {
agg2_sumIsNull = true;
}};
{code}

will be converted by BlockStatementSplitter to:

{code:java}
while (
(kvPair$9687 = 
(org.apache.flink.api.java.tuple.Tuple2) iterator.next()) != null) {
top_whileBody0_0();
if (lastKey$6208 == null) {
  // found first key group
  top_whileBody0_0_ifBody0();
  } else if (lastKey$6209 == null) { 
top_whileBody0_0_ifBody1_ifBody0();
} else {
top_whileBody0_0_ifBody1_ifBody1();
}};
{code}

Further this will be converted by BlockStatementGrouper with maxMethodLength 
parameter set to 4000 to:
{code:java}
while (
(kvPair$9687 = 
(org.apache.flink.api.java.tuple.Tuple2) iterator.next()) != null) {

top_rewriteGroup_0();
}
{code}


Body for the new methods would be:
{code:java}
private void top_whileBody0_0_ifBody1_ifBody0 {
agg2_sum = ((org.apache.flink.table.data.DecimalData) null);
agg3_sumIsNull = true;
}

private void top_whileBody0_0_ifBody1_ifBody1 {
agg2_sumIsNull = true;
}

private void top_whileBody0_0 {
key$6207 = (org.apache.flink.table.data.binary.BinaryRowData) kvPair$9687.f0;
val$9688 = (org.apache.flink.table.data.binary.BinaryRowData) kvPair$9687.f1;
local$5912.replace(key$6207, val$9688);
}

private void top_whileBody0_0_ifBody0 {
lastKey$6208 = key$6207.copy();
agg0_sumIsNull = true;
agg0_sum = ((org.apache.flink.table.data.DecimalData) null);
agg1_sumIsNull = true;
agg1_sum = ((org.apache.flink.table.data.DecimalData) null);
agg3_sum = ((org.apache.flink.table.data.DecimalData) null);
agg3_sum = ((org.apache.flink.table.data.DecimalData) null);
}

void top_rewriteGroup_0() {
top_whileBody0_0();
if (lastKey$6208 == null) {
  // found first key group
  top_whileBody0_0_ifBody0();
  } else if (lastKey$6209 == null) { 
top_whileBody0_0_ifBody1_ifBody0();
  } else {   
 top_whileBody0_0_ifBody1_ifBody1();
  }
}
{code}


What do you think [~TsReaper]?




was (Author: kristoffsc):
Hi [~TsReaper]
Im sory for a long delay. I was actyally trying to develop a PoC fix for this 
problem. I think I managed to at least proof a concept. You can find my raft PR 
here -> https://github.com/apache/flink/pull/21393  The code from this PR made 
the SQL from this ticket to compile and execute which is at least something :) 

The idea is to enhance FunctionSplitter that for every codeBlock 
(getMergedCodeBlocks method) that is bigger than "maxMethodLength" try to 
further split it by calling two new splitters that I've created:
1. BlockStatementSplitter
2. BlockStatementGrouper

The 

[jira] [Comment Edited] (FLINK-27246) Code of method "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V" of class "HashAggregateWithKeys$9211" grows beyond 64 KB

2022-11-25 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17638614#comment-17638614
 ] 

Krzysztof Chmielewski edited comment on FLINK-27246 at 11/25/22 10:23 AM:
--

Hi [~TsReaper]
Im sory for a long delay. I was actyally trying to develop a PoC fix for this 
problem. I think I managed to at least proof a concept. You can find my raft PR 
here -> https://github.com/apache/flink/pull/21393  The code from this PR made 
the SQL from this ticket to compile and execute which is at least something :) 

The idea is to enhance FunctionSplitter that for every codeBlock 
(getMergedCodeBlocks method) that is bigger than "maxMethodLength" try to 
further split it by calling two new splitters that I've created:
1. BlockStatementSplitter
2. BlockStatementGrouper

The BlockStatementSplitter splits body of WHILE, IF/ELSE statements to new 
methods. The original statement is rewritten that will call those new methods. 

Next ,the *BlockStatementGrouper* is groping calls created by 
*BlockStatementSplitter*  to blocks with lengths <  "maxMethodLength" and 
extracting those to another new method. Finally *BlockStatementGrouper* 
rewrites original code block to call methods created by *BlockStatementGrouper*.

For example, an input statement:
{code:java}
while (
(kvPair$9687 = 
(org.apache.flink.api.java.tuple.Tuple2) iterator.next()) != null) {
key$6207 = (org.apache.flink.table.data.binary.BinaryRowData) 
kvPair$9687.f0;
val$9688 = (org.apache.flink.table.data.binary.BinaryRowData) 
kvPair$9687.f1;
// prepare input
local$5912.replace(key$6207, val$9688);
if (lastKey$6208 == null) {
  // found first key group
  lastKey$6208 = key$6207.copy();
  agg0_sumIsNull = true;
agg0_sum = ((org.apache.flink.table.data.DecimalData) null);
agg1_sumIsNull = true;
agg1_sum = ((org.apache.flink.table.data.DecimalData) null);
agg3_sum = ((org.apache.flink.table.data.DecimalData) null);
agg3_sum = ((org.apache.flink.table.data.DecimalData) null);
  } else if (lastKey$6209 == null) { 
agg2_sum = ((org.apache.flink.table.data.DecimalData) null);
agg3_sumIsNull = true;
} else {
agg2_sumIsNull = true;
}};
{code}

will be converted by BlockStatementSplitter to:

{code:java}
while (
(kvPair$9687 = 
(org.apache.flink.api.java.tuple.Tuple2) iterator.next()) != null) {
top_whileBody0_0();
if (lastKey$6208 == null) {
  // found first key group
  top_whileBody0_0_ifBody0();
  } else if (lastKey$6209 == null) { 
top_whileBody0_0_ifBody1_ifBody0();
} else {
top_whileBody0_0_ifBody1_ifBody1();
}};
{code}

Further this will be converted by BlockStatementGrouper with maxMethodLength 
parameter set to 4000 to:
{code:java}
while (
(kvPair$9687 = 
(org.apache.flink.api.java.tuple.Tuple2) iterator.next()) != null) {

top_rewriteGroup_0();
}
{code}


New methods body would be:
{code:java}
private void top_whileBody0_0_ifBody1_ifBody0 {
agg2_sum = ((org.apache.flink.table.data.DecimalData) null);
agg3_sumIsNull = true;
}

private void top_whileBody0_0_ifBody1_ifBody1 {
agg2_sumIsNull = true;
}

private void top_whileBody0_0 {
key$6207 = (org.apache.flink.table.data.binary.BinaryRowData) kvPair$9687.f0;
val$9688 = (org.apache.flink.table.data.binary.BinaryRowData) kvPair$9687.f1;
local$5912.replace(key$6207, val$9688);
}

private void top_whileBody0_0_ifBody0 {
lastKey$6208 = key$6207.copy();
agg0_sumIsNull = true;
agg0_sum = ((org.apache.flink.table.data.DecimalData) null);
agg1_sumIsNull = true;
agg1_sum = ((org.apache.flink.table.data.DecimalData) null);
agg3_sum = ((org.apache.flink.table.data.DecimalData) null);
agg3_sum = ((org.apache.flink.table.data.DecimalData) null);
}

void top_rewriteGroup_0() {
top_whileBody0_0();
if (lastKey$6208 == null) {
  // found first key group
  top_whileBody0_0_ifBody0();
  } else if (lastKey$6209 == null) { 
top_whileBody0_0_ifBody1_ifBody0();
  } else {   
 top_whileBody0_0_ifBody1_ifBody1();
  }
}
{code}


What do you think [~TsReaper]?




was (Author: kristoffsc):
Hi [~TsReaper]
Im sory for a long delay. I was actyally trying to develop a PoC fix for this 
problem. I think I managed to at least proof a concept. You can find my raft PR 
here -> https://github.com/apache/flink/pull/21393  The code from this PR made 
the SQL from this ticket to compile and execute which is at least something :) 

The idea is to enhance FunctionSplitter that for every codeBlock 
(getMergedCodeBlocks method) that is bigger than "maxMethodLength" try to 
further split it by calling two new splitters that I've created:
1. BlockStatementSplitter
2. BlockStatementGrouper

The 

[jira] [Comment Edited] (FLINK-27246) Code of method "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V" of class "HashAggregateWithKeys$9211" grows beyond 64 KB

2022-11-25 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17638614#comment-17638614
 ] 

Krzysztof Chmielewski edited comment on FLINK-27246 at 11/25/22 10:14 AM:
--

Hi [~TsReaper]
Im sory for a long delay. I was actyally trying to develop a PoC fix for this 
problem. I think I managed to at least proof a concept. You can find my raft PR 
here -> https://github.com/apache/flink/pull/21393  The code from this PR made 
the SQL from this ticket to compile and execute which is at least something :) 

The idea is to enhance FunctionSplitter that for every codeBlock 
(getMergedCodeBlocks method) that is bigger than "maxMethodLength" try to 
further split it by calling two new splitters that I've created:
1. BlockStatementSplitter
2. BlockStatementGrouper

The BlockStatementSplitter splits body of WHILE, IF/ELSE statements to new 
methods. The original statement is rewritten that will call those new methods. 

Next ,the *BlockStatementGrouper* is groping calls created by 
*BlockStatementSplitter*  to blocks with lengths <  "maxMethodLength" and 
extracting those to another new method. Finally *BlockStatementGrouper* 
rewrites original code block to call methods created by *BlockStatementGrouper*.

For example, an input statement:
{code:java}
while (
(kvPair$9687 = 
(org.apache.flink.api.java.tuple.Tuple2) iterator.next()) != null) {
key$6207 = (org.apache.flink.table.data.binary.BinaryRowData) 
kvPair$9687.f0;
val$9688 = (org.apache.flink.table.data.binary.BinaryRowData) 
kvPair$9687.f1;
// prepare input
local$5912.replace(key$6207, val$9688);
if (lastKey$6208 == null) {
  // found first key group
  lastKey$6208 = key$6207.copy();
  agg0_sumIsNull = true;
agg0_sum = ((org.apache.flink.table.data.DecimalData) null);
agg1_sumIsNull = true;
agg1_sum = ((org.apache.flink.table.data.DecimalData) null);
agg3_sum = ((org.apache.flink.table.data.DecimalData) null);
agg3_sum = ((org.apache.flink.table.data.DecimalData) null);
  } else if (lastKey$6209 == null) { agg2_sum = 
((org.apache.flink.table.data.DecimalData) null);
agg3_sumIsNull = true;
} else {agg2_sumIsNull = true;
}};
{code}

will be converted by BlockStatementSplitter to:

{code:java}
while (
(kvPair$9687 = 
(org.apache.flink.api.java.tuple.Tuple2) iterator.next()) != null) {
top_whileBody0_0();
if (lastKey$6208 == null) {
  // found first key group
  top_whileBody0_0_ifBody0();
  } else if (lastKey$6209 == null) { 
top_whileBody0_0_ifBody1_ifBody0();
} else {top_whileBody0_0_ifBody1_ifBody1();
}};
{code}

Further this will be converted by BlockStatementGrouper with maxMethodLength 
parameter set to 4000 to:
{code:java}
while (
(kvPair$9687 = 
(org.apache.flink.api.java.tuple.Tuple2) iterator.next()) != null) {

top_rewriteGroup_0();
}
{code}


New methods body would be:
{code:java}
private void top_whileBody0_0_ifBody1_ifBody0 {
agg2_sum = ((org.apache.flink.table.data.DecimalData) null);
agg3_sumIsNull = true;
}

private void top_whileBody0_0_ifBody1_ifBody1 {
agg2_sumIsNull = true;
}

private void top_whileBody0_0 {
key$6207 = (org.apache.flink.table.data.binary.BinaryRowData) kvPair$9687.f0;
val$9688 = (org.apache.flink.table.data.binary.BinaryRowData) kvPair$9687.f1;
local$5912.replace(key$6207, val$9688);
}

private void top_whileBody0_0_ifBody0 {
lastKey$6208 = key$6207.copy();
agg0_sumIsNull = true;
agg0_sum = ((org.apache.flink.table.data.DecimalData) null);
agg1_sumIsNull = true;
agg1_sum = ((org.apache.flink.table.data.DecimalData) null);
agg3_sum = ((org.apache.flink.table.data.DecimalData) null);
agg3_sum = ((org.apache.flink.table.data.DecimalData) null);
}

void top_rewriteGroup_0() {
top_whileBody0_0();
if (lastKey$6208 == null) {
  // found first key group
  top_whileBody0_0_ifBody0();
  } else if (lastKey$6209 == null) { 
top_whileBody0_0_ifBody1_ifBody0();
  } else {   
 top_whileBody0_0_ifBody1_ifBody1();
  }
}
{code}


What do you think [~TsReaper]?




was (Author: kristoffsc):
Hi [~TsReaper]
Im sory for a long delay. I was actyally trying to develop a PoC fix for this 
problem. I think I managed to at least proof a concept. You can find my raft PR 
here -> https://github.com/apache/flink/pull/21393  The code from this PR made 
the SQL from this ticket to compile and execute which is at least something :) 

The idea is to enhance FunctionSplitter that for every codeBlock 
(getMergedCodeBlocks method) that is bigger than "maxMethodLength" try to 
further split it by calling two new splitters that I've created:
1. BlockStatementSplitter
2. BlockStatementGrouper

The 

[jira] [Commented] (FLINK-27246) Code of method "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V" of class "HashAggregateWithKeys$9211" grows beyond 64 KB

2022-11-25 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17638614#comment-17638614
 ] 

Krzysztof Chmielewski commented on FLINK-27246:
---

Hi [~TsReaper]
Im sory for a long delay. I was actyally trying to develop a PoC fix for this 
problem. I think I managed to at least proof a concept. You can find my raft PR 
here -> https://github.com/apache/flink/pull/21393  The code from this PR made 
the SQL from this ticket to compile and execute which is at least something :) 

The idea is to enhance FunctionSplitter that for every codeBlock 
(getMergedCodeBlocks method) that is bigger than "maxMethodLength" try to 
further split it by calling two new splitters that I've created:
1. BlockStatementSplitter
2. BlockStatementGrouper

The BlockStatementSplitter splits body of WHILE, IF/ELSE statements to new 
methods. The original statement is rewritten that will call those new methods. 

Next ,the *BlockStatementGrouper* is groping those new method calls created by 
*BlockStatementSplitter*  to blocks with lengths <  "maxMethodLength" and 
extracting those to yet another new method. Finally *BlockStatementGrouper* 
rewrites original code block to call methods created by *BlockStatementGrouper*.

For example, an input statement:
{code:java}
while (
(kvPair$9687 = 
(org.apache.flink.api.java.tuple.Tuple2) iterator.next()) != null) {
key$6207 = (org.apache.flink.table.data.binary.BinaryRowData) 
kvPair$9687.f0;
val$9688 = (org.apache.flink.table.data.binary.BinaryRowData) 
kvPair$9687.f1;
// prepare input
local$5912.replace(key$6207, val$9688);
if (lastKey$6208 == null) {
  // found first key group
  lastKey$6208 = key$6207.copy();
  agg0_sumIsNull = true;
agg0_sum = ((org.apache.flink.table.data.DecimalData) null);
agg1_sumIsNull = true;
agg1_sum = ((org.apache.flink.table.data.DecimalData) null);
agg3_sum = ((org.apache.flink.table.data.DecimalData) null);
agg3_sum = ((org.apache.flink.table.data.DecimalData) null);
  } else if (lastKey$6209 == null) { agg2_sum = 
((org.apache.flink.table.data.DecimalData) null);
agg3_sumIsNull = true;
} else {agg2_sumIsNull = true;
}};
{code}

will be converted by BlockStatementSplitter to:

{code:java}
while (
(kvPair$9687 = 
(org.apache.flink.api.java.tuple.Tuple2) iterator.next()) != null) {
top_whileBody0_0();
if (lastKey$6208 == null) {
  // found first key group
  top_whileBody0_0_ifBody0();
  } else if (lastKey$6209 == null) { 
top_whileBody0_0_ifBody1_ifBody0();
} else {top_whileBody0_0_ifBody1_ifBody1();
}};
{code}

Further this will be converted by BlockStatementGrouper with maxMethodLength 
parameter set to 4000 to:
{code:java}
while (
(kvPair$9687 = 
(org.apache.flink.api.java.tuple.Tuple2) iterator.next()) != null) {

top_rewriteGroup_0();
}
{code}


New methods body would be:
{code:java}
private void top_whileBody0_0_ifBody1_ifBody0 {
agg2_sum = ((org.apache.flink.table.data.DecimalData) null);
agg3_sumIsNull = true;
}

private void top_whileBody0_0_ifBody1_ifBody1 {
agg2_sumIsNull = true;
}

private void top_whileBody0_0 {
key$6207 = (org.apache.flink.table.data.binary.BinaryRowData) kvPair$9687.f0;
val$9688 = (org.apache.flink.table.data.binary.BinaryRowData) kvPair$9687.f1;
local$5912.replace(key$6207, val$9688);
}

private void top_whileBody0_0_ifBody0 {
lastKey$6208 = key$6207.copy();
agg0_sumIsNull = true;
agg0_sum = ((org.apache.flink.table.data.DecimalData) null);
agg1_sumIsNull = true;
agg1_sum = ((org.apache.flink.table.data.DecimalData) null);
agg3_sum = ((org.apache.flink.table.data.DecimalData) null);
agg3_sum = ((org.apache.flink.table.data.DecimalData) null);
}

void top_rewriteGroup_0() {
top_whileBody0_0();
if (lastKey$6208 == null) {
  // found first key group
  top_whileBody0_0_ifBody0();
  } else if (lastKey$6209 == null) { 
top_whileBody0_0_ifBody1_ifBody0();
  } else {   
 top_whileBody0_0_ifBody1_ifBody1();
  }
}
{code}


What do you think [~TsReaper]?



> Code of method 
> "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V"
>  of class "HashAggregateWithKeys$9211" grows beyond 64 KB
> -
>
> Key: FLINK-27246
> URL: https://issues.apache.org/jira/browse/FLINK-27246
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.14.3
>Reporter: Maciej Bryński
>Priority: Major
> Attachments: 

[jira] [Comment Edited] (FLINK-25920) Allow receiving updates of CommittableSummary

2022-11-09 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17631277#comment-17631277
 ] 

Krzysztof Chmielewski edited comment on FLINK-25920 at 11/9/22 8:41 PM:


[~bdine] and [~qinjunjerry] could you share what kind of Sink are you suing?

Recently we were fixing various issues with Sink architecture for Flink 1.15.
https://issues.apache.org/jira/browse/FLINK-29509
https://issues.apache.org/jira/browse/FLINK-29512
https://issues.apache.org/jira/browse/FLINK-29627

One of the symptoms was this issue for setup with aligned checkpoints. 
You would need to have all 3 of those fixes so you would need to use Flink 
1.15.3 or 1.16.1 (both not yet released).


was (Author: kristoffsc):
[~bdine] and [~qinjunjerry] could you share what kind of Sink are you suing?

Recently we were fixing various issues with Sink architecture for Flink 1.15.
https://issues.apache.org/jira/browse/FLINK-29509
https://issues.apache.org/jira/browse/FLINK-29512
https://issues.apache.org/jira/browse/FLINK-29627

One of the symptoms was this issue for setup with aligned checkpoints. 
You would need to have all 3 of those fixes so you would need to use Flink 
1.15.3 or 1.16.1 (both not yet released).

> Allow receiving updates of CommittableSummary
> -
>
> Key: FLINK-25920
> URL: https://issues.apache.org/jira/browse/FLINK-25920
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / DataStream, Connectors / Common
>Affects Versions: 1.15.0, 1.16.0
>Reporter: Fabian Paul
>Priority: Major
>
> In the case of unaligned checkpoints, it might happen that the checkpoint 
> barrier overtakes the records and an empty committable summary is emitted 
> that needs to be correct at a later point when the records arrive.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-25920) Allow receiving updates of CommittableSummary

2022-11-09 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17631277#comment-17631277
 ] 

Krzysztof Chmielewski edited comment on FLINK-25920 at 11/9/22 8:41 PM:


[~bdine] and [~qinjunjerry] could you share what kind of Sink are you suing?

Recently we were fixing various issues with Sink architecture for Flink 1.15.
https://issues.apache.org/jira/browse/FLINK-29509
https://issues.apache.org/jira/browse/FLINK-29512
https://issues.apache.org/jira/browse/FLINK-29627

One of the symptoms was this issue for setup with aligned checkpoints. 
You would need to have all 3 of those fixes so you would need to use Flink 
1.15.3 or 1.16.1 (both not yet released).


was (Author: kristoffsc):
[~bdine] and [~qinjunjerry] could you share what kind of Sink are you suing?

Recently we were fixing various issues with Sink architecture for Flink 1.15.
https://issues.apache.org/jira/browse/FLINK-29509
https://issues.apache.org/jira/browse/FLINK-29512
https://issues.apache.org/jira/browse/FLINK-29627

One of the symptoms was this issue for setup with aligned checkpoints. 
You would need to have all 3 of those tickets so you would need to use Flink 
1.15.3 or 1.16.1 (both not yet released).

> Allow receiving updates of CommittableSummary
> -
>
> Key: FLINK-25920
> URL: https://issues.apache.org/jira/browse/FLINK-25920
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / DataStream, Connectors / Common
>Affects Versions: 1.15.0, 1.16.0
>Reporter: Fabian Paul
>Priority: Major
>
> In the case of unaligned checkpoints, it might happen that the checkpoint 
> barrier overtakes the records and an empty committable summary is emitted 
> that needs to be correct at a later point when the records arrive.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-25920) Allow receiving updates of CommittableSummary

2022-11-09 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17631277#comment-17631277
 ] 

Krzysztof Chmielewski edited comment on FLINK-25920 at 11/9/22 8:40 PM:


[~bdine] and [~qinjunjerry] could you share what kind of Sink are you suing?

Recently we were fixing various issues with Sink architecture for Flink 1.15.
https://issues.apache.org/jira/browse/FLINK-29509
https://issues.apache.org/jira/browse/FLINK-29512
https://issues.apache.org/jira/browse/FLINK-29627

One of the symptoms was this issue for setup with aligned checkpoints. 
You would need to have all 3 of those tickets so you would need to use Flink 
1.15.3 or 1.16.1 (both not yet released).


was (Author: kristoffsc):
[~bdine] could you share what kind of Sink are you suing?

Recently we were fixing various issues with Sink architecture for Flink 1.15.
https://issues.apache.org/jira/browse/FLINK-29509
https://issues.apache.org/jira/browse/FLINK-29512
https://issues.apache.org/jira/browse/FLINK-29627

One of the symptoms was this issue for setup with aligned checkpoints. 
You would need to have all 3 of those tickets so you would need to use Flink 
1.15.3 or 1.16.1 (both not yet released).

> Allow receiving updates of CommittableSummary
> -
>
> Key: FLINK-25920
> URL: https://issues.apache.org/jira/browse/FLINK-25920
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / DataStream, Connectors / Common
>Affects Versions: 1.15.0, 1.16.0
>Reporter: Fabian Paul
>Priority: Major
>
> In the case of unaligned checkpoints, it might happen that the checkpoint 
> barrier overtakes the records and an empty committable summary is emitted 
> that needs to be correct at a later point when the records arrive.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-25920) Allow receiving updates of CommittableSummary

2022-11-09 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17631277#comment-17631277
 ] 

Krzysztof Chmielewski commented on FLINK-25920:
---

[~bdine] could you share what kind of Sink are you suing?

Recently we were fixing various issues with Sink architecture for Flink 1.15.
https://issues.apache.org/jira/browse/FLINK-29509
https://issues.apache.org/jira/browse/FLINK-29512
https://issues.apache.org/jira/browse/FLINK-29627

One of the symptoms was this issue for setup with aligned checkpoints. 
You would need to have all 3 of those tickets so you would need to use Flink 
1.15.3 or 1.16.1 (both not yet released).

> Allow receiving updates of CommittableSummary
> -
>
> Key: FLINK-25920
> URL: https://issues.apache.org/jira/browse/FLINK-25920
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / DataStream, Connectors / Common
>Affects Versions: 1.15.0, 1.16.0
>Reporter: Fabian Paul
>Priority: Major
>
> In the case of unaligned checkpoints, it might happen that the checkpoint 
> barrier overtakes the records and an empty committable summary is emitted 
> that needs to be correct at a later point when the records arrive.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-27246) Code of method "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V" of class "HashAggregateWithKeys$9211" grows beyond 64 KB

2022-11-03 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17628218#comment-17628218
 ] 

Krzysztof Chmielewski edited comment on FLINK-27246 at 11/3/22 9:31 AM:


[~TsReaper]
I see why my comment was causing confusion, apologize for that. 

For a moment I though that FunctionSplitter is rewriting this while block but 
now I see clearly it does not. 
At the same time I saw that some "rewrite" process is applyed to this block but 
also now I see it was MemberFieldRewriter logic. 

Long story short, I did not want to run JavaCodeSplitter again but just enhance 
current logic to handle this "while" case.


was (Author: kristoffsc):
[~TsReaper]
I see why my comment was causing confusion, apologize for that. 

For a moment I though that FunctionSplitter is rewriting this while block but 
now I see clearly it does not. 
At the same time I saw that some "rewrite" process is applyed to this block but 
also now I see it was MemberFieldRewriter logic. 

Long story short, I did not want to run JavaCodeSplitter again but just extend 
rewrite to handle this "while" case. 

> Code of method 
> "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V"
>  of class "HashAggregateWithKeys$9211" grows beyond 64 KB
> -
>
> Key: FLINK-27246
> URL: https://issues.apache.org/jira/browse/FLINK-27246
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.14.3
>Reporter: Maciej Bryński
>Priority: Major
> Attachments: endInput_falseFilter9123_split9704.txt
>
>
> I think this bug should get fixed in 
> https://issues.apache.org/jira/browse/FLINK-23007
> Unfortunately I spotted it on Flink 1.14.3
> {code}
> java.lang.RuntimeException: Could not instantiate generated class 
> 'HashAggregateWithKeys$9211'
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:85)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.operators.CodeGenOperatorFactory.createStreamOperator(CodeGenOperatorFactory.java:40)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.api.operators.StreamOperatorFactoryUtil.createOperator(StreamOperatorFactoryUtil.java:81)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.(OperatorChain.java:198)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.(RegularOperatorChain.java:63)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:666)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:654)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at java.lang.Thread.run(Unknown Source) ~[?:?]
> Caused by: org.apache.flink.util.FlinkRuntimeException: 
> org.apache.flink.api.common.InvalidProgramException: Table program cannot be 
> compiled. This is a bug. Please file an issue.
>   at 
> org.apache.flink.table.runtime.generated.CompileUtils.compile(CompileUtils.java:76)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.compile(GeneratedClass.java:102)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:83)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   ... 11 more
> Caused by: 
> org.apache.flink.shaded.guava30.com.google.common.util.concurrent.UncheckedExecutionException:
>  org.apache.flink.api.common.InvalidProgramException: Table program cannot be 
> compiled. This is a bug. Please file an issue.
>   at 
> org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2051)
>  

[jira] [Commented] (FLINK-27246) Code of method "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V" of class "HashAggregateWithKeys$9211" grows beyond 64 KB

2022-11-03 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17628218#comment-17628218
 ] 

Krzysztof Chmielewski commented on FLINK-27246:
---

[~TsReaper]
I see why my comment was causing confusion, apologize for that. 

For a moment I though that FunctionSplitter is rewriting this while block but 
now I see clearly it does not. 
At the same time I saw that some "rewrite" process is applyed to this block but 
also now I see it was MemberFieldRewriter logic. 

Long story short, I did not want to run JavaCodeSplitter again but just extend 
rewrite to handle this "while" case. 

> Code of method 
> "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V"
>  of class "HashAggregateWithKeys$9211" grows beyond 64 KB
> -
>
> Key: FLINK-27246
> URL: https://issues.apache.org/jira/browse/FLINK-27246
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.14.3
>Reporter: Maciej Bryński
>Priority: Major
> Attachments: endInput_falseFilter9123_split9704.txt
>
>
> I think this bug should get fixed in 
> https://issues.apache.org/jira/browse/FLINK-23007
> Unfortunately I spotted it on Flink 1.14.3
> {code}
> java.lang.RuntimeException: Could not instantiate generated class 
> 'HashAggregateWithKeys$9211'
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:85)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.operators.CodeGenOperatorFactory.createStreamOperator(CodeGenOperatorFactory.java:40)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.api.operators.StreamOperatorFactoryUtil.createOperator(StreamOperatorFactoryUtil.java:81)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.(OperatorChain.java:198)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.(RegularOperatorChain.java:63)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:666)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:654)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at java.lang.Thread.run(Unknown Source) ~[?:?]
> Caused by: org.apache.flink.util.FlinkRuntimeException: 
> org.apache.flink.api.common.InvalidProgramException: Table program cannot be 
> compiled. This is a bug. Please file an issue.
>   at 
> org.apache.flink.table.runtime.generated.CompileUtils.compile(CompileUtils.java:76)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.compile(GeneratedClass.java:102)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:83)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   ... 11 more
> Caused by: 
> org.apache.flink.shaded.guava30.com.google.common.util.concurrent.UncheckedExecutionException:
>  org.apache.flink.api.common.InvalidProgramException: Table program cannot be 
> compiled. This is a bug. Please file an issue.
>   at 
> org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2051)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache.get(LocalCache.java:3962)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4859)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.CompileUtils.compile(CompileUtils.java:74)
>  

[jira] [Commented] (FLINK-27246) Code of method "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V" of class "HashAggregateWithKeys$9211" grows beyond 64 KB

2022-11-02 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17627602#comment-17627602
 ] 

Krzysztof Chmielewski commented on FLINK-27246:
---

Hi [~TsReaper]
Thanks for replaying.

The extracted while loop method contains many, many referneces to 
rewrite$index[] methods and a lot of self contained if else blocks. 
There is no break, return, continue or break statements. I've attached the 
method body to this ticket.  [^endInput_falseFilter9123_split9704.txt] 

So it looks like that body of this method was already reprocessed and rewrite 
by Spliter.
Having this I was thinking that maybe if we would "rewrite" it again, to 
further split it into smaller groups, we could fix the problem.

What do you think?

> Code of method 
> "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V"
>  of class "HashAggregateWithKeys$9211" grows beyond 64 KB
> -
>
> Key: FLINK-27246
> URL: https://issues.apache.org/jira/browse/FLINK-27246
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.14.3
>Reporter: Maciej Bryński
>Priority: Major
> Attachments: endInput_falseFilter9123_split9704.txt
>
>
> I think this bug should get fixed in 
> https://issues.apache.org/jira/browse/FLINK-23007
> Unfortunately I spotted it on Flink 1.14.3
> {code}
> java.lang.RuntimeException: Could not instantiate generated class 
> 'HashAggregateWithKeys$9211'
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:85)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.operators.CodeGenOperatorFactory.createStreamOperator(CodeGenOperatorFactory.java:40)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.api.operators.StreamOperatorFactoryUtil.createOperator(StreamOperatorFactoryUtil.java:81)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.(OperatorChain.java:198)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.(RegularOperatorChain.java:63)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:666)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:654)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at java.lang.Thread.run(Unknown Source) ~[?:?]
> Caused by: org.apache.flink.util.FlinkRuntimeException: 
> org.apache.flink.api.common.InvalidProgramException: Table program cannot be 
> compiled. This is a bug. Please file an issue.
>   at 
> org.apache.flink.table.runtime.generated.CompileUtils.compile(CompileUtils.java:76)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.compile(GeneratedClass.java:102)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:83)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   ... 11 more
> Caused by: 
> org.apache.flink.shaded.guava30.com.google.common.util.concurrent.UncheckedExecutionException:
>  org.apache.flink.api.common.InvalidProgramException: Table program cannot be 
> compiled. This is a bug. Please file an issue.
>   at 
> org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2051)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache.get(LocalCache.java:3962)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4859)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> 

[jira] [Updated] (FLINK-27246) Code of method "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V" of class "HashAggregateWithKeys$9211" grows beyond 64 KB

2022-11-02 Thread Krzysztof Chmielewski (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-27246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krzysztof Chmielewski updated FLINK-27246:
--
Attachment: endInput_falseFilter9123_split9704.txt

> Code of method 
> "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V"
>  of class "HashAggregateWithKeys$9211" grows beyond 64 KB
> -
>
> Key: FLINK-27246
> URL: https://issues.apache.org/jira/browse/FLINK-27246
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.14.3
>Reporter: Maciej Bryński
>Priority: Major
> Attachments: endInput_falseFilter9123_split9704.txt
>
>
> I think this bug should get fixed in 
> https://issues.apache.org/jira/browse/FLINK-23007
> Unfortunately I spotted it on Flink 1.14.3
> {code}
> java.lang.RuntimeException: Could not instantiate generated class 
> 'HashAggregateWithKeys$9211'
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:85)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.operators.CodeGenOperatorFactory.createStreamOperator(CodeGenOperatorFactory.java:40)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.api.operators.StreamOperatorFactoryUtil.createOperator(StreamOperatorFactoryUtil.java:81)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.(OperatorChain.java:198)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.(RegularOperatorChain.java:63)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:666)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:654)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575) 
> ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at java.lang.Thread.run(Unknown Source) ~[?:?]
> Caused by: org.apache.flink.util.FlinkRuntimeException: 
> org.apache.flink.api.common.InvalidProgramException: Table program cannot be 
> compiled. This is a bug. Please file an issue.
>   at 
> org.apache.flink.table.runtime.generated.CompileUtils.compile(CompileUtils.java:76)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.compile(GeneratedClass.java:102)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:83)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   ... 11 more
> Caused by: 
> org.apache.flink.shaded.guava30.com.google.common.util.concurrent.UncheckedExecutionException:
>  org.apache.flink.api.common.InvalidProgramException: Table program cannot be 
> compiled. This is a bug. Please file an issue.
>   at 
> org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2051)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache.get(LocalCache.java:3962)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4859)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.CompileUtils.compile(CompileUtils.java:74)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.compile(GeneratedClass.java:102)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:83)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   ... 11 more
> Caused by: org.apache.flink.api.common.InvalidProgramException: Table 

[jira] [Comment Edited] (FLINK-27246) Code of method "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V" of class "HashAggregateWithKeys$9211" grows beyond 64 KB

2022-10-29 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17625871#comment-17625871
 ] 

Krzysztof Chmielewski edited comment on FLINK-27246 at 10/29/22 8:24 AM:
-

Hi,
I would like to try to fix this problem but I would appreciate any guidance. 
FYI I've verified that it still occurs on latest master branch.

>From what I've debugged the problem is caused by one rewrited method created 
>from in:
[JavaCodeSplitter -> new FunctionSplitter(text, 
maxMethodLength).rewrite()|https://github.com/apache/flink/blob/master/flink-table/flink-table-code-splitter/src/main/java/org/apache/flink/table/codesplit/JavaCodeSplitter.java#:~:text=new%20FunctionSplitter(text%2C%20maxMethodLength).rewrite()]

The FunctionSplitter::visitMethodDeclaration method iterates through 
JavaParser.BlockStatementContext elements from 
ctx.methodBody().block().blockStatement() - > 
[code|https://github.com/apache/flink/blob/87c33711fa3a4844598772ceafd66dd4a776eea9/flink-table/flink-table-code-splitter/src/main/java/org/apache/flink/table/codesplit/FunctionSplitter.java#L100:~:text=for%20(JavaParser.BlockStatementContext%20blockStatementContext]

For every element we get ContextString from it and add to the splitFuncBodies 
list. Later we iterated through this list to Merged them into [Code 
Blocks|https://github.com/apache/flink/blob/87c33711fa3a4844598772ceafd66dd4a776eea9/flink-table/flink-table-code-splitter/src/main/java/org/apache/flink/table/codesplit/FunctionSplitter.java#L100:~:text=getMergedCodeBlocks(List%3CString%3E%20codeBlock)]
 respectively to  maxMethodLength.

However the problem is that for our case the single 
JavaParser.BlockStatementContext element from 
ctx.methodBody().block().blockStatement() by it self is larger than 
maxMethodLength. Its entire body is converted to the method by 
FunctionSplitter::getMergedCodeBlocks and this causes the exception.

The code block is a big while loop that contains a bunch of calls to 
rewrite$ methods like so:

{code:java}
rewrite$9722[10418] = ((org.apache.flink.table.data.DecimalData) null);
rewrite$9729[10431] = true;
rewrite$9722[10419] = ((org.apache.flink.table.data.DecimalData) null);
rewrite$9729[10432] = true;
rewrite$9722[10420] = ((org.apache.flink.table.data.DecimalData) null);
rewrite$9729[10433] = true;
{code}

Which in my opinion could be easily extracted to separate methods which will 
solve the problem.

I would like to ask:
1. if my understanding and proposed high level solution for splitting the 
problematic code block into smaller chunks is correct?
2. should this be done by FunctionSplitter or this should be implemented in 
Scala code for code generation?

I'm quite familiar with Antlr4 so I do understand what is happening there. 





was (Author: kristoffsc):
Hi,
I would like to try fix this problem but I would appreciate any guidance. FYI 
I've verified that it still occurs on latest master branch.

>From what I've debugged the problem is caused by one rewrited method created 
>from in:
[JavaCodeSplitter -> new FunctionSplitter(text, 
maxMethodLength).rewrite()|https://github.com/apache/flink/blob/master/flink-table/flink-table-code-splitter/src/main/java/org/apache/flink/table/codesplit/JavaCodeSplitter.java#:~:text=new%20FunctionSplitter(text%2C%20maxMethodLength).rewrite()]

The FunctionSplitter::visitMethodDeclaration method iterates through 
JavaParser.BlockStatementContext elements from 
ctx.methodBody().block().blockStatement() - > 
[code|https://github.com/apache/flink/blob/87c33711fa3a4844598772ceafd66dd4a776eea9/flink-table/flink-table-code-splitter/src/main/java/org/apache/flink/table/codesplit/FunctionSplitter.java#L100:~:text=for%20(JavaParser.BlockStatementContext%20blockStatementContext]

For every element we get ContextString from it and add to the splitFuncBodies 
list. Later we iterated through this list to Merged them into [Code 
Blocks|https://github.com/apache/flink/blob/87c33711fa3a4844598772ceafd66dd4a776eea9/flink-table/flink-table-code-splitter/src/main/java/org/apache/flink/table/codesplit/FunctionSplitter.java#L100:~:text=getMergedCodeBlocks(List%3CString%3E%20codeBlock)]
 respectively to  maxMethodLength.

However the problem is that for our case the single 
JavaParser.BlockStatementContext element from 
ctx.methodBody().block().blockStatement() by it self is larger than 
maxMethodLength. Its entire body is converted to the method by 
FunctionSplitter::getMergedCodeBlocks and this causes the exception.

The code block is a big while loop that contains a bunch of calls to 
rewrite$ methods like so:

{code:java}
rewrite$9722[10418] = ((org.apache.flink.table.data.DecimalData) null);
rewrite$9729[10431] = true;
rewrite$9722[10419] = ((org.apache.flink.table.data.DecimalData) null);
rewrite$9729[10432] = 

[jira] [Commented] (FLINK-27246) Code of method "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V" of class "HashAggregateWithKeys$9211" grows beyond 64 KB

2022-10-28 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17625871#comment-17625871
 ] 

Krzysztof Chmielewski commented on FLINK-27246:
---

Hi,
I would like to try fix this problem but I would appreciate any guidance. FYI 
I've verified that it still occurs on latest master branch.

>From what I've debugged the problem is caused by one rewrited method created 
>from in:
[JavaCodeSplitter -> new FunctionSplitter(text, 
maxMethodLength).rewrite()|https://github.com/apache/flink/blob/master/flink-table/flink-table-code-splitter/src/main/java/org/apache/flink/table/codesplit/JavaCodeSplitter.java#:~:text=new%20FunctionSplitter(text%2C%20maxMethodLength).rewrite()]

The FunctionSplitter::visitMethodDeclaration method iterates through 
JavaParser.BlockStatementContext elements from 
ctx.methodBody().block().blockStatement() - > 
[code|https://github.com/apache/flink/blob/87c33711fa3a4844598772ceafd66dd4a776eea9/flink-table/flink-table-code-splitter/src/main/java/org/apache/flink/table/codesplit/FunctionSplitter.java#L100:~:text=for%20(JavaParser.BlockStatementContext%20blockStatementContext]

For every element we get ContextString from it and add to the splitFuncBodies 
list. Later we iterated through this list to Merged them into [Code 
Blocks|https://github.com/apache/flink/blob/87c33711fa3a4844598772ceafd66dd4a776eea9/flink-table/flink-table-code-splitter/src/main/java/org/apache/flink/table/codesplit/FunctionSplitter.java#L100:~:text=getMergedCodeBlocks(List%3CString%3E%20codeBlock)]
 respectively to  maxMethodLength.

However the problem is that for our case the single 
JavaParser.BlockStatementContext element from 
ctx.methodBody().block().blockStatement() by it self is larger than 
maxMethodLength. Its entire body is converted to the method by 
FunctionSplitter::getMergedCodeBlocks and this causes the exception.

The code block is a big while loop that contains a bunch of calls to 
rewrite$ methods like so:

{code:java}
rewrite$9722[10418] = ((org.apache.flink.table.data.DecimalData) null);
rewrite$9729[10431] = true;
rewrite$9722[10419] = ((org.apache.flink.table.data.DecimalData) null);
rewrite$9729[10432] = true;
rewrite$9722[10420] = ((org.apache.flink.table.data.DecimalData) null);
rewrite$9729[10433] = true;
{code}

Which in my opinion could be easily extracted to separate methods which will 
solve the problem.

I would like to ask:
1. if my understanding and proposed high level solution for splitting the 
problematic code block into smaller chunks is correct?
2. should this be done by FunctionSplitter or this should be implemented in 
Scala code for code generation?

I'm quite familiar with Antlr4 so I do understand what is happening there. 




> Code of method 
> "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V"
>  of class "HashAggregateWithKeys$9211" grows beyond 64 KB
> -
>
> Key: FLINK-27246
> URL: https://issues.apache.org/jira/browse/FLINK-27246
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.14.3
>Reporter: Maciej Bryński
>Priority: Major
>
> I think this bug should get fixed in 
> https://issues.apache.org/jira/browse/FLINK-23007
> Unfortunately I spotted it on Flink 1.14.3
> {code}
> java.lang.RuntimeException: Could not instantiate generated class 
> 'HashAggregateWithKeys$9211'
>   at 
> org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:85)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.table.runtime.operators.CodeGenOperatorFactory.createStreamOperator(CodeGenOperatorFactory.java:40)
>  ~[flink-table_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.api.operators.StreamOperatorFactoryUtil.createOperator(StreamOperatorFactoryUtil.java:81)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.(OperatorChain.java:198)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.(RegularOperatorChain.java:63)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:666)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:654)
>  ~[flink-dist_2.12-1.14.3-stream1.jar:1.14.3-stream1]
>   at 
> 

[jira] [Commented] (FLINK-29459) Sink v2 has bugs in supporting legacy v1 implementations with global committer

2022-10-20 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17621187#comment-17621187
 ] 

Krzysztof Chmielewski commented on FLINK-29459:
---

FYI ticets
29509
29512
29627

are fixing issue with Task manager recovery for Sink architecture with global 
committer.

The 29583 is about recovering Flink 1.14 unified sinks committer state and 
migrate it to the extended unified model.

> Sink v2 has bugs in supporting legacy v1 implementations with global committer
> --
>
> Key: FLINK-29459
> URL: https://issues.apache.org/jira/browse/FLINK-29459
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream
>Affects Versions: 1.16.0, 1.17.0, 1.15.2
>Reporter: Yun Gao
>Assignee: Yun Gao
>Priority: Major
> Fix For: 1.17.0, 1.15.3, 1.16.1
>
>
> Currently when supporting Sink implementation using version 1 interface, 
> there are issues after restoring from a checkpoint after failover:
>  # In global committer operator, when restoring SubtaskCommittableManager, 
> the subtask id is replaced with the one in the current operator. This means 
> that the id originally is the id of the sender task (0 ~ N - 1), but after 
> restoring it has to be 0. This would cause Duplication Key exception during 
> restoring.
>  # For Committer operator, the subtaskId of CheckpointCommittableManagerImpl 
> is always restored to 0 after failover for all the subtasks. This makes the 
> summary sent to the Global Committer is attached with wrong subtask id.
>  # For Committer operator, the checkpoint id of SubtaskCommittableManager is 
> always restored to 1 after failover, this make the following committable sent 
> to the global committer is attached with wrong checkpoint id. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-29589) Data Loss in Sink GlobalCommitter during Task Manager recovery

2022-10-20 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17616307#comment-17616307
 ] 

Krzysztof Chmielewski edited comment on FLINK-29589 at 10/20/22 9:10 AM:
-

Hi [~chesnay]

V2 on 1.15, 1.16 and 1.17 has its own issues that we have found and actually we 
are working to fix them with Fabian Paul.

https://issues.apache.org/jira/browse/FLINK-29509
https://issues.apache.org/jira/browse/FLINK-29583
https://issues.apache.org/jira/browse/FLINK-29512
https://issues.apache.org/jira/browse/FLINK-29627

With those stil on the plate we cant really tell if there is a data loss on V2 
since Task manager is failing to start during recovery when running Sink with 
global committer.





was (Author: kristoffsc):
Hi [~chesnay]

V2 on 1.15, 1.16 and 1.17 has its own issues that we have found and actually we 
are working to fix them with Fabian Paul.

https://issues.apache.org/jira/browse/FLINK-29509
https://issues.apache.org/jira/browse/FLINK-29583
https://issues.apache.org/jira/browse/FLINK-29512

With those stil on the plate we cant really tell if there is a data loss on V2 
since Task manager is failing to start during recovery when running Sink with 
global committer.




> Data Loss in Sink GlobalCommitter during Task Manager recovery
> --
>
> Key: FLINK-29589
> URL: https://issues.apache.org/jira/browse/FLINK-29589
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Krzysztof Chmielewski
>Priority: Blocker
>
> Flink's Sink architecture with global committer seems to be vulnerable for 
> data loss during Task Manager recovery. The entire checkpoint can be lost by 
> _GlobalCommitter_ resulting with data loss.
> Issue was observed in Delta Sink connector on a real 1.14.x cluster and was 
> replicated using Flink's 1.14.6 Test Utils classes.
> Scenario:
>  #  Streaming source emitting constant number of events per checkpoint (20 
> events per commit for 5 commits in total, that gives 100 records).
>  #  Sink with parallelism > 1 with committer and _GlobalCommitter_ elements.
>  #  _Commiters_ processed committables for *checkpointId 2*.
>  #  _GlobalCommitter_ throws exception (desired exception) during 
> *checkpointId 2* (third commit) while processing data from *checkpoint 1* (it 
> is expected to global committer architecture lag one commit behind in 
> reference to rest of the pipeline).
>  # Task Manager recovery, source resumes sending data.
>  # Streaming source ends.
>  # We are missing 20 records (one checkpoint).
> What is happening is that during recovery, committers are performing "retry" 
> on committables for *checkpointId 2*, however those committables, reprocessed 
> from "retry" task are not emit downstream to the global committer. 
> The issue can be reproduced using Junit Test build with Flink's TestSink.
> The test was [implemented 
> here|https://github.com/kristoffSC/flink/blob/Flink_1.14_DataLoss_SinkGlobalCommitter/flink-tests/src/test/java/org/apache/flink/test/streaming/runtime/SinkITCase.java#:~:text=testGlobalCommitterMissingRecordsDuringRecovery]
>  and it is based on other tests from `SinkITCase.java` class.
> The test reproduces the issue in more than 90% of runs.
> I believe that problem is somewhere around 
> *SinkOperator::notifyCheckpointComplete* method. In there we see that Retry 
> async task is scheduled however its result is never emitted downstream like 
> it is done for regular flow one line above.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-29627) Sink - Duplicate key exception during recover more than 1 committable.

2022-10-19 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17620355#comment-17620355
 ] 

Krzysztof Chmielewski commented on FLINK-29627:
---

Backports:
1.15 - https://github.com/apache/flink/pull/21113
1.16 - https://github.com/apache/flink/pull/21115

> Sink - Duplicate key exception during recover more than 1 committable.
> --
>
> Key: FLINK-29627
> URL: https://issues.apache.org/jira/browse/FLINK-29627
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.16.0, 1.17.0, 1.15.2, 1.16.1
>Reporter: Krzysztof Chmielewski
>Assignee: Krzysztof Chmielewski
>Priority: Critical
>
> Recovery more than one Committable  causes `IllegalStateException` and 
> prevents cluster to start.
> When we recover the `CheckpointCommittableManager` we deserialize 
> SubtaskCommittableManager instances from recovery state and we put them into 
> `Map>`. The key of this map is 
> subtaskId of the recovered manager. However this will fail if we have to 
> recover more than one committable. 
> What w should do is to call `SubtaskCommittableManager::merge` if we already 
> deserialize manager for this subtaskId.
> Stack Trace:
> {code:java}
> 28603 [flink-akka.actor.default-dispatcher-8] INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Sink: Global 
> Committer (1/1) 
> (485dc57aca56235b9d1ab803c8c966ad_47d89856a1cf553f16e7063d953b7d42_0_1) 
> switched from INITIALIZING to FAILED on 2ed5c848-d360-48ae-9a92-730b022c8a39 
> @ kubernetes.docker.internal (dataPort=-1).
> java.lang.IllegalStateException: Duplicate key 0 (attempted merging values 
> org.apache.flink.streaming.runtime.operators.sink.committables.SubtaskCommittableManager@631940ac
>  and 
> org.apache.flink.streaming.runtime.operators.sink.committables.SubtaskCommittableManager@7ff3bd7)
>   at 
> java.util.stream.Collectors.duplicateKeyException(Collectors.java:133) ~[?:?]
>   at 
> java.util.stream.Collectors.lambda$uniqKeysMapAccumulator$1(Collectors.java:180)
>  ~[?:?]
>   at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169) 
> ~[?:?]
>   at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655)
>  ~[?:?]
>   at 
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) ~[?:?]
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) 
> ~[?:?]
>   at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913) 
> ~[?:?]
>   at 
> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:?]
>   at 
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578) ~[?:?]
>   at 
> org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer$CheckpointSimpleVersionedSerializer.deserialize(CommittableCollectorSerializer.java:153)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer$CheckpointSimpleVersionedSerializer.deserialize(CommittableCollectorSerializer.java:124)
>  ~[classes/:?]
>   at 
> org.apache.flink.core.io.SimpleVersionedSerialization.readVersionAndDeserializeList(SimpleVersionedSerialization.java:148)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer.deserializeV2(CommittableCollectorSerializer.java:105)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer.deserialize(CommittableCollectorSerializer.java:82)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer.deserialize(CommittableCollectorSerializer.java:41)
>  ~[classes/:?]
>   at 
> org.apache.flink.core.io.SimpleVersionedSerialization.readVersionAndDeSerialize(SimpleVersionedSerialization.java:121)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.api.connector.sink2.GlobalCommitterSerializer.deserializeV2(GlobalCommitterSerializer.java:128)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.api.connector.sink2.GlobalCommitterSerializer.deserialize(GlobalCommitterSerializer.java:99)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.api.connector.sink2.GlobalCommitterSerializer.deserialize(GlobalCommitterSerializer.java:42)
>  ~[classes/:?]
>   at 
> org.apache.flink.core.io.SimpleVersionedSerialization.readVersionAndDeSerialize(SimpleVersionedSerialization.java:227)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.api.operators.util.SimpleVersionedListState$DeserializingIterator.next(SimpleVersionedListState.java:138)
>  ~[classes/:?]
>   at 

[jira] [Commented] (FLINK-29627) Sink - Duplicate key exception during recover more than 1 committable.

2022-10-18 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17619570#comment-17619570
 ] 

Krzysztof Chmielewski commented on FLINK-29627:
---

New PR without SinkItTest
https://github.com/apache/flink/pull/21101

> Sink - Duplicate key exception during recover more than 1 committable.
> --
>
> Key: FLINK-29627
> URL: https://issues.apache.org/jira/browse/FLINK-29627
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.16.0, 1.17.0, 1.15.2, 1.16.1
>Reporter: Krzysztof Chmielewski
>Assignee: Krzysztof Chmielewski
>Priority: Critical
>
> Recovery more than one Committable  causes `IllegalStateException` and 
> prevents cluster to start.
> When we recover the `CheckpointCommittableManager` we deserialize 
> SubtaskCommittableManager instances from recovery state and we put them into 
> `Map>`. The key of this map is 
> subtaskId of the recovered manager. However this will fail if we have to 
> recover more than one committable. 
> What w should do is to call `SubtaskCommittableManager::merge` if we already 
> deserialize manager for this subtaskId.
> Stack Trace:
> {code:java}
> 28603 [flink-akka.actor.default-dispatcher-8] INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Sink: Global 
> Committer (1/1) 
> (485dc57aca56235b9d1ab803c8c966ad_47d89856a1cf553f16e7063d953b7d42_0_1) 
> switched from INITIALIZING to FAILED on 2ed5c848-d360-48ae-9a92-730b022c8a39 
> @ kubernetes.docker.internal (dataPort=-1).
> java.lang.IllegalStateException: Duplicate key 0 (attempted merging values 
> org.apache.flink.streaming.runtime.operators.sink.committables.SubtaskCommittableManager@631940ac
>  and 
> org.apache.flink.streaming.runtime.operators.sink.committables.SubtaskCommittableManager@7ff3bd7)
>   at 
> java.util.stream.Collectors.duplicateKeyException(Collectors.java:133) ~[?:?]
>   at 
> java.util.stream.Collectors.lambda$uniqKeysMapAccumulator$1(Collectors.java:180)
>  ~[?:?]
>   at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169) 
> ~[?:?]
>   at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655)
>  ~[?:?]
>   at 
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) ~[?:?]
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) 
> ~[?:?]
>   at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913) 
> ~[?:?]
>   at 
> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:?]
>   at 
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578) ~[?:?]
>   at 
> org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer$CheckpointSimpleVersionedSerializer.deserialize(CommittableCollectorSerializer.java:153)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer$CheckpointSimpleVersionedSerializer.deserialize(CommittableCollectorSerializer.java:124)
>  ~[classes/:?]
>   at 
> org.apache.flink.core.io.SimpleVersionedSerialization.readVersionAndDeserializeList(SimpleVersionedSerialization.java:148)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer.deserializeV2(CommittableCollectorSerializer.java:105)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer.deserialize(CommittableCollectorSerializer.java:82)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer.deserialize(CommittableCollectorSerializer.java:41)
>  ~[classes/:?]
>   at 
> org.apache.flink.core.io.SimpleVersionedSerialization.readVersionAndDeSerialize(SimpleVersionedSerialization.java:121)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.api.connector.sink2.GlobalCommitterSerializer.deserializeV2(GlobalCommitterSerializer.java:128)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.api.connector.sink2.GlobalCommitterSerializer.deserialize(GlobalCommitterSerializer.java:99)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.api.connector.sink2.GlobalCommitterSerializer.deserialize(GlobalCommitterSerializer.java:42)
>  ~[classes/:?]
>   at 
> org.apache.flink.core.io.SimpleVersionedSerialization.readVersionAndDeSerialize(SimpleVersionedSerialization.java:227)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.api.operators.util.SimpleVersionedListState$DeserializingIterator.next(SimpleVersionedListState.java:138)
>  ~[classes/:?]
>   at java.lang.Iterable.forEach(Iterable.java:74) ~[?:?]
>   at 
> 

[jira] [Comment Edited] (FLINK-29627) Sink - Duplicate key exception during recover more than 1 committable.

2022-10-14 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17617611#comment-17617611
 ] 

Krzysztof Chmielewski edited comment on FLINK-29627 at 10/14/22 9:54 AM:
-

PR ready:
https://github.com/apache/flink/pull/21052


Pending on https://github.com/apache/flink/pull/21022 to be merged.


was (Author: kristoffsc):
PR ready but waiting on https://github.com/apache/flink/pull/21022 to be merged.
https://github.com/apache/flink/pull/21052

> Sink - Duplicate key exception during recover more than 1 committable.
> --
>
> Key: FLINK-29627
> URL: https://issues.apache.org/jira/browse/FLINK-29627
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.16.0, 1.17.0, 1.15.2, 1.16.1
>Reporter: Krzysztof Chmielewski
>Assignee: Krzysztof Chmielewski
>Priority: Critical
>
> Recovery more than one Committable  causes `IllegalStateException` and 
> prevents cluster to start.
> When we recover the `CheckpointCommittableManager` we deserialize 
> SubtaskCommittableManager instances from recovery state and we put them into 
> `Map>`. The key of this map is 
> subtaskId of the recovered manager. However this will fail if we have to 
> recover more than one committable. 
> What w should do is to call `SubtaskCommittableManager::merge` if we already 
> deserialize manager for this subtaskId.
> Stack Trace:
> {code:java}
> 28603 [flink-akka.actor.default-dispatcher-8] INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Sink: Global 
> Committer (1/1) 
> (485dc57aca56235b9d1ab803c8c966ad_47d89856a1cf553f16e7063d953b7d42_0_1) 
> switched from INITIALIZING to FAILED on 2ed5c848-d360-48ae-9a92-730b022c8a39 
> @ kubernetes.docker.internal (dataPort=-1).
> java.lang.IllegalStateException: Duplicate key 0 (attempted merging values 
> org.apache.flink.streaming.runtime.operators.sink.committables.SubtaskCommittableManager@631940ac
>  and 
> org.apache.flink.streaming.runtime.operators.sink.committables.SubtaskCommittableManager@7ff3bd7)
>   at 
> java.util.stream.Collectors.duplicateKeyException(Collectors.java:133) ~[?:?]
>   at 
> java.util.stream.Collectors.lambda$uniqKeysMapAccumulator$1(Collectors.java:180)
>  ~[?:?]
>   at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169) 
> ~[?:?]
>   at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655)
>  ~[?:?]
>   at 
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) ~[?:?]
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) 
> ~[?:?]
>   at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913) 
> ~[?:?]
>   at 
> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:?]
>   at 
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578) ~[?:?]
>   at 
> org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer$CheckpointSimpleVersionedSerializer.deserialize(CommittableCollectorSerializer.java:153)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer$CheckpointSimpleVersionedSerializer.deserialize(CommittableCollectorSerializer.java:124)
>  ~[classes/:?]
>   at 
> org.apache.flink.core.io.SimpleVersionedSerialization.readVersionAndDeserializeList(SimpleVersionedSerialization.java:148)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer.deserializeV2(CommittableCollectorSerializer.java:105)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer.deserialize(CommittableCollectorSerializer.java:82)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer.deserialize(CommittableCollectorSerializer.java:41)
>  ~[classes/:?]
>   at 
> org.apache.flink.core.io.SimpleVersionedSerialization.readVersionAndDeSerialize(SimpleVersionedSerialization.java:121)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.api.connector.sink2.GlobalCommitterSerializer.deserializeV2(GlobalCommitterSerializer.java:128)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.api.connector.sink2.GlobalCommitterSerializer.deserialize(GlobalCommitterSerializer.java:99)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.api.connector.sink2.GlobalCommitterSerializer.deserialize(GlobalCommitterSerializer.java:42)
>  ~[classes/:?]
>   at 
> org.apache.flink.core.io.SimpleVersionedSerialization.readVersionAndDeSerialize(SimpleVersionedSerialization.java:227)
>  ~[classes/:?]
> 

[jira] [Comment Edited] (FLINK-29627) Sink - Duplicate key exception during recover more than 1 committable.

2022-10-14 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17617611#comment-17617611
 ] 

Krzysztof Chmielewski edited comment on FLINK-29627 at 10/14/22 9:54 AM:
-

PR ready but waiting on https://github.com/apache/flink/pull/21022 to be merged.
https://github.com/apache/flink/pull/21052


was (Author: kristoffsc):
PR ready but waiting on #21022 to be merged.
https://github.com/apache/flink/pull/21052

> Sink - Duplicate key exception during recover more than 1 committable.
> --
>
> Key: FLINK-29627
> URL: https://issues.apache.org/jira/browse/FLINK-29627
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.16.0, 1.17.0, 1.15.2, 1.16.1
>Reporter: Krzysztof Chmielewski
>Assignee: Krzysztof Chmielewski
>Priority: Critical
>
> Recovery more than one Committable  causes `IllegalStateException` and 
> prevents cluster to start.
> When we recover the `CheckpointCommittableManager` we deserialize 
> SubtaskCommittableManager instances from recovery state and we put them into 
> `Map>`. The key of this map is 
> subtaskId of the recovered manager. However this will fail if we have to 
> recover more than one committable. 
> What w should do is to call `SubtaskCommittableManager::merge` if we already 
> deserialize manager for this subtaskId.
> Stack Trace:
> {code:java}
> 28603 [flink-akka.actor.default-dispatcher-8] INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Sink: Global 
> Committer (1/1) 
> (485dc57aca56235b9d1ab803c8c966ad_47d89856a1cf553f16e7063d953b7d42_0_1) 
> switched from INITIALIZING to FAILED on 2ed5c848-d360-48ae-9a92-730b022c8a39 
> @ kubernetes.docker.internal (dataPort=-1).
> java.lang.IllegalStateException: Duplicate key 0 (attempted merging values 
> org.apache.flink.streaming.runtime.operators.sink.committables.SubtaskCommittableManager@631940ac
>  and 
> org.apache.flink.streaming.runtime.operators.sink.committables.SubtaskCommittableManager@7ff3bd7)
>   at 
> java.util.stream.Collectors.duplicateKeyException(Collectors.java:133) ~[?:?]
>   at 
> java.util.stream.Collectors.lambda$uniqKeysMapAccumulator$1(Collectors.java:180)
>  ~[?:?]
>   at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169) 
> ~[?:?]
>   at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655)
>  ~[?:?]
>   at 
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) ~[?:?]
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) 
> ~[?:?]
>   at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913) 
> ~[?:?]
>   at 
> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:?]
>   at 
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578) ~[?:?]
>   at 
> org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer$CheckpointSimpleVersionedSerializer.deserialize(CommittableCollectorSerializer.java:153)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer$CheckpointSimpleVersionedSerializer.deserialize(CommittableCollectorSerializer.java:124)
>  ~[classes/:?]
>   at 
> org.apache.flink.core.io.SimpleVersionedSerialization.readVersionAndDeserializeList(SimpleVersionedSerialization.java:148)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer.deserializeV2(CommittableCollectorSerializer.java:105)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer.deserialize(CommittableCollectorSerializer.java:82)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer.deserialize(CommittableCollectorSerializer.java:41)
>  ~[classes/:?]
>   at 
> org.apache.flink.core.io.SimpleVersionedSerialization.readVersionAndDeSerialize(SimpleVersionedSerialization.java:121)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.api.connector.sink2.GlobalCommitterSerializer.deserializeV2(GlobalCommitterSerializer.java:128)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.api.connector.sink2.GlobalCommitterSerializer.deserialize(GlobalCommitterSerializer.java:99)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.api.connector.sink2.GlobalCommitterSerializer.deserialize(GlobalCommitterSerializer.java:42)
>  ~[classes/:?]
>   at 
> org.apache.flink.core.io.SimpleVersionedSerialization.readVersionAndDeSerialize(SimpleVersionedSerialization.java:227)
>  ~[classes/:?]
>   at 
> 

[jira] [Comment Edited] (FLINK-29627) Sink - Duplicate key exception during recover more than 1 committable.

2022-10-14 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17617611#comment-17617611
 ] 

Krzysztof Chmielewski edited comment on FLINK-29627 at 10/14/22 9:54 AM:
-

PR ready:
https://github.com/apache/flink/pull/21052


Pending on https://github.com/apache/flink/pull/21022.


was (Author: kristoffsc):
PR ready:
https://github.com/apache/flink/pull/21052


Pending on https://github.com/apache/flink/pull/21022 to be merged.

> Sink - Duplicate key exception during recover more than 1 committable.
> --
>
> Key: FLINK-29627
> URL: https://issues.apache.org/jira/browse/FLINK-29627
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.16.0, 1.17.0, 1.15.2, 1.16.1
>Reporter: Krzysztof Chmielewski
>Assignee: Krzysztof Chmielewski
>Priority: Critical
>
> Recovery more than one Committable  causes `IllegalStateException` and 
> prevents cluster to start.
> When we recover the `CheckpointCommittableManager` we deserialize 
> SubtaskCommittableManager instances from recovery state and we put them into 
> `Map>`. The key of this map is 
> subtaskId of the recovered manager. However this will fail if we have to 
> recover more than one committable. 
> What w should do is to call `SubtaskCommittableManager::merge` if we already 
> deserialize manager for this subtaskId.
> Stack Trace:
> {code:java}
> 28603 [flink-akka.actor.default-dispatcher-8] INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Sink: Global 
> Committer (1/1) 
> (485dc57aca56235b9d1ab803c8c966ad_47d89856a1cf553f16e7063d953b7d42_0_1) 
> switched from INITIALIZING to FAILED on 2ed5c848-d360-48ae-9a92-730b022c8a39 
> @ kubernetes.docker.internal (dataPort=-1).
> java.lang.IllegalStateException: Duplicate key 0 (attempted merging values 
> org.apache.flink.streaming.runtime.operators.sink.committables.SubtaskCommittableManager@631940ac
>  and 
> org.apache.flink.streaming.runtime.operators.sink.committables.SubtaskCommittableManager@7ff3bd7)
>   at 
> java.util.stream.Collectors.duplicateKeyException(Collectors.java:133) ~[?:?]
>   at 
> java.util.stream.Collectors.lambda$uniqKeysMapAccumulator$1(Collectors.java:180)
>  ~[?:?]
>   at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169) 
> ~[?:?]
>   at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655)
>  ~[?:?]
>   at 
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) ~[?:?]
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) 
> ~[?:?]
>   at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913) 
> ~[?:?]
>   at 
> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:?]
>   at 
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578) ~[?:?]
>   at 
> org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer$CheckpointSimpleVersionedSerializer.deserialize(CommittableCollectorSerializer.java:153)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer$CheckpointSimpleVersionedSerializer.deserialize(CommittableCollectorSerializer.java:124)
>  ~[classes/:?]
>   at 
> org.apache.flink.core.io.SimpleVersionedSerialization.readVersionAndDeserializeList(SimpleVersionedSerialization.java:148)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer.deserializeV2(CommittableCollectorSerializer.java:105)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer.deserialize(CommittableCollectorSerializer.java:82)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer.deserialize(CommittableCollectorSerializer.java:41)
>  ~[classes/:?]
>   at 
> org.apache.flink.core.io.SimpleVersionedSerialization.readVersionAndDeSerialize(SimpleVersionedSerialization.java:121)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.api.connector.sink2.GlobalCommitterSerializer.deserializeV2(GlobalCommitterSerializer.java:128)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.api.connector.sink2.GlobalCommitterSerializer.deserialize(GlobalCommitterSerializer.java:99)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.api.connector.sink2.GlobalCommitterSerializer.deserialize(GlobalCommitterSerializer.java:42)
>  ~[classes/:?]
>   at 
> org.apache.flink.core.io.SimpleVersionedSerialization.readVersionAndDeSerialize(SimpleVersionedSerialization.java:227)
>  ~[classes/:?]
>   at 
> 

[jira] [Comment Edited] (FLINK-29627) Sink - Duplicate key exception during recover more than 1 committable.

2022-10-14 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17617611#comment-17617611
 ] 

Krzysztof Chmielewski edited comment on FLINK-29627 at 10/14/22 9:53 AM:
-

PR ready but waiting on #21022 to be merged.
https://github.com/apache/flink/pull/21052


was (Author: kristoffsc):
PR
https://github.com/apache/flink/pull/21052

> Sink - Duplicate key exception during recover more than 1 committable.
> --
>
> Key: FLINK-29627
> URL: https://issues.apache.org/jira/browse/FLINK-29627
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.16.0, 1.17.0, 1.15.2, 1.16.1
>Reporter: Krzysztof Chmielewski
>Assignee: Krzysztof Chmielewski
>Priority: Critical
>
> Recovery more than one Committable  causes `IllegalStateException` and 
> prevents cluster to start.
> When we recover the `CheckpointCommittableManager` we deserialize 
> SubtaskCommittableManager instances from recovery state and we put them into 
> `Map>`. The key of this map is 
> subtaskId of the recovered manager. However this will fail if we have to 
> recover more than one committable. 
> What w should do is to call `SubtaskCommittableManager::merge` if we already 
> deserialize manager for this subtaskId.
> Stack Trace:
> {code:java}
> 28603 [flink-akka.actor.default-dispatcher-8] INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Sink: Global 
> Committer (1/1) 
> (485dc57aca56235b9d1ab803c8c966ad_47d89856a1cf553f16e7063d953b7d42_0_1) 
> switched from INITIALIZING to FAILED on 2ed5c848-d360-48ae-9a92-730b022c8a39 
> @ kubernetes.docker.internal (dataPort=-1).
> java.lang.IllegalStateException: Duplicate key 0 (attempted merging values 
> org.apache.flink.streaming.runtime.operators.sink.committables.SubtaskCommittableManager@631940ac
>  and 
> org.apache.flink.streaming.runtime.operators.sink.committables.SubtaskCommittableManager@7ff3bd7)
>   at 
> java.util.stream.Collectors.duplicateKeyException(Collectors.java:133) ~[?:?]
>   at 
> java.util.stream.Collectors.lambda$uniqKeysMapAccumulator$1(Collectors.java:180)
>  ~[?:?]
>   at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169) 
> ~[?:?]
>   at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655)
>  ~[?:?]
>   at 
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) ~[?:?]
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) 
> ~[?:?]
>   at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913) 
> ~[?:?]
>   at 
> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:?]
>   at 
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578) ~[?:?]
>   at 
> org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer$CheckpointSimpleVersionedSerializer.deserialize(CommittableCollectorSerializer.java:153)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer$CheckpointSimpleVersionedSerializer.deserialize(CommittableCollectorSerializer.java:124)
>  ~[classes/:?]
>   at 
> org.apache.flink.core.io.SimpleVersionedSerialization.readVersionAndDeserializeList(SimpleVersionedSerialization.java:148)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer.deserializeV2(CommittableCollectorSerializer.java:105)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer.deserialize(CommittableCollectorSerializer.java:82)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer.deserialize(CommittableCollectorSerializer.java:41)
>  ~[classes/:?]
>   at 
> org.apache.flink.core.io.SimpleVersionedSerialization.readVersionAndDeSerialize(SimpleVersionedSerialization.java:121)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.api.connector.sink2.GlobalCommitterSerializer.deserializeV2(GlobalCommitterSerializer.java:128)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.api.connector.sink2.GlobalCommitterSerializer.deserialize(GlobalCommitterSerializer.java:99)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.api.connector.sink2.GlobalCommitterSerializer.deserialize(GlobalCommitterSerializer.java:42)
>  ~[classes/:?]
>   at 
> org.apache.flink.core.io.SimpleVersionedSerialization.readVersionAndDeSerialize(SimpleVersionedSerialization.java:227)
>  ~[classes/:?]
>   at 
> 

[jira] [Commented] (FLINK-29627) Sink - Duplicate key exception during recover more than 1 committable.

2022-10-14 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17617611#comment-17617611
 ] 

Krzysztof Chmielewski commented on FLINK-29627:
---

PR
https://github.com/apache/flink/pull/21052

> Sink - Duplicate key exception during recover more than 1 committable.
> --
>
> Key: FLINK-29627
> URL: https://issues.apache.org/jira/browse/FLINK-29627
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.16.0, 1.17.0, 1.15.2, 1.16.1
>Reporter: Krzysztof Chmielewski
>Assignee: Krzysztof Chmielewski
>Priority: Critical
>
> Recovery more than one Committable  causes `IllegalStateException` and 
> prevents cluster to start.
> When we recover the `CheckpointCommittableManager` we deserialize 
> SubtaskCommittableManager instances from recovery state and we put them into 
> `Map>`. The key of this map is 
> subtaskId of the recovered manager. However this will fail if we have to 
> recover more than one committable. 
> What w should do is to call `SubtaskCommittableManager::merge` if we already 
> deserialize manager for this subtaskId.
> Stack Trace:
> {code:java}
> 28603 [flink-akka.actor.default-dispatcher-8] INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Sink: Global 
> Committer (1/1) 
> (485dc57aca56235b9d1ab803c8c966ad_47d89856a1cf553f16e7063d953b7d42_0_1) 
> switched from INITIALIZING to FAILED on 2ed5c848-d360-48ae-9a92-730b022c8a39 
> @ kubernetes.docker.internal (dataPort=-1).
> java.lang.IllegalStateException: Duplicate key 0 (attempted merging values 
> org.apache.flink.streaming.runtime.operators.sink.committables.SubtaskCommittableManager@631940ac
>  and 
> org.apache.flink.streaming.runtime.operators.sink.committables.SubtaskCommittableManager@7ff3bd7)
>   at 
> java.util.stream.Collectors.duplicateKeyException(Collectors.java:133) ~[?:?]
>   at 
> java.util.stream.Collectors.lambda$uniqKeysMapAccumulator$1(Collectors.java:180)
>  ~[?:?]
>   at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169) 
> ~[?:?]
>   at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655)
>  ~[?:?]
>   at 
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) ~[?:?]
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) 
> ~[?:?]
>   at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913) 
> ~[?:?]
>   at 
> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:?]
>   at 
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578) ~[?:?]
>   at 
> org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer$CheckpointSimpleVersionedSerializer.deserialize(CommittableCollectorSerializer.java:153)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer$CheckpointSimpleVersionedSerializer.deserialize(CommittableCollectorSerializer.java:124)
>  ~[classes/:?]
>   at 
> org.apache.flink.core.io.SimpleVersionedSerialization.readVersionAndDeserializeList(SimpleVersionedSerialization.java:148)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer.deserializeV2(CommittableCollectorSerializer.java:105)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer.deserialize(CommittableCollectorSerializer.java:82)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer.deserialize(CommittableCollectorSerializer.java:41)
>  ~[classes/:?]
>   at 
> org.apache.flink.core.io.SimpleVersionedSerialization.readVersionAndDeSerialize(SimpleVersionedSerialization.java:121)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.api.connector.sink2.GlobalCommitterSerializer.deserializeV2(GlobalCommitterSerializer.java:128)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.api.connector.sink2.GlobalCommitterSerializer.deserialize(GlobalCommitterSerializer.java:99)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.api.connector.sink2.GlobalCommitterSerializer.deserialize(GlobalCommitterSerializer.java:42)
>  ~[classes/:?]
>   at 
> org.apache.flink.core.io.SimpleVersionedSerialization.readVersionAndDeSerialize(SimpleVersionedSerialization.java:227)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.api.operators.util.SimpleVersionedListState$DeserializingIterator.next(SimpleVersionedListState.java:138)
>  ~[classes/:?]
>   at java.lang.Iterable.forEach(Iterable.java:74) ~[?:?]
>   at 
> 

[jira] [Updated] (FLINK-29627) Sink - Duplicate key exception during recover more than 1 committable.

2022-10-13 Thread Krzysztof Chmielewski (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-29627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krzysztof Chmielewski updated FLINK-29627:
--
Description: 
Recovery more than one Committable  causes `IllegalStateException` and prevents 
cluster to start.

When we recover the `CheckpointCommittableManager` we deserialize 
SubtaskCommittableManager instances from recovery state and we put them into 
`Map>`. The key of this map is 
subtaskId of the recovered manager. However this will fail if we have to 
recover more than one committable. 

What w should do is to call `SubtaskCommittableManager::merge` if we already 
deserialize manager for this subtaskId.


Stack Trace:
{code:java}
28603 [flink-akka.actor.default-dispatcher-8] INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Sink: Global 
Committer (1/1) 
(485dc57aca56235b9d1ab803c8c966ad_47d89856a1cf553f16e7063d953b7d42_0_1) 
switched from INITIALIZING to FAILED on 2ed5c848-d360-48ae-9a92-730b022c8a39 @ 
kubernetes.docker.internal (dataPort=-1).
java.lang.IllegalStateException: Duplicate key 0 (attempted merging values 
org.apache.flink.streaming.runtime.operators.sink.committables.SubtaskCommittableManager@631940ac
 and 
org.apache.flink.streaming.runtime.operators.sink.committables.SubtaskCommittableManager@7ff3bd7)
at 
java.util.stream.Collectors.duplicateKeyException(Collectors.java:133) ~[?:?]
at 
java.util.stream.Collectors.lambda$uniqKeysMapAccumulator$1(Collectors.java:180)
 ~[?:?]
at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169) 
~[?:?]
at 
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655) 
~[?:?]
at 
java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) ~[?:?]
at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) 
~[?:?]
at 
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913) 
~[?:?]
at 
java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:?]
at 
java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578) ~[?:?]
at 
org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer$CheckpointSimpleVersionedSerializer.deserialize(CommittableCollectorSerializer.java:153)
 ~[classes/:?]
at 
org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer$CheckpointSimpleVersionedSerializer.deserialize(CommittableCollectorSerializer.java:124)
 ~[classes/:?]
at 
org.apache.flink.core.io.SimpleVersionedSerialization.readVersionAndDeserializeList(SimpleVersionedSerialization.java:148)
 ~[classes/:?]
at 
org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer.deserializeV2(CommittableCollectorSerializer.java:105)
 ~[classes/:?]
at 
org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer.deserialize(CommittableCollectorSerializer.java:82)
 ~[classes/:?]
at 
org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer.deserialize(CommittableCollectorSerializer.java:41)
 ~[classes/:?]
at 
org.apache.flink.core.io.SimpleVersionedSerialization.readVersionAndDeSerialize(SimpleVersionedSerialization.java:121)
 ~[classes/:?]
at 
org.apache.flink.streaming.api.connector.sink2.GlobalCommitterSerializer.deserializeV2(GlobalCommitterSerializer.java:128)
 ~[classes/:?]
at 
org.apache.flink.streaming.api.connector.sink2.GlobalCommitterSerializer.deserialize(GlobalCommitterSerializer.java:99)
 ~[classes/:?]
at 
org.apache.flink.streaming.api.connector.sink2.GlobalCommitterSerializer.deserialize(GlobalCommitterSerializer.java:42)
 ~[classes/:?]
at 
org.apache.flink.core.io.SimpleVersionedSerialization.readVersionAndDeSerialize(SimpleVersionedSerialization.java:227)
 ~[classes/:?]
at 
org.apache.flink.streaming.api.operators.util.SimpleVersionedListState$DeserializingIterator.next(SimpleVersionedListState.java:138)
 ~[classes/:?]
at java.lang.Iterable.forEach(Iterable.java:74) ~[?:?]
at 
org.apache.flink.streaming.api.connector.sink2.GlobalCommitterOperator.initializeState(GlobalCommitterOperator.java:133)
 ~[classes/:?]
at 
org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.initializeOperatorState(StreamOperatorStateHandler.java:122)
 ~[classes/:?]
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:286)
 ~[classes/:?]
at 
org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:106)
 ~[classes/:?]
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:727)
 ~[classes/:?]
at 

[jira] [Updated] (FLINK-29627) Sink - Duplicate key exception during recover more than 1 committable.

2022-10-13 Thread Krzysztof Chmielewski (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-29627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krzysztof Chmielewski updated FLINK-29627:
--
Description: 
Recovery more than one Committable  causes `IllegalStateException` and prevents 
cluster to start.

When we recover the `CheckpointCommittableManager` we deserialize 
SubtaskCommittableManager instances from recovery state and we put them into 
`Map>`. The key of this map is 
subtaskId of the recovered manager. However this will fail if we have to 
recover more than one committable. 

What w should do is to call `SubtaskCommittableManager::merge` if we already 
deserialzie manager for this subtaskId.


Stack Trace:
{code:java}
28603 [flink-akka.actor.default-dispatcher-8] INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Sink: Global 
Committer (1/1) 
(485dc57aca56235b9d1ab803c8c966ad_47d89856a1cf553f16e7063d953b7d42_0_1) 
switched from INITIALIZING to FAILED on 2ed5c848-d360-48ae-9a92-730b022c8a39 @ 
kubernetes.docker.internal (dataPort=-1).
java.lang.IllegalStateException: Duplicate key 0 (attempted merging values 
org.apache.flink.streaming.runtime.operators.sink.committables.SubtaskCommittableManager@631940ac
 and 
org.apache.flink.streaming.runtime.operators.sink.committables.SubtaskCommittableManager@7ff3bd7)
at 
java.util.stream.Collectors.duplicateKeyException(Collectors.java:133) ~[?:?]
at 
java.util.stream.Collectors.lambda$uniqKeysMapAccumulator$1(Collectors.java:180)
 ~[?:?]
at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169) 
~[?:?]
at 
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655) 
~[?:?]
at 
java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) ~[?:?]
at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) 
~[?:?]
at 
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913) 
~[?:?]
at 
java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:?]
at 
java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578) ~[?:?]
at 
org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer$CheckpointSimpleVersionedSerializer.deserialize(CommittableCollectorSerializer.java:153)
 ~[classes/:?]
at 
org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer$CheckpointSimpleVersionedSerializer.deserialize(CommittableCollectorSerializer.java:124)
 ~[classes/:?]
at 
org.apache.flink.core.io.SimpleVersionedSerialization.readVersionAndDeserializeList(SimpleVersionedSerialization.java:148)
 ~[classes/:?]
at 
org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer.deserializeV2(CommittableCollectorSerializer.java:105)
 ~[classes/:?]
at 
org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer.deserialize(CommittableCollectorSerializer.java:82)
 ~[classes/:?]
at 
org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer.deserialize(CommittableCollectorSerializer.java:41)
 ~[classes/:?]
at 
org.apache.flink.core.io.SimpleVersionedSerialization.readVersionAndDeSerialize(SimpleVersionedSerialization.java:121)
 ~[classes/:?]
at 
org.apache.flink.streaming.api.connector.sink2.GlobalCommitterSerializer.deserializeV2(GlobalCommitterSerializer.java:128)
 ~[classes/:?]
at 
org.apache.flink.streaming.api.connector.sink2.GlobalCommitterSerializer.deserialize(GlobalCommitterSerializer.java:99)
 ~[classes/:?]
at 
org.apache.flink.streaming.api.connector.sink2.GlobalCommitterSerializer.deserialize(GlobalCommitterSerializer.java:42)
 ~[classes/:?]
at 
org.apache.flink.core.io.SimpleVersionedSerialization.readVersionAndDeSerialize(SimpleVersionedSerialization.java:227)
 ~[classes/:?]
at 
org.apache.flink.streaming.api.operators.util.SimpleVersionedListState$DeserializingIterator.next(SimpleVersionedListState.java:138)
 ~[classes/:?]
at java.lang.Iterable.forEach(Iterable.java:74) ~[?:?]
at 
org.apache.flink.streaming.api.connector.sink2.GlobalCommitterOperator.initializeState(GlobalCommitterOperator.java:133)
 ~[classes/:?]
at 
org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.initializeOperatorState(StreamOperatorStateHandler.java:122)
 ~[classes/:?]
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:286)
 ~[classes/:?]
at 
org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:106)
 ~[classes/:?]
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:727)
 ~[classes/:?]
at 

[jira] [Commented] (FLINK-29627) Sink - Duplicate key exception during recover more than 1 committable.

2022-10-13 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17616925#comment-17616925
 ] 

Krzysztof Chmielewski commented on FLINK-29627:
---

I have a fix and tests for this issue.
Will provide PR shortly

> Sink - Duplicate key exception during recover more than 1 committable.
> --
>
> Key: FLINK-29627
> URL: https://issues.apache.org/jira/browse/FLINK-29627
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.16.0, 1.17.0, 1.15.2, 1.16.1
>Reporter: Krzysztof Chmielewski
>Priority: Critical
>
> Recovery more then one Committable  causes `IllegalStateException` and 
> prevents cluster to start.
> When we recover the `CheckpointCommittableManager` we deserialize 
> SubtaskCommittableManager instances from recovery state and we put them into 
> `Map>`. The key of this map is 
> subtaskId of the recovered manager. However this will fail if we have to 
> recover more than one committable. 
> What w should do is to call `SubtaskCommittableManager::merge` if we already 
> deserialzie manager for this subtaskId.
> Stack Trace:
> {code:java}
> 28603 [flink-akka.actor.default-dispatcher-8] INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Sink: Global 
> Committer (1/1) 
> (485dc57aca56235b9d1ab803c8c966ad_47d89856a1cf553f16e7063d953b7d42_0_1) 
> switched from INITIALIZING to FAILED on 2ed5c848-d360-48ae-9a92-730b022c8a39 
> @ kubernetes.docker.internal (dataPort=-1).
> java.lang.IllegalStateException: Duplicate key 0 (attempted merging values 
> org.apache.flink.streaming.runtime.operators.sink.committables.SubtaskCommittableManager@631940ac
>  and 
> org.apache.flink.streaming.runtime.operators.sink.committables.SubtaskCommittableManager@7ff3bd7)
>   at 
> java.util.stream.Collectors.duplicateKeyException(Collectors.java:133) ~[?:?]
>   at 
> java.util.stream.Collectors.lambda$uniqKeysMapAccumulator$1(Collectors.java:180)
>  ~[?:?]
>   at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169) 
> ~[?:?]
>   at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655)
>  ~[?:?]
>   at 
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) ~[?:?]
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) 
> ~[?:?]
>   at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913) 
> ~[?:?]
>   at 
> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:?]
>   at 
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578) ~[?:?]
>   at 
> org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer$CheckpointSimpleVersionedSerializer.deserialize(CommittableCollectorSerializer.java:153)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer$CheckpointSimpleVersionedSerializer.deserialize(CommittableCollectorSerializer.java:124)
>  ~[classes/:?]
>   at 
> org.apache.flink.core.io.SimpleVersionedSerialization.readVersionAndDeserializeList(SimpleVersionedSerialization.java:148)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer.deserializeV2(CommittableCollectorSerializer.java:105)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer.deserialize(CommittableCollectorSerializer.java:82)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer.deserialize(CommittableCollectorSerializer.java:41)
>  ~[classes/:?]
>   at 
> org.apache.flink.core.io.SimpleVersionedSerialization.readVersionAndDeSerialize(SimpleVersionedSerialization.java:121)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.api.connector.sink2.GlobalCommitterSerializer.deserializeV2(GlobalCommitterSerializer.java:128)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.api.connector.sink2.GlobalCommitterSerializer.deserialize(GlobalCommitterSerializer.java:99)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.api.connector.sink2.GlobalCommitterSerializer.deserialize(GlobalCommitterSerializer.java:42)
>  ~[classes/:?]
>   at 
> org.apache.flink.core.io.SimpleVersionedSerialization.readVersionAndDeSerialize(SimpleVersionedSerialization.java:227)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.api.operators.util.SimpleVersionedListState$DeserializingIterator.next(SimpleVersionedListState.java:138)
>  ~[classes/:?]
>   at java.lang.Iterable.forEach(Iterable.java:74) ~[?:?]
>   at 
> 

[jira] [Created] (FLINK-29627) Sink - Duplicate key exception during recover more than 1 committable.

2022-10-13 Thread Krzysztof Chmielewski (Jira)
Krzysztof Chmielewski created FLINK-29627:
-

 Summary: Sink - Duplicate key exception during recover more than 1 
committable.
 Key: FLINK-29627
 URL: https://issues.apache.org/jira/browse/FLINK-29627
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.15.2, 1.16.0, 1.17.0, 1.16.1
Reporter: Krzysztof Chmielewski


Recovery more then one Committable  causes `IllegalStateException` and prevents 
cluster to start.

When we recover the `CheckpointCommittableManager` we deserialize 
SubtaskCommittableManager instances from recovery state and we put them into 
`Map>`. The key of this map is 
subtaskId of the recovered manager. However this will fail if we have to 
recover more than one committable. 

What w should do is to call `SubtaskCommittableManager::merge` if we already 
deserialzie manager for this subtaskId.


Stack Trace:
{code:java}
28603 [flink-akka.actor.default-dispatcher-8] INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Sink: Global 
Committer (1/1) 
(485dc57aca56235b9d1ab803c8c966ad_47d89856a1cf553f16e7063d953b7d42_0_1) 
switched from INITIALIZING to FAILED on 2ed5c848-d360-48ae-9a92-730b022c8a39 @ 
kubernetes.docker.internal (dataPort=-1).
java.lang.IllegalStateException: Duplicate key 0 (attempted merging values 
org.apache.flink.streaming.runtime.operators.sink.committables.SubtaskCommittableManager@631940ac
 and 
org.apache.flink.streaming.runtime.operators.sink.committables.SubtaskCommittableManager@7ff3bd7)
at 
java.util.stream.Collectors.duplicateKeyException(Collectors.java:133) ~[?:?]
at 
java.util.stream.Collectors.lambda$uniqKeysMapAccumulator$1(Collectors.java:180)
 ~[?:?]
at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169) 
~[?:?]
at 
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655) 
~[?:?]
at 
java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) ~[?:?]
at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) 
~[?:?]
at 
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913) 
~[?:?]
at 
java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:?]
at 
java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578) ~[?:?]
at 
org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer$CheckpointSimpleVersionedSerializer.deserialize(CommittableCollectorSerializer.java:153)
 ~[classes/:?]
at 
org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer$CheckpointSimpleVersionedSerializer.deserialize(CommittableCollectorSerializer.java:124)
 ~[classes/:?]
at 
org.apache.flink.core.io.SimpleVersionedSerialization.readVersionAndDeserializeList(SimpleVersionedSerialization.java:148)
 ~[classes/:?]
at 
org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer.deserializeV2(CommittableCollectorSerializer.java:105)
 ~[classes/:?]
at 
org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer.deserialize(CommittableCollectorSerializer.java:82)
 ~[classes/:?]
at 
org.apache.flink.streaming.runtime.operators.sink.committables.CommittableCollectorSerializer.deserialize(CommittableCollectorSerializer.java:41)
 ~[classes/:?]
at 
org.apache.flink.core.io.SimpleVersionedSerialization.readVersionAndDeSerialize(SimpleVersionedSerialization.java:121)
 ~[classes/:?]
at 
org.apache.flink.streaming.api.connector.sink2.GlobalCommitterSerializer.deserializeV2(GlobalCommitterSerializer.java:128)
 ~[classes/:?]
at 
org.apache.flink.streaming.api.connector.sink2.GlobalCommitterSerializer.deserialize(GlobalCommitterSerializer.java:99)
 ~[classes/:?]
at 
org.apache.flink.streaming.api.connector.sink2.GlobalCommitterSerializer.deserialize(GlobalCommitterSerializer.java:42)
 ~[classes/:?]
at 
org.apache.flink.core.io.SimpleVersionedSerialization.readVersionAndDeSerialize(SimpleVersionedSerialization.java:227)
 ~[classes/:?]
at 
org.apache.flink.streaming.api.operators.util.SimpleVersionedListState$DeserializingIterator.next(SimpleVersionedListState.java:138)
 ~[classes/:?]
at java.lang.Iterable.forEach(Iterable.java:74) ~[?:?]
at 
org.apache.flink.streaming.api.connector.sink2.GlobalCommitterOperator.initializeState(GlobalCommitterOperator.java:133)
 ~[classes/:?]
at 
org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.initializeOperatorState(StreamOperatorStateHandler.java:122)
 ~[classes/:?]
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:286)
 ~[classes/:?]
at 

[jira] [Commented] (FLINK-29589) Data Loss in Sink GlobalCommitter during Task Manager recovery

2022-10-12 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17616307#comment-17616307
 ] 

Krzysztof Chmielewski commented on FLINK-29589:
---

Hi [~chesnay]

V2 on 1.15, 1.16 and 1.17 has its own issues that we have found and actually we 
are working to fix them with Fabian Paul.

https://issues.apache.org/jira/browse/FLINK-29509
https://issues.apache.org/jira/browse/FLINK-29583
https://issues.apache.org/jira/browse/FLINK-29512

With those stil on the plate we cant really tell if there is a data loss on V2 
since Task manager is failing to start during recovery when running Sink with 
global committer.




> Data Loss in Sink GlobalCommitter during Task Manager recovery
> --
>
> Key: FLINK-29589
> URL: https://issues.apache.org/jira/browse/FLINK-29589
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Krzysztof Chmielewski
>Priority: Blocker
>
> Flink's Sink architecture with global committer seems to be vulnerable for 
> data loss during Task Manager recovery. The entire checkpoint can be lost by 
> _GlobalCommitter_ resulting with data loss.
> Issue was observed in Delta Sink connector on a real 1.14.x cluster and was 
> replicated using Flink's 1.14.6 Test Utils classes.
> Scenario:
>  #  Streaming source emitting constant number of events per checkpoint (20 
> events per commit for 5 commits in total, that gives 100 records).
>  #  Sink with parallelism > 1 with committer and _GlobalCommitter_ elements.
>  #  _Commiters_ processed committables for *checkpointId 2*.
>  #  _GlobalCommitter_ throws exception (desired exception) during 
> *checkpointId 2* (third commit) while processing data from *checkpoint 1* (it 
> is expected to global committer architecture lag one commit behind in 
> reference to rest of the pipeline).
>  # Task Manager recovery, source resumes sending data.
>  # Streaming source ends.
>  # We are missing 20 records (one checkpoint).
> What is happening is that during recovery, committers are performing "retry" 
> on committables for *checkpointId 2*, however those committables, reprocessed 
> from "retry" task are not emit downstream to the global committer. 
> The issue can be reproduced using Junit Test build with Flink's TestSink.
> The test was [implemented 
> here|https://github.com/kristoffSC/flink/blob/Flink_1.14_DataLoss_SinkGlobalCommitter/flink-tests/src/test/java/org/apache/flink/test/streaming/runtime/SinkITCase.java#:~:text=testGlobalCommitterMissingRecordsDuringRecovery]
>  and it is based on other tests from `SinkITCase.java` class.
> The test reproduces the issue in more than 90% of runs.
> I believe that problem is somewhere around 
> *SinkOperator::notifyCheckpointComplete* method. In there we see that Retry 
> async task is scheduled however its result is never emitted downstream like 
> it is done for regular flow one line above.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-29589) Data Loss in Sink GlobalCommitter during Task Manager recovery

2022-10-11 Thread Krzysztof Chmielewski (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-29589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krzysztof Chmielewski updated FLINK-29589:
--
Description: 
Flink's Sink architecture with global committer seems to be vulnerable for data 
loss during Task Manager recovery. The entire checkpoint can be lost by 
_GlobalCommitter_ resulting with data loss.

Issue was observed in Delta Sink connector on a real 1.14.x cluster and was 
replicated using Flink's 1.14.6 Test Utils classes.

Scenario:
 #  Streaming source emitting constant number of events per checkpoint (20 
events per commit for 5 commits in total, that gives 100 records).
 #  Sink with parallelism > 1 with committer and _GlobalCommitter_ elements.
 #  _Commiters_ processed committables for *checkpointId 2*.
 #  _GlobalCommitter_ throws exception (desired exception) during *checkpointId 
2* (third commit) while processing data from *checkpoint 1* (it is expected to 
global committer architecture lag one commit behind in reference to rest of the 
pipeline).
 # Task Manager recovery, source resumes sending data.
 # Streaming source ends.
 # We are missing 20 records (one checkpoint).

What is happening is that during recovery, committers are performing "retry" on 
committables for *checkpointId 2*, however those committables, reprocessed from 
"retry" task are not emit downstream to the global committer. 

The issue can be reproduced using Junit Test build with Flink's TestSink.
The test was [implemented 
here|https://github.com/kristoffSC/flink/blob/Flink_1.14_DataLoss_SinkGlobalCommitter/flink-tests/src/test/java/org/apache/flink/test/streaming/runtime/SinkITCase.java#:~:text=testGlobalCommitterMissingRecordsDuringRecovery]
 and it is based on other tests from `SinkITCase.java` class.
The test reproduces the issue in more than 90% of runs.

I believe that problem is somewhere around 
*SinkOperator::notifyCheckpointComplete* method. In there we see that Retry 
async task is scheduled however its result is never emitted downstream like it 
is done for regular flow one line above.

  was:
Flink's Sink architecture with global committer seems to be vulnerable for data 
loss during Task Manager recovery. The entire checkpoint can be lost by 
`GlobalCommitter` resulting with data loss.

Issue was observed in Delta Sink connector on a real 1.14.x cluster and was 
replicated using Flink's 1.14.6 Test Utils classes.

Scenario:
 #  Streaming source emitting constant number of events per checkpoint (20 
events per commit for 5 commits in total, that gives 100 records).
 #  Sink with parallelism > 1 with committer and `GlobalCommitter` elements.
 #  `Commiters` processed committables for checkpointId 2.
 #  `GlobalCommitter` throws exception (desired exception) during `checkpointId 
2` (third commit) while processing data from checkpoint 1 (it is expected to 
global committer architecture lag one commit behind in reference to rest of the 
pipeline).
 # Task Manager recovery, source resumes sending data.
 # Streaming source ends.
 # We are missing 20 records (one checkpoint).

What is happening is that during recovery, committers are performing "retry" on 
committables for `checkpointId 2`, however those committables, reprocessed from 
"retry" task are not emit downstream to the global committer. 

The issue can be reproduced using Junit Test build with Flink's TestSink.
The test was [implemented 
here|https://github.com/kristoffSC/flink/blob/Flink_1.14_DataLoss_SinkGlobalCommitter/flink-tests/src/test/java/org/apache/flink/test/streaming/runtime/SinkITCase.java#:~:text=testGlobalCommitterMissingRecordsDuringRecovery]
 and it is based on other tests from `SinkITCase.java` class.
The test reproduces the issue in more than 90% of runs.

I believe that problem is somewhere around 
`SinkOperator::notifyCheckpointComplete` method. In there we see that Retry 
async task is scheduled however its result is never emitted downstream like it 
is done for regular flow one line above.


> Data Loss in Sink GlobalCommitter during Task Manager recovery
> --
>
> Key: FLINK-29589
> URL: https://issues.apache.org/jira/browse/FLINK-29589
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.14.0, 1.14.2, 1.14.3, 1.14.4, 1.14.5, 1.14.6
>Reporter: Krzysztof Chmielewski
>Priority: Blocker
>
> Flink's Sink architecture with global committer seems to be vulnerable for 
> data loss during Task Manager recovery. The entire checkpoint can be lost by 
> _GlobalCommitter_ resulting with data loss.
> Issue was observed in Delta Sink connector on a real 1.14.x cluster and was 
> replicated using Flink's 1.14.6 Test Utils classes.
> Scenario:
>  #  Streaming source emitting constant number of events per checkpoint (20 
> events per commit for 5 commits in total, that gives 

[jira] [Updated] (FLINK-29589) Data Loss in Sink GlobalCommitter during Task Manager recovery

2022-10-11 Thread Krzysztof Chmielewski (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-29589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krzysztof Chmielewski updated FLINK-29589:
--
Description: 
Flink's Sink architecture with global committer seems to be vulnerable for data 
loss during Task Manager recovery. The entire checkpoint can be lost by 
`GlobalCommitter` resulting with data loss.

Issue was observed in Delta Sink connector on a real 1.14.x cluster and was 
replicated using Flink's 1.14.6 Test Utils classes.

Scenario:
 #  Streaming source emitting constant number of events per checkpoint (20 
events per commit for 5 commits in total, that gives 100 records).
 #  Sink with parallelism > 1 with committer and `GlobalCommitter` elements.
 #  `Commiters` processed committables for checkpointId 2.
 #  `GlobalCommitter` throws exception (desired exception) during `checkpointId 
2` (third commit) while processing data from checkpoint 1 (it is expected to 
global committer architecture lag one commit behind in reference to rest of the 
pipeline).
 # Task Manager recovery, source resumes sending data.
 # Streaming source ends.
 # We are missing 20 records (one checkpoint).

What is happening is that during recovery, committers are performing "retry" on 
committables for `checkpointId 2`, however those committables, reprocessed from 
"retry" task are not emit downstream to the global committer. 

The issue can be reproduced using Junit Test build with Flink's TestSink.
The test was [implemented 
here|https://github.com/kristoffSC/flink/blob/Flink_1.14_DataLoss_SinkGlobalCommitter/flink-tests/src/test/java/org/apache/flink/test/streaming/runtime/SinkITCase.java#:~:text=testGlobalCommitterMissingRecordsDuringRecovery]
 and it is based on other tests from `SinkITCase.java` class.
The test reproduces the issue in more than 90% of runs.

I believe that problem is somewhere around 
`SinkOperator::notifyCheckpointComplete` method. In there we see that Retry 
async task is scheduled however its result is never emitted downstream like it 
is done for regular flow one line above.

  was:
Flink's Sink architecture with global committer seems to be vulnerable for data 
loss during Task Manager recovery. The entire checkpoint can be lost by 
GlobalCommitter resulting with data loss.

Issue was observed in Delta Sink connector on a real 1.14.x cluster and was 
replicated using Flink's 1.14.6 Test Utils classes.

Scenario:
 #  Streaming source emitting constant number of events per checkpoint (20 
events per commit for 5 commits in total, that gives 100 records).
 #  Sink with parallelism > 1 with committer and GlobalCommitter elements.
 #  Commiters processed committables for checkpointId 2.
 #  GlobalCommitter throws exception (desired exception) during checkpointId 2 
(third commit) while processing data from checkpoint 1 (it is expected to 
global committer architecture lag one commit behind in reference to rest of the 
pipeline).
 # Task Manager recovery, source resumes sending data.
 # Streaming source ends.
 #  We are missing 20 records (one checkpoint).

What is happening is that during recovery, committers are performing "retry" on 
committables for checkpointId 2, however those committables, reprocessed from 
"retry" task are not emit downstream to the global committer. 

The issue can be reproduced using Junit Test builded with Flink's TestSink.
The test was [implemented 
here|https://github.com/kristoffSC/flink/blob/Flink_1.14_DataLoss_SinkGlobalCommitter/flink-tests/src/test/java/org/apache/flink/test/streaming/runtime/SinkITCase.java#:~:text=testGlobalCommitterMissingRecordsDuringRecovery]
 and it is based on other tests from SinkITCase.java class.

I believe that problem is somewhere around 
`SinkOperator::notifyCheckpointComplete` method. In there we see that Retry 
async task is scheduled however its result is never emitted downstream like it 
is done for regular flow one line above.


> Data Loss in Sink GlobalCommitter during Task Manager recovery
> --
>
> Key: FLINK-29589
> URL: https://issues.apache.org/jira/browse/FLINK-29589
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.14.0, 1.14.2, 1.14.3, 1.14.4, 1.14.5, 1.14.6
>Reporter: Krzysztof Chmielewski
>Priority: Blocker
>
> Flink's Sink architecture with global committer seems to be vulnerable for 
> data loss during Task Manager recovery. The entire checkpoint can be lost by 
> `GlobalCommitter` resulting with data loss.
> Issue was observed in Delta Sink connector on a real 1.14.x cluster and was 
> replicated using Flink's 1.14.6 Test Utils classes.
> Scenario:
>  #  Streaming source emitting constant number of events per checkpoint (20 
> events per commit for 5 commits in total, that gives 100 records).
>  #  Sink with parallelism > 1 with committer and 

[jira] [Updated] (FLINK-29589) Data Loss in Sink GlobalCommitter during Task Manager recovery

2022-10-11 Thread Krzysztof Chmielewski (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-29589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krzysztof Chmielewski updated FLINK-29589:
--
Description: 
Flink's Sink architecture with global committer seems to be vulnerable for data 
loss during Task Manager recovery. The entire checkpoint can be lost by 
GlobalCommitter resulting with data loss.

Issue was observed in Delta Sink connector on a real 1.14.x cluster and was 
replicated using Flink's 1.14.6 Test Utils classes.

Scenario:
 #  Streaming source emitting constant number of events per checkpoint (20 
events per commit for 5 commits in total, that gives 100 records).
 #  Sink with parallelism > 1 with committer and GlobalCommitter elements.
 #  Commiters processed committables for checkpointId 2.
 #  GlobalCommitter throws exception (desired exception) during checkpointId 2 
(third commit) while processing data from checkpoint 1 (it is expected to 
global committer architecture lag one commit behind in reference to rest of the 
pipeline).
 # Task Manager recovery, source resumes sending data.
 # Streaming source ends.
 #  We are missing 20 records (one checkpoint).

What is happening is that during recovery, committers are performing "retry" on 
committables for checkpointId 2, however those committables, reprocessed from 
"retry" task are not emit downstream to the global committer. 

The issue can be reproduced using Junit Test builded with Flink's TestSink.
The test was [implemented 
here|https://github.com/kristoffSC/flink/blob/Flink_1.14_DataLoss_SinkGlobalCommitter/flink-tests/src/test/java/org/apache/flink/test/streaming/runtime/SinkITCase.java#:~:text=testGlobalCommitterMissingRecordsDuringRecovery]
 and it is based on other tests from SinkITCase.java class.

I believe that problem is somewhere around 
`SinkOperator::notifyCheckpointComplete` method. In there we see that Retry 
async task is scheduled however its result is never emitted downstream like it 
is done for regular flow one line above.

  was:
Flink's Sink architecture with global committer seems to be vulnerable for data 
loss during Task Manager recovery. The entire checkpoint can be lost by 
GlobalCommitter resulting with data loss.

Issue was observed in Delta Sink connector on a real 1.14.x cluster and was 
replicated using Flink's 1.14.6 Test Utils classes.

Scenario:
 #  Streaming source emitting constant number of events per checkpoint (20 
events per commit for 5 commits in total, that gives 100 records).
 #  Sink with parallelism > 1 with committer and GlobalCommitter elements.
 #  Commiters processed committables for checkpointId 2.
 #  GlobalCommitter throws exception (desired exception) during checkpointId 2 
(third commit) while processing data from checkpoint 1 (it is expected to 
global committer architecture lag one commit behind in reference to rest of the 
pipeline).
 # Task Manager recovery, source resumes sending data.
 # Streaming source ends.
 #  We are missing 20 records (one checkpoint).

What is happening is that during recovery, committers are performing "retry" on 
committables for checkpointId 2, however those committables, reprocessed from 
"retry" task are not emit downstream to the global committer. 

The issue can be reproduced using Junit Test builded with Flink's TestSink.
The test was implemented 
[here|https://github.com/kristoffSC/flink/blob/Flink_1.14_DataLoss_SinkGlobalCommitter/flink-tests/src/test/java/org/apache/flink/test/streaming/runtime/SinkITCase.java#:~:text=testGlobalCommitterMissingRecordsDuringRecovery]
 and it is based on other tests from SinkITCase.java class.

I believe that problem is somewhere around 
`SinkOperator::notifyCheckpointComplete` method. In there we see that Retry 
async task is scheduled however its result is never emitted downstream like it 
is done for regular flow one line above.


> Data Loss in Sink GlobalCommitter during Task Manager recovery
> --
>
> Key: FLINK-29589
> URL: https://issues.apache.org/jira/browse/FLINK-29589
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.14.0, 1.14.2, 1.14.3, 1.14.4, 1.14.5, 1.14.6
>Reporter: Krzysztof Chmielewski
>Priority: Blocker
>
> Flink's Sink architecture with global committer seems to be vulnerable for 
> data loss during Task Manager recovery. The entire checkpoint can be lost by 
> GlobalCommitter resulting with data loss.
> Issue was observed in Delta Sink connector on a real 1.14.x cluster and was 
> replicated using Flink's 1.14.6 Test Utils classes.
> Scenario:
>  #  Streaming source emitting constant number of events per checkpoint (20 
> events per commit for 5 commits in total, that gives 100 records).
>  #  Sink with parallelism > 1 with committer and GlobalCommitter elements.
>  #  Commiters processed committables for 

[jira] [Updated] (FLINK-29589) Data Loss in Sink GlobalCommitter during Task Manager recovery

2022-10-11 Thread Krzysztof Chmielewski (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-29589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krzysztof Chmielewski updated FLINK-29589:
--
Description: 
Flink's Sink architecture with global committer seems to be vulnerable for data 
loss during Task Manager recovery. The entire checkpoint can be lost by 
GlobalCommitter resulting with data loss.

Issue was observed in Delta Sink connector on a real 1.14.x cluster and was 
replicated using Flink's 1.14.6 Test Utils classes.

Scenario:
 #  Streaming source emitting constant number of events per checkpoint (20 
events per commit for 5 commits in total, that gives 100 records).
 #  Sink with parallelism > 1 with committer and GlobalCommitter elements.
 #  Commiters processed committables for checkpointId 2.
 #  GlobalCommitter throws exception (desired exception) during checkpointId 2 
(third commit) while processing data from checkpoint 1 (it is expected to 
global committer architecture lag one commit behind in reference to rest of the 
pipeline).
 # Task Manager recovery, source resumes sending data.
 # Streaming source ends.
 #  We are missing 20 records (one checkpoint).

What is happening is that during recovery, committers are performing "retry" on 
committables for checkpointId 2, however those committables, reprocessed from 
"retry" task are not emit downstream to the global committer. 

The issue can be reproduced using Junit Test builded with Flink's TestSink.
The test was implemented 
[here|https://github.com/kristoffSC/flink/blob/Flink_1.14_DataLoss_SinkGlobalCommitter/flink-tests/src/test/java/org/apache/flink/test/streaming/runtime/SinkITCase.java#:~:text=testGlobalCommitterMissingRecordsDuringRecovery]
 and it is based on other tests from SinkITCase.java class.

I believe that problem is somewhere around 
`SinkOperator::notifyCheckpointComplete` method. In there we see that Retry 
async task is scheduled however its result is never emitted downstream like it 
is done for regular flow one line above.

  was:
Flink's Sink architecture with global committer seems to be vulnerable for data 
loss during Task Manager recovery. The entire checkpoint can be lost by 
GlobalCommitter resulting with data loss.

Issue was observed in Delta Sink connector on a real 1.14.x cluster and was 
replicated using Flink's 1.14.6 Test Utils classes.

Scenario:
 #  Streaming source emitting constant number of events per checkpoint (20 
events for 5 commits)
 #  Sink with parallelism > 1 with committer and GlobalCommitter elements.
 #  Commiters processed committables for checkpointId 2.
 #  GlobalCommitter throws exception (desired exception) during checkpointId 2 
(third commit) while processing data from checkpoint 1 (it is expected to 
global committer architecture lag one commit behind in reference to rest of the 
pipeline).
 # Task Manager recovery, source resumes sending data.
 # Streaming source ends.
 #  We are missing 20 records (one checkpoint).

What is happening is that during recovery, committers are performing "retry" on 
committables for checkpointId 2, however those committables, reprocessed from 
"retry" task are not emit downstream to the global committer. 

The issue can be reproduced using Junit Test builded with Flink's TestSink.
The test was implemented 
[here|https://github.com/kristoffSC/flink/blob/Flink_1.14_DataLoss_SinkGlobalCommitter/flink-tests/src/test/java/org/apache/flink/test/streaming/runtime/SinkITCase.java#:~:text=testGlobalCommitterMissingRecordsDuringRecovery]
 and it is based on other tests from SinkITCase.java class.

I believe that problem is somewhere around 
`SinkOperator::notifyCheckpointComplete` method. In there we see that Retry 
async task is scheduled however its result is never emitted downstream like it 
is done for regular flow one line above.


> Data Loss in Sink GlobalCommitter during Task Manager recovery
> --
>
> Key: FLINK-29589
> URL: https://issues.apache.org/jira/browse/FLINK-29589
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.14.0, 1.14.2, 1.14.3, 1.14.4, 1.14.5, 1.14.6
>Reporter: Krzysztof Chmielewski
>Priority: Major
>
> Flink's Sink architecture with global committer seems to be vulnerable for 
> data loss during Task Manager recovery. The entire checkpoint can be lost by 
> GlobalCommitter resulting with data loss.
> Issue was observed in Delta Sink connector on a real 1.14.x cluster and was 
> replicated using Flink's 1.14.6 Test Utils classes.
> Scenario:
>  #  Streaming source emitting constant number of events per checkpoint (20 
> events per commit for 5 commits in total, that gives 100 records).
>  #  Sink with parallelism > 1 with committer and GlobalCommitter elements.
>  #  Commiters processed committables for checkpointId 2.
>  #  GlobalCommitter throws exception 

[jira] [Updated] (FLINK-29589) Data Loss in Sink GlobalCommitter during Task Manager recovery

2022-10-11 Thread Krzysztof Chmielewski (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-29589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krzysztof Chmielewski updated FLINK-29589:
--
Description: 
Flink's Sink architecture with global committer seems to be vulnerable for data 
loss during Task Manager recovery. The entire checkpoint can be lost by 
GlobalCommitter resulting with data loss.

Issue was observed in Delta Sink connector on a real 1.14.x cluster and was 
replicated using Flink's 1.14.6 Test Utils classes.

Scenario:
 #  Streaming source emitting constant number of events per checkpoint (20 
events for 5 commits)
 #  Sink with parallelism > 1 with committer and GlobalCommitter elements.
 #  Commiters processed committables for checkpointId 2.
 #  GlobalCommitter throws exception (desired exception) during checkpointId 2 
(third commit) while processing data from checkpoint 1 (it is expected to 
global committer architecture lag one commit behind in reference to rest of the 
pipeline).
 # Task Manager recovery, source resumes sending data.
 # Streaming source ends.
 #  We are missing 20 records (one checkpoint).

What is happening is that during recovery, committers are performing "retry" on 
committables for checkpointId 2, however those committables, reprocessed from 
"retry" task are not emit downstream to the global committer. 

The issue can be reproduced using Junit Test builded with Flink's TestSink.
The test was implemented 
[here|https://github.com/kristoffSC/flink/blob/Flink_1.14_DataLoss_SinkGlobalCommitter/flink-tests/src/test/java/org/apache/flink/test/streaming/runtime/SinkITCase.java#:~:text=testGlobalCommitterMissingRecordsDuringRecovery]
 and it is based on other tests from SinkITCase.java class.

I believe that problem is somewhere around 
`SinkOperator::notifyCheckpointComplete` method. In there we see that Retry 
async task is scheduled however its result is never emitted downstream like it 
is done for regular flow one line above.

  was:
Flink's Sink architecture with global committer seems to be vulnerable for data 
loss during Task Manager recovery. The entire checkpoint can be lost by 
GlobalCommitter resulting with data loss.

Issue was observed in Delta Sink connector on a real 1.14.x cluster and was 
replicated using Flink's 1.14.6 Test Utils classes.

Scenario:
 #  Streaming source emitting constant number of events per checkpoint (20 
events for 5 commits)
 #  Sink with parallelism > 1 with committer and GlobalCommitter elements.
 #  Commiters processed committables for checkpointId 2.
 #  GlobalCommitter throws exception (desired exception) during checkpointId 2 
(third commit) while processing data from checkpoint 1 (it is expected to 
global committer architecture lag one commit behind in reference to rest of the 
pipeline).
 # Task Manager recovery, source resumes sending data.
 # Streaming source ends.
 #  We are missing 20 records (one checkpoint).

What is happening is that during recovery, committers are performing "retry" on 
committables for checkpointId 2, however those committables, reprocessed from 
"retry" task are not emit downstream to the global committer. 

The issue can be reproduced using Junit Test builded with Flink's TestSink.
The test was implemented 
[here|https://github.com/kristoffSC/flink/blob/Flink_1.14_DataLoss_SinkGlobalCommitter/flink-tests/src/test/java/org/apache/flink/test/streaming/runtime/SinkITCase.java#:~:text=testGlobalCommitterMissingRecordsDuringRecovery].

I believe that problem is somewhere around 
`SinkOperator::notifyCheckpointComplete` method. In there we see that Retry 
async task is scheduled however its result is never emitted downstream like it 
is done for regular flow one line above.


> Data Loss in Sink GlobalCommitter during Task Manager recovery
> --
>
> Key: FLINK-29589
> URL: https://issues.apache.org/jira/browse/FLINK-29589
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.14.0, 1.14.2, 1.14.3, 1.14.4, 1.14.5, 1.14.6
>Reporter: Krzysztof Chmielewski
>Priority: Major
>
> Flink's Sink architecture with global committer seems to be vulnerable for 
> data loss during Task Manager recovery. The entire checkpoint can be lost by 
> GlobalCommitter resulting with data loss.
> Issue was observed in Delta Sink connector on a real 1.14.x cluster and was 
> replicated using Flink's 1.14.6 Test Utils classes.
> Scenario:
>  #  Streaming source emitting constant number of events per checkpoint (20 
> events for 5 commits)
>  #  Sink with parallelism > 1 with committer and GlobalCommitter elements.
>  #  Commiters processed committables for checkpointId 2.
>  #  GlobalCommitter throws exception (desired exception) during checkpointId 
> 2 (third commit) while processing data from checkpoint 1 (it is expected to 
> global committer 

[jira] [Updated] (FLINK-29589) Data Loss in Sink GlobalCommitter during Task Manager recovery

2022-10-11 Thread Krzysztof Chmielewski (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-29589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krzysztof Chmielewski updated FLINK-29589:
--
Description: 
Flink's Sink architecture with global committer seems to be vulnerable for data 
loss during Task Manager recovery. The entire checkpoint can be lost by 
GlobalCommitter resulting with data loss.

Issue was observed in Delta Sink connector on a real 1.14.x cluster and was 
replicated using Flink's 1.14.6 Test Utils classes.

Scenario:
 #  Streaming source emitting constant number of events per checkpoint (20 
events for 5 commits)
 #  Sink with parallelism > 1 with committer and GlobalCommitter elements.
 #  Commiters processed committables for checkpointId 2.
 #  GlobalCommitter throws exception (desired exception) during checkpointId 2 
(third commit) while processing data from checkpoint 1 (it is expected to 
global committer architecture lag one commit behind in reference to rest of the 
pipeline).
 # Task Manager recovery, source resumes sending data.
 # Streaming source ends.
 #  We are missing 20 records (one checkpoint).

What is happening is that during recovery, committers are performing "retry" on 
committables for checkpointId 2, however those committables, reprocessed from 
"retry" task are not emit downstream to the global committer. 

The issue can be reproduced using Junit Test builded with Flink's TestSink.
The test was implemented 
[here|https://github.com/kristoffSC/flink/blob/Flink_1.14_DataLoss_SinkGlobalCommitter/flink-tests/src/test/java/org/apache/flink/test/streaming/runtime/SinkITCase.java#:~:text=testGlobalCommitterMissingRecordsDuringRecovery].

I believe that problem is somewhere around 
`SinkOperator::notifyCheckpointComplete` method. In there we see that Retry 
async task is scheduled however its result is never emitted downstream like it 
is done for regular flow one line above.

  was:
Flink's Sink's architecture with global committer seems to be vulnerable for 
data loss during Task Manager recovery. The entire checkpoint can be lost by 
GlobalCommitter resulting with data loss for sinks.

Issue was observed in Delta Sink connector on a real 1.14.x cluster and was 
replicated using Flink's 1.14.6 Test Utils classes.

Scenario:
 #  Streaming source emitting constant number of events per checkpoint (20 
events for 5 commits)
 #  Sink with parallelism > 1 with committer and GlobalCommitter elements.
 #  Commitaers processed committables for checkpointId 2.
 #  GlobalCommitter throws exception (desired exception) during checkpointId 2 
(third commit) while processing data from checkpoint 1 (it is expected to 
global committer architecture lag one commit behind in reference to rest of the 
pipeline).
 # Streaming source ends.
 #  we are missing 20 records (one checkpoint).

What is happening is that during recovery, committers are performing "retry" on 
committables for checkpointId 2, however those committables, reprocessed from 
"retry" task are not emit to the global committer. 

The issue can be reproduced using Junit Test builded with Flink's TestSink.
The test was implemented 
[here|https://github.com/kristoffSC/flink/blob/Flink_1.14_DataLoss_SinkGlobalCommitter/flink-tests/src/test/java/org/apache/flink/test/streaming/runtime/SinkITCase.java#:~:text=testGlobalCommitterMissingRecordsDuringRecovery].

I believe that problem is somewhere around 
`SinkOperator::notifyCheckpointComplete` method. In there we see that Retry 
async task is scheduled however its result is never emitted downstream like it 
is done for regular flow one line above.


> Data Loss in Sink GlobalCommitter during Task Manager recovery
> --
>
> Key: FLINK-29589
> URL: https://issues.apache.org/jira/browse/FLINK-29589
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.14.0, 1.14.2, 1.14.3, 1.14.4, 1.14.5, 1.14.6
>Reporter: Krzysztof Chmielewski
>Priority: Major
>
> Flink's Sink architecture with global committer seems to be vulnerable for 
> data loss during Task Manager recovery. The entire checkpoint can be lost by 
> GlobalCommitter resulting with data loss.
> Issue was observed in Delta Sink connector on a real 1.14.x cluster and was 
> replicated using Flink's 1.14.6 Test Utils classes.
> Scenario:
>  #  Streaming source emitting constant number of events per checkpoint (20 
> events for 5 commits)
>  #  Sink with parallelism > 1 with committer and GlobalCommitter elements.
>  #  Commiters processed committables for checkpointId 2.
>  #  GlobalCommitter throws exception (desired exception) during checkpointId 
> 2 (third commit) while processing data from checkpoint 1 (it is expected to 
> global committer architecture lag one commit behind in reference to rest of 
> the pipeline).
>  # Task Manager recovery, source resumes 

[jira] [Created] (FLINK-29589) Data Loss in Sink GlobalCommitter during Task Manager recovery

2022-10-11 Thread Krzysztof Chmielewski (Jira)
Krzysztof Chmielewski created FLINK-29589:
-

 Summary: Data Loss in Sink GlobalCommitter during Task Manager 
recovery
 Key: FLINK-29589
 URL: https://issues.apache.org/jira/browse/FLINK-29589
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.14.6, 1.14.5, 1.14.4, 1.14.3, 1.14.2, 1.14.0
Reporter: Krzysztof Chmielewski


Flink's Sink's architecture with global committer seems to be vulnerable for 
data loss during Task Manager recovery. The entire checkpoint can be lost by 
GlobalCommitter resulting with data loss for sinks.

Issue was observed in Delta Sink connector on a real 1.14.x cluster and was 
replicated using Flink's 1.14.6 Test Utils classes.

Scenario:
1. Streaming source emitting constant number of events per checkpoint (20 
events for 5 commits)
2. Sink with parallelism > 1 with committer and GlobalCommitter elements.
3. Commitaers processed committables for checkpointId 2.
3. GlobalCommitter throws exception (desired exception) during checkpointId 2 
(third commit) while processing data from checkpoint 1 (it is expected to 
global committer architecture lag one commit behind in reference to rest of the 
pipeline).
4. Streaming source ends
5. we are missing 20 records (one checkpoint).

What is happening is that during recovery, committers are performing "retry" on 
committables for checkpointId 2, however those committables, reprocessed from 
"retry" task are not emit to the global committer. 

The issue can be reproduced using Junit Test builded with Flink's TestSink.
The test was implemented 
[here|https://github.com/kristoffSC/flink/blob/Flink_1.14_DataLoss_SinkGlobalCommitter/flink-tests/src/test/java/org/apache/flink/test/streaming/runtime/SinkITCase.java#:~:text=testGlobalCommitterMissingRecordsDuringRecovery].



I believe that problem is somewhere around 
`SinkOperator::notifyCheckpointComplete` method. In there we see that Retry 
async task is scheduled however its result is never emitted downstream like it 
is done for regular flow one line above.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-29589) Data Loss in Sink GlobalCommitter during Task Manager recovery

2022-10-11 Thread Krzysztof Chmielewski (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-29589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krzysztof Chmielewski updated FLINK-29589:
--
Description: 
Flink's Sink's architecture with global committer seems to be vulnerable for 
data loss during Task Manager recovery. The entire checkpoint can be lost by 
GlobalCommitter resulting with data loss for sinks.

Issue was observed in Delta Sink connector on a real 1.14.x cluster and was 
replicated using Flink's 1.14.6 Test Utils classes.

Scenario:
 #  Streaming source emitting constant number of events per checkpoint (20 
events for 5 commits)
 #  Sink with parallelism > 1 with committer and GlobalCommitter elements.
 #  Commitaers processed committables for checkpointId 2.
 #  GlobalCommitter throws exception (desired exception) during checkpointId 2 
(third commit) while processing data from checkpoint 1 (it is expected to 
global committer architecture lag one commit behind in reference to rest of the 
pipeline).
 # Streaming source ends.
 #  we are missing 20 records (one checkpoint).

What is happening is that during recovery, committers are performing "retry" on 
committables for checkpointId 2, however those committables, reprocessed from 
"retry" task are not emit to the global committer. 

The issue can be reproduced using Junit Test builded with Flink's TestSink.
The test was implemented 
[here|https://github.com/kristoffSC/flink/blob/Flink_1.14_DataLoss_SinkGlobalCommitter/flink-tests/src/test/java/org/apache/flink/test/streaming/runtime/SinkITCase.java#:~:text=testGlobalCommitterMissingRecordsDuringRecovery].

I believe that problem is somewhere around 
`SinkOperator::notifyCheckpointComplete` method. In there we see that Retry 
async task is scheduled however its result is never emitted downstream like it 
is done for regular flow one line above.

  was:
Flink's Sink's architecture with global committer seems to be vulnerable for 
data loss during Task Manager recovery. The entire checkpoint can be lost by 
GlobalCommitter resulting with data loss for sinks.

Issue was observed in Delta Sink connector on a real 1.14.x cluster and was 
replicated using Flink's 1.14.6 Test Utils classes.

Scenario:
1. Streaming source emitting constant number of events per checkpoint (20 
events for 5 commits)
2. Sink with parallelism > 1 with committer and GlobalCommitter elements.
3. Commitaers processed committables for checkpointId 2.
3. GlobalCommitter throws exception (desired exception) during checkpointId 2 
(third commit) while processing data from checkpoint 1 (it is expected to 
global committer architecture lag one commit behind in reference to rest of the 
pipeline).
4. Streaming source ends
5. we are missing 20 records (one checkpoint).

What is happening is that during recovery, committers are performing "retry" on 
committables for checkpointId 2, however those committables, reprocessed from 
"retry" task are not emit to the global committer. 

The issue can be reproduced using Junit Test builded with Flink's TestSink.
The test was implemented 
[here|https://github.com/kristoffSC/flink/blob/Flink_1.14_DataLoss_SinkGlobalCommitter/flink-tests/src/test/java/org/apache/flink/test/streaming/runtime/SinkITCase.java#:~:text=testGlobalCommitterMissingRecordsDuringRecovery].



I believe that problem is somewhere around 
`SinkOperator::notifyCheckpointComplete` method. In there we see that Retry 
async task is scheduled however its result is never emitted downstream like it 
is done for regular flow one line above.


> Data Loss in Sink GlobalCommitter during Task Manager recovery
> --
>
> Key: FLINK-29589
> URL: https://issues.apache.org/jira/browse/FLINK-29589
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.14.0, 1.14.2, 1.14.3, 1.14.4, 1.14.5, 1.14.6
>Reporter: Krzysztof Chmielewski
>Priority: Major
>
> Flink's Sink's architecture with global committer seems to be vulnerable for 
> data loss during Task Manager recovery. The entire checkpoint can be lost by 
> GlobalCommitter resulting with data loss for sinks.
> Issue was observed in Delta Sink connector on a real 1.14.x cluster and was 
> replicated using Flink's 1.14.6 Test Utils classes.
> Scenario:
>  #  Streaming source emitting constant number of events per checkpoint (20 
> events for 5 commits)
>  #  Sink with parallelism > 1 with committer and GlobalCommitter elements.
>  #  Commitaers processed committables for checkpointId 2.
>  #  GlobalCommitter throws exception (desired exception) during checkpointId 
> 2 (third commit) while processing data from checkpoint 1 (it is expected to 
> global committer architecture lag one commit behind in reference to rest of 
> the pipeline).
>  # Streaming source ends.
>  #  we are missing 20 records (one checkpoint).
> What is 

[jira] [Comment Edited] (FLINK-29509) Set correct subtaskId during recovery of committables

2022-10-07 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17613888#comment-17613888
 ] 

Krzysztof Chmielewski edited comment on FLINK-29509 at 10/7/22 4:29 PM:


PR ready for review :)
[https://github.com/apache/flink/pull/20979]


was (Author: kristoffsc):
PR:
https://github.com/apache/flink/pull/20979

> Set correct subtaskId during recovery of committables
> -
>
> Key: FLINK-29509
> URL: https://issues.apache.org/jira/browse/FLINK-29509
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Common
>Affects Versions: 1.17.0, 1.15.2, 1.16.1
>Reporter: Fabian Paul
>Assignee: Krzysztof Chmielewski
>Priority: Critical
>
> When we recover the `CheckpointCommittableManager` we ignore the subtaskId it 
> is recovered on. 
> [https://github.com/apache/flink/blob/d191bda7e63a2c12416cba56090e5cd75426079b/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/operators/sink/committables/CheckpointCommittableManagerImpl.java#L58]
> This becomes a problem when a sink uses a post-commit topology because 
> multiple committer operators might forward committable summaries coming from 
> the same subtaskId.
>  
> It should be possible to use the subtaskId already present in the 
> `CommittableCollector` when creating the `CheckpointCommittableManager`s.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-29509) Set correct subtaskId during recovery of committables

2022-10-07 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17613888#comment-17613888
 ] 

Krzysztof Chmielewski commented on FLINK-29509:
---

PR:
https://github.com/apache/flink/pull/20979

> Set correct subtaskId during recovery of committables
> -
>
> Key: FLINK-29509
> URL: https://issues.apache.org/jira/browse/FLINK-29509
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Common
>Affects Versions: 1.17.0, 1.15.2, 1.16.1
>Reporter: Fabian Paul
>Assignee: Krzysztof Chmielewski
>Priority: Critical
>
> When we recover the `CheckpointCommittableManager` we ignore the subtaskId it 
> is recovered on. 
> [https://github.com/apache/flink/blob/d191bda7e63a2c12416cba56090e5cd75426079b/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/operators/sink/committables/CheckpointCommittableManagerImpl.java#L58]
> This becomes a problem when a sink uses a post-commit topology because 
> multiple committer operators might forward committable summaries coming from 
> the same subtaskId.
>  
> It should be possible to use the subtaskId already present in the 
> `CommittableCollector` when creating the `CheckpointCommittableManager`s.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-29509) Set correct subtaskId during recovery of committables

2022-10-06 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17613527#comment-17613527
 ] 

Krzysztof Chmielewski commented on FLINK-29509:
---

Hi,
I would like to work on this thicket.

Can someone assign it to me? It seems I can't do that.

> Set correct subtaskId during recovery of committables
> -
>
> Key: FLINK-29509
> URL: https://issues.apache.org/jira/browse/FLINK-29509
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Common
>Affects Versions: 1.17.0, 1.15.2, 1.16.1
>Reporter: Fabian Paul
>Priority: Critical
>
> When we recover the `CheckpointCommittableManager` we ignore the subtaskId it 
> is recovered on. 
> [https://github.com/apache/flink/blob/d191bda7e63a2c12416cba56090e5cd75426079b/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/operators/sink/committables/CheckpointCommittableManagerImpl.java#L58]
> This becomes a problem when a sink uses a post-commit topology because 
> multiple committer operators might forward committable summaries coming from 
> the same subtaskId.
>  
> It should be possible to use the subtaskId already present in the 
> `CommittableCollector` when creating the `CheckpointCommittableManager`s.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-28591) Array> is not serialized correctly when BigInt is present

2022-07-19 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-28591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17568488#comment-17568488
 ] 

Krzysztof Chmielewski edited comment on FLINK-28591 at 7/19/22 11:51 AM:
-

The potential issue might be in _CopyingChainingOutput.class_ line 82 where we 
call 

input.processElement(copy);
The type of input is "StreamExecCalc" but i do not see processElement method on 
this type.. and when I try to go inside with IntelliJ debug, I actually dont 
see anything..

Anyways,
for case with bigint, 
{code:java}
input.processElement(copy);{code}
 leads us to {_}GenericArrayData{_}, where for case with int, we dont have this 
object created.

Unfortunately I dont know what is happening inside 
{code:java}
input.processElement(copy);{code}
Any hint about how to debug this place would help.
Currently I see this:
!image-2022-07-19-13-51-45-254.png!


was (Author: kristoffsc):
The potential issue might be in _CopyingChainingOutput.class_ line 82 where we 
call 

input.processElement(copy);
The type of input is "StreamExecCalc" but i do not see processElement method on 
this type.. and when I try to go inside with IntelliJ debug, I actually dont 
see anything..

Anyways,
for case with bigint, 
{code:java}
input.processElement(copy);{code}
 leads us to {_}GenericArrayData{_}, where for case with int, we dont have this 
object created.

Unfortunately I dont know what is happening inside 
{code:java}
input.processElement(copy);{code}

> Array> is not serialized correctly when BigInt is present
> --
>
> Key: FLINK-28591
> URL: https://issues.apache.org/jira/browse/FLINK-28591
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / API, Table SQL / Planner
>Affects Versions: 1.15.0
>Reporter: Andrzej Swatowski
>Priority: Major
> Attachments: image-2022-07-19-13-51-45-254.png
>
>
> When using Table API to insert data into array of rows, the data apparently 
> is incorrectly serialized internally, which leads to incorrect serialization 
> at the connectors. It happens when one of the table fields is a BIGINT (and 
> does not happen, when it is INT).
> E.g., a following table:
> {code:java}
> CREATE TABLE wrongArray (
>     foo bigint,
>     bar ARRAY>
> ) WITH (
>   'connector' = 'filesystem',
>   'path' = 'file://path/to/somewhere',
>   'format' = 'json'
> ) {code}
> along with the following insert:
> {code:java}
> insert into wrongArray (
>     SELECT
>         1,
>         array[
>             ('Field1', 'Value1'),
>             ('Field2', 'Value2')
>         ]
>     FROM (VALUES(1))
> ) {code}
> gets serialized into: 
> {code:java}
> {
>   "foo":1,
>   "bar":[
>     {
>       "foo1":"Field2",
>       "foo2":"Value2"
>     },
>     {
>       "foo1":"Field2",
>       "foo2":"Value2"
>     }
>   ]
> }{code}
> It is easy to spot that `bar` (an Array of Rows with two Strings) consists of 
> duplicates of the last row in the array.
> On the other hand, when `foo` is of type `int` instead of `bigint`:
> {code:java}
> CREATE TABLE wrongArray (
>     foo int,
>     bar ARRAY>
> ) WITH (
>   'connector' = 'filesystem',
>   'path' = 'file://path/to/somewhere',
>   'format' = 'json'
> ) {code}
> the previous insert yields correct value: 
> {code:java}
> {
>   "foo":1,
>   "bar":[
>     {
>       "foo1":"Field1",
>       "foo2":"Value1"
>     },
>     {
>       "foo1":"Field2",
>       "foo2":"Value2"
>     }
>   ]
> }{code}
> Bug reproduced in the Flink project: 
> [https://github.com/swtwsk/flink-array-row-bug]
> 
> It is not an error connected with either a specific connector or format. I 
> have done a bit of debugging when trying to implement my own format and it 
> seems that `BinaryArrayData` holding the row values has wrong data saved in 
> its `MemorySegment`, i.e. calling: 
> {code:java}
> for (var i = 0; i < array.size(); i++) {
>   Object element = arrayDataElementGetter.getElementOrNull(array, i);
> }{code}
> correctly calculates offsets but yields the same result as the data is 
> malformed in the array's `MemorySegment`. Such a call can be, e.g., found in 
> `flink-json` — to be more specific in 
> {color:#e8912d}org.apache.flink.formats.json.RowDataToJsonConverters::createArrayConverter
>  {color}(line 241 in 1.15.0 version)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-28591) Array> is not serialized correctly when BigInt is present

2022-07-19 Thread Krzysztof Chmielewski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-28591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17568488#comment-17568488
 ] 

Krzysztof Chmielewski commented on FLINK-28591:
---

The potential issue might be in _CopyingChainingOutput.class_ line 82 where we 
call 

input.processElement(copy);
The type of input is "StreamExecCalc" but i do not see processElement method on 
this type.. and when I try to go inside with IntelliJ debug, I actually dont 
see anything..

Anyways,
for case with bigint, 
{code:java}
input.processElement(copy);{code}
 leads us to {_}GenericArrayData{_}, where for case with int, we dont have this 
object created.

Unfortunately I dont know what is happening inside 
{code:java}
input.processElement(copy);{code}

> Array> is not serialized correctly when BigInt is present
> --
>
> Key: FLINK-28591
> URL: https://issues.apache.org/jira/browse/FLINK-28591
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / API, Table SQL / Planner
>Affects Versions: 1.15.0
>Reporter: Andrzej Swatowski
>Priority: Major
>
> When using Table API to insert data into array of rows, the data apparently 
> is incorrectly serialized internally, which leads to incorrect serialization 
> at the connectors. It happens when one of the table fields is a BIGINT (and 
> does not happen, when it is INT).
> E.g., a following table:
> {code:java}
> CREATE TABLE wrongArray (
>     foo bigint,
>     bar ARRAY>
> ) WITH (
>   'connector' = 'filesystem',
>   'path' = 'file://path/to/somewhere',
>   'format' = 'json'
> ) {code}
> along with the following insert:
> {code:java}
> insert into wrongArray (
>     SELECT
>         1,
>         array[
>             ('Field1', 'Value1'),
>             ('Field2', 'Value2')
>         ]
>     FROM (VALUES(1))
> ) {code}
> gets serialized into: 
> {code:java}
> {
>   "foo":1,
>   "bar":[
>     {
>       "foo1":"Field2",
>       "foo2":"Value2"
>     },
>     {
>       "foo1":"Field2",
>       "foo2":"Value2"
>     }
>   ]
> }{code}
> It is easy to spot that `bar` (an Array of Rows with two Strings) consists of 
> duplicates of the last row in the array.
> On the other hand, when `foo` is of type `int` instead of `bigint`:
> {code:java}
> CREATE TABLE wrongArray (
>     foo int,
>     bar ARRAY>
> ) WITH (
>   'connector' = 'filesystem',
>   'path' = 'file://path/to/somewhere',
>   'format' = 'json'
> ) {code}
> the previous insert yields correct value: 
> {code:java}
> {
>   "foo":1,
>   "bar":[
>     {
>       "foo1":"Field1",
>       "foo2":"Value1"
>     },
>     {
>       "foo1":"Field2",
>       "foo2":"Value2"
>     }
>   ]
> }{code}
> Bug reproduced in the Flink project: 
> [https://github.com/swtwsk/flink-array-row-bug]
> 
> It is not an error connected with either a specific connector or format. I 
> have done a bit of debugging when trying to implement my own format and it 
> seems that `BinaryArrayData` holding the row values has wrong data saved in 
> its `MemorySegment`, i.e. calling: 
> {code:java}
> for (var i = 0; i < array.size(); i++) {
>   Object element = arrayDataElementGetter.getElementOrNull(array, i);
> }{code}
> correctly calculates offsets but yields the same result as the data is 
> malformed in the array's `MemorySegment`. Such a call can be, e.g., found in 
> `flink-json` — to be more specific in 
> {color:#e8912d}org.apache.flink.formats.json.RowDataToJsonConverters::createArrayConverter
>  {color}(line 241 in 1.15.0 version)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)