[jira] [Commented] (FLINK-33251) SQL Client query execution aborts after a few seconds: ConnectTimeoutException

2024-04-23 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840061#comment-17840061
 ] 

Robert Metzger commented on FLINK-33251:


I'm having this problem with 1.19.0 as well, on a M1 MBP.
The problem is tricky to reproduce (e.g. it doesn't happen all the times)

{code}
2024-04-23 12:51:14,317 DEBUG org.apache.flink.runtime.rest.RestClient  
   [] - Shutting down rest endpoint.
2024-04-23 12:51:14,317 DEBUG 
org.apache.flink.shaded.netty4.io.netty.buffer.PoolThreadCache [] - Freed 2 
thread-local buffer(s) from thread: flink-rest-client-netty-thread-1
2024-04-23 12:51:14,318 DEBUG org.apache.flink.runtime.rest.RestClient  
   [] - Rest endpoint shutdown complete.
2024-04-23 12:51:14,318 TRACE 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop [] - 
instrumented a special java.util.Set into: sun.nio.ch.KQueueSelectorImpl@c1d225b
2024-04-23 12:51:14,318 DEBUG org.apache.flink.runtime.rest.RestClient  
   [] - Rest client endpoint started.
2024-04-23 12:51:14,318 DEBUG org.apache.flink.runtime.rest.RestClient  
   [] - Sending request of class class 
org.apache.flink.runtime.rest.messages.job.coordination.ClientCoordinationRequestBody
 to 
localhost:8081/v1/jobs/5d8d1b8ef7dc49381a3855ae10a18ec5/coordinators/b728d985904d42b0fdd945a9e3253fca
2024-04-23 12:51:14,320 DEBUG org.apache.flink.runtime.rest.RestClient  
   [] - Received response 
{"serializedCoordinationResult":"rO0ABXNyAExvcmcuYXBhY2hlLmZsaW5rLnN0cmVhbWluZy5hcGkub3BlcmF0b3JzLmNvbGxlY3QuQ29sbGVjdENvb3JkaW5hdGlvblJlc3BvbnNlAAECAANKABZsYXN0Q2hlY2twb2ludGVkT2Zmc2V0TAARc2VyaWFsaXplZFJlc3VsdHN0ABBMamF2YS91dGlsL0xpc3Q7TAAHdmVyc2lvbnQAEkxqYXZhL2xhbmcvU3RyaW5nO3hwAABzcgATamF2YS51dGlsLkFycmF5TGlzdHiB0h2Zx2GdAwABSQAEc2l6ZXhwAHcEAHh0ACQ3MzgyMjA5Ni0wODE2LTQ5NTMtODA4NC1kMDJhZTg0ZjNhNWU="}.
2024-04-23 12:51:14,321 DEBUG org.apache.flink.runtime.rest.RestClient  
   [] - Shutting down rest endpoint.
2024-04-23 12:51:14,321 DEBUG 
org.apache.flink.shaded.netty4.io.netty.buffer.PoolThreadCache [] - Freed 3 
thread-local buffer(s) from thread: flink-rest-client-netty-thread-1
2024-04-23 12:51:14,321 DEBUG org.apache.flink.runtime.rest.RestClient  
   [] - Rest endpoint shutdown complete.
2024-04-23 12:51:14,390 DEBUG org.apache.flink.runtime.rest.RestClient  
   [] - Sending request of class class 
org.apache.flink.runtime.rest.messages.EmptyRequestBody to 
localhost:62113/v2/sessions/88b98272-be33-4303-a649-942acd213e84/heartbeat
2024-04-23 12:51:14,391 TRACE org.apache.flink.runtime.rest.FileUploadHandler   
   [] - Received request. 
URL:/v2/sessions/88b98272-be33-4303-a649-942acd213e84/heartbeat Method:POST
2024-04-23 12:51:14,391 TRACE 
org.apache.flink.table.gateway.rest.handler.session.TriggerSessionHeartbeatHandler
 [] - Received request 
/v2/sessions/88b98272-be33-4303-a649-942acd213e84/heartbeat.
2024-04-23 12:51:14,391 TRACE 
org.apache.flink.table.gateway.rest.handler.session.TriggerSessionHeartbeatHandler
 [] - Starting request processing.
2024-04-23 12:51:14,391 DEBUG org.apache.flink.runtime.rest.RestClient  
   [] - Received response {}.
2024-04-23 12:51:14,425 TRACE 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop [] - 
instrumented a special java.util.Set into: sun.nio.ch.KQueueSelectorImpl@1436813
2024-04-23 12:51:14,425 DEBUG org.apache.flink.runtime.rest.RestClient  
   [] - Rest client endpoint started.
2024-04-23 12:51:14,426 DEBUG org.apache.flink.runtime.rest.RestClient  
   [] - Sending request of class class 
org.apache.flink.runtime.rest.messages.EmptyRequestBody to 
localhost:8081/v1/jobs/5d8d1b8ef7dc49381a3855ae10a18ec5/status
2024-04-23 12:51:14,434 DEBUG org.apache.flink.runtime.rest.RestClient  
   [] - Received response {"status":"RUNNING"}.
2024-04-23 12:51:14,435 DEBUG org.apache.flink.runtime.rest.RestClient  
   [] - Shutting down rest endpoint.
2024-04-23 12:51:14,435 DEBUG 
org.apache.flink.shaded.netty4.io.netty.buffer.PoolThreadCache [] - Freed 2 
thread-local buffer(s) from thread: flink-rest-client-netty-thread-1
2024-04-23 12:51:14,435 DEBUG org.apache.flink.runtime.rest.RestClient  
   [] - Rest endpoint shutdown complete.
2024-04-23 12:51:14,435 TRACE 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop [] - 
instrumented a special java.util.Set into: 
sun.nio.ch.KQueueSelectorImpl@64f0cb77
2024-04-23 12:51:14,435 DEBUG org.apache.flink.runtime.rest.RestClient  
   [] - Rest client endpoint started.
2024-04-23 12:51:14,436 DEBUG org.apache.flink.runtime.rest.RestClient  
   [] - Sending request of class class 

[jira] [Commented] (FLINK-35040) The performance of serializerHeavyString regresses since April 3

2024-04-13 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17836843#comment-17836843
 ] 

Robert Metzger commented on FLINK-35040:


Do we know what causes the performance degradation with commons-io? Maybe 
there's a ticket in the commons-io project that helps us understand what is 
going on? If not, it might make sense to report to commons-io, so that they are 
aware that they have a performance degradation.

> The performance of serializerHeavyString regresses since April 3
> 
>
> Key: FLINK-35040
> URL: https://issues.apache.org/jira/browse/FLINK-35040
> Project: Flink
>  Issue Type: Bug
>  Components: Benchmarks
>Affects Versions: 1.20.0
>Reporter: Rui Fan
>Assignee: Rui Fan
>Priority: Blocker
>  Labels: pull-request-available
> Attachments: image-2024-04-08-10-51-07-403.png, 
> image-2024-04-11-12-53-53-353.png, screenshot-1.png
>
>
> The performance of serializerHeavyString regresses since April 3, and had not 
> yet recovered on April 8th.
> It seems Java 11 regresses, and Java 8 and Java 17 are fine.
> http://flink-speed.xyz/timeline/#/?exe=1,6,12=serializerHeavyString=on=on=off=3=200
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-34997) PyFlink YARN per-job on Docker test failed on azure

2024-04-09 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger reassigned FLINK-34997:
--

Assignee: Robert Metzger

> PyFlink YARN per-job on Docker test failed on azure
> ---
>
> Key: FLINK-34997
> URL: https://issues.apache.org/jira/browse/FLINK-34997
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / CI
>Affects Versions: 1.20.0
>Reporter: Weijie Guo
>Assignee: Robert Metzger
>Priority: Blocker
>  Labels: test-stability
>
> {code}
> Apr 03 03:12:37 
> ==
> Apr 03 03:12:37 Running 'PyFlink YARN per-job on Docker test'
> Apr 03 03:12:37 
> ==
> Apr 03 03:12:37 TEST_DATA_DIR: 
> /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-37046085202
> Apr 03 03:12:37 Flink dist directory: 
> /home/vsts/work/1/s/flink-dist/target/flink-1.19-SNAPSHOT-bin/flink-1.19-SNAPSHOT
> Apr 03 03:12:38 Flink dist directory: 
> /home/vsts/work/1/s/flink-dist/target/flink-1.19-SNAPSHOT-bin/flink-1.19-SNAPSHOT
> Apr 03 03:12:38 Docker version 24.0.9, build 2936816
> /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/common_docker.sh: 
> line 24: docker-compose: command not found
> Apr 03 03:12:38 [FAIL] Test script contains errors.
> Apr 03 03:12:38 Checking of logs skipped.
> Apr 03 03:12:38 
> Apr 03 03:12:38 [FAIL] 'PyFlink YARN per-job on Docker test' failed after 0 
> minutes and 1 seconds! Test exited with exit code 1
> {code}
> {code}
> /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/common_docker.sh: 
> line 24: docker-compose: command not found
> {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58709=logs=f8e16326-dc75-5ba0-3e95-6178dd55bf6c=94ccd692-49fc-5c64-8775-d427c6e65440=10236



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-35040) The performance of serializerHeavyString regresses since April 3

2024-04-09 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger reassigned FLINK-35040:
--

Assignee: Rui Fan

> The performance of serializerHeavyString regresses since April 3
> 
>
> Key: FLINK-35040
> URL: https://issues.apache.org/jira/browse/FLINK-35040
> Project: Flink
>  Issue Type: Bug
>  Components: Benchmarks
>Affects Versions: 1.20.0
>Reporter: Rui Fan
>Assignee: Rui Fan
>Priority: Blocker
> Attachments: image-2024-04-08-10-51-07-403.png, screenshot-1.png
>
>
> The performance of serializerHeavyString regresses since April 3, and had not 
> yet recovered on April 8th.
> It seems Java 11 regresses, and Java 8 and Java 17 are fine.
> http://flink-speed.xyz/timeline/#/?exe=1,6,12=serializerHeavyString=on=on=off=3=200
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34450) TwoInputStreamTaskTest.testWatermarkAndWatermarkStatusForwarding failed

2024-04-08 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835010#comment-17835010
 ] 

Robert Metzger commented on FLINK-34450:


I was not able to reproduce the issue with a single push, so it is "just" a 
build instability.

> TwoInputStreamTaskTest.testWatermarkAndWatermarkStatusForwarding failed
> ---
>
> Key: FLINK-34450
> URL: https://issues.apache.org/jira/browse/FLINK-34450
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Task
>Affects Versions: 1.20.0
>Reporter: Matthias Pohl
>Assignee: Robert Metzger
>Priority: Critical
>  Labels: github-actions, test-stability
> Fix For: 1.20.0
>
>
> https://github.com/XComp/flink/actions/runs/7927275243/job/21643615491#step:10:9880
> {code}
> Error: 07:48:06 07:48:06.643 [ERROR] Tests run: 11, Failures: 1, Errors: 0, 
> Skipped: 0, Time elapsed: 0.309 s <<< FAILURE! -- in 
> org.apache.flink.streaming.runtime.tasks.TwoInputStreamTaskTest
> Error: 07:48:06 07:48:06.646 [ERROR] 
> org.apache.flink.streaming.runtime.tasks.TwoInputStreamTaskTest.testWatermarkAndWatermarkStatusForwarding
>  -- Time elapsed: 0.036 s <<< FAILURE!
> Feb 16 07:48:06 Output was not correct.: array lengths differed, 
> expected.length=8 actual.length=7; arrays first differed at element [6]; 
> expected: but was:
> Feb 16 07:48:06   at 
> org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:78)
> Feb 16 07:48:06   at 
> org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:28)
> Feb 16 07:48:06   at org.junit.Assert.internalArrayEquals(Assert.java:534)
> Feb 16 07:48:06   at org.junit.Assert.assertArrayEquals(Assert.java:285)
> Feb 16 07:48:06   at 
> org.apache.flink.streaming.util.TestHarnessUtil.assertOutputEquals(TestHarnessUtil.java:59)
> Feb 16 07:48:06   at 
> org.apache.flink.streaming.runtime.tasks.TwoInputStreamTaskTest.testWatermarkAndWatermarkStatusForwarding(TwoInputStreamTaskTest.java:248)
> Feb 16 07:48:06   at java.lang.reflect.Method.invoke(Method.java:498)
> Feb 16 07:48:06 Caused by: java.lang.AssertionError: expected: 
> but was:
> Feb 16 07:48:06   at org.junit.Assert.fail(Assert.java:89)
> Feb 16 07:48:06   at org.junit.Assert.failNotEquals(Assert.java:835)
> Feb 16 07:48:06   at org.junit.Assert.assertEquals(Assert.java:120)
> Feb 16 07:48:06   at org.junit.Assert.assertEquals(Assert.java:146)
> Feb 16 07:48:06   at 
> org.junit.internal.ExactComparisonCriteria.assertElementsEqual(ExactComparisonCriteria.java:8)
> Feb 16 07:48:06   at 
> org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:76)
> Feb 16 07:48:06   ... 6 more
> {code}
> I couldn't reproduce it locally with 2 runs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-34450) TwoInputStreamTaskTest.testWatermarkAndWatermarkStatusForwarding failed

2024-04-05 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger reassigned FLINK-34450:
--

Assignee: Robert Metzger

> TwoInputStreamTaskTest.testWatermarkAndWatermarkStatusForwarding failed
> ---
>
> Key: FLINK-34450
> URL: https://issues.apache.org/jira/browse/FLINK-34450
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Task
>Affects Versions: 1.20.0
>Reporter: Matthias Pohl
>Assignee: Robert Metzger
>Priority: Critical
>  Labels: github-actions, test-stability
> Fix For: 1.20.0
>
>
> https://github.com/XComp/flink/actions/runs/7927275243/job/21643615491#step:10:9880
> {code}
> Error: 07:48:06 07:48:06.643 [ERROR] Tests run: 11, Failures: 1, Errors: 0, 
> Skipped: 0, Time elapsed: 0.309 s <<< FAILURE! -- in 
> org.apache.flink.streaming.runtime.tasks.TwoInputStreamTaskTest
> Error: 07:48:06 07:48:06.646 [ERROR] 
> org.apache.flink.streaming.runtime.tasks.TwoInputStreamTaskTest.testWatermarkAndWatermarkStatusForwarding
>  -- Time elapsed: 0.036 s <<< FAILURE!
> Feb 16 07:48:06 Output was not correct.: array lengths differed, 
> expected.length=8 actual.length=7; arrays first differed at element [6]; 
> expected: but was:
> Feb 16 07:48:06   at 
> org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:78)
> Feb 16 07:48:06   at 
> org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:28)
> Feb 16 07:48:06   at org.junit.Assert.internalArrayEquals(Assert.java:534)
> Feb 16 07:48:06   at org.junit.Assert.assertArrayEquals(Assert.java:285)
> Feb 16 07:48:06   at 
> org.apache.flink.streaming.util.TestHarnessUtil.assertOutputEquals(TestHarnessUtil.java:59)
> Feb 16 07:48:06   at 
> org.apache.flink.streaming.runtime.tasks.TwoInputStreamTaskTest.testWatermarkAndWatermarkStatusForwarding(TwoInputStreamTaskTest.java:248)
> Feb 16 07:48:06   at java.lang.reflect.Method.invoke(Method.java:498)
> Feb 16 07:48:06 Caused by: java.lang.AssertionError: expected: 
> but was:
> Feb 16 07:48:06   at org.junit.Assert.fail(Assert.java:89)
> Feb 16 07:48:06   at org.junit.Assert.failNotEquals(Assert.java:835)
> Feb 16 07:48:06   at org.junit.Assert.assertEquals(Assert.java:120)
> Feb 16 07:48:06   at org.junit.Assert.assertEquals(Assert.java:146)
> Feb 16 07:48:06   at 
> org.junit.internal.ExactComparisonCriteria.assertElementsEqual(ExactComparisonCriteria.java:8)
> Feb 16 07:48:06   at 
> org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:76)
> Feb 16 07:48:06   ... 6 more
> {code}
> I couldn't reproduce it locally with 2 runs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34450) TwoInputStreamTaskTest.testWatermarkAndWatermarkStatusForwarding failed

2024-04-05 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834196#comment-17834196
 ] 

Robert Metzger commented on FLINK-34450:


I'm checking how easy this is to reproduce with GHA and if Romans latest 
changes have anything to do with it.

> TwoInputStreamTaskTest.testWatermarkAndWatermarkStatusForwarding failed
> ---
>
> Key: FLINK-34450
> URL: https://issues.apache.org/jira/browse/FLINK-34450
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Task
>Affects Versions: 1.20.0
>Reporter: Matthias Pohl
>Assignee: Robert Metzger
>Priority: Critical
>  Labels: github-actions, test-stability
> Fix For: 1.20.0
>
>
> https://github.com/XComp/flink/actions/runs/7927275243/job/21643615491#step:10:9880
> {code}
> Error: 07:48:06 07:48:06.643 [ERROR] Tests run: 11, Failures: 1, Errors: 0, 
> Skipped: 0, Time elapsed: 0.309 s <<< FAILURE! -- in 
> org.apache.flink.streaming.runtime.tasks.TwoInputStreamTaskTest
> Error: 07:48:06 07:48:06.646 [ERROR] 
> org.apache.flink.streaming.runtime.tasks.TwoInputStreamTaskTest.testWatermarkAndWatermarkStatusForwarding
>  -- Time elapsed: 0.036 s <<< FAILURE!
> Feb 16 07:48:06 Output was not correct.: array lengths differed, 
> expected.length=8 actual.length=7; arrays first differed at element [6]; 
> expected: but was:
> Feb 16 07:48:06   at 
> org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:78)
> Feb 16 07:48:06   at 
> org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:28)
> Feb 16 07:48:06   at org.junit.Assert.internalArrayEquals(Assert.java:534)
> Feb 16 07:48:06   at org.junit.Assert.assertArrayEquals(Assert.java:285)
> Feb 16 07:48:06   at 
> org.apache.flink.streaming.util.TestHarnessUtil.assertOutputEquals(TestHarnessUtil.java:59)
> Feb 16 07:48:06   at 
> org.apache.flink.streaming.runtime.tasks.TwoInputStreamTaskTest.testWatermarkAndWatermarkStatusForwarding(TwoInputStreamTaskTest.java:248)
> Feb 16 07:48:06   at java.lang.reflect.Method.invoke(Method.java:498)
> Feb 16 07:48:06 Caused by: java.lang.AssertionError: expected: 
> but was:
> Feb 16 07:48:06   at org.junit.Assert.fail(Assert.java:89)
> Feb 16 07:48:06   at org.junit.Assert.failNotEquals(Assert.java:835)
> Feb 16 07:48:06   at org.junit.Assert.assertEquals(Assert.java:120)
> Feb 16 07:48:06   at org.junit.Assert.assertEquals(Assert.java:146)
> Feb 16 07:48:06   at 
> org.junit.internal.ExactComparisonCriteria.assertElementsEqual(ExactComparisonCriteria.java:8)
> Feb 16 07:48:06   at 
> org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:76)
> Feb 16 07:48:06   ... 6 more
> {code}
> I couldn't reproduce it locally with 2 runs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34450) TwoInputStreamTaskTest.testWatermarkAndWatermarkStatusForwarding failed

2024-04-05 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-34450:
---
Priority: Critical  (was: Blocker)

> TwoInputStreamTaskTest.testWatermarkAndWatermarkStatusForwarding failed
> ---
>
> Key: FLINK-34450
> URL: https://issues.apache.org/jira/browse/FLINK-34450
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Task
>Affects Versions: 1.20.0
>Reporter: Matthias Pohl
>Priority: Critical
>  Labels: github-actions, test-stability
> Fix For: 1.20.0
>
>
> https://github.com/XComp/flink/actions/runs/7927275243/job/21643615491#step:10:9880
> {code}
> Error: 07:48:06 07:48:06.643 [ERROR] Tests run: 11, Failures: 1, Errors: 0, 
> Skipped: 0, Time elapsed: 0.309 s <<< FAILURE! -- in 
> org.apache.flink.streaming.runtime.tasks.TwoInputStreamTaskTest
> Error: 07:48:06 07:48:06.646 [ERROR] 
> org.apache.flink.streaming.runtime.tasks.TwoInputStreamTaskTest.testWatermarkAndWatermarkStatusForwarding
>  -- Time elapsed: 0.036 s <<< FAILURE!
> Feb 16 07:48:06 Output was not correct.: array lengths differed, 
> expected.length=8 actual.length=7; arrays first differed at element [6]; 
> expected: but was:
> Feb 16 07:48:06   at 
> org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:78)
> Feb 16 07:48:06   at 
> org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:28)
> Feb 16 07:48:06   at org.junit.Assert.internalArrayEquals(Assert.java:534)
> Feb 16 07:48:06   at org.junit.Assert.assertArrayEquals(Assert.java:285)
> Feb 16 07:48:06   at 
> org.apache.flink.streaming.util.TestHarnessUtil.assertOutputEquals(TestHarnessUtil.java:59)
> Feb 16 07:48:06   at 
> org.apache.flink.streaming.runtime.tasks.TwoInputStreamTaskTest.testWatermarkAndWatermarkStatusForwarding(TwoInputStreamTaskTest.java:248)
> Feb 16 07:48:06   at java.lang.reflect.Method.invoke(Method.java:498)
> Feb 16 07:48:06 Caused by: java.lang.AssertionError: expected: 
> but was:
> Feb 16 07:48:06   at org.junit.Assert.fail(Assert.java:89)
> Feb 16 07:48:06   at org.junit.Assert.failNotEquals(Assert.java:835)
> Feb 16 07:48:06   at org.junit.Assert.assertEquals(Assert.java:120)
> Feb 16 07:48:06   at org.junit.Assert.assertEquals(Assert.java:146)
> Feb 16 07:48:06   at 
> org.junit.internal.ExactComparisonCriteria.assertElementsEqual(ExactComparisonCriteria.java:8)
> Feb 16 07:48:06   at 
> org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:76)
> Feb 16 07:48:06   ... 6 more
> {code}
> I couldn't reproduce it locally with 2 runs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34450) TwoInputStreamTaskTest.testWatermarkAndWatermarkStatusForwarding failed

2024-04-04 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-34450:
---
Fix Version/s: 1.20.0

> TwoInputStreamTaskTest.testWatermarkAndWatermarkStatusForwarding failed
> ---
>
> Key: FLINK-34450
> URL: https://issues.apache.org/jira/browse/FLINK-34450
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Task
>Affects Versions: 1.20.0
>Reporter: Matthias Pohl
>Priority: Blocker
>  Labels: github-actions, test-stability
> Fix For: 1.20.0
>
>
> https://github.com/XComp/flink/actions/runs/7927275243/job/21643615491#step:10:9880
> {code}
> Error: 07:48:06 07:48:06.643 [ERROR] Tests run: 11, Failures: 1, Errors: 0, 
> Skipped: 0, Time elapsed: 0.309 s <<< FAILURE! -- in 
> org.apache.flink.streaming.runtime.tasks.TwoInputStreamTaskTest
> Error: 07:48:06 07:48:06.646 [ERROR] 
> org.apache.flink.streaming.runtime.tasks.TwoInputStreamTaskTest.testWatermarkAndWatermarkStatusForwarding
>  -- Time elapsed: 0.036 s <<< FAILURE!
> Feb 16 07:48:06 Output was not correct.: array lengths differed, 
> expected.length=8 actual.length=7; arrays first differed at element [6]; 
> expected: but was:
> Feb 16 07:48:06   at 
> org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:78)
> Feb 16 07:48:06   at 
> org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:28)
> Feb 16 07:48:06   at org.junit.Assert.internalArrayEquals(Assert.java:534)
> Feb 16 07:48:06   at org.junit.Assert.assertArrayEquals(Assert.java:285)
> Feb 16 07:48:06   at 
> org.apache.flink.streaming.util.TestHarnessUtil.assertOutputEquals(TestHarnessUtil.java:59)
> Feb 16 07:48:06   at 
> org.apache.flink.streaming.runtime.tasks.TwoInputStreamTaskTest.testWatermarkAndWatermarkStatusForwarding(TwoInputStreamTaskTest.java:248)
> Feb 16 07:48:06   at java.lang.reflect.Method.invoke(Method.java:498)
> Feb 16 07:48:06 Caused by: java.lang.AssertionError: expected: 
> but was:
> Feb 16 07:48:06   at org.junit.Assert.fail(Assert.java:89)
> Feb 16 07:48:06   at org.junit.Assert.failNotEquals(Assert.java:835)
> Feb 16 07:48:06   at org.junit.Assert.assertEquals(Assert.java:120)
> Feb 16 07:48:06   at org.junit.Assert.assertEquals(Assert.java:146)
> Feb 16 07:48:06   at 
> org.junit.internal.ExactComparisonCriteria.assertElementsEqual(ExactComparisonCriteria.java:8)
> Feb 16 07:48:06   at 
> org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:76)
> Feb 16 07:48:06   ... 6 more
> {code}
> I couldn't reproduce it locally with 2 runs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34450) TwoInputStreamTaskTest.testWatermarkAndWatermarkStatusForwarding failed

2024-04-04 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834061#comment-17834061
 ] 

Robert Metzger commented on FLINK-34450:


Builds on master are failing with this -- setting to blocker: 
https://github.com/apache/flink/actions/runs/8545965922/job/23415690241

> TwoInputStreamTaskTest.testWatermarkAndWatermarkStatusForwarding failed
> ---
>
> Key: FLINK-34450
> URL: https://issues.apache.org/jira/browse/FLINK-34450
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Task
>Affects Versions: 1.20.0
>Reporter: Matthias Pohl
>Priority: Blocker
>  Labels: github-actions, test-stability
>
> https://github.com/XComp/flink/actions/runs/7927275243/job/21643615491#step:10:9880
> {code}
> Error: 07:48:06 07:48:06.643 [ERROR] Tests run: 11, Failures: 1, Errors: 0, 
> Skipped: 0, Time elapsed: 0.309 s <<< FAILURE! -- in 
> org.apache.flink.streaming.runtime.tasks.TwoInputStreamTaskTest
> Error: 07:48:06 07:48:06.646 [ERROR] 
> org.apache.flink.streaming.runtime.tasks.TwoInputStreamTaskTest.testWatermarkAndWatermarkStatusForwarding
>  -- Time elapsed: 0.036 s <<< FAILURE!
> Feb 16 07:48:06 Output was not correct.: array lengths differed, 
> expected.length=8 actual.length=7; arrays first differed at element [6]; 
> expected: but was:
> Feb 16 07:48:06   at 
> org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:78)
> Feb 16 07:48:06   at 
> org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:28)
> Feb 16 07:48:06   at org.junit.Assert.internalArrayEquals(Assert.java:534)
> Feb 16 07:48:06   at org.junit.Assert.assertArrayEquals(Assert.java:285)
> Feb 16 07:48:06   at 
> org.apache.flink.streaming.util.TestHarnessUtil.assertOutputEquals(TestHarnessUtil.java:59)
> Feb 16 07:48:06   at 
> org.apache.flink.streaming.runtime.tasks.TwoInputStreamTaskTest.testWatermarkAndWatermarkStatusForwarding(TwoInputStreamTaskTest.java:248)
> Feb 16 07:48:06   at java.lang.reflect.Method.invoke(Method.java:498)
> Feb 16 07:48:06 Caused by: java.lang.AssertionError: expected: 
> but was:
> Feb 16 07:48:06   at org.junit.Assert.fail(Assert.java:89)
> Feb 16 07:48:06   at org.junit.Assert.failNotEquals(Assert.java:835)
> Feb 16 07:48:06   at org.junit.Assert.assertEquals(Assert.java:120)
> Feb 16 07:48:06   at org.junit.Assert.assertEquals(Assert.java:146)
> Feb 16 07:48:06   at 
> org.junit.internal.ExactComparisonCriteria.assertElementsEqual(ExactComparisonCriteria.java:8)
> Feb 16 07:48:06   at 
> org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:76)
> Feb 16 07:48:06   ... 6 more
> {code}
> I couldn't reproduce it locally with 2 runs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34450) TwoInputStreamTaskTest.testWatermarkAndWatermarkStatusForwarding failed

2024-04-04 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-34450:
---
Priority: Blocker  (was: Major)

> TwoInputStreamTaskTest.testWatermarkAndWatermarkStatusForwarding failed
> ---
>
> Key: FLINK-34450
> URL: https://issues.apache.org/jira/browse/FLINK-34450
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Task
>Affects Versions: 1.20.0
>Reporter: Matthias Pohl
>Priority: Blocker
>  Labels: github-actions, test-stability
>
> https://github.com/XComp/flink/actions/runs/7927275243/job/21643615491#step:10:9880
> {code}
> Error: 07:48:06 07:48:06.643 [ERROR] Tests run: 11, Failures: 1, Errors: 0, 
> Skipped: 0, Time elapsed: 0.309 s <<< FAILURE! -- in 
> org.apache.flink.streaming.runtime.tasks.TwoInputStreamTaskTest
> Error: 07:48:06 07:48:06.646 [ERROR] 
> org.apache.flink.streaming.runtime.tasks.TwoInputStreamTaskTest.testWatermarkAndWatermarkStatusForwarding
>  -- Time elapsed: 0.036 s <<< FAILURE!
> Feb 16 07:48:06 Output was not correct.: array lengths differed, 
> expected.length=8 actual.length=7; arrays first differed at element [6]; 
> expected: but was:
> Feb 16 07:48:06   at 
> org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:78)
> Feb 16 07:48:06   at 
> org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:28)
> Feb 16 07:48:06   at org.junit.Assert.internalArrayEquals(Assert.java:534)
> Feb 16 07:48:06   at org.junit.Assert.assertArrayEquals(Assert.java:285)
> Feb 16 07:48:06   at 
> org.apache.flink.streaming.util.TestHarnessUtil.assertOutputEquals(TestHarnessUtil.java:59)
> Feb 16 07:48:06   at 
> org.apache.flink.streaming.runtime.tasks.TwoInputStreamTaskTest.testWatermarkAndWatermarkStatusForwarding(TwoInputStreamTaskTest.java:248)
> Feb 16 07:48:06   at java.lang.reflect.Method.invoke(Method.java:498)
> Feb 16 07:48:06 Caused by: java.lang.AssertionError: expected: 
> but was:
> Feb 16 07:48:06   at org.junit.Assert.fail(Assert.java:89)
> Feb 16 07:48:06   at org.junit.Assert.failNotEquals(Assert.java:835)
> Feb 16 07:48:06   at org.junit.Assert.assertEquals(Assert.java:120)
> Feb 16 07:48:06   at org.junit.Assert.assertEquals(Assert.java:146)
> Feb 16 07:48:06   at 
> org.junit.internal.ExactComparisonCriteria.assertElementsEqual(ExactComparisonCriteria.java:8)
> Feb 16 07:48:06   at 
> org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:76)
> Feb 16 07:48:06   ... 6 more
> {code}
> I couldn't reproduce it locally with 2 runs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-34999) PR CI stopped operating

2024-04-04 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger reassigned FLINK-34999:
--

Assignee: Lorenzo Affetti

> PR CI stopped operating
> ---
>
> Key: FLINK-34999
> URL: https://issues.apache.org/jira/browse/FLINK-34999
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / CI
>Affects Versions: 1.19.0, 1.18.1, 1.20.0
>Reporter: Matthias Pohl
>Assignee: Lorenzo Affetti
>Priority: Blocker
>
> There are no [new PR CI 
> runs|https://dev.azure.com/apache-flink/apache-flink/_build?definitionId=2] 
> being picked up anymore. [Recently updated 
> PRs|https://github.com/apache/flink/pulls?q=sort%3Aupdated-desc] are not 
> picked up by the @flinkbot.
> In the meantime there was a notification sent from GitHub that the password 
> of the [@flinkbot|https://github.com/flinkbot] was reset for security 
> reasons. It's quite likely that these two events are related.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34999) PR CI stopped operating

2024-04-03 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833629#comment-17833629
 ] 

Robert Metzger commented on FLINK-34999:


I have access to the Flinkbot gh account (Matthias Pohl and Chesnay have access 
too).

[~lorenzo.affetti] I pinged you in the Flink slack regarding the VV flinkbot 
stuff.

> PR CI stopped operating
> ---
>
> Key: FLINK-34999
> URL: https://issues.apache.org/jira/browse/FLINK-34999
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / CI
>Affects Versions: 1.19.0, 1.18.1, 1.20.0
>Reporter: Matthias Pohl
>Priority: Blocker
>
> There are no [new PR CI 
> runs|https://dev.azure.com/apache-flink/apache-flink/_build?definitionId=2] 
> being picked up anymore. [Recently updated 
> PRs|https://github.com/apache/flink/pulls?q=sort%3Aupdated-desc] are not 
> picked up by the @flinkbot.
> In the meantime there was a notification sent from GitHub that the password 
> of the [@flinkbot|https://github.com/flinkbot] was reset for security 
> reasons. It's quite likely that these two events are related.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34999) PR CI stopped operating

2024-04-03 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833587#comment-17833587
 ] 

Robert Metzger commented on FLINK-34999:


I'm trying to restore access to the flinkbot gh account.

> PR CI stopped operating
> ---
>
> Key: FLINK-34999
> URL: https://issues.apache.org/jira/browse/FLINK-34999
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / CI
>Affects Versions: 1.19.0, 1.18.1, 1.20.0
>Reporter: Matthias Pohl
>Priority: Blocker
>
> There are no [new PR CI 
> runs|https://dev.azure.com/apache-flink/apache-flink/_build?definitionId=2] 
> being picked up anymore. [Recently updated 
> PRs|https://github.com/apache/flink/pulls?q=sort%3Aupdated-desc] are not 
> picked up by the @flinkbot.
> In the meantime there was a notification sent from GitHub that the password 
> of the [@flinkbot|https://github.com/flinkbot] was reset for security 
> reasons. It's quite likely that these two events are related.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34960) NullPointerException while applying parallelism overrides for session jobs

2024-04-02 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833086#comment-17833086
 ] 

Robert Metzger commented on FLINK-34960:


Sorry, that's something you need to figure out yourself.

Thanks for validating that the missing configuration caused the issue. I think 
this is a valid bug we need to fix in Flink.

> NullPointerException while applying parallelism overrides for session jobs
> --
>
> Key: FLINK-34960
> URL: https://issues.apache.org/jira/browse/FLINK-34960
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator
>Affects Versions: 1.8.0
>Reporter: Kunal Rohitas
>Priority: Major
>
> While using the autoscaler for session jobs, the operator throws a 
> NullPointerException while trying to apply parallelism overrides, though it's 
> able to generate parallelism suggestion report for scaling. The versions used 
> here are flink-1.18.1 and flink-kubernetes-operator-1.8.0. 
> {code:java}
> 2024-03-26 08:41:21,617 o.a.f.a.JobAutoScalerImpl 
> [ERROR][default/clientsession-job] Error applying overrides. 
> java.lang.NullPointerException at 
> org.apache.flink.kubernetes.operator.autoscaler.KubernetesScalingRealizer.realizeParallelismOverrides(KubernetesScalingRealizer.java:52)
>  at 
> org.apache.flink.kubernetes.operator.autoscaler.KubernetesScalingRealizer.realizeParallelismOverrides(KubernetesScalingRealizer.java:40)
>  at 
> org.apache.flink.autoscaler.JobAutoScalerImpl.applyParallelismOverrides(JobAutoScalerImpl.java:161)
>  at 
> org.apache.flink.autoscaler.JobAutoScalerImpl.scale(JobAutoScalerImpl.java:111)
>  at 
> org.apache.flink.kubernetes.operator.reconciler.deployment.AbstractFlinkResourceReconciler.applyAutoscaler(AbstractFlinkResourceReconciler.java:192)
>  at 
> org.apache.flink.kubernetes.operator.reconciler.deployment.AbstractFlinkResourceReconciler.reconcile(AbstractFlinkResourceReconciler.java:139)
>  at 
> org.apache.flink.kubernetes.operator.controller.FlinkSessionJobController.reconcile(FlinkSessionJobController.java:116)
>  at 
> org.apache.flink.kubernetes.operator.controller.FlinkSessionJobController.reconcile(FlinkSessionJobController.java:53)
>  at 
> io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:152)
>  at 
> io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:110)
>  at 
> org.apache.flink.kubernetes.operator.metrics.OperatorJosdkMetrics.timeControllerExecution(OperatorJosdkMetrics.java:80)
>  at 
> io.javaoperatorsdk.operator.processing.Controller.reconcile(Controller.java:109)
>  at 
> io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.reconcileExecution(ReconciliationDispatcher.java:140)
>  at 
> io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleReconcile(ReconciliationDispatcher.java:121)
>  at 
> io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleDispatch(ReconciliationDispatcher.java:91)
>  at 
> io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleExecution(ReconciliationDispatcher.java:64)
>  at 
> io.javaoperatorsdk.operator.processing.event.EventProcessor$ReconcilerExecutor.run(EventProcessor.java:417)
>  at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
> Source) at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 
> at java.base/java.lang.Thread.run(Unknown Source){code}
>  
> {code:java}
> 2024-03-26 08:41:21,617 o.a.f.a.JobAutoScalerImpl 
> [ERROR][default/clientsession-job] Error while scaling job 
> java.lang.NullPointerException at 
> org.apache.flink.kubernetes.operator.autoscaler.KubernetesScalingRealizer.realizeParallelismOverrides(KubernetesScalingRealizer.java:52)
>  at 
> org.apache.flink.kubernetes.operator.autoscaler.KubernetesScalingRealizer.realizeParallelismOverrides(KubernetesScalingRealizer.java:40)
>  at 
> org.apache.flink.autoscaler.JobAutoScalerImpl.applyParallelismOverrides(JobAutoScalerImpl.java:161)
>  at 
> org.apache.flink.autoscaler.JobAutoScalerImpl.scale(JobAutoScalerImpl.java:111)
>  at 
> org.apache.flink.kubernetes.operator.reconciler.deployment.AbstractFlinkResourceReconciler.applyAutoscaler(AbstractFlinkResourceReconciler.java:192)
>  at 
> org.apache.flink.kubernetes.operator.reconciler.deployment.AbstractFlinkResourceReconciler.reconcile(AbstractFlinkResourceReconciler.java:139)
>  at 
> org.apache.flink.kubernetes.operator.controller.FlinkSessionJobController.reconcile(FlinkSessionJobController.java:116)
>  at 
> org.apache.flink.kubernetes.operator.controller.FlinkSessionJobController.reconcile(FlinkSessionJobController.java:53)
>  at 
> io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:152)
>  at 
> 

[jira] [Commented] (FLINK-34960) NullPointerException while applying parallelism overrides for session jobs

2024-04-02 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833060#comment-17833060
 ] 

Robert Metzger commented on FLINK-34960:


Is it possible that your `FlinkSessionJobSpec` does not contain any 
`flinkConfiguration`? Can you try adding some dummy Flink config (e.g. foo: 
bar), just to validate if this resolves the bug?

> NullPointerException while applying parallelism overrides for session jobs
> --
>
> Key: FLINK-34960
> URL: https://issues.apache.org/jira/browse/FLINK-34960
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator
>Affects Versions: 1.8.0
>Reporter: Kunal Rohitas
>Priority: Major
>
> While using the autoscaler for session jobs, the operator throws a 
> NullPointerException while trying to apply parallelism overrides, though it's 
> able to generate parallelism suggestion report for scaling. The versions used 
> here are flink-1.18.1 and flink-kubernetes-operator-1.8.0. 
> {code:java}
> 2024-03-26 08:41:21,617 o.a.f.a.JobAutoScalerImpl 
> [ERROR][default/clientsession-job] Error applying overrides. 
> java.lang.NullPointerException at 
> org.apache.flink.kubernetes.operator.autoscaler.KubernetesScalingRealizer.realizeParallelismOverrides(KubernetesScalingRealizer.java:52)
>  at 
> org.apache.flink.kubernetes.operator.autoscaler.KubernetesScalingRealizer.realizeParallelismOverrides(KubernetesScalingRealizer.java:40)
>  at 
> org.apache.flink.autoscaler.JobAutoScalerImpl.applyParallelismOverrides(JobAutoScalerImpl.java:161)
>  at 
> org.apache.flink.autoscaler.JobAutoScalerImpl.scale(JobAutoScalerImpl.java:111)
>  at 
> org.apache.flink.kubernetes.operator.reconciler.deployment.AbstractFlinkResourceReconciler.applyAutoscaler(AbstractFlinkResourceReconciler.java:192)
>  at 
> org.apache.flink.kubernetes.operator.reconciler.deployment.AbstractFlinkResourceReconciler.reconcile(AbstractFlinkResourceReconciler.java:139)
>  at 
> org.apache.flink.kubernetes.operator.controller.FlinkSessionJobController.reconcile(FlinkSessionJobController.java:116)
>  at 
> org.apache.flink.kubernetes.operator.controller.FlinkSessionJobController.reconcile(FlinkSessionJobController.java:53)
>  at 
> io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:152)
>  at 
> io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:110)
>  at 
> org.apache.flink.kubernetes.operator.metrics.OperatorJosdkMetrics.timeControllerExecution(OperatorJosdkMetrics.java:80)
>  at 
> io.javaoperatorsdk.operator.processing.Controller.reconcile(Controller.java:109)
>  at 
> io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.reconcileExecution(ReconciliationDispatcher.java:140)
>  at 
> io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleReconcile(ReconciliationDispatcher.java:121)
>  at 
> io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleDispatch(ReconciliationDispatcher.java:91)
>  at 
> io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleExecution(ReconciliationDispatcher.java:64)
>  at 
> io.javaoperatorsdk.operator.processing.event.EventProcessor$ReconcilerExecutor.run(EventProcessor.java:417)
>  at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
> Source) at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 
> at java.base/java.lang.Thread.run(Unknown Source){code}
>  
> {code:java}
> 2024-03-26 08:41:21,617 o.a.f.a.JobAutoScalerImpl 
> [ERROR][default/clientsession-job] Error while scaling job 
> java.lang.NullPointerException at 
> org.apache.flink.kubernetes.operator.autoscaler.KubernetesScalingRealizer.realizeParallelismOverrides(KubernetesScalingRealizer.java:52)
>  at 
> org.apache.flink.kubernetes.operator.autoscaler.KubernetesScalingRealizer.realizeParallelismOverrides(KubernetesScalingRealizer.java:40)
>  at 
> org.apache.flink.autoscaler.JobAutoScalerImpl.applyParallelismOverrides(JobAutoScalerImpl.java:161)
>  at 
> org.apache.flink.autoscaler.JobAutoScalerImpl.scale(JobAutoScalerImpl.java:111)
>  at 
> org.apache.flink.kubernetes.operator.reconciler.deployment.AbstractFlinkResourceReconciler.applyAutoscaler(AbstractFlinkResourceReconciler.java:192)
>  at 
> org.apache.flink.kubernetes.operator.reconciler.deployment.AbstractFlinkResourceReconciler.reconcile(AbstractFlinkResourceReconciler.java:139)
>  at 
> org.apache.flink.kubernetes.operator.controller.FlinkSessionJobController.reconcile(FlinkSessionJobController.java:116)
>  at 
> org.apache.flink.kubernetes.operator.controller.FlinkSessionJobController.reconcile(FlinkSessionJobController.java:53)
>  at 
> io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:152)
>  at 
> 

[jira] [Commented] (FLINK-34470) Transactional message + Table api kafka source with 'latest-offset' scan bound mode causes indefinitely hanging

2024-03-25 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830552#comment-17830552
 ] 

Robert Metzger commented on FLINK-34470:


[~renqs] can you take a look at this?

> Transactional message + Table api kafka source with 'latest-offset' scan 
> bound mode causes indefinitely hanging
> ---
>
> Key: FLINK-34470
> URL: https://issues.apache.org/jira/browse/FLINK-34470
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.17.1
>Reporter: dongwoo.kim
>Priority: Major
>
> h2. Summary  
> Hi we have faced issue with transactional message and table api kafka source. 
> If we configure *'scan.bounded.mode'* to *'latest-offset'* flink sql's 
> request timeouts after hanging. We can always reproduce this unexpected 
> behavior by following below steps.
> This is related to this 
> [issue|https://issues.apache.org/jira/browse/FLINK-33484] too.
> h2. How to reproduce
> 1. Deploy transactional producer and stop after producing certain amount of 
> messages
> 2. Configure *'scan.bounded.mode'* to *'latest-offset'* and submit simple 
> query such as getting count of the produced messages
> 3. Flink sql job gets stucked and timeouts.
> h2. Cause
> Transaction producer always produces [control 
> records|https://kafka.apache.org/documentation/#controlbatch] at the end of 
> the transaction. And these control messages are not polled by 
> {*}consumer.poll(){*}. (It is filtered internally). In 
> *KafkaPartitionSplitReader* code, split is finished only when 
> *lastRecord.offset() >= stoppingOffset - 1* condition is met. This might work 
> well with non transactional messages or streaming environment but in some 
> batch use cases it causes unexpected hanging.
> h2. Proposed solution
> {code:java}
> if (consumer.position(tp) >= stoppingOffset) {
> recordsBySplits.setPartitionStoppingOffset(tp, 
> stoppingOffset);
> finishSplitAtRecord(
> tp,
> stoppingOffset,
> lastRecord.offset(),
> finishedPartitions,
> recordsBySplits);
> }
> {code}
> Replacing if condition to *consumer.position(tp) >= stoppingOffset* in 
> [here|https://github.com/apache/flink-connector-kafka/blob/15f2662eccf461d9d539ed87a78c9851cd17fa43/flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/source/reader/KafkaPartitionSplitReader.java#L137]
>  can solve the problem. 
> *consumer.position(tp)* gets next record's offset if it exist and the last 
> record's offset if the next record doesn't exist. 
> By this KafkaPartitionSplitReader is available to finish the split even when 
> the stopping offset is configured to control record's offset. 
> I would be happy to implement about this fix if we can reach on agreement. 
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-29122) Improve robustness of FileUtils.expandDirectory()

2024-03-13 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-29122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger resolved FLINK-29122.

Resolution: Fixed

> Improve robustness of FileUtils.expandDirectory() 
> --
>
> Key: FLINK-29122
> URL: https://issues.apache.org/jira/browse/FLINK-29122
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core
>Affects Versions: 1.16.0, 1.17.0
>Reporter: Robert Metzger
>Assignee: Anupam Aggarwal
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>
> `FileUtils.expandDirectory()` can potentially write to invalid locations if 
> the zip file is invalid (contains entry names with ../).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-29122) Improve robustness of FileUtils.expandDirectory()

2024-03-13 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826762#comment-17826762
 ] 

Robert Metzger commented on FLINK-29122:


Merged to master in 
https://github.com/apache/flink/commit/0da60ca1a4754f858cf7c52dd4f0c97ae0e1b0cb

> Improve robustness of FileUtils.expandDirectory() 
> --
>
> Key: FLINK-29122
> URL: https://issues.apache.org/jira/browse/FLINK-29122
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core
>Affects Versions: 1.16.0, 1.17.0
>Reporter: Robert Metzger
>Assignee: Anupam Aggarwal
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>
> `FileUtils.expandDirectory()` can potentially write to invalid locations if 
> the zip file is invalid (contains entry names with ../).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-29122) Improve robustness of FileUtils.expandDirectory()

2024-03-13 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-29122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-29122:
---
Fix Version/s: 1.20.0

> Improve robustness of FileUtils.expandDirectory() 
> --
>
> Key: FLINK-29122
> URL: https://issues.apache.org/jira/browse/FLINK-29122
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core
>Affects Versions: 1.16.0, 1.17.0
>Reporter: Robert Metzger
>Assignee: Anupam Aggarwal
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>
> `FileUtils.expandDirectory()` can potentially write to invalid locations if 
> the zip file is invalid (contains entry names with ../).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34148) Potential regression (Jan. 13): stringWrite with Java8

2024-02-21 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819269#comment-17819269
 ] 

Robert Metzger commented on FLINK-34148:


Thanks, but how does the broken shading behavior cause a performance regression 
in the {{StringSerializationBenchmark}} ?

> Potential regression (Jan. 13): stringWrite with Java8
> --
>
> Key: FLINK-34148
> URL: https://issues.apache.org/jira/browse/FLINK-34148
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Type Serialization System
>Reporter: Zakelly Lan
>Assignee: Sergey Nuyanzin
>Priority: Blocker
> Fix For: 1.19.0
>
>
> Significant drop of performance in stringWrite with Java8 from commit 
> [881062f352|https://github.com/apache/flink/commit/881062f352f8bf8c21ab7cbea95e111fd82fdf20]
>  to 
> [5d9d8748b6|https://github.com/apache/flink/commit/5d9d8748b64ff1a75964a5cd2857ab5061312b51]
>  . It only involves strings not so long (128 or 4).
> stringWrite.128.ascii(Java8) baseline=1089.107756 current_value=754.52452
> stringWrite.128.chinese(Java8) baseline=504.244575 current_value=295.358989
> stringWrite.128.russian(Java8) baseline=655.582639 current_value=421.030188
> stringWrite.4.chinese(Java8) baseline=9598.791964 current_value=6627.929927
> stringWrite.4.russian(Java8) baseline=11070.666415 current_value=8289.95767



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34148) Potential regression (Jan. 13): stringWrite with Java8

2024-02-21 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819265#comment-17819265
 ] 

Robert Metzger commented on FLINK-34148:


Thanks for addressing this issue.
Maybe I'm overlooking something in the discussion or related tickets, but have 
we understood what caused the performance regression in flink-shaded 1.18? E.g. 
do we know what we need to fix flink-shaded to move to flink-shaded 1.18.1 or 
1.19?

> Potential regression (Jan. 13): stringWrite with Java8
> --
>
> Key: FLINK-34148
> URL: https://issues.apache.org/jira/browse/FLINK-34148
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Type Serialization System
>Reporter: Zakelly Lan
>Assignee: Sergey Nuyanzin
>Priority: Blocker
> Fix For: 1.19.0
>
>
> Significant drop of performance in stringWrite with Java8 from commit 
> [881062f352|https://github.com/apache/flink/commit/881062f352f8bf8c21ab7cbea95e111fd82fdf20]
>  to 
> [5d9d8748b6|https://github.com/apache/flink/commit/5d9d8748b64ff1a75964a5cd2857ab5061312b51]
>  . It only involves strings not so long (128 or 4).
> stringWrite.128.ascii(Java8) baseline=1089.107756 current_value=754.52452
> stringWrite.128.chinese(Java8) baseline=504.244575 current_value=295.358989
> stringWrite.128.russian(Java8) baseline=655.582639 current_value=421.030188
> stringWrite.4.chinese(Java8) baseline=9598.791964 current_value=6627.929927
> stringWrite.4.russian(Java8) baseline=11070.666415 current_value=8289.95767



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-29122) Improve robustness of FileUtils.expandDirectory()

2024-02-12 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-29122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger reassigned FLINK-29122:
--

Assignee: Anupam Aggarwal  (was: Robert Metzger)

> Improve robustness of FileUtils.expandDirectory() 
> --
>
> Key: FLINK-29122
> URL: https://issues.apache.org/jira/browse/FLINK-29122
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core
>Affects Versions: 1.16.0, 1.17.0
>Reporter: Robert Metzger
>Assignee: Anupam Aggarwal
>Priority: Major
>
> `FileUtils.expandDirectory()` can potentially write to invalid locations if 
> the zip file is invalid (contains entry names with ../).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33587) Tidy up docs around JDBC

2023-11-17 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger reassigned FLINK-33587:
--

Assignee: Robin Moffatt

> Tidy up docs around JDBC
> 
>
> Key: FLINK-33587
> URL: https://issues.apache.org/jira/browse/FLINK-33587
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Robin Moffatt
>Assignee: Robin Moffatt
>Priority: Not a Priority
>
> The documentation around using JDBC with Flink is, IMHO, a bit disjointed and 
> could be easier to follow. Specifically on
>  * 
> [https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/jdbcdriver/]
>  ** Add note to this page regarding use of this vs Hive JDBC option through 
> HiveServer2 endpoint
>  * 
> [https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/hive-compatibility/hiveserver2/]
>  should simply redirect to 
> [https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql-gateway/hiveserver2/]
>  since this is a feature of the SQL Gateway. Having two pages of the same 
> title is confusing.
> There are also various nits in the above pages, such as missing 
> capitalisation, links to github repos that would be useful to add, etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33335) Remove unused e2e tests

2023-10-24 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-5?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17779087#comment-17779087
 ] 

Robert Metzger commented on FLINK-5:


It's a long time ago since we worked on this :) I don't recall the details, and 
there's no explicit comment about removing those tests.
My gut feeling is that removing these tests was not intentional, and if it is 
not very difficult to re-activate them, I would do so.

> Remove unused e2e tests
> ---
>
> Key: FLINK-5
> URL: https://issues.apache.org/jira/browse/FLINK-5
> Project: Flink
>  Issue Type: Improvement
>Reporter: Alexander Fedulov
>Assignee: Alexander Fedulov
>Priority: Major
>
> FLINK-17375 removed _run-pre-commit-tests.sh_ in Flink 1.12 [1]. Since then 
> the following tests are not executed anymore:
> _test_state_migration.sh_
> _test_state_evolution.sh_
> _test_streaming_kinesis.sh_
> _test_streaming_classloader.sh_
> _test_streaming_distributed_cache_via_blob.sh_
> [1]   
> https://github.com/apache/flink/pull/12268/files#diff-39f0aea40d2dd3f026544bb4c2502b2e9eab4c825df5f2b68c6d4ca8c39d7b5e



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33217) Flink SQL: UNNEST fails with on LEFT JOIN with NOT NULL type in array

2023-10-09 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-33217:
---
Attachment: UnnestNullErrorTest.scala

> Flink SQL: UNNEST fails with on LEFT JOIN with NOT NULL type in array
> -
>
> Key: FLINK-33217
> URL: https://issues.apache.org/jira/browse/FLINK-33217
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.15.3, 1.18.0, 1.19.0
>Reporter: Robert Metzger
>Priority: Major
> Attachments: UnnestNullErrorTest.scala
>
>
> Steps to reproduce:
> Take a column of type 
> {code:java}
> business_data ARRAY
> {code}
> Take this query
> {code:java}
> select bd_name from reproduce_unnest LEFT JOIN 
> UNNEST(reproduce_unnest.business_data) AS exploded_bd(bd_name) ON true
> {code}
> And get this error
> {code:java}
> Caused by: java.lang.AssertionError: Type mismatch:
> rowtype of rel before registration: RecordType(VARCHAR(2147483647) CHARACTER 
> SET "UTF-16LE" NOT NULL ARRAY business_data, VARCHAR(2147483647) CHARACTER 
> SET "UTF-16LE" bd_name) NOT NULL
> rowtype of rel after registration: RecordType(VARCHAR(2147483647) CHARACTER 
> SET "UTF-16LE" NOT NULL ARRAY business_data, VARCHAR(2147483647) CHARACTER 
> SET "UTF-16LE" NOT NULL f0) NOT NULL
> Difference:
> bd_name: VARCHAR(2147483647) CHARACTER SET "UTF-16LE" -> VARCHAR(2147483647) 
> CHARACTER SET "UTF-16LE" NOT NULL
>   at org.apache.calcite.util.Litmus$1.fail(Litmus.java:32)
>   at org.apache.calcite.plan.RelOptUtil.equal(RelOptUtil.java:2206)
>   at 
> org.apache.calcite.rel.AbstractRelNode.onRegister(AbstractRelNode.java:275)
>   at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1270)
>   at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:598)
>   at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:613)
>   at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.changeTraits(VolcanoPlanner.java:498)
>   at 
> org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:315)
>   at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkVolcanoProgram.optimize(FlinkVolcanoProgram.scala:62)
> {code}
> I have implemented a small test case, which fails against Flink 1.15, 1.8 and 
> the latest master branch.
> Workarounds:
> 1. Drop "NOT NULL" in array type
> 2. Drop "LEFT" from "LEFT JOIN".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-33217) Flink SQL: UNNEST fails with on LEFT JOIN with NOT NULL type in array

2023-10-09 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17773311#comment-17773311
 ] 

Robert Metzger edited comment on FLINK-33217 at 10/9/23 1:31 PM:
-

Actually, you can make the reproducer even simpler:

{code}business_data ARRAY{code}
also fails with the same error. I updated the code & description again.


was (Author: rmetzger):
Actually, you can make the reproducer even simpler:

{code}business_data ARRAY{code}
also works. I updated the code & description again.

> Flink SQL: UNNEST fails with on LEFT JOIN with NOT NULL type in array
> -
>
> Key: FLINK-33217
> URL: https://issues.apache.org/jira/browse/FLINK-33217
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.15.3, 1.18.0, 1.19.0
>Reporter: Robert Metzger
>Priority: Major
> Attachments: UnnestNullErrorTest.scala
>
>
> Steps to reproduce:
> Take a column of type 
> {code:java}
> business_data ARRAY
> {code}
> Take this query
> {code:java}
> select bd_name from reproduce_unnest LEFT JOIN 
> UNNEST(reproduce_unnest.business_data) AS exploded_bd(bd_name) ON true
> {code}
> And get this error
> {code:java}
> Caused by: java.lang.AssertionError: Type mismatch:
> rowtype of rel before registration: RecordType(VARCHAR(2147483647) CHARACTER 
> SET "UTF-16LE" NOT NULL ARRAY business_data, VARCHAR(2147483647) CHARACTER 
> SET "UTF-16LE" bd_name) NOT NULL
> rowtype of rel after registration: RecordType(VARCHAR(2147483647) CHARACTER 
> SET "UTF-16LE" NOT NULL ARRAY business_data, VARCHAR(2147483647) CHARACTER 
> SET "UTF-16LE" NOT NULL f0) NOT NULL
> Difference:
> bd_name: VARCHAR(2147483647) CHARACTER SET "UTF-16LE" -> VARCHAR(2147483647) 
> CHARACTER SET "UTF-16LE" NOT NULL
>   at org.apache.calcite.util.Litmus$1.fail(Litmus.java:32)
>   at org.apache.calcite.plan.RelOptUtil.equal(RelOptUtil.java:2206)
>   at 
> org.apache.calcite.rel.AbstractRelNode.onRegister(AbstractRelNode.java:275)
>   at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1270)
>   at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:598)
>   at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:613)
>   at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.changeTraits(VolcanoPlanner.java:498)
>   at 
> org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:315)
>   at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkVolcanoProgram.optimize(FlinkVolcanoProgram.scala:62)
> {code}
> I have implemented a small test case, which fails against Flink 1.15, 1.8 and 
> the latest master branch.
> Workarounds:
> 1. Drop "NOT NULL" in array type
> 2. Drop "LEFT" from "LEFT JOIN".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33217) Flink SQL: UNNEST fails with on LEFT JOIN with NOT NULL type in array

2023-10-09 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-33217:
---
Attachment: (was: UnnestNullErrorTest-1.scala)

> Flink SQL: UNNEST fails with on LEFT JOIN with NOT NULL type in array
> -
>
> Key: FLINK-33217
> URL: https://issues.apache.org/jira/browse/FLINK-33217
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.15.3, 1.18.0, 1.19.0
>Reporter: Robert Metzger
>Priority: Major
> Attachments: UnnestNullErrorTest.scala
>
>
> Steps to reproduce:
> Take a column of type 
> {code:java}
> business_data ARRAY
> {code}
> Take this query
> {code:java}
> select bd_name from reproduce_unnest LEFT JOIN 
> UNNEST(reproduce_unnest.business_data) AS exploded_bd(bd_name) ON true
> {code}
> And get this error
> {code:java}
> Caused by: java.lang.AssertionError: Type mismatch:
> rowtype of rel before registration: RecordType(VARCHAR(2147483647) CHARACTER 
> SET "UTF-16LE" NOT NULL ARRAY business_data, VARCHAR(2147483647) CHARACTER 
> SET "UTF-16LE" bd_name) NOT NULL
> rowtype of rel after registration: RecordType(VARCHAR(2147483647) CHARACTER 
> SET "UTF-16LE" NOT NULL ARRAY business_data, VARCHAR(2147483647) CHARACTER 
> SET "UTF-16LE" NOT NULL f0) NOT NULL
> Difference:
> bd_name: VARCHAR(2147483647) CHARACTER SET "UTF-16LE" -> VARCHAR(2147483647) 
> CHARACTER SET "UTF-16LE" NOT NULL
>   at org.apache.calcite.util.Litmus$1.fail(Litmus.java:32)
>   at org.apache.calcite.plan.RelOptUtil.equal(RelOptUtil.java:2206)
>   at 
> org.apache.calcite.rel.AbstractRelNode.onRegister(AbstractRelNode.java:275)
>   at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1270)
>   at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:598)
>   at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:613)
>   at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.changeTraits(VolcanoPlanner.java:498)
>   at 
> org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:315)
>   at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkVolcanoProgram.optimize(FlinkVolcanoProgram.scala:62)
> {code}
> I have implemented a small test case, which fails against Flink 1.15, 1.8 and 
> the latest master branch.
> Workarounds:
> 1. Drop "NOT NULL" in array type
> 2. Drop "LEFT" from "LEFT JOIN".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33217) Flink SQL: UNNEST fails with on LEFT JOIN with NOT NULL type in array

2023-10-09 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-33217:
---
Description: 
Steps to reproduce:

Take a column of type 

{code:java}
business_data ARRAY
{code}

Take this query

{code:java}
select bd_name from reproduce_unnest LEFT JOIN 
UNNEST(reproduce_unnest.business_data) AS exploded_bd(bd_name) ON true
{code}

And get this error

{code:java}
Caused by: java.lang.AssertionError: Type mismatch:
rowtype of rel before registration: RecordType(VARCHAR(2147483647) CHARACTER 
SET "UTF-16LE" NOT NULL ARRAY business_data, VARCHAR(2147483647) CHARACTER SET 
"UTF-16LE" bd_name) NOT NULL
rowtype of rel after registration: RecordType(VARCHAR(2147483647) CHARACTER SET 
"UTF-16LE" NOT NULL ARRAY business_data, VARCHAR(2147483647) CHARACTER SET 
"UTF-16LE" NOT NULL f0) NOT NULL
Difference:
bd_name: VARCHAR(2147483647) CHARACTER SET "UTF-16LE" -> VARCHAR(2147483647) 
CHARACTER SET "UTF-16LE" NOT NULL

at org.apache.calcite.util.Litmus$1.fail(Litmus.java:32)
at org.apache.calcite.plan.RelOptUtil.equal(RelOptUtil.java:2206)
at 
org.apache.calcite.rel.AbstractRelNode.onRegister(AbstractRelNode.java:275)
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1270)
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:598)
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:613)
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.changeTraits(VolcanoPlanner.java:498)
at 
org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:315)
at 
org.apache.flink.table.planner.plan.optimize.program.FlinkVolcanoProgram.optimize(FlinkVolcanoProgram.scala:62)
{code}
I have implemented a small test case, which fails against Flink 1.15, 1.8 and 
the latest master branch.

Workarounds:
1. Drop "NOT NULL" in array type
2. Drop "LEFT" from "LEFT JOIN".

  was:
Steps to reproduce:

Take a column of type 

{code:java}
business_data ROW<`updateEvent` ARRAY>
{code}

Take this query

{code:java}
select ue_name from reproduce_unnest LEFT JOIN 
UNNEST(reproduce_unnest.business_data.updateEvent) AS exploded_ue(ue_name) ON 
true
{code}

And get this error

{code:java}
Caused by: java.lang.AssertionError: Type mismatch:
rowtype of rel before registration: 
RecordType(RecordType:peek_no_expand(VARCHAR(2147483647) CHARACTER SET 
"UTF-16LE" NOT NULL ARRAY updateEvent) business_data, VARCHAR(2147483647) 
CHARACTER SET "UTF-16LE" ue_name) NOT NULL
rowtype of rel after registration: 
RecordType(RecordType:peek_no_expand(VARCHAR(2147483647) CHARACTER SET 
"UTF-16LE" NOT NULL ARRAY updateEvent) business_data, VARCHAR(2147483647) 
CHARACTER SET "UTF-16LE" NOT NULL f0) NOT NULL
Difference:
ue_name: VARCHAR(2147483647) CHARACTER SET "UTF-16LE" -> VARCHAR(2147483647) 
CHARACTER SET "UTF-16LE" NOT NULL

at org.apache.calcite.util.Litmus$1.fail(Litmus.java:32)
at org.apache.calcite.plan.RelOptUtil.equal(RelOptUtil.java:2206)
at 
org.apache.calcite.rel.AbstractRelNode.onRegister(AbstractRelNode.java:275)
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1270)
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:598)
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:613)
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.changeTraits(VolcanoPlanner.java:498)
at 
org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:315)
at 
org.apache.flink.table.planner.plan.optimize.program.FlinkVolcanoProgram.optimize(FlinkVolcanoProgram.scala:62)
{code}
I have implemented a small test case, which fails against Flink 1.15, 1.8 and 
the latest master branch.


> Flink SQL: UNNEST fails with on LEFT JOIN with NOT NULL type in array
> -
>
> Key: FLINK-33217
> URL: https://issues.apache.org/jira/browse/FLINK-33217
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.15.3, 1.18.0, 1.19.0
>Reporter: Robert Metzger
>Priority: Major
> Attachments: UnnestNullErrorTest-1.scala
>
>
> Steps to reproduce:
> Take a column of type 
> {code:java}
> business_data ARRAY
> {code}
> Take this query
> {code:java}
> select bd_name from reproduce_unnest LEFT JOIN 
> UNNEST(reproduce_unnest.business_data) AS exploded_bd(bd_name) ON true
> {code}
> And get this error
> {code:java}
> Caused by: java.lang.AssertionError: Type mismatch:
> rowtype of rel before registration: RecordType(VARCHAR(2147483647) CHARACTER 
> SET "UTF-16LE" NOT NULL ARRAY business_data, VARCHAR(2147483647) CHARACTER 
> SET "UTF-16LE" bd_name) 

[jira] [Updated] (FLINK-33217) Flink SQL: UNNEST fails with on LEFT JOIN with NOT NULL type in array

2023-10-09 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-33217:
---
Attachment: UnnestNullErrorTest-1.scala

> Flink SQL: UNNEST fails with on LEFT JOIN with NOT NULL type in array
> -
>
> Key: FLINK-33217
> URL: https://issues.apache.org/jira/browse/FLINK-33217
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.15.3, 1.18.0, 1.19.0
>Reporter: Robert Metzger
>Priority: Major
> Attachments: UnnestNullErrorTest-1.scala, UnnestNullErrorTest.scala
>
>
> Steps to reproduce:
> Take a column of type 
> {code:java}
> business_data ROW<`id` STRING, `updateEvent` ARRAY NULL> NOT NULL>> {code}
> Take this query
> {code:java}
> select id, ue_name from reproduce_unnest LEFT JOIN 
> UNNEST(reproduce_unnest.business_data.updateEvent) AS exploded_ue(ue_name) ON 
> true {code}
> And get this error
> {code:java}
> Caused by: java.lang.AssertionError: Type mismatch:
> rowtype of rel before registration: 
> RecordType(RecordType:peek_no_expand(VARCHAR(2147483647) CHARACTER SET 
> "UTF-16LE" id, RecordType:peek_no_expand(VARCHAR(2147483647) CHARACTER SET 
> "UTF-16LE" NOT NULL name) NOT NULL ARRAY updateEvent) business_data, 
> VARCHAR(2147483647) CHARACTER SET "UTF-16LE" ue_name) NOT NULL
> rowtype of rel after registration: 
> RecordType(RecordType:peek_no_expand(VARCHAR(2147483647) CHARACTER SET 
> "UTF-16LE" id, RecordType:peek_no_expand(VARCHAR(2147483647) CHARACTER SET 
> "UTF-16LE" NOT NULL name) NOT NULL ARRAY updateEvent) business_data, 
> VARCHAR(2147483647) CHARACTER SET "UTF-16LE" NOT NULL name) NOT NULL
> Difference:
> ue_name: VARCHAR(2147483647) CHARACTER SET "UTF-16LE" -> VARCHAR(2147483647) 
> CHARACTER SET "UTF-16LE" NOT NULL
>   at org.apache.calcite.util.Litmus$1.fail(Litmus.java:32)
>   at org.apache.calcite.plan.RelOptUtil.equal(RelOptUtil.java:2206)
>   at 
> org.apache.calcite.rel.AbstractRelNode.onRegister(AbstractRelNode.java:275)
>   at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1270)
>   at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:598)
>   at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:613)
>   at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.changeTraits(VolcanoPlanner.java:498)
>   at 
> org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:315)
>   at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkVolcanoProgram.optimize(FlinkVolcanoProgram.scala:62)
>   ... 66 more
> {code}
> I have implemented a small test case, which fails against Flink 1.15, 1.8 and 
> the latest master branch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33217) Flink SQL: UNNEST fails with on LEFT JOIN with NOT NULL type in array

2023-10-09 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-33217:
---
Attachment: UnnestNullErrorTest.scala

> Flink SQL: UNNEST fails with on LEFT JOIN with NOT NULL type in array
> -
>
> Key: FLINK-33217
> URL: https://issues.apache.org/jira/browse/FLINK-33217
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.15.3, 1.18.0, 1.19.0
>Reporter: Robert Metzger
>Priority: Major
> Attachments: UnnestNullErrorTest.scala
>
>
> Steps to reproduce:
> Take a column of type 
> {code:java}
> business_data ROW<`id` STRING, `updateEvent` ARRAY NULL> NOT NULL>> {code}
> Take this query
> {code:java}
> select id, ue_name from reproduce_unnest LEFT JOIN 
> UNNEST(reproduce_unnest.business_data.updateEvent) AS exploded_ue(ue_name) ON 
> true {code}
> And get this error
> {code:java}
> Caused by: java.lang.AssertionError: Type mismatch:
> rowtype of rel before registration: 
> RecordType(RecordType:peek_no_expand(VARCHAR(2147483647) CHARACTER SET 
> "UTF-16LE" id, RecordType:peek_no_expand(VARCHAR(2147483647) CHARACTER SET 
> "UTF-16LE" NOT NULL name) NOT NULL ARRAY updateEvent) business_data, 
> VARCHAR(2147483647) CHARACTER SET "UTF-16LE" ue_name) NOT NULL
> rowtype of rel after registration: 
> RecordType(RecordType:peek_no_expand(VARCHAR(2147483647) CHARACTER SET 
> "UTF-16LE" id, RecordType:peek_no_expand(VARCHAR(2147483647) CHARACTER SET 
> "UTF-16LE" NOT NULL name) NOT NULL ARRAY updateEvent) business_data, 
> VARCHAR(2147483647) CHARACTER SET "UTF-16LE" NOT NULL name) NOT NULL
> Difference:
> ue_name: VARCHAR(2147483647) CHARACTER SET "UTF-16LE" -> VARCHAR(2147483647) 
> CHARACTER SET "UTF-16LE" NOT NULL
>   at org.apache.calcite.util.Litmus$1.fail(Litmus.java:32)
>   at org.apache.calcite.plan.RelOptUtil.equal(RelOptUtil.java:2206)
>   at 
> org.apache.calcite.rel.AbstractRelNode.onRegister(AbstractRelNode.java:275)
>   at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1270)
>   at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:598)
>   at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:613)
>   at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.changeTraits(VolcanoPlanner.java:498)
>   at 
> org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:315)
>   at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkVolcanoProgram.optimize(FlinkVolcanoProgram.scala:62)
>   ... 66 more
> {code}
> I have implemented a small test case, which fails against Flink 1.15, 1.8 and 
> the latest master branch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33217) Flink SQL: UNNEST fails with on LEFT JOIN with NOT NULL type in array

2023-10-09 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-33217:
--

 Summary: Flink SQL: UNNEST fails with on LEFT JOIN with NOT NULL 
type in array
 Key: FLINK-33217
 URL: https://issues.apache.org/jira/browse/FLINK-33217
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.15.3, 1.18.0, 1.19.0
Reporter: Robert Metzger


Steps to reproduce:

Take a column of type 
{code:java}
business_data ROW<`id` STRING, `updateEvent` ARRAY 
NOT NULL>> {code}
Take this query
{code:java}
select id, ue_name from reproduce_unnest LEFT JOIN 
UNNEST(reproduce_unnest.business_data.updateEvent) AS exploded_ue(ue_name) ON 
true {code}
And get this error
{code:java}
Caused by: java.lang.AssertionError: Type mismatch:rowtype of rel before 
registration: RecordType(RecordType:peek_no_expand(VARCHAR(2147483647) 
CHARACTER SET "UTF-16LE" id, RecordType:peek_no_expand(VARCHAR(2147483647) 
CHARACTER SET "UTF-16LE" NOT NULL name) NOT NULL ARRAY updateEvent) 
business_data, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" ue_name) NOT 
NULLrowtype of rel after registration: 
RecordType(RecordType:peek_no_expand(VARCHAR(2147483647) CHARACTER SET 
"UTF-16LE" id, RecordType:peek_no_expand(VARCHAR(2147483647) CHARACTER SET 
"UTF-16LE" NOT NULL name) NOT NULL ARRAY updateEvent) business_data, 
VARCHAR(2147483647) CHARACTER SET "UTF-16LE" NOT NULL name) NOT 
NULLDifference:ue_name: VARCHAR(2147483647) CHARACTER SET "UTF-16LE" -> 
VARCHAR(2147483647) CHARACTER SET "UTF-16LE" NOT NULL
at org.apache.calcite.util.Litmus$1.fail(Litmus.java:32)at 
org.apache.calcite.plan.RelOptUtil.equal(RelOptUtil.java:2206)   at 
org.apache.calcite.rel.AbstractRelNode.onRegister(AbstractRelNode.java:275)  at 
org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1270)
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:598)
 at 
org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:613)
 at 
org.apache.calcite.plan.volcano.VolcanoPlanner.changeTraits(VolcanoPlanner.java:498)
 at org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:315) 
 at 
org.apache.flink.table.planner.plan.optimize.program.FlinkVolcanoProgram.optimize(FlinkVolcanoProgram.scala:62)
 {code}
I have implemented a small test case, which fails against Flink 1.15, 1.8 and 
the latest master branch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-28518) Exception: AssertionError: Cannot add expression of different type to set in sub-query with ROW type

2023-10-05 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-28518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-28518:
---
Fix Version/s: 1.18.0

> Exception: AssertionError: Cannot add expression of different type to set in 
> sub-query with ROW type
> 
>
> Key: FLINK-28518
> URL: https://issues.apache.org/jira/browse/FLINK-28518
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.15.1
>Reporter: Koylubaev Nikita
>Priority: Major
> Fix For: 1.18.0
>
> Attachments: SubQueryRowTypeTest.scala, test.sql
>
>
> All scripts is attached to file: test.sql
> Create 2 tables:
>  
> {code:java}
> SET
> 'sql-client.execution.result-mode' = 'tableau';
> SET
> 'execution.runtime-mode' = 'batch';
> SET
> 'sql-client.execution.mode' = 'streaming';
> SET
> 'parallelism.default' = '8';
> SET
> 'table.dml-sync' = 'true';
> CREATE
> TEMPORARY TABLE fl (
> `id` INT,
> `name` STRING) 
> WITH (
> 'connector' = 'faker',
> 'number-of-rows' = '10',
> 'rows-per-second' = '1',
> 'fields.id.expression' = '#{number.numberBetween ''0'',''10''}',
> 'fields.name.expression' = '#{superhero.name}');
> CREATE
> TEMPORARY TABLE application (
> `id` INT,
> `fl_id` INT,
> `num` INT,
> `db` DOUBLE) 
> WITH (
> 'connector' = 'faker',
> 'number-of-rows' = '100',
> 'rows-per-second' = '100',
> 'fields.id.expression' = '#{number.numberBetween ''0'',''100''}',
> 'fields.fl_id.expression' = '#{number.numberBetween ''0'',''10''}',
> 'fields.num.expression' = '#{number.numberBetween 
> ''-2147483648'',''2147483647''}',
> 'fields.db.expression' = '#{number.randomDouble ''3'',''-1000'',''1000''}'); 
> {code}
> The next SQL throw exception:
> {code:java}
> select fl.name,
>(select (COLLECT(application.num), COLLECT(application.db))
> from application
> where fl.id = application.fl_id)
> from fl;{code}
> Error stack trace is (I marked what is different in type: it's just NOT NULL):
>  
>  
> {code:java}
> [ERROR] Could not execute SQL statement. Reason:
> java.lang.AssertionError: Cannot add expression of different type to set:
> set type is RecordType(INTEGER id, VARCHAR(2147483647) CHARACTER SET 
> "UTF-16LE" name, RecordType(INTEGER MULTISET EXPR$0, DOUBLE MULTISET EXPR$1) 
> $f0) NOT NULL
> expression type is RecordType(INTEGER id, VARCHAR(2147483647) CHARACTER SET 
> "UTF-16LE" name, RecordType(INTEGER MULTISET EXPR$0, DOUBLE MULTISET EXPR$1) 
> NOT NULL $f0) NOT NULL
> set is 
> rel#129:LogicalCorrelate.NONE.any.[](left=HepRelVertex#119,right=HepRelVertex#128,correlation=$cor0,joinType=left,requiredColumns={0})
> expression is LogicalProject(id=[$0], name=[$1], $f0=[ROW($2, $3)])
>   LogicalCorrelate(correlation=[$cor0], joinType=[left], 
> requiredColumns=[{0}])
>     LogicalTableScan(table=[[default_catalog, default_database, fl]])
>     LogicalAggregate(group=[{}], agg#0=[COLLECT($0)], agg#1=[COLLECT($1)])
>       LogicalProject(num=[$2], db=[$3])
>         LogicalFilter(condition=[=($cor0.id, $1)])
>           LogicalTableScan(table=[[default_catalog, default_database, 
> application]])
>  {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-28518) Exception: AssertionError: Cannot add expression of different type to set in sub-query with ROW type

2023-10-05 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-28518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger closed FLINK-28518.
--
Resolution: Fixed

> Exception: AssertionError: Cannot add expression of different type to set in 
> sub-query with ROW type
> 
>
> Key: FLINK-28518
> URL: https://issues.apache.org/jira/browse/FLINK-28518
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.15.1
>Reporter: Koylubaev Nikita
>Priority: Major
> Fix For: 1.18.0
>
> Attachments: SubQueryRowTypeTest.scala, test.sql
>
>
> All scripts is attached to file: test.sql
> Create 2 tables:
>  
> {code:java}
> SET
> 'sql-client.execution.result-mode' = 'tableau';
> SET
> 'execution.runtime-mode' = 'batch';
> SET
> 'sql-client.execution.mode' = 'streaming';
> SET
> 'parallelism.default' = '8';
> SET
> 'table.dml-sync' = 'true';
> CREATE
> TEMPORARY TABLE fl (
> `id` INT,
> `name` STRING) 
> WITH (
> 'connector' = 'faker',
> 'number-of-rows' = '10',
> 'rows-per-second' = '1',
> 'fields.id.expression' = '#{number.numberBetween ''0'',''10''}',
> 'fields.name.expression' = '#{superhero.name}');
> CREATE
> TEMPORARY TABLE application (
> `id` INT,
> `fl_id` INT,
> `num` INT,
> `db` DOUBLE) 
> WITH (
> 'connector' = 'faker',
> 'number-of-rows' = '100',
> 'rows-per-second' = '100',
> 'fields.id.expression' = '#{number.numberBetween ''0'',''100''}',
> 'fields.fl_id.expression' = '#{number.numberBetween ''0'',''10''}',
> 'fields.num.expression' = '#{number.numberBetween 
> ''-2147483648'',''2147483647''}',
> 'fields.db.expression' = '#{number.randomDouble ''3'',''-1000'',''1000''}'); 
> {code}
> The next SQL throw exception:
> {code:java}
> select fl.name,
>(select (COLLECT(application.num), COLLECT(application.db))
> from application
> where fl.id = application.fl_id)
> from fl;{code}
> Error stack trace is (I marked what is different in type: it's just NOT NULL):
>  
>  
> {code:java}
> [ERROR] Could not execute SQL statement. Reason:
> java.lang.AssertionError: Cannot add expression of different type to set:
> set type is RecordType(INTEGER id, VARCHAR(2147483647) CHARACTER SET 
> "UTF-16LE" name, RecordType(INTEGER MULTISET EXPR$0, DOUBLE MULTISET EXPR$1) 
> $f0) NOT NULL
> expression type is RecordType(INTEGER id, VARCHAR(2147483647) CHARACTER SET 
> "UTF-16LE" name, RecordType(INTEGER MULTISET EXPR$0, DOUBLE MULTISET EXPR$1) 
> NOT NULL $f0) NOT NULL
> set is 
> rel#129:LogicalCorrelate.NONE.any.[](left=HepRelVertex#119,right=HepRelVertex#128,correlation=$cor0,joinType=left,requiredColumns={0})
> expression is LogicalProject(id=[$0], name=[$1], $f0=[ROW($2, $3)])
>   LogicalCorrelate(correlation=[$cor0], joinType=[left], 
> requiredColumns=[{0}])
>     LogicalTableScan(table=[[default_catalog, default_database, fl]])
>     LogicalAggregate(group=[{}], agg#0=[COLLECT($0)], agg#1=[COLLECT($1)])
>       LogicalProject(num=[$2], db=[$3])
>         LogicalFilter(condition=[=($cor0.id, $1)])
>           LogicalTableScan(table=[[default_catalog, default_database, 
> application]])
>  {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-28518) Exception: AssertionError: Cannot add expression of different type to set in sub-query with ROW type

2023-10-05 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-28518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17772179#comment-17772179
 ] 

Robert Metzger commented on FLINK-28518:


This is the ticket for the calcite upgrade, which has been closed with 1.18 
https://issues.apache.org/jira/browse/FLINK-27998

> Exception: AssertionError: Cannot add expression of different type to set in 
> sub-query with ROW type
> 
>
> Key: FLINK-28518
> URL: https://issues.apache.org/jira/browse/FLINK-28518
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.15.1
>Reporter: Koylubaev Nikita
>Priority: Major
> Attachments: SubQueryRowTypeTest.scala, test.sql
>
>
> All scripts is attached to file: test.sql
> Create 2 tables:
>  
> {code:java}
> SET
> 'sql-client.execution.result-mode' = 'tableau';
> SET
> 'execution.runtime-mode' = 'batch';
> SET
> 'sql-client.execution.mode' = 'streaming';
> SET
> 'parallelism.default' = '8';
> SET
> 'table.dml-sync' = 'true';
> CREATE
> TEMPORARY TABLE fl (
> `id` INT,
> `name` STRING) 
> WITH (
> 'connector' = 'faker',
> 'number-of-rows' = '10',
> 'rows-per-second' = '1',
> 'fields.id.expression' = '#{number.numberBetween ''0'',''10''}',
> 'fields.name.expression' = '#{superhero.name}');
> CREATE
> TEMPORARY TABLE application (
> `id` INT,
> `fl_id` INT,
> `num` INT,
> `db` DOUBLE) 
> WITH (
> 'connector' = 'faker',
> 'number-of-rows' = '100',
> 'rows-per-second' = '100',
> 'fields.id.expression' = '#{number.numberBetween ''0'',''100''}',
> 'fields.fl_id.expression' = '#{number.numberBetween ''0'',''10''}',
> 'fields.num.expression' = '#{number.numberBetween 
> ''-2147483648'',''2147483647''}',
> 'fields.db.expression' = '#{number.randomDouble ''3'',''-1000'',''1000''}'); 
> {code}
> The next SQL throw exception:
> {code:java}
> select fl.name,
>(select (COLLECT(application.num), COLLECT(application.db))
> from application
> where fl.id = application.fl_id)
> from fl;{code}
> Error stack trace is (I marked what is different in type: it's just NOT NULL):
>  
>  
> {code:java}
> [ERROR] Could not execute SQL statement. Reason:
> java.lang.AssertionError: Cannot add expression of different type to set:
> set type is RecordType(INTEGER id, VARCHAR(2147483647) CHARACTER SET 
> "UTF-16LE" name, RecordType(INTEGER MULTISET EXPR$0, DOUBLE MULTISET EXPR$1) 
> $f0) NOT NULL
> expression type is RecordType(INTEGER id, VARCHAR(2147483647) CHARACTER SET 
> "UTF-16LE" name, RecordType(INTEGER MULTISET EXPR$0, DOUBLE MULTISET EXPR$1) 
> NOT NULL $f0) NOT NULL
> set is 
> rel#129:LogicalCorrelate.NONE.any.[](left=HepRelVertex#119,right=HepRelVertex#128,correlation=$cor0,joinType=left,requiredColumns={0})
> expression is LogicalProject(id=[$0], name=[$1], $f0=[ROW($2, $3)])
>   LogicalCorrelate(correlation=[$cor0], joinType=[left], 
> requiredColumns=[{0}])
>     LogicalTableScan(table=[[default_catalog, default_database, fl]])
>     LogicalAggregate(group=[{}], agg#0=[COLLECT($0)], agg#1=[COLLECT($1)])
>       LogicalProject(num=[$2], db=[$3])
>         LogicalFilter(condition=[=($cor0.id, $1)])
>           LogicalTableScan(table=[[default_catalog, default_database, 
> application]])
>  {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-30719) flink-runtime-web failed due to a corrupted nodejs dependency

2023-09-05 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-30719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-30719:
---
Summary: flink-runtime-web failed due to a corrupted nodejs dependency  
(was: flink-runtime-web failed due to a corrupted )

> flink-runtime-web failed due to a corrupted nodejs dependency
> -
>
> Key: FLINK-30719
> URL: https://issues.apache.org/jira/browse/FLINK-30719
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Web Frontend, Test Infrastructure, Tests
>Affects Versions: 1.16.0, 1.17.0, 1.18.0
>Reporter: Matthias Pohl
>Assignee: Sergey Nuyanzin
>Priority: Critical
>  Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=44954=logs=52b61abe-a3cc-5bde-cc35-1bbe89bb7df5=54421a62-0c80-5aad-3319-094ff69180bb=12550
> The build failed due to a corrupted nodejs dependency:
> {code}
> [ERROR] The archive file 
> /__w/1/.m2/repository/com/github/eirslett/node/16.13.2/node-16.13.2-linux-x64.tar.gz
>  is corrupted and will be deleted. Please try the build again.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-30636) Typo fix: "to to" -> "to"

2023-08-03 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-30636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger closed FLINK-30636.
--
Resolution: Fixed

> Typo fix: "to to" -> "to"
> -
>
> Key: FLINK-30636
> URL: https://issues.apache.org/jira/browse/FLINK-30636
> Project: Flink
>  Issue Type: Technical Debt
>Reporter: Gunnar Morling
>Assignee: Gunnar Morling
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.18.0
>
>
> There's a surprising number of occurrences of "to to" in JavaDoc and the 
> like, where it actually should just be "to".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-30636) Typo fix: "to to" -> "to"

2023-08-03 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17750689#comment-17750689
 ] 

Robert Metzger commented on FLINK-30636:


Merged to master in 
https://github.com/apache/flink/commit/b78cd3f536331d02eff9af4702904f331d90bc07

> Typo fix: "to to" -> "to"
> -
>
> Key: FLINK-30636
> URL: https://issues.apache.org/jira/browse/FLINK-30636
> Project: Flink
>  Issue Type: Technical Debt
>Reporter: Gunnar Morling
>Assignee: Gunnar Morling
>Priority: Minor
>  Labels: pull-request-available
>
> There's a surprising number of occurrences of "to to" in JavaDoc and the 
> like, where it actually should just be "to".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-30636) Typo fix: "to to" -> "to"

2023-08-03 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-30636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-30636:
---
Fix Version/s: 1.18.0

> Typo fix: "to to" -> "to"
> -
>
> Key: FLINK-30636
> URL: https://issues.apache.org/jira/browse/FLINK-30636
> Project: Flink
>  Issue Type: Technical Debt
>Reporter: Gunnar Morling
>Assignee: Gunnar Morling
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.18.0
>
>
> There's a surprising number of occurrences of "to to" in JavaDoc and the 
> like, where it actually should just be "to".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32212) Job restarting indefinitely after an IllegalStateException from BlobLibraryCacheManager

2023-07-04 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17739982#comment-17739982
 ] 

Robert Metzger commented on FLINK-32212:


I also just ran into this situation with the Flink K8s Operator and resolved it 
by regenerating the JobGraph.
To do so, I first scaled the Flink JobManager deployment to 0.
Then, I removed the "jobGraph-yyy" key from the "xxx-cluster-config-map" 
ConfigMap.

Next, I scaled the deployment up to 1 again, and watched the job successfully 
(and with state) recover from the last checkpoint.

> Job restarting indefinitely after an IllegalStateException from 
> BlobLibraryCacheManager
> ---
>
> Key: FLINK-32212
> URL: https://issues.apache.org/jira/browse/FLINK-32212
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Task
>Affects Versions: 1.16.1
> Environment: Apache Flink Kubernetes Operator 1.4
>Reporter: Matheus Felisberto
>Priority: Major
>
> After running for a few hours the job starts to throw IllegalStateException 
> and I can't figure out why. To restore the job, I need to manually delete the 
> FlinkDeployment to be recreated and redeploy everything.
> The jar is built-in into the docker image, hence is defined accordingly with 
> the Operator's documentation:
> {code:java}
> // jarURI: local:///opt/flink/usrlib/my-job.jar {code}
> I've tried to move it into /opt/flink/lib/my-job.jar but it didn't work 
> either. 
>  
> {code:java}
> // Source: my-topic (1/2)#30587 
> (b82d2c7f9696449a2d9f4dc298c0a008_bc764cd8ddf7a0cff126f51c16239658_0_30587) 
> switched from DEPLOYING to FAILED with failure cause: 
> java.lang.IllegalStateException: The library registration references a 
> different set of library BLOBs than previous registrations for this job:
> old:[p-5d91888083d38a3ff0b6c350f05a3013632137c6-7237ecbb12b0b021934b0c81aef78396]
> new:[p-5d91888083d38a3ff0b6c350f05a3013632137c6-943737c6790a3ec6870cecd652b956c2]
>     at 
> org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$ResolvedClassLoader.verifyClassLoader(BlobLibraryCacheManager.java:419)
>     at 
> org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$ResolvedClassLoader.access$500(BlobLibraryCacheManager.java:359)
>     at 
> org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$LibraryCacheEntry.getOrResolveClassLoader(BlobLibraryCacheManager.java:235)
>     at 
> org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$LibraryCacheEntry.access$1100(BlobLibraryCacheManager.java:202)
>     at 
> org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$DefaultClassLoaderLease.getOrResolveClassLoader(BlobLibraryCacheManager.java:336)
>     at 
> org.apache.flink.runtime.taskmanager.Task.createUserCodeClassloader(Task.java:1024)
>     at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:612)
>     at org.apache.flink.runtime.taskmanager.Task.run(Task.java:550)
>     at java.base/java.lang.Thread.run(Unknown Source) {code}
> If there is any other information that can help to identify the problem, 
> please let me know.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32439) Kubernetes operator is silently overwriting the "execution.savepoint.path" config

2023-06-26 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-32439:
--

 Summary: Kubernetes operator is silently overwriting the 
"execution.savepoint.path" config
 Key: FLINK-32439
 URL: https://issues.apache.org/jira/browse/FLINK-32439
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Reporter: Robert Metzger


I recently stumbled across the fact that the K8s operator is silently deleting 
/ overwriting the execution.savepoint.path config option.

I understand why this happens, but I wonder if the operator should write a log 
message if the user configured the execution.savepoint.path option.

And / or add a list to the docs about "Operator managed" config options?

https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/reconciler/deployment/ApplicationReconciler.java#L155-L159



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-21883) Introduce cooldown period into adaptive scheduler

2023-06-15 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17733007#comment-17733007
 ] 

Robert Metzger commented on FLINK-21883:


There's FLIP for this Jira: 
https://cwiki.apache.org/confluence/plugins/servlet/mobile?contentId=255072164#content/view/255072164

> Introduce cooldown period into adaptive scheduler
> -
>
> Key: FLINK-21883
> URL: https://issues.apache.org/jira/browse/FLINK-21883
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Reporter: Robert Metzger
>Assignee: Etienne Chauchot
>Priority: Not a Priority
>  Labels: auto-deprioritized-major, auto-deprioritized-minor, 
> reactive
>
> This is a follow up to reactive mode, introduced in FLINK-10407.
> Introduce a cooldown timeout, during which no further scaling actions are 
> performed, after a scaling action.
> Without such a cooldown timeout, it can happen with unfortunate timing, that 
> we are rescaling the job very frequently, because TaskManagers are not all 
> connecting at the same time.
> With the current implementation (1.13), this only applies to scaling up, but 
> this can also apply to scaling down with autoscaling support.
> With this implemented, users can define a cooldown timeout of say 5 minutes: 
> If taskmanagers are now slowly connecting one after another, we will only 
> rescale every 5 minutes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-19830) Properly implements processing-time temporal table join

2023-06-15 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732978#comment-17732978
 ] 

Robert Metzger commented on FLINK-19830:


Awesome, thanks for the update!

> Properly implements processing-time temporal table join
> ---
>
> Key: FLINK-19830
> URL: https://issues.apache.org/jira/browse/FLINK-19830
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / Runtime
>Reporter: Leonard Xu
>Assignee: Leonard Xu
>Priority: Major
>
> The exsiting TemporalProcessTimeJoinOperator has already supported temporal 
> table join.
>  However, the semantic of this implementation is problematic, because the 
> join processing for left stream doesn't wait for the complete snapshot of 
> temporal table, this may mislead users in production environment.
> Under the processing time temporal join semantics, to get the complete 
> snapshot of temporal table may need introduce new mechanism in FLINK SQL in 
> the future.
> **Background** : 
>  * The reason why we turn off the switch[1] for `FOR SYSTEM_TIME AS OF` 
> syntax for *temporal table join* is only the semantic consideration as above.
>  * The reason why we turn on *temporal table function*  is that it has been 
> alive for a long time, thus although it exists same semantic problem, but we 
> still support it from the perspective of compatibility.
> [1] 
> [https://github.com/apache/flink/blob/4fe9f525a92319acc1e3434bebed601306f7a16f/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/nodes/exec/stream/StreamExecTemporalJoin.java#L257]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-30218) [Kafka Connector] java.lang.OutOfMemoryError: Metaspace

2023-05-30 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17727517#comment-17727517
 ] 

Robert Metzger commented on FLINK-30218:


We had a pretty similar problem (also related to AWS auth) with the Kinesis 
connector: https://issues.apache.org/jira/browse/FLINK-19259. I guess the fix 
for Kafka is similar / the same.

> [Kafka Connector] java.lang.OutOfMemoryError: Metaspace
> ---
>
> Key: FLINK-30218
> URL: https://issues.apache.org/jira/browse/FLINK-30218
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.15.1
> Environment: +*AWS EMR*+
>  * Standard AWS EMR Cluster (1 master YARN node, 1 core node) - 2 vCore, 8 GB 
> memory each
>  * JDK 11 Coretto
> +*Kafka Consumer Config*+
> {code:java}
>     acks = 1
>     batch.size = 16384
>     bootstrap.servers = [...]
>     buffer.memory = 33554432
>     client.dns.lookup = use_all_dns_ips
>     client.id = producer-1
>     compression.type = none
>     connections.max.idle.ms = 54
>     delivery.timeout.ms = 12
>     enable.idempotence = false
>     interceptor.classes = []
>     internal.auto.downgrade.txn.commit = false
>     key.serializer = class 
> org.apache.kafka.common.serialization.ByteArraySerializer
>     linger.ms = 0
>     max.block.ms = 6
>     max.in.flight.requests.per.connection = 5
>     max.request.size = 1048576
>     metadata.max.age.ms = 30
>     metadata.max.idle.ms = 30
>     metric.reporters = []
>     metrics.num.samples = 2
>     metrics.recording.level = INFO
>     metrics.sample.window.ms = 3
>     partitioner.class = class 
> org.apache.kafka.clients.producer.internals.DefaultPartitioner
>     receive.buffer.bytes = 32768
>     reconnect.backoff.max.ms = 1000
>     reconnect.backoff.ms = 50
>     request.timeout.ms = 3
>     retries = 2147483647
>     retry.backoff.ms = 100
>     sasl.client.callback.handler.class = class 
> software.amazon.msk.auth.iam.IAMClientCallbackHandler
>     sasl.jaas.config = [hidden]
>     sasl.kerberos.kinit.cmd = /usr/bin/kinit
>     sasl.kerberos.min.time.before.relogin = 6
>     sasl.kerberos.service.name = null
>     sasl.kerberos.ticket.renew.jitter = 0.05
>     sasl.kerberos.ticket.renew.window.factor = 0.8
>     sasl.login.callback.handler.class = null
>     sasl.login.class = null
>     sasl.login.refresh.buffer.seconds = 300
>     sasl.login.refresh.min.period.seconds = 60
>     sasl.login.refresh.window.factor = 0.8
>     sasl.login.refresh.window.jitter = 0.05
>     sasl.mechanism = AWS_MSK_IAM
>     security.protocol = SASL_SSL
>     security.providers = null
>     send.buffer.bytes = 131072
>     socket.connection.setup.timeout.max.ms = 3
>     socket.connection.setup.timeout.ms = 1
>     ssl.cipher.suites = null
>     ssl.enabled.protocols = [TLSv1.2, TLSv1.3]
>     ssl.endpoint.identification.algorithm = https
>     ssl.engine.factory.class = null
>     ssl.key.password = null
>     ssl.keymanager.algorithm = SunX509
>     ssl.keystore.certificate.chain = null
>     ssl.keystore.key = null
>     ssl.keystore.location = null
>     ssl.keystore.password = null
>     ssl.keystore.type = JKS
>     ssl.protocol = TLSv1.3
>     ssl.provider = null
>     ssl.secure.random.implementation = null
>     ssl.trustmanager.algorithm = PKIX
>     ssl.truststore.certificates = null
>     ssl.truststore.location = null
>     ssl.truststore.password = null
>     ssl.truststore.type = JKS
>     transaction.timeout.ms = 360
>     transactional.id = null
>     value.serializer = class 
> org.apache.kafka.common.serialization.ByteArraySerializer{code}
> +*Flink Config*+
> {code:java}
> taskmanager.memory.process.size=3g
> taskmanager.memory.jvm-metaspace.size=512m
> taskmanager.numberOfTaskSlots=2
> jobmanager.memory.process.size=3g
> jobmanager.memory.jvm-metaspace.size=256m
> jobmanager.web.address0.0.0.0
> env.java.opts.all-XX:+HeapDumpOnOutOfMemoryError 
> -XX:HeapDumpPath=${FLINK_LOG_PREFIX}.hprof {code}
>  
>Reporter: Lukas Mahl
>Priority: Major
>  Labels: bug
> Attachments: dump.hprof.zip, image-2022-11-25-15-55-43-559.png
>
>
> Hello!
> I'm running a Flink application on AWS EMR which consumes from a Kafka Topic 
> using the official Flink Kafka consumer. I'm running the application as a 
> Flink batch job every 30 minutes and I see that the JobManager's metaspace is 
> increasing every time I submit a new job and doesn't reduce once a job has 
> finished executing. Eventually the metaspace overflows and I get a 
> OutOfMemory metaspace exception. I've tried increasing the metaspace to 512m, 
> but this just delays the problem - hence it's definitely a classloading leak.
> I debugged the issue by creating a simple Flink application with a 

[jira] [Commented] (FLINK-31860) FlinkDeployments never finalize when namespace is deleted

2023-05-05 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719694#comment-17719694
 ] 

Robert Metzger commented on FLINK-31860:


We were also facing this problem, and we've solved it for now in this hacky way:

{code}
---
 .../kubernetes/operator/utils/EventUtils.java | 24 ++-
 .../templates/rbac.yaml   |  1 +
 2 files changed, 19 insertions(+), 6 deletions(-)

diff --git 
a/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/utils/EventUtils.java
 
b/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/utils/EventUtils.java
index d993de2..36c49a4 100644
--- 
a/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/utils/EventUtils.java
+++ 
b/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/utils/EventUtils.java
@@ -22,6 +22,8 @@
 import io.fabric8.kubernetes.api.model.HasMetadata;
 import io.fabric8.kubernetes.api.model.ObjectReferenceBuilder;
 import io.fabric8.kubernetes.client.KubernetesClient;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
 import java.time.Instant;
 import java.util.function.Consumer;
@@ -31,6 +33,7 @@
  * 
https://github.com/EnMasseProject/enmasse/blob/master/k8s-api/src/main/java/io/enmasse/k8s/api/KubeEventLogger.java
  */
 public class EventUtils {
+private static final Logger LOG = 
LoggerFactory.getLogger(EventUtils.class);
 
 public static String generateEventName(
 HasMetadata target,
@@ -58,14 +61,14 @@ public static boolean createOrUpdateEvent(
 String message,
 EventRecorder.Component component,
 Consumer eventListener) {
+var namespace = target.getMetadata().getNamespace();
+if (isNamespaceMarkedForDeletion(client, namespace)) {
+LOG.info("Ignoring event because namespace is marked for 
deletion");
+return true;
+}
 var eventName = generateEventName(target, type, reason, message, 
component);
 
-var existing =
-client.v1()
-.events()
-.inNamespace(target.getMetadata().getNamespace())
-.withName(eventName)
-.get();
+var existing = 
client.v1().events().inNamespace(namespace).withName(eventName).get();
 
 if (existing != null
 && existing.getType().equals(type.name())
@@ -109,4 +112,13 @@ public static boolean createOrUpdateEvent(
 return true;
 }
 }
+
+private static boolean isNamespaceMarkedForDeletion(KubernetesClient 
client, String namespace) {
+try {
+return 
client.namespaces().withName(namespace).get().isMarkedForDeletion();
+} catch (Exception e) {
+LOG.warn("Error while checking namespace status", e);
+return false;
+}
+}
 }
diff --git a/helm/flink-kubernetes-operator/templates/rbac.yaml 
b/helm/flink-kubernetes-operator/templates/rbac.yaml
index f50852e..21d7071 100644
--- a/helm/flink-kubernetes-operator/templates/rbac.yaml
+++ b/helm/flink-kubernetes-operator/templates/rbac.yaml
@@ -29,6 +29,7 @@ rules:
   - events
   - configmaps
   - secrets
+  - namespaces
 verbs:
   - "*"
 {{- if .Values.rbac.nodesRule.create }}
{code}

> FlinkDeployments never finalize when namespace is deleted
> -
>
> Key: FLINK-31860
> URL: https://issues.apache.org/jira/browse/FLINK-31860
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-1.3.1
> Environment: Apache Flink Kubernetes Operator 1.3.1
> Kubernetes 1.24.9
>Reporter: Jayme Howard
>Assignee: Gyula Fora
>Priority: Blocker
>  Labels: pull-request-available
>
> This appears to be a pretty straightforward issue, but I don't know the 
> codebase well enough to propose a fix.  When a FlinkDeployment is present in 
> a namespace, and the namespace is deleted, the FlinkDeployment never 
> reconciles and fails to complete its finalizer.  This leads to the namespace 
> being blocked from deletion indefinitely, requiring manual manipulation to 
> remove the finalizer on the FlinkDeployment.
>  
> Namespace conditions:
> {code:java}
> conditions:
> - lastTransitionTime: '2023-04-18T22:17:48Z'
>   message: All resources successfully discovered
>   reason: ResourcesDiscovered
>   status: 'False'
>   type: NamespaceDeletionDiscoveryFailure
> - lastTransitionTime: '2023-03-23T18:27:37Z'
>   message: All legacy kube types successfully parsed
>   reason: ParsedGroupVersions
>   status: 'False'
>   type: NamespaceDeletionGroupVersionParsingFailure
> - lastTransitionTime: 

[jira] [Updated] (FLINK-31834) Azure Warning: no space left on device

2023-05-05 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-31834:
---
Fix Version/s: 1.16.2
   1.17.1

> Azure Warning: no space left on device
> --
>
> Key: FLINK-31834
> URL: https://issues.apache.org/jira/browse/FLINK-31834
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / Azure Pipelines
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Major
>  Labels: build-stability, pull-request-available
> Fix For: 1.16.2, 1.18.0, 1.17.1
>
>
> In this CI run: 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=48213=logs=af184cdd-c6d8-5084-0b69-7e9c67b35f7a=841082b6-1a93-5908-4d37-a071f4387a5f=21
> There was this warning:
> {code}
> Loaded image: confluentinc/cp-kafka:6.2.2
> Loaded image: testcontainers/ryuk:0.3.3
> ApplyLayer exit status 1 stdout:  stderr: write 
> /opt/jdk-15.0.1+9/lib/modules: no space left on device
> ##[error]Bash exited with code '1'.
> Finishing: Restore docker images
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31834) Azure Warning: no space left on device

2023-05-05 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719688#comment-17719688
 ] 

Robert Metzger commented on FLINK-31834:


No, I think that's just an oversight.
I've just pushed to 1.17: 
https://github.com/apache/flink/commit/91dfb22e0bc7ac10a9a9f59cd9da6d62a723dadd

> Azure Warning: no space left on device
> --
>
> Key: FLINK-31834
> URL: https://issues.apache.org/jira/browse/FLINK-31834
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / Azure Pipelines
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Major
>  Labels: build-stability, pull-request-available
> Fix For: 1.18.0
>
>
> In this CI run: 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=48213=logs=af184cdd-c6d8-5084-0b69-7e9c67b35f7a=841082b6-1a93-5908-4d37-a071f4387a5f=21
> There was this warning:
> {code}
> Loaded image: confluentinc/cp-kafka:6.2.2
> Loaded image: testcontainers/ryuk:0.3.3
> ApplyLayer exit status 1 stdout:  stderr: write 
> /opt/jdk-15.0.1+9/lib/modules: no space left on device
> ##[error]Bash exited with code '1'.
> Finishing: Restore docker images
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31999) CI fails in preparing the e2e test runs due openssl unavailable

2023-05-04 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719282#comment-17719282
 ] 

Robert Metzger commented on FLINK-31999:


I think some e2e test was failing with the ubuntu provided openssl version. 
But maybe this problem has been addressed by netty by now?

> CI fails in preparing the e2e test runs due openssl unavailable
> ---
>
> Key: FLINK-31999
> URL: https://issues.apache.org/jira/browse/FLINK-31999
> Project: Flink
>  Issue Type: Bug
>  Components: Test Infrastructure
>Affects Versions: 1.17.0, 1.16.1, 1.18.0
>Reporter: Matthias Pohl
>Assignee: Matthias Pohl
>Priority: Critical
>  Labels: test-stability
>
> We experience build failures due to the openssl download URL causing a 404:
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=48653=logs=bea52777-eaf8-5663-8482-18fbc3630e81=d6e79740-7cf7-5407-2e69-ca34c9be0efb]
> It is indeed due to the URL having changed slightly:
>  * old:   
> [http://security.ubuntu.com/ubuntu/pool/main/o/openssl1.0/libssl1.0.0_1.0.2n-1ubuntu5.11_amd64.deb]
>  * new: 
> [http://security.ubuntu.com/ubuntu/pool/main/o/openssl1.0/libssl1.0.0_1.0.2n-1ubuntu5.12_amd64.deb]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-21711) DataStreamSink doesn't allow setting maxParallelism

2023-04-27 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-21711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger closed FLINK-21711.
--
Fix Version/s: (was: 1.11.7)
   (was: 1.12.8)
   (was: 1.18.0)
   Resolution: Duplicate

Closing as duplicate of FLINK-31873

> DataStreamSink doesn't allow setting maxParallelism
> ---
>
> Key: FLINK-21711
> URL: https://issues.apache.org/jira/browse/FLINK-21711
> Project: Flink
>  Issue Type: Improvement
>  Components: API / DataStream
>Affects Versions: 1.11.3, 1.12.2, 1.13.0
>Reporter: Robert Metzger
>Priority: Minor
>  Labels: auto-deprioritized-major
>
> It seems that we can only set the max parallelism of the sink only via this 
> internal API:
> {code}
> input.addSink(new 
> ParallelismTrackingSink<>()).getTransformation().setMaxParallelism(1);
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-31834) Azure Warning: no space left on device

2023-04-19 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger closed FLINK-31834.
--
Fix Version/s: 1.18.0
   Resolution: Fixed

Merged to master in 
https://github.com/apache/flink/commit/15c4d88eb78cb8f702e7e56563cb10e49d424b14

> Azure Warning: no space left on device
> --
>
> Key: FLINK-31834
> URL: https://issues.apache.org/jira/browse/FLINK-31834
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / Azure Pipelines
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Major
>  Labels: build-stability, pull-request-available
> Fix For: 1.18.0
>
>
> In this CI run: 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=48213=logs=af184cdd-c6d8-5084-0b69-7e9c67b35f7a=841082b6-1a93-5908-4d37-a071f4387a5f=21
> There was this warning:
> {code}
> Loaded image: confluentinc/cp-kafka:6.2.2
> Loaded image: testcontainers/ryuk:0.3.3
> ApplyLayer exit status 1 stdout:  stderr: write 
> /opt/jdk-15.0.1+9/lib/modules: no space left on device
> ##[error]Bash exited with code '1'.
> Finishing: Restore docker images
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31810) RocksDBException: Bad table magic number on checkpoint rescale

2023-04-18 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17713613#comment-17713613
 ] 

Robert Metzger commented on FLINK-31810:


Thanks for confirming. I will try and see if this file is still around, or if I 
can reproduce the issue.

> RocksDBException: Bad table magic number on checkpoint rescale
> --
>
> Key: FLINK-31810
> URL: https://issues.apache.org/jira/browse/FLINK-31810
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.15.2
>Reporter: Robert Metzger
>Priority: Major
>
> While rescaling a job from checkpoint, I ran into this exception:
> {code:java}
> SinkMaterializer[7] -> rob-result[7]: Writer -> rob-result[7]: Committer 
> (4/4)#3 (c1b348f7eed6e1ce0e41ef75338ae754) switched from INITIALIZING to 
> FAILED with failure cause: java.lang.Exception: Exception while creating 
> StreamOperatorStateContext.
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:255)
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:265)
>   at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:106)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:703)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:679)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:646)
>   at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948)
>   at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:917)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563)
>   at java.base/java.lang.Thread.run(Unknown Source)
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for 
> SinkUpsertMaterializer_7d9b7588bc2ff89baed50d7a4558caa4_(4/4) from any of the 
> 1 provided restore options.
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:160)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:346)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:164)
>   ... 11 more
> Caused by: org.apache.flink.runtime.state.BackendBuildingException: Caught 
> unexpected exception.
>   at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:395)
>   at 
> org.apache.flink.contrib.streaming.state.EmbeddedRocksDBStateBackend.createKeyedStateBackend(EmbeddedRocksDBStateBackend.java:483)
>   at 
> org.apache.flink.contrib.streaming.state.EmbeddedRocksDBStateBackend.createKeyedStateBackend(EmbeddedRocksDBStateBackend.java:97)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:329)
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:168)
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
>   ... 13 more
> Caused by: java.io.IOException: Error while opening RocksDB instance.
>   at 
> org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.openDB(RocksDBOperationUtils.java:92)
>   at 
> org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreDBInstanceFromStateHandle(RocksDBIncrementalRestoreOperation.java:465)
>   at 
> org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreWithRescaling(RocksDBIncrementalRestoreOperation.java:321)
>   at 
> org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restore(RocksDBIncrementalRestoreOperation.java:164)
>   at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:315)
>   ... 18 more
> Caused by: org.rocksdb.RocksDBException: Bad table magic number: expected 
> 9863518390377041911, found 4096 in 
> 

[jira] [Assigned] (FLINK-31834) Azure Warning: no space left on device

2023-04-18 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger reassigned FLINK-31834:
--

Assignee: Robert Metzger

> Azure Warning: no space left on device
> --
>
> Key: FLINK-31834
> URL: https://issues.apache.org/jira/browse/FLINK-31834
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / Azure Pipelines
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Major
>  Labels: build-stability
>
> In this CI run: 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=48213=logs=af184cdd-c6d8-5084-0b69-7e9c67b35f7a=841082b6-1a93-5908-4d37-a071f4387a5f=21
> There was this warning:
> {code}
> Loaded image: confluentinc/cp-kafka:6.2.2
> Loaded image: testcontainers/ryuk:0.3.3
> ApplyLayer exit status 1 stdout:  stderr: write 
> /opt/jdk-15.0.1+9/lib/modules: no space left on device
> ##[error]Bash exited with code '1'.
> Finishing: Restore docker images
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31834) Azure Warning: no space left on device

2023-04-18 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17713608#comment-17713608
 ] 

Robert Metzger commented on FLINK-31834:


Looking closer at this issue, I notice:

a) this message didn't fail the build. Probably the caching didn't properly work
b) the cleanup script runs after this caching step. It reports 3.9 GB of free 
disk space: 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=48213=logs=af184cdd-c6d8-5084-0b69-7e9c67b35f7a=affb2083-df4e-5398-e502-35356824fd45=290
 After the cleanup script. 32GB of disk space is available: 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=48213=logs=af184cdd-c6d8-5084-0b69-7e9c67b35f7a=affb2083-df4e-5398-e502-35356824fd45=1804

It looks like I can move the cleanup script before the caching. I'll try that.

> Azure Warning: no space left on device
> --
>
> Key: FLINK-31834
> URL: https://issues.apache.org/jira/browse/FLINK-31834
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / Azure Pipelines
>Reporter: Robert Metzger
>Priority: Major
>  Labels: build-stability
>
> In this CI run: 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=48213=logs=af184cdd-c6d8-5084-0b69-7e9c67b35f7a=841082b6-1a93-5908-4d37-a071f4387a5f=21
> There was this warning:
> {code}
> Loaded image: confluentinc/cp-kafka:6.2.2
> Loaded image: testcontainers/ryuk:0.3.3
> ApplyLayer exit status 1 stdout:  stderr: write 
> /opt/jdk-15.0.1+9/lib/modules: no space left on device
> ##[error]Bash exited with code '1'.
> Finishing: Restore docker images
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31840) NullPointerException in operators.window.slicing.SliceAssigners$AbstractSliceAssigner.assignSliceEnd

2023-04-18 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17713599#comment-17713599
 ] 

Robert Metzger commented on FLINK-31840:


Yeah, I wanted to mention the stack trace and fixed version somewhere, Jira is 
ideal for that.

Ideally, your PR should have included a test case to make sure nobody is 
breaking this in the future.

> NullPointerException in 
> operators.window.slicing.SliceAssigners$AbstractSliceAssigner.assignSliceEnd
> 
>
> Key: FLINK-31840
> URL: https://issues.apache.org/jira/browse/FLINK-31840
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Reporter: Robert Metzger
>Assignee: lincoln lee
>Priority: Major
> Fix For: 1.16.0
>
>
> While running a Flink SQL Query (with a hop window), I got this error.
> {code}
> Caused by: 
> org.apache.flink.streaming.runtime.tasks.ExceptionInChainedOperatorException: 
> Could not forward element to next operator
>   at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:99)
>   at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:57)
>   at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:29)
>   at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:56)
>   at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:29)
>   at StreamExecCalc$11.processElement(Unknown Source)
>   at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:82)
>   ... 23 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.flink.table.runtime.operators.window.slicing.SliceAssigners$AbstractSliceAssigner.assignSliceEnd(SliceAssigners.java:558)
>   at 
> org.apache.flink.table.runtime.operators.aggregate.window.LocalSlicingWindowAggOperator.processElement(LocalSlicingWindowAggOperator.java:114)
>   at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:82)
>   ... 29 more
> {code}
> It was caused by a timestamp field containing NULL values.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-31840) NullPointerException in operators.window.slicing.SliceAssigners$AbstractSliceAssigner.assignSliceEnd

2023-04-18 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger reassigned FLINK-31840:
--

Assignee: lincoln lee

> NullPointerException in 
> operators.window.slicing.SliceAssigners$AbstractSliceAssigner.assignSliceEnd
> 
>
> Key: FLINK-31840
> URL: https://issues.apache.org/jira/browse/FLINK-31840
> Project: Flink
>  Issue Type: Bug
>Reporter: Robert Metzger
>Assignee: lincoln lee
>Priority: Major
> Fix For: 1.16.0
>
>
> While running a Flink SQL Query (with a hop window), I got this error.
> {code}
> Caused by: 
> org.apache.flink.streaming.runtime.tasks.ExceptionInChainedOperatorException: 
> Could not forward element to next operator
>   at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:99)
>   at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:57)
>   at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:29)
>   at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:56)
>   at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:29)
>   at StreamExecCalc$11.processElement(Unknown Source)
>   at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:82)
>   ... 23 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.flink.table.runtime.operators.window.slicing.SliceAssigners$AbstractSliceAssigner.assignSliceEnd(SliceAssigners.java:558)
>   at 
> org.apache.flink.table.runtime.operators.aggregate.window.LocalSlicingWindowAggOperator.processElement(LocalSlicingWindowAggOperator.java:114)
>   at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:82)
>   ... 29 more
> {code}
> It was caused by a timestamp field containing NULL values.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-31840) NullPointerException in operators.window.slicing.SliceAssigners$AbstractSliceAssigner.assignSliceEnd

2023-04-18 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-31840:
---
Component/s: Table SQL / Runtime

> NullPointerException in 
> operators.window.slicing.SliceAssigners$AbstractSliceAssigner.assignSliceEnd
> 
>
> Key: FLINK-31840
> URL: https://issues.apache.org/jira/browse/FLINK-31840
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Reporter: Robert Metzger
>Assignee: lincoln lee
>Priority: Major
> Fix For: 1.16.0
>
>
> While running a Flink SQL Query (with a hop window), I got this error.
> {code}
> Caused by: 
> org.apache.flink.streaming.runtime.tasks.ExceptionInChainedOperatorException: 
> Could not forward element to next operator
>   at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:99)
>   at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:57)
>   at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:29)
>   at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:56)
>   at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:29)
>   at StreamExecCalc$11.processElement(Unknown Source)
>   at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:82)
>   ... 23 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.flink.table.runtime.operators.window.slicing.SliceAssigners$AbstractSliceAssigner.assignSliceEnd(SliceAssigners.java:558)
>   at 
> org.apache.flink.table.runtime.operators.aggregate.window.LocalSlicingWindowAggOperator.processElement(LocalSlicingWindowAggOperator.java:114)
>   at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:82)
>   ... 29 more
> {code}
> It was caused by a timestamp field containing NULL values.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-31840) NullPointerException in operators.window.slicing.SliceAssigners$AbstractSliceAssigner.assignSliceEnd

2023-04-18 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger closed FLINK-31840.
--
Resolution: Fixed

> NullPointerException in 
> operators.window.slicing.SliceAssigners$AbstractSliceAssigner.assignSliceEnd
> 
>
> Key: FLINK-31840
> URL: https://issues.apache.org/jira/browse/FLINK-31840
> Project: Flink
>  Issue Type: Bug
>Reporter: Robert Metzger
>Priority: Major
> Fix For: 1.16.0
>
>
> While running a Flink SQL Query (with a hop window), I got this error.
> {code}
> Caused by: 
> org.apache.flink.streaming.runtime.tasks.ExceptionInChainedOperatorException: 
> Could not forward element to next operator
>   at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:99)
>   at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:57)
>   at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:29)
>   at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:56)
>   at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:29)
>   at StreamExecCalc$11.processElement(Unknown Source)
>   at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:82)
>   ... 23 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.flink.table.runtime.operators.window.slicing.SliceAssigners$AbstractSliceAssigner.assignSliceEnd(SliceAssigners.java:558)
>   at 
> org.apache.flink.table.runtime.operators.aggregate.window.LocalSlicingWindowAggOperator.processElement(LocalSlicingWindowAggOperator.java:114)
>   at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:82)
>   ... 29 more
> {code}
> It was caused by a timestamp field containing NULL values.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-31840) NullPointerException in operators.window.slicing.SliceAssigners$AbstractSliceAssigner.assignSliceEnd

2023-04-18 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-31840:
---
Fix Version/s: 1.16.0

> NullPointerException in 
> operators.window.slicing.SliceAssigners$AbstractSliceAssigner.assignSliceEnd
> 
>
> Key: FLINK-31840
> URL: https://issues.apache.org/jira/browse/FLINK-31840
> Project: Flink
>  Issue Type: Bug
>Reporter: Robert Metzger
>Priority: Major
> Fix For: 1.16.0
>
>
> While running a Flink SQL Query (with a hop window), I got this error.
> {code}
> Caused by: 
> org.apache.flink.streaming.runtime.tasks.ExceptionInChainedOperatorException: 
> Could not forward element to next operator
>   at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:99)
>   at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:57)
>   at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:29)
>   at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:56)
>   at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:29)
>   at StreamExecCalc$11.processElement(Unknown Source)
>   at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:82)
>   ... 23 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.flink.table.runtime.operators.window.slicing.SliceAssigners$AbstractSliceAssigner.assignSliceEnd(SliceAssigners.java:558)
>   at 
> org.apache.flink.table.runtime.operators.aggregate.window.LocalSlicingWindowAggOperator.processElement(LocalSlicingWindowAggOperator.java:114)
>   at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:82)
>   ... 29 more
> {code}
> It was caused by a timestamp field containing NULL values.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31840) NullPointerException in operators.window.slicing.SliceAssigners$AbstractSliceAssigner.assignSliceEnd

2023-04-18 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17713563#comment-17713563
 ] 

Robert Metzger commented on FLINK-31840:


Note: I've filed this ticket just for tracking purposes, because I couldn't 
find any information about this error on the internet. Now the problem and 
solution is at least publicly available.

> NullPointerException in 
> operators.window.slicing.SliceAssigners$AbstractSliceAssigner.assignSliceEnd
> 
>
> Key: FLINK-31840
> URL: https://issues.apache.org/jira/browse/FLINK-31840
> Project: Flink
>  Issue Type: Bug
>Reporter: Robert Metzger
>Priority: Major
>
> While running a Flink SQL Query (with a hop window), I got this error.
> {code}
> Caused by: 
> org.apache.flink.streaming.runtime.tasks.ExceptionInChainedOperatorException: 
> Could not forward element to next operator
>   at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:99)
>   at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:57)
>   at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:29)
>   at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:56)
>   at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:29)
>   at StreamExecCalc$11.processElement(Unknown Source)
>   at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:82)
>   ... 23 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.flink.table.runtime.operators.window.slicing.SliceAssigners$AbstractSliceAssigner.assignSliceEnd(SliceAssigners.java:558)
>   at 
> org.apache.flink.table.runtime.operators.aggregate.window.LocalSlicingWindowAggOperator.processElement(LocalSlicingWindowAggOperator.java:114)
>   at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:82)
>   ... 29 more
> {code}
> It was caused by a timestamp field containing NULL values.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31840) NullPointerException in operators.window.slicing.SliceAssigners$AbstractSliceAssigner.assignSliceEnd

2023-04-18 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17713561#comment-17713561
 ] 

Robert Metzger commented on FLINK-31840:


This issue was fixed in https://github.com/apache/flink/pull/20302/files

> NullPointerException in 
> operators.window.slicing.SliceAssigners$AbstractSliceAssigner.assignSliceEnd
> 
>
> Key: FLINK-31840
> URL: https://issues.apache.org/jira/browse/FLINK-31840
> Project: Flink
>  Issue Type: Bug
>Reporter: Robert Metzger
>Priority: Major
>
> While running a Flink SQL Query (with a hop window), I got this error.
> {code}
> Caused by: 
> org.apache.flink.streaming.runtime.tasks.ExceptionInChainedOperatorException: 
> Could not forward element to next operator
>   at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:99)
>   at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:57)
>   at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:29)
>   at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:56)
>   at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:29)
>   at StreamExecCalc$11.processElement(Unknown Source)
>   at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:82)
>   ... 23 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.flink.table.runtime.operators.window.slicing.SliceAssigners$AbstractSliceAssigner.assignSliceEnd(SliceAssigners.java:558)
>   at 
> org.apache.flink.table.runtime.operators.aggregate.window.LocalSlicingWindowAggOperator.processElement(LocalSlicingWindowAggOperator.java:114)
>   at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:82)
>   ... 29 more
> {code}
> It was caused by a timestamp field containing NULL values.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31840) NullPointerException in operators.window.slicing.SliceAssigners$AbstractSliceAssigner.assignSliceEnd

2023-04-18 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-31840:
--

 Summary: NullPointerException in 
operators.window.slicing.SliceAssigners$AbstractSliceAssigner.assignSliceEnd
 Key: FLINK-31840
 URL: https://issues.apache.org/jira/browse/FLINK-31840
 Project: Flink
  Issue Type: Bug
Reporter: Robert Metzger


While running a Flink SQL Query (with a hop window), I got this error.
{code}
Caused by: 
org.apache.flink.streaming.runtime.tasks.ExceptionInChainedOperatorException: 
Could not forward element to next operator
at 
org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:99)
at 
org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:57)
at 
org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:29)
at 
org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:56)
at 
org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:29)
at StreamExecCalc$11.processElement(Unknown Source)
at 
org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:82)
... 23 more
Caused by: java.lang.NullPointerException
at 
org.apache.flink.table.runtime.operators.window.slicing.SliceAssigners$AbstractSliceAssigner.assignSliceEnd(SliceAssigners.java:558)
at 
org.apache.flink.table.runtime.operators.aggregate.window.LocalSlicingWindowAggOperator.processElement(LocalSlicingWindowAggOperator.java:114)
at 
org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:82)
... 29 more
{code}

It was caused by a timestamp field containing NULL values.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31834) Azure Warning: no space left on device

2023-04-18 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-31834:
--

 Summary: Azure Warning: no space left on device
 Key: FLINK-31834
 URL: https://issues.apache.org/jira/browse/FLINK-31834
 Project: Flink
  Issue Type: Bug
  Components: Build System / Azure Pipelines
Reporter: Robert Metzger


In this CI run: 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=48213=logs=af184cdd-c6d8-5084-0b69-7e9c67b35f7a=841082b6-1a93-5908-4d37-a071f4387a5f=21

There was this warning:
{code}
Loaded image: confluentinc/cp-kafka:6.2.2
Loaded image: testcontainers/ryuk:0.3.3
ApplyLayer exit status 1 stdout:  stderr: write /opt/jdk-15.0.1+9/lib/modules: 
no space left on device
##[error]Bash exited with code '1'.
Finishing: Restore docker images
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31810) RocksDBException: Bad table magic number on checkpoint rescale

2023-04-14 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-31810:
--

 Summary: RocksDBException: Bad table magic number on checkpoint 
rescale
 Key: FLINK-31810
 URL: https://issues.apache.org/jira/browse/FLINK-31810
 Project: Flink
  Issue Type: Bug
  Components: Runtime / State Backends
Affects Versions: 1.15.2
Reporter: Robert Metzger


While rescaling a job from checkpoint, I ran into this exception:

{code:java}
SinkMaterializer[7] -> rob-result[7]: Writer -> rob-result[7]: Committer 
(4/4)#3 (c1b348f7eed6e1ce0e41ef75338ae754) switched from INITIALIZING to FAILED 
with failure cause: java.lang.Exception: Exception while creating 
StreamOperatorStateContext.
at 
org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:255)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:265)
at 
org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:106)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:703)
at 
org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:679)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:646)
at 
org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948)
at 
org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:917)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: org.apache.flink.util.FlinkException: Could not restore keyed state 
backend for SinkUpsertMaterializer_7d9b7588bc2ff89baed50d7a4558caa4_(4/4) from 
any of the 1 provided restore options.
at 
org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:160)
at 
org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:346)
at 
org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:164)
... 11 more
Caused by: org.apache.flink.runtime.state.BackendBuildingException: Caught 
unexpected exception.
at 
org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:395)
at 
org.apache.flink.contrib.streaming.state.EmbeddedRocksDBStateBackend.createKeyedStateBackend(EmbeddedRocksDBStateBackend.java:483)
at 
org.apache.flink.contrib.streaming.state.EmbeddedRocksDBStateBackend.createKeyedStateBackend(EmbeddedRocksDBStateBackend.java:97)
at 
org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:329)
at 
org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:168)
at 
org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
... 13 more
Caused by: java.io.IOException: Error while opening RocksDB instance.
at 
org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.openDB(RocksDBOperationUtils.java:92)
at 
org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreDBInstanceFromStateHandle(RocksDBIncrementalRestoreOperation.java:465)
at 
org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreWithRescaling(RocksDBIncrementalRestoreOperation.java:321)
at 
org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restore(RocksDBIncrementalRestoreOperation.java:164)
at 
org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:315)
... 18 more
Caused by: org.rocksdb.RocksDBException: Bad table magic number: expected 
9863518390377041911, found 4096 in 
/tmp/job__op_SinkUpsertMaterializer_7d9b7588bc2ff89baed50d7a4558caa4__4_4__uuid_d5587dfc-78b3-427c-8cb6-35507b71bc4b/46475654-5515-430e-b215-389d42cddb97/000232.sst
at org.rocksdb.RocksDB.open(Native Method)
at org.rocksdb.RocksDB.open(RocksDB.java:306)
at 
org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.openDB(RocksDBOperationUtils.java:80)
... 22 more
{code}

I haven't found any other cases of this 

[jira] [Closed] (FLINK-31566) Publish the Dockerfiles for the new release

2023-03-24 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger closed FLINK-31566.
--
Resolution: Fixed

> Publish the Dockerfiles for the new release
> ---
>
> Key: FLINK-31566
> URL: https://issues.apache.org/jira/browse/FLINK-31566
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Matthias Pohl
>Assignee: Matthias Pohl
>Priority: Major
>  Labels: pull-request-available
>
> Note: the official Dockerfiles fetch the binary distribution of the target 
> Flink version from an Apache mirror. After publishing the binary release 
> artifacts, mirrors can take some hours to start serving the new artifacts, so 
> you may want to wait to do this step until you are ready to continue with the 
> "Promote the release" steps in the follow-up Jira.
> Follow the instructions in the flink-docker repo to build the new Dockerfiles 
> and send an updated manifest to Docker Hub so the new images are built and 
> published.
>  
> 
> h3. Expectations
>  * Dockerfiles in [flink-docker|https://github.com/apache/flink-docker] 
> updated for the new Flink release and pull request opened on the Docker 
> official-images with an updated manifest



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31566) Publish the Dockerfiles for the new release

2023-03-24 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17704484#comment-17704484
 ] 

Robert Metzger commented on FLINK-31566:


Also the official image is online https://hub.docker.com/_/flink/tags

> Publish the Dockerfiles for the new release
> ---
>
> Key: FLINK-31566
> URL: https://issues.apache.org/jira/browse/FLINK-31566
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Matthias Pohl
>Assignee: Matthias Pohl
>Priority: Major
>  Labels: pull-request-available
>
> Note: the official Dockerfiles fetch the binary distribution of the target 
> Flink version from an Apache mirror. After publishing the binary release 
> artifacts, mirrors can take some hours to start serving the new artifacts, so 
> you may want to wait to do this step until you are ready to continue with the 
> "Promote the release" steps in the follow-up Jira.
> Follow the instructions in the flink-docker repo to build the new Dockerfiles 
> and send an updated manifest to Docker Hub so the new images are built and 
> published.
>  
> 
> h3. Expectations
>  * Dockerfiles in [flink-docker|https://github.com/apache/flink-docker] 
> updated for the new Flink release and pull request opened on the Docker 
> official-images with an updated manifest



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31566) Publish the Dockerfiles for the new release

2023-03-23 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17704328#comment-17704328
 ] 

Robert Metzger commented on FLINK-31566:


Opened a PR with the official docker library: 
https://github.com/docker-library/official-images/pull/14324

Asked the Flink PMC to help with releasing from the apache/flink dockerHub repo 
(I hope to soon have access again)

> Publish the Dockerfiles for the new release
> ---
>
> Key: FLINK-31566
> URL: https://issues.apache.org/jira/browse/FLINK-31566
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Matthias Pohl
>Assignee: Matthias Pohl
>Priority: Major
>  Labels: pull-request-available
>
> Note: the official Dockerfiles fetch the binary distribution of the target 
> Flink version from an Apache mirror. After publishing the binary release 
> artifacts, mirrors can take some hours to start serving the new artifacts, so 
> you may want to wait to do this step until you are ready to continue with the 
> "Promote the release" steps in the follow-up Jira.
> Follow the instructions in the flink-docker repo to build the new Dockerfiles 
> and send an updated manifest to Docker Hub so the new images are built and 
> published.
>  
> 
> h3. Expectations
>  * Dockerfiles in [flink-docker|https://github.com/apache/flink-docker] 
> updated for the new Flink release and pull request opened on the Docker 
> official-images with an updated manifest



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-30787) dmesg fails to save data to file due to permissions

2023-01-25 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17680583#comment-17680583
 ] 

Robert Metzger commented on FLINK-30787:


I'm pretty sure this worked before. Maybe it only worked on the alibaba 
machines, not on the Azure machines (or vice versa)

> dmesg fails to save data to file due to permissions
> ---
>
> Key: FLINK-30787
> URL: https://issues.apache.org/jira/browse/FLINK-30787
> Project: Flink
>  Issue Type: Bug
>  Components: Test Infrastructure
>Affects Versions: 1.16.0, 1.17.0, 1.15.3
>Reporter: Matthias Pohl
>Priority: Major
>
> We're not collecting the {{dmesg}} output due to a permission issue in any 
> build:
> {code}
> 2023-01-12T10:10:25.1598207Z dmesg: read kernel buffer failed: Operation not 
> permitted
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-29427) LookupJoinITCase failed with classloader problem

2023-01-11 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17675557#comment-17675557
 ] 

Robert Metzger commented on FLINK-29427:


[~smiralex] what's the status of this issue?

> LookupJoinITCase failed with classloader problem
> 
>
> Key: FLINK-29427
> URL: https://issues.apache.org/jira/browse/FLINK-29427
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.16.0, 1.17.0
>Reporter: Huang Xingbo
>Assignee: Alexander Smirnov
>Priority: Blocker
>  Labels: pull-request-available, test-stability
>
> {code:java}
> 2022-09-27T02:49:20.9501313Z Sep 27 02:49:20 Caused by: 
> org.codehaus.janino.InternalCompilerException: Compiling 
> "KeyProjection$108341": Trying to access closed classloader. Please check if 
> you store classloaders directly or indirectly in static fields. If the 
> stacktrace suggests that the leak occurs in a third party library and cannot 
> be fixed immediately, you can disable this check with the configuration 
> 'classloader.check-leaked-classloader'.
> 2022-09-27T02:49:20.9502654Z Sep 27 02:49:20  at 
> org.codehaus.janino.UnitCompiler.compileUnit(UnitCompiler.java:382)
> 2022-09-27T02:49:20.9503366Z Sep 27 02:49:20  at 
> org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:237)
> 2022-09-27T02:49:20.9504044Z Sep 27 02:49:20  at 
> org.codehaus.janino.SimpleCompiler.compileToClassLoader(SimpleCompiler.java:465)
> 2022-09-27T02:49:20.9504704Z Sep 27 02:49:20  at 
> org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:216)
> 2022-09-27T02:49:20.9505341Z Sep 27 02:49:20  at 
> org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:207)
> 2022-09-27T02:49:20.9505965Z Sep 27 02:49:20  at 
> org.codehaus.commons.compiler.Cookable.cook(Cookable.java:80)
> 2022-09-27T02:49:20.9506584Z Sep 27 02:49:20  at 
> org.codehaus.commons.compiler.Cookable.cook(Cookable.java:75)
> 2022-09-27T02:49:20.9507261Z Sep 27 02:49:20  at 
> org.apache.flink.table.runtime.generated.CompileUtils.doCompile(CompileUtils.java:104)
> 2022-09-27T02:49:20.9507883Z Sep 27 02:49:20  ... 30 more
> 2022-09-27T02:49:20.9509266Z Sep 27 02:49:20 Caused by: 
> java.lang.IllegalStateException: Trying to access closed classloader. Please 
> check if you store classloaders directly or indirectly in static fields. If 
> the stacktrace suggests that the leak occurs in a third party library and 
> cannot be fixed immediately, you can disable this check with the 
> configuration 'classloader.check-leaked-classloader'.
> 2022-09-27T02:49:20.9510835Z Sep 27 02:49:20  at 
> org.apache.flink.util.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.ensureInner(FlinkUserCodeClassLoaders.java:184)
> 2022-09-27T02:49:20.9511760Z Sep 27 02:49:20  at 
> org.apache.flink.util.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.loadClass(FlinkUserCodeClassLoaders.java:192)
> 2022-09-27T02:49:20.9512456Z Sep 27 02:49:20  at 
> java.lang.Class.forName0(Native Method)
> 2022-09-27T02:49:20.9513014Z Sep 27 02:49:20  at 
> java.lang.Class.forName(Class.java:348)
> 2022-09-27T02:49:20.9513649Z Sep 27 02:49:20  at 
> org.codehaus.janino.ClassLoaderIClassLoader.findIClass(ClassLoaderIClassLoader.java:89)
> 2022-09-27T02:49:20.9514339Z Sep 27 02:49:20  at 
> org.codehaus.janino.IClassLoader.loadIClass(IClassLoader.java:312)
> 2022-09-27T02:49:20.9514990Z Sep 27 02:49:20  at 
> org.codehaus.janino.UnitCompiler.findTypeByName(UnitCompiler.java:8556)
> 2022-09-27T02:49:20.9515659Z Sep 27 02:49:20  at 
> org.codehaus.janino.UnitCompiler.getReferenceType(UnitCompiler.java:6749)
> 2022-09-27T02:49:20.9516337Z Sep 27 02:49:20  at 
> org.codehaus.janino.UnitCompiler.getReferenceType(UnitCompiler.java:6594)
> 2022-09-27T02:49:20.9516989Z Sep 27 02:49:20  at 
> org.codehaus.janino.UnitCompiler.getType2(UnitCompiler.java:6573)
> 2022-09-27T02:49:20.9517632Z Sep 27 02:49:20  at 
> org.codehaus.janino.UnitCompiler.access$13900(UnitCompiler.java:215)
> 2022-09-27T02:49:20.9518319Z Sep 27 02:49:20  at 
> org.codehaus.janino.UnitCompiler$22$1.visitReferenceType(UnitCompiler.java:6481)
> 2022-09-27T02:49:20.9519018Z Sep 27 02:49:20  at 
> org.codehaus.janino.UnitCompiler$22$1.visitReferenceType(UnitCompiler.java:6476)
> 2022-09-27T02:49:20.9519680Z Sep 27 02:49:20  at 
> org.codehaus.janino.Java$ReferenceType.accept(Java.java:3928)
> 2022-09-27T02:49:20.9520386Z Sep 27 02:49:20  at 
> org.codehaus.janino.UnitCompiler$22.visitType(UnitCompiler.java:6476)
> 2022-09-27T02:49:20.9521042Z Sep 27 02:49:20  at 
> org.codehaus.janino.UnitCompiler$22.visitType(UnitCompiler.java:6469)
> 2022-09-27T02:49:20.9521677Z Sep 27 02:49:20  at 
> org.codehaus.janino.Java$ReferenceType.accept(Java.java:3927)
> 2022-09-27T02:49:20.9522299Z Sep 27 02:49:20  at 
> 

[jira] [Commented] (FLINK-26802) StreamPhysicalOverAggregate doesn't support consuming update and delete changes which is produced by node Deduplicate

2023-01-10 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17656695#comment-17656695
 ] 

Robert Metzger commented on FLINK-26802:


I think this is a known limitation tracked here: 
https://issues.apache.org/jira/browse/FLINK-19059 

> StreamPhysicalOverAggregate doesn't support consuming update and delete 
> changes which is produced by node Deduplicate
> -
>
> Key: FLINK-26802
> URL: https://issues.apache.org/jira/browse/FLINK-26802
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.14.4
> Environment: flink version 1.14
> flink sql  job
>Reporter: zhiyuan
>Priority: Major
> Attachments: image-2022-03-22-17-28-17-759.png, 
> image-2022-03-22-17-37-09-443.png
>
>
> Problem description:
> using topn to de-duplicate, another layer of topN queries is nested
> tran_tm  Is defined as watermark
> sql :
> !image-2022-03-22-17-28-17-759.png!
>  
> I tested it and found that using ProcTime worked, but using RowTime had 
> syntax problems
>  
> !image-2022-03-22-17-37-09-443.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-27969) StreamPhysicalOverAggregate doesn't support consuming update and delete changes

2023-01-10 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17656694#comment-17656694
 ] 

Robert Metzger commented on FLINK-27969:


I think this is a known limitation, tracked here: 
https://issues.apache.org/jira/browse/FLINK-19059

> StreamPhysicalOverAggregate doesn't support consuming update and delete 
> changes
> ---
>
> Key: FLINK-27969
> URL: https://issues.apache.org/jira/browse/FLINK-27969
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / API
>Affects Versions: 1.14.3
>Reporter: Spongebob
>Priority: Major
>
> Exception trace:
> {code:java}
> // exception
> StreamPhysicalOverAggregate doesn't support consuming update and delete 
> changes which is produced by node Join(joinType=[LeftOuterJoin], where=[(COL2 
> = COL4)], select=[...], leftInputSpec=[NoUniqueKey], 
> rightInputSpec=[NoUniqueKey]) {code}
> FlinkSQL that scheduled as streaming table like this:
> {code:java}
> // dml
> SELECT RANK() OVER (PARTITION BY A.COL1 ORDER BY A.COL2) AS ODER_ONUM
> FROM A
> INNER JOIN B ON A.COL1 = B.COL1
> LEFT JOIN C ON C.COL3 = 1 AND CAST(A.COL2 AS STRING) = C.COL4{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-30093) [Flink SQL][Protobuf] CompileException when querying Kafka topic using google.protobuf.Timestamp

2023-01-02 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653556#comment-17653556
 ] 

Robert Metzger commented on FLINK-30093:


I'm not a protobuf expert, so I can not decide what's best from a user 
experience perspective.

Using {{row}} sounds like a workaround to me? 
Wouldn't it be nicer if the Timestamp-type is handled out of the box by Flink? 
If you think this is an ok user experience, then I agree with Benchao, that we 
should at least document how how to use the timestamp type with Flink.

> [Flink SQL][Protobuf] CompileException when querying Kafka topic using 
> google.protobuf.Timestamp 
> -
>
> Key: FLINK-30093
> URL: https://issues.apache.org/jira/browse/FLINK-30093
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile), Table 
> SQL / Ecosystem
>Affects Versions: 1.16.0
> Environment: Mac OS Ventura
>Reporter: James Mcguire
>Assignee: hubert dulay
>Priority: Major
>  Labels: pull-request-available
> Attachments: taskmanager_172.22.0 (1).4_46291-40eec2_log
>
>
> I am encountering an issue when trying to use Flink SQL to query a Kafka 
> topic that uses {{{}google.protobuf.Timestamp{}}}.
>  
> When attempting to use Flink SQL to query a protobuf serialized Kafka topic 
> that uses  {{{}google.protobuf.Timestamp{}}}, a 
> {{org.codehaus.commons.compiler.CompileException: Line 23, Column 5: Cannot 
> determine simple type name "com" }}error occurs when trying to query the 
> table.
>  
> *Replication steps:*
> 1. Use a protobuf definition that contains a 
> {{{}google.protobuf.Timestamp{}}}:
> {noformat}
> syntax = "proto3";
> package example.message;
> import "google/protobuf/timestamp.proto";
> option java_package = "com.example.message";
> option java_multiple_files = true;
> message Test {
>   int64 id = 1;
>   google.protobuf.Timestamp created_at = 5;
> }{noformat}
> 2. Use protobuf definition to produce message to topic
> 3. Confirm message is deserializable by protoc:
> {code:java}
> kcat -C -t development.example.message -b localhost:9092 -o -1 -e -q -D "" | 
> protoc --decode=example.message.Test 
> --proto_path=/Users/jamesmcguire/repos/flink-proto-example/schemas/ 
> example/message/test.proto 
> id: 123
> created_at {
>   seconds: 456
>   nanos: 789
> }{code}
> 4. Create table in Flink SQL using kafka connector and protobuf format
> {code:java}
> CREATE TABLE tests (
>   id BIGINT,
>   created_at row
> )
> COMMENT ''
> WITH (
>   'connector' = 'kafka',
>   'format' = 'protobuf',
>   'protobuf.message-class-name' = 'com.example.message.Test',
>   'properties.auto.offset.reset' = 'earliest',
>   'properties.bootstrap.servers' = 'host.docker.internal:9092',
>   'properties.group.id' = 'test-1',
>   'topic' = 'development.example.message'
> );{code}
> 5. Run query in Flink SQL and encounter error:
> {code:java}
> Flink SQL> select * from tests;
> [ERROR] Could not execute SQL statement. Reason:
> org.codehaus.commons.compiler.CompileException: Line 23, Column 5: Cannot 
> determine simple type name "com" {code}
> {*}NOTE{*}: If you repeat steps 4-5 without {{created_at row nanos INT>}} in the table, step 5 will complete successfully.
> 6. Observe in attached log file, Flink appears to be using the incorrect 
> namespace (should be {{google.protobuf.Timestamp):}}
> {code:java}
> com.example.message.Timestamp message3 = message0.getCreatedAt(); {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-29363) Allow web ui to fully redirect to other page

2023-01-02 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-29363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger closed FLINK-29363.
--
Fix Version/s: 1.17.0
 Assignee: Zhenqiu Huang
   Resolution: Fixed

Merged to master for 1.17 in 
https://github.com/apache/flink/commit/ded2df542fd5d585842e77d021fb84a92a5bea76

> Allow web ui to fully redirect to other page
> 
>
> Key: FLINK-29363
> URL: https://issues.apache.org/jira/browse/FLINK-29363
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Web Frontend
>Affects Versions: 1.15.3
>Reporter: Zhenqiu Huang
>Assignee: Zhenqiu Huang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.17.0
>
>
> In a streaming platform system, web ui usually integrates with internal 
> authentication and authorization system. Given the validation failed, the 
> request needs to be redirected to a landing page. It does't work for AJAX 
> request. It will be great to have the web ui configurable to allow auto full 
> redirect. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-30443) Expand list of sensitive keys

2022-12-22 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-30443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger closed FLINK-30443.
--
Fix Version/s: 1.17.0
   Resolution: Fixed

Merged to master (1.17) in 
https://github.com/apache/flink/commit/d7b63aee3b02e71f837e1f0b18f1b93790765d9f

> Expand list of sensitive keys
> -
>
> Key: FLINK-30443
> URL: https://issues.apache.org/jira/browse/FLINK-30443
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Core
>Reporter: Gunnar Morling
>Assignee: Gunnar Morling
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.17.0
>
>
> In {{{}GlobalConfiguration{}}}, there is [a list of known configuration 
> keys|https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/configuration/GlobalConfiguration.java#L47-L48]
>  whose values will be masked in log output. In our Flink deployment there's a 
> few more keys which we would like to mask, specifically, the following ones:
> * "auth-params"
> * "service-key"
> * "token"
> * "basic-auth"
> * "jaas.config"
> While those are somewhat use-case specific, I feel they are generic enough 
> for being added to that list, and there already is precedence in form of 
> "fs.azure.account.key". In that light, I don't think it's worth making this 
> somehow pluggable, but I'm curious what other folks here think. Thanks!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-30443) Expand list of sensitive keys

2022-12-22 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-30443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger reassigned FLINK-30443:
--

Assignee: Gunnar Morling

> Expand list of sensitive keys
> -
>
> Key: FLINK-30443
> URL: https://issues.apache.org/jira/browse/FLINK-30443
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Core
>Reporter: Gunnar Morling
>Assignee: Gunnar Morling
>Priority: Major
>  Labels: pull-request-available
>
> In {{{}GlobalConfiguration{}}}, there is [a list of known configuration 
> keys|https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/configuration/GlobalConfiguration.java#L47-L48]
>  whose values will be masked in log output. In our Flink deployment there's a 
> few more keys which we would like to mask, specifically, the following ones:
> * "auth-params"
> * "service-key"
> * "token"
> * "basic-auth"
> * "jaas.config"
> While those are somewhat use-case specific, I feel they are generic enough 
> for being added to that list, and there already is precedence in form of 
> "fs.azure.account.key". In that light, I don't think it's worth making this 
> somehow pluggable, but I'm curious what other folks here think. Thanks!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-30454) Inconsistent class hierarchy in TaskIOMetricGroup

2022-12-22 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17651304#comment-17651304
 ] 

Robert Metzger commented on FLINK-30454:


Thanks for your contribution!

Merged to master (1.17) in f82be845f3f673264a13eaf29e11b19e00f37222

> Inconsistent class hierarchy in TaskIOMetricGroup
> -
>
> Key: FLINK-30454
> URL: https://issues.apache.org/jira/browse/FLINK-30454
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Metrics
>Reporter: Gunnar Morling
>Assignee: Gunnar Morling
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.17.0
>
>
> I noticed an interesting issue when trying to compile the flink-runtime 
> module with Eclipse (same for VSCode): the _private_ inner class 
> {{org.apache.flink.runtime.metrics.groups.TaskIOMetricGroup.SizeGauge}} has 
> yet another _public_ inner class, {{{}SizeSupplier{}}}. The public method 
> {{org.apache.flink.runtime.metrics.groups.TaskIOMetricGroup.registerMailboxSizeSupplier(SizeSupplier)}}
>  has a parameter of that type.
> The invocation of this method in 
> {{org.apache.flink.streaming.runtime.tasks.StreamTask.StreamTask(Environment, 
> TimerService, UncaughtExceptionHandler, StreamTaskActionExecutor, 
> TaskMailbox)}} can be compiled with the javac compiler of the JDK, but fails 
> to compile with ecj:
> {code:java}
> The type TaskIOMetricGroup.SizeGauge from the descriptor computed for the 
> target context is not visible here.  
> {code}
> I tend to believe that the behavior of Eclipse's compiler is the correct one. 
> After all, you couldn't explicitly reference the {{SizeSupplier}} type either.
> One possible fix would be to promote {{SizeSupplier}} to the same level as 
> {{{}SizeGauge{}}}. This would be source-compatible but not binary-compatible, 
> though. I.e. code compiled against the earlier signature of 
> {{registerMailboxSizeSupplier()}} would be broken with a 
> {{{}NoSuchMethodError{}}}. Depending on whether 
> {{registerMailboxSizeSupplier()}} are expected in client code or not, this 
> may or may not be acceptable.
> Another fix would be to make {{SizeGauge}} public. I think that's the change 
> I'd do. Curious what other folks here think.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-30454) Inconsistent class hierarchy in TaskIOMetricGroup

2022-12-22 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-30454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-30454:
---
Fix Version/s: 1.17.0

> Inconsistent class hierarchy in TaskIOMetricGroup
> -
>
> Key: FLINK-30454
> URL: https://issues.apache.org/jira/browse/FLINK-30454
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Metrics
>Reporter: Gunnar Morling
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.17.0
>
>
> I noticed an interesting issue when trying to compile the flink-runtime 
> module with Eclipse (same for VSCode): the _private_ inner class 
> {{org.apache.flink.runtime.metrics.groups.TaskIOMetricGroup.SizeGauge}} has 
> yet another _public_ inner class, {{{}SizeSupplier{}}}. The public method 
> {{org.apache.flink.runtime.metrics.groups.TaskIOMetricGroup.registerMailboxSizeSupplier(SizeSupplier)}}
>  has a parameter of that type.
> The invocation of this method in 
> {{org.apache.flink.streaming.runtime.tasks.StreamTask.StreamTask(Environment, 
> TimerService, UncaughtExceptionHandler, StreamTaskActionExecutor, 
> TaskMailbox)}} can be compiled with the javac compiler of the JDK, but fails 
> to compile with ecj:
> {code:java}
> The type TaskIOMetricGroup.SizeGauge from the descriptor computed for the 
> target context is not visible here.  
> {code}
> I tend to believe that the behavior of Eclipse's compiler is the correct one. 
> After all, you couldn't explicitly reference the {{SizeSupplier}} type either.
> One possible fix would be to promote {{SizeSupplier}} to the same level as 
> {{{}SizeGauge{}}}. This would be source-compatible but not binary-compatible, 
> though. I.e. code compiled against the earlier signature of 
> {{registerMailboxSizeSupplier()}} would be broken with a 
> {{{}NoSuchMethodError{}}}. Depending on whether 
> {{registerMailboxSizeSupplier()}} are expected in client code or not, this 
> may or may not be acceptable.
> Another fix would be to make {{SizeGauge}} public. I think that's the change 
> I'd do. Curious what other folks here think.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-30454) Inconsistent class hierarchy in TaskIOMetricGroup

2022-12-22 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-30454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger reassigned FLINK-30454:
--

Assignee: Gunnar Morling

> Inconsistent class hierarchy in TaskIOMetricGroup
> -
>
> Key: FLINK-30454
> URL: https://issues.apache.org/jira/browse/FLINK-30454
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Metrics
>Reporter: Gunnar Morling
>Assignee: Gunnar Morling
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.17.0
>
>
> I noticed an interesting issue when trying to compile the flink-runtime 
> module with Eclipse (same for VSCode): the _private_ inner class 
> {{org.apache.flink.runtime.metrics.groups.TaskIOMetricGroup.SizeGauge}} has 
> yet another _public_ inner class, {{{}SizeSupplier{}}}. The public method 
> {{org.apache.flink.runtime.metrics.groups.TaskIOMetricGroup.registerMailboxSizeSupplier(SizeSupplier)}}
>  has a parameter of that type.
> The invocation of this method in 
> {{org.apache.flink.streaming.runtime.tasks.StreamTask.StreamTask(Environment, 
> TimerService, UncaughtExceptionHandler, StreamTaskActionExecutor, 
> TaskMailbox)}} can be compiled with the javac compiler of the JDK, but fails 
> to compile with ecj:
> {code:java}
> The type TaskIOMetricGroup.SizeGauge from the descriptor computed for the 
> target context is not visible here.  
> {code}
> I tend to believe that the behavior of Eclipse's compiler is the correct one. 
> After all, you couldn't explicitly reference the {{SizeSupplier}} type either.
> One possible fix would be to promote {{SizeSupplier}} to the same level as 
> {{{}SizeGauge{}}}. This would be source-compatible but not binary-compatible, 
> though. I.e. code compiled against the earlier signature of 
> {{registerMailboxSizeSupplier()}} would be broken with a 
> {{{}NoSuchMethodError{}}}. Depending on whether 
> {{registerMailboxSizeSupplier()}} are expected in client code or not, this 
> may or may not be acceptable.
> Another fix would be to make {{SizeGauge}} public. I think that's the change 
> I'd do. Curious what other folks here think.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-30443) Expand list of sensitive keys

2022-12-22 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17651302#comment-17651302
 ] 

Robert Metzger commented on FLINK-30443:


I suggest to merge this as-is, and file a follow up ticket for making it 
configurable. 
This will allow us to collect feedback on the ticket.

At my current company, we had to make this change in our internal fork to 
extend the list of sensitive keys. If this was configurable, it would have been 
easier for us ... so I see benefits here.
In addition to that, our configuration page distinguishes between normal and 
"advanced" configuration keys...and this is certainly an advanced configuration.

> Expand list of sensitive keys
> -
>
> Key: FLINK-30443
> URL: https://issues.apache.org/jira/browse/FLINK-30443
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Core
>Reporter: Gunnar Morling
>Priority: Major
>  Labels: pull-request-available
>
> In {{{}GlobalConfiguration{}}}, there is [a list of known configuration 
> keys|https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/configuration/GlobalConfiguration.java#L47-L48]
>  whose values will be masked in log output. In our Flink deployment there's a 
> few more keys which we would like to mask, specifically, the following ones:
> * "auth-params"
> * "service-key"
> * "token"
> * "basic-auth"
> * "jaas.config"
> While those are somewhat use-case specific, I feel they are generic enough 
> for being added to that list, and there already is precedence in form of 
> "fs.azure.account.key". In that light, I don't think it's worth making this 
> somehow pluggable, but I'm curious what other folks here think. Thanks!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-30478) Don't depend on IPAddressUtil

2022-12-21 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-30478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger reassigned FLINK-30478:
--

Assignee: Gunnar Morling

> Don't depend on IPAddressUtil
> -
>
> Key: FLINK-30478
> URL: https://issues.apache.org/jira/browse/FLINK-30478
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Core
>Reporter: Gunnar Morling
>Assignee: Gunnar Morling
>Priority: Major
>
> The class \{{org.apache.flink.util.NetUtils}} uses the JDK-internal class 
> \{{sun.net.util.IPAddressUtil}}. On current JDKs (16+), this causes issues as 
> access to this class is prevented by default and would require an additional 
> \{{--add-opens}} clause. That's undesirable in particular in cases where we 
> don't control the JVM start-up arguments, e.g. when using Flink embedded into 
> a custom Java application.
> I suggest to replace this logic using the 
> [IPAddress|https://github.com/seancfoley/IPAddress/] library (Apache License 
> v2), which implements everything we need without relying on internal classes. 
> I have a patch for that ready and will submit it for discussion.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-30455) Avoid "cleaning" of java.lang.String in ClosureCleaner

2022-12-20 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-30455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-30455:
---
Component/s: API / Core

> Avoid "cleaning" of java.lang.String in ClosureCleaner
> --
>
> Key: FLINK-30455
> URL: https://issues.apache.org/jira/browse/FLINK-30455
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Core
>Reporter: Gunnar Morling
>Assignee: Gunnar Morling
>Priority: Major
>  Labels: pull-request-available
>
> When running a simple "hello world" example on JDK 17, I noticed the closure 
> cleaner tries to reflectively access the {{java.lang.String}} class, which 
> fails due to the strong encapsulation enabled by default in JDK 17 and 
> beyond. I don't think the closure cleaner needs to act on {{String}} at all, 
> as it doesn't contain any inner classes. Unless there are objections, I'll 
> provide a fix along those lines. With this change in place, I can run that 
> example on JDK 17.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-30454) Inconsistent class hierarchy in TaskIOMetricGroup

2022-12-19 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649365#comment-17649365
 ] 

Robert Metzger commented on FLINK-30454:


I'm pretty sure these are not public classes. We use the @Public / 
@PublicEvolving / @Internal annotations to mark interface visibility in Flink.
The TaskIOMetricGroup-stuff is, in my understanding anyways internal. 

> Inconsistent class hierarchy in TaskIOMetricGroup
> -
>
> Key: FLINK-30454
> URL: https://issues.apache.org/jira/browse/FLINK-30454
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Metrics
>Reporter: Gunnar Morling
>Priority: Major
>
> I noticed an interesting issue when trying to compile the flink-runtime 
> module with Eclipse (same for VSCode): the _private_ inner class 
> {{org.apache.flink.runtime.metrics.groups.TaskIOMetricGroup.SizeGauge}} has 
> yet another _public_ inner class, {{{}SizeSupplier{}}}. The public method 
> {{org.apache.flink.runtime.metrics.groups.TaskIOMetricGroup.registerMailboxSizeSupplier(SizeSupplier)}}
>  has a parameter of that type.
> The invocation of this method in 
> {{org.apache.flink.streaming.runtime.tasks.StreamTask.StreamTask(Environment, 
> TimerService, UncaughtExceptionHandler, StreamTaskActionExecutor, 
> TaskMailbox)}} can be compiled with the javac compiler of the JDK, but fails 
> to compile with ecj:
> {code:java}
> The type TaskIOMetricGroup.SizeGauge from the descriptor computed for the 
> target context is not visible here.  
> {code}
> I tend to believe that the behavior of Eclipse's compiler is the correct one. 
> After all, you couldn't explicitly reference the {{SizeSupplier}} type either.
> One possible fix would be to promote {{SizeSupplier}} to the same level as 
> {{{}SizeGauge{}}}. This would be source-compatible but not binary-compatible, 
> though. I.e. code compiled against the earlier signature of 
> {{registerMailboxSizeSupplier()}} would be broken with a 
> {{{}NoSuchMethodError{}}}. Depending on whether 
> {{registerMailboxSizeSupplier()}} are expected in client code or not, this 
> may or may not be acceptable.
> Another fix would be to make {{SizeGauge}} public. I think that's the change 
> I'd do. Curious what other folks here think.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-29755) PulsarSourceUnorderedE2ECase.testSavepoint failed because of missing TaskManagers

2022-12-19 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649270#comment-17649270
 ] 

Robert Metzger commented on FLINK-29755:


Another instability with the Pulsar E2e tests:

{code}
2022-12-18T16:01:17.3859942Z Dec 18 16:01:17 [ERROR] 
org.apache.flink.tests.util.pulsar.PulsarSourceUnorderedE2ECase.testScaleDown(TestEnvironment,
 DataStreamSourceExternalContext, CheckpointingMode)[2]  Time elapsed: 124.035 
s  <<< FAILURE!

022-12-18T16:01:17.3861087Z Dec 18 16:01:17 Expecting
2022-12-18T16:01:17.3861386Z Dec 18 16:01:17   
2022-12-18T16:01:17.3861716Z Dec 18 16:01:17 to be completed within 2M.
2022-12-18T16:01:17.3861992Z Dec 18 16:01:17 
2022-12-18T16:01:17.3862369Z Dec 18 16:01:17 exception caught while trying to 
get the future result: java.util.concurrent.TimeoutException
2022-12-18T16:01:17.3862901Z Dec 18 16:01:17at 
java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784)
2022-12-18T16:01:17.3863456Z Dec 18 16:01:17at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928)
2022-12-18T16:01:17.3863987Z Dec 18 16:01:17at 
org.assertj.core.internal.Futures.assertSucceededWithin(Futures.java:109)
2022-12-18T16:01:17.3864610Z Dec 18 16:01:17at 
org.assertj.core.api.AbstractCompletableFutureAssert.internalSucceedsWithin(AbstractCompletableFutureAssert.java:400)
2022-12-18T16:01:17.3865289Z Dec 18 16:01:17at 
org.assertj.core.api.AbstractCompletableFutureAssert.succeedsWithin(AbstractCompletableFutureAssert.java:396)
2022-12-18T16:01:17.3866019Z Dec 18 16:01:17at 
org.apache.flink.connector.pulsar.testutils.source.UnorderedSourceTestSuiteBase.checkResultWithSemantic(UnorderedSourceTestSuiteBase.java:55)
2022-12-18T16:01:17.3866771Z Dec 18 16:01:17at 
org.apache.flink.connector.testframe.testsuites.SourceTestSuiteBase.restartFromSavepoint(SourceTestSuiteBase.java:329)
2022-12-18T16:01:17.3867465Z Dec 18 16:01:17at 
org.apache.flink.connector.testframe.testsuites.SourceTestSuiteBase.testScaleDown(SourceTestSuiteBase.java:279)
2022-12-18T16:01:17.3868017Z Dec 18 16:01:17at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2022-12-18T16:01:17.3868513Z Dec 18 16:01:17at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
2022-12-18T16:01:17.3869078Z Dec 18 16:01:17at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2022-12-18T16:01:17.3869596Z Dec 18 16:01:17at 
java.lang.reflect.Method.invoke(Method.java:498)
2022-12-18T16:01:17.3870098Z Dec 18 16:01:17at 
org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:727)
2022-12-18T16:01:17.3870794Z Dec 18 16:01:17at 
org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
2022-12-18T16:01:17.3871458Z Dec 18 16:01:17at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
2022-12-18T16:01:17.3872128Z Dec 18 16:01:17at 
org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156)
2022-12-18T16:01:17.3872754Z Dec 18 16:01:17at 
org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147)
2022-12-18T16:01:17.3873407Z Dec 18 16:01:17at 
org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestTemplateMethod(TimeoutExtension.java:94)
2022-12-18T16:01:17.3874141Z Dec 18 16:01:17at 
org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103)
2022-12-18T16:01:17.3874918Z Dec 18 16:01:17at 
org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93)
2022-12-18T16:01:17.3875638Z Dec 18 16:01:17at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
2022-12-18T16:01:17.3876515Z Dec 18 16:01:17at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
2022-12-18T16:01:17.3877239Z Dec 18 16:01:17at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
2022-12-18T16:01:17.3877912Z Dec 18 16:01:17at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
2022-12-18T16:01:17.3878578Z Dec 18 16:01:17at 
org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92)
2022-12-18T16:01:17.3879243Z Dec 18 16:01:17at 
org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:86)
2022-12-18T16:01:17.3879939Z Dec 18 16:01:17at 

[jira] [Assigned] (FLINK-30439) Update playgrounds for 1.16

2022-12-16 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-30439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger reassigned FLINK-30439:
--

Assignee: Gunnar Morling

> Update playgrounds for 1.16
> ---
>
> Key: FLINK-30439
> URL: https://issues.apache.org/jira/browse/FLINK-30439
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation / Training
>Reporter: David Anderson
>Assignee: Gunnar Morling
>Priority: Major
>  Labels: starter
> Fix For: 1.16.0
>
>
> All of the playgrounds should be updated for Flink 1.16. This should include 
> reworking the code as necessary to avoid using anything that has been 
> deprecated.
> See [https://github.com/apache/flink-playgrounds] for more info.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-30439) Update playgrounds for 1.16

2022-12-16 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17648699#comment-17648699
 ] 

Robert Metzger commented on FLINK-30439:


I assigned you!

> Update playgrounds for 1.16
> ---
>
> Key: FLINK-30439
> URL: https://issues.apache.org/jira/browse/FLINK-30439
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation / Training
>Reporter: David Anderson
>Assignee: Gunnar Morling
>Priority: Major
>  Labels: starter
> Fix For: 1.16.0
>
>
> All of the playgrounds should be updated for Flink 1.16. This should include 
> reworking the code as necessary to avoid using anything that has been 
> deprecated.
> See [https://github.com/apache/flink-playgrounds] for more info.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-29940) ExecutionGraph logs job state change at ERROR level when job fails

2022-12-05 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643312#comment-17643312
 ] 

Robert Metzger commented on FLINK-29940:


I agree that ERROR is not the right log level here, because it is indeed not a 
log message indicating that the system can not proceed normally.
However, it is also not really an INFO event, since it is a pretty important 
message, that the user should notice. I was wondering whether a WARN log level 
would be a change we could consider here?

> ExecutionGraph logs job state change at ERROR level when job fails
> --
>
> Key: FLINK-29940
> URL: https://issues.apache.org/jira/browse/FLINK-29940
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.16.0
>Reporter: Mingliang Liu
>Priority: Minor
>  Labels: pull-request-available
>
> When job switched to FAILED state, the log is very useful to understand why 
> it failed along with the root cause exception stack. However, the current log 
> level is INFO - a bit inconvenient for users to search from logging with so 
> many surrounding log lines. We can log at ERROR level when the job switched 
> to FAILED state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-30257) SqlClientITCase#testMatchRecognize failed

2022-12-01 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17641844#comment-17641844
 ] 

Robert Metzger commented on FLINK-30257:


Another case: 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=43633=logs=af184cdd-c6d8-5084-0b69-7e9c67b35f7a=160c9ae5-96fd-516e-1c91-deb81f59292a

> SqlClientITCase#testMatchRecognize failed
> -
>
> Key: FLINK-30257
> URL: https://issues.apache.org/jira/browse/FLINK-30257
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Client
>Affects Versions: 1.17.0
>Reporter: Martijn Visser
>Priority: Major
>  Labels: test-stability
>
> {code:java}
> Nov 30 21:54:41 [ERROR] Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, 
> Time elapsed: 224.683 s <<< FAILURE! - in SqlClientITCase
> Nov 30 21:54:41 [ERROR] SqlClientITCase.testMatchRecognize  Time elapsed: 
> 50.164 s  <<< FAILURE!
> Nov 30 21:54:41 org.opentest4j.AssertionFailedError: 
> Nov 30 21:54:41 
> Nov 30 21:54:41 expected: 1
> Nov 30 21:54:41  but was: 0
> Nov 30 21:54:41   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> Nov 30 21:54:41   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> Nov 30 21:54:41   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> Nov 30 21:54:41   at 
> SqlClientITCase.verifyNumberOfResultRecords(SqlClientITCase.java:297)
> Nov 30 21:54:41   at 
> SqlClientITCase.testMatchRecognize(SqlClientITCase.java:255)
> Nov 30 21:54:41   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> Nov 30 21:54:41   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> Nov 30 21:54:41   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> Nov 30 21:54:41   at java.lang.reflect.Method.invoke(Method.java:498)
> Nov 30 21:54:41   at 
> org.junit.platform.commons.util.ReflectionUtils.invokeMetho
> {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=43635=logs=af184cdd-c6d8-5084-0b69-7e9c67b35f7a=160c9ae5-96fd-516e-1c91-deb81f59292a=14817



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-30093) [Flink SQL][Protobuf] CompileException when querying Kafka topic using google.protobuf.Timestamp

2022-11-30 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-30093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger reassigned FLINK-30093:
--

Assignee: hubert dulay

> [Flink SQL][Protobuf] CompileException when querying Kafka topic using 
> google.protobuf.Timestamp 
> -
>
> Key: FLINK-30093
> URL: https://issues.apache.org/jira/browse/FLINK-30093
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile), Table 
> SQL / Ecosystem
>Affects Versions: 1.16.0
> Environment: Mac OS Ventura
>Reporter: James Mcguire
>Assignee: hubert dulay
>Priority: Major
> Attachments: taskmanager_172.22.0 (1).4_46291-40eec2_log
>
>
> I am encountering an issue when trying to use Flink SQL to query a Kafka 
> topic that uses {{{}google.protobuf.Timestamp{}}}.
>  
> When attempting to use Flink SQL to query a protobuf serialized Kafka topic 
> that uses  {{{}google.protobuf.Timestamp{}}}, a 
> {{org.codehaus.commons.compiler.CompileException: Line 23, Column 5: Cannot 
> determine simple type name "com" }}error occurs when trying to query the 
> table.
>  
> *Replication steps:*
> 1. Use a protobuf definition that contains a 
> {{{}google.protobuf.Timestamp{}}}:
> {noformat}
> syntax = "proto3";
> package example.message;
> import "google/protobuf/timestamp.proto";
> option java_package = "com.example.message";
> option java_multiple_files = true;
> message Test {
>   int64 id = 1;
>   google.protobuf.Timestamp created_at = 5;
> }{noformat}
> 2. Use protobuf definition to produce message to topic
> 3. Confirm message is deserializable by protoc:
> {code:java}
> kcat -C -t development.example.message -b localhost:9092 -o -1 -e -q -D "" | 
> protoc --decode=example.message.Test 
> --proto_path=/Users/jamesmcguire/repos/flink-proto-example/schemas/ 
> example/message/test.proto 
> id: 123
> created_at {
>   seconds: 456
>   nanos: 789
> }{code}
> 4. Create table in Flink SQL using kafka connector and protobuf format
> {code:java}
> CREATE TABLE tests (
>   id BIGINT,
>   created_at row
> )
> COMMENT ''
> WITH (
>   'connector' = 'kafka',
>   'format' = 'protobuf',
>   'protobuf.message-class-name' = 'com.example.message.Test',
>   'properties.auto.offset.reset' = 'earliest',
>   'properties.bootstrap.servers' = 'host.docker.internal:9092',
>   'properties.group.id' = 'test-1',
>   'topic' = 'development.example.message'
> );{code}
> 5. Run query in Flink SQL and encounter error:
> {code:java}
> Flink SQL> select * from tests;
> [ERROR] Could not execute SQL statement. Reason:
> org.codehaus.commons.compiler.CompileException: Line 23, Column 5: Cannot 
> determine simple type name "com" {code}
> {*}NOTE{*}: If you repeat steps 4-5 without {{created_at row nanos INT>}} in the table, step 5 will complete successfully.
> 6. Observe in attached log file, Flink appears to be using the incorrect 
> namespace (should be {{google.protobuf.Timestamp):}}
> {code:java}
> com.example.message.Timestamp message3 = message0.getCreatedAt(); {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-30156) [FLIP-242] Blogpost about the customisable RateLimitingStrategy in the AsyncSinkBase

2022-11-25 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-30156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger closed FLINK-30156.
--
Resolution: Fixed

> [FLIP-242] Blogpost about the customisable RateLimitingStrategy in the 
> AsyncSinkBase
> 
>
> Key: FLINK-30156
> URL: https://issues.apache.org/jira/browse/FLINK-30156
> Project: Flink
>  Issue Type: Improvement
>Reporter: Hong Liang Teoh
>Assignee: Hong Liang Teoh
>Priority: Major
>  Labels: pull-request-available
>
> Create a blogpost to explain the customisability of the RateLimitingStrategy 
> in the AsyncSinkBase. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-30156) [FLIP-242] Blogpost about the customisable RateLimitingStrategy in the AsyncSinkBase

2022-11-25 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17638568#comment-17638568
 ] 

Robert Metzger commented on FLINK-30156:


Merged in 
https://github.com/apache/flink-web/commit/0e4b8a20ac88202c8ff5c671014d4c464c86ec8d

> [FLIP-242] Blogpost about the customisable RateLimitingStrategy in the 
> AsyncSinkBase
> 
>
> Key: FLINK-30156
> URL: https://issues.apache.org/jira/browse/FLINK-30156
> Project: Flink
>  Issue Type: Improvement
>Reporter: Hong Liang Teoh
>Assignee: Hong Liang Teoh
>Priority: Major
>  Labels: pull-request-available
>
> Create a blogpost to explain the customisability of the RateLimitingStrategy 
> in the AsyncSinkBase. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-30156) [FLIP-242] Blogpost about the customisable RateLimitingStrategy in the AsyncSinkBase

2022-11-25 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-30156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger reassigned FLINK-30156:
--

Assignee: Hong Liang Teoh

> [FLIP-242] Blogpost about the customisable RateLimitingStrategy in the 
> AsyncSinkBase
> 
>
> Key: FLINK-30156
> URL: https://issues.apache.org/jira/browse/FLINK-30156
> Project: Flink
>  Issue Type: Improvement
>Reporter: Hong Liang Teoh
>Assignee: Hong Liang Teoh
>Priority: Major
>  Labels: pull-request-available
>
> Create a blogpost to explain the customisability of the RateLimitingStrategy 
> in the AsyncSinkBase. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-30083) Bump maven-shade-plugin to 3.4.1

2022-11-24 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-30083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-30083:
---
Summary: Bump maven-shade-plugin to 3.4.1  (was: Bump maven-shade-plugin to 
3.4.0)

> Bump maven-shade-plugin to 3.4.1
> 
>
> Key: FLINK-30083
> URL: https://issues.apache.org/jira/browse/FLINK-30083
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Affects Versions: 1.17.0
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.17.0
>
>
> FLINK-24273 proposes to relocate the io.fabric8 dependencies of 
> flink-kubernetes.
> This is not possible because of a problem with the maven shade plugin ("mvn 
> install" doesn't work, it needs to be "mvn clean install").
> MSHADE-425 solves this issue, and has been released with maven-shade-plugin 
> 3.4.0.
> Upgrading the shade plugin will solve the problem, unblocking the K8s 
> relocation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-24273) Relocate io.fabric8 dependency

2022-11-24 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-24273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger closed FLINK-24273.
--
Fix Version/s: (was: 1.17.0)
 Assignee: (was: David Morávek)
   Resolution: Duplicate

> Relocate io.fabric8 dependency
> --
>
> Key: FLINK-24273
> URL: https://issues.apache.org/jira/browse/FLINK-24273
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / Kubernetes
>Reporter: David Morávek
>Priority: Major
>  Labels: pull-request-available, stale-assigned
>
> We are shipping io.fabric8 classes with flink-kubernetes shaded jar.
> We should relocate them to `org.apache.flink.kubernetes.shaded.io.fabric8`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-24273) Relocate io.fabric8 dependency

2022-11-24 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-24273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17638298#comment-17638298
 ] 

Robert Metzger commented on FLINK-24273:


Yes, this is indeed a duplicate. Closing this ticket and PR.

> Relocate io.fabric8 dependency
> --
>
> Key: FLINK-24273
> URL: https://issues.apache.org/jira/browse/FLINK-24273
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / Kubernetes
>Reporter: David Morávek
>Assignee: David Morávek
>Priority: Major
>  Labels: pull-request-available, stale-assigned
> Fix For: 1.17.0
>
>
> We are shipping io.fabric8 classes with flink-kubernetes shaded jar.
> We should relocate them to `org.apache.flink.kubernetes.shaded.io.fabric8`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-29779) Allow using MiniCluster with a PluginManager to use metrics reporters

2022-11-24 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-29779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger closed FLINK-29779.
--
Fix Version/s: 1.17.0
   (was: 1.16.1)
   Resolution: Fixed

Merged to master (for 1.17) in 
https://github.com/apache/flink/commit/9984b09cb9a2af9f2bde0e973cb8ce375942bd8c

> Allow using MiniCluster with a PluginManager to use metrics reporters
> -
>
> Key: FLINK-29779
> URL: https://issues.apache.org/jira/browse/FLINK-29779
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.16.0
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.17.0
>
>
> Currently, using MiniCluster with a metric reporter loaded as a plugin is not 
> supported, because the {{ReporterSetup.fromConfiguration(config, null));}} 
> gets passed {{null}} for the PluginManager.
> I think it generally valuable to allow passing a PluginManager to the 
> MiniCluster.
> I'll open a PR for this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-30093) [Flink SQL][Protobuf] CompileException when querying Kafka topic using google.protobuf.Timestamp

2022-11-18 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17635995#comment-17635995
 ] 

Robert Metzger commented on FLINK-30093:


[~maosuhan] Sorry for pinging you directly, could you take a look at this?

> [Flink SQL][Protobuf] CompileException when querying Kafka topic using 
> google.protobuf.Timestamp 
> -
>
> Key: FLINK-30093
> URL: https://issues.apache.org/jira/browse/FLINK-30093
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile), Table 
> SQL / Ecosystem
>Affects Versions: 1.16.0
> Environment: Mac OS Ventura
>Reporter: James Mcguire
>Priority: Major
> Attachments: taskmanager_172.22.0 (1).4_46291-40eec2_log
>
>
> I am encountering an issue when trying to use Flink SQL to query a Kafka 
> topic that uses {{{}google.protobuf.Timestamp{}}}.
>  
> When attempting to use Flink SQL to query a protobuf serialized Kafka topic 
> that uses  {{{}google.protobuf.Timestamp{}}}, a 
> {{org.codehaus.commons.compiler.CompileException: Line 23, Column 5: Cannot 
> determine simple type name "com" }}error occurs when trying to query the 
> table.
>  
> *Replication steps:*
> 1. Use a protobuf definition that contains a 
> {{{}google.protobuf.Timestamp{}}}:
> {noformat}
> syntax = "proto3";
> package example.message;
> import "google/protobuf/timestamp.proto";
> option java_package = "com.example.message";
> option java_multiple_files = true;
> message Test {
>   int64 id = 1;
>   google.protobuf.Timestamp created_at = 5;
> }{noformat}
> 2. Use protobuf definition to produce message to topic
> 3. Confirm message is deserializable by protoc:
> {code:java}
> kcat -C -t development.example.message -b localhost:9092 -o -1 -e -q -D "" | 
> protoc --decode=example.message.Test 
> --proto_path=/Users/jamesmcguire/repos/flink-proto-example/schemas/ 
> example/message/test.proto 
> id: 123
> created_at {
>   seconds: 456
>   nanos: 789
> }{code}
> 4. Create table in Flink SQL using kafka connector and protobuf format
> {code:java}
> CREATE TABLE tests (
>   id BIGINT,
>   created_at row
> )
> COMMENT ''
> WITH (
>   'connector' = 'kafka',
>   'format' = 'protobuf',
>   'protobuf.message-class-name' = 'com.example.message.Test',
>   'properties.auto.offset.reset' = 'earliest',
>   'properties.bootstrap.servers' = 'host.docker.internal:9092',
>   'properties.group.id' = 'test-1',
>   'topic' = 'development.example.message'
> );{code}
> 5. Run query in Flink SQL and encounter error:
> {code:java}
> Flink SQL> select * from tests;
> [ERROR] Could not execute SQL statement. Reason:
> org.codehaus.commons.compiler.CompileException: Line 23, Column 5: Cannot 
> determine simple type name "com" {code}
> {*}NOTE{*}: If you repeat steps 4-5 without {{created_at row nanos INT>}} in the table, step 5 will complete successfully.
> 6. Observe in attached log file, Flink appears to be using the incorrect 
> namespace (should be {{google.protobuf.Timestamp):}}
> {code:java}
> com.example.message.Timestamp message3 = message0.getCreatedAt(); {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-30083) Bump maven-shade-plugin to 3.4.0

2022-11-18 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-30083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger reassigned FLINK-30083:
--

Assignee: Robert Metzger

> Bump maven-shade-plugin to 3.4.0
> 
>
> Key: FLINK-30083
> URL: https://issues.apache.org/jira/browse/FLINK-30083
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Affects Versions: 1.17.0
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Major
> Fix For: 1.17.0
>
>
> FLINK-24273 proposes to relocate the io.fabric8 dependencies of 
> flink-kubernetes.
> This is not possible because of a problem with the maven shade plugin ("mvn 
> install" doesn't work, it needs to be "mvn clean install").
> MSHADE-425 solves this issue, and has been released with maven-shade-plugin 
> 3.4.0.
> Upgrading the shade plugin will solve the problem, unblocking the K8s 
> relocation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30083) Bump maven-shade-plugin to 3.4.0

2022-11-18 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-30083:
--

 Summary: Bump maven-shade-plugin to 3.4.0
 Key: FLINK-30083
 URL: https://issues.apache.org/jira/browse/FLINK-30083
 Project: Flink
  Issue Type: Improvement
  Components: Build System
Affects Versions: 1.17.0
Reporter: Robert Metzger
 Fix For: 1.17.0


FLINK-24273 proposes to relocate the io.fabric8 dependencies of 
flink-kubernetes.
This is not possible because of a problem with the maven shade plugin ("mvn 
install" doesn't work, it needs to be "mvn clean install").
MSHADE-425 solves this issue, and has been released with maven-shade-plugin 
3.4.0.

Upgrading the shade plugin will solve the problem, unblocking the K8s 
relocation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-24273) Relocate io.fabric8 dependency

2022-11-17 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-24273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17635371#comment-17635371
 ] 

Robert Metzger commented on FLINK-24273:


+1 to fix this.
I'm facing a similar issue when trying to use the 
"io.fabric8.kubernetes.client.server.mock.KubernetesMockServer". This breaks 
because of the lack of relocation.

> Relocate io.fabric8 dependency
> --
>
> Key: FLINK-24273
> URL: https://issues.apache.org/jira/browse/FLINK-24273
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / Kubernetes
>Reporter: David Morávek
>Assignee: David Morávek
>Priority: Major
>  Labels: pull-request-available, stale-assigned
> Fix For: 1.17.0
>
>
> We are shipping io.fabric8 classes with flink-kubernetes shaded jar.
> We should relocate them to `org.apache.flink.kubernetes.shaded.io.fabric8`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-29427) LookupJoinITCase failed with classloader problem

2022-10-31 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17626607#comment-17626607
 ] 

Robert Metzger commented on FLINK-29427:


Another instance: 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=42642=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4

I'm including the entire stack trace here again for better discoverability of 
the ticket
{code}
2022-10-31T10:36:52.5727692Z Oct 31 10:36:52 [ERROR] 
LookupJoinITCase.testJoinTemporalTableWithComputedColumnAndPushDown  Time 
elapsed: 0.259 s  <<< ERROR!
2022-10-31T10:36:52.5728256Z Oct 31 10:36:52 java.lang.RuntimeException: Failed 
to fetch next result
2022-10-31T10:36:52.5729008Z Oct 31 10:36:52at 
org.apache.flink.streaming.api.operators.collect.CollectResultIterator.nextResultFromFetcher(CollectResultIterator.java:109)
2022-10-31T10:36:52.5729945Z Oct 31 10:36:52at 
org.apache.flink.streaming.api.operators.collect.CollectResultIterator.hasNext(CollectResultIterator.java:80)
2022-10-31T10:36:52.5730765Z Oct 31 10:36:52at 
org.apache.flink.table.planner.connectors.CollectDynamicSink$CloseableRowIteratorWrapper.hasNext(CollectDynamicSink.java:222)
2022-10-31T10:36:52.5731457Z Oct 31 10:36:52at 
java.util.Iterator.forEachRemaining(Iterator.java:115)
2022-10-31T10:36:52.5732046Z Oct 31 10:36:52at 
org.apache.flink.util.CollectionUtil.iteratorToList(CollectionUtil.java:115)
2022-10-31T10:36:52.5732709Z Oct 31 10:36:52at 
org.apache.flink.table.planner.runtime.utils.BatchTestBase.executeQuery(BatchTestBase.scala:308)
2022-10-31T10:36:52.5733412Z Oct 31 10:36:52at 
org.apache.flink.table.planner.runtime.utils.BatchTestBase.check(BatchTestBase.scala:144)
2022-10-31T10:36:52.5734111Z Oct 31 10:36:52at 
org.apache.flink.table.planner.runtime.utils.BatchTestBase.checkResult(BatchTestBase.scala:108)
2022-10-31T10:36:52.5734927Z Oct 31 10:36:52at 
org.apache.flink.table.planner.runtime.batch.sql.join.LookupJoinITCase.testJoinTemporalTableWithComputedColumnAndPushDown(LookupJoinITCase.scala:350)
2022-10-31T10:36:52.5735660Z Oct 31 10:36:52at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2022-10-31T10:36:52.5736232Z Oct 31 10:36:52at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
2022-10-31T10:36:52.5736871Z Oct 31 10:36:52at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2022-10-31T10:36:52.5737463Z Oct 31 10:36:52at 
java.lang.reflect.Method.invoke(Method.java:498)
2022-10-31T10:36:52.5738053Z Oct 31 10:36:52at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
2022-10-31T10:36:52.5738698Z Oct 31 10:36:52at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
2022-10-31T10:36:52.5739454Z Oct 31 10:36:52at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
2022-10-31T10:36:52.5740160Z Oct 31 10:36:52at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
2022-10-31T10:36:52.5740803Z Oct 31 10:36:52at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
2022-10-31T10:36:52.5741416Z Oct 31 10:36:52at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
2022-10-31T10:36:52.5742038Z Oct 31 10:36:52at 
org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
2022-10-31T10:36:52.5742704Z Oct 31 10:36:52at 
org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
2022-10-31T10:36:52.5743255Z Oct 31 10:36:52at 
org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
2022-10-31T10:36:52.5743882Z Oct 31 10:36:52at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
2022-10-31T10:36:52.5744496Z Oct 31 10:36:52at 
org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
2022-10-31T10:36:52.5745093Z Oct 31 10:36:52at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
2022-10-31T10:36:52.5745755Z Oct 31 10:36:52at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
2022-10-31T10:36:52.5746353Z Oct 31 10:36:52at 
org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
2022-10-31T10:36:52.5746901Z Oct 31 10:36:52at 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
2022-10-31T10:36:52.5747477Z Oct 31 10:36:52at 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
2022-10-31T10:36:52.5748061Z Oct 31 10:36:52at 
org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
2022-10-31T10:36:52.5748617Z Oct 31 10:36:52at 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
2022-10-31T10:36:52.5749257Z Oct 31 10:36:52at 

[jira] [Commented] (FLINK-28790) Incorrect KafkaProducer metrics initialization

2022-10-31 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-28790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17626469#comment-17626469
 ] 

Robert Metzger commented on FLINK-28790:


[~chesnay] Do you have any advise here?
I don't think we can break existing metric names for the Kafka connector.

Other ideas:
- Introduce a config parameter to switch the behavior
- add the `producer-metrics` and `producer-node-metrics` prefixes in addition 
to the current behavior: this will introduce more and duplicate metrics though.

> Incorrect KafkaProducer metrics initialization
> --
>
> Key: FLINK-28790
> URL: https://issues.apache.org/jira/browse/FLINK-28790
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.14.4, 1.15.1
>Reporter: Valentina Predtechenskaya
>Priority: Major
>
> Problem
> KafkaProducer Flink metrics have unpredictable behavior because of concurrent 
> initialization of broker's and topic's metrics.
> Reproducing
> Firstly we found the problem with our Flink cluster: metric 
> KafkaProducer.outgoing-byte-rate was periodically missing (was equals zero or 
> near zero) on several subtasks, in the same time other subtasks was fine with 
> this metric. Actual outgoing rate was the same on different subtasks, it was 
> clear from, for example, KafkaProducer.records-send-rate metric, which was ok 
> on every subtask, problem 100% was with metric itself.
> After long investigation we found the root-cause of this behavior:
>  
> * KafkaWriter creates an instance of FlinkKafkaInternalProducer and then 
> [initializes|https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/sink/KafkaWriter.java#L327-L330]
>  metric wrappers over existing KafkaProducer metrics (gauges)
> * KafkaProducer itself in the constructor [creates 
> Sender|https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L458-L460]
>  to access brokers, starts a thread (kafka-producer-network-thread) and run 
> Sender in this separate thread
> * After starting the Sender, metrics connected with topics and brokers 
> register for some time. If they register quickly, KafkaWriter will see them 
> before the end of initialization and these metrics will be wrapped as flink 
> gauges. Otherwise, they will not.
> * [Some KafkaProducer 
> metrics|https://docs.confluent.io/platform/current/kafka/monitoring.html#producer-metrics]
>  from producer and from broker has same names - for example, 
> outgoing-byte-rate
> * In case if two metrics has same name, Flink KafkaWriter 
> [rewrites|https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/sink/KafkaWriter.java#L359-L360]
>  metric in wrapper
> So, to reproduce this bug it's enough to run any job with Kafka Sink and to 
> look at the KafkaProducer metrics, some of them will be absent (broker's or 
> topic's) or some of them will be rewritten (like outgoing-byte-rate in the 
> example).
> I suppose there is at least two ways to fix it:
> 1. Add tag (producer-metric, producer-node-metric, etc.) to Flinks metrics 
> name
> 2. Use only metrics with tag=producer-metrics, ignore any another tags - 
> without considering broker's and topic's metrics
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   3   4   5   6   7   8   9   10   >