[jira] [Commented] (FLINK-34424) BoundedBlockingSubpartitionWriteReadTest#testRead10ConsumersConcurrent times out

2024-02-21 Thread Yunfeng Zhou (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819495#comment-17819495
 ] 

Yunfeng Zhou commented on FLINK-34424:
--

Hi [~mapohl]  and [~pnowojski], [~tanyuxin] and I have not been able to find 
the cause of this problem for now. As far as I have investigated, a Java thread 
might be blocked without an explicit "waiting to lock ..." hint if it involves 
JNI calls like MonitorEnter/MonitorExit, internal locks like Semaphore and 
CountDownLatch, or low-level synchronization primitives like 
LockSupport.park(). However I did not find any match of these patterns in the 
blocked thread's stack. Besides, it seems that Java's GC might also cause a 
thread to be in blocking status, so we are not even sure there are blocking or 
deadlock issues to resolve.

Given that we could not reproduce this error in our local environment, nor have 
we found a similar error in Flink CI history, it seems that the error is a 
low-probability issue that can be temporarily ignored. Could we mark this issue 
as low priority for now and maybe revisit it when we have more inputs on 
exceptions like this?

> BoundedBlockingSubpartitionWriteReadTest#testRead10ConsumersConcurrent times 
> out
> 
>
> Key: FLINK-34424
> URL: https://issues.apache.org/jira/browse/FLINK-34424
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.19.0, 1.20.0
>Reporter: Matthias Pohl
>Assignee: Yunfeng Zhou
>Priority: Critical
>  Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57446=logs=0da23115-68bb-5dcd-192c-bd4c8adebde1=24c3384f-1bcb-57b3-224f-51bf973bbee8=9151
> {code}
> Feb 11 13:55:29 "ForkJoinPool-50-worker-25" #414 daemon prio=5 os_prio=0 
> tid=0x7f19503af800 nid=0x284c in Object.wait() [0x7f191b6db000]
> Feb 11 13:55:29java.lang.Thread.State: WAITING (on object monitor)
> Feb 11 13:55:29   at java.lang.Object.wait(Native Method)
> Feb 11 13:55:29   at java.lang.Thread.join(Thread.java:1252)
> Feb 11 13:55:29   - locked <0xe2e019a8> (a 
> org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartitionWriteReadTest$LongReader)
> Feb 11 13:55:29   at 
> org.apache.flink.core.testutils.CheckedThread.trySync(CheckedThread.java:104)
> Feb 11 13:55:29   at 
> org.apache.flink.core.testutils.CheckedThread.sync(CheckedThread.java:92)
> Feb 11 13:55:29   at 
> org.apache.flink.core.testutils.CheckedThread.sync(CheckedThread.java:81)
> Feb 11 13:55:29   at 
> org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartitionWriteReadTest.testRead10ConsumersConcurrent(BoundedBlockingSubpartitionWriteReadTest.java:177)
> Feb 11 13:55:29   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34424) BoundedBlockingSubpartitionWriteReadTest#testRead10ConsumersConcurrent times out

2024-02-19 Thread Yuxin Tan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818615#comment-17818615
 ] 

Yuxin Tan commented on FLINK-34424:
---

[~mapohl] [~pnowojski] Sorry for the late reply. I tried to reproduce the 
issue, but it can not be reproduced in my local environment. I think it may be 
an occasional case with a low probability. I and [~yunfengzhou] will continue 
investigating the cause.

> BoundedBlockingSubpartitionWriteReadTest#testRead10ConsumersConcurrent times 
> out
> 
>
> Key: FLINK-34424
> URL: https://issues.apache.org/jira/browse/FLINK-34424
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.19.0, 1.20.0
>Reporter: Matthias Pohl
>Assignee: Yunfeng Zhou
>Priority: Critical
>  Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57446=logs=0da23115-68bb-5dcd-192c-bd4c8adebde1=24c3384f-1bcb-57b3-224f-51bf973bbee8=9151
> {code}
> Feb 11 13:55:29 "ForkJoinPool-50-worker-25" #414 daemon prio=5 os_prio=0 
> tid=0x7f19503af800 nid=0x284c in Object.wait() [0x7f191b6db000]
> Feb 11 13:55:29java.lang.Thread.State: WAITING (on object monitor)
> Feb 11 13:55:29   at java.lang.Object.wait(Native Method)
> Feb 11 13:55:29   at java.lang.Thread.join(Thread.java:1252)
> Feb 11 13:55:29   - locked <0xe2e019a8> (a 
> org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartitionWriteReadTest$LongReader)
> Feb 11 13:55:29   at 
> org.apache.flink.core.testutils.CheckedThread.trySync(CheckedThread.java:104)
> Feb 11 13:55:29   at 
> org.apache.flink.core.testutils.CheckedThread.sync(CheckedThread.java:92)
> Feb 11 13:55:29   at 
> org.apache.flink.core.testutils.CheckedThread.sync(CheckedThread.java:81)
> Feb 11 13:55:29   at 
> org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartitionWriteReadTest.testRead10ConsumersConcurrent(BoundedBlockingSubpartitionWriteReadTest.java:177)
> Feb 11 13:55:29   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34424) BoundedBlockingSubpartitionWriteReadTest#testRead10ConsumersConcurrent times out

2024-02-19 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818444#comment-17818444
 ] 

Matthias Pohl commented on FLINK-34424:
---

Thank you. Much appreciated :)

> BoundedBlockingSubpartitionWriteReadTest#testRead10ConsumersConcurrent times 
> out
> 
>
> Key: FLINK-34424
> URL: https://issues.apache.org/jira/browse/FLINK-34424
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.19.0, 1.20.0
>Reporter: Matthias Pohl
>Priority: Critical
>  Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57446=logs=0da23115-68bb-5dcd-192c-bd4c8adebde1=24c3384f-1bcb-57b3-224f-51bf973bbee8=9151
> {code}
> Feb 11 13:55:29 "ForkJoinPool-50-worker-25" #414 daemon prio=5 os_prio=0 
> tid=0x7f19503af800 nid=0x284c in Object.wait() [0x7f191b6db000]
> Feb 11 13:55:29java.lang.Thread.State: WAITING (on object monitor)
> Feb 11 13:55:29   at java.lang.Object.wait(Native Method)
> Feb 11 13:55:29   at java.lang.Thread.join(Thread.java:1252)
> Feb 11 13:55:29   - locked <0xe2e019a8> (a 
> org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartitionWriteReadTest$LongReader)
> Feb 11 13:55:29   at 
> org.apache.flink.core.testutils.CheckedThread.trySync(CheckedThread.java:104)
> Feb 11 13:55:29   at 
> org.apache.flink.core.testutils.CheckedThread.sync(CheckedThread.java:92)
> Feb 11 13:55:29   at 
> org.apache.flink.core.testutils.CheckedThread.sync(CheckedThread.java:81)
> Feb 11 13:55:29   at 
> org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartitionWriteReadTest.testRead10ConsumersConcurrent(BoundedBlockingSubpartitionWriteReadTest.java:177)
> Feb 11 13:55:29   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34424) BoundedBlockingSubpartitionWriteReadTest#testRead10ConsumersConcurrent times out

2024-02-19 Thread Yunfeng Zhou (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818436#comment-17818436
 ] 

Yunfeng Zhou commented on FLINK-34424:
--

[~mapohl] [~pnowojski] I'll try to figure out the cause of this problem. 
FLINK-33743 is not supposed to affect Flink's behavior when using bounded 
blocking shuffle, but anyway I'll look into this.

> BoundedBlockingSubpartitionWriteReadTest#testRead10ConsumersConcurrent times 
> out
> 
>
> Key: FLINK-34424
> URL: https://issues.apache.org/jira/browse/FLINK-34424
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.19.0, 1.20.0
>Reporter: Matthias Pohl
>Priority: Critical
>  Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57446=logs=0da23115-68bb-5dcd-192c-bd4c8adebde1=24c3384f-1bcb-57b3-224f-51bf973bbee8=9151
> {code}
> Feb 11 13:55:29 "ForkJoinPool-50-worker-25" #414 daemon prio=5 os_prio=0 
> tid=0x7f19503af800 nid=0x284c in Object.wait() [0x7f191b6db000]
> Feb 11 13:55:29java.lang.Thread.State: WAITING (on object monitor)
> Feb 11 13:55:29   at java.lang.Object.wait(Native Method)
> Feb 11 13:55:29   at java.lang.Thread.join(Thread.java:1252)
> Feb 11 13:55:29   - locked <0xe2e019a8> (a 
> org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartitionWriteReadTest$LongReader)
> Feb 11 13:55:29   at 
> org.apache.flink.core.testutils.CheckedThread.trySync(CheckedThread.java:104)
> Feb 11 13:55:29   at 
> org.apache.flink.core.testutils.CheckedThread.sync(CheckedThread.java:92)
> Feb 11 13:55:29   at 
> org.apache.flink.core.testutils.CheckedThread.sync(CheckedThread.java:81)
> Feb 11 13:55:29   at 
> org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartitionWriteReadTest.testRead10ConsumersConcurrent(BoundedBlockingSubpartitionWriteReadTest.java:177)
> Feb 11 13:55:29   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34424) BoundedBlockingSubpartitionWriteReadTest#testRead10ConsumersConcurrent times out

2024-02-14 Thread Piotr Nowojski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817251#comment-17817251
 ] 

Piotr Nowojski commented on FLINK-34424:


:D

[~mapohl] I haven't made any modifications to this area of code since a very 
long time and I unfortunately recently don't have much time to investigate this 
kind of issues. I would suggest to ping someone else. Maybe authors of the most 
recent change to this test file: FLINK-33743


> BoundedBlockingSubpartitionWriteReadTest#testRead10ConsumersConcurrent times 
> out
> 
>
> Key: FLINK-34424
> URL: https://issues.apache.org/jira/browse/FLINK-34424
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.19.0, 1.20.0
>Reporter: Matthias Pohl
>Priority: Critical
>  Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57446=logs=0da23115-68bb-5dcd-192c-bd4c8adebde1=24c3384f-1bcb-57b3-224f-51bf973bbee8=9151
> {code}
> Feb 11 13:55:29 "ForkJoinPool-50-worker-25" #414 daemon prio=5 os_prio=0 
> tid=0x7f19503af800 nid=0x284c in Object.wait() [0x7f191b6db000]
> Feb 11 13:55:29java.lang.Thread.State: WAITING (on object monitor)
> Feb 11 13:55:29   at java.lang.Object.wait(Native Method)
> Feb 11 13:55:29   at java.lang.Thread.join(Thread.java:1252)
> Feb 11 13:55:29   - locked <0xe2e019a8> (a 
> org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartitionWriteReadTest$LongReader)
> Feb 11 13:55:29   at 
> org.apache.flink.core.testutils.CheckedThread.trySync(CheckedThread.java:104)
> Feb 11 13:55:29   at 
> org.apache.flink.core.testutils.CheckedThread.sync(CheckedThread.java:92)
> Feb 11 13:55:29   at 
> org.apache.flink.core.testutils.CheckedThread.sync(CheckedThread.java:81)
> Feb 11 13:55:29   at 
> org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartitionWriteReadTest.testRead10ConsumersConcurrent(BoundedBlockingSubpartitionWriteReadTest.java:177)
> Feb 11 13:55:29   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34424) BoundedBlockingSubpartitionWriteReadTest#testRead10ConsumersConcurrent times out

2024-02-13 Thread Piotr Nowicki (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817083#comment-17817083
 ] 

Piotr Nowicki commented on FLINK-34424:
---

Yes, we did! A week ago was an anniversary of the last mention :D

No problem and good luck with concurrency issues :)

> BoundedBlockingSubpartitionWriteReadTest#testRead10ConsumersConcurrent times 
> out
> 
>
> Key: FLINK-34424
> URL: https://issues.apache.org/jira/browse/FLINK-34424
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.19.0, 1.20.0
>Reporter: Matthias Pohl
>Priority: Critical
>  Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57446=logs=0da23115-68bb-5dcd-192c-bd4c8adebde1=24c3384f-1bcb-57b3-224f-51bf973bbee8=9151
> {code}
> Feb 11 13:55:29 "ForkJoinPool-50-worker-25" #414 daemon prio=5 os_prio=0 
> tid=0x7f19503af800 nid=0x284c in Object.wait() [0x7f191b6db000]
> Feb 11 13:55:29java.lang.Thread.State: WAITING (on object monitor)
> Feb 11 13:55:29   at java.lang.Object.wait(Native Method)
> Feb 11 13:55:29   at java.lang.Thread.join(Thread.java:1252)
> Feb 11 13:55:29   - locked <0xe2e019a8> (a 
> org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartitionWriteReadTest$LongReader)
> Feb 11 13:55:29   at 
> org.apache.flink.core.testutils.CheckedThread.trySync(CheckedThread.java:104)
> Feb 11 13:55:29   at 
> org.apache.flink.core.testutils.CheckedThread.sync(CheckedThread.java:92)
> Feb 11 13:55:29   at 
> org.apache.flink.core.testutils.CheckedThread.sync(CheckedThread.java:81)
> Feb 11 13:55:29   at 
> org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartitionWriteReadTest.testRead10ConsumersConcurrent(BoundedBlockingSubpartitionWriteReadTest.java:177)
> Feb 11 13:55:29   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34424) BoundedBlockingSubpartitionWriteReadTest#testRead10ConsumersConcurrent times out

2024-02-13 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817082#comment-17817082
 ] 

Matthias Pohl commented on FLINK-34424:
---

args. Didn't we have this in the past? Sorry again - the auto completion and 
the guy behind the screen are to blame here. Yes, you're right.

> BoundedBlockingSubpartitionWriteReadTest#testRead10ConsumersConcurrent times 
> out
> 
>
> Key: FLINK-34424
> URL: https://issues.apache.org/jira/browse/FLINK-34424
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.19.0, 1.20.0
>Reporter: Matthias Pohl
>Priority: Critical
>  Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57446=logs=0da23115-68bb-5dcd-192c-bd4c8adebde1=24c3384f-1bcb-57b3-224f-51bf973bbee8=9151
> {code}
> Feb 11 13:55:29 "ForkJoinPool-50-worker-25" #414 daemon prio=5 os_prio=0 
> tid=0x7f19503af800 nid=0x284c in Object.wait() [0x7f191b6db000]
> Feb 11 13:55:29java.lang.Thread.State: WAITING (on object monitor)
> Feb 11 13:55:29   at java.lang.Object.wait(Native Method)
> Feb 11 13:55:29   at java.lang.Thread.join(Thread.java:1252)
> Feb 11 13:55:29   - locked <0xe2e019a8> (a 
> org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartitionWriteReadTest$LongReader)
> Feb 11 13:55:29   at 
> org.apache.flink.core.testutils.CheckedThread.trySync(CheckedThread.java:104)
> Feb 11 13:55:29   at 
> org.apache.flink.core.testutils.CheckedThread.sync(CheckedThread.java:92)
> Feb 11 13:55:29   at 
> org.apache.flink.core.testutils.CheckedThread.sync(CheckedThread.java:81)
> Feb 11 13:55:29   at 
> org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartitionWriteReadTest.testRead10ConsumersConcurrent(BoundedBlockingSubpartitionWriteReadTest.java:177)
> Feb 11 13:55:29   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34424) BoundedBlockingSubpartitionWriteReadTest#testRead10ConsumersConcurrent times out

2024-02-13 Thread Piotr Nowicki (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817073#comment-17817073
 ] 

Piotr Nowicki commented on FLINK-34424:
---

[~mapohl] I guess you wanted to mention [~pnowojski] ? :)

> BoundedBlockingSubpartitionWriteReadTest#testRead10ConsumersConcurrent times 
> out
> 
>
> Key: FLINK-34424
> URL: https://issues.apache.org/jira/browse/FLINK-34424
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.19.0, 1.20.0
>Reporter: Matthias Pohl
>Priority: Critical
>  Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57446=logs=0da23115-68bb-5dcd-192c-bd4c8adebde1=24c3384f-1bcb-57b3-224f-51bf973bbee8=9151
> {code}
> Feb 11 13:55:29 "ForkJoinPool-50-worker-25" #414 daemon prio=5 os_prio=0 
> tid=0x7f19503af800 nid=0x284c in Object.wait() [0x7f191b6db000]
> Feb 11 13:55:29java.lang.Thread.State: WAITING (on object monitor)
> Feb 11 13:55:29   at java.lang.Object.wait(Native Method)
> Feb 11 13:55:29   at java.lang.Thread.join(Thread.java:1252)
> Feb 11 13:55:29   - locked <0xe2e019a8> (a 
> org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartitionWriteReadTest$LongReader)
> Feb 11 13:55:29   at 
> org.apache.flink.core.testutils.CheckedThread.trySync(CheckedThread.java:104)
> Feb 11 13:55:29   at 
> org.apache.flink.core.testutils.CheckedThread.sync(CheckedThread.java:92)
> Feb 11 13:55:29   at 
> org.apache.flink.core.testutils.CheckedThread.sync(CheckedThread.java:81)
> Feb 11 13:55:29   at 
> org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartitionWriteReadTest.testRead10ConsumersConcurrent(BoundedBlockingSubpartitionWriteReadTest.java:177)
> Feb 11 13:55:29   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34424) BoundedBlockingSubpartitionWriteReadTest#testRead10ConsumersConcurrent times out

2024-02-13 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817069#comment-17817069
 ] 

Matthias Pohl commented on FLINK-34424:
---

[~piotr.nowicki] (because it's networking; feel free to delegate) 
[~yunfengzhou] (because you touched the code in FLINK-33743 recently): Can 
someone help with investigating the cause of the issue?

> BoundedBlockingSubpartitionWriteReadTest#testRead10ConsumersConcurrent times 
> out
> 
>
> Key: FLINK-34424
> URL: https://issues.apache.org/jira/browse/FLINK-34424
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.19.0, 1.20.0
>Reporter: Matthias Pohl
>Priority: Critical
>  Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57446=logs=0da23115-68bb-5dcd-192c-bd4c8adebde1=24c3384f-1bcb-57b3-224f-51bf973bbee8=9151
> {code}
> Feb 11 13:55:29 "ForkJoinPool-50-worker-25" #414 daemon prio=5 os_prio=0 
> tid=0x7f19503af800 nid=0x284c in Object.wait() [0x7f191b6db000]
> Feb 11 13:55:29java.lang.Thread.State: WAITING (on object monitor)
> Feb 11 13:55:29   at java.lang.Object.wait(Native Method)
> Feb 11 13:55:29   at java.lang.Thread.join(Thread.java:1252)
> Feb 11 13:55:29   - locked <0xe2e019a8> (a 
> org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartitionWriteReadTest$LongReader)
> Feb 11 13:55:29   at 
> org.apache.flink.core.testutils.CheckedThread.trySync(CheckedThread.java:104)
> Feb 11 13:55:29   at 
> org.apache.flink.core.testutils.CheckedThread.sync(CheckedThread.java:92)
> Feb 11 13:55:29   at 
> org.apache.flink.core.testutils.CheckedThread.sync(CheckedThread.java:81)
> Feb 11 13:55:29   at 
> org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartitionWriteReadTest.testRead10ConsumersConcurrent(BoundedBlockingSubpartitionWriteReadTest.java:177)
> Feb 11 13:55:29   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34424) BoundedBlockingSubpartitionWriteReadTest#testRead10ConsumersConcurrent times out

2024-02-13 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817063#comment-17817063
 ] 

Matthias Pohl commented on FLINK-34424:
---

I'm wondering whether that has anything to do with the blocked reader thread:
{code}
Feb 11 13:55:29 "Thread-76" #476 daemon prio=5 os_prio=0 tid=0x7f190bbf1800 
nid=0x5a40 waiting for monitor entry [0x7f191bce4000]
Feb 11 13:55:29java.lang.Thread.State: BLOCKED (on object monitor)
Feb 11 13:55:29 at net.jpountz.lz4.LZ4JNI.LZ4_decompress_fast(Native 
Method)
Feb 11 13:55:29 at 
net.jpountz.lz4.LZ4JNIFastDecompressor.decompress(LZ4JNIFastDecompressor.java:70)
Feb 11 13:55:29 at 
org.apache.flink.runtime.io.compression.Lz4BlockDecompressor.decompress(Lz4BlockDecompressor.java:68)
Feb 11 13:55:29 at 
org.apache.flink.runtime.io.network.buffer.BufferDecompressor.decompress(BufferDecompressor.java:126)
Feb 11 13:55:29 at 
org.apache.flink.runtime.io.network.buffer.BufferDecompressor.decompressToIntermediateBuffer(BufferDecompressor.java:68)
Feb 11 13:55:29 at 
org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartitionWriteReadTest.readLongs(BoundedBlockingSubpartitionWriteReadTest.java:206)
Feb 11 13:55:29 at 
org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartitionWriteReadTest.access$000(BoundedBlockingSubpartitionWriteReadTest.java:55)
Feb 11 13:55:29 at 
org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartitionWriteReadTest$LongReader.go(BoundedBlockingSubpartitionWriteReadTest.java:323)
Feb 11 13:55:29 at 
org.apache.flink.core.testutils.CheckedThread.run(CheckedThread.java:67)
{code}

The test was started at 13:32:18.152 and timed out at 13:55:39

> BoundedBlockingSubpartitionWriteReadTest#testRead10ConsumersConcurrent times 
> out
> 
>
> Key: FLINK-34424
> URL: https://issues.apache.org/jira/browse/FLINK-34424
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.19.0, 1.20.0
>Reporter: Matthias Pohl
>Priority: Critical
>  Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57446=logs=0da23115-68bb-5dcd-192c-bd4c8adebde1=24c3384f-1bcb-57b3-224f-51bf973bbee8=9151
> {code}
> Feb 11 13:55:29 "ForkJoinPool-50-worker-25" #414 daemon prio=5 os_prio=0 
> tid=0x7f19503af800 nid=0x284c in Object.wait() [0x7f191b6db000]
> Feb 11 13:55:29java.lang.Thread.State: WAITING (on object monitor)
> Feb 11 13:55:29   at java.lang.Object.wait(Native Method)
> Feb 11 13:55:29   at java.lang.Thread.join(Thread.java:1252)
> Feb 11 13:55:29   - locked <0xe2e019a8> (a 
> org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartitionWriteReadTest$LongReader)
> Feb 11 13:55:29   at 
> org.apache.flink.core.testutils.CheckedThread.trySync(CheckedThread.java:104)
> Feb 11 13:55:29   at 
> org.apache.flink.core.testutils.CheckedThread.sync(CheckedThread.java:92)
> Feb 11 13:55:29   at 
> org.apache.flink.core.testutils.CheckedThread.sync(CheckedThread.java:81)
> Feb 11 13:55:29   at 
> org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartitionWriteReadTest.testRead10ConsumersConcurrent(BoundedBlockingSubpartitionWriteReadTest.java:177)
> Feb 11 13:55:29   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)