[jira] [Commented] (FLINK-8020) Deadlock found in Async I/O operator
[ https://issues.apache.org/jira/browse/FLINK-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357296#comment-16357296 ] Tzu-Li (Gordon) Tai commented on FLINK-8020: Moving this to 1.4.2, since on the mailing lists the community agreed to move forward with what we have already for 1.4.1. Please reopen and let me know if you disagree. > Deadlock found in Async I/O operator > > > Key: FLINK-8020 > URL: https://issues.apache.org/jira/browse/FLINK-8020 > Project: Flink > Issue Type: Bug > Components: Kafka Connector, Streaming, Streaming Connectors >Affects Versions: 1.3.2 > Environment: Kafka 0.8.2 and Flink 1.3.2 on YARN mode >Reporter: Weihua Jiang >Priority: Critical > Fix For: 1.5.0, 1.4.2 > > Attachments: jstack53009(2).out, jstack67976-2.log > > > Our streaming job run into trouble in these days after a long time smooth > running. One issue we found is > [https://issues.apache.org/jira/browse/FLINK-8019] and another one is this > one. > After analyzing the jstack, we believe we found a DEAD LOCK in flink: > 1. The thread "cache-process0 -> async-operator0 -> Sink: hbase-sink0 (8/8)" > hold lock 0x0007b6aa1788 and is waiting for lock 0x0007b6aa1940. > 2. The thread "Time Trigger for cache-process0 -> async-operator0 -> Sink: > hbase-sink0 (8/8)" hold lock 0x0007b6aa1940 and is waiting for lock > 0x0007b6aa1788. > This DEADLOCK made the job fail to proceed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8020) Deadlock found in Async I/O operator
[ https://issues.apache.org/jira/browse/FLINK-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349926#comment-16349926 ] Stephan Ewen commented on FLINK-8020: - I think the original diagnosis is not quite correct - The thread locks {{0x0007b692ad98}} (which is the checkpoint lock) and then waits in the lock (as in {{Object.wait()}}), which means it waits for a notification. The thread is not blocked on the lock (note that Java Thread state WAITING is not BLOCKING). I am curious if this is still a problem? The stack trace actually does not show a specific problem - at a first glance, it looks as if Async I/O operations (to HBase?) do not complete. Because of that, the pipeline stops and waits for those async i/o requests.waits > Deadlock found in Async I/O operator > > > Key: FLINK-8020 > URL: https://issues.apache.org/jira/browse/FLINK-8020 > Project: Flink > Issue Type: Bug > Components: Kafka Connector, Streaming, Streaming Connectors >Affects Versions: 1.3.2 > Environment: Kafka 0.8.2 and Flink 1.3.2 on YARN mode >Reporter: Weihua Jiang >Priority: Blocker > Fix For: 1.5.0, 1.4.1 > > Attachments: jstack53009(2).out, jstack67976-2.log > > > Our streaming job run into trouble in these days after a long time smooth > running. One issue we found is > [https://issues.apache.org/jira/browse/FLINK-8019] and another one is this > one. > After analyzing the jstack, we believe we found a DEAD LOCK in flink: > 1. The thread "cache-process0 -> async-operator0 -> Sink: hbase-sink0 (8/8)" > hold lock 0x0007b6aa1788 and is waiting for lock 0x0007b6aa1940. > 2. The thread "Time Trigger for cache-process0 -> async-operator0 -> Sink: > hbase-sink0 (8/8)" hold lock 0x0007b6aa1940 and is waiting for lock > 0x0007b6aa1788. > This DEADLOCK made the job fail to proceed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8020) Deadlock found in Async I/O operator
[ https://issues.apache.org/jira/browse/FLINK-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349913#comment-16349913 ] Stephan Ewen commented on FLINK-8020: - Updated the title to better describe the root problem. > Deadlock found in Async I/O operator > > > Key: FLINK-8020 > URL: https://issues.apache.org/jira/browse/FLINK-8020 > Project: Flink > Issue Type: Bug > Components: Kafka Connector, Streaming, Streaming Connectors >Affects Versions: 1.3.2 > Environment: Kafka 0.8.2 and Flink 1.3.2 on YARN mode >Reporter: Weihua Jiang >Priority: Blocker > Fix For: 1.5.0, 1.4.1 > > Attachments: jstack53009(2).out, jstack67976-2.log > > > Our streaming job run into trouble in these days after a long time smooth > running. One issue we found is > [https://issues.apache.org/jira/browse/FLINK-8019] and another one is this > one. > After analyzing the jstack, we believe we found a DEAD LOCK in flink: > 1. The thread "cache-process0 -> async-operator0 -> Sink: hbase-sink0 (8/8)" > hold lock 0x0007b6aa1788 and is waiting for lock 0x0007b6aa1940. > 2. The thread "Time Trigger for cache-process0 -> async-operator0 -> Sink: > hbase-sink0 (8/8)" hold lock 0x0007b6aa1940 and is waiting for lock > 0x0007b6aa1788. > This DEADLOCK made the job fail to proceed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)