[jira] [Updated] (FLINK-11783) Deadlock during Join operation
[ https://issues.apache.org/jira/browse/FLINK-11783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-11783: --- Labels: auto-deprioritized-major auto-deprioritized-minor (was: auto-deprioritized-major stale-minor) Priority: Not a Priority (was: Minor) This issue was labeled "stale-minor" 7 days ago and has not received any updates so it is being deprioritized. If this ticket is actually Minor, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > Deadlock during Join operation > -- > > Key: FLINK-11783 > URL: https://issues.apache.org/jira/browse/FLINK-11783 > Project: Flink > Issue Type: Bug > Components: API / DataSet >Affects Versions: 1.7.2 >Reporter: Julien Nioche >Priority: Not a Priority > Labels: auto-deprioritized-major, auto-deprioritized-minor > Attachments: flink_is_stuck.png > > > I am running a filtering job on a large dataset with Flink running in > distributed mode. Most tasks in the Join operation have completed a while ago > and only the tasks from a particular TaskManager are still running. These > tasks make progress but extremely slowly. > When logging onto the machine running this TM I can see that all threads are > TIMED_WAITING . > Could there be a synchronization problem? > See attachment for a screenshot of the Flink UI and the stack below. > > *{{$ jstack 9183 | grep -A 15 "DataSetFilterJob"}}* > {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at > (DataSetFilterJob.java:67)) (66/150)" #155 prio=5 os_prio=0 > tid=0x7faa5c01c000 nid=0x248c waiting on condition [0x7fa9d15d5000]}} > {{ java.lang.Thread.State: TIMED_WAITING (parking)}} > {{ at sun.misc.Unsafe.park(Native Method)}} > {{ - parking to wait for <0x0007bfa89578> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)}} > {{ at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)}} > {{ at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)}} > {{ at > java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:98)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:43)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.nextSegment(ChannelReaderInputView.java:228)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.advance(AbstractPagedInputView.java:158)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.readByte(AbstractPagedInputView.java:271)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:278)}} > {{ at org.apache.flink.types.StringValue.readString(StringValue.java:746)}} > {{ at > org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:75)}} > {{ at > org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:80)}} > {{--}} > {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at > (DataSetFilterJob.java:67)) (65/150)" #154 prio=5 os_prio=0 > tid=0x7faa5c01b000 nid=0x248b waiting on condition [0x7fa9d14d4000]}} > {{ java.lang.Thread.State: TIMED_WAITING (parking)}} > {{ at sun.misc.Unsafe.park(Native Method)}} > {{ - parking to wait for <0x0007b8e0eb50> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)}} > {{ at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)}} > {{ at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)}} > {{ at > java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:98)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:43)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.nextSegment(ChannelReaderInputView.java:228)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.advance(AbstractPagedInputView.java:158)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.readByte(AbstractPagedInputView.java:271)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:278)}} > {{ at org.apache.flink.types.StringValue.readString(StringValue.java:746)}} > {{ at >
[jira] [Updated] (FLINK-11783) Deadlock during Join operation
[ https://issues.apache.org/jira/browse/FLINK-11783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-11783: --- Labels: auto-deprioritized-major stale-minor (was: auto-deprioritized-major) I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I help the community manage its development. I see this issues has been marked as Minor but is unassigned and neither itself nor its Sub-Tasks have been updated for 180 days. I have gone ahead and marked it "stale-minor". If this ticket is still Minor, please either assign yourself or give an update. Afterwards, please remove the label or in 7 days the issue will be deprioritized. > Deadlock during Join operation > -- > > Key: FLINK-11783 > URL: https://issues.apache.org/jira/browse/FLINK-11783 > Project: Flink > Issue Type: Bug > Components: API / DataSet >Affects Versions: 1.7.2 >Reporter: Julien Nioche >Priority: Minor > Labels: auto-deprioritized-major, stale-minor > Attachments: flink_is_stuck.png > > > I am running a filtering job on a large dataset with Flink running in > distributed mode. Most tasks in the Join operation have completed a while ago > and only the tasks from a particular TaskManager are still running. These > tasks make progress but extremely slowly. > When logging onto the machine running this TM I can see that all threads are > TIMED_WAITING . > Could there be a synchronization problem? > See attachment for a screenshot of the Flink UI and the stack below. > > *{{$ jstack 9183 | grep -A 15 "DataSetFilterJob"}}* > {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at > (DataSetFilterJob.java:67)) (66/150)" #155 prio=5 os_prio=0 > tid=0x7faa5c01c000 nid=0x248c waiting on condition [0x7fa9d15d5000]}} > {{ java.lang.Thread.State: TIMED_WAITING (parking)}} > {{ at sun.misc.Unsafe.park(Native Method)}} > {{ - parking to wait for <0x0007bfa89578> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)}} > {{ at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)}} > {{ at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)}} > {{ at > java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:98)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:43)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.nextSegment(ChannelReaderInputView.java:228)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.advance(AbstractPagedInputView.java:158)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.readByte(AbstractPagedInputView.java:271)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:278)}} > {{ at org.apache.flink.types.StringValue.readString(StringValue.java:746)}} > {{ at > org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:75)}} > {{ at > org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:80)}} > {{--}} > {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at > (DataSetFilterJob.java:67)) (65/150)" #154 prio=5 os_prio=0 > tid=0x7faa5c01b000 nid=0x248b waiting on condition [0x7fa9d14d4000]}} > {{ java.lang.Thread.State: TIMED_WAITING (parking)}} > {{ at sun.misc.Unsafe.park(Native Method)}} > {{ - parking to wait for <0x0007b8e0eb50> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)}} > {{ at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)}} > {{ at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)}} > {{ at > java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:98)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:43)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.nextSegment(ChannelReaderInputView.java:228)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.advance(AbstractPagedInputView.java:158)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.readByte(AbstractPagedInputView.java:271)}} > {{ at >
[jira] [Updated] (FLINK-11783) Deadlock during Join operation
[ https://issues.apache.org/jira/browse/FLINK-11783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-11783: --- Labels: auto-deprioritized-major (was: stale-major) > Deadlock during Join operation > -- > > Key: FLINK-11783 > URL: https://issues.apache.org/jira/browse/FLINK-11783 > Project: Flink > Issue Type: Bug > Components: API / DataSet >Affects Versions: 1.7.2 >Reporter: Julien Nioche >Priority: Major > Labels: auto-deprioritized-major > Attachments: flink_is_stuck.png > > > I am running a filtering job on a large dataset with Flink running in > distributed mode. Most tasks in the Join operation have completed a while ago > and only the tasks from a particular TaskManager are still running. These > tasks make progress but extremely slowly. > When logging onto the machine running this TM I can see that all threads are > TIMED_WAITING . > Could there be a synchronization problem? > See attachment for a screenshot of the Flink UI and the stack below. > > *{{$ jstack 9183 | grep -A 15 "DataSetFilterJob"}}* > {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at > (DataSetFilterJob.java:67)) (66/150)" #155 prio=5 os_prio=0 > tid=0x7faa5c01c000 nid=0x248c waiting on condition [0x7fa9d15d5000]}} > {{ java.lang.Thread.State: TIMED_WAITING (parking)}} > {{ at sun.misc.Unsafe.park(Native Method)}} > {{ - parking to wait for <0x0007bfa89578> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)}} > {{ at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)}} > {{ at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)}} > {{ at > java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:98)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:43)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.nextSegment(ChannelReaderInputView.java:228)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.advance(AbstractPagedInputView.java:158)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.readByte(AbstractPagedInputView.java:271)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:278)}} > {{ at org.apache.flink.types.StringValue.readString(StringValue.java:746)}} > {{ at > org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:75)}} > {{ at > org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:80)}} > {{--}} > {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at > (DataSetFilterJob.java:67)) (65/150)" #154 prio=5 os_prio=0 > tid=0x7faa5c01b000 nid=0x248b waiting on condition [0x7fa9d14d4000]}} > {{ java.lang.Thread.State: TIMED_WAITING (parking)}} > {{ at sun.misc.Unsafe.park(Native Method)}} > {{ - parking to wait for <0x0007b8e0eb50> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)}} > {{ at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)}} > {{ at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)}} > {{ at > java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:98)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:43)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.nextSegment(ChannelReaderInputView.java:228)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.advance(AbstractPagedInputView.java:158)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.readByte(AbstractPagedInputView.java:271)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:278)}} > {{ at org.apache.flink.types.StringValue.readString(StringValue.java:746)}} > {{ at > org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:75)}} > {{ at > org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:80)}} > {{--}} > {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at > (DataSetFilterJob.java:67)) (68/150)" #153 prio=5 os_prio=0 > tid=0x7faa5c019800 nid=0x248a
[jira] [Updated] (FLINK-11783) Deadlock during Join operation
[ https://issues.apache.org/jira/browse/FLINK-11783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-11783: --- Priority: Minor (was: Major) > Deadlock during Join operation > -- > > Key: FLINK-11783 > URL: https://issues.apache.org/jira/browse/FLINK-11783 > Project: Flink > Issue Type: Bug > Components: API / DataSet >Affects Versions: 1.7.2 >Reporter: Julien Nioche >Priority: Minor > Labels: auto-deprioritized-major > Attachments: flink_is_stuck.png > > > I am running a filtering job on a large dataset with Flink running in > distributed mode. Most tasks in the Join operation have completed a while ago > and only the tasks from a particular TaskManager are still running. These > tasks make progress but extremely slowly. > When logging onto the machine running this TM I can see that all threads are > TIMED_WAITING . > Could there be a synchronization problem? > See attachment for a screenshot of the Flink UI and the stack below. > > *{{$ jstack 9183 | grep -A 15 "DataSetFilterJob"}}* > {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at > (DataSetFilterJob.java:67)) (66/150)" #155 prio=5 os_prio=0 > tid=0x7faa5c01c000 nid=0x248c waiting on condition [0x7fa9d15d5000]}} > {{ java.lang.Thread.State: TIMED_WAITING (parking)}} > {{ at sun.misc.Unsafe.park(Native Method)}} > {{ - parking to wait for <0x0007bfa89578> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)}} > {{ at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)}} > {{ at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)}} > {{ at > java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:98)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:43)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.nextSegment(ChannelReaderInputView.java:228)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.advance(AbstractPagedInputView.java:158)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.readByte(AbstractPagedInputView.java:271)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:278)}} > {{ at org.apache.flink.types.StringValue.readString(StringValue.java:746)}} > {{ at > org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:75)}} > {{ at > org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:80)}} > {{--}} > {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at > (DataSetFilterJob.java:67)) (65/150)" #154 prio=5 os_prio=0 > tid=0x7faa5c01b000 nid=0x248b waiting on condition [0x7fa9d14d4000]}} > {{ java.lang.Thread.State: TIMED_WAITING (parking)}} > {{ at sun.misc.Unsafe.park(Native Method)}} > {{ - parking to wait for <0x0007b8e0eb50> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)}} > {{ at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)}} > {{ at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)}} > {{ at > java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:98)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:43)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.nextSegment(ChannelReaderInputView.java:228)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.advance(AbstractPagedInputView.java:158)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.readByte(AbstractPagedInputView.java:271)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:278)}} > {{ at org.apache.flink.types.StringValue.readString(StringValue.java:746)}} > {{ at > org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:75)}} > {{ at > org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:80)}} > {{--}} > {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at > (DataSetFilterJob.java:67)) (68/150)" #153 prio=5 os_prio=0 > tid=0x7faa5c019800 nid=0x248a waiting on condition
[jira] [Updated] (FLINK-11783) Deadlock during Join operation
[ https://issues.apache.org/jira/browse/FLINK-11783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-11783: --- Labels: stale-major (was: ) > Deadlock during Join operation > -- > > Key: FLINK-11783 > URL: https://issues.apache.org/jira/browse/FLINK-11783 > Project: Flink > Issue Type: Bug > Components: API / DataSet >Affects Versions: 1.7.2 >Reporter: Julien Nioche >Priority: Major > Labels: stale-major > Attachments: flink_is_stuck.png > > > I am running a filtering job on a large dataset with Flink running in > distributed mode. Most tasks in the Join operation have completed a while ago > and only the tasks from a particular TaskManager are still running. These > tasks make progress but extremely slowly. > When logging onto the machine running this TM I can see that all threads are > TIMED_WAITING . > Could there be a synchronization problem? > See attachment for a screenshot of the Flink UI and the stack below. > > *{{$ jstack 9183 | grep -A 15 "DataSetFilterJob"}}* > {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at > (DataSetFilterJob.java:67)) (66/150)" #155 prio=5 os_prio=0 > tid=0x7faa5c01c000 nid=0x248c waiting on condition [0x7fa9d15d5000]}} > {{ java.lang.Thread.State: TIMED_WAITING (parking)}} > {{ at sun.misc.Unsafe.park(Native Method)}} > {{ - parking to wait for <0x0007bfa89578> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)}} > {{ at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)}} > {{ at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)}} > {{ at > java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:98)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:43)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.nextSegment(ChannelReaderInputView.java:228)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.advance(AbstractPagedInputView.java:158)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.readByte(AbstractPagedInputView.java:271)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:278)}} > {{ at org.apache.flink.types.StringValue.readString(StringValue.java:746)}} > {{ at > org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:75)}} > {{ at > org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:80)}} > {{--}} > {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at > (DataSetFilterJob.java:67)) (65/150)" #154 prio=5 os_prio=0 > tid=0x7faa5c01b000 nid=0x248b waiting on condition [0x7fa9d14d4000]}} > {{ java.lang.Thread.State: TIMED_WAITING (parking)}} > {{ at sun.misc.Unsafe.park(Native Method)}} > {{ - parking to wait for <0x0007b8e0eb50> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)}} > {{ at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)}} > {{ at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)}} > {{ at > java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:98)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:43)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.nextSegment(ChannelReaderInputView.java:228)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.advance(AbstractPagedInputView.java:158)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.readByte(AbstractPagedInputView.java:271)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:278)}} > {{ at org.apache.flink.types.StringValue.readString(StringValue.java:746)}} > {{ at > org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:75)}} > {{ at > org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:80)}} > {{--}} > {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at > (DataSetFilterJob.java:67)) (68/150)" #153 prio=5 os_prio=0 > tid=0x7faa5c019800 nid=0x248a waiting on condition
[jira] [Updated] (FLINK-11783) Deadlock during Join operation
[ https://issues.apache.org/jira/browse/FLINK-11783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger updated FLINK-11783: --- Component/s: API / DataSet > Deadlock during Join operation > -- > > Key: FLINK-11783 > URL: https://issues.apache.org/jira/browse/FLINK-11783 > Project: Flink > Issue Type: Bug > Components: API / DataSet >Affects Versions: 1.7.2 >Reporter: Julien Nioche >Priority: Major > Attachments: flink_is_stuck.png > > > I am running a filtering job on a large dataset with Flink running in > distributed mode. Most tasks in the Join operation have completed a while ago > and only the tasks from a particular TaskManager are still running. These > tasks make progress but extremely slowly. > When logging onto the machine running this TM I can see that all threads are > TIMED_WAITING . > Could there be a synchronization problem? > See attachment for a screenshot of the Flink UI and the stack below. > > *{{$ jstack 9183 | grep -A 15 "DataSetFilterJob"}}* > {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at > (DataSetFilterJob.java:67)) (66/150)" #155 prio=5 os_prio=0 > tid=0x7faa5c01c000 nid=0x248c waiting on condition [0x7fa9d15d5000]}} > {{ java.lang.Thread.State: TIMED_WAITING (parking)}} > {{ at sun.misc.Unsafe.park(Native Method)}} > {{ - parking to wait for <0x0007bfa89578> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)}} > {{ at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)}} > {{ at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)}} > {{ at > java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:98)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:43)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.nextSegment(ChannelReaderInputView.java:228)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.advance(AbstractPagedInputView.java:158)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.readByte(AbstractPagedInputView.java:271)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:278)}} > {{ at org.apache.flink.types.StringValue.readString(StringValue.java:746)}} > {{ at > org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:75)}} > {{ at > org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:80)}} > {{--}} > {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at > (DataSetFilterJob.java:67)) (65/150)" #154 prio=5 os_prio=0 > tid=0x7faa5c01b000 nid=0x248b waiting on condition [0x7fa9d14d4000]}} > {{ java.lang.Thread.State: TIMED_WAITING (parking)}} > {{ at sun.misc.Unsafe.park(Native Method)}} > {{ - parking to wait for <0x0007b8e0eb50> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)}} > {{ at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)}} > {{ at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)}} > {{ at > java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:98)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:43)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.nextSegment(ChannelReaderInputView.java:228)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.advance(AbstractPagedInputView.java:158)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.readByte(AbstractPagedInputView.java:271)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:278)}} > {{ at org.apache.flink.types.StringValue.readString(StringValue.java:746)}} > {{ at > org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:75)}} > {{ at > org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:80)}} > {{--}} > {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at > (DataSetFilterJob.java:67)) (68/150)" #153 prio=5 os_prio=0 > tid=0x7faa5c019800 nid=0x248a waiting on condition [0x7fa981df6000]}} > {{ java.lang.Thread.State:
[jira] [Updated] (FLINK-11783) Deadlock during Join operation
[ https://issues.apache.org/jira/browse/FLINK-11783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated FLINK-11783: -- Attachment: (was: flink_is_stuck.png) > Deadlock during Join operation > -- > > Key: FLINK-11783 > URL: https://issues.apache.org/jira/browse/FLINK-11783 > Project: Flink > Issue Type: Bug >Affects Versions: 1.7.2 >Reporter: Julien Nioche >Priority: Major > > I am running a filtering job on a large dataset with Flink running in > distributed mode. Most tasks in the Join operation have completed a while ago > and only the tasks from a particular TaskManager are still running. These > tasks make progress but extremely slowly. > When logging onto the machine running this TM I can see that all threads are > TIMED_WAITING . > Could there be a synchronization problem? > See attachment for a screenshot of the Flink UI and the stack below. > > *{{$ jstack 9183 | grep -A 15 "DataSetFilterJob"}}* > {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at > (DataSetFilterJob.java:67)) (66/150)" #155 prio=5 os_prio=0 > tid=0x7faa5c01c000 nid=0x248c waiting on condition [0x7fa9d15d5000]}} > {{ java.lang.Thread.State: TIMED_WAITING (parking)}} > {{ at sun.misc.Unsafe.park(Native Method)}} > {{ - parking to wait for <0x0007bfa89578> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)}} > {{ at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)}} > {{ at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)}} > {{ at > java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:98)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:43)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.nextSegment(ChannelReaderInputView.java:228)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.advance(AbstractPagedInputView.java:158)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.readByte(AbstractPagedInputView.java:271)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:278)}} > {{ at org.apache.flink.types.StringValue.readString(StringValue.java:746)}} > {{ at > org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:75)}} > {{ at > org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:80)}} > {{--}} > {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at > (DataSetFilterJob.java:67)) (65/150)" #154 prio=5 os_prio=0 > tid=0x7faa5c01b000 nid=0x248b waiting on condition [0x7fa9d14d4000]}} > {{ java.lang.Thread.State: TIMED_WAITING (parking)}} > {{ at sun.misc.Unsafe.park(Native Method)}} > {{ - parking to wait for <0x0007b8e0eb50> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)}} > {{ at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)}} > {{ at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)}} > {{ at > java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:98)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:43)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.nextSegment(ChannelReaderInputView.java:228)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.advance(AbstractPagedInputView.java:158)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.readByte(AbstractPagedInputView.java:271)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:278)}} > {{ at org.apache.flink.types.StringValue.readString(StringValue.java:746)}} > {{ at > org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:75)}} > {{ at > org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:80)}} > {{--}} > {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at > (DataSetFilterJob.java:67)) (68/150)" #153 prio=5 os_prio=0 > tid=0x7faa5c019800 nid=0x248a waiting on condition [0x7fa981df6000]}} > {{ java.lang.Thread.State: TIMED_WAITING (parking)}} > {{ at sun.misc.Unsafe.park(Native Method)}}
[jira] [Updated] (FLINK-11783) Deadlock during Join operation
[ https://issues.apache.org/jira/browse/FLINK-11783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated FLINK-11783: -- Attachment: flink_is_stuck.png > Deadlock during Join operation > -- > > Key: FLINK-11783 > URL: https://issues.apache.org/jira/browse/FLINK-11783 > Project: Flink > Issue Type: Bug >Affects Versions: 1.7.2 >Reporter: Julien Nioche >Priority: Major > Attachments: flink_is_stuck.png > > > I am running a filtering job on a large dataset with Flink running in > distributed mode. Most tasks in the Join operation have completed a while ago > and only the tasks from a particular TaskManager are still running. These > tasks make progress but extremely slowly. > When logging onto the machine running this TM I can see that all threads are > TIMED_WAITING . > Could there be a synchronization problem? > See attachment for a screenshot of the Flink UI and the stack below. > > *{{$ jstack 9183 | grep -A 15 "DataSetFilterJob"}}* > {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at > (DataSetFilterJob.java:67)) (66/150)" #155 prio=5 os_prio=0 > tid=0x7faa5c01c000 nid=0x248c waiting on condition [0x7fa9d15d5000]}} > {{ java.lang.Thread.State: TIMED_WAITING (parking)}} > {{ at sun.misc.Unsafe.park(Native Method)}} > {{ - parking to wait for <0x0007bfa89578> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)}} > {{ at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)}} > {{ at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)}} > {{ at > java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:98)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:43)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.nextSegment(ChannelReaderInputView.java:228)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.advance(AbstractPagedInputView.java:158)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.readByte(AbstractPagedInputView.java:271)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:278)}} > {{ at org.apache.flink.types.StringValue.readString(StringValue.java:746)}} > {{ at > org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:75)}} > {{ at > org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:80)}} > {{--}} > {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at > (DataSetFilterJob.java:67)) (65/150)" #154 prio=5 os_prio=0 > tid=0x7faa5c01b000 nid=0x248b waiting on condition [0x7fa9d14d4000]}} > {{ java.lang.Thread.State: TIMED_WAITING (parking)}} > {{ at sun.misc.Unsafe.park(Native Method)}} > {{ - parking to wait for <0x0007b8e0eb50> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)}} > {{ at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)}} > {{ at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)}} > {{ at > java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:98)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:43)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.nextSegment(ChannelReaderInputView.java:228)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.advance(AbstractPagedInputView.java:158)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.readByte(AbstractPagedInputView.java:271)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:278)}} > {{ at org.apache.flink.types.StringValue.readString(StringValue.java:746)}} > {{ at > org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:75)}} > {{ at > org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:80)}} > {{--}} > {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at > (DataSetFilterJob.java:67)) (68/150)" #153 prio=5 os_prio=0 > tid=0x7faa5c019800 nid=0x248a waiting on condition [0x7fa981df6000]}} > {{ java.lang.Thread.State: TIMED_WAITING (parking)}} > {{ at
[jira] [Updated] (FLINK-11783) Deadlock during Join operation
[ https://issues.apache.org/jira/browse/FLINK-11783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated FLINK-11783: -- Attachment: flink_is_stuck.png > Deadlock during Join operation > -- > > Key: FLINK-11783 > URL: https://issues.apache.org/jira/browse/FLINK-11783 > Project: Flink > Issue Type: Bug >Affects Versions: 1.7.2 >Reporter: Julien Nioche >Priority: Major > > I am running a filtering job on a large dataset with Flink running in > distributed mode. Most tasks in the Join operation have completed a while ago > and only the tasks from a particular TaskManager are still running. These > tasks make progress but extremely slowly. > When logging onto the machine running this TM I can see that all threads are > TIMED_WAITING . > Could there be a synchronization problem? > See attachment for a screenshot of the Flink UI and the stack below. > > *{{$ jstack 9183 | grep -A 15 "DataSetFilterJob"}}* > {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at > (DataSetFilterJob.java:67)) (66/150)" #155 prio=5 os_prio=0 > tid=0x7faa5c01c000 nid=0x248c waiting on condition [0x7fa9d15d5000]}} > {{ java.lang.Thread.State: TIMED_WAITING (parking)}} > {{ at sun.misc.Unsafe.park(Native Method)}} > {{ - parking to wait for <0x0007bfa89578> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)}} > {{ at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)}} > {{ at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)}} > {{ at > java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:98)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:43)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.nextSegment(ChannelReaderInputView.java:228)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.advance(AbstractPagedInputView.java:158)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.readByte(AbstractPagedInputView.java:271)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:278)}} > {{ at org.apache.flink.types.StringValue.readString(StringValue.java:746)}} > {{ at > org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:75)}} > {{ at > org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:80)}} > {{--}} > {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at > (DataSetFilterJob.java:67)) (65/150)" #154 prio=5 os_prio=0 > tid=0x7faa5c01b000 nid=0x248b waiting on condition [0x7fa9d14d4000]}} > {{ java.lang.Thread.State: TIMED_WAITING (parking)}} > {{ at sun.misc.Unsafe.park(Native Method)}} > {{ - parking to wait for <0x0007b8e0eb50> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)}} > {{ at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)}} > {{ at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)}} > {{ at > java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:98)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:43)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.nextSegment(ChannelReaderInputView.java:228)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.advance(AbstractPagedInputView.java:158)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.readByte(AbstractPagedInputView.java:271)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:278)}} > {{ at org.apache.flink.types.StringValue.readString(StringValue.java:746)}} > {{ at > org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:75)}} > {{ at > org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:80)}} > {{--}} > {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at > (DataSetFilterJob.java:67)) (68/150)" #153 prio=5 os_prio=0 > tid=0x7faa5c019800 nid=0x248a waiting on condition [0x7fa981df6000]}} > {{ java.lang.Thread.State: TIMED_WAITING (parking)}} > {{ at sun.misc.Unsafe.park(Native Method)}} > {{ -
[jira] [Updated] (FLINK-11783) Deadlock during Join operation
[ https://issues.apache.org/jira/browse/FLINK-11783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated FLINK-11783: -- Attachment: (was: flink_is_stuck.png) > Deadlock during Join operation > -- > > Key: FLINK-11783 > URL: https://issues.apache.org/jira/browse/FLINK-11783 > Project: Flink > Issue Type: Bug >Affects Versions: 1.7.2 >Reporter: Julien Nioche >Priority: Major > > I am running a filtering job on a large dataset with Flink running in > distributed mode. Most tasks in the Join operation have completed a while ago > and only the tasks from a particular TaskManager are still running. These > tasks make progress but extremely slowly. > When logging onto the machine running this TM I can see that all threads are > TIMED_WAITING . > Could there be a synchronization problem? > See attachment for a screenshot of the Flink UI and the stack below. > > *{{$ jstack 9183 | grep -A 15 "DataSetFilterJob"}}* > {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at > (DataSetFilterJob.java:67)) (66/150)" #155 prio=5 os_prio=0 > tid=0x7faa5c01c000 nid=0x248c waiting on condition [0x7fa9d15d5000]}} > {{ java.lang.Thread.State: TIMED_WAITING (parking)}} > {{ at sun.misc.Unsafe.park(Native Method)}} > {{ - parking to wait for <0x0007bfa89578> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)}} > {{ at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)}} > {{ at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)}} > {{ at > java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:98)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:43)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.nextSegment(ChannelReaderInputView.java:228)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.advance(AbstractPagedInputView.java:158)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.readByte(AbstractPagedInputView.java:271)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:278)}} > {{ at org.apache.flink.types.StringValue.readString(StringValue.java:746)}} > {{ at > org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:75)}} > {{ at > org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:80)}} > {{--}} > {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at > (DataSetFilterJob.java:67)) (65/150)" #154 prio=5 os_prio=0 > tid=0x7faa5c01b000 nid=0x248b waiting on condition [0x7fa9d14d4000]}} > {{ java.lang.Thread.State: TIMED_WAITING (parking)}} > {{ at sun.misc.Unsafe.park(Native Method)}} > {{ - parking to wait for <0x0007b8e0eb50> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)}} > {{ at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)}} > {{ at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)}} > {{ at > java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:98)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:43)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.nextSegment(ChannelReaderInputView.java:228)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.advance(AbstractPagedInputView.java:158)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.readByte(AbstractPagedInputView.java:271)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:278)}} > {{ at org.apache.flink.types.StringValue.readString(StringValue.java:746)}} > {{ at > org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:75)}} > {{ at > org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:80)}} > {{--}} > {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at > (DataSetFilterJob.java:67)) (68/150)" #153 prio=5 os_prio=0 > tid=0x7faa5c019800 nid=0x248a waiting on condition [0x7fa981df6000]}} > {{ java.lang.Thread.State: TIMED_WAITING (parking)}} > {{ at sun.misc.Unsafe.park(Native Method)}}