[jira] [Updated] (FLINK-11783) Deadlock during Join operation

2021-11-28 Thread Flink Jira Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-11783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flink Jira Bot updated FLINK-11783:
---
  Labels: auto-deprioritized-major auto-deprioritized-minor  (was: 
auto-deprioritized-major stale-minor)
Priority: Not a Priority  (was: Minor)

This issue was labeled "stale-minor" 7 days ago and has not received any 
updates so it is being deprioritized. If this ticket is actually Minor, please 
raise the priority and ask a committer to assign you the issue or revive the 
public discussion.


> Deadlock during Join operation
> --
>
> Key: FLINK-11783
> URL: https://issues.apache.org/jira/browse/FLINK-11783
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataSet
>Affects Versions: 1.7.2
>Reporter: Julien Nioche
>Priority: Not a Priority
>  Labels: auto-deprioritized-major, auto-deprioritized-minor
> Attachments: flink_is_stuck.png
>
>
> I am running a filtering job on a large dataset with Flink running in 
> distributed mode. Most tasks in the Join operation have completed a while ago 
> and only the tasks from a particular TaskManager are still running. These 
> tasks make progress but extremely slowly.
> When logging onto the machine running this TM I can see that all threads are 
> TIMED_WAITING .
> Could there be a synchronization problem?
> See attachment for a screenshot of the Flink UI and the stack below.
>  
> *{{$ jstack 9183 | grep -A 15 "DataSetFilterJob"}}*
> {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at 
> (DataSetFilterJob.java:67)) (66/150)" #155 prio=5 os_prio=0 
> tid=0x7faa5c01c000 nid=0x248c waiting on condition [0x7fa9d15d5000]}}
> {{ java.lang.Thread.State: TIMED_WAITING (parking)}}
> {{ at sun.misc.Unsafe.park(Native Method)}}
> {{ - parking to wait for <0x0007bfa89578> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)}}
> {{ at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)}}
> {{ at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)}}
> {{ at 
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:98)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:43)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.nextSegment(ChannelReaderInputView.java:228)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.advance(AbstractPagedInputView.java:158)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.readByte(AbstractPagedInputView.java:271)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:278)}}
> {{ at org.apache.flink.types.StringValue.readString(StringValue.java:746)}}
> {{ at 
> org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:75)}}
> {{ at 
> org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:80)}}
> {{--}}
> {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at 
> (DataSetFilterJob.java:67)) (65/150)" #154 prio=5 os_prio=0 
> tid=0x7faa5c01b000 nid=0x248b waiting on condition [0x7fa9d14d4000]}}
> {{ java.lang.Thread.State: TIMED_WAITING (parking)}}
> {{ at sun.misc.Unsafe.park(Native Method)}}
> {{ - parking to wait for <0x0007b8e0eb50> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)}}
> {{ at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)}}
> {{ at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)}}
> {{ at 
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:98)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:43)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.nextSegment(ChannelReaderInputView.java:228)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.advance(AbstractPagedInputView.java:158)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.readByte(AbstractPagedInputView.java:271)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:278)}}
> {{ at org.apache.flink.types.StringValue.readString(StringValue.java:746)}}
> {{ at 
> 

[jira] [Updated] (FLINK-11783) Deadlock during Join operation

2021-11-17 Thread Flink Jira Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-11783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flink Jira Bot updated FLINK-11783:
---
Labels: auto-deprioritized-major stale-minor  (was: 
auto-deprioritized-major)

I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I help 
the community manage its development. I see this issues has been marked as 
Minor but is unassigned and neither itself nor its Sub-Tasks have been updated 
for 180 days. I have gone ahead and marked it "stale-minor". If this ticket is 
still Minor, please either assign yourself or give an update. Afterwards, 
please remove the label or in 7 days the issue will be deprioritized.


> Deadlock during Join operation
> --
>
> Key: FLINK-11783
> URL: https://issues.apache.org/jira/browse/FLINK-11783
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataSet
>Affects Versions: 1.7.2
>Reporter: Julien Nioche
>Priority: Minor
>  Labels: auto-deprioritized-major, stale-minor
> Attachments: flink_is_stuck.png
>
>
> I am running a filtering job on a large dataset with Flink running in 
> distributed mode. Most tasks in the Join operation have completed a while ago 
> and only the tasks from a particular TaskManager are still running. These 
> tasks make progress but extremely slowly.
> When logging onto the machine running this TM I can see that all threads are 
> TIMED_WAITING .
> Could there be a synchronization problem?
> See attachment for a screenshot of the Flink UI and the stack below.
>  
> *{{$ jstack 9183 | grep -A 15 "DataSetFilterJob"}}*
> {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at 
> (DataSetFilterJob.java:67)) (66/150)" #155 prio=5 os_prio=0 
> tid=0x7faa5c01c000 nid=0x248c waiting on condition [0x7fa9d15d5000]}}
> {{ java.lang.Thread.State: TIMED_WAITING (parking)}}
> {{ at sun.misc.Unsafe.park(Native Method)}}
> {{ - parking to wait for <0x0007bfa89578> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)}}
> {{ at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)}}
> {{ at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)}}
> {{ at 
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:98)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:43)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.nextSegment(ChannelReaderInputView.java:228)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.advance(AbstractPagedInputView.java:158)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.readByte(AbstractPagedInputView.java:271)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:278)}}
> {{ at org.apache.flink.types.StringValue.readString(StringValue.java:746)}}
> {{ at 
> org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:75)}}
> {{ at 
> org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:80)}}
> {{--}}
> {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at 
> (DataSetFilterJob.java:67)) (65/150)" #154 prio=5 os_prio=0 
> tid=0x7faa5c01b000 nid=0x248b waiting on condition [0x7fa9d14d4000]}}
> {{ java.lang.Thread.State: TIMED_WAITING (parking)}}
> {{ at sun.misc.Unsafe.park(Native Method)}}
> {{ - parking to wait for <0x0007b8e0eb50> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)}}
> {{ at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)}}
> {{ at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)}}
> {{ at 
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:98)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:43)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.nextSegment(ChannelReaderInputView.java:228)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.advance(AbstractPagedInputView.java:158)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.readByte(AbstractPagedInputView.java:271)}}
> {{ at 
> 

[jira] [Updated] (FLINK-11783) Deadlock during Join operation

2021-04-29 Thread Flink Jira Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-11783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flink Jira Bot updated FLINK-11783:
---
Labels: auto-deprioritized-major  (was: stale-major)

> Deadlock during Join operation
> --
>
> Key: FLINK-11783
> URL: https://issues.apache.org/jira/browse/FLINK-11783
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataSet
>Affects Versions: 1.7.2
>Reporter: Julien Nioche
>Priority: Major
>  Labels: auto-deprioritized-major
> Attachments: flink_is_stuck.png
>
>
> I am running a filtering job on a large dataset with Flink running in 
> distributed mode. Most tasks in the Join operation have completed a while ago 
> and only the tasks from a particular TaskManager are still running. These 
> tasks make progress but extremely slowly.
> When logging onto the machine running this TM I can see that all threads are 
> TIMED_WAITING .
> Could there be a synchronization problem?
> See attachment for a screenshot of the Flink UI and the stack below.
>  
> *{{$ jstack 9183 | grep -A 15 "DataSetFilterJob"}}*
> {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at 
> (DataSetFilterJob.java:67)) (66/150)" #155 prio=5 os_prio=0 
> tid=0x7faa5c01c000 nid=0x248c waiting on condition [0x7fa9d15d5000]}}
> {{ java.lang.Thread.State: TIMED_WAITING (parking)}}
> {{ at sun.misc.Unsafe.park(Native Method)}}
> {{ - parking to wait for <0x0007bfa89578> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)}}
> {{ at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)}}
> {{ at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)}}
> {{ at 
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:98)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:43)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.nextSegment(ChannelReaderInputView.java:228)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.advance(AbstractPagedInputView.java:158)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.readByte(AbstractPagedInputView.java:271)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:278)}}
> {{ at org.apache.flink.types.StringValue.readString(StringValue.java:746)}}
> {{ at 
> org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:75)}}
> {{ at 
> org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:80)}}
> {{--}}
> {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at 
> (DataSetFilterJob.java:67)) (65/150)" #154 prio=5 os_prio=0 
> tid=0x7faa5c01b000 nid=0x248b waiting on condition [0x7fa9d14d4000]}}
> {{ java.lang.Thread.State: TIMED_WAITING (parking)}}
> {{ at sun.misc.Unsafe.park(Native Method)}}
> {{ - parking to wait for <0x0007b8e0eb50> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)}}
> {{ at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)}}
> {{ at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)}}
> {{ at 
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:98)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:43)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.nextSegment(ChannelReaderInputView.java:228)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.advance(AbstractPagedInputView.java:158)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.readByte(AbstractPagedInputView.java:271)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:278)}}
> {{ at org.apache.flink.types.StringValue.readString(StringValue.java:746)}}
> {{ at 
> org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:75)}}
> {{ at 
> org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:80)}}
> {{--}}
> {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at 
> (DataSetFilterJob.java:67)) (68/150)" #153 prio=5 os_prio=0 
> tid=0x7faa5c019800 nid=0x248a 

[jira] [Updated] (FLINK-11783) Deadlock during Join operation

2021-04-29 Thread Flink Jira Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-11783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flink Jira Bot updated FLINK-11783:
---
Priority: Minor  (was: Major)

> Deadlock during Join operation
> --
>
> Key: FLINK-11783
> URL: https://issues.apache.org/jira/browse/FLINK-11783
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataSet
>Affects Versions: 1.7.2
>Reporter: Julien Nioche
>Priority: Minor
>  Labels: auto-deprioritized-major
> Attachments: flink_is_stuck.png
>
>
> I am running a filtering job on a large dataset with Flink running in 
> distributed mode. Most tasks in the Join operation have completed a while ago 
> and only the tasks from a particular TaskManager are still running. These 
> tasks make progress but extremely slowly.
> When logging onto the machine running this TM I can see that all threads are 
> TIMED_WAITING .
> Could there be a synchronization problem?
> See attachment for a screenshot of the Flink UI and the stack below.
>  
> *{{$ jstack 9183 | grep -A 15 "DataSetFilterJob"}}*
> {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at 
> (DataSetFilterJob.java:67)) (66/150)" #155 prio=5 os_prio=0 
> tid=0x7faa5c01c000 nid=0x248c waiting on condition [0x7fa9d15d5000]}}
> {{ java.lang.Thread.State: TIMED_WAITING (parking)}}
> {{ at sun.misc.Unsafe.park(Native Method)}}
> {{ - parking to wait for <0x0007bfa89578> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)}}
> {{ at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)}}
> {{ at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)}}
> {{ at 
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:98)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:43)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.nextSegment(ChannelReaderInputView.java:228)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.advance(AbstractPagedInputView.java:158)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.readByte(AbstractPagedInputView.java:271)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:278)}}
> {{ at org.apache.flink.types.StringValue.readString(StringValue.java:746)}}
> {{ at 
> org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:75)}}
> {{ at 
> org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:80)}}
> {{--}}
> {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at 
> (DataSetFilterJob.java:67)) (65/150)" #154 prio=5 os_prio=0 
> tid=0x7faa5c01b000 nid=0x248b waiting on condition [0x7fa9d14d4000]}}
> {{ java.lang.Thread.State: TIMED_WAITING (parking)}}
> {{ at sun.misc.Unsafe.park(Native Method)}}
> {{ - parking to wait for <0x0007b8e0eb50> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)}}
> {{ at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)}}
> {{ at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)}}
> {{ at 
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:98)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:43)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.nextSegment(ChannelReaderInputView.java:228)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.advance(AbstractPagedInputView.java:158)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.readByte(AbstractPagedInputView.java:271)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:278)}}
> {{ at org.apache.flink.types.StringValue.readString(StringValue.java:746)}}
> {{ at 
> org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:75)}}
> {{ at 
> org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:80)}}
> {{--}}
> {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at 
> (DataSetFilterJob.java:67)) (68/150)" #153 prio=5 os_prio=0 
> tid=0x7faa5c019800 nid=0x248a waiting on condition 

[jira] [Updated] (FLINK-11783) Deadlock during Join operation

2021-04-22 Thread Flink Jira Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-11783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flink Jira Bot updated FLINK-11783:
---
Labels: stale-major  (was: )

> Deadlock during Join operation
> --
>
> Key: FLINK-11783
> URL: https://issues.apache.org/jira/browse/FLINK-11783
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataSet
>Affects Versions: 1.7.2
>Reporter: Julien Nioche
>Priority: Major
>  Labels: stale-major
> Attachments: flink_is_stuck.png
>
>
> I am running a filtering job on a large dataset with Flink running in 
> distributed mode. Most tasks in the Join operation have completed a while ago 
> and only the tasks from a particular TaskManager are still running. These 
> tasks make progress but extremely slowly.
> When logging onto the machine running this TM I can see that all threads are 
> TIMED_WAITING .
> Could there be a synchronization problem?
> See attachment for a screenshot of the Flink UI and the stack below.
>  
> *{{$ jstack 9183 | grep -A 15 "DataSetFilterJob"}}*
> {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at 
> (DataSetFilterJob.java:67)) (66/150)" #155 prio=5 os_prio=0 
> tid=0x7faa5c01c000 nid=0x248c waiting on condition [0x7fa9d15d5000]}}
> {{ java.lang.Thread.State: TIMED_WAITING (parking)}}
> {{ at sun.misc.Unsafe.park(Native Method)}}
> {{ - parking to wait for <0x0007bfa89578> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)}}
> {{ at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)}}
> {{ at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)}}
> {{ at 
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:98)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:43)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.nextSegment(ChannelReaderInputView.java:228)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.advance(AbstractPagedInputView.java:158)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.readByte(AbstractPagedInputView.java:271)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:278)}}
> {{ at org.apache.flink.types.StringValue.readString(StringValue.java:746)}}
> {{ at 
> org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:75)}}
> {{ at 
> org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:80)}}
> {{--}}
> {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at 
> (DataSetFilterJob.java:67)) (65/150)" #154 prio=5 os_prio=0 
> tid=0x7faa5c01b000 nid=0x248b waiting on condition [0x7fa9d14d4000]}}
> {{ java.lang.Thread.State: TIMED_WAITING (parking)}}
> {{ at sun.misc.Unsafe.park(Native Method)}}
> {{ - parking to wait for <0x0007b8e0eb50> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)}}
> {{ at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)}}
> {{ at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)}}
> {{ at 
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:98)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:43)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.nextSegment(ChannelReaderInputView.java:228)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.advance(AbstractPagedInputView.java:158)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.readByte(AbstractPagedInputView.java:271)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:278)}}
> {{ at org.apache.flink.types.StringValue.readString(StringValue.java:746)}}
> {{ at 
> org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:75)}}
> {{ at 
> org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:80)}}
> {{--}}
> {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at 
> (DataSetFilterJob.java:67)) (68/150)" #153 prio=5 os_prio=0 
> tid=0x7faa5c019800 nid=0x248a waiting on condition 

[jira] [Updated] (FLINK-11783) Deadlock during Join operation

2019-02-28 Thread Robert Metzger (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-11783:
---
Component/s: API / DataSet

> Deadlock during Join operation
> --
>
> Key: FLINK-11783
> URL: https://issues.apache.org/jira/browse/FLINK-11783
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataSet
>Affects Versions: 1.7.2
>Reporter: Julien Nioche
>Priority: Major
> Attachments: flink_is_stuck.png
>
>
> I am running a filtering job on a large dataset with Flink running in 
> distributed mode. Most tasks in the Join operation have completed a while ago 
> and only the tasks from a particular TaskManager are still running. These 
> tasks make progress but extremely slowly.
> When logging onto the machine running this TM I can see that all threads are 
> TIMED_WAITING .
> Could there be a synchronization problem?
> See attachment for a screenshot of the Flink UI and the stack below.
>  
> *{{$ jstack 9183 | grep -A 15 "DataSetFilterJob"}}*
> {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at 
> (DataSetFilterJob.java:67)) (66/150)" #155 prio=5 os_prio=0 
> tid=0x7faa5c01c000 nid=0x248c waiting on condition [0x7fa9d15d5000]}}
> {{ java.lang.Thread.State: TIMED_WAITING (parking)}}
> {{ at sun.misc.Unsafe.park(Native Method)}}
> {{ - parking to wait for <0x0007bfa89578> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)}}
> {{ at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)}}
> {{ at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)}}
> {{ at 
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:98)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:43)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.nextSegment(ChannelReaderInputView.java:228)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.advance(AbstractPagedInputView.java:158)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.readByte(AbstractPagedInputView.java:271)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:278)}}
> {{ at org.apache.flink.types.StringValue.readString(StringValue.java:746)}}
> {{ at 
> org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:75)}}
> {{ at 
> org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:80)}}
> {{--}}
> {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at 
> (DataSetFilterJob.java:67)) (65/150)" #154 prio=5 os_prio=0 
> tid=0x7faa5c01b000 nid=0x248b waiting on condition [0x7fa9d14d4000]}}
> {{ java.lang.Thread.State: TIMED_WAITING (parking)}}
> {{ at sun.misc.Unsafe.park(Native Method)}}
> {{ - parking to wait for <0x0007b8e0eb50> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)}}
> {{ at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)}}
> {{ at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)}}
> {{ at 
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:98)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:43)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.nextSegment(ChannelReaderInputView.java:228)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.advance(AbstractPagedInputView.java:158)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.readByte(AbstractPagedInputView.java:271)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:278)}}
> {{ at org.apache.flink.types.StringValue.readString(StringValue.java:746)}}
> {{ at 
> org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:75)}}
> {{ at 
> org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:80)}}
> {{--}}
> {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at 
> (DataSetFilterJob.java:67)) (68/150)" #153 prio=5 os_prio=0 
> tid=0x7faa5c019800 nid=0x248a waiting on condition [0x7fa981df6000]}}
> {{ java.lang.Thread.State: 

[jira] [Updated] (FLINK-11783) Deadlock during Join operation

2019-02-28 Thread Julien Nioche (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Nioche updated FLINK-11783:
--
Attachment: (was: flink_is_stuck.png)

> Deadlock during Join operation
> --
>
> Key: FLINK-11783
> URL: https://issues.apache.org/jira/browse/FLINK-11783
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.7.2
>Reporter: Julien Nioche
>Priority: Major
>
> I am running a filtering job on a large dataset with Flink running in 
> distributed mode. Most tasks in the Join operation have completed a while ago 
> and only the tasks from a particular TaskManager are still running. These 
> tasks make progress but extremely slowly.
> When logging onto the machine running this TM I can see that all threads are 
> TIMED_WAITING .
> Could there be a synchronization problem?
> See attachment for a screenshot of the Flink UI and the stack below.
>  
> *{{$ jstack 9183 | grep -A 15 "DataSetFilterJob"}}*
> {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at 
> (DataSetFilterJob.java:67)) (66/150)" #155 prio=5 os_prio=0 
> tid=0x7faa5c01c000 nid=0x248c waiting on condition [0x7fa9d15d5000]}}
> {{ java.lang.Thread.State: TIMED_WAITING (parking)}}
> {{ at sun.misc.Unsafe.park(Native Method)}}
> {{ - parking to wait for <0x0007bfa89578> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)}}
> {{ at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)}}
> {{ at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)}}
> {{ at 
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:98)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:43)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.nextSegment(ChannelReaderInputView.java:228)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.advance(AbstractPagedInputView.java:158)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.readByte(AbstractPagedInputView.java:271)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:278)}}
> {{ at org.apache.flink.types.StringValue.readString(StringValue.java:746)}}
> {{ at 
> org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:75)}}
> {{ at 
> org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:80)}}
> {{--}}
> {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at 
> (DataSetFilterJob.java:67)) (65/150)" #154 prio=5 os_prio=0 
> tid=0x7faa5c01b000 nid=0x248b waiting on condition [0x7fa9d14d4000]}}
> {{ java.lang.Thread.State: TIMED_WAITING (parking)}}
> {{ at sun.misc.Unsafe.park(Native Method)}}
> {{ - parking to wait for <0x0007b8e0eb50> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)}}
> {{ at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)}}
> {{ at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)}}
> {{ at 
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:98)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:43)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.nextSegment(ChannelReaderInputView.java:228)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.advance(AbstractPagedInputView.java:158)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.readByte(AbstractPagedInputView.java:271)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:278)}}
> {{ at org.apache.flink.types.StringValue.readString(StringValue.java:746)}}
> {{ at 
> org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:75)}}
> {{ at 
> org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:80)}}
> {{--}}
> {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at 
> (DataSetFilterJob.java:67)) (68/150)" #153 prio=5 os_prio=0 
> tid=0x7faa5c019800 nid=0x248a waiting on condition [0x7fa981df6000]}}
> {{ java.lang.Thread.State: TIMED_WAITING (parking)}}
> {{ at sun.misc.Unsafe.park(Native Method)}}

[jira] [Updated] (FLINK-11783) Deadlock during Join operation

2019-02-28 Thread Julien Nioche (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Nioche updated FLINK-11783:
--
Attachment: flink_is_stuck.png

> Deadlock during Join operation
> --
>
> Key: FLINK-11783
> URL: https://issues.apache.org/jira/browse/FLINK-11783
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.7.2
>Reporter: Julien Nioche
>Priority: Major
> Attachments: flink_is_stuck.png
>
>
> I am running a filtering job on a large dataset with Flink running in 
> distributed mode. Most tasks in the Join operation have completed a while ago 
> and only the tasks from a particular TaskManager are still running. These 
> tasks make progress but extremely slowly.
> When logging onto the machine running this TM I can see that all threads are 
> TIMED_WAITING .
> Could there be a synchronization problem?
> See attachment for a screenshot of the Flink UI and the stack below.
>  
> *{{$ jstack 9183 | grep -A 15 "DataSetFilterJob"}}*
> {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at 
> (DataSetFilterJob.java:67)) (66/150)" #155 prio=5 os_prio=0 
> tid=0x7faa5c01c000 nid=0x248c waiting on condition [0x7fa9d15d5000]}}
> {{ java.lang.Thread.State: TIMED_WAITING (parking)}}
> {{ at sun.misc.Unsafe.park(Native Method)}}
> {{ - parking to wait for <0x0007bfa89578> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)}}
> {{ at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)}}
> {{ at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)}}
> {{ at 
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:98)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:43)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.nextSegment(ChannelReaderInputView.java:228)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.advance(AbstractPagedInputView.java:158)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.readByte(AbstractPagedInputView.java:271)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:278)}}
> {{ at org.apache.flink.types.StringValue.readString(StringValue.java:746)}}
> {{ at 
> org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:75)}}
> {{ at 
> org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:80)}}
> {{--}}
> {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at 
> (DataSetFilterJob.java:67)) (65/150)" #154 prio=5 os_prio=0 
> tid=0x7faa5c01b000 nid=0x248b waiting on condition [0x7fa9d14d4000]}}
> {{ java.lang.Thread.State: TIMED_WAITING (parking)}}
> {{ at sun.misc.Unsafe.park(Native Method)}}
> {{ - parking to wait for <0x0007b8e0eb50> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)}}
> {{ at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)}}
> {{ at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)}}
> {{ at 
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:98)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:43)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.nextSegment(ChannelReaderInputView.java:228)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.advance(AbstractPagedInputView.java:158)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.readByte(AbstractPagedInputView.java:271)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:278)}}
> {{ at org.apache.flink.types.StringValue.readString(StringValue.java:746)}}
> {{ at 
> org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:75)}}
> {{ at 
> org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:80)}}
> {{--}}
> {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at 
> (DataSetFilterJob.java:67)) (68/150)" #153 prio=5 os_prio=0 
> tid=0x7faa5c019800 nid=0x248a waiting on condition [0x7fa981df6000]}}
> {{ java.lang.Thread.State: TIMED_WAITING (parking)}}
> {{ at 

[jira] [Updated] (FLINK-11783) Deadlock during Join operation

2019-02-28 Thread Julien Nioche (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Nioche updated FLINK-11783:
--
Attachment: flink_is_stuck.png

> Deadlock during Join operation
> --
>
> Key: FLINK-11783
> URL: https://issues.apache.org/jira/browse/FLINK-11783
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.7.2
>Reporter: Julien Nioche
>Priority: Major
>
> I am running a filtering job on a large dataset with Flink running in 
> distributed mode. Most tasks in the Join operation have completed a while ago 
> and only the tasks from a particular TaskManager are still running. These 
> tasks make progress but extremely slowly.
> When logging onto the machine running this TM I can see that all threads are 
> TIMED_WAITING .
> Could there be a synchronization problem?
> See attachment for a screenshot of the Flink UI and the stack below.
>  
> *{{$ jstack 9183 | grep -A 15 "DataSetFilterJob"}}*
> {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at 
> (DataSetFilterJob.java:67)) (66/150)" #155 prio=5 os_prio=0 
> tid=0x7faa5c01c000 nid=0x248c waiting on condition [0x7fa9d15d5000]}}
> {{ java.lang.Thread.State: TIMED_WAITING (parking)}}
> {{ at sun.misc.Unsafe.park(Native Method)}}
> {{ - parking to wait for <0x0007bfa89578> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)}}
> {{ at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)}}
> {{ at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)}}
> {{ at 
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:98)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:43)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.nextSegment(ChannelReaderInputView.java:228)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.advance(AbstractPagedInputView.java:158)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.readByte(AbstractPagedInputView.java:271)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:278)}}
> {{ at org.apache.flink.types.StringValue.readString(StringValue.java:746)}}
> {{ at 
> org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:75)}}
> {{ at 
> org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:80)}}
> {{--}}
> {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at 
> (DataSetFilterJob.java:67)) (65/150)" #154 prio=5 os_prio=0 
> tid=0x7faa5c01b000 nid=0x248b waiting on condition [0x7fa9d14d4000]}}
> {{ java.lang.Thread.State: TIMED_WAITING (parking)}}
> {{ at sun.misc.Unsafe.park(Native Method)}}
> {{ - parking to wait for <0x0007b8e0eb50> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)}}
> {{ at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)}}
> {{ at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)}}
> {{ at 
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:98)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:43)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.nextSegment(ChannelReaderInputView.java:228)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.advance(AbstractPagedInputView.java:158)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.readByte(AbstractPagedInputView.java:271)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:278)}}
> {{ at org.apache.flink.types.StringValue.readString(StringValue.java:746)}}
> {{ at 
> org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:75)}}
> {{ at 
> org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:80)}}
> {{--}}
> {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at 
> (DataSetFilterJob.java:67)) (68/150)" #153 prio=5 os_prio=0 
> tid=0x7faa5c019800 nid=0x248a waiting on condition [0x7fa981df6000]}}
> {{ java.lang.Thread.State: TIMED_WAITING (parking)}}
> {{ at sun.misc.Unsafe.park(Native Method)}}
> {{ - 

[jira] [Updated] (FLINK-11783) Deadlock during Join operation

2019-02-28 Thread Julien Nioche (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Nioche updated FLINK-11783:
--
Attachment: (was: flink_is_stuck.png)

> Deadlock during Join operation
> --
>
> Key: FLINK-11783
> URL: https://issues.apache.org/jira/browse/FLINK-11783
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.7.2
>Reporter: Julien Nioche
>Priority: Major
>
> I am running a filtering job on a large dataset with Flink running in 
> distributed mode. Most tasks in the Join operation have completed a while ago 
> and only the tasks from a particular TaskManager are still running. These 
> tasks make progress but extremely slowly.
> When logging onto the machine running this TM I can see that all threads are 
> TIMED_WAITING .
> Could there be a synchronization problem?
> See attachment for a screenshot of the Flink UI and the stack below.
>  
> *{{$ jstack 9183 | grep -A 15 "DataSetFilterJob"}}*
> {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at 
> (DataSetFilterJob.java:67)) (66/150)" #155 prio=5 os_prio=0 
> tid=0x7faa5c01c000 nid=0x248c waiting on condition [0x7fa9d15d5000]}}
> {{ java.lang.Thread.State: TIMED_WAITING (parking)}}
> {{ at sun.misc.Unsafe.park(Native Method)}}
> {{ - parking to wait for <0x0007bfa89578> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)}}
> {{ at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)}}
> {{ at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)}}
> {{ at 
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:98)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:43)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.nextSegment(ChannelReaderInputView.java:228)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.advance(AbstractPagedInputView.java:158)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.readByte(AbstractPagedInputView.java:271)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:278)}}
> {{ at org.apache.flink.types.StringValue.readString(StringValue.java:746)}}
> {{ at 
> org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:75)}}
> {{ at 
> org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:80)}}
> {{--}}
> {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at 
> (DataSetFilterJob.java:67)) (65/150)" #154 prio=5 os_prio=0 
> tid=0x7faa5c01b000 nid=0x248b waiting on condition [0x7fa9d14d4000]}}
> {{ java.lang.Thread.State: TIMED_WAITING (parking)}}
> {{ at sun.misc.Unsafe.park(Native Method)}}
> {{ - parking to wait for <0x0007b8e0eb50> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)}}
> {{ at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)}}
> {{ at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)}}
> {{ at 
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:98)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:43)}}
> {{ at 
> org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.nextSegment(ChannelReaderInputView.java:228)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.advance(AbstractPagedInputView.java:158)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.readByte(AbstractPagedInputView.java:271)}}
> {{ at 
> org.apache.flink.runtime.memory.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:278)}}
> {{ at org.apache.flink.types.StringValue.readString(StringValue.java:746)}}
> {{ at 
> org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:75)}}
> {{ at 
> org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:80)}}
> {{--}}
> {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at 
> (DataSetFilterJob.java:67)) (68/150)" #153 prio=5 os_prio=0 
> tid=0x7faa5c019800 nid=0x248a waiting on condition [0x7fa981df6000]}}
> {{ java.lang.Thread.State: TIMED_WAITING (parking)}}
> {{ at sun.misc.Unsafe.park(Native Method)}}