[jira] [Commented] (FLINK-17322) Enable latency tracker would corrupt the broadcast state

2020-06-16 Thread Arvid Heise (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136400#comment-17136400
 ] 

Arvid Heise commented on FLINK-17322:
-

Merged to master as 7e7b132f611f5a564e8c7b1e6b195692293fa90f
Merged to 1.11 as dfdfdadf445ea055c841c526b1a382424e1e1865
Merged to 1.10 as 5c0f827bf1c1f531ba30f97bdf6ab4393a32b0f0. 

> Enable latency tracker would corrupt the broadcast state
> 
>
> Key: FLINK-17322
> URL: https://issues.apache.org/jira/browse/FLINK-17322
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.9.3, 1.10.1, 1.11.0, 1.12.0
>Reporter: Yun Tang
>Assignee: Arvid Heise
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0, 1.10.2, 1.12.0
>
> Attachments: 
> Telematics2-feature-flink-1.10-latency-tracking-broken.zip
>
>
> This bug is reported from user mail list:
>  
> [http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Latency-tracking-together-with-broadcast-state-can-cause-job-failure-td34013.html]
> Execute {{BroadcastStateIT#broadcastStateWorksWithLatencyTracking}} would 
> easily reproduce this problem.
> From current information, the broadcast element would be corrupt once we 
> enable {{env.getConfig().setLatencyTrackingInterval(2000)}}.
>  The exception stack trace would be: (based on current master branch)
> {code:java}
> Caused by: java.io.IOException: Corrupt stream, found tag: 84
>   at 
> org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.deserialize(StreamElementSerializer.java:217)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.deserialize(StreamElementSerializer.java:46)
>  ~[classes/:?]
>   at 
> org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55)
>  ~[classes/:?]
>   at 
> org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:157)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:123)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.io.StreamTwoInputProcessor.processInput(StreamTwoInputProcessor.java:181)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:332)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxStep(MailboxProcessor.java:206)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:196)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:505)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:485)
>  ~[classes/:?]
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:720) 
> ~[classes/:?]
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:544) 
> ~[classes/:?]
>   at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_144]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17322) Enable latency tracker would corrupt the broadcast state

2020-06-15 Thread Arvid Heise (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17135772#comment-17135772
 ] 

Arvid Heise commented on FLINK-17322:
-

Merged fix to 1.11 and 1.10 is about to be merged.

For 1.9, there seems to be a completely different cause; the buggy code first 
appeared in 1.10. Since 1.11 is about to pop and latency markers in conjunction 
with broadcast state is rarely used, we decided to not fix it in 1.9.

I'll close this ticket as soon as 1.10 is merged.

> Enable latency tracker would corrupt the broadcast state
> 
>
> Key: FLINK-17322
> URL: https://issues.apache.org/jira/browse/FLINK-17322
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.9.3, 1.10.1, 1.11.0, 1.12.0
>Reporter: Yun Tang
>Assignee: Arvid Heise
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0, 1.10.2, 1.12.0
>
> Attachments: 
> Telematics2-feature-flink-1.10-latency-tracking-broken.zip
>
>
> This bug is reported from user mail list:
>  
> [http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Latency-tracking-together-with-broadcast-state-can-cause-job-failure-td34013.html]
> Execute {{BroadcastStateIT#broadcastStateWorksWithLatencyTracking}} would 
> easily reproduce this problem.
> From current information, the broadcast element would be corrupt once we 
> enable {{env.getConfig().setLatencyTrackingInterval(2000)}}.
>  The exception stack trace would be: (based on current master branch)
> {code:java}
> Caused by: java.io.IOException: Corrupt stream, found tag: 84
>   at 
> org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.deserialize(StreamElementSerializer.java:217)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.deserialize(StreamElementSerializer.java:46)
>  ~[classes/:?]
>   at 
> org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55)
>  ~[classes/:?]
>   at 
> org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:157)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:123)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.io.StreamTwoInputProcessor.processInput(StreamTwoInputProcessor.java:181)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:332)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxStep(MailboxProcessor.java:206)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:196)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:505)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:485)
>  ~[classes/:?]
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:720) 
> ~[classes/:?]
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:544) 
> ~[classes/:?]
>   at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_144]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17322) Enable latency tracker would corrupt the broadcast state

2020-06-12 Thread Arvid Heise (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134561#comment-17134561
 ] 

Arvid Heise commented on FLINK-17322:
-

Proper fix has been merged to master; currently backporting.

> Enable latency tracker would corrupt the broadcast state
> 
>
> Key: FLINK-17322
> URL: https://issues.apache.org/jira/browse/FLINK-17322
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.9.3, 1.10.1, 1.11.0, 1.12.0
>Reporter: Yun Tang
>Assignee: Arvid Heise
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
> Attachments: 
> Telematics2-feature-flink-1.10-latency-tracking-broken.zip
>
>
> This bug is reported from user mail list:
>  
> [http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Latency-tracking-together-with-broadcast-state-can-cause-job-failure-td34013.html]
> Execute {{BroadcastStateIT#broadcastStateWorksWithLatencyTracking}} would 
> easily reproduce this problem.
> From current information, the broadcast element would be corrupt once we 
> enable {{env.getConfig().setLatencyTrackingInterval(2000)}}.
>  The exception stack trace would be: (based on current master branch)
> {code:java}
> Caused by: java.io.IOException: Corrupt stream, found tag: 84
>   at 
> org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.deserialize(StreamElementSerializer.java:217)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.deserialize(StreamElementSerializer.java:46)
>  ~[classes/:?]
>   at 
> org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55)
>  ~[classes/:?]
>   at 
> org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:157)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:123)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.io.StreamTwoInputProcessor.processInput(StreamTwoInputProcessor.java:181)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:332)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxStep(MailboxProcessor.java:206)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:196)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:505)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:485)
>  ~[classes/:?]
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:720) 
> ~[classes/:?]
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:544) 
> ~[classes/:?]
>   at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_144]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17322) Enable latency tracker would corrupt the broadcast state

2020-05-21 Thread Arvid Heise (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113083#comment-17113083
 ] 

Arvid Heise commented on FLINK-17322:
-

There is a fundamental issue with latency markers and legacy sources that also 
applies to the current master.

 

Latency markers are emitted over task thread, while all other records come over 
legacy thread. Since buffer builder is not thread-safe, we should not interact 
with it over task thread at all.

However, I also don't see a way to reliably emit them over legacy thread. The 
only way would be to fall back to record hand-over from legacy to task thread 
and then do everything in task thread, but that would degrade performance 
tremendously.

 

So until we get FLIP-27, it might be that legacy markers are not working.

> Enable latency tracker would corrupt the broadcast state
> 
>
> Key: FLINK-17322
> URL: https://issues.apache.org/jira/browse/FLINK-17322
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.9.2, 1.10.0
>Reporter: Yun Tang
>Priority: Major
> Fix For: 1.11.0
>
> Attachments: 
> Telematics2-feature-flink-1.10-latency-tracking-broken.zip
>
>
> This bug is reported from user mail list:
>  
> [http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Latency-tracking-together-with-broadcast-state-can-cause-job-failure-td34013.html]
> Execute {{BroadcastStateIT#broadcastStateWorksWithLatencyTracking}} would 
> easily reproduce this problem.
> From current information, the broadcast element would be corrupt once we 
> enable {{env.getConfig().setLatencyTrackingInterval(2000)}}.
>  The exception stack trace would be: (based on current master branch)
> {code:java}
> Caused by: java.io.IOException: Corrupt stream, found tag: 84
>   at 
> org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.deserialize(StreamElementSerializer.java:217)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.deserialize(StreamElementSerializer.java:46)
>  ~[classes/:?]
>   at 
> org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55)
>  ~[classes/:?]
>   at 
> org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:157)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:123)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.io.StreamTwoInputProcessor.processInput(StreamTwoInputProcessor.java:181)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:332)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxStep(MailboxProcessor.java:206)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:196)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:505)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:485)
>  ~[classes/:?]
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:720) 
> ~[classes/:?]
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:544) 
> ~[classes/:?]
>   at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_144]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17322) Enable latency tracker would corrupt the broadcast state

2020-04-22 Thread Yun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17089483#comment-17089483
 ] 

Yun Tang commented on FLINK-17322:
--

I have also uploaded the related project to reproduce this problem, which is 
offered from user Lasse Nedergaard 

> Enable latency tracker would corrupt the broadcast state
> 
>
> Key: FLINK-17322
> URL: https://issues.apache.org/jira/browse/FLINK-17322
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Reporter: Yun Tang
>Priority: Major
> Attachments: 
> Telematics2-feature-flink-1.10-latency-tracking-broken.zip
>
>
> This bug is reported from user mail list:
>  
> [http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Latency-tracking-together-with-broadcast-state-can-cause-job-failure-td34013.html]
> Execute {{BroadcastStateIT#broadcastStateWorksWithLatencyTracking}} would 
> easily reproduce this problem.
> From current information, the broadcast element would be corrupt once we 
> enable {{env.getConfig().setLatencyTrackingInterval(2000)}}.
>  The exception stack trace would be: (based on current master branch)
> {code:java}
> Caused by: java.io.IOException: Corrupt stream, found tag: 84
>   at 
> org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.deserialize(StreamElementSerializer.java:217)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.deserialize(StreamElementSerializer.java:46)
>  ~[classes/:?]
>   at 
> org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55)
>  ~[classes/:?]
>   at 
> org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:157)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:123)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.io.StreamTwoInputProcessor.processInput(StreamTwoInputProcessor.java:181)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:332)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxStep(MailboxProcessor.java:206)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:196)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:505)
>  ~[classes/:?]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:485)
>  ~[classes/:?]
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:720) 
> ~[classes/:?]
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:544) 
> ~[classes/:?]
>   at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_144]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)