[jira] [Commented] (BEAM-9225) Flink uber jar job server hangs

2020-02-25 Thread Kyle Weaver (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044916#comment-17044916
 ] 

Kyle Weaver commented on BEAM-9225:
---

I was waiting for fix to the tests to verify this, but I'm pretty sure it's 
resolved.

> Flink uber jar job server hangs
> ---
>
> Key: BEAM-9225
> URL: https://issues.apache.org/jira/browse/BEAM-9225
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
> Fix For: 2.20.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> This was observed on Kubernetes. I suspect this behavior might also be the 
> reason beam_PostCommit_PortableJar_Flink is timing out.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9225) Flink uber jar job server hangs

2020-02-24 Thread Rui Wang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043970#comment-17043970
 ] 

Rui Wang commented on BEAM-9225:


I am going to cut 2.20.0 branch on 02/26/2020. Can this Jira be finished by 
that date? Will you able to push this back to 2.21.0?

> Flink uber jar job server hangs
> ---
>
> Key: BEAM-9225
> URL: https://issues.apache.org/jira/browse/BEAM-9225
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
> Fix For: 2.20.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> This was observed on Kubernetes. I suspect this behavior might also be the 
> reason beam_PostCommit_PortableJar_Flink is timing out.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9225) Flink uber jar job server hangs

2020-02-20 Thread Kyle Weaver (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17041248#comment-17041248
 ] 

Kyle Weaver commented on BEAM-9225:
---

The Flink uber jar job server has two streams, one for state and one for 
messages. They share history [1] and don't yield a message if it is a duplicate 
according to that history [2]. So if the message stream reads the terminal 
state first, the state history will never yield it, and the job will never 
register as complete.

I avoided this in the Spark implementation by deferring de-duplication to 
later, so all messages are yielded regardless of whether they are duplicates or 
not.

[1] 
https://github.com/apache/beam/blob/57ce5b966cfc4f549082a47f50c29d5f9caa2909/sdks/python/apache_beam/runners/portability/abstract_job_service.py#L252
[2] 
https://github.com/apache/beam/blob/57ce5b966cfc4f549082a47f50c29d5f9caa2909/sdks/python/apache_beam/runners/portability/flink_uber_jar_job_server.py#L205

> Flink uber jar job server hangs
> ---
>
> Key: BEAM-9225
> URL: https://issues.apache.org/jira/browse/BEAM-9225
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
> Fix For: 2.20.0
>
>
> This was observed on Kubernetes. I suspect this behavior might also be the 
> reason beam_PostCommit_PortableJar_Flink is timing out.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)