[jira] [Comment Edited] (BEAM-10115) Staging requirements.txt fails but staging setup.py succeeds

2020-05-28 Thread Ankur Goenka (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119112#comment-17119112
 ] 

Ankur Goenka edited comment on BEAM-10115 at 5/28/20, 10:12 PM:


An action item from this seems to be cleaning /tmp/dataflow-requirements-cache  
before uploading the requirements as it might contain old artifacts.

https://issues.apache.org/jira/browse/BEAM-10147


was (Author: angoenka):
An action item from this seems to be cleaning /tmp/dataflow-requirements-cache  
before uploading the requirements as it might contain old artifacts.

> Staging requirements.txt fails but staging setup.py succeeds
> 
>
> Key: BEAM-10115
> URL: https://issues.apache.org/jira/browse/BEAM-10115
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-core
>Reporter: Kenneth Knowles
>Priority: P2
>
> User reports on StackOverflow: 
> https://stackoverflow.com/questions/62032382/dataflow-fails-when-i-add-requirements-txt-python
> The issue appears to be a problem with staging, and a difference between 
> using `requirements.txt` and `setup.py` for some reason.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10147) Clean /tmp/dataflow-requirements-cache before downloading the requirements from requirements.txt

2020-05-28 Thread Ankur Goenka (Jira)
Ankur Goenka created BEAM-10147:
---

 Summary: Clean /tmp/dataflow-requirements-cache before downloading 
the requirements from requirements.txt
 Key: BEAM-10147
 URL: https://issues.apache.org/jira/browse/BEAM-10147
 Project: Beam
  Issue Type: Bug
  Components: runner-dataflow, sdk-py-core
Reporter: Ankur Goenka


when using requirements.txt with beam pipelines, 
/tmp/dataflow-requirements-cache is not cleaned and all the new dependencies 
are downloaded and then all the files in /tmp/dataflow-requirements-cache  are 
staged which is wasteful and unecessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10115) Staging requirements.txt fails but staging setup.py succeeds

2020-05-28 Thread Ankur Goenka (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119112#comment-17119112
 ] 

Ankur Goenka commented on BEAM-10115:
-

An action item from this seems to be cleaning /tmp/dataflow-requirements-cache  
before uploading the requirements as it might contain old artifacts.

> Staging requirements.txt fails but staging setup.py succeeds
> 
>
> Key: BEAM-10115
> URL: https://issues.apache.org/jira/browse/BEAM-10115
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-core
>Reporter: Kenneth Knowles
>Priority: P2
>
> User reports on StackOverflow: 
> https://stackoverflow.com/questions/62032382/dataflow-fails-when-i-add-requirements-txt-python
> The issue appears to be a problem with staging, and a difference between 
> using `requirements.txt` and `setup.py` for some reason.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10115) Staging requirements.txt fails but staging setup.py succeeds

2020-05-27 Thread Ankur Goenka (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118213#comment-17118213
 ] 

Ankur Goenka commented on BEAM-10115:
-

I suspect the issue might be because of old data in requirements cache.

all the dependencies are recursively downloaded from requirements.txt in 
/tmp/dataflow-requirements-cache 

I don't think we delete and recreate the /tmp/dataflow-requirements-cache which 
can result into unintended files to get staged. Some of those files might have 
uncommon name which can potentially cause this issue.

> Staging requirements.txt fails but staging setup.py succeeds
> 
>
> Key: BEAM-10115
> URL: https://issues.apache.org/jira/browse/BEAM-10115
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-core
>Reporter: Kenneth Knowles
>Priority: P2
>
> User reports on StackOverflow: 
> https://stackoverflow.com/questions/62032382/dataflow-fails-when-i-add-requirements-txt-python
> The issue appears to be a problem with staging, and a difference between 
> using `requirements.txt` and `setup.py` for some reason.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9974) beam_PostRelease_NightlySnapshot failing

2020-05-26 Thread Ankur Goenka (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17116999#comment-17116999
 ] 

Ankur Goenka commented on BEAM-9974:


cc: [~robertwb] 

> beam_PostRelease_NightlySnapshot failing
> 
>
> Key: BEAM-9974
> URL: https://issues.apache.org/jira/browse/BEAM-9974
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Brian Hulette
>Priority: P1
>  Labels: currently-failing
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Another failure mode:
> 07:02:29 > Task 
> :runners:google-cloud-dataflow-java:runMobileGamingJavaDataflow
> 07:02:29 [ERROR] Failed to execute goal 
> org.codehaus.mojo:exec-maven-plugin:1.6.0:java (default-cli) on project 
> word-count-beam: An exception occured while executing the Java class. No 
> filesystem found for scheme gs -> [Help 1]
> 07:02:29 [ERROR] 
> 07:02:29 [ERROR] To see the full stack trace of the errors, re-run Maven with 
> the -e switch.
> 07:02:29 [ERROR] Re-run Maven using the -X switch to enable full debug 
> logging.
> 07:02:29 [ERROR] 
> 07:02:29 [ERROR] For more information about the errors and possible 
> solutions, please read the following articles:
> 07:02:29 [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
> 07:02:29 [ERROR] Failed to execute goal 
> org.codehaus.mojo:exec-maven-plugin:1.6.0:java (default-cli) on project 
> word-count-beam: An exception occured while executing the Java class. No 
> filesystem found for scheme gs -> [Help 1]
> 07:02:29 [ERROR] 
> 07:02:29 [ERROR] To see the full stack trace of the errors, re-run Maven with 
> the -e switch.
> 07:02:29 [ERROR] Re-run Maven using the -X switch to enable full debug 
> logging.
> 07:02:29 [ERROR] 
> 07:02:29 [ERROR] For more information about the errors and possible 
> solutions, please read the following articles:
> 07:02:29 [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
> 07:02:29 [ERROR] Failed command
> 07:02:29 
> 07:02:29 > Task 
> :runners:google-cloud-dataflow-java:runMobileGamingJavaDataflow FAILED



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-10060) Release new Python container image for Dataflow

2020-05-26 Thread Ankur Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka resolved BEAM-10060.
-
Resolution: Fixed

> Release new Python container image for Dataflow
> ---
>
> Key: BEAM-10060
> URL: https://issues.apache.org/jira/browse/BEAM-10060
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-harness
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: P2
> Fix For: 2.23.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Release beam-master-20200521 for python legacy and fnapi



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10060) Release new Python container image for Dataflow

2020-05-21 Thread Ankur Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka updated BEAM-10060:

Fix Version/s: 2.23.0

> Release new Python container image for Dataflow
> ---
>
> Key: BEAM-10060
> URL: https://issues.apache.org/jira/browse/BEAM-10060
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-harness
>Reporter: Ankur Goenka
>Priority: P2
> Fix For: 2.23.0
>
>
> Release beam-master-20200521 for python legacy and fnapi



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10060) Release new Python container image for Dataflow

2020-05-21 Thread Ankur Goenka (Jira)
Ankur Goenka created BEAM-10060:
---

 Summary: Release new Python container image for Dataflow
 Key: BEAM-10060
 URL: https://issues.apache.org/jira/browse/BEAM-10060
 Project: Beam
  Issue Type: Bug
  Components: runner-dataflow, sdk-py-harness
Reporter: Ankur Goenka


Release beam-master-20200521 for python legacy and fnapi



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9707) Hardcode runner harness container image for unified worker

2020-05-20 Thread Ankur Goenka (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112559#comment-17112559
 ] 

Ankur Goenka commented on BEAM-9707:


We can delay it to 2.23

> Hardcode runner harness container image for unified worker
> --
>
> Key: BEAM-9707
> URL: https://issues.apache.org/jira/browse/BEAM-9707
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Hardcode runner harness image temporarily to support usage of unified worker 
> on head.
> Remove this hardcoding once 2.21 is release



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9707) Hardcode runner harness container image for unified worker

2020-05-20 Thread Ankur Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka updated BEAM-9707:
---
Fix Version/s: (was: 2.22.0)
   2.23.0

> Hardcode runner harness container image for unified worker
> --
>
> Key: BEAM-9707
> URL: https://issues.apache.org/jira/browse/BEAM-9707
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: P2
> Fix For: 2.23.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Hardcode runner harness image temporarily to support usage of unified worker 
> on head.
> Remove this hardcoding once 2.21 is release



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9987) Add Gradle instruction to Java quick start guide

2020-05-13 Thread Ankur Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka updated BEAM-9987:
---
Parent: BEAM-9962
Issue Type: Sub-task  (was: Bug)

> Add Gradle instruction to Java quick start guide
> 
>
> Key: BEAM-9987
> URL: https://issues.apache.org/jira/browse/BEAM-9987
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Minor
>
> [Java quick start|https://beam.apache.org/get-started/quickstart-java/] only 
> has instruction to use maven as the build system.
> Let's add instruction to use gradle in there.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9962) CI/CD for Beam

2020-05-13 Thread Ankur Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka updated BEAM-9962:
---
Issue Type: Task  (was: Bug)

> CI/CD for Beam
> --
>
> Key: BEAM-9962
> URL: https://issues.apache.org/jira/browse/BEAM-9962
> Project: Beam
>  Issue Type: Task
>  Components: sdk-ideas
>Reporter: Ankur Goenka
>Priority: Major
>
> Uber task for CI CD



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9987) Add Gradle instruction to Java quick start guide

2020-05-13 Thread Ankur Goenka (Jira)
Ankur Goenka created BEAM-9987:
--

 Summary: Add Gradle instruction to Java quick start guide
 Key: BEAM-9987
 URL: https://issues.apache.org/jira/browse/BEAM-9987
 Project: Beam
  Issue Type: Bug
  Components: build-system
Reporter: Ankur Goenka
Assignee: Ankur Goenka


[Java quick start|https://beam.apache.org/get-started/quickstart-java/] only 
has instruction to use maven as the build system.

Let's add instruction to use gradle in there.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9962) CI/CD for Beam

2020-05-12 Thread Ankur Goenka (Jira)
Ankur Goenka created BEAM-9962:
--

 Summary: CI/CD for Beam
 Key: BEAM-9962
 URL: https://issues.apache.org/jira/browse/BEAM-9962
 Project: Beam
  Issue Type: Bug
  Components: sdk-ideas
Reporter: Ankur Goenka


Uber task for CI CD



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9791) Precommit for dataflow runner v2

2020-05-11 Thread Ankur Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka resolved BEAM-9791.

Resolution: Fixed

> Precommit for dataflow runner v2
> 
>
> Key: BEAM-9791
> URL: https://issues.apache.org/jira/browse/BEAM-9791
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, testing
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
> Fix For: 2.22.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9791) Precommit for dataflow runner v2

2020-05-11 Thread Ankur Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka updated BEAM-9791:
---
Fix Version/s: 2.22.0

> Precommit for dataflow runner v2
> 
>
> Key: BEAM-9791
> URL: https://issues.apache.org/jira/browse/BEAM-9791
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, testing
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
> Fix For: 2.22.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9791) Precommit for dataflow runner v2

2020-04-20 Thread Ankur Goenka (Jira)
Ankur Goenka created BEAM-9791:
--

 Summary: Precommit for dataflow runner v2
 Key: BEAM-9791
 URL: https://issues.apache.org/jira/browse/BEAM-9791
 Project: Beam
  Issue Type: Bug
  Components: runner-dataflow, testing
Reporter: Ankur Goenka






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9752) Too many shards in GCS

2020-04-13 Thread Ankur Goenka (Jira)
Ankur Goenka created BEAM-9752:
--

 Summary: Too many shards in GCS
 Key: BEAM-9752
 URL: https://issues.apache.org/jira/browse/BEAM-9752
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core, sdk-py-harness
Reporter: Ankur Goenka


We have observed case where the data was spread very thinly over automatically 
computed number of shards.

This caused wait for the buffers to fill before sending the data over to gcs 
causing upload timeout as we did not upload any data for while waiting.

However, by setting an explicit number of shards (1000 in my case) solved this 
problem potentially because all the shards had enough data to fill the buffer 
write avoiding timeout.

 

We can improve the sharding logic so that we don't create too many shards.

Alternatively, we can improve connection handling so that the connection does 
not timeout.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9735) Performance regression in Python Batch pipeline in Reshuffle

2020-04-09 Thread Ankur Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka updated BEAM-9735:
---
Priority: Blocker  (was: Major)

> Performance regression in Python Batch pipeline in Reshuffle
> 
>
> Key: BEAM-9735
> URL: https://issues.apache.org/jira/browse/BEAM-9735
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Blocker
> Fix For: 2.21.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9735) Performance regression in Python Batch pipeline in Reshuffle

2020-04-09 Thread Ankur Goenka (Jira)
Ankur Goenka created BEAM-9735:
--

 Summary: Performance regression in Python Batch pipeline in 
Reshuffle
 Key: BEAM-9735
 URL: https://issues.apache.org/jira/browse/BEAM-9735
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Ankur Goenka
Assignee: Ankur Goenka
 Fix For: 2.21.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9707) Hardcode runner harness container image for unified worker

2020-04-08 Thread Ankur Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka updated BEAM-9707:
---
Fix Version/s: (was: 2.21.0)
   2.22.0

> Hardcode runner harness container image for unified worker
> --
>
> Key: BEAM-9707
> URL: https://issues.apache.org/jira/browse/BEAM-9707
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-core
>Reporter: Ankur Goenka
>Priority: Major
> Fix For: 2.22.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Hardcode runner harness image temporarily to support usage of unified worker 
> on head.
> Remove this hardcoding once 2.21 is release



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9707) Hardcode runner harness container image for unified worker

2020-04-08 Thread Ankur Goenka (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078719#comment-17078719
 ] 

Ankur Goenka commented on BEAM-9707:


I will keep it open for cleanup.

 

Marking the fixed version to 2.22

> Hardcode runner harness container image for unified worker
> --
>
> Key: BEAM-9707
> URL: https://issues.apache.org/jira/browse/BEAM-9707
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-core
>Reporter: Ankur Goenka
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Hardcode runner harness image temporarily to support usage of unified worker 
> on head.
> Remove this hardcoding once 2.21 is release



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9725) Perfomance regression in reshuffle

2020-04-08 Thread Ankur Goenka (Jira)
Ankur Goenka created BEAM-9725:
--

 Summary: Perfomance regression in reshuffle 
 Key: BEAM-9725
 URL: https://issues.apache.org/jira/browse/BEAM-9725
 Project: Beam
  Issue Type: Bug
  Components: runner-dataflow, sdk-py-core, sdk-py-harness
Affects Versions: 2.20.0
Reporter: Ankur Goenka
 Fix For: 2.21.0


PR [https://github.com/apache/beam/pull/11066] is causing a performance 
regression for reshuffle transform. 

 

cc: [~amaliujia] [~altay]

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9707) Hardcode runner harness container image for unified worker

2020-04-06 Thread Ankur Goenka (Jira)
Ankur Goenka created BEAM-9707:
--

 Summary: Hardcode runner harness container image for unified worker
 Key: BEAM-9707
 URL: https://issues.apache.org/jira/browse/BEAM-9707
 Project: Beam
  Issue Type: Bug
  Components: runner-dataflow, sdk-py-core
Reporter: Ankur Goenka
 Fix For: 2.21.0


Hardcode runner harness image temporarily to support usage of unified worker on 
head.

Remove this hardcoding once 2.21 is release



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9445) Preoptimize causes error with environment setup

2020-03-30 Thread Ankur Goenka (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071400#comment-17071400
 ] 

Ankur Goenka commented on BEAM-9445:


Was trying to reproduce this error with wordcount without luck.

Seems like some weird combination where environment might be dropped.

Will try TFX chicagotaxi example.

> Preoptimize causes error with environment setup
> ---
>
> Key: BEAM-9445
> URL: https://issues.apache.org/jira/browse/BEAM-9445
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Kyle Weaver
>Assignee: Ankur Goenka
>Priority: Major
>
> Setting --experiments=pre_optimize=all causes an error with StatisticsGen in 
> the TFX taxi example pipeline [1]:
>   File 
> "[redacted]/apache_beam/runners/portability/fn_api_runner_transforms.py", 
> line 250, in executable_stage_transform
> environment=components.environments[self.environment],
> TypeError: None has type NoneType, but expected one of: bytes, unicode [while 
> running 'Run[StatisticsGen]']
> cc [~angoenka]
> [1] 
> https://github.com/tensorflow/tfx/blob/ff314a6803675548c89a016a5110a91e5bf98024/tfx/examples/chicago_taxi_pipeline/taxi_pipeline_portable_beam.py#L155



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9287) Python Validates runner tests for Unified Worker

2020-03-30 Thread Ankur Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka resolved BEAM-9287.

Resolution: Fixed

> Python Validates runner tests for Unified Worker
> 
>
> Key: BEAM-9287
> URL: https://issues.apache.org/jira/browse/BEAM-9287
> Project: Beam
>  Issue Type: Test
>  Components: runner-dataflow, testing
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-9445) Preoptimize causes error with environment setup

2020-03-24 Thread Ankur Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka reassigned BEAM-9445:
--

Assignee: Ankur Goenka

> Preoptimize causes error with environment setup
> ---
>
> Key: BEAM-9445
> URL: https://issues.apache.org/jira/browse/BEAM-9445
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Kyle Weaver
>Assignee: Ankur Goenka
>Priority: Major
>
> Setting --experiments=pre_optimize=all causes an error with StatisticsGen in 
> the TFX taxi example pipeline [1]:
>   File 
> "[redacted]/apache_beam/runners/portability/fn_api_runner_transforms.py", 
> line 250, in executable_stage_transform
> environment=components.environments[self.environment],
> TypeError: None has type NoneType, but expected one of: bytes, unicode [while 
> running 'Run[StatisticsGen]']
> cc [~angoenka]
> [1] 
> https://github.com/tensorflow/tfx/blob/ff314a6803675548c89a016a5110a91e5bf98024/tfx/examples/chicago_taxi_pipeline/taxi_pipeline_portable_beam.py#L155



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9504) Remove streaming test_streaming_pipeline_returns_expected_user_metrics_fnapi_it from batch validates runner

2020-03-13 Thread Ankur Goenka (Jira)
Ankur Goenka created BEAM-9504:
--

 Summary: Remove streaming 
test_streaming_pipeline_returns_expected_user_metrics_fnapi_it from batch 
validates runner
 Key: BEAM-9504
 URL: https://issues.apache.org/jira/browse/BEAM-9504
 Project: Beam
  Issue Type: Bug
  Components: runner-dataflow, testing
Reporter: Ankur Goenka


We need not run streaming tests as part of batch VR tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-9499) test_multi_triggered_gbk_side_input is failing on head

2020-03-13 Thread Ankur Goenka (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058545#comment-17058545
 ] 

Ankur Goenka edited comment on BEAM-9499 at 3/13/20, 9:22 AM:
--

[~ruoyun] I have disable the test for now. Can you take a look and reenable it. 

Please feel free to re-assign it to the right person.


was (Author: angoenka):
[~ruoyun] Can you take a look. 

Please feel free to re-assign it to the right person.

> test_multi_triggered_gbk_side_input is failing on head
> --
>
> Key: BEAM-9499
> URL: https://issues.apache.org/jira/browse/BEAM-9499
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ruoyun Huang
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> test_multi_triggered_gbk_side_input is failing after it was fixed to run on 
> Dataflow runner. Earlier it was always running on DirectRunner.
>  Example failure: 
> [https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/6004/testReport/junit/apache_beam.transforms.sideinputs_test/SideInputsTest/test_multi_triggered_gbk_side_input_7/]
> Error:
> h3. Error Message
> 'list' object has no attribute 'proto'  >> begin captured 
> logging <<  apache_beam.options.pipeline_options: 
> WARNING: --region not set; will default to us-central1. Future releases of 
> Beam will require the user to set --region explicitly, or else have a default 
> set via the gcloud tool.
> {{[https://cloud.google.com/compute/docs/regions-zones]}}
> root: DEBUG: Unhandled type_constraint: Union[] root: DEBUG: Unhandled 
> type_constraint: Union[] root: DEBUG: Unhandled type_constraint: Union[] 
> apache_beam.runners.runner: ERROR: Error while visiting Main windowInto 
> - >> end captured logging << -
> h3. Stacktrace
> File "/usr/lib/python3.6/unittest/case.py", line 59, in testPartExecutor 
> yield File "/usr/lib/python3.6/unittest/case.py", line 605, in run 
> testMethod() File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/transforms/sideinputs_test.py",
>  line 406, in test_multi_triggered_gbk_side_input p.run() File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 112, in run False if self.not_use_test_runner_api else 
> test_runner_api)) File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py",
>  line 495, in run self._options).run(False) File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py",
>  line 508, in run return self.runner.run_pipeline(self, self._options) File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py",
>  line 57, in run_pipeline self).run_pipeline(pipeline, options) File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 536, in run_pipeline self.visit_transforms(pipeline, options) File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/runner.py",
>  line 224, in visit_transforms pipeline.visit(RunVisitor(self)) File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py",
>  line 545, in visit self._root_transform().visit(visitor, self, visited) File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py",
>  line 1033, in visit part.visit(visitor, pipeline, visited) File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py",
>  line 1036, in visit visitor.visit_transform(self) File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/runner.py",
>  line 219, in visit_transform self.runner.run_transform(transform_node, 
> options) File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/runner.py",
>  line 246, in run_transform return m(transform_node, options) File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 957, in run_ParDo PropertyNames.STEP_NAME: input_step.proto.name, 
> 'list' object has no attribute 'proto'  >> begin captured 
> logging <<  apache_beam.options.pipeline_options: 
> 

[jira] [Commented] (BEAM-9499) test_multi_triggered_gbk_side_input is failing on head

2020-03-13 Thread Ankur Goenka (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058545#comment-17058545
 ] 

Ankur Goenka commented on BEAM-9499:


[~ruoyun] Can you take a look. 

Please feel free to re-assign it to the right person.

> test_multi_triggered_gbk_side_input is failing on head
> --
>
> Key: BEAM-9499
> URL: https://issues.apache.org/jira/browse/BEAM-9499
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ruoyun Huang
>Priority: Major
>
> test_multi_triggered_gbk_side_input is failing after it was fixed to run on 
> Dataflow runner. Earlier it was always running on DirectRunner.
>  Example failure: 
> [https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/6004/testReport/junit/apache_beam.transforms.sideinputs_test/SideInputsTest/test_multi_triggered_gbk_side_input_7/]
> Error:
> h3. Error Message
> 'list' object has no attribute 'proto'  >> begin captured 
> logging <<  apache_beam.options.pipeline_options: 
> WARNING: --region not set; will default to us-central1. Future releases of 
> Beam will require the user to set --region explicitly, or else have a default 
> set via the gcloud tool.
> {{[https://cloud.google.com/compute/docs/regions-zones]}}
> root: DEBUG: Unhandled type_constraint: Union[] root: DEBUG: Unhandled 
> type_constraint: Union[] root: DEBUG: Unhandled type_constraint: Union[] 
> apache_beam.runners.runner: ERROR: Error while visiting Main windowInto 
> - >> end captured logging << -
> h3. Stacktrace
> File "/usr/lib/python3.6/unittest/case.py", line 59, in testPartExecutor 
> yield File "/usr/lib/python3.6/unittest/case.py", line 605, in run 
> testMethod() File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/transforms/sideinputs_test.py",
>  line 406, in test_multi_triggered_gbk_side_input p.run() File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 112, in run False if self.not_use_test_runner_api else 
> test_runner_api)) File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py",
>  line 495, in run self._options).run(False) File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py",
>  line 508, in run return self.runner.run_pipeline(self, self._options) File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py",
>  line 57, in run_pipeline self).run_pipeline(pipeline, options) File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 536, in run_pipeline self.visit_transforms(pipeline, options) File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/runner.py",
>  line 224, in visit_transforms pipeline.visit(RunVisitor(self)) File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py",
>  line 545, in visit self._root_transform().visit(visitor, self, visited) File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py",
>  line 1033, in visit part.visit(visitor, pipeline, visited) File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py",
>  line 1036, in visit visitor.visit_transform(self) File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/runner.py",
>  line 219, in visit_transform self.runner.run_transform(transform_node, 
> options) File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/runner.py",
>  line 246, in run_transform return m(transform_node, options) File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 957, in run_ParDo PropertyNames.STEP_NAME: input_step.proto.name, 
> 'list' object has no attribute 'proto'  >> begin captured 
> logging <<  apache_beam.options.pipeline_options: 
> WARNING: --region not set; will default to us-central1. Future releases of 
> Beam will require the user to set --region explicitly, or else have a default 
> set via the gcloud tool.
> {{[https://cloud.google.com/compute/docs/regions-zones]}}
> root: DEBUG: Unhandled 

[jira] [Assigned] (BEAM-9499) test_multi_triggered_gbk_side_input is failing on head

2020-03-13 Thread Ankur Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka reassigned BEAM-9499:
--

Assignee: Ruoyun Huang

> test_multi_triggered_gbk_side_input is failing on head
> --
>
> Key: BEAM-9499
> URL: https://issues.apache.org/jira/browse/BEAM-9499
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ruoyun Huang
>Priority: Major
>
> test_multi_triggered_gbk_side_input is failing after it was fixed to run on 
> Dataflow runner. Earlier it was always running on DirectRunner.
>  Example failure: 
> [https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/6004/testReport/junit/apache_beam.transforms.sideinputs_test/SideInputsTest/test_multi_triggered_gbk_side_input_7/]
> Error:
> h3. Error Message
> 'list' object has no attribute 'proto'  >> begin captured 
> logging <<  apache_beam.options.pipeline_options: 
> WARNING: --region not set; will default to us-central1. Future releases of 
> Beam will require the user to set --region explicitly, or else have a default 
> set via the gcloud tool.
> {{[https://cloud.google.com/compute/docs/regions-zones]}}
> root: DEBUG: Unhandled type_constraint: Union[] root: DEBUG: Unhandled 
> type_constraint: Union[] root: DEBUG: Unhandled type_constraint: Union[] 
> apache_beam.runners.runner: ERROR: Error while visiting Main windowInto 
> - >> end captured logging << -
> h3. Stacktrace
> File "/usr/lib/python3.6/unittest/case.py", line 59, in testPartExecutor 
> yield File "/usr/lib/python3.6/unittest/case.py", line 605, in run 
> testMethod() File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/transforms/sideinputs_test.py",
>  line 406, in test_multi_triggered_gbk_side_input p.run() File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 112, in run False if self.not_use_test_runner_api else 
> test_runner_api)) File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py",
>  line 495, in run self._options).run(False) File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py",
>  line 508, in run return self.runner.run_pipeline(self, self._options) File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py",
>  line 57, in run_pipeline self).run_pipeline(pipeline, options) File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 536, in run_pipeline self.visit_transforms(pipeline, options) File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/runner.py",
>  line 224, in visit_transforms pipeline.visit(RunVisitor(self)) File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py",
>  line 545, in visit self._root_transform().visit(visitor, self, visited) File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py",
>  line 1033, in visit part.visit(visitor, pipeline, visited) File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py",
>  line 1036, in visit visitor.visit_transform(self) File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/runner.py",
>  line 219, in visit_transform self.runner.run_transform(transform_node, 
> options) File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/runner.py",
>  line 246, in run_transform return m(transform_node, options) File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 957, in run_ParDo PropertyNames.STEP_NAME: input_step.proto.name, 
> 'list' object has no attribute 'proto'  >> begin captured 
> logging <<  apache_beam.options.pipeline_options: 
> WARNING: --region not set; will default to us-central1. Future releases of 
> Beam will require the user to set --region explicitly, or else have a default 
> set via the gcloud tool.
> {{[https://cloud.google.com/compute/docs/regions-zones]}}
> root: DEBUG: Unhandled type_constraint: Union[] root: DEBUG: Unhandled 
> type_constraint: Union[] root: DEBUG: Unhandled 

[jira] [Commented] (BEAM-9499) test_multi_triggered_gbk_side_input is failing on head

2020-03-13 Thread Ankur Goenka (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058540#comment-17058540
 ] 

Ankur Goenka commented on BEAM-9499:


Sickbaying this test as this test has never truly run on Dataflow. 

 

CC: [~HuangLED] [~robertwb] 

> test_multi_triggered_gbk_side_input is failing on head
> --
>
> Key: BEAM-9499
> URL: https://issues.apache.org/jira/browse/BEAM-9499
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-core
>Reporter: Ankur Goenka
>Priority: Major
>
> test_multi_triggered_gbk_side_input is failing after it was fixed to run on 
> Dataflow runner. Earlier it was always running on DirectRunner.
>  Example failure: 
> [https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/6004/testReport/junit/apache_beam.transforms.sideinputs_test/SideInputsTest/test_multi_triggered_gbk_side_input_7/]
> Error:
> h3. Error Message
> 'list' object has no attribute 'proto'  >> begin captured 
> logging <<  apache_beam.options.pipeline_options: 
> WARNING: --region not set; will default to us-central1. Future releases of 
> Beam will require the user to set --region explicitly, or else have a default 
> set via the gcloud tool.
> {{[https://cloud.google.com/compute/docs/regions-zones]}}
> root: DEBUG: Unhandled type_constraint: Union[] root: DEBUG: Unhandled 
> type_constraint: Union[] root: DEBUG: Unhandled type_constraint: Union[] 
> apache_beam.runners.runner: ERROR: Error while visiting Main windowInto 
> - >> end captured logging << -
> h3. Stacktrace
> File "/usr/lib/python3.6/unittest/case.py", line 59, in testPartExecutor 
> yield File "/usr/lib/python3.6/unittest/case.py", line 605, in run 
> testMethod() File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/transforms/sideinputs_test.py",
>  line 406, in test_multi_triggered_gbk_side_input p.run() File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 112, in run False if self.not_use_test_runner_api else 
> test_runner_api)) File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py",
>  line 495, in run self._options).run(False) File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py",
>  line 508, in run return self.runner.run_pipeline(self, self._options) File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py",
>  line 57, in run_pipeline self).run_pipeline(pipeline, options) File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 536, in run_pipeline self.visit_transforms(pipeline, options) File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/runner.py",
>  line 224, in visit_transforms pipeline.visit(RunVisitor(self)) File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py",
>  line 545, in visit self._root_transform().visit(visitor, self, visited) File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py",
>  line 1033, in visit part.visit(visitor, pipeline, visited) File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py",
>  line 1036, in visit visitor.visit_transform(self) File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/runner.py",
>  line 219, in visit_transform self.runner.run_transform(transform_node, 
> options) File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/runner.py",
>  line 246, in run_transform return m(transform_node, options) File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 957, in run_ParDo PropertyNames.STEP_NAME: input_step.proto.name, 
> 'list' object has no attribute 'proto'  >> begin captured 
> logging <<  apache_beam.options.pipeline_options: 
> WARNING: --region not set; will default to us-central1. Future releases of 
> Beam will require the user to set --region explicitly, or else have a default 
> set via the gcloud tool.
> {{[https://cloud.google.com/compute/docs/regions-zones]}}
> root: DEBUG: Unhandled type_constraint: Union[] 

[jira] [Updated] (BEAM-9499) test_multi_triggered_gbk_side_input is failing on head

2020-03-13 Thread Ankur Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka updated BEAM-9499:
---
Description: 
test_multi_triggered_gbk_side_input is failing after it was fixed to run on 
Dataflow runner. Earlier it was always running on DirectRunner.

 Example failure: 
[https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/6004/testReport/junit/apache_beam.transforms.sideinputs_test/SideInputsTest/test_multi_triggered_gbk_side_input_7/]

Error:
h3. Error Message

'list' object has no attribute 'proto'  >> begin captured 
logging <<  apache_beam.options.pipeline_options: WARNING: 
--region not set; will default to us-central1. Future releases of Beam will 
require the user to set --region explicitly, or else have a default set via the 
gcloud tool.

{{[https://cloud.google.com/compute/docs/regions-zones]}}

root: DEBUG: Unhandled type_constraint: Union[] root: DEBUG: Unhandled 
type_constraint: Union[] root: DEBUG: Unhandled type_constraint: Union[] 
apache_beam.runners.runner: ERROR: Error while visiting Main windowInto 
- >> end captured logging << -
h3. Stacktrace

File "/usr/lib/python3.6/unittest/case.py", line 59, in testPartExecutor yield 
File "/usr/lib/python3.6/unittest/case.py", line 605, in run testMethod() File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/transforms/sideinputs_test.py",
 line 406, in test_multi_triggered_gbk_side_input p.run() File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/testing/test_pipeline.py",
 line 112, in run False if self.not_use_test_runner_api else test_runner_api)) 
File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py",
 line 495, in run self._options).run(False) File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py",
 line 508, in run return self.runner.run_pipeline(self, self._options) File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py",
 line 57, in run_pipeline self).run_pipeline(pipeline, options) File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py",
 line 536, in run_pipeline self.visit_transforms(pipeline, options) File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/runner.py",
 line 224, in visit_transforms pipeline.visit(RunVisitor(self)) File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py",
 line 545, in visit self._root_transform().visit(visitor, self, visited) File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py",
 line 1033, in visit part.visit(visitor, pipeline, visited) File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py",
 line 1036, in visit visitor.visit_transform(self) File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/runner.py",
 line 219, in visit_transform self.runner.run_transform(transform_node, 
options) File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/runner.py",
 line 246, in run_transform return m(transform_node, options) File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py",
 line 957, in run_ParDo PropertyNames.STEP_NAME: input_step.proto.name, 'list' 
object has no attribute 'proto'  >> begin captured logging 
<<  apache_beam.options.pipeline_options: WARNING: --region 
not set; will default to us-central1. Future releases of Beam will require the 
user to set --region explicitly, or else have a default set via the gcloud tool.

{{[https://cloud.google.com/compute/docs/regions-zones]}}

root: DEBUG: Unhandled type_constraint: Union[] root: DEBUG: Unhandled 
type_constraint: Union[] root: DEBUG: Unhandled type_constraint: Union[] 
apache_beam.runners.runner: ERROR: Error while visiting Main windowInto 
- >> end captured logging << -

  was:
test_multi_triggered_gbk_side_input is failing after it was fixed to run on 
Dataflow runner. Earlier it was always running on DirectRunner.

 

Error:
h3. Error Message

'list' object has no attribute 'proto'  >> begin captured 
logging <<  apache_beam.options.pipeline_options: WARNING: 
--region not set; will default to 

[jira] [Created] (BEAM-9499) test_multi_triggered_gbk_side_input is failing on head

2020-03-13 Thread Ankur Goenka (Jira)
Ankur Goenka created BEAM-9499:
--

 Summary: test_multi_triggered_gbk_side_input is failing on head
 Key: BEAM-9499
 URL: https://issues.apache.org/jira/browse/BEAM-9499
 Project: Beam
  Issue Type: Bug
  Components: runner-dataflow, sdk-py-core
Reporter: Ankur Goenka


test_multi_triggered_gbk_side_input is failing after it was fixed to run on 
Dataflow runner. Earlier it was always running on DirectRunner.

 

Error:
h3. Error Message

'list' object has no attribute 'proto'  >> begin captured 
logging <<  apache_beam.options.pipeline_options: WARNING: 
--region not set; will default to us-central1. Future releases of Beam will 
require the user to set --region explicitly, or else have a default set via the 
gcloud tool.

{{[https://cloud.google.com/compute/docs/regions-zones]}}

root: DEBUG: Unhandled type_constraint: Union[] root: DEBUG: Unhandled 
type_constraint: Union[] root: DEBUG: Unhandled type_constraint: Union[] 
apache_beam.runners.runner: ERROR: Error while visiting Main windowInto 
- >> end captured logging << -
h3. Stacktrace

File "/usr/lib/python3.6/unittest/case.py", line 59, in testPartExecutor yield 
File "/usr/lib/python3.6/unittest/case.py", line 605, in run testMethod() File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/transforms/sideinputs_test.py",
 line 406, in test_multi_triggered_gbk_side_input p.run() File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/testing/test_pipeline.py",
 line 112, in run False if self.not_use_test_runner_api else test_runner_api)) 
File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py",
 line 495, in run self._options).run(False) File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py",
 line 508, in run return self.runner.run_pipeline(self, self._options) File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py",
 line 57, in run_pipeline self).run_pipeline(pipeline, options) File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py",
 line 536, in run_pipeline self.visit_transforms(pipeline, options) File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/runner.py",
 line 224, in visit_transforms pipeline.visit(RunVisitor(self)) File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py",
 line 545, in visit self._root_transform().visit(visitor, self, visited) File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py",
 line 1033, in visit part.visit(visitor, pipeline, visited) File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py",
 line 1036, in visit visitor.visit_transform(self) File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/runner.py",
 line 219, in visit_transform self.runner.run_transform(transform_node, 
options) File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/runner.py",
 line 246, in run_transform return m(transform_node, options) File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py",
 line 957, in run_ParDo PropertyNames.STEP_NAME: input_step.proto.name, 'list' 
object has no attribute 'proto'  >> begin captured logging 
<<  apache_beam.options.pipeline_options: WARNING: --region 
not set; will default to us-central1. Future releases of Beam will require the 
user to set --region explicitly, or else have a default set via the gcloud tool.

{{[https://cloud.google.com/compute/docs/regions-zones]}}

root: DEBUG: Unhandled type_constraint: Union[] root: DEBUG: Unhandled 
type_constraint: Union[] root: DEBUG: Unhandled type_constraint: Union[] 
apache_beam.runners.runner: ERROR: Error while visiting Main windowInto 
- >> end captured logging << -



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (BEAM-9287) Python Validates runner tests for Unified Worker

2020-03-12 Thread Ankur Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka reopened BEAM-9287:


We need to disable the test for batch validates runner as TeasStreams is only 
supported in streaming mode.

> Python Validates runner tests for Unified Worker
> 
>
> Key: BEAM-9287
> URL: https://issues.apache.org/jira/browse/BEAM-9287
> Project: Beam
>  Issue Type: Test
>  Components: runner-dataflow, testing
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9485) Dataflow Silently drops Non implemented transform in fnapi mode.

2020-03-12 Thread Ankur Goenka (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058316#comment-17058316
 ] 

Ankur Goenka commented on BEAM-9485:


Thanks for taking a look. Happy to answer. 

Is this a regression from a previous release? -> No, the issue is present in 
previous releases.
Is this a new feature or related to a new feature? -> Its related to UW. I am 
not sure if it impact any of the current users. [~chamikara]  Do you have more 
context on this?
What percentage of users would be impacted by this issue if it is not fixed? -> 
anyone using UW which will be small.
Would it be possible for the impacted users to skip this version? -> No. UW 
will only be supported for 2.20

 

> Dataflow Silently drops Non implemented transform in fnapi mode.
> 
>
> Key: BEAM-9485
> URL: https://issues.apache.org/jira/browse/BEAM-9485
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> We need to raise an error here 
> https://github.com/apache/beam/blob/02cb8d807314a38542c9894b19483e4333d8223b/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L857



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9485) Dataflow Silently drops Non implemented transform in fnapi mode.

2020-03-12 Thread Ankur Goenka (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058159#comment-17058159
 ] 

Ankur Goenka commented on BEAM-9485:


[~ruiwang] We should also cherry pick this change as it silently drops 
transforms causing hard to find bug in pipleine

 

cc [~altay]   [~chamikara] 

 

> Dataflow Silently drops Non implemented transform in fnapi mode.
> 
>
> Key: BEAM-9485
> URL: https://issues.apache.org/jira/browse/BEAM-9485
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> We need to raise an error here 
> https://github.com/apache/beam/blob/02cb8d807314a38542c9894b19483e4333d8223b/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L857



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-9485) Dataflow Silently drops Non implemented transform in fnapi mode.

2020-03-12 Thread Ankur Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka reassigned BEAM-9485:
--

Assignee: Ankur Goenka

> Dataflow Silently drops Non implemented transform in fnapi mode.
> 
>
> Key: BEAM-9485
> URL: https://issues.apache.org/jira/browse/BEAM-9485
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> We need to raise an error here 
> https://github.com/apache/beam/blob/02cb8d807314a38542c9894b19483e4333d8223b/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L857



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9485) Dataflow Silently drops Non implemented transform in fnapi mode.

2020-03-12 Thread Ankur Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka updated BEAM-9485:
---
Fix Version/s: 2.20.0

> Dataflow Silently drops Non implemented transform in fnapi mode.
> 
>
> Key: BEAM-9485
> URL: https://issues.apache.org/jira/browse/BEAM-9485
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-core
>Reporter: Ankur Goenka
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> We need to raise an error here 
> https://github.com/apache/beam/blob/02cb8d807314a38542c9894b19483e4333d8223b/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L857



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9485) Dataflow Silently drops Non implemented transform in fnapi mode.

2020-03-11 Thread Ankur Goenka (Jira)
Ankur Goenka created BEAM-9485:
--

 Summary: Dataflow Silently drops Non implemented transform in 
fnapi mode.
 Key: BEAM-9485
 URL: https://issues.apache.org/jira/browse/BEAM-9485
 Project: Beam
  Issue Type: Bug
  Components: runner-dataflow, sdk-py-core
Reporter: Ankur Goenka


We need to raise an error here 
https://github.com/apache/beam/blob/02cb8d807314a38542c9894b19483e4333d8223b/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L857



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-9465) Reshuffle should trigger repeatedly

2020-03-06 Thread Ankur Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka reassigned BEAM-9465:
--

Assignee: Ankur Goenka

> Reshuffle should trigger repeatedly
> ---
>
> Key: BEAM-9465
> URL: https://issues.apache.org/jira/browse/BEAM-9465
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> [https://github.com/apache/beam/blob/403a08f8b95d13e5381a22c1c032ad22c8848650/sdks/python/apache_beam/transforms/trigger.py#L516|https://www.google.com/url?q=https://github.com/apache/beam/blob/403a08f8b95d13e5381a22c1c032ad22c8848650/sdks/python/apache_beam/transforms/trigger.py%23L516=D]
> should fire repeatedly 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9465) Reshuffle should trigger repeatedly

2020-03-06 Thread Ankur Goenka (Jira)
Ankur Goenka created BEAM-9465:
--

 Summary: Reshuffle should trigger repeatedly
 Key: BEAM-9465
 URL: https://issues.apache.org/jira/browse/BEAM-9465
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Ankur Goenka
 Fix For: 2.20.0


[https://github.com/apache/beam/blob/403a08f8b95d13e5381a22c1c032ad22c8848650/sdks/python/apache_beam/transforms/trigger.py#L516|https://www.google.com/url?q=https://github.com/apache/beam/blob/403a08f8b95d13e5381a22c1c032ad22c8848650/sdks/python/apache_beam/transforms/trigger.py%23L516=D]

should fire repeatedly 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9431) test_streaming_pipeline_returns_expected_user_metrics_fnapi_it is not supported with streaming engine

2020-03-03 Thread Ankur Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka updated BEAM-9431:
---
Component/s: runner-dataflow

> test_streaming_pipeline_returns_expected_user_metrics_fnapi_it is not 
> supported with streaming engine
> -
>
> Key: BEAM-9431
> URL: https://issues.apache.org/jira/browse/BEAM-9431
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, testing
>Reporter: Ankur Goenka
>Priority: Major
>
> test_streaming_pipeline_returns_expected_user_metrics_fnapi_it is failing on 
> Dataflow V2 as ReadFromPubSub/Read-out0-ElementCount is not implemented in 
> with streaming engine.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9431) test_streaming_pipeline_returns_expected_user_metrics_fnapi_it is not supported with streaming engine

2020-03-03 Thread Ankur Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka updated BEAM-9431:
---
Description: test_streaming_pipeline_returns_expected_user_metrics_fnapi_it 
is failing on Dataflow V2 as ReadFromPubSub/Read-out0-ElementCount is not 
implemented in with streaming engine.

> test_streaming_pipeline_returns_expected_user_metrics_fnapi_it is not 
> supported with streaming engine
> -
>
> Key: BEAM-9431
> URL: https://issues.apache.org/jira/browse/BEAM-9431
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Ankur Goenka
>Priority: Major
>
> test_streaming_pipeline_returns_expected_user_metrics_fnapi_it is failing on 
> Dataflow V2 as ReadFromPubSub/Read-out0-ElementCount is not implemented in 
> with streaming engine.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9431) test_streaming_pipeline_returns_expected_user_metrics_fnapi_it is not supported with streaming engine

2020-03-03 Thread Ankur Goenka (Jira)
Ankur Goenka created BEAM-9431:
--

 Summary: 
test_streaming_pipeline_returns_expected_user_metrics_fnapi_it is not supported 
with streaming engine
 Key: BEAM-9431
 URL: https://issues.apache.org/jira/browse/BEAM-9431
 Project: Beam
  Issue Type: Bug
  Components: testing
Reporter: Ankur Goenka






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9287) Python Validates runner tests for Unified Worker

2020-03-02 Thread Ankur Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka updated BEAM-9287:
---
Fix Version/s: (was: 2.20.0)
   2.21.0

> Python Validates runner tests for Unified Worker
> 
>
> Key: BEAM-9287
> URL: https://issues.apache.org/jira/browse/BEAM-9287
> Project: Beam
>  Issue Type: Test
>  Components: runner-dataflow, testing
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9287) Python Validates runner tests for Unified Worker

2020-03-02 Thread Ankur Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka resolved BEAM-9287.

Resolution: Fixed

> Python Validates runner tests for Unified Worker
> 
>
> Key: BEAM-9287
> URL: https://issues.apache.org/jira/browse/BEAM-9287
> Project: Beam
>  Issue Type: Test
>  Components: runner-dataflow, testing
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9287) Python Validates runner tests for Unified Worker

2020-03-02 Thread Ankur Goenka (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17049629#comment-17049629
 ] 

Ankur Goenka commented on BEAM-9287:


The tests are enabled now.

> Python Validates runner tests for Unified Worker
> 
>
> Key: BEAM-9287
> URL: https://issues.apache.org/jira/browse/BEAM-9287
> Project: Beam
>  Issue Type: Test
>  Components: runner-dataflow, testing
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (BEAM-9287) Python Validates runner tests for Unified Worker

2020-02-27 Thread Ankur Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka reopened BEAM-9287:


> Python Validates runner tests for Unified Worker
> 
>
> Key: BEAM-9287
> URL: https://issues.apache.org/jira/browse/BEAM-9287
> Project: Beam
>  Issue Type: Test
>  Components: runner-dataflow, testing
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9402) test_multi_triggered_gbk_side_input is always using DirectRunner

2020-02-27 Thread Ankur Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka updated BEAM-9402:
---
Description: 
The option provided in the test overwrite the argument provided during 
validates runner test

 

 

- >> end captured logging << -

==
 FAIL: Test a GBK sideinput, with multiple triggering.
 --
 Traceback (most recent call last):
 File 
"/usr/local/google/home/goenka/d/work/beam/beam/sdks/python/apache_beam/transforms/sideinputs_test.py",
 line 401, in test_multi_triggered_gbk_side_input
 p.run()
 File 
"/usr/local/google/home/goenka/d/work/beam/beam/sdks/python/apache_beam/testing/test_pipeline.py",
 line 112, in run
 False if self.not_use_test_runner_api else test_runner_api))
 File 
"/usr/local/google/home/goenka/d/work/beam/beam/sdks/python/apache_beam/pipeline.py",
 line 483, in run
 self._options).run(False)
 File 
"/usr/local/google/home/goenka/d/work/beam/beam/sdks/python/apache_beam/pipeline.py",
 line 809, in from_runner_api
 p.transforms_stack = [context.transforms.get_by_id(root_transform_id)]
 File 
"/usr/local/google/home/goenka/d/work/beam/beam/sdks/python/apache_beam/runners/pipeline_context.py",
 line 103, in get_by_id
 self._id_to_proto[id], self._pipeline_context)
 File 
"/usr/local/google/home/goenka/d/work/beam/beam/sdks/python/apache_beam/pipeline.py",
 line 1117, in from_runner_api
 part = context.transforms.get_by_id(transform_id)
 File 
"/usr/local/google/home/goenka/d/work/beam/beam/sdks/python/apache_beam/runners/pipeline_context.py",
 line 103, in get_by_id
 self._id_to_proto[id], self._pipeline_context)
 File 
"/usr/local/google/home/goenka/d/work/beam/beam/sdks/python/apache_beam/pipeline.py",
 line 1104, in from_runner_api
 transform = ptransform.PTransform.from_runner_api(proto, context)
 File 
"/usr/local/google/home/goenka/d/work/beam/beam/sdks/python/apache_beam/transforms/ptransform.py",
 line 684, in from_runner_api
 context)
 File 
"/usr/local/google/home/goenka/d/work/beam/beam/sdks/python/apache_beam/testing/test_stream.py",
 line 316, in from_runner_api_parameter
 output_tags=output_tags)
 File 
"/usr/local/google/home/goenka/d/work/beam/beam/sdks/python/apache_beam/testing/test_stream.py",
 line 200, in __init__
 assert event_tags.issubset(self.output_tags)
 AssertionError: 
  >> begin captured logging << 
 root: INFO: Missing pipeline option (runner). Executing pipeline using the 
default runner: DirectRunner.
 - >> end captured logging << -

  was:
- >> end captured logging << -

==
FAIL: Test a GBK sideinput, with multiple triggering.
--
Traceback (most recent call last):
 File 
"/usr/local/google/home/goenka/d/work/beam/beam/sdks/python/apache_beam/transforms/sideinputs_test.py",
 line 401, in test_multi_triggered_gbk_side_input
 p.run()
 File 
"/usr/local/google/home/goenka/d/work/beam/beam/sdks/python/apache_beam/testing/test_pipeline.py",
 line 112, in run
 False if self.not_use_test_runner_api else test_runner_api))
 File 
"/usr/local/google/home/goenka/d/work/beam/beam/sdks/python/apache_beam/pipeline.py",
 line 483, in run
 self._options).run(False)
 File 
"/usr/local/google/home/goenka/d/work/beam/beam/sdks/python/apache_beam/pipeline.py",
 line 809, in from_runner_api
 p.transforms_stack = [context.transforms.get_by_id(root_transform_id)]
 File 
"/usr/local/google/home/goenka/d/work/beam/beam/sdks/python/apache_beam/runners/pipeline_context.py",
 line 103, in get_by_id
 self._id_to_proto[id], self._pipeline_context)
 File 
"/usr/local/google/home/goenka/d/work/beam/beam/sdks/python/apache_beam/pipeline.py",
 line 1117, in from_runner_api
 part = context.transforms.get_by_id(transform_id)
 File 
"/usr/local/google/home/goenka/d/work/beam/beam/sdks/python/apache_beam/runners/pipeline_context.py",
 line 103, in get_by_id
 self._id_to_proto[id], self._pipeline_context)
 File 
"/usr/local/google/home/goenka/d/work/beam/beam/sdks/python/apache_beam/pipeline.py",
 line 1104, in from_runner_api
 transform = ptransform.PTransform.from_runner_api(proto, context)
 File 
"/usr/local/google/home/goenka/d/work/beam/beam/sdks/python/apache_beam/transforms/ptransform.py",
 line 684, in from_runner_api
 context)
 File 
"/usr/local/google/home/goenka/d/work/beam/beam/sdks/python/apache_beam/testing/test_stream.py",
 line 316, in from_runner_api_parameter
 output_tags=output_tags)
 File 
"/usr/local/google/home/goenka/d/work/beam/beam/sdks/python/apache_beam/testing/test_stream.py",
 line 200, in __init__
 assert 

[jira] [Created] (BEAM-9402) test_multi_triggered_gbk_side_input is always using DirectRunner

2020-02-27 Thread Ankur Goenka (Jira)
Ankur Goenka created BEAM-9402:
--

 Summary: test_multi_triggered_gbk_side_input is always using 
DirectRunner
 Key: BEAM-9402
 URL: https://issues.apache.org/jira/browse/BEAM-9402
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core, testing
Reporter: Ankur Goenka
Assignee: Ankur Goenka


- >> end captured logging << -

==
FAIL: Test a GBK sideinput, with multiple triggering.
--
Traceback (most recent call last):
 File 
"/usr/local/google/home/goenka/d/work/beam/beam/sdks/python/apache_beam/transforms/sideinputs_test.py",
 line 401, in test_multi_triggered_gbk_side_input
 p.run()
 File 
"/usr/local/google/home/goenka/d/work/beam/beam/sdks/python/apache_beam/testing/test_pipeline.py",
 line 112, in run
 False if self.not_use_test_runner_api else test_runner_api))
 File 
"/usr/local/google/home/goenka/d/work/beam/beam/sdks/python/apache_beam/pipeline.py",
 line 483, in run
 self._options).run(False)
 File 
"/usr/local/google/home/goenka/d/work/beam/beam/sdks/python/apache_beam/pipeline.py",
 line 809, in from_runner_api
 p.transforms_stack = [context.transforms.get_by_id(root_transform_id)]
 File 
"/usr/local/google/home/goenka/d/work/beam/beam/sdks/python/apache_beam/runners/pipeline_context.py",
 line 103, in get_by_id
 self._id_to_proto[id], self._pipeline_context)
 File 
"/usr/local/google/home/goenka/d/work/beam/beam/sdks/python/apache_beam/pipeline.py",
 line 1117, in from_runner_api
 part = context.transforms.get_by_id(transform_id)
 File 
"/usr/local/google/home/goenka/d/work/beam/beam/sdks/python/apache_beam/runners/pipeline_context.py",
 line 103, in get_by_id
 self._id_to_proto[id], self._pipeline_context)
 File 
"/usr/local/google/home/goenka/d/work/beam/beam/sdks/python/apache_beam/pipeline.py",
 line 1104, in from_runner_api
 transform = ptransform.PTransform.from_runner_api(proto, context)
 File 
"/usr/local/google/home/goenka/d/work/beam/beam/sdks/python/apache_beam/transforms/ptransform.py",
 line 684, in from_runner_api
 context)
 File 
"/usr/local/google/home/goenka/d/work/beam/beam/sdks/python/apache_beam/testing/test_stream.py",
 line 316, in from_runner_api_parameter
 output_tags=output_tags)
 File 
"/usr/local/google/home/goenka/d/work/beam/beam/sdks/python/apache_beam/testing/test_stream.py",
 line 200, in __init__
 assert event_tags.issubset(self.output_tags)
AssertionError: 
 >> begin captured logging << 
root: INFO: Missing pipeline option (runner). Executing pipeline using the 
default runner: DirectRunner.
- >> end captured logging << -



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9391) Cleanup hardcoded unified worker images

2020-02-26 Thread Ankur Goenka (Jira)
Ankur Goenka created BEAM-9391:
--

 Summary: Cleanup hardcoded unified worker images  
 Key: BEAM-9391
 URL: https://issues.apache.org/jira/browse/BEAM-9391
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-harness, testing
Reporter: Ankur Goenka
Assignee: Ankur Goenka






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9347) Remove default image for Unified Worker

2020-02-24 Thread Ankur Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka resolved BEAM-9347.

Resolution: Fixed

> Remove default image for Unified Worker
> ---
>
> Key: BEAM-9347
> URL: https://issues.apache.org/jira/browse/BEAM-9347
> Project: Beam
>  Issue Type: Test
>  Components: runner-dataflow
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> The runner will choose the Runner Harness image for UW so we don't need to 
> overwrite the image in default behavior.
> Also, this will help us distinguish between user requested overwrites for the 
> default overwrites(which is not used).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9347) Remove default image for Unified Worker

2020-02-24 Thread Ankur Goenka (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043982#comment-17043982
 ] 

Ankur Goenka commented on BEAM-9347:


Yes, The PR is merged. 

> Remove default image for Unified Worker
> ---
>
> Key: BEAM-9347
> URL: https://issues.apache.org/jira/browse/BEAM-9347
> Project: Beam
>  Issue Type: Test
>  Components: runner-dataflow
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> The runner will choose the Runner Harness image for UW so we don't need to 
> overwrite the image in default behavior.
> Also, this will help us distinguish between user requested overwrites for the 
> default overwrites(which is not used).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9347) Remove default image for Unified Worker

2020-02-20 Thread Ankur Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka updated BEAM-9347:
---
Description: 
The runner will choose the Runner Harness image for UW so we don't need to 
overwrite the image in default behavior.

Also, this will help us distinguish between user requested overwrites for the 
default overwrites(which is not used).

> Remove default image for Unified Worker
> ---
>
> Key: BEAM-9347
> URL: https://issues.apache.org/jira/browse/BEAM-9347
> Project: Beam
>  Issue Type: Test
>  Components: runner-dataflow
>Reporter: Ankur Goenka
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The runner will choose the Runner Harness image for UW so we don't need to 
> overwrite the image in default behavior.
> Also, this will help us distinguish between user requested overwrites for the 
> default overwrites(which is not used).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9347) Remove default image for Unified Worker

2020-02-20 Thread Ankur Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka updated BEAM-9347:
---
Component/s: (was: testing)

> Remove default image for Unified Worker
> ---
>
> Key: BEAM-9347
> URL: https://issues.apache.org/jira/browse/BEAM-9347
> Project: Beam
>  Issue Type: Test
>  Components: runner-dataflow
>Reporter: Ankur Goenka
>Priority: Major
> Fix For: 2.20.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9347) Remove default image for Unified Worker

2020-02-20 Thread Ankur Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka updated BEAM-9347:
---
Fix Version/s: 2.20.0

> Remove default image for Unified Worker
> ---
>
> Key: BEAM-9347
> URL: https://issues.apache.org/jira/browse/BEAM-9347
> Project: Beam
>  Issue Type: Test
>  Components: runner-dataflow, testing
>Reporter: Ankur Goenka
>Priority: Major
> Fix For: 2.20.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9347) Remove default image for Unified Worker

2020-02-20 Thread Ankur Goenka (Jira)
Ankur Goenka created BEAM-9347:
--

 Summary: Remove default image for Unified Worker
 Key: BEAM-9347
 URL: https://issues.apache.org/jira/browse/BEAM-9347
 Project: Beam
  Issue Type: Test
  Components: runner-dataflow, testing
Reporter: Ankur Goenka






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9290) runner_harness_container_image experiment is not honored in python released sdks.

2020-02-20 Thread Ankur Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka resolved BEAM-9290.

Resolution: Fixed

> runner_harness_container_image experiment is not honored in python released 
> sdks.
> -
>
> Key: BEAM-9290
> URL: https://issues.apache.org/jira/browse/BEAM-9290
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
>  
> {code:java}
> --experiments=runner_harness_container_image=foo_image{code}
> does not have any affect on the job.
>  
>  
> cc: [~tvalentyn]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-4032) Support staging binary distributions of dependency packages.

2020-02-19 Thread Ankur Goenka (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17040489#comment-17040489
 ] 

Ankur Goenka commented on BEAM-4032:


This will become less of a concern when we start supporting custom containers 
as the binaries will be preinstalled on it.

> Support staging binary distributions of dependency packages.
> 
>
> Key: BEAM-4032
> URL: https://issues.apache.org/jira/browse/BEAM-4032
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Priority: Major
>
> requirements.txt only supports source-distribution dependencies [1].
> --extra_packages does not officially support wheel files [2].
> It is possible to expand this to support binary distributions as long as we 
> have the knowledge of the target platform.
> We should take into consideration the mechanisms of staging dependencies 
> through portability framework, and perhaps consolidate some of the existing 
> options.
> [https://github.com/apache/beam/blob/a79d1b4fc27eb81db0d9a773047820a206f3d238/sdks/python/apache_beam/runners/dataflow/internal/dependency.py#L260]
> [https://github.com/apache/beam/blob/a79d1b4fc27eb81db0d9a773047820a206f3d238/sdks/python/apache_beam/runners/dataflow/internal/dependency.py#L188]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9299) Upgrade Flink Runner to 1.8.3 and 1.9.2

2020-02-14 Thread Ankur Goenka (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17037271#comment-17037271
 ] 

Ankur Goenka commented on BEAM-9299:


+1 for option 2. 

We should not make deprecation of old flink version a blocker for adding 
support for new flink versions.

 

As a side item, I think we should have easy ways to test old versions of flink. 
the beam precommit tests primarily use latest flink version.

> Upgrade Flink Runner to 1.8.3 and 1.9.2
> ---
>
> Key: BEAM-9299
> URL: https://issues.apache.org/jira/browse/BEAM-9299
> Project: Beam
>  Issue Type: Task
>  Components: runner-flink
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> I would like to Upgrade Flink Runner to 18.3 and 1.9.2 due to both the Apache 
> Flink 1.8.3 and Apache Flink 1.9.2 have been released [1]. 
> What do you think?
> [1] https://dist.apache.org/repos/dist/release/flink/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9290) runner_harness_container_image experiment is not honored in python released sdks.

2020-02-10 Thread Ankur Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka updated BEAM-9290:
---
Fix Version/s: 2.20.0

> runner_harness_container_image experiment is not honored in python released 
> sdks.
> -
>
> Key: BEAM-9290
> URL: https://issues.apache.org/jira/browse/BEAM-9290
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-core
>Reporter: Ankur Goenka
>Priority: Major
> Fix For: 2.20.0
>
>
>  
> {code:java}
> --experiments=runner_harness_container_image=foo_image{code}
> does not have any affect on the job.
>  
>  
> cc: [~tvalentyn]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-9290) runner_harness_container_image experiment is not honored in python released sdks.

2020-02-10 Thread Ankur Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka reassigned BEAM-9290:
--

Assignee: Ankur Goenka

> runner_harness_container_image experiment is not honored in python released 
> sdks.
> -
>
> Key: BEAM-9290
> URL: https://issues.apache.org/jira/browse/BEAM-9290
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
> Fix For: 2.20.0
>
>
>  
> {code:java}
> --experiments=runner_harness_container_image=foo_image{code}
> does not have any affect on the job.
>  
>  
> cc: [~tvalentyn]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9290) runner_harness_container_image experiment is not honored in python released sdks.

2020-02-10 Thread Ankur Goenka (Jira)
Ankur Goenka created BEAM-9290:
--

 Summary: runner_harness_container_image experiment is not honored 
in python released sdks.
 Key: BEAM-9290
 URL: https://issues.apache.org/jira/browse/BEAM-9290
 Project: Beam
  Issue Type: Test
  Components: runner-dataflow, sdk-py-core
Reporter: Ankur Goenka


 
{code:java}
--experiments=runner_harness_container_image=foo_image{code}
does not have any affect on the job.

 

 

cc: [~tvalentyn]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9290) runner_harness_container_image experiment is not honored in python released sdks.

2020-02-10 Thread Ankur Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka updated BEAM-9290:
---
Issue Type: Bug  (was: Test)

> runner_harness_container_image experiment is not honored in python released 
> sdks.
> -
>
> Key: BEAM-9290
> URL: https://issues.apache.org/jira/browse/BEAM-9290
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-core
>Reporter: Ankur Goenka
>Priority: Major
>
>  
> {code:java}
> --experiments=runner_harness_container_image=foo_image{code}
> does not have any affect on the job.
>  
>  
> cc: [~tvalentyn]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9287) Python Validates runner tests for Unified Worker

2020-02-10 Thread Ankur Goenka (Jira)
Ankur Goenka created BEAM-9287:
--

 Summary: Python Validates runner tests for Unified Worker
 Key: BEAM-9287
 URL: https://issues.apache.org/jira/browse/BEAM-9287
 Project: Beam
  Issue Type: Test
  Components: runner-dataflow, testing
Reporter: Ankur Goenka






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9220) Add use_runner_v2 argument for dataflow

2020-01-29 Thread Ankur Goenka (Jira)
Ankur Goenka created BEAM-9220:
--

 Summary: Add use_runner_v2 argument for dataflow
 Key: BEAM-9220
 URL: https://issues.apache.org/jira/browse/BEAM-9220
 Project: Beam
  Issue Type: New Feature
  Components: runner-dataflow
Reporter: Ankur Goenka






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8970) Spark portable runner supports Yarn

2020-01-27 Thread Ankur Goenka (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17024818#comment-17024818
 ] 

Ankur Goenka commented on BEAM-8970:


{code:java}
['--runner=SparkRunner',
'--output_executable_path=~/path/to/output.jar']
{code}

Would be the best way forward.

Once you have the jar, you can use regular jar submission mode as described in 
spark documentation here 
https://spark.apache.org/docs/latest/running-on-yarn.html
The entry class would be 


{code:java}
--class org.apache.beam.runners.spark.SparkPipelineRunner
{code}


> Spark portable runner supports Yarn
> ---
>
> Key: BEAM-8970
> URL: https://issues.apache.org/jira/browse/BEAM-8970
> Project: Beam
>  Issue Type: Wish
>  Components: runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9093) Pipeline options which with different underlying store variable does not get over written

2020-01-11 Thread Ankur Goenka (Jira)
Ankur Goenka created BEAM-9093:
--

 Summary: Pipeline options which with different underlying store 
variable does not get over written
 Key: BEAM-9093
 URL: https://issues.apache.org/jira/browse/BEAM-9093
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Ankur Goenka


Example:

PipelineOptions(flags=[],**\{'no_use_public_ips': True,})

Expectation: use_public_ips should be set False.

Actual: the value is not used as its not passed through argparser

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9002) test_flatten_same_pcollections (apache_beam.transforms.ptransform_test.PTransformTest) does not work in Streaming VR suite on Dataflow

2020-01-09 Thread Ankur Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka updated BEAM-9002:
---
Component/s: (was: sdk-py-core)
 runner-dataflow

> test_flatten_same_pcollections 
> (apache_beam.transforms.ptransform_test.PTransformTest) does not work in 
> Streaming VR suite on Dataflow
> --
>
> Key: BEAM-9002
> URL: https://issues.apache.org/jira/browse/BEAM-9002
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Valentyn Tymofieiev
>Assignee: Ankur Goenka
>Priority: Major
>
> Per investigation in https://issues.apache.org/jira/browse/BEAM-8877, the 
> test times out and was recently added to VR test suite.
> [~liumomo315], I will sickbay this test for streaming, could you please help 
> triage the failure?
> Thank you!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9002) test_flatten_same_pcollections (apache_beam.transforms.ptransform_test.PTransformTest) does not work in Streaming VR suite on Dataflow

2020-01-09 Thread Ankur Goenka (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012263#comment-17012263
 ] 

Ankur Goenka commented on BEAM-9002:


I would consider it a bug as it works on direct runner.

The resolution can be to fail such pipelines at submission time itself.

> test_flatten_same_pcollections 
> (apache_beam.transforms.ptransform_test.PTransformTest) does not work in 
> Streaming VR suite on Dataflow
> --
>
> Key: BEAM-9002
> URL: https://issues.apache.org/jira/browse/BEAM-9002
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Ankur Goenka
>Priority: Major
>
> Per investigation in https://issues.apache.org/jira/browse/BEAM-8877, the 
> test times out and was recently added to VR test suite.
> [~liumomo315], I will sickbay this test for streaming, could you please help 
> triage the failure?
> Thank you!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9081) Fail Dataflow Streaming pipeline on passert failure

2020-01-09 Thread Ankur Goenka (Jira)
Ankur Goenka created BEAM-9081:
--

 Summary: Fail Dataflow Streaming pipeline on passert failure
 Key: BEAM-9081
 URL: https://issues.apache.org/jira/browse/BEAM-9081
 Project: Beam
  Issue Type: Bug
  Components: testing
Reporter: Ankur Goenka


Dataflow retries a failed work item indefinitely.

As PAsserts are translated to PTransforms, A failed PAssert will be retried 
indefinitely in Streaming mode.

So we should stop fail the pipeline in case of PAssert failure instead of 
waiting for test timeout.

 

Example job 
[https://pantheon.corp.google.com/dataflow/jobsDetail/locations/us-west1/jobs/2020-01-09_13_25_08-2233620052523810338?project=apache-beam-testing]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9002) test_flatten_same_pcollections (apache_beam.transforms.ptransform_test.PTransformTest) does not work in Streaming VR suite on Dataflow

2020-01-09 Thread Ankur Goenka (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012243#comment-17012243
 ] 

Ankur Goenka commented on BEAM-9002:


Based on this error stack, same Pcollection flatten is not working on dataflow

 
{
  job: "2020-01-09_13_25_08-2233620052523810338"   
  logger: 
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py:171"
   
  message: "Error processing instruction -364. Original traceback is
Traceback (most recent call last):
  File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
 line 165, in _execute
response = task()
  File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
 line 221, in 
lambda: self.create_worker().do_instruction(request), request)
  File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
 line 350, in do_instruction
request.instruction_id)
  File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
 line 384, in process_bundle
bundle_processor.process_bundle(instruction_id))
  File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/bundle_processor.py",
 line 811, in process_bundle
data.transform_id].process_encoded(data.data)
  File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/bundle_processor.py",
 line 206, in process_encoded
self.output(decoded_value)
  File "apache_beam/runners/worker/operations.py", line 302, in 
apache_beam.runners.worker.operations.Operation.output
def output(self, windowed_value, output_index=0):
  File "apache_beam/runners/worker/operations.py", line 304, in 
apache_beam.runners.worker.operations.Operation.output
cython.cast(Receiver, self.receivers[output_index]).receive(windowed_value)
  File "apache_beam/runners/worker/operations.py", line 178, in 
apache_beam.runners.worker.operations.SingletonConsumerSet.receive
self.consumer.process(windowed_value)
  File "apache_beam/runners/worker/operations.py", line 657, in 
apache_beam.runners.worker.operations.DoOperation.process
with self.scoped_process_state:
  File "apache_beam/runners/worker/operations.py", line 658, in 
apache_beam.runners.worker.operations.DoOperation.process
delayed_application = self.dofn_receiver.receive(o)
  File "apache_beam/runners/common.py", line 876, in 
apache_beam.runners.common.DoFnRunner.receive
self.process(windowed_value)
  File "apache_beam/runners/common.py", line 883, in 
apache_beam.runners.common.DoFnRunner.process
self._reraise_augmented(exn)
  File "apache_beam/runners/common.py", line 937, in 
apache_beam.runners.common.DoFnRunner._reraise_augmented
raise
  File "apache_beam/runners/common.py", line 881, in 
apache_beam.runners.common.DoFnRunner.process
return self.do_fn_invoker.invoke_process(windowed_value)
  File "apache_beam/runners/common.py", line 670, in 
apache_beam.runners.common.PerWindowInvoker.invoke_process
self._invoke_process_per_window(
  File "apache_beam/runners/common.py", line 748, in 
apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
output_processor.process_outputs(
  File "apache_beam/runners/common.py", line 984, in 
apache_beam.runners.common._OutputProcessor.process_outputs
def process_outputs(self, windowed_input_element, results):
  File "apache_beam/runners/common.py", line 1024, in 
apache_beam.runners.common._OutputProcessor.process_outputs
self.main_receivers.receive(windowed_value)
  File "apache_beam/runners/worker/operations.py", line 178, in 
apache_beam.runners.worker.operations.SingletonConsumerSet.receive
self.consumer.process(windowed_value)
  File "apache_beam/runners/worker/operations.py", line 657, in 
apache_beam.runners.worker.operations.DoOperation.process
with self.scoped_process_state:
  File "apache_beam/runners/worker/operations.py", line 658, in 
apache_beam.runners.worker.operations.DoOperation.process
delayed_application = self.dofn_receiver.receive(o)
  File "apache_beam/runners/common.py", line 876, in 
apache_beam.runners.common.DoFnRunner.receive
self.process(windowed_value)
  File "apache_beam/runners/common.py", line 883, in 
apache_beam.runners.common.DoFnRunner.process
self._reraise_augmented(exn)
  File "apache_beam/runners/common.py", line 937, in 
apache_beam.runners.common.DoFnRunner._reraise_augmented
raise
  File "apache_beam/runners/common.py", line 881, in 
apache_beam.runners.common.DoFnRunner.process
return self.do_fn_invoker.invoke_process(windowed_value)
  File "apache_beam/runners/common.py", line 496, in 
apache_beam.runners.common.SimpleInvoker.invoke_process
output_processor.process_outputs(
  File "apache_beam/runners/common.py", line 1024, in 
apache_beam.runners.common._OutputProcessor.process_outputs
self.main_receivers.receive(windowed_value)
  

[jira] [Commented] (BEAM-9002) test_flatten_same_pcollections (apache_beam.transforms.ptransform_test.PTransformTest) does not work in Streaming VR suite on Dataflow

2020-01-09 Thread Ankur Goenka (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012242#comment-17012242
 ] 

Ankur Goenka commented on BEAM-9002:


Dataflow streaming VR tests are actually timing out in case of assertion error 
as steaming pipelines have indefinite retry for bundle failure.

 

We should also introduce early failure for streaming pipelines in case of 
passert failure.

> test_flatten_same_pcollections 
> (apache_beam.transforms.ptransform_test.PTransformTest) does not work in 
> Streaming VR suite on Dataflow
> --
>
> Key: BEAM-9002
> URL: https://issues.apache.org/jira/browse/BEAM-9002
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Ankur Goenka
>Priority: Major
>
> Per investigation in https://issues.apache.org/jira/browse/BEAM-8877, the 
> test times out and was recently added to VR test suite.
> [~liumomo315], I will sickbay this test for streaming, could you please help 
> triage the failure?
> Thank you!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9002) test_flatten_same_pcollections (apache_beam.transforms.ptransform_test.PTransformTest) does not work in Streaming VR suite on Dataflow

2020-01-09 Thread Ankur Goenka (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012203#comment-17012203
 ] 

Ankur Goenka commented on BEAM-9002:


The test should work fine unless there is a bug.

I think timeout is a manifestation of that bug if their is one.

> test_flatten_same_pcollections 
> (apache_beam.transforms.ptransform_test.PTransformTest) does not work in 
> Streaming VR suite on Dataflow
> --
>
> Key: BEAM-9002
> URL: https://issues.apache.org/jira/browse/BEAM-9002
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Ankur Goenka
>Priority: Major
>
> Per investigation in https://issues.apache.org/jira/browse/BEAM-8877, the 
> test times out and was recently added to VR test suite.
> [~liumomo315], I will sickbay this test for streaming, could you please help 
> triage the failure?
> Thank you!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-9002) test_flatten_same_pcollections (apache_beam.transforms.ptransform_test.PTransformTest) does not work in Streaming VR suite on Dataflow

2020-01-09 Thread Ankur Goenka (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012203#comment-17012203
 ] 

Ankur Goenka edited comment on BEAM-9002 at 1/9/20 8:23 PM:


The test should work fine unless there is a bug.

I think timeout is a manifestation of that bug if their is one.

So I think we should investigate it.


was (Author: angoenka):
The test should work fine unless there is a bug.

I think timeout is a manifestation of that bug if their is one.

> test_flatten_same_pcollections 
> (apache_beam.transforms.ptransform_test.PTransformTest) does not work in 
> Streaming VR suite on Dataflow
> --
>
> Key: BEAM-9002
> URL: https://issues.apache.org/jira/browse/BEAM-9002
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Ankur Goenka
>Priority: Major
>
> Per investigation in https://issues.apache.org/jira/browse/BEAM-8877, the 
> test times out and was recently added to VR test suite.
> [~liumomo315], I will sickbay this test for streaming, could you please help 
> triage the failure?
> Thank you!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8413) test_streaming_pipeline_returns_expected_user_metrics_fnapi_it failed on latest PostCommit Py36

2020-01-07 Thread Ankur Goenka (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010224#comment-17010224
 ] 

Ankur Goenka commented on BEAM-8413:


The latest runs are passing 
[https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/5455/testReport/apache_beam.runners.dataflow.dataflow_exercise_streaming_metrics_pipeline_test/ExerciseStreamingMetricsPipelineTest/]
 
[https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/5455/testReport/apache_beam.runners.dataflow.dataflow_exercise_streaming_metrics_pipeline_test/ExerciseStreamingMetricsPipelineTest/]
 

and older runs were cleaned.

> test_streaming_pipeline_returns_expected_user_metrics_fnapi_it  failed on 
> latest PostCommit Py36 
> -
>
> Key: BEAM-8413
> URL: https://issues.apache.org/jira/browse/BEAM-8413
> Project: Beam
>  Issue Type: New Feature
>  Components: test-failures
>Reporter: Boyuan Zhang
>Assignee: Ankur Goenka
>Priority: Major
>
> https://builds.apache.org/job/beam_PostCommit_Python36/731/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8945) DirectStreamObserver race condition

2020-01-07 Thread Ankur Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka updated BEAM-8945:
---
Description: 
The DirectStreamObserver can get into a dead lock if the channel become 
unhealthy of is not ready. An extended period of unhealthyness should result 
into failure.

This is supported by following thread dumps where we see that 1 thread is 
having on getting the lock on actual stream observer while the remaining worker 
threads are waiting on the lock on the stream observer.
 The thread which is having lock on stream observer is probably in the while 
loop because the outboundObserver is not ready.
 Their is also 1 thread which is waiting to execute onError which means that 
the stream observer has become unhealthy and probably never going to get ready.

100s of threads are blocked on:

 
 
org.apache.beam.sdk.fn.stream.SynchronizedStreamObserver.onNext(SynchronizedStreamObserver.java:46)
 
org.apache.beam.runners.fnexecution.control.FnApiControlClient.handle(FnApiControlClient.java:84)
 
org.apache.beam.runners.dataflow.worker.fn.control.RegisterAndProcessBundleOperation.getProcessBundleProgress(RegisterAndProcessBundleOperation.java:393)
 
org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor$SingularProcessBundleProgressTracker.updateProgress(BeamFnMapTaskExecutor.java:347)
 
org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor$SingularProcessBundleProgressTracker.periodicProgressUpdate(BeamFnMapTaskExecutor.java:334)
 
org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor$SingularProcessBundleProgressTracker$$Lambda$107/1297335196.run(Unknown
 Source)
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 java.lang.Thread.run(Thread.java:745)

 

 

One thread having the lock:

State: TIMED_WAITING stack: —
 sun.misc.Unsafe.park(Native Method)
 java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
 java.util.concurrent.Phaser$QNode.block(Phaser.java:1142)
 java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
 java.util.concurrent.Phaser.internalAwaitAdvance(Phaser.java:1067)
 java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:796)
 
org.apache.beam.sdk.fn.stream.DirectStreamObserver.onNext(DirectStreamObserver.java:70)
 
org.apache.beam.sdk.fn.stream.SynchronizedStreamObserver.onNext(SynchronizedStreamObserver.java:46)
 
org.apache.beam.runners.fnexecution.control.FnApiControlClient.handle(FnApiControlClient.java:84)
 
org.apache.beam.runners.dataflow.worker.fn.control.RegisterAndProcessBundleOperation.getProcessBundleProgress(RegisterAndProcessBundleOperation.java:393)
 
org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor$SingularProcessBundleProgressTracker.updateProgress(BeamFnMapTaskExecutor.java:347)
 
org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor$SingularProcessBundleProgressTracker.periodicProgressUpdate(BeamFnMapTaskExecutor.java:334)
 
org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor$SingularProcessBundleProgressTracker$$Lambda$107/1297335196.run(Unknown
 Source)
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 java.lang.Thread.run(Thread.java:745)

 

 

One thread waiting to execute onError

State: BLOCKED stack: —
 
org.apache.beam.sdk.fn.stream.SynchronizedStreamObserver.onError(SynchronizedStreamObserver.java:53)
 
org.apache.beam.runners.fnexecution.control.FnApiControlClient.closeAndTerminateOutstandingRequests(FnApiControlClient.java:117)
 
org.apache.beam.runners.fnexecution.control.FnApiControlClient.access$300(FnApiControlClient.java:49)
 
org.apache.beam.runners.fnexecution.control.FnApiControlClient$ResponseStreamObserver.onError(FnApiControlClient.java:174)
 
org.apache.beam.vendor.grpc.v1p21p0.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onCancel(ServerCalls.java:270)
 

[jira] [Updated] (BEAM-6499) Support HDFS for artifact staging with Flink Runner

2020-01-07 Thread Ankur Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka updated BEAM-6499:
---
Issue Type: New Feature  (was: Bug)

> Support HDFS for artifact staging with Flink Runner
> ---
>
> Key: BEAM-6499
> URL: https://issues.apache.org/jira/browse/BEAM-6499
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>
> Based on the PR [https://github.com/apache/beam/pull/5806/]
> We should enable HDFS for artifact staging.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-6499) Support HDFS for artifact staging with Flink Runner

2020-01-07 Thread Ankur Goenka (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-6499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010221#comment-17010221
 ] 

Ankur Goenka commented on BEAM-6499:


We have introduced uber jar mode of pipeline submission which eliminates the 
need for having a separate dfs to distribute files.

 

Changing the bug to feature request.

> Support HDFS for artifact staging with Flink Runner
> ---
>
> Key: BEAM-6499
> URL: https://issues.apache.org/jira/browse/BEAM-6499
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>
> Based on the PR [https://github.com/apache/beam/pull/5806/]
> We should enable HDFS for artifact staging.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-5156) Apache Beam on dataflow runner can't find Tensorflow for workers

2020-01-07 Thread Ankur Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka resolved BEAM-5156.

Fix Version/s: (was: 2.6.0)
   (was: 2.5.0)
   2.17.0
   Resolution: Fixed

> Apache Beam on dataflow runner can't find Tensorflow for workers
> 
>
> Key: BEAM-5156
> URL: https://issues.apache.org/jira/browse/BEAM-5156
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
> Environment: google cloud compute instance running linux
>Reporter: Thomas Johns
>Assignee: Ankur Goenka
>Priority: Major
>  Labels: triaged
> Fix For: 2.17.0
>
>
> Adding serialized tensorflow model to apache beam pipeline with python sdk 
> but it can not find any version of tensorflow when applied to dataflow runner 
> although it is not a problem locally. Tried various versions of tensorflow 
> from 1.6 to 1.10. I thought it might be a conflicting package some where so I 
> removed all other packages and tried to just install tensorflow and same 
> problem.
> Could not find a version that satisfies the requirement tensorflow==1.6.0 
> (from -r reqtest.txt (line 59)) (from versions: )No matching distribution 
> found for tensorflow==1.6.0 (from -r reqtest.txt (line 59))



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-5156) Apache Beam on dataflow runner can't find Tensorflow for workers

2020-01-07 Thread Ankur Goenka (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010220#comment-17010220
 ] 

Ankur Goenka commented on BEAM-5156:


We have moved to an unbounded thread factory and this issue does not require 
any additional arguments to be passed.

> Apache Beam on dataflow runner can't find Tensorflow for workers
> 
>
> Key: BEAM-5156
> URL: https://issues.apache.org/jira/browse/BEAM-5156
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
> Environment: google cloud compute instance running linux
>Reporter: Thomas Johns
>Assignee: Ankur Goenka
>Priority: Major
>  Labels: triaged
> Fix For: 2.5.0, 2.6.0
>
>
> Adding serialized tensorflow model to apache beam pipeline with python sdk 
> but it can not find any version of tensorflow when applied to dataflow runner 
> although it is not a problem locally. Tried various versions of tensorflow 
> from 1.6 to 1.10. I thought it might be a conflicting package some where so I 
> removed all other packages and tried to just install tensorflow and same 
> problem.
> Could not find a version that satisfies the requirement tensorflow==1.6.0 
> (from -r reqtest.txt (line 59)) (from versions: )No matching distribution 
> found for tensorflow==1.6.0 (from -r reqtest.txt (line 59))



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-7243) Release python{36,37}-fnapi containers images required by Dataflow runner.

2020-01-07 Thread Ankur Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka resolved BEAM-7243.

Fix Version/s: 2.16.0
   Resolution: Fixed

> Release python{36,37}-fnapi containers images required by Dataflow runner. 
> ---
>
> Key: BEAM-7243
> URL: https://issues.apache.org/jira/browse/BEAM-7243
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Valentyn Tymofieiev
>Assignee: Ankur Goenka
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> We have not yet released all container images for currently used dev fnapi 
> containers [1]. 
> For example:
> :~$ docker pull 
> gcr.io/cloud-dataflow/v1beta3/python36-fnapi:beam-master-20190213
> Error response from daemon: manifest for 
> gcr.io/cloud-dataflow/v1beta3/python36-fnapi:beam-master-20190213 not found
> This causes failures in Python streaming postcommit tests on Dataflow runner 
> for Python 3.6 and higher versions. 
> We need to release the containers and update names.py.
> [~angoenka], I think you were planning an update of dev containers that will 
> also take care of this. If you don't plan to do that or need help, please 
> reassign the issue back to me. Thanks!
> [1] 
> https://github.com/apache/beam/blob/79a463784fce36c12292b4e642238ef124c184e0/sdks/python/apache_beam/runners/dataflow/internal/names.py#L44



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-7243) Release python{36,37}-fnapi containers images required by Dataflow runner.

2020-01-07 Thread Ankur Goenka (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010217#comment-17010217
 ] 

Ankur Goenka commented on BEAM-7243:


The PR was committed and the latest container is available for pull 

docker pull gcr.io/cloud-dataflow/v1beta3/python36-fnapi:beam-master-20191220

> Release python{36,37}-fnapi containers images required by Dataflow runner. 
> ---
>
> Key: BEAM-7243
> URL: https://issues.apache.org/jira/browse/BEAM-7243
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Valentyn Tymofieiev
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> We have not yet released all container images for currently used dev fnapi 
> containers [1]. 
> For example:
> :~$ docker pull 
> gcr.io/cloud-dataflow/v1beta3/python36-fnapi:beam-master-20190213
> Error response from daemon: manifest for 
> gcr.io/cloud-dataflow/v1beta3/python36-fnapi:beam-master-20190213 not found
> This causes failures in Python streaming postcommit tests on Dataflow runner 
> for Python 3.6 and higher versions. 
> We need to release the containers and update names.py.
> [~angoenka], I think you were planning an update of dev containers that will 
> also take care of this. If you don't plan to do that or need help, please 
> reassign the issue back to me. Thanks!
> [1] 
> https://github.com/apache/beam/blob/79a463784fce36c12292b4e642238ef124c184e0/sdks/python/apache_beam/runners/dataflow/internal/names.py#L44



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9046) Kafka connector for Python throws ClassCastException when reading KafkaRecord

2020-01-07 Thread Ankur Goenka (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010192#comment-17010192
 ] 

Ankur Goenka commented on BEAM-9046:


cc: [~mxm] [~chamikara]

> Kafka connector for Python throws ClassCastException when reading KafkaRecord
> -
>
> Key: BEAM-9046
> URL: https://issues.apache.org/jira/browse/BEAM-9046
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-kafka
>Affects Versions: 2.16.0
>Reporter: Berkay Öztürk
>Priority: Major
>  Labels: KafkaIO, Python
>
>  I'm trying to read the data streaming from Apache Kafka using the Python SDK 
> for Apache Beam with the Flink runner. After running Kafka 2.4.0 and Flink 
> 1.8.3, I follow these steps:
>  * Compile and run Beam 2.16 with Flink 1.8 runner.
> {code:java}
> git clone --single-branch --branch release-2.16.0 
> https://github.com/apache/beam.git beam-2.16.0
> cd beam-2.16.0
> nohup ./gradlew :runners:flink:1.8:job-server:runShadow 
> -PflinkMasterUrl=localhost:8081 &
> {code}
>  * Run the Python pipeline.
> {code:python}
> from apache_beam import Pipeline
> from apache_beam.io.external.kafka import ReadFromKafka
> from apache_beam.options.pipeline_options import PipelineOptions
> if __name__ == '__main__':
> with Pipeline(options=PipelineOptions([
> '--runner=FlinkRunner',
> '--flink_version=1.8',
> '--flink_master_url=localhost:8081',
> '--environment_type=LOOPBACK',
> '--streaming'
> ])) as pipeline:
> (
> pipeline
> | 'read' >> ReadFromKafka({'bootstrap.servers': 
> 'localhost:9092'}, ['test'])  # [BEAM-3788] ???
> )
> result = pipeline.run()
> result.wait_until_finish()
> {code}
>  * Publish some data to Kafka.
> {code:java}
> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
> >{"hello":"world!"}
> {code}
> The Python script throws this error:
> {code:java}
> [flink-runner-job-invoker] ERROR 
> org.apache.beam.runners.fnexecution.jobsubmission.JobInvocation - Error 
> during job invocation BeamApp-USER-somejob. 
> org.apache.flink.client.program.ProgramInvocationException: Job failed. 
> (JobID: xxx)
> at 
> org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:268)
> at 
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:483)
> at 
> org.apache.beam.runners.flink.FlinkExecutionEnvironments$BeamFlinkRemoteStreamEnvironment.executeRemotely(FlinkExecutionEnvironments.java:360)
> at 
> org.apache.flink.streaming.api.environment.RemoteStreamEnvironment.execute(RemoteStreamEnvironment.java:310)
> at 
> org.apache.beam.runners.flink.FlinkStreamingPortablePipelineTranslator$StreamingTranslationContext.execute(FlinkStreamingPortablePipelineTranslator.java:173)
> at 
> org.apache.beam.runners.flink.FlinkPipelineRunner.runPipelineWithTranslator(FlinkPipelineRunner.java:104)
> at 
> org.apache.beam.runners.flink.FlinkPipelineRunner.run(FlinkPipelineRunner.java:80)
> at 
> org.apache.beam.runners.fnexecution.jobsubmission.JobInvocation.runPipeline(JobInvocation.java:78)
> at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
> at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
> at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.flink.runtime.client.JobExecutionException: Job 
> execution failed.
> at 
> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146)
> at 
> org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:265)
> ... 13 more
> Caused by: java.lang.ClassCastException: 
> org.apache.beam.sdk.io.kafka.KafkaRecord cannot be cast to [B
> at 
> org.apache.beam.sdk.coders.ByteArrayCoder.encode(ByteArrayCoder.java:41)
> at 
> org.apache.beam.sdk.coders.LengthPrefixCoder.encode(LengthPrefixCoder.java:56)
> at 
> org.apache.beam.sdk.values.ValueWithRecordId$ValueWithRecordIdCoder.encode(ValueWithRecordId.java:105)
> at 
> org.apache.beam.sdk.values.ValueWithRecordId$ValueWithRecordIdCoder.encode(ValueWithRecordId.java:81)

[jira] [Commented] (BEAM-8980) Running GroupByKeyLoadTest on Portable Flink fails

2019-12-17 Thread Ankur Goenka (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998487#comment-16998487
 ] 

Ankur Goenka commented on BEAM-8980:


The stack trace is generally not the root cause as it is printed on grpc 
termination.
Please link the jenkins/gradle link for the job for complete logs.

> Running GroupByKeyLoadTest on Portable Flink fails
> --
>
> Key: BEAM-8980
> URL: https://issues.apache.org/jira/browse/BEAM-8980
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Michal Walenia
>Priority: Major
>
> When running a GBK Load test using Java harness image and JobServer image 
> generated from master, the load test fails with a cryptic exception:
> {code:java}
> Exception in thread "main" java.lang.RuntimeException: Invalid job state: 
> FAILED.
> 11:45:31  at 
> org.apache.beam.sdk.loadtests.JobFailure.handleFailure(JobFailure.java:55)
> 11:45:31  at org.apache.beam.sdk.loadtests.LoadTest.run(LoadTest.java:106)
> 11:45:31  at 
> org.apache.beam.sdk.loadtests.CombineLoadTest.run(CombineLoadTest.java:66)
> 11:45:31  at 
> org.apache.beam.sdk.loadtests.CombineLoadTest.main(CombineLoadTest.java:169)
> {code}
>  
> After some investigation, I found a stacktrace of the error:
> {code:java}
> org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: 
> CANCELLED: call already 
> cancelledorg.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: 
> CANCELLED: call already cancelled at 
> org.apache.beam.vendor.grpc.v1p21p0.io.grpc.Status.asRuntimeException(Status.java:524)
>  at 
> org.apache.beam.vendor.grpc.v1p21p0.io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onNext(ServerCalls.java:339)
>  at 
> org.apache.beam.sdk.fn.stream.DirectStreamObserver.onNext(DirectStreamObserver.java:98)
>  at 
> org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.flush(BeamFnDataSizeBasedBufferingOutboundObserver.java:90)
>  at 
> org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.accept(BeamFnDataSizeBasedBufferingOutboundObserver.java:102)
>  at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.processElements(FlinkExecutableStageFunction.java:278)
>  at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:201)
>  at 
> org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriver.java:103)
>  at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504) at 
> org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369) at 
> org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705) at 
> org.apache.flink.runtime.taskmanager.Task.run(Task.java:530) at 
> java.lang.Thread.run(Thread.java:748) Suppressed: 
> org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: 
> CANCELLED: call already cancelled at 
> org.apache.beam.vendor.grpc.v1p21p0.io.grpc.Status.asRuntimeException(Status.java:524)
>  at 
> org.apache.beam.vendor.grpc.v1p21p0.io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onNext(ServerCalls.java:339)
>  at 
> org.apache.beam.sdk.fn.stream.DirectStreamObserver.onNext(DirectStreamObserver.java:98)
>  at 
> org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.close(BeamFnDataSizeBasedBufferingOutboundObserver.java:84)
>  at 
> org.apache.beam.runners.fnexecution.control.SdkHarnessClient$ActiveBundle.close(SdkHarnessClient.java:298)
>  at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.$closeResource(FlinkExecutableStageFunction.java:202)
>  at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:202)
>  ... 6 more Suppressed: java.lang.IllegalStateException: Processing bundle 
> failed, TODO: [BEAM-3962] abort bundle. at 
> org.apache.beam.runners.fnexecution.control.SdkHarnessClient$ActiveBundle.close(SdkHarnessClient.java:320)
>  ... 8 more
> {code}
> It seems that the core issue is an IllegalStateException thrown from 
> SdkHarnessClient.java:320, related to BEAM-3962.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8945) DirectStreamObserver race condition

2019-12-10 Thread Ankur Goenka (Jira)
Ankur Goenka created BEAM-8945:
--

 Summary: DirectStreamObserver race condition
 Key: BEAM-8945
 URL: https://issues.apache.org/jira/browse/BEAM-8945
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-harness
Affects Versions: 2.16.0
Reporter: Ankur Goenka
Assignee: Ankur Goenka


The DirectStreamObserver can bet into a dead lock if the channel become 
unhealthy of is not ready. An extended period of unhealthyness should result 
into failure.

This is supported by following thread dumps where we see that 1 thread is 
having on getting the lock on actual stream observer while the remaining worker 
threads are waiting on the lock on the stream observer.
 The thread which is having lock on stream observer is probably in the while 
loop because the outboundObserver is not ready.
 Their is also 1 thread which is waiting to execute onError which means that 
the stream observer has become unhealthy and probably never going to get ready.

100s of threads are blocked on:

 
 
org.apache.beam.sdk.fn.stream.SynchronizedStreamObserver.onNext(SynchronizedStreamObserver.java:46)
 
org.apache.beam.runners.fnexecution.control.FnApiControlClient.handle(FnApiControlClient.java:84)
 
org.apache.beam.runners.dataflow.worker.fn.control.RegisterAndProcessBundleOperation.getProcessBundleProgress(RegisterAndProcessBundleOperation.java:393)
 
org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor$SingularProcessBundleProgressTracker.updateProgress(BeamFnMapTaskExecutor.java:347)
 
org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor$SingularProcessBundleProgressTracker.periodicProgressUpdate(BeamFnMapTaskExecutor.java:334)
 
org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor$SingularProcessBundleProgressTracker$$Lambda$107/1297335196.run(Unknown
 Source)
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 java.lang.Thread.run(Thread.java:745)

 

 

One thread having the lock:

State: TIMED_WAITING stack: ---
 sun.misc.Unsafe.park(Native Method)
 java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
 java.util.concurrent.Phaser$QNode.block(Phaser.java:1142)
 java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
 java.util.concurrent.Phaser.internalAwaitAdvance(Phaser.java:1067)
 java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:796)
 
org.apache.beam.sdk.fn.stream.DirectStreamObserver.onNext(DirectStreamObserver.java:70)
 
org.apache.beam.sdk.fn.stream.SynchronizedStreamObserver.onNext(SynchronizedStreamObserver.java:46)
 
org.apache.beam.runners.fnexecution.control.FnApiControlClient.handle(FnApiControlClient.java:84)
 
org.apache.beam.runners.dataflow.worker.fn.control.RegisterAndProcessBundleOperation.getProcessBundleProgress(RegisterAndProcessBundleOperation.java:393)
 
org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor$SingularProcessBundleProgressTracker.updateProgress(BeamFnMapTaskExecutor.java:347)
 
org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor$SingularProcessBundleProgressTracker.periodicProgressUpdate(BeamFnMapTaskExecutor.java:334)
 
org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor$SingularProcessBundleProgressTracker$$Lambda$107/1297335196.run(Unknown
 Source)
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 java.lang.Thread.run(Thread.java:745)

 

 

One thread waiting to execute onError

State: BLOCKED stack: ---
 
org.apache.beam.sdk.fn.stream.SynchronizedStreamObserver.onError(SynchronizedStreamObserver.java:53)
 
org.apache.beam.runners.fnexecution.control.FnApiControlClient.closeAndTerminateOutstandingRequests(FnApiControlClient.java:117)
 
org.apache.beam.runners.fnexecution.control.FnApiControlClient.access$300(FnApiControlClient.java:49)
 

[jira] [Comment Edited] (BEAM-6499) Support HDFS for artifact staging with Flink Runner

2019-12-06 Thread Ankur Goenka (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-6499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990239#comment-16990239
 ] 

Ankur Goenka edited comment on BEAM-6499 at 12/7/19 12:09 AM:
--

We were not able to prioritize it yet and don't have a ETA yet.
Any help would be appreciated. 

Let me know if I can help you get started :) 


was (Author: angoenka):
We were not able to prioritize it yet and don't have a ETA yet.

> Support HDFS for artifact staging with Flink Runner
> ---
>
> Key: BEAM-6499
> URL: https://issues.apache.org/jira/browse/BEAM-6499
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>
> Based on the PR [https://github.com/apache/beam/pull/5806/]
> We should enable HDFS for artifact staging.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-6499) Support HDFS for artifact staging with Flink Runner

2019-12-06 Thread Ankur Goenka (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-6499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990239#comment-16990239
 ] 

Ankur Goenka commented on BEAM-6499:


We were not able to prioritize it yet and don't have a ETA yet.

> Support HDFS for artifact staging with Flink Runner
> ---
>
> Key: BEAM-6499
> URL: https://issues.apache.org/jira/browse/BEAM-6499
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>
> Based on the PR [https://github.com/apache/beam/pull/5806/]
> We should enable HDFS for artifact staging.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8479) Python pattern for side input

2019-10-24 Thread Ankur Goenka (Jira)
Ankur Goenka created BEAM-8479:
--

 Summary: Python pattern for side input
 Key: BEAM-8479
 URL: https://issues.apache.org/jira/browse/BEAM-8479
 Project: Beam
  Issue Type: Bug
  Components: website
Reporter: Ankur Goenka


[https://beam.apache.org/documentation/patterns/side-inputs/] has pattern for 
java pipeline. 

 

Add example for python pipeline



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8477) Prioritize precommit over post commit jobs

2019-10-24 Thread Ankur Goenka (Jira)
Ankur Goenka created BEAM-8477:
--

 Summary: Prioritize precommit over post commit jobs
 Key: BEAM-8477
 URL: https://issues.apache.org/jira/browse/BEAM-8477
 Project: Beam
  Issue Type: Bug
  Components: testing
Reporter: Ankur Goenka


Precommit give an immediate signal to the health of the PR and are more time 
sensitive as it gives feedback to the contributor.

 

It will be good if we can prioritize precommit or manually triggered jenkins 
job over automated post commit jobs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8476) Flag to skip triggering precommit jobs

2019-10-24 Thread Ankur Goenka (Jira)
Ankur Goenka created BEAM-8476:
--

 Summary: Flag to skip triggering precommit jobs
 Key: BEAM-8476
 URL: https://issues.apache.org/jira/browse/BEAM-8476
 Project: Beam
  Issue Type: Bug
  Components: testing
Reporter: Ankur Goenka


Accidental/incomplete PR updates trigger a pre commit jobs on jenkins consuming 
resources.

We can introduce some mechanism to not trigger jenkins job in certain cases.

Example: WIP in PR title can skip jenkins precommit jobs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8475) Kill old jenkins job for a PR when PR is updated

2019-10-24 Thread Ankur Goenka (Jira)
Ankur Goenka created BEAM-8475:
--

 Summary: Kill old jenkins job for a PR when PR is updated
 Key: BEAM-8475
 URL: https://issues.apache.org/jira/browse/BEAM-8475
 Project: Beam
  Issue Type: Bug
  Components: testing
Reporter: Ankur Goenka


Once the a PR is update, old jenkins job becomes obsolete so their is no point 
in running them to completion. 

It will save some resources to kill the old jenkins job once PR is updated



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8183) Optionally bundle multiple pipelines into a single Flink jar

2019-10-02 Thread Ankur Goenka (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943201#comment-16943201
 ] 

Ankur Goenka commented on BEAM-8183:


I agree, changing a bit of configuration in the proto will serve a lot of use 
cases. A few can be the input/output data file etc.
{quote} 
 You are correct that the Python entry point / driver program would need to be 
(re)executed for a fully generic solution. But that's not necessary for the 
majority of use cases. Those are artifact + configuration. If there is a way to 
parameterize configuration values in the proto, we can address that majority of 
use cases with a single job jar artifact.
{quote}
Will 
[value_provider|[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/options/value_provider.py]]
 help in this case? Dataflow templates use this.

Also, we can enhance the driver class to swap the actual option values in the 
options proto to parameters provided at the submission time.
{quote} 
 But beyond that we also have (in our infrastructure) the use case of multiple 
entry points that the user can pick at submit time.
  
{quote}
 
 Thats a valid usecase. I can't imagine a good way to model it in beam as all 
the beam notions are build considering a single pipeline at a time. Will a 
shell script capable of merging merging the jars for different pipeline.

I think a pipeline docker can resolve a lot of these issues as it will be 
capable of running the submission code in a consistent manner based on the 
arguments provided.

> Optionally bundle multiple pipelines into a single Flink jar
> 
>
> Key: BEAM-8183
> URL: https://issues.apache.org/jira/browse/BEAM-8183
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>
> [https://github.com/apache/beam/pull/9331#issuecomment-526734851]
> "With Flink you can bundle multiple entry points into the same jar file and 
> specify which one to use with optional flags. It may be desirable to allow 
> inclusion of multiple pipelines for this tool also, although that would 
> require a different workflow. Absent this option, it becomes quite convoluted 
> for users that need the flexibility to choose which pipeline to launch at 
> submission time."



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8183) Optionally bundle multiple pipelines into a single Flink jar

2019-10-02 Thread Ankur Goenka (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943161#comment-16943161
 ] 

Ankur Goenka commented on BEAM-8183:


I see. Thanks for explaining the use case.

I think hardcoded pipeline options are definitely a limitation as of now. We 
can see how we can use Beam's ValueProvider to give dynamic arguments. We can 
also think of overwriting the pipeline options when submitting the jar to fink.
{quote}Running the same pipeline in different environments with different 
parameters is a common need. Virtually everyone has dev/staging/prod or 
whatever their environments are and they want to use the same build artifact. 
That normally requires some amount of parameterization.
{quote}
I don't really have a good solution for dev/staging/prod use case. This is not 
going to be solved by jar with multiple pipelines (as each pipeline will have a 
static set of pipeline options) but by a jar creating dynamic pipelines (as the 
pipeline changes based on the pipeline options and environment).

The major issue to me seems to be that we need to execute pipeline construction 
code which is environment dependent. To generate new pipelines for an 
environment, we need to execute the pipeline submission code in that 
environment. And this is where I see a problem. Python pipelines have to 
execute user code in python using python sdk to construct the pipeline.

Considering this jar as the artifact would not be idle for different 
environment as the actual sdk/lib etc can differ between environments. From 
environment point of view, a docker container capable of submitting the 
pipeline should be an artifact as it has all the dependencies bundled in it and 
is capable of executing code with consistent dependencies. And if we don't want 
consistent dependency across environment, then pipeline code should be 
considered as an artifact as it can work with different dependencies.

 

For context, In dataflow we pack multiple pipeline in a single jar for java and 
for python we generate separate par for each pipeline (We do publish them as a 
single mpm). Further, this does not materialize the pipeline but create an 
executable which is later used in an environment having the right sdk 
installed. The submission process just runs the "python test_pipeline.par 
--runner=DataflowRunner --apiary=testapiary" which goes though dataflow job 
submission api and is submitted as a regular dataflow job.

This is similar to docker model just that instead of docker we use par file and 
execute it using python/java.
{quote}The other use case is bundling multiple pipelines into the same 
container and select which to run at launch time.
{quote}
This will save some space at the time of deployment. Specifically the jobserver 
jar and pipeline staged artifacts if they are shared. We don't really 
introspect the staged artifacts so we don't know what can be shared and what 
can't across pipelines. I think a better approach would be to just write a 
separate script to merge multiple pipeline jars (jar with single pipeline) and 
replace main class to consider the name of the pipeline to pick the right 
proto. The script can be infrastructure infrastructure aware and can make the 
appropriate lib changes. Beam does not have a notion of multiple pipelines in 
any sense so it will be interesting to see how we model this if we decide to 
introduce it in beam.

Note: As the pipelines are materialized, they will still not work across 
environments.

 

Please let me know if you have any ideas for solution this.

 

> Optionally bundle multiple pipelines into a single Flink jar
> 
>
> Key: BEAM-8183
> URL: https://issues.apache.org/jira/browse/BEAM-8183
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>
> [https://github.com/apache/beam/pull/9331#issuecomment-526734851]
> "With Flink you can bundle multiple entry points into the same jar file and 
> specify which one to use with optional flags. It may be desirable to allow 
> inclusion of multiple pipelines for this tool also, although that would 
> require a different workflow. Absent this option, it becomes quite convoluted 
> for users that need the flexibility to choose which pipeline to launch at 
> submission time."



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8183) Optionally bundle multiple pipelines into a single Flink jar

2019-09-30 Thread Ankur Goenka (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941330#comment-16941330
 ] 

Ankur Goenka commented on BEAM-8183:


Flink has some neat feature of picking the pipeline on the fly. 

I don't think this is a very common usecase with Beam though.

Given Beam Job Submission api work on a single pipeline at a time it will be 
very convoluted work flow to introduce multiple pipelines in a single jar.

 

Will it be easier to just store pipelines as separate jar in global storage 
(hdfs etc) and pass the right jar at the time of pipeline submission?

In case the submission happen through a service then will it be easier and less 
error prone to just keep these jars separately on the service and submit the 
right jar to flink based on the parameter?

> Optionally bundle multiple pipelines into a single Flink jar
> 
>
> Key: BEAM-8183
> URL: https://issues.apache.org/jira/browse/BEAM-8183
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>
> [https://github.com/apache/beam/pull/9331#issuecomment-526734851]
> "With Flink you can bundle multiple entry points into the same jar file and 
> specify which one to use with optional flags. It may be desirable to allow 
> inclusion of multiple pipelines for this tool also, although that would 
> require a different workflow. Absent this option, it becomes quite convoluted 
> for users that need the flexibility to choose which pipeline to launch at 
> submission time."



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-8314) Beam Fn Api metrics piling causes pipeline to stuck after running for a while

2019-09-27 Thread Ankur Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka reassigned BEAM-8314:
--

Assignee: Ankur Goenka

> Beam Fn Api metrics piling causes pipeline to stuck after running for a while
> -
>
> Key: BEAM-8314
> URL: https://issues.apache.org/jira/browse/BEAM-8314
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Yichi Zhang
>Assignee: Ankur Goenka
>Priority: Blocker
> Fix For: 2.16.0
>
> Attachments: E4UaSUhJJKF.png
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Seems that in StreamingDataflowWorker we are not able to update the metrics 
> fast enough to dataflow service, the piling metrics causes memory usage to 
> increase and eventually leads to excessive memory thrashing/GC. And it will 
> almost stop the pipeline from processing new items.
>  
>  !E4UaSUhJJKF.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-8314) Beam Fn Api metrics piling causes pipeline to stuck after running for a while

2019-09-27 Thread Ankur Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka reassigned BEAM-8314:
--

Assignee: (was: Ankur Goenka)

> Beam Fn Api metrics piling causes pipeline to stuck after running for a while
> -
>
> Key: BEAM-8314
> URL: https://issues.apache.org/jira/browse/BEAM-8314
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Yichi Zhang
>Priority: Blocker
> Fix For: 2.16.0
>
> Attachments: E4UaSUhJJKF.png
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Seems that in StreamingDataflowWorker we are not able to update the metrics 
> fast enough to dataflow service, the piling metrics causes memory usage to 
> increase and eventually leads to excessive memory thrashing/GC. And it will 
> almost stop the pipeline from processing new items.
>  
>  !E4UaSUhJJKF.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8314) Beam Fn Api metrics piling causes pipeline to stuck after running for a while

2019-09-27 Thread Ankur Goenka (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939820#comment-16939820
 ] 

Ankur Goenka commented on BEAM-8314:


Cherry pick is out [https://github.com/apache/beam/pull/9682]

> Beam Fn Api metrics piling causes pipeline to stuck after running for a while
> -
>
> Key: BEAM-8314
> URL: https://issues.apache.org/jira/browse/BEAM-8314
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Yichi Zhang
>Priority: Blocker
> Fix For: 2.16.0
>
> Attachments: E4UaSUhJJKF.png
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Seems that in StreamingDataflowWorker we are not able to update the metrics 
> fast enough to dataflow service, the piling metrics causes memory usage to 
> increase and eventually leads to excessive memory thrashing/GC. And it will 
> almost stop the pipeline from processing new items.
>  
>  !E4UaSUhJJKF.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8314) Beam Fn Api metrics piling causes pipeline to stuck after running for a while

2019-09-27 Thread Ankur Goenka (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939758#comment-16939758
 ] 

Ankur Goenka commented on BEAM-8314:


We are refining the PR to make make it more readable and smaller.
Also we are adding tests for the change which will give us confidence.

We tried running the pipeline overnight with this fix and found it working in 
expected manner.

However, CounterUpdate structure is very generic and not very well defined 
which makes it harder to be 100% confident.

Given the size and number of work items in streaming, we will anyways need to 
do this fix to support counters.



> Beam Fn Api metrics piling causes pipeline to stuck after running for a while
> -
>
> Key: BEAM-8314
> URL: https://issues.apache.org/jira/browse/BEAM-8314
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Yichi Zhang
>Priority: Blocker
> Fix For: 2.16.0
>
> Attachments: E4UaSUhJJKF.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Seems that in StreamingDataflowWorker we are not able to update the metrics 
> fast enough to dataflow service, the piling metrics causes memory usage to 
> increase and eventually leads to excessive memory thrashing/GC. And it will 
> almost stop the pipeline from processing new items.
>  
>  !E4UaSUhJJKF.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   >