[jira] [Created] (BEAM-5703) Migrate Python streaming and portable integration tests to use a staged dataflow worker jar

2018-10-10 Thread Henning Rohde (JIRA)
Henning Rohde created BEAM-5703:
---

 Summary: Migrate Python streaming and portable integration tests 
to use a staged dataflow worker jar
 Key: BEAM-5703
 URL: https://issues.apache.org/jira/browse/BEAM-5703
 Project: Beam
  Issue Type: Sub-task
  Components: sdk-py-core
Reporter: Henning Rohde
Assignee: Ruoyun Huang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-5699) Migrate Go integration test to use a staged dataflow worker jar

2018-10-10 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde resolved BEAM-5699.
-
   Resolution: Fixed
Fix Version/s: 2.9.0

> Migrate Go integration test to use a staged dataflow worker jar
> ---
>
> Key: BEAM-5699
> URL: https://issues.apache.org/jira/browse/BEAM-5699
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-go
>Reporter: Henning Rohde
>Assignee: Henning Rohde
>Priority: Major
> Fix For: 2.9.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5699) Migrate Go integration test to use a staged dataflow worker jar

2018-10-09 Thread Henning Rohde (JIRA)
Henning Rohde created BEAM-5699:
---

 Summary: Migrate Go integration test to use a staged dataflow 
worker jar
 Key: BEAM-5699
 URL: https://issues.apache.org/jira/browse/BEAM-5699
 Project: Beam
  Issue Type: Sub-task
  Components: sdk-go
Reporter: Henning Rohde
Assignee: Henning Rohde






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5698) Migrate Dataflow tests to use a staged dataflow worker jar

2018-10-09 Thread Henning Rohde (JIRA)
Henning Rohde created BEAM-5698:
---

 Summary: Migrate Dataflow tests to use a staged dataflow worker jar
 Key: BEAM-5698
 URL: https://issues.apache.org/jira/browse/BEAM-5698
 Project: Beam
  Issue Type: Task
  Components: runner-dataflow
Reporter: Henning Rohde


Needs to be done for all Dataflow testing at HEAD for all SDKs, except legacy 
Python batch. For java legacy jobs, we should not specify a worker harness 
container image.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-5637) Python support for custom dataflow worker jar

2018-10-09 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde reassigned BEAM-5637:
---

Assignee: Ruoyun Huang

> Python support for custom dataflow worker jar
> -
>
> Key: BEAM-5637
> URL: https://issues.apache.org/jira/browse/BEAM-5637
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Henning Rohde
>Assignee: Ruoyun Huang
>Priority: Major
>
> One of the slightly subtle aspects is that we would need to ignore one of the 
> staged jars for portable Python jobs. That requires a change to the Python 
> boot code: 
> https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5686) Remove DataflowRunnerHarness shim again

2018-10-09 Thread Henning Rohde (JIRA)
Henning Rohde created BEAM-5686:
---

 Summary: Remove DataflowRunnerHarness shim again
 Key: BEAM-5686
 URL: https://issues.apache.org/jira/browse/BEAM-5686
 Project: Beam
  Issue Type: Task
  Components: runner-dataflow
Reporter: Henning Rohde
Assignee: Henning Rohde






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5689) Remove artifact naming constraint for portable Dataflow job

2018-10-09 Thread Henning Rohde (JIRA)
Henning Rohde created BEAM-5689:
---

 Summary: Remove artifact naming constraint for portable Dataflow 
job
 Key: BEAM-5689
 URL: https://issues.apache.org/jira/browse/BEAM-5689
 Project: Beam
  Issue Type: Task
  Components: runner-dataflow
Reporter: Henning Rohde
Assignee: Henning Rohde


Artifact names/keys are not preserved in Dataflow. Remove the below workarounds 
when they are.

 * Go Dataflow runner
 * Java and Python container boot code (probably)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5637) Python support for custom dataflow worker jar

2018-10-03 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde updated BEAM-5637:

Description: One of the slightly subtle aspects is that we would need to 
ignore one of the staged jars for portable Python jobs. That requires a change 
to the Python boot code: 
https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104

> Python support for custom dataflow worker jar
> -
>
> Key: BEAM-5637
> URL: https://issues.apache.org/jira/browse/BEAM-5637
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Henning Rohde
>Priority: Major
>
> One of the slightly subtle aspects is that we would need to ignore one of the 
> staged jars for portable Python jobs. That requires a change to the Python 
> boot code: 
> https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5636) Java support for custom dataflow worker jar

2018-10-03 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde updated BEAM-5636:

Description: One of the slightly subtle aspects is that we would need to 
ignore one of the staged jars for portable Java jobs. That requires a change to 
the Java boot code: 
https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/java/container/boot.go#L107

> Java support for custom dataflow worker jar
> ---
>
> Key: BEAM-5636
> URL: https://issues.apache.org/jira/browse/BEAM-5636
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Henning Rohde
>Assignee: Boyuan Zhang
>Priority: Major
>
> One of the slightly subtle aspects is that we would need to ignore one of the 
> staged jars for portable Java jobs. That requires a change to the Java boot 
> code: 
> https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/java/container/boot.go#L107



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5637) Python support for custom dataflow worker jar

2018-10-03 Thread Henning Rohde (JIRA)
Henning Rohde created BEAM-5637:
---

 Summary: Python support for custom dataflow worker jar
 Key: BEAM-5637
 URL: https://issues.apache.org/jira/browse/BEAM-5637
 Project: Beam
  Issue Type: Sub-task
  Components: sdk-py-core
Reporter: Henning Rohde






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5326) SDK support for custom dataflow worker jar

2018-10-03 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde updated BEAM-5326:

Summary: SDK support for custom dataflow worker jar  (was: Support for 
custom dataflow worker jar)

> SDK support for custom dataflow worker jar
> --
>
> Key: BEAM-5326
> URL: https://issues.apache.org/jira/browse/BEAM-5326
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Henning Rohde
>Assignee: Boyuan Zhang
>Priority: Major
>
> Doc: 
> https://docs.google.com/document/d/1-m-GzkYWIODKOEl1ZSUNXYbcGRvRr3QkasfHsJxbuoA/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5636) Java support for custom dataflow worker jar

2018-10-03 Thread Henning Rohde (JIRA)
Henning Rohde created BEAM-5636:
---

 Summary: Java support for custom dataflow worker jar
 Key: BEAM-5636
 URL: https://issues.apache.org/jira/browse/BEAM-5636
 Project: Beam
  Issue Type: Sub-task
  Components: sdk-java-core
Reporter: Henning Rohde
Assignee: Boyuan Zhang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5634) Bring Dataflow Java Worker Code into Beam

2018-10-03 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde updated BEAM-5634:

Description: 
Discussion: 
https://lists.apache.org/thread.html/89efd3bc1d30f3d43d4b361a5ee05bd52778c9dc3f43ac72354c2bd9@%3Cdev.beam.apache.org%3E
Vote: 
https://lists.apache.org/thread.html/2bdc645659e2fbd7e29f3a2758941faefedb01148a2a11558dfe60f8@%3Cdev.beam.apache.org%3E


> Bring Dataflow Java Worker Code into Beam
> -
>
> Key: BEAM-5634
> URL: https://issues.apache.org/jira/browse/BEAM-5634
> Project: Beam
>  Issue Type: Task
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>
> Discussion: 
> https://lists.apache.org/thread.html/89efd3bc1d30f3d43d4b361a5ee05bd52778c9dc3f43ac72354c2bd9@%3Cdev.beam.apache.org%3E
> Vote: 
> https://lists.apache.org/thread.html/2bdc645659e2fbd7e29f3a2758941faefedb01148a2a11558dfe60f8@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-2880) Artifact server proxies

2018-09-22 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde resolved BEAM-2880.
-
   Resolution: Won't Fix
Fix Version/s: Not applicable

> Artifact server proxies
> ---
>
> Key: BEAM-2880
> URL: https://issues.apache.org/jira/browse/BEAM-2880
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Henning Rohde
>Assignee: Henning Rohde
>Priority: Minor
>  Labels: portability
> Fix For: Not applicable
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> As per https://s.apache.org/beam-fn-api-container-contract, we should add 
> runner-agnostic (dockerized) artifact proxies for various runner 
> environments. These proxies implement the server side of the artifact API.
> We'll likely need proxies for GCS, S3, Azure Storage, and local. If the 
> proxies are implemented in a statically-linked language like Go, say, then 
> they do not have a dependency footprint and can easily be used as native 
> binaries or in dockerized form -- at the discretion of the runner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5440) Add option to mount a directory inside SDK harness containers

2018-09-21 Thread Henning Rohde (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624222#comment-16624222
 ] 

Henning Rohde commented on BEAM-5440:
-

This is something that I think makes sense and is userful for local development 
only. For non-local, I agree with [~thw] that process is likely a better option 
or something runner or deployment specific, such as fixed mappings 
(/opt/lyft/catphotos is just always mapped in).

> Add option to mount a directory inside SDK harness containers
> -
>
> Key: BEAM-5440
> URL: https://issues.apache.org/jira/browse/BEAM-5440
> Project: Beam
>  Issue Type: New Feature
>  Components: java-fn-execution
>Reporter: Maximilian Michels
>Priority: Major
>  Labels: portability, portability-flink
> Fix For: 2.8.0
>
>
> While experimenting with the Python SDK locally, I found it inconvenient that 
> I can't mount a host directory to the Docker containers, i.e. the input must 
> already be in the container and the results of a Write remain inside the 
> container. For local testing, users may want to mount a host directory.
> Since BEAM-5288 the {{Environment}} carries explicit environment information, 
> we could a) add volume args to the {{DockerPayload}}, or b) provide a general 
> Docker arguments field.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5447) Support running user pipelines with the Universal Local Runner in Go.

2018-09-21 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde updated BEAM-5447:

Component/s: sdk-go

> Support running user pipelines with the Universal Local Runner in Go.
> -
>
> Key: BEAM-5447
> URL: https://issues.apache.org/jira/browse/BEAM-5447
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-direct, sdk-go
>Reporter: Daniel Oliveira
>Assignee: Daniel Oliveira
>Priority: Major
>
> In order to aid testing, devs should be able to write pipelines and then 
> easily run them with the ULR. This should be generally trivial with Go as the 
> existing local runner for Go uses a build rule that should be nearly 
> identical to the build rule needed to complete this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-2901) Containerize ULR

2018-09-21 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde reassigned BEAM-2901:
---

Assignee: Daniel Oliveira

> Containerize ULR
> 
>
> Key: BEAM-2901
> URL: https://issues.apache.org/jira/browse/BEAM-2901
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core
>Reporter: Henning Rohde
>Assignee: Daniel Oliveira
>Priority: Minor
>  Labels: portability
>
> We should containerize ULR as a convenience options for users who have docker 
> installed, but perhaps not java.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-2900) ULR: configurable container/process management

2018-09-21 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde resolved BEAM-2900.
-
   Resolution: Won't Fix
Fix Version/s: Not applicable

Obsolete now that we're adding direct process support.

> ULR: configurable container/process management
> --
>
> Key: BEAM-2900
> URL: https://issues.apache.org/jira/browse/BEAM-2900
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core
>Reporter: Henning Rohde
>Priority: Minor
>  Labels: portability
> Fix For: Not applicable
>
>
> The ULR should support configurable container/process management as per 
> https://s.apache.org/beam-fn-api-container-contract. It would be convenient 
> if containers are optional for testing purposes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-2884) Dataflow runs portable pipelines

2018-09-21 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde resolved BEAM-2884.
-
   Resolution: Fixed
Fix Version/s: 2.8.0

All SDKs have a Dataflow-specific job-submission code to work around the lack 
of job api support, but it otherwise runs portable pipelines. Closing.

> Dataflow runs portable pipelines
> 
>
> Key: BEAM-2884
> URL: https://issues.apache.org/jira/browse/BEAM-2884
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Henning Rohde
>Priority: Major
>  Labels: portability
> Fix For: 2.8.0
>
>
> Dataflow should run pipelines using the full portability API as currently 
> defined:
> https://s.apache.org/beam-fn-api 
> https://s.apache.org/beam-runner-api
> https://s.apache.org/beam-job-api
> https://s.apache.org/beam-fn-api-container-contract
> This issue tracks its adoption of the portability framework. New Fn API and 
> other features will be tracked separately.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-3514) Use portable WindowIntoPayload in DataflowRunner

2018-09-21 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde updated BEAM-3514:

Issue Type: Task  (was: Sub-task)
Parent: (was: BEAM-2884)

> Use portable WindowIntoPayload in DataflowRunner
> 
>
> Key: BEAM-3514
> URL: https://issues.apache.org/jira/browse/BEAM-3514
> Project: Beam
>  Issue Type: Task
>  Components: runner-dataflow
>Reporter: Kenneth Knowles
>Assignee: Henning Rohde
>Priority: Major
>  Labels: portability
>
> The Java-specific blobs transmitted to Dataflow need more context, in the 
> form of portability framework protos.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-2769) Java SDK support for submitting a Portable Pipeline

2018-09-21 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde reassigned BEAM-2769:
---

Assignee: Ankur Goenka

> Java SDK support for submitting a Portable Pipeline
> ---
>
> Key: BEAM-2769
> URL: https://issues.apache.org/jira/browse/BEAM-2769
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Thomas Groh
>Assignee: Ankur Goenka
>Priority: Major
>  Labels: portability
>
> The Java codebase should provide a way to submit a Job to a Job Service.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-2769) Java SDK support for submitting a Portable Pipeline

2018-09-21 Thread Henning Rohde (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624181#comment-16624181
 ] 

Henning Rohde commented on BEAM-2769:
-

[~angoenka] Isn't this done a long time ago?

> Java SDK support for submitting a Portable Pipeline
> ---
>
> Key: BEAM-2769
> URL: https://issues.apache.org/jira/browse/BEAM-2769
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Thomas Groh
>Assignee: Ankur Goenka
>Priority: Major
>  Labels: portability
>
> The Java codebase should provide a way to submit a Job to a Job Service.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-2880) Artifact server proxies

2018-09-21 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde reassigned BEAM-2880:
---

Assignee: Henning Rohde

> Artifact server proxies
> ---
>
> Key: BEAM-2880
> URL: https://issues.apache.org/jira/browse/BEAM-2880
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Henning Rohde
>Assignee: Henning Rohde
>Priority: Minor
>  Labels: portability
>
> As per https://s.apache.org/beam-fn-api-container-contract, we should add 
> runner-agnostic (dockerized) artifact proxies for various runner 
> environments. These proxies implement the server side of the artifact API.
> We'll likely need proxies for GCS, S3, Azure Storage, and local. If the 
> proxies are implemented in a statically-linked language like Go, say, then 
> they do not have a dependency footprint and can easily be used as native 
> binaries or in dockerized form -- at the discretion of the runner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-2882) S3 artifact server proxy

2018-09-21 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde resolved BEAM-2882.
-
   Resolution: Won't Do
Fix Version/s: Not applicable

> S3 artifact server proxy
> 
>
> Key: BEAM-2882
> URL: https://issues.apache.org/jira/browse/BEAM-2882
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-core
>Reporter: Henning Rohde
>Priority: Minor
>  Labels: portability
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-2880) Artifact server proxies

2018-09-21 Thread Henning Rohde (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16623884#comment-16623884
 ] 

Henning Rohde commented on BEAM-2880:
-

Based on the portable Flink runner work, it seems these proxies are not the 
direction that we want to go. I'll close the this issue and remove the (unused) 
sketch code. We can always revive that path later.

> Artifact server proxies
> ---
>
> Key: BEAM-2880
> URL: https://issues.apache.org/jira/browse/BEAM-2880
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Henning Rohde
>Priority: Minor
>  Labels: portability
>
> As per https://s.apache.org/beam-fn-api-container-contract, we should add 
> runner-agnostic (dockerized) artifact proxies for various runner 
> environments. These proxies implement the server side of the artifact API.
> We'll likely need proxies for GCS, S3, Azure Storage, and local. If the 
> proxies are implemented in a statically-linked language like Go, say, then 
> they do not have a dependency footprint and can easily be used as native 
> binaries or in dockerized form -- at the discretion of the runner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-2863) Add support for Side Inputs over the Fn API

2018-09-21 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde resolved BEAM-2863.
-
   Resolution: Fixed
Fix Version/s: 2.8.0

> Add support for Side Inputs over the Fn API
> ---
>
> Key: BEAM-2863
> URL: https://issues.apache.org/jira/browse/BEAM-2863
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Major
>  Labels: portability
> Fix For: 2.8.0
>
>
> See:
> * https://s.apache.org/beam-side-inputs-1-pager
> * http://s.apache.org/beam-fn-api-state-api



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-3419) Enable iterable side input for beam runners.

2018-09-21 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde updated BEAM-3419:

Issue Type: Improvement  (was: Sub-task)
Parent: (was: BEAM-2863)

> Enable iterable side input for beam runners.
> 
>
> Key: BEAM-3419
> URL: https://issues.apache.org/jira/browse/BEAM-3419
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core
>Reporter: Robert Bradshaw
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-2928) ULR support for portable side input

2018-09-21 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde updated BEAM-2928:

Issue Type: Improvement  (was: Sub-task)
Parent: (was: BEAM-2863)

> ULR support for portable side input
> ---
>
> Key: BEAM-2928
> URL: https://issues.apache.org/jira/browse/BEAM-2928
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core, runner-direct
>Reporter: Henning Rohde
>Assignee: Daniel Oliveira
>Priority: Major
>  Labels: portability
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-3080) Improve the side input materialization for the DirectRunner/ULR from iterable to storing the multimap directly

2018-09-21 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde updated BEAM-3080:

Issue Type: Improvement  (was: Sub-task)
Parent: (was: BEAM-2863)

> Improve the side input materialization for the DirectRunner/ULR from iterable 
> to storing the multimap directly
> --
>
> Key: BEAM-3080
> URL: https://issues.apache.org/jira/browse/BEAM-3080
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Reporter: Luke Cwik
>Priority: Minor
>  Labels: portability
>
> https://github.com/apache/beam/pull/4011 migrated to using a multimap as the 
> materialization format for side inputs.
> The migration used a trivial multimap -> iterable -> multimap conversion 
> within the DirectRunner for first pass implementation purposes. Note that 
> this is no different then the current materialization from a performance 
> perspective it just moves this logic within the purview of the runner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-2928) ULR support for portable side input

2018-09-21 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde reassigned BEAM-2928:
---

Assignee: Daniel Oliveira  (was: Luke Cwik)

> ULR support for portable side input
> ---
>
> Key: BEAM-2928
> URL: https://issues.apache.org/jira/browse/BEAM-2928
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-core, runner-direct
>Reporter: Henning Rohde
>Assignee: Daniel Oliveira
>Priority: Major
>  Labels: portability
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-3286) Go SDK support for portable side input

2018-09-19 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde resolved BEAM-3286.
-
   Resolution: Fixed
Fix Version/s: 2.8.0

> Go SDK support for portable side input
> --
>
> Key: BEAM-3286
> URL: https://issues.apache.org/jira/browse/BEAM-3286
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-go
>Reporter: Henning Rohde
>Assignee: Henning Rohde
>Priority: Major
>  Labels: portability
> Fix For: 2.8.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-2947) SDK harnesses should be able to indicate panic in FnAPI

2018-09-18 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde reassigned BEAM-2947:
---

Assignee: (was: Henning Rohde)

> SDK harnesses should be able to indicate panic in FnAPI
> ---
>
> Key: BEAM-2947
> URL: https://issues.apache.org/jira/browse/BEAM-2947
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Henning Rohde
>Priority: Minor
>  Labels: portability
>
> If the SDK harness encounters a permanent problem (local file corruption or 
> bad staged files, say) it should be able to rely that information to the 
> runner as a "panic" message, say. Such conditions may also happen during the 
> boot stage.
> Using process/container exit codes would likely not work well, because that 
> information may not flow easily to the runner logic depending on how 
> containers are managed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-5350) Running autocomplete.go on dataflow fails

2018-09-18 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde reassigned BEAM-5350:
---

Assignee: (was: Henning Rohde)

> Running autocomplete.go on dataflow fails
> -
>
> Key: BEAM-5350
> URL: https://issues.apache.org/jira/browse/BEAM-5350
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Affects Versions: Not applicable
>Reporter: Tomas Roos
>Priority: Major
>
> I'm in the process as a external developer make sure that all examples are 
> runnable on both direct and the dataflow runner as its crucial for people 
> onboarding this project.
>  
> I've visted the projects before and some are runnable, some probably where 
> previously, and some are def not runnable.
>  
> So I started top down today, in order to make autocomplete.go run on dataflow 
> as well as the direct runner i changed the input in order to make it platform 
> independent instead of pointing to a local file.The reading of the source 
> from the public cloud storage went fine but it fails to run the top.Largest 
> anonymous less function (ran on id: go-job-1-1536575613531078735) failed with
>  
>  
> {code:java}
> RESP: instruction_id: "-205" error: "Invalid bundle desc: decode: bad userfn: 
> bad struct encoding: failed to decode data: decode: failed to find symbol 
> main.main.func1: main.main.func1 not found. Use runtime.RegisterFunction in 
> unit tests" register: < >
>  
> {code}
>  
> [https://github.com/apache/beam/blob/master/sdks/go/examples/complete/autocomplete/autocomplete.go#L63]
>  
> So in order to fix this I introduced the local func called lessFn and 
> registered in the init process. This though now instead when running
>  
> {code:java}
>  
> go run autocomplete.go --project fair-app-213019 --runner dataflow 
> --staging_location=gs://fair-app-213019/staging-test2 
> --worker_harness_container_image=apache-docker-beam-snapshots-docker.bintray.io/beam/go:20180515
> {code}
>  
>  
> fails with
>  
> {code:java}
> 2018/09/10 13:37:10 Running autocomplete
>  Unable to encode combiner for lifting: failed to encode custom coder: bad 
> underlying type: bad field type: bad element: unencodable type: interface 
> {}2018/09/10 13:37:10 Using running binary as worker binary: 
> '/tmp/go-build157286122/b001/exe/autocomplete'
>  2018/09/10 13:37:10 Staging worker binary: 
> /tmp/go-build157286122/b001/exe/autocomplete{code}
>  
> And I know this is when invoking the top.Largest since I've removed the piece 
> of code and then the job runs fine, could you please point me in the right 
> direction why my local func is not encoable as a interface {} and I will of 
> course happily send a PR when this is working on direct and the dataflow 
> direct so I can move on to the other examples
>  
> (All changes can be seen here) 
> [https://github.com/apache/beam/compare/master...ptomasroos:make-autocomplete-dataflowable?expand=1]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-5391) Use repeated fields for model for ordering of input/output/etc maps

2018-09-14 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde reassigned BEAM-5391:
---

Assignee: (was: Kenneth Knowles)

> Use repeated fields for model for ordering of input/output/etc maps
> ---
>
> Key: BEAM-5391
> URL: https://issues.apache.org/jira/browse/BEAM-5391
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Henning Rohde
>Priority: Major
>
> The Go SDK uses positional side input tagging and needs the ordering. It 
> cannot as easily encode it into a sequential tag ("i0", "i1", ..) because 
> runners much with the naming when doing certain optimizations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5391) Use repeated fields for model for ordering of input/output/etc maps

2018-09-14 Thread Henning Rohde (JIRA)
Henning Rohde created BEAM-5391:
---

 Summary: Use repeated fields for model for ordering of 
input/output/etc maps
 Key: BEAM-5391
 URL: https://issues.apache.org/jira/browse/BEAM-5391
 Project: Beam
  Issue Type: Sub-task
  Components: beam-model
Reporter: Henning Rohde
Assignee: Kenneth Knowles


The Go SDK uses positional side input tagging and needs the ordering. It cannot 
as easily encode it into a sequential tag ("i0", "i1", ..) because runners much 
with the naming when doing certain optimizations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5288) Modify Environment to support non-dockerized SDK harness deployments

2018-09-13 Thread Henning Rohde (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614237#comment-16614237
 ] 

Henning Rohde commented on BEAM-5288:
-

If we're reusing the container contract (which I think is reasonable), then 
these extra proto args would have to come either before or after the 
runner-provided args. Having just the environment avoids that potential 
semantic confusion. I was originally thinking that the program would be 
executed in a (OS dependent) '/bin/bash -c ""' fashion and defer any 
escaping to the sdk, but I have no problem with more explicit args as long as 
the semantics is clear.

> Modify Environment to support non-dockerized SDK harness deployments 
> -
>
> Key: BEAM-5288
> URL: https://issues.apache.org/jira/browse/BEAM-5288
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Maximilian Michels
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> As of mailing discussions and BEAM-5187, it has become clear that we need to 
> extend the Environment information. In addition to the Docker environment, 
> the extended environment holds deployment options for 1) a process-based 
> environment, 2) an externally managed environment.
> The proto definition, as of now, looks as follows:
> {noformat}
>  message Environment {
>// (Required) The URN of the payload
>string urn = 1;
>// (Optional) The data specifying any parameters to the URN. If
>// the URN does not require any arguments, this may be omitted.
>bytes payload = 2;
>  }
>  message StandardEnvironments {
>enum Environments {
>  DOCKER = 0 [(beam_urn) = "beam:env:docker:v1"];
>  PROCESS = 1 [(beam_urn) = "beam:env:process:v1"];
>  EXTERNAL = 2 [(beam_urn) = "beam:env:external:v1"];
>}
>  }
>  // The payload of a Docker image
>  message DockerPayload {
>string container_image = 1;  // implicitly linux_amd64.
>  }
>  message ProcessPayload {
>string os = 1;  // "linux", "darwin", ..
>string arch = 2;  // "amd64", ..
>string command = 3; // process to execute
>map env = 4; // environment variables
>  }
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5354) Side Inputs seems to be non-working in the sdk-go

2018-09-13 Thread Henning Rohde (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613736#comment-16613736
 ] 

Henning Rohde commented on BEAM-5354:
-

Technically, the side input feature on Dataflow is not implemented yet: 
https://issues.apache.org/jira/browse/BEAM-3286. I ran into an error on 
Dataflow that might be explained by Robert's work, but never got back to it. So 
YMMV.

> Side Inputs seems to be non-working in the sdk-go
> -
>
> Key: BEAM-5354
> URL: https://issues.apache.org/jira/browse/BEAM-5354
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Tomas Roos
>Priority: Major
>
> Running the contains example fails with
>  
> {code:java}
> Output i0 for step was not found.
> {code}
> This is because of the call to debug.Head (which internally uses SideInput)
> Removing the following line 
> [https://github.com/apache/beam/blob/master/sdks/go/examples/contains/contains.go#L50]
>  
> The pipeline executes well.
>  
> Executed on id's
>  
> go-job-1-1536664417610678545 
> vs
> go-job-1-1536664934354466938
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5357) Go check for IsWorkerCompatibleBinary is wrong

2018-09-11 Thread Henning Rohde (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611406#comment-16611406
 ] 

Henning Rohde commented on BEAM-5357:
-

De-prioritizing now that [~ptomasroos]'s fix is in.

> Go check for IsWorkerCompatibleBinary is wrong
> --
>
> Key: BEAM-5357
> URL: https://issues.apache.org/jira/browse/BEAM-5357
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Henning Rohde
>Assignee: Robert Burke
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Per BEAM-5253, The linux/amd64 check in IsWorkerCompatibleBinary is 
> insufficient:
> https://github.com/apache/beam/blob/609a42978405173a60e5d91f35170a5c0b5d5332/sdks/go/pkg/beam/runners/universal/runnerlib/compile.go#L37
> We need to see if we can do a better check here (such as looking up the 
> symbol table or similar) or disable this optimization.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5357) Go check for IsWorkerCompatibleBinary is wrong

2018-09-11 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde updated BEAM-5357:

Priority: Major  (was: Critical)

> Go check for IsWorkerCompatibleBinary is wrong
> --
>
> Key: BEAM-5357
> URL: https://issues.apache.org/jira/browse/BEAM-5357
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Henning Rohde
>Assignee: Robert Burke
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Per BEAM-5253, The linux/amd64 check in IsWorkerCompatibleBinary is 
> insufficient:
> https://github.com/apache/beam/blob/609a42978405173a60e5d91f35170a5c0b5d5332/sdks/go/pkg/beam/runners/universal/runnerlib/compile.go#L37
> We need to see if we can do a better check here (such as looking up the 
> symbol table or similar) or disable this optimization.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5253) Go SDK PubSub example currently broken

2018-09-11 Thread Henning Rohde (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611017#comment-16611017
 ] 

Henning Rohde commented on BEAM-5253:
-

Great. Thanks for investigating!  Opened 
https://issues.apache.org/jira/browse/BEAM-5357.

[~ptomasroos] Just to be clear: are you sending a PR with the short-term fix of 
just always returning false or would you like me to do it?

> Go SDK PubSub example currently broken
> --
>
> Key: BEAM-5253
> URL: https://issues.apache.org/jira/browse/BEAM-5253
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Sean Patrick Hagen
>Assignee: Robert Burke
>Priority: Critical
> Fix For: Not applicable
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The Go SDK contains an example for creating a streaming pipeline that reads 
> from pubsub and outputs the messages. It can be found here: 
> [https://github.com/apache/beam/blob/master/sdks/go/examples/streaming_wordcap/wordcap.go]
>  
> This example is broken and does not work. It fails with the error "failed to 
> execute job: translation failed: no root units" when I try to run it with the 
> direct runner, and it just fails with "Internal Issue (8ed815a0a259018f): 
> 65177287:8503 " when run in Google Dataflow.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5357) Go check for IsWorkerCompatibleBinary is wrong

2018-09-11 Thread Henning Rohde (JIRA)
Henning Rohde created BEAM-5357:
---

 Summary: Go check for IsWorkerCompatibleBinary is wrong
 Key: BEAM-5357
 URL: https://issues.apache.org/jira/browse/BEAM-5357
 Project: Beam
  Issue Type: Bug
  Components: sdk-go
Reporter: Henning Rohde
Assignee: Robert Burke


Per BEAM-5253, The linux/amd64 check in IsWorkerCompatibleBinary is 
insufficient:

https://github.com/apache/beam/blob/609a42978405173a60e5d91f35170a5c0b5d5332/sdks/go/pkg/beam/runners/universal/runnerlib/compile.go#L37

We need to see if we can do a better check here (such as looking up the symbol 
table or similar) or disable this optimization.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5253) Go SDK PubSub example currently broken

2018-09-11 Thread Henning Rohde (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610861#comment-16610861
 ] 

Henning Rohde commented on BEAM-5253:
-

"go run" is expected to work. Would you mind trying to make 
IsWorkerCompatibleBinary always return false here:

https://github.com/apache/beam/blob/609a42978405173a60e5d91f35170a5c0b5d5332/sdks/go/pkg/beam/runners/universal/runnerlib/compile.go#L37

If that works it's a quick workaround. But we need to see if we can do a better 
check here (such as looking up the symbol table or similar).

> Go SDK PubSub example currently broken
> --
>
> Key: BEAM-5253
> URL: https://issues.apache.org/jira/browse/BEAM-5253
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Sean Patrick Hagen
>Assignee: Robert Burke
>Priority: Critical
> Fix For: Not applicable
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The Go SDK contains an example for creating a streaming pipeline that reads 
> from pubsub and outputs the messages. It can be found here: 
> [https://github.com/apache/beam/blob/master/sdks/go/examples/streaming_wordcap/wordcap.go]
>  
> This example is broken and does not work. It fails with the error "failed to 
> execute job: translation failed: no root units" when I try to run it with the 
> direct runner, and it just fails with "Internal Issue (8ed815a0a259018f): 
> 65177287:8503 " when run in Google Dataflow.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5350) Running autocomplete.go on dataflow fails

2018-09-10 Thread Henning Rohde (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609984#comment-16609984
 ] 

Henning Rohde commented on BEAM-5350:
-

Tomas -- do you want to re-purpose this JIRA to remove (or complete) the 
example? If you send a PR to either effect, I can merge it to avoid a defunct 
example. For some reason, I can't simply assign you the JIRA.

> Running autocomplete.go on dataflow fails
> -
>
> Key: BEAM-5350
> URL: https://issues.apache.org/jira/browse/BEAM-5350
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Affects Versions: Not applicable
>Reporter: Tomas Roos
>Assignee: Henning Rohde
>Priority: Major
>
> I'm in the process as a external developer make sure that all examples are 
> runnable on both direct and the dataflow runner as its crucial for people 
> onboarding this project.
>  
> I've visted the projects before and some are runnable, some probably where 
> previously, and some are def not runnable.
>  
> So I started top down today, in order to make autocomplete.go run on dataflow 
> as well as the direct runner i changed the input in order to make it platform 
> independent instead of pointing to a local file.The reading of the source 
> from the public cloud storage went fine but it fails to run the top.Largest 
> anonymous less function (ran on id: go-job-1-1536575613531078735) failed with
>  
>  
> {code:java}
> RESP: instruction_id: "-205" error: "Invalid bundle desc: decode: bad userfn: 
> bad struct encoding: failed to decode data: decode: failed to find symbol 
> main.main.func1: main.main.func1 not found. Use runtime.RegisterFunction in 
> unit tests" register: < >
>  
> {code}
>  
> [https://github.com/apache/beam/blob/master/sdks/go/examples/complete/autocomplete/autocomplete.go#L63]
>  
> So in order to fix this I introduced the local func called lessFn and 
> registered in the init process. This though now instead when running
>  
> {code:java}
>  
> go run autocomplete.go --project fair-app-213019 --runner dataflow 
> --staging_location=gs://fair-app-213019/staging-test2 
> --worker_harness_container_image=apache-docker-beam-snapshots-docker.bintray.io/beam/go:20180515
> {code}
>  
>  
> fails with
>  
> {code:java}
> 2018/09/10 13:37:10 Running autocomplete
>  Unable to encode combiner for lifting: failed to encode custom coder: bad 
> underlying type: bad field type: bad element: unencodable type: interface 
> {}2018/09/10 13:37:10 Using running binary as worker binary: 
> '/tmp/go-build157286122/b001/exe/autocomplete'
>  2018/09/10 13:37:10 Staging worker binary: 
> /tmp/go-build157286122/b001/exe/autocomplete{code}
>  
> And I know this is when invoking the top.Largest since I've removed the piece 
> of code and then the job runs fine, could you please point me in the right 
> direction why my local func is not encoable as a interface {} and I will of 
> course happily send a PR when this is working on direct and the dataflow 
> direct so I can move on to the other examples
>  
> (All changes can be seen here) 
> [https://github.com/apache/beam/compare/master...ptomasroos:make-autocomplete-dataflowable?expand=1]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-5253) Go SDK PubSub example currently broken

2018-09-10 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde reassigned BEAM-5253:
---

Assignee: Robert Burke  (was: Henning Rohde)

> Go SDK PubSub example currently broken
> --
>
> Key: BEAM-5253
> URL: https://issues.apache.org/jira/browse/BEAM-5253
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Sean Patrick Hagen
>Assignee: Robert Burke
>Priority: Critical
> Fix For: Not applicable
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The Go SDK contains an example for creating a streaming pipeline that reads 
> from pubsub and outputs the messages. It can be found here: 
> [https://github.com/apache/beam/blob/master/sdks/go/examples/streaming_wordcap/wordcap.go]
>  
> This example is broken and does not work. It fails with the error "failed to 
> execute job: translation failed: no root units" when I try to run it with the 
> direct runner, and it just fails with "Internal Issue (8ed815a0a259018f): 
> 65177287:8503 " when run in Google Dataflow.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5253) Go SDK PubSub example currently broken

2018-09-10 Thread Henning Rohde (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609968#comment-16609968
 ] 

Henning Rohde commented on BEAM-5253:
-

I've been testing on mac with cross-compiling to linux/amd64 (just via "go 
build"). Same version of Go.

$ go version
go version go1.10.1 darwin/amd64

I made a change to re-use the running binary as the worker binary if 
linux/amd64, but it can be overridden via --worker_binary. It seems the binary 
submitting the job doesn't include the symbol information we're expecting. Not 
sure why. Could you try to see if the pkg/symtab program works on your machine?

[~lostluck] Do you have time to take a deeper look? You have better context 
here.

> Go SDK PubSub example currently broken
> --
>
> Key: BEAM-5253
> URL: https://issues.apache.org/jira/browse/BEAM-5253
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Sean Patrick Hagen
>Assignee: Henning Rohde
>Priority: Critical
> Fix For: Not applicable
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The Go SDK contains an example for creating a streaming pipeline that reads 
> from pubsub and outputs the messages. It can be found here: 
> [https://github.com/apache/beam/blob/master/sdks/go/examples/streaming_wordcap/wordcap.go]
>  
> This example is broken and does not work. It fails with the error "failed to 
> execute job: translation failed: no root units" when I try to run it with the 
> direct runner, and it just fails with "Internal Issue (8ed815a0a259018f): 
> 65177287:8503 " when run in Google Dataflow.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5350) Running autocomplete.go on dataflow fails

2018-09-10 Thread Henning Rohde (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609950#comment-16609950
 ] 

Henning Rohde commented on BEAM-5350:
-

That is a really awesome idea. Thanks for taking that on!

Let's consolidate the investigation in 
https://issues.apache.org/jira/browse/BEAM-5253.

> Running autocomplete.go on dataflow fails
> -
>
> Key: BEAM-5350
> URL: https://issues.apache.org/jira/browse/BEAM-5350
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Affects Versions: Not applicable
>Reporter: Tomas Roos
>Assignee: Henning Rohde
>Priority: Major
>
> I'm in the process as a external developer make sure that all examples are 
> runnable on both direct and the dataflow runner as its crucial for people 
> onboarding this project.
>  
> I've visted the projects before and some are runnable, some probably where 
> previously, and some are def not runnable.
>  
> So I started top down today, in order to make autocomplete.go run on dataflow 
> as well as the direct runner i changed the input in order to make it platform 
> independent instead of pointing to a local file.The reading of the source 
> from the public cloud storage went fine but it fails to run the top.Largest 
> anonymous less function (ran on id: go-job-1-1536575613531078735) failed with
>  
>  
> {code:java}
> RESP: instruction_id: "-205" error: "Invalid bundle desc: decode: bad userfn: 
> bad struct encoding: failed to decode data: decode: failed to find symbol 
> main.main.func1: main.main.func1 not found. Use runtime.RegisterFunction in 
> unit tests" register: < >
>  
> {code}
>  
> [https://github.com/apache/beam/blob/master/sdks/go/examples/complete/autocomplete/autocomplete.go#L63]
>  
> So in order to fix this I introduced the local func called lessFn and 
> registered in the init process. This though now instead when running
>  
> {code:java}
>  
> go run autocomplete.go --project fair-app-213019 --runner dataflow 
> --staging_location=gs://fair-app-213019/staging-test2 
> --worker_harness_container_image=apache-docker-beam-snapshots-docker.bintray.io/beam/go:20180515
> {code}
>  
>  
> fails with
>  
> {code:java}
> 2018/09/10 13:37:10 Running autocomplete
>  Unable to encode combiner for lifting: failed to encode custom coder: bad 
> underlying type: bad field type: bad element: unencodable type: interface 
> {}2018/09/10 13:37:10 Using running binary as worker binary: 
> '/tmp/go-build157286122/b001/exe/autocomplete'
>  2018/09/10 13:37:10 Staging worker binary: 
> /tmp/go-build157286122/b001/exe/autocomplete{code}
>  
> And I know this is when invoking the top.Largest since I've removed the piece 
> of code and then the job runs fine, could you please point me in the right 
> direction why my local func is not encoable as a interface {} and I will of 
> course happily send a PR when this is working on direct and the dataflow 
> direct so I can move on to the other examples
>  
> (All changes can be seen here) 
> [https://github.com/apache/beam/compare/master...ptomasroos:make-autocomplete-dataflowable?expand=1]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5350) Running autocomplete.go on dataflow fails

2018-09-10 Thread Henning Rohde (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609622#comment-16609622
 ] 

Henning Rohde commented on BEAM-5350:
-

The Go autocomplete example is a stub from a long time ago. It was never 
completed and it makes sense to either complete (so that it computes the same 
thing as the Java example) or delete.

> Running autocomplete.go on dataflow fails
> -
>
> Key: BEAM-5350
> URL: https://issues.apache.org/jira/browse/BEAM-5350
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Affects Versions: Not applicable
>Reporter: Tomas Roos
>Assignee: Henning Rohde
>Priority: Major
>
> I'm in the process as a external developer make sure that all examples are 
> runnable on both direct and the dataflow runner as its crucial for people 
> onboarding this project.
>  
> I've visted the projects before and some are runnable, some probably where 
> previously, and some are def not runnable.
>  
> So I started top down today, in order to make autocomplete.go run on dataflow 
> as well as the direct runner i changed the input in order to make it platform 
> independent instead of pointing to a local file.The reading of the source 
> from the public cloud storage went fine but it fails to run the top.Largest 
> anonymous less function (ran on id: go-job-1-1536575613531078735) failed with
>  
>  
> {code:java}
> RESP: instruction_id: "-205" error: "Invalid bundle desc: decode: bad userfn: 
> bad struct encoding: failed to decode data: decode: failed to find symbol 
> main.main.func1: main.main.func1 not found. Use runtime.RegisterFunction in 
> unit tests" register: < >
>  
> {code}
>  
> [https://github.com/apache/beam/blob/master/sdks/go/examples/complete/autocomplete/autocomplete.go#L63]
>  
> So in order to fix this I introduced the local func called lessFn and 
> registered in the init process. This though now instead when running
>  
> {code:java}
>  
> go run autocomplete.go --project fair-app-213019 --runner dataflow 
> --staging_location=gs://fair-app-213019/staging-test2 
> --worker_harness_container_image=apache-docker-beam-snapshots-docker.bintray.io/beam/go:20180515
> {code}
>  
>  
> fails with
>  
> {code:java}
> 2018/09/10 13:37:10 Running autocomplete
>  Unable to encode combiner for lifting: failed to encode custom coder: bad 
> underlying type: bad field type: bad element: unencodable type: interface 
> {}2018/09/10 13:37:10 Using running binary as worker binary: 
> '/tmp/go-build157286122/b001/exe/autocomplete'
>  2018/09/10 13:37:10 Staging worker binary: 
> /tmp/go-build157286122/b001/exe/autocomplete{code}
>  
> And I know this is when invoking the top.Largest since I've removed the piece 
> of code and then the job runs fine, could you please point me in the right 
> direction why my local func is not encoable as a interface {} and I will of 
> course happily send a PR when this is working on direct and the dataflow 
> direct so I can move on to the other examples
>  
> (All changes can be seen here) 
> [https://github.com/apache/beam/compare/master...ptomasroos:make-autocomplete-dataflowable?expand=1]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5253) Go SDK PubSub example currently broken

2018-09-10 Thread Henning Rohde (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609612#comment-16609612
 ] 

Henning Rohde commented on BEAM-5253:
-

Looking at the PR: https://github.com/apache/beam/pull/6357

This is curious. beam.RegisterFunction is optional (to avoid hitting the symbol 
table at runtime) and the example works for me when run with:

$ [...]/src/github.com/apache/beam/sdks/go
$ go run examples/streaming_wordcap/wordcap.go --runner=dataflow [...]

 While the fix is not wrong, something else is going on -- probably how the 
binary is cross-compiled. It seems doing a separate "go build" is causing 
trouble. The instructions on beam.apache.org does a "go install" for comparison.

> Go SDK PubSub example currently broken
> --
>
> Key: BEAM-5253
> URL: https://issues.apache.org/jira/browse/BEAM-5253
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Sean Patrick Hagen
>Assignee: Henning Rohde
>Priority: Critical
> Fix For: Not applicable
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The Go SDK contains an example for creating a streaming pipeline that reads 
> from pubsub and outputs the messages. It can be found here: 
> [https://github.com/apache/beam/blob/master/sdks/go/examples/streaming_wordcap/wordcap.go]
>  
> This example is broken and does not work. It fails with the error "failed to 
> execute job: translation failed: no root units" when I try to run it with the 
> direct runner, and it just fails with "Internal Issue (8ed815a0a259018f): 
> 65177287:8503 " when run in Google Dataflow.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-5340) [beam_PostCommit_Python_Verify][test_hourly_team_score_it][Flake] Failed to install packages

2018-09-07 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde reassigned BEAM-5340:
---

Assignee: Ahmet Altay  (was: Henning Rohde)

> [beam_PostCommit_Python_Verify][test_hourly_team_score_it][Flake] Failed to 
> install packages
> 
>
> Key: BEAM-5340
> URL: https://issues.apache.org/jira/browse/BEAM-5340
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Mikhail Gryzykhin
>Assignee: Ahmet Altay
>Priority: Major
>
> boot.go failed to install SDK.
> Failing job url:
> [https://builds.apache.org/job/beam_PostCommit_Python_Verify/5920/]
> Relevant log:
> Stackdriver logs:
>  
> [https://pantheon.corp.google.com/logs/viewer?resource=dataflow_step%2Fjob_id%2F2018-09-07_11_04_49-10526452128675467543=projects%2Fapache-beam-testing%2Flogs%2Fdataflow.googleapis.com%252Fworker-startup=NO_LIMIT=apache-beam-testing=500=false=2018-09-07T19:12:48.64300Z==true=2018-09-07T18:13:26.58416Z=forwardInTime]
>  
> +Worker startup:+
> Installing setup packages ... 
> Executing: /usr/local/bin/pip install 
> /var/opt/google/dataflow/dataflow_python_sdk.tar[gcp] 
> Traceback (most recent call last): 
> File "/usr/local/bin/pip", line 7, in 
> from pip._internal import main 
> ImportError : No module named pip._internal
> /usr/local/bin/pip failed with exit status 1
> Dataflow base path override: [https://dataflow.googleapis.com/]
>  
> 2018-09-07 11:12:47.377 PDTFailed to install packages: failed to install SDK: 
> exit status 1
>  {
>  insertId: "237402094728839677:887122:0:16965" 
>  jsonPayload: 
> { line: "boot.go:145"  message: "Failed to install packages: failed to 
> install SDK: exit status 1" }
> labels: 
> { compute.googleapis.com/resource_id: "237402094728839677"  
> compute.googleapis.com/resource_name: 
> "beamapp-jenkins-090718044-09071104-8yaz-harness-dzk6"  
> compute.googleapis.com/resource_type: "instance"  
> dataflow.googleapis.com/job_id: "2018-09-07_11_04_49-10526452128675467543"  
> dataflow.googleapis.com/job_name: "beamapp-jenkins-0907180442-302722"  
> dataflow.googleapis.com/region: "us-central1" }
> logName: 
> "projects/apache-beam-testing/logs/dataflow.googleapis.com%2Fworker-startup" 
>  receiveTimestamp: "2018-09-07T18:12:49.639606153Z" 
>  resource: {
>  labels: 
> { job_id: "2018-09-07_11_04_49-10526452128675467543"  job_name: 
> "beamapp-jenkins-0907180442-302722"  project_id: "apache-beam-testing"  
> region: "us-central1"  step_id: "" }
> type: "dataflow_step" }
>  severity: "CRITICAL" 
>  timestamp: "2018-09-07T18:12:47.377871Z" }
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-5327) Go support for custom dataflow worker jar

2018-09-06 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde resolved BEAM-5327.
-
   Resolution: Fixed
Fix Version/s: 2.8.0

> Go support for custom dataflow worker jar
> -
>
> Key: BEAM-5327
> URL: https://issues.apache.org/jira/browse/BEAM-5327
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-go
>Reporter: Henning Rohde
>Assignee: Henning Rohde
>Priority: Major
> Fix For: 2.8.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5326) Support for custom dataflow worker jar

2018-09-05 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde updated BEAM-5326:

Description: Doc: 
https://docs.google.com/document/d/1-m-GzkYWIODKOEl1ZSUNXYbcGRvRr3QkasfHsJxbuoA/edit?usp=sharing

> Support for custom dataflow worker jar
> --
>
> Key: BEAM-5326
> URL: https://issues.apache.org/jira/browse/BEAM-5326
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Henning Rohde
>Assignee: Boyuan Zhang
>Priority: Major
>
> Doc: 
> https://docs.google.com/document/d/1-m-GzkYWIODKOEl1ZSUNXYbcGRvRr3QkasfHsJxbuoA/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5327) Go support for custom dataflow worker jar

2018-09-05 Thread Henning Rohde (JIRA)
Henning Rohde created BEAM-5327:
---

 Summary: Go support for custom dataflow worker jar
 Key: BEAM-5327
 URL: https://issues.apache.org/jira/browse/BEAM-5327
 Project: Beam
  Issue Type: Sub-task
  Components: sdk-go
Reporter: Henning Rohde
Assignee: Henning Rohde






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5326) Support for custom dataflow worker jar

2018-09-05 Thread Henning Rohde (JIRA)
Henning Rohde created BEAM-5326:
---

 Summary: Support for custom dataflow worker jar
 Key: BEAM-5326
 URL: https://issues.apache.org/jira/browse/BEAM-5326
 Project: Beam
  Issue Type: Improvement
  Components: runner-dataflow
Reporter: Henning Rohde
Assignee: Boyuan Zhang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-5253) Go SDK PubSub example currently broken

2018-09-05 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde resolved BEAM-5253.
-
   Resolution: Not A Bug
Fix Version/s: Not applicable

> Go SDK PubSub example currently broken
> --
>
> Key: BEAM-5253
> URL: https://issues.apache.org/jira/browse/BEAM-5253
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Sean Patrick Hagen
>Assignee: Henning Rohde
>Priority: Critical
> Fix For: Not applicable
>
>
> The Go SDK contains an example for creating a streaming pipeline that reads 
> from pubsub and outputs the messages. It can be found here: 
> [https://github.com/apache/beam/blob/master/sdks/go/examples/streaming_wordcap/wordcap.go]
>  
> This example is broken and does not work. It fails with the error "failed to 
> execute job: translation failed: no root units" when I try to run it with the 
> direct runner, and it just fails with "Internal Issue (8ed815a0a259018f): 
> 65177287:8503 " when run in Google Dataflow.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5288) Modify Environment to support non-dockerized SDK harness deployments

2018-09-04 Thread Henning Rohde (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16603454#comment-16603454
 ] 

Henning Rohde commented on BEAM-5288:
-

Clarifying question [~mxm] [~angoenka]: what is the semantics of the params 
field in ProcessPayload? The SDK harness will need to know the id and endpoints 
from the runner, which is why I suggested to re-use the container contract 
invocation here.


> Modify Environment to support non-dockerized SDK harness deployments 
> -
>
> Key: BEAM-5288
> URL: https://issues.apache.org/jira/browse/BEAM-5288
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Maximilian Michels
>Priority: Major
>
> As of mailing discussions and BEAM-5187, it has become clear that we need to 
> extend the Environment information. In addition to the Docker environment, 
> the extended environment holds deployment options for 1) a process-based 
> environment, 2) an externally managed environment.
> The proto definition, as of now, looks as follows:
> {noformat}
>  message Environment {
>// (Required) The URN of the payload
>string urn = 1;
>// (Optional) The data specifying any parameters to the URN. If
>// the URN does not require any arguments, this may be omitted.
>bytes payload = 2;
>  }
>  message StandardEnvironments {
>enum Environments {
>  DOCKER = 0 [(beam_urn) = "beam:env:docker:v1"];
>  PROCESS = 1 [(beam_urn) = "beam:env:process:v1"];
>  EXTERNAL = 2 [(beam_urn) = "beam:env:external:v1"];
>}
>  }
>  // The payload of a Docker image
>  message DockerPayload {
>string container_image = 1;  // implicitly linux_amd64.
>  }
>  message ProcessPayload {
>string os = 1;  // "linux", "darwin", ..
>string arch = 2;  // "amd64", ..
>string command = 3; // process to execute
>repeated string params = 4; // parameters
>  }
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5253) Go SDK PubSub example currently broken

2018-08-31 Thread Henning Rohde (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599425#comment-16599425
 ] 

Henning Rohde commented on BEAM-5253:
-

[~seanhagen] It looks like project doesn't have access to one of the buckets 
you specify, likely "gs://events/". The Dataflow logs in the cloud console 
doesn't give you an error to that effect?

If you don't specify the temp_location, it uses a subdirectory of the 
staging_location.

> Go SDK PubSub example currently broken
> --
>
> Key: BEAM-5253
> URL: https://issues.apache.org/jira/browse/BEAM-5253
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Sean Patrick Hagen
>Assignee: Henning Rohde
>Priority: Critical
>
> The Go SDK contains an example for creating a streaming pipeline that reads 
> from pubsub and outputs the messages. It can be found here: 
> [https://github.com/apache/beam/blob/master/sdks/go/examples/streaming_wordcap/wordcap.go]
>  
> This example is broken and does not work. It fails with the error "failed to 
> execute job: translation failed: no root units" when I try to run it with the 
> direct runner, and it just fails with "Internal Issue (8ed815a0a259018f): 
> 65177287:8503 " when run in Google Dataflow.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-4424) Improvements to hooks module

2018-08-31 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde reassigned BEAM-4424:
---

Assignee: (was: Bill Neubauer)

> Improvements to hooks module
> 
>
> Key: BEAM-4424
> URL: https://issues.apache.org/jira/browse/BEAM-4424
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Affects Versions: Not applicable
>Reporter: Bill Neubauer
>Priority: Minor
>
> Proposed improvements to the Go hooks API:
> Execution order of the hooks should be based on the order in which the hook 
> is enabled. This gives the runner precise control over ordering so 
> dependencies on hooked behavior can be well-managed.
> Provide an API to disable a hook. Disabling a hook removes it from the 
> ordered list. If the same hook is later re-Enabled, it has lost its previous 
> ordering and would be placed at the end of the ordered list.
> The invocation of setupRemoteLogging() in harness.Main() will be replaced by 
> a hook. This new hook will be called by the default translate code. The net 
> effect is the default behavior for runners remains unchanged. If a runner 
> wants a different logging behavior, it can disable the default logging hook 
> and enable its own hook.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-4124) Support elements larger than 4 MB

2018-08-31 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde reassigned BEAM-4124:
---

Assignee: (was: Bill Neubauer)

> Support elements larger than 4 MB
> -
>
> Key: BEAM-4124
> URL: https://issues.apache.org/jira/browse/BEAM-4124
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Cody Schroeder
>Priority: Major
>
> The Go SDK harness is limited by a gRPC message size limit of 4 MB.
> https://github.com/apache/beam/blob/4a32353/sdks/go/pkg/beam/core/runtime/harness/datamgr.go#L31



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-4017) Go session runner should write multiple files

2018-08-31 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde reassigned BEAM-4017:
---

Assignee: (was: Bill Neubauer)

> Go session runner should write multiple files
> -
>
> Key: BEAM-4017
> URL: https://issues.apache.org/jira/browse/BEAM-4017
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Bill Neubauer
>Priority: Minor
>
> The Go session runner allows a worker to "play back" a previous execution, 
> which can be useful for debugging or profiling sessions. However, the 
> recording facility produces one file for the entire lifetime of the worker. 
> While this is useful for local debugging, it won't work well for workers at 
> scale.
> Having the session capture facility make the output chunkable will help 
> larger systems scale. I suggest that the interface for session writing be 
> expanded from a io.WriteCloser to include a sequence number that systems can 
> use to produce an ordered set of files for playback.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5253) Go SDK PubSub example currently broken

2018-08-31 Thread Henning Rohde (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599220#comment-16599220
 ] 

Henning Rohde commented on BEAM-5253:
-

[~seanhagen]:
(1) The direct runner doesn't support pubsub (or streaming pipelines). So that 
error is expected, although it could be clearer.
(2) Can you include the command you use on Dataflow? And maybe include a job 
id? The example works for me.

> Go SDK PubSub example currently broken
> --
>
> Key: BEAM-5253
> URL: https://issues.apache.org/jira/browse/BEAM-5253
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Sean Patrick Hagen
>Assignee: Henning Rohde
>Priority: Critical
>
> The Go SDK contains an example for creating a streaming pipeline that reads 
> from pubsub and outputs the messages. It can be found here: 
> [https://github.com/apache/beam/blob/master/sdks/go/examples/streaming_wordcap/wordcap.go]
>  
> This example is broken and does not work. It fails with the error "failed to 
> execute job: translation failed: no root units" when I try to run it with the 
> direct runner, and it just fails with "Internal Issue (8ed815a0a259018f): 
> 65177287:8503 " when run in Google Dataflow.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-2930) Flink support for portable side input

2018-08-22 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde reassigned BEAM-2930:
---

Assignee: Thomas Weise

> Flink support for portable side input
> -
>
> Key: BEAM-2930
> URL: https://issues.apache.org/jira/browse/BEAM-2930
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Reporter: Henning Rohde
>Assignee: Thomas Weise
>Priority: Major
>  Labels: portability
>  Time Spent: 4h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-4683) Integrate support for timers using the portability APIs into Spark

2018-08-22 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde closed BEAM-4683.
---
   Resolution: Later
Fix Version/s: Not applicable

> Integrate support for timers using the portability APIs into Spark
> --
>
> Key: BEAM-4683
> URL: https://issues.apache.org/jira/browse/BEAM-4683
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-spark
>Reporter: Luke Cwik
>Priority: Major
>  Labels: portability
> Fix For: Not applicable
>
>
> Consider using the code produced in BEAM-4658 to support timers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3672) FlinkRunner: Implement an Artifact service using the Flink DistributedCache

2018-08-22 Thread Henning Rohde (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589062#comment-16589062
 ] 

Henning Rohde commented on BEAM-3672:
-

Moving this to a separate task, if it's optional and not imminent.

> FlinkRunner: Implement an Artifact service using the Flink DistributedCache
> ---
>
> Key: BEAM-3672
> URL: https://issues.apache.org/jira/browse/BEAM-3672
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Reporter: Ben Sidhom
>Assignee: Aljoscha Krettek
>Priority: Minor
>
> We need to have a DistributedCache-based artifact service to ship with the 
> portable Flink runner. The DistributedCache is a perfect fit for Flink 
> because it comes for free and is the mechanism that Flink already uses to 
> distribute its own artifacts.
>  
> The final artifact service implementation should be pluggable, but using the 
> DistributedCache allows the Flink runner to work without additional external 
> dependencies (beyond perhaps Docker).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-3672) FlinkRunner: Implement an Artifact service using the Flink DistributedCache

2018-08-22 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde updated BEAM-3672:

Issue Type: Improvement  (was: Sub-task)
Parent: (was: BEAM-2889)

> FlinkRunner: Implement an Artifact service using the Flink DistributedCache
> ---
>
> Key: BEAM-3672
> URL: https://issues.apache.org/jira/browse/BEAM-3672
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Ben Sidhom
>Assignee: Aljoscha Krettek
>Priority: Minor
>
> We need to have a DistributedCache-based artifact service to ship with the 
> portable Flink runner. The DistributedCache is a perfect fit for Flink 
> because it comes for free and is the mechanism that Flink already uses to 
> distribute its own artifacts.
>  
> The final artifact service implementation should be pluggable, but using the 
> DistributedCache allows the Flink runner to work without additional external 
> dependencies (beyond perhaps Docker).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-2951) Containerize job proxy

2018-08-22 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde resolved BEAM-2951.
-
   Resolution: Won't Fix
Fix Version/s: Not applicable

We'll containerize each runner/runner proxy instead.

> Containerize job proxy
> --
>
> Key: BEAM-2951
> URL: https://issues.apache.org/jira/browse/BEAM-2951
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-core
>Reporter: Henning Rohde
>Priority: Major
>  Labels: portability
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-2886) Portable pipeline submission proxy

2018-08-22 Thread Henning Rohde (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589051#comment-16589051
 ] 

Henning Rohde commented on BEAM-2886:
-

I think this issue should be resolved, given we are going with a library (= 
what this issue was intended to capture) and separate runner-specific job 
servers. [~axelmagn] feel free to reopen or clarify, if that not correct.

> Portable pipeline submission proxy
> --
>
> Key: BEAM-2886
> URL: https://issues.apache.org/jira/browse/BEAM-2886
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core
>Reporter: Henning Rohde
>Assignee: Axel Magnuson
>Priority: Major
>  Labels: portability
> Fix For: 2.7.0
>
>
> Add an local extensible proxy for portable job submission as per 
> https://s.apache.org/beam-job-api as well as a runner/shim for each SDK 
> targeting it. This proxy is a stepping stone towards the longer-term goal of 
> native runner support of these APIs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-2886) Portable pipeline submission proxy

2018-08-22 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde resolved BEAM-2886.
-
   Resolution: Fixed
Fix Version/s: 2.7.0

> Portable pipeline submission proxy
> --
>
> Key: BEAM-2886
> URL: https://issues.apache.org/jira/browse/BEAM-2886
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core
>Reporter: Henning Rohde
>Assignee: Axel Magnuson
>Priority: Major
>  Labels: portability
> Fix For: 2.7.0
>
>
> Add an local extensible proxy for portable job submission as per 
> https://s.apache.org/beam-job-api as well as a runner/shim for each SDK 
> targeting it. This proxy is a stepping stone towards the longer-term goal of 
> native runner support of these APIs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-2885) Local Dataflow proxy for portable job submission

2018-08-22 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde updated BEAM-2885:

Summary: Local Dataflow proxy for portable job submission  (was: Support 
job+artifact APIs locally)

> Local Dataflow proxy for portable job submission
> 
>
> Key: BEAM-2885
> URL: https://issues.apache.org/jira/browse/BEAM-2885
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow
>Reporter: Henning Rohde
>Assignee: Henning Rohde
>Priority: Major
>  Labels: portability
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> As per https://s.apache.org/beam-job-api, use local support for 
> submission-side. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-2885) Local Dataflow proxy for portable job submission

2018-08-22 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde updated BEAM-2885:

Issue Type: Improvement  (was: Sub-task)
Parent: (was: BEAM-2884)

> Local Dataflow proxy for portable job submission
> 
>
> Key: BEAM-2885
> URL: https://issues.apache.org/jira/browse/BEAM-2885
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Henning Rohde
>Assignee: Henning Rohde
>Priority: Major
>  Labels: portability
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> As per https://s.apache.org/beam-job-api, use local support for 
> submission-side. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-2932) Spark support for portable side input

2018-08-22 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde closed BEAM-2932.
---
   Resolution: Later
Fix Version/s: Not applicable

> Spark support for portable side input
> -
>
> Key: BEAM-2932
> URL: https://issues.apache.org/jira/browse/BEAM-2932
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-spark
>Reporter: Henning Rohde
>Priority: Major
>  Labels: portability
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-2920) Spark support for portable user state

2018-08-22 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde closed BEAM-2920.
---
   Resolution: Later
Fix Version/s: Not applicable

> Spark support for portable user state
> -
>
> Key: BEAM-2920
> URL: https://issues.apache.org/jira/browse/BEAM-2920
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-spark
>Reporter: Henning Rohde
>Priority: Minor
>  Labels: portability
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-2909) Spark supports portable progress reporting

2018-08-22 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde closed BEAM-2909.
---
   Resolution: Later
Fix Version/s: Not applicable

> Spark supports portable progress reporting
> --
>
> Key: BEAM-2909
> URL: https://issues.apache.org/jira/browse/BEAM-2909
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-spark
>Reporter: Henning Rohde
>Priority: Minor
>  Labels: portability
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-2921) Gearpump support for portable user state

2018-08-22 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde closed BEAM-2921.
---
   Resolution: Later
Fix Version/s: Not applicable

> Gearpump support for portable user state
> 
>
> Key: BEAM-2921
> URL: https://issues.apache.org/jira/browse/BEAM-2921
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-gearpump
>Reporter: Henning Rohde
>Priority: Minor
>  Labels: portability
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-2910) Gearpump supports portable progress reporting

2018-08-22 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde closed BEAM-2910.
---
   Resolution: Later
Fix Version/s: Not applicable

> Gearpump supports portable progress reporting
> -
>
> Key: BEAM-2910
> URL: https://issues.apache.org/jira/browse/BEAM-2910
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-gearpump
>Reporter: Henning Rohde
>Priority: Minor
>  Labels: portability
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-2933) Gearpump support for portable side input

2018-08-22 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde closed BEAM-2933.
---
   Resolution: Later
Fix Version/s: Not applicable

> Gearpump support for portable side input
> 
>
> Key: BEAM-2933
> URL: https://issues.apache.org/jira/browse/BEAM-2933
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-gearpump
>Reporter: Henning Rohde
>Priority: Major
>  Labels: portability
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-2931) Apex support for portable side input

2018-08-22 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde closed BEAM-2931.
---
   Resolution: Later
Fix Version/s: Not applicable

> Apex support for portable side input
> 
>
> Key: BEAM-2931
> URL: https://issues.apache.org/jira/browse/BEAM-2931
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-apex
>Reporter: Henning Rohde
>Priority: Major
>  Labels: portability
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-2908) Apex supports portable progress reporting

2018-08-22 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde closed BEAM-2908.
---
   Resolution: Later
Fix Version/s: Not applicable

> Apex supports portable progress reporting
> -
>
> Key: BEAM-2908
> URL: https://issues.apache.org/jira/browse/BEAM-2908
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-apex
>Reporter: Henning Rohde
>Priority: Minor
>  Labels: portability
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-2919) Apex support for portable user state

2018-08-22 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde closed BEAM-2919.
---
   Resolution: Later
Fix Version/s: Not applicable

> Apex support for portable user state
> 
>
> Key: BEAM-2919
> URL: https://issues.apache.org/jira/browse/BEAM-2919
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-apex
>Reporter: Henning Rohde
>Priority: Minor
>  Labels: portability
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-2936) JStorm support for portable side input

2018-08-22 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde closed BEAM-2936.
---
   Resolution: Later
Fix Version/s: Not applicable

> JStorm support for portable side input
> --
>
> Key: BEAM-2936
> URL: https://issues.apache.org/jira/browse/BEAM-2936
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-jstorm
>Reporter: Henning Rohde
>Priority: Major
>  Labels: portability
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-2924) JStorm support for portable user state

2018-08-22 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde closed BEAM-2924.
---
   Resolution: Later
Fix Version/s: Not applicable

> JStorm support for portable user state
> --
>
> Key: BEAM-2924
> URL: https://issues.apache.org/jira/browse/BEAM-2924
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-jstorm
>Reporter: Henning Rohde
>Priority: Minor
>  Labels: portability
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-2913) JStorm supports portable progress reporting

2018-08-22 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde closed BEAM-2913.
---
   Resolution: Later
Fix Version/s: Not applicable

> JStorm supports portable progress reporting
> ---
>
> Key: BEAM-2913
> URL: https://issues.apache.org/jira/browse/BEAM-2913
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-jstorm
>Reporter: Henning Rohde
>Priority: Minor
>  Labels: portability
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-2923) MapReduce support for portable user state

2018-08-22 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde closed BEAM-2923.
---
   Resolution: Later
Fix Version/s: Not applicable

> MapReduce support for portable user state
> -
>
> Key: BEAM-2923
> URL: https://issues.apache.org/jira/browse/BEAM-2923
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-mapreduce
>Reporter: Henning Rohde
>Priority: Minor
>  Labels: portability
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-2912) MapReduce supports portable progress reporting

2018-08-22 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde closed BEAM-2912.
---
   Resolution: Later
Fix Version/s: Not applicable

> MapReduce supports portable progress reporting
> --
>
> Key: BEAM-2912
> URL: https://issues.apache.org/jira/browse/BEAM-2912
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-mapreduce
>Reporter: Henning Rohde
>Priority: Minor
>  Labels: portability
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-2935) MapReduce support for portable side input

2018-08-22 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde closed BEAM-2935.
---
   Resolution: Later
Fix Version/s: Not applicable

> MapReduce support for portable side input
> -
>
> Key: BEAM-2935
> URL: https://issues.apache.org/jira/browse/BEAM-2935
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-mapreduce
>Reporter: Henning Rohde
>Priority: Major
>  Labels: portability
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-2922) Tez support for portable user state

2018-08-22 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde closed BEAM-2922.
---
   Resolution: Later
Fix Version/s: Not applicable

> Tez support for portable user state
> ---
>
> Key: BEAM-2922
> URL: https://issues.apache.org/jira/browse/BEAM-2922
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-tez
>Reporter: Henning Rohde
>Priority: Minor
>  Labels: portability
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-2911) Tez supports portable progress reporting

2018-08-22 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde closed BEAM-2911.
---
   Resolution: Later
Fix Version/s: Not applicable

> Tez supports portable progress reporting
> 
>
> Key: BEAM-2911
> URL: https://issues.apache.org/jira/browse/BEAM-2911
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-tez
>Reporter: Henning Rohde
>Priority: Minor
>  Labels: portability
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-2934) Tez support for portable side input

2018-08-22 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde closed BEAM-2934.
---
   Resolution: Later
Fix Version/s: Not applicable

> Tez support for portable side input
> ---
>
> Key: BEAM-2934
> URL: https://issues.apache.org/jira/browse/BEAM-2934
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-tez
>Reporter: Henning Rohde
>Priority: Major
>  Labels: portability
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-3532) Add Distinct cookbook example to Go SDK

2018-08-21 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde reassigned BEAM-3532:
---

Assignee: (was: Willy Lulciuc)

> Add Distinct cookbook example to Go SDK
> ---
>
> Key: BEAM-3532
> URL: https://issues.apache.org/jira/browse/BEAM-3532
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Henning Rohde
>Priority: Minor
>
> Add the Go equivalent of
> [https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/cookbook/DistinctExample.java]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4636) Make beam.Run() (and/or friends) thread-safe.

2018-08-14 Thread Henning Rohde (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16580469#comment-16580469
 ] 

Henning Rohde commented on BEAM-4636:
-

I understand. I'm suggesting that you use the dataflowlib package directly 
instead of beamx to avoid flags.

> Make beam.Run() (and/or friends) thread-safe.
> -
>
> Key: BEAM-4636
> URL: https://issues.apache.org/jira/browse/BEAM-4636
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Eduardo Morales
>Assignee: Henning Rohde
>Priority: Minor
> Fix For: 2.7.0
>
>
> It would be nice to be able to launch dataflow pipelines in parallel. 
> For example, here is my use case:
>  * I consume data produced by my clients/customers.
>  * I need to launch a pipeline on an event dispatch.
>  * I may receive multiple events, from multiple customers at the same time.
>  * Go code could be simpler if synchronization/cooperation wouldn't be needed 
> from goroutines handling each customer. In particular, setting options 
> through flags is cumbersome.
>  * Launching dataflow pipelines serially may not scale if I am able to sign 
> up many customers. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-4699) BeamFileSystemArtifactServicesTest.putArtifactsSingleSmallFileTest flake

2018-08-09 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde updated BEAM-4699:

Issue Type: Bug  (was: New Feature)

> BeamFileSystemArtifactServicesTest.putArtifactsSingleSmallFileTest flake
> 
>
> Key: BEAM-4699
> URL: https://issues.apache.org/jira/browse/BEAM-4699
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Kenneth Knowles
>Assignee: Ankur Goenka
>Priority: Major
>  Labels: portability
>
> I've seen a few transient failures from 
> {{BeamFileSystemArtifactServicesTest}}. I don't recall if they are all 
> {{putArtifactsSingleSmallFileTest}} or how often they occur.
> https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/
> {code}
> java.io.FileNotFoundException: 
> /tmp/junit8499382858780569091/staging/123/artifacts/artifact_c147efcfc2d7ea666a9e4f5187b115c90903f0fc896a56df9a6ef5d8f3fc9f31
>  (No such file or directory)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-4699) BeamFileSystemArtifactServicesTest.putArtifactsSingleSmallFileTest flake

2018-08-09 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde reassigned BEAM-4699:
---

Assignee: Ankur Goenka  (was: Henning Rohde)

> BeamFileSystemArtifactServicesTest.putArtifactsSingleSmallFileTest flake
> 
>
> Key: BEAM-4699
> URL: https://issues.apache.org/jira/browse/BEAM-4699
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Kenneth Knowles
>Assignee: Ankur Goenka
>Priority: Major
>  Labels: portability
>
> I've seen a few transient failures from 
> {{BeamFileSystemArtifactServicesTest}}. I don't recall if they are all 
> {{putArtifactsSingleSmallFileTest}} or how often they occur.
> https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/
> {code}
> java.io.FileNotFoundException: 
> /tmp/junit8499382858780569091/staging/123/artifacts/artifact_c147efcfc2d7ea666a9e4f5187b115c90903f0fc896a56df9a6ef5d8f3fc9f31
>  (No such file or directory)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-4636) Make beam.Run() (and/or friends) thread-safe.

2018-08-09 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde resolved BEAM-4636.
-
   Resolution: Fixed
Fix Version/s: 2.7.0

> Make beam.Run() (and/or friends) thread-safe.
> -
>
> Key: BEAM-4636
> URL: https://issues.apache.org/jira/browse/BEAM-4636
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Eduardo Morales
>Assignee: Henning Rohde
>Priority: Minor
> Fix For: 2.7.0
>
>
> It would be nice to be able to launch dataflow pipelines in parallel. 
> For example, here is my use case:
>  * I consume data produced by my clients/customers.
>  * I need to launch a pipeline on an event dispatch.
>  * I may receive multiple events, from multiple customers at the same time.
>  * Go code could be simpler if synchronization/cooperation wouldn't be needed 
> from goroutines handling each customer. In particular, setting options 
> through flags is cumbersome.
>  * Launching dataflow pipelines serially may not scale if I am able to sign 
> up many customers. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-4699) BeamFileSystemArtifactServicesTest.putArtifactsSingleSmallFileTest flake

2018-08-09 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde updated BEAM-4699:

Labels: portability  (was: )

> BeamFileSystemArtifactServicesTest.putArtifactsSingleSmallFileTest flake
> 
>
> Key: BEAM-4699
> URL: https://issues.apache.org/jira/browse/BEAM-4699
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Kenneth Knowles
>Assignee: Henning Rohde
>Priority: Major
>  Labels: portability
>
> I've seen a few transient failures from 
> {{BeamFileSystemArtifactServicesTest}}. I don't recall if they are all 
> {{putArtifactsSingleSmallFileTest}} or how often they occur.
> https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/
> {code}
> java.io.FileNotFoundException: 
> /tmp/junit8499382858780569091/staging/123/artifacts/artifact_c147efcfc2d7ea666a9e4f5187b115c90903f0fc896a56df9a6ef5d8f3fc9f31
>  (No such file or directory)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4636) Make beam.Run() (and/or friends) thread-safe.

2018-08-09 Thread Henning Rohde (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575337#comment-16575337
 ] 

Henning Rohde commented on BEAM-4636:
-

[~exm] I think this should be fixed. I also refactored the Go Dataflow runner 
into a dataflowlib and a driver (similarly to the universal runner), so that 
you can use the lib directly without flags.

> Make beam.Run() (and/or friends) thread-safe.
> -
>
> Key: BEAM-4636
> URL: https://issues.apache.org/jira/browse/BEAM-4636
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Eduardo Morales
>Assignee: Henning Rohde
>Priority: Minor
>
> It would be nice to be able to launch dataflow pipelines in parallel. 
> For example, here is my use case:
>  * I consume data produced by my clients/customers.
>  * I need to launch a pipeline on an event dispatch.
>  * I may receive multiple events, from multiple customers at the same time.
>  * Go code could be simpler if synchronization/cooperation wouldn't be needed 
> from goroutines handling each customer. In particular, setting options 
> through flags is cumbersome.
>  * Launching dataflow pipelines serially may not scale if I am able to sign 
> up many customers. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-3917) Pipeline roots are not computed correctly for CoGBK

2018-08-02 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde resolved BEAM-3917.
-
   Resolution: Fixed
Fix Version/s: 2.7.0

> Pipeline roots are not computed correctly for CoGBK
> ---
>
> Key: BEAM-3917
> URL: https://issues.apache.org/jira/browse/BEAM-3917
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Henning Rohde
>Assignee: Henning Rohde
>Priority: Minor
> Fix For: 2.7.0
>
>
> The roots in graphx/translate.go do not take into account expansions, so a 
> top-level CoGBK would introduce root PTransforms that are not in the roots 
> list.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-4813) Make Go Dataflow translation use protos directly

2018-08-02 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde resolved BEAM-4813.
-
   Resolution: Fixed
Fix Version/s: 2.7.0

> Make Go Dataflow translation use protos directly
> 
>
> Key: BEAM-4813
> URL: https://issues.apache.org/jira/browse/BEAM-4813
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Henning Rohde
>Assignee: Henning Rohde
>Priority: Major
> Fix For: 2.7.0
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> The Go SDK maintains 2 pipeline translations and keeps various tweaks in 
> sync. It would be better to remove the Dataflow one and extract a more 
> flexible (such as running as a separate proxy) translation from proto to 
> v1beta3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4826) Flink runner sends bad flatten to SDK

2018-08-01 Thread Henning Rohde (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16566235#comment-16566235
 ] 

Henning Rohde commented on BEAM-4826:
-

Dataflow drops the flatten altogether.

> Flink runner sends bad flatten to SDK
> -
>
> Key: BEAM-4826
> URL: https://issues.apache.org/jira/browse/BEAM-4826
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Henning Rohde
>Assignee: Ankur Goenka
>Priority: Major
>  Labels: portability
>
> For a Go flatten test w/ 3 input, the Flink runner splits this into 3 bundle 
> descriptors. But it sends the original 3-input flatten but w/ 1 actual input 
> present in each bundle descriptor. This is inconsistent and the SDK shouldn't 
> expect dangling PCollections. In contrast, Dataflow removes the flatten when 
> it does the same split.
> Snippet:
> register: <
>   process_bundle_descriptor: <
> id: "3"
> transforms: <
>   key: "e4"
>   value: <
> unique_name: "github.com/apache/beam/sdks/go/pkg/beam.createFn'1"
> spec: <
>   urn: "urn:beam:transform:pardo:v1"
>   payload: [...]
> >
> inputs: <
>   key: "i0"
>   value: "n3"
> >
> outputs: <
>   key: "i0"
>   value: "n4"
> >
>   >
> >
> transforms: <
>   key: "e7"
>   value: <
> unique_name: "Flatten"
> spec: <
>   urn: "beam:transform:flatten:v1"
> >
> inputs: <
>   key: "i0"
>   value: "n2"
> >
> inputs: <
>   key: "i1"
>   value: "n4" . // <--- only one present.
> >
> inputs: <
>   key: "i2"
>   value: "n6"
> >
> outputs: <
>   key: "i0"
>   value: "n7"
> >
>   >
> >
> [...]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4826) Flink runner sends bad flatten to SDK

2018-08-01 Thread Henning Rohde (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16566219#comment-16566219
 ] 

Henning Rohde commented on BEAM-4826:
-

Yeah - unused outputs from ParDo is fine. As you say, they may occur from user 
code naturally.

Flatten is the only transform I can think of, where the runner would split it 
by input. We should just remove it in that case, like Dataflow does (because a 
1 input flatten is a no-op) -- although for Go at least, it would work fine. 
Other transforms generally have an expected input arity and the SDK can't 
execute the bundle if the inputs are not all wired up. Mucking with such inputs 
probably wouldn't work well.

> Flink runner sends bad flatten to SDK
> -
>
> Key: BEAM-4826
> URL: https://issues.apache.org/jira/browse/BEAM-4826
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Henning Rohde
>Assignee: Ankur Goenka
>Priority: Major
>  Labels: portability
>
> For a Go flatten test w/ 3 input, the Flink runner splits this into 3 bundle 
> descriptors. But it sends the original 3-input flatten but w/ 1 actual input 
> present in each bundle descriptor. This is inconsistent and the SDK shouldn't 
> expect dangling PCollections. In contrast, Dataflow removes the flatten when 
> it does the same split.
> Snippet:
> register: <
>   process_bundle_descriptor: <
> id: "3"
> transforms: <
>   key: "e4"
>   value: <
> unique_name: "github.com/apache/beam/sdks/go/pkg/beam.createFn'1"
> spec: <
>   urn: "urn:beam:transform:pardo:v1"
>   payload: [...]
> >
> inputs: <
>   key: "i0"
>   value: "n3"
> >
> outputs: <
>   key: "i0"
>   value: "n4"
> >
>   >
> >
> transforms: <
>   key: "e7"
>   value: <
> unique_name: "Flatten"
> spec: <
>   urn: "beam:transform:flatten:v1"
> >
> inputs: <
>   key: "i0"
>   value: "n2"
> >
> inputs: <
>   key: "i1"
>   value: "n4" . // <--- only one present.
> >
> inputs: <
>   key: "i2"
>   value: "n6"
> >
> outputs: <
>   key: "i0"
>   value: "n7"
> >
>   >
> >
> [...]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-4826) Flink runner sends bad flatten to SDK

2018-08-01 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde reassigned BEAM-4826:
---

Assignee: Ankur Goenka  (was: Aljoscha Krettek)

> Flink runner sends bad flatten to SDK
> -
>
> Key: BEAM-4826
> URL: https://issues.apache.org/jira/browse/BEAM-4826
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Henning Rohde
>Assignee: Ankur Goenka
>Priority: Major
>  Labels: portability
>
> For a Go flatten test w/ 3 input, the Flink runner splits this into 3 bundle 
> descriptors. But it sends the original 3-input flatten but w/ 1 actual input 
> present in each bundle descriptor. This is inconsistent and the SDK shouldn't 
> expect dangling PCollections. In contrast, Dataflow removes the flatten when 
> it does the same split.
> Snippet:
> register: <
>   process_bundle_descriptor: <
> id: "3"
> transforms: <
>   key: "e4"
>   value: <
> unique_name: "github.com/apache/beam/sdks/go/pkg/beam.createFn'1"
> spec: <
>   urn: "urn:beam:transform:pardo:v1"
>   payload: [...]
> >
> inputs: <
>   key: "i0"
>   value: "n3"
> >
> outputs: <
>   key: "i0"
>   value: "n4"
> >
>   >
> >
> transforms: <
>   key: "e7"
>   value: <
> unique_name: "Flatten"
> spec: <
>   urn: "beam:transform:flatten:v1"
> >
> inputs: <
>   key: "i0"
>   value: "n2"
> >
> inputs: <
>   key: "i1"
>   value: "n4" . // <--- only one present.
> >
> inputs: <
>   key: "i2"
>   value: "n6"
> >
> outputs: <
>   key: "i0"
>   value: "n7"
> >
>   >
> >
> [...]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4826) Flink runner sends bad flatten to SDK

2018-07-19 Thread Henning Rohde (JIRA)
Henning Rohde created BEAM-4826:
---

 Summary: Flink runner sends bad flatten to SDK
 Key: BEAM-4826
 URL: https://issues.apache.org/jira/browse/BEAM-4826
 Project: Beam
  Issue Type: Bug
  Components: runner-flink
Reporter: Henning Rohde
Assignee: Aljoscha Krettek


For a Go flatten test w/ 3 input, the Flink runner splits this into 3 bundle 
descriptors. But it sends the original 3-input flatten but w/ 1 actual input 
present in each bundle descriptor. This is inconsistent and the SDK shouldn't 
expect dangling PCollections. In contrast, Dataflow removes the flatten when it 
does the same split.

Snippet:

register: <
  process_bundle_descriptor: <
id: "3"
transforms: <
  key: "e4"
  value: <
unique_name: "github.com/apache/beam/sdks/go/pkg/beam.createFn'1"
spec: <
  urn: "urn:beam:transform:pardo:v1"
  payload: [...]
>
inputs: <
  key: "i0"
  value: "n3"
>
outputs: <
  key: "i0"
  value: "n4"
>
  >
>
transforms: <
  key: "e7"
  value: <
unique_name: "Flatten"
spec: <
  urn: "beam:transform:flatten:v1"
>
inputs: <
  key: "i0"
  value: "n2"
>
inputs: <
  key: "i1"
  value: "n4" . // <--- only one present.
>
inputs: <
  key: "i2"
  value: "n6"
>
outputs: <
  key: "i0"
  value: "n7"
>
  >
>
[...]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   4   5   >