[jira] [Created] (BEAM-10206) [Go SDK] Jenkins static checks

2020-06-05 Thread Robert Burke (Jira)
Robert Burke created BEAM-10206:
---

 Summary: [Go SDK] Jenkins static checks  
 Key: BEAM-10206
 URL: https://issues.apache.org/jira/browse/BEAM-10206
 Project: Beam
  Issue Type: Improvement
  Components: sdk-go
Reporter: Robert Burke


We should probably hook up static checks 
[https://staticcheck.io|https://staticcheck.io/] to avoid style and lint 
regressions, and to run them locally and fix most of them.

Additional configuration we could probably integrate should take the proto 
import conventions we've established (see PR 11927) so that we use consistent 
short names.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10206) [Go SDK] Jenkins static checks

2020-06-05 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke updated BEAM-10206:

Status: Open  (was: Triage Needed)

> [Go SDK] Jenkins static checks  
> 
>
> Key: BEAM-10206
> URL: https://issues.apache.org/jira/browse/BEAM-10206
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Robert Burke
>Priority: P3
>
> We should probably hook up static checks 
> [https://staticcheck.io|https://staticcheck.io/] to avoid style and lint 
> regressions, and to run them locally and fix most of them.
> Additional configuration we could probably integrate should take the proto 
> import conventions we've established (see PR 11927) so that we use consistent 
> short names.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-10169) ParDo* functions should declare the correct output N in their error message

2020-06-05 Thread Robert Burke (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126926#comment-17126926
 ] 

Robert Burke edited comment on BEAM-10169 at 6/5/20, 4:23 PM:
--

Given that a given DoFn will only be suitable for one of the ParDo* methods, 
we're probably better off being less polite. And in Go, there's a strong got 
before want convention at least for test outputs. Further, while we're 
panicking, there's no reason we shouldn't clearly indicate the context of the 
error. So I was thinking:

{{DoFn \{doFnName} has \{numOutputs} outputs, but ParDo\{parDoNum} requires 
\{parDoNum}. Use ParDo\{numOutputs} instead.}}

Of course there's also edge cases to consider. Eg. should just print ParDo 
instead of ParDo1, and if there are more than 7 outputs, then it should 
recommend ParDoN instead. Having the doFnName helps localize which DoFn is 
being used wrong, and the panic trace will hopefully make the caller's line 
number unambiguous. Further, having the doFnName helps when some user is having 
ParDo called indirectly.

This isn't perfect for all problems, but it covers most of them I think.
 What do you think of that? [~codeBehindMe]


was (Author: lostluck):
Given that a given DoFn will only be suitable for one of the ParDo* methods, 
we're probably better off being less polite. And in Go, there's a strong got 
before want convention at least for test outputs. Further, while we're 
panicking, there's no reason we shouldn't clearly indicate the context of the 
error. So I was thinking:

```DoFn \{doFnName} has \{numOutputs} outputs, but ParDo\{parDoNum} requires 
\{parDoNum}. Use ParDo\{numOutputs} instead.```

Of course there's also edge cases to consider. Eg. should just print ParDo 
instead of ParDo1, and if there are more than 7 outputs, then it should 
recommend ParDoN instead. Having the doFnName helps localize which DoFn is 
being used wrong, and the panic trace will hopefully make the caller's line 
number unambiguous. Further, having the doFnName helps when some user is having 
ParDo called indirectly.

This isn't perfect for all problems, but it covers most of them I think.
 What do you think of that? [~codeBehindMe]

> ParDo* functions should declare the correct output N in their error message
> ---
>
> Key: BEAM-10169
> URL: https://issues.apache.org/jira/browse/BEAM-10169
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Aaron Tillekeratne
>Priority: P3
>  Labels: noob, starter
>
> User report noted the confusion in the error if you use a DoFn with 0 outputs 
> with beam.ParDo instead of beam.ParDo0. 
> In that case, a panic stack trace is followed by the cryptic: "expected 1 
> output. Found: []"
> We can do better.
> While we can't change the return signature dynamically (that's for ParDoN 
> only), we can instead clearly indicate: 
> *  the DoFn in question.
> * the number of outputs the DoFn has
> * and recommend using ParDo0, ParDo, ParDo2,...ParDo7,  or ParDoN, as 
> appropriate.
> https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/pardo.go#L361 
> would need to change as well as any of the specific cases that follow. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-10169) ParDo* functions should declare the correct output N in their error message

2020-06-05 Thread Robert Burke (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126926#comment-17126926
 ] 

Robert Burke edited comment on BEAM-10169 at 6/5/20, 4:22 PM:
--

Given that a given DoFn will only be suitable for one of the ParDo* methods, 
we're probably better off being less polite. And in Go, there's a strong got 
before want convention at least for test outputs. Further, while we're 
panicking, there's no reason we shouldn't clearly indicate the context of the 
error. So I was thinking:

```DoFn \{doFnName} has \{numOutputs} outputs, but ParDo\{parDoNum} requires 
\{parDoNum}. Use ParDo\{numOutputs} instead.```

Of course there's also edge cases to consider. Eg. should just print ParDo 
instead of ParDo1, and if there are more than 7 outputs, then it should 
recommend ParDoN instead. Having the doFnName helps localize which DoFn is 
being used wrong, and the panic trace will hopefully make the caller's line 
number unambiguous. Further, having the doFnName helps when some user is having 
ParDo called indirectly.

This isn't perfect for all problems, but it covers most of them I think.
 What do you think of that? [~codeBehindMe]


was (Author: lostluck):
Given that a given DoFn will only be suitable for one of the ParDo* methods, 
we're probably better off being less polite. And in Go, there's a strong got 
before want convention at least for test outputs. Further, while we're 
panicking, there's no reason we shouldn't clearly indicate the context of the 
error. So I was thinking:

```DoFn {doFnName} has {numOutputs} outputs, but ParDo{parDoNum} requires 
{parDoNum}. Use ParDo{numOutputs} instead.```

Of course there's also edge cases to consider. Eg. should just print ParDo 
instead of ParDo1, and if there are more than 7 outputs, then it should 
recommend ParDoN instead. Having the doFnName helps localize which DoFn is 
being used wrong, and the panic trace will hopefully make the caller's line 
number unambiguous. Further, having the doFnName helps when some user is having 
ParDo called indirectly. 

This isn't perfect for all problems, but it covers most of them I think.
What do you think of that? [~codeBehindMe]

> ParDo* functions should declare the correct output N in their error message
> ---
>
> Key: BEAM-10169
> URL: https://issues.apache.org/jira/browse/BEAM-10169
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Aaron Tillekeratne
>Priority: P3
>  Labels: noob, starter
>
> User report noted the confusion in the error if you use a DoFn with 0 outputs 
> with beam.ParDo instead of beam.ParDo0. 
> In that case, a panic stack trace is followed by the cryptic: "expected 1 
> output. Found: []"
> We can do better.
> While we can't change the return signature dynamically (that's for ParDoN 
> only), we can instead clearly indicate: 
> *  the DoFn in question.
> * the number of outputs the DoFn has
> * and recommend using ParDo0, ParDo, ParDo2,...ParDo7,  or ParDoN, as 
> appropriate.
> https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/pardo.go#L361 
> would need to change as well as any of the specific cases that follow. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10169) ParDo* functions should declare the correct output N in their error message

2020-06-05 Thread Robert Burke (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126926#comment-17126926
 ] 

Robert Burke commented on BEAM-10169:
-

Given that a given DoFn will only be suitable for one of the ParDo* methods, 
we're probably better off being less polite. And in Go, there's a strong got 
before want convention at least for test outputs. Further, while we're 
panicking, there's no reason we shouldn't clearly indicate the context of the 
error. So I was thinking:

```DoFn {doFnName} has {numOutputs} outputs, but ParDo{parDoNum} requires 
{parDoNum}. Use ParDo{numOutputs} instead.```

Of course there's also edge cases to consider. Eg. should just print ParDo 
instead of ParDo1, and if there are more than 7 outputs, then it should 
recommend ParDoN instead. Having the doFnName helps localize which DoFn is 
being used wrong, and the panic trace will hopefully make the caller's line 
number unambiguous. Further, having the doFnName helps when some user is having 
ParDo called indirectly. 

This isn't perfect for all problems, but it covers most of them I think.
What do you think of that?

> ParDo* functions should declare the correct output N in their error message
> ---
>
> Key: BEAM-10169
> URL: https://issues.apache.org/jira/browse/BEAM-10169
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Aaron Tillekeratne
>Priority: P3
>  Labels: noob, starter
>
> User report noted the confusion in the error if you use a DoFn with 0 outputs 
> with beam.ParDo instead of beam.ParDo0. 
> In that case, a panic stack trace is followed by the cryptic: "expected 1 
> output. Found: []"
> We can do better.
> While we can't change the return signature dynamically (that's for ParDoN 
> only), we can instead clearly indicate: 
> *  the DoFn in question.
> * the number of outputs the DoFn has
> * and recommend using ParDo0, ParDo, ParDo2,...ParDo7,  or ParDoN, as 
> appropriate.
> https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/pardo.go#L361 
> would need to change as well as any of the specific cases that follow. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-10169) ParDo* functions should declare the correct output N in their error message

2020-06-05 Thread Robert Burke (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126926#comment-17126926
 ] 

Robert Burke edited comment on BEAM-10169 at 6/5/20, 4:19 PM:
--

Given that a given DoFn will only be suitable for one of the ParDo* methods, 
we're probably better off being less polite. And in Go, there's a strong got 
before want convention at least for test outputs. Further, while we're 
panicking, there's no reason we shouldn't clearly indicate the context of the 
error. So I was thinking:

```DoFn {doFnName} has {numOutputs} outputs, but ParDo{parDoNum} requires 
{parDoNum}. Use ParDo{numOutputs} instead.```

Of course there's also edge cases to consider. Eg. should just print ParDo 
instead of ParDo1, and if there are more than 7 outputs, then it should 
recommend ParDoN instead. Having the doFnName helps localize which DoFn is 
being used wrong, and the panic trace will hopefully make the caller's line 
number unambiguous. Further, having the doFnName helps when some user is having 
ParDo called indirectly. 

This isn't perfect for all problems, but it covers most of them I think.
What do you think of that? [~codeBehindMe]


was (Author: lostluck):
Given that a given DoFn will only be suitable for one of the ParDo* methods, 
we're probably better off being less polite. And in Go, there's a strong got 
before want convention at least for test outputs. Further, while we're 
panicking, there's no reason we shouldn't clearly indicate the context of the 
error. So I was thinking:

```DoFn {doFnName} has {numOutputs} outputs, but ParDo{parDoNum} requires 
{parDoNum}. Use ParDo{numOutputs} instead.```

Of course there's also edge cases to consider. Eg. should just print ParDo 
instead of ParDo1, and if there are more than 7 outputs, then it should 
recommend ParDoN instead. Having the doFnName helps localize which DoFn is 
being used wrong, and the panic trace will hopefully make the caller's line 
number unambiguous. Further, having the doFnName helps when some user is having 
ParDo called indirectly. 

This isn't perfect for all problems, but it covers most of them I think.
What do you think of that?

> ParDo* functions should declare the correct output N in their error message
> ---
>
> Key: BEAM-10169
> URL: https://issues.apache.org/jira/browse/BEAM-10169
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Aaron Tillekeratne
>Priority: P3
>  Labels: noob, starter
>
> User report noted the confusion in the error if you use a DoFn with 0 outputs 
> with beam.ParDo instead of beam.ParDo0. 
> In that case, a panic stack trace is followed by the cryptic: "expected 1 
> output. Found: []"
> We can do better.
> While we can't change the return signature dynamically (that's for ParDoN 
> only), we can instead clearly indicate: 
> *  the DoFn in question.
> * the number of outputs the DoFn has
> * and recommend using ParDo0, ParDo, ParDo2,...ParDo7,  or ParDoN, as 
> appropriate.
> https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/pardo.go#L361 
> would need to change as well as any of the specific cases that follow. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9615) [Go SDK] Beam Schemas

2020-06-04 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke updated BEAM-9615:
---
Labels:   (was: stale-assigned)

> [Go SDK] Beam Schemas
> -
>
> Key: BEAM-9615
> URL: https://issues.apache.org/jira/browse/BEAM-9615
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Robert Burke
>Priority: P2
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Schema support is required for advanced cross language features in Beam, and 
> has the opportunity to replace the current default JSON encoding of elements.
> Some quick notes, though a better fleshed out doc with details will be 
> forthcoming:
>  * All base coders should be implemented, and listed as coder capabilities. I 
> think only stringutf8 is missing presently.
>  * Should support fairly arbitrary user types, seamlessly. That is, users 
> should be able to rely on it "just working" if their type is compatible.
>  * Should support schema metadata tagging.
> In particular, one breaking shift in the default will be to explicitly fail 
> pipelines if elements have unexported fields, when no other custom coder has 
> been added. This has been a source of errors/dropped data/keys and a simply 
> warning at construction time won't cut it. However, we could provide a manual 
> "use beam schemas, but ignore unexported fields" registration as a work 
> around.
> Edit: Doc is now at https://s.apache.org/beam-go-schemas



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9615) [Go SDK] Beam Schemas

2020-06-04 Thread Robert Burke (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126280#comment-17126280
 ] 

Robert Burke commented on BEAM-9615:


State of the world slowed down progress on this, but I'm now rolling out PRs 
for review.

> [Go SDK] Beam Schemas
> -
>
> Key: BEAM-9615
> URL: https://issues.apache.org/jira/browse/BEAM-9615
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Robert Burke
>Priority: P2
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Schema support is required for advanced cross language features in Beam, and 
> has the opportunity to replace the current default JSON encoding of elements.
> Some quick notes, though a better fleshed out doc with details will be 
> forthcoming:
>  * All base coders should be implemented, and listed as coder capabilities. I 
> think only stringutf8 is missing presently.
>  * Should support fairly arbitrary user types, seamlessly. That is, users 
> should be able to rely on it "just working" if their type is compatible.
>  * Should support schema metadata tagging.
> In particular, one breaking shift in the default will be to explicitly fail 
> pipelines if elements have unexported fields, when no other custom coder has 
> been added. This has been a source of errors/dropped data/keys and a simply 
> warning at construction time won't cut it. However, we could provide a manual 
> "use beam schemas, but ignore unexported fields" registration as a work 
> around.
> Edit: Doc is now at https://s.apache.org/beam-go-schemas



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10169) ParDo* functions should declare the correct output N in their error message

2020-06-02 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke updated BEAM-10169:

Status: Open  (was: Triage Needed)

> ParDo* functions should declare the correct output N in their error message
> ---
>
> Key: BEAM-10169
> URL: https://issues.apache.org/jira/browse/BEAM-10169
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Aaron Tillekeratne
>Priority: P3
>  Labels: noob, starter
>
> User report noted the confusion in the error if you use a DoFn with 0 outputs 
> with beam.ParDo instead of beam.ParDo0. 
> In that case, a panic stack trace is followed by the cryptic: "expected 1 
> output. Found: []"
> We can do better.
> While we can't change the return signature dynamically (that's for ParDoN 
> only), we can instead clearly indicate: 
> *  the DoFn in question.
> * the number of outputs the DoFn has
> * and recommend using ParDo0, ParDo, ParDo2,...ParDo7,  or ParDoN, as 
> appropriate.
> https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/pardo.go#L361 
> would need to change as well as any of the specific cases that follow. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10169) ParDo* functions should declare the correct output N in their error message

2020-06-01 Thread Robert Burke (Jira)
Robert Burke created BEAM-10169:
---

 Summary: ParDo* functions should declare the correct output N in 
their error message
 Key: BEAM-10169
 URL: https://issues.apache.org/jira/browse/BEAM-10169
 Project: Beam
  Issue Type: Improvement
  Components: sdk-go
Reporter: Robert Burke


User report noted the confusion in the error if you use a DoFn with 0 outputs 
with beam.ParDo instead of beam.ParDo0. 

In that case, a panic stack trace is followed by the cryptic: "expected 1 
output. Found: []"

We can do better.

While we can't change the return signature dynamically (that's for ParDoN 
only), we can instead clearly indicate: 
*  the DoFn in question.
* the number of outputs the DoFn has
* and recommend using ParDo0, ParDo, ParDo2,...ParDo7,  or ParDoN, as 
appropriate.

https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/pardo.go#L361 would 
need to change as well as any of the specific cases that follow. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10166) Improve execution time errors

2020-06-01 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke updated BEAM-10166:

Labels: beginner n00b starter  (was: )

> Improve execution time errors
> -
>
> Key: BEAM-10166
> URL: https://issues.apache.org/jira/browse/BEAM-10166
> Project: Beam
>  Issue Type: Task
>  Components: sdk-go
>Reporter: Robert Burke
>Priority: P2
>  Labels: beginner, n00b, starter
>
> The Go SDK uses errors returned by DoFns to signal failures to process 
> bundles, and terminate bundle processing. However, if the preceding DoFn uses 
> emitters, rather than error returns, the code has no choice to panic to avoid 
> user code handling or ignoring the cross DoFn error (which could cause 
> dataloss or other correctness problems). 
> All bundle executions are wrapped in `callNoPanic` to prevent worker 
> termination on such panics, and orderly terminate just the affected bundle 
> instead.`callNoPanic` uses Go's built in recover mechanism to get the error 
> and provide a stack trace.
> We can do better.
> The value returned by recover is just an interface{} which means we could 
> detect the specific type of error it is. In particular, we could have the 
> exec package have an error that we can detect. If the recovered value is that 
> error, then we could use that to provide a clearer error message  than a 
> panic stack trace.
> Such an error wrapper would contain: the error in question, the user DoFn 
> that caused it, the debug id of the DoFn node (To be related back to the 
> plan.)
> Then in `callNoPanic` we could detect this error wrapper and produce a 
> clearer error message based on the existing plan. If not, we can maintain the 
> current behavior. This latter part is necessary to handle panics originating 
> in user code. 
> To avoid mistaken user use which would breach this protocol, we're best off 
> keeping the wrapper unexported from the exec package.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10166) Improve execution time errors

2020-06-01 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke updated BEAM-10166:

Issue Type: Improvement  (was: Task)

> Improve execution time errors
> -
>
> Key: BEAM-10166
> URL: https://issues.apache.org/jira/browse/BEAM-10166
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Robert Burke
>Priority: P2
>  Labels: beginner, n00b, starter
>
> The Go SDK uses errors returned by DoFns to signal failures to process 
> bundles, and terminate bundle processing. However, if the preceding DoFn uses 
> emitters, rather than error returns, the code has no choice to panic to avoid 
> user code handling or ignoring the cross DoFn error (which could cause 
> dataloss or other correctness problems). 
> All bundle executions are wrapped in `callNoPanic` to prevent worker 
> termination on such panics, and orderly terminate just the affected bundle 
> instead.`callNoPanic` uses Go's built in recover mechanism to get the error 
> and provide a stack trace.
> We can do better.
> The value returned by recover is just an interface{} which means we could 
> detect the specific type of error it is. In particular, we could have the 
> exec package have an error that we can detect. If the recovered value is that 
> error, then we could use that to provide a clearer error message  than a 
> panic stack trace.
> Such an error wrapper would contain: the error in question, the user DoFn 
> that caused it, the debug id of the DoFn node (To be related back to the 
> plan.)
> Then in `callNoPanic` we could detect this error wrapper and produce a 
> clearer error message based on the existing plan. If not, we can maintain the 
> current behavior. This latter part is necessary to handle panics originating 
> in user code. 
> To avoid mistaken user use which would breach this protocol, we're best off 
> keeping the wrapper unexported from the exec package.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10166) Improve execution time errors

2020-06-01 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke updated BEAM-10166:

Priority: P3  (was: P2)

> Improve execution time errors
> -
>
> Key: BEAM-10166
> URL: https://issues.apache.org/jira/browse/BEAM-10166
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Robert Burke
>Priority: P3
>  Labels: beginner, n00b, starter
>
> The Go SDK uses errors returned by DoFns to signal failures to process 
> bundles, and terminate bundle processing. However, if the preceding DoFn uses 
> emitters, rather than error returns, the code has no choice to panic to avoid 
> user code handling or ignoring the cross DoFn error (which could cause 
> dataloss or other correctness problems). 
> All bundle executions are wrapped in `callNoPanic` to prevent worker 
> termination on such panics, and orderly terminate just the affected bundle 
> instead.`callNoPanic` uses Go's built in recover mechanism to get the error 
> and provide a stack trace.
> We can do better.
> The value returned by recover is just an interface{} which means we could 
> detect the specific type of error it is. In particular, we could have the 
> exec package have an error that we can detect. If the recovered value is that 
> error, then we could use that to provide a clearer error message  than a 
> panic stack trace.
> Such an error wrapper would contain: the error in question, the user DoFn 
> that caused it, the debug id of the DoFn node (To be related back to the 
> plan.)
> Then in `callNoPanic` we could detect this error wrapper and produce a 
> clearer error message based on the existing plan. If not, we can maintain the 
> current behavior. This latter part is necessary to handle panics originating 
> in user code. 
> To avoid mistaken user use which would breach this protocol, we're best off 
> keeping the wrapper unexported from the exec package.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10166) Improve execution time errors

2020-06-01 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke updated BEAM-10166:

Status: Open  (was: Triage Needed)

> Improve execution time errors
> -
>
> Key: BEAM-10166
> URL: https://issues.apache.org/jira/browse/BEAM-10166
> Project: Beam
>  Issue Type: Task
>  Components: sdk-go
>Reporter: Robert Burke
>Priority: P2
>  Labels: beginner, n00b, starter
>
> The Go SDK uses errors returned by DoFns to signal failures to process 
> bundles, and terminate bundle processing. However, if the preceding DoFn uses 
> emitters, rather than error returns, the code has no choice to panic to avoid 
> user code handling or ignoring the cross DoFn error (which could cause 
> dataloss or other correctness problems). 
> All bundle executions are wrapped in `callNoPanic` to prevent worker 
> termination on such panics, and orderly terminate just the affected bundle 
> instead.`callNoPanic` uses Go's built in recover mechanism to get the error 
> and provide a stack trace.
> We can do better.
> The value returned by recover is just an interface{} which means we could 
> detect the specific type of error it is. In particular, we could have the 
> exec package have an error that we can detect. If the recovered value is that 
> error, then we could use that to provide a clearer error message  than a 
> panic stack trace.
> Such an error wrapper would contain: the error in question, the user DoFn 
> that caused it, the debug id of the DoFn node (To be related back to the 
> plan.)
> Then in `callNoPanic` we could detect this error wrapper and produce a 
> clearer error message based on the existing plan. If not, we can maintain the 
> current behavior. This latter part is necessary to handle panics originating 
> in user code. 
> To avoid mistaken user use which would breach this protocol, we're best off 
> keeping the wrapper unexported from the exec package.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10166) Improve execution time errors

2020-06-01 Thread Robert Burke (Jira)
Robert Burke created BEAM-10166:
---

 Summary: Improve execution time errors
 Key: BEAM-10166
 URL: https://issues.apache.org/jira/browse/BEAM-10166
 Project: Beam
  Issue Type: Task
  Components: sdk-go
Reporter: Robert Burke


The Go SDK uses errors returned by DoFns to signal failures to process bundles, 
and terminate bundle processing. However, if the preceding DoFn uses emitters, 
rather than error returns, the code has no choice to panic to avoid user code 
handling or ignoring the cross DoFn error (which could cause dataloss or other 
correctness problems). 

All bundle executions are wrapped in `callNoPanic` to prevent worker 
termination on such panics, and orderly terminate just the affected bundle 
instead.`callNoPanic` uses Go's built in recover mechanism to get the error and 
provide a stack trace.

We can do better.

The value returned by recover is just an interface{} which means we could 
detect the specific type of error it is. In particular, we could have the exec 
package have an error that we can detect. If the recovered value is that error, 
then we could use that to provide a clearer error message  than a panic stack 
trace.
Such an error wrapper would contain: the error in question, the user DoFn that 
caused it, the debug id of the DoFn node (To be related back to the plan.)

Then in `callNoPanic` we could detect this error wrapper and produce a clearer 
error message based on the existing plan. If not, we can maintain the current 
behavior. This latter part is necessary to handle panics originating in user 
code. 
To avoid mistaken user use which would breach this protocol, we're best off 
keeping the wrapper unexported from the exec package.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9789) Locking error in harness.go

2020-06-01 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke resolved BEAM-9789.

Resolution: Fixed

> Locking error in harness.go
> ---
>
> Key: BEAM-9789
> URL: https://issues.apache.org/jira/browse/BEAM-9789
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Robert Burke
>Priority: P2
> Fix For: Not applicable
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When there's an error on lookup or construction of an execution plan, the 
> lock is accidentally held causing the worker to freeze. 
> Shouldn't be user affecting, as most plans and lookups are correct without 
> error, but if there's a transient GRPC issue on lookup, that might cause an 
> otherwise healthy worker to deadlock.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9815) beam_PostCommit_Go perma red due to failing to start container

2020-05-29 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke resolved BEAM-9815.

Fix Version/s: Not applicable
   Resolution: Fixed

Dataflow's portable artifact service was updated, so Go Dataflow PostCommits 
are green again.

> beam_PostCommit_Go perma red due to failing to start container
> --
>
> Key: BEAM-9815
> URL: https://issues.apache.org/jira/browse/BEAM-9815
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go, test-failures
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Robert Bradshaw
>Priority: P1
>  Labels: currently-failing
> Fix For: Not applicable
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> For example,
> [https://builds.apache.org/job/beam_PostCommit_Go/6847/]
> [https://pantheon.corp.google.com/logs/viewer?project=apache-beam-testing=500=false=2020-04-24T15:09:13.45500Z==true=NO_LIMIT=dataflow_step%2Fjob_id%2F2020-04-24_05_03_49-5495819388067192698=2020-04-24T13:03:38.313084000Z]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9815) beam_PostCommit_Go perma red due to failing to start container

2020-05-29 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke updated BEAM-9815:
---
Labels:   (was: currently-failing)

> beam_PostCommit_Go perma red due to failing to start container
> --
>
> Key: BEAM-9815
> URL: https://issues.apache.org/jira/browse/BEAM-9815
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go, test-failures
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Robert Bradshaw
>Priority: P1
> Fix For: Not applicable
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> For example,
> [https://builds.apache.org/job/beam_PostCommit_Go/6847/]
> [https://pantheon.corp.google.com/logs/viewer?project=apache-beam-testing=500=false=2020-04-24T15:09:13.45500Z==true=NO_LIMIT=dataflow_step%2Fjob_id%2F2020-04-24_05_03_49-5495819388067192698=2020-04-24T13:03:38.313084000Z]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10110) Populate pipeline_proto_coder_id field for dataflow.

2020-05-27 Thread Robert Burke (Jira)
Robert Burke created BEAM-10110:
---

 Summary: Populate pipeline_proto_coder_id field for dataflow.
 Key: BEAM-10110
 URL: https://issues.apache.org/jira/browse/BEAM-10110
 Project: Beam
  Issue Type: Task
  Components: runner-dataflow, sdk-go
Reporter: Robert Burke
Assignee: Robert Burke


Dataflow isn't natively translating from the Beam Pipeline Proto yet, but 
requires SDKs to translate the graph into it's own format. Adding this hint for 
custom coders (Coders  not known to Dataflow/Beam) avoids having dataflow 
re-synthesize coders from it's format, back to the pipeline proto.

Currently there's the awkward restriction on which coders should receive the 
ID, rather than having the SDK apply the field to all of them, but this is a 
good first step to get there. This restriction may be lifted on a subsequent 
dataflow release. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9815) beam_PostCommit_Go perma red due to failing to start container

2020-05-26 Thread Robert Burke (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117009#comment-17117009
 ] 

Robert Burke commented on BEAM-9815:


This seems to be resolved since Dataflow updated.

> beam_PostCommit_Go perma red due to failing to start container
> --
>
> Key: BEAM-9815
> URL: https://issues.apache.org/jira/browse/BEAM-9815
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go, test-failures
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Robert Bradshaw
>Priority: P1
>  Labels: currently-failing
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> For example,
> [https://builds.apache.org/job/beam_PostCommit_Go/6847/]
> [https://pantheon.corp.google.com/logs/viewer?project=apache-beam-testing=500=false=2020-04-24T15:09:13.45500Z==true=NO_LIMIT=dataflow_step%2Fjob_id%2F2020-04-24_05_03_49-5495819388067192698=2020-04-24T13:03:38.313084000Z]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9679) Core Transforms | Go SDK Code Katas

2020-05-26 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke updated BEAM-9679:
---
Description: 
A kata devoted to core beam transforms patterns after 
[https://github.com/apache/beam/tree/master/learning/katas/java/Core%20Transforms]
 where the take away is an individual's ability to master the following using 
an Apache Beam pipeline using the Golang SDK.

 
||Transform||Pull Request||Status||
|Map|[11564|https://github.com/apache/beam/pull/11564]|Closed|
|GroupByKey|[11734|https://github.com/apache/beam/pull/11734]|Closed|
|CoGroupByKey|[11803|https://github.com/apache/beam/pull/11803]|Open|
|Combine| | |
|Flatten|[11806|https://github.com/apache/beam/pull/11806]| |
|Partition| | |
|Side Input| | |
|Side Output| | |
|Branching| | |
|Composite Transform| | |
|DoFn Additional Parameters| | |

  was:
A kata devoted to core beam transforms patterns after 
[https://github.com/apache/beam/tree/master/learning/katas/java/Core%20Transforms]
 where the take away is an individual's ability to master the following using 
an Apache Beam pipeline using the Golang SDK.

 
||Transform||Pull Request||Status||
|Map|[11564|https://github.com/apache/beam/pull/11564]|Closed|
|GroupByKey|[11734|https://github.com/apache/beam/pull/11734]|Closed|
|CoGroupByKey|[11803|https://github.com/apache/beam/pull/11803]|Open|
|Combine| | |
|Flatten|[11806|https://github.com/apache/beam/pull/11806]| | |
|Partition| | |
|Side Input| | |
|Side Output| | |
|Branching| | |
|Composite Transform| | |
|DoFn Additional Parameters| | |


> Core Transforms | Go SDK Code Katas
> ---
>
> Key: BEAM-9679
> URL: https://issues.apache.org/jira/browse/BEAM-9679
> Project: Beam
>  Issue Type: Sub-task
>  Components: katas, sdk-go
>Reporter: Damon Douglas
>Assignee: Damon Douglas
>Priority: P2
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> A kata devoted to core beam transforms patterns after 
> [https://github.com/apache/beam/tree/master/learning/katas/java/Core%20Transforms]
>  where the take away is an individual's ability to master the following using 
> an Apache Beam pipeline using the Golang SDK.
>  
> ||Transform||Pull Request||Status||
> |Map|[11564|https://github.com/apache/beam/pull/11564]|Closed|
> |GroupByKey|[11734|https://github.com/apache/beam/pull/11734]|Closed|
> |CoGroupByKey|[11803|https://github.com/apache/beam/pull/11803]|Open|
> |Combine| | |
> |Flatten|[11806|https://github.com/apache/beam/pull/11806]| |
> |Partition| | |
> |Side Input| | |
> |Side Output| | |
> |Branching| | |
> |Composite Transform| | |
> |DoFn Additional Parameters| | |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9679) Core Transforms | Go SDK Code Katas

2020-05-26 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke updated BEAM-9679:
---
Description: 
A kata devoted to core beam transforms patterns after 
[https://github.com/apache/beam/tree/master/learning/katas/java/Core%20Transforms]
 where the take away is an individual's ability to master the following using 
an Apache Beam pipeline using the Golang SDK.

 
||Transform||Pull Request||Status||
|Map|[11564|https://github.com/apache/beam/pull/11564]|Closed|
|GroupByKey|[11734|https://github.com/apache/beam/pull/11734]|Closed|
|CoGroupByKey|[11803|https://github.com/apache/beam/pull/11803]|Open|
|Combine| | |
|Flatten|[11806|https://github.com/apache/beam/pull/11806]| | |
|Partition| | |
|Side Input| | |
|Side Output| | |
|Branching| | |
|Composite Transform| | |
|DoFn Additional Parameters| | |

  was:
A kata devoted to core beam transforms patterns after 
[https://github.com/apache/beam/tree/master/learning/katas/java/Core%20Transforms]
 where the take away is an individual's ability to master the following using 
an Apache Beam pipeline using the Golang SDK.

 
||Transform||Pull Request||Status||
|Map|[11564|https://github.com/apache/beam/pull/11564]|Closed|
|GroupByKey|[11734|https://github.com/apache/beam/pull/11734]|Closed|
|CoGroupByKey|[11803|https://github.com/apache/beam/pull/11803]|Open|
|Combine| | |
|Flatten| | |
|Partition| | |
|Side Input| | |
|Side Output| | |
|Branching| | |
|Composite Transform| | |
|DoFn Additional Parameters| | |


> Core Transforms | Go SDK Code Katas
> ---
>
> Key: BEAM-9679
> URL: https://issues.apache.org/jira/browse/BEAM-9679
> Project: Beam
>  Issue Type: Sub-task
>  Components: katas, sdk-go
>Reporter: Damon Douglas
>Assignee: Damon Douglas
>Priority: P2
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> A kata devoted to core beam transforms patterns after 
> [https://github.com/apache/beam/tree/master/learning/katas/java/Core%20Transforms]
>  where the take away is an individual's ability to master the following using 
> an Apache Beam pipeline using the Golang SDK.
>  
> ||Transform||Pull Request||Status||
> |Map|[11564|https://github.com/apache/beam/pull/11564]|Closed|
> |GroupByKey|[11734|https://github.com/apache/beam/pull/11734]|Closed|
> |CoGroupByKey|[11803|https://github.com/apache/beam/pull/11803]|Open|
> |Combine| | |
> |Flatten|[11806|https://github.com/apache/beam/pull/11806]| | |
> |Partition| | |
> |Side Input| | |
> |Side Output| | |
> |Branching| | |
> |Composite Transform| | |
> |DoFn Additional Parameters| | |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-10051) Misordered check WRT closed data readers.

2020-05-21 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke reassigned BEAM-10051:
---

Assignee: Robert Burke

> Misordered check WRT closed data readers.
> -
>
> Key: BEAM-10051
> URL: https://issues.apache.org/jira/browse/BEAM-10051
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Robert Burke
>Priority: P2
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This check 
> https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/harness/datamgr.go#L269
> in it's current position prevents the "normal teardown" that the reader 
> expects. This means that readers for instructions that terminate early such 
> as due to splitting stay resident in memory and never close.
> In practice this is benign as the buffer would already be closed, but with 
> streaming this  memory leak would become noticable.
> The fix is to move the check to after the sentinel check, and additionally 
> check there for early termination to avoid closing the buffer twice.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10056) Side Input Validation too tight, doesn't allow CoGBK

2020-05-21 Thread Robert Burke (Jira)
Robert Burke created BEAM-10056:
---

 Summary: Side Input Validation too tight, doesn't allow CoGBK
 Key: BEAM-10056
 URL: https://issues.apache.org/jira/browse/BEAM-10056
 Project: Beam
  Issue Type: Bug
  Components: sdk-go
Reporter: Robert Burke
Assignee: Robert Burke


The following doesn't pass validation, though it should as it's a valid 
signature for ParDo accepting a PCollection>

func (fn *writer) StartBundle(ctx context.Context) error

func (fn *writer) ProcessElement(
ctx context.Context,
key string,
iter1, iter2 func(**clientHistory) bool)

func (fn *writer) FinishBundle(ctx context.Context)

It returns an error:

Missing side inputs in the StartBundle method of a DoFn. If side inputs are 
present in ProcessElement those side inputs must also be present in StartBundle.
Full error:
inserting ParDo in scope root:
graph.AsDoFn: for Fn named <...pii...>/userpackage.writer:
side inputs expected in method StartBundle [recovered]
panic: Missing side inputs in the StartBundle method of a DoFn. If side 
inputs are present in ProcessElement those side inputs must also be present in 
StartBundle.
Full error:
inserting ParDo in scope root:
graph.AsDoFn: for Fn named <...pii...>/userpackage.writer:
side inputs expected in method StartBundle


This is happening in the input unaware validation, which means it needs to be 
loosened, and validated elsewhere.
https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/graph/fn.go#L527

There are "sibling" cases for the DoFn  signature

func (fn *writer) StartBundle(context.Context, side func(**clientHistory) bool) 
error

func (fn *writer) ProcessElement(
ctx context.Context,
key string,
iter, side func(**clientHistory) bool)

func (fn *writer) FinishBundle( context.Context, side, func(**clientHistory) 
bool)

and

func (fn *writer) StartBundle(context.Context, side1, side2 
func(**clientHistory) bool) error

func (fn *writer) ProcessElement(
ctx context.Context,
key string,
side1, side2 func(**clientHistory) bool)

func (fn *writer) FinishBundle( context.Context, side1, side2 
func(**clientHistory) bool)

Would be for  > with <*clientHistory> on the 
side, and
  with <*clientHistory> and <*clientHistory> on the side respectively.

Which would only be determinable fully with the input, and should provide a 
clear error when PCollection binding is occuring.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10051) Misordered check WRT closed data readers.

2020-05-20 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke updated BEAM-10051:

Description: 
This check 
https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/harness/datamgr.go#L269

in it's current position prevents the "normal teardown" that the reader 
expects. This means that readers for instructions that terminate early such as 
due to splitting stay resident in memory and never close.

In practice this is benign as the buffer would already be closed, but with 
streaming this  memory leak would become noticable.

The fix is to move the check to after the sentinel check, and additionally 
check there for early termination to avoid closing the buffer twice.

> Misordered check WRT closed data readers.
> -
>
> Key: BEAM-10051
> URL: https://issues.apache.org/jira/browse/BEAM-10051
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Robert Burke
>Priority: P2
>
> This check 
> https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/harness/datamgr.go#L269
> in it's current position prevents the "normal teardown" that the reader 
> expects. This means that readers for instructions that terminate early such 
> as due to splitting stay resident in memory and never close.
> In practice this is benign as the buffer would already be closed, but with 
> streaming this  memory leak would become noticable.
> The fix is to move the check to after the sentinel check, and additionally 
> check there for early termination to avoid closing the buffer twice.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10051) Misordered check WRT closed data readers.

2020-05-20 Thread Robert Burke (Jira)
Robert Burke created BEAM-10051:
---

 Summary: Misordered check WRT closed data readers.
 Key: BEAM-10051
 URL: https://issues.apache.org/jira/browse/BEAM-10051
 Project: Beam
  Issue Type: Bug
  Components: sdk-go
Reporter: Robert Burke






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10049) Add licenses to Go SDK containers

2020-05-20 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke updated BEAM-10049:

Description: 
This will be a prerequisite to publishing Go SDK containers as part of the 
release again. See BEAM-9685

There's tool to pull in dependency license information for a Go package: 
https://github.com/google/go-licenses

And once the License file from  PR  https://github.com/apache/beam/pull/11657 
is picked up, 
pkd.go.dev will also display them, 
https://pkg.go.dev/github.com/apache/beam/sdks/go/pkg/beam?tab=licenses  

  was:This will be a prerequisite to publishing Go SDK containers as part of 
the release again. See BEAM-9685


> Add licenses to Go SDK containers
> -
>
> Key: BEAM-10049
> URL: https://issues.apache.org/jira/browse/BEAM-10049
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system, sdk-go
>Reporter: Kyle Weaver
>Priority: P2
>
> This will be a prerequisite to publishing Go SDK containers as part of the 
> release again. See BEAM-9685
> There's tool to pull in dependency license information for a Go package: 
> https://github.com/google/go-licenses
> And once the License file from  PR  https://github.com/apache/beam/pull/11657 
> is picked up, 
> pkd.go.dev will also display them, 
> https://pkg.go.dev/github.com/apache/beam/sdks/go/pkg/beam?tab=licenses  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9615) [Go SDK] Beam Schemas

2020-05-14 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke updated BEAM-9615:
---
Description: 
Schema support is required for advanced cross language features in Beam, and 
has the opportunity to replace the current default JSON encoding of elements.



Some quick notes, though a better fleshed out doc with details will be 
forthcoming:
 * All base coders should be implemented, and listed as coder capabilities. I 
think only stringutf8 is missing presently.
 * Should support fairly arbitrary user types, seamlessly. That is, users 
should be able to rely on it "just working" if their type is compatible.
 * Should support schema metadata tagging.

In particular, one breaking shift in the default will be to explicitly fail 
pipelines if elements have unexported fields, when no other custom coder has 
been added. This has been a source of errors/dropped data/keys and a simply 
warning at construction time won't cut it. However, we could provide a manual 
"use beam schemas, but ignore unexported fields" registration as a work around.

Edit: Doc is now at https://s.apache.org/beam-go-schemas

  was:
Schema support is required for advanced cross language features in Beam, and 
has the opportunity to replace the current default JSON encoding of elements.

 

Some quick notes, though a better fleshed out doc with details will be 
forthcoming:
 * All base coders should be implemented, and listed as coder capabilities. I 
think only stringutf8 is missing presently.
 * Should support fairly arbitrary user types, seamlessly. That is, users 
should be able to rely on it "just working" if their type is compatible.
 * Should support schema metadata tagging.

In particular, one breaking shift in the default will be to explicitly fail 
pipelines if elements have unexported fields, when no other custom coder has 
been added. This has been a source of errors/dropped data/keys and a simply 
warning at construction time won't cut it. However, we could provide a manual 
"use beam schemas, but ignore unexported fields" registration as a work around.


> [Go SDK] Beam Schemas
> -
>
> Key: BEAM-9615
> URL: https://issues.apache.org/jira/browse/BEAM-9615
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Robert Burke
>Priority: Major
>
> Schema support is required for advanced cross language features in Beam, and 
> has the opportunity to replace the current default JSON encoding of elements.
> Some quick notes, though a better fleshed out doc with details will be 
> forthcoming:
>  * All base coders should be implemented, and listed as coder capabilities. I 
> think only stringutf8 is missing presently.
>  * Should support fairly arbitrary user types, seamlessly. That is, users 
> should be able to rely on it "just working" if their type is compatible.
>  * Should support schema metadata tagging.
> In particular, one breaking shift in the default will be to explicitly fail 
> pipelines if elements have unexported fields, when no other custom coder has 
> been added. This has been a source of errors/dropped data/keys and a simply 
> warning at construction time won't cut it. However, we could provide a manual 
> "use beam schemas, but ignore unexported fields" registration as a work 
> around.
> Edit: Doc is now at https://s.apache.org/beam-go-schemas



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-7178) Add package comment to "errors" package.

2020-05-13 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke resolved BEAM-7178.

Fix Version/s: Not applicable
   Resolution: Fixed

> Add package comment to "errors" package.
> 
>
> Key: BEAM-7178
> URL: https://issues.apache.org/jira/browse/BEAM-7178
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Daniel Oliveira
>Assignee: Daniel Oliveira
>Priority: Minor
> Fix For: Not applicable
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I forgot to add a package comment to the errors package: 
> [https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/internal/errors/errors.go]
> I should fix that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8292) Add a Reshuffle PTransform preventing fusion of the surrounding transforms

2020-05-13 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke resolved BEAM-8292.

Fix Version/s: Not applicable
   Resolution: Fixed

> Add a Reshuffle PTransform preventing fusion of the surrounding transforms
> --
>
> Key: BEAM-8292
> URL: https://issues.apache.org/jira/browse/BEAM-8292
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go
>Reporter: John Patoch
>Assignee: Robert Burke
>Priority: Minor
> Fix For: Not applicable
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Reshuffle is a PTransform that takes a PCollection and shuffles the data 
> to help increase parallelism.
> Reshuffle adds a temporary random key to each element, performs a
>  GroupByKey, and finally removes the temporary key.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9982) Replace graphx.MustMarshal with protox.MustEncode

2020-05-13 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke updated BEAM-9982:
---
Description: 
A redundant helper function, 
[graphx.MustMarshal|https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/graphx/translate.go#L117],
 was accidentally introduced recently. There exists an identical function in a 
different package that was already being used in that same file, 
[protox.MustEncode|https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/util/protox/protox.go#L22]

This task is to remove all instances of the graphx.MustMarshal function and 
replace them with the protox.MustEncode call instead.

  was:
A redundant helper function, 
[graphx.MustMarshal][https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/graphx/translate.go#L117],
 was accidentally introduced recently. There exists an identical function in a 
different package that was already being used in that same file, 
[protox.MustEncode][https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/util/protox/protox.go#L22]

This task is to remove all instances of the graphx.MustMarshal function and 
replace them with the protox.MustEncode call instead.


> Replace  graphx.MustMarshal with protox.MustEncode
> --
>
> Key: BEAM-9982
> URL: https://issues.apache.org/jira/browse/BEAM-9982
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Robert Burke
>Priority: Minor
>
> A redundant helper function, 
> [graphx.MustMarshal|https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/graphx/translate.go#L117],
>  was accidentally introduced recently. There exists an identical function in 
> a different package that was already being used in that same file, 
> [protox.MustEncode|https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/util/protox/protox.go#L22]
> This task is to remove all instances of the graphx.MustMarshal function and 
> replace them with the protox.MustEncode call instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9982) Replace graphx.MustMarshal with protox.MustEncode

2020-05-13 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke updated BEAM-9982:
---
Description: 
A redundant helper function, 
[graphx.MustMarshal][https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/graphx/translate.go#L117],
 was accidentally introduced recently. There exists an identical function in a 
different package that was already being used in that same file, 
[protox.MustEncode][https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/util/protox/protox.go#L22]

This task is to remove all instances of the graphx.MustMarshal function and 
replace them with the protox.MustEncode call instead.

  was:
A redundant helper function, 
[graphx.MustMarshal](https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/graphx/translate.go#L117),
 was accidentally introduced recently. There exists an identical function in a 
different package that was already being used in that same file, 
[protox.MustEncode](https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/util/protox/protox.go#L22)

This task is to remove all instances of the graphx.MustMarshal function and 
replace them with the protox.MustEncode call instead.


> Replace  graphx.MustMarshal with protox.MustEncode
> --
>
> Key: BEAM-9982
> URL: https://issues.apache.org/jira/browse/BEAM-9982
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Robert Burke
>Priority: Minor
>
> A redundant helper function, 
> [graphx.MustMarshal][https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/graphx/translate.go#L117],
>  was accidentally introduced recently. There exists an identical function in 
> a different package that was already being used in that same file, 
> [protox.MustEncode][https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/util/protox/protox.go#L22]
> This task is to remove all instances of the graphx.MustMarshal function and 
> replace them with the protox.MustEncode call instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9982) Replace graphx.MustMarshal with protox.MustEncode

2020-05-13 Thread Robert Burke (Jira)
Robert Burke created BEAM-9982:
--

 Summary: Replace  graphx.MustMarshal with protox.MustEncode
 Key: BEAM-9982
 URL: https://issues.apache.org/jira/browse/BEAM-9982
 Project: Beam
  Issue Type: Bug
  Components: sdk-go
Reporter: Robert Burke
Assignee: Robert Burke


A redundant helper function, 
[graphx.MustMarshal](https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/graphx/translate.go#L117),
 was accidentally introduced recently. There exists an identical function in a 
different package that was already being used in that same file, 
[protox.MustEncode](https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/util/protox/protox.go#L22)

This task is to remove all instances of the graphx.MustMarshal function and 
replace them with the protox.MustEncode call instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9959) Mistakes Computing Composite Inputs and Outputs

2020-05-12 Thread Robert Burke (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105750#comment-17105750
 ] 

Robert Burke commented on BEAM-9959:


The right overall fix for that is to check for cycles WRT the composites after 
the topological sort, and print out that there's a cycle involving the 
*composite* node represented by the scope. Anything without the full cycle is 
much harder to debug. Further, the individual PTransforms involved should be 
fully qualified with their composite parent hierachies to make it easier to 
find where these are coming from, and recommend either merging two scopes or 
similar, and recommending that the new scope objects be moved to their own 
functions with 1 scope per function. This makes the bad construction impossible.

> Mistakes Computing Composite Inputs and Outputs
> ---
>
> Key: BEAM-9959
> URL: https://issues.apache.org/jira/browse/BEAM-9959
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Robert Burke
>Priority: Major
>
> The Go SDK uses a Scope object to manage beam Composites.
> A bug was discovered when consuming a PCollection in both the composite that 
> created it, and in a separate composite.
> Further, the Go SDK should verify that the root hypergraph structure is a DAG 
> and provides a reasonable error.  In particular, the leaf nodes of the graph 
> could form a DAG, but due to how the beam.Scope object is used, might cause 
> the hypergraph to not be a DAG.
> Eg. It's possible to write the following in the Go SDK.
>  PTransforms A, B, C and PCollections colA, colB, and Composites a, b.
> A and C are in a, and B are in b.
> A generates colA
> B consumes colA, and generates colB.
> C consumes colA and colB.
> ```
> a := s.Scope(a)
> b := s.Scope(b)
> colA := beam.Impulse(*a*)
> colB := beam.ParDo(*b*, , colA)
> beam.ParDo0(*a*, , colA, beam.SideInput{colB})
> ```
> If it doesn't already, the Go SDK must emit a clear error, and fail pipeline 
> construction.
> If the affected composites are roots in the graph, the cycle prevents being 
> able to topologically sort the root ptransforms for the pipeline graph, which 
> can adversely affect runners.
> The recommendation is always to wrap uses of scope in functions or other 
> scopes to prevent such incorrect constructions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9959) Mistakes Computing Composite Inputs and Outputs

2020-05-12 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke updated BEAM-9959:
---
Description: 
The Go SDK uses a Scope object to manage beam Composites.

A bug was discovered when consuming a PCollection in both the composite that 
created it, and in a separate composite.

Further, the Go SDK should verify that the root hypergraph structure is a DAG 
and provides a reasonable error.  In particular, the leaf nodes of the graph 
could form a DAG, but due to how the beam.Scope object is used, might cause the 
hypergraph to not be a DAG.

Eg. It's possible to write the following in the Go SDK.

 PTransforms A, B, C and PCollections colA, colB, and Composites a, b.
A and C are in a, and B are in b.
A generates colA
B consumes colA, and generates colB.
C consumes colA and colB.

```
a := s.Scope(a)
b := s.Scope(b)
colA := beam.Impulse(*a*)
colB := beam.ParDo(*b*, , colA)
beam.ParDo0(*a*, , colA, beam.SideInput{colB})
```

If it doesn't already, the Go SDK must emit a clear error, and fail pipeline 
construction.

If the affected composites are roots in the graph, the cycle prevents being 
able to topologically sort the root ptransforms for the pipeline graph, which 
can adversely affect runners.

The recommendation is always to wrap uses of scope in functions or other scopes 
to prevent such incorrect constructions.




  was:
The Go SDK uses a Scope object to manage beam Composites.

A bug was discovered when consuming a PCollection in both the composite that 
created it, and in a separate composite.

Further, the Go SDK should verify that the root hypergraph structure is a DAG 
and provides a reasonable error.  In particular, the leaf nodes of the graph 
could form a DAG, but due to how the beam.Scope object is used, might cause the 
hypergraph to not be a DAG.

Eg. It's possible to write the following in the Go SDK.

 PTransforms A, B, C and PCollections colA, colB, and Composites a, b.
A and C are in a, and B are in b.
A generates colA
B consumes colA, and generates colB.
C consumes colB.

```
a := s.Scope(a)
b := s.Scope(b)
colA := beam.Impulse(*a*)
colB := beam.ParDo(*b*, , colA)
beam.ParDo0(*a*, , colA)
```

If it doesn't already the Go SDK must emit a clear error, and fail pipeline 
construction.

If the affected composites are roots in the graph, the cycle prevents being 
able to topologically sort the root ptransforms for the pipeline graph, which 
can adversely affect runners.

The recommendation is always to wrap uses of scope in functions or other scopes 
to prevent such incorrect constructions.





> Mistakes Computing Composite Inputs and Outputs
> ---
>
> Key: BEAM-9959
> URL: https://issues.apache.org/jira/browse/BEAM-9959
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Robert Burke
>Priority: Major
>
> The Go SDK uses a Scope object to manage beam Composites.
> A bug was discovered when consuming a PCollection in both the composite that 
> created it, and in a separate composite.
> Further, the Go SDK should verify that the root hypergraph structure is a DAG 
> and provides a reasonable error.  In particular, the leaf nodes of the graph 
> could form a DAG, but due to how the beam.Scope object is used, might cause 
> the hypergraph to not be a DAG.
> Eg. It's possible to write the following in the Go SDK.
>  PTransforms A, B, C and PCollections colA, colB, and Composites a, b.
> A and C are in a, and B are in b.
> A generates colA
> B consumes colA, and generates colB.
> C consumes colA and colB.
> ```
> a := s.Scope(a)
> b := s.Scope(b)
> colA := beam.Impulse(*a*)
> colB := beam.ParDo(*b*, , colA)
> beam.ParDo0(*a*, , colA, beam.SideInput{colB})
> ```
> If it doesn't already, the Go SDK must emit a clear error, and fail pipeline 
> construction.
> If the affected composites are roots in the graph, the cycle prevents being 
> able to topologically sort the root ptransforms for the pipeline graph, which 
> can adversely affect runners.
> The recommendation is always to wrap uses of scope in functions or other 
> scopes to prevent such incorrect constructions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9959) Mistakes Computing Composite Inputs and Outputs

2020-05-12 Thread Robert Burke (Jira)
Robert Burke created BEAM-9959:
--

 Summary: Mistakes Computing Composite Inputs and Outputs
 Key: BEAM-9959
 URL: https://issues.apache.org/jira/browse/BEAM-9959
 Project: Beam
  Issue Type: Bug
  Components: sdk-go
Reporter: Robert Burke
Assignee: Robert Burke


The Go SDK uses a Scope object to manage beam Composites.

A bug was discovered when consuming a PCollection in both the composite that 
created it, and in a separate composite.

Further, the Go SDK should verify that the root hypergraph structure is a DAG 
and provides a reasonable error.  In particular, the leaf nodes of the graph 
could form a DAG, but due to how the beam.Scope object is used, might cause the 
hypergraph to not be a DAG.

Eg. It's possible to write the following in the Go SDK.

 PTransforms A, B, C and PCollections colA, colB, and Composites a, b.
A and C are in a, and B are in b.
A generates colA
B consumes colA, and generates colB.
C consumes colB.

```
a := s.Scope(a)
b := s.Scope(b)
colA := beam.Impulse(*a*)
colB := beam.ParDo(*b*, , colA)
beam.ParDo0(*a*, , colA)
```

If it doesn't already the Go SDK must emit a clear error, and fail pipeline 
construction.

If the affected composites are roots in the graph, the cycle prevents being 
able to topologically sort the root ptransforms for the pipeline graph, which 
can adversely affect runners.

The recommendation is always to wrap uses of scope in functions or other scopes 
to prevent such incorrect constructions.






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-7030) Make it possible to display the full PCollection when passert fails

2020-05-06 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke resolved BEAM-7030.

Fix Version/s: Not applicable
 Assignee: Paul Fisher
   Resolution: Fixed

I believe this got addressed in BEAM-9731. PAssert now prints the whole 
PCollections under test, and soon, also sorts it for easier comparison.

> Make it possible to display the full PCollection when passert fails
> ---
>
> Key: BEAM-7030
> URL: https://issues.apache.org/jira/browse/BEAM-7030
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go, testing
>Reporter: Damien Desfontaines
>Assignee: Paul Fisher
>Priority: Major
> Fix For: Not applicable
>
>
> If I use passert.Equals with two PCollections, and the test fails, the error 
> message only says something like "value _ present, but not expected". This is 
> not very useful — to debug failing tests, I'd like to print both PCollections 
> so I can compare them directly instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9815) beam_PostCommit_Go perma red due to failing to start container

2020-04-25 Thread Robert Burke (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17092409#comment-17092409
 ] 

Robert Burke commented on BEAM-9815:


I'm going to stop looking at this point, but the open source side seems to be 
exhausted. The code eventually runs a binary on the Dataflow side to set up an 
artifact service, that [proxies reading from 
GCS](https://github.com/apache/beam/blob/24361d1b5981ef7d18e586a8e5deaf683f4329f1/sdks/go/pkg/beam/artifact/gcsproxy/retrieval.go#L82).
 That code 

is in the [artifact 
package](https://github.com/apache/beam/blob/24361d1b5981ef7d18e586a8e5deaf683f4329f1/sdks/go/pkg/beam/artifact/materialize.go#L135)
 though it's called from something inside google. That there might have some 
kind of version skew with the container/boot.go code, and need updating on the 
google side. I'm unable to figure that one out at this time.


> beam_PostCommit_Go perma red due to failing to start container
> --
>
> Key: BEAM-9815
> URL: https://issues.apache.org/jira/browse/BEAM-9815
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go, test-failures
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Robert Bradshaw
>Priority: Critical
>  Labels: currently-failing
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> For example,
> [https://builds.apache.org/job/beam_PostCommit_Go/6847/]
> [https://pantheon.corp.google.com/logs/viewer?project=apache-beam-testing=500=false=2020-04-24T15:09:13.45500Z==true=NO_LIMIT=dataflow_step%2Fjob_id%2F2020-04-24_05_03_49-5495819388067192698=2020-04-24T13:03:38.313084000Z]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9815) beam_PostCommit_Go perma red due to failing to start container

2020-04-25 Thread Robert Burke (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17092397#comment-17092397
 ] 

Robert Burke commented on BEAM-9815:


The provision info for dataflow has no depenencies listed (though the pipeline 
proto does have one listed), and also no retrieval token. So a "quick" fix 
might be to hack it to assume something is there.

The data there was not at the "assumed" path on the worker, and it never used 
the staging location for the worker binary.

https://pantheon.corp.google.com/storage/browser/temp-storage-for-end-to-end-tests/staging-validatesrunner-test/go-1-1587852331972351853/?forceOnBucketsSortingFiltering=false=apache-beam-testing

But the model was able to be found at the staging location (which makes sense 
since we can see it in the dataflow explorer there).
I 2020-04-25T22:06:49.848812Z Downloading: 
gs://temp-storage-for-end-to-end-tests/staging-validatesrunner-test/go-2-1587852331972428268/model
 to /tmp/tmp/download.0.148567144/file.0 (size: 3 Kb, MD5: 
sceqVeC8VgLLgWXiRJ0Kvg==) 
I 2020-04-25T22:06:49.952462Z Download completed: 
gs://temp-storage-for-end-to-end-tests/staging-validatesrunner-test/go-2-1587852331972428268/model
 (duration: 103 ms @ 37 Kb/s) 


So, I'm going to look now where the model is actually getting downloaded (since 
it's doing that somewhere, and printing it out), and see why the worker binary 
is  not getting the same treatment.

> beam_PostCommit_Go perma red due to failing to start container
> --
>
> Key: BEAM-9815
> URL: https://issues.apache.org/jira/browse/BEAM-9815
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go, test-failures
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Robert Bradshaw
>Priority: Critical
>  Labels: currently-failing
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> For example,
> [https://builds.apache.org/job/beam_PostCommit_Go/6847/]
> [https://pantheon.corp.google.com/logs/viewer?project=apache-beam-testing=500=false=2020-04-24T15:09:13.45500Z==true=NO_LIMIT=dataflow_step%2Fjob_id%2F2020-04-24_05_03_49-5495819388067192698=2020-04-24T13:03:38.313084000Z]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9815) beam_PostCommit_Go perma red due to failing to start container

2020-04-25 Thread Robert Burke (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17092390#comment-17092390
 ] 

Robert Burke commented on BEAM-9815:


`2020/04/25 21:29:37 Initializing AWESOME Go harness: /opt/apache/beam/boot 
--id=1 --logging_endpoint=localhost:12370 --control_endpoint=localhost:12371 
--artifact_endpoint=localhost:12372 --provision_endpoint=localhost:12373 
--semi_persist_dir=/var/opt/google`
 
is what Dataflow tells the boot container, while for Flink, only the provision 
service is provided.

`12:16:29 2020/04/25 19:16:28 Initializing Go harness: /opt/apache/beam/boot 
--id=23-1 --provision_endpoint=localhost:46247`

> beam_PostCommit_Go perma red due to failing to start container
> --
>
> Key: BEAM-9815
> URL: https://issues.apache.org/jira/browse/BEAM-9815
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go, test-failures
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Robert Bradshaw
>Priority: Critical
>  Labels: currently-failing
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> For example,
> [https://builds.apache.org/job/beam_PostCommit_Go/6847/]
> [https://pantheon.corp.google.com/logs/viewer?project=apache-beam-testing=500=false=2020-04-24T15:09:13.45500Z==true=NO_LIMIT=dataflow_step%2Fjob_id%2F2020-04-24_05_03_49-5495819388067192698=2020-04-24T13:03:38.313084000Z]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-9815) beam_PostCommit_Go perma red due to failing to start container

2020-04-25 Thread Robert Burke (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17092381#comment-17092381
 ] 

Robert Burke edited comment on BEAM-9815 at 4/25/20, 9:38 PM:
--

Further investigation reveals that there must be some other logic error in the 
artifact fetching in the boot harness.

First I thought it was an entirely different harness container that was on the 
dataflow side, but it turns out if I modify the [go binary 
booter](https://github.com/apache/beam/blob/master/sdks/go/container/boot.go) 
it is reflected when submitting the job, so for some reason the artifacts are 
either being queried incorrectly, OR being staged incorrectly, leading to the 
"No artifacts staged" message.


was (Author: lostluck):
Further investigation reveals that there must be some other logic error in the 
artifact fetching in the boot harness.

First I though it was an entirely different harness container that was on the 
dataflow side, but it turns out if I modify the [go binary 
booter](https://github.com/apache/beam/blob/master/sdks/go/container/boot.go) 
it is reflected when submitting the job, so for some reason the artifacts are 
either being queried incorrectly, OR being staged incorrectly, leading to the 
"No artifacts staged" message.

> beam_PostCommit_Go perma red due to failing to start container
> --
>
> Key: BEAM-9815
> URL: https://issues.apache.org/jira/browse/BEAM-9815
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go, test-failures
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Robert Bradshaw
>Priority: Critical
>  Labels: currently-failing
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> For example,
> [https://builds.apache.org/job/beam_PostCommit_Go/6847/]
> [https://pantheon.corp.google.com/logs/viewer?project=apache-beam-testing=500=false=2020-04-24T15:09:13.45500Z==true=NO_LIMIT=dataflow_step%2Fjob_id%2F2020-04-24_05_03_49-5495819388067192698=2020-04-24T13:03:38.313084000Z]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9815) beam_PostCommit_Go perma red due to failing to start container

2020-04-25 Thread Robert Burke (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17092381#comment-17092381
 ] 

Robert Burke commented on BEAM-9815:


Further investigation reveals that there must be some other logic error in the 
artifact fetching in the boot harness.

First I though it was an entirely different harness container that was on the 
dataflow side, but it turns out if I modify the [go binary 
booter](https://github.com/apache/beam/blob/master/sdks/go/container/boot.go) 
it is reflected when submitting the job, so for some reason the artifacts are 
either being queried incorrectly, OR being staged incorrectly, leading to the 
"No artifacts staged" message.

> beam_PostCommit_Go perma red due to failing to start container
> --
>
> Key: BEAM-9815
> URL: https://issues.apache.org/jira/browse/BEAM-9815
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go, test-failures
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Robert Bradshaw
>Priority: Critical
>  Labels: currently-failing
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> For example,
> [https://builds.apache.org/job/beam_PostCommit_Go/6847/]
> [https://pantheon.corp.google.com/logs/viewer?project=apache-beam-testing=500=false=2020-04-24T15:09:13.45500Z==true=NO_LIMIT=dataflow_step%2Fjob_id%2F2020-04-24_05_03_49-5495819388067192698=2020-04-24T13:03:38.313084000Z]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9815) beam_PostCommit_Go perma red due to failing to start container

2020-04-25 Thread Robert Burke (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17092371#comment-17092371
 ] 

Robert Burke commented on BEAM-9815:


Confirmed, that the "dev" tag doesn't exist
https://hub.docker.com/r/apache/beam_go_sdk/tags

Which comes from 
https://github.com/apache/beam/blob/master/sdks/go/test/run_integration_tests.sh#L152

And I think this worked before since previously something else changed as well, 
since the tests should be building and pushing an image to the beam testing 
repo, and no defaulting to the "dev" tagged image. That path should only be for 
the universal python runner.

Which I've now confirmed that comparison not working for some reason...



> beam_PostCommit_Go perma red due to failing to start container
> --
>
> Key: BEAM-9815
> URL: https://issues.apache.org/jira/browse/BEAM-9815
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go, test-failures
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Robert Bradshaw
>Priority: Critical
>  Labels: currently-failing
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> For example,
> [https://builds.apache.org/job/beam_PostCommit_Go/6847/]
> [https://pantheon.corp.google.com/logs/viewer?project=apache-beam-testing=500=false=2020-04-24T15:09:13.45500Z==true=NO_LIMIT=dataflow_step%2Fjob_id%2F2020-04-24_05_03_49-5495819388067192698=2020-04-24T13:03:38.313084000Z]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9815) beam_PostCommit_Go perma red due to failing to start container

2020-04-25 Thread Robert Burke (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17092314#comment-17092314
 ] 

Robert Burke commented on BEAM-9815:


Digging into this further, it reads like the "apache/beam_go_sdk:dev" is not 
found, and IIRC we changed all that up lately, so it's probable that the 
container was never built and doesn't exist at all at this point.

This commit removed the Go SDK containers from the release, which was when we 
moved from our own repo to the official apache repo.
https://github.com/apache/beam/commit/061c5c7db5064e20eef50a6a51f976235b30aae2

> beam_PostCommit_Go perma red due to failing to start container
> --
>
> Key: BEAM-9815
> URL: https://issues.apache.org/jira/browse/BEAM-9815
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go, test-failures
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Robert Bradshaw
>Priority: Critical
>  Labels: currently-failing
>
> For example,
> [https://builds.apache.org/job/beam_PostCommit_Go/6847/]
> [https://pantheon.corp.google.com/logs/viewer?project=apache-beam-testing=500=false=2020-04-24T15:09:13.45500Z==true=NO_LIMIT=dataflow_step%2Fjob_id%2F2020-04-24_05_03_49-5495819388067192698=2020-04-24T13:03:38.313084000Z]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9815) beam_PostCommit_Go perma red due to failing to start container

2020-04-25 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke updated BEAM-9815:
---
Component/s: sdk-go

> beam_PostCommit_Go perma red due to failing to start container
> --
>
> Key: BEAM-9815
> URL: https://issues.apache.org/jira/browse/BEAM-9815
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go, test-failures
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Robert Bradshaw
>Priority: Critical
>  Labels: currently-failing
>
> For example,
> [https://builds.apache.org/job/beam_PostCommit_Go/6847/]
> [https://pantheon.corp.google.com/logs/viewer?project=apache-beam-testing=500=false=2020-04-24T15:09:13.45500Z==true=NO_LIMIT=dataflow_step%2Fjob_id%2F2020-04-24_05_03_49-5495819388067192698=2020-04-24T13:03:38.313084000Z]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9459) Go Postcommit failing at GBK

2020-04-25 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke resolved BEAM-9459.

Fix Version/s: Not applicable
   Resolution: Fixed

The original issue cause was rolled back.

> Go Postcommit failing at GBK
> 
>
> Key: BEAM-9459
> URL: https://issues.apache.org/jira/browse/BEAM-9459
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go, test-failures
>Reporter: Daniel Oliveira
>Assignee: Robert Burke
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Example: [https://builds.apache.org/job/beam_PostCommit_Go_PR/106/]
> [https://scans.gradle.com/s/es67rfaomu26m]
>  
> {noformat}
> 2020/03/06 00:47:41 Submitted job: 2020-03-05_16_47_40-13139296997856231782
> 2020/03/06 00:47:41 Submitted job: 2020-03-05_16_47_40-13139296997856231782 
> 2020/03/06 00:47:41 Console: 
> https://console.cloud.google.com/dataflow/job/2020-03-05_16_47_40-13139296997856231782?project=apache-beam-testing
> 2020/03/06 00:47:41 Logs: 
> https://console.cloud.google.com/logs/viewer?project=apache-beam-testing=dataflow_step%2Fjob_id%2F2020-03-05_16_47_40-13139296997856231782
> ...
> 2020/03/06 00:50:41 Test cogbk:cogbk failed: job 
> 2020-03-05_16_47_40-13139296997856231782 failed{noformat}
> And then in the console logs: 
> [https://pantheon.corp.google.com/logs/viewer?project=apache-beam-testing=dataflow_step%2Fjob_id%2F2020-03-05_16_47_40-13139296997856231782=500=false=2020-03-06T01:01:14.21000Z==true=2020-03-06T00:01:14.460Z=2020-03-06T01:01:14.460Z=PT1H=2020-03-06T00:49:14.413355915Z]
>  
> {code:java}
> exception: "java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: Error received from SDK harness for instruction 
> -165: process bundle failed for instruction -165 using plan -122 : panic: 
> Unexpected coder: 
> CoGBK goroutine 81 
> [running]:
> runtime/debug.Stack(0xc001103970, 0xd2c5e0, 0xc000bd7f40)
>   /usr/lib/go-1.12/src/runtime/debug/stack.go:24 +0x9d
> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.callNoPanic.func1(0xc001103b90)
>   
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/util.go:40
>  +0x60
> panic(0xd2c5e0, 0xc000bd7f40)
>   /usr/lib/go-1.12/src/runtime/panic.go:522 +0x1b5
> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.MakeElementEncoder(0xc000b99cc0,
>  0xc000aa4930, 0xc000b64a00)
>   
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/coder.go:91
>  +0x479
> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.(*PCollection).Up(0xc000af3dd0,
>  0x10018e0, 0xc000b57f80, 0x0, 0xc000346b50)
>   
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/pcollection.go:59
>  +0xfe
> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.callNoPanic(0x10018e0,
>  0xc000b57f80, 0xc000346c28, 0x0, 0x0)
>   
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/util.go:43
>  +0x6c
> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.(*Plan).Execute(0xc0002623f0,
>  0x10018e0, 0xc000b57f80, 0xc0002365a0, 0x4, 0xff0340, 0xc000aa4750, 
> 0xff0380, 0xc000b57fc0, 0xc000346de0, ...)
>   
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/plan.go:93
>  +0xdf
> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness.(*control).handleInstruction(0xc0001f4680,
>  0x10017a0, 0xc0001bafc0, 0xc000b57dc0, 0xc0001bafc0)
>   
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness/harness.go:211
>  +0xa34
> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness.Main.func2(0x10017a0,
>  0xc0001bafc0, 

[jira] [Commented] (BEAM-9815) beam_PostCommit_Go perma red due to failing to start container

2020-04-24 Thread Robert Burke (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091727#comment-17091727
 ] 

Robert Burke commented on BEAM-9815:


If I knew how to update the Dataflow artifact boot container that's not been 
updated I would do it, but I've been unable to trace where and how that 
container is generated or chosen or set by the service.

Last I heard, it might require a Dataflow service release to resolve. Given 
that Dataflow doesn't yet support the using the Go SDK on it's service, I 
suspect this will not be a high priority at this time.

I'm happier that the Flink and Spark runs, which are the same tests, are still 
passing however.

> beam_PostCommit_Go perma red due to failing to start container
> --
>
> Key: BEAM-9815
> URL: https://issues.apache.org/jira/browse/BEAM-9815
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Robert Bradshaw
>Priority: Critical
>  Labels: currently-failing
>
> For example,
> [https://builds.apache.org/job/beam_PostCommit_Go/6847/]
> [https://pantheon.corp.google.com/logs/viewer?project=apache-beam-testing=500=false=2020-04-24T15:09:13.45500Z==true=NO_LIMIT=dataflow_step%2Fjob_id%2F2020-04-24_05_03_49-5495819388067192698=2020-04-24T13:03:38.313084000Z]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-9815) beam_PostCommit_Go perma red due to failing to start container

2020-04-24 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke reassigned BEAM-9815:
--

Assignee: Robert Bradshaw  (was: Robert Burke)

> beam_PostCommit_Go perma red due to failing to start container
> --
>
> Key: BEAM-9815
> URL: https://issues.apache.org/jira/browse/BEAM-9815
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Robert Bradshaw
>Priority: Critical
>  Labels: currently-failing
>
> For example,
> [https://builds.apache.org/job/beam_PostCommit_Go/6847/]
> [https://pantheon.corp.google.com/logs/viewer?project=apache-beam-testing=500=false=2020-04-24T15:09:13.45500Z==true=NO_LIMIT=dataflow_step%2Fjob_id%2F2020-04-24_05_03_49-5495819388067192698=2020-04-24T13:03:38.313084000Z]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9459) Go Postcommit failing at GBK

2020-04-23 Thread Robert Burke (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091087#comment-17091087
 ] 

Robert Burke commented on BEAM-9459:


Dataflow Postcommits are broken since the Artifact API was changed
recently,and the Dataflow boot container that fetches artifacts hasn't been
updated as well yet.

On Thu, Apr 23, 2020, 6:10 PM Chamikara Madhusanka Jayalath (Jira) <



> Go Postcommit failing at GBK
> 
>
> Key: BEAM-9459
> URL: https://issues.apache.org/jira/browse/BEAM-9459
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go, test-failures
>Reporter: Daniel Oliveira
>Assignee: Robert Burke
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Example: [https://builds.apache.org/job/beam_PostCommit_Go_PR/106/]
> [https://scans.gradle.com/s/es67rfaomu26m]
>  
> {noformat}
> 2020/03/06 00:47:41 Submitted job: 2020-03-05_16_47_40-13139296997856231782
> 2020/03/06 00:47:41 Submitted job: 2020-03-05_16_47_40-13139296997856231782 
> 2020/03/06 00:47:41 Console: 
> https://console.cloud.google.com/dataflow/job/2020-03-05_16_47_40-13139296997856231782?project=apache-beam-testing
> 2020/03/06 00:47:41 Logs: 
> https://console.cloud.google.com/logs/viewer?project=apache-beam-testing=dataflow_step%2Fjob_id%2F2020-03-05_16_47_40-13139296997856231782
> ...
> 2020/03/06 00:50:41 Test cogbk:cogbk failed: job 
> 2020-03-05_16_47_40-13139296997856231782 failed{noformat}
> And then in the console logs: 
> [https://pantheon.corp.google.com/logs/viewer?project=apache-beam-testing=dataflow_step%2Fjob_id%2F2020-03-05_16_47_40-13139296997856231782=500=false=2020-03-06T01:01:14.21000Z==true=2020-03-06T00:01:14.460Z=2020-03-06T01:01:14.460Z=PT1H=2020-03-06T00:49:14.413355915Z]
>  
> {code:java}
> exception: "java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: Error received from SDK harness for instruction 
> -165: process bundle failed for instruction -165 using plan -122 : panic: 
> Unexpected coder: 
> CoGBK goroutine 81 
> [running]:
> runtime/debug.Stack(0xc001103970, 0xd2c5e0, 0xc000bd7f40)
>   /usr/lib/go-1.12/src/runtime/debug/stack.go:24 +0x9d
> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.callNoPanic.func1(0xc001103b90)
>   
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/util.go:40
>  +0x60
> panic(0xd2c5e0, 0xc000bd7f40)
>   /usr/lib/go-1.12/src/runtime/panic.go:522 +0x1b5
> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.MakeElementEncoder(0xc000b99cc0,
>  0xc000aa4930, 0xc000b64a00)
>   
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/coder.go:91
>  +0x479
> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.(*PCollection).Up(0xc000af3dd0,
>  0x10018e0, 0xc000b57f80, 0x0, 0xc000346b50)
>   
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/pcollection.go:59
>  +0xfe
> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.callNoPanic(0x10018e0,
>  0xc000b57f80, 0xc000346c28, 0x0, 0x0)
>   
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/util.go:43
>  +0x6c
> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.(*Plan).Execute(0xc0002623f0,
>  0x10018e0, 0xc000b57f80, 0xc0002365a0, 0x4, 0xff0340, 0xc000aa4750, 
> 0xff0380, 0xc000b57fc0, 0xc000346de0, ...)
>   
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/plan.go:93
>  +0xdf
> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness.(*control).handleInstruction(0xc0001f4680,
>  0x10017a0, 0xc0001bafc0, 0xc000b57dc0, 0xc0001bafc0)
>   
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness/harness.go:211
>  +0xa34
> 

[jira] [Created] (BEAM-9789) Locking error in harness.go

2020-04-20 Thread Robert Burke (Jira)
Robert Burke created BEAM-9789:
--

 Summary: Locking error in harness.go
 Key: BEAM-9789
 URL: https://issues.apache.org/jira/browse/BEAM-9789
 Project: Beam
  Issue Type: Bug
  Components: sdk-go
Reporter: Robert Burke
Assignee: Robert Burke
 Fix For: Not applicable


When there's an error on lookup or construction of an execution plan, the lock 
is accidentally held causing the worker to freeze. 

Shouldn't be user affecting, as most plans and lookups are correct without 
error, but if there's a transient GRPC issue on lookup, that might cause an 
otherwise healthy worker to deadlock.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-8472) Get default GCP region from gcloud

2020-04-13 Thread Robert Burke (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082711#comment-17082711
 ] 

Robert Burke edited comment on BEAM-8472 at 4/13/20, 10:26 PM:
---

Just to be clear, the protocol is to check the environment variables, and then 
execute the gcloud command?

Which would be to use [os.Getenv|https://godoc.org/pkg/os#Getenv] with 
"CLOUDSDK_COMPUTE_REGION" and then use the [os/exec 
package|https://godoc.org/pkg/os/exec] to call the gcloud executable?


was (Author: lostluck):
Just to be clear, the protocol is to check the environment variables, and then 
execute the gcloud command?

Which would be to use [os.Getenv|https://godoc.corp.google.com/pkg/os#Getenv] 
with "CLOUDSDK_COMPUTE_REGION" and then use the [os/exec 
package|https://godoc.corp.google.com/pkg/os/exec] to call the gcloud 
executable?

> Get default GCP region from gcloud
> --
>
> Key: BEAM-8472
> URL: https://issues.apache.org/jira/browse/BEAM-8472
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow, sdk-go
>Reporter: Kyle Weaver
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Currently, we default to us-central1 if --region flag is not set. The Google 
> Cloud SDK generally tries to get a default value in this case for 
> convenience, which we should follow. 
> [https://cloud.google.com/compute/docs/gcloud-compute/#order_of_precedence_for_default_properties]
> Update 11/12: this is complete for Python and Java, Go remains.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8472) Get default GCP region from gcloud

2020-04-13 Thread Robert Burke (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082711#comment-17082711
 ] 

Robert Burke commented on BEAM-8472:


Just to be clear, the protocol is to check the environment variables, and then 
execute the gcloud command?

Which would be to use [os.Getenv|https://godoc.corp.google.com/pkg/os#Getenv] 
with "CLOUDSDK_COMPUTE_REGION" and then use the [os/exec 
package|https://godoc.corp.google.com/pkg/os/exec] to call the gcloud 
executable?

> Get default GCP region from gcloud
> --
>
> Key: BEAM-8472
> URL: https://issues.apache.org/jira/browse/BEAM-8472
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow, sdk-go
>Reporter: Kyle Weaver
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Currently, we default to us-central1 if --region flag is not set. The Google 
> Cloud SDK generally tries to get a default value in this case for 
> convenience, which we should follow. 
> [https://cloud.google.com/compute/docs/gcloud-compute/#order_of_precedence_for_default_properties]
> Update 11/12: this is complete for Python and Java, Go remains.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8472) Get default GCP region from gcloud

2020-04-13 Thread Robert Burke (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082707#comment-17082707
 ] 

Robert Burke commented on BEAM-8472:


Eventually. Dataflow doesn't currently support the Go SDK so this won't be 
prioritized above current work any time soon.

> Get default GCP region from gcloud
> --
>
> Key: BEAM-8472
> URL: https://issues.apache.org/jira/browse/BEAM-8472
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow, sdk-go
>Reporter: Kyle Weaver
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Currently, we default to us-central1 if --region flag is not set. The Google 
> Cloud SDK generally tries to get a default value in this case for 
> convenience, which we should follow. 
> [https://cloud.google.com/compute/docs/gcloud-compute/#order_of_precedence_for_default_properties]
> Update 11/12: this is complete for Python and Java, Go remains.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-8472) Get default GCP region from gcloud

2020-04-13 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke reassigned BEAM-8472:
--

Assignee: (was: Kyle Weaver)

> Get default GCP region from gcloud
> --
>
> Key: BEAM-8472
> URL: https://issues.apache.org/jira/browse/BEAM-8472
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow, sdk-go
>Reporter: Kyle Weaver
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Currently, we default to us-central1 if --region flag is not set. The Google 
> Cloud SDK generally tries to get a default value in this case for 
> convenience, which we should follow. 
> [https://cloud.google.com/compute/docs/gcloud-compute/#order_of_precedence_for_default_properties]
> Update 11/12: this is complete for Python and Java, Go remains.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9746) [Go SDK] Empty side inputs causing spurious zero elements

2020-04-13 Thread Robert Burke (Jira)
Robert Burke created BEAM-9746:
--

 Summary: [Go SDK] Empty side inputs causing spurious zero elements
 Key: BEAM-9746
 URL: https://issues.apache.org/jira/browse/BEAM-9746
 Project: Beam
  Issue Type: Improvement
  Components: sdk-go
Reporter: Robert Burke
Assignee: Robert Burke


A user discovered that empty side inputs would spuriously provide a single zero 
element.

The error was narrowed down to the Go SDK's state manager code  copying the 
stateGetResponse data wasn't checking that the original data source even had 
any bytes in it, leading it in particular to interpret length prefixed data as 
having 0 length, which would cause zero value elements to be generated. 
Notably, this caused empty strings.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-9731) golang passert.Equals output is unhelpful

2020-04-09 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke closed BEAM-9731.
--
Fix Version/s: Not applicable
   Resolution: Fixed

Thanks Paul Fisher!

> golang passert.Equals output is unhelpful
> -
>
> Key: BEAM-9731
> URL: https://issues.apache.org/jira/browse/BEAM-9731
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go, testing
>Reporter: Paul Fisher
>Priority: Minor
> Fix For: Not applicable
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> The output from using passert.Equals includes only one of the missing or 
> unexpected elements from the diff. Including all of the missing and 
> unexpected elements will make tests much easier to debug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9690) Go build failing: undefined: primitives.Reshuffle(KV)

2020-04-03 Thread Robert Burke (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074936#comment-17074936
 ] 

Robert Burke commented on BEAM-9690:


I've been unable to replicate this issue locally, and the post commits are
differently broken at present due to artifact issues, though when they were
first committed, they did correctly run in post commit.




> Go build failing: undefined: primitives.Reshuffle(KV)
> -
>
> Key: BEAM-9690
> URL: https://issues.apache.org/jira/browse/BEAM-9690
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Kyle Weaver
>Assignee: Robert Burke
>Priority: Major
>
> Go SDK build is failing on head (1d3e3ef9ffb4aaa913dc223d92626ca9f0f43207). I 
> tried ./gradlew sdks:go:clean but it didn't seem to make a difference.
> Logs:
> ./gradlew :sdks:go:container:docker
> Resolving dependencies...
> # github.com/apache/beam/sdks/go/test/integration
> .gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/integration/driver.go:67:27:
>  undefined: primitives.Reshuffle
> .gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/integration/driver.go:68:29:
>  undefined: primitives.ReshuffleKV
> > Task :sdks:go:buildDarwinAmd64 FAILED
> FAILURE: Build failed with an exception.
> * What went wrong:
> Execution failed for task ':sdks:go:buildDarwinAmd64'.
> > Build failed due to return code 2 of: 
>   Command:
>/Users/kcweaver/.gradle/go/binary/1.12/go/bin/go build -o 
> ./build/bin/integration github.com/apache/beam/sdks/go/test/integration
>   Env:
>GOEXE=
>
> GOPATH=/Users/kcweaver/go/src/github.com/apache/beam/sdks/go/.gogradle/project_gopath
>GOROOT=/Users/kcweaver/.gradle/go/binary/1.12/go
>GOOS=darwin
>GOARCH=amd64



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9676) Go SDK Code Katas

2020-04-02 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke updated BEAM-9676:
---
Status: Open  (was: Triage Needed)

> Go SDK Code Katas
> -
>
> Key: BEAM-9676
> URL: https://issues.apache.org/jira/browse/BEAM-9676
> Project: Beam
>  Issue Type: Improvement
>  Components: katas, sdk-go
>Reporter: Robert Burke
>Assignee: Damon Douglas
>Priority: Major
>
> There should be code katas for the Go SDK similar to the Java and Python SDKs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-9676) Go SDK Code Katas

2020-04-02 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke reassigned BEAM-9676:
--

Assignee: Damon Douglas

> Go SDK Code Katas
> -
>
> Key: BEAM-9676
> URL: https://issues.apache.org/jira/browse/BEAM-9676
> Project: Beam
>  Issue Type: Improvement
>  Components: katas, sdk-go
>Reporter: Robert Burke
>Assignee: Damon Douglas
>Priority: Major
>
> There should be code katas for the Go SDK similar to the Java and Python SDKs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9676) Go SDK Code Katas

2020-04-02 Thread Robert Burke (Jira)
Robert Burke created BEAM-9676:
--

 Summary: Go SDK Code Katas
 Key: BEAM-9676
 URL: https://issues.apache.org/jira/browse/BEAM-9676
 Project: Beam
  Issue Type: Improvement
  Components: katas, sdk-go
Reporter: Robert Burke


There should be code katas for the Go SDK similar to the Java and Python SDKs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9667) Allow metrics use during DoFn Setup

2020-04-01 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke updated BEAM-9667:
---
Status: Open  (was: Triage Needed)

> Allow metrics use during DoFn Setup
> ---
>
> Key: BEAM-9667
> URL: https://issues.apache.org/jira/browse/BEAM-9667
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Robert Burke
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> User found a bug where runners were crashing because the PTransform label for 
> metrics were not being populated by the Go SDK. It was narrowed down to the 
> Setup method not populating the PTransformId context, but providing a bundle 
> context.
> As long as users aren't caching the context in their DoFns, populating the 
> PTransformID for Setup should be safe as long as we don't cache it, as the 
> bundle Id will be different for subsequent executions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9667) Allow metrics use during DoFn Setup

2020-04-01 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke updated BEAM-9667:
---
Description: 
User found a bug where runners were crashing because the PTransform label for 
metrics were not being populated by the Go SDK. It was narrowed down to the 
Setup method not populating the PTransformId context, but providing a bundle 
context.

As long as users aren't caching the context in their DoFns, populating the 
PTransformID for Setup should be safe as long as we don't cache it, as the 
bundle Id will be different for subsequent executions.

> Allow metrics use during DoFn Setup
> ---
>
> Key: BEAM-9667
> URL: https://issues.apache.org/jira/browse/BEAM-9667
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Robert Burke
>Priority: Minor
>
> User found a bug where runners were crashing because the PTransform label for 
> metrics were not being populated by the Go SDK. It was narrowed down to the 
> Setup method not populating the PTransformId context, but providing a bundle 
> context.
> As long as users aren't caching the context in their DoFns, populating the 
> PTransformID for Setup should be safe as long as we don't cache it, as the 
> bundle Id will be different for subsequent executions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9667) Allow metrics use during DoFn Setup

2020-04-01 Thread Robert Burke (Jira)
Robert Burke created BEAM-9667:
--

 Summary: Allow metrics use during DoFn Setup
 Key: BEAM-9667
 URL: https://issues.apache.org/jira/browse/BEAM-9667
 Project: Beam
  Issue Type: Bug
  Components: sdk-go
Reporter: Robert Burke






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9616) [Go SDK] starcgen improvements

2020-03-26 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke updated BEAM-9616:
---
Labels: golang  (was: )

> [Go SDK] starcgen improvements
> --
>
> Key: BEAM-9616
> URL: https://issues.apache.org/jira/browse/BEAM-9616
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Robert Burke
>Priority: Major
>  Labels: golang
>
> The starcgen code generator works OK, but could do with some improvements.
>  * Uniquifying imports (handling multiple imports with same short suffix)
>  * Generating multiple iterNatives (eg when the normal symbol is already 
> taken).
>  * Keying off of beam.Register* calls rather than command line.
>  **  Avoids duplicating lists of identifiers, and improves default behavior.
>  ** Possibly have a new beam.RegisterDoFn which can take a list of DoFn and 
> struct types a function or a struct, and key off those, reducing boiler plate 
> somewhat.
>  * Perhaps having a specific single import alias package for components 
> required for import, rather than the current 3-4.
>  * Generate efficient Beam Schema coders for registered types?
>  * Handle SplittableDoFns properly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9616) [Go SDK] starcgen improvements

2020-03-26 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke updated BEAM-9616:
---
Status: Open  (was: Triage Needed)

> [Go SDK] starcgen improvements
> --
>
> Key: BEAM-9616
> URL: https://issues.apache.org/jira/browse/BEAM-9616
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Robert Burke
>Priority: Major
>
> The starcgen code generator works OK, but could do with some improvements.
>  * Uniquifying imports (handling multiple imports with same short suffix)
>  * Generating multiple iterNatives (eg when the normal symbol is already 
> taken).
>  * Keying off of beam.Register* calls rather than command line.
>  **  Avoids duplicating lists of identifiers, and improves default behavior.
>  ** Possibly have a new beam.RegisterDoFn which can take a list of DoFn and 
> struct types a function or a struct, and key off those, reducing boiler plate 
> somewhat.
>  * Perhaps having a specific single import alias package for components 
> required for import, rather than the current 3-4.
>  * Generate efficient Beam Schema coders for registered types?
>  * Handle SplittableDoFns properly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9616) [Go SDK] starcgen improvements

2020-03-26 Thread Robert Burke (Jira)
Robert Burke created BEAM-9616:
--

 Summary: [Go SDK] starcgen improvements
 Key: BEAM-9616
 URL: https://issues.apache.org/jira/browse/BEAM-9616
 Project: Beam
  Issue Type: Improvement
  Components: sdk-go
Reporter: Robert Burke


The starcgen code generator works OK, but could do with some improvements.
 * Uniquifying imports (handling multiple imports with same short suffix)
 * Generating multiple iterNatives (eg when the normal symbol is already taken).
 * Keying off of beam.Register* calls rather than command line.
 **  Avoids duplicating lists of identifiers, and improves default behavior.
 ** Possibly have a new beam.RegisterDoFn which can take a list of DoFn and 
struct types a function or a struct, and key off those, reducing boiler plate 
somewhat.
 * Perhaps having a specific single import alias package for components 
required for import, rather than the current 3-4.
 * Generate efficient Beam Schema coders for registered types?
 * Handle SplittableDoFns properly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9615) [Go SDK] Beam Schemas

2020-03-26 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke updated BEAM-9615:
---
Status: Open  (was: Triage Needed)

> [Go SDK] Beam Schemas
> -
>
> Key: BEAM-9615
> URL: https://issues.apache.org/jira/browse/BEAM-9615
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Robert Burke
>Priority: Major
>
> Schema support is required for advanced cross language features in Beam, and 
> has the opportunity to replace the current default JSON encoding of elements.
>  
> Some quick notes, though a better fleshed out doc with details will be 
> forthcoming:
>  * All base coders should be implemented, and listed as coder capabilities. I 
> think only stringutf8 is missing presently.
>  * Should support fairly arbitrary user types, seamlessly. That is, users 
> should be able to rely on it "just working" if their type is compatible.
>  * Should support schema metadata tagging.
> In particular, one breaking shift in the default will be to explicitly fail 
> pipelines if elements have unexported fields, when no other custom coder has 
> been added. This has been a source of errors/dropped data/keys and a simply 
> warning at construction time won't cut it. However, we could provide a manual 
> "use beam schemas, but ignore unexported fields" registration as a work 
> around.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9615) [Go SDK] Beam Schemas

2020-03-26 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke updated BEAM-9615:
---
Description: 
Schema support is required for advanced cross language features in Beam, and 
has the opportunity to replace the current default JSON encoding of elements.

 

Some quick notes, though a better fleshed out doc with details will be 
forthcoming:
 * All base coders should be implemented, and listed as coder capabilities. I 
think only stringutf8 is missing presently.
 * Should support fairly arbitrary user types, seamlessly. That is, users 
should be able to rely on it "just working" if their type is compatible.
 * Should support schema metadata tagging.

In particular, one breaking shift in the default will be to explicitly fail 
pipelines if elements have unexported fields, when no other custom coder has 
been added. This has been a source of errors/dropped data/keys and a simply 
warning at construction time won't cut it. However, we could provide a manual 
"use beam schemas, but ignore unexported fields" registration as a work around.

  was:
Schema support is required for advanced cross language features in Beam, and 
has the opportunity to replace the current default JSON encoding of elements.

 

Some quick notes
 * All base coders should be implemented, and listed as coder capabilities. I 
think only stringutf8 is missing presently.
 * Should support fairly arbitrary user types, seamlessly. That is, users 
should be able to rely on it "just working" if their type is compatible.
 * Should support schema metadata tagging.

In particular, one breaking shift in the default will be to explicitly fail 
pipelines if elements have unexported fields, when no other custom coder has 
been added. This has been a source of errors/dropped data/keys and a simply 
warning at construction time won't cut it. However, we could provide a manual 
"use beam schemas, but ignore unexported fields" registration as a work around.


> [Go SDK] Beam Schemas
> -
>
> Key: BEAM-9615
> URL: https://issues.apache.org/jira/browse/BEAM-9615
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Robert Burke
>Priority: Major
>
> Schema support is required for advanced cross language features in Beam, and 
> has the opportunity to replace the current default JSON encoding of elements.
>  
> Some quick notes, though a better fleshed out doc with details will be 
> forthcoming:
>  * All base coders should be implemented, and listed as coder capabilities. I 
> think only stringutf8 is missing presently.
>  * Should support fairly arbitrary user types, seamlessly. That is, users 
> should be able to rely on it "just working" if their type is compatible.
>  * Should support schema metadata tagging.
> In particular, one breaking shift in the default will be to explicitly fail 
> pipelines if elements have unexported fields, when no other custom coder has 
> been added. This has been a source of errors/dropped data/keys and a simply 
> warning at construction time won't cut it. However, we could provide a manual 
> "use beam schemas, but ignore unexported fields" registration as a work 
> around.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9615) [Go SDK] Beam Schemas

2020-03-26 Thread Robert Burke (Jira)
Robert Burke created BEAM-9615:
--

 Summary: [Go SDK] Beam Schemas
 Key: BEAM-9615
 URL: https://issues.apache.org/jira/browse/BEAM-9615
 Project: Beam
  Issue Type: New Feature
  Components: sdk-go
Reporter: Robert Burke
Assignee: Robert Burke


Schema support is required for advanced cross language features in Beam, and 
has the opportunity to replace the current default JSON encoding of elements.

 

Some quick notes
 * All base coders should be implemented, and listed as coder capabilities. I 
think only stringutf8 is missing presently.
 * Should support fairly arbitrary user types, seamlessly. That is, users 
should be able to rely on it "just working" if their type is compatible.
 * Should support schema metadata tagging.

In particular, one breaking shift in the default will be to explicitly fail 
pipelines if elements have unexported fields, when no other custom coder has 
been added. This has been a source of errors/dropped data/keys and a simply 
warning at construction time won't cut it. However, we could provide a manual 
"use beam schemas, but ignore unexported fields" registration as a work around.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9614) Declare versioned capability for identifying the Go SDK.

2020-03-26 Thread Robert Burke (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067912#comment-17067912
 ] 

Robert Burke commented on BEAM-9614:


A quick search doesn't indicate any good way to do this without simply having a 
go file somewhere that gets updated for each release.  Right now the Dataflow 
runner package has a constant which declares the version to be 0.5.0, but 
ideally it's something we can include in some script that generates the release 
branches.

> Declare versioned capability for identifying the Go SDK.
> 
>
> Key: BEAM-9614
> URL: https://issues.apache.org/jira/browse/BEAM-9614
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Robert Bradshaw
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-8292) Add a Reshuffle PTransform preventing fusion of the surrounding transforms

2020-03-23 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke reassigned BEAM-8292:
--

Assignee: Robert Burke

> Add a Reshuffle PTransform preventing fusion of the surrounding transforms
> --
>
> Key: BEAM-8292
> URL: https://issues.apache.org/jira/browse/BEAM-8292
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go
>Reporter: John Patoch
>Assignee: Robert Burke
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Reshuffle is a PTransform that takes a PCollection and shuffles the data 
> to help increase parallelism.
> Reshuffle adds a temporary random key to each element, performs a
>  GroupByKey, and finally removes the temporary key.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9551) Pass around Environment PB as pointer not value

2020-03-20 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke resolved BEAM-9551.

Fix Version/s: Not applicable
   Resolution: Fixed

> Pass around Environment PB as pointer not value
> ---
>
> Key: BEAM-9551
> URL: https://issues.apache.org/jira/browse/BEAM-9551
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Robert Burke
>Priority: Minor
> Fix For: Not applicable
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Go Protocol buffers prefer being passed around by Pointer than by value.  
> Caught by a linter, and should be fixed for good practice.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9551) Pass around Environment PB as pointer not value

2020-03-18 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke updated BEAM-9551:
---
Status: Open  (was: Triage Needed)

> Pass around Environment PB as pointer not value
> ---
>
> Key: BEAM-9551
> URL: https://issues.apache.org/jira/browse/BEAM-9551
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Robert Burke
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Go Protocol buffers prefer being passed around by Pointer than by value.  
> Caught by a linter, and should be fixed for good practice.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9551) Pass around Environment PB as pointer not value

2020-03-18 Thread Robert Burke (Jira)
Robert Burke created BEAM-9551:
--

 Summary: Pass around Environment PB as pointer not value
 Key: BEAM-9551
 URL: https://issues.apache.org/jira/browse/BEAM-9551
 Project: Beam
  Issue Type: Bug
  Components: sdk-go
Reporter: Robert Burke
Assignee: Robert Burke


Go Protocol buffers prefer being passed around by Pointer than by value.  
Caught by a linter, and should be fixed for good practice.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-9374) Go Postcommits not pulling right container name

2020-02-24 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke closed BEAM-9374.
--
Fix Version/s: Not applicable
   Resolution: Fixed

They were fixed by 
[https://github.com/apache/beam/commit/88914cf7c79ca185e2f67a03a7d1dc57372c6873#diff-2f9709e332964eeedae560738d7e]
 before this was filed.  I had stale pages, which had the old content cached 
somehow.

> Go Postcommits not pulling right container name
> ---
>
> Key: BEAM-9374
> URL: https://issues.apache.org/jira/browse/BEAM-9374
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Hannah Jiang
>Priority: Major
> Fix For: Not applicable
>
>
> It looks like a script variable CONTAINERS wasn't updated in 
> [https://github.com/apache/beam/pull/10612] , causing the container pull to 
> fail.
> [https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/2518/]
> [https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/consoleText]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-9374) Go Postcommits not pulling right container name

2020-02-24 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke reassigned BEAM-9374:
--

Assignee: Hannah Jiang  (was: Robert Burke)

> Go Postcommits not pulling right container name
> ---
>
> Key: BEAM-9374
> URL: https://issues.apache.org/jira/browse/BEAM-9374
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Hannah Jiang
>Priority: Major
>
> It looks like a script variable CONTAINERS wasn't updated in 
> [https://github.com/apache/beam/pull/10612] , causing the container pull to 
> fail.
> [https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/2518/]
> [https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/consoleText]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9374) Go Postcommits not pulling right container name

2020-02-24 Thread Robert Burke (Jira)
Robert Burke created BEAM-9374:
--

 Summary: Go Postcommits not pulling right container name
 Key: BEAM-9374
 URL: https://issues.apache.org/jira/browse/BEAM-9374
 Project: Beam
  Issue Type: Bug
  Components: sdk-go
Reporter: Robert Burke
Assignee: Robert Burke


It looks like a script variable CONTAINERS wasn't updated in 
[https://github.com/apache/beam/pull/10612] , causing the container pull to 
fail.

[https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/2518/]

[https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/consoleText]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-6374) "elements added" for input and output collections is always empty

2020-02-22 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke reassigned BEAM-6374:
--

Assignee: Robert Burke

> "elements added" for input and output collections is always empty
> -
>
> Key: BEAM-6374
> URL: https://issues.apache.org/jira/browse/BEAM-6374
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-go
>Reporter: Andrew Brampton
>Assignee: Robert Burke
>Priority: Major
>
> The field for "Elements added" and "Estimated size" is always blank when 
> running a Go binary on Dataflow. For example when running the work count 
> example: https://pasteboard.co/HVf80BU.png



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-3306) Consider: Go coder registry

2020-02-19 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke resolved BEAM-3306.

Fix Version/s: Not applicable
   Resolution: Fixed

Go supports a coder registry w/beam.RegisterCoder

Remaining work might be to optionally support "direct" access to an io.Reader 
or io.Writer interface which could yield efficiency gains in some situations 
for user types.

> Consider: Go coder registry
> ---
>
> Key: BEAM-3306
> URL: https://issues.apache.org/jira/browse/BEAM-3306
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Henning Rohde
>Assignee: Robert Burke
>Priority: Minor
> Fix For: Not applicable
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Add coder registry to allow easier overwrite of default coders. We may also 
> allow otherwise un-encodable types, but that would require that function 
> analysis depends on it.
> If we're hardcoding support for proto/avro, then there may be little need for 
> such a feature. Conversely, this may be how we implement such support.
>  
> Proposal Doc: 
> [https://docs.google.com/document/d/1kQwx4Ah6PzG8z2ZMuNsNEXkGsLXm6gADOZaIO7reUOg/edit#|https://docs.google.com/document/d/1kQwx4Ah6PzG8z2ZMuNsNEXkGsLXm6gADOZaIO7reUOg/edit]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-3545) Fn API metrics in Go SDK harness

2020-02-04 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke reassigned BEAM-3545:
--

Assignee: Robert Burke

> Fn API metrics in Go SDK harness
> 
>
> Key: BEAM-3545
> URL: https://issues.apache.org/jira/browse/BEAM-3545
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-go
>Reporter: Kenneth Knowles
>Assignee: Robert Burke
>Priority: Major
>  Labels: portability
>  Time Spent: 13h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9167) Reduce overhead of Go SDK side metrics

2020-02-04 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke resolved BEAM-9167.

Fix Version/s: Not applicable
   Resolution: Fixed

SDK side performance of user metrics is now reduced significantly if the proxy 
object is used. There's other metrics related work (eg. framework metrics 
around PCollections and ParDos, programmatic extraction, using the updated 
Monitoring infos), but they are tracked by other JIRAs.

> Reduce overhead of Go SDK side metrics
> --
>
> Key: BEAM-9167
> URL: https://issues.apache.org/jira/browse/BEAM-9167
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Robert Burke
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Locking overhead due to the global store and local caches of SDK counter data 
> can dominate certain workloads, which means we can do better.
> Instead of having a global store of metrics data to extract counters, we 
> should use per ptransform (or per bundle) counter sets, which would avoid 
> requiring locking per counter operation. The main detriment compared to the 
> current implementation is that a user would need to add their own locking if 
> they were to spawn multiple goroutines to process a Bundle's work in a DoFn.
> Given that self multithreaded DoFns aren't recommended/safe in Java,  largely 
> impossible in Python, and the other beam Go SDK provided constructs (like 
> Iterators and Emitters) are not thread safe, this is a small concern, 
> provided the documentation is clear on this.
> Removing the locking and switching to atomic ops reduces the overhead 
> significantly in example jobs and in the benchmarks.
> A second part of this change should be to move the exec package to manage 
> it's own per bundle state, rather than relying on a global datastore to 
> extract the per bundle,per ptransform values.
> Related: https://issues.apache.org/jira/browse/BEAM-6541 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-7726) [Go SDK] State Backed Iterables

2020-02-04 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke resolved BEAM-7726.

Resolution: Fixed

The Go SDK now supports using State Backed iterables if the runner triggers it.

> [Go SDK] State Backed Iterables
> ---
>
> Key: BEAM-7726
> URL: https://issues.apache.org/jira/browse/BEAM-7726
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Affects Versions: Not applicable
>Reporter: Robert Burke
>Assignee: Robert Burke
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> The Go SDK should support the State backed iterables protocol per the proto.
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L644]
>  
> Primary case is for iterables after CoGBKs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-7726) [Go SDK] State Backed Iterables

2020-02-04 Thread Robert Burke (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17004337#comment-17004337
 ] 

Robert Burke edited comment on BEAM-7726 at 2/4/20 10:46 PM:
-

The data channel is correctly multiplexing bundles. There's no other way to do 
the multiple streams thing in the current protocol and GRPC without the runner 
having multiple endpoints, or the process doing so (eg. Multiple SDK Harnesses 
per worker, which is how python handles it).

I think I have a resolution for state backed iterables blocking the 
datachannel, which will work for any runners that support datasource split 
requests. If the data channel is eventually split down to a the current value 
and no more, we can close the reader, which will cause the channel to be 
unblocked. Any buffered data will be drained. Care needs to be taken to avoid 
deadlocking or dataloss or race conditions, but there should only be lock 
contention  when the Split thread is closing the reader.

Edit (2020/02/04): I wasn't able to confirm that this actually worked better, 
and even though there was no material locking overhead, the additional 
complexity to that part of the code isn't worth  questionable benefits. Tabling 
for now. 


was (Author: lostluck):
The data channel is correctly multiplexing bundles. There's no other way to do 
the multiple streams thing in the current protocol and GRPC without the runner 
having multiple endpoints, or the process doing so (eg. Multiple SDK Harnesses 
per worker, which is how python handles it).

I think I have a resolution for state backed iterables blocking the 
datachannel, which will work for any runners that support datasource split 
requests. If the data channel is eventually split down to a the current value 
and no more, we can close the reader, which will cause the channel to be 
unblocked. Any buffered data will be drained. Care needs to be taken to avoid 
deadlocking or dataloss or race conditions, but there should only be lock 
contention  when the Split thread is closing the reader.

Edit: I wasn't able to confirm that this actually worked better, and even 
though there was no material locking overhead, the additional complexity to 
that part of the code isn't worth  questionable benefits. Tabling for now. 

> [Go SDK] State Backed Iterables
> ---
>
> Key: BEAM-7726
> URL: https://issues.apache.org/jira/browse/BEAM-7726
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Affects Versions: Not applicable
>Reporter: Robert Burke
>Assignee: Robert Burke
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> The Go SDK should support the State backed iterables protocol per the proto.
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L644]
>  
> Primary case is for iterables after CoGBKs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-7726) [Go SDK] State Backed Iterables

2020-02-04 Thread Robert Burke (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17004337#comment-17004337
 ] 

Robert Burke edited comment on BEAM-7726 at 2/4/20 10:45 PM:
-

The data channel is correctly multiplexing bundles. There's no other way to do 
the multiple streams thing in the current protocol and GRPC without the runner 
having multiple endpoints, or the process doing so (eg. Multiple SDK Harnesses 
per worker, which is how python handles it).

I think I have a resolution for state backed iterables blocking the 
datachannel, which will work for any runners that support datasource split 
requests. If the data channel is eventually split down to a the current value 
and no more, we can close the reader, which will cause the channel to be 
unblocked. Any buffered data will be drained. Care needs to be taken to avoid 
deadlocking or dataloss or race conditions, but there should only be lock 
contention  when the Split thread is closing the reader.

Edit: I wasn't able to confirm that this actually worked better, and even 
though there was no material locking overhead, the additional complexity to 
that part of the code isn't worth  questionable benefits. Tabling for now. 


was (Author: lostluck):
The data channel is correctly multiplexing bundles. There's no other way to do 
the multiple streams thing in the current protocol and GRPC without the runner 
having multiple endpoints, or the process doing so (eg. Multiple SDK Harnesses 
per worker, which is how python handles it).

I think I have a resolution for state backed iterables blocking the 
datachannel, which will work for any runners that support datasource split 
requests. If the data channel is eventually split down to a the current value 
and no more, we can close the reader, which will cause the channel to be 
unblocked. Any buffered data will be drained. Care needs to be taken to avoid 
deadlocking or dataloss or race conditions, but there should only be lock 
contention  when the Split thread is closing the reader.

 

> [Go SDK] State Backed Iterables
> ---
>
> Key: BEAM-7726
> URL: https://issues.apache.org/jira/browse/BEAM-7726
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Affects Versions: Not applicable
>Reporter: Robert Burke
>Assignee: Robert Burke
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> The Go SDK should support the State backed iterables protocol per the proto.
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L644]
>  
> Primary case is for iterables after CoGBKs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-9233) Go: unregistered Go functions fail when using -buildmode=pie -ldflags=-w

2020-01-31 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke closed BEAM-9233.
--
Fix Version/s: Not applicable
   Resolution: Fixed

Fixed by linked patch. Thanks!

> Go: unregistered Go functions fail when using -buildmode=pie -ldflags=-w
> 
>
> Key: BEAM-9233
> URL: https://issues.apache.org/jira/browse/BEAM-9233
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
> Environment: GNU/Linux
>Reporter: Ian Lance Taylor
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> If a Go program is built with -buildmode=pie -ldflags=-w, the code that 
> transfers an unregistered function fails.  It tries to look up the symbol in 
> the DWARF debug info, but that info has been stripped because of the -w flag. 
>  This causes a program crash when calling the function.
> I have a patch for this problem that I will send shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9233) Go: unregistered Go functions fail when using -buildmode=pie -ldflags=-w

2020-01-31 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke updated BEAM-9233:
---
Affects Version/s: (was: 2.18.0)

> Go: unregistered Go functions fail when using -buildmode=pie -ldflags=-w
> 
>
> Key: BEAM-9233
> URL: https://issues.apache.org/jira/browse/BEAM-9233
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
> Environment: GNU/Linux
>Reporter: Ian Lance Taylor
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> If a Go program is built with -buildmode=pie -ldflags=-w, the code that 
> transfers an unregistered function fails.  It tries to look up the symbol in 
> the DWARF debug info, but that info has been stripped because of the -w flag. 
>  This causes a program crash when calling the function.
> I have a patch for this problem that I will send shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-6498) Consider using sync/atomic for Go SDK metrics.

2020-01-22 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke resolved BEAM-6498.

Fix Version/s: Not applicable
   Resolution: Fixed

Resolved in [GitHub Pull Request 
#10654|https://github.com/apache/beam/pull/10654] instead. In particular 
counters were updated to use atomics, and the lock adds ~10ns for the other two 
types, which is fine given they do more work.

> Consider using sync/atomic for Go SDK metrics.
> --
>
> Key: BEAM-6498
> URL: https://issues.apache.org/jira/browse/BEAM-6498
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-go
>Affects Versions: Not applicable
>Reporter: Robert Burke
>Assignee: Robert Burke
>Priority: Minor
> Fix For: Not applicable
>
>
> Changing a portion of the metrics code to use the atomic counters might yield 
> a performance improvement and the opportunity to remove a lock or two.
> Care needs to be taken though: 
> [https://stackoverflow.com/questions/47445344/is-there-a-difference-in-go-between-a-counter-using-atomic-operations-and-one-us]
> The outcome of this task is a benchmark demonstrating the benefit (or 
> detriment) in a quasi-real situation for the Go SDK, and if warranted 
> switching metrics where possible, to use atomics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-6498) Consider using sync/atomic for Go SDK metrics.

2020-01-22 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke reassigned BEAM-6498:
--

Assignee: Robert Burke

> Consider using sync/atomic for Go SDK metrics.
> --
>
> Key: BEAM-6498
> URL: https://issues.apache.org/jira/browse/BEAM-6498
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-go
>Affects Versions: Not applicable
>Reporter: Robert Burke
>Assignee: Robert Burke
>Priority: Minor
>
> Changing a portion of the metrics code to use the atomic counters might yield 
> a performance improvement and the opportunity to remove a lock or two.
> Care needs to be taken though: 
> [https://stackoverflow.com/questions/47445344/is-there-a-difference-in-go-between-a-counter-using-atomic-operations-and-one-us]
> The outcome of this task is a benchmark demonstrating the benefit (or 
> detriment) in a quasi-real situation for the Go SDK, and if warranted 
> switching metrics where possible, to use atomics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9167) Reduce overhead of Go SDK side metrics

2020-01-22 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke updated BEAM-9167:
---
Parent: BEAM-4725
Issue Type: Sub-task  (was: Improvement)

> Reduce overhead of Go SDK side metrics
> --
>
> Key: BEAM-9167
> URL: https://issues.apache.org/jira/browse/BEAM-9167
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Robert Burke
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Locking overhead due to the global store and local caches of SDK counter data 
> can dominate certain workloads, which means we can do better.
> Instead of having a global store of metrics data to extract counters, we 
> should use per ptransform (or per bundle) counter sets, which would avoid 
> requiring locking per counter operation. The main detriment compared to the 
> current implementation is that a user would need to add their own locking if 
> they were to spawn multiple goroutines to process a Bundle's work in a DoFn.
> Given that self multithreaded DoFns aren't recommended/safe in Java,  largely 
> impossible in Python, and the other beam Go SDK provided constructs (like 
> Iterators and Emitters) are not thread safe, this is a small concern, 
> provided the documentation is clear on this.
> Removing the locking and switching to atomic ops reduces the overhead 
> significantly in example jobs and in the benchmarks.
> A second part of this change should be to move the exec package to manage 
> it's own per bundle state, rather than relying on a global datastore to 
> extract the per bundle,per ptransform values.
> Related: https://issues.apache.org/jira/browse/BEAM-6541 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-6148) Support Go "Unit" tests on arbitrary runners

2020-01-22 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke resolved BEAM-6148.

Fix Version/s: Not applicable
   Resolution: Fixed

> Support Go "Unit" tests on arbitrary runners
> 
>
> Key: BEAM-6148
> URL: https://issues.apache.org/jira/browse/BEAM-6148
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Robert Burke
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> There's no clear path to testing pipelines on runners other than the direct 
> runner. It should be possibly to "redirect" tests to use a runner of choice. 
> This would enable more "testy" ValidatesRunner tests in Go.
>  
> In particular, users should need to at least _ import the runner they want, 
> and be able to set a flag.
> The tricky bit is ensuring beam.Init is called so that each individual test 
> can convert to WorkerMode when it's spun up as a SDK harness. This can be 
> done by having a TestMain. 
> ptest should provide convenience functions to help with this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-6371) Add support for reading and writing to CSV files

2020-01-22 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke reassigned BEAM-6371:
--

Assignee: (was: Robert Burke)

> Add support for reading and writing to CSV files
> 
>
> Key: BEAM-6371
> URL: https://issues.apache.org/jira/browse/BEAM-6371
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Andrew Brampton
>Priority: Major
>
> A very simple CSV Reader and Writer could be created, similar to [this 
> one|https://github.com/bramp/morebeam/tree/master/csvio].
> It would support reading a header, and support similar options to the 
> standard [go csv package|https://golang.org/pkg/encoding/csv/].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-5354) Side Inputs seems to be non-working in the sdk-go

2020-01-22 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke resolved BEAM-5354.

Fix Version/s: Not applicable
   Resolution: Fixed

> Side Inputs seems to be non-working in the sdk-go
> -
>
> Key: BEAM-5354
> URL: https://issues.apache.org/jira/browse/BEAM-5354
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Tomas Roos
>Assignee: Robert Burke
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Running the contains example fails with
>  
> {code:java}
> Output i0 for step was not found.
> {code}
> This is because of the call to debug.Head (which internally uses SideInput)
> Removing the following line 
> [https://github.com/apache/beam/blob/master/sdks/go/examples/contains/contains.go#L50]
>  
> The pipeline executes well.
>  
> Executed on id's
>  
> go-job-1-1536664417610678545 
> vs
> go-job-1-1536664934354466938
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-7928) Being able to specify disk type and disk size

2020-01-22 Thread Robert Burke (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021366#comment-17021366
 ] 

Robert Burke commented on BEAM-7928:


This is also about running templates of Go SDK jobs on Dataflow, which hasn't 
been tested at all. As per the usual, Dataflow doesn't currently support the Go 
SDK, so it's lucky if it works rather than intent.

> Being able to specify disk type and disk size
> -
>
> Key: BEAM-7928
> URL: https://issues.apache.org/jira/browse/BEAM-7928
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go
>Reporter: Thomas
>Priority: Major
>
> Hi everyone,
> I'm willing to launch a job from a template, so I'm using 
> [https://godoc.org/google.golang.org/api/dataflow/v1b3#CreateJobFromTemplateRequest]
>  and then I call the `Create` method.
> With this (particularly inside `RuntimeEnvironment` type) I'm able to specify 
> the machine type and so on, but I'm unable to precise disk settings (type and 
> size).
>  
> Do you think such settings could be there also? Or do I need to define them 
> with another way?
>  
> Thank you,



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-7928) Being able to specify disk type and disk size

2020-01-22 Thread Robert Burke (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021365#comment-17021365
 ] 

Robert Burke commented on BEAM-7928:


Apparently I don't get emails from JIRA  anymore. 

Adding new options/flags and ensuring they're plumbed through isn't difficult 
to do though. See [https://github.com/apache/beam/pull/9906] for an example PR 
to doing so. I'd be happy to review if you mention me: @lostluck 

> Being able to specify disk type and disk size
> -
>
> Key: BEAM-7928
> URL: https://issues.apache.org/jira/browse/BEAM-7928
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go
>Reporter: Thomas
>Priority: Major
>
> Hi everyone,
> I'm willing to launch a job from a template, so I'm using 
> [https://godoc.org/google.golang.org/api/dataflow/v1b3#CreateJobFromTemplateRequest]
>  and then I call the `Create` method.
> With this (particularly inside `RuntimeEnvironment` type) I'm able to specify 
> the machine type and so on, but I'm unable to precise disk settings (type and 
> size).
>  
> Do you think such settings could be there also? Or do I need to define them 
> with another way?
>  
> Thank you,



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-8166) Support Graceful shutdown of worker harness.

2020-01-22 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke closed BEAM-8166.
--
Fix Version/s: Not applicable
   Resolution: Fixed

> Support Graceful shutdown of worker harness.
> 
>
> Key: BEAM-8166
> URL: https://issues.apache.org/jira/browse/BEAM-8166
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core, sdk-go
>Reporter: Robert Burke
>Assignee: Robert Burke
>Priority: Minor
> Fix For: Not applicable
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Ideally there should be a clear Shutdown control RPC a runner can send a 
> worker harness to trigger an orderly shutdown.
> Absent that, errors on the runner side shouldn't manifest as SDK worker 
> harness errors. SDKs should log, and gracefully shutdown from GRPC errors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-8166) Support Graceful shutdown of worker harness.

2020-01-22 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke reassigned BEAM-8166:
--

Assignee: Robert Burke

> Support Graceful shutdown of worker harness.
> 
>
> Key: BEAM-8166
> URL: https://issues.apache.org/jira/browse/BEAM-8166
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core, sdk-go
>Reporter: Robert Burke
>Assignee: Robert Burke
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Ideally there should be a clear Shutdown control RPC a runner can send a 
> worker harness to trigger an orderly shutdown.
> Absent that, errors on the runner side shouldn't manifest as SDK worker 
> harness errors. SDKs should log, and gracefully shutdown from GRPC errors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-6541) Consider converting bundle & ptransform ids to ints eagerly.

2020-01-21 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke closed BEAM-6541.
--
Fix Version/s: Not applicable
 Assignee: Robert Burke
   Resolution: Won't Fix

I'm taking a different approach in 
https://issues.apache.org/jira/browse/BEAM-9167 which better relies on the 
structure bundles and ptransforms to reduce the overhead.

Granted, I'm also using the technique mentioned here, but with hashing the 
metric names rather than the higher level structs.

> Consider converting bundle & ptransform ids to ints eagerly.
> 
>
> Key: BEAM-6541
> URL: https://issues.apache.org/jira/browse/BEAM-6541
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Robert Burke
>Priority: Minor
> Fix For: Not applicable
>
>
> BundleIDs and PTransformIDs necessary for communicating with the Runner 
> interface in the go SDK are currently strings, and used as is for metrics 
> contexts. We use them for getting bundle & ptransform specific metrics, and 
> transmitting the same. We could instead eagerly assign them a local index 
> that is then converted out when communicating metrics over the FnAPI, this 
> would reduce overhead on metric lookups in the various maps.
> Note: the same could be done for the user's metric-name, completing the 
> optimization. Measuring the per-report overhead for tentative/final metric 
> reporting is required before committing to this approach.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9167) Reduce overhead of Go SDK side metrics

2020-01-21 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke updated BEAM-9167:
---
Description: 
Locking overhead due to the global store and local caches of SDK counter data 
can dominate certain workloads, which means we can do better.

Instead of having a global store of metrics data to extract counters, we should 
use per ptransform (or per bundle) counter sets, which would avoid requiring 
locking per counter operation. The main detriment compared to the current 
implementation is that a user would need to add their own locking if they were 
to spawn multiple goroutines to process a Bundle's work in a DoFn.

Given that self multithreaded DoFns aren't recommended/safe in Java,  largely 
impossible in Python, and the other beam Go SDK provided constructs (like 
Iterators and Emitters) are not thread safe, this is a small concern, provided 
the documentation is clear on this.

Removing the locking and switching to atomic ops reduces the overhead 
significantly in example jobs and in the benchmarks.

A second part of this change should be to move the exec package to manage it's 
own per bundle state, rather than relying on a global datastore to extract the 
per bundle,per ptransform values.

Related: https://issues.apache.org/jira/browse/BEAM-6541 

  was:
Locking overhead due to the global store and local caches of SDK counter data 
can dominate certain workloads, which means we can do better.

Instead of having a global store of metrics data to extract counters, we should 
use per ptransform (or per bundle) counter sets, which would avoid requiring 
locking per counter operation. The main detriment compared to the current 
implementation is that a user would need to add their own locking if they were 
to spawn multiple goroutines to process a Bundle's work in a DoFn.

Given that self multithreaded DoFns aren't recommended/safe in Java,  largely 
impossible in Python, and the other beam Go SDK provided constructs (like 
Iterators and Emitters) are not thread safe, this is a small concern, provided 
the documentation is clear on this.

Removing the locking and switching to atomic ops reduces the overhead 
significantly in example jobs and in the benchmarks.

Related: https://issues.apache.org/jira/browse/BEAM-6541 


> Reduce overhead of Go SDK side metrics
> --
>
> Key: BEAM-9167
> URL: https://issues.apache.org/jira/browse/BEAM-9167
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Robert Burke
>Priority: Major
>
> Locking overhead due to the global store and local caches of SDK counter data 
> can dominate certain workloads, which means we can do better.
> Instead of having a global store of metrics data to extract counters, we 
> should use per ptransform (or per bundle) counter sets, which would avoid 
> requiring locking per counter operation. The main detriment compared to the 
> current implementation is that a user would need to add their own locking if 
> they were to spawn multiple goroutines to process a Bundle's work in a DoFn.
> Given that self multithreaded DoFns aren't recommended/safe in Java,  largely 
> impossible in Python, and the other beam Go SDK provided constructs (like 
> Iterators and Emitters) are not thread safe, this is a small concern, 
> provided the documentation is clear on this.
> Removing the locking and switching to atomic ops reduces the overhead 
> significantly in example jobs and in the benchmarks.
> A second part of this change should be to move the exec package to manage 
> it's own per bundle state, rather than relying on a global datastore to 
> extract the per bundle,per ptransform values.
> Related: https://issues.apache.org/jira/browse/BEAM-6541 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-9167) Reduce overhead of Go SDK side metrics

2020-01-21 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke reassigned BEAM-9167:
--

Assignee: Robert Burke

> Reduce overhead of Go SDK side metrics
> --
>
> Key: BEAM-9167
> URL: https://issues.apache.org/jira/browse/BEAM-9167
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Robert Burke
>Priority: Major
>
> Locking overhead due to the global store and local caches of SDK counter data 
> can dominate certain workloads, which means we can do better.
> Instead of having a global store of metrics data to extract counters, we 
> should use per ptransform (or per bundle) counter sets, which would avoid 
> requiring locking per counter operation. The main detriment compared to the 
> current implementation is that a user would need to add their own locking if 
> they were to spawn multiple goroutines to process a Bundle's work in a DoFn.
> Given that self multithreaded DoFns aren't recommended/safe in Java,  largely 
> impossible in Python, and the other beam Go SDK provided constructs (like 
> Iterators and Emitters) are not thread safe, this is a small concern, 
> provided the documentation is clear on this.
> Removing the locking and switching to atomic ops reduces the overhead 
> significantly in example jobs and in the benchmarks.
> Related: https://issues.apache.org/jira/browse/BEAM-6541 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9167) Reduce overhead of Go SDK side metrics

2020-01-21 Thread Robert Burke (Jira)
Robert Burke created BEAM-9167:
--

 Summary: Reduce overhead of Go SDK side metrics
 Key: BEAM-9167
 URL: https://issues.apache.org/jira/browse/BEAM-9167
 Project: Beam
  Issue Type: Improvement
  Components: sdk-go
Reporter: Robert Burke


Locking overhead due to the global store and local caches of SDK counter data 
can dominate certain workloads, which means we can do better.

Instead of having a global store of metrics data to extract counters, we should 
use per ptransform (or per bundle) counter sets, which would avoid requiring 
locking per counter operation. The main detriment compared to the current 
implementation is that a user would need to add their own locking if they were 
to spawn multiple goroutines to process a Bundle's work in a DoFn.

Given that self multithreaded DoFns aren't recommended/safe in Java,  largely 
impossible in Python, and the other beam Go SDK provided constructs (like 
Iterators and Emitters) are not thread safe, this is a small concern, provided 
the documentation is clear on this.

Removing the locking and switching to atomic ops reduces the overhead 
significantly in example jobs and in the benchmarks.

Related: https://issues.apache.org/jira/browse/BEAM-6541 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   >