[jira] [Created] (BEAM-10206) [Go SDK] Jenkins static checks
Robert Burke created BEAM-10206: --- Summary: [Go SDK] Jenkins static checks Key: BEAM-10206 URL: https://issues.apache.org/jira/browse/BEAM-10206 Project: Beam Issue Type: Improvement Components: sdk-go Reporter: Robert Burke We should probably hook up static checks [https://staticcheck.io|https://staticcheck.io/] to avoid style and lint regressions, and to run them locally and fix most of them. Additional configuration we could probably integrate should take the proto import conventions we've established (see PR 11927) so that we use consistent short names. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10206) [Go SDK] Jenkins static checks
[ https://issues.apache.org/jira/browse/BEAM-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke updated BEAM-10206: Status: Open (was: Triage Needed) > [Go SDK] Jenkins static checks > > > Key: BEAM-10206 > URL: https://issues.apache.org/jira/browse/BEAM-10206 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Robert Burke >Priority: P3 > > We should probably hook up static checks > [https://staticcheck.io|https://staticcheck.io/] to avoid style and lint > regressions, and to run them locally and fix most of them. > Additional configuration we could probably integrate should take the proto > import conventions we've established (see PR 11927) so that we use consistent > short names. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-10169) ParDo* functions should declare the correct output N in their error message
[ https://issues.apache.org/jira/browse/BEAM-10169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126926#comment-17126926 ] Robert Burke edited comment on BEAM-10169 at 6/5/20, 4:23 PM: -- Given that a given DoFn will only be suitable for one of the ParDo* methods, we're probably better off being less polite. And in Go, there's a strong got before want convention at least for test outputs. Further, while we're panicking, there's no reason we shouldn't clearly indicate the context of the error. So I was thinking: {{DoFn \{doFnName} has \{numOutputs} outputs, but ParDo\{parDoNum} requires \{parDoNum}. Use ParDo\{numOutputs} instead.}} Of course there's also edge cases to consider. Eg. should just print ParDo instead of ParDo1, and if there are more than 7 outputs, then it should recommend ParDoN instead. Having the doFnName helps localize which DoFn is being used wrong, and the panic trace will hopefully make the caller's line number unambiguous. Further, having the doFnName helps when some user is having ParDo called indirectly. This isn't perfect for all problems, but it covers most of them I think. What do you think of that? [~codeBehindMe] was (Author: lostluck): Given that a given DoFn will only be suitable for one of the ParDo* methods, we're probably better off being less polite. And in Go, there's a strong got before want convention at least for test outputs. Further, while we're panicking, there's no reason we shouldn't clearly indicate the context of the error. So I was thinking: ```DoFn \{doFnName} has \{numOutputs} outputs, but ParDo\{parDoNum} requires \{parDoNum}. Use ParDo\{numOutputs} instead.``` Of course there's also edge cases to consider. Eg. should just print ParDo instead of ParDo1, and if there are more than 7 outputs, then it should recommend ParDoN instead. Having the doFnName helps localize which DoFn is being used wrong, and the panic trace will hopefully make the caller's line number unambiguous. Further, having the doFnName helps when some user is having ParDo called indirectly. This isn't perfect for all problems, but it covers most of them I think. What do you think of that? [~codeBehindMe] > ParDo* functions should declare the correct output N in their error message > --- > > Key: BEAM-10169 > URL: https://issues.apache.org/jira/browse/BEAM-10169 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Robert Burke >Assignee: Aaron Tillekeratne >Priority: P3 > Labels: noob, starter > > User report noted the confusion in the error if you use a DoFn with 0 outputs > with beam.ParDo instead of beam.ParDo0. > In that case, a panic stack trace is followed by the cryptic: "expected 1 > output. Found: []" > We can do better. > While we can't change the return signature dynamically (that's for ParDoN > only), we can instead clearly indicate: > * the DoFn in question. > * the number of outputs the DoFn has > * and recommend using ParDo0, ParDo, ParDo2,...ParDo7, or ParDoN, as > appropriate. > https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/pardo.go#L361 > would need to change as well as any of the specific cases that follow. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-10169) ParDo* functions should declare the correct output N in their error message
[ https://issues.apache.org/jira/browse/BEAM-10169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126926#comment-17126926 ] Robert Burke edited comment on BEAM-10169 at 6/5/20, 4:22 PM: -- Given that a given DoFn will only be suitable for one of the ParDo* methods, we're probably better off being less polite. And in Go, there's a strong got before want convention at least for test outputs. Further, while we're panicking, there's no reason we shouldn't clearly indicate the context of the error. So I was thinking: ```DoFn \{doFnName} has \{numOutputs} outputs, but ParDo\{parDoNum} requires \{parDoNum}. Use ParDo\{numOutputs} instead.``` Of course there's also edge cases to consider. Eg. should just print ParDo instead of ParDo1, and if there are more than 7 outputs, then it should recommend ParDoN instead. Having the doFnName helps localize which DoFn is being used wrong, and the panic trace will hopefully make the caller's line number unambiguous. Further, having the doFnName helps when some user is having ParDo called indirectly. This isn't perfect for all problems, but it covers most of them I think. What do you think of that? [~codeBehindMe] was (Author: lostluck): Given that a given DoFn will only be suitable for one of the ParDo* methods, we're probably better off being less polite. And in Go, there's a strong got before want convention at least for test outputs. Further, while we're panicking, there's no reason we shouldn't clearly indicate the context of the error. So I was thinking: ```DoFn {doFnName} has {numOutputs} outputs, but ParDo{parDoNum} requires {parDoNum}. Use ParDo{numOutputs} instead.``` Of course there's also edge cases to consider. Eg. should just print ParDo instead of ParDo1, and if there are more than 7 outputs, then it should recommend ParDoN instead. Having the doFnName helps localize which DoFn is being used wrong, and the panic trace will hopefully make the caller's line number unambiguous. Further, having the doFnName helps when some user is having ParDo called indirectly. This isn't perfect for all problems, but it covers most of them I think. What do you think of that? [~codeBehindMe] > ParDo* functions should declare the correct output N in their error message > --- > > Key: BEAM-10169 > URL: https://issues.apache.org/jira/browse/BEAM-10169 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Robert Burke >Assignee: Aaron Tillekeratne >Priority: P3 > Labels: noob, starter > > User report noted the confusion in the error if you use a DoFn with 0 outputs > with beam.ParDo instead of beam.ParDo0. > In that case, a panic stack trace is followed by the cryptic: "expected 1 > output. Found: []" > We can do better. > While we can't change the return signature dynamically (that's for ParDoN > only), we can instead clearly indicate: > * the DoFn in question. > * the number of outputs the DoFn has > * and recommend using ParDo0, ParDo, ParDo2,...ParDo7, or ParDoN, as > appropriate. > https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/pardo.go#L361 > would need to change as well as any of the specific cases that follow. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-10169) ParDo* functions should declare the correct output N in their error message
[ https://issues.apache.org/jira/browse/BEAM-10169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126926#comment-17126926 ] Robert Burke commented on BEAM-10169: - Given that a given DoFn will only be suitable for one of the ParDo* methods, we're probably better off being less polite. And in Go, there's a strong got before want convention at least for test outputs. Further, while we're panicking, there's no reason we shouldn't clearly indicate the context of the error. So I was thinking: ```DoFn {doFnName} has {numOutputs} outputs, but ParDo{parDoNum} requires {parDoNum}. Use ParDo{numOutputs} instead.``` Of course there's also edge cases to consider. Eg. should just print ParDo instead of ParDo1, and if there are more than 7 outputs, then it should recommend ParDoN instead. Having the doFnName helps localize which DoFn is being used wrong, and the panic trace will hopefully make the caller's line number unambiguous. Further, having the doFnName helps when some user is having ParDo called indirectly. This isn't perfect for all problems, but it covers most of them I think. What do you think of that? > ParDo* functions should declare the correct output N in their error message > --- > > Key: BEAM-10169 > URL: https://issues.apache.org/jira/browse/BEAM-10169 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Robert Burke >Assignee: Aaron Tillekeratne >Priority: P3 > Labels: noob, starter > > User report noted the confusion in the error if you use a DoFn with 0 outputs > with beam.ParDo instead of beam.ParDo0. > In that case, a panic stack trace is followed by the cryptic: "expected 1 > output. Found: []" > We can do better. > While we can't change the return signature dynamically (that's for ParDoN > only), we can instead clearly indicate: > * the DoFn in question. > * the number of outputs the DoFn has > * and recommend using ParDo0, ParDo, ParDo2,...ParDo7, or ParDoN, as > appropriate. > https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/pardo.go#L361 > would need to change as well as any of the specific cases that follow. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-10169) ParDo* functions should declare the correct output N in their error message
[ https://issues.apache.org/jira/browse/BEAM-10169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126926#comment-17126926 ] Robert Burke edited comment on BEAM-10169 at 6/5/20, 4:19 PM: -- Given that a given DoFn will only be suitable for one of the ParDo* methods, we're probably better off being less polite. And in Go, there's a strong got before want convention at least for test outputs. Further, while we're panicking, there's no reason we shouldn't clearly indicate the context of the error. So I was thinking: ```DoFn {doFnName} has {numOutputs} outputs, but ParDo{parDoNum} requires {parDoNum}. Use ParDo{numOutputs} instead.``` Of course there's also edge cases to consider. Eg. should just print ParDo instead of ParDo1, and if there are more than 7 outputs, then it should recommend ParDoN instead. Having the doFnName helps localize which DoFn is being used wrong, and the panic trace will hopefully make the caller's line number unambiguous. Further, having the doFnName helps when some user is having ParDo called indirectly. This isn't perfect for all problems, but it covers most of them I think. What do you think of that? [~codeBehindMe] was (Author: lostluck): Given that a given DoFn will only be suitable for one of the ParDo* methods, we're probably better off being less polite. And in Go, there's a strong got before want convention at least for test outputs. Further, while we're panicking, there's no reason we shouldn't clearly indicate the context of the error. So I was thinking: ```DoFn {doFnName} has {numOutputs} outputs, but ParDo{parDoNum} requires {parDoNum}. Use ParDo{numOutputs} instead.``` Of course there's also edge cases to consider. Eg. should just print ParDo instead of ParDo1, and if there are more than 7 outputs, then it should recommend ParDoN instead. Having the doFnName helps localize which DoFn is being used wrong, and the panic trace will hopefully make the caller's line number unambiguous. Further, having the doFnName helps when some user is having ParDo called indirectly. This isn't perfect for all problems, but it covers most of them I think. What do you think of that? > ParDo* functions should declare the correct output N in their error message > --- > > Key: BEAM-10169 > URL: https://issues.apache.org/jira/browse/BEAM-10169 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Robert Burke >Assignee: Aaron Tillekeratne >Priority: P3 > Labels: noob, starter > > User report noted the confusion in the error if you use a DoFn with 0 outputs > with beam.ParDo instead of beam.ParDo0. > In that case, a panic stack trace is followed by the cryptic: "expected 1 > output. Found: []" > We can do better. > While we can't change the return signature dynamically (that's for ParDoN > only), we can instead clearly indicate: > * the DoFn in question. > * the number of outputs the DoFn has > * and recommend using ParDo0, ParDo, ParDo2,...ParDo7, or ParDoN, as > appropriate. > https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/pardo.go#L361 > would need to change as well as any of the specific cases that follow. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9615) [Go SDK] Beam Schemas
[ https://issues.apache.org/jira/browse/BEAM-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke updated BEAM-9615: --- Labels: (was: stale-assigned) > [Go SDK] Beam Schemas > - > > Key: BEAM-9615 > URL: https://issues.apache.org/jira/browse/BEAM-9615 > Project: Beam > Issue Type: New Feature > Components: sdk-go >Reporter: Robert Burke >Assignee: Robert Burke >Priority: P2 > Time Spent: 40m > Remaining Estimate: 0h > > Schema support is required for advanced cross language features in Beam, and > has the opportunity to replace the current default JSON encoding of elements. > Some quick notes, though a better fleshed out doc with details will be > forthcoming: > * All base coders should be implemented, and listed as coder capabilities. I > think only stringutf8 is missing presently. > * Should support fairly arbitrary user types, seamlessly. That is, users > should be able to rely on it "just working" if their type is compatible. > * Should support schema metadata tagging. > In particular, one breaking shift in the default will be to explicitly fail > pipelines if elements have unexported fields, when no other custom coder has > been added. This has been a source of errors/dropped data/keys and a simply > warning at construction time won't cut it. However, we could provide a manual > "use beam schemas, but ignore unexported fields" registration as a work > around. > Edit: Doc is now at https://s.apache.org/beam-go-schemas -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9615) [Go SDK] Beam Schemas
[ https://issues.apache.org/jira/browse/BEAM-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126280#comment-17126280 ] Robert Burke commented on BEAM-9615: State of the world slowed down progress on this, but I'm now rolling out PRs for review. > [Go SDK] Beam Schemas > - > > Key: BEAM-9615 > URL: https://issues.apache.org/jira/browse/BEAM-9615 > Project: Beam > Issue Type: New Feature > Components: sdk-go >Reporter: Robert Burke >Assignee: Robert Burke >Priority: P2 > Time Spent: 40m > Remaining Estimate: 0h > > Schema support is required for advanced cross language features in Beam, and > has the opportunity to replace the current default JSON encoding of elements. > Some quick notes, though a better fleshed out doc with details will be > forthcoming: > * All base coders should be implemented, and listed as coder capabilities. I > think only stringutf8 is missing presently. > * Should support fairly arbitrary user types, seamlessly. That is, users > should be able to rely on it "just working" if their type is compatible. > * Should support schema metadata tagging. > In particular, one breaking shift in the default will be to explicitly fail > pipelines if elements have unexported fields, when no other custom coder has > been added. This has been a source of errors/dropped data/keys and a simply > warning at construction time won't cut it. However, we could provide a manual > "use beam schemas, but ignore unexported fields" registration as a work > around. > Edit: Doc is now at https://s.apache.org/beam-go-schemas -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10169) ParDo* functions should declare the correct output N in their error message
[ https://issues.apache.org/jira/browse/BEAM-10169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke updated BEAM-10169: Status: Open (was: Triage Needed) > ParDo* functions should declare the correct output N in their error message > --- > > Key: BEAM-10169 > URL: https://issues.apache.org/jira/browse/BEAM-10169 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Robert Burke >Assignee: Aaron Tillekeratne >Priority: P3 > Labels: noob, starter > > User report noted the confusion in the error if you use a DoFn with 0 outputs > with beam.ParDo instead of beam.ParDo0. > In that case, a panic stack trace is followed by the cryptic: "expected 1 > output. Found: []" > We can do better. > While we can't change the return signature dynamically (that's for ParDoN > only), we can instead clearly indicate: > * the DoFn in question. > * the number of outputs the DoFn has > * and recommend using ParDo0, ParDo, ParDo2,...ParDo7, or ParDoN, as > appropriate. > https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/pardo.go#L361 > would need to change as well as any of the specific cases that follow. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-10169) ParDo* functions should declare the correct output N in their error message
Robert Burke created BEAM-10169: --- Summary: ParDo* functions should declare the correct output N in their error message Key: BEAM-10169 URL: https://issues.apache.org/jira/browse/BEAM-10169 Project: Beam Issue Type: Improvement Components: sdk-go Reporter: Robert Burke User report noted the confusion in the error if you use a DoFn with 0 outputs with beam.ParDo instead of beam.ParDo0. In that case, a panic stack trace is followed by the cryptic: "expected 1 output. Found: []" We can do better. While we can't change the return signature dynamically (that's for ParDoN only), we can instead clearly indicate: * the DoFn in question. * the number of outputs the DoFn has * and recommend using ParDo0, ParDo, ParDo2,...ParDo7, or ParDoN, as appropriate. https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/pardo.go#L361 would need to change as well as any of the specific cases that follow. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10166) Improve execution time errors
[ https://issues.apache.org/jira/browse/BEAM-10166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke updated BEAM-10166: Labels: beginner n00b starter (was: ) > Improve execution time errors > - > > Key: BEAM-10166 > URL: https://issues.apache.org/jira/browse/BEAM-10166 > Project: Beam > Issue Type: Task > Components: sdk-go >Reporter: Robert Burke >Priority: P2 > Labels: beginner, n00b, starter > > The Go SDK uses errors returned by DoFns to signal failures to process > bundles, and terminate bundle processing. However, if the preceding DoFn uses > emitters, rather than error returns, the code has no choice to panic to avoid > user code handling or ignoring the cross DoFn error (which could cause > dataloss or other correctness problems). > All bundle executions are wrapped in `callNoPanic` to prevent worker > termination on such panics, and orderly terminate just the affected bundle > instead.`callNoPanic` uses Go's built in recover mechanism to get the error > and provide a stack trace. > We can do better. > The value returned by recover is just an interface{} which means we could > detect the specific type of error it is. In particular, we could have the > exec package have an error that we can detect. If the recovered value is that > error, then we could use that to provide a clearer error message than a > panic stack trace. > Such an error wrapper would contain: the error in question, the user DoFn > that caused it, the debug id of the DoFn node (To be related back to the > plan.) > Then in `callNoPanic` we could detect this error wrapper and produce a > clearer error message based on the existing plan. If not, we can maintain the > current behavior. This latter part is necessary to handle panics originating > in user code. > To avoid mistaken user use which would breach this protocol, we're best off > keeping the wrapper unexported from the exec package. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10166) Improve execution time errors
[ https://issues.apache.org/jira/browse/BEAM-10166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke updated BEAM-10166: Issue Type: Improvement (was: Task) > Improve execution time errors > - > > Key: BEAM-10166 > URL: https://issues.apache.org/jira/browse/BEAM-10166 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Robert Burke >Priority: P2 > Labels: beginner, n00b, starter > > The Go SDK uses errors returned by DoFns to signal failures to process > bundles, and terminate bundle processing. However, if the preceding DoFn uses > emitters, rather than error returns, the code has no choice to panic to avoid > user code handling or ignoring the cross DoFn error (which could cause > dataloss or other correctness problems). > All bundle executions are wrapped in `callNoPanic` to prevent worker > termination on such panics, and orderly terminate just the affected bundle > instead.`callNoPanic` uses Go's built in recover mechanism to get the error > and provide a stack trace. > We can do better. > The value returned by recover is just an interface{} which means we could > detect the specific type of error it is. In particular, we could have the > exec package have an error that we can detect. If the recovered value is that > error, then we could use that to provide a clearer error message than a > panic stack trace. > Such an error wrapper would contain: the error in question, the user DoFn > that caused it, the debug id of the DoFn node (To be related back to the > plan.) > Then in `callNoPanic` we could detect this error wrapper and produce a > clearer error message based on the existing plan. If not, we can maintain the > current behavior. This latter part is necessary to handle panics originating > in user code. > To avoid mistaken user use which would breach this protocol, we're best off > keeping the wrapper unexported from the exec package. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10166) Improve execution time errors
[ https://issues.apache.org/jira/browse/BEAM-10166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke updated BEAM-10166: Priority: P3 (was: P2) > Improve execution time errors > - > > Key: BEAM-10166 > URL: https://issues.apache.org/jira/browse/BEAM-10166 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Robert Burke >Priority: P3 > Labels: beginner, n00b, starter > > The Go SDK uses errors returned by DoFns to signal failures to process > bundles, and terminate bundle processing. However, if the preceding DoFn uses > emitters, rather than error returns, the code has no choice to panic to avoid > user code handling or ignoring the cross DoFn error (which could cause > dataloss or other correctness problems). > All bundle executions are wrapped in `callNoPanic` to prevent worker > termination on such panics, and orderly terminate just the affected bundle > instead.`callNoPanic` uses Go's built in recover mechanism to get the error > and provide a stack trace. > We can do better. > The value returned by recover is just an interface{} which means we could > detect the specific type of error it is. In particular, we could have the > exec package have an error that we can detect. If the recovered value is that > error, then we could use that to provide a clearer error message than a > panic stack trace. > Such an error wrapper would contain: the error in question, the user DoFn > that caused it, the debug id of the DoFn node (To be related back to the > plan.) > Then in `callNoPanic` we could detect this error wrapper and produce a > clearer error message based on the existing plan. If not, we can maintain the > current behavior. This latter part is necessary to handle panics originating > in user code. > To avoid mistaken user use which would breach this protocol, we're best off > keeping the wrapper unexported from the exec package. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10166) Improve execution time errors
[ https://issues.apache.org/jira/browse/BEAM-10166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke updated BEAM-10166: Status: Open (was: Triage Needed) > Improve execution time errors > - > > Key: BEAM-10166 > URL: https://issues.apache.org/jira/browse/BEAM-10166 > Project: Beam > Issue Type: Task > Components: sdk-go >Reporter: Robert Burke >Priority: P2 > Labels: beginner, n00b, starter > > The Go SDK uses errors returned by DoFns to signal failures to process > bundles, and terminate bundle processing. However, if the preceding DoFn uses > emitters, rather than error returns, the code has no choice to panic to avoid > user code handling or ignoring the cross DoFn error (which could cause > dataloss or other correctness problems). > All bundle executions are wrapped in `callNoPanic` to prevent worker > termination on such panics, and orderly terminate just the affected bundle > instead.`callNoPanic` uses Go's built in recover mechanism to get the error > and provide a stack trace. > We can do better. > The value returned by recover is just an interface{} which means we could > detect the specific type of error it is. In particular, we could have the > exec package have an error that we can detect. If the recovered value is that > error, then we could use that to provide a clearer error message than a > panic stack trace. > Such an error wrapper would contain: the error in question, the user DoFn > that caused it, the debug id of the DoFn node (To be related back to the > plan.) > Then in `callNoPanic` we could detect this error wrapper and produce a > clearer error message based on the existing plan. If not, we can maintain the > current behavior. This latter part is necessary to handle panics originating > in user code. > To avoid mistaken user use which would breach this protocol, we're best off > keeping the wrapper unexported from the exec package. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-10166) Improve execution time errors
Robert Burke created BEAM-10166: --- Summary: Improve execution time errors Key: BEAM-10166 URL: https://issues.apache.org/jira/browse/BEAM-10166 Project: Beam Issue Type: Task Components: sdk-go Reporter: Robert Burke The Go SDK uses errors returned by DoFns to signal failures to process bundles, and terminate bundle processing. However, if the preceding DoFn uses emitters, rather than error returns, the code has no choice to panic to avoid user code handling or ignoring the cross DoFn error (which could cause dataloss or other correctness problems). All bundle executions are wrapped in `callNoPanic` to prevent worker termination on such panics, and orderly terminate just the affected bundle instead.`callNoPanic` uses Go's built in recover mechanism to get the error and provide a stack trace. We can do better. The value returned by recover is just an interface{} which means we could detect the specific type of error it is. In particular, we could have the exec package have an error that we can detect. If the recovered value is that error, then we could use that to provide a clearer error message than a panic stack trace. Such an error wrapper would contain: the error in question, the user DoFn that caused it, the debug id of the DoFn node (To be related back to the plan.) Then in `callNoPanic` we could detect this error wrapper and produce a clearer error message based on the existing plan. If not, we can maintain the current behavior. This latter part is necessary to handle panics originating in user code. To avoid mistaken user use which would breach this protocol, we're best off keeping the wrapper unexported from the exec package. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-9789) Locking error in harness.go
[ https://issues.apache.org/jira/browse/BEAM-9789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke resolved BEAM-9789. Resolution: Fixed > Locking error in harness.go > --- > > Key: BEAM-9789 > URL: https://issues.apache.org/jira/browse/BEAM-9789 > Project: Beam > Issue Type: Bug > Components: sdk-go >Reporter: Robert Burke >Assignee: Robert Burke >Priority: P2 > Fix For: Not applicable > > Time Spent: 50m > Remaining Estimate: 0h > > When there's an error on lookup or construction of an execution plan, the > lock is accidentally held causing the worker to freeze. > Shouldn't be user affecting, as most plans and lookups are correct without > error, but if there's a transient GRPC issue on lookup, that might cause an > otherwise healthy worker to deadlock. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-9815) beam_PostCommit_Go perma red due to failing to start container
[ https://issues.apache.org/jira/browse/BEAM-9815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke resolved BEAM-9815. Fix Version/s: Not applicable Resolution: Fixed Dataflow's portable artifact service was updated, so Go Dataflow PostCommits are green again. > beam_PostCommit_Go perma red due to failing to start container > -- > > Key: BEAM-9815 > URL: https://issues.apache.org/jira/browse/BEAM-9815 > Project: Beam > Issue Type: Bug > Components: sdk-go, test-failures >Reporter: Chamikara Madhusanka Jayalath >Assignee: Robert Bradshaw >Priority: P1 > Labels: currently-failing > Fix For: Not applicable > > Time Spent: 2h 40m > Remaining Estimate: 0h > > For example, > [https://builds.apache.org/job/beam_PostCommit_Go/6847/] > [https://pantheon.corp.google.com/logs/viewer?project=apache-beam-testing=500=false=2020-04-24T15:09:13.45500Z==true=NO_LIMIT=dataflow_step%2Fjob_id%2F2020-04-24_05_03_49-5495819388067192698=2020-04-24T13:03:38.313084000Z] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9815) beam_PostCommit_Go perma red due to failing to start container
[ https://issues.apache.org/jira/browse/BEAM-9815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke updated BEAM-9815: --- Labels: (was: currently-failing) > beam_PostCommit_Go perma red due to failing to start container > -- > > Key: BEAM-9815 > URL: https://issues.apache.org/jira/browse/BEAM-9815 > Project: Beam > Issue Type: Bug > Components: sdk-go, test-failures >Reporter: Chamikara Madhusanka Jayalath >Assignee: Robert Bradshaw >Priority: P1 > Fix For: Not applicable > > Time Spent: 2h 40m > Remaining Estimate: 0h > > For example, > [https://builds.apache.org/job/beam_PostCommit_Go/6847/] > [https://pantheon.corp.google.com/logs/viewer?project=apache-beam-testing=500=false=2020-04-24T15:09:13.45500Z==true=NO_LIMIT=dataflow_step%2Fjob_id%2F2020-04-24_05_03_49-5495819388067192698=2020-04-24T13:03:38.313084000Z] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-10110) Populate pipeline_proto_coder_id field for dataflow.
Robert Burke created BEAM-10110: --- Summary: Populate pipeline_proto_coder_id field for dataflow. Key: BEAM-10110 URL: https://issues.apache.org/jira/browse/BEAM-10110 Project: Beam Issue Type: Task Components: runner-dataflow, sdk-go Reporter: Robert Burke Assignee: Robert Burke Dataflow isn't natively translating from the Beam Pipeline Proto yet, but requires SDKs to translate the graph into it's own format. Adding this hint for custom coders (Coders not known to Dataflow/Beam) avoids having dataflow re-synthesize coders from it's format, back to the pipeline proto. Currently there's the awkward restriction on which coders should receive the ID, rather than having the SDK apply the field to all of them, but this is a good first step to get there. This restriction may be lifted on a subsequent dataflow release. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9815) beam_PostCommit_Go perma red due to failing to start container
[ https://issues.apache.org/jira/browse/BEAM-9815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117009#comment-17117009 ] Robert Burke commented on BEAM-9815: This seems to be resolved since Dataflow updated. > beam_PostCommit_Go perma red due to failing to start container > -- > > Key: BEAM-9815 > URL: https://issues.apache.org/jira/browse/BEAM-9815 > Project: Beam > Issue Type: Bug > Components: sdk-go, test-failures >Reporter: Chamikara Madhusanka Jayalath >Assignee: Robert Bradshaw >Priority: P1 > Labels: currently-failing > Time Spent: 2h 40m > Remaining Estimate: 0h > > For example, > [https://builds.apache.org/job/beam_PostCommit_Go/6847/] > [https://pantheon.corp.google.com/logs/viewer?project=apache-beam-testing=500=false=2020-04-24T15:09:13.45500Z==true=NO_LIMIT=dataflow_step%2Fjob_id%2F2020-04-24_05_03_49-5495819388067192698=2020-04-24T13:03:38.313084000Z] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9679) Core Transforms | Go SDK Code Katas
[ https://issues.apache.org/jira/browse/BEAM-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke updated BEAM-9679: --- Description: A kata devoted to core beam transforms patterns after [https://github.com/apache/beam/tree/master/learning/katas/java/Core%20Transforms] where the take away is an individual's ability to master the following using an Apache Beam pipeline using the Golang SDK. ||Transform||Pull Request||Status|| |Map|[11564|https://github.com/apache/beam/pull/11564]|Closed| |GroupByKey|[11734|https://github.com/apache/beam/pull/11734]|Closed| |CoGroupByKey|[11803|https://github.com/apache/beam/pull/11803]|Open| |Combine| | | |Flatten|[11806|https://github.com/apache/beam/pull/11806]| | |Partition| | | |Side Input| | | |Side Output| | | |Branching| | | |Composite Transform| | | |DoFn Additional Parameters| | | was: A kata devoted to core beam transforms patterns after [https://github.com/apache/beam/tree/master/learning/katas/java/Core%20Transforms] where the take away is an individual's ability to master the following using an Apache Beam pipeline using the Golang SDK. ||Transform||Pull Request||Status|| |Map|[11564|https://github.com/apache/beam/pull/11564]|Closed| |GroupByKey|[11734|https://github.com/apache/beam/pull/11734]|Closed| |CoGroupByKey|[11803|https://github.com/apache/beam/pull/11803]|Open| |Combine| | | |Flatten|[11806|https://github.com/apache/beam/pull/11806]| | | |Partition| | | |Side Input| | | |Side Output| | | |Branching| | | |Composite Transform| | | |DoFn Additional Parameters| | | > Core Transforms | Go SDK Code Katas > --- > > Key: BEAM-9679 > URL: https://issues.apache.org/jira/browse/BEAM-9679 > Project: Beam > Issue Type: Sub-task > Components: katas, sdk-go >Reporter: Damon Douglas >Assignee: Damon Douglas >Priority: P2 > Time Spent: 2h 20m > Remaining Estimate: 0h > > A kata devoted to core beam transforms patterns after > [https://github.com/apache/beam/tree/master/learning/katas/java/Core%20Transforms] > where the take away is an individual's ability to master the following using > an Apache Beam pipeline using the Golang SDK. > > ||Transform||Pull Request||Status|| > |Map|[11564|https://github.com/apache/beam/pull/11564]|Closed| > |GroupByKey|[11734|https://github.com/apache/beam/pull/11734]|Closed| > |CoGroupByKey|[11803|https://github.com/apache/beam/pull/11803]|Open| > |Combine| | | > |Flatten|[11806|https://github.com/apache/beam/pull/11806]| | > |Partition| | | > |Side Input| | | > |Side Output| | | > |Branching| | | > |Composite Transform| | | > |DoFn Additional Parameters| | | -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9679) Core Transforms | Go SDK Code Katas
[ https://issues.apache.org/jira/browse/BEAM-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke updated BEAM-9679: --- Description: A kata devoted to core beam transforms patterns after [https://github.com/apache/beam/tree/master/learning/katas/java/Core%20Transforms] where the take away is an individual's ability to master the following using an Apache Beam pipeline using the Golang SDK. ||Transform||Pull Request||Status|| |Map|[11564|https://github.com/apache/beam/pull/11564]|Closed| |GroupByKey|[11734|https://github.com/apache/beam/pull/11734]|Closed| |CoGroupByKey|[11803|https://github.com/apache/beam/pull/11803]|Open| |Combine| | | |Flatten|[11806|https://github.com/apache/beam/pull/11806]| | | |Partition| | | |Side Input| | | |Side Output| | | |Branching| | | |Composite Transform| | | |DoFn Additional Parameters| | | was: A kata devoted to core beam transforms patterns after [https://github.com/apache/beam/tree/master/learning/katas/java/Core%20Transforms] where the take away is an individual's ability to master the following using an Apache Beam pipeline using the Golang SDK. ||Transform||Pull Request||Status|| |Map|[11564|https://github.com/apache/beam/pull/11564]|Closed| |GroupByKey|[11734|https://github.com/apache/beam/pull/11734]|Closed| |CoGroupByKey|[11803|https://github.com/apache/beam/pull/11803]|Open| |Combine| | | |Flatten| | | |Partition| | | |Side Input| | | |Side Output| | | |Branching| | | |Composite Transform| | | |DoFn Additional Parameters| | | > Core Transforms | Go SDK Code Katas > --- > > Key: BEAM-9679 > URL: https://issues.apache.org/jira/browse/BEAM-9679 > Project: Beam > Issue Type: Sub-task > Components: katas, sdk-go >Reporter: Damon Douglas >Assignee: Damon Douglas >Priority: P2 > Time Spent: 2h 20m > Remaining Estimate: 0h > > A kata devoted to core beam transforms patterns after > [https://github.com/apache/beam/tree/master/learning/katas/java/Core%20Transforms] > where the take away is an individual's ability to master the following using > an Apache Beam pipeline using the Golang SDK. > > ||Transform||Pull Request||Status|| > |Map|[11564|https://github.com/apache/beam/pull/11564]|Closed| > |GroupByKey|[11734|https://github.com/apache/beam/pull/11734]|Closed| > |CoGroupByKey|[11803|https://github.com/apache/beam/pull/11803]|Open| > |Combine| | | > |Flatten|[11806|https://github.com/apache/beam/pull/11806]| | | > |Partition| | | > |Side Input| | | > |Side Output| | | > |Branching| | | > |Composite Transform| | | > |DoFn Additional Parameters| | | -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-10051) Misordered check WRT closed data readers.
[ https://issues.apache.org/jira/browse/BEAM-10051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke reassigned BEAM-10051: --- Assignee: Robert Burke > Misordered check WRT closed data readers. > - > > Key: BEAM-10051 > URL: https://issues.apache.org/jira/browse/BEAM-10051 > Project: Beam > Issue Type: Bug > Components: sdk-go >Reporter: Robert Burke >Assignee: Robert Burke >Priority: P2 > Time Spent: 20m > Remaining Estimate: 0h > > This check > https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/harness/datamgr.go#L269 > in it's current position prevents the "normal teardown" that the reader > expects. This means that readers for instructions that terminate early such > as due to splitting stay resident in memory and never close. > In practice this is benign as the buffer would already be closed, but with > streaming this memory leak would become noticable. > The fix is to move the check to after the sentinel check, and additionally > check there for early termination to avoid closing the buffer twice. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-10056) Side Input Validation too tight, doesn't allow CoGBK
Robert Burke created BEAM-10056: --- Summary: Side Input Validation too tight, doesn't allow CoGBK Key: BEAM-10056 URL: https://issues.apache.org/jira/browse/BEAM-10056 Project: Beam Issue Type: Bug Components: sdk-go Reporter: Robert Burke Assignee: Robert Burke The following doesn't pass validation, though it should as it's a valid signature for ParDo accepting a PCollection> func (fn *writer) StartBundle(ctx context.Context) error func (fn *writer) ProcessElement( ctx context.Context, key string, iter1, iter2 func(**clientHistory) bool) func (fn *writer) FinishBundle(ctx context.Context) It returns an error: Missing side inputs in the StartBundle method of a DoFn. If side inputs are present in ProcessElement those side inputs must also be present in StartBundle. Full error: inserting ParDo in scope root: graph.AsDoFn: for Fn named <...pii...>/userpackage.writer: side inputs expected in method StartBundle [recovered] panic: Missing side inputs in the StartBundle method of a DoFn. If side inputs are present in ProcessElement those side inputs must also be present in StartBundle. Full error: inserting ParDo in scope root: graph.AsDoFn: for Fn named <...pii...>/userpackage.writer: side inputs expected in method StartBundle This is happening in the input unaware validation, which means it needs to be loosened, and validated elsewhere. https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/graph/fn.go#L527 There are "sibling" cases for the DoFn signature func (fn *writer) StartBundle(context.Context, side func(**clientHistory) bool) error func (fn *writer) ProcessElement( ctx context.Context, key string, iter, side func(**clientHistory) bool) func (fn *writer) FinishBundle( context.Context, side, func(**clientHistory) bool) and func (fn *writer) StartBundle(context.Context, side1, side2 func(**clientHistory) bool) error func (fn *writer) ProcessElement( ctx context.Context, key string, side1, side2 func(**clientHistory) bool) func (fn *writer) FinishBundle( context.Context, side1, side2 func(**clientHistory) bool) Would be for > with <*clientHistory> on the side, and with <*clientHistory> and <*clientHistory> on the side respectively. Which would only be determinable fully with the input, and should provide a clear error when PCollection binding is occuring. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10051) Misordered check WRT closed data readers.
[ https://issues.apache.org/jira/browse/BEAM-10051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke updated BEAM-10051: Description: This check https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/harness/datamgr.go#L269 in it's current position prevents the "normal teardown" that the reader expects. This means that readers for instructions that terminate early such as due to splitting stay resident in memory and never close. In practice this is benign as the buffer would already be closed, but with streaming this memory leak would become noticable. The fix is to move the check to after the sentinel check, and additionally check there for early termination to avoid closing the buffer twice. > Misordered check WRT closed data readers. > - > > Key: BEAM-10051 > URL: https://issues.apache.org/jira/browse/BEAM-10051 > Project: Beam > Issue Type: Bug > Components: sdk-go >Reporter: Robert Burke >Priority: P2 > > This check > https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/harness/datamgr.go#L269 > in it's current position prevents the "normal teardown" that the reader > expects. This means that readers for instructions that terminate early such > as due to splitting stay resident in memory and never close. > In practice this is benign as the buffer would already be closed, but with > streaming this memory leak would become noticable. > The fix is to move the check to after the sentinel check, and additionally > check there for early termination to avoid closing the buffer twice. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-10051) Misordered check WRT closed data readers.
Robert Burke created BEAM-10051: --- Summary: Misordered check WRT closed data readers. Key: BEAM-10051 URL: https://issues.apache.org/jira/browse/BEAM-10051 Project: Beam Issue Type: Bug Components: sdk-go Reporter: Robert Burke -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10049) Add licenses to Go SDK containers
[ https://issues.apache.org/jira/browse/BEAM-10049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke updated BEAM-10049: Description: This will be a prerequisite to publishing Go SDK containers as part of the release again. See BEAM-9685 There's tool to pull in dependency license information for a Go package: https://github.com/google/go-licenses And once the License file from PR https://github.com/apache/beam/pull/11657 is picked up, pkd.go.dev will also display them, https://pkg.go.dev/github.com/apache/beam/sdks/go/pkg/beam?tab=licenses was:This will be a prerequisite to publishing Go SDK containers as part of the release again. See BEAM-9685 > Add licenses to Go SDK containers > - > > Key: BEAM-10049 > URL: https://issues.apache.org/jira/browse/BEAM-10049 > Project: Beam > Issue Type: Improvement > Components: build-system, sdk-go >Reporter: Kyle Weaver >Priority: P2 > > This will be a prerequisite to publishing Go SDK containers as part of the > release again. See BEAM-9685 > There's tool to pull in dependency license information for a Go package: > https://github.com/google/go-licenses > And once the License file from PR https://github.com/apache/beam/pull/11657 > is picked up, > pkd.go.dev will also display them, > https://pkg.go.dev/github.com/apache/beam/sdks/go/pkg/beam?tab=licenses -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9615) [Go SDK] Beam Schemas
[ https://issues.apache.org/jira/browse/BEAM-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke updated BEAM-9615: --- Description: Schema support is required for advanced cross language features in Beam, and has the opportunity to replace the current default JSON encoding of elements. Some quick notes, though a better fleshed out doc with details will be forthcoming: * All base coders should be implemented, and listed as coder capabilities. I think only stringutf8 is missing presently. * Should support fairly arbitrary user types, seamlessly. That is, users should be able to rely on it "just working" if their type is compatible. * Should support schema metadata tagging. In particular, one breaking shift in the default will be to explicitly fail pipelines if elements have unexported fields, when no other custom coder has been added. This has been a source of errors/dropped data/keys and a simply warning at construction time won't cut it. However, we could provide a manual "use beam schemas, but ignore unexported fields" registration as a work around. Edit: Doc is now at https://s.apache.org/beam-go-schemas was: Schema support is required for advanced cross language features in Beam, and has the opportunity to replace the current default JSON encoding of elements. Some quick notes, though a better fleshed out doc with details will be forthcoming: * All base coders should be implemented, and listed as coder capabilities. I think only stringutf8 is missing presently. * Should support fairly arbitrary user types, seamlessly. That is, users should be able to rely on it "just working" if their type is compatible. * Should support schema metadata tagging. In particular, one breaking shift in the default will be to explicitly fail pipelines if elements have unexported fields, when no other custom coder has been added. This has been a source of errors/dropped data/keys and a simply warning at construction time won't cut it. However, we could provide a manual "use beam schemas, but ignore unexported fields" registration as a work around. > [Go SDK] Beam Schemas > - > > Key: BEAM-9615 > URL: https://issues.apache.org/jira/browse/BEAM-9615 > Project: Beam > Issue Type: New Feature > Components: sdk-go >Reporter: Robert Burke >Assignee: Robert Burke >Priority: Major > > Schema support is required for advanced cross language features in Beam, and > has the opportunity to replace the current default JSON encoding of elements. > Some quick notes, though a better fleshed out doc with details will be > forthcoming: > * All base coders should be implemented, and listed as coder capabilities. I > think only stringutf8 is missing presently. > * Should support fairly arbitrary user types, seamlessly. That is, users > should be able to rely on it "just working" if their type is compatible. > * Should support schema metadata tagging. > In particular, one breaking shift in the default will be to explicitly fail > pipelines if elements have unexported fields, when no other custom coder has > been added. This has been a source of errors/dropped data/keys and a simply > warning at construction time won't cut it. However, we could provide a manual > "use beam schemas, but ignore unexported fields" registration as a work > around. > Edit: Doc is now at https://s.apache.org/beam-go-schemas -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-7178) Add package comment to "errors" package.
[ https://issues.apache.org/jira/browse/BEAM-7178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke resolved BEAM-7178. Fix Version/s: Not applicable Resolution: Fixed > Add package comment to "errors" package. > > > Key: BEAM-7178 > URL: https://issues.apache.org/jira/browse/BEAM-7178 > Project: Beam > Issue Type: Bug > Components: sdk-go >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: Minor > Fix For: Not applicable > > Time Spent: 0.5h > Remaining Estimate: 0h > > I forgot to add a package comment to the errors package: > [https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/internal/errors/errors.go] > I should fix that. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-8292) Add a Reshuffle PTransform preventing fusion of the surrounding transforms
[ https://issues.apache.org/jira/browse/BEAM-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke resolved BEAM-8292. Fix Version/s: Not applicable Resolution: Fixed > Add a Reshuffle PTransform preventing fusion of the surrounding transforms > -- > > Key: BEAM-8292 > URL: https://issues.apache.org/jira/browse/BEAM-8292 > Project: Beam > Issue Type: New Feature > Components: sdk-go >Reporter: John Patoch >Assignee: Robert Burke >Priority: Minor > Fix For: Not applicable > > Time Spent: 4h > Remaining Estimate: 0h > > Reshuffle is a PTransform that takes a PCollection and shuffles the data > to help increase parallelism. > Reshuffle adds a temporary random key to each element, performs a > GroupByKey, and finally removes the temporary key. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9982) Replace graphx.MustMarshal with protox.MustEncode
[ https://issues.apache.org/jira/browse/BEAM-9982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke updated BEAM-9982: --- Description: A redundant helper function, [graphx.MustMarshal|https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/graphx/translate.go#L117], was accidentally introduced recently. There exists an identical function in a different package that was already being used in that same file, [protox.MustEncode|https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/util/protox/protox.go#L22] This task is to remove all instances of the graphx.MustMarshal function and replace them with the protox.MustEncode call instead. was: A redundant helper function, [graphx.MustMarshal][https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/graphx/translate.go#L117], was accidentally introduced recently. There exists an identical function in a different package that was already being used in that same file, [protox.MustEncode][https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/util/protox/protox.go#L22] This task is to remove all instances of the graphx.MustMarshal function and replace them with the protox.MustEncode call instead. > Replace graphx.MustMarshal with protox.MustEncode > -- > > Key: BEAM-9982 > URL: https://issues.apache.org/jira/browse/BEAM-9982 > Project: Beam > Issue Type: Bug > Components: sdk-go >Reporter: Robert Burke >Assignee: Robert Burke >Priority: Minor > > A redundant helper function, > [graphx.MustMarshal|https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/graphx/translate.go#L117], > was accidentally introduced recently. There exists an identical function in > a different package that was already being used in that same file, > [protox.MustEncode|https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/util/protox/protox.go#L22] > This task is to remove all instances of the graphx.MustMarshal function and > replace them with the protox.MustEncode call instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9982) Replace graphx.MustMarshal with protox.MustEncode
[ https://issues.apache.org/jira/browse/BEAM-9982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke updated BEAM-9982: --- Description: A redundant helper function, [graphx.MustMarshal][https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/graphx/translate.go#L117], was accidentally introduced recently. There exists an identical function in a different package that was already being used in that same file, [protox.MustEncode][https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/util/protox/protox.go#L22] This task is to remove all instances of the graphx.MustMarshal function and replace them with the protox.MustEncode call instead. was: A redundant helper function, [graphx.MustMarshal](https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/graphx/translate.go#L117), was accidentally introduced recently. There exists an identical function in a different package that was already being used in that same file, [protox.MustEncode](https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/util/protox/protox.go#L22) This task is to remove all instances of the graphx.MustMarshal function and replace them with the protox.MustEncode call instead. > Replace graphx.MustMarshal with protox.MustEncode > -- > > Key: BEAM-9982 > URL: https://issues.apache.org/jira/browse/BEAM-9982 > Project: Beam > Issue Type: Bug > Components: sdk-go >Reporter: Robert Burke >Assignee: Robert Burke >Priority: Minor > > A redundant helper function, > [graphx.MustMarshal][https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/graphx/translate.go#L117], > was accidentally introduced recently. There exists an identical function in > a different package that was already being used in that same file, > [protox.MustEncode][https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/util/protox/protox.go#L22] > This task is to remove all instances of the graphx.MustMarshal function and > replace them with the protox.MustEncode call instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9982) Replace graphx.MustMarshal with protox.MustEncode
Robert Burke created BEAM-9982: -- Summary: Replace graphx.MustMarshal with protox.MustEncode Key: BEAM-9982 URL: https://issues.apache.org/jira/browse/BEAM-9982 Project: Beam Issue Type: Bug Components: sdk-go Reporter: Robert Burke Assignee: Robert Burke A redundant helper function, [graphx.MustMarshal](https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/graphx/translate.go#L117), was accidentally introduced recently. There exists an identical function in a different package that was already being used in that same file, [protox.MustEncode](https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/util/protox/protox.go#L22) This task is to remove all instances of the graphx.MustMarshal function and replace them with the protox.MustEncode call instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9959) Mistakes Computing Composite Inputs and Outputs
[ https://issues.apache.org/jira/browse/BEAM-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105750#comment-17105750 ] Robert Burke commented on BEAM-9959: The right overall fix for that is to check for cycles WRT the composites after the topological sort, and print out that there's a cycle involving the *composite* node represented by the scope. Anything without the full cycle is much harder to debug. Further, the individual PTransforms involved should be fully qualified with their composite parent hierachies to make it easier to find where these are coming from, and recommend either merging two scopes or similar, and recommending that the new scope objects be moved to their own functions with 1 scope per function. This makes the bad construction impossible. > Mistakes Computing Composite Inputs and Outputs > --- > > Key: BEAM-9959 > URL: https://issues.apache.org/jira/browse/BEAM-9959 > Project: Beam > Issue Type: Bug > Components: sdk-go >Reporter: Robert Burke >Assignee: Robert Burke >Priority: Major > > The Go SDK uses a Scope object to manage beam Composites. > A bug was discovered when consuming a PCollection in both the composite that > created it, and in a separate composite. > Further, the Go SDK should verify that the root hypergraph structure is a DAG > and provides a reasonable error. In particular, the leaf nodes of the graph > could form a DAG, but due to how the beam.Scope object is used, might cause > the hypergraph to not be a DAG. > Eg. It's possible to write the following in the Go SDK. > PTransforms A, B, C and PCollections colA, colB, and Composites a, b. > A and C are in a, and B are in b. > A generates colA > B consumes colA, and generates colB. > C consumes colA and colB. > ``` > a := s.Scope(a) > b := s.Scope(b) > colA := beam.Impulse(*a*) > colB := beam.ParDo(*b*, , colA) > beam.ParDo0(*a*, , colA, beam.SideInput{colB}) > ``` > If it doesn't already, the Go SDK must emit a clear error, and fail pipeline > construction. > If the affected composites are roots in the graph, the cycle prevents being > able to topologically sort the root ptransforms for the pipeline graph, which > can adversely affect runners. > The recommendation is always to wrap uses of scope in functions or other > scopes to prevent such incorrect constructions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9959) Mistakes Computing Composite Inputs and Outputs
[ https://issues.apache.org/jira/browse/BEAM-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke updated BEAM-9959: --- Description: The Go SDK uses a Scope object to manage beam Composites. A bug was discovered when consuming a PCollection in both the composite that created it, and in a separate composite. Further, the Go SDK should verify that the root hypergraph structure is a DAG and provides a reasonable error. In particular, the leaf nodes of the graph could form a DAG, but due to how the beam.Scope object is used, might cause the hypergraph to not be a DAG. Eg. It's possible to write the following in the Go SDK. PTransforms A, B, C and PCollections colA, colB, and Composites a, b. A and C are in a, and B are in b. A generates colA B consumes colA, and generates colB. C consumes colA and colB. ``` a := s.Scope(a) b := s.Scope(b) colA := beam.Impulse(*a*) colB := beam.ParDo(*b*, , colA) beam.ParDo0(*a*, , colA, beam.SideInput{colB}) ``` If it doesn't already, the Go SDK must emit a clear error, and fail pipeline construction. If the affected composites are roots in the graph, the cycle prevents being able to topologically sort the root ptransforms for the pipeline graph, which can adversely affect runners. The recommendation is always to wrap uses of scope in functions or other scopes to prevent such incorrect constructions. was: The Go SDK uses a Scope object to manage beam Composites. A bug was discovered when consuming a PCollection in both the composite that created it, and in a separate composite. Further, the Go SDK should verify that the root hypergraph structure is a DAG and provides a reasonable error. In particular, the leaf nodes of the graph could form a DAG, but due to how the beam.Scope object is used, might cause the hypergraph to not be a DAG. Eg. It's possible to write the following in the Go SDK. PTransforms A, B, C and PCollections colA, colB, and Composites a, b. A and C are in a, and B are in b. A generates colA B consumes colA, and generates colB. C consumes colB. ``` a := s.Scope(a) b := s.Scope(b) colA := beam.Impulse(*a*) colB := beam.ParDo(*b*, , colA) beam.ParDo0(*a*, , colA) ``` If it doesn't already the Go SDK must emit a clear error, and fail pipeline construction. If the affected composites are roots in the graph, the cycle prevents being able to topologically sort the root ptransforms for the pipeline graph, which can adversely affect runners. The recommendation is always to wrap uses of scope in functions or other scopes to prevent such incorrect constructions. > Mistakes Computing Composite Inputs and Outputs > --- > > Key: BEAM-9959 > URL: https://issues.apache.org/jira/browse/BEAM-9959 > Project: Beam > Issue Type: Bug > Components: sdk-go >Reporter: Robert Burke >Assignee: Robert Burke >Priority: Major > > The Go SDK uses a Scope object to manage beam Composites. > A bug was discovered when consuming a PCollection in both the composite that > created it, and in a separate composite. > Further, the Go SDK should verify that the root hypergraph structure is a DAG > and provides a reasonable error. In particular, the leaf nodes of the graph > could form a DAG, but due to how the beam.Scope object is used, might cause > the hypergraph to not be a DAG. > Eg. It's possible to write the following in the Go SDK. > PTransforms A, B, C and PCollections colA, colB, and Composites a, b. > A and C are in a, and B are in b. > A generates colA > B consumes colA, and generates colB. > C consumes colA and colB. > ``` > a := s.Scope(a) > b := s.Scope(b) > colA := beam.Impulse(*a*) > colB := beam.ParDo(*b*, , colA) > beam.ParDo0(*a*, , colA, beam.SideInput{colB}) > ``` > If it doesn't already, the Go SDK must emit a clear error, and fail pipeline > construction. > If the affected composites are roots in the graph, the cycle prevents being > able to topologically sort the root ptransforms for the pipeline graph, which > can adversely affect runners. > The recommendation is always to wrap uses of scope in functions or other > scopes to prevent such incorrect constructions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9959) Mistakes Computing Composite Inputs and Outputs
Robert Burke created BEAM-9959: -- Summary: Mistakes Computing Composite Inputs and Outputs Key: BEAM-9959 URL: https://issues.apache.org/jira/browse/BEAM-9959 Project: Beam Issue Type: Bug Components: sdk-go Reporter: Robert Burke Assignee: Robert Burke The Go SDK uses a Scope object to manage beam Composites. A bug was discovered when consuming a PCollection in both the composite that created it, and in a separate composite. Further, the Go SDK should verify that the root hypergraph structure is a DAG and provides a reasonable error. In particular, the leaf nodes of the graph could form a DAG, but due to how the beam.Scope object is used, might cause the hypergraph to not be a DAG. Eg. It's possible to write the following in the Go SDK. PTransforms A, B, C and PCollections colA, colB, and Composites a, b. A and C are in a, and B are in b. A generates colA B consumes colA, and generates colB. C consumes colB. ``` a := s.Scope(a) b := s.Scope(b) colA := beam.Impulse(*a*) colB := beam.ParDo(*b*, , colA) beam.ParDo0(*a*, , colA) ``` If it doesn't already the Go SDK must emit a clear error, and fail pipeline construction. If the affected composites are roots in the graph, the cycle prevents being able to topologically sort the root ptransforms for the pipeline graph, which can adversely affect runners. The recommendation is always to wrap uses of scope in functions or other scopes to prevent such incorrect constructions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-7030) Make it possible to display the full PCollection when passert fails
[ https://issues.apache.org/jira/browse/BEAM-7030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke resolved BEAM-7030. Fix Version/s: Not applicable Assignee: Paul Fisher Resolution: Fixed I believe this got addressed in BEAM-9731. PAssert now prints the whole PCollections under test, and soon, also sorts it for easier comparison. > Make it possible to display the full PCollection when passert fails > --- > > Key: BEAM-7030 > URL: https://issues.apache.org/jira/browse/BEAM-7030 > Project: Beam > Issue Type: Improvement > Components: sdk-go, testing >Reporter: Damien Desfontaines >Assignee: Paul Fisher >Priority: Major > Fix For: Not applicable > > > If I use passert.Equals with two PCollections, and the test fails, the error > message only says something like "value _ present, but not expected". This is > not very useful — to debug failing tests, I'd like to print both PCollections > so I can compare them directly instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9815) beam_PostCommit_Go perma red due to failing to start container
[ https://issues.apache.org/jira/browse/BEAM-9815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17092409#comment-17092409 ] Robert Burke commented on BEAM-9815: I'm going to stop looking at this point, but the open source side seems to be exhausted. The code eventually runs a binary on the Dataflow side to set up an artifact service, that [proxies reading from GCS](https://github.com/apache/beam/blob/24361d1b5981ef7d18e586a8e5deaf683f4329f1/sdks/go/pkg/beam/artifact/gcsproxy/retrieval.go#L82). That code is in the [artifact package](https://github.com/apache/beam/blob/24361d1b5981ef7d18e586a8e5deaf683f4329f1/sdks/go/pkg/beam/artifact/materialize.go#L135) though it's called from something inside google. That there might have some kind of version skew with the container/boot.go code, and need updating on the google side. I'm unable to figure that one out at this time. > beam_PostCommit_Go perma red due to failing to start container > -- > > Key: BEAM-9815 > URL: https://issues.apache.org/jira/browse/BEAM-9815 > Project: Beam > Issue Type: Bug > Components: sdk-go, test-failures >Reporter: Chamikara Madhusanka Jayalath >Assignee: Robert Bradshaw >Priority: Critical > Labels: currently-failing > Time Spent: 1h > Remaining Estimate: 0h > > For example, > [https://builds.apache.org/job/beam_PostCommit_Go/6847/] > [https://pantheon.corp.google.com/logs/viewer?project=apache-beam-testing=500=false=2020-04-24T15:09:13.45500Z==true=NO_LIMIT=dataflow_step%2Fjob_id%2F2020-04-24_05_03_49-5495819388067192698=2020-04-24T13:03:38.313084000Z] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9815) beam_PostCommit_Go perma red due to failing to start container
[ https://issues.apache.org/jira/browse/BEAM-9815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17092397#comment-17092397 ] Robert Burke commented on BEAM-9815: The provision info for dataflow has no depenencies listed (though the pipeline proto does have one listed), and also no retrieval token. So a "quick" fix might be to hack it to assume something is there. The data there was not at the "assumed" path on the worker, and it never used the staging location for the worker binary. https://pantheon.corp.google.com/storage/browser/temp-storage-for-end-to-end-tests/staging-validatesrunner-test/go-1-1587852331972351853/?forceOnBucketsSortingFiltering=false=apache-beam-testing But the model was able to be found at the staging location (which makes sense since we can see it in the dataflow explorer there). I 2020-04-25T22:06:49.848812Z Downloading: gs://temp-storage-for-end-to-end-tests/staging-validatesrunner-test/go-2-1587852331972428268/model to /tmp/tmp/download.0.148567144/file.0 (size: 3 Kb, MD5: sceqVeC8VgLLgWXiRJ0Kvg==) I 2020-04-25T22:06:49.952462Z Download completed: gs://temp-storage-for-end-to-end-tests/staging-validatesrunner-test/go-2-1587852331972428268/model (duration: 103 ms @ 37 Kb/s) So, I'm going to look now where the model is actually getting downloaded (since it's doing that somewhere, and printing it out), and see why the worker binary is not getting the same treatment. > beam_PostCommit_Go perma red due to failing to start container > -- > > Key: BEAM-9815 > URL: https://issues.apache.org/jira/browse/BEAM-9815 > Project: Beam > Issue Type: Bug > Components: sdk-go, test-failures >Reporter: Chamikara Madhusanka Jayalath >Assignee: Robert Bradshaw >Priority: Critical > Labels: currently-failing > Time Spent: 1h > Remaining Estimate: 0h > > For example, > [https://builds.apache.org/job/beam_PostCommit_Go/6847/] > [https://pantheon.corp.google.com/logs/viewer?project=apache-beam-testing=500=false=2020-04-24T15:09:13.45500Z==true=NO_LIMIT=dataflow_step%2Fjob_id%2F2020-04-24_05_03_49-5495819388067192698=2020-04-24T13:03:38.313084000Z] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9815) beam_PostCommit_Go perma red due to failing to start container
[ https://issues.apache.org/jira/browse/BEAM-9815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17092390#comment-17092390 ] Robert Burke commented on BEAM-9815: `2020/04/25 21:29:37 Initializing AWESOME Go harness: /opt/apache/beam/boot --id=1 --logging_endpoint=localhost:12370 --control_endpoint=localhost:12371 --artifact_endpoint=localhost:12372 --provision_endpoint=localhost:12373 --semi_persist_dir=/var/opt/google` is what Dataflow tells the boot container, while for Flink, only the provision service is provided. `12:16:29 2020/04/25 19:16:28 Initializing Go harness: /opt/apache/beam/boot --id=23-1 --provision_endpoint=localhost:46247` > beam_PostCommit_Go perma red due to failing to start container > -- > > Key: BEAM-9815 > URL: https://issues.apache.org/jira/browse/BEAM-9815 > Project: Beam > Issue Type: Bug > Components: sdk-go, test-failures >Reporter: Chamikara Madhusanka Jayalath >Assignee: Robert Bradshaw >Priority: Critical > Labels: currently-failing > Time Spent: 50m > Remaining Estimate: 0h > > For example, > [https://builds.apache.org/job/beam_PostCommit_Go/6847/] > [https://pantheon.corp.google.com/logs/viewer?project=apache-beam-testing=500=false=2020-04-24T15:09:13.45500Z==true=NO_LIMIT=dataflow_step%2Fjob_id%2F2020-04-24_05_03_49-5495819388067192698=2020-04-24T13:03:38.313084000Z] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-9815) beam_PostCommit_Go perma red due to failing to start container
[ https://issues.apache.org/jira/browse/BEAM-9815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17092381#comment-17092381 ] Robert Burke edited comment on BEAM-9815 at 4/25/20, 9:38 PM: -- Further investigation reveals that there must be some other logic error in the artifact fetching in the boot harness. First I thought it was an entirely different harness container that was on the dataflow side, but it turns out if I modify the [go binary booter](https://github.com/apache/beam/blob/master/sdks/go/container/boot.go) it is reflected when submitting the job, so for some reason the artifacts are either being queried incorrectly, OR being staged incorrectly, leading to the "No artifacts staged" message. was (Author: lostluck): Further investigation reveals that there must be some other logic error in the artifact fetching in the boot harness. First I though it was an entirely different harness container that was on the dataflow side, but it turns out if I modify the [go binary booter](https://github.com/apache/beam/blob/master/sdks/go/container/boot.go) it is reflected when submitting the job, so for some reason the artifacts are either being queried incorrectly, OR being staged incorrectly, leading to the "No artifacts staged" message. > beam_PostCommit_Go perma red due to failing to start container > -- > > Key: BEAM-9815 > URL: https://issues.apache.org/jira/browse/BEAM-9815 > Project: Beam > Issue Type: Bug > Components: sdk-go, test-failures >Reporter: Chamikara Madhusanka Jayalath >Assignee: Robert Bradshaw >Priority: Critical > Labels: currently-failing > Time Spent: 50m > Remaining Estimate: 0h > > For example, > [https://builds.apache.org/job/beam_PostCommit_Go/6847/] > [https://pantheon.corp.google.com/logs/viewer?project=apache-beam-testing=500=false=2020-04-24T15:09:13.45500Z==true=NO_LIMIT=dataflow_step%2Fjob_id%2F2020-04-24_05_03_49-5495819388067192698=2020-04-24T13:03:38.313084000Z] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9815) beam_PostCommit_Go perma red due to failing to start container
[ https://issues.apache.org/jira/browse/BEAM-9815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17092381#comment-17092381 ] Robert Burke commented on BEAM-9815: Further investigation reveals that there must be some other logic error in the artifact fetching in the boot harness. First I though it was an entirely different harness container that was on the dataflow side, but it turns out if I modify the [go binary booter](https://github.com/apache/beam/blob/master/sdks/go/container/boot.go) it is reflected when submitting the job, so for some reason the artifacts are either being queried incorrectly, OR being staged incorrectly, leading to the "No artifacts staged" message. > beam_PostCommit_Go perma red due to failing to start container > -- > > Key: BEAM-9815 > URL: https://issues.apache.org/jira/browse/BEAM-9815 > Project: Beam > Issue Type: Bug > Components: sdk-go, test-failures >Reporter: Chamikara Madhusanka Jayalath >Assignee: Robert Bradshaw >Priority: Critical > Labels: currently-failing > Time Spent: 0.5h > Remaining Estimate: 0h > > For example, > [https://builds.apache.org/job/beam_PostCommit_Go/6847/] > [https://pantheon.corp.google.com/logs/viewer?project=apache-beam-testing=500=false=2020-04-24T15:09:13.45500Z==true=NO_LIMIT=dataflow_step%2Fjob_id%2F2020-04-24_05_03_49-5495819388067192698=2020-04-24T13:03:38.313084000Z] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9815) beam_PostCommit_Go perma red due to failing to start container
[ https://issues.apache.org/jira/browse/BEAM-9815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17092371#comment-17092371 ] Robert Burke commented on BEAM-9815: Confirmed, that the "dev" tag doesn't exist https://hub.docker.com/r/apache/beam_go_sdk/tags Which comes from https://github.com/apache/beam/blob/master/sdks/go/test/run_integration_tests.sh#L152 And I think this worked before since previously something else changed as well, since the tests should be building and pushing an image to the beam testing repo, and no defaulting to the "dev" tagged image. That path should only be for the universal python runner. Which I've now confirmed that comparison not working for some reason... > beam_PostCommit_Go perma red due to failing to start container > -- > > Key: BEAM-9815 > URL: https://issues.apache.org/jira/browse/BEAM-9815 > Project: Beam > Issue Type: Bug > Components: sdk-go, test-failures >Reporter: Chamikara Madhusanka Jayalath >Assignee: Robert Bradshaw >Priority: Critical > Labels: currently-failing > Time Spent: 0.5h > Remaining Estimate: 0h > > For example, > [https://builds.apache.org/job/beam_PostCommit_Go/6847/] > [https://pantheon.corp.google.com/logs/viewer?project=apache-beam-testing=500=false=2020-04-24T15:09:13.45500Z==true=NO_LIMIT=dataflow_step%2Fjob_id%2F2020-04-24_05_03_49-5495819388067192698=2020-04-24T13:03:38.313084000Z] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9815) beam_PostCommit_Go perma red due to failing to start container
[ https://issues.apache.org/jira/browse/BEAM-9815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17092314#comment-17092314 ] Robert Burke commented on BEAM-9815: Digging into this further, it reads like the "apache/beam_go_sdk:dev" is not found, and IIRC we changed all that up lately, so it's probable that the container was never built and doesn't exist at all at this point. This commit removed the Go SDK containers from the release, which was when we moved from our own repo to the official apache repo. https://github.com/apache/beam/commit/061c5c7db5064e20eef50a6a51f976235b30aae2 > beam_PostCommit_Go perma red due to failing to start container > -- > > Key: BEAM-9815 > URL: https://issues.apache.org/jira/browse/BEAM-9815 > Project: Beam > Issue Type: Bug > Components: sdk-go, test-failures >Reporter: Chamikara Madhusanka Jayalath >Assignee: Robert Bradshaw >Priority: Critical > Labels: currently-failing > > For example, > [https://builds.apache.org/job/beam_PostCommit_Go/6847/] > [https://pantheon.corp.google.com/logs/viewer?project=apache-beam-testing=500=false=2020-04-24T15:09:13.45500Z==true=NO_LIMIT=dataflow_step%2Fjob_id%2F2020-04-24_05_03_49-5495819388067192698=2020-04-24T13:03:38.313084000Z] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9815) beam_PostCommit_Go perma red due to failing to start container
[ https://issues.apache.org/jira/browse/BEAM-9815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke updated BEAM-9815: --- Component/s: sdk-go > beam_PostCommit_Go perma red due to failing to start container > -- > > Key: BEAM-9815 > URL: https://issues.apache.org/jira/browse/BEAM-9815 > Project: Beam > Issue Type: Bug > Components: sdk-go, test-failures >Reporter: Chamikara Madhusanka Jayalath >Assignee: Robert Bradshaw >Priority: Critical > Labels: currently-failing > > For example, > [https://builds.apache.org/job/beam_PostCommit_Go/6847/] > [https://pantheon.corp.google.com/logs/viewer?project=apache-beam-testing=500=false=2020-04-24T15:09:13.45500Z==true=NO_LIMIT=dataflow_step%2Fjob_id%2F2020-04-24_05_03_49-5495819388067192698=2020-04-24T13:03:38.313084000Z] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-9459) Go Postcommit failing at GBK
[ https://issues.apache.org/jira/browse/BEAM-9459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke resolved BEAM-9459. Fix Version/s: Not applicable Resolution: Fixed The original issue cause was rolled back. > Go Postcommit failing at GBK > > > Key: BEAM-9459 > URL: https://issues.apache.org/jira/browse/BEAM-9459 > Project: Beam > Issue Type: Bug > Components: sdk-go, test-failures >Reporter: Daniel Oliveira >Assignee: Robert Burke >Priority: Major > Fix For: Not applicable > > Time Spent: 10m > Remaining Estimate: 0h > > Example: [https://builds.apache.org/job/beam_PostCommit_Go_PR/106/] > [https://scans.gradle.com/s/es67rfaomu26m] > > {noformat} > 2020/03/06 00:47:41 Submitted job: 2020-03-05_16_47_40-13139296997856231782 > 2020/03/06 00:47:41 Submitted job: 2020-03-05_16_47_40-13139296997856231782 > 2020/03/06 00:47:41 Console: > https://console.cloud.google.com/dataflow/job/2020-03-05_16_47_40-13139296997856231782?project=apache-beam-testing > 2020/03/06 00:47:41 Logs: > https://console.cloud.google.com/logs/viewer?project=apache-beam-testing=dataflow_step%2Fjob_id%2F2020-03-05_16_47_40-13139296997856231782 > ... > 2020/03/06 00:50:41 Test cogbk:cogbk failed: job > 2020-03-05_16_47_40-13139296997856231782 failed{noformat} > And then in the console logs: > [https://pantheon.corp.google.com/logs/viewer?project=apache-beam-testing=dataflow_step%2Fjob_id%2F2020-03-05_16_47_40-13139296997856231782=500=false=2020-03-06T01:01:14.21000Z==true=2020-03-06T00:01:14.460Z=2020-03-06T01:01:14.460Z=PT1H=2020-03-06T00:49:14.413355915Z] > > {code:java} > exception: "java.util.concurrent.ExecutionException: > java.lang.RuntimeException: Error received from SDK harness for instruction > -165: process bundle failed for instruction -165 using plan -122 : panic: > Unexpected coder: > CoGBK goroutine 81 > [running]: > runtime/debug.Stack(0xc001103970, 0xd2c5e0, 0xc000bd7f40) > /usr/lib/go-1.12/src/runtime/debug/stack.go:24 +0x9d > github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.callNoPanic.func1(0xc001103b90) > > /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/util.go:40 > +0x60 > panic(0xd2c5e0, 0xc000bd7f40) > /usr/lib/go-1.12/src/runtime/panic.go:522 +0x1b5 > github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.MakeElementEncoder(0xc000b99cc0, > 0xc000aa4930, 0xc000b64a00) > > /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/coder.go:91 > +0x479 > github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.(*PCollection).Up(0xc000af3dd0, > 0x10018e0, 0xc000b57f80, 0x0, 0xc000346b50) > > /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/pcollection.go:59 > +0xfe > github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.callNoPanic(0x10018e0, > 0xc000b57f80, 0xc000346c28, 0x0, 0x0) > > /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/util.go:43 > +0x6c > github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.(*Plan).Execute(0xc0002623f0, > 0x10018e0, 0xc000b57f80, 0xc0002365a0, 0x4, 0xff0340, 0xc000aa4750, > 0xff0380, 0xc000b57fc0, 0xc000346de0, ...) > > /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/plan.go:93 > +0xdf > github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness.(*control).handleInstruction(0xc0001f4680, > 0x10017a0, 0xc0001bafc0, 0xc000b57dc0, 0xc0001bafc0) > > /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness/harness.go:211 > +0xa34 > github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness.Main.func2(0x10017a0, > 0xc0001bafc0,
[jira] [Commented] (BEAM-9815) beam_PostCommit_Go perma red due to failing to start container
[ https://issues.apache.org/jira/browse/BEAM-9815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091727#comment-17091727 ] Robert Burke commented on BEAM-9815: If I knew how to update the Dataflow artifact boot container that's not been updated I would do it, but I've been unable to trace where and how that container is generated or chosen or set by the service. Last I heard, it might require a Dataflow service release to resolve. Given that Dataflow doesn't yet support the using the Go SDK on it's service, I suspect this will not be a high priority at this time. I'm happier that the Flink and Spark runs, which are the same tests, are still passing however. > beam_PostCommit_Go perma red due to failing to start container > -- > > Key: BEAM-9815 > URL: https://issues.apache.org/jira/browse/BEAM-9815 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Chamikara Madhusanka Jayalath >Assignee: Robert Bradshaw >Priority: Critical > Labels: currently-failing > > For example, > [https://builds.apache.org/job/beam_PostCommit_Go/6847/] > [https://pantheon.corp.google.com/logs/viewer?project=apache-beam-testing=500=false=2020-04-24T15:09:13.45500Z==true=NO_LIMIT=dataflow_step%2Fjob_id%2F2020-04-24_05_03_49-5495819388067192698=2020-04-24T13:03:38.313084000Z] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-9815) beam_PostCommit_Go perma red due to failing to start container
[ https://issues.apache.org/jira/browse/BEAM-9815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke reassigned BEAM-9815: -- Assignee: Robert Bradshaw (was: Robert Burke) > beam_PostCommit_Go perma red due to failing to start container > -- > > Key: BEAM-9815 > URL: https://issues.apache.org/jira/browse/BEAM-9815 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Chamikara Madhusanka Jayalath >Assignee: Robert Bradshaw >Priority: Critical > Labels: currently-failing > > For example, > [https://builds.apache.org/job/beam_PostCommit_Go/6847/] > [https://pantheon.corp.google.com/logs/viewer?project=apache-beam-testing=500=false=2020-04-24T15:09:13.45500Z==true=NO_LIMIT=dataflow_step%2Fjob_id%2F2020-04-24_05_03_49-5495819388067192698=2020-04-24T13:03:38.313084000Z] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9459) Go Postcommit failing at GBK
[ https://issues.apache.org/jira/browse/BEAM-9459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091087#comment-17091087 ] Robert Burke commented on BEAM-9459: Dataflow Postcommits are broken since the Artifact API was changed recently,and the Dataflow boot container that fetches artifacts hasn't been updated as well yet. On Thu, Apr 23, 2020, 6:10 PM Chamikara Madhusanka Jayalath (Jira) < > Go Postcommit failing at GBK > > > Key: BEAM-9459 > URL: https://issues.apache.org/jira/browse/BEAM-9459 > Project: Beam > Issue Type: Bug > Components: sdk-go, test-failures >Reporter: Daniel Oliveira >Assignee: Robert Burke >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Example: [https://builds.apache.org/job/beam_PostCommit_Go_PR/106/] > [https://scans.gradle.com/s/es67rfaomu26m] > > {noformat} > 2020/03/06 00:47:41 Submitted job: 2020-03-05_16_47_40-13139296997856231782 > 2020/03/06 00:47:41 Submitted job: 2020-03-05_16_47_40-13139296997856231782 > 2020/03/06 00:47:41 Console: > https://console.cloud.google.com/dataflow/job/2020-03-05_16_47_40-13139296997856231782?project=apache-beam-testing > 2020/03/06 00:47:41 Logs: > https://console.cloud.google.com/logs/viewer?project=apache-beam-testing=dataflow_step%2Fjob_id%2F2020-03-05_16_47_40-13139296997856231782 > ... > 2020/03/06 00:50:41 Test cogbk:cogbk failed: job > 2020-03-05_16_47_40-13139296997856231782 failed{noformat} > And then in the console logs: > [https://pantheon.corp.google.com/logs/viewer?project=apache-beam-testing=dataflow_step%2Fjob_id%2F2020-03-05_16_47_40-13139296997856231782=500=false=2020-03-06T01:01:14.21000Z==true=2020-03-06T00:01:14.460Z=2020-03-06T01:01:14.460Z=PT1H=2020-03-06T00:49:14.413355915Z] > > {code:java} > exception: "java.util.concurrent.ExecutionException: > java.lang.RuntimeException: Error received from SDK harness for instruction > -165: process bundle failed for instruction -165 using plan -122 : panic: > Unexpected coder: > CoGBK goroutine 81 > [running]: > runtime/debug.Stack(0xc001103970, 0xd2c5e0, 0xc000bd7f40) > /usr/lib/go-1.12/src/runtime/debug/stack.go:24 +0x9d > github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.callNoPanic.func1(0xc001103b90) > > /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/util.go:40 > +0x60 > panic(0xd2c5e0, 0xc000bd7f40) > /usr/lib/go-1.12/src/runtime/panic.go:522 +0x1b5 > github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.MakeElementEncoder(0xc000b99cc0, > 0xc000aa4930, 0xc000b64a00) > > /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/coder.go:91 > +0x479 > github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.(*PCollection).Up(0xc000af3dd0, > 0x10018e0, 0xc000b57f80, 0x0, 0xc000346b50) > > /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/pcollection.go:59 > +0xfe > github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.callNoPanic(0x10018e0, > 0xc000b57f80, 0xc000346c28, 0x0, 0x0) > > /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/util.go:43 > +0x6c > github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.(*Plan).Execute(0xc0002623f0, > 0x10018e0, 0xc000b57f80, 0xc0002365a0, 0x4, 0xff0340, 0xc000aa4750, > 0xff0380, 0xc000b57fc0, 0xc000346de0, ...) > > /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/plan.go:93 > +0xdf > github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness.(*control).handleInstruction(0xc0001f4680, > 0x10017a0, 0xc0001bafc0, 0xc000b57dc0, 0xc0001bafc0) > > /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness/harness.go:211 > +0xa34 >
[jira] [Created] (BEAM-9789) Locking error in harness.go
Robert Burke created BEAM-9789: -- Summary: Locking error in harness.go Key: BEAM-9789 URL: https://issues.apache.org/jira/browse/BEAM-9789 Project: Beam Issue Type: Bug Components: sdk-go Reporter: Robert Burke Assignee: Robert Burke Fix For: Not applicable When there's an error on lookup or construction of an execution plan, the lock is accidentally held causing the worker to freeze. Shouldn't be user affecting, as most plans and lookups are correct without error, but if there's a transient GRPC issue on lookup, that might cause an otherwise healthy worker to deadlock. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-8472) Get default GCP region from gcloud
[ https://issues.apache.org/jira/browse/BEAM-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082711#comment-17082711 ] Robert Burke edited comment on BEAM-8472 at 4/13/20, 10:26 PM: --- Just to be clear, the protocol is to check the environment variables, and then execute the gcloud command? Which would be to use [os.Getenv|https://godoc.org/pkg/os#Getenv] with "CLOUDSDK_COMPUTE_REGION" and then use the [os/exec package|https://godoc.org/pkg/os/exec] to call the gcloud executable? was (Author: lostluck): Just to be clear, the protocol is to check the environment variables, and then execute the gcloud command? Which would be to use [os.Getenv|https://godoc.corp.google.com/pkg/os#Getenv] with "CLOUDSDK_COMPUTE_REGION" and then use the [os/exec package|https://godoc.corp.google.com/pkg/os/exec] to call the gcloud executable? > Get default GCP region from gcloud > -- > > Key: BEAM-8472 > URL: https://issues.apache.org/jira/browse/BEAM-8472 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow, sdk-go >Reporter: Kyle Weaver >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > Currently, we default to us-central1 if --region flag is not set. The Google > Cloud SDK generally tries to get a default value in this case for > convenience, which we should follow. > [https://cloud.google.com/compute/docs/gcloud-compute/#order_of_precedence_for_default_properties] > Update 11/12: this is complete for Python and Java, Go remains. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8472) Get default GCP region from gcloud
[ https://issues.apache.org/jira/browse/BEAM-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082711#comment-17082711 ] Robert Burke commented on BEAM-8472: Just to be clear, the protocol is to check the environment variables, and then execute the gcloud command? Which would be to use [os.Getenv|https://godoc.corp.google.com/pkg/os#Getenv] with "CLOUDSDK_COMPUTE_REGION" and then use the [os/exec package|https://godoc.corp.google.com/pkg/os/exec] to call the gcloud executable? > Get default GCP region from gcloud > -- > > Key: BEAM-8472 > URL: https://issues.apache.org/jira/browse/BEAM-8472 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow, sdk-go >Reporter: Kyle Weaver >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > Currently, we default to us-central1 if --region flag is not set. The Google > Cloud SDK generally tries to get a default value in this case for > convenience, which we should follow. > [https://cloud.google.com/compute/docs/gcloud-compute/#order_of_precedence_for_default_properties] > Update 11/12: this is complete for Python and Java, Go remains. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8472) Get default GCP region from gcloud
[ https://issues.apache.org/jira/browse/BEAM-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082707#comment-17082707 ] Robert Burke commented on BEAM-8472: Eventually. Dataflow doesn't currently support the Go SDK so this won't be prioritized above current work any time soon. > Get default GCP region from gcloud > -- > > Key: BEAM-8472 > URL: https://issues.apache.org/jira/browse/BEAM-8472 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow, sdk-go >Reporter: Kyle Weaver >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > Currently, we default to us-central1 if --region flag is not set. The Google > Cloud SDK generally tries to get a default value in this case for > convenience, which we should follow. > [https://cloud.google.com/compute/docs/gcloud-compute/#order_of_precedence_for_default_properties] > Update 11/12: this is complete for Python and Java, Go remains. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-8472) Get default GCP region from gcloud
[ https://issues.apache.org/jira/browse/BEAM-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke reassigned BEAM-8472: -- Assignee: (was: Kyle Weaver) > Get default GCP region from gcloud > -- > > Key: BEAM-8472 > URL: https://issues.apache.org/jira/browse/BEAM-8472 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow, sdk-go >Reporter: Kyle Weaver >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > Currently, we default to us-central1 if --region flag is not set. The Google > Cloud SDK generally tries to get a default value in this case for > convenience, which we should follow. > [https://cloud.google.com/compute/docs/gcloud-compute/#order_of_precedence_for_default_properties] > Update 11/12: this is complete for Python and Java, Go remains. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9746) [Go SDK] Empty side inputs causing spurious zero elements
Robert Burke created BEAM-9746: -- Summary: [Go SDK] Empty side inputs causing spurious zero elements Key: BEAM-9746 URL: https://issues.apache.org/jira/browse/BEAM-9746 Project: Beam Issue Type: Improvement Components: sdk-go Reporter: Robert Burke Assignee: Robert Burke A user discovered that empty side inputs would spuriously provide a single zero element. The error was narrowed down to the Go SDK's state manager code copying the stateGetResponse data wasn't checking that the original data source even had any bytes in it, leading it in particular to interpret length prefixed data as having 0 length, which would cause zero value elements to be generated. Notably, this caused empty strings. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (BEAM-9731) golang passert.Equals output is unhelpful
[ https://issues.apache.org/jira/browse/BEAM-9731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke closed BEAM-9731. -- Fix Version/s: Not applicable Resolution: Fixed Thanks Paul Fisher! > golang passert.Equals output is unhelpful > - > > Key: BEAM-9731 > URL: https://issues.apache.org/jira/browse/BEAM-9731 > Project: Beam > Issue Type: Improvement > Components: sdk-go, testing >Reporter: Paul Fisher >Priority: Minor > Fix For: Not applicable > > Time Spent: 2h 10m > Remaining Estimate: 0h > > The output from using passert.Equals includes only one of the missing or > unexpected elements from the diff. Including all of the missing and > unexpected elements will make tests much easier to debug. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9690) Go build failing: undefined: primitives.Reshuffle(KV)
[ https://issues.apache.org/jira/browse/BEAM-9690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074936#comment-17074936 ] Robert Burke commented on BEAM-9690: I've been unable to replicate this issue locally, and the post commits are differently broken at present due to artifact issues, though when they were first committed, they did correctly run in post commit. > Go build failing: undefined: primitives.Reshuffle(KV) > - > > Key: BEAM-9690 > URL: https://issues.apache.org/jira/browse/BEAM-9690 > Project: Beam > Issue Type: Bug > Components: sdk-go >Reporter: Kyle Weaver >Assignee: Robert Burke >Priority: Major > > Go SDK build is failing on head (1d3e3ef9ffb4aaa913dc223d92626ca9f0f43207). I > tried ./gradlew sdks:go:clean but it didn't seem to make a difference. > Logs: > ./gradlew :sdks:go:container:docker > Resolving dependencies... > # github.com/apache/beam/sdks/go/test/integration > .gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/integration/driver.go:67:27: > undefined: primitives.Reshuffle > .gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/integration/driver.go:68:29: > undefined: primitives.ReshuffleKV > > Task :sdks:go:buildDarwinAmd64 FAILED > FAILURE: Build failed with an exception. > * What went wrong: > Execution failed for task ':sdks:go:buildDarwinAmd64'. > > Build failed due to return code 2 of: > Command: >/Users/kcweaver/.gradle/go/binary/1.12/go/bin/go build -o > ./build/bin/integration github.com/apache/beam/sdks/go/test/integration > Env: >GOEXE= > > GOPATH=/Users/kcweaver/go/src/github.com/apache/beam/sdks/go/.gogradle/project_gopath >GOROOT=/Users/kcweaver/.gradle/go/binary/1.12/go >GOOS=darwin >GOARCH=amd64 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9676) Go SDK Code Katas
[ https://issues.apache.org/jira/browse/BEAM-9676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke updated BEAM-9676: --- Status: Open (was: Triage Needed) > Go SDK Code Katas > - > > Key: BEAM-9676 > URL: https://issues.apache.org/jira/browse/BEAM-9676 > Project: Beam > Issue Type: Improvement > Components: katas, sdk-go >Reporter: Robert Burke >Assignee: Damon Douglas >Priority: Major > > There should be code katas for the Go SDK similar to the Java and Python SDKs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-9676) Go SDK Code Katas
[ https://issues.apache.org/jira/browse/BEAM-9676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke reassigned BEAM-9676: -- Assignee: Damon Douglas > Go SDK Code Katas > - > > Key: BEAM-9676 > URL: https://issues.apache.org/jira/browse/BEAM-9676 > Project: Beam > Issue Type: Improvement > Components: katas, sdk-go >Reporter: Robert Burke >Assignee: Damon Douglas >Priority: Major > > There should be code katas for the Go SDK similar to the Java and Python SDKs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9676) Go SDK Code Katas
Robert Burke created BEAM-9676: -- Summary: Go SDK Code Katas Key: BEAM-9676 URL: https://issues.apache.org/jira/browse/BEAM-9676 Project: Beam Issue Type: Improvement Components: katas, sdk-go Reporter: Robert Burke There should be code katas for the Go SDK similar to the Java and Python SDKs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9667) Allow metrics use during DoFn Setup
[ https://issues.apache.org/jira/browse/BEAM-9667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke updated BEAM-9667: --- Status: Open (was: Triage Needed) > Allow metrics use during DoFn Setup > --- > > Key: BEAM-9667 > URL: https://issues.apache.org/jira/browse/BEAM-9667 > Project: Beam > Issue Type: Bug > Components: sdk-go >Reporter: Robert Burke >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > User found a bug where runners were crashing because the PTransform label for > metrics were not being populated by the Go SDK. It was narrowed down to the > Setup method not populating the PTransformId context, but providing a bundle > context. > As long as users aren't caching the context in their DoFns, populating the > PTransformID for Setup should be safe as long as we don't cache it, as the > bundle Id will be different for subsequent executions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9667) Allow metrics use during DoFn Setup
[ https://issues.apache.org/jira/browse/BEAM-9667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke updated BEAM-9667: --- Description: User found a bug where runners were crashing because the PTransform label for metrics were not being populated by the Go SDK. It was narrowed down to the Setup method not populating the PTransformId context, but providing a bundle context. As long as users aren't caching the context in their DoFns, populating the PTransformID for Setup should be safe as long as we don't cache it, as the bundle Id will be different for subsequent executions. > Allow metrics use during DoFn Setup > --- > > Key: BEAM-9667 > URL: https://issues.apache.org/jira/browse/BEAM-9667 > Project: Beam > Issue Type: Bug > Components: sdk-go >Reporter: Robert Burke >Priority: Minor > > User found a bug where runners were crashing because the PTransform label for > metrics were not being populated by the Go SDK. It was narrowed down to the > Setup method not populating the PTransformId context, but providing a bundle > context. > As long as users aren't caching the context in their DoFns, populating the > PTransformID for Setup should be safe as long as we don't cache it, as the > bundle Id will be different for subsequent executions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9667) Allow metrics use during DoFn Setup
Robert Burke created BEAM-9667: -- Summary: Allow metrics use during DoFn Setup Key: BEAM-9667 URL: https://issues.apache.org/jira/browse/BEAM-9667 Project: Beam Issue Type: Bug Components: sdk-go Reporter: Robert Burke -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9616) [Go SDK] starcgen improvements
[ https://issues.apache.org/jira/browse/BEAM-9616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke updated BEAM-9616: --- Labels: golang (was: ) > [Go SDK] starcgen improvements > -- > > Key: BEAM-9616 > URL: https://issues.apache.org/jira/browse/BEAM-9616 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Robert Burke >Priority: Major > Labels: golang > > The starcgen code generator works OK, but could do with some improvements. > * Uniquifying imports (handling multiple imports with same short suffix) > * Generating multiple iterNatives (eg when the normal symbol is already > taken). > * Keying off of beam.Register* calls rather than command line. > ** Avoids duplicating lists of identifiers, and improves default behavior. > ** Possibly have a new beam.RegisterDoFn which can take a list of DoFn and > struct types a function or a struct, and key off those, reducing boiler plate > somewhat. > * Perhaps having a specific single import alias package for components > required for import, rather than the current 3-4. > * Generate efficient Beam Schema coders for registered types? > * Handle SplittableDoFns properly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9616) [Go SDK] starcgen improvements
[ https://issues.apache.org/jira/browse/BEAM-9616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke updated BEAM-9616: --- Status: Open (was: Triage Needed) > [Go SDK] starcgen improvements > -- > > Key: BEAM-9616 > URL: https://issues.apache.org/jira/browse/BEAM-9616 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Robert Burke >Priority: Major > > The starcgen code generator works OK, but could do with some improvements. > * Uniquifying imports (handling multiple imports with same short suffix) > * Generating multiple iterNatives (eg when the normal symbol is already > taken). > * Keying off of beam.Register* calls rather than command line. > ** Avoids duplicating lists of identifiers, and improves default behavior. > ** Possibly have a new beam.RegisterDoFn which can take a list of DoFn and > struct types a function or a struct, and key off those, reducing boiler plate > somewhat. > * Perhaps having a specific single import alias package for components > required for import, rather than the current 3-4. > * Generate efficient Beam Schema coders for registered types? > * Handle SplittableDoFns properly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9616) [Go SDK] starcgen improvements
Robert Burke created BEAM-9616: -- Summary: [Go SDK] starcgen improvements Key: BEAM-9616 URL: https://issues.apache.org/jira/browse/BEAM-9616 Project: Beam Issue Type: Improvement Components: sdk-go Reporter: Robert Burke The starcgen code generator works OK, but could do with some improvements. * Uniquifying imports (handling multiple imports with same short suffix) * Generating multiple iterNatives (eg when the normal symbol is already taken). * Keying off of beam.Register* calls rather than command line. ** Avoids duplicating lists of identifiers, and improves default behavior. ** Possibly have a new beam.RegisterDoFn which can take a list of DoFn and struct types a function or a struct, and key off those, reducing boiler plate somewhat. * Perhaps having a specific single import alias package for components required for import, rather than the current 3-4. * Generate efficient Beam Schema coders for registered types? * Handle SplittableDoFns properly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9615) [Go SDK] Beam Schemas
[ https://issues.apache.org/jira/browse/BEAM-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke updated BEAM-9615: --- Status: Open (was: Triage Needed) > [Go SDK] Beam Schemas > - > > Key: BEAM-9615 > URL: https://issues.apache.org/jira/browse/BEAM-9615 > Project: Beam > Issue Type: New Feature > Components: sdk-go >Reporter: Robert Burke >Assignee: Robert Burke >Priority: Major > > Schema support is required for advanced cross language features in Beam, and > has the opportunity to replace the current default JSON encoding of elements. > > Some quick notes, though a better fleshed out doc with details will be > forthcoming: > * All base coders should be implemented, and listed as coder capabilities. I > think only stringutf8 is missing presently. > * Should support fairly arbitrary user types, seamlessly. That is, users > should be able to rely on it "just working" if their type is compatible. > * Should support schema metadata tagging. > In particular, one breaking shift in the default will be to explicitly fail > pipelines if elements have unexported fields, when no other custom coder has > been added. This has been a source of errors/dropped data/keys and a simply > warning at construction time won't cut it. However, we could provide a manual > "use beam schemas, but ignore unexported fields" registration as a work > around. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9615) [Go SDK] Beam Schemas
[ https://issues.apache.org/jira/browse/BEAM-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke updated BEAM-9615: --- Description: Schema support is required for advanced cross language features in Beam, and has the opportunity to replace the current default JSON encoding of elements. Some quick notes, though a better fleshed out doc with details will be forthcoming: * All base coders should be implemented, and listed as coder capabilities. I think only stringutf8 is missing presently. * Should support fairly arbitrary user types, seamlessly. That is, users should be able to rely on it "just working" if their type is compatible. * Should support schema metadata tagging. In particular, one breaking shift in the default will be to explicitly fail pipelines if elements have unexported fields, when no other custom coder has been added. This has been a source of errors/dropped data/keys and a simply warning at construction time won't cut it. However, we could provide a manual "use beam schemas, but ignore unexported fields" registration as a work around. was: Schema support is required for advanced cross language features in Beam, and has the opportunity to replace the current default JSON encoding of elements. Some quick notes * All base coders should be implemented, and listed as coder capabilities. I think only stringutf8 is missing presently. * Should support fairly arbitrary user types, seamlessly. That is, users should be able to rely on it "just working" if their type is compatible. * Should support schema metadata tagging. In particular, one breaking shift in the default will be to explicitly fail pipelines if elements have unexported fields, when no other custom coder has been added. This has been a source of errors/dropped data/keys and a simply warning at construction time won't cut it. However, we could provide a manual "use beam schemas, but ignore unexported fields" registration as a work around. > [Go SDK] Beam Schemas > - > > Key: BEAM-9615 > URL: https://issues.apache.org/jira/browse/BEAM-9615 > Project: Beam > Issue Type: New Feature > Components: sdk-go >Reporter: Robert Burke >Assignee: Robert Burke >Priority: Major > > Schema support is required for advanced cross language features in Beam, and > has the opportunity to replace the current default JSON encoding of elements. > > Some quick notes, though a better fleshed out doc with details will be > forthcoming: > * All base coders should be implemented, and listed as coder capabilities. I > think only stringutf8 is missing presently. > * Should support fairly arbitrary user types, seamlessly. That is, users > should be able to rely on it "just working" if their type is compatible. > * Should support schema metadata tagging. > In particular, one breaking shift in the default will be to explicitly fail > pipelines if elements have unexported fields, when no other custom coder has > been added. This has been a source of errors/dropped data/keys and a simply > warning at construction time won't cut it. However, we could provide a manual > "use beam schemas, but ignore unexported fields" registration as a work > around. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9615) [Go SDK] Beam Schemas
Robert Burke created BEAM-9615: -- Summary: [Go SDK] Beam Schemas Key: BEAM-9615 URL: https://issues.apache.org/jira/browse/BEAM-9615 Project: Beam Issue Type: New Feature Components: sdk-go Reporter: Robert Burke Assignee: Robert Burke Schema support is required for advanced cross language features in Beam, and has the opportunity to replace the current default JSON encoding of elements. Some quick notes * All base coders should be implemented, and listed as coder capabilities. I think only stringutf8 is missing presently. * Should support fairly arbitrary user types, seamlessly. That is, users should be able to rely on it "just working" if their type is compatible. * Should support schema metadata tagging. In particular, one breaking shift in the default will be to explicitly fail pipelines if elements have unexported fields, when no other custom coder has been added. This has been a source of errors/dropped data/keys and a simply warning at construction time won't cut it. However, we could provide a manual "use beam schemas, but ignore unexported fields" registration as a work around. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9614) Declare versioned capability for identifying the Go SDK.
[ https://issues.apache.org/jira/browse/BEAM-9614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067912#comment-17067912 ] Robert Burke commented on BEAM-9614: A quick search doesn't indicate any good way to do this without simply having a go file somewhere that gets updated for each release. Right now the Dataflow runner package has a constant which declares the version to be 0.5.0, but ideally it's something we can include in some script that generates the release branches. > Declare versioned capability for identifying the Go SDK. > > > Key: BEAM-9614 > URL: https://issues.apache.org/jira/browse/BEAM-9614 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Robert Bradshaw >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-8292) Add a Reshuffle PTransform preventing fusion of the surrounding transforms
[ https://issues.apache.org/jira/browse/BEAM-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke reassigned BEAM-8292: -- Assignee: Robert Burke > Add a Reshuffle PTransform preventing fusion of the surrounding transforms > -- > > Key: BEAM-8292 > URL: https://issues.apache.org/jira/browse/BEAM-8292 > Project: Beam > Issue Type: New Feature > Components: sdk-go >Reporter: John Patoch >Assignee: Robert Burke >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > Reshuffle is a PTransform that takes a PCollection and shuffles the data > to help increase parallelism. > Reshuffle adds a temporary random key to each element, performs a > GroupByKey, and finally removes the temporary key. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-9551) Pass around Environment PB as pointer not value
[ https://issues.apache.org/jira/browse/BEAM-9551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke resolved BEAM-9551. Fix Version/s: Not applicable Resolution: Fixed > Pass around Environment PB as pointer not value > --- > > Key: BEAM-9551 > URL: https://issues.apache.org/jira/browse/BEAM-9551 > Project: Beam > Issue Type: Bug > Components: sdk-go >Reporter: Robert Burke >Assignee: Robert Burke >Priority: Minor > Fix For: Not applicable > > Time Spent: 40m > Remaining Estimate: 0h > > Go Protocol buffers prefer being passed around by Pointer than by value. > Caught by a linter, and should be fixed for good practice. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9551) Pass around Environment PB as pointer not value
[ https://issues.apache.org/jira/browse/BEAM-9551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke updated BEAM-9551: --- Status: Open (was: Triage Needed) > Pass around Environment PB as pointer not value > --- > > Key: BEAM-9551 > URL: https://issues.apache.org/jira/browse/BEAM-9551 > Project: Beam > Issue Type: Bug > Components: sdk-go >Reporter: Robert Burke >Assignee: Robert Burke >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > Go Protocol buffers prefer being passed around by Pointer than by value. > Caught by a linter, and should be fixed for good practice. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9551) Pass around Environment PB as pointer not value
Robert Burke created BEAM-9551: -- Summary: Pass around Environment PB as pointer not value Key: BEAM-9551 URL: https://issues.apache.org/jira/browse/BEAM-9551 Project: Beam Issue Type: Bug Components: sdk-go Reporter: Robert Burke Assignee: Robert Burke Go Protocol buffers prefer being passed around by Pointer than by value. Caught by a linter, and should be fixed for good practice. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (BEAM-9374) Go Postcommits not pulling right container name
[ https://issues.apache.org/jira/browse/BEAM-9374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke closed BEAM-9374. -- Fix Version/s: Not applicable Resolution: Fixed They were fixed by [https://github.com/apache/beam/commit/88914cf7c79ca185e2f67a03a7d1dc57372c6873#diff-2f9709e332964eeedae560738d7e] before this was filed. I had stale pages, which had the old content cached somehow. > Go Postcommits not pulling right container name > --- > > Key: BEAM-9374 > URL: https://issues.apache.org/jira/browse/BEAM-9374 > Project: Beam > Issue Type: Bug > Components: sdk-go >Reporter: Robert Burke >Assignee: Hannah Jiang >Priority: Major > Fix For: Not applicable > > > It looks like a script variable CONTAINERS wasn't updated in > [https://github.com/apache/beam/pull/10612] , causing the container pull to > fail. > [https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/2518/] > [https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/consoleText] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-9374) Go Postcommits not pulling right container name
[ https://issues.apache.org/jira/browse/BEAM-9374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke reassigned BEAM-9374: -- Assignee: Hannah Jiang (was: Robert Burke) > Go Postcommits not pulling right container name > --- > > Key: BEAM-9374 > URL: https://issues.apache.org/jira/browse/BEAM-9374 > Project: Beam > Issue Type: Bug > Components: sdk-go >Reporter: Robert Burke >Assignee: Hannah Jiang >Priority: Major > > It looks like a script variable CONTAINERS wasn't updated in > [https://github.com/apache/beam/pull/10612] , causing the container pull to > fail. > [https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/2518/] > [https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/consoleText] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9374) Go Postcommits not pulling right container name
Robert Burke created BEAM-9374: -- Summary: Go Postcommits not pulling right container name Key: BEAM-9374 URL: https://issues.apache.org/jira/browse/BEAM-9374 Project: Beam Issue Type: Bug Components: sdk-go Reporter: Robert Burke Assignee: Robert Burke It looks like a script variable CONTAINERS wasn't updated in [https://github.com/apache/beam/pull/10612] , causing the container pull to fail. [https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/2518/] [https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/consoleText] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-6374) "elements added" for input and output collections is always empty
[ https://issues.apache.org/jira/browse/BEAM-6374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke reassigned BEAM-6374: -- Assignee: Robert Burke > "elements added" for input and output collections is always empty > - > > Key: BEAM-6374 > URL: https://issues.apache.org/jira/browse/BEAM-6374 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, sdk-go >Reporter: Andrew Brampton >Assignee: Robert Burke >Priority: Major > > The field for "Elements added" and "Estimated size" is always blank when > running a Go binary on Dataflow. For example when running the work count > example: https://pasteboard.co/HVf80BU.png -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-3306) Consider: Go coder registry
[ https://issues.apache.org/jira/browse/BEAM-3306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke resolved BEAM-3306. Fix Version/s: Not applicable Resolution: Fixed Go supports a coder registry w/beam.RegisterCoder Remaining work might be to optionally support "direct" access to an io.Reader or io.Writer interface which could yield efficiency gains in some situations for user types. > Consider: Go coder registry > --- > > Key: BEAM-3306 > URL: https://issues.apache.org/jira/browse/BEAM-3306 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Henning Rohde >Assignee: Robert Burke >Priority: Minor > Fix For: Not applicable > > Time Spent: 2h 50m > Remaining Estimate: 0h > > Add coder registry to allow easier overwrite of default coders. We may also > allow otherwise un-encodable types, but that would require that function > analysis depends on it. > If we're hardcoding support for proto/avro, then there may be little need for > such a feature. Conversely, this may be how we implement such support. > > Proposal Doc: > [https://docs.google.com/document/d/1kQwx4Ah6PzG8z2ZMuNsNEXkGsLXm6gADOZaIO7reUOg/edit#|https://docs.google.com/document/d/1kQwx4Ah6PzG8z2ZMuNsNEXkGsLXm6gADOZaIO7reUOg/edit] > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-3545) Fn API metrics in Go SDK harness
[ https://issues.apache.org/jira/browse/BEAM-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke reassigned BEAM-3545: -- Assignee: Robert Burke > Fn API metrics in Go SDK harness > > > Key: BEAM-3545 > URL: https://issues.apache.org/jira/browse/BEAM-3545 > Project: Beam > Issue Type: Sub-task > Components: sdk-go >Reporter: Kenneth Knowles >Assignee: Robert Burke >Priority: Major > Labels: portability > Time Spent: 13h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-9167) Reduce overhead of Go SDK side metrics
[ https://issues.apache.org/jira/browse/BEAM-9167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke resolved BEAM-9167. Fix Version/s: Not applicable Resolution: Fixed SDK side performance of user metrics is now reduced significantly if the proxy object is used. There's other metrics related work (eg. framework metrics around PCollections and ParDos, programmatic extraction, using the updated Monitoring infos), but they are tracked by other JIRAs. > Reduce overhead of Go SDK side metrics > -- > > Key: BEAM-9167 > URL: https://issues.apache.org/jira/browse/BEAM-9167 > Project: Beam > Issue Type: Sub-task > Components: sdk-go >Reporter: Robert Burke >Assignee: Robert Burke >Priority: Major > Fix For: Not applicable > > Time Spent: 3h 20m > Remaining Estimate: 0h > > Locking overhead due to the global store and local caches of SDK counter data > can dominate certain workloads, which means we can do better. > Instead of having a global store of metrics data to extract counters, we > should use per ptransform (or per bundle) counter sets, which would avoid > requiring locking per counter operation. The main detriment compared to the > current implementation is that a user would need to add their own locking if > they were to spawn multiple goroutines to process a Bundle's work in a DoFn. > Given that self multithreaded DoFns aren't recommended/safe in Java, largely > impossible in Python, and the other beam Go SDK provided constructs (like > Iterators and Emitters) are not thread safe, this is a small concern, > provided the documentation is clear on this. > Removing the locking and switching to atomic ops reduces the overhead > significantly in example jobs and in the benchmarks. > A second part of this change should be to move the exec package to manage > it's own per bundle state, rather than relying on a global datastore to > extract the per bundle,per ptransform values. > Related: https://issues.apache.org/jira/browse/BEAM-6541 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-7726) [Go SDK] State Backed Iterables
[ https://issues.apache.org/jira/browse/BEAM-7726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke resolved BEAM-7726. Resolution: Fixed The Go SDK now supports using State Backed iterables if the runner triggers it. > [Go SDK] State Backed Iterables > --- > > Key: BEAM-7726 > URL: https://issues.apache.org/jira/browse/BEAM-7726 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Affects Versions: Not applicable >Reporter: Robert Burke >Assignee: Robert Burke >Priority: Major > Fix For: Not applicable > > Time Spent: 3h > Remaining Estimate: 0h > > The Go SDK should support the State backed iterables protocol per the proto. > [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L644] > > Primary case is for iterables after CoGBKs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-7726) [Go SDK] State Backed Iterables
[ https://issues.apache.org/jira/browse/BEAM-7726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17004337#comment-17004337 ] Robert Burke edited comment on BEAM-7726 at 2/4/20 10:46 PM: - The data channel is correctly multiplexing bundles. There's no other way to do the multiple streams thing in the current protocol and GRPC without the runner having multiple endpoints, or the process doing so (eg. Multiple SDK Harnesses per worker, which is how python handles it). I think I have a resolution for state backed iterables blocking the datachannel, which will work for any runners that support datasource split requests. If the data channel is eventually split down to a the current value and no more, we can close the reader, which will cause the channel to be unblocked. Any buffered data will be drained. Care needs to be taken to avoid deadlocking or dataloss or race conditions, but there should only be lock contention when the Split thread is closing the reader. Edit (2020/02/04): I wasn't able to confirm that this actually worked better, and even though there was no material locking overhead, the additional complexity to that part of the code isn't worth questionable benefits. Tabling for now. was (Author: lostluck): The data channel is correctly multiplexing bundles. There's no other way to do the multiple streams thing in the current protocol and GRPC without the runner having multiple endpoints, or the process doing so (eg. Multiple SDK Harnesses per worker, which is how python handles it). I think I have a resolution for state backed iterables blocking the datachannel, which will work for any runners that support datasource split requests. If the data channel is eventually split down to a the current value and no more, we can close the reader, which will cause the channel to be unblocked. Any buffered data will be drained. Care needs to be taken to avoid deadlocking or dataloss or race conditions, but there should only be lock contention when the Split thread is closing the reader. Edit: I wasn't able to confirm that this actually worked better, and even though there was no material locking overhead, the additional complexity to that part of the code isn't worth questionable benefits. Tabling for now. > [Go SDK] State Backed Iterables > --- > > Key: BEAM-7726 > URL: https://issues.apache.org/jira/browse/BEAM-7726 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Affects Versions: Not applicable >Reporter: Robert Burke >Assignee: Robert Burke >Priority: Major > Fix For: Not applicable > > Time Spent: 3h > Remaining Estimate: 0h > > The Go SDK should support the State backed iterables protocol per the proto. > [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L644] > > Primary case is for iterables after CoGBKs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-7726) [Go SDK] State Backed Iterables
[ https://issues.apache.org/jira/browse/BEAM-7726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17004337#comment-17004337 ] Robert Burke edited comment on BEAM-7726 at 2/4/20 10:45 PM: - The data channel is correctly multiplexing bundles. There's no other way to do the multiple streams thing in the current protocol and GRPC without the runner having multiple endpoints, or the process doing so (eg. Multiple SDK Harnesses per worker, which is how python handles it). I think I have a resolution for state backed iterables blocking the datachannel, which will work for any runners that support datasource split requests. If the data channel is eventually split down to a the current value and no more, we can close the reader, which will cause the channel to be unblocked. Any buffered data will be drained. Care needs to be taken to avoid deadlocking or dataloss or race conditions, but there should only be lock contention when the Split thread is closing the reader. Edit: I wasn't able to confirm that this actually worked better, and even though there was no material locking overhead, the additional complexity to that part of the code isn't worth questionable benefits. Tabling for now. was (Author: lostluck): The data channel is correctly multiplexing bundles. There's no other way to do the multiple streams thing in the current protocol and GRPC without the runner having multiple endpoints, or the process doing so (eg. Multiple SDK Harnesses per worker, which is how python handles it). I think I have a resolution for state backed iterables blocking the datachannel, which will work for any runners that support datasource split requests. If the data channel is eventually split down to a the current value and no more, we can close the reader, which will cause the channel to be unblocked. Any buffered data will be drained. Care needs to be taken to avoid deadlocking or dataloss or race conditions, but there should only be lock contention when the Split thread is closing the reader. > [Go SDK] State Backed Iterables > --- > > Key: BEAM-7726 > URL: https://issues.apache.org/jira/browse/BEAM-7726 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Affects Versions: Not applicable >Reporter: Robert Burke >Assignee: Robert Burke >Priority: Major > Fix For: Not applicable > > Time Spent: 3h > Remaining Estimate: 0h > > The Go SDK should support the State backed iterables protocol per the proto. > [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L644] > > Primary case is for iterables after CoGBKs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (BEAM-9233) Go: unregistered Go functions fail when using -buildmode=pie -ldflags=-w
[ https://issues.apache.org/jira/browse/BEAM-9233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke closed BEAM-9233. -- Fix Version/s: Not applicable Resolution: Fixed Fixed by linked patch. Thanks! > Go: unregistered Go functions fail when using -buildmode=pie -ldflags=-w > > > Key: BEAM-9233 > URL: https://issues.apache.org/jira/browse/BEAM-9233 > Project: Beam > Issue Type: Bug > Components: sdk-go > Environment: GNU/Linux >Reporter: Ian Lance Taylor >Priority: Major > Fix For: Not applicable > > Time Spent: 1h 10m > Remaining Estimate: 0h > > If a Go program is built with -buildmode=pie -ldflags=-w, the code that > transfers an unregistered function fails. It tries to look up the symbol in > the DWARF debug info, but that info has been stripped because of the -w flag. > This causes a program crash when calling the function. > I have a patch for this problem that I will send shortly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9233) Go: unregistered Go functions fail when using -buildmode=pie -ldflags=-w
[ https://issues.apache.org/jira/browse/BEAM-9233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke updated BEAM-9233: --- Affects Version/s: (was: 2.18.0) > Go: unregistered Go functions fail when using -buildmode=pie -ldflags=-w > > > Key: BEAM-9233 > URL: https://issues.apache.org/jira/browse/BEAM-9233 > Project: Beam > Issue Type: Bug > Components: sdk-go > Environment: GNU/Linux >Reporter: Ian Lance Taylor >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > If a Go program is built with -buildmode=pie -ldflags=-w, the code that > transfers an unregistered function fails. It tries to look up the symbol in > the DWARF debug info, but that info has been stripped because of the -w flag. > This causes a program crash when calling the function. > I have a patch for this problem that I will send shortly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-6498) Consider using sync/atomic for Go SDK metrics.
[ https://issues.apache.org/jira/browse/BEAM-6498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke resolved BEAM-6498. Fix Version/s: Not applicable Resolution: Fixed Resolved in [GitHub Pull Request #10654|https://github.com/apache/beam/pull/10654] instead. In particular counters were updated to use atomics, and the lock adds ~10ns for the other two types, which is fine given they do more work. > Consider using sync/atomic for Go SDK metrics. > -- > > Key: BEAM-6498 > URL: https://issues.apache.org/jira/browse/BEAM-6498 > Project: Beam > Issue Type: Sub-task > Components: sdk-go >Affects Versions: Not applicable >Reporter: Robert Burke >Assignee: Robert Burke >Priority: Minor > Fix For: Not applicable > > > Changing a portion of the metrics code to use the atomic counters might yield > a performance improvement and the opportunity to remove a lock or two. > Care needs to be taken though: > [https://stackoverflow.com/questions/47445344/is-there-a-difference-in-go-between-a-counter-using-atomic-operations-and-one-us] > The outcome of this task is a benchmark demonstrating the benefit (or > detriment) in a quasi-real situation for the Go SDK, and if warranted > switching metrics where possible, to use atomics. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-6498) Consider using sync/atomic for Go SDK metrics.
[ https://issues.apache.org/jira/browse/BEAM-6498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke reassigned BEAM-6498: -- Assignee: Robert Burke > Consider using sync/atomic for Go SDK metrics. > -- > > Key: BEAM-6498 > URL: https://issues.apache.org/jira/browse/BEAM-6498 > Project: Beam > Issue Type: Sub-task > Components: sdk-go >Affects Versions: Not applicable >Reporter: Robert Burke >Assignee: Robert Burke >Priority: Minor > > Changing a portion of the metrics code to use the atomic counters might yield > a performance improvement and the opportunity to remove a lock or two. > Care needs to be taken though: > [https://stackoverflow.com/questions/47445344/is-there-a-difference-in-go-between-a-counter-using-atomic-operations-and-one-us] > The outcome of this task is a benchmark demonstrating the benefit (or > detriment) in a quasi-real situation for the Go SDK, and if warranted > switching metrics where possible, to use atomics. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9167) Reduce overhead of Go SDK side metrics
[ https://issues.apache.org/jira/browse/BEAM-9167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke updated BEAM-9167: --- Parent: BEAM-4725 Issue Type: Sub-task (was: Improvement) > Reduce overhead of Go SDK side metrics > -- > > Key: BEAM-9167 > URL: https://issues.apache.org/jira/browse/BEAM-9167 > Project: Beam > Issue Type: Sub-task > Components: sdk-go >Reporter: Robert Burke >Assignee: Robert Burke >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > Locking overhead due to the global store and local caches of SDK counter data > can dominate certain workloads, which means we can do better. > Instead of having a global store of metrics data to extract counters, we > should use per ptransform (or per bundle) counter sets, which would avoid > requiring locking per counter operation. The main detriment compared to the > current implementation is that a user would need to add their own locking if > they were to spawn multiple goroutines to process a Bundle's work in a DoFn. > Given that self multithreaded DoFns aren't recommended/safe in Java, largely > impossible in Python, and the other beam Go SDK provided constructs (like > Iterators and Emitters) are not thread safe, this is a small concern, > provided the documentation is clear on this. > Removing the locking and switching to atomic ops reduces the overhead > significantly in example jobs and in the benchmarks. > A second part of this change should be to move the exec package to manage > it's own per bundle state, rather than relying on a global datastore to > extract the per bundle,per ptransform values. > Related: https://issues.apache.org/jira/browse/BEAM-6541 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-6148) Support Go "Unit" tests on arbitrary runners
[ https://issues.apache.org/jira/browse/BEAM-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke resolved BEAM-6148. Fix Version/s: Not applicable Resolution: Fixed > Support Go "Unit" tests on arbitrary runners > > > Key: BEAM-6148 > URL: https://issues.apache.org/jira/browse/BEAM-6148 > Project: Beam > Issue Type: New Feature > Components: sdk-go >Reporter: Robert Burke >Assignee: Robert Burke >Priority: Major > Fix For: Not applicable > > Time Spent: 0.5h > Remaining Estimate: 0h > > There's no clear path to testing pipelines on runners other than the direct > runner. It should be possibly to "redirect" tests to use a runner of choice. > This would enable more "testy" ValidatesRunner tests in Go. > > In particular, users should need to at least _ import the runner they want, > and be able to set a flag. > The tricky bit is ensuring beam.Init is called so that each individual test > can convert to WorkerMode when it's spun up as a SDK harness. This can be > done by having a TestMain. > ptest should provide convenience functions to help with this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-6371) Add support for reading and writing to CSV files
[ https://issues.apache.org/jira/browse/BEAM-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke reassigned BEAM-6371: -- Assignee: (was: Robert Burke) > Add support for reading and writing to CSV files > > > Key: BEAM-6371 > URL: https://issues.apache.org/jira/browse/BEAM-6371 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Andrew Brampton >Priority: Major > > A very simple CSV Reader and Writer could be created, similar to [this > one|https://github.com/bramp/morebeam/tree/master/csvio]. > It would support reading a header, and support similar options to the > standard [go csv package|https://golang.org/pkg/encoding/csv/]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-5354) Side Inputs seems to be non-working in the sdk-go
[ https://issues.apache.org/jira/browse/BEAM-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke resolved BEAM-5354. Fix Version/s: Not applicable Resolution: Fixed > Side Inputs seems to be non-working in the sdk-go > - > > Key: BEAM-5354 > URL: https://issues.apache.org/jira/browse/BEAM-5354 > Project: Beam > Issue Type: Bug > Components: sdk-go >Reporter: Tomas Roos >Assignee: Robert Burke >Priority: Major > Fix For: Not applicable > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Running the contains example fails with > > {code:java} > Output i0 for step was not found. > {code} > This is because of the call to debug.Head (which internally uses SideInput) > Removing the following line > [https://github.com/apache/beam/blob/master/sdks/go/examples/contains/contains.go#L50] > > The pipeline executes well. > > Executed on id's > > go-job-1-1536664417610678545 > vs > go-job-1-1536664934354466938 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-7928) Being able to specify disk type and disk size
[ https://issues.apache.org/jira/browse/BEAM-7928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021366#comment-17021366 ] Robert Burke commented on BEAM-7928: This is also about running templates of Go SDK jobs on Dataflow, which hasn't been tested at all. As per the usual, Dataflow doesn't currently support the Go SDK, so it's lucky if it works rather than intent. > Being able to specify disk type and disk size > - > > Key: BEAM-7928 > URL: https://issues.apache.org/jira/browse/BEAM-7928 > Project: Beam > Issue Type: New Feature > Components: sdk-go >Reporter: Thomas >Priority: Major > > Hi everyone, > I'm willing to launch a job from a template, so I'm using > [https://godoc.org/google.golang.org/api/dataflow/v1b3#CreateJobFromTemplateRequest] > and then I call the `Create` method. > With this (particularly inside `RuntimeEnvironment` type) I'm able to specify > the machine type and so on, but I'm unable to precise disk settings (type and > size). > > Do you think such settings could be there also? Or do I need to define them > with another way? > > Thank you, -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-7928) Being able to specify disk type and disk size
[ https://issues.apache.org/jira/browse/BEAM-7928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021365#comment-17021365 ] Robert Burke commented on BEAM-7928: Apparently I don't get emails from JIRA anymore. Adding new options/flags and ensuring they're plumbed through isn't difficult to do though. See [https://github.com/apache/beam/pull/9906] for an example PR to doing so. I'd be happy to review if you mention me: @lostluck > Being able to specify disk type and disk size > - > > Key: BEAM-7928 > URL: https://issues.apache.org/jira/browse/BEAM-7928 > Project: Beam > Issue Type: New Feature > Components: sdk-go >Reporter: Thomas >Priority: Major > > Hi everyone, > I'm willing to launch a job from a template, so I'm using > [https://godoc.org/google.golang.org/api/dataflow/v1b3#CreateJobFromTemplateRequest] > and then I call the `Create` method. > With this (particularly inside `RuntimeEnvironment` type) I'm able to specify > the machine type and so on, but I'm unable to precise disk settings (type and > size). > > Do you think such settings could be there also? Or do I need to define them > with another way? > > Thank you, -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (BEAM-8166) Support Graceful shutdown of worker harness.
[ https://issues.apache.org/jira/browse/BEAM-8166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke closed BEAM-8166. -- Fix Version/s: Not applicable Resolution: Fixed > Support Graceful shutdown of worker harness. > > > Key: BEAM-8166 > URL: https://issues.apache.org/jira/browse/BEAM-8166 > Project: Beam > Issue Type: Improvement > Components: runner-core, sdk-go >Reporter: Robert Burke >Assignee: Robert Burke >Priority: Minor > Fix For: Not applicable > > Time Spent: 50m > Remaining Estimate: 0h > > Ideally there should be a clear Shutdown control RPC a runner can send a > worker harness to trigger an orderly shutdown. > Absent that, errors on the runner side shouldn't manifest as SDK worker > harness errors. SDKs should log, and gracefully shutdown from GRPC errors. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-8166) Support Graceful shutdown of worker harness.
[ https://issues.apache.org/jira/browse/BEAM-8166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke reassigned BEAM-8166: -- Assignee: Robert Burke > Support Graceful shutdown of worker harness. > > > Key: BEAM-8166 > URL: https://issues.apache.org/jira/browse/BEAM-8166 > Project: Beam > Issue Type: Improvement > Components: runner-core, sdk-go >Reporter: Robert Burke >Assignee: Robert Burke >Priority: Minor > Time Spent: 50m > Remaining Estimate: 0h > > Ideally there should be a clear Shutdown control RPC a runner can send a > worker harness to trigger an orderly shutdown. > Absent that, errors on the runner side shouldn't manifest as SDK worker > harness errors. SDKs should log, and gracefully shutdown from GRPC errors. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (BEAM-6541) Consider converting bundle & ptransform ids to ints eagerly.
[ https://issues.apache.org/jira/browse/BEAM-6541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke closed BEAM-6541. -- Fix Version/s: Not applicable Assignee: Robert Burke Resolution: Won't Fix I'm taking a different approach in https://issues.apache.org/jira/browse/BEAM-9167 which better relies on the structure bundles and ptransforms to reduce the overhead. Granted, I'm also using the technique mentioned here, but with hashing the metric names rather than the higher level structs. > Consider converting bundle & ptransform ids to ints eagerly. > > > Key: BEAM-6541 > URL: https://issues.apache.org/jira/browse/BEAM-6541 > Project: Beam > Issue Type: Sub-task > Components: sdk-go >Reporter: Robert Burke >Assignee: Robert Burke >Priority: Minor > Fix For: Not applicable > > > BundleIDs and PTransformIDs necessary for communicating with the Runner > interface in the go SDK are currently strings, and used as is for metrics > contexts. We use them for getting bundle & ptransform specific metrics, and > transmitting the same. We could instead eagerly assign them a local index > that is then converted out when communicating metrics over the FnAPI, this > would reduce overhead on metric lookups in the various maps. > Note: the same could be done for the user's metric-name, completing the > optimization. Measuring the per-report overhead for tentative/final metric > reporting is required before committing to this approach. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9167) Reduce overhead of Go SDK side metrics
[ https://issues.apache.org/jira/browse/BEAM-9167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke updated BEAM-9167: --- Description: Locking overhead due to the global store and local caches of SDK counter data can dominate certain workloads, which means we can do better. Instead of having a global store of metrics data to extract counters, we should use per ptransform (or per bundle) counter sets, which would avoid requiring locking per counter operation. The main detriment compared to the current implementation is that a user would need to add their own locking if they were to spawn multiple goroutines to process a Bundle's work in a DoFn. Given that self multithreaded DoFns aren't recommended/safe in Java, largely impossible in Python, and the other beam Go SDK provided constructs (like Iterators and Emitters) are not thread safe, this is a small concern, provided the documentation is clear on this. Removing the locking and switching to atomic ops reduces the overhead significantly in example jobs and in the benchmarks. A second part of this change should be to move the exec package to manage it's own per bundle state, rather than relying on a global datastore to extract the per bundle,per ptransform values. Related: https://issues.apache.org/jira/browse/BEAM-6541 was: Locking overhead due to the global store and local caches of SDK counter data can dominate certain workloads, which means we can do better. Instead of having a global store of metrics data to extract counters, we should use per ptransform (or per bundle) counter sets, which would avoid requiring locking per counter operation. The main detriment compared to the current implementation is that a user would need to add their own locking if they were to spawn multiple goroutines to process a Bundle's work in a DoFn. Given that self multithreaded DoFns aren't recommended/safe in Java, largely impossible in Python, and the other beam Go SDK provided constructs (like Iterators and Emitters) are not thread safe, this is a small concern, provided the documentation is clear on this. Removing the locking and switching to atomic ops reduces the overhead significantly in example jobs and in the benchmarks. Related: https://issues.apache.org/jira/browse/BEAM-6541 > Reduce overhead of Go SDK side metrics > -- > > Key: BEAM-9167 > URL: https://issues.apache.org/jira/browse/BEAM-9167 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Robert Burke >Assignee: Robert Burke >Priority: Major > > Locking overhead due to the global store and local caches of SDK counter data > can dominate certain workloads, which means we can do better. > Instead of having a global store of metrics data to extract counters, we > should use per ptransform (or per bundle) counter sets, which would avoid > requiring locking per counter operation. The main detriment compared to the > current implementation is that a user would need to add their own locking if > they were to spawn multiple goroutines to process a Bundle's work in a DoFn. > Given that self multithreaded DoFns aren't recommended/safe in Java, largely > impossible in Python, and the other beam Go SDK provided constructs (like > Iterators and Emitters) are not thread safe, this is a small concern, > provided the documentation is clear on this. > Removing the locking and switching to atomic ops reduces the overhead > significantly in example jobs and in the benchmarks. > A second part of this change should be to move the exec package to manage > it's own per bundle state, rather than relying on a global datastore to > extract the per bundle,per ptransform values. > Related: https://issues.apache.org/jira/browse/BEAM-6541 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-9167) Reduce overhead of Go SDK side metrics
[ https://issues.apache.org/jira/browse/BEAM-9167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke reassigned BEAM-9167: -- Assignee: Robert Burke > Reduce overhead of Go SDK side metrics > -- > > Key: BEAM-9167 > URL: https://issues.apache.org/jira/browse/BEAM-9167 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Robert Burke >Assignee: Robert Burke >Priority: Major > > Locking overhead due to the global store and local caches of SDK counter data > can dominate certain workloads, which means we can do better. > Instead of having a global store of metrics data to extract counters, we > should use per ptransform (or per bundle) counter sets, which would avoid > requiring locking per counter operation. The main detriment compared to the > current implementation is that a user would need to add their own locking if > they were to spawn multiple goroutines to process a Bundle's work in a DoFn. > Given that self multithreaded DoFns aren't recommended/safe in Java, largely > impossible in Python, and the other beam Go SDK provided constructs (like > Iterators and Emitters) are not thread safe, this is a small concern, > provided the documentation is clear on this. > Removing the locking and switching to atomic ops reduces the overhead > significantly in example jobs and in the benchmarks. > Related: https://issues.apache.org/jira/browse/BEAM-6541 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9167) Reduce overhead of Go SDK side metrics
Robert Burke created BEAM-9167: -- Summary: Reduce overhead of Go SDK side metrics Key: BEAM-9167 URL: https://issues.apache.org/jira/browse/BEAM-9167 Project: Beam Issue Type: Improvement Components: sdk-go Reporter: Robert Burke Locking overhead due to the global store and local caches of SDK counter data can dominate certain workloads, which means we can do better. Instead of having a global store of metrics data to extract counters, we should use per ptransform (or per bundle) counter sets, which would avoid requiring locking per counter operation. The main detriment compared to the current implementation is that a user would need to add their own locking if they were to spawn multiple goroutines to process a Bundle's work in a DoFn. Given that self multithreaded DoFns aren't recommended/safe in Java, largely impossible in Python, and the other beam Go SDK provided constructs (like Iterators and Emitters) are not thread safe, this is a small concern, provided the documentation is clear on this. Removing the locking and switching to atomic ops reduces the overhead significantly in example jobs and in the benchmarks. Related: https://issues.apache.org/jira/browse/BEAM-6541 -- This message was sent by Atlassian Jira (v8.3.4#803005)