[jira] [Work logged] (BEAM-3855) Add Go SDK support for protobuf coder
[ https://issues.apache.org/jira/browse/BEAM-3855?focusedWorklogId=83226=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83226 ] ASF GitHub Bot logged work on BEAM-3855: Author: ASF GitHub Bot Created on: 22/Mar/18 16:34 Start Date: 22/Mar/18 16:34 Worklog Time Spent: 10m Work Description: tgroh closed pull request #4908: BEAM-3855: Add Protocol Buffer support URL: https://github.com/apache/beam/pull/4908 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/sdks/go/pkg/beam/coder.go b/sdks/go/pkg/beam/coder.go index db0c06a3991..236347a6cd7 100644 --- a/sdks/go/pkg/beam/coder.go +++ b/sdks/go/pkg/beam/coder.go @@ -25,11 +25,14 @@ import ( "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/coderx" "github.com/apache/beam/sdks/go/pkg/beam/core/typex" "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/golang/protobuf/proto" ) func init() { RegisterFunction(JSONDec) RegisterFunction(JSONEnc) + RegisterFunction(ProtoEnc) + RegisterFunction(ProtoDec) } // Coder defines how to encode and decode values of type 'A' into byte streams. @@ -87,6 +90,8 @@ func NewCoder(t FullType) Coder { return Coder{c} } +var protoMessageType = reflect.TypeOf((*proto.Message)(nil)).Elem() + func inferCoder(t FullType) (*coder.Coder, error) { switch t.Class() { case typex.Concrete, typex.Container: @@ -108,6 +113,16 @@ func inferCoder(t FullType) (*coder.Coder, error) { // conversions at runtime in inconvenient places. return {Kind: coder.Bytes, T: t}, nil default: + // TODO(BEAM-3306): the coder registry should be consulted here for user + // specified types and their coders. + if t.Type().Implements(protoMessageType) { + c, err := newProtoCoder(t.Type()) + if err != nil { + return nil, err + } + return {Kind: coder.Custom, T: t, Custom: c}, nil + } + c, err := newJSONCoder(t.Type()) if err != nil { return nil, err @@ -154,6 +169,26 @@ func inferCoders(list []FullType) ([]*coder.Coder, error) { // form that doesn't require LengthPrefix'ing to cut up the bytestream from // the FnHarness. +func ProtoEnc(in typex.T) ([]byte, error) { + return proto.Marshal(in.(proto.Message)) +} + +func ProtoDec(t reflect.Type, in []byte) (typex.T, error) { + val := reflect.New(t.Elem()).Interface().(proto.Message) + if err := proto.Unmarshal(in, val); err != nil { + return nil, err + } + return val, nil +} + +func newProtoCoder(t reflect.Type) (*coder.CustomCoder, error) { + c, err := coder.NewCustomCoder("proto", t, ProtoEnc, ProtoDec) + if err != nil { + return nil, fmt.Errorf("invalid coder: %v", err) + } + return c, nil +} + // Concrete and universal custom coders both have a similar signature. // Conversion is handled by reflection. diff --git a/sdks/go/pkg/beam/core/typex/class.go b/sdks/go/pkg/beam/core/typex/class.go index 8dfa79fbd54..a477ecf5f1d 100644 --- a/sdks/go/pkg/beam/core/typex/class.go +++ b/sdks/go/pkg/beam/core/typex/class.go @@ -20,6 +20,8 @@ import ( "reflect" "unicode" "unicode/utf8" + + "github.com/golang/protobuf/proto" ) // Class is the type "class" of data as distinguished by the runtime. The class @@ -45,6 +47,8 @@ const ( Composite ) +var protoMessageType = reflect.TypeOf((*proto.Message)(nil)).Elem() + // TODO(herohde) 5/16/2017: maybe we should add more classes, so that every // reasonable type (such as error) is not Invalid, even though it is not // valid in FullType. "Special", say? Right now, a valid DoFn signature may @@ -76,6 +80,12 @@ func IsConcrete(t reflect.Type) bool { return false } + // TODO(BEAM-3306): the coder registry should be consulted here for user + // specified types and their coders. + if t.Implements(protoMessageType) { + return true + } + switch t.Kind() { case reflect.Invalid, reflect.UnsafePointer, reflect.Uintptr, reflect.Interface: return false // no unmanageable types This is an automated message from the Apache Git Service. To
[jira] [Work logged] (BEAM-3855) Add Go SDK support for protobuf coder
[ https://issues.apache.org/jira/browse/BEAM-3855?focusedWorklogId=83215=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83215 ] ASF GitHub Bot logged work on BEAM-3855: Author: ASF GitHub Bot Created on: 22/Mar/18 16:02 Start Date: 22/Mar/18 16:02 Worklog Time Spent: 10m Work Description: herohde commented on issue #4908: BEAM-3855: Add Protocol Buffer support URL: https://github.com/apache/beam/pull/4908#issuecomment-375360407 R: @tgroh This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 83215) Time Spent: 1h 50m (was: 1h 40m) > Add Go SDK support for protobuf coder > - > > Key: BEAM-3855 > URL: https://issues.apache.org/jira/browse/BEAM-3855 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Willy Lulciuc >Assignee: Bill Neubauer >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > This JIRA is for something functional. We might want to use the coder > registry for a more general solution, when implemented. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3855) Add Go SDK support for protobuf coder
[ https://issues.apache.org/jira/browse/BEAM-3855?focusedWorklogId=83013=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83013 ] ASF GitHub Bot logged work on BEAM-3855: Author: ASF GitHub Bot Created on: 22/Mar/18 01:09 Start Date: 22/Mar/18 01:09 Worklog Time Spent: 10m Work Description: herohde commented on issue #4908: BEAM-3855: Add Protocol Buffer support URL: https://github.com/apache/beam/pull/4908#issuecomment-375145473 Yep This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 83013) Time Spent: 1h 40m (was: 1.5h) > Add Go SDK support for protobuf coder > - > > Key: BEAM-3855 > URL: https://issues.apache.org/jira/browse/BEAM-3855 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Willy Lulciuc >Assignee: Bill Neubauer >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > This JIRA is for something functional. We might want to use the coder > registry for a more general solution, when implemented. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3855) Add Go SDK support for protobuf coder
[ https://issues.apache.org/jira/browse/BEAM-3855?focusedWorklogId=83012=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83012 ] ASF GitHub Bot logged work on BEAM-3855: Author: ASF GitHub Bot Created on: 22/Mar/18 01:03 Start Date: 22/Mar/18 01:03 Worklog Time Spent: 10m Work Description: wcn3 commented on issue #4908: BEAM-3855: Add Protocol Buffer support URL: https://github.com/apache/beam/pull/4908#issuecomment-375144582 Got it! Said differently, Concrete will just mean the set of types that can be encoded "natively" and the new type, say UserCoded, will capture the concept that's been introduced into IsConcrete. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 83012) Time Spent: 1.5h (was: 1h 20m) > Add Go SDK support for protobuf coder > - > > Key: BEAM-3855 > URL: https://issues.apache.org/jira/browse/BEAM-3855 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Willy Lulciuc >Assignee: Bill Neubauer >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > This JIRA is for something functional. We might want to use the coder > registry for a more general solution, when implemented. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3855) Add Go SDK support for protobuf coder
[ https://issues.apache.org/jira/browse/BEAM-3855?focusedWorklogId=83010=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83010 ] ASF GitHub Bot logged work on BEAM-3855: Author: ASF GitHub Bot Created on: 22/Mar/18 00:59 Start Date: 22/Mar/18 00:59 Worklog Time Spent: 10m Work Description: herohde commented on issue #4908: BEAM-3855: Add Protocol Buffer support URL: https://github.com/apache/beam/pull/4908#issuecomment-375144050 LGTM. By "non-concrete codeable types", I meant types that require a user-provided coder to work (= are currently classified as Invalid). Protobuf is one example. I expect some interfaces will also fall into this category. While protobufs might play well with JSON as fields, it won't be the case in general. That's why I'm thinking we may want to add a new class to handle such types. But that is something to consider as a part of a proper coder registry. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 83010) Time Spent: 1h 20m (was: 1h 10m) > Add Go SDK support for protobuf coder > - > > Key: BEAM-3855 > URL: https://issues.apache.org/jira/browse/BEAM-3855 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Willy Lulciuc >Assignee: Bill Neubauer >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > This JIRA is for something functional. We might want to use the coder > registry for a more general solution, when implemented. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3855) Add Go SDK support for protobuf coder
[ https://issues.apache.org/jira/browse/BEAM-3855?focusedWorklogId=83004=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83004 ] ASF GitHub Bot logged work on BEAM-3855: Author: ASF GitHub Bot Created on: 22/Mar/18 00:39 Start Date: 22/Mar/18 00:39 Worklog Time Spent: 10m Work Description: wcn3 commented on issue #4908: BEAM-3855: Add Protocol Buffer support URL: https://github.com/apache/beam/pull/4908#issuecomment-375141079 This version has been tested internally to Google. The decode routine did have a bug where the instantiated type wasn't correct. This is resolved and I've validated the messages are serialized and deserialized using the proto codings. I'll still continue to develop a Cloud Dataflow suite, but if we want to submit this now, I feel confident in this code. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 83004) Time Spent: 1h 10m (was: 1h) > Add Go SDK support for protobuf coder > - > > Key: BEAM-3855 > URL: https://issues.apache.org/jira/browse/BEAM-3855 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Willy Lulciuc >Assignee: Bill Neubauer >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > This JIRA is for something functional. We might want to use the coder > registry for a more general solution, when implemented. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3855) Add Go SDK support for protobuf coder
[ https://issues.apache.org/jira/browse/BEAM-3855?focusedWorklogId=82961=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82961 ] ASF GitHub Bot logged work on BEAM-3855: Author: ASF GitHub Bot Created on: 21/Mar/18 22:12 Start Date: 21/Mar/18 22:12 Worklog Time Spent: 10m Work Description: wcn3 commented on issue #4908: BEAM-3855: Add Protocol Buffer support URL: https://github.com/apache/beam/pull/4908#issuecomment-375113157 Added the registrations. I am still working on a test case running against Cloud Dataflow to verify the serialization actually works. The direct runner only requires the IsConcrete method to be working in order to handle protocol buffers; I need to test against a runner that will actually serialize data. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 82961) Time Spent: 1h (was: 50m) > Add Go SDK support for protobuf coder > - > > Key: BEAM-3855 > URL: https://issues.apache.org/jira/browse/BEAM-3855 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Willy Lulciuc >Assignee: Bill Neubauer >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > This JIRA is for something functional. We might want to use the coder > registry for a more general solution, when implemented. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3855) Add Go SDK support for protobuf coder
[ https://issues.apache.org/jira/browse/BEAM-3855?focusedWorklogId=82940=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82940 ] ASF GitHub Bot logged work on BEAM-3855: Author: ASF GitHub Bot Created on: 21/Mar/18 21:42 Start Date: 21/Mar/18 21:42 Worklog Time Spent: 10m Work Description: wcn3 commented on issue #4908: BEAM-3855: Add Protocol Buffer support URL: https://github.com/apache/beam/pull/4908#issuecomment-375105611 I'm not following the points in the "On a similar vein" paragraph. If a Go struct has a proto message as a field, the resulting struct is still concrete and serializable. Unfortunately, it'd devolve to encoding the proto using the JSON coder based on the top-level type. This is unfortunate, but it would work. I'm not sure it needs to be disallowed. I see how having the coder registry would alleviate this situation: the top-level struct would be JSON encoded, but the proto message member would be looked up and use the proto encoding for that "subtree" I'm confused by the phrase "non-concrete codeable types." Of the non-concrete types, we have Containers, which are serialized based on its underlying Concrete type, Universals which aren't actually serialized, and Composites, which are a custom encoding based on the underlying types. What did you mean by this phrase? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 82940) Time Spent: 50m (was: 40m) > Add Go SDK support for protobuf coder > - > > Key: BEAM-3855 > URL: https://issues.apache.org/jira/browse/BEAM-3855 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Willy Lulciuc >Assignee: Bill Neubauer >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > This JIRA is for something functional. We might want to use the coder > registry for a more general solution, when implemented. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3855) Add Go SDK support for protobuf coder
[ https://issues.apache.org/jira/browse/BEAM-3855?focusedWorklogId=82938=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82938 ] ASF GitHub Bot logged work on BEAM-3855: Author: ASF GitHub Bot Created on: 21/Mar/18 21:41 Start Date: 21/Mar/18 21:41 Worklog Time Spent: 10m Work Description: wcn3 commented on a change in pull request #4908: BEAM-3855: Add Protocol Buffer support URL: https://github.com/apache/beam/pull/4908#discussion_r176247577 ## File path: sdks/go/pkg/beam/core/typex/class.go ## @@ -76,6 +78,13 @@ func IsConcrete(t reflect.Type) bool { return false } + // TODO(wcn): perhaps we need an extension hook to teach this method about Review comment: Done. Left similar comment in coder.go as well. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 82938) Time Spent: 0.5h (was: 20m) > Add Go SDK support for protobuf coder > - > > Key: BEAM-3855 > URL: https://issues.apache.org/jira/browse/BEAM-3855 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Willy Lulciuc >Assignee: Bill Neubauer >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > This JIRA is for something functional. We might want to use the coder > registry for a more general solution, when implemented. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3855) Add Go SDK support for protobuf coder
[ https://issues.apache.org/jira/browse/BEAM-3855?focusedWorklogId=82939=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82939 ] ASF GitHub Bot logged work on BEAM-3855: Author: ASF GitHub Bot Created on: 21/Mar/18 21:41 Start Date: 21/Mar/18 21:41 Worklog Time Spent: 10m Work Description: wcn3 commented on a change in pull request #4908: BEAM-3855: Add Protocol Buffer support URL: https://github.com/apache/beam/pull/4908#discussion_r176247586 ## File path: sdks/go/pkg/beam/core/typex/class.go ## @@ -76,6 +78,13 @@ func IsConcrete(t reflect.Type) bool { return false } + // TODO(wcn): perhaps we need an extension hook to teach this method about + // types that are "concrete" because they are already sourced by external code? + pt := reflect.TypeOf((*proto.Message)(nil)).Elem() Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 82939) Time Spent: 40m (was: 0.5h) > Add Go SDK support for protobuf coder > - > > Key: BEAM-3855 > URL: https://issues.apache.org/jira/browse/BEAM-3855 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Willy Lulciuc >Assignee: Bill Neubauer >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > This JIRA is for something functional. We might want to use the coder > registry for a more general solution, when implemented. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3855) Add Go SDK support for protobuf coder
[ https://issues.apache.org/jira/browse/BEAM-3855?focusedWorklogId=82401=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82401 ] ASF GitHub Bot logged work on BEAM-3855: Author: ASF GitHub Bot Created on: 20/Mar/18 18:40 Start Date: 20/Mar/18 18:40 Worklog Time Spent: 10m Work Description: herohde commented on a change in pull request #4908: BEAM-3855: Add Protocol Buffer support URL: https://github.com/apache/beam/pull/4908#discussion_r175877521 ## File path: sdks/go/pkg/beam/core/typex/class.go ## @@ -76,6 +78,13 @@ func IsConcrete(t reflect.Type) bool { return false } + // TODO(wcn): perhaps we need an extension hook to teach this method about Review comment: Perhaps link this TODO to the coder registry jira? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 82401) Time Spent: 20m (was: 10m) > Add Go SDK support for protobuf coder > - > > Key: BEAM-3855 > URL: https://issues.apache.org/jira/browse/BEAM-3855 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Willy Lulciuc >Assignee: Bill Neubauer >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > This JIRA is for something functional. We might want to use the coder > registry for a more general solution, when implemented. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3855) Add Go SDK support for protobuf coder
[ https://issues.apache.org/jira/browse/BEAM-3855?focusedWorklogId=82400=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82400 ] ASF GitHub Bot logged work on BEAM-3855: Author: ASF GitHub Bot Created on: 20/Mar/18 18:40 Start Date: 20/Mar/18 18:40 Worklog Time Spent: 10m Work Description: herohde commented on a change in pull request #4908: BEAM-3855: Add Protocol Buffer support URL: https://github.com/apache/beam/pull/4908#discussion_r175877897 ## File path: sdks/go/pkg/beam/core/typex/class.go ## @@ -76,6 +78,13 @@ func IsConcrete(t reflect.Type) bool { return false } + // TODO(wcn): perhaps we need an extension hook to teach this method about + // types that are "concrete" because they are already sourced by external code? + pt := reflect.TypeOf((*proto.Message)(nil)).Elem() Review comment: Small suggestion: made this a global similarly to other types elsewhere? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 82400) Time Spent: 10m Remaining Estimate: 0h > Add Go SDK support for protobuf coder > - > > Key: BEAM-3855 > URL: https://issues.apache.org/jira/browse/BEAM-3855 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Willy Lulciuc >Assignee: Bill Neubauer >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > This JIRA is for something functional. We might want to use the coder > registry for a more general solution, when implemented. -- This message was sent by Atlassian JIRA (v7.6.3#76005)