Re: Easy Multi-language via a SchemaTransform-aware Expansion Service
Hi All, The implementation of https://s.apache.org/easy-multi-language (with the dynamic API for Python) was merged and should be available with Beam 2.44.0: https://github.com/apache/beam/pull/23413 Thanks, Cham On Fri, Aug 19, 2022 at 3:35 PM Chamikara Jayalath wrote: > Hi All, > > Thanks for the comments so far. Seems like we generally agree on this > proposal. > > Please see https://github.com/apache/beam/pull/22802 for a prototype > implementation that adds the following. > > * Support for dynamically discovering and registering SchemaTransforms in > the Java expansion service. > * Support for dynamically discovering registered SchemaTransforms from the > Python side. > * Support for using SchemaTransforms in Python pipelines. > > Feel free to add more comments to the doc and/or the PR. > > Thanks, > Cham > > > > > > > > On Mon, Aug 8, 2022 at 9:34 PM Chamikara Jayalath > wrote: > >> I think the *DiscoverSchemaTransform()* RPC introduced in this proposal >> and the ability to easily deploy/use available *SchemaTransforms* using >> an expansion service essentially provide the tooling necessary for >> implementing such a service. Such a service could even startup expansion >> services to discover/list transforms available in given artifacts (for >> example, jar files). >> >> Thanks, >> Cham >> >> On Mon, Aug 8, 2022 at 3:48 PM Byron Ellis wrote: >> >>> I like that idea, sort of like Kafka’s Schema Service but for transforms? >>> >>> On Mon, Aug 8, 2022 at 2:45 PM Robert Bradshaw via dev < >>> dev@beam.apache.org> wrote: >>> This is a great idea. I would like to approach this from the perspective of making it easy to provide a catalog of well-defined transforms for use in expansion services from typical SDKs and also elsewhere (e.g. for documentation purposes, GUIs, etc.) Ideally everything about what a transform is (its config, documentation, expectations on inputs, etc.) can be specified programmatically in a way that's much easier to both author and consume than it is now. On Thu, Aug 4, 2022 at 6:51 PM Chamikara Jayalath via dev wrote: > > Hi All, > > I believe we can make the multi-language pipelines offering [1] much easier to use by updating the expansion service to be fully aware of SchemaTransforms. Additionally this will make it easy to register/discover/use transforms defined in one SDK from all other SDKs. Specifically we could add the following features. > > Expansion service can be used to easily initialize and expand transforms without need for additional code. > Expansion service can be used to easily discover already registered transforms. > Pipeline SDKs can generate user-friendly stub-APIs based on transforms registered with an expansion service, eliminating the need to develop language-specific wrappers. > > Please see here for my proposal: https://s.apache.org/easy-multi-language > > Lemme know if you have any comments/questions/suggestions :) > > Thanks, > Cham > > [1] https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines > >>>
Re: Easy Multi-language via a SchemaTransform-aware Expansion Service
Hi All, Thanks for the comments so far. Seems like we generally agree on this proposal. Please see https://github.com/apache/beam/pull/22802 for a prototype implementation that adds the following. * Support for dynamically discovering and registering SchemaTransforms in the Java expansion service. * Support for dynamically discovering registered SchemaTransforms from the Python side. * Support for using SchemaTransforms in Python pipelines. Feel free to add more comments to the doc and/or the PR. Thanks, Cham On Mon, Aug 8, 2022 at 9:34 PM Chamikara Jayalath wrote: > I think the *DiscoverSchemaTransform()* RPC introduced in this proposal > and the ability to easily deploy/use available *SchemaTransforms* using > an expansion service essentially provide the tooling necessary for > implementing such a service. Such a service could even startup expansion > services to discover/list transforms available in given artifacts (for > example, jar files). > > Thanks, > Cham > > On Mon, Aug 8, 2022 at 3:48 PM Byron Ellis wrote: > >> I like that idea, sort of like Kafka’s Schema Service but for transforms? >> >> On Mon, Aug 8, 2022 at 2:45 PM Robert Bradshaw via dev < >> dev@beam.apache.org> wrote: >> >>> This is a great idea. I would like to approach this from the >>> perspective of making it easy to provide a catalog of well-defined >>> transforms for use in expansion services from typical SDKs and also >>> elsewhere (e.g. for documentation purposes, GUIs, etc.) Ideally >>> everything about what a transform is (its config, documentation, >>> expectations on inputs, etc.) can be specified programmatically in a >>> way that's much easier to both author and consume than it is now. >>> >>> On Thu, Aug 4, 2022 at 6:51 PM Chamikara Jayalath via dev >>> wrote: >>> > >>> > Hi All, >>> > >>> > I believe we can make the multi-language pipelines offering [1] much >>> easier to use by updating the expansion service to be fully aware of >>> SchemaTransforms. Additionally this will make it easy to >>> register/discover/use transforms defined in one SDK from all other SDKs. >>> Specifically we could add the following features. >>> > >>> > Expansion service can be used to easily initialize and expand >>> transforms without need for additional code. >>> > Expansion service can be used to easily discover already registered >>> transforms. >>> > Pipeline SDKs can generate user-friendly stub-APIs based on transforms >>> registered with an expansion service, eliminating the need to develop >>> language-specific wrappers. >>> > >>> > Please see here for my proposal: >>> https://s.apache.org/easy-multi-language >>> > >>> > Lemme know if you have any comments/questions/suggestions :) >>> > >>> > Thanks, >>> > Cham >>> > >>> > [1] >>> https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines >>> > >>> >>
Re: Easy Multi-language via a SchemaTransform-aware Expansion Service
I think the *DiscoverSchemaTransform()* RPC introduced in this proposal and the ability to easily deploy/use available *SchemaTransforms* using an expansion service essentially provide the tooling necessary for implementing such a service. Such a service could even startup expansion services to discover/list transforms available in given artifacts (for example, jar files). Thanks, Cham On Mon, Aug 8, 2022 at 3:48 PM Byron Ellis wrote: > I like that idea, sort of like Kafka’s Schema Service but for transforms? > > On Mon, Aug 8, 2022 at 2:45 PM Robert Bradshaw via dev < > dev@beam.apache.org> wrote: > >> This is a great idea. I would like to approach this from the >> perspective of making it easy to provide a catalog of well-defined >> transforms for use in expansion services from typical SDKs and also >> elsewhere (e.g. for documentation purposes, GUIs, etc.) Ideally >> everything about what a transform is (its config, documentation, >> expectations on inputs, etc.) can be specified programmatically in a >> way that's much easier to both author and consume than it is now. >> >> On Thu, Aug 4, 2022 at 6:51 PM Chamikara Jayalath via dev >> wrote: >> > >> > Hi All, >> > >> > I believe we can make the multi-language pipelines offering [1] much >> easier to use by updating the expansion service to be fully aware of >> SchemaTransforms. Additionally this will make it easy to >> register/discover/use transforms defined in one SDK from all other SDKs. >> Specifically we could add the following features. >> > >> > Expansion service can be used to easily initialize and expand >> transforms without need for additional code. >> > Expansion service can be used to easily discover already registered >> transforms. >> > Pipeline SDKs can generate user-friendly stub-APIs based on transforms >> registered with an expansion service, eliminating the need to develop >> language-specific wrappers. >> > >> > Please see here for my proposal: >> https://s.apache.org/easy-multi-language >> > >> > Lemme know if you have any comments/questions/suggestions :) >> > >> > Thanks, >> > Cham >> > >> > [1] >> https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines >> > >> >
Re: Easy Multi-language via a SchemaTransform-aware Expansion Service
I like that idea, sort of like Kafka’s Schema Service but for transforms? On Mon, Aug 8, 2022 at 2:45 PM Robert Bradshaw via dev wrote: > This is a great idea. I would like to approach this from the > perspective of making it easy to provide a catalog of well-defined > transforms for use in expansion services from typical SDKs and also > elsewhere (e.g. for documentation purposes, GUIs, etc.) Ideally > everything about what a transform is (its config, documentation, > expectations on inputs, etc.) can be specified programmatically in a > way that's much easier to both author and consume than it is now. > > On Thu, Aug 4, 2022 at 6:51 PM Chamikara Jayalath via dev > wrote: > > > > Hi All, > > > > I believe we can make the multi-language pipelines offering [1] much > easier to use by updating the expansion service to be fully aware of > SchemaTransforms. Additionally this will make it easy to > register/discover/use transforms defined in one SDK from all other SDKs. > Specifically we could add the following features. > > > > Expansion service can be used to easily initialize and expand transforms > without need for additional code. > > Expansion service can be used to easily discover already registered > transforms. > > Pipeline SDKs can generate user-friendly stub-APIs based on transforms > registered with an expansion service, eliminating the need to develop > language-specific wrappers. > > > > Please see here for my proposal: > https://s.apache.org/easy-multi-language > > > > Lemme know if you have any comments/questions/suggestions :) > > > > Thanks, > > Cham > > > > [1] > https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines > > >
Re: Easy Multi-language via a SchemaTransform-aware Expansion Service
This is a great idea. I would like to approach this from the perspective of making it easy to provide a catalog of well-defined transforms for use in expansion services from typical SDKs and also elsewhere (e.g. for documentation purposes, GUIs, etc.) Ideally everything about what a transform is (its config, documentation, expectations on inputs, etc.) can be specified programmatically in a way that's much easier to both author and consume than it is now. On Thu, Aug 4, 2022 at 6:51 PM Chamikara Jayalath via dev wrote: > > Hi All, > > I believe we can make the multi-language pipelines offering [1] much easier > to use by updating the expansion service to be fully aware of > SchemaTransforms. Additionally this will make it easy to > register/discover/use transforms defined in one SDK from all other SDKs. > Specifically we could add the following features. > > Expansion service can be used to easily initialize and expand transforms > without need for additional code. > Expansion service can be used to easily discover already registered > transforms. > Pipeline SDKs can generate user-friendly stub-APIs based on transforms > registered with an expansion service, eliminating the need to develop > language-specific wrappers. > > Please see here for my proposal: https://s.apache.org/easy-multi-language > > Lemme know if you have any comments/questions/suggestions :) > > Thanks, > Cham > > [1] > https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines >
Re: Easy Multi-language via a SchemaTransform-aware Expansion Service
Indeed, there's nothing stopping you from doing codegen where it's useful but I think it's probably easier to implement codegen from dynamic than it is to go the other way around (Avro vs Proto) On Fri, Aug 5, 2022 at 1:15 PM Chamikara Jayalath wrote: > > > On Fri, Aug 5, 2022 at 12:00 PM Byron Ellis wrote: > >> I think there are some practical advantages to having the ability to >> support a dynamic version---at previous places where I've worked having >> Kafka's Schema Service was incredibly useful for data processing (it was a >> Java/Scala shop and we mostly used a "decode to POJO" approach rather than >> codegen.) >> > > Yeah, that's my thought as well. I think it will be pretty useful during > development/testing cycles, especially if we push code generation to the > release time. Also, it will be useful for trying out any SchemaTransforms > developed/released by third parties where generated stubs might not be > available. > > >> >> On Fri, Aug 5, 2022 at 10:08 AM Chamikara Jayalath via dev < >> dev@beam.apache.org> wrote: >> >>> >>> >>> On Fri, Aug 5, 2022 at 9:44 AM Brian Hulette >>> wrote: >>> Thanks Cham! I really like the proposal, I left a few comments. I also had one higher-level point I wanted to elevate here: > Pipeline SDKs can generate user-friendly stub-APIs based on transforms registered with an expansion service, eliminating the need to develop language-specific wrappers. This would be great! I think one point to consider is whether we can do this statically. We could package up these stubs with releases and include them in API docs for each language, making them much more discoverable. That could be an extension on top of your proposal (e.g. as part of its build, each SDK spins up other known expansion services and generates code based on the discovery responses), but maybe it could be cleaner if we don't really need the dynamic version? >>> >>> So my proposal suggested two solutions for wrappers. >>> * A higher level (dynamic) API (SchemaAwareExternalTransform) that can >>> be used to discover/initialize/use any SchemaTransform. >>> * Developing tooling to generate stubs for each language. This is >>> possible since SchemaTransform gives a cleaner way to define/interpret the >>> construction API of a transform. >>> >>> I think both can be useful. For example, the prior might be useful to >>> quickly test/try out new SchemaTransforms without going through code >>> generation. >>> >>> Also, I agree with you that it might be good to generate such stubs (and >>> corresponding docs) during release time instead of generating and >>> committing stubs to the repo. >>> >>> Thanks, >>> Cham >>> >>> Brian On Thu, Aug 4, 2022 at 6:51 PM Chamikara Jayalath via dev < dev@beam.apache.org> wrote: > Hi All, > > I believe we can make the multi-language pipelines offering [1] much > easier to use by updating the expansion service to be fully aware of > SchemaTransforms. Additionally this will make it easy to > register/discover/use transforms defined in one SDK from all other SDKs. > Specifically we could add the following features. > >- Expansion service can be used to easily initialize and expand >transforms without need for additional code. >- Expansion service can be used to easily discover already >registered transforms. >- Pipeline SDKs can generate user-friendly stub-APIs based on >transforms registered with an expansion service, eliminating the need > to >develop language-specific wrappers. > > Please see here for my proposal: > https://s.apache.org/easy-multi-language > > Lemme know if you have any comments/questions/suggestions :) > > Thanks, > Cham > > [1] > https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines > >
Re: Easy Multi-language via a SchemaTransform-aware Expansion Service
On Fri, Aug 5, 2022 at 12:00 PM Byron Ellis wrote: > I think there are some practical advantages to having the ability to > support a dynamic version---at previous places where I've worked having > Kafka's Schema Service was incredibly useful for data processing (it was a > Java/Scala shop and we mostly used a "decode to POJO" approach rather than > codegen.) > Yeah, that's my thought as well. I think it will be pretty useful during development/testing cycles, especially if we push code generation to the release time. Also, it will be useful for trying out any SchemaTransforms developed/released by third parties where generated stubs might not be available. > > On Fri, Aug 5, 2022 at 10:08 AM Chamikara Jayalath via dev < > dev@beam.apache.org> wrote: > >> >> >> On Fri, Aug 5, 2022 at 9:44 AM Brian Hulette wrote: >> >>> Thanks Cham! I really like the proposal, I left a few comments. I also >>> had one higher-level point I wanted to elevate here: >>> >>> > Pipeline SDKs can generate user-friendly stub-APIs based on transforms >>> registered with an expansion service, eliminating the need to develop >>> language-specific wrappers. >>> This would be great! I think one point to consider is whether we can do >>> this statically. We could package up these stubs with releases and include >>> them in API docs for each language, making them much more discoverable. >>> That could be an extension on top of your proposal (e.g. as part of its >>> build, each SDK spins up other known expansion services and generates code >>> based on the discovery responses), but maybe it could be cleaner if we >>> don't really need the dynamic version? >>> >> >> So my proposal suggested two solutions for wrappers. >> * A higher level (dynamic) API (SchemaAwareExternalTransform) that can be >> used to discover/initialize/use any SchemaTransform. >> * Developing tooling to generate stubs for each language. This is >> possible since SchemaTransform gives a cleaner way to define/interpret the >> construction API of a transform. >> >> I think both can be useful. For example, the prior might be useful to >> quickly test/try out new SchemaTransforms without going through code >> generation. >> >> Also, I agree with you that it might be good to generate such stubs (and >> corresponding docs) during release time instead of generating and >> committing stubs to the repo. >> >> Thanks, >> Cham >> >> >>> >>> Brian >>> >>> >>> On Thu, Aug 4, 2022 at 6:51 PM Chamikara Jayalath via dev < >>> dev@beam.apache.org> wrote: >>> Hi All, I believe we can make the multi-language pipelines offering [1] much easier to use by updating the expansion service to be fully aware of SchemaTransforms. Additionally this will make it easy to register/discover/use transforms defined in one SDK from all other SDKs. Specifically we could add the following features. - Expansion service can be used to easily initialize and expand transforms without need for additional code. - Expansion service can be used to easily discover already registered transforms. - Pipeline SDKs can generate user-friendly stub-APIs based on transforms registered with an expansion service, eliminating the need to develop language-specific wrappers. Please see here for my proposal: https://s.apache.org/easy-multi-language Lemme know if you have any comments/questions/suggestions :) Thanks, Cham [1] https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines
Re: Easy Multi-language via a SchemaTransform-aware Expansion Service
I think there are some practical advantages to having the ability to support a dynamic version---at previous places where I've worked having Kafka's Schema Service was incredibly useful for data processing (it was a Java/Scala shop and we mostly used a "decode to POJO" approach rather than codegen.) On Fri, Aug 5, 2022 at 10:08 AM Chamikara Jayalath via dev < dev@beam.apache.org> wrote: > > > On Fri, Aug 5, 2022 at 9:44 AM Brian Hulette wrote: > >> Thanks Cham! I really like the proposal, I left a few comments. I also >> had one higher-level point I wanted to elevate here: >> >> > Pipeline SDKs can generate user-friendly stub-APIs based on transforms >> registered with an expansion service, eliminating the need to develop >> language-specific wrappers. >> This would be great! I think one point to consider is whether we can do >> this statically. We could package up these stubs with releases and include >> them in API docs for each language, making them much more discoverable. >> That could be an extension on top of your proposal (e.g. as part of its >> build, each SDK spins up other known expansion services and generates code >> based on the discovery responses), but maybe it could be cleaner if we >> don't really need the dynamic version? >> > > So my proposal suggested two solutions for wrappers. > * A higher level (dynamic) API (SchemaAwareExternalTransform) that can be > used to discover/initialize/use any SchemaTransform. > * Developing tooling to generate stubs for each language. This is possible > since SchemaTransform gives a cleaner way to define/interpret the > construction API of a transform. > > I think both can be useful. For example, the prior might be useful to > quickly test/try out new SchemaTransforms without going through code > generation. > > Also, I agree with you that it might be good to generate such stubs (and > corresponding docs) during release time instead of generating and > committing stubs to the repo. > > Thanks, > Cham > > >> >> Brian >> >> >> On Thu, Aug 4, 2022 at 6:51 PM Chamikara Jayalath via dev < >> dev@beam.apache.org> wrote: >> >>> Hi All, >>> >>> I believe we can make the multi-language pipelines offering [1] much >>> easier to use by updating the expansion service to be fully aware of >>> SchemaTransforms. Additionally this will make it easy to >>> register/discover/use transforms defined in one SDK from all other SDKs. >>> Specifically we could add the following features. >>> >>>- Expansion service can be used to easily initialize and expand >>>transforms without need for additional code. >>>- Expansion service can be used to easily discover already >>>registered transforms. >>>- Pipeline SDKs can generate user-friendly stub-APIs based on >>>transforms registered with an expansion service, eliminating the need to >>>develop language-specific wrappers. >>> >>> Please see here for my proposal: >>> https://s.apache.org/easy-multi-language >>> >>> Lemme know if you have any comments/questions/suggestions :) >>> >>> Thanks, >>> Cham >>> >>> [1] >>> https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines >>> >>>
Re: Easy Multi-language via a SchemaTransform-aware Expansion Service
On Fri, Aug 5, 2022 at 9:44 AM Brian Hulette wrote: > Thanks Cham! I really like the proposal, I left a few comments. I also had > one higher-level point I wanted to elevate here: > > > Pipeline SDKs can generate user-friendly stub-APIs based on transforms > registered with an expansion service, eliminating the need to develop > language-specific wrappers. > This would be great! I think one point to consider is whether we can do > this statically. We could package up these stubs with releases and include > them in API docs for each language, making them much more discoverable. > That could be an extension on top of your proposal (e.g. as part of its > build, each SDK spins up other known expansion services and generates code > based on the discovery responses), but maybe it could be cleaner if we > don't really need the dynamic version? > So my proposal suggested two solutions for wrappers. * A higher level (dynamic) API (SchemaAwareExternalTransform) that can be used to discover/initialize/use any SchemaTransform. * Developing tooling to generate stubs for each language. This is possible since SchemaTransform gives a cleaner way to define/interpret the construction API of a transform. I think both can be useful. For example, the prior might be useful to quickly test/try out new SchemaTransforms without going through code generation. Also, I agree with you that it might be good to generate such stubs (and corresponding docs) during release time instead of generating and committing stubs to the repo. Thanks, Cham > > Brian > > > On Thu, Aug 4, 2022 at 6:51 PM Chamikara Jayalath via dev < > dev@beam.apache.org> wrote: > >> Hi All, >> >> I believe we can make the multi-language pipelines offering [1] much >> easier to use by updating the expansion service to be fully aware of >> SchemaTransforms. Additionally this will make it easy to >> register/discover/use transforms defined in one SDK from all other SDKs. >> Specifically we could add the following features. >> >>- Expansion service can be used to easily initialize and expand >>transforms without need for additional code. >>- Expansion service can be used to easily discover already registered >>transforms. >>- Pipeline SDKs can generate user-friendly stub-APIs based on >>transforms registered with an expansion service, eliminating the need to >>develop language-specific wrappers. >> >> Please see here for my proposal: https://s.apache.org/easy-multi-language >> >> Lemme know if you have any comments/questions/suggestions :) >> >> Thanks, >> Cham >> >> [1] >> https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines >> >>
Re: Easy Multi-language via a SchemaTransform-aware Expansion Service
Thanks Cham! I really like the proposal, I left a few comments. I also had one higher-level point I wanted to elevate here: > Pipeline SDKs can generate user-friendly stub-APIs based on transforms registered with an expansion service, eliminating the need to develop language-specific wrappers. This would be great! I think one point to consider is whether we can do this statically. We could package up these stubs with releases and include them in API docs for each language, making them much more discoverable. That could be an extension on top of your proposal (e.g. as part of its build, each SDK spins up other known expansion services and generates code based on the discovery responses), but maybe it could be cleaner if we don't really need the dynamic version? Brian On Thu, Aug 4, 2022 at 6:51 PM Chamikara Jayalath via dev < dev@beam.apache.org> wrote: > Hi All, > > I believe we can make the multi-language pipelines offering [1] much > easier to use by updating the expansion service to be fully aware of > SchemaTransforms. Additionally this will make it easy to > register/discover/use transforms defined in one SDK from all other SDKs. > Specifically we could add the following features. > >- Expansion service can be used to easily initialize and expand >transforms without need for additional code. >- Expansion service can be used to easily discover already registered >transforms. >- Pipeline SDKs can generate user-friendly stub-APIs based on >transforms registered with an expansion service, eliminating the need to >develop language-specific wrappers. > > Please see here for my proposal: https://s.apache.org/easy-multi-language > > Lemme know if you have any comments/questions/suggestions :) > > Thanks, > Cham > > [1] > https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines > >
Easy Multi-language via a SchemaTransform-aware Expansion Service
Hi All, I believe we can make the multi-language pipelines offering [1] much easier to use by updating the expansion service to be fully aware of SchemaTransforms. Additionally this will make it easy to register/discover/use transforms defined in one SDK from all other SDKs. Specifically we could add the following features. - Expansion service can be used to easily initialize and expand transforms without need for additional code. - Expansion service can be used to easily discover already registered transforms. - Pipeline SDKs can generate user-friendly stub-APIs based on transforms registered with an expansion service, eliminating the need to develop language-specific wrappers. Please see here for my proposal: https://s.apache.org/easy-multi-language Lemme know if you have any comments/questions/suggestions :) Thanks, Cham [1] https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines