Re: Standarizing the "Runner" concept across website content
To those unfamiliar with these concepts, I generally conflate everything to a "Runner" to keep things simple. Though, also mention "execution engine" at times. Glad there appears to be concrete consensus on how we want to talk about this. It will also help guide me in being consistent :-) On Wed, Jan 6, 2021 at 3:05 PM Griselda Cuevas wrote: > Thank you all for this productive conversation! > > Interestingly enough, a usability study we ran for Apache Beam (more > details coming soon) pointed out that our documentation and website assume > that the readers will be already familiar with Data Processing basic > concepts such as engines, pipelines, etc. So introducing a glossary and > even rethinking how we add this concepts into our new documentation is a > good practice to have in mind. > > In the meantime, I will adopt the suggestion of differentiating between > engine and runner. The first application I made of this is in the copy for > the home page, which you can find as an attached file in this Jira ticket > [1] in case you want to add comments/suggestions. > > The home page is the most important page in the website, as it's the one > that explains Beam to the world and markets it's features, so appreciate > feedback there too. > > Thanks everyone! > > [1] > https://issues.apache.org/jira/browse/BEAM-11346?jql=project%20%3D%20beam%20AND%20assignee%20%3D%20gris%20ORDER%20BY%20priority%20DESC > > On Wed, 6 Jan 2021 at 13:33, Kenneth Knowles wrote: > >> >> >> On Wed, Jan 6, 2021 at 12:28 PM Robert Burke wrote: >> >>> +1 on consolidating and being consistent with our terms. >>> >>> I've always considered them (Runner/Engine) synonymous. From a user >>> perspective, an engine without a runner isn't any good for their beam >>> pipeline. That there's an adapter is an implementation detail in some >>> instances. I do appreciate not using Adapter a term, avoiding confusing >>> descriptions. >>> >>> However, if we make the change and there's a clear glossary of terms >>> somewhere then >>> >>> That puts the lifecycle of a pipeline to be (loosely) something like... >>> >>> A Beam User authors Pipelines by writing DoFns, adding them as >>> PTransforms connected by PCollections into a Pipeline using a Beam SDK. An >>> SDK converts the pipeline into a portable representation, and submit it to >>> the Job Management Service of a Beam Runner. A Beam Runner translates the >>> portable pipeline representation into terms an underlying Engine >>> understands for Execution. The Beam Runner also reverses this translation >>> when the Engine delegates tasks to workers, so that the Beam SDKs can >>> execute the user's DoFns in keeping with the Beam Semantics. >>> >> >> An explicit glossary is a great idea to combine with standardizing >> terminology across the site. I think the important context is that most of >> the engines already existed before Beam and many of them are more >> well-known. In fact, a pretty good way for a user to understand the essence >> of what Beam is about is by taking a look at all the engines for which >> there are Beam runners :-) >> >> Engine: a system/product for doing [big] data processing >> Pipeline: user authors this logic that says what they want to compute (I >> think the fact that it is a DAG of PTransforms is relevant but we can get >> away with omitting it for the high-level view and to avoid introducing the >> term PTransform too early) >> Runner: executes a Beam pipeline on an engine (agree that "adapter" is >> too generic) >> >> I'd say below that level of granularity is getting into things that you >> need to know only after you have started writing pipelines. Possibly you >> need to introduce SDK harness to make clear that Beam pipelines are >> inherently multi-language/multi-runtime, even if the engine isn't (my >> personal opinion is that "UDF server" is the best understood terminology >> for this, and so much better that it is never too late to abandon the >> cryptic term "SDK harness"). >> >> Kenn >> >> >>> (Not covered, bundles etc, but you get the idea...) >>> >>> On Wed, Jan 6, 2021, 11:16 AM Robert Bradshaw >>> wrote: >>> +1 to keeping the distinction between Runner and Engine as Kenn described, and cleaning up the site with these in mind (I don't think the term engine is widely used yet). On Wed, Jan 6, 2021 at 11:15 AM Yichi Zhang wrote: > I agree with what kenn said, in most cases I would refer to the term > runner as the adapter for translating user's pipeline code into a job > representation and submitting it to the execution engine. Though in some > cases they may still be used interchangeably such as direct runner? > > On Wed, Jan 6, 2021 at 11:02 AM Kenneth Knowles > wrote: > >> I personally try to always distinguish two concepts: the thing doing >> the computing (like Spark or Flink), and the adapter for running a Beam >> pipeline (like SparkRunner or FlinkRunner). I use the term
Re: Standarizing the "Runner" concept across website content
Thank you all for this productive conversation! Interestingly enough, a usability study we ran for Apache Beam (more details coming soon) pointed out that our documentation and website assume that the readers will be already familiar with Data Processing basic concepts such as engines, pipelines, etc. So introducing a glossary and even rethinking how we add this concepts into our new documentation is a good practice to have in mind. In the meantime, I will adopt the suggestion of differentiating between engine and runner. The first application I made of this is in the copy for the home page, which you can find as an attached file in this Jira ticket [1] in case you want to add comments/suggestions. The home page is the most important page in the website, as it's the one that explains Beam to the world and markets it's features, so appreciate feedback there too. Thanks everyone! [1] https://issues.apache.org/jira/browse/BEAM-11346?jql=project%20%3D%20beam%20AND%20assignee%20%3D%20gris%20ORDER%20BY%20priority%20DESC On Wed, 6 Jan 2021 at 13:33, Kenneth Knowles wrote: > > > On Wed, Jan 6, 2021 at 12:28 PM Robert Burke wrote: > >> +1 on consolidating and being consistent with our terms. >> >> I've always considered them (Runner/Engine) synonymous. From a user >> perspective, an engine without a runner isn't any good for their beam >> pipeline. That there's an adapter is an implementation detail in some >> instances. I do appreciate not using Adapter a term, avoiding confusing >> descriptions. >> >> However, if we make the change and there's a clear glossary of terms >> somewhere then >> >> That puts the lifecycle of a pipeline to be (loosely) something like... >> >> A Beam User authors Pipelines by writing DoFns, adding them as >> PTransforms connected by PCollections into a Pipeline using a Beam SDK. An >> SDK converts the pipeline into a portable representation, and submit it to >> the Job Management Service of a Beam Runner. A Beam Runner translates the >> portable pipeline representation into terms an underlying Engine >> understands for Execution. The Beam Runner also reverses this translation >> when the Engine delegates tasks to workers, so that the Beam SDKs can >> execute the user's DoFns in keeping with the Beam Semantics. >> > > An explicit glossary is a great idea to combine with standardizing > terminology across the site. I think the important context is that most of > the engines already existed before Beam and many of them are more > well-known. In fact, a pretty good way for a user to understand the essence > of what Beam is about is by taking a look at all the engines for which > there are Beam runners :-) > > Engine: a system/product for doing [big] data processing > Pipeline: user authors this logic that says what they want to compute (I > think the fact that it is a DAG of PTransforms is relevant but we can get > away with omitting it for the high-level view and to avoid introducing the > term PTransform too early) > Runner: executes a Beam pipeline on an engine (agree that "adapter" is too > generic) > > I'd say below that level of granularity is getting into things that you > need to know only after you have started writing pipelines. Possibly you > need to introduce SDK harness to make clear that Beam pipelines are > inherently multi-language/multi-runtime, even if the engine isn't (my > personal opinion is that "UDF server" is the best understood terminology > for this, and so much better that it is never too late to abandon the > cryptic term "SDK harness"). > > Kenn > > >> (Not covered, bundles etc, but you get the idea...) >> >> On Wed, Jan 6, 2021, 11:16 AM Robert Bradshaw >> wrote: >> >>> +1 to keeping the distinction between Runner and Engine as Kenn >>> described, and cleaning up the site with these in mind (I don't think the >>> term engine is widely used yet). >>> >>> On Wed, Jan 6, 2021 at 11:15 AM Yichi Zhang wrote: >>> I agree with what kenn said, in most cases I would refer to the term runner as the adapter for translating user's pipeline code into a job representation and submitting it to the execution engine. Though in some cases they may still be used interchangeably such as direct runner? On Wed, Jan 6, 2021 at 11:02 AM Kenneth Knowles wrote: > I personally try to always distinguish two concepts: the thing doing > the computing (like Spark or Flink), and the adapter for running a Beam > pipeline (like SparkRunner or FlinkRunner). I use the term "runner" to > mean > the adapter, and have been trying to use the term "engine" to refer to the > thing doing the computing. Do you think that users will use these two > interchangeably? Do you have recommendations about if these terms makes > sense to users? > > Kenn > > On Wed, Jan 6, 2021 at 10:23 AM Griselda Cuevas > wrote: > >> Hi dev@ community, Happy New Year! >> >> I'm working on updating the copy of
Re: Standarizing the "Runner" concept across website content
On Wed, Jan 6, 2021 at 12:28 PM Robert Burke wrote: > +1 on consolidating and being consistent with our terms. > > I've always considered them (Runner/Engine) synonymous. From a user > perspective, an engine without a runner isn't any good for their beam > pipeline. That there's an adapter is an implementation detail in some > instances. I do appreciate not using Adapter a term, avoiding confusing > descriptions. > > However, if we make the change and there's a clear glossary of terms > somewhere then > > That puts the lifecycle of a pipeline to be (loosely) something like... > > A Beam User authors Pipelines by writing DoFns, adding them as PTransforms > connected by PCollections into a Pipeline using a Beam SDK. An SDK converts > the pipeline into a portable representation, and submit it to the Job > Management Service of a Beam Runner. A Beam Runner translates the portable > pipeline representation into terms an underlying Engine understands for > Execution. The Beam Runner also reverses this translation when the Engine > delegates tasks to workers, so that the Beam SDKs can execute the user's > DoFns in keeping with the Beam Semantics. > An explicit glossary is a great idea to combine with standardizing terminology across the site. I think the important context is that most of the engines already existed before Beam and many of them are more well-known. In fact, a pretty good way for a user to understand the essence of what Beam is about is by taking a look at all the engines for which there are Beam runners :-) Engine: a system/product for doing [big] data processing Pipeline: user authors this logic that says what they want to compute (I think the fact that it is a DAG of PTransforms is relevant but we can get away with omitting it for the high-level view and to avoid introducing the term PTransform too early) Runner: executes a Beam pipeline on an engine (agree that "adapter" is too generic) I'd say below that level of granularity is getting into things that you need to know only after you have started writing pipelines. Possibly you need to introduce SDK harness to make clear that Beam pipelines are inherently multi-language/multi-runtime, even if the engine isn't (my personal opinion is that "UDF server" is the best understood terminology for this, and so much better that it is never too late to abandon the cryptic term "SDK harness"). Kenn > (Not covered, bundles etc, but you get the idea...) > > On Wed, Jan 6, 2021, 11:16 AM Robert Bradshaw wrote: > >> +1 to keeping the distinction between Runner and Engine as Kenn >> described, and cleaning up the site with these in mind (I don't think the >> term engine is widely used yet). >> >> On Wed, Jan 6, 2021 at 11:15 AM Yichi Zhang wrote: >> >>> I agree with what kenn said, in most cases I would refer to the term >>> runner as the adapter for translating user's pipeline code into a job >>> representation and submitting it to the execution engine. Though in some >>> cases they may still be used interchangeably such as direct runner? >>> >>> On Wed, Jan 6, 2021 at 11:02 AM Kenneth Knowles wrote: >>> I personally try to always distinguish two concepts: the thing doing the computing (like Spark or Flink), and the adapter for running a Beam pipeline (like SparkRunner or FlinkRunner). I use the term "runner" to mean the adapter, and have been trying to use the term "engine" to refer to the thing doing the computing. Do you think that users will use these two interchangeably? Do you have recommendations about if these terms makes sense to users? Kenn On Wed, Jan 6, 2021 at 10:23 AM Griselda Cuevas wrote: > Hi dev@ community, Happy New Year! > > I'm working on updating the copy of a few website pages, and something > that I want to solve is standardize how we refer to runners across the > site. So far I've identified these definitions: > >- Back-end >- Backend systems >- Execution environments >- Runtime >- Runtime system >- Runner > > Even when the majority of users will understand these concepts > interchangeably, it's a good idea to be consistent so new users get > familiar with how Beam works and its components. > > I'm going to start using the word "Runner" as I update the copy and > will ask the team working in te UI revamp to do the same. Let me know if > you have any questions/concerns. >
Re: Standarizing the "Runner" concept across website content
+1 on consolidating and being consistent with our terms. I've always considered them (Runner/Engine) synonymous. From a user perspective, an engine without a runner isn't any good for their beam pipeline. That there's an adapter is an implementation detail in some instances. I do appreciate not using Adapter a term, avoiding confusing descriptions. However, if we make the change and there's a clear glossary of terms somewhere then That puts the lifecycle of a pipeline to be (loosely) something like... A Beam User authors Pipelines by writing DoFns, adding them as PTransforms connected by PCollections into a Pipeline using a Beam SDK. An SDK converts the pipeline into a portable representation, and submit it to the Job Management Service of a Beam Runner. A Beam Runner translates the portable pipeline representation into terms an underlying Engine understands for Execution. The Beam Runner also reverses this translation when the Engine delegates tasks to workers, so that the Beam SDKs can execute the user's DoFns in keeping with the Beam Semantics. (Not covered, bundles etc, but you get the idea...) On Wed, Jan 6, 2021, 11:16 AM Robert Bradshaw wrote: > +1 to keeping the distinction between Runner and Engine as Kenn described, > and cleaning up the site with these in mind (I don't think the term engine > is widely used yet). > > On Wed, Jan 6, 2021 at 11:15 AM Yichi Zhang wrote: > >> I agree with what kenn said, in most cases I would refer to the term >> runner as the adapter for translating user's pipeline code into a job >> representation and submitting it to the execution engine. Though in some >> cases they may still be used interchangeably such as direct runner? >> >> On Wed, Jan 6, 2021 at 11:02 AM Kenneth Knowles wrote: >> >>> I personally try to always distinguish two concepts: the thing doing the >>> computing (like Spark or Flink), and the adapter for running a Beam >>> pipeline (like SparkRunner or FlinkRunner). I use the term "runner" to mean >>> the adapter, and have been trying to use the term "engine" to refer to the >>> thing doing the computing. Do you think that users will use these two >>> interchangeably? Do you have recommendations about if these terms makes >>> sense to users? >>> >>> Kenn >>> >>> On Wed, Jan 6, 2021 at 10:23 AM Griselda Cuevas wrote: >>> Hi dev@ community, Happy New Year! I'm working on updating the copy of a few website pages, and something that I want to solve is standardize how we refer to runners across the site. So far I've identified these definitions: - Back-end - Backend systems - Execution environments - Runtime - Runtime system - Runner Even when the majority of users will understand these concepts interchangeably, it's a good idea to be consistent so new users get familiar with how Beam works and its components. I'm going to start using the word "Runner" as I update the copy and will ask the team working in te UI revamp to do the same. Let me know if you have any questions/concerns. >>>
Re: Standarizing the "Runner" concept across website content
+1 to keeping the distinction between Runner and Engine as Kenn described, and cleaning up the site with these in mind (I don't think the term engine is widely used yet). On Wed, Jan 6, 2021 at 11:15 AM Yichi Zhang wrote: > I agree with what kenn said, in most cases I would refer to the term > runner as the adapter for translating user's pipeline code into a job > representation and submitting it to the execution engine. Though in some > cases they may still be used interchangeably such as direct runner? > > On Wed, Jan 6, 2021 at 11:02 AM Kenneth Knowles wrote: > >> I personally try to always distinguish two concepts: the thing doing the >> computing (like Spark or Flink), and the adapter for running a Beam >> pipeline (like SparkRunner or FlinkRunner). I use the term "runner" to mean >> the adapter, and have been trying to use the term "engine" to refer to the >> thing doing the computing. Do you think that users will use these two >> interchangeably? Do you have recommendations about if these terms makes >> sense to users? >> >> Kenn >> >> On Wed, Jan 6, 2021 at 10:23 AM Griselda Cuevas wrote: >> >>> Hi dev@ community, Happy New Year! >>> >>> I'm working on updating the copy of a few website pages, and something >>> that I want to solve is standardize how we refer to runners across the >>> site. So far I've identified these definitions: >>> >>>- Back-end >>>- Backend systems >>>- Execution environments >>>- Runtime >>>- Runtime system >>>- Runner >>> >>> Even when the majority of users will understand these concepts >>> interchangeably, it's a good idea to be consistent so new users get >>> familiar with how Beam works and its components. >>> >>> I'm going to start using the word "Runner" as I update the copy and will >>> ask the team working in te UI revamp to do the same. Let me know if you >>> have any questions/concerns. >>> >>
Re: Standarizing the "Runner" concept across website content
I agree with what kenn said, in most cases I would refer to the term runner as the adapter for translating user's pipeline code into a job representation and submitting it to the execution engine. Though in some cases they may still be used interchangeably such as direct runner? On Wed, Jan 6, 2021 at 11:02 AM Kenneth Knowles wrote: > I personally try to always distinguish two concepts: the thing doing the > computing (like Spark or Flink), and the adapter for running a Beam > pipeline (like SparkRunner or FlinkRunner). I use the term "runner" to mean > the adapter, and have been trying to use the term "engine" to refer to the > thing doing the computing. Do you think that users will use these two > interchangeably? Do you have recommendations about if these terms makes > sense to users? > > Kenn > > On Wed, Jan 6, 2021 at 10:23 AM Griselda Cuevas wrote: > >> Hi dev@ community, Happy New Year! >> >> I'm working on updating the copy of a few website pages, and something >> that I want to solve is standardize how we refer to runners across the >> site. So far I've identified these definitions: >> >>- Back-end >>- Backend systems >>- Execution environments >>- Runtime >>- Runtime system >>- Runner >> >> Even when the majority of users will understand these concepts >> interchangeably, it's a good idea to be consistent so new users get >> familiar with how Beam works and its components. >> >> I'm going to start using the word "Runner" as I update the copy and will >> ask the team working in te UI revamp to do the same. Let me know if you >> have any questions/concerns. >> >
Re: Standarizing the "Runner" concept across website content
+1 to distinguishing between runners and engines(spark/flink/dataflow). Those terms are clear and make sense to me. *~Vincent* On Wed, Jan 6, 2021 at 11:02 AM Kenneth Knowles wrote: > I personally try to always distinguish two concepts: the thing doing the > computing (like Spark or Flink), and the adapter for running a Beam > pipeline (like SparkRunner or FlinkRunner). I use the term "runner" to mean > the adapter, and have been trying to use the term "engine" to refer to the > thing doing the computing. Do you think that users will use these two > interchangeably? Do you have recommendations about if these terms makes > sense to users? > > Kenn > > On Wed, Jan 6, 2021 at 10:23 AM Griselda Cuevas wrote: > >> Hi dev@ community, Happy New Year! >> >> I'm working on updating the copy of a few website pages, and something >> that I want to solve is standardize how we refer to runners across the >> site. So far I've identified these definitions: >> >>- Back-end >>- Backend systems >>- Execution environments >>- Runtime >>- Runtime system >>- Runner >> >> Even when the majority of users will understand these concepts >> interchangeably, it's a good idea to be consistent so new users get >> familiar with how Beam works and its components. >> >> I'm going to start using the word "Runner" as I update the copy and will >> ask the team working in te UI revamp to do the same. Let me know if you >> have any questions/concerns. >> >
Re: Standarizing the "Runner" concept across website content
I personally try to always distinguish two concepts: the thing doing the computing (like Spark or Flink), and the adapter for running a Beam pipeline (like SparkRunner or FlinkRunner). I use the term "runner" to mean the adapter, and have been trying to use the term "engine" to refer to the thing doing the computing. Do you think that users will use these two interchangeably? Do you have recommendations about if these terms makes sense to users? Kenn On Wed, Jan 6, 2021 at 10:23 AM Griselda Cuevas wrote: > Hi dev@ community, Happy New Year! > > I'm working on updating the copy of a few website pages, and something > that I want to solve is standardize how we refer to runners across the > site. So far I've identified these definitions: > >- Back-end >- Backend systems >- Execution environments >- Runtime >- Runtime system >- Runner > > Even when the majority of users will understand these concepts > interchangeably, it's a good idea to be consistent so new users get > familiar with how Beam works and its components. > > I'm going to start using the word "Runner" as I update the copy and will > ask the team working in te UI revamp to do the same. Let me know if you > have any questions/concerns. >