Re: Standarizing the "Runner" concept across website content

2021-01-07 Thread Austin Bennett
To those unfamiliar with these concepts, I generally conflate everything to
a "Runner" to keep things simple.  Though, also mention "execution engine"
at times.  Glad there appears to be concrete consensus on how we want to
talk about this.  It will also help guide me in being consistent :-)



On Wed, Jan 6, 2021 at 3:05 PM Griselda Cuevas  wrote:

> Thank you all for this productive conversation!
>
> Interestingly enough, a usability study we ran for Apache Beam (more
> details coming soon) pointed out that our documentation and website assume
> that the readers will be already familiar with Data Processing basic
> concepts such as engines, pipelines, etc. So introducing a glossary and
> even rethinking how we add this concepts into our new documentation is a
> good practice to have in mind.
>
> In the meantime, I will adopt the suggestion of differentiating between
> engine and runner. The first application I made of this is in the copy for
> the home page, which you can find as an attached file in this Jira ticket
> [1] in case you want to add comments/suggestions.
>
> The home page is the most important page in the website, as it's the one
> that explains Beam to the world and markets it's features, so appreciate
> feedback there too.
>
> Thanks everyone!
>
> [1]
> https://issues.apache.org/jira/browse/BEAM-11346?jql=project%20%3D%20beam%20AND%20assignee%20%3D%20gris%20ORDER%20BY%20priority%20DESC
>
> On Wed, 6 Jan 2021 at 13:33, Kenneth Knowles  wrote:
>
>>
>>
>> On Wed, Jan 6, 2021 at 12:28 PM Robert Burke  wrote:
>>
>>> +1 on consolidating and being consistent with our terms.
>>>
>>> I've always considered them (Runner/Engine) synonymous. From a user
>>> perspective, an engine without a runner isn't any good for their beam
>>> pipeline. That there's an adapter is an implementation detail in some
>>> instances. I do appreciate not using Adapter a term, avoiding confusing
>>> descriptions.
>>>
>>> However, if we make the change and there's a clear glossary of terms
>>> somewhere then
>>>
>>> That puts the lifecycle of a pipeline to be (loosely) something like...
>>>
>>> A Beam User authors Pipelines by writing DoFns, adding them as
>>> PTransforms connected by PCollections into a Pipeline using a Beam SDK. An
>>> SDK converts the pipeline into a portable representation, and submit it to
>>> the Job Management Service of a Beam Runner. A Beam Runner translates the
>>> portable pipeline representation into terms an underlying Engine
>>> understands for Execution. The Beam Runner also reverses this translation
>>> when the Engine delegates tasks to workers, so that the Beam SDKs can
>>> execute the user's DoFns in keeping with the Beam Semantics.
>>>
>>
>> An explicit glossary is a great idea to combine with standardizing
>> terminology across the site. I think the important context is that most of
>> the engines already existed before Beam and many of them are more
>> well-known. In fact, a pretty good way for a user to understand the essence
>> of what Beam is about is by taking a look at all the engines for which
>> there are Beam runners :-)
>>
>> Engine: a system/product for doing [big] data processing
>> Pipeline: user authors this logic that says what they want to compute (I
>> think the fact that it is a DAG of PTransforms is relevant but we can get
>> away with omitting it for the high-level view and to avoid introducing the
>> term PTransform too early)
>> Runner: executes a Beam pipeline on an engine (agree that "adapter" is
>> too generic)
>>
>> I'd say below that level of granularity is getting into things that you
>> need to know only after you have started writing pipelines. Possibly you
>> need to introduce SDK harness to make clear that Beam pipelines are
>> inherently multi-language/multi-runtime, even if the engine isn't (my
>> personal opinion is that "UDF server" is the best understood terminology
>> for this, and so much better that it is never too late to abandon the
>> cryptic term "SDK harness").
>>
>> Kenn
>>
>>
>>> (Not covered, bundles etc, but you get the idea...)
>>>
>>> On Wed, Jan 6, 2021, 11:16 AM Robert Bradshaw 
>>> wrote:
>>>
 +1 to keeping the distinction between Runner and Engine as Kenn
 described, and cleaning up the site with these in mind (I don't think the
 term engine is widely used yet).

 On Wed, Jan 6, 2021 at 11:15 AM Yichi Zhang  wrote:

> I agree with what kenn said, in most cases I would refer to the term
> runner as the adapter for translating user's pipeline code into a job
> representation and submitting it to the execution engine. Though in some
> cases they may still be used interchangeably such as direct runner?
>
> On Wed, Jan 6, 2021 at 11:02 AM Kenneth Knowles 
> wrote:
>
>> I personally try to always distinguish two concepts: the thing doing
>> the computing (like Spark or Flink), and the adapter for running a Beam
>> pipeline (like SparkRunner or FlinkRunner). I use the term 

Re: Standarizing the "Runner" concept across website content

2021-01-06 Thread Griselda Cuevas
Thank you all for this productive conversation!

Interestingly enough, a usability study we ran for Apache Beam (more
details coming soon) pointed out that our documentation and website assume
that the readers will be already familiar with Data Processing basic
concepts such as engines, pipelines, etc. So introducing a glossary and
even rethinking how we add this concepts into our new documentation is a
good practice to have in mind.

In the meantime, I will adopt the suggestion of differentiating between
engine and runner. The first application I made of this is in the copy for
the home page, which you can find as an attached file in this Jira ticket
[1] in case you want to add comments/suggestions.

The home page is the most important page in the website, as it's the one
that explains Beam to the world and markets it's features, so appreciate
feedback there too.

Thanks everyone!

[1]
https://issues.apache.org/jira/browse/BEAM-11346?jql=project%20%3D%20beam%20AND%20assignee%20%3D%20gris%20ORDER%20BY%20priority%20DESC

On Wed, 6 Jan 2021 at 13:33, Kenneth Knowles  wrote:

>
>
> On Wed, Jan 6, 2021 at 12:28 PM Robert Burke  wrote:
>
>> +1 on consolidating and being consistent with our terms.
>>
>> I've always considered them (Runner/Engine) synonymous. From a user
>> perspective, an engine without a runner isn't any good for their beam
>> pipeline. That there's an adapter is an implementation detail in some
>> instances. I do appreciate not using Adapter a term, avoiding confusing
>> descriptions.
>>
>> However, if we make the change and there's a clear glossary of terms
>> somewhere then
>>
>> That puts the lifecycle of a pipeline to be (loosely) something like...
>>
>> A Beam User authors Pipelines by writing DoFns, adding them as
>> PTransforms connected by PCollections into a Pipeline using a Beam SDK. An
>> SDK converts the pipeline into a portable representation, and submit it to
>> the Job Management Service of a Beam Runner. A Beam Runner translates the
>> portable pipeline representation into terms an underlying Engine
>> understands for Execution. The Beam Runner also reverses this translation
>> when the Engine delegates tasks to workers, so that the Beam SDKs can
>> execute the user's DoFns in keeping with the Beam Semantics.
>>
>
> An explicit glossary is a great idea to combine with standardizing
> terminology across the site. I think the important context is that most of
> the engines already existed before Beam and many of them are more
> well-known. In fact, a pretty good way for a user to understand the essence
> of what Beam is about is by taking a look at all the engines for which
> there are Beam runners :-)
>
> Engine: a system/product for doing [big] data processing
> Pipeline: user authors this logic that says what they want to compute (I
> think the fact that it is a DAG of PTransforms is relevant but we can get
> away with omitting it for the high-level view and to avoid introducing the
> term PTransform too early)
> Runner: executes a Beam pipeline on an engine (agree that "adapter" is too
> generic)
>
> I'd say below that level of granularity is getting into things that you
> need to know only after you have started writing pipelines. Possibly you
> need to introduce SDK harness to make clear that Beam pipelines are
> inherently multi-language/multi-runtime, even if the engine isn't (my
> personal opinion is that "UDF server" is the best understood terminology
> for this, and so much better that it is never too late to abandon the
> cryptic term "SDK harness").
>
> Kenn
>
>
>> (Not covered, bundles etc, but you get the idea...)
>>
>> On Wed, Jan 6, 2021, 11:16 AM Robert Bradshaw 
>> wrote:
>>
>>> +1 to keeping the distinction between Runner and Engine as Kenn
>>> described, and cleaning up the site with these in mind (I don't think the
>>> term engine is widely used yet).
>>>
>>> On Wed, Jan 6, 2021 at 11:15 AM Yichi Zhang  wrote:
>>>
 I agree with what kenn said, in most cases I would refer to the term
 runner as the adapter for translating user's pipeline code into a job
 representation and submitting it to the execution engine. Though in some
 cases they may still be used interchangeably such as direct runner?

 On Wed, Jan 6, 2021 at 11:02 AM Kenneth Knowles 
 wrote:

> I personally try to always distinguish two concepts: the thing doing
> the computing (like Spark or Flink), and the adapter for running a Beam
> pipeline (like SparkRunner or FlinkRunner). I use the term "runner" to 
> mean
> the adapter, and have been trying to use the term "engine" to refer to the
> thing doing the computing. Do you think that users will use these two
> interchangeably? Do you have recommendations about if these terms makes
> sense to users?
>
> Kenn
>
> On Wed, Jan 6, 2021 at 10:23 AM Griselda Cuevas 
> wrote:
>
>> Hi dev@ community, Happy New Year!
>>
>> I'm working on updating the copy of 

Re: Standarizing the "Runner" concept across website content

2021-01-06 Thread Kenneth Knowles
On Wed, Jan 6, 2021 at 12:28 PM Robert Burke  wrote:

> +1 on consolidating and being consistent with our terms.
>
> I've always considered them (Runner/Engine) synonymous. From a user
> perspective, an engine without a runner isn't any good for their beam
> pipeline. That there's an adapter is an implementation detail in some
> instances. I do appreciate not using Adapter a term, avoiding confusing
> descriptions.
>
> However, if we make the change and there's a clear glossary of terms
> somewhere then
>
> That puts the lifecycle of a pipeline to be (loosely) something like...
>
> A Beam User authors Pipelines by writing DoFns, adding them as PTransforms
> connected by PCollections into a Pipeline using a Beam SDK. An SDK converts
> the pipeline into a portable representation, and submit it to the Job
> Management Service of a Beam Runner. A Beam Runner translates the portable
> pipeline representation into terms an underlying Engine understands for
> Execution. The Beam Runner also reverses this translation when the Engine
> delegates tasks to workers, so that the Beam SDKs can execute the user's
> DoFns in keeping with the Beam Semantics.
>

An explicit glossary is a great idea to combine with standardizing
terminology across the site. I think the important context is that most of
the engines already existed before Beam and many of them are more
well-known. In fact, a pretty good way for a user to understand the essence
of what Beam is about is by taking a look at all the engines for which
there are Beam runners :-)

Engine: a system/product for doing [big] data processing
Pipeline: user authors this logic that says what they want to compute (I
think the fact that it is a DAG of PTransforms is relevant but we can get
away with omitting it for the high-level view and to avoid introducing the
term PTransform too early)
Runner: executes a Beam pipeline on an engine (agree that "adapter" is too
generic)

I'd say below that level of granularity is getting into things that you
need to know only after you have started writing pipelines. Possibly you
need to introduce SDK harness to make clear that Beam pipelines are
inherently multi-language/multi-runtime, even if the engine isn't (my
personal opinion is that "UDF server" is the best understood terminology
for this, and so much better that it is never too late to abandon the
cryptic term "SDK harness").

Kenn


> (Not covered, bundles etc, but you get the idea...)
>
> On Wed, Jan 6, 2021, 11:16 AM Robert Bradshaw  wrote:
>
>> +1 to keeping the distinction between Runner and Engine as Kenn
>> described, and cleaning up the site with these in mind (I don't think the
>> term engine is widely used yet).
>>
>> On Wed, Jan 6, 2021 at 11:15 AM Yichi Zhang  wrote:
>>
>>> I agree with what kenn said, in most cases I would refer to the term
>>> runner as the adapter for translating user's pipeline code into a job
>>> representation and submitting it to the execution engine. Though in some
>>> cases they may still be used interchangeably such as direct runner?
>>>
>>> On Wed, Jan 6, 2021 at 11:02 AM Kenneth Knowles  wrote:
>>>
 I personally try to always distinguish two concepts: the thing doing
 the computing (like Spark or Flink), and the adapter for running a Beam
 pipeline (like SparkRunner or FlinkRunner). I use the term "runner" to mean
 the adapter, and have been trying to use the term "engine" to refer to the
 thing doing the computing. Do you think that users will use these two
 interchangeably? Do you have recommendations about if these terms makes
 sense to users?

 Kenn

 On Wed, Jan 6, 2021 at 10:23 AM Griselda Cuevas 
 wrote:

> Hi dev@ community, Happy New Year!
>
> I'm working on updating the copy of a few website pages, and something
> that I want to solve is standardize how we refer to runners across the
> site. So far I've identified these definitions:
>
>- Back-end
>- Backend systems
>- Execution environments
>- Runtime
>- Runtime system
>- Runner
>
> Even when the majority of users will understand these concepts
> interchangeably, it's a good idea to be consistent so new users get
> familiar with how Beam works and its components.
>
> I'm going to start using the word "Runner" as I update the copy and
> will ask the team working in te UI revamp to do the same. Let me know if
> you have any questions/concerns.
>



Re: Standarizing the "Runner" concept across website content

2021-01-06 Thread Robert Burke
+1 on consolidating and being consistent with our terms.

I've always considered them (Runner/Engine) synonymous. From a user
perspective, an engine without a runner isn't any good for their beam
pipeline. That there's an adapter is an implementation detail in some
instances. I do appreciate not using Adapter a term, avoiding confusing
descriptions.

However, if we make the change and there's a clear glossary of terms
somewhere then

That puts the lifecycle of a pipeline to be (loosely) something like...

A Beam User authors Pipelines by writing DoFns, adding them as PTransforms
connected by PCollections into a Pipeline using a Beam SDK. An SDK converts
the pipeline into a portable representation, and submit it to the Job
Management Service of a Beam Runner. A Beam Runner translates the portable
pipeline representation into terms an underlying Engine understands for
Execution. The Beam Runner also reverses this translation when the Engine
delegates tasks to workers, so that the Beam SDKs can execute the user's
DoFns in keeping with the Beam Semantics.

(Not covered, bundles etc, but you get the idea...)

On Wed, Jan 6, 2021, 11:16 AM Robert Bradshaw  wrote:

> +1 to keeping the distinction between Runner and Engine as Kenn described,
> and cleaning up the site with these in mind (I don't think the term engine
> is widely used yet).
>
> On Wed, Jan 6, 2021 at 11:15 AM Yichi Zhang  wrote:
>
>> I agree with what kenn said, in most cases I would refer to the term
>> runner as the adapter for translating user's pipeline code into a job
>> representation and submitting it to the execution engine. Though in some
>> cases they may still be used interchangeably such as direct runner?
>>
>> On Wed, Jan 6, 2021 at 11:02 AM Kenneth Knowles  wrote:
>>
>>> I personally try to always distinguish two concepts: the thing doing the
>>> computing (like Spark or Flink), and the adapter for running a Beam
>>> pipeline (like SparkRunner or FlinkRunner). I use the term "runner" to mean
>>> the adapter, and have been trying to use the term "engine" to refer to the
>>> thing doing the computing. Do you think that users will use these two
>>> interchangeably? Do you have recommendations about if these terms makes
>>> sense to users?
>>>
>>> Kenn
>>>
>>> On Wed, Jan 6, 2021 at 10:23 AM Griselda Cuevas  wrote:
>>>
 Hi dev@ community, Happy New Year!

 I'm working on updating the copy of a few website pages, and something
 that I want to solve is standardize how we refer to runners across the
 site. So far I've identified these definitions:

- Back-end
- Backend systems
- Execution environments
- Runtime
- Runtime system
- Runner

 Even when the majority of users will understand these concepts
 interchangeably, it's a good idea to be consistent so new users get
 familiar with how Beam works and its components.

 I'm going to start using the word "Runner" as I update the copy and
 will ask the team working in te UI revamp to do the same. Let me know if
 you have any questions/concerns.

>>>


Re: Standarizing the "Runner" concept across website content

2021-01-06 Thread Robert Bradshaw
+1 to keeping the distinction between Runner and Engine as Kenn described,
and cleaning up the site with these in mind (I don't think the term engine
is widely used yet).

On Wed, Jan 6, 2021 at 11:15 AM Yichi Zhang  wrote:

> I agree with what kenn said, in most cases I would refer to the term
> runner as the adapter for translating user's pipeline code into a job
> representation and submitting it to the execution engine. Though in some
> cases they may still be used interchangeably such as direct runner?
>
> On Wed, Jan 6, 2021 at 11:02 AM Kenneth Knowles  wrote:
>
>> I personally try to always distinguish two concepts: the thing doing the
>> computing (like Spark or Flink), and the adapter for running a Beam
>> pipeline (like SparkRunner or FlinkRunner). I use the term "runner" to mean
>> the adapter, and have been trying to use the term "engine" to refer to the
>> thing doing the computing. Do you think that users will use these two
>> interchangeably? Do you have recommendations about if these terms makes
>> sense to users?
>>
>> Kenn
>>
>> On Wed, Jan 6, 2021 at 10:23 AM Griselda Cuevas  wrote:
>>
>>> Hi dev@ community, Happy New Year!
>>>
>>> I'm working on updating the copy of a few website pages, and something
>>> that I want to solve is standardize how we refer to runners across the
>>> site. So far I've identified these definitions:
>>>
>>>- Back-end
>>>- Backend systems
>>>- Execution environments
>>>- Runtime
>>>- Runtime system
>>>- Runner
>>>
>>> Even when the majority of users will understand these concepts
>>> interchangeably, it's a good idea to be consistent so new users get
>>> familiar with how Beam works and its components.
>>>
>>> I'm going to start using the word "Runner" as I update the copy and will
>>> ask the team working in te UI revamp to do the same. Let me know if you
>>> have any questions/concerns.
>>>
>>


Re: Standarizing the "Runner" concept across website content

2021-01-06 Thread Yichi Zhang
I agree with what kenn said, in most cases I would refer to the term runner
as the adapter for translating user's pipeline code into a job
representation and submitting it to the execution engine. Though in some
cases they may still be used interchangeably such as direct runner?

On Wed, Jan 6, 2021 at 11:02 AM Kenneth Knowles  wrote:

> I personally try to always distinguish two concepts: the thing doing the
> computing (like Spark or Flink), and the adapter for running a Beam
> pipeline (like SparkRunner or FlinkRunner). I use the term "runner" to mean
> the adapter, and have been trying to use the term "engine" to refer to the
> thing doing the computing. Do you think that users will use these two
> interchangeably? Do you have recommendations about if these terms makes
> sense to users?
>
> Kenn
>
> On Wed, Jan 6, 2021 at 10:23 AM Griselda Cuevas  wrote:
>
>> Hi dev@ community, Happy New Year!
>>
>> I'm working on updating the copy of a few website pages, and something
>> that I want to solve is standardize how we refer to runners across the
>> site. So far I've identified these definitions:
>>
>>- Back-end
>>- Backend systems
>>- Execution environments
>>- Runtime
>>- Runtime system
>>- Runner
>>
>> Even when the majority of users will understand these concepts
>> interchangeably, it's a good idea to be consistent so new users get
>> familiar with how Beam works and its components.
>>
>> I'm going to start using the word "Runner" as I update the copy and will
>> ask the team working in te UI revamp to do the same. Let me know if you
>> have any questions/concerns.
>>
>


Re: Standarizing the "Runner" concept across website content

2021-01-06 Thread Vincent Marquez
+1 to distinguishing  between runners and engines(spark/flink/dataflow).
Those terms are clear and make sense to me.

*~Vincent*


On Wed, Jan 6, 2021 at 11:02 AM Kenneth Knowles  wrote:

> I personally try to always distinguish two concepts: the thing doing the
> computing (like Spark or Flink), and the adapter for running a Beam
> pipeline (like SparkRunner or FlinkRunner). I use the term "runner" to mean
> the adapter, and have been trying to use the term "engine" to refer to the
> thing doing the computing. Do you think that users will use these two
> interchangeably? Do you have recommendations about if these terms makes
> sense to users?
>
> Kenn
>
> On Wed, Jan 6, 2021 at 10:23 AM Griselda Cuevas  wrote:
>
>> Hi dev@ community, Happy New Year!
>>
>> I'm working on updating the copy of a few website pages, and something
>> that I want to solve is standardize how we refer to runners across the
>> site. So far I've identified these definitions:
>>
>>- Back-end
>>- Backend systems
>>- Execution environments
>>- Runtime
>>- Runtime system
>>- Runner
>>
>> Even when the majority of users will understand these concepts
>> interchangeably, it's a good idea to be consistent so new users get
>> familiar with how Beam works and its components.
>>
>> I'm going to start using the word "Runner" as I update the copy and will
>> ask the team working in te UI revamp to do the same. Let me know if you
>> have any questions/concerns.
>>
>


Re: Standarizing the "Runner" concept across website content

2021-01-06 Thread Kenneth Knowles
I personally try to always distinguish two concepts: the thing doing the
computing (like Spark or Flink), and the adapter for running a Beam
pipeline (like SparkRunner or FlinkRunner). I use the term "runner" to mean
the adapter, and have been trying to use the term "engine" to refer to the
thing doing the computing. Do you think that users will use these two
interchangeably? Do you have recommendations about if these terms makes
sense to users?

Kenn

On Wed, Jan 6, 2021 at 10:23 AM Griselda Cuevas  wrote:

> Hi dev@ community, Happy New Year!
>
> I'm working on updating the copy of a few website pages, and something
> that I want to solve is standardize how we refer to runners across the
> site. So far I've identified these definitions:
>
>- Back-end
>- Backend systems
>- Execution environments
>- Runtime
>- Runtime system
>- Runner
>
> Even when the majority of users will understand these concepts
> interchangeably, it's a good idea to be consistent so new users get
> familiar with how Beam works and its components.
>
> I'm going to start using the word "Runner" as I update the copy and will
> ask the team working in te UI revamp to do the same. Let me know if you
> have any questions/concerns.
>