Re: Newbie question : Apache Beam/Flink

HG Tue, 18 Jan 2022 07:53:52 -0800

Hi Matt
I had already seen your talks (not yet watched it in it's entirety.
Will do tomorrow.


Regards Hans-Peter

Op di 18 jan. 2022 om 15:59 schreef Matt Casters <[email protected]>:

> Hi Hans-Peter,
>
> "Flink" is typically a cluster of Flink servers.  You can run pipelines on
> it which process data in a streaming or batch fashion in a safe and
> parallel way.
> Apache Beam is a general API for running these pipelines.
>
> So now with Apache Hop you can design pipelines in a graphical way (or
> with an SDK for that matter) and this is then represented as pure metadata.
> The metadata describes the things that need to happen in a pipeline: the
> transforms and the way they are connected with hops.
>
> For more details I can point you to our getting started with Beam page:
> https://hop.apache.org/manual/latest/pipeline/beam/getting-started-with-beam.html
> I also did a 2h workshop for the Apache Beam Summit last summer:
> https://www.youtube.com/watch?v=sZSIbcPtebI
>
> The gist of it, to come back to your question, is that you can design
> pipelines which are capable of executing on Apache Flink, Apache Spark or
> indeed GCP Dataflow.
> The exact same pipeline metadata can run on all platforms and can be unit
> tested as well for example.
> We accomplished this by wrapping all the platform specific options in what
> we call Pipeline Run Configurations.  For example, here is the
> documentation for the Apache Flink run configuration:
> https://hop.apache.org/manual/latest/pipeline/pipeline-run-configurations/beam-flink-pipeline-engine.html
>
>
> This way you can simply specify how (on which platform) you want to run
> your pipeline and Hop will take care of it.  All the software and libraries
> that you need from Apache Hop, Beam and Flink are included if you use Hop.
>
> Cheers,
> Matt
>
> On Tue, Jan 18, 2022 at 2:57 PM HG <[email protected]> wrote:
>
>> Hi,
>>
>> One question which has not yet become entirely clear to me.
>> I want to run jobs on Flink.
>> Do I also need to install and use Beam?
>>
>> On the page : What is beam it says:
>> Pipelines can also run on Apache Spark, Apache Flink and Google Dataflow
>> through the Apache Beam runtime configurations.
>>
>> What does this mean in practice?
>>
>> Regards Hans-Peter
>>
>
>
>
>
>
>

Re: Newbie question : Apache Beam/Flink

Reply via email to