Re: How to send pipeline to Flink?

podunk Tue, 31 May 2022 06:24:22 -0700

Wow! That's really you Matt!? You're a living legend to me :-)

Kettle was (still is) amazing piece of software. I was sad that Hitachi Vantara is killing this product. But when I started looking on the Internet to see if there was any alternative and it turned out that there is Hop, I was delighted. This software, so Kettle, is amazing! I use it as a door to any system, to work on any data.

Thanks a lot for help - I will test solution you propose and get back if any problem (hope not :-))

Mike

Sent: Monday, May 30, 2022 at 5:59 PM
From: "Matt Casters" <[email protected]>
To: [email protected]
Subject: Re: How to send pipeline to Flink?

Hi Mike,

the variable PROJECT_HOME is actually not set by flink so what I had to do was add it to conf/flink-conf.yml:

env.java.opts: -DPROJECT_HOME=/path/to/project/home/folder

My script to run right now looks something like this:

bin/flink run \
--class org.apache.hop.beam.run.MainBeam \
/tmp/hop-fatjar.jar \
/path/to/project/home/folder/beam/pipelines/input-process-output.hpl \
/tmp/hop-metadata.json \
Flink

As to your questions...

- I guess there is no way to see the data somewhere in Flink gui?

No but I'm thinking of adding a Hop call-back service (on Hop Server probably) so that we can capture progress and allow Hop GUI to see that.

- How can I modify this 'generate-synthetic-data' pipeline so Flink saves these sample data to file or database? In last transform 'Beam output' i see variable ${DATA_OUTPUT} - does it mean I can do some mapping in 'Variables' tab of Pipeline Run configuration to point to file? But what about database?

The Table Output transform will allow you to write to a relational database. Performance-wise, please read up on the basics specifically the section on "Row batching with non-Beam transforms".

- Hop exports pipeline (I know it can be executed as well with 'run') and it runs permamently till it is stopped, right? Or there is some parameter to execute it just once?

Flink Run will execute only once as shown in the script above.

HTH,
Matt

On Mon, May 30, 2022 at 5:42 PM <[email protected]> wrote:

Hello community!

I'm learning Apache Flink. I launched test installation on my PC (version 1.13.6), I have HOP running and now I want to run first pipeline on Flink.

Yet for some reason it does not work. What I did:

1. I exported fat jar according to this tutorial:

https://hop.apache.org/manual/latest/pipeline/beam/running-the-beam-samples.html#_prerequisites

2. I created new Pipeline Run Configuration with details:

Name: Local Flink

Description:

Engine type: Beam Flink pipeline engine

The Flink master: [local]

Parallelism: 2

...

...

User agent: Hop

Temp file: file://C:/Temp

..

Fat jar file location: ${PROJECT_HOME}/fat_jar.jar

3. I did open 'generate-synthetic-data' pipeline and executed it with Pipeline run configuration: Local Flink

Looks like it runs; I see it in the Flink dashboard.

I have two questions:

- I guess there is no way to see the data somewhere in Flink gui?

- How can I modify this 'generate-synthetic-data' pipeline so Flink saves these sample data to file or database? In last transform 'Beam output' i see variable ${DATA_OUTPUT} - does it mean I can do some mapping in 'Variables' tab of Pipeline Run configuration to point to file? But what about database?

- Hop exports pipeline (I know it can be executed as well with 'run') and it runs permamently till it is stopped, right? Or there is some parameter to execute it just once?

Thanks a lot for your help

Mike

Re: How to send pipeline to Flink?

Reply via email to