Re: How to send pipeline to Flink?

podunk Wed, 01 Jun 2022 08:51:34 -0700

I tried several things but I need help:

"[local], [collection] or [auto]":

- 'local' means use embeded Flink (so Hop has Flink engine built in?)?

- 'auto' will generate some engine independent Flink job?

- 'collection' ?

It is executed in my PC Flink installation only if i set The Flink Master to '127.0.0.1:8081'

If I want to execute following pipeline in Flink:

text input => some transform => table output

It will run or I have to specify 'Beam Output' at the end (I think not)? Something like:

text file input => some transform => table output => Beam output

'Text file input' will not work and I have to insert 'Beam input' ? Similar obout 'Beam output' - 'Beam input/output' are just faster than 'text file input/output'?

If I need finally Excel file as a result - what will be faster; save directly using 'Excel writer' or save to txt file and run another pipeline that will open this file and save to Excel?

Regards

Sent: Monday, May 30, 2022 at 5:59 PM
From: "Matt Casters" <[email protected]>
To: [email protected]
Subject: Re: How to send pipeline to Flink?

Hi Mike,

the variable PROJECT_HOME is actually not set by flink so what I had to do was add it to conf/flink-conf.yml:

env.java.opts: -DPROJECT_HOME=/path/to/project/home/folder

My script to run right now looks something like this:

bin/flink run \
--class org.apache.hop.beam.run.MainBeam \
/tmp/hop-fatjar.jar \
/path/to/project/home/folder/beam/pipelines/input-process-output.hpl \
/tmp/hop-metadata.json \
Flink

As to your questions...

- I guess there is no way to see the data somewhere in Flink gui?

No but I'm thinking of adding a Hop call-back service (on Hop Server probably) so that we can capture progress and allow Hop GUI to see that.

- How can I modify this 'generate-synthetic-data' pipeline so Flink saves these sample data to file or database? In last transform 'Beam output' i see variable ${DATA_OUTPUT} - does it mean I can do some mapping in 'Variables' tab of Pipeline Run configuration to point to file? But what about database?

The Table Output transform will allow you to write to a relational database. Performance-wise, please read up on the basics specifically the section on "Row batching with non-Beam transforms".

- Hop exports pipeline (I know it can be executed as well with 'run') and it runs permamently till it is stopped, right? Or there is some parameter to execute it just once?

Flink Run will execute only once as shown in the script above.

HTH,
Matt

On Mon, May 30, 2022 at 5:42 PM <[email protected]> wrote:

Hello community!

I'm learning Apache Flink. I launched test installation on my PC (version 1.13.6), I have HOP running and now I want to run first pipeline on Flink.

Yet for some reason it does not work. What I did:

1. I exported fat jar according to this tutorial:

https://hop.apache.org/manual/latest/pipeline/beam/running-the-beam-samples.html#_prerequisites

2. I created new Pipeline Run Configuration with details:

Name: Local Flink

Description:

Engine type: Beam Flink pipeline engine

The Flink master: [local]

Parallelism: 2

...

...

User agent: Hop

Temp file: file://C:/Temp

..

Fat jar file location: ${PROJECT_HOME}/fat_jar.jar

3. I did open 'generate-synthetic-data' pipeline and executed it with Pipeline run configuration: Local Flink

Looks like it runs; I see it in the Flink dashboard.

I have two questions:

- I guess there is no way to see the data somewhere in Flink gui?

- How can I modify this 'generate-synthetic-data' pipeline so Flink saves these sample data to file or database? In last transform 'Beam output' i see variable ${DATA_OUTPUT} - does it mean I can do some mapping in 'Variables' tab of Pipeline Run configuration to point to file? But what about database?

- Hop exports pipeline (I know it can be executed as well with 'run') and it runs permamently till it is stopped, right? Or there is some parameter to execute it just once?

Thanks a lot for your help

Mike

Re: How to send pipeline to Flink?

Reply via email to