I tried several things but I need help:
"[local], [collection] or [auto]":
- 'local' means use embeded Flink (so Hop has Flink engine built in?)?
- 'auto' will generate some engine independent Flink job?
- 'collection' ?
It is executed in my PC Flink installation only if i set The Flink Master to '127.0.0.1:8081'
If I want to execute following pipeline in Flink:
text input => some transform => table output
It will run or I have to specify 'Beam Output' at the end (I think not)? Something like:
text file input => some transform => table output => Beam output
'Text file input' will not work and I have to insert 'Beam input' ? Similar obout 'Beam output' - 'Beam input/output' are just faster than 'text file input/output'?
If I need finally Excel file as a result - what will be faster; save directly using 'Excel writer' or save to txt file and run another pipeline that will open this file and save to Excel?
Regards
M.
Sent: Monday, May 30, 2022 at 5:59 PM
From: "Matt Casters" <[email protected]>
To: [email protected]
Subject: Re: How to send pipeline to Flink?
From: "Matt Casters" <[email protected]>
To: [email protected]
Subject: Re: How to send pipeline to Flink?
Hi Mike,
the variable PROJECT_HOME is actually not set by flink so what I had to do was add it to conf/flink-conf.yml:
env.java.opts: -DPROJECT_HOME=/path/to/project/home/folder
My script to run right now looks something like this:
bin/flink run \
--class org.apache.hop.beam.run.MainBeam \
/tmp/hop-fatjar.jar \
/path/to/project/home/folder/beam/pipelines/input-process-output.hpl \
/tmp/hop-metadata.json \
Flink
As to your questions...--class org.apache.hop.beam.run.MainBeam \
/tmp/hop-fatjar.jar \
/path/to/project/home/folder/beam/pipelines/input-process-output.hpl \
/tmp/hop-metadata.json \
Flink
- I guess there is no way to see the data somewhere in Flink gui?
No but I'm thinking of adding a Hop call-back service (on Hop Server probably) so that we can capture progress and allow Hop GUI to see that.
- How can I modify this 'generate-synthetic-data' pipeline so Flink saves these sample data to file or database? In last transform 'Beam output' i see variable ${DATA_OUTPUT} - does it mean I can do some mapping in 'Variables' tab of Pipeline Run configuration to point to file? But what about database?
The Table Output transform will allow you to write to a relational database. Performance-wise, please read up on the basics specifically the section on "Row batching with non-Beam transforms".
- Hop exports pipeline (I know it can be executed as well with 'run') and it runs permamently till it is stopped, right? Or there is some parameter to execute it just once?
Flink Run will execute only once as shown in the script above.
HTH,
Matt
Matt
On Mon, May 30, 2022 at 5:42 PM <[email protected]> wrote:
Hello community!I'm learning Apache Flink. I launched test installation on my PC (version 1.13.6), I have HOP running and now I want to run first pipeline on Flink.Yet for some reason it does not work. What I did:1. I exported fat jar according to this tutorial:2. I created new Pipeline Run Configuration with details:Name: Local FlinkDescription:Engine type: Beam Flink pipeline engineThe Flink master: [local]Parallelism: 2......User agent: HopTemp file: file://C:/Temp..Fat jar file location: ${PROJECT_HOME}/fat_jar.jar3. I did open 'generate-synthetic-data' pipeline and executed it with Pipeline run configuration: Local FlinkLooks like it runs; I see it in the Flink dashboard.I have two questions:- I guess there is no way to see the data somewhere in Flink gui?- How can I modify this 'generate-synthetic-data' pipeline so Flink saves these sample data to file or database? In last transform 'Beam output' i see variable ${DATA_OUTPUT} - does it mean I can do some mapping in 'Variables' tab of Pipeline Run configuration to point to file? But what about database?- Hop exports pipeline (I know it can be executed as well with 'run') and it runs permamently till it is stopped, right? Or there is some parameter to execute it just once?Thanks a lot for your helpMike
