Wow! That's really you Matt!? You're a living legend to me :-)
Kettle was (still is) amazing piece of software. I was sad that Hitachi Vantara is killing this product. But when I started looking on the Internet to see if there was any alternative and it turned out that there is Hop, I was delighted. This software, so Kettle, is amazing! I use it as a door to any system, to work on any data.
Thanks a lot for help - I will test solution you propose and get back if any problem (hope not :-))
Mike
Sent: Monday, May 30, 2022 at 5:59 PM
From: "Matt Casters" <[email protected]>
To: [email protected]
Subject: Re: How to send pipeline to Flink?
From: "Matt Casters" <[email protected]>
To: [email protected]
Subject: Re: How to send pipeline to Flink?
Hi Mike,
the variable PROJECT_HOME is actually not set by flink so what I had to do was add it to conf/flink-conf.yml:
env.java.opts: -DPROJECT_HOME=/path/to/project/home/folder
My script to run right now looks something like this:
bin/flink run \
--class org.apache.hop.beam.run.MainBeam \
/tmp/hop-fatjar.jar \
/path/to/project/home/folder/beam/pipelines/input-process-output.hpl \
/tmp/hop-metadata.json \
Flink
As to your questions...--class org.apache.hop.beam.run.MainBeam \
/tmp/hop-fatjar.jar \
/path/to/project/home/folder/beam/pipelines/input-process-output.hpl \
/tmp/hop-metadata.json \
Flink
- I guess there is no way to see the data somewhere in Flink gui?
No but I'm thinking of adding a Hop call-back service (on Hop Server probably) so that we can capture progress and allow Hop GUI to see that.
- How can I modify this 'generate-synthetic-data' pipeline so Flink saves these sample data to file or database? In last transform 'Beam output' i see variable ${DATA_OUTPUT} - does it mean I can do some mapping in 'Variables' tab of Pipeline Run configuration to point to file? But what about database?
The Table Output transform will allow you to write to a relational database. Performance-wise, please read up on the basics specifically the section on "Row batching with non-Beam transforms".
- Hop exports pipeline (I know it can be executed as well with 'run') and it runs permamently till it is stopped, right? Or there is some parameter to execute it just once?
Flink Run will execute only once as shown in the script above.
HTH,
Matt
Matt
On Mon, May 30, 2022 at 5:42 PM <[email protected]> wrote:
Hello community!I'm learning Apache Flink. I launched test installation on my PC (version 1.13.6), I have HOP running and now I want to run first pipeline on Flink.Yet for some reason it does not work. What I did:1. I exported fat jar according to this tutorial:2. I created new Pipeline Run Configuration with details:Name: Local FlinkDescription:Engine type: Beam Flink pipeline engineThe Flink master: [local]Parallelism: 2......User agent: HopTemp file: file://C:/Temp..Fat jar file location: ${PROJECT_HOME}/fat_jar.jar3. I did open 'generate-synthetic-data' pipeline and executed it with Pipeline run configuration: Local FlinkLooks like it runs; I see it in the Flink dashboard.I have two questions:- I guess there is no way to see the data somewhere in Flink gui?- How can I modify this 'generate-synthetic-data' pipeline so Flink saves these sample data to file or database? In last transform 'Beam output' i see variable ${DATA_OUTPUT} - does it mean I can do some mapping in 'Variables' tab of Pipeline Run configuration to point to file? But what about database?- Hop exports pipeline (I know it can be executed as well with 'run') and it runs permamently till it is stopped, right? Or there is some parameter to execute it just once?Thanks a lot for your helpMike
