Hello Hans,

I went through the flex-template process yesterday but the generated template 
does not work. The main piece that's missing for me is how to pass the actual 
pipeline that should be run. My test boiled down to:

gcloud dataflow flex-template build 
gs://foo_ag_dataflow/tmp/todays-directories.json \
      --image-gcr-path 
"europe-west1-docker.pkg.dev/dashboard-foo/dataflow/hop:latest" \
      --sdk-language "JAVA" \
      --flex-template-base-image JAVA11 \
      --metadata-file 
"/Users/fabian/Documents/src/foo/fooDataEngineering/hop/dataflow/todays-directories.json"
 \
      --jar "/Users/fabian/tmp/fat-hop.jar" \
      --env FLEX_TEMPLATE_JAVA_MAIN_CLASS="org.apache.hop.beam.run.MainBeam"

gcloud dataflow flex-template run "todays-directories-`date +%Y%m%d-%H%M%S`" \
    --template-file-gcs-location 
"gs://foo_ag_dataflow/tmp/todays-directories.json" \
    --region "europe-west1"

With Dockerfile:

FROM gcr.io/dataflow-templates-base/java11-template-launcher-base

ARG WORKDIR=/dataflow/template
RUN mkdir -p ${WORKDIR}
WORKDIR ${WORKDIR}

ENV FLEX_TEMPLATE_JAVA_MAIN_CLASS="org.apache.hop.beam.run.MainBeam"
ENV FLEX_TEMPLATE_JAVA_CLASSPATH="/dataflow/template/*"

ENTRYPOINT ["/opt/google/dataflow/java_template_launcher"]


And "todays-directories.json":

{
    "defaultEnvironment": {},
    "image": "europe-west1-docker.pkg.dev/dashboard-foo/dataflow/hop:latest",
    "metadata": {
        "description": "Test templates creation with Apache Hop",
        "name": "Todays directories"
    },
    "sdkInfo": {
        "language": "JAVA"
    }
}

Thanks for having a look at it!

cheers

Fabian

> Am 10.08.2022 um 16:03 schrieb Hans Van Akelyen <[email protected]>:
> 
> Hi Fabian,
> 
> You have indeed found something we have not yet documented, mainly because we 
> have not yet tried it out ourselves.
> The main class that gets called when running Beam pipelines is 
> "org.apache.hop.beam.run.MainBeam".
> 
> I was hoping the "Import as pipeline" button on a job would give you 
> everything you need to execute this but it does not.
> I'll take a closer look the following days to see what is needed to use this 
> functionality, could be that we need to export the template based on a 
> pipeline.
> 
> Kr,
> Hans
> 
> On Wed, 10 Aug 2022 at 15:46, Fabian Peters <[email protected] 
> <mailto:[email protected]>> wrote:
> Hi all!
> 
> Thanks to Hans' work on the REST transform, I can now deploy my jobs to 
> Dataflow.
> 
> Next, I'd like to schedule a batch job 
> <https://cloud.google.com/community/tutorials/schedule-dataflow-jobs-with-cloud-scheduler>,
>  but for this I need to create a  
> <https://cloud.google.com/dataflow/docs/concepts/dataflow-templates>template 
> <https://cloud.google.com/dataflow/docs/concepts/dataflow-templates>. I've 
> searched the Hop documentation but haven't found anything on this. I'm 
> guessing that flex-templates 
> <https://cloud.google.com/dataflow/docs/guides/templates/using-flex-templates#create_a_flex_template>
>  are the way to go, due to the fat-jar, but I'm wondering what to pass as the 
> FLEX_TEMPLATE_JAVA_MAIN_CLASS.
> 
> cheers
> 
> Fabian

Reply via email to