Thanks all. Yes I was under a misunderstanding that we can directly use one of these templates as a base without creating a custom template. Thanks for clarifying it for me.
Regards, Sumit Desai On Mon, 18 Dec 2023, 10:34 pm Bruno Volpato via user, <user@beam.apache.org> wrote: > Right, there's some misunderstanding here, so Bartosz and XQ's inputs are > correct. > > Just want to add that the template_location parameter is the GCS path that > you want to store your template on, and not the image reference of the base > image. > The GCR path that you are trying to use is used in the Dockerfile in case > you are trying to use a Flex Template (see here: > https://cloud.google.com/dataflow/docs/guides/templates/configuring-flex-templates#use_custom_container_images > ). > > Best, > Bruno > > > > > On Mon, Dec 18, 2023 at 11:39 AM XQ Hu via user <user@beam.apache.org> > wrote: > >> >> https://github.com/google/dataflow-ml-starter/tree/main?tab=readme-ov-file#run-the-beam-pipeline-with-dataflow-flex-templates >> has a full example about how to create your own flex template. FYI. >> >> On Mon, Dec 18, 2023 at 5:01 AM Bartosz Zabłocki via user < >> user@beam.apache.org> wrote: >> >>> Hi Sumit, >>> could you elaborate a little bit more on what you are trying to achieve >>> with the templates? >>> >>> As far as I know, these base Docker images serve as base images for your >>> own custom templates. >>> If you want to use an existing template, you can use one of these: >>> https://cloud.google.com/dataflow/docs/guides/templates/provided-templates >>> . >>> To run it, you just need to invoke `gcloud dataflow jobs run... ` or >>> equivalent command ( >>> https://cloud.google.com/dataflow/docs/guides/templates/provided/pubsub-to-pubsub#gcloud). >>> Or just use the UI to launch it (Cloud Console -> Dataflow -> Jobs -> >>> Create Job From Template). >>> >>> If you want to create your own template (ie a reusable Dataflow >>> pipeline) take a look at this page: >>> https://cloud.google.com/dataflow/docs/guides/templates/using-flex-templates#create_a_flex_template. >>> This will let you package your own pipeline as a template. You'll be able >>> to launch it with the `gcloud dataflow jobs run...` command. >>> If you want to create a custom container image, which gives you more >>> control over the environment and dependencies, you can create your own, >>> custom Docker image. That's where you'll use the base image you mentioned. >>> See this page for an example: >>> https://cloud.google.com/dataflow/docs/guides/templates/configuring-flex-templates#use_a_custom_container_for_dependencies >>> . >>> >>> I hope this helps, let me know if you have any other questions. >>> >>> Cheers, >>> Bartosz Zablocki >>> >>> On Mon, Dec 18, 2023 at 8:36 AM Sumit Desai via user < >>> user@beam.apache.org> wrote: >>> >>>> I am creating an Apache beam pipeline using Python SDK.I want to use >>>> some standard template of dataflow (this one >>>> <https://console.cloud.google.com/gcr/images/dataflow-templates-base/global/python310-template-launcher-base?tab=info>). >>>> But when I am specifying it using 'template_location' key while creating >>>> pipeline_options object, I am getting an error `FileNotFoundError: [Errno >>>> 2] No such file or directory: ' >>>> gcr.io/dataflow-templates-base/python310-template-launcher-base'` >>>> <http://gcr.io/dataflow-templates-base/python310-template-launcher-base'> >>>> >>>> I also tried to specify the complete version ` >>>> gcr.io/dataflow-templates-base/python310-template-launcher-base::flex_templates_base_image_release_20231127_RC00` >>>> <http://gcr.io/dataflow-templates-base/python310-template-launcher-base::flex_templates_base_image_release_20231127_RC00> >>>> but got the same error. Can someone suggest what I might be doing wrong? >>>> The code snippet to create pipeline_options is as follows- >>>> >>>> def __create_pipeline_options_dataflow(job_name): >>>> >>>> >>>> # Set up the Dataflow runner options >>>> gcp_project_id = os.environ.get(GCP_PROJECT_ID) >>>> # TODO:Move to environmental variables >>>> pipeline_options = { >>>> 'project': gcp_project_id, >>>> 'region': "us-east1", >>>> 'job_name': job_name, # Provide a unique job name >>>> 'temp_location': >>>> f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/temp', >>>> 'staging_location': >>>> f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/staging', >>>> 'runner': 'DataflowRunner', >>>> 'save_main_session': True, >>>> 'service_account_email': service_account, >>>> # 'network': >>>> f'projects/{gcp_project_id}/global/networks/default', >>>> # 'subnetwork': >>>> f'projects/{gcp_project_id}/regions/us-east1/subnetworks/default' >>>> 'template_location': ' >>>> gcr.io/dataflow-templates-base/python310-template-launcher-base' >>>> >>>> } >>>> logger.debug(f"pipeline_options created as {pipeline_options}") >>>> return pipeline_options >>>> >>>> >>>>