Thanks all. Yes I was under a misunderstanding that we can directly use one
of these templates as a base without creating a custom template. Thanks for
clarifying it for me.

Regards,
Sumit Desai

On Mon, 18 Dec 2023, 10:34 pm Bruno Volpato via user, <user@beam.apache.org>
wrote:

> Right, there's some misunderstanding here, so Bartosz and XQ's inputs are
> correct.
>
> Just want to add that the template_location parameter is the GCS path that
> you want to store your template on, and not the image reference of the base
> image.
> The GCR path that you are trying to use is used in the Dockerfile in case
> you are trying to use a Flex Template (see here:
> https://cloud.google.com/dataflow/docs/guides/templates/configuring-flex-templates#use_custom_container_images
> ).
>
> Best,
> Bruno
>
>
>
>
> On Mon, Dec 18, 2023 at 11:39 AM XQ Hu via user <user@beam.apache.org>
> wrote:
>
>>
>> https://github.com/google/dataflow-ml-starter/tree/main?tab=readme-ov-file#run-the-beam-pipeline-with-dataflow-flex-templates
>> has a full example about how to create your own flex template. FYI.
>>
>> On Mon, Dec 18, 2023 at 5:01 AM Bartosz Zabłocki via user <
>> user@beam.apache.org> wrote:
>>
>>> Hi Sumit,
>>> could you elaborate a little bit more on what you are trying to achieve
>>> with the templates?
>>>
>>> As far as I know, these base Docker images serve as base images for your
>>> own custom templates.
>>> If you want to use an existing template, you can use one of these:
>>> https://cloud.google.com/dataflow/docs/guides/templates/provided-templates
>>> .
>>> To run it, you just need to invoke `gcloud dataflow jobs run... ` or
>>> equivalent command (
>>> https://cloud.google.com/dataflow/docs/guides/templates/provided/pubsub-to-pubsub#gcloud).
>>> Or just use the UI to launch it (Cloud Console -> Dataflow -> Jobs ->
>>> Create Job From Template).
>>>
>>> If you want to create your own template (ie a reusable Dataflow
>>> pipeline) take a look at this page:
>>> https://cloud.google.com/dataflow/docs/guides/templates/using-flex-templates#create_a_flex_template.
>>> This will let you package your own pipeline as a template. You'll be able
>>> to launch it with the `gcloud dataflow jobs run...` command.
>>> If you want to create a custom container image, which gives you more
>>> control over the environment and dependencies, you can create your own,
>>> custom Docker image. That's where you'll use the base image you mentioned.
>>> See this page for an example:
>>> https://cloud.google.com/dataflow/docs/guides/templates/configuring-flex-templates#use_a_custom_container_for_dependencies
>>> .
>>>
>>> I hope this helps, let me know if you have any other questions.
>>>
>>> Cheers,
>>> Bartosz Zablocki
>>>
>>> On Mon, Dec 18, 2023 at 8:36 AM Sumit Desai via user <
>>> user@beam.apache.org> wrote:
>>>
>>>> I am creating an Apache beam pipeline using Python SDK.I want to use
>>>> some standard template of dataflow (this one
>>>> <https://console.cloud.google.com/gcr/images/dataflow-templates-base/global/python310-template-launcher-base?tab=info>).
>>>> But when I am specifying it using 'template_location' key while creating
>>>> pipeline_options object, I am getting an error `FileNotFoundError: [Errno
>>>> 2] No such file or directory: '
>>>> gcr.io/dataflow-templates-base/python310-template-launcher-base'`
>>>> <http://gcr.io/dataflow-templates-base/python310-template-launcher-base'>
>>>>
>>>> I also tried to specify the complete version `
>>>> gcr.io/dataflow-templates-base/python310-template-launcher-base::flex_templates_base_image_release_20231127_RC00`
>>>> <http://gcr.io/dataflow-templates-base/python310-template-launcher-base::flex_templates_base_image_release_20231127_RC00>
>>>> but got the same error. Can someone suggest what I might be doing wrong?
>>>> The code snippet to create pipeline_options is as follows-
>>>>
>>>> def __create_pipeline_options_dataflow(job_name):
>>>>
>>>>
>>>>     # Set up the Dataflow runner options
>>>>     gcp_project_id = os.environ.get(GCP_PROJECT_ID)
>>>>     # TODO:Move to environmental variables
>>>>     pipeline_options = {
>>>>         'project': gcp_project_id,
>>>>         'region': "us-east1",
>>>>         'job_name': job_name,  # Provide a unique job name
>>>>         'temp_location':
>>>> f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/temp',
>>>>         'staging_location':
>>>> f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/staging',
>>>>         'runner': 'DataflowRunner',
>>>>         'save_main_session': True,
>>>>         'service_account_email': service_account,
>>>>         # 'network':
>>>> f'projects/{gcp_project_id}/global/networks/default',
>>>>         # 'subnetwork':
>>>> f'projects/{gcp_project_id}/regions/us-east1/subnetworks/default'
>>>>         'template_location': '
>>>> gcr.io/dataflow-templates-base/python310-template-launcher-base'
>>>>
>>>>     }
>>>>     logger.debug(f"pipeline_options created as {pipeline_options}")
>>>>     return pipeline_options
>>>>
>>>>
>>>>

Reply via email to