Thank you! added workerOptions (GoogleCloudOption fired an error on worker
machine)

code below for anyone else that might need it:

Apache beam SDK:
!pip show apache-beam

Name: apache-beam
Version: 2.20.0
Summary: Apache Beam SDK for Python
Home-page: https://beam.apache.org
Author: Apache Software Foundation
Author-email: [email protected]
License: Apache License, Version 2.0
Location: /usr/local/envs/py3env/lib/python3.5/site-packages
Requires: grpcio, python-dateutil, typing, dill, pyarrow, pydot,
future, hdfs, httplib2, pytz, typing-extensions, fastavro,
avro-python3, crcmod, mock, oauth2client, protobuf, pymongo, numpy
Required-by:


*Python 3 code:*


options = PipelineOptions()
standard_cloud_options = options.view_as(StandardOptions)
standard_cloud_options.runner = RUNNER #'DataflowRunner'
worker_cloud_options = options.view_as(WorkerOptions)
worker_cloud_options.machine_type = 'n1-highcpu-96'
setup_cloud_options = options.view_as(SetupOptions)
setup_cloud_options.setup_file = "./setup.py"
google_cloud_options = options.view_as(GoogleCloudOptions)
google_cloud_options.project = 'project_name'
job_rand = ''.join(random.choice('0123456789abcdef') for j in range(4))
google_cloud_options.job_name = 'n1-highcpu-96-'+ job_rand
google_cloud_options.staging_location = '%s/staging' % BUCKET_URL
google_cloud_options.temp_location = '%s/tmp' % BUCKET_URL
google_cloud_options.region = 'us-central1'

HTH,
Eila



On Tue, May 12, 2020 at 11:43 AM Brian Hulette <[email protected]> wrote:

> Hi Eila,
>
> It looks like you're attempting to set the option on the
> GoogleCloudOptions class directly, I think you want to set it on an
> instance of PipelineOptions that you've viewed as GoogleCloudOptions. Like
> this example from
> https://cloud.google.com/dataflow/docs/guides/specifying-exec-params#configuring-pipelineoptions-for-execution-on-the-cloud-dataflow-service
>
> # Create and set your PipelineOptions.
> options = PipelineOptions(flags=argv)
>
> # For Cloud execution, specify DataflowRunner and set the Cloud Platform
> # project, job name, staging file location, temp file location, and region.
> options.view_as(StandardOptions).runner = 'DataflowRunner'
> google_cloud_options = options.view_as(GoogleCloudOptions)
> google_cloud_options.project = 'my-project-id'
> ...
> # Create the Pipeline with the specified options.
> p = Pipeline(options=options)
>
> Alternatively you should be able to just specify --worker_machine_type at
> the command line if you're parsing the PipelineOptions from sys.argv. Does
> that help?
>
> Brian
>
> On Tue, May 12, 2020 at 8:30 AM OrielResearch Eila Arich-Landkof <
> [email protected]> wrote:
>
>> Hello,
>>
>> I am trying to check if the setting of the resources are actually being
>> implemented.
>> What will be the right way to do it.
>> *the code is:*
>> GoogleCloudOptions.worker_machine_type = 'n1-highcpu-96'
>>
>> and *the dataflow view is *the following (nothing that reflects
>> the highcpu machine.
>> Please advice
>>
>> Thanks,
>> Eila
>> Resource metrics
>> Current vCPUs
>>
>> 1
>>
>> Total vCPU time
>>
>> 0.07 vCPU hr
>>
>> Current memory
>>
>> 3.75 GB
>>
>> Total memory time
>>
>> 0.264 GB hr
>>
>> Current PD
>>
>> 250 GB
>>
>> Total PD time
>>
>> 17.632 GB hr
>>
>> Current SSD PD
>>
>> 0 B
>>
>> Total SSD PD time
>>
>> 0 GB hr
>>
>>
>> --
>> Eila
>> <http://www.orielresearch.com>
>> Meetup <https://www.meetup.com/Deep-Learning-In-Production/>
>>
>

-- 
Eila
<http://www.orielresearch.com>
Meetup <https://www.meetup.com/Deep-Learning-In-Production/>

Reply via email to