Thank you! added workerOptions (GoogleCloudOption fired an error on worker machine)
code below for anyone else that might need it: Apache beam SDK: !pip show apache-beam Name: apache-beam Version: 2.20.0 Summary: Apache Beam SDK for Python Home-page: https://beam.apache.org Author: Apache Software Foundation Author-email: [email protected] License: Apache License, Version 2.0 Location: /usr/local/envs/py3env/lib/python3.5/site-packages Requires: grpcio, python-dateutil, typing, dill, pyarrow, pydot, future, hdfs, httplib2, pytz, typing-extensions, fastavro, avro-python3, crcmod, mock, oauth2client, protobuf, pymongo, numpy Required-by: *Python 3 code:* options = PipelineOptions() standard_cloud_options = options.view_as(StandardOptions) standard_cloud_options.runner = RUNNER #'DataflowRunner' worker_cloud_options = options.view_as(WorkerOptions) worker_cloud_options.machine_type = 'n1-highcpu-96' setup_cloud_options = options.view_as(SetupOptions) setup_cloud_options.setup_file = "./setup.py" google_cloud_options = options.view_as(GoogleCloudOptions) google_cloud_options.project = 'project_name' job_rand = ''.join(random.choice('0123456789abcdef') for j in range(4)) google_cloud_options.job_name = 'n1-highcpu-96-'+ job_rand google_cloud_options.staging_location = '%s/staging' % BUCKET_URL google_cloud_options.temp_location = '%s/tmp' % BUCKET_URL google_cloud_options.region = 'us-central1' HTH, Eila On Tue, May 12, 2020 at 11:43 AM Brian Hulette <[email protected]> wrote: > Hi Eila, > > It looks like you're attempting to set the option on the > GoogleCloudOptions class directly, I think you want to set it on an > instance of PipelineOptions that you've viewed as GoogleCloudOptions. Like > this example from > https://cloud.google.com/dataflow/docs/guides/specifying-exec-params#configuring-pipelineoptions-for-execution-on-the-cloud-dataflow-service > > # Create and set your PipelineOptions. > options = PipelineOptions(flags=argv) > > # For Cloud execution, specify DataflowRunner and set the Cloud Platform > # project, job name, staging file location, temp file location, and region. > options.view_as(StandardOptions).runner = 'DataflowRunner' > google_cloud_options = options.view_as(GoogleCloudOptions) > google_cloud_options.project = 'my-project-id' > ... > # Create the Pipeline with the specified options. > p = Pipeline(options=options) > > Alternatively you should be able to just specify --worker_machine_type at > the command line if you're parsing the PipelineOptions from sys.argv. Does > that help? > > Brian > > On Tue, May 12, 2020 at 8:30 AM OrielResearch Eila Arich-Landkof < > [email protected]> wrote: > >> Hello, >> >> I am trying to check if the setting of the resources are actually being >> implemented. >> What will be the right way to do it. >> *the code is:* >> GoogleCloudOptions.worker_machine_type = 'n1-highcpu-96' >> >> and *the dataflow view is *the following (nothing that reflects >> the highcpu machine. >> Please advice >> >> Thanks, >> Eila >> Resource metrics >> Current vCPUs >> >> 1 >> >> Total vCPU time >> >> 0.07 vCPU hr >> >> Current memory >> >> 3.75 GB >> >> Total memory time >> >> 0.264 GB hr >> >> Current PD >> >> 250 GB >> >> Total PD time >> >> 17.632 GB hr >> >> Current SSD PD >> >> 0 B >> >> Total SSD PD time >> >> 0 GB hr >> >> >> -- >> Eila >> <http://www.orielresearch.com> >> Meetup <https://www.meetup.com/Deep-Learning-In-Production/> >> > -- Eila <http://www.orielresearch.com> Meetup <https://www.meetup.com/Deep-Learning-In-Production/>
