Re: Error and initiating new worker when executing a subprocess with Popen

Luke Cwik Sun, 03 May 2020 08:35:18 -0700

The "ongoing" message is to help you diagnose where the long waits in your
pipeline. Usually they indicate a stuck thread but there are usecases which
the message is a red herring since the user is intentionally processing a
single element for several minutes.


Have you been able to log the output of stdout/stderr and any exit codes
returned by the kallisto application?

On Sat, May 2, 2020 at 12:22 AM OrielResearch Eila Arich-Landkof <
[email protected]> wrote:

> Hi all,
>
> I would appreciate any help on that issue. I have no idea where to start
> debugging that issue.
> On a local machine, it is working fine. The challenge is to have it
> working on a worker machine.
>
> There is an option to run the command below(kallisto) using multiple
> threads with -t [# threads]. This option didn't work at all so I tried the
> no threads version (below). after the below error, the pipeline is
> iterating, fails at the same point and iterates again....
>
> *Environment:*
> SDK version
> Apache Beam Python 3.5 SDK 2.20.0
>
> *The code:*
>
> *cmd1 = "export PATH=$PATH:/opt/userowned/kallisto"*
>
>
>
> *cmd2 = "kallisto quant -i {} --single -l 200 -s 20 -o {}
> {}".format(local_kallisto_index_file, \
> local_folder_align_out, \                local_fastq_filenames[0])*
>
>
> *from subprocess import Popen, PIPE, STDOUT    final = Popen("{};
> {}".format(cmd1,cmd2), shell=True, stdin=PIPE, stdout=PIPE, stderr=STDOUT,
> close_fds=True)    stdout, nothing = final.communicate()*
>
> The execution fires the following error:
> {"work": 8708750263297725979, "timestamp": {"seconds": 1588400583,
> "nanos": 503702878}, "severity": "WARN", "message": "Operation ongoing for
> over 647.37 seconds in state process-msecs in step s2 . Current
> Traceback:\n File \"/usr/local/lib/python3.5/runpy.py\", line 193, in
> _run_module_as_main\n \"__main__\", mod_spec)\n File
> \"/usr/local/lib/python3.5/runpy.py\", line 85, in _run_code\n exec(code,
> run_globals)\n File
> \"/usr/local/lib/python3.5/site-packages/dataflow_worker/start.py\", line
> 144, in <module>\n main()\n File
> \"/usr/local/lib/python3.5/site-packages/dataflow_worker/start.py\", line
> 140, in main\n batchworker.BatchWorker(properties,
> sdk_pipeline_options).run()\n File
> \"/usr/local/lib/python3.5/site-packages/dataflow_worker/batchw
> *next log is reinitiating the pipeline execution:*
> Worker started with properties: {'job_id':
> '2020-05-01_23_08_31-17246501580113091346', 'service_path': '
> https://dataflow.googleapis.com/', 'temp_gcs_directory': 'gs://unused',
> 'root_url': 'https://dataflow.googleapis.com',
> 'dataflow.worker.logging.location':
> '/var/log/dataflow/python-dataflow-0-json.log', 'reporting_enabled':
> 'True', 'project_id': 'orielresearch-188115', 'worker_id':
> 'step-1-rna-expression-no--05012308-qrjn-harness-qlpw',
> 'local_staging_directory': '/var/opt/google/dataflow',
> 'sdk_pipeline_options':
> '{"display_data":[{"key":"temp_location","namespace":"apache_beam.options.pipeline_options.PipelineOptions","type":"STRING","value":"gs://ort_execution_aux/tmp/step-1-rna-expression-no-fastqc-5054.1588399703.032773"},{"key":"beam_plugins","namespace":"apache_beam.options.pipeline_options.PipelineOptions","type":"STRING","value":"[\'apache_beam.io.filesystem.FileSystem\',
> \'apache_beam.io.hadoopfilesystem.HadoopFileSystem\',
> \'apache_beam.io.localfilesystem.LocalFileSystem\',
> \'apache_beam.io.aws.s3filesystem.S3FileSystem\',
> \'apache_beam.io.gcp.gcsfilesystem.GCSFileSystem\']"},{"key":"setup_file","namespace":"apache_beam.options.pipeline_options.PipelineOptions","type":"STRING","value":"./setup.py"},{"key":"staging_location","namespace":"apache_beam.options.pipeline_options.PipelineOptions","type":"STRING","value":"gs://ort_execution_aux/staging/step-1-rna-expression-no-fastqc-5054.1588399703.032773"},{"key":"labels","namespace":"apache_beam.options.pipeline_options.PipelineOptions","type":"STRING","value":"[\'goog-dataflow-notebook=2_20_0\']"},{"key":"experiments","namespace":"apache_beam.options.pipeline_options.PipelineOptions","type":"STRING","value":"[\'use_fastavro\']"},{"key":"runner","namespace":"apache_beam.options.pipeline_options.PipelineOptions","type":"STRING","value":"DataflowRunner"}],"options":{"artifact_port":0,"autoscalingAlgorithm":"NONE","beam_plugins":["apache_beam.io.filesystem.FileSystem","apache_beam.io.hadoopfilesystem.HadoopFileSystem","apache_beam.io.localfilesystem.LocalFileSystem","apache_beam.io.aws.s3filesystem.S3FileSystem","apache_beam.io.gcp.gcsfilesystem.GCSFileSystem"],"dataflowJobId":"2020-05-01_23_08_31-17246501580113091346","dataflow_endpoint":"
> https://dataflow.googleapis.com
> ","direct_num_workers":1,"direct_runner_bundle_repeat":0,"direct_runner_use_stacked_bundle":true,"direct_running_mode":"in_memory","dry_run":false,"enable_streaming_engine":false,"environment_cache_millis":0,"expansion_port":0,"experiments":["use_fastavro"],"flink_master":"[auto]","flink_submit_uber_jar":false,"flink_version":"1.9","gcpTempLocation":"gs://ort_execution_aux/tmp","hdfs_full_urls":false,"job_port":0,"job_server_timeout":60,"labels":["goog-dataflow-notebook=2_20_0"],"maxNumWorkers":0,"no_auth":false,"numWorkers":3,"pipelineUrl":"gs://ort_execution_aux/staging/pipeline.pb","pipeline_type_check":true,"profile_cpu":false,"profile_memory":false,"profile_sample_rate":1,"project":"orielresearch-188115","runner":"DataflowRunner","runtime_type_check":false,"save_main_session":false,"sdk_location":"default","sdk_worker_parallelism":1,"setup_file":"./setup.py","spark_master_url":"local[4]","spark_submit_uber_jar":false,"staging_location":"gs://ort_execution_aux/staging/step-1-rna-expression-no-fastqc-5054.1588399703.032773","streaming":false,"temp_location":"gs://ort_execution_aux/tmp/step-1-rna-expression-no-fastqc-5054.1588399703.032773","type_check_strictness":"DEFAULT_TO_ANY","update":false}}'}
>
> it is reiterating like that for more than 30 minutes with no output.
> Many thanks for any advice on how to move forward.
> Best,
> --
> Eila
> <http://www.orielresearch.com>
> Meetup <https://www.meetup.com/Deep-Learning-In-Production/>
>

Re: Error and initiating new worker when executing a subprocess with Popen

Reply via email to