Hi all,
I would appreciate any help on that issue. I have no idea where to start
debugging that issue.
On a local machine, it is working fine. The challenge is to have it working
on a worker machine.
There is an option to run the command below(kallisto) using multiple
threads with -t [# threads]. This option didn't work at all so I tried the
no threads version (below). after the below error, the pipeline is
iterating, fails at the same point and iterates again....
*Environment:*
SDK version
Apache Beam Python 3.5 SDK 2.20.0
*The code:*
*cmd1 = "export PATH=$PATH:/opt/userowned/kallisto"*
*cmd2 = "kallisto quant -i {} --single -l 200 -s 20 -o {}
{}".format(local_kallisto_index_file, \
local_folder_align_out, \ local_fastq_filenames[0])*
*from subprocess import Popen, PIPE, STDOUT final = Popen("{};
{}".format(cmd1,cmd2), shell=True, stdin=PIPE, stdout=PIPE, stderr=STDOUT,
close_fds=True) stdout, nothing = final.communicate()*
The execution fires the following error:
{"work": 8708750263297725979, "timestamp": {"seconds": 1588400583, "nanos":
503702878}, "severity": "WARN", "message": "Operation ongoing for over
647.37 seconds in state process-msecs in step s2 . Current Traceback:\n
File \"/usr/local/lib/python3.5/runpy.py\", line 193, in
_run_module_as_main\n \"__main__\", mod_spec)\n File
\"/usr/local/lib/python3.5/runpy.py\", line 85, in _run_code\n exec(code,
run_globals)\n File
\"/usr/local/lib/python3.5/site-packages/dataflow_worker/start.py\", line
144, in <module>\n main()\n File
\"/usr/local/lib/python3.5/site-packages/dataflow_worker/start.py\", line
140, in main\n batchworker.BatchWorker(properties,
sdk_pipeline_options).run()\n File
\"/usr/local/lib/python3.5/site-packages/dataflow_worker/batchw
*next log is reinitiating the pipeline execution:*
Worker started with properties: {'job_id':
'2020-05-01_23_08_31-17246501580113091346', 'service_path': '
https://dataflow.googleapis.com/', 'temp_gcs_directory': 'gs://unused',
'root_url': 'https://dataflow.googleapis.com',
'dataflow.worker.logging.location':
'/var/log/dataflow/python-dataflow-0-json.log', 'reporting_enabled':
'True', 'project_id': 'orielresearch-188115', 'worker_id':
'step-1-rna-expression-no--05012308-qrjn-harness-qlpw',
'local_staging_directory': '/var/opt/google/dataflow',
'sdk_pipeline_options':
'{"display_data":[{"key":"temp_location","namespace":"apache_beam.options.pipeline_options.PipelineOptions","type":"STRING","value":"gs://ort_execution_aux/tmp/step-1-rna-expression-no-fastqc-5054.1588399703.032773"},{"key":"beam_plugins","namespace":"apache_beam.options.pipeline_options.PipelineOptions","type":"STRING","value":"[\'apache_beam.io.filesystem.FileSystem\',
\'apache_beam.io.hadoopfilesystem.HadoopFileSystem\',
\'apache_beam.io.localfilesystem.LocalFileSystem\',
\'apache_beam.io.aws.s3filesystem.S3FileSystem\',
\'apache_beam.io.gcp.gcsfilesystem.GCSFileSystem\']"},{"key":"setup_file","namespace":"apache_beam.options.pipeline_options.PipelineOptions","type":"STRING","value":"./setup.py"},{"key":"staging_location","namespace":"apache_beam.options.pipeline_options.PipelineOptions","type":"STRING","value":"gs://ort_execution_aux/staging/step-1-rna-expression-no-fastqc-5054.1588399703.032773"},{"key":"labels","namespace":"apache_beam.options.pipeline_options.PipelineOptions","type":"STRING","value":"[\'goog-dataflow-notebook=2_20_0\']"},{"key":"experiments","namespace":"apache_beam.options.pipeline_options.PipelineOptions","type":"STRING","value":"[\'use_fastavro\']"},{"key":"runner","namespace":"apache_beam.options.pipeline_options.PipelineOptions","type":"STRING","value":"DataflowRunner"}],"options":{"artifact_port":0,"autoscalingAlgorithm":"NONE","beam_plugins":["apache_beam.io.filesystem.FileSystem","apache_beam.io.hadoopfilesystem.HadoopFileSystem","apache_beam.io.localfilesystem.LocalFileSystem","apache_beam.io.aws.s3filesystem.S3FileSystem","apache_beam.io.gcp.gcsfilesystem.GCSFileSystem"],"dataflowJobId":"2020-05-01_23_08_31-17246501580113091346","dataflow_endpoint":"
https://dataflow.googleapis.com
","direct_num_workers":1,"direct_runner_bundle_repeat":0,"direct_runner_use_stacked_bundle":true,"direct_running_mode":"in_memory","dry_run":false,"enable_streaming_engine":false,"environment_cache_millis":0,"expansion_port":0,"experiments":["use_fastavro"],"flink_master":"[auto]","flink_submit_uber_jar":false,"flink_version":"1.9","gcpTempLocation":"gs://ort_execution_aux/tmp","hdfs_full_urls":false,"job_port":0,"job_server_timeout":60,"labels":["goog-dataflow-notebook=2_20_0"],"maxNumWorkers":0,"no_auth":false,"numWorkers":3,"pipelineUrl":"gs://ort_execution_aux/staging/pipeline.pb","pipeline_type_check":true,"profile_cpu":false,"profile_memory":false,"profile_sample_rate":1,"project":"orielresearch-188115","runner":"DataflowRunner","runtime_type_check":false,"save_main_session":false,"sdk_location":"default","sdk_worker_parallelism":1,"setup_file":"./setup.py","spark_master_url":"local[4]","spark_submit_uber_jar":false,"staging_location":"gs://ort_execution_aux/staging/step-1-rna-expression-no-fastqc-5054.1588399703.032773","streaming":false,"temp_location":"gs://ort_execution_aux/tmp/step-1-rna-expression-no-fastqc-5054.1588399703.032773","type_check_strictness":"DEFAULT_TO_ANY","update":false}}'}
it is reiterating like that for more than 30 minutes with no output.
Many thanks for any advice on how to move forward.
Best,
--
Eila
<http://www.orielresearch.com>
Meetup <https://www.meetup.com/Deep-Learning-In-Production/>