The "ongoing" message is to help you diagnose where the long waits in your pipeline. Usually they indicate a stuck thread but there are usecases which the message is a red herring since the user is intentionally processing a single element for several minutes.
Have you been able to log the output of stdout/stderr and any exit codes returned by the kallisto application? On Sat, May 2, 2020 at 12:22 AM OrielResearch Eila Arich-Landkof < [email protected]> wrote: > Hi all, > > I would appreciate any help on that issue. I have no idea where to start > debugging that issue. > On a local machine, it is working fine. The challenge is to have it > working on a worker machine. > > There is an option to run the command below(kallisto) using multiple > threads with -t [# threads]. This option didn't work at all so I tried the > no threads version (below). after the below error, the pipeline is > iterating, fails at the same point and iterates again.... > > *Environment:* > SDK version > Apache Beam Python 3.5 SDK 2.20.0 > > *The code:* > > *cmd1 = "export PATH=$PATH:/opt/userowned/kallisto"* > > > > *cmd2 = "kallisto quant -i {} --single -l 200 -s 20 -o {} > {}".format(local_kallisto_index_file, \ > local_folder_align_out, \ local_fastq_filenames[0])* > > > *from subprocess import Popen, PIPE, STDOUT final = Popen("{}; > {}".format(cmd1,cmd2), shell=True, stdin=PIPE, stdout=PIPE, stderr=STDOUT, > close_fds=True) stdout, nothing = final.communicate()* > > The execution fires the following error: > {"work": 8708750263297725979, "timestamp": {"seconds": 1588400583, > "nanos": 503702878}, "severity": "WARN", "message": "Operation ongoing for > over 647.37 seconds in state process-msecs in step s2 . Current > Traceback:\n File \"/usr/local/lib/python3.5/runpy.py\", line 193, in > _run_module_as_main\n \"__main__\", mod_spec)\n File > \"/usr/local/lib/python3.5/runpy.py\", line 85, in _run_code\n exec(code, > run_globals)\n File > \"/usr/local/lib/python3.5/site-packages/dataflow_worker/start.py\", line > 144, in <module>\n main()\n File > \"/usr/local/lib/python3.5/site-packages/dataflow_worker/start.py\", line > 140, in main\n batchworker.BatchWorker(properties, > sdk_pipeline_options).run()\n File > \"/usr/local/lib/python3.5/site-packages/dataflow_worker/batchw > *next log is reinitiating the pipeline execution:* > Worker started with properties: {'job_id': > '2020-05-01_23_08_31-17246501580113091346', 'service_path': ' > https://dataflow.googleapis.com/', 'temp_gcs_directory': 'gs://unused', > 'root_url': 'https://dataflow.googleapis.com', > 'dataflow.worker.logging.location': > '/var/log/dataflow/python-dataflow-0-json.log', 'reporting_enabled': > 'True', 'project_id': 'orielresearch-188115', 'worker_id': > 'step-1-rna-expression-no--05012308-qrjn-harness-qlpw', > 'local_staging_directory': '/var/opt/google/dataflow', > 'sdk_pipeline_options': > '{"display_data":[{"key":"temp_location","namespace":"apache_beam.options.pipeline_options.PipelineOptions","type":"STRING","value":"gs://ort_execution_aux/tmp/step-1-rna-expression-no-fastqc-5054.1588399703.032773"},{"key":"beam_plugins","namespace":"apache_beam.options.pipeline_options.PipelineOptions","type":"STRING","value":"[\'apache_beam.io.filesystem.FileSystem\', > \'apache_beam.io.hadoopfilesystem.HadoopFileSystem\', > \'apache_beam.io.localfilesystem.LocalFileSystem\', > \'apache_beam.io.aws.s3filesystem.S3FileSystem\', > \'apache_beam.io.gcp.gcsfilesystem.GCSFileSystem\']"},{"key":"setup_file","namespace":"apache_beam.options.pipeline_options.PipelineOptions","type":"STRING","value":"./setup.py"},{"key":"staging_location","namespace":"apache_beam.options.pipeline_options.PipelineOptions","type":"STRING","value":"gs://ort_execution_aux/staging/step-1-rna-expression-no-fastqc-5054.1588399703.032773"},{"key":"labels","namespace":"apache_beam.options.pipeline_options.PipelineOptions","type":"STRING","value":"[\'goog-dataflow-notebook=2_20_0\']"},{"key":"experiments","namespace":"apache_beam.options.pipeline_options.PipelineOptions","type":"STRING","value":"[\'use_fastavro\']"},{"key":"runner","namespace":"apache_beam.options.pipeline_options.PipelineOptions","type":"STRING","value":"DataflowRunner"}],"options":{"artifact_port":0,"autoscalingAlgorithm":"NONE","beam_plugins":["apache_beam.io.filesystem.FileSystem","apache_beam.io.hadoopfilesystem.HadoopFileSystem","apache_beam.io.localfilesystem.LocalFileSystem","apache_beam.io.aws.s3filesystem.S3FileSystem","apache_beam.io.gcp.gcsfilesystem.GCSFileSystem"],"dataflowJobId":"2020-05-01_23_08_31-17246501580113091346","dataflow_endpoint":" > https://dataflow.googleapis.com > ","direct_num_workers":1,"direct_runner_bundle_repeat":0,"direct_runner_use_stacked_bundle":true,"direct_running_mode":"in_memory","dry_run":false,"enable_streaming_engine":false,"environment_cache_millis":0,"expansion_port":0,"experiments":["use_fastavro"],"flink_master":"[auto]","flink_submit_uber_jar":false,"flink_version":"1.9","gcpTempLocation":"gs://ort_execution_aux/tmp","hdfs_full_urls":false,"job_port":0,"job_server_timeout":60,"labels":["goog-dataflow-notebook=2_20_0"],"maxNumWorkers":0,"no_auth":false,"numWorkers":3,"pipelineUrl":"gs://ort_execution_aux/staging/pipeline.pb","pipeline_type_check":true,"profile_cpu":false,"profile_memory":false,"profile_sample_rate":1,"project":"orielresearch-188115","runner":"DataflowRunner","runtime_type_check":false,"save_main_session":false,"sdk_location":"default","sdk_worker_parallelism":1,"setup_file":"./setup.py","spark_master_url":"local[4]","spark_submit_uber_jar":false,"staging_location":"gs://ort_execution_aux/staging/step-1-rna-expression-no-fastqc-5054.1588399703.032773","streaming":false,"temp_location":"gs://ort_execution_aux/tmp/step-1-rna-expression-no-fastqc-5054.1588399703.032773","type_check_strictness":"DEFAULT_TO_ANY","update":false}}'} > > it is reiterating like that for more than 30 minutes with no output. > Many thanks for any advice on how to move forward. > Best, > -- > Eila > <http://www.orielresearch.com> > Meetup <https://www.meetup.com/Deep-Learning-In-Production/> >
