Hmm, having more than one DagFileProcessorManager alive at the same time does indicate something has gone wrong -- there should only be one of those.
Are you using sub-dags or doing anything else "unusual" in any of your dag files? What executor are you using? What process supervisor (if any) are you using to run your scheduler. -ash > On 9 Dec 2019, at 20:48, Reed Villanueva <[email protected]> wrote: > > Have problem where the airflow (v1.10.5) webserver will complain... > > The scheduler does not appear to be running. Last heartbeat was received 45 > minutes ago. > But checking the scheduler daemon process (started via airflow scheduler -D) > can see... > > [airflow@airflowetl airflow]$ cat airflow-scheduler.pid > 64186 > [airflow@airflowetl airflow]$ ps -aux | grep 64186 > airflow 64186 0.0 0.1 663340 67796 ? S 15:03 0:00 > /usr/bin/python3 /home/airflow/.local/bin/airflow scheduler -D > airflow 94305 0.0 0.0 112716 964 pts/4 R+ 16:01 0:00 grep > --color=auto 64186 > and after some period of time the error message goes away again). > > This happens very frequently off-and-on even after restarting both the > webserver and scheduler. > > The airflow-scheduler.err file is empty and the .out and .log files appear > innocuous (need more time to look through deeper). > > Running the scheduler in the terminal to see the feed live, everything seems > to run fine until I see this output in the middle of the dag execution > > [2019-11-29 15:51:57,825] {__init__.py:51} INFO - Using executor > SequentialExecutor > [2019-11-29 15:51:58,259] {dagbag.py:90} INFO - Filling up the DagBag from > /home/airflow/airflow/dags/my_dag_file.py > Once this pops up, I can see in the web UI that the scheduler heartbeat error > message appears. (Oddly, killing the scheduler process here does not generate > the heartbeat error message in the web UI). Checking for the scheduler > process, I see... > > [airflow@airflowetl airflow]$ ps -aux | grep scheduler > airflow 3409 0.2 0.1 523336 67384 ? S Oct24 115:06 airflow > scheduler -- DagFileProcessorManager > airflow 25569 0.0 0.0 112716 968 pts/4 S+ 16:00 0:00 grep > --color=auto scheduler > airflow 56771 0.0 0.1 662560 67264 ? S Nov26 4:09 airflow > scheduler -- DagFileProcessorManager > airflow 64187 0.0 0.1 662564 67096 ? S Nov27 0:00 airflow > scheduler -- DagFileProcessorManager > airflow 153959 0.1 0.1 662568 67232 ? S 15:01 0:06 airflow > scheduler -- DagFileProcessorManager > IDK if this is this normal or not. > > Thought the problem may have been that there were older scheduler processes > that were not deleted that were still running... > > [airflow@airflowetl airflow]$ kill -9 3409 36771 > bash: kill: (36771) - No such process > [airflow@airflowetl airflow]$ ps -aux | grep scheduler > airflow 56771 0.0 0.1 662560 67264 ? S Nov26 4:09 airflow > scheduler -- DagFileProcessorManager > airflow 64187 0.0 0.1 662564 67096 ? S Nov27 0:00 airflow > scheduler -- DagFileProcessorManager > airflow 153959 0.0 0.1 662568 67232 ? S Nov29 0:06 airflow > scheduler -- DagFileProcessorManager > airflow 155741 0.0 0.0 112712 968 pts/2 R+ 15:54 0:00 grep > --color=auto scheduler > Notice all the various start times in the output. > > Doing a kill -9 56771 64187 ... and then rerunning airflow scheduler -D does > not seem to have fixed the problem. > > Note: the scheduler seems to consistently stop running after a task fails to > move a file from an FTP location to an HDFS one... > > hadoop fs -Dfs.mapr.trace=debug -get \ > ftp://$FTP_CLIENT:$FTP_PASS@$FTP_IP/$FTP_DIR"$TABLENAME.TSV" \ > $PROJECT_HOME/tmp/"$TABLENAME.TSV" \ > | hadoop fs -moveFromLocal $PROJECT_HOME/tmp/"$TABLENAME.TSV" > "$DATASTORE" > # see https://stackoverflow.com/a/46433847/8236733 > <https://stackoverflow.com/a/46433847/8236733> > Note there is a logic error in this line since $DATASTORE is a hdfs dir path, > not a file path, but either way I don't think that the airflow scheduler > should be missing heartbeats like this from something seemingly so unrelated. > > Anyone know what could be going on here or how to fix? > > > This electronic message is intended only for the named > recipient, and may contain information that is confidential or > privileged. If you are not the intended recipient, you are > hereby notified that any disclosure, copying, distribution or > use of the contents of this message is strictly prohibited. If > you have received this message in error or are not the named > recipient, please notify us immediately by contacting the > sender at the electronic mail address noted above, and delete > and destroy all copies of this message. Thank you.
