Don’t take this at face value since I’m a novice with Airflow but my 
understanding of best practices is to have the scheduler restart every so often 
(cmd line: -r <seconds> or config: scheduler.run_duration = <seconds>) Kill all 
the processes and try setting that to something low, then if the problem goes 
away, increase it to a day or something.

From: Reed Villanueva <[email protected]>
Sent: Monday, December 9, 2019 3:48 PM
To: [email protected]
Subject: Airflow scheduler complains no heartbeat when running daemon


Have problem where the airflow (v1.10.5) webserver will complain...

The scheduler does not appear to be running. Last heartbeat was received 45 
minutes ago.

But checking the scheduler daemon process (started via airflow scheduler -D) 
can see...

[airflow@airflowetl airflow]$ cat airflow-scheduler.pid

64186

[airflow@airflowetl airflow]$ ps -aux | grep 64186

airflow   64186  0.0  0.1 663340 67796 ?        S    15:03   0:00 
/usr/bin/python3 /home/airflow/.local/bin/airflow scheduler -D

airflow   94305  0.0  0.0 112716   964 pts/4    R+   16:01   0:00 grep 
--color=auto 64186

and after some period of time the error message goes away again).

This happens very frequently off-and-on even after restarting both the 
webserver and scheduler.

The airflow-scheduler.err file is empty and the .out and .log files appear 
innocuous (need more time to look through deeper).

Running the scheduler in the terminal to see the feed live, everything seems to 
run fine until I see this output in the middle of the dag execution

[2019-11-29 15:51:57,825] {__init__.py:51} INFO - Using executor 
SequentialExecutor

[2019-11-29 15:51:58,259] {dagbag.py:90} INFO - Filling up the DagBag from 
/home/airflow/airflow/dags/my_dag_file.py

Once this pops up, I can see in the web UI that the scheduler heartbeat error 
message appears. (Oddly, killing the scheduler process here does not generate 
the heartbeat error message in the web UI). Checking for the scheduler process, 
I see...

[airflow@airflowetl airflow]$ ps -aux | grep scheduler

airflow    3409  0.2  0.1 523336 67384 ?        S    Oct24 115:06 airflow 
scheduler -- DagFileProcessorManager

airflow   25569  0.0  0.0 112716   968 pts/4    S+   16:00   0:00 grep 
--color=auto scheduler

airflow   56771  0.0  0.1 662560 67264 ?        S    Nov26   4:09 airflow 
scheduler -- DagFileProcessorManager

airflow   64187  0.0  0.1 662564 67096 ?        S    Nov27   0:00 airflow 
scheduler -- DagFileProcessorManager

airflow  153959  0.1  0.1 662568 67232 ?        S    15:01   0:06 airflow 
scheduler -- DagFileProcessorManager

IDK if this is this normal or not.

Thought the problem may have been that there were older scheduler processes 
that were not deleted that were still running...

[airflow@airflowetl airflow]$ kill -9 3409 36771

bash: kill: (36771) - No such process

[airflow@airflowetl airflow]$ ps -aux | grep scheduler

airflow   56771  0.0  0.1 662560 67264 ?        S    Nov26   4:09 airflow 
scheduler -- DagFileProcessorManager

airflow   64187  0.0  0.1 662564 67096 ?        S    Nov27   0:00 airflow 
scheduler -- DagFileProcessorManager

airflow  153959  0.0  0.1 662568 67232 ?        S    Nov29   0:06 airflow 
scheduler -- DagFileProcessorManager

airflow  155741  0.0  0.0 112712   968 pts/2    R+   15:54   0:00 grep 
--color=auto scheduler

Notice all the various start times in the output.

Doing a kill -9 56771 64187 ... and then rerunning airflow scheduler -D does 
not seem to have fixed the problem.

Note: the scheduler seems to consistently stop running after a task fails to 
move a file from an FTP location to an HDFS one...

hadoop fs -Dfs.mapr.trace=debug -get \

        ftp://$FTP_CLIENT:$FTP_PASS@$FTP_IP/$FTP_DIR"$TABLENAME.TSV"; \

        $PROJECT_HOME/tmp/"$TABLENAME.TSV" \

        | hadoop fs -moveFromLocal $PROJECT_HOME/tmp/"$TABLENAME.TSV" 
"$DATASTORE"

# see https://stackoverflow.com/a/46433847/8236733

Note there is a logic error in this line since $DATASTORE is a hdfs dir path, 
not a file path, but either way I don't think that the airflow scheduler should 
be missing heartbeats like this from something seemingly so unrelated.

Anyone know what could be going on here or how to fix?

This electronic message is intended only for the named
recipient, and may contain information that is confidential or
privileged. If you are not the intended recipient, you are
hereby notified that any disclosure, copying, distribution or
use of the contents of this message is strictly prohibited. If
you have received this message in error or are not the named
recipient, please notify us immediately by contacting the
sender at the electronic mail address noted above, and delete
and destroy all copies of this message. Thank you.

Reply via email to