Have problem where the airflow (v1.10.5) webserver will complain...
The scheduler does not appear to be running. Last heartbeat was received 45
minutes ago.
But checking the scheduler daemon process (started via airflow scheduler -D)
can see...
[airflow@airflowetl airflow]$ cat
airflow-scheduler.pid64186[airflow@airflowetl airflow]$ ps -aux | grep
64186
airflow 64186 0.0 0.1 663340 67796 ? S 15:03 0:00
/usr/bin/python3 /home/airflow/.local/bin/airflow scheduler -D
airflow 94305 0.0 0.0 112716 964 pts/4 R+ 16:01 0:00 grep
--color=auto 64186
and after some period of time the error message *goes away again*).
This happens very frequently off-and-on even after restarting both the
webserver and scheduler.
The airflow-scheduler.err file is empty and the .out and .log files appear
innocuous (need more time to look through deeper).
Running the scheduler in the terminal to see the feed live, everything
seems to run fine until I see this output in the middle of the dag execution
[2019-11-29 15:51:57,825] {__init__.py:51} INFO - Using executor
SequentialExecutor[2019-11-29 15:51:58,259] {dagbag.py:90} INFO -
Filling up the DagBag from /home/airflow/airflow/dags/my_dag_file.py
Once this pops up, I can see in the web UI that the scheduler heartbeat
error message appears. (Oddly, killing the scheduler process here does not
generate the heartbeat error message in the web UI). Checking for the
scheduler process, I see...
[airflow@airflowetl airflow]$ ps -aux | grep scheduler
airflow 3409 0.2 0.1 523336 67384 ? S Oct24 115:06
airflow scheduler -- DagFileProcessorManager
airflow 25569 0.0 0.0 112716 968 pts/4 S+ 16:00 0:00 grep
--color=auto scheduler
airflow 56771 0.0 0.1 662560 67264 ? S Nov26 4:09
airflow scheduler -- DagFileProcessorManager
airflow 64187 0.0 0.1 662564 67096 ? S Nov27 0:00
airflow scheduler -- DagFileProcessorManager
airflow 153959 0.1 0.1 662568 67232 ? S 15:01 0:06
airflow scheduler -- DagFileProcessorManager
IDK if this is this normal or not.
Thought the problem may have been that there were older scheduler processes
that were not deleted that were still running...
[airflow@airflowetl airflow]$ kill -9 3409 36771
bash: kill: (36771) - No such process[airflow@airflowetl airflow]$ ps
-aux | grep scheduler
airflow 56771 0.0 0.1 662560 67264 ? S Nov26 4:09
airflow scheduler -- DagFileProcessorManager
airflow 64187 0.0 0.1 662564 67096 ? S Nov27 0:00
airflow scheduler -- DagFileProcessorManager
airflow 153959 0.0 0.1 662568 67232 ? S Nov29 0:06
airflow scheduler -- DagFileProcessorManager
airflow 155741 0.0 0.0 112712 968 pts/2 R+ 15:54 0:00 grep
--color=auto scheduler
Notice all the various start times in the output.
Doing a kill -9 56771 64187 ... and then rerunning airflow scheduler -D does
not seem to have fixed the problem.
Note: the scheduler seems to consistently stop running after a task fails
to move a file from an FTP location to an HDFS one...
hadoop fs -Dfs.mapr.trace=debug -get \
ftp://$FTP_CLIENT:$FTP_PASS@$FTP_IP/$FTP_DIR"$TABLENAME.TSV" \
$PROJECT_HOME/tmp/"$TABLENAME.TSV" \
| hadoop fs -moveFromLocal $PROJECT_HOME/tmp/"$TABLENAME.TSV"
"$DATASTORE"# see https://stackoverflow.com/a/46433847/8236733
Note there *is* a logic error in this line since $DATASTORE is a hdfs dir
path, not a file path, but either way I don't think that the airflow
scheduler should be missing heartbeats like this from something seemingly
so unrelated.
Anyone know what could be going on here or how to fix?
--
This electronic message is intended only for the named
recipient, and may
contain information that is confidential or
privileged. If you are not the
intended recipient, you are
hereby notified that any disclosure, copying,
distribution or
use of the contents of this message is strictly
prohibited. If
you have received this message in error or are not the
named
recipient, please notify us immediately by contacting the
sender at
the electronic mail address noted above, and delete
and destroy all copies
of this message. Thank you.