Hmm, having more than one DagFileProcessorManager alive at the same time does 
indicate something has gone wrong -- there should only be one of those.

Are you using sub-dags or doing anything else "unusual" in any of your dag 
files?

What executor are you using? What process supervisor (if any) are you using to 
run your scheduler.

-ash

> On 9 Dec 2019, at 20:48, Reed Villanueva <[email protected]> wrote:
> 
> Have problem where the airflow (v1.10.5) webserver will complain...
> 
> The scheduler does not appear to be running. Last heartbeat was received 45 
> minutes ago.
> But checking the scheduler daemon process (started via airflow scheduler -D) 
> can see...
> 
> [airflow@airflowetl airflow]$ cat airflow-scheduler.pid
> 64186
> [airflow@airflowetl airflow]$ ps -aux | grep 64186
> airflow   64186  0.0  0.1 663340 67796 ?        S    15:03   0:00 
> /usr/bin/python3 /home/airflow/.local/bin/airflow scheduler -D
> airflow   94305  0.0  0.0 112716   964 pts/4    R+   16:01   0:00 grep 
> --color=auto 64186
> and after some period of time the error message goes away again).
> 
> This happens very frequently off-and-on even after restarting both the 
> webserver and scheduler.
> 
> The airflow-scheduler.err file is empty and the .out and .log files appear 
> innocuous (need more time to look through deeper).
> 
> Running the scheduler in the terminal to see the feed live, everything seems 
> to run fine until I see this output in the middle of the dag execution
> 
> [2019-11-29 15:51:57,825] {__init__.py:51} INFO - Using executor 
> SequentialExecutor
> [2019-11-29 15:51:58,259] {dagbag.py:90} INFO - Filling up the DagBag from 
> /home/airflow/airflow/dags/my_dag_file.py
> Once this pops up, I can see in the web UI that the scheduler heartbeat error 
> message appears. (Oddly, killing the scheduler process here does not generate 
> the heartbeat error message in the web UI). Checking for the scheduler 
> process, I see...
> 
> [airflow@airflowetl airflow]$ ps -aux | grep scheduler
> airflow    3409  0.2  0.1 523336 67384 ?        S    Oct24 115:06 airflow 
> scheduler -- DagFileProcessorManager
> airflow   25569  0.0  0.0 112716   968 pts/4    S+   16:00   0:00 grep 
> --color=auto scheduler
> airflow   56771  0.0  0.1 662560 67264 ?        S    Nov26   4:09 airflow 
> scheduler -- DagFileProcessorManager
> airflow   64187  0.0  0.1 662564 67096 ?        S    Nov27   0:00 airflow 
> scheduler -- DagFileProcessorManager
> airflow  153959  0.1  0.1 662568 67232 ?        S    15:01   0:06 airflow 
> scheduler -- DagFileProcessorManager
> IDK if this is this normal or not.
> 
> Thought the problem may have been that there were older scheduler processes 
> that were not deleted that were still running...
> 
> [airflow@airflowetl airflow]$ kill -9 3409 36771
> bash: kill: (36771) - No such process
> [airflow@airflowetl airflow]$ ps -aux | grep scheduler
> airflow   56771  0.0  0.1 662560 67264 ?        S    Nov26   4:09 airflow 
> scheduler -- DagFileProcessorManager
> airflow   64187  0.0  0.1 662564 67096 ?        S    Nov27   0:00 airflow 
> scheduler -- DagFileProcessorManager
> airflow  153959  0.0  0.1 662568 67232 ?        S    Nov29   0:06 airflow 
> scheduler -- DagFileProcessorManager
> airflow  155741  0.0  0.0 112712   968 pts/2    R+   15:54   0:00 grep 
> --color=auto scheduler
> Notice all the various start times in the output.
> 
> Doing a kill -9 56771 64187 ... and then rerunning airflow scheduler -D does 
> not seem to have fixed the problem.
> 
> Note: the scheduler seems to consistently stop running after a task fails to 
> move a file from an FTP location to an HDFS one...
> 
> hadoop fs -Dfs.mapr.trace=debug -get \
>         ftp://$FTP_CLIENT:$FTP_PASS@$FTP_IP/$FTP_DIR"$TABLENAME.TSV"; \
>         $PROJECT_HOME/tmp/"$TABLENAME.TSV" \
>         | hadoop fs -moveFromLocal $PROJECT_HOME/tmp/"$TABLENAME.TSV" 
> "$DATASTORE"
> # see https://stackoverflow.com/a/46433847/8236733 
> <https://stackoverflow.com/a/46433847/8236733>
> Note there is a logic error in this line since $DATASTORE is a hdfs dir path, 
> not a file path, but either way I don't think that the airflow scheduler 
> should be missing heartbeats like this from something seemingly so unrelated.
> 
> Anyone know what could be going on here or how to fix?
> 
> 
> This electronic message is intended only for the named 
> recipient, and may contain information that is confidential or 
> privileged. If you are not the intended recipient, you are 
> hereby notified that any disclosure, copying, distribution or 
> use of the contents of this message is strictly prohibited. If 
> you have received this message in error or are not the named
> recipient, please notify us immediately by contacting the 
> sender at the electronic mail address noted above, and delete 
> and destroy all copies of this message. Thank you.

Reply via email to