Yes, every task is run in process isolation (and could be running across 
separate machines) so every tasks builds the DAG from scratch.

If you don’t expect your DAG to change across an amount of time and they run on 
the same machine you could cache / pickle the DAG object and before trying to 
build the DAG check if the cache / pickle file is available and recent and load 
it from there. Or I am sure there are many other solutions.

Damian

From: Reed Villanueva <[email protected]>
Sent: Thursday, January 30, 2020 00:14
To: [email protected]
Subject: How often is dag definition file read during a single dag run?


How often is a dag definition file read during a single dag run?

Have a large dag that takes long amount of time to build (~1-3min). Looking at 
the logs of each task as the dag is running it appears that the dag definition 
file is being executed for every task before it runs...

*** Reading local file: 
/home/airflow/airflow/logs/mydag/mytask/2020-01-30T04:51:34.621883+00:00/1.log

[2020-01-29 19:02:10,844] {taskinstance.py:655} INFO - Dependencies all met for 
<TaskInstance: mydag.mytask2020-01-30T04:51:34.621883+00:00 [queued]>

[2020-01-29 19:02:10,866] {taskinstance.py:655} INFO - Dependencies all met for 
<TaskInstance: mydag.mytask2020-01-30T04:51:34.621883+00:00 [queued]>

[2020-01-29 19:02:10,866] {taskinstance.py:866} INFO -

--------------------------------------------------------------------------------

[2020-01-29 19:02:10,866] {taskinstance.py:867} INFO - Starting attempt 1 of 1

[2020-01-29 19:02:10,866] {taskinstance.py:868} INFO -

--------------------------------------------------------------------------------

[2020-01-29 19:02:10,883] {taskinstance.py:887} INFO - Executing 
<Task(BashOperator): precheck_db_perms> on 2020-01-30T04:51:34.621883+00:00

[2020-01-29 19:02:10,887] {standard_task_runner.py:52} INFO - Started process 
140570 to run task

[2020-01-29 19:02:11,048] {logging_mixin.py:112} INFO - [2020-01-29 
19:02:11,047] {dagbag.py:403} INFO - Filling up the DagBag from 
/home/airflow/airflow/dags/mydag.py

[2020-01-29 19:02:11,052] {logging_mixin.py:112} INFO - <output from my dag 
definition file>

[2020-01-29 19:02:11,101] {logging_mixin.py:112} INFO - <more output from my 
dag definition file>

....

....

....

[2020-01-29 19:02:58,651] {logging_mixin.py:112} INFO - Running %s on host %s 
<TaskInstance: mydag.mytask 2020-01-30T04:51:34.621883+00:00 [running]> 
airflowetl.co.local

[2020-01-29 19:02:58,674] {bash_operator.py:81} INFO - Tmp dir root location:

 /tmp

[2020-01-29 19:02:58,674] {bash_operator.py:91} INFO - Exporting the following 
env vars:

[email protected]<mailto:[email protected]>

AIRFLOW_CTX_DAG_OWNER=me

AIRFLOW_CTX_DAG_ID=mydag

AIRFLOW_CTX_TASK_ID=mytask

AIRFLOW_CTX_EXECUTION_DATE=2020-01-30T04:51:34.621883+00:00

AIRFLOW_CTX_DAG_RUN_ID=manual__2020-01-30T04:51:34.621883+00:00

[2020-01-29 19:02:58,675] {bash_operator.py:105} INFO - Temporary script 
location: /tmp/airflowtmphwu1ckty/mytaskbmnsizw5

<only now does the actual task logic output seem to start>

where the first whole part of the log seems to imply that the dag file is being 
run each time a new task is run (I see this for every task).

Is this indeed what is happening here? Is this normal / expected behavior? Note 
that since my dag takes some time to build, this would mean that that time is 
being multiplied across every task in the dag (of which there are many in this 
case), which makes me think this is either not normal or there is some best 
practice I am not using here. Could anyone with more airflow experience help 
explain what I'm seeing here?

This electronic message is intended only for the named
recipient, and may contain information that is confidential or
privileged. If you are not the intended recipient, you are
hereby notified that any disclosure, copying, distribution or
use of the contents of this message is strictly prohibited. If
you have received this message in error or are not the named
recipient, please notify us immediately by contacting the
sender at the electronic mail address noted above, and delete
and destroy all copies of this message. Thank you.



=============================================================================== 
Please access the attached hyperlink for an important electronic communications 
disclaimer: 
http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html 
=============================================================================== 

Reply via email to