[jira] [Commented] (AIRFLOW-3347) Unable to configure Kubernetes secrets through environment
[ https://issues.apache.org/jira/browse/AIRFLOW-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16687005#comment-16687005 ] Chris Bandy commented on AIRFLOW-3347: -- I was able to track this down. The following in kubernetes_executor was always returning empty OrderedDict() {code:python} self.kube_secrets = configuration_dict.get('kubernetes_secrets', {}) {code} The following line in AirflowConfigParser was not prepared for double underscores within/beneath a configuration section. Limiting the split seemed to do the trick. {code:python} diff --git a/airflow/configuration.py b/airflow/configuration.py index 2e05fde0..4c923b80 100644 --- a/airflow/configuration.py +++ b/airflow/configuration.py @@ -358,7 +358,7 @@ class AirflowConfigParser(ConfigParser): # add env vars and overwrite because they have priority for ev in [ev for ev in os.environ if ev.startswith('AIRFLOW__')]: try: -_, section, key = ev.split('__') +_, section, key = ev.split('__', 2) opt = self._get_env_var_option(section, key) except ValueError: opt = None {code} > Unable to configure Kubernetes secrets through environment > -- > > Key: AIRFLOW-3347 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3347 > Project: Apache Airflow > Issue Type: Bug > Components: configuration, kubernetes >Affects Versions: 1.10.0 >Reporter: Chris Bandy >Priority: Major > > We configure Airflow through environment variables. While setting up the > Kubernetes Executor, we wanted to pass the SQL Alchemy connection string to > workers by including it the {{kubernetes_secrets}} section of config. > Unfortunately, even with > {{AIRFLOW_\_KUBERNETES_SECRETS_\_AIRFLOW_\_CORE_\_SQL_ALCHEMY_CONN}} set in > the scheduler environment, the worker gets no environment secret environment > variables. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-3347) Unable to configure Kubernetes secrets through environment
[ https://issues.apache.org/jira/browse/AIRFLOW-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Bandy updated AIRFLOW-3347: - Description: We configure Airflow through environment variables. While setting up the Kubernetes Executor, we wanted to pass the SQL Alchemy connection string to workers by including it the {{kubernetes_secrets}} section of config. Unfortunately, even with {{AIRFLOW_\_KUBERNETES_SECRETS_\_AIRFLOW_\_CORE_\_SQL_ALCHEMY_CONN}} set in the scheduler environment, the worker gets no environment secret environment variables. was: We configure Airflow through environment variables. While setting up the Kubernetes Executor, we wanted to pass the SQL Alchemy connection string to workers by including it the {{kubernetes_secrets}} section of config. Unfortunately, even with {{AIRFLOW__KUBERNETES_SECRETS__AIRFLOW__CORE__SQL_ALCHEMY_CONN}} set in the scheduler environment, the worker gets no environment secret environment variables. > Unable to configure Kubernetes secrets through environment > -- > > Key: AIRFLOW-3347 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3347 > Project: Apache Airflow > Issue Type: Bug > Components: configuration, kubernetes >Affects Versions: 1.10.0 >Reporter: Chris Bandy >Priority: Major > > We configure Airflow through environment variables. While setting up the > Kubernetes Executor, we wanted to pass the SQL Alchemy connection string to > workers by including it the {{kubernetes_secrets}} section of config. > Unfortunately, even with > {{AIRFLOW_\_KUBERNETES_SECRETS_\_AIRFLOW_\_CORE_\_SQL_ALCHEMY_CONN}} set in > the scheduler environment, the worker gets no environment secret environment > variables. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-3347) Unable to configure Kubernetes secrets through environment
Chris Bandy created AIRFLOW-3347: Summary: Unable to configure Kubernetes secrets through environment Key: AIRFLOW-3347 URL: https://issues.apache.org/jira/browse/AIRFLOW-3347 Project: Apache Airflow Issue Type: Bug Components: configuration, kubernetes Affects Versions: 1.10.0 Reporter: Chris Bandy We configure Airflow through environment variables. While setting up the Kubernetes Executor, we wanted to pass the SQL Alchemy connection string to workers by including it the {{kubernetes_secrets}} section of config. Unfortunately, even with {{AIRFLOW__KUBERNETES_SECRETS__AIRFLOW__CORE__SQL_ALCHEMY_CONN}} set in the scheduler environment, the worker gets no environment secret environment variables. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2143) Try number displays incorrect values in the web UI
[ https://issues.apache.org/jira/browse/AIRFLOW-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16679843#comment-16679843 ] Chris Bandy commented on AIRFLOW-2143: -- Affects 1.10.0 as well. > Try number displays incorrect values in the web UI > -- > > Key: AIRFLOW-2143 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2143 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: 1.9.0 >Reporter: James Davidheiser >Priority: Minor > Attachments: adhoc_query.png, task_instance_page.png > > > This was confusing us a lot in our task runs - in the database, a task that > ran is marked as 1 try. However, when we view it in the UI, it shows at 2 > tries in several places. These include: > * Task Instance Details (ie > [https://airflow/task?execution_date=xxx&dag_id=xxx&task_id=xxx > )|https://airflow/task?execution_date=xxx&dag_id=xxx&task_id=xxx] > * Task instance browser (/admin/taskinstance/) > * Task Tries graph (/admin/airflow/tries) > Notably, is is correctly shown as 1 try in the log filenames, on the log > viewer page (admin/airflow/log?execution_date=), and some other places. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-3312) No log output from BashOperator under test
Chris Bandy created AIRFLOW-3312: Summary: No log output from BashOperator under test Key: AIRFLOW-3312 URL: https://issues.apache.org/jira/browse/AIRFLOW-3312 Project: Apache Airflow Issue Type: Bug Components: logging, operators Affects Versions: 1.10.0 Reporter: Chris Bandy The BashOperator logs some messages as well as the stdout of its command at the info level, but none of these appear when running {{airflow test}} with the default configuration. For example, this DAG emits the following in Airflow 1.10.0: {code:python} from airflow import DAG from airflow.operators.bash_operator import BashOperator from datetime import datetime dag = DAG('please', start_date=datetime(year=2018, month=11, day=1)) BashOperator(dag=dag, task_id='mine', bash_command='echo thank you') {code} {noformat} $ airflow test please mine '2018-11-01' [2018-11-08 00:06:54,098] {__init__.py:51} INFO - Using executor SequentialExecutor [2018-11-08 00:06:54,246] {models.py:258} INFO - Filling up the DagBag from /usr/local/airflow/dags {noformat} When executed by the scheduler, logs go to a file: {noformat} $ airflow scheduler -n 1 ... [2018-11-08 00:41:02,674] {dag_processing.py:582} INFO - Started a process (PID: 9) to generate tasks for /usr/local/airflow/dags/please.py [2018-11-08 00:41:03,185] {dag_processing.py:495} INFO - Processor for /usr/local/airflow/dags/please.py finished [2018-11-08 00:41:03,525] {jobs.py:1114} INFO - Tasks up for execution: [2018-11-08 00:41:03,536] {jobs.py:1147} INFO - Figuring out tasks to run in Pool(name=None) with 128 open slots and 1 task instances in queue [2018-11-08 00:41:03,539] {jobs.py:1184} INFO - DAG please has 0/16 running and queued tasks [2018-11-08 00:41:03,540] {jobs.py:1216} INFO - Setting the follow tasks to queued state: [2018-11-08 00:41:03,573] {jobs.py:1297} INFO - Setting the follow tasks to queued state: [2018-11-08 00:41:03,576] {jobs.py:1339} INFO - Sending ('please', 'mine', datetime.datetime(2018, 11, 1, 0, 0, tzinfo=)) to executor with priority 1 and queue default [2018-11-08 00:41:03,578] {base_executor.py:56} INFO - Adding to queue: airflow run please mine 2018-11-01T00:00:00+00:00 --local -sd /usr/local/airflow/dags/please.py [2018-11-08 00:41:03,593] {sequential_executor.py:45} INFO - Executing command: airflow run please mine 2018-11-01T00:00:00+00:00 --local -sd /usr/local/airflow/dags/please.py [2018-11-08 00:41:04,262] {__init__.py:51} INFO - Using executor SequentialExecutor [2018-11-08 00:41:04,406] {models.py:258} INFO - Filling up the DagBag from /usr/local/airflow/dags/please.py [2018-11-08 00:41:04,458] {cli.py:492} INFO - Running on host e2e08cf4dfaa [2018-11-08 00:41:09,684] {jobs.py:1443} INFO - Executor reports please.mine execution_date=2018-11-01 00:00:00+00:00 as success $ cat logs/please/mine/2018-11-01T00\:00\:00+00\:00/1.log [2018-11-08 00:41:04,554] {models.py:1335} INFO - Dependencies all met for [2018-11-08 00:41:04,564] {models.py:1335} INFO - Dependencies all met for [2018-11-08 00:41:04,565] {models.py:1547} INFO - Starting attempt 1 of 1 [2018-11-08 00:41:04,605] {models.py:1569} INFO - Executing on 2018-11-01T00:00:00+00:00 [2018-11-08 00:41:04,605] {base_task_runner.py:124} INFO - Running: ['bash', '-c', 'airflow run please mine 2018-11-01T00:00:00+00:00 --job_id 142 --raw -sd DAGS_FOLDER/please.py --cfg_path /tmp/tmp9prq7knr'] [2018-11-08 00:41:05,214] {base_task_runner.py:107} INFO - Job 142: Subtask mine [2018-11-08 00:41:05,213] {__init__.py:51} INFO - Using executor SequentialExecutor [2018-11-08 00:41:05,334] {base_task_runner.py:107} INFO - Job 142: Subtask mine [2018-11-08 00:41:05,333] {models.py:258} INFO - Filling up the DagBag from /usr/local/airflow/dags/please.py [2018-11-08 00:41:05,368] {base_task_runner.py:107} INFO - Job 142: Subtask mine [2018-11-08 00:41:05,367] {cli.py:492} INFO - Running on host e2e08cf4dfaa [2018-11-08 00:41:05,398] {bash_operator.py:74} INFO - Tmp dir root location: /tmp [2018-11-08 00:41:05,398] {bash_operator.py:87} INFO - Temporary script location: /tmp/airflowtmp0is6wwxi/mine8tmew5y4 [2018-11-08 00:41:05,399] {bash_operator.py:97} INFO - Running command: echo thank you [2018-11-08 00:41:05,402] {bash_operator.py:106} INFO - Output: [2018-11-08 00:41:05,404] {bash_operator.py:110} INFO - thank you [2018-11-08 00:41:05,404] {bash_operator.py:114} INFO - Command exited with return code 0 [2018-11-08 00:41:09,504] {logging_mixin.py:95} INFO - [2018-11-08 00:41:09,503] {jobs.py:2612} INFO - Task exited with return code 0 {noformat} This appears to be a regression. In Airflow 1.9.0, the same DAG with default configuration emi
[jira] [Commented] (AIRFLOW-3299) Logs for currently running sensors not visible in the UI
[ https://issues.apache.org/jira/browse/AIRFLOW-3299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16679846#comment-16679846 ] Chris Bandy commented on AIRFLOW-3299: -- Possibly related to AIRFLOW-2143? > Logs for currently running sensors not visible in the UI > > > Key: AIRFLOW-3299 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3299 > Project: Apache Airflow > Issue Type: Bug > Components: ui >Reporter: Brad Holmes >Priority: Major > > When a task is actively running, the logs are not appearing. I have tracked > this down to the {{next_try_number}} logic of task-instances. > In [the view at line > 836|https://github.com/apache/incubator-airflow/blame/master/airflow/www/views.py#L836], > we have > {code:java} > logs = [''] * (ti.next_try_number - 1 if ti is not None else 0) > {code} > The length of the {{logs}} array informs the frontend on the number of > {{attempts}} that exist, and thus how many AJAX calls to make to load the > logs. > Here is the current logic I have observed > ||Task State||Current length of 'logs'||Needed length of 'logs'|| > |Successfully completed in 1 attempt|1|1| > |Successfully completed in 2 attempt|2|2| > |Not yet attempted|0|0| > |Actively running task, first time|0|1| > That last case is the bug. Perhaps task-instance needs a method like > {{most_recent_try_number}} ? I don't see how to make use of {{try_number()}} > or {{next_try_number()}} to meet the need here. > ||Task State||try_number()||next_try_number()||Number of Attempts _Should_ > Display|| > |Successfully completed in 1 attempt|2|2|1| > |Successfully completed in 2 attempt|3|3|2| > |Not yet attempted|1|1|0| > |Actively running task, first time|0|1|1| > [~ashb] : You implemented this portion of task-instance 11 months ago. Any > suggestions? Or perhaps the problem is elsewhere? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-593) Tasks do not get backfilled sequentially
[ https://issues.apache.org/jira/browse/AIRFLOW-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551139#comment-16551139 ] Chris Bandy commented on AIRFLOW-593: - I can see that every instance of `load_transactions` believes it is the first instance: {noformat} [2018-07-20 18:23:53,724] {models.py:1216} DEBUG - dependency 'Previous Dagrun State' PASSED: True, This task instance was the first task instance for its task. [2018-07-20 18:23:53,774] {models.py:1216} DEBUG - dependency 'Previous Dagrun State' PASSED: True, This task instance was the first task instance for its task. [2018-07-20 18:24:33,619] {models.py:1216} DEBUG - dependency 'Previous Dagrun State' PASSED: True, This task instance was the first task instance for its task. [2018-07-20 18:24:33,689] {models.py:1216} DEBUG - dependency 'Previous Dagrun State' PASSED: True, This task instance was the first task instance for its task. [2018-07-20 18:24:59,968] {models.py:1216} DEBUG - dependency 'Previous Dagrun State' PASSED: True, This task instance was the first task instance for its task. {noformat} > Tasks do not get backfilled sequentially > > > Key: AIRFLOW-593 > URL: https://issues.apache.org/jira/browse/AIRFLOW-593 > Project: Apache Airflow > Issue Type: Bug > Components: DagRun, scheduler >Affects Versions: Airflow 1.7.1.3 >Reporter: Jong Kim >Priority: Minor > Attachments: Screen Shot 2018-07-20 at 10.04.24 AM.png > > > I need to have the tasks within a DAG complete in order when running > backfills. I am running on my mac locally using SequentialExecutor. > Let's say I have a DAG running daily at 11AM UTC (0 11 * * *) with a > start_date: datetime(2016, 10, 20, 11, 0, 0). The DAG consists of 3 tasks, > which must complete in order. task0 -> task1 -> task2. This dependency is set > using .set_downstream(). > Today (2016/10/22) I reset the database, turn-on the DAGrun using the on/off > toggle in the webserver, and issue "airflow scheduler", which will > automatically backfill starting from start_date. > It will backfill for 2016/10/20 and 2016/10/21. I expect backfill to run > like the following sequentially: > datetime(2016, 10, 20, 11, 0, 0) task0 > datetime(2016, 10, 20, 11, 0, 0) task1 > datetime(2016, 10, 20, 11, 0, 0) task2 > datetime(2016, 10, 21, 11, 0, 0) task0 > datetime(2016, 10, 21, 11, 0, 0) task1 > datetime(2016, 10, 21, 11, 0, 0) task2 > With 'depends_on_past': False, I see Airflow running tasks grouped by > sequence number something like this, which is not what I want: > datetime(2016, 10, 20, 11, 0, 0) task0 > datetime(2016, 10, 21, 11, 0, 0) task0 > datetime(2016, 10, 20, 11, 0, 0) task1 > datetime(2016, 10, 21, 11, 0, 0) task1 > datetime(2016, 10, 20, 11, 0, 0) task2 > datetime(2016, 10, 21, 11, 0, 0) task2 > With 'depends_on_past': True and 'wait_for_downstream': True, I expect it to > run like what I need to, but instead it runs some tasks out of order like > this: > datetime(2016, 10, 20, 11, 0, 0) task0 > datetime(2016, 10, 20, 11, 0, 0) task1 > datetime(2016, 10, 21, 11, 0, 0) task0 <- out of order! > datetime(2016, 10, 20, 11, 0, 0) task2 <- out of order! > datetime(2016, 10, 21, 11, 0, 0) task1 > datetime(2016, 10, 21, 11, 0, 0) task2 > Is this a bug? If not, am I understanding 'depends_on_past' and > 'wait_for_downstream' correctly? What do I need to do? > The only remedy I can think of is to backfill each date manually. > Public gist of DAG: > https://gist.github.com/jong-eatsa/cba1bf3c182b38e966696da47164faf1 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-593) Tasks do not get backfilled sequentially
[ https://issues.apache.org/jira/browse/AIRFLOW-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550894#comment-16550894 ] Chris Bandy commented on AIRFLOW-593: - https://lists.apache.org/thread.html/ef9ab995d019590eb7b072a74efca2a160b9a4916b6c1618c2ab762b@%3Cdev.airflow.apache.org%3E > Tasks do not get backfilled sequentially > > > Key: AIRFLOW-593 > URL: https://issues.apache.org/jira/browse/AIRFLOW-593 > Project: Apache Airflow > Issue Type: Bug > Components: DagRun, scheduler >Affects Versions: Airflow 1.7.1.3 >Reporter: Jong Kim >Priority: Minor > Attachments: Screen Shot 2018-07-20 at 10.04.24 AM.png > > > I need to have the tasks within a DAG complete in order when running > backfills. I am running on my mac locally using SequentialExecutor. > Let's say I have a DAG running daily at 11AM UTC (0 11 * * *) with a > start_date: datetime(2016, 10, 20, 11, 0, 0). The DAG consists of 3 tasks, > which must complete in order. task0 -> task1 -> task2. This dependency is set > using .set_downstream(). > Today (2016/10/22) I reset the database, turn-on the DAGrun using the on/off > toggle in the webserver, and issue "airflow scheduler", which will > automatically backfill starting from start_date. > It will backfill for 2016/10/20 and 2016/10/21. I expect backfill to run > like the following sequentially: > datetime(2016, 10, 20, 11, 0, 0) task0 > datetime(2016, 10, 20, 11, 0, 0) task1 > datetime(2016, 10, 20, 11, 0, 0) task2 > datetime(2016, 10, 21, 11, 0, 0) task0 > datetime(2016, 10, 21, 11, 0, 0) task1 > datetime(2016, 10, 21, 11, 0, 0) task2 > With 'depends_on_past': False, I see Airflow running tasks grouped by > sequence number something like this, which is not what I want: > datetime(2016, 10, 20, 11, 0, 0) task0 > datetime(2016, 10, 21, 11, 0, 0) task0 > datetime(2016, 10, 20, 11, 0, 0) task1 > datetime(2016, 10, 21, 11, 0, 0) task1 > datetime(2016, 10, 20, 11, 0, 0) task2 > datetime(2016, 10, 21, 11, 0, 0) task2 > With 'depends_on_past': True and 'wait_for_downstream': True, I expect it to > run like what I need to, but instead it runs some tasks out of order like > this: > datetime(2016, 10, 20, 11, 0, 0) task0 > datetime(2016, 10, 20, 11, 0, 0) task1 > datetime(2016, 10, 21, 11, 0, 0) task0 <- out of order! > datetime(2016, 10, 20, 11, 0, 0) task2 <- out of order! > datetime(2016, 10, 21, 11, 0, 0) task1 > datetime(2016, 10, 21, 11, 0, 0) task2 > Is this a bug? If not, am I understanding 'depends_on_past' and > 'wait_for_downstream' correctly? What do I need to do? > The only remedy I can think of is to backfill each date manually. > Public gist of DAG: > https://gist.github.com/jong-eatsa/cba1bf3c182b38e966696da47164faf1 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-593) Tasks do not get backfilled sequentially
[ https://issues.apache.org/jira/browse/AIRFLOW-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550869#comment-16550869 ] Chris Bandy commented on AIRFLOW-593: - I'm seeing this same behavior in Airflow 1.9.0. The `load_transactions` task has `depends_on_past=True`, but earlier instances are getting queued/executed after later ones during backfill. (The errors occurred when I killed the backfill command.) !Screen Shot 2018-07-20 at 10.04.24 AM.png! > Tasks do not get backfilled sequentially > > > Key: AIRFLOW-593 > URL: https://issues.apache.org/jira/browse/AIRFLOW-593 > Project: Apache Airflow > Issue Type: Bug > Components: DagRun, scheduler >Affects Versions: Airflow 1.7.1.3 >Reporter: Jong Kim >Priority: Minor > Attachments: Screen Shot 2018-07-20 at 10.04.24 AM.png > > > I need to have the tasks within a DAG complete in order when running > backfills. I am running on my mac locally using SequentialExecutor. > Let's say I have a DAG running daily at 11AM UTC (0 11 * * *) with a > start_date: datetime(2016, 10, 20, 11, 0, 0). The DAG consists of 3 tasks, > which must complete in order. task0 -> task1 -> task2. This dependency is set > using .set_downstream(). > Today (2016/10/22) I reset the database, turn-on the DAGrun using the on/off > toggle in the webserver, and issue "airflow scheduler", which will > automatically backfill starting from start_date. > It will backfill for 2016/10/20 and 2016/10/21. I expect backfill to run > like the following sequentially: > datetime(2016, 10, 20, 11, 0, 0) task0 > datetime(2016, 10, 20, 11, 0, 0) task1 > datetime(2016, 10, 20, 11, 0, 0) task2 > datetime(2016, 10, 21, 11, 0, 0) task0 > datetime(2016, 10, 21, 11, 0, 0) task1 > datetime(2016, 10, 21, 11, 0, 0) task2 > With 'depends_on_past': False, I see Airflow running tasks grouped by > sequence number something like this, which is not what I want: > datetime(2016, 10, 20, 11, 0, 0) task0 > datetime(2016, 10, 21, 11, 0, 0) task0 > datetime(2016, 10, 20, 11, 0, 0) task1 > datetime(2016, 10, 21, 11, 0, 0) task1 > datetime(2016, 10, 20, 11, 0, 0) task2 > datetime(2016, 10, 21, 11, 0, 0) task2 > With 'depends_on_past': True and 'wait_for_downstream': True, I expect it to > run like what I need to, but instead it runs some tasks out of order like > this: > datetime(2016, 10, 20, 11, 0, 0) task0 > datetime(2016, 10, 20, 11, 0, 0) task1 > datetime(2016, 10, 21, 11, 0, 0) task0 <- out of order! > datetime(2016, 10, 20, 11, 0, 0) task2 <- out of order! > datetime(2016, 10, 21, 11, 0, 0) task1 > datetime(2016, 10, 21, 11, 0, 0) task2 > Is this a bug? If not, am I understanding 'depends_on_past' and > 'wait_for_downstream' correctly? What do I need to do? > The only remedy I can think of is to backfill each date manually. > Public gist of DAG: > https://gist.github.com/jong-eatsa/cba1bf3c182b38e966696da47164faf1 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-593) Tasks do not get backfilled sequentially
[ https://issues.apache.org/jira/browse/AIRFLOW-593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Bandy updated AIRFLOW-593: Attachment: Screen Shot 2018-07-20 at 10.04.24 AM.png > Tasks do not get backfilled sequentially > > > Key: AIRFLOW-593 > URL: https://issues.apache.org/jira/browse/AIRFLOW-593 > Project: Apache Airflow > Issue Type: Bug > Components: DagRun, scheduler >Affects Versions: Airflow 1.7.1.3 >Reporter: Jong Kim >Priority: Minor > Attachments: Screen Shot 2018-07-20 at 10.04.24 AM.png > > > I need to have the tasks within a DAG complete in order when running > backfills. I am running on my mac locally using SequentialExecutor. > Let's say I have a DAG running daily at 11AM UTC (0 11 * * *) with a > start_date: datetime(2016, 10, 20, 11, 0, 0). The DAG consists of 3 tasks, > which must complete in order. task0 -> task1 -> task2. This dependency is set > using .set_downstream(). > Today (2016/10/22) I reset the database, turn-on the DAGrun using the on/off > toggle in the webserver, and issue "airflow scheduler", which will > automatically backfill starting from start_date. > It will backfill for 2016/10/20 and 2016/10/21. I expect backfill to run > like the following sequentially: > datetime(2016, 10, 20, 11, 0, 0) task0 > datetime(2016, 10, 20, 11, 0, 0) task1 > datetime(2016, 10, 20, 11, 0, 0) task2 > datetime(2016, 10, 21, 11, 0, 0) task0 > datetime(2016, 10, 21, 11, 0, 0) task1 > datetime(2016, 10, 21, 11, 0, 0) task2 > With 'depends_on_past': False, I see Airflow running tasks grouped by > sequence number something like this, which is not what I want: > datetime(2016, 10, 20, 11, 0, 0) task0 > datetime(2016, 10, 21, 11, 0, 0) task0 > datetime(2016, 10, 20, 11, 0, 0) task1 > datetime(2016, 10, 21, 11, 0, 0) task1 > datetime(2016, 10, 20, 11, 0, 0) task2 > datetime(2016, 10, 21, 11, 0, 0) task2 > With 'depends_on_past': True and 'wait_for_downstream': True, I expect it to > run like what I need to, but instead it runs some tasks out of order like > this: > datetime(2016, 10, 20, 11, 0, 0) task0 > datetime(2016, 10, 20, 11, 0, 0) task1 > datetime(2016, 10, 21, 11, 0, 0) task0 <- out of order! > datetime(2016, 10, 20, 11, 0, 0) task2 <- out of order! > datetime(2016, 10, 21, 11, 0, 0) task1 > datetime(2016, 10, 21, 11, 0, 0) task2 > Is this a bug? If not, am I understanding 'depends_on_past' and > 'wait_for_downstream' correctly? What do I need to do? > The only remedy I can think of is to backfill each date manually. > Public gist of DAG: > https://gist.github.com/jong-eatsa/cba1bf3c182b38e966696da47164faf1 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2143) Try number displays incorrect values in the web UI
[ https://issues.apache.org/jira/browse/AIRFLOW-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487329#comment-16487329 ] Chris Bandy commented on AIRFLOW-2143: -- Introduced by AIRFLOW-1873, I expect. https://github.com/apache/incubator-airflow/commit/f205fae9abdba271c1eaecdf1c9db950154a8199 > Try number displays incorrect values in the web UI > -- > > Key: AIRFLOW-2143 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2143 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: 1.9.0 >Reporter: James Davidheiser >Priority: Minor > Attachments: adhoc_query.png, task_instance_page.png > > > This was confusing us a lot in our task runs - in the database, a task that > ran is marked as 1 try. However, when we view it in the UI, it shows at 2 > tries in several places. These include: > * Task Instance Details (ie > [https://airflow/task?execution_date=xxx&dag_id=xxx&task_id=xxx > )|https://airflow/task?execution_date=xxx&dag_id=xxx&task_id=xxx] > * Task instance browser (/admin/taskinstance/) > * Task Tries graph (/admin/airflow/tries) > Notably, is is correctly shown as 1 try in the log filenames, on the log > viewer page (admin/airflow/log?execution_date=), and some other places. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2390) FlaskWTFDeprecationWarning: "flask_wtf.Form" has been renamed
[ https://issues.apache.org/jira/browse/AIRFLOW-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16457664#comment-16457664 ] Chris Bandy commented on AIRFLOW-2390: -- If I understand correctly, this can be remedied by renaming one import here: https://github.com/apache/incubator-airflow/blob/1.9.0/airflow/www/forms.py#L23 > FlaskWTFDeprecationWarning: "flask_wtf.Form" has been renamed > - > > Key: AIRFLOW-2390 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2390 > Project: Apache Airflow > Issue Type: Improvement > Components: webserver >Affects Versions: 1.9.0 >Reporter: Chris Bandy >Priority: Trivial > > Webserver complains about Flask deprecation: > {noformat} > /usr/local/lib/python3.5/dist-packages/airflow/www/views.py:661: > FlaskWTFDeprecationWarning: "flask_wtf.Form" has been renamed to "FlaskForm" > and will be removed in 1.0. > form = DateTimeForm(data={'execution_date': dttm}){noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2390) FlaskWTFDeprecationWarning: "flask_wtf.Form" has been renamed
Chris Bandy created AIRFLOW-2390: Summary: FlaskWTFDeprecationWarning: "flask_wtf.Form" has been renamed Key: AIRFLOW-2390 URL: https://issues.apache.org/jira/browse/AIRFLOW-2390 Project: Apache Airflow Issue Type: Improvement Components: webserver Affects Versions: 1.9.0 Reporter: Chris Bandy Webserver complains about Flask deprecation: {noformat} /usr/local/lib/python3.5/dist-packages/airflow/www/views.py:661: FlaskWTFDeprecationWarning: "flask_wtf.Form" has been renamed to "FlaskForm" and will be removed in 1.0. form = DateTimeForm(data={'execution_date': dttm}){noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (AIRFLOW-2128) 'Tall' DAGs scale worse than 'wide' DAGs
[ https://issues.apache.org/jira/browse/AIRFLOW-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429355#comment-16429355 ] Chris Bandy edited comment on AIRFLOW-2128 at 4/7/18 1:21 PM: -- [~szmate1618] what is your {{scheduler.min_file_process_interval}} (or {{AIRFLOW_\_SCHEDULER__MIN_FILE_PROCESS_INTERVAL}} environment) set to? was (Author: cbandy): [~szmate1618] what is your {{scheduler.min_file_process_interval}} (or {{AIRFLOW__SCHEDULER__MIN_FILE_PROCESS_INTERVAL}} environment) set to? > 'Tall' DAGs scale worse than 'wide' DAGs > > > Key: AIRFLOW-2128 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2128 > Project: Apache Airflow > Issue Type: Bug > Components: DAG, DagRun, scheduler >Affects Versions: 1.9.0 >Reporter: Máté Szabó >Priority: Major > Labels: performance, usability > Attachments: tall_dag.py, wide_dag.py > > > Tall DAG = a DAG with long chains of dependencies, e.g.: 0 -> 1 -> 2 -> ... > -> 998 -> 999 > Wide DAG = a DAG with many short, parallel dependencies e.g. 0 -> 1; 0 -> 2; > ... 0 -> 999 > Take a super simple case where both graphs are of 1000 tasks, and all the > tasks are just "sleep 0.03" bash commands (see the attached files). > With the default SequentialExecutor (without paralellism), I would expect my > 2 example DAGs to take (approximately) the same time to run, but apparently > this is not the case. > For the wide DAG it was about 80 successfully executed tasks in 10 minutes, > for the tall one it was 0. > This anomaly also seem to affect the web UI. Opening up the graph view or the > tree view for the wide DAG takes about 6 seconds on my machine, but for the > tall one it takes significantly longer, in fact currently it does not load at > all. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2128) 'Tall' DAGs scale worse than 'wide' DAGs
[ https://issues.apache.org/jira/browse/AIRFLOW-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429355#comment-16429355 ] Chris Bandy commented on AIRFLOW-2128: -- [~szmate1618] what is your {{scheduler.min_file_process_interval}} (or {{AIRFLOW__SCHEDULER__MIN_FILE_PROCESS_INTERVAL}} environment) set to? > 'Tall' DAGs scale worse than 'wide' DAGs > > > Key: AIRFLOW-2128 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2128 > Project: Apache Airflow > Issue Type: Bug > Components: DAG, DagRun, scheduler >Affects Versions: 1.9.0 >Reporter: Máté Szabó >Priority: Major > Labels: performance, usability > Attachments: tall_dag.py, wide_dag.py > > > Tall DAG = a DAG with long chains of dependencies, e.g.: 0 -> 1 -> 2 -> ... > -> 998 -> 999 > Wide DAG = a DAG with many short, parallel dependencies e.g. 0 -> 1; 0 -> 2; > ... 0 -> 999 > Take a super simple case where both graphs are of 1000 tasks, and all the > tasks are just "sleep 0.03" bash commands (see the attached files). > With the default SequentialExecutor (without paralellism), I would expect my > 2 example DAGs to take (approximately) the same time to run, but apparently > this is not the case. > For the wide DAG it was about 80 successfully executed tasks in 10 minutes, > for the tall one it was 0. > This anomaly also seem to affect the web UI. Opening up the graph view or the > tree view for the wide DAG takes about 6 seconds on my machine, but for the > tall one it takes significantly longer, in fact currently it does not load at > all. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-978) subdags concurrency setting is not working
[ https://issues.apache.org/jira/browse/AIRFLOW-978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16370935#comment-16370935 ] Chris Bandy commented on AIRFLOW-978: - [~jeffliujing] what is your {{core.parallelism}} set to? > subdags concurrency setting is not working > -- > > Key: AIRFLOW-978 > URL: https://issues.apache.org/jira/browse/AIRFLOW-978 > Project: Apache Airflow > Issue Type: Bug >Reporter: Jeff Liu >Priority: Major > > I have a dag with one subdag, inside this one subdag ( level2), there are > more than 100 subdags. > It seems that the concurrency settings on the level2 subdag doesn't work as > expected. With concurrency setting at 12 on the level2 subdag, the subdag > only runs 4 concurrent jobs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2108) BashOperator discards process indentation
[ https://issues.apache.org/jira/browse/AIRFLOW-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Bandy updated AIRFLOW-2108: - Description: When the BashOperator logs every line of output from the executing process, it strips leading whitespace which makes it difficult to interpret output that was formatted with indentation. For example, I'm executing [PGLoader|http://pgloader.readthedocs.io/] through this operator. When it finishes, it prints a summary which appears in the logs like so: {noformat} [2018-02-14 07:31:44,524] {bash_operator.py:101} INFO - 2018-02-14T07:31:44.115000Z LOG report summary reset [2018-02-14 07:31:44,564] {bash_operator.py:101} INFO - table name errors read imported bytes total time read write [2018-02-14 07:31:44,564] {bash_operator.py:101} INFO - -- - - - - -- - - [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - fetch meta data 0524524 1.438s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create Schemas 0 0 0 0.161s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create SQL Types 0 19 1920.413s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create tables 0310310 3m2.316s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Set Table OIDs 0155155 0.458s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - -- - - - - -- - - [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - -- - - - - -- - - [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Index Build Completion 0353353 1m37.323s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create Indexes 0353353 3m25.929s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Reset Sequences 0 0 0 2.677s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Primary Keys 0 147147 1m21.091s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create Foreign Keys 0 16 16 8.283s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create Triggers 0 0 0 0.339s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Install Comments 0 0 0 0.000s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - -- - - - - -- - - [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Total import time ✓ 0 0 6m35.642s {noformat} Ideally, the leading whitespace would be retained, so the logs look like this: {noformat} [2018-02-14 07:31:44,524] {bash_operator.py:101} INFO - 2018-02-14T07:31:44.115000Z LOG report summary reset [2018-02-14 07:31:44,564] {bash_operator.py:101} INFO - table name errors read imported bytes total time read write [2018-02-14 07:31:44,564] {bash_operator.py:101} INFO - -- - - - - -- - - [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO -fetch meta data 0524524 1.438s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create Schemas 0 0 0 0.161s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create SQL Types 0 19 1920.413s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create tables 0310310 3m2.316s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Set Table OIDs 0155155 0.458s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - -- - - - - -- - - [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - -- - - - - -- - - [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Index Build Completion 0353353 1m37.323s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create
[jira] [Commented] (AIRFLOW-2108) BashOperator discards process indentation
[ https://issues.apache.org/jira/browse/AIRFLOW-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16364816#comment-16364816 ] Chris Bandy commented on AIRFLOW-2108: -- If I understand correctly, this could be fixed by replacing {{line.strip()}} with {{line.rstrip()}}. > BashOperator discards process indentation > - > > Key: AIRFLOW-2108 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2108 > Project: Apache Airflow > Issue Type: Bug > Components: operators >Affects Versions: 1.9.0 >Reporter: Chris Bandy >Priority: Minor > > When the BashOperator logs every line of output from the executing process, > it strips leading whitespace which makes it difficult to interpret output > that was formatted with indentation. > For example, I'm executing [PGLoader|http://pgloader.readthedocs.io/] through > this operator. When it finishes, it prints a summary which appears in the > logs like so: > {noformat} > [2018-02-14 07:31:44,524] {bash_operator.py:101} INFO - > 2018-02-14T07:31:44.115000Z LOG report summary reset > [2018-02-14 07:31:44,564] {bash_operator.py:101} INFO - table name errors > read imported bytes total time read write > [2018-02-14 07:31:44,564] {bash_operator.py:101} INFO - > -- - - - - > -- - - > [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - fetch meta data >0524524 1.438s > [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create Schemas > 0 0 0 0.161s > [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create SQL Types > 0 19 1920.413s > [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create tables > 0310310 3m2.316s > [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Set Table OIDs > 0155155 0.458s > [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - > -- - - - - > -- - - > [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - > -- - - - - > -- - - > [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Index Build > Completion 0353353 1m37.323s > [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create Indexes > 0353353 3m25.929s > [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Reset Sequences >0 0 0 2.677s > [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Primary Keys > 0147147 1m21.091s > [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create Foreign Keys >0 16 16 8.283s > [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create Triggers >0 0 0 0.339s > [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Install Comments > 0 0 0 0.000s > [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - > -- - - - - > -- - - > [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Total import time > ∞ 0 0 6m35.642s > {noformat} > Ideally, the leading whitespace would be retained, so the logs look like this: > {noformat} > [2018-02-14 07:31:44,524] {bash_operator.py:101} INFO - > 2018-02-14T07:31:44.115000Z LOG report summary reset > [2018-02-14 07:31:44,564] {bash_operator.py:101} INFO - table > name errors read imported bytes total time read >write > [2018-02-14 07:31:44,564] {bash_operator.py:101} INFO - > -- - - - - > -- - - > [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO -fetch meta > data 0524524 1.438s > [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create > Schemas 0 0 0 0.161s > [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create SQL > Types 0 19 1920.413s > [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create > tables 0310310 3m2.316
[jira] [Updated] (AIRFLOW-2108) BashOperator discards process indentation
[ https://issues.apache.org/jira/browse/AIRFLOW-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Bandy updated AIRFLOW-2108: - Description: When the BashOperator logs every line of output from the executing process, it strips leading whitespace which makes it difficult to interpret output that was formatted with indentation. For example, I'm executing [PGLoader|http://pgloader.readthedocs.io/] through this operator. When it finishes, it prints a summary which appears in the logs like so: {noformat} [2018-02-14 07:31:44,524] {bash_operator.py:101} INFO - 2018-02-14T07:31:44.115000Z LOG report summary reset [2018-02-14 07:31:44,564] {bash_operator.py:101} INFO - table name errors read imported bytes total time read write [2018-02-14 07:31:44,564] {bash_operator.py:101} INFO - -- - - - - -- - - [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - fetch meta data 0524524 1.438s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create Schemas 0 0 0 0.161s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create SQL Types 0 19 1920.413s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create tables 0310310 3m2.316s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Set Table OIDs 0155155 0.458s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - -- - - - - -- - - [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - -- - - - - -- - - [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Index Build Completion 0353353 1m37.323s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create Indexes 0353353 3m25.929s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Reset Sequences 0 0 0 2.677s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Primary Keys 0 147147 1m21.091s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create Foreign Keys 0 16 16 8.283s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create Triggers 0 0 0 0.339s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Install Comments 0 0 0 0.000s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - -- - - - - -- - - [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Total import time ∞ 0 0 6m35.642s {noformat} Ideally, the leading whitespace would be retained, so the logs look like this: {noformat} [2018-02-14 07:31:44,524] {bash_operator.py:101} INFO - 2018-02-14T07:31:44.115000Z LOG report summary reset [2018-02-14 07:31:44,564] {bash_operator.py:101} INFO - table name errors read imported bytes total time read write [2018-02-14 07:31:44,564] {bash_operator.py:101} INFO - -- - - - - -- - - [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO -fetch meta data 0524524 1.438s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create Schemas 0 0 0 0.161s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create SQL Types 0 19 1920.413s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create tables 0310310 3m2.316s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Set Table OIDs 0155155 0.458s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - -- - - - - -- - - [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - -- - - - - -- - - [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Index Build Completion 0353353 1m37.323s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create
[jira] [Created] (AIRFLOW-2108) BashOperator discards process indentation
Chris Bandy created AIRFLOW-2108: Summary: BashOperator discards process indentation Key: AIRFLOW-2108 URL: https://issues.apache.org/jira/browse/AIRFLOW-2108 Project: Apache Airflow Issue Type: Bug Components: operators Affects Versions: 1.9.0 Reporter: Chris Bandy When the BashOperator logs every line of output from the executing process, it strips leading whitespace which makes it difficult to interpret output that was formatted with indentation. For example, I'm executing [PGLoader|http://pgloader.readthedocs.io/] through this operator. When it finishes, it prints a summary which appears in the logs like so: {noformat} [2018-02-14 07:31:44,524] {bash_operator.py:101} INFO - 2018-02-14T07:31:44.115000Z LOG report summary reset [2018-02-14 07:31:44,564] {bash_operator.py:101} INFO - table name errors read imported bytes total time read write [2018-02-14 07:31:44,564] {bash_operator.py:101} INFO - -- - - - - -- - - [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - fetch meta data 0524524 1.438s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create Schemas 0 0 0 0.161s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create SQL Types 0 19 1920.413s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create tables 0310310 3m2.316s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Set Table OIDs 0155155 0.458s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - -- - - - - -- - - [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - -- - - - - -- - - [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Index Build Completion 0353353 1m37.323s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create Indexes 0353353 3m25.929s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Reset Sequences 0 0 0 2.677s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Primary Keys 0 147147 1m21.091s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create Foreign Keys 0 16 16 8.283s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create Triggers 0 0 0 0.339s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Install Comments 0 0 0 0.000s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - -- - - - - -- - - [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Total import time ∞ 0 0 6m35.642s {noformat} Ideally, the leading whitespace would be retained, so the output looks like this: {noformat} [2018-02-14 07:31:44,524] {bash_operator.py:101} INFO - 2018-02-14T07:31:44.115000Z LOG report summary reset [2018-02-14 07:31:44,564] {bash_operator.py:101} INFO - table name errors read imported bytes total time read write [2018-02-14 07:31:44,564] {bash_operator.py:101} INFO - -- - - - - -- - - [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO -fetch meta data 0524524 1.438s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create Schemas 0 0 0 0.161s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create SQL Types 0 19 1920.413s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create tables 0310310 3m2.316s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Set Table OIDs 0155155 0.458s [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - -- - - - - -- - - [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - -- - - - - -- - - [2018-02-14 07:31