[jira] [Resolved] (AIRFLOW-6529) Serialization error occurs when the scheduler tries to run on macOS.

2021-02-08 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik resolved AIRFLOW-6529.
-
Resolution: Fixed

Fixed by https://github.com/apache/airflow/pull/8671

> Serialization error occurs when the scheduler tries to run on macOS.
> 
>
> Key: AIRFLOW-6529
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6529
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 1.10.8
> Environment: macOS
> Python 3.8
> multiprocessing with spawn mode
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> When we try to run the scheduler on macOS, we will get a serialization error 
> like as follows.
> {code}
>      _
>  |__( )_  __/__  /  __
>   /| |_  /__  ___/_  /_ __  /_  __ \_ | /| / /
> ___  ___ |  / _  /   _  __/ _  / / /_/ /_ |/ |/ /
>  _/_/  |_/_/  /_//_//_/  \//|__/
> [2020-01-10 19:54:41,974] {executor_loader.py:59} INFO - Using executor 
> SequentialExecutor
> [2020-01-10 19:54:41,983] {scheduler_job.py:1462} INFO - Starting the 
> scheduler
> [2020-01-10 19:54:41,984] {scheduler_job.py:1469} INFO - Processing each file 
> at most -1 times
> [2020-01-10 19:54:41,984] {scheduler_job.py:1472} INFO - Searching for files 
> in /Users/sarutak/airflow/dags
> [2020-01-10 19:54:42,025] {scheduler_job.py:1474} INFO - There are 27 files 
> in /Users/sarutak/airflow/dags
> [2020-01-10 19:54:42,025] {scheduler_job.py:1527} INFO - Resetting orphaned 
> tasks for active dag runs
> [2020-01-10 19:54:42,059] {scheduler_job.py:1500} ERROR - Exception when 
> executing execute_helper
> Traceback (most recent call last):
>   File 
> "/Users/sarutak/work/oss/airflow-env/master-python3.8.1/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py",
>  line 1498, in _execute
> self._execute_helper()
>   File 
> "/Users/sarutak/work/oss/airflow-env/master-python3.8.1/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py",
>  line 1531, in _execute_helper
> self.processor_agent.start()
>   File 
> "/Users/sarutak/work/oss/airflow-env/master-python3.8.1/lib/python3.8/site-packages/airflow/utils/dag_processing.py",
>  line 348, in start
> self._process.start()
>   File "/opt/python/3.8.1/lib/python3.8/multiprocessing/process.py", line 
> 121, in start
> self._popen = self._Popen(self)
>   File "/opt/python/3.8.1/lib/python3.8/multiprocessing/context.py", line 
> 224, in _Popen
> return _default_context.get_context().Process._Popen(process_obj)
>   File "/opt/python/3.8.1/lib/python3.8/multiprocessing/context.py", line 
> 283, in _Popen
> return Popen(process_obj)
>   File 
> "/opt/python/3.8.1/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 
> 32, in __init__
> super().__init__(process_obj)
>   File "/opt/python/3.8.1/lib/python3.8/multiprocessing/popen_fork.py", line 
> 19, in __init__
> self._launch(process_obj)
>   File 
> "/opt/python/3.8.1/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 
> 47, in _launch
> reduction.dump(process_obj, fp)
>   File "/opt/python/3.8.1/lib/python3.8/multiprocessing/reduction.py", line 
> 60, in dump
> ForkingPickler(file, protocol).dump(obj)
> AttributeError: Can't pickle local object 
> 'SchedulerJob._execute..processor_factory'
> {code}
> The reason is scheduler try to run subprocesses using multiprocessing with 
> spawn mode and the mode tries to pickle objects. In this case, 
> `processor_factory` inner method is tried to be pickled.
> Actually, as of Python 3.8, spawn mode is the default mode in macOS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (AIRFLOW-3964) Consolidate and de-duplicate sensor tasks

2020-09-08 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik resolved AIRFLOW-3964.
-
Resolution: Fixed

> Consolidate and de-duplicate sensor tasks 
> --
>
> Key: AIRFLOW-3964
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3964
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: dependencies, operators, scheduler
>Affects Versions: 1.10.0
>Reporter: Yingbo Wang
>Assignee: Yingbo Wang
>Priority: Critical
>
> h2. Problem
> h3. Airflow Sensor:
> Sensors are a certain type of operator that will keep running until a certain 
> criterion is met. Examples include a specific file landing in HDFS or S3, a 
> partition appearing in Hive, or a specific time of the day. Sensors are 
> derived from BaseSensorOperator and run a poke method at a specified 
> poke_interval until it returns True.
> Airflow Sensor duplication is a normal problem for large scale airflow 
> project. There are duplicated partitions needing to be detected from 
> same/different DAG. In Airbnb there are 88 boxes running four different types 
> of sensors everyday. The number of running sensor tasks ranges from 8k to 
> 16k, which takes great amount of resources. Although Airflow team had 
> redirected all sensors to a specific queue to allocate relatively minor 
> resource, there is still large room to reduce the number of workers and 
> relief DB pressure by optimizing the sensor mechanism.
> Existing sensor implementation creates an identical task for any sensor task 
> with specific dag_id, task_id and execution_date. This task is responsible of 
> keeping querying DB until the specified partitions exists. Even if two tasks 
> are waiting for same partition in DB, they are creating two connections with 
> the DB and checking the status in two separate processes. In one hand, DB 
> need to run duplicate jobs in multiple processes which will take both cpu and 
> memory resources. At the same time, Airflow need to maintain a process for 
> each sensor to query and wait for the partition/table to be created.
> h1. ***Design*
> There are several issues need to be resolved for our smart sensor. 
> h2. Persist sensor infor in DB and avoid file parsing before running
> Current Airflow implementation need to parse the DAG python file before 
> running a task. Parsing multiple python file in a smart sensor make the case 
> low efficiency and overload. Since sensor tasks need relatively more “light 
> weight” executing information -- less number of properties with simple 
> structure (most are built in type instead of function or object). We propose 
> to skip the parsing for smart sensor. The easiest way is to persist all task 
> instance information in airflow metaDB. 
> h3. Solution:
> It will be hard to dump the whole task instance object dictionary. And we do 
> not really need that much information. 
> We add two sets to the base sensor class as “persist_fields” and 
> “execute_fields”. 
> h4. “persist_fields”  dump to airflow.task_instance column “attr_dict”
> saves the attribute names that should be used to accomplish a sensor poking 
> job. For example:
>  #  the “NamedHivePartitionSensor” define its persist_fields as  
> ('partition_names', 'metastore_conn_id', 'hook') since these properties are 
> enough for its poking function. 
>  # While the HivePartitionSensor can be slightly different use persist_fields 
> as ('schema', 'table', 'partition', 'metastore_conn_id')
> If we have two tasks that have same property value for all field in 
> persist_fields. That means these two tasks are poking the same item and they 
> are holding duplicate poking jobs in senser. 
> *The persist_fields can help us in deduplicate sensor tasks*. In a more 
> broader way. If we can list persist_fields for all operators, it can help to 
> dedup all airflow tasks.
> h4. “Execute_fields” dump to airflow.task_instance column “exec_dict”
> Saves the execution configuration such as “poke_interval”, “timeout”, 
> “execution_timeout”
> Fields in this set do not contain information affecting the poking job 
> detail. They are related to how frequent should we poke, when should the task 
> timeout, how many times timeout should be a fail etc. We only put those logic 
> that we can easily handle in a smart sensor for now. This is a smart sensor 
> “doable whitelist” and can be extended with more logic being “unlocked” by 
> smart sensor implementation. 
>  When we initialize a task instance object. We dump the attribute value of 
> these two sets and persist them in the Airflow metaDB. Smart sensor can visit 
> DB to get all required information of running sensor tasks and don’t need to 
> parse any DAG files.
> h2. Airflow scheduler change
> We do not want to break any existing logic in scheduler. 

[jira] [Resolved] (AIRFLOW-5948) Replace SimpleDag with serialized version

2020-09-03 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik resolved AIRFLOW-5948.
-
Fix Version/s: 2.0.0
   Resolution: Fixed

> Replace SimpleDag with serialized version
> -
>
> Key: AIRFLOW-5948
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5948
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core, scheduler
>Affects Versions: 2.0.0, 1.10.7
>Reporter: Kaxil Naik
>Assignee: Kaxil Naik
>Priority: Critical
>  Labels: dag-serialization
> Fix For: 2.0.0
>
>
> Replace SimpleDag with serialized version (json over multiprocessing) in 
> SchedulerJob etc., no other change in scheduler behaviour. (This doesn't make 
> sense long term, but does tidy up the code)
> Currently, we have 2 Serialized Representation:
> # SimpleDags (were created because SimpleDags were not pickleable)
> # Serialized DAG
> We should remove SimpleDags



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (AIRFLOW-5500) Bug in trigger api endpoint

2020-08-19 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik resolved AIRFLOW-5500.
-
Fix Version/s: 1.10.11
   Resolution: Fixed

> Bug in trigger api endpoint 
> 
>
> Key: AIRFLOW-5500
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5500
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: api
>Affects Versions: 1.10.1
>Reporter: Deavarajegowda M T
>Priority: Critical
> Fix For: 1.10.11
>
> Attachments: 3level.py
>
>
> Unable to trigger workflow with nested sub dags, getting following error:
>  sqlalchemy.exc.IntegrityError: (psycopg2.IntegrityError) duplicate key value 
> (dag_id,execution_date)=('dummy.task1.task_level1.task_level2','2019-09-10 
> 13:00:27+00:00') violates unique constraint 
> "dag_run_dag_id_execution_date_key"
>  trigger_dag for nested sub_dags is called twice.
>  
> fix:
> in airflow/api/common/experimental/trigger_dag.py -
> while populating subdags for a dag, each subdag's subdags is also populated 
> to main dag.
> So no need to repopulate subdags for each subdag separately.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (AIRFLOW-4541) Replace mkdirs usage with pathlib

2020-08-02 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik resolved AIRFLOW-4541.
-
Resolution: Fixed

Fixed in https://github.com/apache/airflow/pull/10117

> Replace mkdirs usage with pathlib
> -
>
> Key: AIRFLOW-4541
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4541
> Project: Apache Airflow
>  Issue Type: Sub-task
>  Components: core
>Reporter: Jarek Potiuk
>Assignee: Bas Harenslak
>Priority: Major
>
> _makedirs is used in 'airlfow.utils.file.mkdirs'  - it could be replaced with 
> pathlib now with python3.5+_



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (AIRFLOW-4202) Make setup.py compatible with Python 3 only

2020-08-02 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik resolved AIRFLOW-4202.
-
Resolution: Fixed

This was fixed in Airflow Master

> Make setup.py compatible with Python 3 only
> ---
>
> Key: AIRFLOW-4202
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4202
> Project: Apache Airflow
>  Issue Type: Sub-task
>  Components: core
>Reporter: Fokko Driesprong
>Assignee: Jiajie Zhong
>Priority: Major
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (AIRFLOW-4196) AIP-3 Drop support for Python 2

2020-08-02 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik resolved AIRFLOW-4196.
-
Resolution: Fixed

> AIP-3 Drop support for Python 2
> ---
>
> Key: AIRFLOW-4196
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4196
> Project: Apache Airflow
>  Issue Type: Task
>  Components: core
>Reporter: Fokko Driesprong
>Priority: Major
> Fix For: 2.0.0
>
>
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-3+Drop+support+for+Python+2



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?

2020-07-08 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-5071:

Fix Version/s: 1.10.12

> Thousand os Executor reports task instance X finished (success) although the 
> task says its queued. Was the task killed externally?
> --
>
> Key: AIRFLOW-5071
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5071
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG, scheduler
>Affects Versions: 1.10.3
>Reporter: msempere
>Priority: Critical
> Fix For: 1.10.12
>
> Attachments: image-2020-01-27-18-10-29-124.png, 
> image-2020-07-08-07-58-42-972.png
>
>
> I'm opening this issue because since I update to 1.10.3 I'm seeing thousands 
> of daily messages like the following in the logs:
>  
> ```
>  {{__init__.py:1580}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> {{jobs.py:1484}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> ```
> -And looks like this is triggering also thousand of daily emails because the 
> flag to send email in case of failure is set to True.-
> I have Airflow setup to use Celery and Redis as a backend queue service.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?

2020-07-08 Thread Kaxil Naik (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154068#comment-17154068
 ] 

Kaxil Naik commented on AIRFLOW-5071:
-

Thanks for the info [~sgrzemski].

[~potiuk] Yes let's tackle this for 1.10.12. Seems like it has been occurring 
since sometime.

> Thousand os Executor reports task instance X finished (success) although the 
> task says its queued. Was the task killed externally?
> --
>
> Key: AIRFLOW-5071
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5071
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG, scheduler
>Affects Versions: 1.10.3
>Reporter: msempere
>Priority: Critical
> Fix For: 1.10.12
>
> Attachments: image-2020-01-27-18-10-29-124.png, 
> image-2020-07-08-07-58-42-972.png
>
>
> I'm opening this issue because since I update to 1.10.3 I'm seeing thousands 
> of daily messages like the following in the logs:
>  
> ```
>  {{__init__.py:1580}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> {{jobs.py:1484}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> ```
> -And looks like this is triggering also thousand of daily emails because the 
> flag to send email in case of failure is set to True.-
> I have Airflow setup to use Celery and Redis as a backend queue service.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-6708) Set unique logger names

2020-06-30 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-6708:

Fix Version/s: (was: 2.0.0)
   1.10.11

> Set unique logger names
> ---
>
> Key: AIRFLOW-6708
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6708
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core, logging
>Affects Versions: 1.10.7
>Reporter: Kamil Bregula
>Priority: Major
> Fix For: 1.10.11
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-4052) To allow filtering using "event" and "owner" in "Log" view

2020-05-14 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-4052:

Fix Version/s: (was: 1.10.3)
   1.10.11

> To allow filtering using "event" and "owner" in "Log" view
> --
>
> Key: AIRFLOW-4052
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4052
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: ui
>Affects Versions: 1.10.2
>Reporter: Xiaodong Deng
>Assignee: Xiaodong Deng
>Priority: Minor
>  Labels: webapp
> Fix For: 1.10.11
>
>
> In the RBAC UI, users can check Logs. But they can only use "dag id", "task 
> id", "execution date", or "extra" to filter, while filtering using "event" 
> and "owner" will be very useful (to allow users to check specific events 
> happened, or check what a specific user did).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (AIRFLOW-1156) Using a timedelta object as a Schedule Interval with catchup=False causes the start_date to no longer be honored.

2020-05-11 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik resolved AIRFLOW-1156.
-
Fix Version/s: 1.10.11
   Resolution: Fixed

> Using a timedelta object as a Schedule Interval with catchup=False causes the 
> start_date to no longer be honored.
> -
>
> Key: AIRFLOW-1156
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1156
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 1.8.0
>Reporter: Zachary Lawson
>Assignee: Kaxil Naik
>Priority: Minor
> Fix For: 1.10.11
>
>
> Currently, in Airflow v1.8, if you set your schedule_interval to a timedelta 
> object and set catchup=False, the start_date is no longer honored and the DAG 
> is scheduled immediately upon unpausing the DAG. It is then schedule on the 
> schedule interval from that point onward. Example below:
> {code}
> from airflow import DAG
> from datetime import datetime, timedelta
> import logging
> from airflow.operators.python_operator import PythonOperator
> default_args = {
> 'owner': 'airflow',
> 'depends_on_past': False,
> 'start_date': datetime(2015, 6, 1),
> }
> dag = DAG('test', default_args=default_args, 
> schedule_interval=timedelta(seconds=5), catchup=False)
> def context_test(ds, **context):
> logging.info('testing')
> test_context = PythonOperator(
> task_id='test_context',
> provide_context=True,
> python_callable=context_test,
> dag=dag
> )
> {code}
> If you switch the above over to a CRON expression, the behavior of the 
> scheduling is returned to the expected.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (AIRFLOW-1156) Using a timedelta object as a Schedule Interval with catchup=False causes the start_date to no longer be honored.

2020-05-09 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-1156 started by Kaxil Naik.
---
> Using a timedelta object as a Schedule Interval with catchup=False causes the 
> start_date to no longer be honored.
> -
>
> Key: AIRFLOW-1156
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1156
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 1.8.0
>Reporter: Zachary Lawson
>Assignee: Kaxil Naik
>Priority: Minor
>
> Currently, in Airflow v1.8, if you set your schedule_interval to a timedelta 
> object and set catchup=False, the start_date is no longer honored and the DAG 
> is scheduled immediately upon unpausing the DAG. It is then schedule on the 
> schedule interval from that point onward. Example below:
> {code}
> from airflow import DAG
> from datetime import datetime, timedelta
> import logging
> from airflow.operators.python_operator import PythonOperator
> default_args = {
> 'owner': 'airflow',
> 'depends_on_past': False,
> 'start_date': datetime(2015, 6, 1),
> }
> dag = DAG('test', default_args=default_args, 
> schedule_interval=timedelta(seconds=5), catchup=False)
> def context_test(ds, **context):
> logging.info('testing')
> test_context = PythonOperator(
> task_id='test_context',
> provide_context=True,
> python_callable=context_test,
> dag=dag
> )
> {code}
> If you switch the above over to a CRON expression, the behavior of the 
> scheduling is returned to the expected.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-3369) Un-pausing a DAG with catchup =False creates an extra DAG run (1.10)

2020-05-09 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik closed AIRFLOW-3369.
---
Resolution: Duplicate

PR to fix this: https://github.com/apache/airflow/pull/8776
Closing the issue as https://issues.apache.org/jira/browse/AIRFLOW-1156 
describes the same issue 

> Un-pausing a DAG with catchup =False creates an extra DAG run (1.10)
> 
>
> Key: AIRFLOW-3369
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3369
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 1.10.0
>Reporter: Andrew Harmon
>Assignee: Kaxil Naik
>Priority: Major
> Attachments: image.png
>
>
> If you create a DAG with catchup=False, when it is un-paused, it creates 2 
> dag runs. One for the most recent scheduled interval (expected) and one for 
> the interval before that (unexpected).
> *Sample DAG*
> {code:java}
> from airflow import DAG
> from datetime import datetime
> from airflow.operators.dummy_operator import DummyOperator
> dag = DAG(
> dag_id='DummyTest',
> start_date=datetime(2018,1,1),
> catchup=False
> )
> do = DummyOperator(
> task_id='dummy_task',
> dag=dag
> )
> {code}
> *Result:*
> 2 DAG runs are created. 2018-11-18 and 108-11-17
> *Expected Result:*
> Only 1 DAG run should have been created (2018-11-18)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-6577) DAG Backfill with timedelta runs twice

2020-05-09 Thread Kaxil Naik (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17103622#comment-17103622
 ] 

Kaxil Naik commented on AIRFLOW-6577:
-

PR to fix this: https://github.com/apache/airflow/pull/8776
Closing the issue as https://issues.apache.org/jira/browse/AIRFLOW-1156 
describes the same issue 

> DAG Backfill with timedelta runs twice
> --
>
> Key: AIRFLOW-6577
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6577
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG, DagRun
>Affects Versions: 1.10.7
> Environment: ProductName: Mac OS X
> ProductVersion:   10.14.6
> BuildVersion: 18G2022
> Client: Docker Engine - Community
>  Version:   19.03.5
>  API version:   1.40
>  Go version:go1.12.12
>  Git commit:633a0ea
>  Built: Wed Nov 13 07:22:34 2019
>  OS/Arch:   darwin/amd64
>  Experimental:  false
>Reporter: Nick Benthem
>Priority: Minor
>
> if you use {{timedelta=__anything__}}, and have {{catchup=False}}, it will 
> cause a DOUBLE run of your DAG! The only workaround i found was to use a cron 
> timer, i.e.,
> schedule_interval='@daily',
> Rather than
> schedule_interval=timedelta(days=1),
> It almost definitely exists in
> def create_dag_run(self, dag, session=None):
> in 
> {{/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py
>  }}
> around line {{643}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-1056) Single dag run triggered when un-pausing job with catchup=False

2020-05-09 Thread Kaxil Naik (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17103621#comment-17103621
 ] 

Kaxil Naik commented on AIRFLOW-1056:
-

PR to fix this: https://github.com/apache/airflow/pull/8776
Closing the issue as https://issues.apache.org/jira/browse/AIRFLOW-1156 
describes the same issue 

> Single dag run triggered when un-pausing job with catchup=False
> ---
>
> Key: AIRFLOW-1056
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1056
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 1.8.0
>Reporter: Andrew Heuermann
>Priority: Major
>
> When "catchup=False" a single job run is still triggered when un-pausing a 
> dag when there are missed run windows. 
> In airflow/jobs.py:create_dag_run(): When catchup is disabled it updates the 
> dag.start_date here to prevent the backfill: 
> https://github.com/apache/incubator-airflow/blob/bb39078a35cf2bceea58d7831d7a2028c8ef849f/airflow/jobs.py#L770.
> But it looks like the function schedules dags based on a window (using 
> sequential run times as lower and upper bounds) so it will always schedule a 
> single dag run if there is a missed run between the last run and the time 
> which it was unpaused. Even if it was un-paused AFTER those missed runs.
> Some ideas on solutions:
> * Pass in the time when the scheduler last ran and use that as the lower 
> bound of the window, but not sure how easy that is to get to. 
> * Update the start_date when a dag with catchup=False is unpaused. Or add a 
> new "unpaused_date" field that would serve the same purpose.
> * If paused have the scheduler insert a skipped Job record when the job would 
> have run.
> There might be a simpler solution I'm missing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-6577) DAG Backfill with timedelta runs twice

2020-05-09 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik closed AIRFLOW-6577.
---
Resolution: Duplicate

> DAG Backfill with timedelta runs twice
> --
>
> Key: AIRFLOW-6577
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6577
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG, DagRun
>Affects Versions: 1.10.7
> Environment: ProductName: Mac OS X
> ProductVersion:   10.14.6
> BuildVersion: 18G2022
> Client: Docker Engine - Community
>  Version:   19.03.5
>  API version:   1.40
>  Go version:go1.12.12
>  Git commit:633a0ea
>  Built: Wed Nov 13 07:22:34 2019
>  OS/Arch:   darwin/amd64
>  Experimental:  false
>Reporter: Nick Benthem
>Priority: Minor
>
> if you use {{timedelta=__anything__}}, and have {{catchup=False}}, it will 
> cause a DOUBLE run of your DAG! The only workaround i found was to use a cron 
> timer, i.e.,
> schedule_interval='@daily',
> Rather than
> schedule_interval=timedelta(days=1),
> It almost definitely exists in
> def create_dag_run(self, dag, session=None):
> in 
> {{/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py
>  }}
> around line {{643}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-1056) Single dag run triggered when un-pausing job with catchup=False

2020-05-09 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik closed AIRFLOW-1056.
---
Resolution: Duplicate

> Single dag run triggered when un-pausing job with catchup=False
> ---
>
> Key: AIRFLOW-1056
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1056
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 1.8.0
>Reporter: Andrew Heuermann
>Priority: Major
>
> When "catchup=False" a single job run is still triggered when un-pausing a 
> dag when there are missed run windows. 
> In airflow/jobs.py:create_dag_run(): When catchup is disabled it updates the 
> dag.start_date here to prevent the backfill: 
> https://github.com/apache/incubator-airflow/blob/bb39078a35cf2bceea58d7831d7a2028c8ef849f/airflow/jobs.py#L770.
> But it looks like the function schedules dags based on a window (using 
> sequential run times as lower and upper bounds) so it will always schedule a 
> single dag run if there is a missed run between the last run and the time 
> which it was unpaused. Even if it was un-paused AFTER those missed runs.
> Some ideas on solutions:
> * Pass in the time when the scheduler last ran and use that as the lower 
> bound of the window, but not sure how easy that is to get to. 
> * Update the start_date when a dag with catchup=False is unpaused. Or add a 
> new "unpaused_date" field that would serve the same purpose.
> * If paused have the scheduler insert a skipped Job record when the job would 
> have run.
> There might be a simpler solution I'm missing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (AIRFLOW-3369) Un-pausing a DAG with catchup =False creates an extra DAG run (1.10)

2020-05-07 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-3369 started by Kaxil Naik.
---
> Un-pausing a DAG with catchup =False creates an extra DAG run (1.10)
> 
>
> Key: AIRFLOW-3369
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3369
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 1.10.0
>Reporter: Andrew Harmon
>Assignee: Kaxil Naik
>Priority: Major
> Attachments: image.png
>
>
> If you create a DAG with catchup=False, when it is un-paused, it creates 2 
> dag runs. One for the most recent scheduled interval (expected) and one for 
> the interval before that (unexpected).
> *Sample DAG*
> {code:java}
> from airflow import DAG
> from datetime import datetime
> from airflow.operators.dummy_operator import DummyOperator
> dag = DAG(
> dag_id='DummyTest',
> start_date=datetime(2018,1,1),
> catchup=False
> )
> do = DummyOperator(
> task_id='dummy_task',
> dag=dag
> )
> {code}
> *Result:*
> 2 DAG runs are created. 2018-11-18 and 108-11-17
> *Expected Result:*
> Only 1 DAG run should have been created (2018-11-18)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (AIRFLOW-3369) Un-pausing a DAG with catchup =False creates an extra DAG run (1.10)

2020-05-07 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik reassigned AIRFLOW-3369:
---

Assignee: Kaxil Naik

> Un-pausing a DAG with catchup =False creates an extra DAG run (1.10)
> 
>
> Key: AIRFLOW-3369
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3369
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 1.10.0
>Reporter: Andrew Harmon
>Assignee: Kaxil Naik
>Priority: Major
> Attachments: image.png
>
>
> If you create a DAG with catchup=False, when it is un-paused, it creates 2 
> dag runs. One for the most recent scheduled interval (expected) and one for 
> the interval before that (unexpected).
> *Sample DAG*
> {code:java}
> from airflow import DAG
> from datetime import datetime
> from airflow.operators.dummy_operator import DummyOperator
> dag = DAG(
> dag_id='DummyTest',
> start_date=datetime(2018,1,1),
> catchup=False
> )
> do = DummyOperator(
> task_id='dummy_task',
> dag=dag
> )
> {code}
> *Result:*
> 2 DAG runs are created. 2018-11-18 and 108-11-17
> *Expected Result:*
> Only 1 DAG run should have been created (2018-11-18)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (AIRFLOW-7048) Provide "timezone selection" mechanism in front-end

2020-04-29 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-7048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik resolved AIRFLOW-7048.
-
Fix Version/s: (was: 2.0.0)
   1.10.10
   Resolution: Fixed

> Provide "timezone selection" mechanism in front-end
> ---
>
> Key: AIRFLOW-7048
> URL: https://issues.apache.org/jira/browse/AIRFLOW-7048
> Project: Apache Airflow
>  Issue Type: Sub-task
>  Components: ui, webserver
>Affects Versions: 1.10.0
>Reporter: Ash Berlin-Taylor
>Assignee: Ash Berlin-Taylor
>Priority: Major
>  Labels: timezone
> Fix For: 1.10.10
>
>
> Often users will want to see timezones in their "local" timezone (as defined 
> by browser), but there are useful cases when they would want to be able to 
> view times in other timezones.
> We should have "dropdown"/selection mechanism that lets the user choose:
> - Browser local (the default probably)
> - UTC
> - Ariflow server configured TZ (if it's not UTC)
> Possibly we should also have "DAG timezone" for any dag-specific page, when 
> that DAG is set to use a different TZ
> And then for flexability (and if it's not loads of work) let the user select 
> any arbitrary timezone ("America/New York", or "Europe/Paris" for instance) 
> from a list.
> The user's chosen TZ should be stored in a cookie (don't think it's necessary 
> to store this server side if all the translation is being done client side.)
> It might be useful to have a "quick toggle" between local and what ever other 
> setting is set.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-6959) Use NULL as dag.description default value and change related UI

2020-04-27 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-6959:

Fix Version/s: (was: 2.0.0)
   1.10.11

> Use NULL as dag.description default value and change related UI
> ---
>
> Key: AIRFLOW-6959
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6959
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: DAG, database, ui
>Affects Versions: 1.10.9
>Reporter: Jiajie Zhong
>Assignee: Jiajie Zhong
>Priority: Minor
> Fix For: 1.10.11
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (AIRFLOW-6796) Serialized DAGs can be incorrectly deleted

2020-04-27 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik resolved AIRFLOW-6796.
-
Fix Version/s: 1.10.11
   Resolution: Fixed

> Serialized DAGs can be incorrectly deleted
> --
>
> Key: AIRFLOW-6796
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6796
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: serialization
>Affects Versions: 1.10.9
>Reporter: Matthew Bruce
>Priority: Major
> Fix For: 1.10.11
>
>
> With serialization of DAGs enabled, `SerializedDagModel.remove_deleted_dags` 
> called from `DagFileProcessManager.refresh_dag_dir` can delete the 
> serialization of DAGs if they were loaded via a DagBag and globals in a 
> different `.py` file:
> Consider something like this:
>  {{/home/airflow/dags/loader.py}}
> {code:python}
> dag_bags = []
> dag_bags.append(models.DagBag('/home/airflow/project-a/dags')
> dag_bags.append(models.DagBag('/home/airflow/project-b/dags')
> for dag_bag in dag_bags:
> for dag in dag_bag:
>   globals()[dag.dag_id] = dag{code}
> with files:
> {code:java}
> /home/airflow/project-a/dags/dag-a.py
> /home/airflow/project-b/dags/dag-b.py
> {code}
>  
> The list of file paths passed to {{SerializedDagModel.remove_deleted_dags}} 
> is only going to contain {{/home/airflow/dags/loader.py}} and the method will 
> remove the serializations for the DAGs in dag-a.py and dag-b.py
> With non-serialized DAGs, airflow seems to mark DAGs as inactive based on 
> when the scheduler last processed them - I wonder if we should make these two 
> methods consistent?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (AIRFLOW-7111) Expose generate_presigned_url of boto3 to S3Hook

2020-04-24 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-7111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik resolved AIRFLOW-7111.
-
Fix Version/s: 1.10.11
   2.0.0
   Resolution: Fixed

> Expose generate_presigned_url of boto3 to S3Hook
> 
>
> Key: AIRFLOW-7111
> URL: https://issues.apache.org/jira/browse/AIRFLOW-7111
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: aws
>Affects Versions: 1.10.9
>Reporter: korni
>Assignee: Jerome Carless
>Priority: Major
>  Labels: S3, S3Hook, aws, easyfix, gsoc, gsoc2020
> Fix For: 2.0.0, 1.10.11
>
>
> boto3 has {{generate_presigned_url which should be exposed in the Hook:}}
> {{[https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.generate_presigned_url]}}
> {{generate_presigned_url}}(_ClientMethod_, _Params=None_, _ExpiresIn=3600_, 
> _HttpMethod=None_)
> Generate a presigned url given a client, its method, and arguments
> Parameters
>  * *ClientMethod* (_string_) -- The client method to presign for
>  * *Params* (_dict_) -- The parameters normally passed to {{ClientMethod}}.
>  * *ExpiresIn* (_int_) -- The number of seconds the presigned url is valid 
> for. By default it expires in an hour (3600 seconds)
>  * *HttpMethod* (_string_) -- The http method to use on the generated url. By 
> default, the http method is whatever is used in the method's model.
> Returns The presigned url



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (AIRFLOW-4357) Tool tip offset when using RBAC

2020-04-23 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik resolved AIRFLOW-4357.
-
Resolution: Fixed

> Tool tip offset when using RBAC
> ---
>
> Key: AIRFLOW-4357
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4357
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webserver
>Affects Versions: 1.10.2, 1.10.3
> Environment: Fedora 29 with Python 3.5.6 from conda
>Reporter: Charles Surett
>Assignee: Ash Berlin-Taylor
>Priority: Minor
>  Labels: rbac, web
> Fix For: 1.10.11
>
> Attachments: Expected Behavior.png, Issue.png, installed-packages.txt
>
>
> Tool tips are offset when the page is scrolled when using the RBAC web UI
>  
> See attached images for more details.
>  
> It seems to be related to 
> [https://github.com/twbs/bootstrap/blob/v3.3.7/js/tooltip.js#L367-L372]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-5517) SparkSubmitOperator: spark-binary parameter no longer taken from connection extra

2020-04-22 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-5517:

Fix Version/s: (was: 1.10.7)
   1.10.11

> SparkSubmitOperator: spark-binary parameter no longer taken from connection 
> extra
> -
>
> Key: AIRFLOW-5517
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5517
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib
>Affects Versions: 1.10.4, 1.10.5
>Reporter: Alexander Kazarin
>Priority: Major
> Fix For: 1.10.11
>
>
> We have an extra parameters in spark connection:
> {code:java}
> {"deploy-mode": "cluster", "spark-binary": "spark2-submit"}
> {code}
> After upgrade to 1.10.5 from 1.10.3 parameter 'spark-binary' in extra is no 
> longer take effect.
>  Broken after 
> [this|https://github.com/apache/airflow/commit/8be59fb4edf0f2a132b13d0ffd1df0b8908191ab]
>  commit, I think
> Workaround: call SparkSubmitOperator with spark_binary=None argument



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-5659) Add support for ephemeral storage on KubernetesPodOperator

2020-04-19 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-5659:

Fix Version/s: (was: 2.0.0)
   1.10.11

> Add support for ephemeral storage on KubernetesPodOperator
> --
>
> Key: AIRFLOW-5659
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5659
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: operators
>Affects Versions: 2.0.0
>Reporter: Leonardo Miguel
>Assignee: Leonardo Miguel
>Priority: Minor
> Fix For: 1.10.11
>
>
> KubernetesPodOperator currently doesn't support requests and limits for 
> resource 'ephemeral-storage'.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-6609) Airflow upgradedb fails serialized_dag table add on revision id d38e04c12aa2

2020-04-18 Thread Kaxil Naik (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17086435#comment-17086435
 ] 

Kaxil Naik commented on AIRFLOW-6609:
-

That looks someone ran airflow initdb/upgradedb and stopped it midway

Go to Airflow Metadata DB, find the alembic table and update the value to 
d38e04c12aa2

> Airflow upgradedb fails serialized_dag table add on revision id d38e04c12aa2
> 
>
> Key: AIRFLOW-6609
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6609
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: database
>Affects Versions: 1.10.7
>Reporter: Chris Schmautz
>Priority: Major
>  Labels: database, postgres
>
> We're attempting an upgrade from 1.10.3 to 1.10.7 to use some of the great 
> features available in later revisions; however, the upgrade from 1.10.6 to 
> 1.10.7 is causing some heartburn.
> +Runtime environment:+
>  - Docker containers for each runtime segment (webserver, scheduler, flower, 
> postgres, redis, worker)
>  - Using CeleryExecutor queued with Redis
>  - Using Postgres backend
>  
> +Steps to reproduce:+
>  1. Author base images relating to each version of Airflow between 1.10.3 and 
> 1.10.7 (if you want the full regression we have done)
>  2. 'airflow initdb' on revision 1.10.3
>  3. Start up the containers, run some dags, produce metadata
>  4. Increment / swap out base image revision from 1.10.3 base to 1.10.4 base 
> image
>  5. Run 'airflow upgradedb'
>  6. Validate success
>  n. Eventually you will get to the 1.10.6 revision, stepping up to 1.10.7, 
> which produces the error below
>  
> {code:java}
> INFO  [alembic.runtime.migration] Running upgrade 6e96a59344a4 -> 
> d38e04c12aa2, add serialized_dag table
> Revision ID: d38e04c12aa2
> Revises: 6e96a59344a4
> Create Date: 2019-08-01 14:39:35.616417
> Traceback (most recent call last):
>   File 
> "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/base.py",
>  line 1246, in _execute_context
> cursor, statement, parameters, context
>   File 
> "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/default.py",
>  line 581, in do_execute
> cursor.execute(statement, parameters)
> psycopg2.errors.DuplicateTable: relation "serialized_dag" already exists
> The above exception was the direct cause of the following exception:Traceback 
> (most recent call last):
>   File "/opt/anaconda/miniconda3/envs/airflow/bin/airflow", line 37, in 
> 
> args.func(args)
>   File 
> "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/airflow/utils/cli.py",
>  line 75, in wrapper
> return f(*args, **kwargs)
>   File 
> "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/airflow/bin/cli.py",
>  line 1193, in upgradedb
> db.upgradedb()
>   File 
> "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/airflow/utils/db.py",
>  line 376, in upgradedb
> command.upgrade(config, 'heads')
>   File 
> "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/alembic/command.py",
>  line 298, in upgrade
> script.run_env()
>   File 
> "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/alembic/script/base.py",
>  line 489, in run_env
> util.load_python_file(self.dir, "env.py")
>   File 
> "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/alembic/util/pyfiles.py",
>  line 98, in load_python_file
> module = load_module_py(module_id, path)
>   File 
> "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/alembic/util/compat.py",
>  line 173, in load_module_py
> spec.loader.exec_module(module)
>   File "", line 678, in exec_module
>   File "", line 219, in _call_with_frames_removed
>   File 
> "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/airflow/migrations/env.py",
>  line 96, in 
> run_migrations_online()
>   File 
> "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/airflow/migrations/env.py",
>  line 90, in run_migrations_online
> context.run_migrations()
>   File "", line 8, in run_migrations
>   File 
> "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/alembic/runtime/environment.py",
>  line 846, in run_migrations
> self.get_context().run_migrations(**kw)
>   File 
> "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/alembic/runtime/migration.py",
>  line 518, in run_migrations
> step.migration_fn(**kw)
>   File 
> "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/airflow/migrations/versions/d38e04c12aa2_add_serialized_dag_table.py",
>  line 54, in upgrade
> sa.PrimaryKeyConstraint('dag_id'))
>   File "", line 8, in create_table
>   File "", line 3, in create_table
>   File 

[jira] [Comment Edited] (AIRFLOW-6609) Airflow upgradedb fails serialized_dag table add on revision id d38e04c12aa2

2020-04-18 Thread Kaxil Naik (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17086435#comment-17086435
 ] 

Kaxil Naik edited comment on AIRFLOW-6609 at 4/18/20, 12:20 PM:


That looks someone ran airflow initdb/upgradedb and stopped it midway.

Causing alembic to not update identifier in its table

Go to Airflow Metadata DB, find the alembic table and update the value to 
d38e04c12aa2


was (Author: kaxilnaik):
That looks someone ran airflow initdb/upgradedb and stopped it midway

Go to Airflow Metadata DB, find the alembic table and update the value to 
d38e04c12aa2

> Airflow upgradedb fails serialized_dag table add on revision id d38e04c12aa2
> 
>
> Key: AIRFLOW-6609
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6609
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: database
>Affects Versions: 1.10.7
>Reporter: Chris Schmautz
>Priority: Major
>  Labels: database, postgres
>
> We're attempting an upgrade from 1.10.3 to 1.10.7 to use some of the great 
> features available in later revisions; however, the upgrade from 1.10.6 to 
> 1.10.7 is causing some heartburn.
> +Runtime environment:+
>  - Docker containers for each runtime segment (webserver, scheduler, flower, 
> postgres, redis, worker)
>  - Using CeleryExecutor queued with Redis
>  - Using Postgres backend
>  
> +Steps to reproduce:+
>  1. Author base images relating to each version of Airflow between 1.10.3 and 
> 1.10.7 (if you want the full regression we have done)
>  2. 'airflow initdb' on revision 1.10.3
>  3. Start up the containers, run some dags, produce metadata
>  4. Increment / swap out base image revision from 1.10.3 base to 1.10.4 base 
> image
>  5. Run 'airflow upgradedb'
>  6. Validate success
>  n. Eventually you will get to the 1.10.6 revision, stepping up to 1.10.7, 
> which produces the error below
>  
> {code:java}
> INFO  [alembic.runtime.migration] Running upgrade 6e96a59344a4 -> 
> d38e04c12aa2, add serialized_dag table
> Revision ID: d38e04c12aa2
> Revises: 6e96a59344a4
> Create Date: 2019-08-01 14:39:35.616417
> Traceback (most recent call last):
>   File 
> "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/base.py",
>  line 1246, in _execute_context
> cursor, statement, parameters, context
>   File 
> "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/default.py",
>  line 581, in do_execute
> cursor.execute(statement, parameters)
> psycopg2.errors.DuplicateTable: relation "serialized_dag" already exists
> The above exception was the direct cause of the following exception:Traceback 
> (most recent call last):
>   File "/opt/anaconda/miniconda3/envs/airflow/bin/airflow", line 37, in 
> 
> args.func(args)
>   File 
> "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/airflow/utils/cli.py",
>  line 75, in wrapper
> return f(*args, **kwargs)
>   File 
> "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/airflow/bin/cli.py",
>  line 1193, in upgradedb
> db.upgradedb()
>   File 
> "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/airflow/utils/db.py",
>  line 376, in upgradedb
> command.upgrade(config, 'heads')
>   File 
> "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/alembic/command.py",
>  line 298, in upgrade
> script.run_env()
>   File 
> "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/alembic/script/base.py",
>  line 489, in run_env
> util.load_python_file(self.dir, "env.py")
>   File 
> "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/alembic/util/pyfiles.py",
>  line 98, in load_python_file
> module = load_module_py(module_id, path)
>   File 
> "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/alembic/util/compat.py",
>  line 173, in load_module_py
> spec.loader.exec_module(module)
>   File "", line 678, in exec_module
>   File "", line 219, in _call_with_frames_removed
>   File 
> "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/airflow/migrations/env.py",
>  line 96, in 
> run_migrations_online()
>   File 
> "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/airflow/migrations/env.py",
>  line 90, in run_migrations_online
> context.run_migrations()
>   File "", line 8, in run_migrations
>   File 
> "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/alembic/runtime/environment.py",
>  line 846, in run_migrations
> self.get_context().run_migrations(**kw)
>   File 
> "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/alembic/runtime/migration.py",
>  line 518, in run_migrations
> step.migration_fn(**kw)
>  

[jira] [Commented] (AIRFLOW-3347) Unable to configure Kubernetes secrets through environment

2020-04-17 Thread Kaxil Naik (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17086061#comment-17086061
 ] 

Kaxil Naik commented on AIRFLOW-3347:
-

Duplicate of https://issues.apache.org/jira/browse/AIRFLOW-5030 . Solved by 
https://github.com/apache/airflow/pull/5650

> Unable to configure Kubernetes secrets through environment
> --
>
> Key: AIRFLOW-3347
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3347
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: configuration, executors
>Affects Versions: 1.10.0
>Reporter: Chris Bandy
>Priority: Major
>  Labels: kubernetes
>
> We configure Airflow through environment variables. While setting up the 
> Kubernetes Executor, we wanted to pass the SQL Alchemy connection string to 
> workers by including it the {{kubernetes_secrets}} section of config.
> Unfortunately, even with 
> {{AIRFLOW_\_KUBERNETES_SECRETS_\_AIRFLOW_\_CORE_\_SQL_ALCHEMY_CONN}} set in 
> the scheduler environment, the worker gets no environment secret environment 
> variables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-3347) Unable to configure Kubernetes secrets through environment

2020-04-17 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik closed AIRFLOW-3347.
---
Resolution: Duplicate

> Unable to configure Kubernetes secrets through environment
> --
>
> Key: AIRFLOW-3347
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3347
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: configuration, executors
>Affects Versions: 1.10.0
>Reporter: Chris Bandy
>Priority: Major
>  Labels: kubernetes
>
> We configure Airflow through environment variables. While setting up the 
> Kubernetes Executor, we wanted to pass the SQL Alchemy connection string to 
> workers by including it the {{kubernetes_secrets}} section of config.
> Unfortunately, even with 
> {{AIRFLOW_\_KUBERNETES_SECRETS_\_AIRFLOW_\_CORE_\_SQL_ALCHEMY_CONN}} set in 
> the scheduler environment, the worker gets no environment secret environment 
> variables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5577) Dags Filter_by_owner is missing in RBAC

2020-04-17 Thread Kaxil Naik (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085840#comment-17085840
 ] 

Kaxil Naik commented on AIRFLOW-5577:
-

Well, you can also use using the access_control parameter on each dag:
https://github.com/apache/airflow/blob/4b25cb9d08565502172cb847c79d81559775d504/airflow/models/dag.py#L174

You can assign dags to only be accessible by certain roles. After doing this 
you should run `airflow sync_perm` from the CLI to update the permissions

> Dags Filter_by_owner is missing in RBAC
> ---
>
> Key: AIRFLOW-5577
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5577
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib
>Affects Versions: 1.10.3, 1.10.4, 1.10.5
>Reporter: Hari
>Assignee: Hari
>Priority: Major
>  Labels: easyfix
>
> After enabling the RBAC, the dags filter by owner option is missing. All the 
> Dags will be visible to all the users.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5577) Dags Filter_by_owner is missing in RBAC

2020-04-16 Thread Kaxil Naik (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085318#comment-17085318
 ] 

Kaxil Naik commented on AIRFLOW-5577:
-

Check: https://airflow.readthedocs.io/en/latest/security.html#dag-level-role

You can give permissions to certain users read permissions on specific dags

> Dags Filter_by_owner is missing in RBAC
> ---
>
> Key: AIRFLOW-5577
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5577
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib
>Affects Versions: 1.10.3, 1.10.4, 1.10.5
>Reporter: Hari
>Assignee: Hari
>Priority: Major
>  Labels: easyfix
>
> After enabling the RBAC, the dags filter by owner option is missing. All the 
> Dags will be visible to all the users.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5577) Dags Filter_by_owner is missing in RBAC

2020-04-16 Thread Kaxil Naik (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084727#comment-17084727
 ] 

Kaxil Naik commented on AIRFLOW-5577:
-

filter_by_owner was a hack to achieve DAG RBAC but since FAB-based UI already 
has it, there is no need of this flag for RBAC UI. Hence it is closed with "Not 
A Problem"

> Dags Filter_by_owner is missing in RBAC
> ---
>
> Key: AIRFLOW-5577
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5577
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib
>Affects Versions: 1.10.3, 1.10.4, 1.10.5
>Reporter: Hari
>Assignee: Hari
>Priority: Major
>  Labels: easyfix
>
> After enabling the RBAC, the dags filter by owner option is missing. All the 
> Dags will be visible to all the users.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-6885) Add option to only delete KubernetesExecutor pods on successful task completion

2020-04-15 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-6885:

Fix Version/s: (was: 2.0.0)
   1.10.11

> Add option to only delete KubernetesExecutor pods on successful task 
> completion
> ---
>
> Key: AIRFLOW-6885
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6885
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: executor-kubernetes
>Affects Versions: 1.10.9
>Reporter: Daniel Imberman
>Assignee: Daniel Imberman
>Priority: Minor
> Fix For: 1.10.11
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-6320) Add quarterly to crontab presets

2020-04-14 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-6320:

Fix Version/s: (was: 2.0.0)
   1.10.11

> Add quarterly to crontab presets
> 
>
> Key: AIRFLOW-6320
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6320
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: utils
>Affects Versions: 1.10.6
>Reporter: Jiajie Zhong
>Assignee: Jiajie Zhong
>Priority: Minor
> Fix For: 1.10.11
>
>
> Add quarterly to crontab presets



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-6351) security - ui - Add Cross Site Scripting defence

2020-04-14 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-6351:

Fix Version/s: (was: 2.0.0)
   1.10.11

> security - ui - Add Cross Site Scripting defence
> 
>
> Key: AIRFLOW-6351
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6351
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ui
>Affects Versions: 1.10.6, 1.10.7
>Reporter: t oo
>Assignee: t oo
>Priority: Major
> Fix For: 1.10.11
>
>
> *escape search -->*
>  
> *BEFORE*
> return self.render(
> 'airflow/dags.html',
> webserver_dags=webserver_dags_filtered,
> orm_dags=orm_dags,
> hide_paused=hide_paused,
> current_page=current_page,
> search_query=arg_search_query if arg_search_query else '',
> page_size=dags_per_page,
> num_of_pages=num_of_pages,
> num_dag_from=start + 1,
> num_dag_to=min(end, num_of_all_dags),
> num_of_all_dags=num_of_all_dags,
> paging=wwwutils.generate_pages(current_page, num_of_pages,
> {color:#FF}search=arg_search_query,{color}
> showPaused=not hide_paused),
> dag_ids_in_page=page_dag_ids,
> auto_complete_data=auto_complete_data)
>  
> *AFTER*
> return self.render(
> 'airflow/dags.html',
> webserver_dags=webserver_dags_filtered,
> orm_dags=orm_dags,
> hide_paused=hide_paused,
> current_page=current_page,
> search_query=arg_search_query if arg_search_query else '',
> page_size=dags_per_page,
> num_of_pages=num_of_pages,
> num_dag_from=start + 1,
> num_dag_to=min(end, num_of_all_dags),
> num_of_all_dags=num_of_all_dags,
> paging=wwwutils.generate_pages(current_page, num_of_pages,
> {color:#FF}search=escape(arg_search_query) if arg_search_query else 
> None,{color}
> showPaused=not hide_paused),
> dag_ids_in_page=page_dag_ids,
> auto_complete_data=auto_complete_data)
>  
> [https://github.com/apache/airflow/blob/v1-10-stable/airflow/www/views.py#L2278]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (AIRFLOW-4038) Remove DagBag from /home

2020-04-14 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik resolved AIRFLOW-4038.
-
Fix Version/s: (was: 2.0.0)
   1.10.11
   Resolution: Fixed

> Remove DagBag from /home
> 
>
> Key: AIRFLOW-4038
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4038
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: webserver
>Reporter: Peter van 't Hof
>Assignee: Peter van 't Hof
>Priority: Major
> Fix For: 1.10.11
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-4235) home page dags table: highlight rows on mouse hover

2020-04-14 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-4235:

Fix Version/s: (was: 2.0.0)
   1.10.11

> home page dags table: highlight rows on mouse hover
> ---
>
> Key: AIRFLOW-4235
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4235
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: ui
>Affects Versions: 1.10.2
>Reporter: Nando Quintana
>Assignee: Fokko Driesprong
>Priority: Trivial
>  Labels: easyfix, newbie
> Fix For: 1.10.11
>
> Attachments: table_dags_table_hover.png
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> In wide screens, home page dags table becomes too wide. There is too much 
> distance between "DAG" and "Links" columns and it is difficult to appreciate 
> which is the icon you should click.
> It would be very useful to highlight row on mouse hover.
> This could be fixed very quickly, adding 'table-hover' css class to 
> table#dags.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (AIRFLOW-6697) Base date and Search in Graph view don't dim when modal activate

2020-04-14 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik resolved AIRFLOW-6697.
-
Fix Version/s: 1.10.11
   Resolution: Fixed

> Base date and Search in Graph view don't dim when modal activate
> 
>
> Key: AIRFLOW-6697
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6697
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ui
>Affects Versions: 1.10.7
>Reporter: James Coder
>Assignee: James Coder
>Priority: Minor
> Fix For: 1.10.11
>
> Attachments: image-2020-01-31-13-46-57-069.png
>
>
> !image-2020-01-31-13-46-57-069.png|width=972,height=431!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-4357) Tool tip offset when using RBAC

2020-04-10 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-4357:

Fix Version/s: 1.10.11

> Tool tip offset when using RBAC
> ---
>
> Key: AIRFLOW-4357
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4357
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webserver
>Affects Versions: 1.10.2, 1.10.3
> Environment: Fedora 29 with Python 3.5.6 from conda
>Reporter: Charles Surett
>Priority: Minor
>  Labels: rbac, web
> Fix For: 1.10.11
>
> Attachments: Expected Behavior.png, Issue.png, installed-packages.txt
>
>
> Tool tips are offset when the page is scrolled when using the RBAC web UI
>  
> See attached images for more details.
>  
> It seems to be related to 
> [https://github.com/twbs/bootstrap/blob/v3.3.7/js/tooltip.js#L367-L372]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (AIRFLOW-6515) log level of INFO/WARN when ERROR happened

2020-04-09 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik resolved AIRFLOW-6515.
-
Fix Version/s: 1.10.11
   Resolution: Fixed

> log level of INFO/WARN when ERROR happened
> --
>
> Key: AIRFLOW-6515
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6515
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Affects Versions: 1.10.7
>Reporter: t oo
>Assignee: Will Hudgins
>Priority: Major
> Fix For: 1.10.11
>
>
> log level should be error on some (but there are false positives):
> grep -iE 
> 'log\.(info|warn).*(error|exceptio|fail|unab|couldn|lost|gone|missing|not 
> fou|abort|exit|could not)' -R *
> airflow/sensors/base_sensor_operator.py:self.log.info("Success 
> criteria met. Exiting.")
> airflow/logging_config.py:log.warning('Unable to load the config, 
> contains a configuration error.')
> airflow/operators/check_operator.py:self.log.warning("The 
> following %s tests out of %s failed:", j, n)
> airflow/operators/sql_to_gcs.py:self.log.warning('Using default 
> schema due to missing name or type. Please '
> airflow/operators/bash_operator.py:self.log.info('Command exited 
> with return code %s', self.sub_process.returncode)
> airflow/serialization/serialized_objects.py:LOG.warning('Failed 
> to stringify.', exc_info=True)
> airflow/providers/amazon/aws/operators/batch.py:
> self.log.info("AWS Batch Job has failed")
> airflow/providers/amazon/aws/hooks/s3.py:
> self.log.info(e.response["Error"]["Message"])
> airflow/providers/amazon/aws/hooks/s3.py:
> self.log.info(e.response["Error"]["Message"])
> airflow/utils/dag_processing.py:self.log.info("Exiting gracefully 
> upon receiving signal %s", signum)
> airflow/utils/dag_processing.py:self.log.info("Exiting dag 
> parsing loop as all files "
> airflow/utils/dag_processing.py:self.log.info("Failing jobs 
> without heartbeat after %s", limit_dttm)
> airflow/utils/dag_processing.py:self.log.info("Waiting up to %s 
> seconds for processes to exit...", timeout)
> airflow/utils/helpers.py:log.info("Process %s (%s) terminated with 
> exit code %s", p, p.pid, p.returncode)
> airflow/models/dagrun.py:self.log.info('Marking run %s failed', 
> self)
> airflow/models/dagrun.py:self.log.info('Deadlock; marking run %s 
> failed', self)
> airflow/models/dagrun.py:self.log.warning("Failed to get 
> task '{}' for dag '{}'. "
> airflow/gcp/sensors/gcs.py:self.log.warning("FAILURE: Inactivity 
> Period passed, not enough objects found in %s", path)
> airflow/gcp/operators/spanner.py:self.log.info("The Cloud Spanner 
> database was missing: "
> airflow/gcp/hooks/kubernetes_engine.py:self.log.info('Assuming 
> Success: %s', error.message)
> airflow/gcp/hooks/kubernetes_engine.py:self.log.info('Assuming 
> Success: %s', error.message)
> airflow/gcp/hooks/cloud_memorystore.py:self.log.info("Failovering 
> Instance: %s", name)
> airflow/gcp/hooks/cloud_memorystore.py:self.log.info("Instance 
> failovered: %s", name)
> airflow/gcp/hooks/bigquery.py:self.log.info(error_msg)
> airflow/gcp/hooks/bigtable.py:self.log.info("The instance '%s' 
> does not exist in project '%s'. Exiting", instance_id,
> airflow/contrib/sensors/bash_sensor.py:self.log.info("Command 
> exited with return code %s", sp.returncode)
> airflow/contrib/sensors/ftp_sensor.py:self.log.info('Ftp 
> error encountered: %s', str(e))
> airflow/contrib/operators/azure_container_instances_operator.py:
> self.log.info("Container had exit code: %s", exit_code)
> airflow/contrib/operators/azure_container_instances_operator.py:  
>   self.log.info("Container exited with detail_status %s", detail_status)
> airflow/contrib/operators/azure_container_instances_operator.py:  
>   self.log.info("Azure provision failure")
> airflow/contrib/operators/winrm_operator.py:self.log.info("Hook 
> not found, creating...")
> airflow/contrib/operators/docker_swarm_operator.py:
> self.log.info('Service status before exiting: %s', status)
> airflow/contrib/auth/backends/ldap_auth.py:log.warning("Unable to 
> find group for %s %s", search_base, search_filter)
> airflow/contrib/auth/backends/ldap_auth.py:log.warning("""Missing 
> attribute "%s" when looked-up in Ldap database.
> airflow/contrib/auth/backends/ldap_auth.py:log.warning("Parsing error 
> when retrieving the user's group(s)."
> airflow/contrib/utils/sendgrid.py:

[jira] [Updated] (AIRFLOW-6167) Escape col name in MysqlToHive operator

2020-04-09 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-6167:

Fix Version/s: (was: 1.10.10)
   1.10.11

> Escape col name in MysqlToHive operator
> ---
>
> Key: AIRFLOW-6167
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6167
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Affects Versions: 1.10.4
>Reporter: Ping Zhang
>Assignee: Ping Zhang
>Priority: Major
> Fix For: 1.10.11
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-2516) Deadlock found when trying to update task_instance table

2020-04-09 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-2516:

Fix Version/s: (was: 1.10.10)
   1.10.11

> Deadlock found when trying to update task_instance table
> 
>
> Key: AIRFLOW-2516
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2516
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DagRun
>Affects Versions: 1.8.0, 1.9.0, 1.10.0, 1.10.1, 1.10.2, 1.10.3, 1.10.4, 
> 1.10.5, 1.10.6, 1.10.7
>Reporter: Jeff Liu
>Assignee: Jarek Potiuk
>Priority: Major
> Fix For: 1.10.11
>
> Attachments: Screenshot 2019-12-30 at 10.42.52.png, 
> image-2019-12-30-10-48-41-313.png, image-2019-12-30-10-58-02-610.png, 
> jobs.py, jobs_fixed_deadlock_possibly_1.9.py, 
> scheduler_job_fixed_deadlock_possibly_1.10.6.py
>
>
>  
>  
> {code:java}
> [2018-05-23 17:59:57,218] {base_task_runner.py:98} INFO - Subtask: 
> [2018-05-23 17:59:57,217] {base_executor.py:49} INFO - Adding to queue: 
> airflow run production_wipeout_wipe_manager.Carat Carat_20180227 
> 2018-05-23T17:41:18.815809 --local -sd DAGS_FOLDER/wipeout/wipeout.py
> [2018-05-23 17:59:57,231] {base_task_runner.py:98} INFO - Subtask: Traceback 
> (most recent call last):
> [2018-05-23 17:59:57,232] {base_task_runner.py:98} INFO - Subtask: File 
> "/usr/local/bin/airflow", line 27, in 
> [2018-05-23 17:59:57,232] {base_task_runner.py:98} INFO - Subtask: 
> args.func(args)
> [2018-05-23 17:59:57,232] {base_task_runner.py:98} INFO - Subtask: File 
> "/usr/local/lib/python2.7/site-packages/airflow/bin/cli.py", line 392, in run
> [2018-05-23 17:59:57,232] {base_task_runner.py:98} INFO - Subtask: 
> pool=args.pool,
> [2018-05-23 17:59:57,233] {base_task_runner.py:98} INFO - Subtask: File 
> "/usr/local/lib/python2.7/site-packages/airflow/utils/db.py", line 50, in 
> wrapper
> [2018-05-23 17:59:57,233] {base_task_runner.py:98} INFO - Subtask: result = 
> func(*args, **kwargs)
> [2018-05-23 17:59:57,233] {base_task_runner.py:98} INFO - Subtask: File 
> "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1532, in 
> _run_raw_task
> [2018-05-23 17:59:57,234] {base_task_runner.py:98} INFO - Subtask: 
> self.handle_failure(e, test_mode, context)
> [2018-05-23 17:59:57,234] {base_task_runner.py:98} INFO - Subtask: File 
> "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1641, in 
> handle_failure
> [2018-05-23 17:59:57,234] {base_task_runner.py:98} INFO - Subtask: 
> session.merge(self)
> [2018-05-23 17:59:57,235] {base_task_runner.py:98} INFO - Subtask: File 
> "/usr/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 
> 1920, in merge
> [2018-05-23 17:59:57,235] {base_task_runner.py:98} INFO - Subtask: 
> _resolve_conflict_map=_resolve_conflict_map)
> [2018-05-23 17:59:57,235] {base_task_runner.py:98} INFO - Subtask: File 
> "/usr/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 
> 1974, in _merge
> [2018-05-23 17:59:57,236] {base_task_runner.py:98} INFO - Subtask: merged = 
> self.query(mapper.class_).get(key[1])
> [2018-05-23 17:59:57,236] {base_task_runner.py:98} INFO - Subtask: File 
> "/usr/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 882, 
> in get
> [2018-05-23 17:59:57,236] {base_task_runner.py:98} INFO - Subtask: ident, 
> loading.load_on_pk_identity)
> [2018-05-23 17:59:57,236] {base_task_runner.py:98} INFO - Subtask: File 
> "/usr/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 952, 
> in _get_impl
> [2018-05-23 17:59:57,237] {base_task_runner.py:98} INFO - Subtask: return 
> db_load_fn(self, primary_key_identity)
> [2018-05-23 17:59:57,237] {base_task_runner.py:98} INFO - Subtask: File 
> "/usr/local/lib/python2.7/site-packages/sqlalchemy/orm/loading.py", line 247, 
> in load_on_pk_i
> dentity
> [2018-05-23 17:59:57,237] {base_task_runner.py:98} INFO - Subtask: return 
> q.one()
> [2018-05-23 17:59:57,238] {base_task_runner.py:98} INFO - Subtask: File 
> "/usr/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2884, 
> in one
> [2018-05-23 17:59:57,238] {base_task_runner.py:98} INFO - Subtask: ret = 
> self.one_or_none()
> [2018-05-23 17:59:57,238] {base_task_runner.py:98} INFO - Subtask: File 
> "/usr/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2854, 
> in one_or_none
> [2018-05-23 17:59:57,238] {base_task_runner.py:98} INFO - Subtask: ret = 
> list(self)
> [2018-05-23 17:59:57,239] {base_task_runner.py:98} INFO - Subtask: File 
> "/usr/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2925, 
> in __iter__
> [2018-05-23 17:59:57,239] {base_task_runner.py:98} INFO - Subtask: return 
> self._execute_and_instances(context)
> [2018-05-23 17:59:57,239] {base_task_runner.py:98} INFO - 

[jira] [Updated] (AIRFLOW-7046) Support Locale-formatted datetimes in Webserver

2020-04-09 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-7046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-7046:

Fix Version/s: (was: 1.10.10)
   1.10.11

> Support Locale-formatted datetimes in Webserver
> ---
>
> Key: AIRFLOW-7046
> URL: https://issues.apache.org/jira/browse/AIRFLOW-7046
> Project: Apache Airflow
>  Issue Type: Sub-task
>  Components: ui, webserver
>Affects Versions: 1.10.10
>Reporter: Kaxil Naik
>Assignee: Samantha Black
>Priority: Minor
> Fix For: 1.10.11
>
>
> Support human-readable time in local language



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (AIRFLOW-6914) Add a default robots.txt to deny all search engines

2020-04-07 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik reopened AIRFLOW-6914:
-

> Add a default robots.txt to deny all search engines
> ---
>
> Key: AIRFLOW-6914
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6914
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: security, ui
>Affects Versions: 1.10.6, 1.10.7, 1.10.8, 1.10.9
>Reporter: Kaxil Naik
>Priority: Major
>  Labels: gsoc
> Fix For: 1.10.11
>
>
> If the Airflow UI is public, Google can index it and if the Authentication 
> has not been enabled it is a serious security threat if it is a prod cluster.
> Something like this probably should work
> {code:python}
> @app.route('/robots.txt', methods=['GET'])
> def robotstxt():
> return send_from_directory(os.path.join(app.root_path, 'static', 'txt'),
>'robots.txt')
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (AIRFLOW-6914) Add a default robots.txt to deny all search engines

2020-04-07 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik resolved AIRFLOW-6914.
-
Fix Version/s: 1.10.11
   Resolution: Fixed

> Add a default robots.txt to deny all search engines
> ---
>
> Key: AIRFLOW-6914
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6914
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: security, ui
>Affects Versions: 1.10.6, 1.10.7, 1.10.8, 1.10.9
>Reporter: Kaxil Naik
>Priority: Major
>  Labels: gsoc
> Fix For: 1.10.11
>
>
> If the Airflow UI is public, Google can index it and if the Authentication 
> has not been enabled it is a serious security threat if it is a prod cluster.
> Something like this probably should work
> {code:python}
> @app.route('/robots.txt', methods=['GET'])
> def robotstxt():
> return send_from_directory(os.path.join(app.root_path, 'static', 'txt'),
>'robots.txt')
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (AIRFLOW-6822) AWS hooks dont always cache the boto3 client

2020-04-04 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik resolved AIRFLOW-6822.
-
Fix Version/s: 2.0.0
   Resolution: Fixed

> AWS hooks dont always cache the boto3 client
> 
>
> Key: AIRFLOW-6822
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6822
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: aws
>Affects Versions: 1.10.9
>Reporter: Bjorn Olsen
>Assignee: Bjorn Olsen
>Priority: Minor
> Fix For: 2.0.0
>
>
> Implementation of the Amazon AWS hooks (eg S3 hook, Glue hook etc) varies 
> with how they call the underlying aws_hook.get_client_type(X) method.
> Most of the time the client that gets returned is cached by the superclass, 
> but not always. The client should always be cached for performance reasons - 
> creating a client is a time consuming process.
> Example of how to do it (athena.py):
>  
> {code:java}
> def get_conn(self): 
> """
>     check if aws conn exists already or create one and return it 
>     :return: boto3 session
>     """
>     if not self.conn:
>     self.conn = self.get_client_type('athena')
>     return self.conn{code}
>  
>  
> Example of how not to do it: (s3.py):
>  
> {code:java}
> def get_conn(self):
>  return self.get_client_type('s3'){code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (AIRFLOW-7049) Make show_paused persistent across navigation

2020-04-04 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik reassigned AIRFLOW-7049:
---

Assignee: (was: Ryan McKinley)

> Make show_paused persistent across navigation
> -
>
> Key: AIRFLOW-7049
> URL: https://issues.apache.org/jira/browse/AIRFLOW-7049
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: webserver
>Affects Versions: 1.10.9
>Reporter: Ash Berlin-Taylor
>Priority: Minor
>  Labels: gsoc
>
> The "Show/hide paused DAGs" toggle controls the setting for a page, but if 
> you navigate to a different page and then click on the "home" link the 
> setting will be lost.
> We should persist this on a per-user setting.
> A cookie might work for this -- we could also possibly store the "current" 
> value in the session.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work stopped] (AIRFLOW-7049) Make show_paused persistent across navigation

2020-04-04 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-7049 stopped by Kaxil Naik.
---
> Make show_paused persistent across navigation
> -
>
> Key: AIRFLOW-7049
> URL: https://issues.apache.org/jira/browse/AIRFLOW-7049
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: webserver
>Affects Versions: 1.10.9
>Reporter: Ash Berlin-Taylor
>Assignee: Ryan McKinley
>Priority: Minor
>  Labels: gsoc
>
> The "Show/hide paused DAGs" toggle controls the setting for a page, but if 
> you navigate to a different page and then click on the "home" link the 
> setting will be lost.
> We should persist this on a per-user setting.
> A cookie might work for this -- we could also possibly store the "current" 
> value in the session.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (AIRFLOW-7049) Make show_paused persistent across navigation

2020-04-04 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik reassigned AIRFLOW-7049:
---

Assignee: Ryan McKinley  (was: Kaxil Naik)

> Make show_paused persistent across navigation
> -
>
> Key: AIRFLOW-7049
> URL: https://issues.apache.org/jira/browse/AIRFLOW-7049
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: webserver
>Affects Versions: 1.10.9
>Reporter: Ash Berlin-Taylor
>Assignee: Ryan McKinley
>Priority: Minor
>  Labels: gsoc
>
> The "Show/hide paused DAGs" toggle controls the setting for a page, but if 
> you navigate to a different page and then click on the "home" link the 
> setting will be lost.
> We should persist this on a per-user setting.
> A cookie might work for this -- we could also possibly store the "current" 
> value in the session.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (AIRFLOW-7049) Make show_paused persistent across navigation

2020-04-04 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-7049 started by Kaxil Naik.
---
> Make show_paused persistent across navigation
> -
>
> Key: AIRFLOW-7049
> URL: https://issues.apache.org/jira/browse/AIRFLOW-7049
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: webserver
>Affects Versions: 1.10.9
>Reporter: Ash Berlin-Taylor
>Assignee: Kaxil Naik
>Priority: Minor
>  Labels: gsoc
>
> The "Show/hide paused DAGs" toggle controls the setting for a page, but if 
> you navigate to a different page and then click on the "home" link the 
> setting will be lost.
> We should persist this on a per-user setting.
> A cookie might work for this -- we could also possibly store the "current" 
> value in the session.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (AIRFLOW-4529) Support for Azure Batch

2020-04-03 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik resolved AIRFLOW-4529.
-
Fix Version/s: 2.0.0
   Resolution: Fixed

> Support for Azure Batch
> ---
>
> Key: AIRFLOW-4529
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4529
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: contrib, operators
>Reporter: Christian Lellmann
>Assignee: Ephraim E Anierobi
>Priority: Major
> Fix For: 2.0.0
>
>
> Add an operator to support Azure Batch Jobs and monitoring similar to 
> AWSBatchOperator.
> The operator must be able to create a pool and a job and assign tasks to this 
> job. Infrastructure must be definable as well as the docker containers that 
> run.
> A cyclic polling of the status of tasks must be implemented to recognize 
> completion or failure of task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (AIRFLOW-6885) Add option to only delete KubernetesExecutor pods on successful task completion

2020-04-03 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik resolved AIRFLOW-6885.
-
Resolution: Fixed

Marking it for 2.0.0 but we will release it in 1.10.11

> Add option to only delete KubernetesExecutor pods on successful task 
> completion
> ---
>
> Key: AIRFLOW-6885
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6885
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: executor-kubernetes
>Affects Versions: 1.10.9
>Reporter: Daniel Imberman
>Assignee: Daniel Imberman
>Priority: Minor
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (AIRFLOW-6836) DebugExecutor failing to change task state.

2020-04-02 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik resolved AIRFLOW-6836.
-
Fix Version/s: (was: 2.0.0)
   1.10.10
   Resolution: Fixed

> DebugExecutor failing to change task state.
> ---
>
> Key: AIRFLOW-6836
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6836
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executors
>Affects Versions: 1.10.9
>Reporter: Sam Wheating
>Assignee: Sam Wheating
>Priority: Minor
>  Labels: easyfix
> Fix For: 1.10.10
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Running a DAG locally with the DebugExecutor fails with the following error:
>  
> {noformat}
>   File 
> "/Users/samwheating/.pyenv/versions/3.7.3/lib/python3.7/site-packages/airflow/executors/debug_executor.py",
>  line 148, in change_state
>  self.running.remove(key)
>  AttributeError: 'dict' object has no attribute 'remove'
> {noformat}
> This seems to be because the change_state function is expecting a set rather 
> than a dict for `self.running`. This should be updated to use `del 
> self.running[key]` or `self.running.pop(key)`.
> I'll submit a PR for this tomorrow. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-6959) Use NULL as dag.description default value and change related UI

2020-04-02 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-6959:

Fix Version/s: (was: 1.10.10)
   2.0.0

> Use NULL as dag.description default value and change related UI
> ---
>
> Key: AIRFLOW-6959
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6959
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: DAG, database, ui
>Affects Versions: 1.10.9
>Reporter: zhongjiajie
>Assignee: zhongjiajie
>Priority: Minor
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (AIRFLOW-5800) Add a default connection entry for PinotDbApiHook

2020-04-02 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik resolved AIRFLOW-5800.
-
Fix Version/s: 2.0.0
   Resolution: Fixed

> Add a default connection entry for PinotDbApiHook
> -
>
> Key: AIRFLOW-5800
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5800
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: database
>Affects Versions: 1.10.5
>Reporter: Kengo Seki
>Assignee: Kengo Seki
>Priority: Minor
> Fix For: 2.0.0
>
>
> Airflow provides PinotDbApiHook but its default connection doesn't exist in 
> the DB. It'd be convenient for testing purposes or as an example if 
> {{pinot_broker_default}} is defined in the DB with commonly used parameters, 
> just as other hooks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-6959) Use NULL as dag.description default value and change related UI

2020-04-02 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-6959:

Fix Version/s: (was: 2.0.0)
   1.10.10

> Use NULL as dag.description default value and change related UI
> ---
>
> Key: AIRFLOW-6959
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6959
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: DAG, database, ui
>Affects Versions: 1.10.9
>Reporter: zhongjiajie
>Assignee: zhongjiajie
>Priority: Minor
> Fix For: 1.10.10
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-6797) Create policy hooks for DAGs

2020-04-02 Thread Kaxil Naik (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17073741#comment-17073741
 ] 

Kaxil Naik commented on AIRFLOW-6797:
-

Yup, I have few different bits planned for DAG policies that I will create 
hopefully this month

> Create policy hooks for DAGs
> 
>
> Key: AIRFLOW-6797
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6797
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: scheduler
>Affects Versions: 1.10.9
>Reporter: Matthew Bruce
>Assignee: Kaxil Naik
>Priority: Minor
>
> Policy hooks exist to modify task objects just before they are run:
> [https://airflow.apache.org/docs/stable/concepts.html?highlight=policy#cluster-policy]
>  
> Similar functionality for DAGs at loading time so that they could be rejected 
> or modified would be useful (i.e. to validate DAG naming, etc.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (AIRFLOW-7045) RenderedTIFields: Update Sqlalchemy query to support all backends

2020-04-01 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-7045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik resolved AIRFLOW-7045.
-
Resolution: Fixed

> RenderedTIFields: Update Sqlalchemy query to support all backends
> -
>
> Key: AIRFLOW-7045
> URL: https://issues.apache.org/jira/browse/AIRFLOW-7045
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: serialization
>Affects Versions: 1.10.10
>Reporter: Kaxil Naik
>Assignee: Kaxil Naik
>Priority: Major
>  Labels: dag-serialization
> Fix For: 1.10.10
>
>
> https://github.com/apache/airflow/pull/6788#discussion_r391268396



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (AIRFLOW-7045) RenderedTIFields: Update Sqlalchemy query to support all backends

2020-04-01 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-7045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-7045 started by Kaxil Naik.
---
> RenderedTIFields: Update Sqlalchemy query to support all backends
> -
>
> Key: AIRFLOW-7045
> URL: https://issues.apache.org/jira/browse/AIRFLOW-7045
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: serialization
>Affects Versions: 1.10.10
>Reporter: Kaxil Naik
>Assignee: Kaxil Naik
>Priority: Major
>  Labels: dag-serialization
> Fix For: 1.10.10
>
>
> https://github.com/apache/airflow/pull/6788#discussion_r391268396



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-7045) RenderedTIFields: Update Sqlalchemy query to support all backends

2020-04-01 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-7045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-7045:

Summary: RenderedTIFields: Update Sqlalchemy query to support all backends  
(was: Update Sqlalchemy query to support all backends)

> RenderedTIFields: Update Sqlalchemy query to support all backends
> -
>
> Key: AIRFLOW-7045
> URL: https://issues.apache.org/jira/browse/AIRFLOW-7045
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: serialization
>Affects Versions: 1.10.10
>Reporter: Kaxil Naik
>Assignee: Kaxil Naik
>Priority: Major
>  Labels: dag-serialization
> Fix For: 1.10.10
>
>
> https://github.com/apache/airflow/pull/6788#discussion_r391268396



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (AIRFLOW-6871) Tree view unusable for large DAGs

2020-04-01 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik resolved AIRFLOW-6871.
-
Fix Version/s: 1.10.10
   Resolution: Fixed

> Tree view unusable for large DAGs
> -
>
> Key: AIRFLOW-6871
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6871
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: webserver
>Affects Versions: 2.0.0
>Reporter: QP Hou
>Assignee: QP Hou
>Priority: Major
> Fix For: 1.10.10
>
>
> By default airflow loads 25 runs in the tree view. For our main DAG, it's 
> just not usable because all we get is a 5xx error when hitting that page. 
> Manually overriding number of runs to 15 makes the page to load again, but 
> it's very slow and takes more than 1 minute to render.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-3607) Decreasing scheduler delay between tasks

2020-04-01 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-3607:

Fix Version/s: (was: 1.10.10)
   2.0.0

> Decreasing scheduler delay between tasks
> 
>
> Key: AIRFLOW-3607
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3607
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 1.10.0, 1.10.1, 1.10.2
> Environment: ubuntu 14.04
>Reporter: Amichai Horvitz
>Assignee: Amichai Horvitz
>Priority: Major
> Fix For: 2.0.0
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> I came across the TODO in airflow/ti_deps/deps/trigger_rule_dep (line 52) 
> that says instead of checking the query for every task let the tasks report 
> to the dagrun. I have a dag with many tasks and the delay between tasks can 
> rise to 10 seconds or more, I already changed the configuration, added 
> processes and memory, checked the code and did research, profiling and other 
> experiments. I hope that this change will make a drastic change in the delay. 
> I would be happy to discuss this solution, the research and other solutions 
> for this issue.  
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-6685) Add ThresholdCheckOperator for Data Quality Checking

2020-04-01 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-6685:

Fix Version/s: (was: 2.0.0)
   1.10.10

> Add ThresholdCheckOperator for Data Quality Checking 
> -
>
> Key: AIRFLOW-6685
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6685
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: operators
>Affects Versions: 2.0.0
>Reporter: alex l
>Assignee: alex l
>Priority: Major
> Fix For: 1.10.10
>
>
> This PR includes a new operator in *{{CheckOperator}}* that allows users to 
> perform a threshold data quality check.
> *{{ThresholdCheckOperator}}* will check a single value, sql result against a 
> threshold range, and will fail a task if it is outside this range. The lower 
> and upper bound of the threshold can be defined as either a numeric values, 
> or sql-statements that returns a numeric value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (AIRFLOW-5907) Add S3 to MySql Operator

2020-04-01 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik resolved AIRFLOW-5907.
-
Fix Version/s: 2.0.0
   Resolution: Fixed

> Add S3 to MySql Operator
> 
>
> Key: AIRFLOW-5907
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5907
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: operators
>Affects Versions: 2.0.0, 1.10.6
>Reporter: Felix Uellendall
>Assignee: Felix Uellendall
>Priority: Major
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-5391) Clearing a task skipped by BranchPythonOperator will cause the task to execute

2020-03-31 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-5391:

Fix Version/s: (was: 1.10.10)
   2.0.0

> Clearing a task skipped by BranchPythonOperator will cause the task to execute
> --
>
> Key: AIRFLOW-5391
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5391
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Affects Versions: 1.10.4
>Reporter: Qian Yu
>Assignee: Qian Yu
>Priority: Major
> Fix For: 2.0.0
>
>
> I tried this on 1.10.3 and 1.10.4, both have this issue: 
> E.g. in this example from the doc, branch_a executed, branch_false was 
> skipped because of branching condition. However if someone Clear 
> branch_false, it'll cause branch_false to execute. 
> !https://airflow.apache.org/_images/branch_good.png!
> This behaviour is understandable given how BranchPythonOperator is 
> implemented. BranchPythonOperator does not store its decision anywhere. It 
> skips its own downstream tasks in the branch at runtime. So there's currently 
> no way for branch_false to know it should be skipped without rerunning the 
> branching task.
> This is obviously counter-intuitive from the user's perspective. In this 
> example, users would not expect branch_false to execute when they clear it 
> because the branching task should have skipped it.
> There are a few ways to improve this:
> Option 1): Make downstream tasks skipped by BranchPythonOperator not 
> clearable without also clearing the upstream BranchPythonOperator. In this 
> example, if someone clears branch_false without clearing branching, the Clear 
> action should just fail with an error telling the user he needs to clear the 
> branching task as well.
> Option 2): Make BranchPythonOperator store the result of its skip condition 
> somewhere. Make downstream tasks check for this stored decision and skip 
> themselves if they should have been skipped by the condition. This probably 
> means the decision of BranchPythonOperator needs to be stored in the db.
>  
> [kevcampb|https://blog.diffractive.io/author/kevcampb/] attempted a 
> workaround and on this blog. And he acknowledged his workaround is not 
> perfect and a better permanent fix is needed:
> [https://blog.diffractive.io/2018/08/07/replacement-shortcircuitoperator-for-airflow/]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-1467) allow tasks to use more than one pool slot

2020-03-31 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-1467:

Fix Version/s: (was: 2.0.0)
   1.10.10

> allow tasks to use more than one pool slot
> --
>
> Key: AIRFLOW-1467
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1467
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Adrian Bridgett
>Assignee: Lokesh Lal
>Priority: Trivial
>  Labels: pool
> Fix For: 1.10.10
>
>
> It would be useful to have tasks use more than a single pool slot. 
> Our use case is actually to limit how many tasks run on a head node (due to 
> memory constraints), currently we have to set a pool limit limiting how many 
> tasks.
> Ideally we could set the pool size to e.g amount of memory and then set those 
> tasks pool_usage (or whatever the option would be called) to the amount of 
> memory we think they'll use.  This way the pool would let lots of small tasks 
> run or just a few large tasks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-1467) allow tasks to use more than one pool slot

2020-03-31 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-1467:

Fix Version/s: (was: 1.10.10)
   2.0.0

> allow tasks to use more than one pool slot
> --
>
> Key: AIRFLOW-1467
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1467
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Adrian Bridgett
>Assignee: Lokesh Lal
>Priority: Trivial
>  Labels: pool
> Fix For: 2.0.0
>
>
> It would be useful to have tasks use more than a single pool slot. 
> Our use case is actually to limit how many tasks run on a head node (due to 
> memory constraints), currently we have to set a pool limit limiting how many 
> tasks.
> Ideally we could set the pool size to e.g amount of memory and then set those 
> tasks pool_usage (or whatever the option would be called) to the amount of 
> memory we think they'll use.  This way the pool would let lots of small tasks 
> run or just a few large tasks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-6872) Git Version is always Not Available in UI

2020-03-31 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-6872:

Fix Version/s: (was: 1.10.10)
   2.0.0

> Git Version is always Not Available in UI
> -
>
> Key: AIRFLOW-6872
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6872
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.10.7, 1.10.8, 1.10.9
>Reporter: Kaxil Naik
>Assignee: Kaxil Naik
>Priority: Major
> Fix For: 2.0.0
>
>
> git_version is always "Not Available" in Webserver. This is because the file 
> (git_version) is missing from the final built package



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-7017) Respect default dag view on redirect after trigger

2020-03-31 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-7017:

Fix Version/s: (was: 1.10.10)
   2.0.0

> Respect default dag view on redirect after trigger
> --
>
> Key: AIRFLOW-7017
> URL: https://issues.apache.org/jira/browse/AIRFLOW-7017
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: webserver
>Affects Versions: 1.10.9
>Reporter: Joshua Carp
>Priority: Trivial
> Fix For: 2.0.0
>
>
> Triggering a dag from the dag detail page always redirect to the dag tree 
> view. This redirect should respect the default dag view if configured.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-7017) Respect default dag view on redirect after trigger

2020-03-31 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-7017:

Fix Version/s: (was: 2.0.0)
   1.10.10

> Respect default dag view on redirect after trigger
> --
>
> Key: AIRFLOW-7017
> URL: https://issues.apache.org/jira/browse/AIRFLOW-7017
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: webserver
>Affects Versions: 1.10.9
>Reporter: Joshua Carp
>Priority: Trivial
> Fix For: 1.10.10
>
>
> Triggering a dag from the dag detail page always redirect to the dag tree 
> view. This redirect should respect the default dag view if configured.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-6562) mushroom cloud error when clicking 'mark failed/success' from graph view of dag that has never been run yet

2020-03-31 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-6562:

Fix Version/s: (was: 1.10.10)
   2.0.0

> mushroom cloud error when clicking 'mark failed/success' from graph view of 
> dag that has never been run yet
> ---
>
> Key: AIRFLOW-6562
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6562
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ui
>Affects Versions: 1.10.6
> Environment: localexec, mysql metastore, 1.10.6
>Reporter: t oo
>Assignee: t oo
>Priority: Major
> Fix For: 2.0.0
>
>
> # create a new dag
>  # go to graph view
>  # click on one of the tasks (it should have a white border)
>  # click on 'past/future' on either 2nd last row (mark failed) or last row 
> (mark success)
>  # then click either (mark failed) or (mark success)
> below error appears
> Traceback (most recent call last):
>  File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 2446, in 
> wsgi_app
>  response = self.full_dispatch_request()
>  File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1951, in 
> full_dispatch_request
>  rv = self.handle_user_exception(e)
>  File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1820, in 
> handle_user_exception
>  reraise(exc_type, exc_value, tb)
>  File "/usr/local/lib/python3.7/site-packages/flask/_compat.py", line 39, in 
> reraise
>  raise value
>  File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1949, in 
> full_dispatch_request
>  rv = self.dispatch_request()
>  File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1935, in 
> dispatch_request
>  return self.view_functions[rule.endpoint](**req.view_args)
>  File "/usr/local/lib/python3.7/site-packages/flask_admin/base.py", line 69, 
> in inner
>  return self._run_view(f, *args, **kwargs)
>  File "/usr/local/lib/python3.7/site-packages/flask_admin/base.py", line 368, 
> in _run_view
>  return fn(self, *args, **kwargs)
>  File "/usr/local/lib/python3.7/site-packages/flask_login/utils.py", line 
> 258, in decorated_view
>  return func(*args, **kwargs)
>  File "/usr/local/lib/python3.7/site-packages/airflow/www/utils.py", line 
> 290, in wrapper
>  return f(*args, **kwargs)
>  File "/usr/local/lib/python3.7/site-packages/airflow/www/utils.py", line 
> 337, in wrapper
>  return f(*args, **kwargs)
>  File "/usr/local/lib/python3.7/site-packages/airflow/www/views.py", line 
> 1449, in failed
>  future, past, State.FAILED)
>  File "/usr/local/lib/python3.7/site-packages/airflow/www/views.py", line 
> 1420, in _mark_task_instance_state
>  commit=False)
>  File "/usr/local/lib/python3.7/site-packages/airflow/utils/db.py", line 74, 
> in wrapper
>  return func(*args, **kwargs)
>  File 
> "/usr/local/lib/python3.7/site-packages/airflow/api/common/experimental/mark_tasks.py",
>  line 105, in set_state
>  dates = get_execution_dates(dag, execution_date, future, past)
>  File 
> "/usr/local/lib/python3.7/site-packages/airflow/api/common/experimental/mark_tasks.py",
>  line 246, in get_execution_dates
>  raise ValueError("Received non-localized date {}".format(execution_date))
> ValueError: Received non-localized date 2020-01-14T21:58:44.855743+00:00
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-6994) SparkSubmitOperator re launches spark driver even when original driver still running

2020-03-31 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-6994:

Fix Version/s: (was: 1.10.10)
   2.0.0

> SparkSubmitOperator re launches spark driver even when original driver still 
> running
> 
>
> Key: AIRFLOW-6994
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6994
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 1.10.8, 1.10.9
>Reporter: t oo
>Assignee: t oo
>Priority: Major
> Fix For: 2.0.0
>
>
> https://issues.apache.org/jira/browse/AIRFLOW-6229 introduced a bug
> Due to temporary network blip in connection to spark the state goes to 
> unknown (as no tags found in curl response) and forces retry
> fix in spark_submit_hook.py:
>   
> {code:java}
>   def _process_spark_status_log(self, itr):
> """
> parses the logs of the spark driver status query process
> :param itr: An iterator which iterates over the input of the 
> subprocess
> """
> response_found = False
> driver_found = False
> # Consume the iterator
> for line in itr:
> line = line.strip()
> if "submissionId" in line:
> response_found = True
> 
> # Check if the log line is about the driver status and extract 
> the status.
> if "driverState" in line:
> self._driver_status = line.split(' : ')[1] \
> .replace(',', '').replace('\"', '').strip()
> driver_found = True
> self.log.debug("spark driver status log: {}".format(line))
> if response_found and not driver_found:
> self._driver_status = "UNKNOWN"
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (AIRFLOW-5807) Move SFTP from contrib to core

2020-03-30 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik resolved AIRFLOW-5807.
-
Resolution: Fixed

> Move SFTP from contrib to core
> --
>
> Key: AIRFLOW-5807
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5807
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib, hooks, operators
>Affects Versions: 1.10.5
>Reporter: Tobiasz Kedzierski
>Assignee: Tobiasz Kedzierski
>Priority: Minor
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-6167) Escape col name in MysqlToHive operator

2020-03-30 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-6167:

Fix Version/s: (was: 2.0.0)
   1.10.10

> Escape col name in MysqlToHive operator
> ---
>
> Key: AIRFLOW-6167
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6167
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Affects Versions: 1.10.4
>Reporter: Ping Zhang
>Assignee: Ping Zhang
>Priority: Major
> Fix For: 1.10.10
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-6542) sparkKubernetes operator for https://github.com/GoogleCloudPlatform/spark-on-k8s-operator

2020-03-30 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-6542:

Fix Version/s: (was: 1.10.10)
   2.0.0

> sparkKubernetes operator for 
> https://github.com/GoogleCloudPlatform/spark-on-k8s-operator
> -
>
> Key: AIRFLOW-6542
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6542
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib
>Affects Versions: 1.10.7
>Reporter: Roi Teveth
>Priority: Minor
> Fix For: 2.0.0
>
>
> hi, 
> we  working on spark on Kubernetes POC using the google cloud platform 
> spark-k8s-operator 
> [https://github.com/GoogleCloudPlatform/spark-on-k8s-operator] and haven't 
> found native airflow integration for it so we wrote one: 
> kubernetes_hook which create and get kuberenetes crd object
> spark_kubernetes_operator which sends sparkapplication crd to kubernetes 
> cluster
> spark_kubernetes_sensor which poke sparkapplication state
> operator example with spark-pi 
> application:[https://github.com/roitvt/airflow-spark-on-k8s-operator]
> I'll be glad to contribute our operator to airflow contrib
> Thanks
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-6860) Default ignore_first_depends_on_past to True

2020-03-30 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-6860:

Fix Version/s: (was: 1.10.10)

> Default ignore_first_depends_on_past to True
> 
>
> Key: AIRFLOW-6860
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6860
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: cli
>Affects Versions: 1.10.9
>Reporter: Ping Zhang
>Assignee: Ping Zhang
>Priority: Minor
> Fix For: 2.0.0
>
>
> to avoid 
> BackfillJob is deadlocked.Some of the deadlocked tasks were unable to run 
> because of "depends_on_past" relationships.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-6296) add mssql odbc hook

2020-03-30 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-6296:

Fix Version/s: (was: 1.10.10)
   2.0.0

> add mssql odbc hook
> ---
>
> Key: AIRFLOW-6296
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6296
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: hooks
>Affects Versions: 1.10.7
>Reporter: Daniel Standish
>Assignee: Daniel Standish
>Priority: Major
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (AIRFLOW-5825) SageMakerEndpointOperator is not idempotent

2020-03-27 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik resolved AIRFLOW-5825.
-
Fix Version/s: 2.0.0
   Resolution: Fixed

> SageMakerEndpointOperator is not idempotent
> ---
>
> Key: AIRFLOW-5825
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5825
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: aws
>Affects Versions: 2.0.0, 1.10.7
>Reporter: Bas Harenslak
>Assignee: Omair Khan
>Priority: Major
>  Labels: gsoc, gsoc2020, mentor
> Fix For: 2.0.0
>
>
> The SageMakerEndpointOperator currently taken an argument "operati on" with 
> value "create"/"update" which determines whether to create or update a 
> SageMaker endpoint. However this doesn't work in the following situation:
>  * DAG run #1 create the endpoint (have to provide operation="create" here)
>  * Following DAG runs will update the endpoint created by DAG run #1 (would 
> have to edit DAG and set operation="update" here)
> Which should be a very valid use case IMO.
> The SageMakerEndpointOperator should check itself if an endpoint with name X 
> already exists and overwrite it (configurable desired by the user).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-2906) DataDog Integration for Airflow

2020-03-27 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-2906:

Fix Version/s: (was: 1.10.10)
   2.0.0

> DataDog Integration for Airflow
> ---
>
> Key: AIRFLOW-2906
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2906
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: utils
>Affects Versions: 1.8.0
>Reporter: Austin Hsu
>Assignee: Chandu Kavar
>Priority: Minor
>  Labels: metrics
> Fix For: 2.0.0
>
>
> Add functionality to Airflow to enable sending of metrics to DataDog.  
> DataDog provides support for tags which allows us to aggregate data more 
> easily and visualize it.  We can utilize the [Datadog python 
> library|https://github.com/DataDog/datadogpy] python library and the [Datadog 
> ThreadStats 
> module|https://datadogpy.readthedocs.io/en/latest/#datadog-threadstats-module]
>  to send metrics directly to DataDog without needing to spin up an agent to 
> forward the metrics.  The current implementation in 1.8 uses the statsd 
> library to send the metrics which provides us with much less control to 
> filter our data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-6530) Allow for custom Statsd client

2020-03-27 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-6530:

Fix Version/s: (was: 1.10.10)
   2.0.0

> Allow for custom Statsd client
> --
>
> Key: AIRFLOW-6530
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6530
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: scheduler, webserver
>Affects Versions: 1.10.7
>Reporter: Usman Arshad
>Assignee: Usman Arshad
>Priority: Major
>  Labels: features
> Fix For: 2.0.0
>
>
> We are currently using Airflow at Skyscanner and we have a custom 
> implementation of Statsd which offers features which wires in nicely into our 
> metrics platform/tooling.
> I'm quite sure that other companies who are using Airflow would also find 
> great benefit in being able to utilise their own custom Statsd client, 
> therefore i am proposing this addition.
>  
> The proposed solution looks something along the lines of changing this:
> {code:java}
> statsd = StatsClient(
> host=conf.get('scheduler', 'statsd_host'),
> port=conf.getint('scheduler', 'statsd_port'),
> prefix=conf.get('scheduler', 'statsd_prefix'))
> {code}
>  Into
> {code:java}
> statsd = conf.get('STATSD_CLIENT') or StatsClient(
> host=conf.get('scheduler', 'statsd_host'),
> port=conf.getint('scheduler', 'statsd_port'),
> prefix=conf.get('scheduler', 'statsd_prefix'))
> {code}
> Note: Psuedocode, not actual code
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-6711) Drop plugin support for stat_name_handler

2020-03-27 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-6711:

Fix Version/s: (was: 1.10.10)
   2.0.0

> Drop plugin support for stat_name_handler
> -
>
> Key: AIRFLOW-6711
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6711
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.10.7
>Reporter: Kamil Bregula
>Priority: Major
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-4057) airflow should handle invalid stats name

2020-03-27 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-4057:

Fix Version/s: (was: 1.10.10)
   2.0.0

> airflow should handle invalid stats name
> 
>
> Key: AIRFLOW-4057
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4057
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (AIRFLOW-4453) none_failed trigger rule cascading skipped state to downstream tasks

2020-03-26 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik resolved AIRFLOW-4453.
-
Fix Version/s: 1.10.10
   Resolution: Fixed

> none_failed trigger rule cascading skipped state to downstream tasks
> 
>
> Key: AIRFLOW-4453
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4453
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG, scheduler
>Affects Versions: 1.10.3, 1.10.4, 1.10.5, 1.10.6, 1.10.7
>Reporter: Dmytro Kulyk
>Assignee: Kaxil Naik
>Priority: Major
>  Labels: skipped
> Fix For: 1.10.10, 1.10.5
>
> Attachments: 3_step.png, cube_update.py, 
> image-2019-05-02-18-11-28-307.png, simple_skip.png
>
>
> Task with trigger_rule = 'none_failed' cascading *skipped *status to 
> downstream task
>  * task have multiple upstream tasks
>  * trigger_rule set to 'none_failed'
>  * some of upstream tasks can be skipped due to *latest only*
> Basing on documentation this shouldn't happen
>  !image-2019-05-02-18-11-28-307.png|width=655,height=372! 
>  DAG attached



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (AIRFLOW-6399) Serialization: DAG access_control field should be decorated field in DAG serialization

2020-03-26 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik resolved AIRFLOW-6399.
-
Fix Version/s: 1.10.10
   Resolution: Fixed

> Serialization: DAG access_control field should be decorated field in DAG 
> serialization
> --
>
> Key: AIRFLOW-6399
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6399
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG, scheduler, webserver
>Affects Versions: 1.10.7, 1.10.8
>Reporter: Xich Long Le
>Assignee: Xich Long Le
>Priority: Major
> Fix For: 1.10.10
>
>
> When DAG serialization is enabled, if any DAG contain field: "access_control":
> - Scheduler will fail to schedule all jobs.
> - Webserver cannot browse those DAG with "access_control"
> Scheduler output:
> ValidationError: Additional properties are not allowed ('_access_control' was 
> unexpected)
> Failed validating u'additionalProperties' in 
> schema[u'allOf'][0][u'properties'][u'dag']
> Reason: 
> [https://github.com/apache/airflow/blob/1.10.7/airflow/serialization/schema.json|https://github.com/apache/airflow/blob/1.10.7/airflow/serialization/schema.json]
>  does not have access_control as a whitelisted field.
> AIRFLOW-6425 added '_access_control' to JSON schema but does not define it as 
> decorated_fields making the field to serialized/deserialized  incorrectly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-7000) Allow passing in env var JSON dict in task_test

2020-03-25 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-7000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-7000:

Fix Version/s: (was: 1.10.10)
   2.0.0

> Allow passing in env var JSON dict in task_test
> ---
>
> Key: AIRFLOW-7000
> URL: https://issues.apache.org/jira/browse/AIRFLOW-7000
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: cli
>Affects Versions: 1.10.9
>Reporter: Kevin Yang
>Assignee: Kevin Yang
>Priority: Trivial
> Fix For: 2.0.0
>
>
> Add a convinient way to batch add env vars, e.g. {'test_mode': True, 
> "write_to_production": True}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-6530) Allow for custom Statsd client

2020-03-25 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-6530:

Fix Version/s: (was: 2.0.0)
   1.10.10

> Allow for custom Statsd client
> --
>
> Key: AIRFLOW-6530
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6530
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: scheduler, webserver
>Affects Versions: 1.10.7
>Reporter: Usman Arshad
>Assignee: Usman Arshad
>Priority: Major
>  Labels: features
> Fix For: 1.10.10
>
>
> We are currently using Airflow at Skyscanner and we have a custom 
> implementation of Statsd which offers features which wires in nicely into our 
> metrics platform/tooling.
> I'm quite sure that other companies who are using Airflow would also find 
> great benefit in being able to utilise their own custom Statsd client, 
> therefore i am proposing this addition.
>  
> The proposed solution looks something along the lines of changing this:
> {code:java}
> statsd = StatsClient(
> host=conf.get('scheduler', 'statsd_host'),
> port=conf.getint('scheduler', 'statsd_port'),
> prefix=conf.get('scheduler', 'statsd_prefix'))
> {code}
>  Into
> {code:java}
> statsd = conf.get('STATSD_CLIENT') or StatsClient(
> host=conf.get('scheduler', 'statsd_host'),
> port=conf.getint('scheduler', 'statsd_port'),
> prefix=conf.get('scheduler', 'statsd_prefix'))
> {code}
> Note: Psuedocode, not actual code
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-4057) airflow should handle invalid stats name

2020-03-25 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-4057:

Fix Version/s: (was: 2.0.0)
   1.10.10

> airflow should handle invalid stats name
> 
>
> Key: AIRFLOW-4057
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4057
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
> Fix For: 1.10.10
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-6711) Drop plugin support for stat_name_handler

2020-03-25 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-6711:

Fix Version/s: (was: 2.0.0)
   1.10.10

> Drop plugin support for stat_name_handler
> -
>
> Key: AIRFLOW-6711
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6711
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.10.7
>Reporter: Kamil Bregula
>Priority: Major
> Fix For: 1.10.10
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-5231) S3Hook delete fails with over 1000 keys

2020-03-25 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-5231:

Fix Version/s: (was: 1.10.10)
   2.0.0

> S3Hook delete fails with over 1000 keys
> ---
>
> Key: AIRFLOW-5231
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5231
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: aws, hooks
>Affects Versions: 2.0.0
>Reporter: Silviu Tantos
>Assignee: Cyril Shcherbin
>Priority: Major
> Fix For: 2.0.0
>
>
> Error raised:
> {noformat}
> botocore.exceptions.ClientError: An error occurred (MalformedXML) when 
> calling the DeleteObjects operation: The XML you provided was not well-formed 
> or did not validate against our published schema{noformat}
> See also: https://github.com/spotify/luigi/issues/2511



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-6850) Handle not being able to display code more gracefully

2020-03-25 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-6850:

Fix Version/s: (was: 1.10.10)

> Handle not being able to display code more gracefully
> -
>
> Key: AIRFLOW-6850
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6850
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ui
>Affects Versions: 1.10.9
>Reporter: Anita Fronczak
>Assignee: Anita Fronczak
>Priority: Major
> Fix For: 2.0.0
>
>
> Currently when web server does not have access to code it displays exception.
> It would be better to display a message that code could not be loaded.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-6437) sql filters - remove in (NULL)

2020-03-25 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-6437:

Fix Version/s: (was: 1.10.10)

> sql filters - remove in (NULL)
> --
>
> Key: AIRFLOW-6437
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6437
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.10.7
>Reporter: t oo
>Assignee: t oo
>Priority: Trivial
> Fix For: 2.0.0
>
>
> looking at sql generated by sqlalchemy, example query:
> 2020-01-03 09:10:25,373 INFO sqlalchemy.engine.base.Engine SELECT 
> task_instance.try_number AS task_instance_try_number, task_instance.task_id 
> AS task_instance_task_id, task_instance.dag_id AS task_instance_dag_id, 
> task_instance.execution_date AS task_instance_execution_date, 
> task_instance.start_date AS task_instance_start_date, task_instance.end_date 
> AS task_instance_end_date, task_instance.duration AS task_instance_duration, 
> task_instance.state AS task_instance_state, task_instance.max_tries AS 
> task_instance_max_tries, task_instance.hostname AS task_instance_hostname, 
> task_instance.unixname AS task_instance_unixname, task_instance.job_id AS 
> task_instance_job_id, task_instance.pool AS task_instance_pool, 
> task_instance.queue AS task_instance_queue, task_instance.priority_weight AS 
> task_instance_priority_weight, task_instance.operator AS 
> task_instance_operator, task_instance.queued_dttm AS 
> task_instance_queued_dttm, task_instance.pid AS task_instance_pid, 
> task_instance.executor_config AS task_instance_executor_config 
> FROM task_instance 
> WHERE task_instance.dag_id = ? AND task_instance.execution_date = ? AND 
> (task_instance.state IN (?, *NULL*, ?) OR task_instance.state IS NULL)
> 2020-01-03 09:10:25,374 INFO sqlalchemy.engine.base.Engine 
> ('example_bash_operator', '2020-01-01 00:00:00.00', 'up_for_reschedule', 
> 'up_for_retry')
> the bolded part should not go into the query
> fix is to change below parts to rewrite the states list to not have null 
> before passing to the query
> grep 'if None in' -R *
> airflow/models/dagrun.py:if None in state:
> airflow/models/dag.py:if None in state:
> airflow/models/dag.py:if None in states:
> airflow/jobs/scheduler_job.py:if None in states:
> airflow/jobs/scheduler_job.py:if None in acceptable_states:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-6768) Graph view rendering angular edges

2020-03-25 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-6768:

Fix Version/s: (was: 1.10.10)

> Graph view rendering angular edges
> --
>
> Key: AIRFLOW-6768
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6768
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ui
>Affects Versions: 1.10.8, 1.10.9
>Reporter: Nathan Hadfield
>Assignee: Ry Walker
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: Screenshot 2020-02-10 at 08.51.02.png, Screenshot 
> 2020-02-10 at 08.51.20.png
>
>
> Since the release of v1.10.8 the DAG graph view is rendering the edges 
> between nodes with angular lines rather than nice smooth curves.
> Seems to have been caused by a bump of dagre-d3.
> [https://github.com/apache/airflow/pull/7280]
> [https://github.com/dagrejs/dagre-d3/issues/305]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-6722) Can't modify a variable with an existing value to have an empty value

2020-03-25 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-6722:

Fix Version/s: (was: 1.10.10)

> Can't modify a variable with an existing value to have an empty value
> -
>
> Key: AIRFLOW-6722
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6722
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: configuration
>Affects Versions: 1.10.6, 1.10.7
>Reporter: Andrew Desousa
>Assignee: Andrew Desousa
>Priority: Minor
> Fix For: 2.0.0
>
>
> In admin >> variables
> If you attempt to blank out the value for a variable, the changes wont 
> register and the variable will keep the same value that it had before.
>  
> - This issue has been previously fixed, but it was merged to the wrong branch 
> AIRFLOW-5449



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (AIRFLOW-4363) Encounter JSON Decode Error when using docker operator

2020-03-25 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik resolved AIRFLOW-4363.
-
Fix Version/s: 1.10.10
   Resolution: Fixed

> Encounter JSON Decode Error when using docker operator
> --
>
> Key: AIRFLOW-4363
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4363
> Project: Apache Airflow
>  Issue Type: Bug
> Environment: - Mac OS 10.14
> - python 3.6.8
> - airflow 1.10.2
>Reporter: Ben Chen
>Assignee: Ben Chen
>Priority: Blocker
> Fix For: 1.10.10
>
>
> *[Description]* . 
> When using the docker_operator, I experienced some issue while using Mac OS 
> 10.14.4. The error was json.JSONDecodeError. After my investigation about 
> this error, I found that there are several messages for logging aren't well 
> separated, for example it contains \n inside one single message which should 
> be split into 2 to more different messages.
> *[Update]*  
> Confirmed that issue came from the implementation in airflow, issue cannot be 
> solved by just passing `decode` to parameter in docker.pull method in docker 
> api.
> *[Solution]* . 
>  For now, I use try-catch to run the original implementation, and in the 
> exception part I split the message to list and then parse it. Looking for 
> simpler solution to this non critical but still blocking point.
> *[Logs]*  
> {docker_operator.py:188}
> INFO - Starting docker container from image hello-world
> {docker_operator.py:202}
> INFO - Pulling docker image hello-world
> {docker_operator.py:207}
> INFO - Pulling from library/hello-world
> {docker_operator.py:207}
> INFO - Pulling fs layer
> {docker_operator.py:207}
> INFO - Downloading
> {docker_operator.py:207}
> INFO - Downloading
> {docker_operator.py:207}
> INFO - Download complete
> {docker_operator.py:207}
> INFO - Extracting
> {docker_operator.py:207}
> INFO - Extracting
> {docker_operator.py:207}
> INFO - Pull complete
> {docker_operator.py:207}
> INFO - Digest: 
> sha256:92695bc579f31df7a63da6922075d0666e565ceccad16b59c3374d2cf4e8e50e
> {docker_operator.py:207}
> INFO - Pulling from library/hello-world
> {docker_operator.py:207}
> INFO - Digest: 
> sha256:1a67c1115b199aa9d964d5da5646917cbac2d5450c71a1deed7b1bfb79c2c82d
> {models.py:1788}
> ERROR - Extra data: line 2 column 1 (char 70)
>  Traceback (most recent call last):
>  line 1657, in _run_raw_task, result = task_copy.execute(context=context)
>  line 205, in execute output = json.loads(line)
>  line 354, in loads, return _default_decoder.decode(s)
>  line 342, in decode, raise JSONDecodeError("Extra data", s, end)
>  json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 70)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-6833) HA for webhdfs connection

2020-03-25 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-6833:

Priority: Minor  (was: Major)

> HA for webhdfs connection
> -
>
> Key: AIRFLOW-6833
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6833
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: hooks
>Affects Versions: 1.10.9
>Reporter: Jakub Guzik
>Priority: Minor
> Fix For: 2.0.0
>
>
> Creating a connection to a webhdfs with two hosts for high avitablity (eg 
> connection 1, connection 2) is not possible because the entire value entered 
> is taken. For our needs, it is necessary to go through subsequent hosts and 
> connect to the first working. This change allows you to check and then 
> connect to a working hdfs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (AIRFLOW-6833) HA for webhdfs connection

2020-03-25 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik resolved AIRFLOW-6833.
-
Resolution: Fixed

> HA for webhdfs connection
> -
>
> Key: AIRFLOW-6833
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6833
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: hooks
>Affects Versions: 1.10.9
>Reporter: Jakub Guzik
>Priority: Major
> Fix For: 2.0.0
>
>
> Creating a connection to a webhdfs with two hosts for high avitablity (eg 
> connection 1, connection 2) is not possible because the entire value entered 
> is taken. For our needs, it is necessary to go through subsequent hosts and 
> connect to the first working. This change allows you to check and then 
> connect to a working hdfs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-7067) Add apache-airflow-pinned version

2020-03-23 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-7067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-7067:

Fix Version/s: (was: 1.10.10)
   2.0.0

> Add apache-airflow-pinned version
> -
>
> Key: AIRFLOW-7067
> URL: https://issues.apache.org/jira/browse/AIRFLOW-7067
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: ci
>Affects Versions: 2.0.0, 1.10.9
>Reporter: Jarek Potiuk
>Priority: Major
> Fix For: 2.0.0
>
>
> For official docker image we need to have fixed set of requirements so that 
> we know rebuilding the image can be reproducible.
> We need a -pinned version of apache-airflow for that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (AIRFLOW-7105) Unify Secrets Backend method interfaces

2020-03-23 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik resolved AIRFLOW-7105.
-
Fix Version/s: 1.10.10
   Resolution: Fixed

> Unify Secrets Backend method interfaces
> ---
>
> Key: AIRFLOW-7105
> URL: https://issues.apache.org/jira/browse/AIRFLOW-7105
> Project: Apache Airflow
>  Issue Type: Task
>  Components: security
>Affects Versions: 2.0.0, 1.10.10
>Reporter: Xinbin Huang
>Assignee: Xinbin Huang
>Priority: Major
> Fix For: 1.10.10
>
>
> Both AWS SSM, Hashicorp Vault, and GCP Secrets Manger secrets backend has the 
> exact same implementation for get_connections(). We can move it back to 
> BaseSecretsBackend, and allow the users to override it when it fits.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   5   6   7   8   9   10   >