[jira] [Created] (AIRFLOW-3353) redis-py 3.0.0 dependency breaks celery executor

2018-11-15 Thread Stefan Seelmann (JIRA)
Stefan Seelmann created AIRFLOW-3353:


 Summary: redis-py 3.0.0 dependency breaks celery executor
 Key: AIRFLOW-3353
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3353
 Project: Apache Airflow
  Issue Type: Bug
  Components: celery
Affects Versions: 1.10.0
Reporter: Stefan Seelmann


redis-py 3.0.0 was just released. Airflow 1.10.0 defines redis>=2.10.5 so 
installs redis-py 3.0.0 now.

Error in worker below.

Workaround: Pin redis==2.10.6 (e.g. in constraints.txt)

{code}
[2018-11-15 12:06:18,441: CRITICAL/MainProcess] Unrecoverable error: 
AttributeError("'float' object has no attribute 'items'",)
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/celery/worker/worker.py", line 
205, in start
self.blueprint.start(self)
  File "/usr/local/lib/python3.6/site-packages/celery/bootsteps.py", line 119, 
in start
step.start(parent)
  File "/usr/local/lib/python3.6/site-packages/celery/bootsteps.py", line 369, 
in start
return self.obj.start()
  File 
"/usr/local/lib/python3.6/site-packages/celery/worker/consumer/consumer.py", 
line 317, in start
blueprint.start(self)
  File "/usr/local/lib/python3.6/site-packages/celery/bootsteps.py", line 119, 
in start
step.start(parent)
  File 
"/usr/local/lib/python3.6/site-packages/celery/worker/consumer/consumer.py", 
line 593, in start
c.loop(*c.loop_args())
  File "/usr/local/lib/python3.6/site-packages/celery/worker/loops.py", line 
91, in asynloop
next(loop)
  File "/usr/local/lib/python3.6/site-packages/kombu/asynchronous/hub.py", line 
354, in create_loop
cb(*cbargs)
  File "/usr/local/lib/python3.6/site-packages/kombu/transport/redis.py", line 
1040, in on_readable
self.cycle.on_readable(fileno)
  File "/usr/local/lib/python3.6/site-packages/kombu/transport/redis.py", line 
337, in on_readable
chan.handlers[type]()
  File "/usr/local/lib/python3.6/site-packages/kombu/transport/redis.py", line 
724, in _brpop_read
self.connection._deliver(loads(bytes_to_str(item)), dest)
  File 
"/usr/local/lib/python3.6/site-packages/kombu/transport/virtual/base.py", line 
983, in _deliver
callback(message)
  File 
"/usr/local/lib/python3.6/site-packages/kombu/transport/virtual/base.py", line 
632, in _callback
self.qos.append(message, message.delivery_tag)
  File "/usr/local/lib/python3.6/site-packages/kombu/transport/redis.py", line 
149, in append
pipe.zadd(self.unacked_index_key, time(), delivery_tag) \
  File "/usr/local/lib/python3.6/site-packages/redis/client.py", line 2263, in 
zadd
for pair in iteritems(mapping):
  File "/usr/local/lib/python3.6/site-packages/redis/_compat.py", line 123, in 
iteritems
return iter(x.items())
AttributeError: 'float' object has no attribute 'items'

{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2001) Make sensors relinquish their execution slots

2018-10-05 Thread Stefan Seelmann (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640110#comment-16640110
 ] 

Stefan Seelmann commented on AIRFLOW-2001:
--

Yes, this issue requests what was implemented in AIRFLOW-2747. [~Fokko] could 
you please also close this one?

> Make sensors relinquish their execution slots
> -
>
> Key: AIRFLOW-2001
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2001
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: db, scheduler
>Reporter: Yati
>Assignee: Yati
>Priority: Major
>
> A sensor task instance should not take up an execution slot for the entirety 
> of its lifetime (as is currently the case). Indeed, for reasons outlined 
> below, it would be better if sensor execution was preempted by the scheduler 
> by parking it away from the slot till the next poll.
>  Some sensors sense for a condition to be true which is affected only by an 
> external party (e.g., materialization by external means of certain rows in a 
> table). By external, I mean external to the Airflow installation in question, 
> such that the producing entity itself does not need an execution slot in an 
> Airflow pool. If all sensors and their dependencies were of this nature, 
> there would be no issue. Unfortunately, a lot of real world DAGs have sensor 
> dependencies on results produced by another task, typically in some other 
> DAG, but scheduled by the same Airflow scheduler.
> Consider a simple example (arrow direction represents "must happen before", 
> just like in Airflow): DAG1(a >> b) and DAG2(c:sensor(DAG1.b) >> d). In other 
> words, The opening task c of the second dag has a sensor dependency on the 
> ending task b of the first dag. Imagine we have a single pool with 10 
> execution slots, and somehow task instances for c fill up the pool, while the 
> corresponding task instances of DAG1.b have not had a chance to execute (in 
> the real world this happens because of, say, back-fills or reprocesses by 
> clearing those sensors instances and their upstream). This is a deadlock 
> situation, since no progress can be made here – the sensors have filled up 
> the pool waiting on tasks that themselves will never get a chance to run. 
> This problem has been [acknowledged 
> here|https://cwiki.apache.org/confluence/display/AIRFLOW/Common+Pitfalls]
> One way (suggested by Fokko) to solve this is to always run sensors on their 
> pool, and to be careful with the concurrency settings of sensor tasks. This 
> is what a lot of users do now, but there are better solutions to this. Since 
> all the sensor interface allows for is a poll, we can, after each poll, 
> "park" the sensor's execution slot and yield it to other tasks. In the above 
> scenario, there would be no "filling up" of the pool by sensors tasks, as 
> they will be polled, determined to be still unfulfilled, and then parked 
> away, thereby giving a chance to other tasks.
> This would likely have some changes to the DB, and of course to the scheduler.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2639) Dagrun of subdags is set to RUNNING immediately

2018-10-05 Thread Stefan Seelmann (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640001#comment-16640001
 ] 

Stefan Seelmann commented on AIRFLOW-2639:
--

Yes, please close. It was opened base on my comment in AIRFLOW-2355. I'd close 
it but don't have permission.

> Dagrun of subdags is set to RUNNING immediately
> ---
>
> Key: AIRFLOW-2639
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2639
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>
> This change has a side-effect. The subdag run and it's task instances are 
> eagerly created, the subdag is immediately set to "RUNNING" state. This means 
> it is immediately visible in the UI (tree view and dagrun view).
> In our case we skip the SubDagOperator base on some conditions. However the 
> subdag run is then still visible in th UI and in "RUNNING" state which looks 
> scary, see attached screenshot. Before there was no subdag run visible at all 
> for skipped subdags.
> One option I see is to not set subdags to "RUNNING" state but "NONE". Then it 
> will still be visible in the UI but not as running. Another idea is to try to 
> pass the conf directly in the SubDagOperator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-317) Avoid exception when reading logs from s3.

2018-09-22 Thread Stefan Seelmann (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624830#comment-16624830
 ] 

Stefan Seelmann commented on AIRFLOW-317:
-

No longer valid according to 
https://github.com/apache/incubator-airflow/pull/1656#issuecomment-383647450

> Avoid exception when reading logs from s3.
> --
>
> Key: AIRFLOW-317
> URL: https://issues.apache.org/jira/browse/AIRFLOW-317
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: aws
>Affects Versions: 1.7.1.2, 1.7.1.3
>Reporter: Oleksandr Vilchynskyy
>Assignee: Oleksandr Vilchynskyy
>Priority: Minor
>  Labels: patch
> Fix For: 1.7.1.2, 1.7.1.3
>
>
> There is unneeded decode() method after s3_key.get_contents_as_string(). 
> get_contents_as_string() already returns a string. The problem appears if log 
> file contains non-ascii characters decode() method  will raise Exception and 
> log will not be processed.
> Without decode() method it will work as always, but log decoding will be 
> processed to browser.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2653) Add Twitter to the community list

2018-09-22 Thread Stefan Seelmann (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624824#comment-16624824
 ] 

Stefan Seelmann commented on AIRFLOW-2653:
--

It was added int this PR: https://github.com/apache/incubator-airflow/pull/3747

> Add Twitter to the community list
> -
>
> Key: AIRFLOW-2653
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2653
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Samuel Ngahane
>Assignee: Samuel Ngahane
>Priority: Trivial
>
> This ticket is about adding Twitter to the "Who uses Airflow?" list of 
> companies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1532) Tree view arbitrarily adds seven hours to the times

2018-09-22 Thread Stefan Seelmann (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624823#comment-16624823
 ] 

Stefan Seelmann commented on AIRFLOW-1532:
--

This was fixed in commit a13618dd869ec06ad574b8ace98dfb07bfc9e198  / PR 
https://github.com/apache/incubator-airflow/pull/2687

> Tree view arbitrarily adds seven hours to the times
> ---
>
> Key: AIRFLOW-1532
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1532
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ui
>Affects Versions: 1.8.1
> Environment: Docker on Linux
>Reporter: michael crozier
>Priority: Trivial
> Attachments: Screenshot from 2017-08-25 10-04-19.png
>
>
> On the Tree view of a DAG, the times on the timeline appear to have seven 
> hours arbitrarily added to the UTC timestamp. A screenshot and code reference 
> are attached.
> https://github.com/apache/incubator-airflow/blob/32a26d84b679a54add43092d0bdb77350dcbaeaf/airflow/www/templates/airflow/tree.html#L125



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-294) Add support for creating users with password (password_auth backend)

2018-09-22 Thread Stefan Seelmann (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624817#comment-16624817
 ] 

Stefan Seelmann commented on AIRFLOW-294:
-

This is solved in the new RBAC UI.

> Add support for creating users with password (password_auth backend)
> 
>
> Key: AIRFLOW-294
> URL: https://issues.apache.org/jira/browse/AIRFLOW-294
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Vineet Goel
>Assignee: Chinmay Kousik
>Priority: Minor
>
> Currently the workflow to add new users through the UI is to create a new 
> user. This form takes the username and the email id of the user but there 
> seems to be no place to provide the password (unless I am mistaken) or for 
> that matter to specify if the user is a super user. This makes the scope of 
> this auth backend limited as new users need to be creating using db commands 
> instead of the UI. Could one of the maintainers shed some light to this?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2419) SubDag Operator uses default view of airflow.config

2018-09-22 Thread Stefan Seelmann (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624806#comment-16624806
 ] 

Stefan Seelmann commented on AIRFLOW-2419:
--

Included in 1.10.0

> SubDag Operator uses default view of airflow.config
> ---
>
> Key: AIRFLOW-2419
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2419
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Milan
>Assignee: Milan
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1386) Add a SLEEP state for sensors

2018-09-22 Thread Stefan Seelmann (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624796#comment-16624796
 ] 

Stefan Seelmann commented on AIRFLOW-1386:
--

I didn't see this issue and the PR earlier, would have saved me some time. 
Anyway something similar is now implemented, see AIRFLOW-2747.

> Add a SLEEP state for sensors
> -
>
> Key: AIRFLOW-1386
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1386
> Project: Apache Airflow
>  Issue Type: Wish
>  Components: scheduler
>Reporter: Yong-Siang Shih
>Priority: Minor
>
> Currently when the sensor is sleeping, the task process is in RUNING state 
> and it still consumes resources such as memory (because the process does not 
> terminate). This is waste of resource, especially when there are thousands of 
> sensors sleeping and poking. Preferably when it sleeps, it would enter a 
> SLEEP state and terminate, waiting for scheduler to reschedule it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2810) Typo in Xcom model timestamp field

2018-09-22 Thread Stefan Seelmann (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624793#comment-16624793
 ] 

Stefan Seelmann commented on AIRFLOW-2810:
--

Fixed in master and cherrypicked in 1.10.0

> Typo in Xcom model timestamp field
> --
>
> Key: AIRFLOW-2810
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2810
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: models
>Affects Versions: 2.0.0
>Reporter: Andy Wilcox
>Assignee: Andy Wilcox
>Priority: Minor
>
>  
> Looks like a find/replace error.  Should be type UtcDateTime, is legacy 
> DateTime



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-7) Unit test for ExternalTaskSensor depends on a different unit test

2018-09-22 Thread Stefan Seelmann (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624787#comment-16624787
 ] 

Stefan Seelmann commented on AIRFLOW-7:
---

Seems to be fixed, {{self.test_time_sensor()}} is now called explicitely: 
https://github.com/apache/incubator-airflow/blob/master/tests/sensors/test_external_task_sensor.py#L188

> Unit test for ExternalTaskSensor depends on a different unit test
> -
>
> Key: AIRFLOW-7
> URL: https://issues.apache.org/jira/browse/AIRFLOW-7
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: tests
>Reporter: Jeremiah Lowin
>Priority: Minor
>  Labels: test
>
> The unit test {{core:CoreTest.test_external_task_sensor}} appears to depend 
> on the result of a different unit test. I discovered this when I created a 
> {{tearDown()}} method that deleted any TaskInstances created by a unit test. 
> I think it's bad to have cross-test dependencies, especially since I'm not 
> sure if there is a guarantee about unit test run order.
> Full test:
> {code}
> def test_external_task_sensor_delta(self):
> t = operators.ExternalTaskSensor(
> task_id='test_external_task_sensor_check_delta',
> external_dag_id=TEST_DAG_ID,
> external_task_id='time_sensor_check',
> execution_delta=timedelta(0),
> allowed_states=['success'],
> dag=self.dag)
> t.run(start_date=DEFAULT_DATE, end_date=DEFAULT_DATE, force=True)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-529) dag_stats is accessible without authentification

2018-09-22 Thread Stefan Seelmann (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624786#comment-16624786
 ] 

Stefan Seelmann commented on AIRFLOW-529:
-

https://github.com/apache/incubator-airflow/commit/0bf7adb209ce969243ffaf4fc5213ff3957cbbc9
 added @login_required decorator

> dag_stats is accessible without authentification
> 
>
> Key: AIRFLOW-529
> URL: https://issues.apache.org/jira/browse/AIRFLOW-529
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webserver
> Environment: Config :
> authenticate = True
> auth_backend = airflow.contrib.auth.backends.ldap_auth
>Reporter: Maxime Bugeia
>Priority: Major
>
> It's possible to query the endpoint 
> http://AIRFLOWHOST/admin/airflow/dag_stats without being authentificated even 
> when there is authenticate = True in airflow configuration.
> This is a security issue since it leaks informations about the names and 
> status of all dags.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1448) Revert PR 2433 which added merge conflicts to master

2018-09-22 Thread Stefan Seelmann (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624770#comment-16624770
 ] 

Stefan Seelmann commented on AIRFLOW-1448:
--

This is done, can be closed.

> Revert PR 2433 which added merge conflicts to master
> 
>
> Key: AIRFLOW-1448
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1448
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Dan Davydov
>Assignee: Dan Davydov
>Priority: Major
>
> https://github.com/apache/incubator-airflow/pull/2433 has logical merge 
> conflicts in master and causes tests to fail, it needs to be reverted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2744) RBAC app doesn't integrate plugins (blueprints etc)

2018-09-22 Thread Stefan Seelmann (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624743#comment-16624743
 ] 

Stefan Seelmann commented on AIRFLOW-2744:
--

Yes, I'd suggest to submit the PR, you may mark it as "WIP".

Please also note the (short) discussion on the dev mailing list, I don't know 
if Ian is also already working on it. If possible please also comment there (I 
can also do if you are not subscribed).
https://lists.apache.org/thread.html/30b9f524ab72743f72c397ea7ee2a8f22e263cbbf4d6c048f4079124@%3Cdev.airflow.apache.org%3E


> RBAC app doesn't integrate plugins (blueprints etc)
> ---
>
> Key: AIRFLOW-2744
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2744
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webapp, webserver
>Affects Versions: 2.0.0
>Reporter: David Dossett
>Priority: Major
>
> In the current 1.10.0rc tag, the new RBAC app doesn't integrate any plugins 
> created by a user extending Airflow. In the old www/app.py you had the 
> [integrate_plugins|https://github.com/apache/incubator-airflow/blob/f1083cbada337731ed0b7e27b09eee7a26c8189a/airflow/www/app.py#L126]
>  function. But currently the 
> [www_rbac/app.py|https://github.com/apache/incubator-airflow/blob/f1083cbada337731ed0b7e27b09eee7a26c8189a/airflow/www_rbac/app.py]
>  doesn't pull in any plugins from the plugin_manager. So nothing you do to 
> extend Airflow's webapp will work.
> I think adding the code for registering the blueprints and menu links is a 
> pretty simple fix. I'm not sure how the FAB system is handling the same 
> functionality as Flask-Admin views though.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2001) Make sensors relinquish their execution slots

2018-09-21 Thread Stefan Seelmann (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624060#comment-16624060
 ] 

Stefan Seelmann commented on AIRFLOW-2001:
--

AIRFLOW-2747 is merged to master which should also solve this issue.

> Make sensors relinquish their execution slots
> -
>
> Key: AIRFLOW-2001
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2001
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: db, scheduler
>Reporter: Yati
>Assignee: Yati
>Priority: Major
>
> A sensor task instance should not take up an execution slot for the entirety 
> of its lifetime (as is currently the case). Indeed, for reasons outlined 
> below, it would be better if sensor execution was preempted by the scheduler 
> by parking it away from the slot till the next poll.
>  Some sensors sense for a condition to be true which is affected only by an 
> external party (e.g., materialization by external means of certain rows in a 
> table). By external, I mean external to the Airflow installation in question, 
> such that the producing entity itself does not need an execution slot in an 
> Airflow pool. If all sensors and their dependencies were of this nature, 
> there would be no issue. Unfortunately, a lot of real world DAGs have sensor 
> dependencies on results produced by another task, typically in some other 
> DAG, but scheduled by the same Airflow scheduler.
> Consider a simple example (arrow direction represents "must happen before", 
> just like in Airflow): DAG1(a >> b) and DAG2(c:sensor(DAG1.b) >> d). In other 
> words, The opening task c of the second dag has a sensor dependency on the 
> ending task b of the first dag. Imagine we have a single pool with 10 
> execution slots, and somehow task instances for c fill up the pool, while the 
> corresponding task instances of DAG1.b have not had a chance to execute (in 
> the real world this happens because of, say, back-fills or reprocesses by 
> clearing those sensors instances and their upstream). This is a deadlock 
> situation, since no progress can be made here – the sensors have filled up 
> the pool waiting on tasks that themselves will never get a chance to run. 
> This problem has been [acknowledged 
> here|https://cwiki.apache.org/confluence/display/AIRFLOW/Common+Pitfalls]
> One way (suggested by Fokko) to solve this is to always run sensors on their 
> pool, and to be careful with the concurrency settings of sensor tasks. This 
> is what a lot of users do now, but there are better solutions to this. Since 
> all the sensor interface allows for is a poll, we can, after each poll, 
> "park" the sensor's execution slot and yield it to other tasks. In the above 
> scenario, there would be no "filling up" of the pool by sensors tasks, as 
> they will be polled, determined to be still unfulfilled, and then parked 
> away, thereby giving a chance to other tasks.
> This would likely have some changes to the DB, and of course to the scheduler.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2747) Explicit re-schedule of sensors

2018-09-21 Thread Stefan Seelmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Seelmann resolved AIRFLOW-2747.
--
Resolution: Fixed

> Explicit re-schedule of sensors
> ---
>
> Key: AIRFLOW-2747
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2747
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core, operators
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Stefan Seelmann
>Assignee: Stefan Seelmann
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: Screenshot_2018-07-12_14-10-24.png, 
> Screenshot_2018-09-16_20-09-28.png, Screenshot_2018-09-16_20-19-23.png, 
> google_apis-23_r01.zip
>
>
> By default sensors block a worker and just sleep between pokes. This is very 
> inefficient, especially when there are many long-running sensors.
> There is a hacky workaroud by setting a small timeout value and a high retry 
> number. But that has drawbacks:
>  * Errors raised by sensors are hidden and the sensor retries too often
>  * The sensor is retried in a fixed time interval (with optional exponential 
> backoff)
>  * There are many attempts and many log files are generated
>  I'd like to propose an explicit reschedule mechanism:
>  * A new "reschedule" flag for sensors, if set to True it will raise an 
> AirflowRescheduleException that causes a reschedule.
>  * AirflowRescheduleException contains the (earliest) re-schedule date.
>  * Reschedule requests are recorded in new `task_reschedule` table and 
> visualized in the Gantt view.
>  * A new TI dependency that checks if a sensor task is ready to be 
> re-scheduled.
> Advantages:
>  * This change is backward compatible. Existing sensors behave like before. 
> But it's possible to set the "reschedule" flag.
>  * The poke_interval, timeout, and soft_fail parameters are still respected 
> and used to calculate the next schedule time.
>  * Custom sensor implementations can even define the next sensible schedule 
> date by raising AirflowRescheduleException themselves.
>  * Existing TimeSensor and TimeDeltaSensor can also be changed to be 
> rescheduled when the time is reached.
>  * This mechanism can also be used by non-sensor operators (but then the new 
> ReadyToRescheduleDep has to be added to deps or BaseOperator).
> Design decisions and caveats:
>  * When handling AirflowRescheduleException the `try_number` is decremented. 
> That means that subsequent runs use the same try number and write to the same 
> log file.
>  * Sensor TI dependency check now depends on `task_reschedule` table. However 
> only the BaseSensorOperator includes the new ReadyToRescheduleDep for now.
> Open questions and TODOs:
>  * Should a dedicated state `UP_FOR_RESCHEDULE` be used instead of setting 
> the state back to `NONE`? This would require more changes in scheduler code 
> and especially in the UI, but the state of a task would be more explicit and 
> more transparent to the user.
>  * Add example/test for a non-sensor operator
>  * Document the new feature



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-2747) Explicit re-schedule of sensors

2018-09-21 Thread Stefan Seelmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Seelmann reassigned AIRFLOW-2747:


 Assignee: Stefan Seelmann  (was: Roufique hossain)
Affects Version/s: 1.10.0

> Explicit re-schedule of sensors
> ---
>
> Key: AIRFLOW-2747
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2747
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core, operators
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Stefan Seelmann
>Assignee: Stefan Seelmann
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: Screenshot_2018-07-12_14-10-24.png, 
> Screenshot_2018-09-16_20-09-28.png, Screenshot_2018-09-16_20-19-23.png, 
> google_apis-23_r01.zip
>
>
> By default sensors block a worker and just sleep between pokes. This is very 
> inefficient, especially when there are many long-running sensors.
> There is a hacky workaroud by setting a small timeout value and a high retry 
> number. But that has drawbacks:
>  * Errors raised by sensors are hidden and the sensor retries too often
>  * The sensor is retried in a fixed time interval (with optional exponential 
> backoff)
>  * There are many attempts and many log files are generated
>  I'd like to propose an explicit reschedule mechanism:
>  * A new "reschedule" flag for sensors, if set to True it will raise an 
> AirflowRescheduleException that causes a reschedule.
>  * AirflowRescheduleException contains the (earliest) re-schedule date.
>  * Reschedule requests are recorded in new `task_reschedule` table and 
> visualized in the Gantt view.
>  * A new TI dependency that checks if a sensor task is ready to be 
> re-scheduled.
> Advantages:
>  * This change is backward compatible. Existing sensors behave like before. 
> But it's possible to set the "reschedule" flag.
>  * The poke_interval, timeout, and soft_fail parameters are still respected 
> and used to calculate the next schedule time.
>  * Custom sensor implementations can even define the next sensible schedule 
> date by raising AirflowRescheduleException themselves.
>  * Existing TimeSensor and TimeDeltaSensor can also be changed to be 
> rescheduled when the time is reached.
>  * This mechanism can also be used by non-sensor operators (but then the new 
> ReadyToRescheduleDep has to be added to deps or BaseOperator).
> Design decisions and caveats:
>  * When handling AirflowRescheduleException the `try_number` is decremented. 
> That means that subsequent runs use the same try number and write to the same 
> log file.
>  * Sensor TI dependency check now depends on `task_reschedule` table. However 
> only the BaseSensorOperator includes the new ReadyToRescheduleDep for now.
> Open questions and TODOs:
>  * Should a dedicated state `UP_FOR_RESCHEDULE` be used instead of setting 
> the state back to `NONE`? This would require more changes in scheduler code 
> and especially in the UI, but the state of a task would be more explicit and 
> more transparent to the user.
>  * Add example/test for a non-sensor operator
>  * Document the new feature



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3085) Log viewing not possible in default RBAC setting

2018-09-19 Thread Stefan Seelmann (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620344#comment-16620344
 ] 

Stefan Seelmann commented on AIRFLOW-3085:
--

See also:
 * https://issues.apache.org/jira/browse/AIRFLOW-3072
 * [https://github.com/apache/incubator-airflow/pull/3913]

:)

> Log viewing not possible in default RBAC setting
> 
>
> Key: AIRFLOW-3085
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3085
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Joy Gao
>Priority: Major
>
> Aside from Admin role, all other roles are not able to view logs right now 
> due to a missing permission in the default setting. The permission should be 
> added to Viewer/User/Op as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2747) Explicit re-schedule of sensors

2018-09-17 Thread Stefan Seelmann (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618498#comment-16618498
 ] 

Stefan Seelmann commented on AIRFLOW-2747:
--

I assume this "radhefa Roufique hossain" is a spam user. Can one with admin 
access please delete the attached google_apis-23_r01.zip? Also reported in 
https://issues.apache.org/jira/browse/INFRA-17031

> Explicit re-schedule of sensors
> ---
>
> Key: AIRFLOW-2747
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2747
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core, operators
>Affects Versions: 1.9.0
>Reporter: Stefan Seelmann
>Assignee: Roufique hossain
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: Screenshot_2018-07-12_14-10-24.png, 
> Screenshot_2018-09-16_20-09-28.png, Screenshot_2018-09-16_20-19-23.png, 
> google_apis-23_r01.zip
>
>
> By default sensors block a worker and just sleep between pokes. This is very 
> inefficient, especially when there are many long-running sensors.
> There is a hacky workaroud by setting a small timeout value and a high retry 
> number. But that has drawbacks:
>  * Errors raised by sensors are hidden and the sensor retries too often
>  * The sensor is retried in a fixed time interval (with optional exponential 
> backoff)
>  * There are many attempts and many log files are generated
>  I'd like to propose an explicit reschedule mechanism:
>  * A new "reschedule" flag for sensors, if set to True it will raise an 
> AirflowRescheduleException that causes a reschedule.
>  * AirflowRescheduleException contains the (earliest) re-schedule date.
>  * Reschedule requests are recorded in new `task_reschedule` table and 
> visualized in the Gantt view.
>  * A new TI dependency that checks if a sensor task is ready to be 
> re-scheduled.
> Advantages:
>  * This change is backward compatible. Existing sensors behave like before. 
> But it's possible to set the "reschedule" flag.
>  * The poke_interval, timeout, and soft_fail parameters are still respected 
> and used to calculate the next schedule time.
>  * Custom sensor implementations can even define the next sensible schedule 
> date by raising AirflowRescheduleException themselves.
>  * Existing TimeSensor and TimeDeltaSensor can also be changed to be 
> rescheduled when the time is reached.
>  * This mechanism can also be used by non-sensor operators (but then the new 
> ReadyToRescheduleDep has to be added to deps or BaseOperator).
> Design decisions and caveats:
>  * When handling AirflowRescheduleException the `try_number` is decremented. 
> That means that subsequent runs use the same try number and write to the same 
> log file.
>  * Sensor TI dependency check now depends on `task_reschedule` table. However 
> only the BaseSensorOperator includes the new ReadyToRescheduleDep for now.
> Open questions and TODOs:
>  * Should a dedicated state `UP_FOR_RESCHEDULE` be used instead of setting 
> the state back to `NONE`? This would require more changes in scheduler code 
> and especially in the UI, but the state of a task would be more explicit and 
> more transparent to the user.
>  * Add example/test for a non-sensor operator
>  * Document the new feature



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (AIRFLOW-3072) Only admin can view logs in RBAC UI

2018-09-17 Thread Stefan Seelmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-3072 started by Stefan Seelmann.

> Only admin can view logs in RBAC UI
> ---
>
> Key: AIRFLOW-3072
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3072
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ui
>Affects Versions: 1.10.0
>Reporter: Stefan Seelmann
>Assignee: Stefan Seelmann
>Priority: Major
>
> With RBAC enabled, only users with role admin can view logs.
> The default roles (excluding public) include permission {{can_log}} which 
> allows to open the /log page, however the actual log message is loaded with 
> another XHR request which required the additional permission 
> {{get_logs_with_metadata}}.
> My suggestion is to add the permission and assign tog viewer role. Or is 
> there a cause why only admin should be able to see logs?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2639) Dagrun of subdags is set to RUNNING immediately

2018-09-17 Thread Stefan Seelmann (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618044#comment-16618044
 ] 

Stefan Seelmann commented on AIRFLOW-2639:
--

As https://github.com/apache/incubator-airflow/pull/3460 / 
https://issues.apache.org/jira/browse/AIRFLOW-2355 now is in 1.10 I suggest to 
not consider this change because it would change behaviour again. I closed 
https://github.com/apache/incubator-airflow/pull/3540.

> Dagrun of subdags is set to RUNNING immediately
> ---
>
> Key: AIRFLOW-2639
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2639
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>
> This change has a side-effect. The subdag run and it's task instances are 
> eagerly created, the subdag is immediately set to "RUNNING" state. This means 
> it is immediately visible in the UI (tree view and dagrun view).
> In our case we skip the SubDagOperator base on some conditions. However the 
> subdag run is then still visible in th UI and in "RUNNING" state which looks 
> scary, see attached screenshot. Before there was no subdag run visible at all 
> for skipped subdags.
> One option I see is to not set subdags to "RUNNING" state but "NONE". Then it 
> will still be visible in the UI but not as running. Another idea is to try to 
> pass the conf directly in the SubDagOperator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-3072) Only admin can view logs in RBAC UI

2018-09-17 Thread Stefan Seelmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Seelmann updated AIRFLOW-3072:
-
Description: 
With RBAC enabled, only users with role admin can view logs.

The default roles (excluding public) include permission {{can_log}} which 
allows to open the /log page, however the actual log message is loaded with 
another XHR request which required the additional permission 
{{get_logs_with_metadata}}.

My suggestion is to add the permission and assign tog viewer role. Or is there 
a cause why only admin should be able to see logs?

  was:
With RBAC enabled, only users with role admin can view logs.

Cause is that there is no permission for {{get_logs_with_metadata}} defined in 
{{security.py}}.

My suggestion is to add the permission and assign tog viewer role. Or is there 
a cause why only admin should be able to see logs?


> Only admin can view logs in RBAC UI
> ---
>
> Key: AIRFLOW-3072
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3072
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ui
>Affects Versions: 1.10.0
>Reporter: Stefan Seelmann
>Assignee: Stefan Seelmann
>Priority: Major
>
> With RBAC enabled, only users with role admin can view logs.
> The default roles (excluding public) include permission {{can_log}} which 
> allows to open the /log page, however the actual log message is loaded with 
> another XHR request which required the additional permission 
> {{get_logs_with_metadata}}.
> My suggestion is to add the permission and assign tog viewer role. Or is 
> there a cause why only admin should be able to see logs?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3072) Only admin can view logs in RBAC UI

2018-09-16 Thread Stefan Seelmann (JIRA)
Stefan Seelmann created AIRFLOW-3072:


 Summary: Only admin can view logs in RBAC UI
 Key: AIRFLOW-3072
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3072
 Project: Apache Airflow
  Issue Type: Bug
  Components: ui
Affects Versions: 1.10.0
Reporter: Stefan Seelmann
Assignee: Stefan Seelmann


With RBAC enabled, only users with role admin can view logs.

Cause is that there is no permission for {{get_logs_with_metadata}} defined in 
{{security.py}}.

My suggestion is to add the permission and assign tog viewer role. Or is there 
a cause why only admin should be able to see logs?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (AIRFLOW-2747) Explicit re-schedule of sensors

2018-09-16 Thread Stefan Seelmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-2747 started by Stefan Seelmann.

> Explicit re-schedule of sensors
> ---
>
> Key: AIRFLOW-2747
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2747
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core, operators
>Affects Versions: 1.9.0
>Reporter: Stefan Seelmann
>Assignee: Stefan Seelmann
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: Screenshot_2018-07-12_14-10-24.png, 
> Screenshot_2018-09-16_20-09-28.png, Screenshot_2018-09-16_20-19-23.png
>
>
> By default sensors block a worker and just sleep between pokes. This is very 
> inefficient, especially when there are many long-running sensors.
> There is a hacky workaroud by setting a small timeout value and a high retry 
> number. But that has drawbacks:
>  * Errors raised by sensors are hidden and the sensor retries too often
>  * The sensor is retried in a fixed time interval (with optional exponential 
> backoff)
>  * There are many attempts and many log files are generated
>  I'd like to propose an explicit reschedule mechanism:
>  * A new "reschedule" flag for sensors, if set to True it will raise an 
> AirflowRescheduleException that causes a reschedule.
>  * AirflowRescheduleException contains the (earliest) re-schedule date.
>  * Reschedule requests are recorded in new `task_reschedule` table and 
> visualized in the Gantt view.
>  * A new TI dependency that checks if a sensor task is ready to be 
> re-scheduled.
> Advantages:
>  * This change is backward compatible. Existing sensors behave like before. 
> But it's possible to set the "reschedule" flag.
>  * The poke_interval, timeout, and soft_fail parameters are still respected 
> and used to calculate the next schedule time.
>  * Custom sensor implementations can even define the next sensible schedule 
> date by raising AirflowRescheduleException themselves.
>  * Existing TimeSensor and TimeDeltaSensor can also be changed to be 
> rescheduled when the time is reached.
>  * This mechanism can also be used by non-sensor operators (but then the new 
> ReadyToRescheduleDep has to be added to deps or BaseOperator).
> Design decisions and caveats:
>  * When handling AirflowRescheduleException the `try_number` is decremented. 
> That means that subsequent runs use the same try number and write to the same 
> log file.
>  * Sensor TI dependency check now depends on `task_reschedule` table. However 
> only the BaseSensorOperator includes the new ReadyToRescheduleDep for now.
> Open questions and TODOs:
>  * Should a dedicated state `UP_FOR_RESCHEDULE` be used instead of setting 
> the state back to `NONE`? This would require more changes in scheduler code 
> and especially in the UI, but the state of a task would be more explicit and 
> more transparent to the user.
>  * Add example/test for a non-sensor operator
>  * Document the new feature



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2747) Explicit re-schedule of sensors

2018-09-16 Thread Stefan Seelmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Seelmann updated AIRFLOW-2747:
-
Attachment: Screenshot_2018-09-16_20-09-28.png

> Explicit re-schedule of sensors
> ---
>
> Key: AIRFLOW-2747
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2747
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core, operators
>Affects Versions: 1.9.0
>Reporter: Stefan Seelmann
>Assignee: Stefan Seelmann
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: Screenshot_2018-07-12_14-10-24.png, 
> Screenshot_2018-09-16_20-09-28.png, Screenshot_2018-09-16_20-19-23.png
>
>
> By default sensors block a worker and just sleep between pokes. This is very 
> inefficient, especially when there are many long-running sensors.
> There is a hacky workaroud by setting a small timeout value and a high retry 
> number. But that has drawbacks:
>  * Errors raised by sensors are hidden and the sensor retries too often
>  * The sensor is retried in a fixed time interval (with optional exponential 
> backoff)
>  * There are many attempts and many log files are generated
>  I'd like to propose an explicit reschedule mechanism:
>  * A new "reschedule" flag for sensors, if set to True it will raise an 
> AirflowRescheduleException that causes a reschedule.
>  * AirflowRescheduleException contains the (earliest) re-schedule date.
>  * Reschedule requests are recorded in new `task_reschedule` table and 
> visualized in the Gantt view.
>  * A new TI dependency that checks if a sensor task is ready to be 
> re-scheduled.
> Advantages:
>  * This change is backward compatible. Existing sensors behave like before. 
> But it's possible to set the "reschedule" flag.
>  * The poke_interval, timeout, and soft_fail parameters are still respected 
> and used to calculate the next schedule time.
>  * Custom sensor implementations can even define the next sensible schedule 
> date by raising AirflowRescheduleException themselves.
>  * Existing TimeSensor and TimeDeltaSensor can also be changed to be 
> rescheduled when the time is reached.
>  * This mechanism can also be used by non-sensor operators (but then the new 
> ReadyToRescheduleDep has to be added to deps or BaseOperator).
> Design decisions and caveats:
>  * When handling AirflowRescheduleException the `try_number` is decremented. 
> That means that subsequent runs use the same try number and write to the same 
> log file.
>  * Sensor TI dependency check now depends on `task_reschedule` table. However 
> only the BaseSensorOperator includes the new ReadyToRescheduleDep for now.
> Open questions and TODOs:
>  * Should a dedicated state `UP_FOR_RESCHEDULE` be used instead of setting 
> the state back to `NONE`? This would require more changes in scheduler code 
> and especially in the UI, but the state of a task would be more explicit and 
> more transparent to the user.
>  * Add example/test for a non-sensor operator
>  * Document the new feature



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2747) Explicit re-schedule of sensors

2018-09-16 Thread Stefan Seelmann (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16616842#comment-16616842
 ] 

Stefan Seelmann commented on AIRFLOW-2747:
--

I changed the Gantt view to not show each individual reschedule but only a 
single bar. The color changes between light green (if currently running) and 
white (if currently inactive), those colors are also shown in other views so 
it's consistent. However failed attempts are still shown as separate bar (as 
before). Attached two screenshots for demonstration.

!Screenshot_2018-09-16_20-19-23.png!!Screenshot_2018-09-16_20-09-28.png!

 

> Explicit re-schedule of sensors
> ---
>
> Key: AIRFLOW-2747
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2747
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core, operators
>Affects Versions: 1.9.0
>Reporter: Stefan Seelmann
>Assignee: Stefan Seelmann
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: Screenshot_2018-07-12_14-10-24.png, 
> Screenshot_2018-09-16_20-09-28.png, Screenshot_2018-09-16_20-19-23.png
>
>
> By default sensors block a worker and just sleep between pokes. This is very 
> inefficient, especially when there are many long-running sensors.
> There is a hacky workaroud by setting a small timeout value and a high retry 
> number. But that has drawbacks:
>  * Errors raised by sensors are hidden and the sensor retries too often
>  * The sensor is retried in a fixed time interval (with optional exponential 
> backoff)
>  * There are many attempts and many log files are generated
>  I'd like to propose an explicit reschedule mechanism:
>  * A new "reschedule" flag for sensors, if set to True it will raise an 
> AirflowRescheduleException that causes a reschedule.
>  * AirflowRescheduleException contains the (earliest) re-schedule date.
>  * Reschedule requests are recorded in new `task_reschedule` table and 
> visualized in the Gantt view.
>  * A new TI dependency that checks if a sensor task is ready to be 
> re-scheduled.
> Advantages:
>  * This change is backward compatible. Existing sensors behave like before. 
> But it's possible to set the "reschedule" flag.
>  * The poke_interval, timeout, and soft_fail parameters are still respected 
> and used to calculate the next schedule time.
>  * Custom sensor implementations can even define the next sensible schedule 
> date by raising AirflowRescheduleException themselves.
>  * Existing TimeSensor and TimeDeltaSensor can also be changed to be 
> rescheduled when the time is reached.
>  * This mechanism can also be used by non-sensor operators (but then the new 
> ReadyToRescheduleDep has to be added to deps or BaseOperator).
> Design decisions and caveats:
>  * When handling AirflowRescheduleException the `try_number` is decremented. 
> That means that subsequent runs use the same try number and write to the same 
> log file.
>  * Sensor TI dependency check now depends on `task_reschedule` table. However 
> only the BaseSensorOperator includes the new ReadyToRescheduleDep for now.
> Open questions and TODOs:
>  * Should a dedicated state `UP_FOR_RESCHEDULE` be used instead of setting 
> the state back to `NONE`? This would require more changes in scheduler code 
> and especially in the UI, but the state of a task would be more explicit and 
> more transparent to the user.
>  * Add example/test for a non-sensor operator
>  * Document the new feature



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2747) Explicit re-schedule of sensors

2018-09-16 Thread Stefan Seelmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Seelmann updated AIRFLOW-2747:
-
Attachment: Screenshot_2018-09-16_20-19-23.png

> Explicit re-schedule of sensors
> ---
>
> Key: AIRFLOW-2747
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2747
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core, operators
>Affects Versions: 1.9.0
>Reporter: Stefan Seelmann
>Assignee: Stefan Seelmann
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: Screenshot_2018-07-12_14-10-24.png, 
> Screenshot_2018-09-16_20-19-23.png
>
>
> By default sensors block a worker and just sleep between pokes. This is very 
> inefficient, especially when there are many long-running sensors.
> There is a hacky workaroud by setting a small timeout value and a high retry 
> number. But that has drawbacks:
>  * Errors raised by sensors are hidden and the sensor retries too often
>  * The sensor is retried in a fixed time interval (with optional exponential 
> backoff)
>  * There are many attempts and many log files are generated
>  I'd like to propose an explicit reschedule mechanism:
>  * A new "reschedule" flag for sensors, if set to True it will raise an 
> AirflowRescheduleException that causes a reschedule.
>  * AirflowRescheduleException contains the (earliest) re-schedule date.
>  * Reschedule requests are recorded in new `task_reschedule` table and 
> visualized in the Gantt view.
>  * A new TI dependency that checks if a sensor task is ready to be 
> re-scheduled.
> Advantages:
>  * This change is backward compatible. Existing sensors behave like before. 
> But it's possible to set the "reschedule" flag.
>  * The poke_interval, timeout, and soft_fail parameters are still respected 
> and used to calculate the next schedule time.
>  * Custom sensor implementations can even define the next sensible schedule 
> date by raising AirflowRescheduleException themselves.
>  * Existing TimeSensor and TimeDeltaSensor can also be changed to be 
> rescheduled when the time is reached.
>  * This mechanism can also be used by non-sensor operators (but then the new 
> ReadyToRescheduleDep has to be added to deps or BaseOperator).
> Design decisions and caveats:
>  * When handling AirflowRescheduleException the `try_number` is decremented. 
> That means that subsequent runs use the same try number and write to the same 
> log file.
>  * Sensor TI dependency check now depends on `task_reschedule` table. However 
> only the BaseSensorOperator includes the new ReadyToRescheduleDep for now.
> Open questions and TODOs:
>  * Should a dedicated state `UP_FOR_RESCHEDULE` be used instead of setting 
> the state back to `NONE`? This would require more changes in scheduler code 
> and especially in the UI, but the state of a task would be more explicit and 
> more transparent to the user.
>  * Add example/test for a non-sensor operator
>  * Document the new feature



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2747) Explicit re-schedule of sensors

2018-07-12 Thread Stefan Seelmann (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16542191#comment-16542191
 ] 

Stefan Seelmann commented on AIRFLOW-2747:
--

[~pedromachado] Thanks for the feedback.

I added the content of task_fail and task_instance table above, I hope things 
get clearer.

Regarding the colors:
 * The black bars are executions that requested a reschedule (i.e. the sensor 
raised an AirflowRescheduleException). The start_date and end_date are the 
actual dates the sensor task run, the reschedule_date is the date it requested 
to be rescheduled. I borrowed the layout of the task_reschedule table from 
task_fail table and added the two additional columns.
 * The red bars are failures (which then triggered a retry), those are recorded 
in task_fail table and already today (in master and 1.10) shown like this in 
the gantt view.

Regarding start_date before reschedule_date: I cannot see that problem, the 
start_date of the next row (with the same sensor task_id) is always after the 
previous reschedule_date. Note that the table contains rows of two sensors s2 
and s3.

The way it is visualized (in the gantt view) can be changed, for example there 
can just be a one bar from first start_date to last end_date, in light green 
while still in unfinished state, dark green or red when successful or failed. I 
personally like the multiple bars to see what happened when.

> Explicit re-schedule of sensors
> ---
>
> Key: AIRFLOW-2747
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2747
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core, operators
>Affects Versions: 1.9.0
>Reporter: Stefan Seelmann
>Assignee: Stefan Seelmann
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: Screenshot_2018-07-12_14-10-24.png
>
>
> By default sensors block a worker and just sleep between pokes. This is very 
> inefficient, especially when there are many long-running sensors.
> There is a hacky workaroud by setting a small timeout value and a high retry 
> number. But that has drawbacks:
>  * Errors raised by sensors are hidden and the sensor retries too often
>  * The sensor is retried in a fixed time interval (with optional exponential 
> backoff)
>  * There are many attempts and many log files are generated
>  I'd like to propose an explicit reschedule mechanism:
>  * A new "reschedule" flag for sensors, if set to True it will raise an 
> AirflowRescheduleException that causes a reschedule.
>  * AirflowRescheduleException contains the (earliest) re-schedule date.
>  * Reschedule requests are recorded in new `task_reschedule` table and 
> visualized in the Gantt view.
>  * A new TI dependency that checks if a sensor task is ready to be 
> re-scheduled.
> Advantages:
>  * This change is backward compatible. Existing sensors behave like before. 
> But it's possible to set the "reschedule" flag.
>  * The poke_interval, timeout, and soft_fail parameters are still respected 
> and used to calculate the next schedule time.
>  * Custom sensor implementations can even define the next sensible schedule 
> date by raising AirflowRescheduleException themselves.
>  * Existing TimeSensor and TimeDeltaSensor can also be changed to be 
> rescheduled when the time is reached.
>  * This mechanism can also be used by non-sensor operators (but then the new 
> ReadyToRescheduleDep has to be added to deps or BaseOperator).
> Design decisions and caveats:
>  * When handling AirflowRescheduleException the `try_number` is decremented. 
> That means that subsequent runs use the same try number and write to the same 
> log file.
>  * Sensor TI dependency check now depends on `task_reschedule` table. However 
> only the BaseSensorOperator includes the new ReadyToRescheduleDep for now.
> Open questions and TODOs:
>  * Should a dedicated state `UP_FOR_RESCHEDULE` be used instead of setting 
> the state back to `NONE`? This would require more changes in scheduler code 
> and especially in the UI, but the state of a task would be more explicit and 
> more transparent to the user.
>  * Add example/test for a non-sensor operator
>  * Document the new feature



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AIRFLOW-2747) Explicit re-schedule of sensors

2018-07-12 Thread Stefan Seelmann (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541539#comment-16541539
 ] 

Stefan Seelmann edited comment on AIRFLOW-2747 at 7/12/18 8:27 PM:
---

Screenshot of the Gantt view for an example DAG run:

  !Screenshot_2018-07-12_14-10-24.png!

And the corresponding rows in task_reschedule, task_fail, and task_instance 
table:
{noformat}
$ select * from task_reschedule where 
execution_date='2018-07-12T12:06:28.988028' order by id;
 id | task_id | dag_id |execution_date | try_number |  
start_date   |   end_date| duration |
reschedule_date
+-++---++---+---+--+---
 42 | s3  | dummy  | 2018-07-12 12:06:28.988028+00 |  1 | 
2018-07-12 12:06:54.430185+00 | 2018-07-12 12:06:59.339554+00 |5 | 
2018-07-12 12:07:14.312456+00
 44 | s2  | dummy  | 2018-07-12 12:06:28.988028+00 |  2 | 
2018-07-12 12:07:09.381193+00 | 2018-07-12 12:07:12.480702+00 |3 | 
2018-07-12 12:07:22.467206+00
 45 | s3  | dummy  | 2018-07-12 12:06:28.988028+00 |  1 | 
2018-07-12 12:07:17.111816+00 | 2018-07-12 12:07:18.444199+00 |1 | 
2018-07-12 12:07:33.4376+00
 47 | s2  | dummy  | 2018-07-12 12:06:28.988028+00 |  3 | 
2018-07-12 12:07:34.499979+00 | 2018-07-12 12:07:35.834609+00 |1 | 
2018-07-12 12:07:45.817533+00
 49 | s2  | dummy  | 2018-07-12 12:06:28.988028+00 |  3 | 
2018-07-12 12:07:49.407569+00 | 2018-07-12 12:07:50.843526+00 |1 | 
2018-07-12 12:08:00.834584+00
 51 | s2  | dummy  | 2018-07-12 12:06:28.988028+00 |  4 | 
2018-07-12 12:08:14.526+00| 2018-07-12 12:08:15.768907+00 |1 | 
2018-07-12 12:08:25.762619+00
 53 | s2  | dummy  | 2018-07-12 12:06:28.988028+00 |  4 | 
2018-07-12 12:08:29.329766+00 | 2018-07-12 12:08:31.168762+00 |2 | 
2018-07-12 12:08:41.160209+00
{noformat}
{noformat}
$ select * from task_fail where execution_date='2018-07-12T12:06:28.988028' 
order by id;
 id  | task_id | dag_id |execution_date |  start_date   
|   end_date| duration 
-+-++---+---+---+--
 173 | t1  | dummy  | 2018-07-12 12:06:28.988028+00 | 2018-07-12 
12:06:33.005215+00 | 2018-07-12 12:06:36.503438+00 |3
 179 | s2  | dummy  | 2018-07-12 12:06:28.988028+00 | 2018-07-12 
12:06:54.860487+00 | 2018-07-12 12:06:59.352183+00 |4
 181 | s2  | dummy  | 2018-07-12 12:06:28.988028+00 | 2018-07-12 
12:07:25.124649+00 | 2018-07-12 12:07:26.606175+00 |1
 182 | s2  | dummy  | 2018-07-12 12:06:28.988028+00 | 2018-07-12 
12:08:04.295306+00 | 2018-07-12 12:08:05.610363+00 |1
{noformat}
{noformat}
$ select 
task_id,dag_id,execution_date,start_date,end_date,duration,state,try_number 
from task_instance where dag_id='dummy' and 
execution_date='2018-07-12T12:06:28.988028';
 task_id | dag_id |execution_date |  start_date 
  |   end_date| duration |  state  | try_number 
-++---+---+---+--+-+
 s2  | dummy  | 2018-07-12 12:06:28.988028+00 | 2018-07-12 
12:08:44.828189+00 | 2018-07-12 12:08:46.609474+00 | 1.781285 | success |   
   4
 t2  | dummy  | 2018-07-12 12:06:28.988028+00 | 2018-07-12 
12:08:50.711506+00 | 2018-07-12 12:08:54.888104+00 | 4.176598 | success |   
   1
 b1  | dummy  | 2018-07-12 12:06:28.988028+00 | 2018-07-12 
12:08:57.965998+00 | 2018-07-12 12:08:59.547209+00 | 1.581211 | success |   
   1
 t1  | dummy  | 2018-07-12 12:06:28.988028+00 | 2018-07-12 
12:06:44.652687+00 | 2018-07-12 12:06:48.328103+00 | 3.675416 | success |   
   2
 sub1| dummy  | 2018-07-12 12:06:28.988028+00 | 2018-07-12 
12:09:03.322963+00 | 2018-07-12 12:09:40.248113+00 | 36.92515 | success |   
   1
 s1  | dummy  | 2018-07-12 12:06:28.988028+00 | 2018-07-12 
12:06:54.345113+00 | 2018-07-12 12:06:58.871657+00 | 4.526544 | success |   
   1
 s3  | dummy  | 2018-07-12 12:06:28.988028+00 | 2018-07-12 
12:07:37.190335+00 | 2018-07-12 12:07:38.725783+00 | 1.535448 | success |   
   1
{noformat}
 

 


was (Author: seelmann):
Screenshot of the Gantt view for an example DAG run:

  !Screenshot_2018-07-12_14-10-24.png!


 And the corresponding rows in task_reschedule table:
{noformat}
$ select * from task_reschedule where 
execution_date='2018-07-12T12:06:28.988028' order by id;
 id | task_id | dag_id |execution_date | try_number |  

[jira] [Updated] (AIRFLOW-2747) Explicit re-schedule of sensors

2018-07-12 Thread Stefan Seelmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Seelmann updated AIRFLOW-2747:
-
Description: 
By default sensors block a worker and just sleep between pokes. This is very 
inefficient, especially when there are many long-running sensors.

There is a hacky workaroud by setting a small timeout value and a high retry 
number. But that has drawbacks:
 * Errors raised by sensors are hidden and the sensor retries too often
 * The sensor is retried in a fixed time interval (with optional exponential 
backoff)
 * There are many attempts and many log files are generated

 I'd like to propose an explicit reschedule mechanism:
 * A new "reschedule" flag for sensors, if set to True it will raise an 
AirflowRescheduleException that causes a reschedule.
 * AirflowRescheduleException contains the (earliest) re-schedule date.
 * Reschedule requests are recorded in new `task_reschedule` table and 
visualized in the Gantt view.
 * A new TI dependency that checks if a sensor task is ready to be re-scheduled.

Advantages:
 * This change is backward compatible. Existing sensors behave like before. But 
it's possible to set the "reschedule" flag.
 * The poke_interval, timeout, and soft_fail parameters are still respected and 
used to calculate the next schedule time.
 * Custom sensor implementations can even define the next sensible schedule 
date by raising AirflowRescheduleException themselves.
 * Existing TimeSensor and TimeDeltaSensor can also be changed to be 
rescheduled when the time is reached.
 * This mechanism can also be used by non-sensor operators (but then the new 
ReadyToRescheduleDep has to be added to deps or BaseOperator).

Design decisions and caveats:
 * When handling AirflowRescheduleException the `try_number` is decremented. 
That means that subsequent runs use the same try number and write to the same 
log file.
 * Sensor TI dependency check now depends on `task_reschedule` table. However 
only the BaseSensorOperator includes the new ReadyToRescheduleDep for now.

Open questions and TODOs:
 * Should a dedicated state `UP_FOR_RESCHEDULE` be used instead of setting the 
state back to `NONE`? This would require more changes in scheduler code and 
especially in the UI, but the state of a task would be more explicit and more 
transparent to the user.
 * Add example/test for a non-sensor operator
 * Document the new feature

  was:
By default sensors block a worker and just sleep between pokes. This is very 
inefficient, especially when there are many long-running sensors.

There is a hacky workaroud by setting a small timeout value and a high retry 
number. But that has drawbacks:
 * Errors throws by sensors are hidden and the sensor retries too often
 * The sensor is retried in a fixed time interval (with optional exponential 
backoff)
 * There are many attempts and many log files are generated

 I'd like to propose an explicit reschedule mechanism:
 * A new "reschedule" flag for sensors, if set to True it will raise an 
AirflowRescheduleException that causes a reschedule.
 * AirflowRescheduleException contains the (earliest) re-schedule date.
 * Reschedule requests are recorded in new `task_reschedule` table and 
visualized in the Gantt view.
 * A new TI dependency that checks if a sensor task is ready to be re-scheduled.

Advantages:
 * This change is backward compatible. Existing sensors behave like before. But 
it's possible to set the "reschedule" flag.
 * The poke_interval, timeout, and soft_fail parameters are still respected and 
used to calculate the next schedule time.
 * Custom sensor implementations can even define the next sensible schedule 
date by raising AirflowRescheduleException themselves.
 * Existing TimeSensor and TimeDeltaSensor can also be changed to be 
rescheduled when the time is reached.
 * This mechanism can also be used by non-sensor operators (but then the new 
ReadyToRescheduleDep has to be added to deps or BaseOperator).

Design decisions and caveats:
 * When handling AirflowRescheduleException the `try_number` is decremented. 
That means that subsequent runs use the same try number and write to the same 
log file.
 * Sensor TI dependency check now depends on `task_reschedule` table. However 
only the BaseSensorOperator includes the new ReadyToRescheduleDep for now.

Open questions and TODOs:
 * Should a dedicated state `UP_FOR_RESCHEDULE` be used instead of setting the 
state back to `NONE`? This would require more changes in scheduler code and 
especially in the UI, but the state of a task would be more explicit and more 
transparent to the user.
 * Add example/test for a non-sensor operator
 * Document the new feature


> Explicit re-schedule of sensors
> ---
>
> Key: AIRFLOW-2747
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2747
> Project: Apache Airflow
>  

[jira] [Commented] (AIRFLOW-2747) Explicit re-schedule of sensors

2018-07-12 Thread Stefan Seelmann (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541546#comment-16541546
 ] 

Stefan Seelmann commented on AIRFLOW-2747:
--

Initial PR: https://github.com/apache/incubator-airflow/pull/3596

> Explicit re-schedule of sensors
> ---
>
> Key: AIRFLOW-2747
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2747
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core, operators
>Affects Versions: 1.9.0
>Reporter: Stefan Seelmann
>Assignee: Stefan Seelmann
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: Screenshot_2018-07-12_14-10-24.png
>
>
> By default sensors block a worker and just sleep between pokes. This is very 
> inefficient, especially when there are many long-running sensors.
> There is a hacky workaroud by setting a small timeout value and a high retry 
> number. But that has drawbacks:
>  * Errors throws by sensors are hidden and the sensor retries too often
>  * The sensor is retried in a fixed time interval (with optional exponential 
> backoff)
>  * There are many attempts and many log files are generated
>  I'd like to propose an explicit reschedule mechanism:
>  * A new "reschedule" flag for sensors, if set to True it will raise an 
> AirflowRescheduleException that causes a reschedule.
>  * AirflowRescheduleException contains the (earliest) re-schedule date.
>  * Reschedule requests are recorded in new `task_reschedule` table and 
> visualized in the Gantt view.
>  * A new TI dependency that checks if a sensor task is ready to be 
> re-scheduled.
> Advantages:
>  * This change is backward compatible. Existing sensors behave like before. 
> But it's possible to set the "reschedule" flag.
>  * The poke_interval, timeout, and soft_fail parameters are still respected 
> and used to calculate the next schedule time.
>  * Custom sensor implementations can even define the next sensible schedule 
> date by raising AirflowRescheduleException themselves.
>  * Existing TimeSensor and TimeDeltaSensor can also be changed to be 
> rescheduled when the time is reached.
>  * This mechanism can also be used by non-sensor operators (but then the new 
> ReadyToRescheduleDep has to be added to deps or BaseOperator).
> Design decisions and caveats:
>  * When handling AirflowRescheduleException the `try_number` is decremented. 
> That means that subsequent runs use the same try number and write to the same 
> log file.
>  * Sensor TI dependency check now depends on `task_reschedule` table. However 
> only the BaseSensorOperator includes the new ReadyToRescheduleDep for now.
> Open questions and TODOs:
>  * Should a dedicated state `UP_FOR_RESCHEDULE` be used instead of setting 
> the state back to `NONE`? This would require more changes in scheduler code 
> and especially in the UI, but the state of a task would be more explicit and 
> more transparent to the user.
>  * Add example/test for a non-sensor operator
>  * Document the new feature



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2747) Explicit re-schedule of sensors

2018-07-12 Thread Stefan Seelmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Seelmann updated AIRFLOW-2747:
-
Attachment: Screenshot_2018-07-12_14-10-24.png

> Explicit re-schedule of sensors
> ---
>
> Key: AIRFLOW-2747
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2747
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core, operators
>Affects Versions: 1.9.0
>Reporter: Stefan Seelmann
>Assignee: Stefan Seelmann
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: Screenshot_2018-07-12_14-10-24.png
>
>
> By default sensors block a worker and just sleep between pokes. This is very 
> inefficient, especially when there are many long-running sensors.
> There is a hacky workaroud by setting a small timeout value and a high retry 
> number. But that has drawbacks:
>  * Errors throws by sensors are hidden and the sensor retries too often
>  * The sensor is retried in a fixed time interval (with optional exponential 
> backoff)
>  * There are many attempts and many log files are generated
>  I'd like to propose an explicit reschedule mechanism:
>  * A new "reschedule" flag for sensors, if set to True it will raise an 
> AirflowRescheduleException that causes a reschedule.
>  * AirflowRescheduleException contains the (earliest) re-schedule date.
>  * Reschedule requests are recorded in new `task_reschedule` table and 
> visualized in the Gantt view.
>  * A new TI dependency that checks if a sensor task is ready to be 
> re-scheduled.
> Advantages:
>  * This change is backward compatible. Existing sensors behave like before. 
> But it's possible to set the "reschedule" flag.
>  * The poke_interval, timeout, and soft_fail parameters are still respected 
> and used to calculate the next schedule time.
>  * Custom sensor implementations can even define the next sensible schedule 
> date by raising AirflowRescheduleException themselves.
>  * Existing TimeSensor and TimeDeltaSensor can also be changed to be 
> rescheduled when the time is reached.
>  * This mechanism can also be used by non-sensor operators (but then the new 
> ReadyToRescheduleDep has to be added to deps or BaseOperator).
> Design decisions and caveats:
>  * When handling AirflowRescheduleException the `try_number` is decremented. 
> That means that subsequent runs use the same try number and write to the same 
> log file.
>  * Sensor TI dependency check now depends on `task_reschedule` table. However 
> only the BaseSensorOperator includes the new ReadyToRescheduleDep for now.
> Open questions and TODOs:
>  * Should a dedicated state `UP_FOR_RESCHEDULE` be used instead of setting 
> the state back to `NONE`? This would require more changes in scheduler code 
> and especially in the UI, but the state of a task would be more explicit and 
> more transparent to the user.
>  * Add example/test for a non-sensor operator
>  * Document the new feature



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2747) Explicit re-schedule of sensors

2018-07-12 Thread Stefan Seelmann (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541539#comment-16541539
 ] 

Stefan Seelmann commented on AIRFLOW-2747:
--

Screenshot of the Gantt view for an example DAG run:

  !Screenshot_2018-07-12_14-10-24.png!


 And the corresponding rows in task_reschedule table:
{noformat}
$ select * from task_reschedule where 
execution_date='2018-07-12T12:06:28.988028' order by id;
 id | task_id | dag_id |execution_date | try_number |  
start_date   |   end_date| duration |
reschedule_date
+-++---++---+---+--+---
 42 | s3  | dummy  | 2018-07-12 12:06:28.988028+00 |  1 | 
2018-07-12 12:06:54.430185+00 | 2018-07-12 12:06:59.339554+00 |5 | 
2018-07-12 12:07:14.312456+00
 44 | s2  | dummy  | 2018-07-12 12:06:28.988028+00 |  2 | 
2018-07-12 12:07:09.381193+00 | 2018-07-12 12:07:12.480702+00 |3 | 
2018-07-12 12:07:22.467206+00
 45 | s3  | dummy  | 2018-07-12 12:06:28.988028+00 |  1 | 
2018-07-12 12:07:17.111816+00 | 2018-07-12 12:07:18.444199+00 |1 | 
2018-07-12 12:07:33.4376+00
 47 | s2  | dummy  | 2018-07-12 12:06:28.988028+00 |  3 | 
2018-07-12 12:07:34.499979+00 | 2018-07-12 12:07:35.834609+00 |1 | 
2018-07-12 12:07:45.817533+00
 49 | s2  | dummy  | 2018-07-12 12:06:28.988028+00 |  3 | 
2018-07-12 12:07:49.407569+00 | 2018-07-12 12:07:50.843526+00 |1 | 
2018-07-12 12:08:00.834584+00
 51 | s2  | dummy  | 2018-07-12 12:06:28.988028+00 |  4 | 
2018-07-12 12:08:14.526+00| 2018-07-12 12:08:15.768907+00 |1 | 
2018-07-12 12:08:25.762619+00
 53 | s2  | dummy  | 2018-07-12 12:06:28.988028+00 |  4 | 
2018-07-12 12:08:29.329766+00 | 2018-07-12 12:08:31.168762+00 |2 | 
2018-07-12 12:08:41.160209+00
{noformat}

> Explicit re-schedule of sensors
> ---
>
> Key: AIRFLOW-2747
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2747
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core, operators
>Affects Versions: 1.9.0
>Reporter: Stefan Seelmann
>Assignee: Stefan Seelmann
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: Screenshot_2018-07-12_14-10-24.png
>
>
> By default sensors block a worker and just sleep between pokes. This is very 
> inefficient, especially when there are many long-running sensors.
> There is a hacky workaroud by setting a small timeout value and a high retry 
> number. But that has drawbacks:
>  * Errors throws by sensors are hidden and the sensor retries too often
>  * The sensor is retried in a fixed time interval (with optional exponential 
> backoff)
>  * There are many attempts and many log files are generated
>  I'd like to propose an explicit reschedule mechanism:
>  * A new "reschedule" flag for sensors, if set to True it will raise an 
> AirflowRescheduleException that causes a reschedule.
>  * AirflowRescheduleException contains the (earliest) re-schedule date.
>  * Reschedule requests are recorded in new `task_reschedule` table and 
> visualized in the Gantt view.
>  * A new TI dependency that checks if a sensor task is ready to be 
> re-scheduled.
> Advantages:
>  * This change is backward compatible. Existing sensors behave like before. 
> But it's possible to set the "reschedule" flag.
>  * The poke_interval, timeout, and soft_fail parameters are still respected 
> and used to calculate the next schedule time.
>  * Custom sensor implementations can even define the next sensible schedule 
> date by raising AirflowRescheduleException themselves.
>  * Existing TimeSensor and TimeDeltaSensor can also be changed to be 
> rescheduled when the time is reached.
>  * This mechanism can also be used by non-sensor operators (but then the new 
> ReadyToRescheduleDep has to be added to deps or BaseOperator).
> Design decisions and caveats:
>  * When handling AirflowRescheduleException the `try_number` is decremented. 
> That means that subsequent runs use the same try number and write to the same 
> log file.
>  * Sensor TI dependency check now depends on `task_reschedule` table. However 
> only the BaseSensorOperator includes the new ReadyToRescheduleDep for now.
> Open questions and TODOs:
>  * Should a dedicated state `UP_FOR_RESCHEDULE` be used instead of setting 
> the state back to `NONE`? This would require more changes in scheduler code 
> and especially in the UI, but the state of a task would be more explicit and 
> more transparent to the user.
>  * Add example/test for a non-sensor operator
>  * Document the new feature



--
This message was sent 

[jira] [Created] (AIRFLOW-2747) Explicit re-schedule of sensors

2018-07-12 Thread Stefan Seelmann (JIRA)
Stefan Seelmann created AIRFLOW-2747:


 Summary: Explicit re-schedule of sensors
 Key: AIRFLOW-2747
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2747
 Project: Apache Airflow
  Issue Type: Improvement
  Components: core, operators
Affects Versions: 1.9.0
Reporter: Stefan Seelmann
Assignee: Stefan Seelmann
 Fix For: 2.0.0


By default sensors block a worker and just sleep between pokes. This is very 
inefficient, especially when there are many long-running sensors.

There is a hacky workaroud by setting a small timeout value and a high retry 
number. But that has drawbacks:
 * Errors throws by sensors are hidden and the sensor retries too often
 * The sensor is retried in a fixed time interval (with optional exponential 
backoff)
 * There are many attempts and many log files are generated

 I'd like to propose an explicit reschedule mechanism:
 * A new "reschedule" flag for sensors, if set to True it will raise an 
AirflowRescheduleException that causes a reschedule.
 * AirflowRescheduleException contains the (earliest) re-schedule date.
 * Reschedule requests are recorded in new `task_reschedule` table and 
visualized in the Gantt view.
 * A new TI dependency that checks if a sensor task is ready to be re-scheduled.

Advantages:
 * This change is backward compatible. Existing sensors behave like before. But 
it's possible to set the "reschedule" flag.
 * The poke_interval, timeout, and soft_fail parameters are still respected and 
used to calculate the next schedule time.
 * Custom sensor implementations can even define the next sensible schedule 
date by raising AirflowRescheduleException themselves.
 * Existing TimeSensor and TimeDeltaSensor can also be changed to be 
rescheduled when the time is reached.
 * This mechanism can also be used by non-sensor operators (but then the new 
ReadyToRescheduleDep has to be added to deps or BaseOperator).

Design decisions and caveats:
 * When handling AirflowRescheduleException the `try_number` is decremented. 
That means that subsequent runs use the same try number and write to the same 
log file.
 * Sensor TI dependency check now depends on `task_reschedule` table. However 
only the BaseSensorOperator includes the new ReadyToRescheduleDep for now.

Open questions and TODOs:
 * Should a dedicated state `UP_FOR_RESCHEDULE` be used instead of setting the 
state back to `NONE`? This would require more changes in scheduler code and 
especially in the UI, but the state of a task would be more explicit and more 
transparent to the user.
 * Add example/test for a non-sensor operator
 * Document the new feature



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2737) Restore original license header

2018-07-10 Thread Stefan Seelmann (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16538405#comment-16538405
 ] 

Stefan Seelmann commented on AIRFLOW-2737:
--

PR: https://github.com/apache/incubator-airflow/pull/3591

> Restore original license header
> ---
>
> Key: AIRFLOW-2737
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2737
> Project: Apache Airflow
>  Issue Type: Improvement
>Affects Versions: 1.9.0
>Reporter: Stefan Seelmann
>Assignee: Stefan Seelmann
>Priority: Major
> Fix For: 2.0.0
>
>
> The original license header in airflow/api/auth/backend/kerberos_auth.py was 
> replaced with the AL. It should be restored.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2737) Restore original license header

2018-07-10 Thread Stefan Seelmann (JIRA)
Stefan Seelmann created AIRFLOW-2737:


 Summary: Restore original license header
 Key: AIRFLOW-2737
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2737
 Project: Apache Airflow
  Issue Type: Improvement
Affects Versions: 1.9.0
Reporter: Stefan Seelmann
Assignee: Stefan Seelmann
 Fix For: 2.0.0


The original license header in airflow/api/auth/backend/kerberos_auth.py was 
replaced with the AL. It should be restored.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2639) Dagrun of subdags is set to RUNNING immediately

2018-06-23 Thread Stefan Seelmann (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16521288#comment-16521288
 ] 

Stefan Seelmann commented on AIRFLOW-2639:
--

[~milton0825] I implemented the 2nd option, the parent's dag run conf is now 
forwared to subdag directly withing the SubdagOperator. WDYT?

PR: [https://github.com/apache/incubator-airflow/pull/3540]

 

> Dagrun of subdags is set to RUNNING immediately
> ---
>
> Key: AIRFLOW-2639
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2639
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>
> This change has a side-effect. The subdag run and it's task instances are 
> eagerly created, the subdag is immediately set to "RUNNING" state. This means 
> it is immediately visible in the UI (tree view and dagrun view).
> In our case we skip the SubDagOperator base on some conditions. However the 
> subdag run is then still visible in th UI and in "RUNNING" state which looks 
> scary, see attached screenshot. Before there was no subdag run visible at all 
> for skipped subdags.
> One option I see is to not set subdags to "RUNNING" state but "NONE". Then it 
> will still be visible in the UI but not as running. Another idea is to try to 
> pass the conf directly in the SubDagOperator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2355) Airflow trigger tag parameters in subdag

2018-06-18 Thread Stefan Seelmann (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16515636#comment-16515636
 ] 

Stefan Seelmann commented on AIRFLOW-2355:
--

This change has a side-effect. The subdag run and it's task instances are 
eagerly created, the subdag is immediately set to "RUNNING" state. This means 
it is immediately visible in the UI (tree view and dagrun view). 

In our case we skip the SubDagOperator base on some conditions. However the 
subdag run is then still visible in th UI and in "RUNNING" state which looks 
scary, see attached screenshot. Before there was no subdag run visible at all 
for skipped subdags.

One option I see is to not set subdags to "RUNNING" state but "NONE". Then it 
will still be visible in the UI but not as running. Another idea is to try to 
pass the conf directly in the SubDagOperator.

Otherwise I really like to have the conf available in the subdag. I also like 
that the subdag uses the same run_id as the main dag. :)

> Airflow trigger tag parameters in subdag
> 
>
> Key: AIRFLOW-2355
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2355
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: api
>Affects Versions: 1.9.0
>Reporter: Mohammed Tameem
>Assignee: Chao-Han Tsai
>Priority: Blocker
> Fix For: 1.10.0, 2.0.0
>
> Attachments: Screenshot_2018-06-18_13-52-41.png
>
>
> The command airflow {color:#8eb021}+_trigger_dag -c 
> "\{'name':'value'}"_+{color} sends conf parameters only to the parent DAG. 
> I'm using SubDags that are dependent on these parameters. And no parameters 
> are recieved by the SubDag.
> From source code of SubDag operator I see that there is no way of passing 
> these trigger parameters to a Subdag.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2355) Airflow trigger tag parameters in subdag

2018-06-18 Thread Stefan Seelmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Seelmann updated AIRFLOW-2355:
-
Attachment: Screenshot_2018-06-18_13-52-41.png

> Airflow trigger tag parameters in subdag
> 
>
> Key: AIRFLOW-2355
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2355
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: api
>Affects Versions: 1.9.0
>Reporter: Mohammed Tameem
>Assignee: Chao-Han Tsai
>Priority: Blocker
> Fix For: 1.10.0, 2.0.0
>
> Attachments: Screenshot_2018-06-18_13-52-41.png
>
>
> The command airflow {color:#8eb021}+_trigger_dag -c 
> "\{'name':'value'}"_+{color} sends conf parameters only to the parent DAG. 
> I'm using SubDags that are dependent on these parameters. And no parameters 
> are recieved by the SubDag.
> From source code of SubDag operator I see that there is no way of passing 
> these trigger parameters to a Subdag.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-2604) dag_id, task_id, execution_date in dag_fail should be indexed

2018-06-17 Thread Stefan Seelmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Seelmann reassigned AIRFLOW-2604:


Assignee: Stefan Seelmann

> dag_id, task_id, execution_date in dag_fail should be indexed
> -
>
> Key: AIRFLOW-2604
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2604
> Project: Apache Airflow
>  Issue Type: Improvement
>Affects Versions: 1.10
>Reporter: Joy Gao
>Assignee: Stefan Seelmann
>Priority: Major
>
> As a follow-up to AIRFLOW-2602, we should index dag_id, task_id and 
> execution_date to make sure the /gantt page (and any other future UIs relying 
> on task_fail) can still be rendered quickly as the table grows in size.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2606) Test needed to ensure database schema always match SQLAlchemy model types

2018-06-17 Thread Stefan Seelmann (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16515066#comment-16515066
 ] 

Stefan Seelmann commented on AIRFLOW-2606:
--

PR: https://github.com/apache/incubator-airflow/pull/3516

> Test needed to ensure database schema always match SQLAlchemy model types
> -
>
> Key: AIRFLOW-2606
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2606
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Joy Gao
>Assignee: Stefan Seelmann
>Priority: Major
>
> An issue was discovered by [this 
> PR|https://github.com/apache/incubator-airflow/pull/3492#issuecomment-396815203]
>  where database schema does not match its corresponding SQLAlchemy model 
> declaration. We should add generic unit test for this to prevent similar bugs 
> from occurring in the future. (Alternatively, we can add the policing logic 
> to `airflow upgradedb` command so each migrations can do the check)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-2606) Test needed to ensure database schema always match SQLAlchemy model types

2018-06-17 Thread Stefan Seelmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Seelmann reassigned AIRFLOW-2606:


Assignee: Stefan Seelmann

> Test needed to ensure database schema always match SQLAlchemy model types
> -
>
> Key: AIRFLOW-2606
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2606
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Joy Gao
>Assignee: Stefan Seelmann
>Priority: Major
>
> An issue was discovered by [this 
> PR|https://github.com/apache/incubator-airflow/pull/3492#issuecomment-396815203]
>  where database schema does not match its corresponding SQLAlchemy model 
> declaration. We should add generic unit test for this to prevent similar bugs 
> from occurring in the future. (Alternatively, we can add the policing logic 
> to `airflow upgradedb` command so each migrations can do the check)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2602) Show failed attempts in Gantt view

2018-06-12 Thread Stefan Seelmann (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510307#comment-16510307
 ] 

Stefan Seelmann commented on AIRFLOW-2602:
--

Screenshot of Gantt view with failed attempts:

!Screenshot_2018-06-13_00-13-21.png!

> Show failed attempts in Gantt view
> --
>
> Key: AIRFLOW-2602
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2602
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: webapp
>Affects Versions: 1.9.0
>Reporter: Stefan Seelmann
>Assignee: Stefan Seelmann
>Priority: Major
> Fix For: Airflow 2.0
>
> Attachments: Screenshot_2018-06-13_00-13-21.png
>
>
> The Gantt view only shows the last attempt (successful or failed). It would 
> be nice to also visualize failed attempts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2602) Show failed attempts in Gantt view

2018-06-12 Thread Stefan Seelmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Seelmann updated AIRFLOW-2602:
-
Description: The Gantt view only shows the last attempt (successful or 
failed). It would be nice to also visualize failed attempts.  (was: The Gantt 
view only shows the last attempt (successful or failed). It would be nice to 
also visulize failed attempts.)

> Show failed attempts in Gantt view
> --
>
> Key: AIRFLOW-2602
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2602
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: webapp
>Affects Versions: 1.9.0
>Reporter: Stefan Seelmann
>Assignee: Stefan Seelmann
>Priority: Major
> Fix For: Airflow 2.0
>
> Attachments: Screenshot_2018-06-13_00-13-21.png
>
>
> The Gantt view only shows the last attempt (successful or failed). It would 
> be nice to also visualize failed attempts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2602) Show failed attempts in Gantt view

2018-06-12 Thread Stefan Seelmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Seelmann updated AIRFLOW-2602:
-
Attachment: Screenshot_2018-06-13_00-13-21.png

> Show failed attempts in Gantt view
> --
>
> Key: AIRFLOW-2602
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2602
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: webapp
>Affects Versions: 1.9.0
>Reporter: Stefan Seelmann
>Assignee: Stefan Seelmann
>Priority: Major
> Fix For: Airflow 2.0
>
> Attachments: Screenshot_2018-06-13_00-13-21.png
>
>
> The Gantt view only shows the last attempt (successful or failed). It would 
> be nice to also visualize failed attempts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2602) Show failed attempts in Gantt view

2018-06-12 Thread Stefan Seelmann (JIRA)
Stefan Seelmann created AIRFLOW-2602:


 Summary: Show failed attempts in Gantt view
 Key: AIRFLOW-2602
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2602
 Project: Apache Airflow
  Issue Type: Improvement
  Components: webapp
Affects Versions: 1.9.0
Reporter: Stefan Seelmann
Assignee: Stefan Seelmann
 Fix For: Airflow 2.0


The Gantt view only shows the last attempt (successful or failed). It would be 
nice to also visulize failed attempts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1863) should the gantt chart provide a drop down of the dag run like the grpah view?

2018-06-01 Thread Stefan Seelmann (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498570#comment-16498570
 ] 

Stefan Seelmann commented on AIRFLOW-1863:
--

PR: [https://github.com/apache/incubator-airflow/pull/3450]

 

> should the gantt chart provide a drop down of the dag run like the grpah view?
> --
>
> Key: AIRFLOW-1863
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1863
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 1.8.2
>Reporter: Yee Ting Li
>Priority: Major
> Attachments: Screenshot_2018-05-31_16-19-43.png
>
>
> the gantt chart is a great way of displaying the efficiency of a dag run. 
> however, the current display of the run 'time' is somewhat arbitrary and does 
> not provide meaningful ways of determining how specific a dag performed (eg 
> with manually triggered dags).
> i think it would be useful to have a drop down of the dag runs (as it 
> currently does with the Graph View) presented on the Gantt View.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1863) should the gantt chart provide a drop down of the dag run like the grpah view?

2018-05-31 Thread Stefan Seelmann (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16496626#comment-16496626
 ] 

Stefan Seelmann commented on AIRFLOW-1863:
--

Implemented here: 
[https://github.com/seelmann/incubator-airflow/tree/AIRFLOW-1863-gantt-view], 
waiting for AIRFLOW-2529 to be merged.

Screenshot:

!Screenshot_2018-05-31_16-19-43.png!

> should the gantt chart provide a drop down of the dag run like the grpah view?
> --
>
> Key: AIRFLOW-1863
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1863
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 1.8.2
>Reporter: Yee Ting Li
>Priority: Major
> Attachments: Screenshot_2018-05-31_16-19-43.png
>
>
> the gantt chart is a great way of displaying the efficiency of a dag run. 
> however, the current display of the run 'time' is somewhat arbitrary and does 
> not provide meaningful ways of determining how specific a dag performed (eg 
> with manually triggered dags).
> i think it would be useful to have a drop down of the dag runs (as it 
> currently does with the Graph View) presented on the Gantt View.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-1863) should the gantt chart provide a drop down of the dag run like the grpah view?

2018-05-31 Thread Stefan Seelmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Seelmann updated AIRFLOW-1863:
-
Attachment: Screenshot_2018-05-31_16-19-43.png

> should the gantt chart provide a drop down of the dag run like the grpah view?
> --
>
> Key: AIRFLOW-1863
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1863
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 1.8.2
>Reporter: Yee Ting Li
>Priority: Major
> Attachments: Screenshot_2018-05-31_16-19-43.png
>
>
> the gantt chart is a great way of displaying the efficiency of a dag run. 
> however, the current display of the run 'time' is somewhat arbitrary and does 
> not provide meaningful ways of determining how specific a dag performed (eg 
> with manually triggered dags).
> i think it would be useful to have a drop down of the dag runs (as it 
> currently does with the Graph View) presented on the Gantt View.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2529) Improve graph view performance and usability

2018-05-31 Thread Stefan Seelmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Seelmann updated AIRFLOW-2529:
-
Summary: Improve graph view performance and usability  (was: Graph View DAG 
Run dropdown)

> Improve graph view performance and usability
> 
>
> Key: AIRFLOW-2529
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2529
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: webapp
>Affects Versions: 1.9.0
>Reporter: Stefan Seelmann
>Assignee: Stefan Seelmann
>Priority: Major
> Attachments: Screenshot_2018-05-28_21-32-38.png
>
>
> The "Graph View" has a dropdown which contains all DAG run IDs. If there are 
> many (thousands) of DAG runs the page gets barely usable. It takes multiple 
> seconds to load the page because all DAG runs must be fetched from DB, are 
> processed, and a long option list is rendered in the browser. It is also not 
> very useful because in such a long list it is hard to find a particular DAG 
> run.
> A simple fix to address the load time would be to just limit the number of 
> shown DAG runs. For example only the latest N are shown, N could be 
> "page_size" from airflow.cfg which is also used in other views. If the DAG 
> run that should be shown (via query parameters execution_date or run_id) is 
> not included in the N lastest list it can still be added by a 2nd SQL query.
> A more complex change to improve usability would require a different way to 
> select a DAG run. For example a popup to search for DAG runs with pagination 
> etc. But such functionality already exits in the /dagrun UI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2529) Graph View DAG Run dropdown

2018-05-30 Thread Stefan Seelmann (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16495761#comment-16495761
 ] 

Stefan Seelmann commented on AIRFLOW-2529:
--

Pull request: https://github.com/apache/incubator-airflow/pull/3441

> Graph View DAG Run dropdown
> ---
>
> Key: AIRFLOW-2529
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2529
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: webapp
>Affects Versions: 1.9.0
>Reporter: Stefan Seelmann
>Assignee: Stefan Seelmann
>Priority: Major
> Attachments: Screenshot_2018-05-28_21-32-38.png
>
>
> The "Graph View" has a dropdown which contains all DAG run IDs. If there are 
> many (thousands) of DAG runs the page gets barely usable. It takes multiple 
> seconds to load the page because all DAG runs must be fetched from DB, are 
> processed, and a long option list is rendered in the browser. It is also not 
> very useful because in such a long list it is hard to find a particular DAG 
> run.
> A simple fix to address the load time would be to just limit the number of 
> shown DAG runs. For example only the latest N are shown, N could be 
> "page_size" from airflow.cfg which is also used in other views. If the DAG 
> run that should be shown (via query parameters execution_date or run_id) is 
> not included in the N lastest list it can still be added by a 2nd SQL query.
> A more complex change to improve usability would require a different way to 
> select a DAG run. For example a popup to search for DAG runs with pagination 
> etc. But such functionality already exits in the /dagrun UI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (AIRFLOW-2529) Graph View DAG Run dropdown

2018-05-30 Thread Stefan Seelmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-2529 started by Stefan Seelmann.

> Graph View DAG Run dropdown
> ---
>
> Key: AIRFLOW-2529
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2529
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: webapp
>Affects Versions: 1.9.0
>Reporter: Stefan Seelmann
>Assignee: Stefan Seelmann
>Priority: Major
> Attachments: Screenshot_2018-05-28_21-32-38.png
>
>
> The "Graph View" has a dropdown which contains all DAG run IDs. If there are 
> many (thousands) of DAG runs the page gets barely usable. It takes multiple 
> seconds to load the page because all DAG runs must be fetched from DB, are 
> processed, and a long option list is rendered in the browser. It is also not 
> very useful because in such a long list it is hard to find a particular DAG 
> run.
> A simple fix to address the load time would be to just limit the number of 
> shown DAG runs. For example only the latest N are shown, N could be 
> "page_size" from airflow.cfg which is also used in other views. If the DAG 
> run that should be shown (via query parameters execution_date or run_id) is 
> not included in the N lastest list it can still be added by a 2nd SQL query.
> A more complex change to improve usability would require a different way to 
> select a DAG run. For example a popup to search for DAG runs with pagination 
> etc. But such functionality already exits in the /dagrun UI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2529) Graph View DAG Run dropdown

2018-05-28 Thread Stefan Seelmann (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16492898#comment-16492898
 ] 

Stefan Seelmann commented on AIRFLOW-2529:
--

Thanks [~TaoFeng] for the pointer, really helpful.

I think it makes sense to add the "Base date" and "Number of runs" widgets 
which are used on the other views to the graph view too. This way the number of 
dag runs in the dropdown is limited. But still it's possible to browse through 
all the dag runs.

I created an initial draft: 
[https://github.com/seelmann/incubator-airflow/tree/AIRFLOW-2529-graph-view-dag-runs]

I have to add the same functionality to the RBAC interface and have to look if 
I can write some tests. But feedback is already welcomed.

I also attached a screenshot:

!Screenshot_2018-05-28_21-32-38.png!

> Graph View DAG Run dropdown
> ---
>
> Key: AIRFLOW-2529
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2529
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: webapp
>Affects Versions: 1.9.0
>Reporter: Stefan Seelmann
>Priority: Major
> Attachments: Screenshot_2018-05-28_21-32-38.png
>
>
> The "Graph View" has a dropdown which contains all DAG run IDs. If there are 
> many (thousands) of DAG runs the page gets barely usable. It takes multiple 
> seconds to load the page because all DAG runs must be fetched from DB, are 
> processed, and a long option list is rendered in the browser. It is also not 
> very useful because in such a long list it is hard to find a particular DAG 
> run.
> A simple fix to address the load time would be to just limit the number of 
> shown DAG runs. For example only the latest N are shown, N could be 
> "page_size" from airflow.cfg which is also used in other views. If the DAG 
> run that should be shown (via query parameters execution_date or run_id) is 
> not included in the N lastest list it can still be added by a 2nd SQL query.
> A more complex change to improve usability would require a different way to 
> select a DAG run. For example a popup to search for DAG runs with pagination 
> etc. But such functionality already exits in the /dagrun UI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2529) Graph View DAG Run dropdown

2018-05-28 Thread Stefan Seelmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Seelmann updated AIRFLOW-2529:
-
Attachment: Screenshot_2018-05-28_21-32-38.png

> Graph View DAG Run dropdown
> ---
>
> Key: AIRFLOW-2529
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2529
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: webapp
>Affects Versions: 1.9.0
>Reporter: Stefan Seelmann
>Priority: Major
> Attachments: Screenshot_2018-05-28_21-32-38.png
>
>
> The "Graph View" has a dropdown which contains all DAG run IDs. If there are 
> many (thousands) of DAG runs the page gets barely usable. It takes multiple 
> seconds to load the page because all DAG runs must be fetched from DB, are 
> processed, and a long option list is rendered in the browser. It is also not 
> very useful because in such a long list it is hard to find a particular DAG 
> run.
> A simple fix to address the load time would be to just limit the number of 
> shown DAG runs. For example only the latest N are shown, N could be 
> "page_size" from airflow.cfg which is also used in other views. If the DAG 
> run that should be shown (via query parameters execution_date or run_id) is 
> not included in the N lastest list it can still be added by a 2nd SQL query.
> A more complex change to improve usability would require a different way to 
> select a DAG run. For example a popup to search for DAG runs with pagination 
> etc. But such functionality already exits in the /dagrun UI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2529) Graph View DAG Run dropdown

2018-05-27 Thread Stefan Seelmann (JIRA)
Stefan Seelmann created AIRFLOW-2529:


 Summary: Graph View DAG Run dropdown
 Key: AIRFLOW-2529
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2529
 Project: Apache Airflow
  Issue Type: Improvement
  Components: webapp
Affects Versions: 1.9.0
Reporter: Stefan Seelmann


The "Graph View" has a dropdown which contains all DAG run IDs. If there are 
many (thousands) of DAG runs the page gets barely usable. It takes multiple 
seconds to load the page because all DAG runs must be fetched from DB, are 
processed, and a long option list is rendered in the browser. It is also not 
very useful because in such a long list it is hard to find a particular DAG run.

A simple fix to address the load time would be to just limit the number of 
shown DAG runs. For example only the latest N are shown, N could be "page_size" 
from airflow.cfg which is also used in other views. If the DAG run that should 
be shown (via query parameters execution_date or run_id) is not included in the 
N lastest list it can still be added by a 2nd SQL query.

A more complex change to improve usability would require a different way to 
select a DAG run. For example a popup to search for DAG runs with pagination 
etc. But such functionality already exits in the /dagrun UI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)