Re: [VOTE] Release Airflow 1.8.2 based on Airflow 1.8.2 RC2

2017-06-30 Thread Bolke de Bruin
Max,

I think you can close the vote?

Bolke

> On 27 Jun 2017, at 02:45, Kengo Seki <sek...@apache.org> wrote:
> 
> +1 (non-binding)
> 
> - verified signatures and checksums
> - ran scheduler and webserver, confirmed they worked fine
> - confirmed the latest fix on v1.8 branch (AIRFLOW-809) is included
> 
> Kengo Seki <sek...@apache.org>
> 
> 
> 2017-06-27 8:53 GMT+09:00 Chris Riccomini <criccom...@apache.org>:
>> +1 (binding)
>> 
>> Been running in our dev env, and everything looks good.
>> 
>> On Mon, Jun 26, 2017 at 3:00 PM, Alex Guziel <alex.guz...@airbnb.com.invalid
>>> wrote:
>> 
>>> Yeah that makes sense. It pages by default at 500 so it explains why we saw
>>> it.
>>> 
>>> On Mon, Jun 26, 2017 at 2:53 PM, Chris Riccomini <criccom...@apache.org>
>>> wrote:
>>> 
>>>> In 1.8.1, the "DAGs" page has "Show  entries". In 1.8.2, it has
>>>> "Show <25> entries". So it looks like prior to 1.8.2, the pagination was
>>>> broken in the sense that it defaulted to the whole list. We have 479 DAGs
>>>> in one env, and it shows them all. It looks like someone fixed the entry
>>> to
>>>> default to 25 now, which exposed the problem for our environments.
>>>> 
>>>> On Mon, Jun 26, 2017 at 2:47 PM, Alex Guziel <alex.guz...@airbnb.com.
>>>> invalid
>>>>> wrote:
>>>> 
>>>>> We're running 1.8.0 + some extras, and none of us added pagination
>>>>> recently, and our homepage is paginated. Are you sure it's not the
>>> number
>>>>> of dags crossing the threshold? Maybe it's some Flask version thing?
>>>>> 
>>>>> On Mon, Jun 26, 2017 at 2:45 PM, Chris Riccomini <
>>> criccom...@apache.org>
>>>>> wrote:
>>>>> 
>>>>>> Yes, I did the 1.8.1 release.
>>>>>> 
>>>>>> On Mon, Jun 26, 2017 at 2:44 PM, Alex Guziel <alex.guz...@airbnb.com
>>> .
>>>>>> invalid
>>>>>>> wrote:
>>>>>> 
>>>>>>> There's no pagination in 1.8.1? Are you sure?
>>>>>>> 
>>>>>>> On Mon, Jun 26, 2017 at 2:37 PM, Chris Riccomini <
>>>>> criccom...@apache.org>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> It's not happening on 1.8.1 (since there's no pagination in that
>>>>>>> version),
>>>>>>>> so I'd count this as a regression. I wouldn't say it's blocking,
>>>> but
>>>>>> it's
>>>>>>>> pretty ugly.
>>>>>>>> 
>>>>>>>> On Mon, Jun 26, 2017 at 2:34 PM, Alex Guziel <
>>>> alex.guz...@airbnb.com
>>>>> .
>>>>>>>> invalid
>>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> I'm not so sure this is a new issue. I think we've seen it on
>>> our
>>>>>>>>> production for quite a while.
>>>>>>>>> 
>>>>>>>>> On Mon, Jun 26, 2017 at 2:31 PM, Chris Riccomini <
>>>>>>> criccom...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> I am seeing a strange UI behavior on 1.8.2.RC2. I've opened a
>>>>> JIRA
>>>>>>>> here:
>>>>>>>>>> 
>>>>>>>>>> https://issues.apache.org/jira/browse/AIRFLOW-1348
>>>>>>>>>> 
>>>>>>>>>> Has anyone else seen this?
>>>>>>>>>> 
>>>>>>>>>> On Mon, Jun 26, 2017 at 3:27 AM, Sumit Maheshwari <
>>>>>>>>> sumeet.ma...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> +1, binding.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Mon, Jun 26, 2017 at 3:49 PM, Bolke de Bruin <
>>>>>> bdbr...@gmail.com
>>>>>>>> 
>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> We have been running it for the last couple of days. No
>>

Re: Airflow profiling

2017-06-27 Thread Bolke de Bruin
Free version also there, maybe more integration testing and benchmarking.

https://stackimpact.com/pricing/ <https://stackimpact.com/pricing/>

B.

> On 27 Jun 2017, at 22:00, Chris Riccomini <criccom...@apache.org> wrote:
> 
> Seems you have to pay?
> 
> On Tue, Jun 27, 2017 at 12:56 PM, Bolke de Bruin <bdbr...@gmail.com> wrote:
> 
>> Just saw this tool on hacker news:
>> 
>> https://github.com/stackimpact/stackimpact-python <https://github.com/
>> stackimpact/stackimpact-python>
>> 
>> Might be interesting for some profiling.
>> 
>> Bolke



Airflow profiling

2017-06-27 Thread Bolke de Bruin
Just saw this tool on hacker news:

https://github.com/stackimpact/stackimpact-python 


Might be interesting for some profiling.

Bolke

Re: [VOTE] Release Airflow 1.8.2 based on Airflow 1.8.2 RC2

2017-06-26 Thread Bolke de Bruin
We have been running it for the last couple of days. No issues and seems more 
responsive.

+1, binding

Bolke

> On 25 Jun 2017, at 01:10, Maxime Beauchemin  
> wrote:
> 
> Dear all,
> 
> 1.8.2 RC2 is baked and available at:
> https://dist.apache.org/repos/dist/dev/incubator/airflow, public keys
> are available
> at https://dist.apache.org/repos/dist/release/incubator/airflow.
> 
> Note that RC1 was the first RC (skipped RC0) and was never announced since
> it had issues coming out of the oven, so RC2 is the first public RC.
> 
> 1.8.2 RC2 is build on to of 1.8.1 with these listed "cherries" on top. I
> added the JIRAs that were identified blockers and targeted 1.8.2. I
> attempted to bring in all of the JIRAs that targeted 1.8.2 but bailed on
> the ones that were generating merge conflicts. I also added all of the
> JIRAs that we've been running in production at Airbnb.
> 
> Issues fixed:
> 9a53e66 [AIRFLOW-809][AIRFLOW-1] Use __eq__ ColumnOperator When Testing
> Booleans
> 333e0b3 [AIRFLOW-1296] Propagate SKIPPED to all downstream tasks
> 93825d5 [AIRFLOW-XXX] Re-enable caching for hadoop components
> 33a9dcb [AIRFLOW-XXX] Pin Hive and Hadoop to a specific version and create
> writable warehouse dir
> 7cff6cd [AIRFLOW-1308] Disable nanny usage for Dask
> 570b2ed [AIRFLOW-1294] Backfills can loose tasks to execute
> 3f48d48 [AIRFLOW-1291] Update NOTICE and LICENSE files to match ASF
> requirements
> 69bd269 [AIRFLOW-1160] Update Spark parameters for Mesos
> 9692510 [AIRFLOW 1149][AIRFLOW-1149] Allow for custom filters in Jinja2
> templates
> 6de5330 [AIRFLOW-1119] Fix unload query so headers are on first row[]
> b4e9eb8 [AIRFLOW-1089] Add Spark application arguments
> a4083f3 [AIRFLOW-1078] Fix latest_runs endpoint for old flask versions
> 7a02841 [AIRFLOW-1074] Don't count queued tasks for concurrency limits
> a2c18a5 [AIRFLOW-1064] Change default sort to job_id for
> TaskInstanceModelView
> d1c64ab [AIRFLOW-1038] Specify celery serialization options explicitly
> b4ee88a [AIRFLOW-1036] Randomize exponential backoff
> 9fca409 [AIRFLOW-993] Update date inference logic
> 272c2f5 [AIRFLOW-1167] Support microseconds in FTPHook modification time
> c7c0b72 [AIRFLOW-1179] Fix Pandas 0.2x breaking Google BigQuery change
> acd0166 [AIRFLOW-1263] Dynamic height for charts
> 7f33f6e [AIRFLOW-1266] Increase width of gantt y axis
> fc33c04 [AIRFLOW-1290] set docs author to 'Apache Airflow'
> 2e9eee3 [AIRFLOW-1282] Fix known event column sorting
> 2389a8a [AIRFLOW-1166] Speed up _change_state_for_tis_without_dagrun
> bf966e6 [AIRFLOW-1192] Some enhancements to qubole_operator
> 57d5bcd [AIRFLOW-1281] Sort variables by key field by default
> 802fc15 [AIRFLOW-1244] Forbid creation of a pool with empty name
> 1232b6a [AIRFLOW-1243] DAGs table has no default entries to show
> b0ba3c9 [AIRFLOW-1227] Remove empty column on the Logs view
> c406652 [AIRFLOW-1226] Remove empty column on the Jobs view
> 51a83cc [AIRFLOW-1199] Fix create modal
> cac7d4c [AIRFLOW-1200] Forbid creation of a variable with an empty key
> 5f3ee52 [AIRFLOW-1186] Sort dag.get_task_instances by execution_date
> f446c08 [AIRFLOW-1145] Fix closest_date_partition function with before set
> to True If we're looking for the closest date before, we should take the
> latest date in the list of date before.
> 93b8e96 [AIRFLOW-1180] Fix flask-wtf version for test_csrf_rejection
> bb56805 [AIRFLOW-1170] DbApiHook insert_rows inserts parameters separately
> 093b2f0 [AIRFLOW-1150] Fix scripts execution in sparksql hook[]
> 777f181 [AIRFLOW-1168] Add closing() to all connections and cursors
> 
> Max



Re: Airflow 1.8.1 scheduler issue

2017-06-23 Thread Bolke de Bruin
This will be fixed in 1.8.2 which will be out shortly (rc2 has the fix).

Bolke

> On 22 Jun 2017, at 20:50, Drew Zoellner  
> wrote:
> 
> Hi airflow dev team,
> 
> We have a subdag which looks like the following...
> 
> 
> 
> This subdag has a concurrency limit of 8. As you can see 8 tasks after our 
> 'select_locations'  task succeed and so do their downstream tasks. The rest 
> of the tasks seem to get forgotten about by the scheduler. We've re-ran this 
> dag a few times, cleaned out the database, ran it again but we still run into 
> the problem.
> 
> After some digging, we found that the scheduler seems to be adding too many 
> tasks to our redis queue. The worker then picks up these tasks from the redis 
> queue, runs 8 of them fine and fails the rest. 
> 
> Here is the error message from the failed ones : Dependencies not met for 
>  performance_kit_180_v1.calculate_stats.transform_tripadvisor_reviews 
> 2017-06-21 00:00:00 [queued]>, dependency 'Task Instance Slots Available' 
> FAILED: The maximum number of running tasks 
> (performance_kit_180_v1.calculate_stats) for this task's DAG '8' has been 
> reached.
> 
> Then is prints this message : 
> 
> 
> 
> FIXME: Rescheduling due to concurrency limits reached at task runtime. 
> Attempt 1 of 2. State set to NONE.
> 
> 
> 
> 
> Queuing into pool None
> 
> We noticed in the airflow code that this error message wasn't expected to 
> happen and that it might point to a problem in the scheduler. Here is the 
> comment which led us to believe that : 
> 
> # FIXME: we might have hit concurrency limits, which means we probably
> # have been running prematurely. This should be handled in the
> # scheduling mechanism.
> 
> 
> 
> We have other dags which are limited by the concurrency limit just fine and 
> none of the tasks of those dags print the above error on our worker node.
> 
> Thanks for any insight into this problem!



Re: Release Manager for 1.8.2?

2017-06-23 Thread Bolke de Bruin
Hi Max,

Looking on a phone, so I might not have spotted everything.

- Tag looks good
- Your public GPG key has not been added yet to 
https://dist.apache.org/repos/dist/release/incubator/airflow/KEYS 
<https://dist.apache.org/repos/dist/release/incubator/airflow/KEYS>
- Please remove the old tar balls from 
https://dist.apache.org/repos/dist/dev/incubator/airflow/ 
<https://dist.apache.org/repos/dist/dev/incubator/airflow/> is causes confusion 
for some reason at the IPMC

@chris please take a look as well.

Bolke

> On 23 Jun 2017, at 00:34, Maxime Beauchemin <maximebeauche...@gmail.com> 
> wrote:
> 
> Aright, so I cherry-picked that last one, some of the other last
> minute ones had conflicts, and I bailed on all conflictual non-blocking
> issues.
> 
> So here is the new tag, it's on the v1-8-test branch:
> https://github.com/apache/incubator-airflow/releases/tag/1.8.2rc2
> 
> The tarball, hashes and signature are here:
> https://dist.apache.org/repos/dist/dev/incubator/airflow/
> 
> Changelog:
> https://github.com/apache/incubator-airflow/blob/1.8.2rc2/CHANGELOG.txt
> 
> Can someone confirm this looks alright before I send an announcement about
> 1.8.2rc2 to the list?
> 
> Max
> 
> On Wed, Jun 21, 2017 at 10:44 PM, Bolke de Bruin <bdbr...@gmail.com> wrote:
> 
>> Please also include https://github.com/apache/incubator-airflow/pull/2022
>> 
>> I have had reports it was also required for MySQL.
>> 
>> Bolke
>> 
>> Sent from my iPhone
>> 
>>> On 22 Jun 2017, at 00:03, Maxime Beauchemin <maximebeauche...@gmail.com>
>> wrote:
>>> 
>>> Ok I'll try to cherry-pick all of these, though if there are any merge
>>> conflicts I'll pass and move their Fix-Version to 1.9.0.
>>> 
>>> Max
>>> 
>>>> On Wed, Jun 21, 2017 at 1:20 AM, Bolke de Bruin <bdbr...@gmail.com>
>> wrote:
>>>> 
>>>> Hi max,
>>>> 
>>>> PR 2365 is merged into master and also into v1-8-test.
>>>> 
>>>> I have added:
>>>> 
>>>> https://issues.apache.org/jira/browse/AIRFLOW-935 <
>>>> https://issues.apache.org/jira/browse/AIRFLOW-935> and
>>>> https://issues.apache.org/jira/browse/AIRFLOW-860 <
>>>> https://issues.apache.org/jira/browse/AIRFLOW-860>
>>>> 
>>>> To the list fro 1.8.2. The PR for this has been in master for some time:
>>>> https://github.com/apache/incubator-airflow/pull/2120 <
>>>> https://github.com/apache/incubator-airflow/pull/2120>
>>>> 
>>>> Can you consider it for 1.8.2? Basically I promised to push to the
>> matter
>>>> :-).
>>>> 
>>>> No blockers anymore as far as I know for 1.8.2.
>>>> 
>>>> Bolke
>>>> 
>>>> P.S. A friendly request if any of the committers do +1 or a LGTM to a
>> PR,
>>>> please also merge the PR instead of leaving it to others. “airflow-pr
>> merge
>>>> XXX” is there to help.
>>>> 
>>>>> On 16 Jun 2017, at 15:11, Bolke de Bruin <bdbr...@gmail.com> wrote:
>>>>> 
>>>>> It is now pinned. I think Cloudera messed up their release for some
>>>> reason.
>>>>> 
>>>>> Builds are succeeding again.
>>>>> 
>>>>> Please note I have one blocker out for 1.8.2:
>> https://github.com/apache/
>>>> incubator-airflow/pull/2365 <https://github.com/apache/
>>>> incubator-airflow/pull/2365>
>>>>> 
>>>>> Bolke
>>>>> 
>>>>>> On 15 Jun 2017, at 16:43, Maxime Beauchemin <
>> maximebeauche...@gmail.com
>>>> <mailto:maximebeauche...@gmail.com>> wrote:
>>>>>> 
>>>>>> Awesome, thanks for looking into it. Is there a way we can pin the
>>>> Cloudera
>>>>>> tarball to avoid surprises like this?
>>>>>> 
>>>>>> On Thu, Jun 15, 2017 at 7:17 AM, Bolke de Bruin <bdbr...@gmail.com
>>>> <mailto:bdbr...@gmail.com>> wrote:
>>>>>> 
>>>>>>> Found the issue for the permission denied part: beeline is not
>>>> executable
>>>>>>> in the latest tarball from Cloudera. I’ll have a workaround for this
>>>> in a
>>>>>>> few minutes and will push it to Apache right away.
>>>>>>> 
>>>>>>> Bolke.
>>>>>>> 
>>>>>>&g

Re: Role Based Access Control for Airflow UI

2017-06-22 Thread Bolke de Bruin
One downside I see from FAB is that is does not do Business Role mapping to FAB 
role. I would prefer to create groups in IPA/LDAP/AD and have those map to FAB 
roles instead of needing to manage that in FAB.

B.

> On 22 Jun 2017, at 09:36, Bolke de Bruin <bdbr...@gmail.com> wrote:
> 
> Hi Guys,
> 
> Thanks for putting the thinking in! It is about time that we get this moving.
> 
> The design looks pretty sound. One can argue about the different roles that 
> are required, but that will be situation dependent I guess.
> 
> Implementation wise I would argue together with Max that FAB is a better or 
> best fit. The ER model that is being described is pretty much a copy of a 
> normal security model. So a reimplementation of that is 1) significant 
> duplication of effort and 2) bound to have bugs that have been solved in the 
> other framework. Moreover, FAB does have integration out of the box with some 
> enterprisey systems like IPA, ActiveDirectory, and LDAP. 
> 
> So while you argue that using FAB would increase the scope of the proposal 
> significantly, but I think that is not true. Using FAB would allow you to 
> focus on what kind of out-of-the-box permission sets and roles we would need 
> and maybe address some issues that FAB lacks (maybe how to deal with non web 
> access - ie. in DAGs, maybe Kerberos, probably how to deal with API calls 
> that are not CRUD). Implementation wise it probably simplifies what we need 
> to do. Maybe - using Max’s early POC as an example - we can slowly move over?
> 
> On a side note: Im planning to hire 2-3 ppl to work on Airflow coming year. 
> Improvement of Security, Enterprise Integration, Revamp UI are on the todo 
> list. However, this is not confirmed yet as business priorities might change.
> 
> Bolke.
> 
> 
>> On 15 Jun 2017, at 21:45, kalpesh dharwadkar <kalpeshdharwad...@gmail.com> 
>> wrote:
>> 
>> @Dan:
>> 
>> Thanks for your feedback. I will remove the REFRESH_DAG permission.
>> 
>> @Max:
>> 
>> Thanks for your response.
>> 
>> The scope of my proposal was just to add RBAC security feature to Airflow
>> without replacing any existing frameworks.
>> 
>> I understand that adopting FAB would serve Airflow better moving forward,
>> however porting Airflow to using FAB significantly increases the scope of
>> the proposal and I don't have the time and expertise to carry out the tasks
>> in the extended scope.
>> 
>> Hence, I'm curious to know if there's a plan for Airflow to migrate to FAB
>> this year?
>> 
>> - Kalpesh
>> 
>> On Mon, Jun 12, 2017 at 6:16 PM, Maxime Beauchemin <
>> maximebeauche...@gmail.com> wrote:
>> 
>>> It would be nice to go with a framework for this. I did some
>>> experimentation using FlaskAppBuilder to go in this direction. It provides
>>> auth on different authentication backends out of the box (oauth, openid,
>>> ldap, registration, ...), generates perms for each view that has an
>>> @has_access decorator, generates at set of perms for each ORM model (show,
>>> edit, delete, add, ...) and enforces it in the CRUD views as well as in the
>>> generated REST api that you get for free as a byprdoduct of deriving FAB's
>>> models (essentially it's SqlAlchemy with a layer on top).
>>> 
>>> I started a POC on FAB here a while ago:
>>> https://github.com/mistercrunch/airflow_webserver at the time my main
>>> motivation was the free/instantaneous REST api.
>>> 
>>> I think FAB is a decent fit as the porting should be fairly straightforward
>>> (moving the flask views over and deprecating Flask-Admin in favor of FAB's
>>> crud) though there was a few blockers. From memory I think FAB didn't like
>>> the compound PKs we use in some of the Airflow models. We'd have to either
>>> write a db migration script on the Airflow side, or add support for
>>> compound keys to FAB (I recently became a maintainer of the project, so I
>>> could help with that)
>>> 
>>> The only downside of FAB is that it's not as mature as something like
>>> Django, but porting to Django would surely be much more work.
>>> 
>>> Then there's the flask-security suite, but that looks like a bit of a
>>> patchwork to me, I guess we can pick and choose which we want to use.
>>> 
>>> Max
>>> 
>>> On Mon, Jun 12, 2017 at 12:50 PM, Dan Davydov <
>>> dan.davy...@airbnb.com.invalid> wrote:
>>> 
>>>> Looks good to me in general, thanks for putting this together!
>>>> 
>>>> I think 

Re: Role Based Access Control for Airflow UI

2017-06-22 Thread Bolke de Bruin
Hi Guys,

Thanks for putting the thinking in! It is about time that we get this moving.

The design looks pretty sound. One can argue about the different roles that are 
required, but that will be situation dependent I guess.

Implementation wise I would argue together with Max that FAB is a better or 
best fit. The ER model that is being described is pretty much a copy of a 
normal security model. So a reimplementation of that is 1) significant 
duplication of effort and 2) bound to have bugs that have been solved in the 
other framework. Moreover, FAB does have integration out of the box with some 
enterprisey systems like IPA, ActiveDirectory, and LDAP. 

So while you argue that using FAB would increase the scope of the proposal 
significantly, but I think that is not true. Using FAB would allow you to focus 
on what kind of out-of-the-box permission sets and roles we would need and 
maybe address some issues that FAB lacks (maybe how to deal with non web access 
- ie. in DAGs, maybe Kerberos, probably how to deal with API calls that are not 
CRUD). Implementation wise it probably simplifies what we need to do. Maybe - 
using Max’s early POC as an example - we can slowly move over?

On a side note: Im planning to hire 2-3 ppl to work on Airflow coming year. 
Improvement of Security, Enterprise Integration, Revamp UI are on the todo 
list. However, this is not confirmed yet as business priorities might change.

Bolke.


> On 15 Jun 2017, at 21:45, kalpesh dharwadkar <kalpeshdharwad...@gmail.com> 
> wrote:
> 
> @Dan:
> 
> Thanks for your feedback. I will remove the REFRESH_DAG permission.
> 
> @Max:
> 
> Thanks for your response.
> 
> The scope of my proposal was just to add RBAC security feature to Airflow
> without replacing any existing frameworks.
> 
> I understand that adopting FAB would serve Airflow better moving forward,
> however porting Airflow to using FAB significantly increases the scope of
> the proposal and I don't have the time and expertise to carry out the tasks
> in the extended scope.
> 
> Hence, I'm curious to know if there's a plan for Airflow to migrate to FAB
> this year?
> 
> - Kalpesh
> 
> On Mon, Jun 12, 2017 at 6:16 PM, Maxime Beauchemin <
> maximebeauche...@gmail.com> wrote:
> 
>> It would be nice to go with a framework for this. I did some
>> experimentation using FlaskAppBuilder to go in this direction. It provides
>> auth on different authentication backends out of the box (oauth, openid,
>> ldap, registration, ...), generates perms for each view that has an
>> @has_access decorator, generates at set of perms for each ORM model (show,
>> edit, delete, add, ...) and enforces it in the CRUD views as well as in the
>> generated REST api that you get for free as a byprdoduct of deriving FAB's
>> models (essentially it's SqlAlchemy with a layer on top).
>> 
>> I started a POC on FAB here a while ago:
>> https://github.com/mistercrunch/airflow_webserver at the time my main
>> motivation was the free/instantaneous REST api.
>> 
>> I think FAB is a decent fit as the porting should be fairly straightforward
>> (moving the flask views over and deprecating Flask-Admin in favor of FAB's
>> crud) though there was a few blockers. From memory I think FAB didn't like
>> the compound PKs we use in some of the Airflow models. We'd have to either
>> write a db migration script on the Airflow side, or add support for
>> compound keys to FAB (I recently became a maintainer of the project, so I
>> could help with that)
>> 
>> The only downside of FAB is that it's not as mature as something like
>> Django, but porting to Django would surely be much more work.
>> 
>> Then there's the flask-security suite, but that looks like a bit of a
>> patchwork to me, I guess we can pick and choose which we want to use.
>> 
>> Max
>> 
>> On Mon, Jun 12, 2017 at 12:50 PM, Dan Davydov <
>> dan.davy...@airbnb.com.invalid> wrote:
>> 
>>> Looks good to me in general, thanks for putting this together!
>>> 
>>> I think the ability to integrate with external RBAC systems like LDAP is
>>> important (i.e. the Airflow DB should not be decoupled with the RBAC
>>> database wherever possible).
>>> 
>>> I wouldn't be too worried about the permissions about refreshing DAGs, as
>>> far as I know this functionality is no longer required with the new
>>> webservers which reload state periodically, and will certainly be removed
>>> when we have a better DAG consistency story.
>>> 
>>> I think it would also be good to think about this proposal/implementation
>>> and how it applied in the API-driven world (e.g. when webserver h

Re: Airflow Logging Improvements

2017-06-22 Thread Bolke de Bruin
In the light of fixing logging, I would definitely appreciate written design. 
Especially, as there have been multiple attempts to fix some issues but these 
have been more like stop gap fixes. 

In my opinion Airflow should not stipulate in a hard coded fashion where and 
how logging takes place. It should behave more like ‘log4j’ configurations. So 
it should not just use “dag_id + task+id + execution_date” and write this to an 
arbitrary location on the filesystem. I could imagine a settings file 
“logging.conf” that setups something like this:

[logger_scheduler]
level = INFO
handler = stderr
qualname = airflow.scheduler
formatter=scheduler_formatter

In airflow.cfg it should allow setting something like this:

[scheduler]
use_syslog = True
syslog_log_facility = LOG_LOCAL0

To allow logging to syslog so it can be moved to a centralised location if 
required (syslog being a special case afaik).

Elasticsearch and any other backend can then just be a handler and we can 
remove the custom stuff that is proposed in PR 
https://github.com/apache/incubator-airflow/pull/2380 
 by 
https://github.com/cmanaha/python-elasticsearch-logger 
 for example.

I then can be convinced to add something like “attempt”, but probably there are 
more friendly ways to solve it at that time. In addition ‘attempts' should then 
imho not be managed by the task or cli, but rather by the executor as that is 
the process which “attempts” a task. 

Bolke.


> On 22 Jun 2017, at 01:21, Dan Davydov  wrote:
> 
> Responding to some of Bolke's concerns in the github PR for this change:
> 
> > Mmm still not convinced. Especially on elastic search it is just easier to 
> > use the start_date to shard on.
> sharding on start_date isn't great because there is still some risk of 
> collisions and it means that we are coupling the primary key with start_date 
> unnecessarily (e.g. hypothetically you could allow two tasks to run at the 
> same in Airflow and in this case start_date would no longer be a valid 
> primary key), using monotonically increasing IDs for DB entries like this is 
> pretty standard practice.
> 
> > In addition I'm very against the managing of log files this way. Log files 
> > are already a mess and should be refactored to be consistent and to be 
> > managed from one place.
> I agree about the logging mess, and there seem to have been efforts 
> attempting to fix this but they have all been abandoned so we decided to move 
> ahead with this change. I need to take a look at the PR first, but this 
> change should actually make logging less messy, since it should add an 
> abstraction for logging modules, and because you know exactly which try 
> numbers (and how many) ran on which workers from the file path. The log 
> folder structure already kind of mimicked the primary key of the 
> task_instance table (dag_id + task_id + execution_date), but really 
> try_number logically belongs in this key as well (at least for the key for 
> log files). 
> 
> > The docker packagers can already not package airflow correctly without 
> > jumping through hoops. Arbitrarily naming it certainly does not help here.
> If this is referring to the // in the path, I don't think this is 
> arbitrarily naming it. A log "unit" really should be a single task run (not 
> an arbitrary grouping of a variable number of multiple runs), and each unit 
> should have a unique key or location. One of the reasons we are working on 
> this effort is to actually make Airflow play nicer with Kubernetes/Docker 
> (since airflow workers should ideally be ephemeral), and allowing a separate 
> service to read and ship the logs is necessary in this case since the logs 
> will be destroyed along with the worker instance. I think in the future we 
> should also allow custom logging modules (e.g. directly writing logs to some 
> service).
> 
> 
> On Wed, Jun 21, 2017 at 3:11 PM, Allison Wang  > wrote:
> Hi,
> 
> I am in the process of making airflow logging backed by Elasticsearch (more 
> detail please check AIRFLOW-1325 
> ). Here are several more 
> logging improvements we are considering:
> 
> 1. Log streaming. Auto-refresh the logs if tasks are running.
> 
> 2. Separate logs by attempts.
> 
> Instead of logging everything into one file, logs can be separated by attempt 
> number and displayed using tabs. Attempt number here is a monotonically 
> increasing number that represents each task instance run (unlike try_number, 
> clear task instance won't reset attempt number).
> try_number: n^th retry by the task instance. try_number should not be greater 
> than retries. Clear task will set try_number to 0.
> attempt: number of times current task instance got executed. 
> 
> 3. Collapsable logs. Collapse logs that are 

Re: Release Manager for 1.8.2?

2017-06-21 Thread Bolke de Bruin
Please also include https://github.com/apache/incubator-airflow/pull/2022

I have had reports it was also required for MySQL. 

Bolke

Sent from my iPhone

> On 22 Jun 2017, at 00:03, Maxime Beauchemin <maximebeauche...@gmail.com> 
> wrote:
> 
> Ok I'll try to cherry-pick all of these, though if there are any merge
> conflicts I'll pass and move their Fix-Version to 1.9.0.
> 
> Max
> 
>> On Wed, Jun 21, 2017 at 1:20 AM, Bolke de Bruin <bdbr...@gmail.com> wrote:
>> 
>> Hi max,
>> 
>> PR 2365 is merged into master and also into v1-8-test.
>> 
>> I have added:
>> 
>> https://issues.apache.org/jira/browse/AIRFLOW-935 <
>> https://issues.apache.org/jira/browse/AIRFLOW-935> and
>> https://issues.apache.org/jira/browse/AIRFLOW-860 <
>> https://issues.apache.org/jira/browse/AIRFLOW-860>
>> 
>> To the list fro 1.8.2. The PR for this has been in master for some time:
>> https://github.com/apache/incubator-airflow/pull/2120 <
>> https://github.com/apache/incubator-airflow/pull/2120>
>> 
>> Can you consider it for 1.8.2? Basically I promised to push to the matter
>> :-).
>> 
>> No blockers anymore as far as I know for 1.8.2.
>> 
>> Bolke
>> 
>> P.S. A friendly request if any of the committers do +1 or a LGTM to a PR,
>> please also merge the PR instead of leaving it to others. “airflow-pr merge
>> XXX” is there to help.
>> 
>>> On 16 Jun 2017, at 15:11, Bolke de Bruin <bdbr...@gmail.com> wrote:
>>> 
>>> It is now pinned. I think Cloudera messed up their release for some
>> reason.
>>> 
>>> Builds are succeeding again.
>>> 
>>> Please note I have one blocker out for 1.8.2: https://github.com/apache/
>> incubator-airflow/pull/2365 <https://github.com/apache/
>> incubator-airflow/pull/2365>
>>> 
>>> Bolke
>>> 
>>>> On 15 Jun 2017, at 16:43, Maxime Beauchemin <maximebeauche...@gmail.com
>> <mailto:maximebeauche...@gmail.com>> wrote:
>>>> 
>>>> Awesome, thanks for looking into it. Is there a way we can pin the
>> Cloudera
>>>> tarball to avoid surprises like this?
>>>> 
>>>> On Thu, Jun 15, 2017 at 7:17 AM, Bolke de Bruin <bdbr...@gmail.com
>> <mailto:bdbr...@gmail.com>> wrote:
>>>> 
>>>>> Found the issue for the permission denied part: beeline is not
>> executable
>>>>> in the latest tarball from Cloudera. I’ll have a workaround for this
>> in a
>>>>> few minutes and will push it to Apache right away.
>>>>> 
>>>>> Bolke.
>>>>> 
>>>>>> On 14 Jun 2017, at 18:15, Bolke de Bruin <bdbr...@gmail.com > bdbr...@gmail.com>> wrote:
>>>>>> 
>>>>>> Hi Max,
>>>>>> 
>>>>>> I have created https://github.com/apache/incubator-airflow/pull/2365
>> <https://github.com/apache/incubator-airflow/pull/2365> <
>>>>> https://github.com/apache/incubator-airflow/pull/2365 <
>> https://github.com/apache/incubator-airflow/pull/2365>> for AIRFLOW-1296,
>>>>> which I think should be treated as a blocker.
>>>>>> 
>>>>>> I think Travis’ is failing due to a dependency upgrade of Dask.
>>>>>> 
>>>>>> The permission denied error seems to come from a new Travis config
>> that
>>>>> does not allow the creation of “/user/hive/warehouse” by the normal
>> user.
>>>>> Probably a “sudo mkdir “ and a “sudo chown” will help here. Let me
>> check.
>>>>>> 
>>>>>> Bolke
>>>>>> 
>>>>>>> On 14 Jun 2017, at 00:26, Maxime Beauchemin <
>> maximebeauche...@gmail.com <mailto:maximebeauche...@gmail.com>
>>>>> <mailto:maximebeauche...@gmail.com <mailto:maximebeauche...@gmail.com>>>
>> wrote:
>>>>>>> 
>>>>>>> A quick update on my progress.
>>>>>>> 
>>>>>>> I ran through the "Releasing Airflow" wiki playbook only to realize I
>>>>> had
>>>>>>> published a version that failed the Travis build
>>>>>>> <https://travis-ci.org/mistercrunch/incubator-airflow/jobs/242532965
>> <https://travis-ci.org/mistercrunch/incubator-airflow/jobs/242532965> <
>>>>> https://travis-ci.org/mistercrunch/incubator-airflow/jobs/242532965 <
>> https://tr

Re: airflow with MSSQL db

2017-06-21 Thread Bolke de Bruin
It is not officially supported but I know of some that run airflow on top of 
mssql. We do integrate fixes for it and did so in the past. 

The sql we run is not overly complex, so in general you should be fine. I would 
stress test it with a couple of examples and run the tests against it. 

Bolke

Sent from my iPhone

> On 21 Jun 2017, at 15:51, Cieplucha, Michal  
> wrote:
> 
> Hello,
> 
> Is MSSQL server supported as airflow metadata DB host? Have anybody tried to 
> set it up? Any concerns?
> 
> Thanks
> mC
> 
> 
> Intel Technology Poland sp. z o.o.
> ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII 
> Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 
> 957-07-52-316 | Kapital zakladowy 200.000 PLN.
> 
> Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i 
> moze zawierac informacje poufne. W razie przypadkowego otrzymania tej 
> wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; 
> jakiekolwiek
> przegladanie lub rozpowszechnianie jest zabronione.
> This e-mail and any attachments may contain confidential material for the 
> sole use of the intended recipient(s). If you are not the intended recipient, 
> please contact the sender and delete all copies; any review or distribution by
> others is strictly prohibited.
> 


Re: Release Manager for 1.8.2?

2017-06-21 Thread Bolke de Bruin
Hi max,

PR 2365 is merged into master and also into v1-8-test.

I have added:

https://issues.apache.org/jira/browse/AIRFLOW-935 
<https://issues.apache.org/jira/browse/AIRFLOW-935> and
https://issues.apache.org/jira/browse/AIRFLOW-860 
<https://issues.apache.org/jira/browse/AIRFLOW-860>

To the list fro 1.8.2. The PR for this has been in master for some time: 
https://github.com/apache/incubator-airflow/pull/2120 
<https://github.com/apache/incubator-airflow/pull/2120>

Can you consider it for 1.8.2? Basically I promised to push to the matter :-).

No blockers anymore as far as I know for 1.8.2.

Bolke

P.S. A friendly request if any of the committers do +1 or a LGTM to a PR, 
please also merge the PR instead of leaving it to others. “airflow-pr merge 
XXX” is there to help.

> On 16 Jun 2017, at 15:11, Bolke de Bruin <bdbr...@gmail.com> wrote:
> 
> It is now pinned. I think Cloudera messed up their release for some reason.
> 
> Builds are succeeding again.
> 
> Please note I have one blocker out for 1.8.2: 
> https://github.com/apache/incubator-airflow/pull/2365 
> <https://github.com/apache/incubator-airflow/pull/2365>
> 
> Bolke
> 
>> On 15 Jun 2017, at 16:43, Maxime Beauchemin <maximebeauche...@gmail.com 
>> <mailto:maximebeauche...@gmail.com>> wrote:
>> 
>> Awesome, thanks for looking into it. Is there a way we can pin the Cloudera
>> tarball to avoid surprises like this?
>> 
>> On Thu, Jun 15, 2017 at 7:17 AM, Bolke de Bruin <bdbr...@gmail.com 
>> <mailto:bdbr...@gmail.com>> wrote:
>> 
>>> Found the issue for the permission denied part: beeline is not executable
>>> in the latest tarball from Cloudera. I’ll have a workaround for this in a
>>> few minutes and will push it to Apache right away.
>>> 
>>> Bolke.
>>> 
>>>> On 14 Jun 2017, at 18:15, Bolke de Bruin <bdbr...@gmail.com 
>>>> <mailto:bdbr...@gmail.com>> wrote:
>>>> 
>>>> Hi Max,
>>>> 
>>>> I have created https://github.com/apache/incubator-airflow/pull/2365 
>>>> <https://github.com/apache/incubator-airflow/pull/2365> <
>>> https://github.com/apache/incubator-airflow/pull/2365 
>>> <https://github.com/apache/incubator-airflow/pull/2365>> for AIRFLOW-1296,
>>> which I think should be treated as a blocker.
>>>> 
>>>> I think Travis’ is failing due to a dependency upgrade of Dask.
>>>> 
>>>> The permission denied error seems to come from a new Travis config that
>>> does not allow the creation of “/user/hive/warehouse” by the normal user.
>>> Probably a “sudo mkdir “ and a “sudo chown” will help here. Let me check.
>>>> 
>>>> Bolke
>>>> 
>>>>> On 14 Jun 2017, at 00:26, Maxime Beauchemin <maximebeauche...@gmail.com 
>>>>> <mailto:maximebeauche...@gmail.com>
>>> <mailto:maximebeauche...@gmail.com <mailto:maximebeauche...@gmail.com>>> 
>>> wrote:
>>>>> 
>>>>> A quick update on my progress.
>>>>> 
>>>>> I ran through the "Releasing Airflow" wiki playbook only to realize I
>>> had
>>>>> published a version that failed the Travis build
>>>>> <https://travis-ci.org/mistercrunch/incubator-airflow/jobs/242532965 
>>>>> <https://travis-ci.org/mistercrunch/incubator-airflow/jobs/242532965> <
>>> https://travis-ci.org/mistercrunch/incubator-airflow/jobs/242532965 
>>> <https://travis-ci.org/mistercrunch/incubator-airflow/jobs/242532965>>>
>>>>> afterwards. I've
>>>>> been updating the wiki as I go and it seems like it will be better the
>>> next
>>>>> time around. I'm getting "Permission denied" on the Hive hook
>>> subprocess if
>>>>> anyone can shed some light on that.
>>>>> 
>>>>> So I need to fix the unit tests at this point, and probably go straight
>>> to
>>>>> `rc2` to avoid changing the files out there for `rc1`.
>>>>> 
>>>>> Max
>>>>> 
>>>>> On Thu, Jun 8, 2017 at 4:54 PM, Ruslan Dautkhanov <dautkha...@gmail.com 
>>>>> <mailto:dautkha...@gmail.com>
>>> <mailto:dautkha...@gmail.com <mailto:dautkha...@gmail.com>>>
>>>>> wrote:
>>>>> 
>>>>>> It would be great if somebody would have a look at following 3 jiras
>>>>>> 
>>>>>> I've flagged
>>>&

Re: Release Manager for 1.8.2?

2017-06-15 Thread Bolke de Bruin
Found the issue for the permission denied part: beeline is not executable in 
the latest tarball from Cloudera. I’ll have a workaround for this in a few 
minutes and will push it to Apache right away.

Bolke.

> On 14 Jun 2017, at 18:15, Bolke de Bruin <bdbr...@gmail.com> wrote:
> 
> Hi Max,
> 
> I have created https://github.com/apache/incubator-airflow/pull/2365 
> <https://github.com/apache/incubator-airflow/pull/2365> for AIRFLOW-1296, 
> which I think should be treated as a blocker.
> 
> I think Travis’ is failing due to a dependency upgrade of Dask.
> 
> The permission denied error seems to come from a new Travis config that does 
> not allow the creation of “/user/hive/warehouse” by the normal user. Probably 
> a “sudo mkdir “ and a “sudo chown” will help here. Let me check.
> 
> Bolke
> 
>> On 14 Jun 2017, at 00:26, Maxime Beauchemin <maximebeauche...@gmail.com 
>> <mailto:maximebeauche...@gmail.com>> wrote:
>> 
>> A quick update on my progress.
>> 
>> I ran through the "Releasing Airflow" wiki playbook only to realize I had
>> published a version that failed the Travis build
>> <https://travis-ci.org/mistercrunch/incubator-airflow/jobs/242532965 
>> <https://travis-ci.org/mistercrunch/incubator-airflow/jobs/242532965>>
>> afterwards. I've
>> been updating the wiki as I go and it seems like it will be better the next
>> time around. I'm getting "Permission denied" on the Hive hook subprocess if
>> anyone can shed some light on that.
>> 
>> So I need to fix the unit tests at this point, and probably go straight to
>> `rc2` to avoid changing the files out there for `rc1`.
>> 
>> Max
>> 
>> On Thu, Jun 8, 2017 at 4:54 PM, Ruslan Dautkhanov <dautkha...@gmail.com 
>> <mailto:dautkha...@gmail.com>>
>> wrote:
>> 
>>> It would be great if somebody would have a look at following 3 jiras
>>> 
>>> I've flagged
>>> https://issues.apache.org/jira/browse/AIRFLOW-1013 
>>> <https://issues.apache.org/jira/browse/AIRFLOW-1013>
>>> https://issues.apache.org/jira/browse/AIRFLOW-1178
>>> https://issues.apache.org/jira/browse/AIRFLOW-1055
>>> 
>>> Two of them were Blockers for one of previous 1.8 releases but were
>>> de-escalated because no resources to fix them.
>>> 
>>> I'm currently Assignee on AIRFLOW-1055 - can't unassign myself - feel free
>>> to scratch my name there.
>>> 
>>> 
>>> 
>>> 
>>> --
>>> Ruslan Dautkhanov
>>> 
>>> On Fri, Jun 9, 2017 at 12:23 AM, Maxime Beauchemin <
>>> maximebeauche...@gmail.com> wrote:
>>> 
>>>> Cool. I'll let everyone know when RC1 is out.
>>>> 
>>>> @community, if you think anything else should be a blocker for 1.8.2
>>> please
>>>> flag it in Jira very soon or it will miss the release train!
>>>> 
>>>> Max
>>>> 
>>>> On Thu, Jun 8, 2017 at 9:06 AM, Chris Riccomini <criccom...@apache.org>
>>>> wrote:
>>>> 
>>>>> Thanks for doing all this, Max! When you're ready, I can deploy to some
>>>> of
>>>>> our environments to verify. Just let me know.
>>>>> 
>>>>> On Thu, Jun 8, 2017 at 8:42 AM, Maxime Beauchemin <
>>>>> maximebeauche...@gmail.com> wrote:
>>>>> 
>>>>>> I branched off `v1-8-test` so all should be good. I just didn't know
>>>> if I
>>>>>> could move ahead with that branch just yet so I branched off. I just
>>>> got
>>>>>> back on `v1-8-test` and pushed what I have to Apache [somehow I had
>>> to
>>>>>> rebase meaning someone added something over the past 24 hours].
>>>>>> 
>>>>>> I just set `Fix Version` of AIRFLOW-1294 to 1.8.2 and set it to
>>>> blocker.
>>>>>> 
>>>>>> I cherry-picked all the commits targeting 1.8.2 I could without
>>> getting
>>>>>> into conflicts. I'd only work at resolving conflicts on blockers. My
>>>> plan
>>>>>> is to add only the 2 rows in red for RC1.
>>>>>> 
>>>>>> Here's the current output of `airflow-jira 1.8.2`:
>>>>>> 
>>>>>> ISSUE ID  |TYPE||PRIORITY  ||STATUS|DESCRIPTION
>>>>>>   |MERGED|PR|COMMITAIRFLOW-1294
>>>>>

Re: Airflow DAG deadlock, "SKIPPED" state not cascading

2017-06-14 Thread Bolke de Bruin
I have created PR https://github.com/apache/incubator-airflow/pull/2365 
<https://github.com/apache/incubator-airflow/pull/2365> for this issue.

Bolke

> On 14 Jun 2017, at 16:26, Bolke de Bruin <bdbr...@gmail.com> wrote:
> 
> Sorry missed your comment on the dag. Will have a look.
> 
>> On 14 Jun 2017, at 13:42, Daniel Huang <dxhu...@gmail.com> wrote:
>> 
>> I think this is the same issue I've been hitting with ShortCircuitOperator
>> and LatestOnlyOperator. I filed
>> https://issues.apache.org/jira/browse/AIRFLOW-1296 a few days ago. It
>> includes a DAG I can consistently reproduce this with on 1.8.1 and master.
>> I get the "This should not happen" log message as well and the DAG fails.
>> 
>> On Wed, Jun 14, 2017 at 3:27 AM, Bolke de Bruin <bdbr...@gmail.com> wrote:
>> 
>>> Please provide the full logs (you are cutting out too much info), dag
>>> definition (sanitized), airflow version.
>>> 
>>> Bolke
>>> 
>>> Sent from my iPhone
>>> 
>>>> On 13 Jun 2017, at 23:51, Rajesh Chamarthi <rajesh.chamar...@gmail.com>
>>> wrote:
>>>> 
>>>> I currently have a dag which follows the following pattern
>>>> 
>>>> short_circuit_operator -> s3_sensor -> downstream_task_1 ->
>>>> Downstream_task_2
>>>> 
>>>> When short circuit evaluates to false, s3_sensor is skipped, other
>>>> downstream task states remains at None and DAG Run fails.
>>>> 
>>>> couple of questions :
>>>> 
>>>> 1) Which part/component of the application (scheduler/operator/?) takes
>>>> care of cascading the skipped status to downstream jobs? Short Circuit
>>>> operator only seems to update the immediate downstream jobs
>>>> 
>>>> 2) Using CeleryExecutor seems to cause this. Are there any other logs or
>>>> processes I can run to figure out the root of the problem?
>>>> 
>>>> More details below
>>>> 
>>>> * ShortCircuitOperator Log: (The first downstream task is set to skipped,
>>>> although log shows a warning)
>>>> 
>>>> ```
>>>> [2017-06-12 09:00:24,552] {base_task_runner.py:95} INFO - Subtask:
>>>> [2017-06-12 09:00:24,552] {python_operator.py:177} INFO - Skipping task:
>>>> on_s3_xyz
>>>> [2017-06-12 09:00:24,553] {base_task_runner.py:95} INFO - Subtask:
>>>> [2017-06-12 09:00:24,553] {python_operator.py:188} WARNING - Task
>>>> <Task(S3KeySensor): on_s3_xyz> was not part of a dag run. This should not
>>>> happen.
>>>> ```
>>>> 
>>>> * Scheduler log (marks the Dag Run as failed)
>>>> 
>>>> [2017-06-13 17:57:20,983] {models.py:4184} DagFileProcessor43 INFO -
>>>> Deadlock; marking run >>> scheduled__2017-06-05T09:00:00, externally triggered: False> failed
>>>> 
>>>> When I check the dag run and run through the code, it looks like trigger
>>>> rule evaluates to false because upstream is "skipped"
>>>> 
>>>> ```
>>>> Previous Dagrun State True The task did not have depends_on_past set.
>>>> Not In Retry Period True The task instance was not marked for retrying.
>>>> Trigger Rule False Task's trigger rule 'all_success' requires all
>>> upstream
>>>> tasks to have succeeded, but found 1 non-success(es).
>>>> upstream_tasks_state={'failed': 0, 'successes': 0, 'skipped': 1,
>>> 'done': 1,
>>>> 'upstream_failed': 0}, upstream_task_ids=['on_s3_xyz']
>>>> ```
>>> 
> 



Re: Airflow DAG deadlock, "SKIPPED" state not cascading

2017-06-14 Thread Bolke de Bruin
Sorry missed your comment on the dag. Will have a look.

> On 14 Jun 2017, at 13:42, Daniel Huang <dxhu...@gmail.com> wrote:
> 
> I think this is the same issue I've been hitting with ShortCircuitOperator
> and LatestOnlyOperator. I filed
> https://issues.apache.org/jira/browse/AIRFLOW-1296 a few days ago. It
> includes a DAG I can consistently reproduce this with on 1.8.1 and master.
> I get the "This should not happen" log message as well and the DAG fails.
> 
> On Wed, Jun 14, 2017 at 3:27 AM, Bolke de Bruin <bdbr...@gmail.com> wrote:
> 
>> Please provide the full logs (you are cutting out too much info), dag
>> definition (sanitized), airflow version.
>> 
>> Bolke
>> 
>> Sent from my iPhone
>> 
>>> On 13 Jun 2017, at 23:51, Rajesh Chamarthi <rajesh.chamar...@gmail.com>
>> wrote:
>>> 
>>> I currently have a dag which follows the following pattern
>>> 
>>> short_circuit_operator -> s3_sensor -> downstream_task_1 ->
>>> Downstream_task_2
>>> 
>>> When short circuit evaluates to false, s3_sensor is skipped, other
>>> downstream task states remains at None and DAG Run fails.
>>> 
>>> couple of questions :
>>> 
>>> 1) Which part/component of the application (scheduler/operator/?) takes
>>> care of cascading the skipped status to downstream jobs? Short Circuit
>>> operator only seems to update the immediate downstream jobs
>>> 
>>> 2) Using CeleryExecutor seems to cause this. Are there any other logs or
>>> processes I can run to figure out the root of the problem?
>>> 
>>> More details below
>>> 
>>> * ShortCircuitOperator Log: (The first downstream task is set to skipped,
>>> although log shows a warning)
>>> 
>>> ```
>>> [2017-06-12 09:00:24,552] {base_task_runner.py:95} INFO - Subtask:
>>> [2017-06-12 09:00:24,552] {python_operator.py:177} INFO - Skipping task:
>>> on_s3_xyz
>>> [2017-06-12 09:00:24,553] {base_task_runner.py:95} INFO - Subtask:
>>> [2017-06-12 09:00:24,553] {python_operator.py:188} WARNING - Task
>>> <Task(S3KeySensor): on_s3_xyz> was not part of a dag run. This should not
>>> happen.
>>> ```
>>> 
>>> * Scheduler log (marks the Dag Run as failed)
>>> 
>>> [2017-06-13 17:57:20,983] {models.py:4184} DagFileProcessor43 INFO -
>>> Deadlock; marking run >> scheduled__2017-06-05T09:00:00, externally triggered: False> failed
>>> 
>>> When I check the dag run and run through the code, it looks like trigger
>>> rule evaluates to false because upstream is "skipped"
>>> 
>>> ```
>>> Previous Dagrun State True The task did not have depends_on_past set.
>>> Not In Retry Period True The task instance was not marked for retrying.
>>> Trigger Rule False Task's trigger rule 'all_success' requires all
>> upstream
>>> tasks to have succeeded, but found 1 non-success(es).
>>> upstream_tasks_state={'failed': 0, 'successes': 0, 'skipped': 1,
>> 'done': 1,
>>> 'upstream_failed': 0}, upstream_task_ids=['on_s3_xyz']
>>> ```
>> 



Re: Airflow DAG deadlock, "SKIPPED" state not cascading

2017-06-14 Thread Bolke de Bruin
Do you have a dag definition that exhibits the issue? If I can reproduce it 
I'll do my best to get it into 1.8.2

Sent from my iPhone

> On 14 Jun 2017, at 13:42, Daniel Huang <dxhu...@gmail.com> wrote:
> 
> I think this is the same issue I've been hitting with ShortCircuitOperator
> and LatestOnlyOperator. I filed
> https://issues.apache.org/jira/browse/AIRFLOW-1296 a few days ago. It
> includes a DAG I can consistently reproduce this with on 1.8.1 and master.
> I get the "This should not happen" log message as well and the DAG fails.
> 
>> On Wed, Jun 14, 2017 at 3:27 AM, Bolke de Bruin <bdbr...@gmail.com> wrote:
>> 
>> Please provide the full logs (you are cutting out too much info), dag
>> definition (sanitized), airflow version.
>> 
>> Bolke
>> 
>> Sent from my iPhone
>> 
>>> On 13 Jun 2017, at 23:51, Rajesh Chamarthi <rajesh.chamar...@gmail.com>
>> wrote:
>>> 
>>> I currently have a dag which follows the following pattern
>>> 
>>> short_circuit_operator -> s3_sensor -> downstream_task_1 ->
>>> Downstream_task_2
>>> 
>>> When short circuit evaluates to false, s3_sensor is skipped, other
>>> downstream task states remains at None and DAG Run fails.
>>> 
>>> couple of questions :
>>> 
>>> 1) Which part/component of the application (scheduler/operator/?) takes
>>> care of cascading the skipped status to downstream jobs? Short Circuit
>>> operator only seems to update the immediate downstream jobs
>>> 
>>> 2) Using CeleryExecutor seems to cause this. Are there any other logs or
>>> processes I can run to figure out the root of the problem?
>>> 
>>> More details below
>>> 
>>> * ShortCircuitOperator Log: (The first downstream task is set to skipped,
>>> although log shows a warning)
>>> 
>>> ```
>>> [2017-06-12 09:00:24,552] {base_task_runner.py:95} INFO - Subtask:
>>> [2017-06-12 09:00:24,552] {python_operator.py:177} INFO - Skipping task:
>>> on_s3_xyz
>>> [2017-06-12 09:00:24,553] {base_task_runner.py:95} INFO - Subtask:
>>> [2017-06-12 09:00:24,553] {python_operator.py:188} WARNING - Task
>>> <Task(S3KeySensor): on_s3_xyz> was not part of a dag run. This should not
>>> happen.
>>> ```
>>> 
>>> * Scheduler log (marks the Dag Run as failed)
>>> 
>>> [2017-06-13 17:57:20,983] {models.py:4184} DagFileProcessor43 INFO -
>>> Deadlock; marking run >> scheduled__2017-06-05T09:00:00, externally triggered: False> failed
>>> 
>>> When I check the dag run and run through the code, it looks like trigger
>>> rule evaluates to false because upstream is "skipped"
>>> 
>>> ```
>>> Previous Dagrun State True The task did not have depends_on_past set.
>>> Not In Retry Period True The task instance was not marked for retrying.
>>> Trigger Rule False Task's trigger rule 'all_success' requires all
>> upstream
>>> tasks to have succeeded, but found 1 non-success(es).
>>> upstream_tasks_state={'failed': 0, 'successes': 0, 'skipped': 1,
>> 'done': 1,
>>> 'upstream_failed': 0}, upstream_task_ids=['on_s3_xyz']
>>> ```
>> 


Re: Airflow DAG deadlock, "SKIPPED" state not cascading

2017-06-14 Thread Bolke de Bruin
Please provide the full logs (you are cutting out too much info), dag 
definition (sanitized), airflow version. 

Bolke

Sent from my iPhone

> On 13 Jun 2017, at 23:51, Rajesh Chamarthi  wrote:
> 
> I currently have a dag which follows the following pattern
> 
> short_circuit_operator -> s3_sensor -> downstream_task_1 ->
> Downstream_task_2
> 
> When short circuit evaluates to false, s3_sensor is skipped, other
> downstream task states remains at None and DAG Run fails.
> 
> couple of questions :
> 
> 1) Which part/component of the application (scheduler/operator/?) takes
> care of cascading the skipped status to downstream jobs? Short Circuit
> operator only seems to update the immediate downstream jobs
> 
> 2) Using CeleryExecutor seems to cause this. Are there any other logs or
> processes I can run to figure out the root of the problem?
> 
> More details below
> 
> * ShortCircuitOperator Log: (The first downstream task is set to skipped,
> although log shows a warning)
> 
> ```
> [2017-06-12 09:00:24,552] {base_task_runner.py:95} INFO - Subtask:
> [2017-06-12 09:00:24,552] {python_operator.py:177} INFO - Skipping task:
> on_s3_xyz
> [2017-06-12 09:00:24,553] {base_task_runner.py:95} INFO - Subtask:
> [2017-06-12 09:00:24,553] {python_operator.py:188} WARNING - Task
>  was not part of a dag run. This should not
> happen.
> ```
> 
> * Scheduler log (marks the Dag Run as failed)
> 
> [2017-06-13 17:57:20,983] {models.py:4184} DagFileProcessor43 INFO -
> Deadlock; marking run  scheduled__2017-06-05T09:00:00, externally triggered: False> failed
> 
> When I check the dag run and run through the code, it looks like trigger
> rule evaluates to false because upstream is "skipped"
> 
> ```
> Previous Dagrun State True The task did not have depends_on_past set.
> Not In Retry Period True The task instance was not marked for retrying.
> Trigger Rule False Task's trigger rule 'all_success' requires all upstream
> tasks to have succeeded, but found 1 non-success(es).
> upstream_tasks_state={'failed': 0, 'successes': 0, 'skipped': 1, 'done': 1,
> 'upstream_failed': 0}, upstream_task_ids=['on_s3_xyz']
> ```


1.8.2rc1 on pypi?

2017-06-13 Thread Bolke de Bruin
Hi Max,

Did you accidentily push 1.8.2rc1 to pypi? That’s a little bit at odds with the 
Apache release process (although it is not an official channel).

Bolke

Re: Role Based Access Control for Airflow UI

2017-06-12 Thread Bolke de Bruin
Will respond but im traveling at the moment. Give me a few days. 

Sent from my iPhone

> On 12 Jun 2017, at 13:39, Chris Riccomini  wrote:
> 
> Hey all,
> 
> Checking in on this. We spent a good chunk of time thinking about this, and
> want to move forward with it, but want to make sure we're all on the same
> page.
> 
> Max? Bolke? Dan? Jeremiah?
> 
> Cheers,
> Chris
> 
> On Thu, Jun 8, 2017 at 1:49 PM, kalpesh dharwadkar <
> kalpeshdharwad...@gmail.com> wrote:
> 
>> Hello everyone,
>> 
>> As you all know, currently Airflow doesn’t have a built-in Role Based
>> Access Control(RBAC) capability.  It does provide very limited
>> authorization capability by providing admin, data_profiler, and user roles.
>> However, associating these roles to authenticated identities is not a
>> simple effort.
>> 
>> To address this issue, I have created a design proposal for building RBAC
>> into Airflow and simplifying user access management via the Airflow UI.
>> 
>> The design proposal is located at https://cwiki.apache.org/
>> confluence/display/AIRFLOW/Airflow+RBAC+proposal
>> 
>> Any comments/questions/feedback are much appreciated.
>> 
>> Thanks
>> Kalpesh
>> 


Re: Tasks Queued but never run

2017-06-09 Thread Bolke de Bruin
I have made PR https://github.com/apache/incubator-airflow/pull/2356 
<https://github.com/apache/incubator-airflow/pull/2356> for this. The issue 
went a little bit deeper than I expected. 

In the backfills we can loose tasks to execute due to a task
setting its own state to NONE if concurrency limits are reached,
this makes them fall outside of the scope the backfill is
managing hence they will not be executed.

Several bugs are the cause for this. Firstly, the state
reported by the executor was always reported as success, ie.
the return code of the task instance was not propagated.
Next to that, if the executor already has a task instance
in its queue it will silently ignore the task instance
being added. The backfills did not guard against this, thus
tasks could get lost here as well.

This patch introduces CONCURRENCY_REACHED as an executor
state, which will be set if the task exits with EBUSY (16).
This allows the backfill to properly handle these tasks
and reschedule them. Please note that the CeleryExecutor
does not report back on executor states.


Please test the patch and report back if it doesn/does not solve the issue.

Bolke.

> On 8 Jun 2017, at 04:23, Russell Pierce <russell.s.pie...@gmail.com> wrote:
> 
> I hadn't thought of it that way. Given that SubDAGs are scheduled as
> backfills, then they'd inherit the same problem. So, the issue I had is
> version specific. Thanks for pointing that out Bolke. Do you know the
> relevant JIRA Issue off hand?
> 
> On Wed, Jun 7, 2017, 4:28 PM Bolke de Bruin <bdbr...@gmail.com> wrote:
> 
>> It is 1.8.x specific in this case (for backfills).
>> 
>> Sent from my iPhone
>> 
>>> On 7 Jun 2017, at 21:35, Russell Pierce <russell.s.pie...@gmail.com>
>> wrote:
>>> 
>>> Probably more of a configuration constellation issue than version
>> specific
>>> or even an 'issue' per se. As noted, on restart the scheduler reschedules
>>> everything. I had a heavy SubDAG that when rescheduled could produce many
>>> extra tasks and a small fixed number of Celery workers. So, the scheduled
>>> tasks wouldn't be done by the time of the scheduler restart and then the
>>> scheduler would reschedule the SubDAG... debugging hilarity followed from
>>> there.
>>> 
>>>> On Wed, Jun 7, 2017, 10:57 AM Jason Chen <chingchien.c...@gmail.com>
>> wrote:
>>>> 
>>>> I am using Airflow 1.7.1.3 with CeleryExecutor, but not run into this
>>>> issue.
>>>> I am wondering if this issue is only for 1.8.x ?
>>>> 
>>>> On Wed, Jun 7, 2017 at 8:34 AM, Russell Pierce <
>> russell.s.pie...@gmail.com
>>>>> 
>>>> wrote:
>>>> 
>>>>> Depending on how fast you can clear down your queue, -n can be harmful
>>>> and
>>>>> really stack up your celery queue. Keep an eye on your queue depth of
>> you
>>>>> see a ton of messages about the task already having been run.
>>>>> 
>>>>> On Mon, Jun 5, 2017, 9:18 AM Josef Samanek <josef.sama...@kiwi.com>
>>>> wrote:
>>>>> 
>>>>>> Hey. Thanks for the answer. I previously also tried to run scheduler
>> -n
>>>>>> 10, but it was back when I was still using LocalExecutor. And it did
>>>> not
>>>>>> help. I have not yet tried to do it with CeleryExecutor, so I might.
>>>>>> 
>>>>>> Still, I would prefer to find an actual solution for the underlying
>>>>>> problem, not just a workaround (eventhough a working workaround is
>> also
>>>>>> appreciated).
>>>>>> 
>>>>>> Best regards,
>>>>>> Joe
>>>>>> 
>>>>>> On 2017-06-02 00:10 (+0200), Alex Guziel <alex.guz...@airbnb.com.
>>>>> INVALID>
>>>>>> wrote:
>>>>>>> We've noticed this with celery, relating to this
>>>>>>> https://github.com/celery/celery/issues/3765
>>>>>>> 
>>>>>>> We also use `-n 5` option on the scheduler so it restarts every 5
>>>> runs,
>>>>>>> which will reset all queued tasks.
>>>>>>> 
>>>>>>> Best,
>>>>>>> Alex
>>>>>>> 
>>>>>>> On Thu, Jun 1, 2017 at 2:18 PM, Josef Samanek <
>>>> josef.sama...@gmail.com
>>>>>> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi!
>>>>>>>> 
>&

Re: Release Manager for 1.8.2?

2017-06-08 Thread Bolke de Bruin
Hi Max,

Sounds good. Couple of things: 

* Can I suggest using the v1-8-test branch as the branch to be used for 
preparing the rc? If we hit RC then move it over to v1-8-stable? V1-8-test 
already had some fixes in that should land in 1.8.2 and the RC should be tagged 
in the stable branch. That also reduces to amount of merge conflicts probably 
as many have been merged. Where did you branch off from? Anyways, see also the 
release management thing on the wiki.

Blocker(!)
* In the backfills we can loose tasks to execute due to a task setting its own 
state to NONE if concurrency limits are reached, this makes them fall outside 
of the scope the backfill is managing hence they will not be executed 
(https://issues.apache.org/jira/browse/AIRFLOW-1294 
). Setting itself to NONE 
should probably be “CONCURRENCY_REACHED” (new state). I have marked it as a 
blocker as we had multiple people hitting the issue, but I need 1-2 days to get 
a patch. Feel free to downgrade to critical if you like :).

Cheers
Bolke


> On 8 Jun 2017, at 02:35, Maxime Beauchemin  wrote:
> 
> What a pleasant, mind numbing afternoon doing some release management
> 
> Notes:
> * Added a warning that the package name has changed on Pypi
> 
> * Removed references to my name here
>  and merged
> * Addressed John D. Ament's concerns here
> , please review!
> * "footable" appears to have been removed, not a problem anymore
> * that `airflow-jira is a god send! thanks Bolke.
> * reviewed list of Airbnb's production cherries and flagged those as `Fix
> Version == 1.8.2`
> * Started branch v1-8-2.rc1 and started picking cherries using
> `airflow-jira compare 1.8.2`
> 
> I'll finish going through picking everything that targeted 1.8.2 that does
> not create merge conflict.
> 
> If there's anything flagged as "blocker" that generates merge conflict,
> I'll go case by case about it.
> 
> Soon after, I should be able to announce 1.8.2 RC1, hopefully sometime
> tomorrow or Friday.
> 
> Let me know if there's anything else I'm missing that I should consider.
> 
> Cheers!
> 
> Max



Re: Tasks Queued but never run

2017-06-07 Thread Bolke de Bruin
Issue is that you are hitting concurrency limits and the tasks set their own 
state to NONE in that case (which they should not but that was a discussion 
earlier with Alex and Dan), therefore they fall out of the list of tasks that 
need to be run.

Working on a patch for it.

Bolke

> On 7 Jun 2017, at 12:04, Bolke de Bruin <bdbr...@gmail.com> wrote:
> 
> I can confirm the issue (havent found the cause yet), but this is with 
> BACKFILLS which function independently
> from the scheduler. So restarting the scheduler will not help.
> 
> Bolke
> 
>> On 6 Jun 2017, at 19:35, Noah Yetter <n...@craftsy.com> wrote:
>> 
>> I'm experiencing the same issue. I've built a simple DAG with no external
>> dependencies other than bash that illustrates the problem consistently on
>> my machine, find it here:
>> https://gist.github.com/slotrans/b3e475c2b9789c4efc41876567902425
>> 
>> If you run it as e.g. airflow backfill tasks_never_run -s 2017-06-06 -e
>> 2017-06-06 you should see some tasks permanently remain in a state of "no
>> status". Restarting the scheduler will not help. Ctrl-C-ing the backfill
>> command and running it again *may* resolve it. The scheduler will
>> continually log messages like the following:
>> 
>> [2017-06-05 18:42:49,372] {jobs.py:1408} INFO - Heartbeating the process
>> manager
>> [2017-06-05 18:42:49,375] {dag_processing.py:559} INFO - Processor for
>> /Users/noah.yetter/airflow/dags/tasks_never_run.py finished
>> [2017-06-05 18:42:49,428] {jobs.py:1007} INFO - Tasks up for execution:
>> > [scheduled]>
>> > [scheduled]>
>> > [scheduled]>
>> > [scheduled]>
>> > [scheduled]>
>> > [scheduled]>
>> > [scheduled]>
>> > [scheduled]>
>> > [scheduled]>
>> > [scheduled]>
>> > [scheduled]>
>> > [scheduled]>
>> > [scheduled]>
>> > [scheduled]>
>> > [scheduled]>
>> > [scheduled]>
>> > [scheduled]>
>> > [scheduled]>
>> > [scheduled]>
>> > [scheduled]>
>> [2017-06-05 18:42:49,430] {jobs.py:1030} INFO - Figuring out tasks to run
>> in Pool(name=None) with 128 open slots and 20 task instances in queue
>> [2017-06-05 18:42:49,463] {jobs.py:1444} INFO - Heartbeating the executor
>> 
>> 
>> On 2017-06-01 15:18 (-0600), "Josef sa...@gmail.com> wrote:
>>> Hi!>
>>> 
>>> We have a problem with our airflow. Sometimes, several tasks get queued
>> but they never get run and remain in Queud state forever. Other tasks from
>> the same schedule interval run. And next schedule interval runs normally
>> too. But these several tasks remain queued.>
>>> 
>>> We are using Airflow 1.8.1. Currently with CeleryExecutor and redis, but
>> we had the same problem with LocalExecutor as well (actually switching to
>> Celery helped quite a bit, the problem now happens way less often, but
>> still it happens). We have 18 DAGs total, 13 active. Some have just 1-2
>> tasks, but some are more complex, like 8 tasks or so and with upstreams.
>> There are also ExternalTaskSensor tasks used. >
>>> 
>>> I tried playing around with DAG configurations (limiting concurrency,
>> max_active_runs, ...), tried switching off some DAGs completely (not all
>> but most) etc., so far nothing helped. Right now, I am not really sure,
>> what else to try to identify a solve the issue.>
>>> 
>>> I am getting a bit desperate, so I would really appreciate any help with
>> this. Thank you all in advance!>
>>> 
>>> Joe>
>>> 
> 



Re: Tasks Queued but never run

2017-06-07 Thread Bolke de Bruin
I can confirm the issue (havent found the cause yet), but this is with 
BACKFILLS which function independently
from the scheduler. So restarting the scheduler will not help.

Bolke

> On 6 Jun 2017, at 19:35, Noah Yetter  wrote:
> 
> I'm experiencing the same issue. I've built a simple DAG with no external
> dependencies other than bash that illustrates the problem consistently on
> my machine, find it here:
> https://gist.github.com/slotrans/b3e475c2b9789c4efc41876567902425
> 
> If you run it as e.g. airflow backfill tasks_never_run -s 2017-06-06 -e
> 2017-06-06 you should see some tasks permanently remain in a state of "no
> status". Restarting the scheduler will not help. Ctrl-C-ing the backfill
> command and running it again *may* resolve it. The scheduler will
> continually log messages like the following:
> 
> [2017-06-05 18:42:49,372] {jobs.py:1408} INFO - Heartbeating the process
> manager
> [2017-06-05 18:42:49,375] {dag_processing.py:559} INFO - Processor for
> /Users/noah.yetter/airflow/dags/tasks_never_run.py finished
> [2017-06-05 18:42:49,428] {jobs.py:1007} INFO - Tasks up for execution:
>  [scheduled]>
>  [scheduled]>
>  [scheduled]>
>  [scheduled]>
>  [scheduled]>
>  [scheduled]>
>  [scheduled]>
>  [scheduled]>
>  [scheduled]>
>  [scheduled]>
>  [scheduled]>
>  [scheduled]>
>  [scheduled]>
>  [scheduled]>
>  [scheduled]>
>  [scheduled]>
>  [scheduled]>
>  [scheduled]>
>  [scheduled]>
>  [scheduled]>
> [2017-06-05 18:42:49,430] {jobs.py:1030} INFO - Figuring out tasks to run
> in Pool(name=None) with 128 open slots and 20 task instances in queue
> [2017-06-05 18:42:49,463] {jobs.py:1444} INFO - Heartbeating the executor
> 
> 
> On 2017-06-01 15:18 (-0600), "Josef sa...@gmail.com> wrote:
>> Hi!>
>> 
>> We have a problem with our airflow. Sometimes, several tasks get queued
> but they never get run and remain in Queud state forever. Other tasks from
> the same schedule interval run. And next schedule interval runs normally
> too. But these several tasks remain queued.>
>> 
>> We are using Airflow 1.8.1. Currently with CeleryExecutor and redis, but
> we had the same problem with LocalExecutor as well (actually switching to
> Celery helped quite a bit, the problem now happens way less often, but
> still it happens). We have 18 DAGs total, 13 active. Some have just 1-2
> tasks, but some are more complex, like 8 tasks or so and with upstreams.
> There are also ExternalTaskSensor tasks used. >
>> 
>> I tried playing around with DAG configurations (limiting concurrency,
> max_active_runs, ...), tried switching off some DAGs completely (not all
> but most) etc., so far nothing helped. Right now, I am not really sure,
> what else to try to identify a solve the issue.>
>> 
>> I am getting a bit desperate, so I would really appreciate any help with
> this. Thank you all in advance!>
>> 
>> Joe>
>> 



Re: task failure propagates correctly in sequential executor but not in celery executor

2017-06-07 Thread Bolke de Bruin
Parallel executing comes to mind. What version of Airflow are you running 
(always report this) and please provide full logs (processor, scheduler, 
worker).

Thanks
Bolke
 
> On 7 Jun 2017, at 00:13, Ali Naqvi  wrote:
> 
> Hi folks,
> 
> So it turns out in the celeryexecutor case the dag deadlocks.
> 
> The last log from the dag run is:
> 
> ```
> Deadlock; marking run  23:24:07.054844: manual__2017-06-01T23:24:07.054844, externally triggered:
> True> failed
> ```
> 
> which is over her:
> 
> https://github.com/apache/incubator-airflow/blob/master/airflow/models.py#L4250-L4253
> 
> and ultimately its the no_dependencies_met variable which is true in this
> case for celery executor.
> 
> I am not clear why this would be an executor specific issue.
> 
> Best Regards,
> Ali
> 
> On Sat, May 6, 2017 at 7:22 PM, Ali Naqvi  wrote:
> 
>> Hi folks,
>> I have a dag that does propagate failures correctly in sequential executor:
>> 
>> https://www.dropbox.com/s/zh0quoj99e44qxh/Screenshot%
>> 202017-05-06%2012.54.10.png?dl=0
>> 
>> but does not propagate failures when using celery executor:
>> 
>> https://www.dropbox.com/s/mfxqhawwf0760gm/Screenshot%
>> 202017-05-06%2019.14.06.png?dl=0
>> Below is sample dag which I used to recreate the problem. I force the
>> failure in the dataops_weekly_update_reviews task by using a non-existent
>> keyword argument.
>> 
>> ```
>> import airflow
>> import datetime
>> from airflow.operators.python_operator import PythonOperator
>> from airflow.models import DAG
>> 
>> args = {
>>'owner': 'airflow',
>>'start_date': datetime.datetime(2017, 5, 5),
>>'queue': 'development'
>> }
>> 
>> dag = DAG(
>>dag_id='example_dataops_weekly_reviews', default_args=args,
>>schedule_interval=None)
>> 
>> 
>> def instantiate_emr_cluster(*args, **kwargs):
>>return "instantiating emr cluster"
>> 
>> task_instantiate_emr_cluster = PythonOperator(
>>task_id="instantiate_emr_cluster",
>>python_callable=instantiate_emr_cluster,
>>provide_context=True,
>>dag=dag)
>> 
>> 
>> def initialize_tables(*args, **kwargs):
>>return "initializing tables {}".format(kwargs["ds"])
>> 
>> 
>> task_initialize_tables = PythonOperator(
>>task_id="initialize_tables",
>>python_callable=initialize_tables,
>>provide_context=True,
>>dag=dag)
>> 
>> 
>> def dataops_weekly_update_reviews(*args, **kwargs):
>>return "UPDATING weekly reviews {}".format(kwargs["dsasdfdsfa"])
>> 
>> 
>> task_dataops_weekly_update_reviews = PythonOperator(
>>task_id="dataops_weekly_update_reviews",
>>python_callable=dataops_weekly_update_reviews,
>>provide_context=True,
>>dag=dag)
>> 
>> 
>> def load_dataops_reviews(*args, **kwargs):
>>return "loading dataops reviews"
>> 
>> 
>> task_load_dataops_reviews = PythonOperator(
>>task_id="load_dataops_reviews",
>>python_callable=load_dataops_reviews,
>>provide_context=True,
>>dag=dag)
>> 
>> 
>> def load_dataops_surveys(**kwargs):
>>return "Print out the running EMR cluster"
>> 
>> 
>> task_load_dataops_surveys = PythonOperator(
>>task_id="load_dataops_surveys",
>>provide_context=True,
>>python_callable=load_dataops_surveys,
>>dag=dag)
>> 
>> 
>> def load_cs_survey_answers(**kwargs):
>>return "load cs survey answers"
>> 
>> 
>> task_load_cs_survey_answers = PythonOperator(
>>task_id="load_cs_survey_answers",
>>provide_context=True,
>>python_callable=load_cs_survey_answers,
>>dag=dag)
>> 
>> 
>> def terminate_emr_cluster(*args, **kwargs):
>>return "terminate emr cluster"
>> 
>> 
>> task_terminate_emr_cluster = PythonOperator(
>>task_id="terminate_emr_cluster",
>>python_callable=terminate_emr_cluster,
>>provide_context=True,
>>trigger_rule="all_done",
>>dag=dag)
>> 
>> 
>> task_initialize_tables.set_upstream(task_instantiate_emr_cluster)
>> task_dataops_weekly_update_reviews.set_upstream(task_initialize_tables)
>> task_load_dataops_reviews.set_upstream(task_dataops_weekly_update_reviews)
>> task_terminate_emr_cluster.set_upstream(task_load_dataops_reviews)
>> task_load_dataops_surveys.set_upstream(task_dataops_weekly_update_reviews)
>> task_terminate_emr_cluster.set_upstream(task_load_dataops_surveys)
>> task_load_cs_survey_answers.set_upstream(task_dataops_
>> weekly_update_reviews)
>> task_terminate_emr_cluster.set_upstream(task_load_cs_survey_answers)
>> 
>> ```
>> 



Re: Release Manager for 1.8.2?

2017-06-07 Thread Bolke de Bruin
Hi Max,

Are you picking this up? I see some open PRs from you that are not too active. 
It would be nice to have a release in 3-4 weeks that also targets full 
compatibility with Apache so we can graduate to top level. Besides summer break 
is getting close and after the summer 1.9.0 is scheduled.

Cheers
Bolke

> On 18 May 2017, at 20:54, Bolke de Bruin <bdbr...@gmail.com> wrote:
> 
> https://cwiki.apache.org/confluence/display/AIRFLOW/Releasing+Airflow 
> <https://cwiki.apache.org/confluence/display/AIRFLOW/Releasing+Airflow>
> 
> (See higher up in the thread)
> 
> Please make sure to address some of the outstanding Apache issues (see also 
> quote below):
> 
> 1. Your name is still mentioned somewhere as author. A patch wasn’t cherry 
> picked earlier for this
> 2. Copyrights 2016-2017 Apache, before Airbnb / you
> 3. License file formatting
> 
> Otherwise it won’t pass the IPMC.
> 
> 
> Quote from the IPMC:
> 
> 
> +1, however there's a few issues with the LICENSE file:
> 
> - Would be good to list out the locations of each file (or path to a group
> of files) (some have this, and others do not so its hard to follow)
> - There's errant /* .. */ around each license declaration, which should be
> removed.
> - Missing license bodies for FooTable v2, jQuery Clock Plugin,
> 
> Likewise, your NOTICE has copyright 2011-2017, however Airflow hasn't been
> incubating that long.  If you like, you can give origination notices to the
> original creators here to specify the original copyright dates.
> 
> I would challenge the podling to see if there's a way to simplify their
> LICENSE by instead using npm or some other javascript packaging tool to
> build a distribution, rather than shipping the dependencies in the source
> release, makes it much easier to use.
> 
> As the podling matures, would be good to see information about the author
> switch from an individual to a community (in setup.cfg, its already in
> setup.py so may have been a miss)
> 
> It would be great to see a binary distribution in the next vote to see how
> that may work, its not clear how to build it from this.  Likewise, don't
> hesitate to clean up your old release artifacts, I downloaded the wrong
> artifact at first.
> 
> 
> 
> Bolke.
> 
> 
> 
>> On 18 May 2017, at 20:49, Maxime Beauchemin <maximebeauche...@gmail.com 
>> <mailto:maximebeauche...@gmail.com>> wrote:
>> 
>> Chris & Bolke, do you have a TODO list / wiki detailing the step-by-step
>> process?
>> 
>> Max
>> 
>> On Thu, May 18, 2017 at 11:46 AM, Maxime Beauchemin <
>> maximebeauche...@gmail.com <mailto:maximebeauche...@gmail.com>> wrote:
>> 
>>> @Andrewm, we can only assume that the author of each commit in master on
>>> top of 1.8.1 wants their commits into 1.8.2.
>>> 
>>> -
>>> 
>>> Ok cool, I'll take this on then, and I'm asking Arthur to see if he wants
>>> to help / oversee the process.
>>> 
>>> I'm planning to make 1.8.2 essentially same as 1.8.1 plus the set of
>>> "cherries" that we use at Airbnb in production and every bugfix / minor
>>> feature that looks benign to us. Given that, we're committing to try out RC
>>> along with everyone else.
>>> 
>>> What cadence are we aiming at? What should be the target date for the RC?
>>> 
>>> Max
>>> 
>>> On Thu, May 18, 2017 at 11:29 AM, Bolke de Bruin <bdbr...@gmail.com 
>>> <mailto:bdbr...@gmail.com>>
>>> wrote:
>>> 
>>>> Hi Max,
>>>> 
>>>> Sounds reasonable. For the Release Manager it is really mostly a
>>>> management job. Chasing, prioritising etc. While it is nice to have a rm
>>>> also being able to run the RCs themselves I don’t think it is an absolute
>>>> requirement. Especially, as I think we should trust the community to test
>>>> and then vote.
>>>> 
>>>> As mentioned the 1.8.X release series should focus on bug fixes,
>>>> performance issue and minor feature updates (UI fixes, fixes to some
>>>> hooks/operators). 1.9.X is for the larger changes. So indeed please keep
>>>> 1.8.2 simple!
>>>> 
>>>> Fully understand that business priorities can take precedence. I (and I
>>>> guess Chris as well) were just hoping that also some of the other
>>>> committers would chime in.
>>>> 
>>>> Cheers
>>>> Bolke
>>>> 
>>>> 
>>>>> On 18 May 2017, at 20:18, Maxime Bea

Re: Cloud ML Operators

2017-06-01 Thread Bolke de Bruin
Hi Peter,

That sounds great! I think the main criteria for this is will you maintain the 
code afterwards? The contrib section is slowly but steadily growing and with 
operators/hooks we are particularly dependent on the community as not all (or 
even none in some case) of the committers use these themselves.

In any case test coverage is required, but that is a given I think.

Kind regards,
Bolke

> On 31 May 2017, at 21:10, Peter Dolan  wrote:
> 
> Hello developers,
> 
> I work with Google Cloud ML, and my team and I are interested in contributing 
> a set of Operators to support working with the Cloud ML platform. The 
> platform supports using the TensorFlow deep neural network framework as a 
> managed system.
> 
> In particular, we would like to contribute
>  * CloudMLTrainingOperator, which would launch and monitor a Cloud ML 
> Training Job (https://cloud.google.com/ml-engine/docs/how-tos/training-jobs 
> ),
>  * CloudMLBatchPredictionOperator, which would launch and monitor a Cloud ML 
> Batch Prediction Job 
> (https://cloud.google.com/ml-engine/docs/how-tos/batch-predict 
> ), and
>  * CloudMLVersionOperator, which can create, update, and delete TensorFlow 
> model versions 
> (https://cloud.google.com/ml-engine/docs/how-tos/managing-models-jobs 
> )
> 
> I'm eager to hear if the Airflow project is open to these contributions, and 
> if any changes are suggested. We have working prototype versions of all of 
> them.
> 
> Thanks in advance,
> Peter



Re: Airflow HA @ING

2017-06-01 Thread Bolke de Bruin
Thanks. Will do.

Bolke

> On 31 May 2017, at 18:40, Hitesh Shah <hit...@apache.org> wrote:
> 
> Hello Bolke 
> 
> It is great that folks are writing about Airflow. Could you please follow up 
> with the author (on behalf of the PPMC) to fix the article to ensure that the 
> first reference to Airflow is called out as "Apache Airflow". See [1] for 
> guidelines. I only see one reference to it near the bottom of the article as 
> part of the install steps. 
> 
> Likewise for references to other Apache projects such as Hadoop.
> 
> thanks
> -- Hitesh
> 
> [1] https://www.apache.org/foundation/marks/#books 
> <https://www.apache.org/foundation/marks/#books>   
> 
> On Tue, May 30, 2017 at 12:32 PM, Bolke de Bruin <bdbr...@gmail.com 
> <mailto:bdbr...@gmail.com>> wrote:
> Hi,
> 
> Just wanted to let you know that one of my team members, Johan Witman, has 
> been writing up on how we are configuring Airflow in HA. We aren’t done yet 
> and some patches will need to land in Airflow to make everything work, but it 
> might inspire others to try it out so we can gather experiences.
> 
> https://medium.com/@ingwbaa/airflow-ha-environment-c60ddca825a9 
> <https://medium.com/@ingwbaa/airflow-ha-environment-c60ddca825a9>
> 
> Cheers
> Bolke
> 
> 



Airflow HA @ING

2017-05-30 Thread Bolke de Bruin
Hi,

Just wanted to let you know that one of my team members, Johan Witman, has been 
writing up on how we are configuring Airflow in HA. We aren’t done yet and some 
patches will need to land in Airflow to make everything work, but it might 
inspire others to try it out so we can gather experiences.

https://medium.com/@ingwbaa/airflow-ha-environment-c60ddca825a9

Cheers
Bolke



Re: Concurrent schedulers

2017-05-23 Thread Bolke de Bruin
Hi Max,

We seem to be in quite good order already. We are testing with multi master 
mysql and will also test multi master Postgres. As we are doing dagrun level 
locking already it does not seem to be required to do DAG-level locking. Also 
tasks are being locked so if multiple schedulers are running everything seems 
to be quite fine. If one of the schedulers restarts it starts checking for 
orphaned tasks by checking the executor queue which is unique for every 
scheduler. This will result it some tasks being dequeued and then requeued. So 
airflow is robust enough to stay alive then (with my patch for deadlocks 
applied), but some things are a bit sub-optimal.

As mentioned we are still stress testing this setup and we might find more.

Bolke

> On 22 May 2017, at 18:19, Maxime Beauchemin <maximebeauche...@gmail.com> 
> wrote:
> 
> Things that might be needed for a correct multi-schedulers setup:
> * DAG-level lock while being evaluated
> * DAG-level lock expiration to recover from potential situation where the
> lock wasn't released
> * Accumulation of the list of task instances to run into the database (as
> opposed to cross process communication to master process)
> * Define a clear master cycle that would read the list of accumulated task
> instances from the DB, dedup, prioritize and schedule. That master cycle
> should have a lock (and lock expiration) as well.
> 
> Max
> 
> On Mon, May 22, 2017 at 12:27 AM, Bolke de Bruin <bdbr...@gmail.com> wrote:
> 
>> Hi Stephen,
>> 
>> We are currently stress testing Airflow for use in a multi-master setup.
>> One of my team members is doing a write up that should show up online
>> shortly. TL;DR; in its current state Airflow will need some patches in
>> order to run concurrently. One issue is that Airflow can have a database
>> deadlock which will stop the scheduler from running. I have a patch for
>> that out here (https://github.com/apache/incubator-airflow/pull/2267 <
>> https://github.com/apache/incubator-airflow/pull/2267>) that works fine
>> on Postgres/MySql (tests don’t pass on sqlite yet due to limitations of
>> sqlite).
>> 
>> Your global scheduler lock (eg. by an active passive configuration) might
>> make most sense for now.
>> 
>> Bolke
>> 
>>> On 22 May 2017, at 07:52, Stephen Rigney <sjrig...@gmail.com> wrote:
>>> 
>>> Hi,
>>> 
>>> We're running airflow in production, but for reliability (n.b. not
>>> performance) we'd like to confirm if it is safe to spawn multiple
>> instances
>>> of the scheduler overlapping in time (otherwise we may need to put more
>>> effort into assuring two copies aren't ever spawned at once in our
>>> environment).
>>> 
>>> 
>>> It seems this officially wasn't a supported configuration back in 2015 (
>>> https://groups.google.com/d/msg/airbnb_airflow/-1wKa3OcwME/uATa8y3YDAAJ
>> ),
>>> but has sufficient intra-airflow locking been added that it is now safe
>> to
>>> start up two temporally overlapping instances of the scheduler for the
>> same
>>> airflow system?
>>> 
>>> 
>>> Or should we hack in a "global scheduler lock" - we're not looking for
>>> increased performance by scheduler parallelism, just that if we ever fire
>>> up two instances of the scheduler nothing terrible happens?
>>> 
>>> 
>>> Stephen
>> 
>> 



Re: Concurrent webservers

2017-05-22 Thread Bolke de Bruin
You should be absolutely fine. Please note that you need to keep your DAG dirs 
in sync (more or less for the webserver).

> On 22 May 2017, at 07:53, Stephen Rigney  wrote:
> 
> Hi,
> 
> We're running airflow in production, but we'd like to confirm if it is safe
> to spawn multiple instances of the webserver on different nodes with the
> same backing database. As it's basically a flask webapp one might assume it
> should be fine, but is it a supported configuration?
> 
> Stephen



Re: Concurrent schedulers

2017-05-22 Thread Bolke de Bruin
Hi Stephen,

We are currently stress testing Airflow for use in a multi-master setup. One of 
my team members is doing a write up that should show up online shortly. TL;DR; 
in its current state Airflow will need some patches in order to run 
concurrently. One issue is that Airflow can have a database deadlock which will 
stop the scheduler from running. I have a patch for that out here 
(https://github.com/apache/incubator-airflow/pull/2267 
) that works fine on 
Postgres/MySql (tests don’t pass on sqlite yet due to limitations of sqlite). 

Your global scheduler lock (eg. by an active passive configuration) might make 
most sense for now.

Bolke

> On 22 May 2017, at 07:52, Stephen Rigney  wrote:
> 
> Hi,
> 
> We're running airflow in production, but for reliability (n.b. not
> performance) we'd like to confirm if it is safe to spawn multiple instances
> of the scheduler overlapping in time (otherwise we may need to put more
> effort into assuring two copies aren't ever spawned at once in our
> environment).
> 
> 
> It seems this officially wasn't a supported configuration back in 2015 (
> https://groups.google.com/d/msg/airbnb_airflow/-1wKa3OcwME/uATa8y3YDAAJ ),
> but has sufficient intra-airflow locking been added that it is now safe to
> start up two temporally overlapping instances of the scheduler for the same
> airflow system?
> 
> 
> Or should we hack in a "global scheduler lock" - we're not looking for
> increased performance by scheduler parallelism, just that if we ever fire
> up two instances of the scheduler nothing terrible happens?
> 
> 
> Stephen



Re: Release Manager for 1.8.2?

2017-05-18 Thread Bolke de Bruin
Max,

Please have a look at the v1-8-test branch which should become 1.8.2 eventually 
by merging it to v1-8-stable. Some fixes have already gone in. Some have been 
targeted 1.8.2 but haven’t been cherry-picked yet. You can view what was 
targeted and what was merged with the airflow-jira tool in ./dev. As RM it is 
fine if you want to retarget some, but please make sure to update Jira to make 
it easy on the next RM.

Bolke

> On 18 May 2017, at 20:46, Maxime Beauchemin <maximebeauche...@gmail.com> 
> wrote:
> 
> @Andrewm, we can only assume that the author of each commit in master on
> top of 1.8.1 wants their commits into 1.8.2.
> 
> -
> 
> Ok cool, I'll take this on then, and I'm asking Arthur to see if he wants
> to help / oversee the process.
> 
> I'm planning to make 1.8.2 essentially same as 1.8.1 plus the set of
> "cherries" that we use at Airbnb in production and every bugfix / minor
> feature that looks benign to us. Given that, we're committing to try out RC
> along with everyone else.
> 
> What cadence are we aiming at? What should be the target date for the RC?
> 
> Max
> 
> On Thu, May 18, 2017 at 11:29 AM, Bolke de Bruin <bdbr...@gmail.com> wrote:
> 
>> Hi Max,
>> 
>> Sounds reasonable. For the Release Manager it is really mostly a
>> management job. Chasing, prioritising etc. While it is nice to have a rm
>> also being able to run the RCs themselves I don’t think it is an absolute
>> requirement. Especially, as I think we should trust the community to test
>> and then vote.
>> 
>> As mentioned the 1.8.X release series should focus on bug fixes,
>> performance issue and minor feature updates (UI fixes, fixes to some
>> hooks/operators). 1.9.X is for the larger changes. So indeed please keep
>> 1.8.2 simple!
>> 
>> Fully understand that business priorities can take precedence. I (and I
>> guess Chris as well) were just hoping that also some of the other
>> committers would chime in.
>> 
>> Cheers
>> Bolke
>> 
>> 
>>> On 18 May 2017, at 20:18, Maxime Beauchemin <maximebeauche...@gmail.com>
>> wrote:
>>> 
>>> Hey,
>>> 
>>> Sorry about the delay answering, I wanted to sync up with the Airflow
>> team
>>> here at Airbnb before I replied here.
>>> 
>>> Quick note to say that the folks at Airbnb are putting a plan together as
>>> to how we can move towards smooth releases with higher confidence in the
>>> future. That plan involves improving the build/test process as well as
>> our
>>> staging infrastructure, possibly enabling progressive rollouts
>> internally.
>>> 
>>> For context, the team that works on Airflow at Airbnb is "Data Platform"
>>> and is also on the hook for big chunks of non-Airflow-related
>>> infrastructure work that hit us recently and accounts for more than the
>>> team's bandwidth at this time. Given that, the team doesn't want to
>> commit
>>> the time/risk to deploy RCs in production in the short term. Clearly
>>> Airflow is still a priority for the team, but on the short term we have
>>> critical things prioritized above that.
>>> 
>>> Part of the solution is for us to hire more engineers, and one of the
>> open
>>> seats is a dedicated role on Airflow tackling things from feature
>> building
>>> to release management. Hopefully we can widen our bandwidth shortly.
>>> 
>>> In the meantime, I can commit the time to handle a release, but this
>>> release won't hit production at Airbnb for a little while, which makes me
>>> wonder whether it's worth committing the time. Maybe there's a
>>> Fedora/RHEL-type scenario here (using a cutting-edge community edition to
>>> stabilize LTS releases), but we know it's not ideal for Airbnb and for
>> the
>>> community. The end goal is clearly to have steady, high-confidence,
>> mostly
>>> automated, regular releases and it feels like time is best spent working
>> in
>>> that direction.
>>> 
>>> Another option is to make [upcoming] 1.8.2 very simple, as 1.8.1 + the
>> few
>>> cherries we run in production already at Airbnb, holding the 50+ extra
>>> commits in master for 1.8.3. This is marginally useful but helps getting
>>> the release mechanics oiled up.
>>> 
>>> I'm trying to be as transparent as I can here, and open to discuss the
>>> different ways we can move forward.
>>> 
>>> Max
>>> 
>>> On Sun, May 14, 2017 at 4:44 AM, Bol

Re: Release Manager for 1.8.2?

2017-05-18 Thread Bolke de Bruin
Hi Max,

Sounds reasonable. For the Release Manager it is really mostly a management 
job. Chasing, prioritising etc. While it is nice to have a rm also being able 
to run the RCs themselves I don’t think it is an absolute requirement. 
Especially, as I think we should trust the community to test and then vote. 

As mentioned the 1.8.X release series should focus on bug fixes, performance 
issue and minor feature updates (UI fixes, fixes to some hooks/operators). 
1.9.X is for the larger changes. So indeed please keep 1.8.2 simple!

Fully understand that business priorities can take precedence. I (and I guess 
Chris as well) were just hoping that also some of the other committers would 
chime in.

Cheers
Bolke


> On 18 May 2017, at 20:18, Maxime Beauchemin <maximebeauche...@gmail.com> 
> wrote:
> 
> Hey,
> 
> Sorry about the delay answering, I wanted to sync up with the Airflow team
> here at Airbnb before I replied here.
> 
> Quick note to say that the folks at Airbnb are putting a plan together as
> to how we can move towards smooth releases with higher confidence in the
> future. That plan involves improving the build/test process as well as our
> staging infrastructure, possibly enabling progressive rollouts internally.
> 
> For context, the team that works on Airflow at Airbnb is "Data Platform"
> and is also on the hook for big chunks of non-Airflow-related
> infrastructure work that hit us recently and accounts for more than the
> team's bandwidth at this time. Given that, the team doesn't want to commit
> the time/risk to deploy RCs in production in the short term. Clearly
> Airflow is still a priority for the team, but on the short term we have
> critical things prioritized above that.
> 
> Part of the solution is for us to hire more engineers, and one of the open
> seats is a dedicated role on Airflow tackling things from feature building
> to release management. Hopefully we can widen our bandwidth shortly.
> 
> In the meantime, I can commit the time to handle a release, but this
> release won't hit production at Airbnb for a little while, which makes me
> wonder whether it's worth committing the time. Maybe there's a
> Fedora/RHEL-type scenario here (using a cutting-edge community edition to
> stabilize LTS releases), but we know it's not ideal for Airbnb and for the
> community. The end goal is clearly to have steady, high-confidence, mostly
> automated, regular releases and it feels like time is best spent working in
> that direction.
> 
> Another option is to make [upcoming] 1.8.2 very simple, as 1.8.1 + the few
> cherries we run in production already at Airbnb, holding the 50+ extra
> commits in master for 1.8.3. This is marginally useful but helps getting
> the release mechanics oiled up.
> 
> I'm trying to be as transparent as I can here, and open to discuss the
> different ways we can move forward.
> 
> Max
> 
> On Sun, May 14, 2017 at 4:44 AM, Bolke de Bruin <bdbr...@gmail.com> wrote:
> 
>> Hi Folks,
>> 
>> With 1.8.1 we have very much improved the reliability airflow, which is
>> great as many new features entered 1.8.0 and the gap from 1.7.1 was huge.
>> What is also great is that we are slowly but surely increasing the test
>> coverage which mitigates some of the risk of regressions going forward. As
>> you know the 1.8.X releases will continue to focus on improved reliability,
>> performance improvements and minor feature updates. The 1.9.X release
>> cycle, which should start around September, will allow for larger feature
>> updates.
>> 
>> I expect 1.8.2 not to have too many PRs, so it will be a relatively simple
>> release process:
>> 
>> 1. Apply bug fixes
>> 2. Add performance fixes
>> 3. Fix some outstanding Apache requirements (Author, Licensing etc)
>> 
>> The process of creating a distribution has been detailed by Chris here:
>> https://cwiki.apache.org/confluence/display/AIRFLOW/Releasing+Airflow <
>> https://cwiki.apache.org/confluence/display/AIRFLOW/Releasing+Airflow>
>> 
>> Now we just need a volunteer (preferably from the committers) to be the
>> Release Manager for 1.8.2 :-).
>> 
>> Who is willing to take this on and make history?
>> 
>> Regards,
>> Bolke
>> 
>> 
>> 



Fwd: Release Manager for 1.8.2?

2017-05-17 Thread Bolke de Bruin
PING.


> Begin forwarded message:
> 
> From: Bolke de Bruin <bdbr...@gmail.com>
> Subject: Release Manager for 1.8.2?
> Date: 14 May 2017 at 13:44:39 GMT+2
> To: dev@airflow.incubator.apache.org
> 
> Hi Folks,
> 
> With 1.8.1 we have very much improved the reliability airflow, which is great 
> as many new features entered 1.8.0 and the gap from 1.7.1 was huge. What is 
> also great is that we are slowly but surely increasing the test coverage 
> which mitigates some of the risk of regressions going forward. As you know 
> the 1.8.X releases will continue to focus on improved reliability, 
> performance improvements and minor feature updates. The 1.9.X release cycle, 
> which should start around September, will allow for larger feature updates.
> 
> I expect 1.8.2 not to have too many PRs, so it will be a relatively simple 
> release process:
> 
> 1. Apply bug fixes 
> 2. Add performance fixes
> 3. Fix some outstanding Apache requirements (Author, Licensing etc)
> 
> The process of creating a distribution has been detailed by Chris here: 
> https://cwiki.apache.org/confluence/display/AIRFLOW/Releasing+Airflow 
> <https://cwiki.apache.org/confluence/display/AIRFLOW/Releasing+Airflow>
> 
> Now we just need a volunteer (preferably from the committers) to be the 
> Release Manager for 1.8.2 :-).
> 
> Who is willing to take this on and make history? 
> 
> Regards,
> Bolke
> 
> 



Re: Simple Airflow BashOperators run but can't be scheduled or un-paused

2017-05-17 Thread Bolke de Bruin
We probably need to use .utcnow() at most places. There was a patch for that, 
but we/I hold it off due to operational implications that it might bring and it 
is kind of hard to test.

B.

> On 17 May 2017, at 02:20, Russell Jurney <russell.jur...@gmail.com> wrote:
> 
> It seems that this entire file needs to be patched so that the
> datetime.now() calls use tz=pytz.utc. At some point Python started
> including timezones in datetime.now() and so this is broken.
> 
> I think I can patch it. But one problem I am having is how do I see the log
> messages of the scheduler when  I use for example:
> self.logger.error(job.latest_heartbeat)
> ?
> 
> Russell Jurney @rjurney <http://twitter.com/rjurney>
> russell.jur...@gmail.com LI <http://linkedin.com/in/russelljurney> FB
> <http://facebook.com/jurney> datasyndrome.com
> 
> On Tue, May 16, 2017 at 3:40 PM, Russell Jurney <russell.jur...@gmail.com>
> wrote:
> 
>> I setup conda to run python 3.4 and I still get this when a job is
>> scheduled and runs:
>> 
>> [2017-05-16 15:37:54,339] {jobs.py:1171} DagFileProcessor1 INFO -
>> Processing agl_p2p_api_worker_dag
>> [2017-05-16 15:37:55,601] {jobs.py:354} DagFileProcessor1 ERROR - Got an
>> exception! Propagating...
>> Traceback (most recent call last):
>>  File "/Users/rjurney/Software/incubator-airflow/airflow/jobs.py", line
>> 346, in helper
>>pickle_dags)
>>  File "/Users/rjurney/Software/incubator-airflow/airflow/utils/db.py",
>> line 48, in wrapper
>>result = func(*args, **kwargs)
>>  File "/Users/rjurney/Software/incubator-airflow/airflow/jobs.py", line
>> 1584, in process_file
>>self._process_dags(dagbag, dags, ti_keys_to_schedule)
>>  File "/Users/rjurney/Software/incubator-airflow/airflow/jobs.py", line
>> 1173, in _process_dags
>>dag_run = self.create_dag_run(dag)
>>  File "/Users/rjurney/Software/incubator-airflow/airflow/utils/db.py",
>> line 48, in wrapper
>>result = func(*args, **kwargs)
>>  File "/Users/rjurney/Software/incubator-airflow/airflow/jobs.py", line
>> 815, in create_dag_run
>>if next_run_date > datetime.now():
>> TypeError: can't compare offset-naive and offset-aware datetimes
>> 
>> 
>> What do I do?
>> 
>> Russell Jurney @rjurney <http://twitter.com/rjurney>
>> russell.jur...@gmail.com LI <http://linkedin.com/in/russelljurney> FB
>> <http://facebook.com/jurney> datasyndrome.com
>> 
>> On Tue, May 16, 2017 at 2:18 PM, Russell Jurney <russell.jur...@gmail.com>
>> wrote:
>> 
>>> Thanks, we're trying that now!
>>> 
>>> Russell Jurney @rjurney <http://twitter.com/rjurney>
>>> russell.jur...@gmail.com LI <http://linkedin.com/in/russelljurney> FB
>>> <http://facebook.com/jurney> datasyndrome.com
>>> 
>>> On Tue, May 16, 2017 at 2:02 PM, Bolke de Bruin <bdbr...@gmail.com>
>>> wrote:
>>> 
>>>> Did you try to run this on Py 2.7 / 3.4 as well? I notice you are
>>>> running on 3.6, which we are not testing against at the moment.
>>>> 
>>>> Bolke.
>>>> 
>>>>> On 16 May 2017, at 22:46, Russell Jurney <russell.jur...@gmail.com>
>>>> wrote:
>>>>> 
>>>>> We have tasks that run, but we can't get them to run as scheduled or to
>>>>> un-pause.
>>>>> 
>>>>> The code for the task looks like this:
>>>>> 
>>>>> # Run the API worker every 5 minutes
>>>>> api_worker_dag = DAG(
>>>>>   'agl_p2p_api_worker_dag',
>>>>>   default_args=default_args,
>>>>>   schedule_interval=timedelta(minutes=5)
>>>>> )
>>>>> 
>>>>> # Run the API worker[
>>>>> api_worker_task = BashOperator(
>>>>>   task_id="api_worker_task",
>>>>>   bash_command="""python {{ params.base_path
>>>>> }}/agl-p2p-api-worker/site/worker.py {{ ds }}""",
>>>>>   params={
>>>>>   "base_path": project_home
>>>>>   },
>>>>>   dag=api_worker_dag
>>>>> )
>>>>> 
>>>>> We run this command: airflow unpause agl_p2p_api_worker_dag
>>>>> 
>>>>> And we see this error:
>>>>> 
>>>>> [2017-05-16 20:26:48,722] {jobs.py:1408} INFO - Heartbeating the
>>>> process
>>&

Re: Simple Airflow BashOperators run but can't be scheduled or un-paused

2017-05-16 Thread Bolke de Bruin
Did you try to run this on Py 2.7 / 3.4 as well? I notice you are running on 
3.6, which we are not testing against at the moment.

Bolke.

> On 16 May 2017, at 22:46, Russell Jurney  wrote:
> 
> We have tasks that run, but we can't get them to run as scheduled or to
> un-pause.
> 
> The code for the task looks like this:
> 
> # Run the API worker every 5 minutes
> api_worker_dag = DAG(
>'agl_p2p_api_worker_dag',
>default_args=default_args,
>schedule_interval=timedelta(minutes=5)
> )
> 
> # Run the API worker[
> api_worker_task = BashOperator(
>task_id="api_worker_task",
>bash_command="""python {{ params.base_path
> }}/agl-p2p-api-worker/site/worker.py {{ ds }}""",
>params={
>"base_path": project_home
>},
>dag=api_worker_dag
> )
> 
> We run this command: airflow unpause agl_p2p_api_worker_dag
> 
> And we see this error:
> 
> [2017-05-16 20:26:48,722] {jobs.py:1408} INFO - Heartbeating the process
> manager
> [2017-05-16 20:26:48,723] {dag_processing.py:559} INFO - Processor for
> /root/airflow/dags/setup.py finished
> [2017-05-16 20:26:48,723] {dag_processing.py:578} WARNING - Processor for
> /root/airflow/dags/setup.py exited with return code 1. See
> /root/airflow/logs/scheduler/2017-05-16/setup.py.log for details.
> [2017-05-16 20:26:48,726] {dag_processing.py:627} INFO - Started a process
> (PID: 110) to generate tasks for /root/airflow/dags/setup.py - logging into
> /root/airflow/logs/scheduler/2017-05-16/setup.py.log
> [2017-05-16 20:26:48,727] {jobs.py:1444} INFO - Heartbeating the executor
> [2017-05-16 20:26:48,727] {jobs.py:1454} INFO - Heartbeating the scheduler
> Process DagFileProcessor19-Process:
> Traceback (most recent call last):
>  File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 249, in
> _bootstrap
>self.run()
>  File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 93, in
> run
>self._target(*self._args, **self._kwargs)
>  File "/opt/conda/lib/python3.6/site-packages/airflow/jobs.py", line 346,
> in helper
>pickle_dags)
>  File "/opt/conda/lib/python3.6/site-packages/airflow/utils/db.py", line
> 53, in wrapper
>result = func(*args, **kwargs)
>  File "/opt/conda/lib/python3.6/site-packages/airflow/jobs.py", line 1585,
> in process_file
>self._process_dags(dagbag, dags, ti_keys_to_schedule)
>  File "/opt/conda/lib/python3.6/site-packages/airflow/jobs.py", line 1174,
> in _process_dags
>dag_run = self.create_dag_run(dag)
>  File "/opt/conda/lib/python3.6/site-packages/airflow/utils/db.py", line
> 53, in wrapper
>result = func(*args, **kwargs)
>  File "/opt/conda/lib/python3.6/site-packages/airflow/jobs.py", line 807,
> in create_dag_run
>else max(next_run_date, dag.start_date))
> TypeError: can't compare offset-naive and offset-aware datetimes
> 
> 
> Note that we can run/schedule the example DAGs from the CLI, and our DAGs
> are very closely derived from the examples, so we don't know what to do!
> 
> For instance we can run:
> 
> docker@airflow:/mnt/airflow$ airflow run agl_p2p_api_worker_dag
> api_worker_task 2017-01-01
> [2017-05-16 20:44:41,046] {__init__.py:57} INFO - Using executor
> SequentialExecutor
> Sending to executor.
> [2017-05-16 20:44:43,274] {__init__.py:57} INFO - Using executor
> SequentialExecutor
> Logging into:
> /root/airflow/logs/agl_p2p_api_worker_dag/api_worker_task/2017-01-01T00:00:00
> 
> 
> And there are no errors. We are sure the scheduler is running and the
> webserver is running. The airflow run/test commands work, but we get this
> same error above when we click the activate button on the web app or
> unpause the DAG from the CLI. We have wondered if our start date has to be
> in the future maybe? We don't know.
> 
> Any help would be appreciated. Thanks!
> 
> Russell Jurney @rjurney 
> russell.jur...@gmail.com LI  FB
>  datasyndrome.com



Re: Discussion on Airflow 1.8.1 RC2

2017-05-04 Thread Bolke de Bruin
Gotcha. I can relate to that. With hindsight the backfill change should not 
have made it into the RC cycle. Any way I hope you guys can jump on the wagon 
pretty soon again: I do think 1.8.1 is in a lot better shape than 1.8.0. It 
would be nice if you guys could take on 1.8.2.

And as mentioned if it has to do with “backfill” for the moment I feel my name 
is tagged on that.

- Bolke

> On 4 May 2017, at 21:27, Dan Davydov <dan.davy...@airbnb.com> wrote:
> 
> Thinking back it may have been 1.8.0rc5-> 1.8.0 regressions. I am still 
> worried about the large number of PRs in 1.8.1 even if they are all bug fixes 
> though (known issues that we already have patches for vs unknown new issues 
> introduced with the 1.8.1 patches) , but I agree with your sentiment that 
> these PRs should most likely make things more stable.
> 
> On Thu, May 4, 2017 at 10:55 AM, Alex Guziel <alex.guz...@airbnb.com 
> <mailto:alex.guz...@airbnb.com>> wrote:
> I don't think any of the fixes I did were regressions.
> 
> On Thu, May 4, 2017 at 8:11 AM, Bolke de Bruin <bdbr...@gmail.com 
> <mailto:bdbr...@gmail.com>> wrote:
> I know of one that Alex wanted to get in, but wasn’t targeted for 1.8.1 in 
> Jira and thus didn’t make the cut at RC time. There is is another one out 
> that seems to have stalled a bit 
> (https://github.com/apache/incubator-airflow/pull/2205 
> <https://github.com/apache/incubator-airflow/pull/2205>).
> 
> Reading the changelog of 1.8.1 I see bug fixes, apache requirements and one 
> “new” feature (UI lightning bolt). Regressions could have happened but we 
> have been quite vigilant on the fact that these bug fixes needed proper 
> tests, so I am very interested in 1.8.0 -> 1.8.1 regressions. If it is a 
> pre-backfill-change 1.8.0 to 1.8.1 regression then I would also like to know, 
> cause I made that change and feel responsible for it.
> 
> Cheers
> Bolke
> 
> 
>> On 3 May 2017, at 22:13, Dan Davydov <dan.davy...@airbnb.com 
>> <mailto:dan.davy...@airbnb.com>> wrote:
>> 
>> cc Alex and Rui who were working on fixes, I'm not sure if their commits got 
>> in before 1.8.1.
>> 
>> On Wed, May 3, 2017 at 1:09 PM, Bolke de Bruin <bdbr...@gmail.com 
>> <mailto:bdbr...@gmail.com>> wrote:
>> Hi Dan,
>> 
>> (Thread renamed to make sure it does not clash, dev@ now added)
>> 
>> It surprises me that you found regression from 1.8.0 to 1.8.1 as 1.8.1 is 
>> very much focused on bug fixes. Were the regressions shared yet? 
>> 
>> The whole 1.8.X release will be bug fix focused (per release management) and 
>> minor feature updates. The 1.9.0 release will be the first release with 
>> major feature updates. So what you want, more robustness and focus on 
>> stability, is now underway. I agree with beefing up tests and including the 
>> major operators in this. Executors should also be on this list btw. Turning 
>> on coverage reporting might be a first step in helping this (it isn’t the 
>> solution obviously).
>> 
>> Cheers
>> Bolke
>> 
>> 
>>> On 3 May 2017, at 20:28, Dan Davydov <dan.davy...@airbnb.com 
>>> <mailto:dan.davy...@airbnb.com>> wrote:
>>> 
>>> We saw several regressions moving from 1.8.0 to 1.8.1 the first time we 
>>> tried, and while I think we merged all our fixes to master (not sure if 
>>> they all made it into 1.8.1 however), we have put releasing on hold due to 
>>> stability issues from the last couple of releases. It's either the case 
>>> that:
>>> A) Airbnb requires more robustness from new releases.
>>> or
>>> B) Most companies using Airflow require more robustness and we should halt 
>>> on feature work until we are more confident in our testing
>>> 
>>> I think the biggest problem currently is the lack of unit testing coverage, 
>>> e.g. when the backfill framework was refactored (which was the right 
>>> long-term fix), it caused a lot of breakages that weren't caught by tests. 
>>> I think we need to audit the major operators/classes and beef up the unit 
>>> testing coverage. The coverage metric does not necessarily cover these 
>>> cases (e.g. cyclomatic complexity). Writing regression tests is good but we 
>>> shouldn't have so many new blocker issues in our releases.
>>> 
>>> We are fighting some fires internally at the moment (not Airflow related), 
>>> but Alex and I have been working on some stuff that we will push to the 
>>> community once we are done. Alex is working on a good solution for python 
>>> package isolation, and 

Re: Discussion on Airflow 1.8.1 RC2

2017-05-04 Thread Bolke de Bruin
I know of one that Alex wanted to get in, but wasn’t targeted for 1.8.1 in Jira 
and thus didn’t make the cut at RC time. There is is another one out that seems 
to have stalled a bit (https://github.com/apache/incubator-airflow/pull/2205 
<https://github.com/apache/incubator-airflow/pull/2205>).

Reading the changelog of 1.8.1 I see bug fixes, apache requirements and one 
“new” feature (UI lightning bolt). Regressions could have happened but we have 
been quite vigilant on the fact that these bug fixes needed proper tests, so I 
am very interested in 1.8.0 -> 1.8.1 regressions. If it is a 
pre-backfill-change 1.8.0 to 1.8.1 regression then I would also like to know, 
cause I made that change and feel responsible for it.

Cheers
Bolke


> On 3 May 2017, at 22:13, Dan Davydov <dan.davy...@airbnb.com> wrote:
> 
> cc Alex and Rui who were working on fixes, I'm not sure if their commits got 
> in before 1.8.1.
> 
> On Wed, May 3, 2017 at 1:09 PM, Bolke de Bruin <bdbr...@gmail.com 
> <mailto:bdbr...@gmail.com>> wrote:
> Hi Dan,
> 
> (Thread renamed to make sure it does not clash, dev@ now added)
> 
> It surprises me that you found regression from 1.8.0 to 1.8.1 as 1.8.1 is 
> very much focused on bug fixes. Were the regressions shared yet? 
> 
> The whole 1.8.X release will be bug fix focused (per release management) and 
> minor feature updates. The 1.9.0 release will be the first release with major 
> feature updates. So what you want, more robustness and focus on stability, is 
> now underway. I agree with beefing up tests and including the major operators 
> in this. Executors should also be on this list btw. Turning on coverage 
> reporting might be a first step in helping this (it isn’t the solution 
> obviously).
> 
> Cheers
> Bolke
> 
> 
>> On 3 May 2017, at 20:28, Dan Davydov <dan.davy...@airbnb.com 
>> <mailto:dan.davy...@airbnb.com>> wrote:
>> 
>> We saw several regressions moving from 1.8.0 to 1.8.1 the first time we 
>> tried, and while I think we merged all our fixes to master (not sure if they 
>> all made it into 1.8.1 however), we have put releasing on hold due to 
>> stability issues from the last couple of releases. It's either the case that:
>> A) Airbnb requires more robustness from new releases.
>> or
>> B) Most companies using Airflow require more robustness and we should halt 
>> on feature work until we are more confident in our testing
>> 
>> I think the biggest problem currently is the lack of unit testing coverage, 
>> e.g. when the backfill framework was refactored (which was the right 
>> long-term fix), it caused a lot of breakages that weren't caught by tests. I 
>> think we need to audit the major operators/classes and beef up the unit 
>> testing coverage. The coverage metric does not necessarily cover these cases 
>> (e.g. cyclomatic complexity). Writing regression tests is good but we 
>> shouldn't have so many new blocker issues in our releases.
>> 
>> We are fighting some fires internally at the moment (not Airflow related), 
>> but Alex and I have been working on some stuff that we will push to the 
>> community once we are done. Alex is working on a good solution for python 
>> package isolation, and I'm working on integration with Kubernetes at the 
>> executor level.
>> 
>> Feel free to forward any of my messages to the dev mailing list.
>> 
>> On Wed, May 3, 2017 at 11:18 AM, Bolke de Bruin <bdbr...@gmail.com 
>> <mailto:bdbr...@gmail.com>> wrote:
>> Grrr, I seriously dislike to send button on the touch bar…here goes again.
>> 
>> Hi Dan,
>> 
>> (Please note I would like to forward the next message to dev@, but let me 
>> know if you don’t find it comfortable)
>> 
>> I understand your point. The gap between 1.7.1 was large in terms of 
>> functionality changes etc. It was going to be a (bit?) rough and as you guys 
>> are using many of the edge cases you probably found more issues than any of 
>> us. Still, between 1.8.0 and 1.8.1 we have added many tests (coverage 
>> increased from 67% to close to 69%, which is a lot as you know). It would be 
>> nice if you can share where your areas of concern are so we can address 
>> those and a suggestion on how to proceed with integration tests is also 
>> welcome. 
>> 
>> You guys (=Airbnb) have been a bit quiet over the past couple of days, so I 
>> am getting a bit worried in terms of engagement. Is that warranted?
>> 
>> Cheers
>> Bolke
>> 
>> 
>>> On 3 May 2017, at 20:13, Bolke de Bruin <bdbr...@gmail.com 
>>> <mailto:bdbr...@gmail.com>> wrote:
>&

Discussion on Airflow 1.8.1 RC2

2017-05-03 Thread Bolke de Bruin
Hi Dan,

(Thread renamed to make sure it does not clash, dev@ now added)

It surprises me that you found regression from 1.8.0 to 1.8.1 as 1.8.1 is very 
much focused on bug fixes. Were the regressions shared yet? 

The whole 1.8.X release will be bug fix focused (per release management) and 
minor feature updates. The 1.9.0 release will be the first release with major 
feature updates. So what you want, more robustness and focus on stability, is 
now underway. I agree with beefing up tests and including the major operators 
in this. Executors should also be on this list btw. Turning on coverage 
reporting might be a first step in helping this (it isn’t the solution 
obviously).

Cheers
Bolke


> On 3 May 2017, at 20:28, Dan Davydov <dan.davy...@airbnb.com> wrote:
> 
> We saw several regressions moving from 1.8.0 to 1.8.1 the first time we 
> tried, and while I think we merged all our fixes to master (not sure if they 
> all made it into 1.8.1 however), we have put releasing on hold due to 
> stability issues from the last couple of releases. It's either the case that:
> A) Airbnb requires more robustness from new releases.
> or
> B) Most companies using Airflow require more robustness and we should halt on 
> feature work until we are more confident in our testing
> 
> I think the biggest problem currently is the lack of unit testing coverage, 
> e.g. when the backfill framework was refactored (which was the right 
> long-term fix), it caused a lot of breakages that weren't caught by tests. I 
> think we need to audit the major operators/classes and beef up the unit 
> testing coverage. The coverage metric does not necessarily cover these cases 
> (e.g. cyclomatic complexity). Writing regression tests is good but we 
> shouldn't have so many new blocker issues in our releases.
> 
> We are fighting some fires internally at the moment (not Airflow related), 
> but Alex and I have been working on some stuff that we will push to the 
> community once we are done. Alex is working on a good solution for python 
> package isolation, and I'm working on integration with Kubernetes at the 
> executor level.
> 
> Feel free to forward any of my messages to the dev mailing list.
> 
> On Wed, May 3, 2017 at 11:18 AM, Bolke de Bruin <bdbr...@gmail.com 
> <mailto:bdbr...@gmail.com>> wrote:
> Grrr, I seriously dislike to send button on the touch bar…here goes again.
> 
> Hi Dan,
> 
> (Please note I would like to forward the next message to dev@, but let me 
> know if you don’t find it comfortable)
> 
> I understand your point. The gap between 1.7.1 was large in terms of 
> functionality changes etc. It was going to be a (bit?) rough and as you guys 
> are using many of the edge cases you probably found more issues than any of 
> us. Still, between 1.8.0 and 1.8.1 we have added many tests (coverage 
> increased from 67% to close to 69%, which is a lot as you know). It would be 
> nice if you can share where your areas of concern are so we can address those 
> and a suggestion on how to proceed with integration tests is also welcome. 
> 
> You guys (=Airbnb) have been a bit quiet over the past couple of days, so I 
> am getting a bit worried in terms of engagement. Is that warranted?
> 
> Cheers
> Bolke
> 
> 
>> On 3 May 2017, at 20:13, Bolke de Bruin <bdbr...@gmail.com 
>> <mailto:bdbr...@gmail.com>> wrote:
>> 
>> Hi Dan,
>> 
>> (Please note I would like to forward the next message to dev@, but let me 
>> know if you don’t find it comfortable)
>> 
>> I understand your point. The gap between 1.7.1 was large in terms of 
>> functionality changes etc. It was going to be a (bit?) rough and as you guys 
>> are using many of the edge cases you probably found more issues than any of 
>> us. Still, between 1.8.0 and 1.8.1 we have added many tests (coverage 
>> increased from 67
>>> On 3 May 2017, at 19:41, Arthur Wiedmer <arthur.wied...@airbnb.com 
>>> <mailto:arthur.wied...@airbnb.com>> wrote:
>>> 
>>> As a counterpoint,
>>> 
>>> I am comfortable voting +1 on this release in the sense that it fixes some 
>>> of the issues with 1.8.0. It is unfortunate that we cannot test it on the 
>>> Airbnb production for now and we should definitely invest in increasing 
>>> testing coverage, but some of the fixes are needed for ease of use/adoption 
>>> (See for instance AIRFLOW-832), and this release is a step in the right 
>>> direction.
>>> 
>>> Best,
>>> Arthur
>>> 
>>> On Wed, May 3, 2017 at 10:30 AM, Dan Davydov <dan.davy...@airbnb.com 
>>> <mailto:dan.davy...@airbnb.com>> wrote:
>>> I'm not comf

Re: last task in the dag is not running

2017-05-03 Thread Bolke de Bruin
Hi Dmitry,

Please provide more information, such as logs and the DAG definition itself. 
This is very little to go on unfortunately.

Bolke

> On 3 May 2017, at 10:22, Dmitry Smirnov  wrote:
> 
> Hi everyone,
> 
> I'm using Airflow version 1.8.0, just upgraded from 1.7.1.3. The issue that
> I'm going to describe started already in 1.7.1.3, I upgraded hoping it
> might help resolve it.
> 
> I have several DAGs for which the *last* task is not moving from queued to
> running.
> These DAGs used to run fine some time ago, but then we had issues with
> rabbitmq cluster we use, and after resetting it up, the problem emerged.
> I'm pretty sure the queue is working fine, since all the tasks except the
> very last one are queued automatically and run fine.
> For the sake of testing, I added a copy of the last task to the DAG, and
> interestingly, the task that used to be the last and did not run, now
> started to run normally, but the new last task is stuck.
> I checked logs at the DEBUG level and I could see that scheduler queues the
> tasks, but those tasks don't show up in the Celery/Flower dashboard in the
> corresponding queue.
> When I run the task that is stuck from the webserver interface, they show
> up in the queue in Flower dashboard and run successfully.
> So, overall, it seems that the issue is present with the scheduler but not
> with webserver, and that this issue is only related to the very last task
> in the DAG.
> I'm really stuck now, I would welcome any suggestions / ideas on what can
> be done.
> 
> Thank you in advance!
> BR, Dima
> 
> -- 
> 
> Dmitry Smirnov (MSc.)
> Data Engineer @ Yousician
> mobile: +358 50 3015072



Re: Force DAGs run up to the last task

2017-04-28 Thread Bolke de Bruin
Or use depends on past?

Sent from my iPhone

> On 28 Apr 2017, at 12:50, Jeremiah Lowin  wrote:
> 
> Hi David -- you'll want to set the concurrency parameter of your DAG to 1.
> 
> J
> 
>> On Fri, Apr 28, 2017 at 4:12 AM David Batista  wrote:
>> 
>> Hello everyone,
>> 
>> is there a simple way to tell Airflow to only start running another DAG
>> when all the tasks for the current running DAG are completed? i.e., when a
>> DAG is triggered Airflow first runs all the tasks for that DAG, and only
>> then picks up another DAG to run.
>> 
>> 
>> --
>> *David Batista* *Data Engineer**, HelloFresh Global*
>> Saarbrücker Str. 37a | 10405 Berlin
>> d...@hellofresh.com 
>> 
>> --
>> 
>> [image: logo]
>>     <
>> http://twitter.com/HelloFreshde>
>>      
>> <
>> https://app.adjust.com/ayje08?campaign=Hellofresh_link=hellofresh%3A%2F%2F_deep_link=https%3A%2F%2Fwww.hellofresh.com%2Fapp%2F%3Futm_medium%3Demail%26utm_source%3Demail_signature=https%3A%2F%2Fwww.hellofresh.com%2Fapp%2F%3Futm_medium%3Demail%26utm_source%3Demail_signature
>>> 
>> 
>> *HelloFresh App –Download Now!*
>> <
>> https://app.adjust.com/ayje08?campaign=Hellofresh_link=hellofresh%3A%2F%2F_deep_link=https%3A%2F%2Fwww.hellofresh.com%2Fapp%2F%3Futm_medium%3Demail%26utm_source%3Demail_signature=https%3A%2F%2Fwww.hellofresh.com%2Fapp%2F%3Futm_medium%3Demail%26utm_source%3Demail_signature
>>> 
>> *We're active in:*
>> US <
>> https://www.hellofresh.com/?utm_medium=email_source=email_signature>
>> |  DE
>>  |
>> UK
>> >> 
>> |  NL
>>  |
>> AU
>> <
>> https://www.hellofresh.com.au/?utm_medium=email_source=email_signature
>>> 
>> |  BE
>>  |
>> AT >> 
>> |  CH
>>  |
>> CA >> 
>> 
>> www.HelloFreshGroup.com
>> <
>> http://www.hellofreshgroup.com/?utm_medium=email_source=email_signature
>>> 
>> 
>> We are hiring around the world – Click here to join us
>> <
>> https://www.hellofresh.com/jobs/?utm_medium=email_source=email_signature
>>> 
>> 
>> --
>> 
>> <
>> https://www.hellofresh.com/jobs/?utm_medium=email_source=email_signature
>>> 
>> HelloFresh SE, Berlin (Sitz der Gesellschaft) | Vorstände: Dominik S.
>> Richter (Vorsitzender), Thomas W. Griesel, Christian Gärtner | Vorsitzender
>> des Aufsichtsrats: Jeffrey Lieberman | Eingetragen beim Amtsgericht
>> Charlottenburg, HRB 182382 B | USt-Id Nr.: DE 302210417
>> 
>> *CONFIDENTIALITY NOTICE:* This message (including any attachments) is
>> confidential and may be privileged. It may be read, copied and used only by
>> the intended recipient. If you have received it in error please contact the
>> sender (by return e-mail) immediately and delete this message. Any
>> unauthorized use or dissemination of this message in whole or in parts is
>> strictly prohibited.
>> 


Re: dag file processing times

2017-04-25 Thread Bolke de Bruin
We could of course write a module loader that takes care of the caching and 
maybe even the manifest. This would help with versioning and could look a bit 
like the java class loader (by separating the imported modules or making sure 
we always load the modules when loading dags). Didn’t think about repercussions 
so there might be severe cons. Please note that I don’t think the multiprocess 
processor solves the sys.modules issue entirely: cached modules in the parent 
will still be there, so any dependencies the airflow scheduler itself brings in 
will be in the processor. It is probably enough in 99% of the circumstances 
though.

On the config issue I don’t entirely agree. If you have a config that is 
available outside your dag, this will still be loaded if you do not use 
serialisation. Strengthening my case for just sending DAGs (and if needed 
dependencies) around and not use pickling/serialization (btw the on the wire 
format of marshmallow is json).

Bolke.


> On 25 Apr 2017, at 01:09, Maxime Beauchemin <maximebeauche...@gmail.com> 
> wrote:
> 
> With configuration as code, you can't really know whether the DAG
> definition has changed based on whether the module was altered. This python
> module could be importing other modules that have been changed, could have
> read a config file somewhere on the drive that might have changed, or read
> from a DB that is constantly getting mutated.
> 
> There are also issues around the fact that Python caches modules in
> `sys.modules`, so even though the crawler is re-interpreting modules,
> imported modules wouldn't get re-interpreted [as our DAG authors expected]
> 
> For these reasons [and others I won't get into here], we decided that the
> scheduler would use a subprocess pool and re-interpret the DAGs from
> scratch at every cycle, insulating the different DAGs and guaranteeing no
> interpreter caching.
> 
> Side note: yaml parsing is much more expensive than other markup languages
> and would recommend working around it to store DAG configuration. Our
> longest-to-parse DAGs at Airbnb were reading yaml to build build a DAG, and
> I believe someone wrote custom logic to avoid reparsing the yaml at every
> cycle. Parsing equivalent json or hocon was an order of magnitude faster.
> 
> Max
> 
> On Mon, Apr 24, 2017 at 2:55 PM, Bolke de Bruin <bdbr...@gmail.com> wrote:
> 
>> Inotify can work without a daemon. Just fire a call to the API when a file
>> changes. Just a few lines in bash.
>> 
>> If you bundle you dependencies in a zip you should be fine with the above.
>> Or if we start using manifests that list the files that are needed in a
>> dag...
>> 
>> 
>> Sent from my iPhone
>> 
>>> On 24 Apr 2017, at 22:46, Dan Davydov <dan.davy...@airbnb.com.INVALID>
>> wrote:
>>> 
>>> One idea to solve this is to use a daemon that uses inotify to watch for
>>> changes in files and then reprocesses just those files. The hard part is
>>> without any kind of dependency/build system for DAGs it can be hard to
>> tell
>>> which DAGs depend on which files.
>>> 
>>> On Mon, Apr 24, 2017 at 1:21 PM, Gerard Toonstra <gtoons...@gmail.com>
>>> wrote:
>>> 
>>>> Hey,
>>>> 
>>>> I've seen some people complain about DAG file processing times. An issue
>>>> was raised about this today:
>>>> 
>>>> https://issues.apache.org/jira/browse/AIRFLOW-1139
>>>> 
>>>> I attempted to provide a good explanation what's going on. Feel free to
>>>> validate and comment.
>>>> 
>>>> 
>>>> I'm noticing that the file processor is a bit naive in the way it
>>>> reprocesses DAGs. It doesn't look at the DAG interval for example, so it
>>>> looks like it reprocesses all files continuously in one big batch, even
>> if
>>>> we can determine that the next "schedule"  for all its dags are in the
>>>> future?
>>>> 
>>>> 
>>>> Wondering if a change in the DagFileProcessingManager could optimize
>> things
>>>> a bit here.
>>>> 
>>>> In the part where it gets the simple_dags from a file it's currently
>>>> processing:
>>>> 
>>>>   for simple_dag in processor.result:
>>>>   simple_dags.append(simple_dag)
>>>> 
>>>> the file_path is in the context and the simple_dags should be able to
>>>> provide the next interval date for each dag in the file.
>>>> 
>>>> The idea is to add files to a sorted deque by "next_schedule_datetime"
>> (the
>>>> minimum next interval date), so that when we build the list
>>>> "files_paths_to_queue", it can remove files that have dags that we know
>>>> won't have a new dagrun for a while.
>>>> 
>>>> One gotcha to resolve after that is to deal with files getting updated
>> with
>>>> new dags or changed dag definitions and renames and different interval
>>>> schedules.
>>>> 
>>>> Worth a PR to glance over?
>>>> 
>>>> Rgds,
>>>> 
>>>> Gerard
>>>> 
>> 



Re: dag file processing times

2017-04-24 Thread Bolke de Bruin
That would be close to serialization which you could do with marshmallow (which 
works better than pickle). 

B. 

Sent from my iPhone

> On 25 Apr 2017, at 00:07, Alex Guziel <alex.guz...@airbnb.com.INVALID> wrote:
> 
> You can also use reflection in Python to read the modules all the way down.
> 
> On Mon, Apr 24, 2017 at 3:05 PM, Dan Davydov <dan.davy...@airbnb.com.invalid
>> wrote:
> 
>> Was talking with Alex about the DB case offline, for those we could support
>> a force refresh arg with an interval param.
>> 
>> Manifests would need to be hierarchal but I feel like it would spin out
>> into a full blown build system inevitably.
>> 
>> On Mon, Apr 24, 2017 at 3:02 PM, Arthur Wiedmer <arthur.wied...@gmail.com>
>> wrote:
>> 
>>> What if the DAG actually depends on configuration that only exists in a
>>> database and is retrieved by the Python code generating the DAG?
>>> 
>>> Just asking because we have this case in production here. It is slowly
>>> changing, so still fits within the Airflow framework, but you cannot just
>>> watch a file...
>>> 
>>> Best,
>>> Arthur
>>> 
>>> On Mon, Apr 24, 2017 at 2:55 PM, Bolke de Bruin <bdbr...@gmail.com>
>> wrote:
>>> 
>>>> Inotify can work without a daemon. Just fire a call to the API when a
>>> file
>>>> changes. Just a few lines in bash.
>>>> 
>>>> If you bundle you dependencies in a zip you should be fine with the
>>> above.
>>>> Or if we start using manifests that list the files that are needed in a
>>>> dag...
>>>> 
>>>> 
>>>> Sent from my iPhone
>>>> 
>>>>> On 24 Apr 2017, at 22:46, Dan Davydov <dan.davy...@airbnb.com.
>> INVALID>
>>>> wrote:
>>>>> 
>>>>> One idea to solve this is to use a daemon that uses inotify to watch
>>> for
>>>>> changes in files and then reprocesses just those files. The hard part
>>> is
>>>>> without any kind of dependency/build system for DAGs it can be hard
>> to
>>>> tell
>>>>> which DAGs depend on which files.
>>>>> 
>>>>> On Mon, Apr 24, 2017 at 1:21 PM, Gerard Toonstra <
>> gtoons...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> Hey,
>>>>>> 
>>>>>> I've seen some people complain about DAG file processing times. An
>>> issue
>>>>>> was raised about this today:
>>>>>> 
>>>>>> https://issues.apache.org/jira/browse/AIRFLOW-1139
>>>>>> 
>>>>>> I attempted to provide a good explanation what's going on. Feel free
>>> to
>>>>>> validate and comment.
>>>>>> 
>>>>>> 
>>>>>> I'm noticing that the file processor is a bit naive in the way it
>>>>>> reprocesses DAGs. It doesn't look at the DAG interval for example,
>> so
>>> it
>>>>>> looks like it reprocesses all files continuously in one big batch,
>>> even
>>>> if
>>>>>> we can determine that the next "schedule"  for all its dags are in
>> the
>>>>>> future?
>>>>>> 
>>>>>> 
>>>>>> Wondering if a change in the DagFileProcessingManager could optimize
>>>> things
>>>>>> a bit here.
>>>>>> 
>>>>>> In the part where it gets the simple_dags from a file it's currently
>>>>>> processing:
>>>>>> 
>>>>>>   for simple_dag in processor.result:
>>>>>>   simple_dags.append(simple_dag)
>>>>>> 
>>>>>> the file_path is in the context and the simple_dags should be able
>> to
>>>>>> provide the next interval date for each dag in the file.
>>>>>> 
>>>>>> The idea is to add files to a sorted deque by
>> "next_schedule_datetime"
>>>> (the
>>>>>> minimum next interval date), so that when we build the list
>>>>>> "files_paths_to_queue", it can remove files that have dags that we
>>> know
>>>>>> won't have a new dagrun for a while.
>>>>>> 
>>>>>> One gotcha to resolve after that is to deal with files getting
>> updated
>>>> with
>>>>>> new dags or changed dag definitions and renames and different
>> interval
>>>>>> schedules.
>>>>>> 
>>>>>> Worth a PR to glance over?
>>>>>> 
>>>>>> Rgds,
>>>>>> 
>>>>>> Gerard
>>>>>> 
>>>> 
>>> 
>> 


Re: dag file processing times

2017-04-24 Thread Bolke de Bruin
Inotify can work without a daemon. Just fire a call to the API when a file 
changes. Just a few lines in bash.

If you bundle you dependencies in a zip you should be fine with the above. Or 
if we start using manifests that list the files that are needed in a dag... 


Sent from my iPhone

> On 24 Apr 2017, at 22:46, Dan Davydov  wrote:
> 
> One idea to solve this is to use a daemon that uses inotify to watch for
> changes in files and then reprocesses just those files. The hard part is
> without any kind of dependency/build system for DAGs it can be hard to tell
> which DAGs depend on which files.
> 
> On Mon, Apr 24, 2017 at 1:21 PM, Gerard Toonstra 
> wrote:
> 
>> Hey,
>> 
>> I've seen some people complain about DAG file processing times. An issue
>> was raised about this today:
>> 
>> https://issues.apache.org/jira/browse/AIRFLOW-1139
>> 
>> I attempted to provide a good explanation what's going on. Feel free to
>> validate and comment.
>> 
>> 
>> I'm noticing that the file processor is a bit naive in the way it
>> reprocesses DAGs. It doesn't look at the DAG interval for example, so it
>> looks like it reprocesses all files continuously in one big batch, even if
>> we can determine that the next "schedule"  for all its dags are in the
>> future?
>> 
>> 
>> Wondering if a change in the DagFileProcessingManager could optimize things
>> a bit here.
>> 
>> In the part where it gets the simple_dags from a file it's currently
>> processing:
>> 
>>for simple_dag in processor.result:
>>simple_dags.append(simple_dag)
>> 
>> the file_path is in the context and the simple_dags should be able to
>> provide the next interval date for each dag in the file.
>> 
>> The idea is to add files to a sorted deque by "next_schedule_datetime" (the
>> minimum next interval date), so that when we build the list
>> "files_paths_to_queue", it can remove files that have dags that we know
>> won't have a new dagrun for a while.
>> 
>> One gotcha to resolve after that is to deal with files getting updated with
>> new dags or changed dag definitions and renames and different interval
>> schedules.
>> 
>> Worth a PR to glance over?
>> 
>> Rgds,
>> 
>> Gerard
>> 


Re: [VOTE] Release Airflow 1.8.1 based on Airflow 1.8.1 RC0

2017-04-24 Thread Bolke de Bruin

> On 23 Apr 2017, at 09:17, Bolke de Bruin <bdbr...@gmail.com> wrote:
> 
> 
> 
> Sent from my iPhone
> 
>> On 23 Apr 2017, at 03:46, Hitesh Shah <hit...@apache.org> wrote:
>> 
>> On Fri, Apr 21, 2017 at 8:19 AM, Chris Riccomini <criccom...@apache.org>
>> wrote:
>> 
>>> 
>>>> Version in pkg-info has an rc0 notation. It should just be
>>> 1.8.1-incubating.
>>> 
>>> This is a bit tricky to do with Python builds. I don't really want to keep
>>> building RCs with the exact same version number. We bake these RCs in real
>>> environments, so we need to version them with something that distinguishes
>>> one from another. Once we set the version, that propagates into the
>>> pkg-info. The plan is to rebuild the final RC that passes without the rc
>>> notation, so the release doesn't contain it.
>>> 
>>> 
>> I understand the rationale but this means that there is a potential
>> difference in what is being voted upon and what is eventually being
>> published as a release.
>> 
>> thanks
>> -- Hitesh
> 
> Hi Hitesh,
> 
> This is a chicken and egg problem. If we put a non release 1.8.1 online users 
> will download  and install it. An update to this package will not trigger an 
> upgrade on the user's side and it is hard to recognize (one will need to 
> compare signature). It puts them at risk. 
> 
> Do you know how other python projects solved this? I will reach out to the 
> libcloud guys and ask them how they did it (also python). 
> 
> Bolke

On further research I see that Apache Beam also does an update to the release 
package after voting:

See:
https://github.com/apache/beam/compare/v0.6.0-RC2...release-0.6.0

Please note that there is NO VOTE on 0.6.0, but there is on RC2:
https://lists.apache.org/list.html?d...@beam.apache.org:lte=1y:vote

Libcloud votes twice, although they seem to have a different understanding of a 
RC:
https://lists.apache.org/thread.html/fb9ea1b77daf2abf34e386ede5f897d0d1b1a78921cb99717cef41a9@%3Cdev.libcloud.apache.org%3E
https://lists.apache.org/thread.html/b6eb8199dca753b0020b0427c248c0494a4d45611bbab9885658f7d9@%3Cdev.libcloud.apache.org%3E

Bolke.












Re: [VOTE] Release Airflow 1.8.1 based on Airflow 1.8.1 RC0

2017-04-23 Thread Bolke de Bruin


Sent from my iPhone

> On 23 Apr 2017, at 03:46, Hitesh Shah  wrote:
> 
> On Fri, Apr 21, 2017 at 8:19 AM, Chris Riccomini 
> wrote:
> 
>> 
>>> Version in pkg-info has an rc0 notation. It should just be
>> 1.8.1-incubating.
>> 
>> This is a bit tricky to do with Python builds. I don't really want to keep
>> building RCs with the exact same version number. We bake these RCs in real
>> environments, so we need to version them with something that distinguishes
>> one from another. Once we set the version, that propagates into the
>> pkg-info. The plan is to rebuild the final RC that passes without the rc
>> notation, so the release doesn't contain it.
>> 
>> 
> I understand the rationale but this means that there is a potential
> difference in what is being voted upon and what is eventually being
> published as a release.
> 
> thanks
> -- Hitesh

Hi Hitesh,

This is a chicken and egg problem. If we put a non release 1.8.1 online users 
will download  and install it. An update to this package will not trigger an 
upgrade on the user's side and it is hard to recognize (one will need to 
compare signature). It puts them at risk. 

Do you know how other python projects solved this? I will reach out to the 
libcloud guys and ask them how they did it (also python). 

Bolke

Re: issue fetching master repo

2017-04-20 Thread Bolke de Bruin
Hi Boris,

To be honest this is not an airflow question, but a git question.

If you havent made any changes to the code, why don’t you delete the test 
folder and cone again?

B.

> On 20 Apr 2017, at 13:42, Boris Tyukin  wrote:
> 
> I just did this
> 
> $ git clone g...@github.com:apache/incubator-airflow.git test
> $ cd test
> $ git status
> 
> and getting this right away -
> # On branch master
> # Changed but not updated:
> #   (use "git add ..." to update what will be committed)
> #   (use "git checkout -- ..." to discard changes in working
> directory)
> #
> # modified:   airflow/www/static/nv.d3.js
> 
> but I did not touch that file. I cannot do rebase or commit:
> 
> cannot rebase: you have unstaged changes
> D airflow/www/static/nv.d3.js
> 
> 
> This is really weird, please help
> 
> 
> 
> 
> On Wed, Apr 19, 2017 at 11:19 PM, Boris Tyukin 
> wrote:
> 
>> hey guys,
>> 
>> want to submit my first tiny PR and once I fork airflow and clone my repo
>> get this message below:
>> 
>> I cannot commit / rebase and I cannot find a way to remove this file. Is
>> it only my who has this issue?
>> 
>> git status
>> # On branch master
>> # Changed but not updated:
>> #   (use "git add ..." to update what will be committed)
>> #   (use "git checkout -- ..." to discard changes in working
>> directory)
>> #
>> # modified:   airflow/www/static/nv.d3.js
>> 
>> 



Re: [VOTE] Release Airflow 1.8.1 based on Airflow 1.8.1 RC0

2017-04-18 Thread Bolke de Bruin
Hey Alex,

I agree with you that they are nice to have, but as you mentioned they are not 
blockers. As we are moving towards time based releases I suggest marking them 
for 1.8.2 and cherry-picking them in your production. 

- Bolke.

> On 18 Apr 2017, at 00:02, Alex Guziel  wrote:
> 
> Sorry about that. FWIW, these were recent and I don't think they were
> blockers but are nice to fix. Particularly, the tree one was forgotten
> about. I remember seeing it at the Airflow hackathon but I guess I forgot
> to correct it.
> 
> On Mon, Apr 17, 2017 at 12:17 PM, Chris Riccomini 
> wrote:
> 
>> :(:(:( Why was this not included in 1.8.1 JIRA? I've been emailing the list
>> all last week
>> 
>> On Mon, Apr 17, 2017 at 11:28 AM, Alex Guziel <
>> alex.guz...@airbnb.com.invalid> wrote:
>> 
>>> I would say to include [1074] (
>>> https://github.com/apache/incubator-airflow/pull/2221) so we don't have
>> a
>>> regression in the release after. I would also say
>>> https://github.com/apache/incubator-airflow/pull/2241 is semi important
>>> but
>>> less so.
>>> 
>>> On Mon, Apr 17, 2017 at 11:24 AM, Chris Riccomini >> 
>>> wrote:
>>> 
 Dear All,
 
 I have been able to make the Airflow 1.8.1 RC0 available at:
 https://dist.apache.org/repos/dist/dev/incubator/airflow, public keys
>>> are
 available at https://dist.apache.org/repos/
>>> dist/release/incubator/airflow.
 
 Issues fixed:
 
 [AIRFLOW-1062] DagRun#find returns wrong result if external_trigg
 [AIRFLOW-1054] Fix broken import on test_dag
 [AIRFLOW-1050] Retries ignored - regression
 [AIRFLOW-1033] TypeError: can't compare datetime.datetime to None
 [AIRFLOW-1030] HttpHook error when creating HttpSensor
 [AIRFLOW-1017] get_task_instance should return None instead of th
 [AIRFLOW-1011] Fix bug in BackfillJob._execute() for SubDAGs
 [AIRFLOW-1001] Landing Time shows "unsupported operand type(s) fo
 [AIRFLOW-1000] Rebrand to Apache Airflow instead of Airflow
 [AIRFLOW-989] Clear Task Regression
 [AIRFLOW-974] airflow.util.file mkdir has a race condition
 [AIRFLOW-906] Update Code icon from lightning bolt to file
 [AIRFLOW-858] Configurable database name for DB operators
 [AIRFLOW-853] ssh_execute_operator.py stdout decode default to A
 [AIRFLOW-832] Fix debug server
 [AIRFLOW-817] Trigger dag fails when using CLI + API
 [AIRFLOW-816] Make sure to pull nvd3 from local resources
 [AIRFLOW-815] Add previous/next execution dates to available def
 [AIRFLOW-813] Fix unterminated unit tests in tests.job (tests/jo
 [AIRFLOW-812] Scheduler job terminates when there is no dag file
 [AIRFLOW-806] UI should properly ignore DAG doc when it is None
 [AIRFLOW-794] Consistent access to DAGS_FOLDER and SQL_ALCHEMY_C
 [AIRFLOW-785] ImportError if cgroupspy is not installed
 [AIRFLOW-784] Cannot install with funcsigs > 1.0.0
 [AIRFLOW-780] The UI no longer shows broken DAGs
 [AIRFLOW-777] dag_is_running is initlialized to True instead of
 [AIRFLOW-719] Skipped operations make DAG finish prematurely
 [AIRFLOW-694] Empty env vars do not overwrite non-empty config v
 [AIRFLOW-139] Executing VACUUM with PostgresOperator
 [AIRFLOW-111] DAG concurrency is not honored
 [AIRFLOW-88] Improve clarity Travis CI reports
 
 I would like to raise a VOTE for releasing 1.8.1 based on release
>>> candidate
 0, i.e. just renaming release candidate 0 to 1.8.1 release.
 
 Please respond to this email by:
 
 +1,0,-1 with *binding* if you are a PMC member or *non-binding* if you
>>> are
 not.
 
 Vote will run for 72 hours (ends this Thursday).
 
 Thanks!
 Chris
 
 My VOTE: +1 (binding)
 
>>> 
>> 



Re: 1.8.1 release update

2017-04-07 Thread Bolke de Bruin
Agree. 

Airflow-1000 can be merged. I think you want to put a big notice somewhere. 

Also some jiras still need to be cherry-picked into 1.8.1 and some will create 
conflicts. 

Bolke

Sent from my iPhone

> On 7 Apr 2017, at 20:16, Chris Riccomini  wrote:
> 
> Hey all,
> 
> I'm still targeting to cut 1.8.1 RC0 for April 17. Doing a checkup to see
> where we stand right now. There's about one week left.
> 
> We look in pretty good shape. There are two blocker bugs left:
> 
> AIRFLOW-1055
> https://issues.apache.org/jira/browse/AIRFLOW-1055
> (no PR)
> I'm going to drop this to critical instead of blocker.
> 
> AIRFLOW-1000
> https://issues.apache.org/jira/browse/AIRFLOW-1000
> https://github.com/apache/incubator-airflow/pull/2172
> @bolke, Can we merge AIRFLOW-1000? Dan gave a LGTM on it.
> 
> Cheers,
> Chris
> 
> On Mon, Apr 3, 2017 at 3:12 PM, Chris Riccomini 
> wrote:
> 
>> Another blocker was added:
>> 
>>  AIRFLOW-1011 https://github.com/apache/incubator-airflow/pull/2179
>> 
>> On Mon, Apr 3, 2017 at 1:29 PM, Chris Riccomini 
>> wrote:
>> 
>>> Hey all,
>>> 
>>> Here's the status of the 1.8.1 release:
>>> 
>>> * Targeting April 17 RC1 vote
>>> * There are five blocker bugs
>>> * We will probably remove AIRFLOW-1055 and AIRFLOW-1019 from the blocker
>>> list for 1.8.1
>>> 
>>> The list of open 1.8.1 issues is here:
>>> 
>>> https://issues.apache.org/jira/issues/?jql=project%20%3D%
>>> 20AIRFLOW%20AND%20status%20in%20(Open%2C%20%22In%20Progress%
>>> 22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.8.1
>>> 
>>> The following blocker bugs have PRs up, and need reviewing:
>>> 
>>>  AIRFLOW-1013 https://github.com/apache/incubator-airflow/pull/2203
>>>  AIRFLOW-1000 https://github.com/apache/incubator-airflow/pull/2172
>>>  AIRFLOW-1001 https://github.com/apache/incubator-airflow/pull/2213
>>> 
>>> PLEASE have a look at this, so we can get them merged.
>>> 
>>> Cheers,
>>> Chris
>>> 
>> 
>> 


Re: PTAL: Airflow 2017 April Podling Report

2017-04-05 Thread Bolke de Bruin
Lgtm2 :-)

Sent from my iPhone

> On 5 Apr 2017, at 21:38, Chris Riccomini  wrote:
> 
> LGTM! Thanks!
> 
> On Wed, Apr 5, 2017 at 11:57 AM, Gurer Kiratli <
> gurer.kira...@airbnb.com.invalid> wrote:
> 
>> Hi folks,
>> 
>> Here is the draft of the podling report. Please take a look and comment. If
>> it looks good one of the committers have to post this on this on the wiki
>> today!
>> 
>> Cheers,
>> 
>> Gurer
>> 
>> 
>> 
 
>> 
>> Airflow
>> 
>> Airflow is a workflow automation and scheduling system that can be used to
>> author and manage data pipelines.
>> 
>> Airflow has been incubating since 2016-03-31.
>> 
>> Three most important issues to address in the move towards graduation:
>> 
>>  1. We will have the 1.8.1 release soon, then we are looking to graduate.
>>  2.
>>  3.
>> 
>> Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
>> aware of?
>> 
>>  None
>> 
>> How has the community developed since the last report?
>> 
>>  1. We had our first official release. 1.8.0 on March 19th 2017.
>>  2. We elected 1 new PPMC Member/Committer: Alex Guziel (a.k.a saguziel)
>>  3. Since our last podling report 3 months ago (i.e. between Jan 1 and Mar
>> 31, inclusive), we grew our contributors from 224 to 256
>>  4. Since our last podling report 3 months ago (i.e. between Jan 1 and Mar
>> 31, inclusive), we resolved 216 pull requests (currently at 1479
>> closed
>> PRs)
>>  5. Two meet-ups, one in New York, NY hosted by Blue Apron and one in San
>> Jose, CA hosted by PayPal were held by the community.
>>  6. Since being accepted into the incubator, the number of companies
>> officially using Apache Airflow has risen from 30 to 83.
>> 
>> How has the project developed since the last report?
>> 
>>  See above
>> 
>> Date of last release:
>> 
>>  March 19th, 2017
>> 
>> When were the last committers or PPMC members elected?
>> 
>>  As mentioned on
>> 
>> https://cwiki.apache.org/confluence/display/AIRFLOW/
>> Announcements#Announcements-Mar14,2017
>>  Alex Guziel joined the Apache Airflow PPMC/Committer group.
>> 
>> Signed-off-by:
>> 
>>  [](airflow) Chris Nauroth
>>  [](airflow) Hitesh Shah
>>  [ ](airflow) Jakob Homan
>> 


Re: Podling Report Reminder - April 2017

2017-04-05 Thread Bolke de Bruin
:-)

If you can please include our intention to graduate after the 1.8.1 release (no 
more reports ;-).

B.

> On 5 Apr 2017, at 19:32, Gurer Kiratli  
> wrote:
> 
> I will do it. I will work on it today as it seems like today is the last
> day to do this.
> 
> On Wed, Apr 5, 2017 at 10:29 AM, Chris Riccomini 
> wrote:
> 
>> Can someone please volunteer to update? Should take ~5 minutes.
>> 
>> On Tue, Apr 4, 2017 at 8:16 PM,  wrote:
>> 
>>> Dear podling,
>>> 
>>> This email was sent by an automated system on behalf of the Apache
>>> Incubator PMC. It is an initial reminder to give you plenty of time to
>>> prepare your quarterly board report.
>>> 
>>> The board meeting is scheduled for Wed, 19 April 2017, 10:30 am PDT.
>>> The report for your podling will form a part of the Incubator PMC
>>> report. The Incubator PMC requires your report to be submitted 2 weeks
>>> before the board meeting, to allow sufficient time for review and
>>> submission (Wed, April 05).
>>> 
>>> Please submit your report with sufficient time to allow the Incubator
>>> PMC, and subsequently board members to review and digest. Again, the
>>> very latest you should submit your report is 2 weeks prior to the board
>>> meeting.
>>> 
>>> Thanks,
>>> 
>>> The Apache Incubator PMC
>>> 
>>> Submitting your Report
>>> 
>>> --
>>> 
>>> Your report should contain the following:
>>> 
>>> *   Your project name
>>> *   A brief description of your project, which assumes no knowledge of
>>>the project or necessarily of its field
>>> *   A list of the three most important issues to address in the move
>>>towards graduation.
>>> *   Any issues that the Incubator PMC or ASF Board might wish/need to be
>>>aware of
>>> *   How has the community developed since the last report
>>> *   How has the project developed since the last report.
>>> *   How does the podling rate their own maturity.
>>> 
>>> This should be appended to the Incubator Wiki page at:
>>> 
>>> https://wiki.apache.org/incubator/April2017
>>> 
>>> Note: This is manually populated. You may need to wait a little before
>>> this page is created from a template.
>>> 
>>> Mentors
>>> ---
>>> 
>>> Mentors should review reports for their project(s) and sign them off on
>>> the Incubator wiki page. Signing off reports shows that you are
>>> following the project - projects that are not signed may raise alarms
>>> for the Incubator PMC.
>>> 
>>> Incubator PMC
>>> 
>> 



Re: Google Summer of Code in Apache Airflow

2017-04-04 Thread Bolke de Bruin
Hey Jakub,

Did you make any progress on this? Do you need any help/advice/assistance?

- Bolke

> On 8 Mar 2017, at 20:09, Jakub Powierza  wrote:
> 
> Hi Gerard,
> Thanks for your reply! I was thinking about contribution connected with the 
> new DAGs UI or preparing REST API for Command Line Interface. These topics 
> seems to be reasonable for GSoC (probably not 100% of these tasks can be 
> completed in three/four months but maybe some parts/tasks?).
> 
> I'll ask for more information on Gitter or in another email :)
> 
> Thanks,
> Jakub Powierza
> 
>> On 7 Mar 2017, at 14:19, Gerard Toonstra  wrote:
>> 
>> Hi Jakub,
>> 
>> Thanks for considering airflow as a GSoc project. You mentioned you looked
>> at the links on the JIRA, so you would also have seen the roadmap that was
>> established
>> for new features.  It is available here:
>> https://cwiki.apache.org/confluence/display/AIRFLOW/2017+Roadmap+Items
>> 
>> At the moment people are really busy trying to get 1.8 onto the road, which
>> would be a really cool milestone for the project.
>> 
>> Beyond the wiki and this mailing list, there's also a gitter.im chat
>> channel where you can meet some people using or developing on airflow
>> itself and ask
>> other questions.
>> 
>> One way to get started and streamline future contributions is to look at
>> coding standards and how PR's are usually resolved.
>> You can also start looking at the JIRA to look into current issues:
>> 
>> https://issues.apache.org/jira/browse/AIRFLOW-948?jql=project%20%3D%20AIRFLOW
>> 
>> What I can suggest is that you try to develop a proposal (based on the
>> roadmap?) to improve airflow and then find a champion/mentor
>> to help you get there, ideally one of the committers.
>> 
>> Best regards,
>> 
>> Gerard
>> 
>> 
>> On Thu, Mar 2, 2017 at 8:21 AM, Jakub Powierza 
>> wrote:
>> 
>>> Hi all,
>>> I am 3rd year bachelor student from Gdańsk, Poland and I would like to
>>> participate in Google Summer of Code. That would be the first time ever for
>>> me to take a part in this program :)
>>> 
>>> I am currently working at Intel Technology Poland as Graphics Software
>>> Engineer Intern. I have worked here for about 1.5 year. My primary
>>> languages that I’m using (as a day to day coder) are Python and JavaScript.
>>> I’ve been using several frameworks and technologies such as Flask,
>>> SQLAlchemy, RabbitMQ, Redis, AngularJS and many more. Recently, I have been
>>> using Apache Airflow as a primary framework and I find it very useful and
>>> promising for the future :)
>>> 
>>> Every day is another adventure, so I’m currently focusing on many other
>>> technologies that can broaden my horizons on IT. That’s why I’m taking a
>>> part in one of the biggest competitions on Kaggle’s platform - Data Science
>>> Bowl 2017. I was asked by my professor to join the team of other passionate
>>> people and together with VoiceLab (local company specialised in machine
>>> learning) try to improve lung cancer detection. That’s my first touch with
>>> machine learning and especially neural networks in Keras/TensorFlow. It
>>> changed my perspectives for the directions in which computer science heads
>>> towards :)
>>> 
>>> Each new technology/project/task gives me new knowledge and skills that (I
>>> hope) will make me a better developer in the future!
>>> 
>>> I have seen that Apache have proposed a few projects ideas for this year.
>>> But I have got a question to you. Is there a way to join you and help with
>>> this great framework? I’ve checked your Jira board and there are many ideas
>>> for features/improvements that can be a great GSoC topic :)
>>> 
>>> I hope to hear from you soon!
>>> 
>>> Thanks,
>>> Jakub Powierza
> 



Re: 1.8.1 release

2017-04-03 Thread Bolke de Bruin
Done.


> On 3 Apr 2017, at 22:22, Chris Riccomini <criccom...@apache.org> wrote:
> 
> Hey Bolke,
> 
> > Furthermore, do you have any timelines for getting to the release
> 
> Let's target April 17 to begin a vote for RC.
> 
> > Can I help in any way
> 
> I need people to review things. The more reviewers, the better.
> 
> Also, can you please reach a conclusion on whether the issues you raised 
> (AIRFLOW-1019 and AIRFLOW-1013) are blockers or not (or just change the 
> status unilaterally if people aren't responding)?
> 
> Cheers,
> Chris
> 
> On Mon, Apr 3, 2017 at 1:06 PM, Bolke de Bruin <bdbr...@gmail.com 
> <mailto:bdbr...@gmail.com>> wrote:
> I have a PR out for 
> 
> * AIRFLOW-1001: https://github.com/apache/incubator-airflow/pull/2213 
> <https://github.com/apache/incubator-airflow/pull/2213>
> * AIRFLOW-1018: https://github.com/apache/incubator-airflow/pull/2212 
> <https://github.com/apache/incubator-airflow/pull/2212>
> 
> Furthermore, Sid, I checked AIRFLOW-1053 and while it is annoying I don’t 
> think it is a blocker: it happens only with @once dags that have a SLA, 
> hardly very common. Nevertheless a fix would be nice obviously.
> 
> Bolke
> 
>> On 3 Apr 2017, at 11:05, Bolke de Bruin <bdbr...@gmail.com 
>> <mailto:bdbr...@gmail.com>> wrote:
>> 
>> Hey Chris,
>> 
>> AIRFLOW-1000 has a PR out. You might want to discuss it on the list what the 
>> impact is though.
>> AIRFLOW-1018 I’ll have a PR out for shortly that allows “stdout” for the 
>> scheduler log files. I did downgrade from blocker to critical.
>> AIRFLOW-719 Has a PR, including much needed increased test coverage, that I 
>> am pretty sure is working, but needs verification (plz @Alex)
>> 
>> I would downgrade AIRFLOW-1019 to critical - especially as Dan is not around 
>> at the moment.
>> 
>> Furthermore, do you have any timelines for getting to the release? Can I 
>> help in any way? You might want to chase a couple of times ;-).
>> 
>> Bolke
>> 
>>> On 30 Mar 2017, at 19:48, siddharth anand <san...@apache.org 
>>> <mailto:san...@apache.org>> wrote:
>>> 
>>> Chris,
>>> I've submitted PRs for :
>>> 
>>>  - PR [AIRFLOW-1013] :
>>>  https://github.com/apache/incubator-airflow/pull/2203 
>>> <https://github.com/apache/incubator-airflow/pull/2203>
>>>  - PR [AIRFLOW-1054]:
>>>  https://github.com/apache/incubator-airflow/pull/2201 
>>> <https://github.com/apache/incubator-airflow/pull/2201>
>>> 
>>> And filed a blocker for a new issue. Essentially, @once DAGs cannot be
>>> created if catchup=False :
>>> https://issues.apache.org/jira/browse/AIRFLOW-1055 
>>> <https://issues.apache.org/jira/browse/AIRFLOW-1055>
>>> 
>>> I have a PR that works for this, but will need to add unit tests for it as
>>> well as for AIRFLOW-1013.
>>> 
>>> -s
>>> 
>>> On Wed, Mar 29, 2017 at 3:24 PM, siddharth anand <san...@apache.org 
>>> <mailto:san...@apache.org>> wrote:
>>> 
>>>> Didn't realize https://issues.apache.org/jira/browse/AIRFLOW-1013 
>>>> <https://issues.apache.org/jira/browse/AIRFLOW-1013> was a
>>>> blocker. I will have a PR shortly.
>>>> -s
>>>> 
>>>> On Wed, Mar 29, 2017 at 2:07 PM, Chris Riccomini <criccom...@apache.org 
>>>> <mailto:criccom...@apache.org>>
>>>> wrote:
>>>> 
>>>>> The following three JIRAs were not merged into the v1-8-test branch, but
>>>>> are listed as part of the 1.8.1 release:
>>>>> 
>>>>> AIRFLOW-1017 b2b9587cca9195229ab107394ad94b7702c70e37
>>>>> AIRFLOW-906 bc47200711be4d2c0b36b772651dae4f5e01a204
>>>>> AIRFLOW-858 94dc7fb0a6bb3c563d9df6566cd52a59bd0c4629
>>>>> AIRFLOW-832 b0ae70d3a8e935dc9266b6853683ae5375a7390b
>>>>> 
>>>>> I'm going to merge them in now.
>>>>> 
>>>>> On Wed, Mar 29, 2017 at 1:53 PM, Chris Riccomini <criccom...@apache.org 
>>>>> <mailto:criccom...@apache.org>>
>>>>> wrote:
>>>>> 
>>>>>> Hey Bolke,
>>>>>> 
>>>>>> Great. Assuming your PR is committed, that leaves five blockers:
>>>>>> 
>>>>>> https://issues.apache.org/jira/browse/AIRFLOW-1000 
>>>>>> <https://issues.apache.org/jira/browse/AIRFLOW-1

Re: 1.8.1 release

2017-04-03 Thread Bolke de Bruin
I have a PR out for 

* AIRFLOW-1001: https://github.com/apache/incubator-airflow/pull/2213 
<https://github.com/apache/incubator-airflow/pull/2213>
* AIRFLOW-1018: https://github.com/apache/incubator-airflow/pull/2212 
<https://github.com/apache/incubator-airflow/pull/2212>

Furthermore, Sid, I checked AIRFLOW-1053 and while it is annoying I don’t think 
it is a blocker: it happens only with @once dags that have a SLA, hardly very 
common. Nevertheless a fix would be nice obviously.

Bolke

> On 3 Apr 2017, at 11:05, Bolke de Bruin <bdbr...@gmail.com> wrote:
> 
> Hey Chris,
> 
> AIRFLOW-1000 has a PR out. You might want to discuss it on the list what the 
> impact is though.
> AIRFLOW-1018 I’ll have a PR out for shortly that allows “stdout” for the 
> scheduler log files. I did downgrade from blocker to critical.
> AIRFLOW-719 Has a PR, including much needed increased test coverage, that I 
> am pretty sure is working, but needs verification (plz @Alex)
> 
> I would downgrade AIRFLOW-1019 to critical - especially as Dan is not around 
> at the moment.
> 
> Furthermore, do you have any timelines for getting to the release? Can I help 
> in any way? You might want to chase a couple of times ;-).
> 
> Bolke
> 
>> On 30 Mar 2017, at 19:48, siddharth anand <san...@apache.org> wrote:
>> 
>> Chris,
>> I've submitted PRs for :
>> 
>>  - PR [AIRFLOW-1013] :
>>  https://github.com/apache/incubator-airflow/pull/2203
>>  - PR [AIRFLOW-1054]:
>>  https://github.com/apache/incubator-airflow/pull/2201
>> 
>> And filed a blocker for a new issue. Essentially, @once DAGs cannot be
>> created if catchup=False :
>> https://issues.apache.org/jira/browse/AIRFLOW-1055
>> 
>> I have a PR that works for this, but will need to add unit tests for it as
>> well as for AIRFLOW-1013.
>> 
>> -s
>> 
>> On Wed, Mar 29, 2017 at 3:24 PM, siddharth anand <san...@apache.org> wrote:
>> 
>>> Didn't realize https://issues.apache.org/jira/browse/AIRFLOW-1013 was a
>>> blocker. I will have a PR shortly.
>>> -s
>>> 
>>> On Wed, Mar 29, 2017 at 2:07 PM, Chris Riccomini <criccom...@apache.org>
>>> wrote:
>>> 
>>>> The following three JIRAs were not merged into the v1-8-test branch, but
>>>> are listed as part of the 1.8.1 release:
>>>> 
>>>> AIRFLOW-1017 b2b9587cca9195229ab107394ad94b7702c70e37
>>>> AIRFLOW-906 bc47200711be4d2c0b36b772651dae4f5e01a204
>>>> AIRFLOW-858 94dc7fb0a6bb3c563d9df6566cd52a59bd0c4629
>>>> AIRFLOW-832 b0ae70d3a8e935dc9266b6853683ae5375a7390b
>>>> 
>>>> I'm going to merge them in now.
>>>> 
>>>> On Wed, Mar 29, 2017 at 1:53 PM, Chris Riccomini <criccom...@apache.org>
>>>> wrote:
>>>> 
>>>>> Hey Bolke,
>>>>> 
>>>>> Great. Assuming your PR is committed, that leaves five blockers:
>>>>> 
>>>>> https://issues.apache.org/jira/browse/AIRFLOW-1000
>>>>> https://issues.apache.org/jira/browse/AIRFLOW-1001
>>>>> https://issues.apache.org/jira/browse/AIRFLOW-1013
>>>>> https://issues.apache.org/jira/browse/AIRFLOW-1018
>>>>> https://issues.apache.org/jira/browse/AIRFLOW-1019
>>>>> 
>>>>> I've also got a list of all open 1.8.1 JIRAs [1].
>>>>> 
>>>>> Cheers,
>>>>> Chris
>>>>> 
>>>>> [1] https://issues.apache.org/jira/issues/?jql=project%20%
>>>>> 3D%20AIRFLOW%20AND%20status%20in%20(Open%2C%20%22In%
>>>>> 20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.8.1
>>>>> 
>>>>> On Mon, Mar 27, 2017 at 8:59 PM, Bolke de Bruin <bdbr...@gmail.com>
>>>> wrote:
>>>>> 
>>>>>> Hi Chris,
>>>>>> 
>>>>>> I have a PR out for
>>>>>> 
>>>>>> * Revert of 719, which makes 982 obsolete and removes 983 from the
>>>>>> blockers list and just a new feature.
>>>>>> 
>>>>>> See: https://github.com/apache/incubator-airflow/pull/2195 <
>>>>>> https://github.com/apache/incubator-airflow/pull/2195>
>>>>>> 
>>>>>> Cc: @alexvanboxel
>>>>>> 
>>>>>> Bolke
>>>>>> 
>>>>>>> On 24 Mar 2017, at 10:21, Chris Riccomini <criccom...@apache.org>
>>>>>> wrote:
>>>>>>> 
>>>>>>&

Re: 1.8.1 release

2017-04-03 Thread Bolke de Bruin
Hey Chris,

AIRFLOW-1000 has a PR out. You might want to discuss it on the list what the 
impact is though.
AIRFLOW-1018 I’ll have a PR out for shortly that allows “stdout” for the 
scheduler log files. I did downgrade from blocker to critical.
AIRFLOW-719 Has a PR, including much needed increased test coverage, that I am 
pretty sure is working, but needs verification (plz @Alex)

I would downgrade AIRFLOW-1019 to critical - especially as Dan is not around at 
the moment.

Furthermore, do you have any timelines for getting to the release? Can I help 
in any way? You might want to chase a couple of times ;-).

Bolke

> On 30 Mar 2017, at 19:48, siddharth anand <san...@apache.org> wrote:
> 
> Chris,
> I've submitted PRs for :
> 
>   - PR [AIRFLOW-1013] :
>   https://github.com/apache/incubator-airflow/pull/2203
>   - PR [AIRFLOW-1054]:
>   https://github.com/apache/incubator-airflow/pull/2201
> 
> And filed a blocker for a new issue. Essentially, @once DAGs cannot be
> created if catchup=False :
> https://issues.apache.org/jira/browse/AIRFLOW-1055
> 
> I have a PR that works for this, but will need to add unit tests for it as
> well as for AIRFLOW-1013.
> 
> -s
> 
> On Wed, Mar 29, 2017 at 3:24 PM, siddharth anand <san...@apache.org> wrote:
> 
>> Didn't realize https://issues.apache.org/jira/browse/AIRFLOW-1013 was a
>> blocker. I will have a PR shortly.
>> -s
>> 
>> On Wed, Mar 29, 2017 at 2:07 PM, Chris Riccomini <criccom...@apache.org>
>> wrote:
>> 
>>> The following three JIRAs were not merged into the v1-8-test branch, but
>>> are listed as part of the 1.8.1 release:
>>> 
>>> AIRFLOW-1017 b2b9587cca9195229ab107394ad94b7702c70e37
>>> AIRFLOW-906 bc47200711be4d2c0b36b772651dae4f5e01a204
>>> AIRFLOW-858 94dc7fb0a6bb3c563d9df6566cd52a59bd0c4629
>>> AIRFLOW-832 b0ae70d3a8e935dc9266b6853683ae5375a7390b
>>> 
>>> I'm going to merge them in now.
>>> 
>>> On Wed, Mar 29, 2017 at 1:53 PM, Chris Riccomini <criccom...@apache.org>
>>> wrote:
>>> 
>>>> Hey Bolke,
>>>> 
>>>> Great. Assuming your PR is committed, that leaves five blockers:
>>>> 
>>>> https://issues.apache.org/jira/browse/AIRFLOW-1000
>>>> https://issues.apache.org/jira/browse/AIRFLOW-1001
>>>> https://issues.apache.org/jira/browse/AIRFLOW-1013
>>>> https://issues.apache.org/jira/browse/AIRFLOW-1018
>>>> https://issues.apache.org/jira/browse/AIRFLOW-1019
>>>> 
>>>> I've also got a list of all open 1.8.1 JIRAs [1].
>>>> 
>>>> Cheers,
>>>> Chris
>>>> 
>>>> [1] https://issues.apache.org/jira/issues/?jql=project%20%
>>>> 3D%20AIRFLOW%20AND%20status%20in%20(Open%2C%20%22In%
>>>> 20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.8.1
>>>> 
>>>> On Mon, Mar 27, 2017 at 8:59 PM, Bolke de Bruin <bdbr...@gmail.com>
>>> wrote:
>>>> 
>>>>> Hi Chris,
>>>>> 
>>>>> I have a PR out for
>>>>> 
>>>>> * Revert of 719, which makes 982 obsolete and removes 983 from the
>>>>> blockers list and just a new feature.
>>>>> 
>>>>> See: https://github.com/apache/incubator-airflow/pull/2195 <
>>>>> https://github.com/apache/incubator-airflow/pull/2195>
>>>>> 
>>>>> Cc: @alexvanboxel
>>>>> 
>>>>> Bolke
>>>>> 
>>>>>> On 24 Mar 2017, at 10:21, Chris Riccomini <criccom...@apache.org>
>>>>> wrote:
>>>>>> 
>>>>>> Hey all,
>>>>>> 
>>>>>> I've let this thread sit for a while. Here are a list of the issues
>>> that
>>>>>> were raised:
>>>>>> 
>>>>>> BLOCKERS:
>>>>>> https://issues.apache.org/jira/browse/AIRFLOW-982
>>>>>> https://issues.apache.org/jira/browse/AIRFLOW-983
>>>>>> https://issues.apache.org/jira/browse/AIRFLOW-1019
>>>>>> https://issues.apache.org/jira/browse/AIRFLOW-1017
>>>>>> 
>>>>>> NICE TO HAVE:
>>>>>> https://issues.apache.org/jira/browse/AIRFLOW-1015
>>>>>> https://issues.apache.org/jira/browse/AIRFLOW-1013
>>>>>> https://issues.apache.org/jira/browse/AIRFLOW-1004
>>>>>> https://issues.apache.org/jira/browse/AIRFLOW-1003
>>>>>> https://issues.apache.org/jira/browse/AIRFLOW-1001
>

Re: Scheduler silently dies

2017-03-27 Thread Bolke de Bruin
 {driver.py:120} INFO - Generating grammar tables
> from /usr/lib/python2.7/lib2to3/Grammar.txt
> [2017-03-25 02:32:54,773] {driver.py:120} INFO - Generating grammar tables
> from /usr/lib/python2.7/lib2to3/PatternGrammar.txt
> Logging into:
> /opt/airflow/logs/AUTOMATOR-sensor-v2/SENSOR--task-2/2017-03-25T02:25:00
> Logging into:
> /opt/airflow/logs/AUTOMATOR-sensor-v2/SENSOR--task-3/2017-03-25T02:25:00
> Logging into:
> /opt/airflow/logs/AUTOMATOR-sensor-v2/SENSOR--task-4/2017-03-25T02:25:00
> Logging into:
> /opt/airflow/logs/AUTOMATOR-sensor-v2/SENSOR--task-5/2017-03-25T02:25:00
> Logging into:
> /opt/airflow/logs/AUTOMATOR-sensor-v2/SENSOR--task-6/2017-03-25T02:25:00
> Logging into:
> /opt/airflow/logs/AUTOMATOR-sensor-v2/SENSOR--task-1/2017-03-25T02:25:00
> 
> And that's the last log I have before restarting it.
> 
> Not sure if this is at all helpful,
> -N
> nik.hodgkin...@collectivehealth.com
> 
> 
> On Mon, Mar 27, 2017 at 12:40 PM, Gerard Toonstra <gtoons...@gmail.com>
> wrote:
> 
>> Any more info from grepping that log file?
>> 
>> G>
>> 
>> On Mon, Mar 27, 2017 at 9:26 PM, Nicholas Hodgkinson <
>> nik.hodgkin...@collectivehealth.com> wrote:
>> 
>>> from airflow.cfg:
>>> 
>>> [core]
>>> ...
>>> executor = LocalExecutor
>>> parallelism = 32
>>> dag_concurrency = 16
>>> dags_are_paused_at_creation = True
>>> non_pooled_task_slot_count = 128
>>> max_active_runs_per_dag = 16
>>> ...
>>> 
>>> Pretty much the defaults; I've never tweaked these values.
>>> 
>>> 
>>> 
>>> -N
>>> nik.hodgkin...@collectivehealth.com
>>> 
>>> On Mon, Mar 27, 2017 at 12:12 PM, Gerard Toonstra <gtoons...@gmail.com>
>>> wrote:
>>> 
>>>> So looks like the localworkers are dying. Airflow does not recover from
>>>> that.
>>>> 
>>>> 
>>>> In SchedulerJob (jobs.py), you can see the "_execute_helper"  function.
>>>> This calls "executor.start()", which is implemented
>>>> in local_executor.py in your case.
>>>> 
>>>> The LocalExecutor is thus an object owned by the SchedulerJob. This
>>>> executor creates x (parallellism) LocalWorkers,
>>>> which derive from a multiprocessing.Process class. So the processes you
>>> see
>>>> "extra" on the scheduler are those LocalWorkers
>>>> as child processes. The LocalWorkers create additional processes
>> through
>>> a
>>>> shell ("subprocess.check_call" with (shell=True)),
>>>> which are the things doing the actual work.
>>>> 
>>>> 
>>>> Before that, on my 'master' here, the LocalWorker issues a *
>>>> self.logger.info
>>>> <http://self.logger.info>("{} running {}"  *, which you can find in
>> the
>>>> general
>>>> output of the scheduler log file. When starting the scheduler with
>>> "airflow
>>>> scheduler", it's what gets printed on the console and starts
>>>> with "Starting the scheduler". That is the file you want to
>> investigate.
>>>> 
>>>> If anything bad happens with general processing, then it prints a:
>>>> 
>>>>self.logger.error("failed to execute task
>>>> {}:".format(str(e)))
>>>> 
>>>> in the exception handler. I'd grep for that "failed to execute task" in
>>> the
>>>> scheduler log file I mentioned.
>>>> 
>>>> 
>>>> I'm not sure where stdout/stderr go for these workers. If the call
>>>> basically succeeded, but there were issues with the queue handling,
>>>> then I'd expect this to go to stderr instead. I'm not 100% sure if that
>>>> gets sent to the same scheduler log file or whether that goes nowhere
>>>> because of it being a child process (they're probably inherited?).
>>>> 
>>>> 
>>>> One further question: what's your parallellism set to?  I see 22
>> zombies
>>>> left behind. Is that your setting?
>>>> 
>>>> Let us know!
>>>> 
>>>> Rgds,
>>>> 
>>>> Gerard
>>>> 
>>>> 
>>>> 
>>>> On Mon, Mar 27, 2017 at 8:13 PM, harish singh <
>> harish.sing...@gmail.com>
>>>> wrote:
>>>> 
>>>>> 1.8:  increasing DA

Re: Scheduler silently dies

2017-03-27 Thread Bolke de Bruin
Defunct children means we are not reaping them. So the "recovering" thing might 
be partially right, we probably need to build in some monitoring mechanism in 
the local executor. 

B. 

Sent from my iPhone

> On 27 Mar 2017, at 12:40, Gerard Toonstra <gtoons...@gmail.com> wrote:
> 
> Any more info from grepping that log file?
> 
> G>
> 
> On Mon, Mar 27, 2017 at 9:26 PM, Nicholas Hodgkinson <
> nik.hodgkin...@collectivehealth.com> wrote:
> 
>> from airflow.cfg:
>> 
>> [core]
>> ...
>> executor = LocalExecutor
>> parallelism = 32
>> dag_concurrency = 16
>> dags_are_paused_at_creation = True
>> non_pooled_task_slot_count = 128
>> max_active_runs_per_dag = 16
>> ...
>> 
>> Pretty much the defaults; I've never tweaked these values.
>> 
>> 
>> 
>> -N
>> nik.hodgkin...@collectivehealth.com
>> 
>> On Mon, Mar 27, 2017 at 12:12 PM, Gerard Toonstra <gtoons...@gmail.com>
>> wrote:
>> 
>>> So looks like the localworkers are dying. Airflow does not recover from
>>> that.
>>> 
>>> 
>>> In SchedulerJob (jobs.py), you can see the "_execute_helper"  function.
>>> This calls "executor.start()", which is implemented
>>> in local_executor.py in your case.
>>> 
>>> The LocalExecutor is thus an object owned by the SchedulerJob. This
>>> executor creates x (parallellism) LocalWorkers,
>>> which derive from a multiprocessing.Process class. So the processes you
>> see
>>> "extra" on the scheduler are those LocalWorkers
>>> as child processes. The LocalWorkers create additional processes through
>> a
>>> shell ("subprocess.check_call" with (shell=True)),
>>> which are the things doing the actual work.
>>> 
>>> 
>>> Before that, on my 'master' here, the LocalWorker issues a *
>>> self.logger.info
>>> <http://self.logger.info>("{} running {}"  *, which you can find in the
>>> general
>>> output of the scheduler log file. When starting the scheduler with
>> "airflow
>>> scheduler", it's what gets printed on the console and starts
>>> with "Starting the scheduler". That is the file you want to investigate.
>>> 
>>> If anything bad happens with general processing, then it prints a:
>>> 
>>>self.logger.error("failed to execute task
>>> {}:".format(str(e)))
>>> 
>>> in the exception handler. I'd grep for that "failed to execute task" in
>> the
>>> scheduler log file I mentioned.
>>> 
>>> 
>>> I'm not sure where stdout/stderr go for these workers. If the call
>>> basically succeeded, but there were issues with the queue handling,
>>> then I'd expect this to go to stderr instead. I'm not 100% sure if that
>>> gets sent to the same scheduler log file or whether that goes nowhere
>>> because of it being a child process (they're probably inherited?).
>>> 
>>> 
>>> One further question: what's your parallellism set to?  I see 22 zombies
>>> left behind. Is that your setting?
>>> 
>>> Let us know!
>>> 
>>> Rgds,
>>> 
>>> Gerard
>>> 
>>> 
>>> 
>>> On Mon, Mar 27, 2017 at 8:13 PM, harish singh <harish.sing...@gmail.com>
>>> wrote:
>>> 
>>>> 1.8:  increasing DAGBAG_IMPORT_TIMEOUT helps. I don't see the issue
>>>> (although not sure why tasks progress has become slow? But thats not
>> the
>>>> issue we are discussing here. So I am ignoring that here)
>>>> 
>>>> 1.7:  our prod is running 1.7 and we havent seen the "defunct process"
>>>> issue for more than a week now. But we saw something very close to what
>>>> Nicholas provided (localexecutor, we do not use --num-runs)
>>>> Not sure if cpu/memory limit may lead to this issue. Often when we hit
>>> this
>>>> issue (which stalled the pipeline), we either increased the memory
>> and/or
>>>> moved airflow to a bulkier (cpu) instance.
>>>> 
>>>> Sorry for a late reply. Was out of town over the weekend.
>>>> 
>>>> 
>>>> 
>>>> On Mon, Mar 27, 2017 at 10:47 AM, Nicholas Hodgkinson <
>>>> nik.hodgkin...@collectivehealth.com> wrote:
>>>> 
>>>>> 1.7.1.3, however it seems this is still an issue in 1.8 according to
>>>>

Re: Scheduler silently dies

2017-03-25 Thread Bolke de Bruin
I case you *think* you have encountered a schedule *hang*, please provide a 
strace on the parent process, provide process list output that shows defunct 
scheduler processes, and provide *all* logging (main logs, scheduler processing 
log, task logs), preferably in debug mode (settings.py). Also show memory 
limits, cpu count and airflow.cfg.

Thanks
Bolke


> On 25 Mar 2017, at 18:16, Bolke de Bruin <bdbr...@gmail.com> wrote:
> 
> Please specify what “stop doing its job” means. It doesn’t log anything 
> anymore? If it does, the scheduler hasn’t died and hasn’t stopped.
> 
> B.
> 
> 
>> On 24 Mar 2017, at 18:20, Gael Magnan <gaelmag...@gmail.com> wrote:
>> 
>> We encountered the same kind of problem with the scheduler that stopped
>> doing its job even after rebooting. I thought changing the start date or
>> the state of a task instance might be to blame but I've never been able to
>> pinpoint the problem either.
>> 
>> We are using celery and docker if it helps.
>> 
>> Le sam. 25 mars 2017 à 01:53, Bolke de Bruin <bdbr...@gmail.com> a écrit :
>> 
>>> We are running *without* num runs for over a year (and never have). It is
>>> a very elusive issue which has not been reproducible.
>>> 
>>> I like more info on this but it needs to be very elaborate even to the
>>> point of access to the system exposing the behavior.
>>> 
>>> Bolke
>>> 
>>> Sent from my iPhone
>>> 
>>>> On 24 Mar 2017, at 16:04, Vijay Ramesh <vi...@change.org> wrote:
>>>> 
>>>> We literally have a cron job that restarts the scheduler every 30 min.
>>> Num
>>>> runs didn't work consistently in rc4, sometimes it would restart itself
>>> and
>>>> sometimes we'd end up with a few zombie scheduler processes and things
>>>> would get stuck. Also running locally, without celery.
>>>> 
>>>>> On Mar 24, 2017 16:02, <lro...@quartethealth.com> wrote:
>>>>> 
>>>>> We have max runs set and still hit this. Our solution is dumber:
>>>>> monitoring log output, and kill the scheduler if it stops emitting.
>>> Works
>>>>> like a charm.
>>>>> 
>>>>>> On Mar 24, 2017, at 5:50 PM, F. Hakan Koklu <fhakan.ko...@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>> Some solutions to this problem is restarting the scheduler frequently
>>> or
>>>>>> some sort of monitoring on the scheduler. We have set up a dag that
>>> pings
>>>>>> cronitor <https://cronitor.io/> (a dead man's snitch type of service)
>>>>> every
>>>>>> 10 minutes and the snitch pages you when the scheduler dies and does
>>> not
>>>>>> send a ping to it.
>>>>>> 
>>>>>> On Fri, Mar 24, 2017 at 1:49 PM, Andrew Phillips <
>>> aphill...@qrmedia.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> We use celery and run into it from time to time.
>>>>>>>> 
>>>>>>> 
>>>>>>> Bang goes my theory ;-) At least, assuming it's the same underlying
>>>>>>> cause...
>>>>>>> 
>>>>>>> Regards
>>>>>>> 
>>>>>>> ap
>>>>>>> 
>>>>> 
>>> 
> 



Re: Scheduler silently dies

2017-03-25 Thread Bolke de Bruin
Please specify what “stop doing its job” means. It doesn’t log anything 
anymore? If it does, the scheduler hasn’t died and hasn’t stopped.

B.


> On 24 Mar 2017, at 18:20, Gael Magnan <gaelmag...@gmail.com> wrote:
> 
> We encountered the same kind of problem with the scheduler that stopped
> doing its job even after rebooting. I thought changing the start date or
> the state of a task instance might be to blame but I've never been able to
> pinpoint the problem either.
> 
> We are using celery and docker if it helps.
> 
> Le sam. 25 mars 2017 à 01:53, Bolke de Bruin <bdbr...@gmail.com> a écrit :
> 
>> We are running *without* num runs for over a year (and never have). It is
>> a very elusive issue which has not been reproducible.
>> 
>> I like more info on this but it needs to be very elaborate even to the
>> point of access to the system exposing the behavior.
>> 
>> Bolke
>> 
>> Sent from my iPhone
>> 
>>> On 24 Mar 2017, at 16:04, Vijay Ramesh <vi...@change.org> wrote:
>>> 
>>> We literally have a cron job that restarts the scheduler every 30 min.
>> Num
>>> runs didn't work consistently in rc4, sometimes it would restart itself
>> and
>>> sometimes we'd end up with a few zombie scheduler processes and things
>>> would get stuck. Also running locally, without celery.
>>> 
>>>> On Mar 24, 2017 16:02, <lro...@quartethealth.com> wrote:
>>>> 
>>>> We have max runs set and still hit this. Our solution is dumber:
>>>> monitoring log output, and kill the scheduler if it stops emitting.
>> Works
>>>> like a charm.
>>>> 
>>>>> On Mar 24, 2017, at 5:50 PM, F. Hakan Koklu <fhakan.ko...@gmail.com>
>>>> wrote:
>>>>> 
>>>>> Some solutions to this problem is restarting the scheduler frequently
>> or
>>>>> some sort of monitoring on the scheduler. We have set up a dag that
>> pings
>>>>> cronitor <https://cronitor.io/> (a dead man's snitch type of service)
>>>> every
>>>>> 10 minutes and the snitch pages you when the scheduler dies and does
>> not
>>>>> send a ping to it.
>>>>> 
>>>>> On Fri, Mar 24, 2017 at 1:49 PM, Andrew Phillips <
>> aphill...@qrmedia.com>
>>>>> wrote:
>>>>> 
>>>>>> We use celery and run into it from time to time.
>>>>>>> 
>>>>>> 
>>>>>> Bang goes my theory ;-) At least, assuming it's the same underlying
>>>>>> cause...
>>>>>> 
>>>>>> Regards
>>>>>> 
>>>>>> ap
>>>>>> 
>>>> 
>> 



Re: Scheduler silently dies

2017-03-25 Thread Bolke de Bruin
Hi Harish,

The below does *not* indicate a scheduler hang, it is a valid exception as 
mentioned earlier.

Bolke.

> On 24 Mar 2017, at 19:07, harish singh <harish.sing...@gmail.com> wrote:
> 
> We have been using (1.7) over a year and never faced this issue.
> The moment we switched to 1.8, I think we have hit this issue.
> The reason why I saw "I think" is because I am not sure if it is the same
> issue. But whenever I restart, my pipeline proceeds.
> 
> 
> 
> *Airflow 1.7Having said that, In 1.7, I did face a similar issue (less than
> 5 times over a year): *
> *I saw that there were lot of processes marked  ""  with parent
> process being "scheduler". *
> 
> *Somebody mentioned it in this jira ->
> https://issues.apache.org/jira/browse/AIRFLOW-401
> <https://issues.apache.org/jira/browse/AIRFLOW-401>*
> *Workaround:  Restart scheduler*
> 
> 
> 
> 
> *Airflow 1.8:Now the issue in 1.8 may be different then the issue in
> 1.7 But again the issue get solved and pipeline progresses on a SCHEDULER
> RESTART.*If it may help, this is the trace in 1.8:
> [2017-03-22 19:35:16,332] {models.py:167} INFO - Filling up the DagBag from
> /usr/local/airflow/pipeline/pipeline.py [2017-03-22 19:35:22,451]
> {airflow_configuration.py:40} INFO - loading setup.cfg file [2017-03-22
> 19:35:51,041] {timeout.py:37} ERROR - Process timed out [2017-03-22
> 19:35:51,041] {models.py:266} ERROR - Failed to import:
> /usr/local/airflow/pipeline/pipeline.py Traceback (most recent call last):
> File "/usr/local/lib/python2.7/dist-packages/airflow/models.py", line 263,
> in process_file m = imp.load_source(mod_name, filepath) File
> "/usr/local/airflow/pipeline/pipeline.py", line 167, in 
> create_tasks(dbguid, version, dag, override_start_date) File
> "/usr/local/airflow/pipeline/pipeline.py", line 104, in create_tasks t =
> create_task(dbguid, dag, taskInfo, version, override_date) File
> "/usr/local/airflow/pipeline/pipeline.py", line 85, in create_task retries,
> 1, depends_on_past, version, override_dag_date) File
> "/usr/local/airflow/pipeline/dags/base_pipeline.py", line 90, in
> create_python_operator depends_on_past=depends_on_past) File
> "/usr/local/lib/python2.7/dist-packages/airflow/utils/decorators.py", line
> 86, in wrapper result = func(*args, **kwargs) File
> "/usr/local/lib/python2.7/dist-packages/airflow/operators/python_operator.py",
> line 65, in __init__ super(PythonOperator, self).__init__(*args, **kwargs)
> File "/usr/local/lib/python2.7/dist-packages/airflow/utils/decorators.py",
> line 70, in wrapper sig = signature(func) File "/usr/local/lib/python2.7/
> dist-packages/funcsigs/__init__.py", line 105, in signature return
> Signature.from_function(obj) File "/usr/local/lib/python2.7/
> dist-packages/funcsigs/__init__.py", line 594, in from_function
> __validate_parameters__=False) File "/usr/local/lib/python2.7/
> dist-packages/funcsigs/__init__.py", line 518, in __init__ for param in
> parameters)) File "/usr/lib/python2.7/collections.py", line 52, in __init__
> self.__update(*args, **kwds) File "/usr/lib/python2.7/_abcoll.py", line
> 548, in update self[key] = value File "/usr/lib/python2.7/collections.py",
> line 61, in __setitem__ last[1] = root[0] = self.__map[key] = [last, root,
> key] File "/usr/local/lib/python2.7/dist-packages/airflow/utils/timeout.py",
> line 38, in handle_timeout raise AirflowTaskTimeout(self.error_message)
> AirflowTaskTimeout: Timeout
> 
> 
> 
> 
> On Fri, Mar 24, 2017 at 5:45 PM, Bolke de Bruin <bdbr...@gmail.com> wrote:
> 
>> We are running *without* num runs for over a year (and never have). It is
>> a very elusive issue which has not been reproducible.
>> 
>> I like more info on this but it needs to be very elaborate even to the
>> point of access to the system exposing the behavior.
>> 
>> Bolke
>> 
>> Sent from my iPhone
>> 
>>> On 24 Mar 2017, at 16:04, Vijay Ramesh <vi...@change.org> wrote:
>>> 
>>> We literally have a cron job that restarts the scheduler every 30 min.
>> Num
>>> runs didn't work consistently in rc4, sometimes it would restart itself
>> and
>>> sometimes we'd end up with a few zombie scheduler processes and things
>>> would get stuck. Also running locally, without celery.
>>> 
>>>> On Mar 24, 2017 16:02, <lro...@quartethealth.com> wrote:
>>>> 
>>>> We have max runs set and still hit this. Our solution is dumber:
>>>> monitoring log output, and kill the scheduler if it 

Re: 1.8.1 release

2017-03-25 Thread Bolke de Bruin
I have set it to blocker.

> On 25 Mar 2017, at 17:56, Vincent Poulain <vincent.poul...@tinyclues.com> 
> wrote:
> 
> Hello,
> 
> For some people who are running airflow on prod with docker, this one is
> quite important : https://issues.apache.org/jira/browse/AIRFLOW-1018. I
> don't have log anymore :/
> 
> cheers,
> 
> On Fri, Mar 24, 2017 at 6:59 PM, Bolke de Bruin <bdbr...@gmail.com> wrote:
> 
>> Hi Chris
>> 
>> I think some jira are missing from the blocker list, I'll supply them
>> soon. Also some fixes are already in the v1-8-test branch, that are not
>> part of your list yet and some need to be (check jira on fixes for 1.8.1).
>> 
>> 982 and 983 might be fixed by reverting a change that we did as part of
>> 1.8.0 and including the "wait for all tasks" patch, that is already in
>> master. Let me pick this up.
>> 
>> To help you out I already did some work on the jira classifications (e.g.
>> try filtering on blocking issues) which should make it easier to find out
>> what needs to go into 1.8.1.
>> 
>> Bolke
>> 
>> Sent from my iPhone
>> 
>>> On 24 Mar 2017, at 10:21, Chris Riccomini <criccom...@apache.org> wrote:
>>> 
>>> Hey all,
>>> 
>>> I've let this thread sit for a while. Here are a list of the issues that
>>> were raised:
>>> 
>>> BLOCKERS:
>>> https://issues.apache.org/jira/browse/AIRFLOW-982
>>> https://issues.apache.org/jira/browse/AIRFLOW-983
>>> https://issues.apache.org/jira/browse/AIRFLOW-1019
>>> https://issues.apache.org/jira/browse/AIRFLOW-1017
>>> 
>>> NICE TO HAVE:
>>> https://issues.apache.org/jira/browse/AIRFLOW-1015
>>> https://issues.apache.org/jira/browse/AIRFLOW-1013
>>> https://issues.apache.org/jira/browse/AIRFLOW-1004
>>> https://issues.apache.org/jira/browse/AIRFLOW-1003
>>> https://issues.apache.org/jira/browse/AIRFLOW-1001
>>> 
>>> It looks like AIRFLOW-1017 is done, though the JIRA is not closed.
>>> 
>>> The rest remain open. I will wait on the release until the remaining
>>> blockers are finished. Dan/Daniel, can you comment on status?
>>> 
>>> Ruslan, if you want to work on your nice to haves, and submit patches,
>>> that's great, otherwise I don't believe they'll get fixed as part of
>> 1.8.1.
>>> 
>>> Cheers,
>>> Chris
>>> 
>>> On Wed, Mar 22, 2017 at 9:19 AM, Ruslan Dautkhanov <dautkha...@gmail.com
>>> 
>>> wrote:
>>> 
>>>> Thank you Sid!
>>>> 
>>>> 
>>>> Best regards,
>>>> Ruslan
>>>> 
>>>> On Wed, Mar 22, 2017 at 12:01 AM, siddharth anand <san...@apache.org>
>>>> wrote:
>>>> 
>>>>> Ruslan,
>>>>> Thanks for sharing this list. I can pick a few up. I agree we should
>> aim
>>>> to
>>>>> get some of them into 1.8.1.
>>>>> 
>>>>> -s
>>>>> 
>>>>> On Tue, Mar 21, 2017 at 2:29 PM, Ruslan Dautkhanov <
>> dautkha...@gmail.com
>>>>> 
>>>>> wrote:
>>>>> 
>>>>>> Some of the issues I ran into while testing 1.8rc5 :
>>>>>> 
>>>>>> https://issues.apache.org/jira/browse/AIRFLOW-1015
>>>>>>> https://issues.apache.org/jira/browse/AIRFLOW-1013
>>>>>>> https://issues.apache.org/jira/browse/AIRFLOW-1004
>>>>>>> https://issues.apache.org/jira/browse/AIRFLOW-1003
>>>>>>> https://issues.apache.org/jira/browse/AIRFLOW-1001
>>>>>>> https://issues.apache.org/jira/browse/AIRFLOW-1015
>>>>>> 
>>>>>> 
>>>>>> It would be great to have at least some of them fixed in 1.8.1.
>>>>>> 
>>>>>> Thank you.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Ruslan Dautkhanov
>>>>>> 
>>>>>> On Tue, Mar 21, 2017 at 3:02 PM, Dan Davydov <dan.davy...@airbnb.com.
>>>>>> invalid
>>>>>>> wrote:
>>>>>> 
>>>>>>> Here is my list for targeted 1.8.1 fixes:
>>>>>>> https://issues.apache.org/jira/browse/AIRFLOW-982
>>>>>>> https://issues.apache.org/jira/browse/AIRFLOW-983
>>>>>>> https://issues.apache.org/jira/

Re: Scheduler silently dies

2017-03-24 Thread Bolke de Bruin
For 1.8 and the issue you are seeing you might want to try increasing:

DAGBAG_IMPORT_TIMEOUT under core which defaults to 30. 

This reminds me that doing timeouts this way cannot be done in child processes 
and might explain the defunct processes, so please test if that works. 

Bolke 

Sent from my iPhone

> On 24 Mar 2017, at 19:07, harish singh <harish.sing...@gmail.com> wrote:
> 
> We have been using (1.7) over a year and never faced this issue.
> The moment we switched to 1.8, I think we have hit this issue.
> The reason why I saw "I think" is because I am not sure if it is the same
> issue. But whenever I restart, my pipeline proceeds.
> 
> 
> 
> *Airflow 1.7Having said that, In 1.7, I did face a similar issue (less than
> 5 times over a year): *
> *I saw that there were lot of processes marked  ""  with parent
> process being "scheduler". *
> 
> *Somebody mentioned it in this jira ->
> https://issues.apache.org/jira/browse/AIRFLOW-401
> <https://issues.apache.org/jira/browse/AIRFLOW-401>*
> *Workaround:  Restart scheduler*
> 
> 
> 
> 
> *Airflow 1.8:Now the issue in 1.8 may be different then the issue in
> 1.7 But again the issue get solved and pipeline progresses on a SCHEDULER
> RESTART.*If it may help, this is the trace in 1.8:
> [2017-03-22 19:35:16,332] {models.py:167} INFO - Filling up the DagBag from
> /usr/local/airflow/pipeline/pipeline.py [2017-03-22 19:35:22,451]
> {airflow_configuration.py:40} INFO - loading setup.cfg file [2017-03-22
> 19:35:51,041] {timeout.py:37} ERROR - Process timed out [2017-03-22
> 19:35:51,041] {models.py:266} ERROR - Failed to import:
> /usr/local/airflow/pipeline/pipeline.py Traceback (most recent call last):
> File "/usr/local/lib/python2.7/dist-packages/airflow/models.py", line 263,
> in process_file m = imp.load_source(mod_name, filepath) File
> "/usr/local/airflow/pipeline/pipeline.py", line 167, in 
> create_tasks(dbguid, version, dag, override_start_date) File
> "/usr/local/airflow/pipeline/pipeline.py", line 104, in create_tasks t =
> create_task(dbguid, dag, taskInfo, version, override_date) File
> "/usr/local/airflow/pipeline/pipeline.py", line 85, in create_task retries,
> 1, depends_on_past, version, override_dag_date) File
> "/usr/local/airflow/pipeline/dags/base_pipeline.py", line 90, in
> create_python_operator depends_on_past=depends_on_past) File
> "/usr/local/lib/python2.7/dist-packages/airflow/utils/decorators.py", line
> 86, in wrapper result = func(*args, **kwargs) File
> "/usr/local/lib/python2.7/dist-packages/airflow/operators/python_operator.py",
> line 65, in __init__ super(PythonOperator, self).__init__(*args, **kwargs)
> File "/usr/local/lib/python2.7/dist-packages/airflow/utils/decorators.py",
> line 70, in wrapper sig = signature(func) File "/usr/local/lib/python2.7/
> dist-packages/funcsigs/__init__.py", line 105, in signature return
> Signature.from_function(obj) File "/usr/local/lib/python2.7/
> dist-packages/funcsigs/__init__.py", line 594, in from_function
> __validate_parameters__=False) File "/usr/local/lib/python2.7/
> dist-packages/funcsigs/__init__.py", line 518, in __init__ for param in
> parameters)) File "/usr/lib/python2.7/collections.py", line 52, in __init__
> self.__update(*args, **kwds) File "/usr/lib/python2.7/_abcoll.py", line
> 548, in update self[key] = value File "/usr/lib/python2.7/collections.py",
> line 61, in __setitem__ last[1] = root[0] = self.__map[key] = [last, root,
> key] File "/usr/local/lib/python2.7/dist-packages/airflow/utils/timeout.py",
> line 38, in handle_timeout raise AirflowTaskTimeout(self.error_message)
> AirflowTaskTimeout: Timeout
> 
> 
> 
> 
>> On Fri, Mar 24, 2017 at 5:45 PM, Bolke de Bruin <bdbr...@gmail.com> wrote:
>> 
>> We are running *without* num runs for over a year (and never have). It is
>> a very elusive issue which has not been reproducible.
>> 
>> I like more info on this but it needs to be very elaborate even to the
>> point of access to the system exposing the behavior.
>> 
>> Bolke
>> 
>> Sent from my iPhone
>> 
>>> On 24 Mar 2017, at 16:04, Vijay Ramesh <vi...@change.org> wrote:
>>> 
>>> We literally have a cron job that restarts the scheduler every 30 min.
>> Num
>>> runs didn't work consistently in rc4, sometimes it would restart itself
>> and
>>> sometimes we'd end up with a few zombie scheduler processes and things
>>> would get stuck. Also running locally, without celery.
>>> 
>>>> On Mar 24, 2017 16:02, <lro

Re: Scheduler silently dies

2017-03-24 Thread Bolke de Bruin
We are running *without* num runs for over a year (and never have). It is a 
very elusive issue which has not been reproducible. 

I like more info on this but it needs to be very elaborate even to the point of 
access to the system exposing the behavior. 

Bolke

Sent from my iPhone

> On 24 Mar 2017, at 16:04, Vijay Ramesh  wrote:
> 
> We literally have a cron job that restarts the scheduler every 30 min. Num
> runs didn't work consistently in rc4, sometimes it would restart itself and
> sometimes we'd end up with a few zombie scheduler processes and things
> would get stuck. Also running locally, without celery.
> 
>> On Mar 24, 2017 16:02,  wrote:
>> 
>> We have max runs set and still hit this. Our solution is dumber:
>> monitoring log output, and kill the scheduler if it stops emitting. Works
>> like a charm.
>> 
>>> On Mar 24, 2017, at 5:50 PM, F. Hakan Koklu 
>> wrote:
>>> 
>>> Some solutions to this problem is restarting the scheduler frequently or
>>> some sort of monitoring on the scheduler. We have set up a dag that pings
>>> cronitor  (a dead man's snitch type of service)
>> every
>>> 10 minutes and the snitch pages you when the scheduler dies and does not
>>> send a ping to it.
>>> 
>>> On Fri, Mar 24, 2017 at 1:49 PM, Andrew Phillips 
>>> wrote:
>>> 
 We use celery and run into it from time to time.
> 
 
 Bang goes my theory ;-) At least, assuming it's the same underlying
 cause...
 
 Regards
 
 ap
 
>> 


Re: 1.8.1 release

2017-03-24 Thread Bolke de Bruin
Hi Chris

I think some jira are missing from the blocker list, I'll supply them soon. 
Also some fixes are already in the v1-8-test branch, that are not part of your 
list yet and some need to be (check jira on fixes for 1.8.1). 

982 and 983 might be fixed by reverting a change that we did as part of 1.8.0 
and including the "wait for all tasks" patch, that is already in master. Let me 
pick this up. 

To help you out I already did some work on the jira classifications (e.g. try 
filtering on blocking issues) which should make it easier to find out what 
needs to go into 1.8.1. 

Bolke

Sent from my iPhone

> On 24 Mar 2017, at 10:21, Chris Riccomini <criccom...@apache.org> wrote:
> 
> Hey all,
> 
> I've let this thread sit for a while. Here are a list of the issues that
> were raised:
> 
> BLOCKERS:
> https://issues.apache.org/jira/browse/AIRFLOW-982
> https://issues.apache.org/jira/browse/AIRFLOW-983
> https://issues.apache.org/jira/browse/AIRFLOW-1019
> https://issues.apache.org/jira/browse/AIRFLOW-1017
> 
> NICE TO HAVE:
> https://issues.apache.org/jira/browse/AIRFLOW-1015
> https://issues.apache.org/jira/browse/AIRFLOW-1013
> https://issues.apache.org/jira/browse/AIRFLOW-1004
> https://issues.apache.org/jira/browse/AIRFLOW-1003
> https://issues.apache.org/jira/browse/AIRFLOW-1001
> 
> It looks like AIRFLOW-1017 is done, though the JIRA is not closed.
> 
> The rest remain open. I will wait on the release until the remaining
> blockers are finished. Dan/Daniel, can you comment on status?
> 
> Ruslan, if you want to work on your nice to haves, and submit patches,
> that's great, otherwise I don't believe they'll get fixed as part of 1.8.1.
> 
> Cheers,
> Chris
> 
> On Wed, Mar 22, 2017 at 9:19 AM, Ruslan Dautkhanov <dautkha...@gmail.com>
> wrote:
> 
>> Thank you Sid!
>> 
>> 
>> Best regards,
>> Ruslan
>> 
>> On Wed, Mar 22, 2017 at 12:01 AM, siddharth anand <san...@apache.org>
>> wrote:
>> 
>>> Ruslan,
>>> Thanks for sharing this list. I can pick a few up. I agree we should aim
>> to
>>> get some of them into 1.8.1.
>>> 
>>> -s
>>> 
>>> On Tue, Mar 21, 2017 at 2:29 PM, Ruslan Dautkhanov <dautkha...@gmail.com
>>> 
>>> wrote:
>>> 
>>>> Some of the issues I ran into while testing 1.8rc5 :
>>>> 
>>>> https://issues.apache.org/jira/browse/AIRFLOW-1015
>>>>> https://issues.apache.org/jira/browse/AIRFLOW-1013
>>>>> https://issues.apache.org/jira/browse/AIRFLOW-1004
>>>>> https://issues.apache.org/jira/browse/AIRFLOW-1003
>>>>> https://issues.apache.org/jira/browse/AIRFLOW-1001
>>>>> https://issues.apache.org/jira/browse/AIRFLOW-1015
>>>> 
>>>> 
>>>> It would be great to have at least some of them fixed in 1.8.1.
>>>> 
>>>> Thank you.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Ruslan Dautkhanov
>>>> 
>>>> On Tue, Mar 21, 2017 at 3:02 PM, Dan Davydov <dan.davy...@airbnb.com.
>>>> invalid
>>>>> wrote:
>>>> 
>>>>> Here is my list for targeted 1.8.1 fixes:
>>>>> https://issues.apache.org/jira/browse/AIRFLOW-982
>>>>> https://issues.apache.org/jira/browse/AIRFLOW-983
>>>>> https://issues.apache.org/jira/browse/AIRFLOW-1019 (and in general
>> the
>>>>> slow
>>>>> startup time from this new logic of orphaned/reset task)
>>>>> https://issues.apache.org/jira/browse/AIRFLOW-1017 (which I will
>>>> hopefully
>>>>> have a fix out for soon just finishing up tests)
>>>>> 
>>>>> We are also hitting a new issue with subdags with rc5 that we weren't
>>>>> hitting with rc4 where subdags will occasionally just hang (had to
>> roll
>>>>> back from rc5 to rc4), I'll try to spin up a JIRA for it soon which
>>>> should
>>>>> be on the list too.
>>>>> 
>>>>> 
>>>>> On Tue, Mar 21, 2017 at 1:54 PM, Chris Riccomini <
>>> criccom...@apache.org>
>>>>> wrote:
>>>>> 
>>>>>> Agreed. I'm looking for a list of checksums/JIRAs that we want in
>> the
>>>>>> bugfix release.
>>>>>> 
>>>>>> On Tue, Mar 21, 2017 at 12:54 PM, Bolke de Bruin <
>> bdbr...@gmail.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> 
>>>

Re: 1.8.1 release

2017-03-21 Thread Bolke de Bruin
@dan

I'm obviously interested in the subdag issue as it is executed by the backfill 
logic. Do you have anything to reproduce it with? Can also talk about it 
tomorrow. 

Secondly, did you verify the all success / skipped 'fix' against 'wait for all 
tasks to finish'?

@chris I also suggest using/embracing jira more (as you are doing), as it helps 
with cleaner changelogs, tracking and targeting releases. 

Also note that I already included some fixes in v1-8-test. 

Bolke

Sent from my iPhone

> On 21 Mar 2017, at 14:29, Ruslan Dautkhanov <dautkha...@gmail.com> wrote:
> 
> Some of the issues I ran into while testing 1.8rc5 :
> 
> https://issues.apache.org/jira/browse/AIRFLOW-1015
>> https://issues.apache.org/jira/browse/AIRFLOW-1013
>> https://issues.apache.org/jira/browse/AIRFLOW-1004
>> https://issues.apache.org/jira/browse/AIRFLOW-1003
>> https://issues.apache.org/jira/browse/AIRFLOW-1001
>> https://issues.apache.org/jira/browse/AIRFLOW-1015
> 
> 
> It would be great to have at least some of them fixed in 1.8.1.
> 
> Thank you.
> 
> 
> 
> 
> -- 
> Ruslan Dautkhanov
> 
> On Tue, Mar 21, 2017 at 3:02 PM, Dan Davydov <dan.davy...@airbnb.com.invalid
>> wrote:
> 
>> Here is my list for targeted 1.8.1 fixes:
>> https://issues.apache.org/jira/browse/AIRFLOW-982
>> https://issues.apache.org/jira/browse/AIRFLOW-983
>> https://issues.apache.org/jira/browse/AIRFLOW-1019 (and in general the
>> slow
>> startup time from this new logic of orphaned/reset task)
>> https://issues.apache.org/jira/browse/AIRFLOW-1017 (which I will hopefully
>> have a fix out for soon just finishing up tests)
>> 
>> We are also hitting a new issue with subdags with rc5 that we weren't
>> hitting with rc4 where subdags will occasionally just hang (had to roll
>> back from rc5 to rc4), I'll try to spin up a JIRA for it soon which should
>> be on the list too.
>> 
>> 
>> On Tue, Mar 21, 2017 at 1:54 PM, Chris Riccomini <criccom...@apache.org>
>> wrote:
>> 
>>> Agreed. I'm looking for a list of checksums/JIRAs that we want in the
>>> bugfix release.
>>> 
>>> On Tue, Mar 21, 2017 at 12:54 PM, Bolke de Bruin <bdbr...@gmail.com>
>>> wrote:
>>> 
>>>> 
>>>> 
>>>>> On 21 Mar 2017, at 12:51, Bolke de Bruin <bdbr...@gmail.com> wrote:
>>>>> 
>>>>> My suggestion, as we are using semantic versioning is:
>>>>> 
>>>>> 1) no new features in the 1.8 branch
>>>>> 2) only bug fixes in the 1.8 branch
>>>>> 3) new features to land in 1.9
>>>>> 
>>>>> This allows companies to
>>>> 
>>>> Have a "known" version and can move to the new branch when they want to
>>>> get new features. Obviously we only support N-1, so when 1.10 comes out
>>> we
>>>> stop supporting 1.8.X.
>>>> 
>>>>> 
>>>>> Sent from my iPhone
>>>>> 
>>>>>> On 21 Mar 2017, at 11:22, Chris Riccomini <criccom...@apache.org>
>>>> wrote:
>>>>>> 
>>>>>> Hey all,
>>>>>> 
>>>>>> I suggest that we start a 1.8.1 Airflow release now. The goal would
>>> be:
>>>>>> 
>>>>>> 1) get a second release under our belt
>>>>>> 2) patch known issues with the 1.8.0 release
>>>>>> 
>>>>>> I'm happy to run it, but I saw Maxime mentioning that Airbnb might
>>> want
>>>> to.
>>>>>> @Max et al, can you comment?
>>>>>> 
>>>>>> Also, can folks supply JIRAs for stuff that think needs to be in the
>>>> 1.8.1
>>>>>> bugfix release?
>>>>>> 
>>>>>> Cheers,
>>>>>> Chris
>>>> 
>>> 
>> 


Re: 1.8.1 release

2017-03-21 Thread Bolke de Bruin


> On 21 Mar 2017, at 12:51, Bolke de Bruin <bdbr...@gmail.com> wrote:
> 
> My suggestion, as we are using semantic versioning is:
> 
> 1) no new features in the 1.8 branch
> 2) only bug fixes in the 1.8 branch
> 3) new features to land in 1.9
> 
> This allows companies to 

Have a "known" version and can move to the new branch when they want to get new 
features. Obviously we only support N-1, so when 1.10 comes out we stop 
supporting 1.8.X. 

> 
> Sent from my iPhone
> 
>> On 21 Mar 2017, at 11:22, Chris Riccomini <criccom...@apache.org> wrote:
>> 
>> Hey all,
>> 
>> I suggest that we start a 1.8.1 Airflow release now. The goal would be:
>> 
>> 1) get a second release under our belt
>> 2) patch known issues with the 1.8.0 release
>> 
>> I'm happy to run it, but I saw Maxime mentioning that Airbnb might want to.
>> @Max et al, can you comment?
>> 
>> Also, can folks supply JIRAs for stuff that think needs to be in the 1.8.1
>> bugfix release?
>> 
>> Cheers,
>> Chris


Re: 1.8.1 release

2017-03-21 Thread Bolke de Bruin
My suggestion, as we are using semantic versioning is:

1) no new features in the 1.8 branch
2) only bug fixes in the 1.8 branch
3) new features to land in 1.9

This allows companies to fon

Sent from my iPhone

> On 21 Mar 2017, at 11:22, Chris Riccomini  wrote:
> 
> Hey all,
> 
> I suggest that we start a 1.8.1 Airflow release now. The goal would be:
> 
> 1) get a second release under our belt
> 2) patch known issues with the 1.8.0 release
> 
> I'm happy to run it, but I saw Maxime mentioning that Airbnb might want to.
> @Max et al, can you comment?
> 
> Also, can folks supply JIRAs for stuff that think needs to be in the 1.8.1
> bugfix release?
> 
> Cheers,
> Chris


Re: [ANNOUNCE] Apache Airflow 1.8.0-incubating Released

2017-03-20 Thread Bolke de Bruin
They are one and the same. However, the official name is "Apache Airflow 1.8.0 
incubating" which is per Apache requirements. 

Bolke

Sent from my iPhone

> On 20 Mar 2017, at 11:15, Michael Gong <go...@hotmail.com> wrote:
> 
> what's the difference between the Airflow 1.8.0-incubating release and the 
> Airflow 1.8.0 release ?
> 
> 
> 
> ________
> From: Bolke de Bruin <bdbr...@gmail.com>
> Sent: Monday, March 20, 2017 5:30 PM
> To: dev@airflow.incubator.apache.org; annou...@apache.org
> Subject: [ANNOUNCE] Apache Airflow 1.8.0-incubating Released
> 
> The Apache Airflow (incubating) Team is proud to announce the release of 
> Apache Airflow 1.8.0-incubating.
> 
> This is a source code only release.
> 
> ABOUT AIRFLOW
> Airflow is a platform to programmatically author, schedule and monitor 
> workflows. Use airflow to author workflows as directed acyclic graphs (DAGs) 
> of tasks. The airflow scheduler executes your tasks on an array of workers 
> while following the specified dependencies. Rich command line utilities make 
> performing complex surgeries on DAGs a snap. The rich user interface makes it 
> easy to visualize pipelines running in production, monitor progress, and 
> troubleshoot issues when needed. When workflows are defined as code, they 
> become more maintainable, versionable, testable, and collaborative.
> 
> FEATURES AND ENHANCEMENTS IN THIS RELEASE
> Over 350 commits have been made since the last (non apache) release. 
> Therefore, in order to see all changes please check the changelog included 
> with the distribution.
> 
> Highlights:
> * Multi-processing and robust parsing of DAGs
> * Experimental API
> * Improved Kerberos Integration
> * Many new and improved operators particularly for cloud providers
> * Many UI improvements
> 
> RELEASE ARTIFACTS ARE AVAILABLE AT
> http://apache.org/dyn/closer.cgi/incubator/airflow/1.8.0-incubating
> 
> SHA256 & MD5 SIGNATURES (verify your downloads
> <https://www.apache.org/dyn/closer.cgi#verify>):
> 
> Apache Download Mirrors<https://www.apache.org/dyn/closer.cgi#verify>
> www.apache.org
> We suggest the following mirror site for your download: 
> http://mirror.olnevhost.net/pub/apache/ Other mirror sites are suggested 
> below. Please use the backup mirrors ...
> 
> 
> 
> https://dist.apache.org/repos/dist/release/incubator/airflow/1.8.0-incubating
> 
> PGP KEYS
> https://dist.apache.org/repos/dist/release/incubator/airflow/KEYS
> 
> PYPI
> For your convenience pypi packages will also be provided, however it is not 
> an official release channel.
> https://pypi.python.org/pypi/airflow/
> 
> Please note that a subsequent release will be named “Apache Airflow” which 
> will require manual upgrading.
> 
> Kind regards,
> Apache Airflow (incubating) Team


[ANNOUNCE] Apache Airflow 1.8.0-incubating Released

2017-03-20 Thread Bolke de Bruin
The Apache Airflow (incubating) Team is proud to announce the release of Apache 
Airflow 1.8.0-incubating.

This is a source code only release.

ABOUT AIRFLOW
Airflow is a platform to programmatically author, schedule and monitor 
workflows. Use airflow to author workflows as directed acyclic graphs (DAGs) of 
tasks. The airflow scheduler executes your tasks on an array of workers while 
following the specified dependencies. Rich command line utilities make 
performing complex surgeries on DAGs a snap. The rich user interface makes it 
easy to visualize pipelines running in production, monitor progress, and 
troubleshoot issues when needed. When workflows are defined as code, they 
become more maintainable, versionable, testable, and collaborative.

FEATURES AND ENHANCEMENTS IN THIS RELEASE
Over 350 commits have been made since the last (non apache) release. Therefore, 
in order to see all changes please check the changelog included with the 
distribution.

Highlights:
* Multi-processing and robust parsing of DAGs
* Experimental API
* Improved Kerberos Integration
* Many new and improved operators particularly for cloud providers
* Many UI improvements

RELEASE ARTIFACTS ARE AVAILABLE AT
http://apache.org/dyn/closer.cgi/incubator/airflow/1.8.0-incubating

SHA256 & MD5 SIGNATURES (verify your downloads
):
https://dist.apache.org/repos/dist/release/incubator/airflow/1.8.0-incubating

PGP KEYS
https://dist.apache.org/repos/dist/release/incubator/airflow/KEYS

PYPI
For your convenience pypi packages will also be provided, however it is not an 
official release channel.
https://pypi.python.org/pypi/airflow/

Please note that a subsequent release will be named “Apache Airflow” which will 
require manual upgrading.

Kind regards,
Apache Airflow (incubating) Team

Re: [RESULT][VOTE]Release Airflow 1.8.0 based on Airflow 1.8.0rc5

2017-03-19 Thread Bolke de Bruin
I’m doing the announcement on the IPMC in a few (need to grab breakfast first 
;-) ). It can be done any time after that.

I need to bump the version number so I will need to re-sign and create a new 
tar ball. I hope they won’t mind that, as it is a bit of a chicken and egg 
problem.

Bolke.

> On 19 Mar 2017, at 09:01, Maxime Beauchemin <maximebeauche...@gmail.com> 
> wrote:
> 
> @Bolke I can take care of regenerating the docs + pypi upload, just let me
> know when
> 
> Max
> 
> On Fri, Mar 17, 2017 at 5:20 PM, Dan Davydov <dan.davy...@airbnb.com.invalid
>> wrote:
> 
>> That's reasonable (treating it a bug instead of a change in behavior). Full
>> speed ahead!
>> 
>> On Thu, Mar 16, 2017 at 9:01 AM, Bolke de Bruin <bdbr...@gmail.com> wrote:
>> 
>>> Hello,
>>> 
>>> Apache Airflow (incubating) 1.8.0 (RC5) has been accepted.
>>> 
>>> 9 “+1” votes received:
>>> 
>>> - Maxime Beauchemin (binding)
>>> - Chris Riccomini (binding)
>>> - Arthur Wiedmer (binding)
>>> - Jeremiah Lowin (binding)
>>> - Siddharth Anand (binding)
>>> - Alex van Boxel (binding)
>>> - Bolke de Bruin (binding)
>>> 
>>> - Daniel Huang (non-binding)
>>> 
>>> Vote thread (start):
>>> http://mail-archives.apache.org/mod_mbox/incubator-
>>> airflow-dev/201703.mbox/%3cB1833A3A-05FB-4112-B395-
>>> 135caf930...@gmail.com%3e
>>> 
>>> Next steps:
>>> 1) will start the voting process at the IPMC mailinglist. I don’t expect
>>> changes.
>>> 2) Only after the positive voting on the IPMC and finalisation I will
>>> rebrand the RC to Release.
>>> 3) I will upload it to the incubator release page, then the tar ball
>> needs
>>> to propagate to the mirrors.
>>> 4) Update the website (can someone volunteer please?)
>>> 5) Finally I will ask Maxime to upload it to pypi. It seems we can keep
>>> the apache branding as lib cloud is doing this as well (
>>> https://libcloud.apache.org/downloads.html#pypi-package).
>>> 
>>> Cheers,
>>> 
>>> Bolke
>> 



Re: SparkOperator - tips and feedback?

2017-03-18 Thread Bolke de Bruin
A spark operator exists as of 1.8.0 (which will be released tomorrow), you 
might want to take a look at that. I know that an update is coming to that 
operator that improves communication with Yarn.

Bolke

> On 18 Mar 2017, at 18:43, Russell Jurney  wrote:
> 
> Ruslan, thanks for your feedback.
> 
> You mean the spark-submit context? Or like the SparkContext and
> SparkSession? I don't think we could keep that alive, because it wouldn't
> work out with multiple calls to spark-submit. I do feel your pain, though.
> Maybe someone else can see how this might be done?
> 
> If SparkContext was able to open the spark/pyspark console, then multiple
> job submissions would be possible. I didn't have this in mind but an
> InteractiveSparkContext or SparkConsoleContext might be able to do this?
> 
> Russell Jurney @rjurney 
> russell.jur...@gmail.com LI  FB
>  datasyndrome.com
> 
> On Sat, Mar 18, 2017 at 3:02 PM, Ruslan Dautkhanov 
> wrote:
> 
>> +1 Great idea.
>> 
>> my two cents - it would be nice (as an option) if SparkOperator would be
>> able to keep context open between different calls,
>> as it takes 30+ seconds to create a new context (on our cluster). Not sure
>> how well it fits Airflow architecture.
>> 
>> 
>> 
>> --
>> Ruslan Dautkhanov
>> 
>> On Sat, Mar 18, 2017 at 3:45 PM, Russell Jurney 
>> wrote:
>> 
>>> What do people think about creating a SparkOperator that uses
>> spark-submit
>>> to submit jobs? Would work for Scala/Java Spark and PySpark. The patterns
>>> outlined in my presentation on Airflow and PySpark
>>>  would fit well inside an Operator, I
>>> think.
>>> BashOperator works, but why not tailor something to spark-submit?
>>> 
>>> I'm open to doing the work, but I wanted to see what people though about
>> it
>>> and get feedback about things they would like to see in SparkOperator and
>>> get any pointers people had to doing the implementation.
>>> 
>>> Russell Jurney @rjurney 
>>> russell.jur...@gmail.com LI  FB
>>>  datasyndrome.com
>>> 
>> 



Re: `airflow webserver -D` runs in foreground

2017-03-17 Thread Bolke de Bruin
This is a (known) bug, since the introduction of the rolling restarts.

Bolke.

> On 17 Mar 2017, at 09:48, Ruslan Dautkhanov  wrote:
> 
> $ pip freeze
> airflow==*1.8.0rc5*+apache.incubating
> 
> airflow webserver doesn't want to daemonize
> 
> 
> $ airflow webserver --daemon
> [2017-03-17 00:06:37,553] {__init__.py:57} INFO - Using executor
> LocalExecutor
> 
> .. skip ..
> Running the Gunicorn Server with:
> Workers: 4 sync
> Host: 0.0.0.0:18111
> Timeout: 120
> Logfiles: - -
> =
> [2017-03-17 00:06:39,744] {__init__.py:57} INFO - Using executor
> LocalExecutor
> 
> 
> It keeps running in foreground.
> I am probably missing something simple?
> 
> ps. Good project name - airflow is the 1st one in `pip freeze` output :)
> 
> 
> Thanks,
> Ruslan Dautkhanov



Re: Airflow Committers: Landscape checks doing more harm than good?

2017-03-16 Thread Bolke de Bruin
We can do it in Travis’ afaik. We should replace it.

So +1.

B.

> On 16 Mar 2017, at 16:48, Jeremiah Lowin  wrote:
> 
> This may be an unpopular opinion, but most Airflow PRs have a little red
> "x" next to them not because they have failing unit tests, but because the
> Landscape check has decided they introduce bad code.
> 
> Unfortunately Landscape is often wrong -- here it is telling me my latest
> PR introduced no less than 30 errors... in files I didn't touch!
> https://github.com/apache/incubator-airflow/pull/2157 (however, it gives me
> credit for fixing 23 errors in those same files, so I've got that going for
> me... which is nice.)
> 
> The upshot is that Github's "health" indicator can be swayed by minor or
> erroneous issues, and therefore it serves little purpose other than making
> it look like every PR is bad. This creates committer fatigue, since every
> PR needs to be parsed to see if it actually is OK or not.
> 
> Don't get me wrong, I'm all for proper style and on occasion Landscape has
> pointed out problems that I've gone and fixed. But on the whole, I believe
> that having it as part of our red / green PR evaluation -- equal to and
> often superseding unit tests -- is harmful. I'd much rather be able to scan
> the PR list and know unequivocally that "green" indicates ready to merge.
> 
> J



Re: [VOTE] Release Apache Airflow 1.8.0 (incubating)

2017-03-16 Thread Bolke de Bruin
Oops, wrong mailing list ;-).


> On 16 Mar 2017, at 09:28, Bolke de Bruin <bdbr...@gmail.com> wrote:
> 
> Hello Incubator PMC’ers,
> 
> The Apache Airflow community has voted and approved the proposal to release 
> Apache Airflow 1.8.0 (incubating) based on 1.8.0 Release Candidate 5. We now 
> kindly request the Incubator PMC members to review and vote on this incubator 
> release. If the vote is successful we will rename release candidate 4 to 
> final.
> 
> Airflow is a platform to programmatically author, schedule and monitor 
> workflows. Use airflow to author workflows as directed acyclic graphs (DAGs) 
> of tasks. The airflow scheduler executes your tasks on an array of workers 
> while following the specified dependencies. Rich command line utilities make 
> performing complex surgeries on DAGs a snap. The rich user interface makes it 
> easy to visualize pipelines running in production, monitor progress, and 
> troubleshoot issues when needed. When workflows are defined as code, they 
> become more maintainable, versionable, testable, and collaborative.
> 
> The Apache Airflow-1.8.0-incubating release candidate is now available with 
> the following artefacts for a project vote:
> 
> * [VOTE] Thread:*
> http://mail-archives.apache.org/mod_mbox/incubator-airflow-dev/201703.mbox/%3cb1833a3a-05fb-4112-b395-135caf930...@gmail.com%3e
>  
> <http://mail-archives.apache.org/mod_mbox/incubator-airflow-dev/201703.mbox/%3cb1833a3a-05fb-4112-b395-135caf930...@gmail.com%3E>
> 
> *[RESULT][VOTE] Thread:*
> http://mail-archives.apache.org/mod_mbox/incubator-airflow-dev/201703.mbox/%3c59bc8c2b-12e2-4de3-9555-b2273660a...@gmail.com%3e
>  
> <http://mail-archives.apache.org/mod_mbox/incubator-airflow-dev/201703.mbox/%3c59bc8c2b-12e2-4de3-9555-b2273660a...@gmail.com%3e>
> 
> *The release candidate(s) to be voted on is available at:*
> https://dist.apache.org/repos/dist/dev/incubator/airflow/ 
> <https://dist.apache.org/repos/dist/dev/incubator/airflow/> or 
> https://dist.apache.org/repos/dist/dev/incubator/airflow/airflow-1.8.0rc5+apache.incubating.tar.gz
>  
> <https://dist.apache.org/repos/dist/dev/incubator/airflow/airflow-1.8.0rc5+apache.incubating.tar.gz>
> 
> *Git branch*
> https://github.com/apache/incubator-airflow/tree/v1-8-stable 
> <https://github.com/apache/incubator-airflow/tree/v1-8-stable> or 
> https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;a=tree;h=refs/heads/v1-8-stable;hb=refs/heads/v1-8-stable
>  
> <https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;a=tree;h=refs/heads/v1-8-stable;hb=refs/heads/v1-8-stable>
> 
> *Git tag*
> https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;a=shortlog;h=f4760c320a29be62469799355e76efa42d0b6bb2
>  
> <https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;a=shortlog;h=f4760c320a29be62469799355e76efa42d0b6bb2>
> 
> PGP signature
> https://dist.apache.org/repos/dist/dev/incubator/airflow/airflow-1.8.0rc5+apache.incubating.tar.gz.asc
>  
> <https://dist.apache.org/repos/dist/dev/incubator/airflow/airflow-1.8.0rc5+apache.incubating.tar.gz.asc>
> 
> MD5/SHA Hashes:
> https://dist.apache.org/repos/dist/dev/incubator/airflow/airflow-1.8.0rc5+apache.incubating.tar.gz.md5
>  
> <https://dist.apache.org/repos/dist/dev/incubator/airflow/airflow-1.8.0rc5+apache.incubating.tar.gz.md5>
> https://dist.apache.org/repos/dist/dev/incubator/airflow/airflow-1.8.0rc5+apache.incubating.tar.gz.sha
>  
> <https://dist.apache.org/repos/dist/dev/incubator/airflow/airflow-1.8.0rc5+apache.incubating.tar.gz.sha>
> 
> *Keys to verify the signature of the release artifacts are available at:*
> https://dist.apache.org/repos/dist/release/incubator/airflow/ 
> <https://dist.apache.org/repos/dist/release/incubator/airflow/>
> 
> * RAT License checks*
> 
> RAT is executed as part of the CI process (e.g. 
> https://travis-ci.org/apache/incubator-airflow/builds/203106493 
> <https://travis-ci.org/apache/incubator-airflow/builds/203106493>) but can 
> also be run manually by issuing “sh scripts/ci/check-license.sh” from the top 
> level.
> 
> Source code is always included, i.e. there is no binary release. Compilation 
> and installation will happen by standard Python practices, e.g. pip install 
> <> or python setup.py install.
> 
> The vote will be open for at least 72 hours or until necessary number of
> votes are reached.
> 
> Members please be sure to indicate "(Binding)" with your vote which will
> help in tallying the vote(s).
> 
> [ ] +1  approve
> 
> [ ] +0  no opinion
> 
> [ ] -1  disapprove (and reason why)
> 
> 
> *Here is my +1 (non-binding)*
> 
> Cheers,
> Bolke



[VOTE] Release Apache Airflow 1.8.0 (incubating)

2017-03-16 Thread Bolke de Bruin
Hello Incubator PMC’ers,

The Apache Airflow community has voted and approved the proposal to release 
Apache Airflow 1.8.0 (incubating) based on 1.8.0 Release Candidate 5. We now 
kindly request the Incubator PMC members to review and vote on this incubator 
release. If the vote is successful we will rename release candidate 4 to final.

Airflow is a platform to programmatically author, schedule and monitor 
workflows. Use airflow to author workflows as directed acyclic graphs (DAGs) of 
tasks. The airflow scheduler executes your tasks on an array of workers while 
following the specified dependencies. Rich command line utilities make 
performing complex surgeries on DAGs a snap. The rich user interface makes it 
easy to visualize pipelines running in production, monitor progress, and 
troubleshoot issues when needed. When workflows are defined as code, they 
become more maintainable, versionable, testable, and collaborative.

The Apache Airflow-1.8.0-incubating release candidate is now available with the 
following artefacts for a project vote:

* [VOTE] Thread:*
http://mail-archives.apache.org/mod_mbox/incubator-airflow-dev/201703.mbox/%3cb1833a3a-05fb-4112-b395-135caf930...@gmail.com%3e
 


*[RESULT][VOTE] Thread:*
http://mail-archives.apache.org/mod_mbox/incubator-airflow-dev/201703.mbox/%3c59bc8c2b-12e2-4de3-9555-b2273660a...@gmail.com%3e

*The release candidate(s) to be voted on is available at:*
https://dist.apache.org/repos/dist/dev/incubator/airflow/ 
 or 
https://dist.apache.org/repos/dist/dev/incubator/airflow/airflow-1.8.0rc5+apache.incubating.tar.gz
 


*Git branch*
https://github.com/apache/incubator-airflow/tree/v1-8-stable 
 or 
https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;a=tree;h=refs/heads/v1-8-stable;hb=refs/heads/v1-8-stable
 


*Git tag*
https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;a=shortlog;h=f4760c320a29be62469799355e76efa42d0b6bb2

PGP signature
https://dist.apache.org/repos/dist/dev/incubator/airflow/airflow-1.8.0rc5+apache.incubating.tar.gz.asc
 


MD5/SHA Hashes:
https://dist.apache.org/repos/dist/dev/incubator/airflow/airflow-1.8.0rc5+apache.incubating.tar.gz.md5
 

https://dist.apache.org/repos/dist/dev/incubator/airflow/airflow-1.8.0rc5+apache.incubating.tar.gz.sha
 


*Keys to verify the signature of the release artifacts are available at:*
https://dist.apache.org/repos/dist/release/incubator/airflow/ 


* RAT License checks*

RAT is executed as part of the CI process (e.g. 
https://travis-ci.org/apache/incubator-airflow/builds/203106493 
) but can also 
be run manually by issuing “sh scripts/ci/check-license.sh” from the top level.

Source code is always included, i.e. there is no binary release. Compilation 
and installation will happen by standard Python practices, e.g. pip install <> 
or python setup.py install.

The vote will be open for at least 72 hours or until necessary number of
votes are reached.

Members please be sure to indicate "(Binding)" with your vote which will
help in tallying the vote(s).

[ ] +1  approve

[ ] +0  no opinion

[ ] -1  disapprove (and reason why)


*Here is my +1 (non-binding)*

Cheers,
Bolke

[RESULT][VOTE]Release Airflow 1.8.0 based on Airflow 1.8.0rc5

2017-03-16 Thread Bolke de Bruin
Hello,

Apache Airflow (incubating) 1.8.0 (RC5) has been accepted.

9 “+1” votes received:

- Maxime Beauchemin (binding)
- Chris Riccomini (binding)
- Arthur Wiedmer (binding)
- Jeremiah Lowin (binding)
- Siddharth Anand (binding)
- Alex van Boxel (binding)
- Bolke de Bruin (binding)

- Daniel Huang (non-binding)

Vote thread (start):
http://mail-archives.apache.org/mod_mbox/incubator-airflow-dev/201703.mbox/%3cb1833a3a-05fb-4112-b395-135caf930...@gmail.com%3e

Next steps:
1) will start the voting process at the IPMC mailinglist. I don’t expect 
changes.
2) Only after the positive voting on the IPMC and finalisation I will rebrand 
the RC to Release.
3) I will upload it to the incubator release page, then the tar ball needs to 
propagate to the mirrors.
4) Update the website (can someone volunteer please?)
5) Finally I will ask Maxime to upload it to pypi. It seems we can keep the 
apache branding as lib cloud is doing this as well 
(https://libcloud.apache.org/downloads.html#pypi-package).

Cheers,

Bolke

Re: [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc5

2017-03-16 Thread Bolke de Bruin
I agree that it is not nice. I suggest that we revisit our fix and see if we 
can do better there (rather than adding new complexity). And get this into 
1.8.1.

Nevertheless, I consider the vote passed.

Bolke

> On 15 Mar 2017, at 19:12, Dan Davydov <dan.davy...@airbnb.com.INVALID> wrote:
> 
> The only thing is that this is a change in semantics and changing semantics
> (breaking some DAGs) and then changing them back (and breaking things
> again) isn't great.
> 
> On Wed, Mar 15, 2017 at 7:02 PM, Bolke de Bruin <bdbr...@gmail.com> wrote:
> 
>> Indeed that could be the case. Let's get 1.8.0 out the door so we can
>> focus on these bug fixes for 1.8.1.
>> 
>> Bolke
>> 
>> Sent from my iPhone
>> 
>>> On 15 Mar 2017, at 18:25, Dan Davydov <dan.davy...@airbnb.com.INVALID>
>> wrote:
>>> 
>>> Another issue we are seeing is
>>> https://issues.apache.org/jira/browse/AIRFLOW-992 - tasks that have both
>>> skipped children and successful children are run instead of skipped. Not
>>> blocking the release on this just letting you guys know for the release
>> bug
>>> notes. We will be cherrypicking a fix for this onto our production when
>> we
>>> release 1.8 once we come up with one.
>>> 
>>> It's possibly thought not necessarily related to an incomplete/incorrect
>>> fix of https://issues.apache.org/jira/browse/AIRFLOW-719 .
>>> 
>>>> On Wed, Mar 15, 2017 at 4:53 PM, siddharth anand <san...@apache.org>
>> wrote:
>>>> 
>>>> Confirmed that Bolke's PR above fixes the issue.
>>>> 
>>>> Also, I agree this is not a blocker for the current airflow release, so
>> my
>>>> +1 (binding) stands.
>>>> -s
>>>> 
>>>>> On Wed, Mar 15, 2017 at 3:11 PM, Bolke de Bruin <bdbr...@gmail.com>
>> wrote:
>>>>> 
>>>>> PR is available: https://github.com/apache/incubator-airflow/pull/2154
>>>>> 
>>>>> But marked for 1.8.1.
>>>>> 
>>>>> - Bolke
>>>>> 
>>>>>> On 15 Mar 2017, at 14:37, Bolke de Bruin <bdbr...@gmail.com> wrote:
>>>>>> 
>>>>>> On second thought I do consider it a bug and can have a fix out pretty
>>>>> quickly, but I don’t consider it a blocker.
>>>>>> 
>>>>>> - B.
>>>>>> 
>>>>>>> On 15 Mar 2017, at 14:21, Bolke de Bruin <bdbr...@gmail.com> wrote:
>>>>>>> 
>>>>>>> Just to be clear: Also in 1.7.1 the DagRun was marked successful, but
>>>>> its tasks continued to be scheduled. So one could also consider 1.7.1
>>>>> behaviour a bug. I am not sure here, but I think it kind of makes sense
>>>> to
>>>>> consider the behaviour of 1.7.1 a bug. It has been present throughout
>> all
>>>>> the 1.8 rc/beta/apha series.
>>>>>>> 
>>>>>>> So yes it is a change in behaviour whether it is a regression or an
>>>>> integrity improvement is up for discussion. Either way I don’t consider
>>>> it
>>>>> a blocker.
>>>>>>> 
>>>>>>> Bolke.
>>>>>>> 
>>>>>>>> On 15 Mar 2017, at 14:06, siddharth anand <san...@apache.org>
>> wrote:
>>>>>>>> 
>>>>>>>> Here's the JIRA :
>>>>>>>> https://issues.apache.org/jira/browse/AIRFLOW-989
>>>>>>>> 
>>>>>>>> I confirmed it is a regression from 1.7.1.3, which I installed via
>>>> pip
>>>>> and
>>>>>>>> tested against the same DAG in the JIRA.
>>>>>>>> 
>>>>>>>> The issue occurs if a leaf / last / terminal downstream task is not
>>>>>>>> cleared. You won't see this issue if you clear the entire DAG Run or
>>>>> clear
>>>>>>>> a task and all of its downstream tasks. If you truly want to only
>>>>> clear and
>>>>>>>> rerun a task, but not its downstream tasks, you can use the CLI to
>>>>> execute
>>>>>>>> a specific task (e.g. vial airflow run).
>>>>>>>> 
>>>>>>>> This is a change in behavior -- if we do go ahead with the release,
>>>>> then
>>>>>>>> this JIRA should be in a list o

Re: [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc5

2017-03-15 Thread Bolke de Bruin
PR is available: https://github.com/apache/incubator-airflow/pull/2154

But marked for 1.8.1.

- Bolke

> On 15 Mar 2017, at 14:37, Bolke de Bruin <bdbr...@gmail.com> wrote:
> 
> On second thought I do consider it a bug and can have a fix out pretty 
> quickly, but I don’t consider it a blocker.
> 
> - B.
> 
>> On 15 Mar 2017, at 14:21, Bolke de Bruin <bdbr...@gmail.com> wrote:
>> 
>> Just to be clear: Also in 1.7.1 the DagRun was marked successful, but its 
>> tasks continued to be scheduled. So one could also consider 1.7.1 behaviour 
>> a bug. I am not sure here, but I think it kind of makes sense to consider 
>> the behaviour of 1.7.1 a bug. It has been present throughout all the 1.8 
>> rc/beta/apha series.
>> 
>> So yes it is a change in behaviour whether it is a regression or an 
>> integrity improvement is up for discussion. Either way I don’t consider it a 
>> blocker.
>> 
>> Bolke.
>> 
>>> On 15 Mar 2017, at 14:06, siddharth anand <san...@apache.org> wrote:
>>> 
>>> Here's the JIRA :
>>> https://issues.apache.org/jira/browse/AIRFLOW-989
>>> 
>>> I confirmed it is a regression from 1.7.1.3, which I installed via pip and
>>> tested against the same DAG in the JIRA.
>>> 
>>> The issue occurs if a leaf / last / terminal downstream task is not
>>> cleared. You won't see this issue if you clear the entire DAG Run or clear
>>> a task and all of its downstream tasks. If you truly want to only clear and
>>> rerun a task, but not its downstream tasks, you can use the CLI to execute
>>> a specific task (e.g. vial airflow run).
>>> 
>>> This is a change in behavior -- if we do go ahead with the release, then
>>> this JIRA should be in a list of JIRAs of known issues related to the new
>>> version.
>>> -s
>>> 
>>> On Wed, Mar 15, 2017 at 9:17 AM, Chris Riccomini <criccom...@apache.org>
>>> wrote:
>>> 
>>>> @Sid, does this happen if you clear downstream as well?
>>>> 
>>>> On Wed, Mar 15, 2017 at 9:04 AM, Chris Riccomini <criccom...@apache.org>
>>>> wrote:
>>>> 
>>>>> Has anyone been able to reproduce Sid's issue?
>>>>> 
>>>>> On Tue, Mar 14, 2017 at 11:17 PM, Bolke de Bruin <bdbr...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> That is not an airflow error, but a Kerberos error. Try executing the
>>>>>> kinit command on the command line by yourself.
>>>>>> 
>>>>>> Bolke
>>>>>> 
>>>>>> Sent from my iPhone
>>>>>> 
>>>>>>> On 14 Mar 2017, at 23:11, Ruslan Dautkhanov <dautkha...@gmail.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>> `airflow kerberos` is broken in 1.8-rc5
>>>>>>> https://issues.apache.org/jira/browse/AIRFLOW-987
>>>>>>> Hopefully fix can be part of the 1.8 release.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Ruslan Dautkhanov
>>>>>>> 
>>>>>>>> On Tue, Mar 14, 2017 at 6:19 PM, siddharth anand <san...@apache.org>
>>>>>> wrote:
>>>>>>>> 
>>>>>>>> FYI,
>>>>>>>> I've just hit a major bug in the release candidate related to "clear
>>>>>> task"
>>>>>>>> behavior.
>>>>>>>> 
>>>>>>>> I've been running airflow in both stage and prod since yesterday on
>>>>>> rc5 and
>>>>>>>> have reproduced this in both environments. I will file a JIRA for
>>>> this
>>>>>>>> tonight, but wanted to send a note over email as well.
>>>>>>>> 
>>>>>>>> In my example, I have a 2 task DAG. For a given DAG run that has
>>>>>> completed
>>>>>>>> successfully, if I
>>>>>>>> 1) clear task2 (leaf task in this case), the previously-successful
>>>> DAG
>>>>>> Run
>>>>>>>> goes back to Running, requeues, and executes the task successfully.
>>>>>> The DAG
>>>>>>>> Run the returns from Running to Success.
>>>>>>>> 2) clear task1 (root task in this case), the previou

Re: [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc5

2017-03-15 Thread Bolke de Bruin
On second thought I do consider it a bug and can have a fix out pretty quickly, 
but I don’t consider it a blocker.

- B.

> On 15 Mar 2017, at 14:21, Bolke de Bruin <bdbr...@gmail.com> wrote:
> 
> Just to be clear: Also in 1.7.1 the DagRun was marked successful, but its 
> tasks continued to be scheduled. So one could also consider 1.7.1 behaviour a 
> bug. I am not sure here, but I think it kind of makes sense to consider the 
> behaviour of 1.7.1 a bug. It has been present throughout all the 1.8 
> rc/beta/apha series.
> 
> So yes it is a change in behaviour whether it is a regression or an integrity 
> improvement is up for discussion. Either way I don’t consider it a blocker.
> 
> Bolke.
> 
>> On 15 Mar 2017, at 14:06, siddharth anand <san...@apache.org> wrote:
>> 
>> Here's the JIRA :
>> https://issues.apache.org/jira/browse/AIRFLOW-989
>> 
>> I confirmed it is a regression from 1.7.1.3, which I installed via pip and
>> tested against the same DAG in the JIRA.
>> 
>> The issue occurs if a leaf / last / terminal downstream task is not
>> cleared. You won't see this issue if you clear the entire DAG Run or clear
>> a task and all of its downstream tasks. If you truly want to only clear and
>> rerun a task, but not its downstream tasks, you can use the CLI to execute
>> a specific task (e.g. vial airflow run).
>> 
>> This is a change in behavior -- if we do go ahead with the release, then
>> this JIRA should be in a list of JIRAs of known issues related to the new
>> version.
>> -s
>> 
>> On Wed, Mar 15, 2017 at 9:17 AM, Chris Riccomini <criccom...@apache.org>
>> wrote:
>> 
>>> @Sid, does this happen if you clear downstream as well?
>>> 
>>> On Wed, Mar 15, 2017 at 9:04 AM, Chris Riccomini <criccom...@apache.org>
>>> wrote:
>>> 
>>>> Has anyone been able to reproduce Sid's issue?
>>>> 
>>>> On Tue, Mar 14, 2017 at 11:17 PM, Bolke de Bruin <bdbr...@gmail.com>
>>>> wrote:
>>>> 
>>>>> That is not an airflow error, but a Kerberos error. Try executing the
>>>>> kinit command on the command line by yourself.
>>>>> 
>>>>> Bolke
>>>>> 
>>>>> Sent from my iPhone
>>>>> 
>>>>>> On 14 Mar 2017, at 23:11, Ruslan Dautkhanov <dautkha...@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>> `airflow kerberos` is broken in 1.8-rc5
>>>>>> https://issues.apache.org/jira/browse/AIRFLOW-987
>>>>>> Hopefully fix can be part of the 1.8 release.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Ruslan Dautkhanov
>>>>>> 
>>>>>>> On Tue, Mar 14, 2017 at 6:19 PM, siddharth anand <san...@apache.org>
>>>>> wrote:
>>>>>>> 
>>>>>>> FYI,
>>>>>>> I've just hit a major bug in the release candidate related to "clear
>>>>> task"
>>>>>>> behavior.
>>>>>>> 
>>>>>>> I've been running airflow in both stage and prod since yesterday on
>>>>> rc5 and
>>>>>>> have reproduced this in both environments. I will file a JIRA for
>>> this
>>>>>>> tonight, but wanted to send a note over email as well.
>>>>>>> 
>>>>>>> In my example, I have a 2 task DAG. For a given DAG run that has
>>>>> completed
>>>>>>> successfully, if I
>>>>>>> 1) clear task2 (leaf task in this case), the previously-successful
>>> DAG
>>>>> Run
>>>>>>> goes back to Running, requeues, and executes the task successfully.
>>>>> The DAG
>>>>>>> Run the returns from Running to Success.
>>>>>>> 2) clear task1 (root task in this case), the previously-successful
>>> DAG
>>>>> Run
>>>>>>> goes back to Running, DOES NOT requeue or execute the task at all.
>>> The
>>>>> DAG
>>>>>>> Run the returns from Running to Success though it never ran the task.
>>>>>>> 
>>>>>>> 1) is expected and previous behavior. 2) is a regression.
>>>>>>> 
>>>>>>> The only workaround is to use the CLI to run the task cleared. Here
>>> are
>>>>>>> some images :
>&g

Re: Make Scheduler More Centralized

2017-03-15 Thread Bolke de Bruin
Hi Rui,

We have been discussing this during the hackathon at Airbnb as well. Besides 
the reservations Gerard is documenting, I am also not enthusiastic about this 
design. Currently, the scheduler is our main issue in scaling. Scheduler runs 
will take longer and longer with more DAGs and more complex DAGS (ie. more 
tasks in a DAG). To move more things into the scheduler makes it more difficult 
to move things out again. This is required when we want to move to an event 
driven / snowballing scheduler.

I would suggest documentation and enforcing the contracts between the scheduler 
- executor - task instance. We are lax in that respect and this is where a lot 
of issue stem from. Also the executor is the weak point here as it doesn’t do 
anything with the task state, but it does handle them. The points Gerard makes 
are very valid and we should improve our assumptions of the underlying bus.

Cheers
Bolke

> On 14 Mar 2017, at 15:08, Rui Wang  wrote:
> 
> Hi,
> The design doc below I created is trying to make airflow scheduler more
> centralized. Briefly speaking, I propose moving state change of
> TaskInstance to scheduler. You can see the reasons for this change below.
> 
> 
> Could you take a look and comment if you see anything does not make sense?
> 
> -Rui
> 
> --
> Current The state of TaskInstance is changed by both scheduler and worker.
> On worker side, worker monitors TaskInstance and changes the state to
> RUNNING, SUCCESS, if task succeed, or to UP_FOR_RETRY, FAILED if task fail.
> Worker also does failure email logic and failure callback logic.
> Proposal The general idea is to make a centralized scheduler and make
> workers dumb. Worker should not change state of TaskInstance, but just
> executes what it is assigned and reports the result of the task. Instead,
> the scheduler should make the decision on TaskInstance state change.
> Ideally, workers should not even handle the failure emails and callbacks
> unless the scheduler asks it to do so.
> Why Worker does not have as much information as scheduler has. There were
> bugs observed caused by worker when worker gets into trouble but cannot
> make decision to change task state due to lack of information. Although
> there is airflow metadata DB, it is still not easy to share all information
> that scheduler has with workers.
> 
> We can also ensure a consistent environment. There are slight differences
> in the chef recipes for the different workers which can cause strange
> issues when DAGs parse on one but not the other.
> 
> In the meantime, moving state changes to the scheduler can reduce the
> complexity of airflow. It especially helps when airflow needs to move to
> distributed schedulers. In that case state change everywhere by both
> schedulers and workers are harder to maintain.
> How to change After lots of discussions, following step will be done:
> 
> 1. Add a new column to TaskInstance table. Worker will fill this column
> with the task process exit code.
> 
> 2. Worker will only set TaskInstance state to RUNNING when it is ready to
> run task. There was debate on moving RUNNING to scheduler as well. If
> moving RUNNING to scheduler, either scheduler marks TaskInstance RUNNING
> before it gets into queue, or scheduler checks the status code in column
> above, which is updated by worker when worker is ready to run task. In
> Former case, from user's perspective, it is bad to mark TaskInstance as
> RUNNING when worker is not ready to run. User could be confused. In the
> latter case, scheduler could mark task as RUNNING late due to schedule
> interval. It is still not a good user experience. Since only worker knows
> when is ready to run task, worker should still deliver this message to user
> by setting RUNNING state.
> 
> 3. In any other cases, worker should not change state of TaskInstance, but
> save defined status code into column above.
> 
> 4. Worker still handles failure emails and callbacks because there were
> concern that scheduler could use too much resource to run failure callbacks
> given unpredictable callback sizes. ( I think ideally scheduler should
> treat failure callbacks and emails as tasks, and assign such tasks to
> workers after TaskInstance state changes correspondingly). Eventually this
> logic will be moved to the workers once there is support for multiple
> distributed schedulers.
> 
> 5. In scheduler's loop, scheduler should check TaskInstance status code,
> then change state and retry/fail TaskInstance correspondingly.



Re: [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc5

2017-03-15 Thread Bolke de Bruin
FYI: When all root tasks (i.e. the last tasks to run) have succeeded the DagRun 
is considered successful and the scheduler will not consider any other tasks in 
the dag run. The code is here: 
https://github.com/apache/incubator-airflow/blob/master/airflow/models.py#L4095 
for version 1.8, and hasn’t changed significantly since 1.7: 
https://github.com/apache/incubator-airflow/blob/airbnb_rb1.7.1_4/airflow/models.py#L2678
 . As 1.7 is more task based and 1.8 more dag run based the behaviour between 
1.7 and 1.8 might be different (Sid is investigating).

Thus to make sure the tasks run you will need to clear the Root task - thus 
downstream clearing will definitely work. I’m not sure this is a change in 
behaviour as explained above. As most of the use case will clear downstream 
(its the default), a workaround is available, I don’t consider it a blocker.

- Bolke.

> On 15 Mar 2017, at 09:17, Chris Riccomini <criccom...@apache.org> wrote:
> 
> @Sid, does this happen if you clear downstream as well?
> 
> On Wed, Mar 15, 2017 at 9:04 AM, Chris Riccomini <criccom...@apache.org>
> wrote:
> 
>> Has anyone been able to reproduce Sid's issue?
>> 
>> On Tue, Mar 14, 2017 at 11:17 PM, Bolke de Bruin <bdbr...@gmail.com>
>> wrote:
>> 
>>> That is not an airflow error, but a Kerberos error. Try executing the
>>> kinit command on the command line by yourself.
>>> 
>>> Bolke
>>> 
>>> Sent from my iPhone
>>> 
>>>> On 14 Mar 2017, at 23:11, Ruslan Dautkhanov <dautkha...@gmail.com>
>>> wrote:
>>>> 
>>>> `airflow kerberos` is broken in 1.8-rc5
>>>> https://issues.apache.org/jira/browse/AIRFLOW-987
>>>> Hopefully fix can be part of the 1.8 release.
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Ruslan Dautkhanov
>>>> 
>>>>> On Tue, Mar 14, 2017 at 6:19 PM, siddharth anand <san...@apache.org>
>>> wrote:
>>>>> 
>>>>> FYI,
>>>>> I've just hit a major bug in the release candidate related to "clear
>>> task"
>>>>> behavior.
>>>>> 
>>>>> I've been running airflow in both stage and prod since yesterday on
>>> rc5 and
>>>>> have reproduced this in both environments. I will file a JIRA for this
>>>>> tonight, but wanted to send a note over email as well.
>>>>> 
>>>>> In my example, I have a 2 task DAG. For a given DAG run that has
>>> completed
>>>>> successfully, if I
>>>>> 1) clear task2 (leaf task in this case), the previously-successful DAG
>>> Run
>>>>> goes back to Running, requeues, and executes the task successfully.
>>> The DAG
>>>>> Run the returns from Running to Success.
>>>>> 2) clear task1 (root task in this case), the previously-successful DAG
>>> Run
>>>>> goes back to Running, DOES NOT requeue or execute the task at all. The
>>> DAG
>>>>> Run the returns from Running to Success though it never ran the task.
>>>>> 
>>>>> 1) is expected and previous behavior. 2) is a regression.
>>>>> 
>>>>> The only workaround is to use the CLI to run the task cleared. Here are
>>>>> some images :
>>>>> *After Clearing the Tasks*
>>>>> https://www.dropbox.com/s/wmuxt0krwx6wurr/Screenshot%
>>>>> 202017-03-14%2014.09.34.png?dl=0
>>>>> 
>>>>> *After DAG Runs return to Success*
>>>>> https://www.dropbox.com/s/qop933rzgdzchpd/Screenshot%
>>>>> 202017-03-14%2014.09.49.png?dl=0
>>>>> 
>>>>> This is a major regression because it will force everyone to use the
>>> CLI
>>>>> for things that they would normally use the UI for.
>>>>> 
>>>>> -s
>>>>> 
>>>>> 
>>>>> -s
>>>>> 
>>>>> 
>>>>>> On Tue, Mar 14, 2017 at 1:32 PM, Daniel Huang <dxhu...@gmail.com>
>>> wrote:
>>>>>> 
>>>>>> +1 (non-binding)!
>>>>>> 
>>>>>> On Tue, Mar 14, 2017 at 11:35 AM, siddharth anand <san...@apache.org>
>>>>>> wrote:
>>>>>> 
>>>>>>> +1 (binding)
>>>>>>> 
>>>>>>> 
>>>>>>> On Tue, Mar 14, 2017 at 8:42 AM, Maxime Beauchemin <
>>>>>>> maximebeauche...@gmail.com> wrote:
>>&

Re: [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc5

2017-03-15 Thread Bolke de Bruin
I have asked Sid to create and Jira and to make it reproducible. Nevertheless, 
I do not consider it a blocker as a workaround exists and it is relatively 
small in scope (while slightly annoying I understand that). 

Let’s get 1.8 out and do bug fixes in 1.8.1. More bugs will inevitably pop up 
:).

- Bolke

> On 15 Mar 2017, at 09:04, Chris Riccomini <criccom...@apache.org> wrote:
> 
> Has anyone been able to reproduce Sid's issue?
> 
> On Tue, Mar 14, 2017 at 11:17 PM, Bolke de Bruin <bdbr...@gmail.com> wrote:
> 
>> That is not an airflow error, but a Kerberos error. Try executing the
>> kinit command on the command line by yourself.
>> 
>> Bolke
>> 
>> Sent from my iPhone
>> 
>>> On 14 Mar 2017, at 23:11, Ruslan Dautkhanov <dautkha...@gmail.com>
>> wrote:
>>> 
>>> `airflow kerberos` is broken in 1.8-rc5
>>> https://issues.apache.org/jira/browse/AIRFLOW-987
>>> Hopefully fix can be part of the 1.8 release.
>>> 
>>> 
>>> 
>>> --
>>> Ruslan Dautkhanov
>>> 
>>>> On Tue, Mar 14, 2017 at 6:19 PM, siddharth anand <san...@apache.org>
>> wrote:
>>>> 
>>>> FYI,
>>>> I've just hit a major bug in the release candidate related to "clear
>> task"
>>>> behavior.
>>>> 
>>>> I've been running airflow in both stage and prod since yesterday on rc5
>> and
>>>> have reproduced this in both environments. I will file a JIRA for this
>>>> tonight, but wanted to send a note over email as well.
>>>> 
>>>> In my example, I have a 2 task DAG. For a given DAG run that has
>> completed
>>>> successfully, if I
>>>> 1) clear task2 (leaf task in this case), the previously-successful DAG
>> Run
>>>> goes back to Running, requeues, and executes the task successfully. The
>> DAG
>>>> Run the returns from Running to Success.
>>>> 2) clear task1 (root task in this case), the previously-successful DAG
>> Run
>>>> goes back to Running, DOES NOT requeue or execute the task at all. The
>> DAG
>>>> Run the returns from Running to Success though it never ran the task.
>>>> 
>>>> 1) is expected and previous behavior. 2) is a regression.
>>>> 
>>>> The only workaround is to use the CLI to run the task cleared. Here are
>>>> some images :
>>>> *After Clearing the Tasks*
>>>> https://www.dropbox.com/s/wmuxt0krwx6wurr/Screenshot%
>>>> 202017-03-14%2014.09.34.png?dl=0
>>>> 
>>>> *After DAG Runs return to Success*
>>>> https://www.dropbox.com/s/qop933rzgdzchpd/Screenshot%
>>>> 202017-03-14%2014.09.49.png?dl=0
>>>> 
>>>> This is a major regression because it will force everyone to use the CLI
>>>> for things that they would normally use the UI for.
>>>> 
>>>> -s
>>>> 
>>>> 
>>>> -s
>>>> 
>>>> 
>>>>> On Tue, Mar 14, 2017 at 1:32 PM, Daniel Huang <dxhu...@gmail.com>
>> wrote:
>>>>> 
>>>>> +1 (non-binding)!
>>>>> 
>>>>> On Tue, Mar 14, 2017 at 11:35 AM, siddharth anand <san...@apache.org>
>>>>> wrote:
>>>>> 
>>>>>> +1 (binding)
>>>>>> 
>>>>>> 
>>>>>> On Tue, Mar 14, 2017 at 8:42 AM, Maxime Beauchemin <
>>>>>> maximebeauche...@gmail.com> wrote:
>>>>>> 
>>>>>>> +1 (binding)
>>>>>>> 
>>>>>>> On Tue, Mar 14, 2017 at 3:59 AM, Alex Van Boxel <a...@vanboxel.be>
>>>>>> wrote:
>>>>>>> 
>>>>>>>> +1 (binding)
>>>>>>>> 
>>>>>>>> Note: we had to revert all our ONE_SUCCESS with ALL_SUCCESS trigger
>>>>>> rules
>>>>>>>> where the parent nodes where joining with a SKIP. But I can of
>>>> should
>>>>>>> have
>>>>>>>> known this was coming. Apart of that I had a successful run last
>>>>> night.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Tue, Mar 14, 2017 at 1:37 AM siddharth anand <san...@apache.org
>>>>> 
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> I'm going to deploy this to staging now. Fab work Bolk

Re: [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc5

2017-03-15 Thread Bolke de Bruin
That is not an airflow error, but a Kerberos error. Try executing the kinit 
command on the command line by yourself. 

Bolke

Sent from my iPhone

> On 14 Mar 2017, at 23:11, Ruslan Dautkhanov <dautkha...@gmail.com> wrote:
> 
> `airflow kerberos` is broken in 1.8-rc5
> https://issues.apache.org/jira/browse/AIRFLOW-987
> Hopefully fix can be part of the 1.8 release.
> 
> 
> 
> -- 
> Ruslan Dautkhanov
> 
>> On Tue, Mar 14, 2017 at 6:19 PM, siddharth anand <san...@apache.org> wrote:
>> 
>> FYI,
>> I've just hit a major bug in the release candidate related to "clear task"
>> behavior.
>> 
>> I've been running airflow in both stage and prod since yesterday on rc5 and
>> have reproduced this in both environments. I will file a JIRA for this
>> tonight, but wanted to send a note over email as well.
>> 
>> In my example, I have a 2 task DAG. For a given DAG run that has completed
>> successfully, if I
>> 1) clear task2 (leaf task in this case), the previously-successful DAG Run
>> goes back to Running, requeues, and executes the task successfully. The DAG
>> Run the returns from Running to Success.
>> 2) clear task1 (root task in this case), the previously-successful DAG Run
>> goes back to Running, DOES NOT requeue or execute the task at all. The DAG
>> Run the returns from Running to Success though it never ran the task.
>> 
>> 1) is expected and previous behavior. 2) is a regression.
>> 
>> The only workaround is to use the CLI to run the task cleared. Here are
>> some images :
>> *After Clearing the Tasks*
>> https://www.dropbox.com/s/wmuxt0krwx6wurr/Screenshot%
>> 202017-03-14%2014.09.34.png?dl=0
>> 
>> *After DAG Runs return to Success*
>> https://www.dropbox.com/s/qop933rzgdzchpd/Screenshot%
>> 202017-03-14%2014.09.49.png?dl=0
>> 
>> This is a major regression because it will force everyone to use the CLI
>> for things that they would normally use the UI for.
>> 
>> -s
>> 
>> 
>> -s
>> 
>> 
>>> On Tue, Mar 14, 2017 at 1:32 PM, Daniel Huang <dxhu...@gmail.com> wrote:
>>> 
>>> +1 (non-binding)!
>>> 
>>> On Tue, Mar 14, 2017 at 11:35 AM, siddharth anand <san...@apache.org>
>>> wrote:
>>> 
>>>> +1 (binding)
>>>> 
>>>> 
>>>> On Tue, Mar 14, 2017 at 8:42 AM, Maxime Beauchemin <
>>>> maximebeauche...@gmail.com> wrote:
>>>> 
>>>>> +1 (binding)
>>>>> 
>>>>> On Tue, Mar 14, 2017 at 3:59 AM, Alex Van Boxel <a...@vanboxel.be>
>>>> wrote:
>>>>> 
>>>>>> +1 (binding)
>>>>>> 
>>>>>> Note: we had to revert all our ONE_SUCCESS with ALL_SUCCESS trigger
>>>> rules
>>>>>> where the parent nodes where joining with a SKIP. But I can of
>> should
>>>>> have
>>>>>> known this was coming. Apart of that I had a successful run last
>>> night.
>>>>>> 
>>>>>> 
>>>>>> On Tue, Mar 14, 2017 at 1:37 AM siddharth anand <san...@apache.org
>>> 
>>>>> wrote:
>>>>>> 
>>>>>> I'm going to deploy this to staging now. Fab work Bolke!
>>>>>> -s
>>>>>> 
>>>>>> On Mon, Mar 13, 2017 at 2:16 PM, Dan Davydov <
>> dan.davy...@airbnb.com
>>> .
>>>>>> invalid
>>>>>>> wrote:
>>>>>> 
>>>>>>> I'll test this on staging as soon as I get a chance (the testing
>> is
>>>>>>> non-blocking on the rc5). Bolke very much in particular :).
>>>>>>> 
>>>>>>> On Mon, Mar 13, 2017 at 10:46 AM, Jeremiah Lowin <
>>> jlo...@apache.org>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> +1 (binding) extremely impressed by the work and diligence all
>>>>>>> contributors
>>>>>>>> have put in to getting these blockers fixed, Bolke in
>> particular.
>>>>>>>> 
>>>>>>>> On Mon, Mar 13, 2017 at 1:07 AM Arthur Wiedmer <
>>> art...@apache.org>
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> +1 (binding)
>>>>>>>>> 
>>>>>>>>> Thanks again for steering us through Bolke.
>>>>>>>>> 
>>>>&g

[VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc5

2017-03-12 Thread Bolke de Bruin
Dear All,

Finally, I have been able to make the FIFTH RELEASE CANDIDATE of Airflow 1.8.0 
available at: https://dist.apache.org/repos/dist/dev/incubator/airflow/ 
 , public keys are 
available at https://dist.apache.org/repos/dist/release/incubator/airflow/ 
 . It is tagged 
with a local version “apache.incubating” so it allows upgrading from earlier 
releases. 

Issues fixed since rc4:

[AIRFLOW-900] Double trigger should not kill original task instance
[AIRFLOW-900] Fixes bugs in LocalTaskJob for double run protection
[AIRFLOW-932] Do not mark tasks removed when backfilling
[AIRFLOW-961] run onkill when SIGTERMed
[AIRFLOW-910] Use parallel task execution for backfills
[AIRFLOW-967] Wrap strings in native for py2 ldap compatibility
[AIRFLOW-941] Use defined parameters for psycopg2
[AIRFLOW-719] Prevent DAGs from ending prematurely
[AIRFLOW-938] Use test for True in task_stats queries
[AIRFLOW-937] Improve performance of task_stats
[AIRFLOW-933] use ast.literal_eval rather eval because ast.literal_eval does 
not execute input.
[AIRFLOW-919] Running tasks with no start date shouldn't break a DAGs UI
[AIRFLOW-897] Prevent dagruns from failing with unfinished tasks
[AIRFLOW-861] make pickle_info endpoint be login_required
[AIRFLOW-853] use utf8 encoding for stdout line decode
[AIRFLOW-856] Make sure execution date is set for local client
[AIRFLOW-830][AIRFLOW-829][AIRFLOW-88] Reduce Travis log verbosity
[AIRFLOW-794] Access DAGS_FOLDER and SQL_ALCHEMY_CONN exclusively from settings
[AIRFLOW-694] Fix config behaviour for empty envvar
[AIRFLOW-365] Set dag.fileloc explicitly and use for Code view
[AIRFLOW-931] Do not set QUEUED in TaskInstances
[AIRFLOW-899] Tasks in SCHEDULED state should be white in the UI instead of 
black
[AIRFLOW-895] Address Apache release incompliancies
[AIRFLOW-893][AIRFLOW-510] Fix crashing webservers when a dagrun has no start 
date
[AIRFLOW-793] Enable compressed loading in S3ToHiveTransfer
[AIRFLOW-863] Example DAGs should have recent start dates
[AIRFLOW-869] Refactor mark success functionality
[AIRFLOW-856] Make sure execution date is set for local client
[AIRFLOW-814] Fix Presto*CheckOperator.__init__
[AIRFLOW-844] Fix cgroups directory creation

No known issues anymore.

I would also like to raise a VOTE for releasing 1.8.0 based on release 
candidate 5, i.e. just renaming release candidate 5 to 1.8.0 release. 

Please respond to this email by:

+1,0,-1 with *binding* if you are a PMC member or *non-binding* if you are not.

Thanks!
Bolke

My VOTE: +1 (binding)

Re: Proposal to simplify start/end dates

2017-03-07 Thread Bolke de Bruin
Ok sounds good. What do you do with a dag that gets predated and with an 
existing dag run? What happens if the interval changes, i.e. non cron syntax?

(Just thinking out loud)

B. 

Sent from my iPhone

> On 7 Mar 2017, at 22:27, Dan Davydov <dan.davy...@airbnb.com.INVALID> wrote:
> 
> Sure thing.
> 
> Current Behavior:
> - User creates DAG with default_args start date to 2015
> - dagrun gets kicked off for 2015
> - User changes default_args start date to 2016
> - dagruns continue running for 2015
> 
> New Behavior:
> - User creates DAG with default_args start date to 2015
> - dagrun gets kicked off for 2015
> - User changes default_args start date to 2016
> - *dagruns start running for the 2016 start date instead of 2015*
> 
>> On Tue, Mar 7, 2017 at 11:49 AM, Bolke de Bruin <bdbr...@gmail.com> wrote:
>> 
>> Hey Dan,
>> 
>> Im not sure if I am seeing a difference for #1 vs now, except you are
>> excluding backfills now from the calculation? Can you provide an example?
>> 
>> Bolke
>> 
>>> On 7 Mar 2017, at 20:38, Dan Davydov <dan.davy...@airbnb.com.INVALID>
>> wrote:
>>> 
>>> A very common source of confusion for our users is when they specify
>>> start_date in default_args but not in their DAG arguments and then try to
>>> change this start_date to move the execution of their DAG forward (e.g.
>>> from 2015 to 2016). This doesn't work because the logic that is used to
>>> calculate the "initial" start date of a dag differs from the logic to
>>> calculate subsequent dagrun start dates.
>>> 
>>> Current Airflow Logic:
>>> DS to schedule initial dagrun: dag.start_date if it exists, else
>> min(start
>>> date of tasks_of_dag)
>>> DS to schedule subsequent dagruns: last_dagrun + scheduled_interval
>>> 
>>> There are a couple ways of addressing this:
>>> 1. Change the definition of start date for subsequent dagruns to match
>> the
>>> "initial" dagrun start date (calculated from the minimum of task start
>>> dates)
>>> 2. Force explicit dag start dates
>>> 
>>> I personally like 1.
>>> 
>>> I also propose that we throw errors for DAGs that have tasks that depend
>> on
>>> other tasks with start dates that occur after theirs (otherwise there
>> could
>>> be deadlocks).
>>> 
>>> What do people think?
>> 
>> 


Re: Proposal to simplify start/end dates

2017-03-07 Thread Bolke de Bruin
Hey Dan,

Im not sure if I am seeing a difference for #1 vs now, except you are excluding 
backfills now from the calculation? Can you provide an example?

Bolke

> On 7 Mar 2017, at 20:38, Dan Davydov  wrote:
> 
> A very common source of confusion for our users is when they specify
> start_date in default_args but not in their DAG arguments and then try to
> change this start_date to move the execution of their DAG forward (e.g.
> from 2015 to 2016). This doesn't work because the logic that is used to
> calculate the "initial" start date of a dag differs from the logic to
> calculate subsequent dagrun start dates.
> 
> Current Airflow Logic:
> DS to schedule initial dagrun: dag.start_date if it exists, else min(start
> date of tasks_of_dag)
> DS to schedule subsequent dagruns: last_dagrun + scheduled_interval
> 
> There are a couple ways of addressing this:
> 1. Change the definition of start date for subsequent dagruns to match the
> "initial" dagrun start date (calculated from the minimum of task start
> dates)
> 2. Force explicit dag start dates
> 
> I personally like 1.
> 
> I also propose that we throw errors for DAGs that have tasks that depend on
> other tasks with start dates that occur after theirs (otherwise there could
> be deadlocks).
> 
> What do people think?



Re: Help needed: Travis builds failing - psycopg2 and ldap3

2017-03-07 Thread Bolke de Bruin
More info: 

If you do a

From future.utils import native

And use native(dn) it will get you further (it will get a connection) But the 
tests fail a bit further down then where I am assuming we see more py3/py2 
conversion problems. The issue is that we get a “newstr” in bytes for the dn 
and probably other variables suffer of the same issue.

Cheers
Bolke

> On 6 Mar 2017, at 22:43, Bolke de Bruin <bdbr...@gmail.com> wrote:
> 
> 
> Hint: It might be required to do a 'decode(“utf-8”)’ to make it pass.
> 
> See the discussion on: https://github.com/cannatag/ldap3/issues/305
> 
> Bolke
> 
> 
>> On 6 Mar 2017, at 08:00, Bolke de Bruin <bdbr...@gmail.com> wrote:
>> 
>> Thanks! Much appreciated.
>> 
>>> On 5 Mar 2017, at 22:23, Jayesh Senjaliya <jhsonl...@gmail.com> wrote:
>>> 
>>> Hi Bolke,
>>> 
>>> I can help with ldap issue.
>>> 
>>> I have faced same issue while trying to integrate with ldap earlier, and I
>>> tracked down to the same get_connection function you mention in the ticket.
>>> I think its really due to there difference between implementation of python
>>> 2 vs python3 lib.
>>> 
>>> anyway, I will look into it ( from this Tuesday) to see what kind of fix we
>>> can put on Airflow side.
>>> 
>>> Thanks
>>> Jayesh
>>> 
>>> 
>>> On Sun, Mar 5, 2017 at 11:24 AM, Bolke de Bruin <bdbr...@gmail.com> wrote:
>>> 
>>>> Hi Folks,
>>>> 
>>>> Our Travis builds have been failing for quite some time. This is due to 2
>>>> issues:
>>>> 
>>>> 1. Pyscopg2 2.7.0 has a small regression that treats “None” as a string. I
>>>> have a PR out that works around the issue: https://github.com/apache/
>>>> incubator-airflow/pull/2126 . This needs a review.
>>>> 
>>>> 2. The python 2 builds are failing with ldap errors, while the same config
>>>> passes on python 3. Honestly I am a bit at a loss here and I raised the
>>>> issue with Ldap3 itself (https://github.com/cannatag/ldap3/issues/305),
>>>> but it might not even be their issue. Trouble is I cannot reproduce it
>>>> locally in a Jupyter notebook.
>>>> 
>>>> I really need some help with #2, because debugging has been a challenge
>>>> and for every change we try to make to get us to release I have to verify
>>>> the output of the builds to see if the ‘right’  tests failed.
>>>> 
>>>> Can someone please take a look?
>>>> 
>>>> Cheers
>>>> Bolke
>>>> 
>>>> 
>>>> 
>> 
> 



Re: Help needed: Travis builds failing - psycopg2 and ldap3

2017-03-06 Thread Bolke de Bruin

Hint: It might be required to do a 'decode(“utf-8”)’ to make it pass.

See the discussion on: https://github.com/cannatag/ldap3/issues/305

Bolke


> On 6 Mar 2017, at 08:00, Bolke de Bruin <bdbr...@gmail.com> wrote:
> 
> Thanks! Much appreciated.
> 
>> On 5 Mar 2017, at 22:23, Jayesh Senjaliya <jhsonl...@gmail.com> wrote:
>> 
>> Hi Bolke,
>> 
>> I can help with ldap issue.
>> 
>> I have faced same issue while trying to integrate with ldap earlier, and I
>> tracked down to the same get_connection function you mention in the ticket.
>> I think its really due to there difference between implementation of python
>> 2 vs python3 lib.
>> 
>> anyway, I will look into it ( from this Tuesday) to see what kind of fix we
>> can put on Airflow side.
>> 
>> Thanks
>> Jayesh
>> 
>> 
>> On Sun, Mar 5, 2017 at 11:24 AM, Bolke de Bruin <bdbr...@gmail.com> wrote:
>> 
>>> Hi Folks,
>>> 
>>> Our Travis builds have been failing for quite some time. This is due to 2
>>> issues:
>>> 
>>> 1. Pyscopg2 2.7.0 has a small regression that treats “None” as a string. I
>>> have a PR out that works around the issue: https://github.com/apache/
>>> incubator-airflow/pull/2126 . This needs a review.
>>> 
>>> 2. The python 2 builds are failing with ldap errors, while the same config
>>> passes on python 3. Honestly I am a bit at a loss here and I raised the
>>> issue with Ldap3 itself (https://github.com/cannatag/ldap3/issues/305),
>>> but it might not even be their issue. Trouble is I cannot reproduce it
>>> locally in a Jupyter notebook.
>>> 
>>> I really need some help with #2, because debugging has been a challenge
>>> and for every change we try to make to get us to release I have to verify
>>> the output of the builds to see if the ‘right’  tests failed.
>>> 
>>> Can someone please take a look?
>>> 
>>> Cheers
>>> Bolke
>>> 
>>> 
>>> 
> 



Update status on getting to RC5 - reviewers wanted

2017-03-06 Thread Bolke de Bruin
Hi,

Just wanted to do a short update on RC5 status. We had 8 (!) blockers, but the 
good news they are either fixed or a patch is available. The patches are in 
need of a review, so it would be appreciated if some of the committers can make 
some time available to do so. Please note that while reviewing, build errors 
are expected due to ldap3 and psycopg2 lib issues. In case no build is 
available due to throttling on Travis, please check on the user’s 
GitHub/Travis. Usually, it is available there.

To be reviewed:

[AIRFLOW-931] Do not set QUEUED in TaskInstances 
 
(https://github.com/apache/incubator-airflow/pull/2127). This fixes with tasks 
getting stuck when concurrency has been reached. Might also solve sporadic 
double triggering of tasks.
[AIRFLOW-941] Use defined parameters for psycopg2 
 
(https://github.com/apache/incubator-airflow/pull/2126). Work around regression 
in psycopg2 2.7.0
[AIRFLOW-932] Do not mark tasks removed when backfilling 
 
(https://github.com/apache/incubator-airflow/pull/2122). Make sure not to 
remove tasks when only seeing a sub set of tasks of a DAG.
[AIRFLOW-910] Use parallel task execution in backfills 
 
(https://github.com/apache/incubator-airflow/pull/2107). Make task execution 
concurrent in backfills and make runs deterministic removing sporadic deadlocks.
[AIRFLOW-900] Double triggered task should not kill original task 
 
(https://github.com/apache/incubator-airflow/pull/2102). Kill the right task 
when we are double triggering.

Fixed:
* Skipped operations make DAG finish prematurely
* Setting a task to running manually breaks a DAGs UI
* (Named)HivePartitionSensor broken if hook attr not set
* Can't mark non-existent tasks as successful from graph view


Cheers!
Bolke

Re: Help needed: Travis builds failing - psycopg2 and ldap3

2017-03-05 Thread Bolke de Bruin
Thanks! Much appreciated.

> On 5 Mar 2017, at 22:23, Jayesh Senjaliya <jhsonl...@gmail.com> wrote:
> 
> Hi Bolke,
> 
> I can help with ldap issue.
> 
> I have faced same issue while trying to integrate with ldap earlier, and I
> tracked down to the same get_connection function you mention in the ticket.
> I think its really due to there difference between implementation of python
> 2 vs python3 lib.
> 
> anyway, I will look into it ( from this Tuesday) to see what kind of fix we
> can put on Airflow side.
> 
> Thanks
> Jayesh
> 
> 
> On Sun, Mar 5, 2017 at 11:24 AM, Bolke de Bruin <bdbr...@gmail.com> wrote:
> 
>> Hi Folks,
>> 
>> Our Travis builds have been failing for quite some time. This is due to 2
>> issues:
>> 
>> 1. Pyscopg2 2.7.0 has a small regression that treats “None” as a string. I
>> have a PR out that works around the issue: https://github.com/apache/
>> incubator-airflow/pull/2126 . This needs a review.
>> 
>> 2. The python 2 builds are failing with ldap errors, while the same config
>> passes on python 3. Honestly I am a bit at a loss here and I raised the
>> issue with Ldap3 itself (https://github.com/cannatag/ldap3/issues/305),
>> but it might not even be their issue. Trouble is I cannot reproduce it
>> locally in a Jupyter notebook.
>> 
>> I really need some help with #2, because debugging has been a challenge
>> and for every change we try to make to get us to release I have to verify
>> the output of the builds to see if the ‘right’  tests failed.
>> 
>> Can someone please take a look?
>> 
>> Cheers
>> Bolke
>> 
>> 
>> 



Re: Airflow running different with different user id ?

2017-03-03 Thread Bolke de Bruin
Nice management of expectations ;-). 

Sent from my iPhone

> On 3 Mar 2017, at 21:44, Dan Davydov  wrote:
> 
> Within a couple of weeks.
> 
>> On Fri, Mar 3, 2017 at 12:34 PM, Michael Gong  wrote:
>> 
>> When approximately will it be released?
>> 
>> Sent from my PP•KING™ smartphone
>> 
>> On Mar 3, 2017 1:42 PM, Dan Davydov 
>> wrote:
>> Yes it is starting on 1.8.0 which will be released soon, you can look in
>> the documentation/grep for "run_as".
>> 
>>> On Mar 3, 2017 8:50 AM, "Michael Gong"  wrote:
>>> 
>>> Hi,
>>> 
>>> 
>>> Suppose I have 1 airflow instance running 2 different DAGs, is it
>> possible
>>> to specify the 2 DAGs running under 2 different ids ?
>>> 
>>> 
>>> Any advises are welcomed.
>>> 
>>> 
>>> Thanks.
>>> 
>>> Michael
>>> 
>>> 
>>> 
>>> 
>>> 
>> 


Re: Getting to RC5: Update

2017-03-01 Thread Bolke de Bruin
Please create a Jira and provide context when this happens. “REMOVED” marked 
means the taskinstance does not have a task equivalent anymore in the dag (or 
so it should :)).

Bolke

> On 1 Mar 2017, at 19:55, Dan Davydov <dan.davy...@airbnb.com.INVALID> wrote:
> 
> We are seeing another major issue with backfills where task instances are
> being deleted and marked as "removed", I am still investigating. Let's keep
> discussion about these in https://issues.apache.org/jira/browse/AIRFLOW-921
> and the subtask comments to have it one place. I will look at the other
> points you cc'd me on too. Thanks for continuing to drive this forward!
> 
> On Wed, Mar 1, 2017 at 8:22 AM, Bolke de Bruin <bdbr...@gmail.com> wrote:
> 
>> Hi,
>> 
>> Just wanted to give an update about the progress getting to RC5. As
>> reported we have 6 blockers listed.
>> 
>> 1. Double run job should not terminate the existing running job. -> Patch
>> Available
>> 2. Parallelize dag runs in backfills -> Patch Available, Tests need to be
>> updated, see below
>> 3. Setting a task to running manually breaks a DAGs UI -> Patch merged
>> 4. Can't mark non-existent tasks as successful from graph view ->
>> Workaround available (t.b.c.), Patch Available unit tests to be added
>> 5. (Named)HivePartitionSensor broken if hook attr not set -> Patch merged
>> 6. Skipped tasks potentially cause a dagrun to be marked as
>> failure/success prematurely -> see below
>> 
>> On 2 I would like to have some more discussion of this would be acceptable
>> (https://github.com/apache/incubator-airflow/pull/2107). I have written
>> the patch for this, however we are not large backfill users. So I need
>> feedback specifically on ripping out the “executor” part: @dan, @max.
>> 
>> On 6 Alex has reported this earlier and written a PR for this (
>> https://github.com/apache/incubator-airflow/pull/1961). Maxime had some
>> thoughts about this, which are currently blocking the integration. However,
>> in testing it seems to solve the issue. Can we finalise the discussion
>> please @max @dan @alex?
>> 
>> Cheers
>> Bolke



Re: [RESULT] [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc4

2017-02-28 Thread Bolke de Bruin
Gotcha:

It works, but then slightly different. 

If you added the subdag, do not zoom in, but click on the subdag in the main 
dag. Use mark success there. It will then allow you to mark all tasks 
successful that are part of the subdag. 

Do we still consider this a blocker? Imho, no as a workaround seems to exist. 

- Bolke

> On 27 Feb 2017, at 23:19, Dan Davydov <dan.davy...@airbnb.com.INVALID> wrote:
> 
> rc + your patch (and a couple of our own custom ones)
> 
> On Mon, Feb 27, 2017 at 2:11 PM, Bolke de Bruin <bdbr...@gmail.com> wrote:
> 
>> Dan
>> 
>> Btw are you running with my patch for this? Or still plain rc?
>> 
>> Cheers
>> Bolke
>> 
>> Sent from my iPhone
>> 
>>> On 27 Feb 2017, at 22:46, Bolke de Bruin <bdbr...@gmail.com> wrote:
>>> 
>>> I'll have a look. I verified and the code is there to take of this.
>>> 
>>> B.
>>> 
>>> Sent from my iPhone
>>> 
>>>> On 27 Feb 2017, at 22:34, Dan Davydov <dan.davy...@airbnb.com.INVALID>
>> wrote:
>>>> 
>>>> Repro steps:
>>>> - Create a DAG with a dummy task
>>>> - Let this DAG run for one dagrun
>>>> - Add a new subdag operator that contains a dummy operator to this DAG
>> that
>>>> has depends_on_past set to true
>>>> - click on the white square for the new subdag operator in the DAGs
>> first
>>>> dagrun
>>>> - Click "Zoom into subdag" (takes you to the graph view for the subdag)
>>>> - Click the dummy task in the graph view and click "Mark Success"
>>>> - Observe that the list of tasks to mark as success is empty (it should
>>>> contain the dummy task)
>>>> 
>>>>> On Mon, Feb 27, 2017 at 1:03 PM, Bolke de Bruin <bdbr...@gmail.com>
>> wrote:
>>>>> 
>>>>> Dan
>>>>> 
>>>>> Can you elaborate on 2, cause I thought I specifically took care of
>> that.
>>>>> 
>>>>> Cheers
>>>>> Bolke
>>>>> 
>>>>> Sent from my iPhone
>>>>> 
>>>>>> On 27 Feb 2017, at 20:27, Dan Davydov <dan.davy...@airbnb.com.
>> INVALID>
>>>>> wrote:
>>>>>> 
>>>>>> I created https://issues.apache.org/jira/browse/AIRFLOW-921 to track
>> the
>>>>>> pending issues.
>>>>>> 
>>>>>> There are two more issues we found which I included there:
>>>>>> 1. Task instances that have their state manually set to running make
>> the
>>>>> UI
>>>>>> for their DAG unable to parse
>>>>>> 2. Mark success doesn't work for non existent task instances/dagruns
>>>>> which
>>>>>> breaks the subdag use case (setting tasks as successful via the graph
>>>>> view)
>>>>>> 
>>>>>>> On Mon, Feb 27, 2017 at 11:06 AM, Bolke de Bruin <bdbr...@gmail.com>
>>>>> wrote:
>>>>>>> 
>>>>>>> Hey Max
>>>>>>> 
>>>>>>> It is massive for sure. Sorry about that ;-). However it is not as
>>>>> massive
>>>>>>> as you might deduct from a first view. 0) run tasks concurrently
>> across
>>>>> dag
>>>>>>> runs 1) ordering of the tasks was added to the loop. 2) calculating
>> of
>>>>>>> deadlocks, running tasks, tasks to run was corrected, 3) relying on
>> the
>>>>>>> executor for status updates was replaced, 4) (tbd) executor failure
>>>>> check
>>>>>>> to protect against endless Ioops.
>>>>>>> 
>>>>>>> 0+1 seem bigger than they are due to the amount of lines changed. 2
>> is a
>>>>>>> subtle change, that touches a couple of lines to pop/push properly.
>> 3)
>>>>> is
>>>>>>> bigger, as I didn't like the reliance on the executor. 4) is old code
>>>>> that
>>>>>>> needs to be added again.
>>>>>>> 
>>>>>>> I probably can leave out 3 which makes 4 mood. The change would be
>>>>>>> smaller. Maybe I could even completely remove 3 and just add 4. What
>> are
>>>>>>> your thoughts?
>>>>>>> 
>>>>>>> The random failures we were seeing were the "implicit"

Re: [RESULT] [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc4

2017-02-27 Thread Bolke de Bruin
Dan

Btw are you running with my patch for this? Or still plain rc?

Cheers
Bolke

Sent from my iPhone

> On 27 Feb 2017, at 22:46, Bolke de Bruin <bdbr...@gmail.com> wrote:
> 
> I'll have a look. I verified and the code is there to take of this. 
> 
> B. 
> 
> Sent from my iPhone
> 
>> On 27 Feb 2017, at 22:34, Dan Davydov <dan.davy...@airbnb.com.INVALID> wrote:
>> 
>> Repro steps:
>> - Create a DAG with a dummy task
>> - Let this DAG run for one dagrun
>> - Add a new subdag operator that contains a dummy operator to this DAG that
>> has depends_on_past set to true
>> - click on the white square for the new subdag operator in the DAGs first
>> dagrun
>> - Click "Zoom into subdag" (takes you to the graph view for the subdag)
>> - Click the dummy task in the graph view and click "Mark Success"
>> - Observe that the list of tasks to mark as success is empty (it should
>> contain the dummy task)
>> 
>>> On Mon, Feb 27, 2017 at 1:03 PM, Bolke de Bruin <bdbr...@gmail.com> wrote:
>>> 
>>> Dan
>>> 
>>> Can you elaborate on 2, cause I thought I specifically took care of that.
>>> 
>>> Cheers
>>> Bolke
>>> 
>>> Sent from my iPhone
>>> 
>>>> On 27 Feb 2017, at 20:27, Dan Davydov <dan.davy...@airbnb.com.INVALID>
>>> wrote:
>>>> 
>>>> I created https://issues.apache.org/jira/browse/AIRFLOW-921 to track the
>>>> pending issues.
>>>> 
>>>> There are two more issues we found which I included there:
>>>> 1. Task instances that have their state manually set to running make the
>>> UI
>>>> for their DAG unable to parse
>>>> 2. Mark success doesn't work for non existent task instances/dagruns
>>> which
>>>> breaks the subdag use case (setting tasks as successful via the graph
>>> view)
>>>> 
>>>>> On Mon, Feb 27, 2017 at 11:06 AM, Bolke de Bruin <bdbr...@gmail.com>
>>> wrote:
>>>>> 
>>>>> Hey Max
>>>>> 
>>>>> It is massive for sure. Sorry about that ;-). However it is not as
>>> massive
>>>>> as you might deduct from a first view. 0) run tasks concurrently across
>>> dag
>>>>> runs 1) ordering of the tasks was added to the loop. 2) calculating of
>>>>> deadlocks, running tasks, tasks to run was corrected, 3) relying on the
>>>>> executor for status updates was replaced, 4) (tbd) executor failure
>>> check
>>>>> to protect against endless Ioops.
>>>>> 
>>>>> 0+1 seem bigger than they are due to the amount of lines changed. 2 is a
>>>>> subtle change, that touches a couple of lines to pop/push properly. 3)
>>> is
>>>>> bigger, as I didn't like the reliance on the executor. 4) is old code
>>> that
>>>>> needs to be added again.
>>>>> 
>>>>> I probably can leave out 3 which makes 4 mood. The change would be
>>>>> smaller. Maybe I could even completely remove 3 and just add 4. What are
>>>>> your thoughts?
>>>>> 
>>>>> The random failures we were seeing were the "implicit" test of not a
>>>>> executing in the right order and then deadlocking. But no explicit tests
>>>>> exist. Help would definitely be appreciated.
>>>>> 
>>>>> Yes I thought about using the scheduler and/or reusing logic from the
>>>>> scheduler. I even experimented a little with it but it didn't allow me
>>> to
>>>>> pass the tests effectively.
>>>>> 
>>>>> What I am planning to do is split the function and make it unit testable
>>>>> if you agree with the current approach.
>>>>> 
>>>>> Bolke
>>>>> 
>>>>> Sent from my iPhone
>>>>> 
>>>>>> On 27 Feb 2017, at 18:35, Maxime Beauchemin <
>>> maximebeauche...@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>> This PR is pretty massive and complex! It looks like solid work but
>>> let's
>>>>>> be really careful around testing and rolling this out.
>>>>>> 
>>>>>> This may be out of scope for this PR, but wanted to discuss the idea of
>>>>>> using the scheduler's logic to perform backfills. It'd be nice to have
>>>>> that
>>>>>&g

Re: [RESULT] [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc4

2017-02-27 Thread Bolke de Bruin
Dan

Can you elaborate on 2, cause I thought I specifically took care of that. 

Cheers
Bolke

Sent from my iPhone

> On 27 Feb 2017, at 20:27, Dan Davydov <dan.davy...@airbnb.com.INVALID> wrote:
> 
> I created https://issues.apache.org/jira/browse/AIRFLOW-921 to track the
> pending issues.
> 
> There are two more issues we found which I included there:
> 1. Task instances that have their state manually set to running make the UI
> for their DAG unable to parse
> 2. Mark success doesn't work for non existent task instances/dagruns which
> breaks the subdag use case (setting tasks as successful via the graph view)
> 
>> On Mon, Feb 27, 2017 at 11:06 AM, Bolke de Bruin <bdbr...@gmail.com> wrote:
>> 
>> Hey Max
>> 
>> It is massive for sure. Sorry about that ;-). However it is not as massive
>> as you might deduct from a first view. 0) run tasks concurrently across dag
>> runs 1) ordering of the tasks was added to the loop. 2) calculating of
>> deadlocks, running tasks, tasks to run was corrected, 3) relying on the
>> executor for status updates was replaced, 4) (tbd) executor failure check
>> to protect against endless Ioops.
>> 
>> 0+1 seem bigger than they are due to the amount of lines changed. 2 is a
>> subtle change, that touches a couple of lines to pop/push properly. 3) is
>> bigger, as I didn't like the reliance on the executor. 4) is old code that
>> needs to be added again.
>> 
>> I probably can leave out 3 which makes 4 mood. The change would be
>> smaller. Maybe I could even completely remove 3 and just add 4. What are
>> your thoughts?
>> 
>> The random failures we were seeing were the "implicit" test of not a
>> executing in the right order and then deadlocking. But no explicit tests
>> exist. Help would definitely be appreciated.
>> 
>> Yes I thought about using the scheduler and/or reusing logic from the
>> scheduler. I even experimented a little with it but it didn't allow me to
>> pass the tests effectively.
>> 
>> What I am planning to do is split the function and make it unit testable
>> if you agree with the current approach.
>> 
>> Bolke
>> 
>> Sent from my iPhone
>> 
>>> On 27 Feb 2017, at 18:35, Maxime Beauchemin <maximebeauche...@gmail.com>
>> wrote:
>>> 
>>> This PR is pretty massive and complex! It looks like solid work but let's
>>> be really careful around testing and rolling this out.
>>> 
>>> This may be out of scope for this PR, but wanted to discuss the idea of
>>> using the scheduler's logic to perform backfills. It'd be nice to have
>> that
>>> logic in one place, though I lost grasp on the details around feasibility
>>> around this approach. I'm sure you looked into this option before issuing
>>> this PR and I'm curious to hear your thoughts on blockers/challenges
>> around
>>> this alternate approach.
>>> 
>>> Also I'm wondering whether we have any sort of mechanisms in our
>>> integration test to validate that task dependencies are respected and run
>>> in the right order. If not I was thinking we could build some abstraction
>>> to make it easy to write this type of tests in an expressive way.
>>> 
>>> ```
>>> #[some code to run a backfill, or a scheduler session]
>>> it = IntegrationTestResults(dag_id='exmaple1')
>>> assert it.ran_before('task1', 'task_2')
>>> assert ti.overlapped('task1', 'task_3') # confirms 2 tasks ran in
>> parallel
>>> assert ti.none_failed()
>>> assert ti.ran_last('root')
>>> assert ti.max_concurrency_reached() == POOL_LIMIT
>>> ```
>>> 
>>> Max
>>> 
>>>> On Mon, Feb 27, 2017 at 5:41 AM, Bolke de Bruin <bdbr...@gmail.com>
>> wrote:
>>>> 
>>>> I have worked in the Backfill issue also in collaboration with Jeremiah.
>>>> 
>>>> The refactor to use dag runs in backfills caused a regression
>>>> in task execution performance as dag runs were executed
>>>> sequentially. Next to that, the backfills were non deterministic
>>>> due to the random execution of tasks, causing root tasks
>>>> being added to the non ready list too soon.
>>>> 
>>>> This updates the backfill logic as follows:
>>>> 
>>>>   • Parallelize execution of tasks
>>>>   • Use a leave first execution model; Breadth-first algorithm by
>>>> Jerermiah
>>>>   • Replace state updates from the executor by task based only
>>

Re: [RESULT] [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc4

2017-02-27 Thread Bolke de Bruin
Hey Max

It is massive for sure. Sorry about that ;-). However it is not as massive as 
you might deduct from a first view. 0) run tasks concurrently across dag runs 
1) ordering of the tasks was added to the loop. 2) calculating of deadlocks, 
running tasks, tasks to run was corrected, 3) relying on the executor for 
status updates was replaced, 4) (tbd) executor failure check to protect against 
endless Ioops. 

0+1 seem bigger than they are due to the amount of lines changed. 2 is a subtle 
change, that touches a couple of lines to pop/push properly. 3) is bigger, as I 
didn't like the reliance on the executor. 4) is old code that needs to be added 
again. 

I probably can leave out 3 which makes 4 mood. The change would be smaller. 
Maybe I could even completely remove 3 and just add 4. What are your thoughts?

The random failures we were seeing were the "implicit" test of not a executing 
in the right order and then deadlocking. But no explicit tests exist. Help 
would definitely be appreciated. 

Yes I thought about using the scheduler and/or reusing logic from the 
scheduler. I even experimented a little with it but it didn't allow me to pass 
the tests effectively. 

What I am planning to do is split the function and make it unit testable if you 
agree with the current approach. 

Bolke

Sent from my iPhone

> On 27 Feb 2017, at 18:35, Maxime Beauchemin <maximebeauche...@gmail.com> 
> wrote:
> 
> This PR is pretty massive and complex! It looks like solid work but let's
> be really careful around testing and rolling this out.
> 
> This may be out of scope for this PR, but wanted to discuss the idea of
> using the scheduler's logic to perform backfills. It'd be nice to have that
> logic in one place, though I lost grasp on the details around feasibility
> around this approach. I'm sure you looked into this option before issuing
> this PR and I'm curious to hear your thoughts on blockers/challenges around
> this alternate approach.
> 
> Also I'm wondering whether we have any sort of mechanisms in our
> integration test to validate that task dependencies are respected and run
> in the right order. If not I was thinking we could build some abstraction
> to make it easy to write this type of tests in an expressive way.
> 
> ```
> #[some code to run a backfill, or a scheduler session]
> it = IntegrationTestResults(dag_id='exmaple1')
> assert it.ran_before('task1', 'task_2')
> assert ti.overlapped('task1', 'task_3') # confirms 2 tasks ran in parallel
> assert ti.none_failed()
> assert ti.ran_last('root')
> assert ti.max_concurrency_reached() == POOL_LIMIT
> ```
> 
> Max
> 
>> On Mon, Feb 27, 2017 at 5:41 AM, Bolke de Bruin <bdbr...@gmail.com> wrote:
>> 
>> I have worked in the Backfill issue also in collaboration with Jeremiah.
>> 
>> The refactor to use dag runs in backfills caused a regression
>> in task execution performance as dag runs were executed
>> sequentially. Next to that, the backfills were non deterministic
>> due to the random execution of tasks, causing root tasks
>> being added to the non ready list too soon.
>> 
>> This updates the backfill logic as follows:
>> 
>>• Parallelize execution of tasks
>>• Use a leave first execution model; Breadth-first algorithm by
>> Jerermiah
>>• Replace state updates from the executor by task based only
>> updates
>> 
>> https://github.com/apache/incubator-airflow/pull/2107
>> 
>> Please review and test properly.
>> 
>> What has been left out at the moment is the checking the executor itself
>> for multiple failures of a task run, where the task itself was never able
>> to execute. Let me know if this is a real world scenario (maybe when disk
>> space issue?). I will add it back in.
>> 
>> - Bolke
>> 
>> 
>>> On 25 Feb 2017, at 09:07, Bolke de Bruin <bdbr...@gmail.com> wrote:
>>> 
>>> Hi Dan,
>>> 
>>> - Backfill indeed runs only one dagrun at the time, see line 1755 of
>> jobs.py. I’ll think about how to fix this over the weekend (I think it was
>> my change that introduced this). Suggestions always welcome. Depending the
>> impact it is a blocker or not. We don’t often use backfills and definitely
>> not at your size, so that is why it didn’t pop up with us. I’m assuming
>> blocker for now, btw.
>>> - Speculation on the High DB Load. I’m not sure what your benchmark is
>> here (1.7.1 + multi processor dags?), but as you mentioned in the code
>> dependencies are checked a couple of times for one run and even task
>> instance. Dependency checking requires aggregation on the DB, which is a
>> performance killer. Annoying but not 

Re: [RESULT] [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc4

2017-02-27 Thread Bolke de Bruin
I have worked in the Backfill issue also in collaboration with Jeremiah.

The refactor to use dag runs in backfills caused a regression
in task execution performance as dag runs were executed
sequentially. Next to that, the backfills were non deterministic
due to the random execution of tasks, causing root tasks
being added to the non ready list too soon.

This updates the backfill logic as follows:

• Parallelize execution of tasks
• Use a leave first execution model; Breadth-first algorithm by 
Jerermiah
• Replace state updates from the executor by task based only updates

https://github.com/apache/incubator-airflow/pull/2107

Please review and test properly.

What has been left out at the moment is the checking the executor itself for 
multiple failures of a task run, where the task itself was never able to 
execute. Let me know if this is a real world scenario (maybe when disk space 
issue?). I will add it back in.

- Bolke


> On 25 Feb 2017, at 09:07, Bolke de Bruin <bdbr...@gmail.com> wrote:
> 
> Hi Dan,
> 
> - Backfill indeed runs only one dagrun at the time, see line 1755 of jobs.py. 
> I’ll think about how to fix this over the weekend (I think it was my change 
> that introduced this). Suggestions always welcome. Depending the impact it is 
> a blocker or not. We don’t often use backfills and definitely not at your 
> size, so that is why it didn’t pop up with us. I’m assuming blocker for now, 
> btw.
> - Speculation on the High DB Load. I’m not sure what your benchmark is here 
> (1.7.1 + multi processor dags?), but as you mentioned in the code 
> dependencies are checked a couple of times for one run and even task 
> instance. Dependency checking requires aggregation on the DB, which is a 
> performance killer. Annoying but not a blocker.
> - Skipped tasks potentially cause a dagrun to be marked failure/success 
> prematurely. BranchOperators are widely used if it affects these operators, 
> then it is a blocker.
> 
> - Bolke
> 
>> On 25 Feb 2017, at 02:04, Dan Davydov <dan.davy...@airbnb.com.INVALID> wrote:
>> 
>> Update on old pending issues:
>> - Black Squares in UI: Fix merged
>> - Double Trigger Issue That Alex G Mentioned: Alex has a PR in flight
>> 
>> New Issues:
>> - Backfill seems to be having issues (only running one dagrun at a time),
>> we are still investigating - might be a blocker
>> - High DB Load (~8x more than 1.7) - We are still investigating but it's
>> probably not a blocker for the release
>> - Skipped tasks potentially cause a dagrun to be marked as failure/success
>> prematurely - not sure whether or not to classify this as a blocker (only
>> really an issue for users who use the BranchingPythonOperator, which AirBnB
>> does)
>> 
>> On Thu, Feb 23, 2017 at 5:59 PM, siddharth anand <san...@apache.org> wrote:
>> 
>>> IMHO, a DAG run without a start date is non-sensical but is not enforced
>>> That said, our UI allows for the manual creation of DAG Runs without a
>>> start date as shown in the images below:
>>> 
>>> 
>>>  - https://www.dropbox.com/s/3sxcqh04eztpl7p/Screenshot%
>>>  202017-02-22%2016.00.40.png?dl=0
>>>  <https://www.dropbox.com/s/3sxcqh04eztpl7p/Screenshot%
>>> 202017-02-22%2016.00.40.png?dl=0>
>>>  - https://www.dropbox.com/s/4q6rr9dwghag1yy/Screenshot%
>>>  202017-02-22%2016.02.22.png?dl=0
>>>  <https://www.dropbox.com/s/4q6rr9dwghag1yy/Screenshot%
>>> 202017-02-22%2016.02.22.png?dl=0>
>>> 
>>> 
>>> On Wed, Feb 22, 2017 at 2:26 PM, Maxime Beauchemin <
>>> maximebeauche...@gmail.com> wrote:
>>> 
>>>> Our database may have edge cases that could be associated with running
>>> any
>>>> previous version that may or may not have been part of an official
>>> release.
>>>> 
>>>> Let's see if anyone else reports the issue. If no one does, one option is
>>>> to release 1.8.0 as is with a comment in the release notes, and have a
>>>> future official minor apache release 1.8.1 that would fix these minor
>>>> issues that are not deal breaker.
>>>> 
>>>> @bolke, I'm curious, how long does it take you to go through one release
>>>> cycle? Oh, and do you have a documented step by step process for
>>> releasing?
>>>> I'd like to add the Pypi part to this doc and add committers that are
>>>> interested to have rights on the project on Pypi.
>>>> 
>>>> Max
>>>> 
>>>> On Wed, Feb 22, 2017 at 2:00 PM, Bolke de Bruin <bdbr...@gmail.com>
>>> wrote:
&

Cutting down on testing time - updated

2017-02-25 Thread Bolke de Bruin
Hi All,

(Welcome to new MacBook Pro that has a send “button” on the touch bar)

Jeremiah and I have been looking into optimising the time that is spend on 
tests. The reason for this was that Travis’ runs are taking more and more time 
and we are being throttled by travis. As part of that we enabled color coding 
of test outcomes and timing of tests. The results kind of …surprising.

This is the top 20 of tests were we spend the most time. MySQL (remember 
concurrent access enabled) - 
https://s3.amazonaws.com/archive.travis-ci.org/jobs/205277617/log.txt: 


tests.BackfillJobTest.test_backfill_examples: 287.9209s
tests.BackfillJobTest.test_backfill_multi_dates: 53.5198s
tests.SchedulerJobTest.test_scheduler_start_date: 36.4935s
tests.CoreTest.test_scheduler_job: 35.5852s
tests.CliTests.test_backfill: 29.7484s
tests.SchedulerJobTest.test_scheduler_multiprocessing: 26.1573s
tests.DaskExecutorTest.test_backfill_integration: 24.5456s
tests.CoreTest.test_schedule_dag_no_end_date_up_to_today_only: 17.3278s
tests.SubDagOperatorTests.test_subdag_deadlock: 16.1957s
tests.SensorTimeoutTest.test_timeout: 15.1000s
tests.SchedulerJobTest.test_dagrun_deadlock_ignore_depends_on_past: 13.8812s
tests.BackfillJobTest.test_cli_backfill_depends_on_past: 12.9539s
tests.SchedulerJobTest.test_dagrun_deadlock_ignore_depends_on_past_advance_ex_date:
 12.8779s
tests.SchedulerJobTest.test_dagrun_success: 12.8177s
tests.SchedulerJobTest.test_dagrun_root_fail: 10.3953s
tests.SchedulerJobTest.test_dag_with_system_exit: 10.1132s
tests.TransferTests.test_mysql_to_hive: 8.5939s
tests.SchedulerJobTest.test_retry_still_in_executor: 8.1739s
tests.SchedulerJobTest.test_dagrun_fail: 7.9855s
tests.ImpersonationTest.test_default_impersonation: 7.4993s

Yes we spend a whopping 5 minutes on executing all examples. Another 
interesting one is “tests.CoreTest.test_scheduler_job”. This test just checks 
whether a certain directories are creating as part of logging. This could have 
been covered by a real unit test just covering the functionality of the 
function that creates the files - now it takes 35s. 

We discussed several strategies for reducing time apart from rewriting some of 
the tests (that would be a herculean job!). What the most optimal seems is:

1. Run the scheduler tests apart from all other tests. 
2. Run “operator” integration tests in their own unit.
3. Run UI tests separate
4. Run API tests separate

This creates the following build matrix (warning ASCII art):

——
|   |  Scheduler |  Operators   |   UI  
|   API | 
——
| Python 2  | x  |. x   |   
x   |   x   |
——
| Python 3  | x  |  x   |   
x   |   x   |
——
| Kerberos  ||  |   
x   |   x   |
——
| Ldap  ||  |   
x   |   |
——
| Hive  ||  x   |   
x   |   x   |
——
| SSH   ||  x   |   
|   |
——
| Postgres  | x  |  x   |   
x   |   x   |
——
| MySQL | x  |  x   |   
x   |   x   |
——
| SQLite| x  |  x   
|   x   |   x   |
——


So from this build matrix one can deduct that Postgres, MySQL are generic 
services that will be present in every build. In addition all builds will use 
Python 2 and Python 3. And I propose using Python 3.4 and Python 3.5. The 
matrix can be expressed by environment variables. See .travis.yml for the 
current build matrix.

Furthermore, I would like us to label our tests correctly, e.g. unit test or 
integration test. This can be done by a comment or introducing a decorator 
@unittest and @integrationtest. This is to help reviewers and maintainers to 
find out whether new functionality is correctly covered. At a minimum a unit 
test is required for new functionality.

What is a unit test (thanks stack overflow): A unit test is a test written by 
the programmer to verify that a relatively small piece of code is doing what it 
is intended to do. They are narrow in scope, they should be easy 

Cutting down on testing time

2017-02-25 Thread Bolke de Bruin
Hi All,

Jeremiah and I have been looking into optimising the time that is spend on 
tests. The reason for this was that Travis’ runs are taking more and more time 
and we are being throttled by travis. As part of that we enabled color coding 
of test outcomes and timing of tests. The results kind of …surprising.

This is the top 20 of tests were we spend the most time. MySQL (remember 
concurrent access enabled) - 
https://s3.amazonaws.com/archive.travis-ci.org/jobs/205277617/log.txt:

tests.BackfillJobTest.test_backfill_examples: 287.9209s
tests.BackfillJobTest.test_backfill_multi_dates: 53.5198s
tests.SchedulerJobTest.test_scheduler_start_date: 36.4935s
tests.CoreTest.test_scheduler_job: 35.5852s
tests.CliTests.test_backfill: 29.7484s
tests.SchedulerJobTest.test_scheduler_multiprocessing: 26.1573s
tests.DaskExecutorTest.test_backfill_integration: 24.5456s
tests.CoreTest.test_schedule_dag_no_end_date_up_to_today_only: 17.3278s
tests.SubDagOperatorTests.test_subdag_deadlock: 16.1957s
tests.SensorTimeoutTest.test_timeout: 15.1000s
tests.SchedulerJobTest.test_dagrun_deadlock_ignore_depends_on_past: 13.8812s
tests.BackfillJobTest.test_cli_backfill_depends_on_past: 12.9539s
tests.SchedulerJobTest.test_dagrun_deadlock_ignore_depends_on_past_advance_ex_date:
 12.8779s
tests.SchedulerJobTest.test_dagrun_success: 12.8177s
tests.SchedulerJobTest.test_dagrun_root_fail: 10.3953s
tests.SchedulerJobTest.test_dag_with_system_exit: 10.1132s
tests.TransferTests.test_mysql_to_hive: 8.5939s
tests.SchedulerJobTest.test_retry_still_in_executor: 8.1739s
tests.SchedulerJobTest.test_dagrun_fail: 7.9855s
tests.ImpersonationTest.test_default_impersonation: 7.4993s

Yes we spend a whopping 5 minutes on executing all examples. Another 
interesting one is “tests.CoreTest.test_scheduler_job”. This test just checks 
whether a certain directories are creating as part of logging. This could have 
been covered by a real unit test just covering the functionality of the 
function that creates the files - now it takes 35s. 

We discussed several strategies for reducing time apart from rewriting some of 
the tests (that would be a herculean job!). What the most optimal seems is:

1. Run the scheduler tests apart from all other tests. 
2. Run “operator” integration tests in their own unit.
3. Run UI tests separate
4. Run API tests separate

This creates the following build matrix (warning ASCII art):

——
|   |  Scheduler |  Operators   |   UI  
|   API | 
——
| Python 2  | x  |. x   |   
x   |   x   |
——
| Python 3  | x  |  x   |   
x   |   x   |
——
| Kerberos  ||  |   
x   |   x   |
——
| Ldap  ||  |   
x   |   |
——
| Hive  ||  x   |   
x   |   x   |
——
| SSH   ||  x   |   
|   |
——
| Postgres  | x  |  x   |   
x   |   x   |
——
| MySQL | x  |  x   |   
x   |   x   |
——
| SQLite| x  |  x   
|   x   |   x   |
——


So from this build matrix one can deduct that Postgres, MySQL are generic 
services that will be present in every build. In addition all builds will use 
Python 2 and Python 3. And I propose using Python 3.4 and Python 3.5.


Furthermore, I would like us to label our tests correctly, e.g. unit test or 
integration test. 

Re: [RESULT] [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc4

2017-02-25 Thread Bolke de Bruin
Not trying to muddy the waters, but the observation of Jeremiah (non 
deterministic outcomes) might have to do something with #3. I didn’t dive in 
deeper, yet.

==
ERROR: test_backfill_examples (tests.BackfillJobTest)
--
Traceback (most recent call last):
  File "/home/travis/build/apache/incubator-airflow/tests/jobs.py", line 164, 
in test_backfill_examples
job.run()
  File "/home/travis/build/apache/incubator-airflow/airflow/jobs.py", line 200, 
in run
self._execute()
  File "/home/travis/build/apache/incubator-airflow/airflow/jobs.py", line 
1999, in _execute
raise AirflowException(err)
AirflowException: ---
Some task instances failed:
set([('example_short_circuit_operator', 'condition_is_True', 
datetime.datetime(2016, 1, 1, 0, 0))])
https://s3.amazonaws.com/archive.travis-ci.org/jobs/204780706/log.txt 
<https://s3.amazonaws.com/archive.travis-ci.org/jobs/204780706/log.txt>

Bolke

> On 25 Feb 2017, at 09:07, Bolke de Bruin <bdbr...@gmail.com> wrote:
> 
> Hi Dan,
> 
> - Backfill indeed runs only one dagrun at the time, see line 1755 of jobs.py. 
> I’ll think about how to fix this over the weekend (I think it was my change 
> that introduced this). Suggestions always welcome. Depending the impact it is 
> a blocker or not. We don’t often use backfills and definitely not at your 
> size, so that is why it didn’t pop up with us. I’m assuming blocker for now, 
> btw.
> - Speculation on the High DB Load. I’m not sure what your benchmark is here 
> (1.7.1 + multi processor dags?), but as you mentioned in the code 
> dependencies are checked a couple of times for one run and even task 
> instance. Dependency checking requires aggregation on the DB, which is a 
> performance killer. Annoying but not a blocker.
> - Skipped tasks potentially cause a dagrun to be marked failure/success 
> prematurely. BranchOperators are widely used if it affects these operators, 
> then it is a blocker.
> 
> - Bolke
> 
>> On 25 Feb 2017, at 02:04, Dan Davydov <dan.davy...@airbnb.com.INVALID> wrote:
>> 
>> Update on old pending issues:
>> - Black Squares in UI: Fix merged
>> - Double Trigger Issue That Alex G Mentioned: Alex has a PR in flight
>> 
>> New Issues:
>> - Backfill seems to be having issues (only running one dagrun at a time),
>> we are still investigating - might be a blocker
>> - High DB Load (~8x more than 1.7) - We are still investigating but it's
>> probably not a blocker for the release
>> - Skipped tasks potentially cause a dagrun to be marked as failure/success
>> prematurely - not sure whether or not to classify this as a blocker (only
>> really an issue for users who use the BranchingPythonOperator, which AirBnB
>> does)
>> 
>> On Thu, Feb 23, 2017 at 5:59 PM, siddharth anand <san...@apache.org> wrote:
>> 
>>> IMHO, a DAG run without a start date is non-sensical but is not enforced
>>> That said, our UI allows for the manual creation of DAG Runs without a
>>> start date as shown in the images below:
>>> 
>>> 
>>>  - https://www.dropbox.com/s/3sxcqh04eztpl7p/Screenshot%
>>>  202017-02-22%2016.00.40.png?dl=0
>>>  <https://www.dropbox.com/s/3sxcqh04eztpl7p/Screenshot%
>>> 202017-02-22%2016.00.40.png?dl=0>
>>>  - https://www.dropbox.com/s/4q6rr9dwghag1yy/Screenshot%
>>>  202017-02-22%2016.02.22.png?dl=0
>>>  <https://www.dropbox.com/s/4q6rr9dwghag1yy/Screenshot%
>>> 202017-02-22%2016.02.22.png?dl=0>
>>> 
>>> 
>>> On Wed, Feb 22, 2017 at 2:26 PM, Maxime Beauchemin <
>>> maximebeauche...@gmail.com> wrote:
>>> 
>>>> Our database may have edge cases that could be associated with running
>>> any
>>>> previous version that may or may not have been part of an official
>>> release.
>>>> 
>>>> Let's see if anyone else reports the issue. If no one does, one option is
>>>> to release 1.8.0 as is with a comment in the release notes, and have a
>>>> future official minor apache release 1.8.1 that would fix these minor
>>>> issues that are not deal breaker.
>>>> 
>>>> @bolke, I'm curious, how long does it take you to go through one release
>>>> cycle? Oh, and do you have a documented step by step process for
>>> releasing?
>>>> I'd like to add the Pypi part to this doc and add committers that are
>>>> interested to have rights on the project on Pypi.
>>

<    1   2   3   4   5   6   >