Re: Using large numbers of sensors, resource consumption

2018-07-15 Thread Maxime Beauchemin
There have been conversations in the past around the idea of adding an
`evaluation_method` argument in BaseSensor that would allow for different
options:
1. the current approach which is taking up a slot and poking periodically
(heavy on slot usage)
2. one approach closer to fail/retry approach, likely introducing a new
state representing that it's waiting for the next sensing event (heavy on
overhead, MQ traffic, ...)
3. one where the scheduler itself runs the "poke" method in line, in many
cases it represents very little overhead for the scheduler to run that
task, and the scheduler is already insulated (DAG is parsed in a sub
process) (heavy on the scheduler machine). I think it's reasonable to do
this even without a distributed scheduler, especially for cheap-to-check
sensors.

Another way to mitigate resources that we used at Airbnb is to have a
dedicated sensor queue with machines that are provisioned more aggressively
(say 16 or 32 slots per CPU core), and route the cheap sensing tasks to
those machines.

Max

On Thu, Jul 12, 2018 at 11:51 AM Pedro Machado  wrote:

> Thanks, Ash, Alexander, and Stefan for your replies.
>
> I am relatively new to airflow and not familiar with the code base. I like
> the idea of having a more efficient sensor.
>
> The async approach makes sense, but I don't know how well it would fit
> within the existing architecture.
>
> I like that Stefan's "reschedule" approach can fit the current architecture
> and could be implemented sooner. From the user point of view, my only
> feedback is that the UI should not show sensors that are still running as
> failed or up for retry as that would draw attention to things that are
> running as expected. I'll add this comment to the JIRA issue.
>
> Thanks!
>
> Pedro
>
>
> On Tue, Jul 10, 2018 at 9:44 AM Stefan Seelmann 
> wrote:
>
> > I also have that requirement and I'm working on a proposal for
> > rescheduling tasks. My current PoC can be found at [1] which uses
> > up_for_retry state which has some problems. I started to make some
> > changes, I hope can make a first proposal this week.
> >
> > The basic idea is:
> > * A new "reschedule" flag for sensors, if set to True it will raise an
> > AirflowRescheduleException (with the new schedule date) that causes a
> > reschedule
> > * Reschedule requests are recorded in new `task_reschedule` table and
> > visualized in the Gantt view.
> > * A new TI dependency that checks if a task is ready to be re-scheduled
> >
> > Advantages:
> > * This change is backward compatible. Existing sensors behave like
> > before. But it's possible to set the "reschedule" flag.
> > * The timeout and poke_interval are still respected and used to
> > calculate the next schedule time
> > * Custom sensor implementations can even define the next sensible
> > schedule date.
> > * This mechanism can also be used by non-sensor operators
> >
> > Kind Regards,
> > Stefan
> >
> > [1]
> https://github.com/seelmann/incubator-airflow/tree/reschedule-sensor-3
> >
> > On 07/10/2018 04:05 PM, Pedro Machado wrote:
> > > I have a few DAGs that use time sensors to wait until data is ready,
> > which
> > > can be several days.
> > >
> > > I have one daily DAG where, for each execution date, I have to repull
> the
> > > data for the next 7 days to capture changes (late arriving revenue
> data).
> > > This DAG currently starts 7 TimeDeltaSensors for each execution days
> with
> > > delays that range from 0 to 6 days.
> > >
> > > I was wondering what the recommendation is for cases like this where a
> > > large number of sensors is needed.
> > >
> > > Are there ways to reduce the footprint of these sensors so that they
> use
> > > less CPU and memory?
> > >
> > > I noticed that in one of the DAGs that Germain Tanguy had in the
> > > presentation he shared today a sensor was set to time out every 30
> > seconds
> > > but had a large retry count so instead of running constantly, it runs
> > every
> > > 15 minutes for 30 seconds and then dies.
> > >
> > > Are other people using this pattern? Do you have other suggestions?
> > >
> > > Thanks,
> > >
> > > Pedro
> > >
> >
> >
>


Re: [DISCUSS] AIP - Time for Airflow Improvement Proposals?

2018-07-15 Thread Arthur Wiedmer
+1

On Sun, Jul 15, 2018, 20:12 Maxime Beauchemin 
wrote:

> +1
>
> On Tue, Jul 10, 2018 at 1:09 PM Sid Anand  wrote:
>
> > +1
> >
> > On Tue, Jul 10, 2018 at 1:02 PM George Leslie-Waksman
> >  wrote:
> >
> > > +1
> > >
> > > On Tue, Jul 10, 2018 at 11:50 AM Jakob Homan 
> wrote:
> > >
> > > > Lots of Apache projects use ?IPs - Whatever Improvement Proposal - to
> > > > document and gather consensus on large changes to the code base.
> Some
> > > > examples:
> > > >* Kafka Improvement Proposals (KIP) -
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
> > > >   * Flink Improvement Proposal (FLIP) -
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
> > > >   * Spark Improvement Proposal (SPIP) -
> > > > https://spark.apache.org/improvement-proposals.html
> > > >
> > > > We've got a few changes that have been discussed, either on the
> > > > list/JIRA (good) or in private (bad -
> > > > https://incubator.apache.org/guides/committer.html#mailing_lists)
> that
> > > > are of a magnitude that they may benefit from some version of this
> > > > process.  Examples:
> > > >* The in-progress plan to refactor out connectors and hooks
> > > > (AIRFLOW-2732)
> > > >* K8S deployment operator proposal
> > > >* Initial Design for Supporting fine-grained Connection encryption
> > > >
> > > >
> > > > The benefits of this approach is that the design is hosted somewhere
> > > > less ephemeral and more editable than email.  It also provides a
> > > > framework for documenting and confirming consensus through the whole
> > > > community.
> > > >
> > > >What do y'all think?
> > > >
> > > > -Jakob
> > > >
> > >
> >
>


Re: [DISCUSS] AIP - Time for Airflow Improvement Proposals?

2018-07-15 Thread Maxime Beauchemin
+1

On Tue, Jul 10, 2018 at 1:09 PM Sid Anand  wrote:

> +1
>
> On Tue, Jul 10, 2018 at 1:02 PM George Leslie-Waksman
>  wrote:
>
> > +1
> >
> > On Tue, Jul 10, 2018 at 11:50 AM Jakob Homan  wrote:
> >
> > > Lots of Apache projects use ?IPs - Whatever Improvement Proposal - to
> > > document and gather consensus on large changes to the code base.  Some
> > > examples:
> > >* Kafka Improvement Proposals (KIP) -
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
> > >   * Flink Improvement Proposal (FLIP) -
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
> > >   * Spark Improvement Proposal (SPIP) -
> > > https://spark.apache.org/improvement-proposals.html
> > >
> > > We've got a few changes that have been discussed, either on the
> > > list/JIRA (good) or in private (bad -
> > > https://incubator.apache.org/guides/committer.html#mailing_lists) that
> > > are of a magnitude that they may benefit from some version of this
> > > process.  Examples:
> > >* The in-progress plan to refactor out connectors and hooks
> > > (AIRFLOW-2732)
> > >* K8S deployment operator proposal
> > >* Initial Design for Supporting fine-grained Connection encryption
> > >
> > >
> > > The benefits of this approach is that the design is hosted somewhere
> > > less ephemeral and more editable than email.  It also provides a
> > > framework for documenting and confirming consensus through the whole
> > > community.
> > >
> > >What do y'all think?
> > >
> > > -Jakob
> > >
> >
>


[VOTE] Airflow 1.10.0rc2

2018-07-15 Thread Bolke de Bruin
Hey all,

I have cut Airflow 1.10.0 RC2. This email is calling a vote on the release,
which will last for 72 hours. Consider this my (binding) +1.

Airflow 1.10.0 RC 2 is available at:

https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc2/ 


apache-airflow-1.10.0rc2+incubating-source.tar.gz is a source release that
comes with INSTALL instructions.
apache-airflow-1.10.0rc2+incubating-bin.tar.gz is the binary Python "sdist"
release.

Public keys are available at:

https://dist.apache.org/repos/dist/release/incubator/airflow/ 


The amount of JIRAs fixed is over 700. Please have a look at the changelog. 
Since RC2 the following has been fixed:

* [AIRFLOW-1729][AIRFLOW-2797][AIRFLOW-2729] Ignore whole directories in 
.airflowignore
* [AIRFLOW-2739] Always read default configuration files as utf-8
* [AIRFLOW-2752] Log using logging instead of stdout
* [AIRFLOW-1729][AIRFLOW-XXX] Remove extra debug log at info level

Please note that the version number excludes the `rcX` string as well
as the "+incubating" string, so it's now simply 1.10.0. This will allow us
to rename the artifact without modifying the artifact checksums when we
actually release.


Cheers,
Bolke

Re: Airflow's JS code (and dependencies) manageable via npm and webpack

2018-07-15 Thread Maxime Beauchemin
Glad to see this is happening!

Max

On Mon, Jul 9, 2018 at 6:37 AM Ash Berlin-Taylor <
ash_airflowl...@firemirror.com> wrote:

> Great! Thanks for doing this. I've left some review comments on your PR.
>
> -ash
>
> > On 9 Jul 2018, at 11:45, Verdan Mahmood 
> wrote:
> >
> > ​Hey Guys, ​
> >
> > In an effort to simplify the JS dependencies of Airflow
> > ​​
> > ,
> > ​I've
> > introduce
> > ​d​
> > npm and webpack for the package management. For now, it only implements
> > this in the www_rbac version of the web server.
> > ​
> >
> > Pull Request: https://github.com/apache/incubator-airflow/pull/3572
> >
> > The problem with the
> > ​existing ​
> > frontend (
> > ​JS
> > ) code of Airflow is that most of the custom JS is written
> > ​with​
> > in the html files, using the Flask's (Jinja) variables in that JS. The
> next
> > step of this effort would be to extract that custom
> > ​JS
> > code in separate JS files
> > ​,​
> > use the dependencies in those files using require or import
> > ​ and introduce the JS automated test suite eventually. ​
> > (At the moment, I'm simply using the CopyWebPackPlugin to copy the
> required
> > dependencies for use)
> > ​.
> >
> > There are also some dependencies which are directly modified in the
> codebase
> > ​ or are outdated​
> > . I couldn't found the
> > ​ correct​
> > npm versions of those libraries. (dagre-d3.js and gantt-chart-d3v2.js).
> > Apparently dagre-d3.js that we are using is one of the gist or is very
> old
> > version
> > ​ not supported with webpack 4​
> > , while the gantt-chart-d3v2 has been modified according to Airflow's
> > requirements
> > ​ I believe​
> > .
> > ​ Used the existing libraries for now. ​
> >
> > ​I am currently working in a separate branch to upgrade the DagreD3
> > library, and updating the custom JS related to DagreD3 accordingly. ​
> >
> > This PR also introduces the pypi_push.sh
> > <
> https://github.com/apache/incubator-airflow/pull/3572/files#diff-8fae684cdcc8cc8df2232c8df16f64cb
> >
> > script that will generate all the JS statics before creating and
> uploading
> > the package.
> > ​
> > ​Please let me know if you guys have any questions or suggestions and I'd
> > be happy to answer that. ​
> >
> > Best,
> > *Verdan Mahmood*
> > (+31) 655 576 560
>
>


Re: [VOTE] Airflow 1.10.0rc1

2018-07-15 Thread Bolke de Bruin
Thanks Fokko, will include your PR.

B.

> On 15 Jul 2018, at 12:07, Driesprong, Fokko  wrote:
> 
> Hi all,
> 
> I've did some more tests and it looks good. I was under the assumption that
> the sequential executor runs within the webserver, but this was a wrong
> assumption on my end. The behaviour is still the same as in 1.9. I've did
> some tests on Python 2.7, 3.5 and 3.6 and it looks good. Python 3.7 does
> not work yet, but it isn't supported anyway.
> 
> I would like to have https://github.com/apache/incubator-airflow/pull/3604 in
> RC2. It isn't a critical bug, but it looks messy.
> 
> Cheers, Fokko
> 
> 2018-07-14 0:06 GMT+02:00 Driesprong, Fokko :
> 
>> Thanks Bolke for all the effort.
>> 
>> I think I've miscommunicated the issue. It doesn't schedule the runs, and
>> when I explicitly kick of a run, it also isn't being picked up.
>> 
>> Currently I'm doing a git bisect to check when this bug was introduced
>> happend. To be continued.
>> 
>> Cheers, Fokko
>> 
>> 2018-07-13 23:28 GMT+02:00 Bolke de Bruin :
>> 
>>> Hi Fokko,
>>> 
>>> Please confirm this, because I tried sequential with a clean install. The
>>> only thing I found was that example DAGs were not picked up.
>>> 
>>> I’m rolling rc2 anyway though so it would be good to get it fixed.
>>> 
>>> B.
>>> 
>>> Verstuurd vanaf mijn iPad
>>> 
 Op 13 jul. 2018 om 22:23 heeft Driesprong, Fokko 
>>> het volgende geschreven:
 
 Ok, I've did some testing.
 
 1.10 works fine with the LocalExecutor. With the SequentialExecutor it
>>> does
 not pick up any task, even with a different database as sqlite. Found
>>> this
 one along the way: https://github.com/apache/incu
>>> bator-airflow/pull/3604
 
 There are no recent changes to the SequentialExecutor, so I'm still
>>> looking
 how this bug found its way into the source. For me this is a -1, right
>>> now
 it is not possible to just give Airflow a try using a basic setup with a
 SequentialExecutor.
 
 Along the way this also makes me reconsider the tests. Like with the
 Kubernetes test we just run a task, and then assert if it ran properly.
 This might also be an idea for the sequential executor.
 
 Cheers, Fokko
 
 2018-07-13 20:15 GMT+02:00 Jakob Homan :
 
> @Bolke - I didn't raise the concern, so I can't speak to whether or
> not Sebb will be ok with that. He tends to be pretty fastidious on
> this stuff and 'but some other TLP does it' hasn't gone over well
> before (trust me... I've tried).  Totally up to you if you'd rather
> discuss it as part of the IPMC vote or just fix it to avoid
> discussion.
> 
> -jakob
> 
> On 13 July 2018 at 09:48, Ash Berlin-Taylor
>  wrote:
>> Cloud that be related to my ignorefile change? `airflow list_dags`
>>> still
> shows the example dags - the output is the same for that command as on
> v1-9-stable.
>> 
>> Though I just noticed I'd left `self.log.info  in there. That's going to be noisy. https://github.com/apache/
> incubator-airflow/pull/3603  incubator-airflow/pull/3603>
>> 
>> -ash
>> 
>>> On 13 Jul 2018, at 17:36, Bolke de Bruin  wrote:
>>> 
>>> Example dags are not picked up. If you put a dag in the normal dag
> folder it works fine.
>>> 
>>> Please create a jira for this @fokko. A pr would be appreciated.
>>> 
>>> B.
>>> 
>>> Sent from my iPhone
>>> 
 On 13 Jul 2018, at 15:46, Driesprong, Fokko 
> wrote:
 
 With the SequentialExecutor the webserver also acts as the scheduler
 (without parallelism)
 
 2018-07-13 15:43 GMT+02:00 Carl Johan Gustavsson <
> carl.jo...@tictail.com>:
 
>> 
> 
>>> 
>> 
>> 



Re: [Proposal] Explicit re-schedule of sensors

2018-07-15 Thread Driesprong, Fokko
Thanks Stefan for picking this up. The sensors are in desperate need for
some redesign for the aforementioned reasons. Please note this ticket:
https://issues.apache.org/jira/browse/AIRFLOW-2001 It addresses the same
issue.

Regarding the open question. I would be reluctant for introducing new
states. Adding new states involves also changing/adding logic in the
scheduler. This scheduler is already far too complex right now. Maybe we
can also do something with the priority of the sensor, maybe lower the
priority once it has been polled, and came back with a negative state. In
such a strategy the other tasks will get priority over the sensors.

Cheers, Fokko

2018-07-12 14:53 GMT+02:00 Stefan Seelmann :

> Hi all,
>
> I'd like to discuss a proposal to enable explicit re-scheduling of
> sensors. I think there is demand for such a thing, in the last weeks
> multiple people asked for it or mentioned workarounds.
>
> I created a Jira [1] that describes the proposal and an initial PR [2].
>
> Feedback welcomed :-)
>
> Kind Regards,
> Stefan
>
> [1] https://issues.apache.org/jira/browse/AIRFLOW-2747
> [2] https://github.com/apache/incubator-airflow/pull/3596
>


Re: [VOTE] Airflow 1.10.0rc1

2018-07-15 Thread Driesprong, Fokko
Hi all,

I've did some more tests and it looks good. I was under the assumption that
the sequential executor runs within the webserver, but this was a wrong
assumption on my end. The behaviour is still the same as in 1.9. I've did
some tests on Python 2.7, 3.5 and 3.6 and it looks good. Python 3.7 does
not work yet, but it isn't supported anyway.

I would like to have https://github.com/apache/incubator-airflow/pull/3604 in
RC2. It isn't a critical bug, but it looks messy.

Cheers, Fokko

2018-07-14 0:06 GMT+02:00 Driesprong, Fokko :

> Thanks Bolke for all the effort.
>
> I think I've miscommunicated the issue. It doesn't schedule the runs, and
> when I explicitly kick of a run, it also isn't being picked up.
>
> Currently I'm doing a git bisect to check when this bug was introduced
> happend. To be continued.
>
> Cheers, Fokko
>
> 2018-07-13 23:28 GMT+02:00 Bolke de Bruin :
>
>> Hi Fokko,
>>
>> Please confirm this, because I tried sequential with a clean install. The
>> only thing I found was that example DAGs were not picked up.
>>
>> I’m rolling rc2 anyway though so it would be good to get it fixed.
>>
>> B.
>>
>> Verstuurd vanaf mijn iPad
>>
>> > Op 13 jul. 2018 om 22:23 heeft Driesprong, Fokko 
>> het volgende geschreven:
>> >
>> > Ok, I've did some testing.
>> >
>> > 1.10 works fine with the LocalExecutor. With the SequentialExecutor it
>> does
>> > not pick up any task, even with a different database as sqlite. Found
>> this
>> > one along the way: https://github.com/apache/incu
>> bator-airflow/pull/3604
>> >
>> > There are no recent changes to the SequentialExecutor, so I'm still
>> looking
>> > how this bug found its way into the source. For me this is a -1, right
>> now
>> > it is not possible to just give Airflow a try using a basic setup with a
>> > SequentialExecutor.
>> >
>> > Along the way this also makes me reconsider the tests. Like with the
>> > Kubernetes test we just run a task, and then assert if it ran properly.
>> > This might also be an idea for the sequential executor.
>> >
>> > Cheers, Fokko
>> >
>> > 2018-07-13 20:15 GMT+02:00 Jakob Homan :
>> >
>> >> @Bolke - I didn't raise the concern, so I can't speak to whether or
>> >> not Sebb will be ok with that. He tends to be pretty fastidious on
>> >> this stuff and 'but some other TLP does it' hasn't gone over well
>> >> before (trust me... I've tried).  Totally up to you if you'd rather
>> >> discuss it as part of the IPMC vote or just fix it to avoid
>> >> discussion.
>> >>
>> >> -jakob
>> >>
>> >> On 13 July 2018 at 09:48, Ash Berlin-Taylor
>> >>  wrote:
>> >>> Cloud that be related to my ignorefile change? `airflow list_dags`
>> still
>> >> shows the example dags - the output is the same for that command as on
>> >> v1-9-stable.
>> >>>
>> >>> Though I just noticed I'd left `self.log.info > >()`
>> >> in there. That's going to be noisy. https://github.com/apache/
>> >> incubator-airflow/pull/3603 > >> incubator-airflow/pull/3603>
>> >>>
>> >>> -ash
>> >>>
>>  On 13 Jul 2018, at 17:36, Bolke de Bruin  wrote:
>> 
>>  Example dags are not picked up. If you put a dag in the normal dag
>> >> folder it works fine.
>> 
>>  Please create a jira for this @fokko. A pr would be appreciated.
>> 
>>  B.
>> 
>>  Sent from my iPhone
>> 
>> > On 13 Jul 2018, at 15:46, Driesprong, Fokko 
>> >> wrote:
>> >
>> > With the SequentialExecutor the webserver also acts as the scheduler
>> > (without parallelism)
>> >
>> > 2018-07-13 15:43 GMT+02:00 Carl Johan Gustavsson <
>> >> carl.jo...@tictail.com>:
>> >
>> >>>
>> >>
>>
>
>