Re: Manual validation operator

2018-10-05 Thread Maxime Beauchemin
It's a bit of a hack, but to save up slots you could just have an
instantly-failing PythonOperator (just raise an exception in the callable)
that would go in a failed state. Marking it as "success" when the
conditions are met would act as a trigger.

On Fri, Oct 5, 2018 at 9:07 AM Brian Greene 
wrote:

> My first thought was this, but my understanding is   That if you had a
> large number of dags “waiting” the sensor would consume all the concurrency.
>
> And what if the user doesn’t approve?
>
> How about the dag you have as it’s last step writes to an api/db the
> status.
>
> Then 2 other dags (or one with a branch) can each have a sensor that’s
> watching for approved/unapproved values.  When it finds one (or a batch
> depending on how you write it), trigger the “next” dag.
>
> This leaves only 1-2 sensors running and would enable your process without
> anyone using the airflow UI (assuming they have some other way to mark
> “approval”).  This avoids the “process by error and recover” logic it seems
> like you’d like to get out of.  (Which makes sense to me)
>
> B
>
> Sent from a device with less than stellar autocorrect
>
> > On Oct 4, 2018, at 10:17 AM, Alek Storm  wrote:
> >
> > Hi Björn,
> >
> > We also sometimes require manual validation, and though we haven't yet
> > implemented this, I imagine you could store the approved/unapproved
> status
> > of the job in a database, expose it via an API, and write an Airflow
> sensor
> > that continuously polls that API until the status becomes "approved", at
> > which point the DAG execution will continue.
> >
> > Best,
> > Alek Storm
> >
> > On Thu, Oct 4, 2018 at 10:05 AM Björn Pollex
> >  wrote:
> >
> >> Hi all,
> >>
> >> In some of our workflows we require a manual validation step, where some
> >> generated data has to be reviewed by an authorised person before the
> >> workflow can continue. We currently model this by using a custom dummy
> >> operator that always fails. After the review, we manually mark it as
> >> success and clear the downstream tasks. This works, but it would be
> nice to
> >> have better representation of this in the UI. The customisation points
> for
> >> plugins don’t seem to offer any way of customising UI for specific
> >> operators.
> >>
> >> Does anyone else have similar use cases? How are you handling this?
> >>
> >> Cheers,
> >>
> >>Björn Pollex
> >>
> >>
>


Re: Pinning dependencies for Apache Airflow

2018-10-05 Thread Jarek Potiuk
Hello Erik,

I understand your concern. It's a hard one to solve in general (i.e.
dependency-hell). It looks like in this case you treat Airflow as
'library', where for some other people it might be more like 'end product'.
If you look at the "pinning" philosophy - the "pin everything" is good for
end products, but not good for libraries. In the case you have Airflow is
treated as a bit of both. And it's perfectly valid case at that (with
custom python DAGs being central concept for Airflow).
However, I think it's not as bad as you think when it comes to exact
pinning.

I believe - a bit counter-intuitively - that tools like pip-tools/poetry
with exact pinning result in having your dependencies upgraded more often,
rather than less - especially in complex systems where dependency-hell
creeps-in. If you look at Airflow's setup.py now - It's a bit scary to make
any change to it. There is a chance it will blow at your face if you change
it. You never know why there is 0.3 < ver < 1.0 - and if you change it,
whether it will cause chain reaction of conflicts that will ruin your work
day.

On the contrary - if you change it to exact pinning in
.lock/requirements.txt file (poetry/pip-tools) and have much simpler (and
commented) exclusion/avoidance rules in your .in/.tml file, the whole setup
might be much easier to maintain and upgrade. Every time you prepare for
release (or even once in a while for master) one person might consciously
attempt to upgrade all dependencies to latest ones. It should be almost as
easy as letting poetry/pip-tools help with figuring out what are the latest
set of dependencies that will work without conflicts. It should be rather
straightforward (I've done it in the past for fairly complex systems). What
those tools enable is - doing single-shot upgrade of all dependencies.
After doing it you can make sure that all tests work fine (and fix any
problems that result from it). And then you test it thoroughly before you
make final release. You can do it in separate PR - with automated testing
in Travis which means that you are not disturbing work of others
(compilation/building + unit tests are guaranteed to work before you merge
it) while doing it. It's all conscious rather than accidental. Nice side
effect of that is that with every release you can actually "catch-up" with
latest stable versions of many libraries in one go. It's better than
waiting until someone deliberately upgrades to newer version (and the rest
remain terribly out-dated as is the case for Airflow now).

So a bit counterintuitively I think tools like pip-tools/poetry help you to
catch up faster in many cases. That is at least my experience so far.

Additionally, Airflow is an open system - if you have very specific needs
for requirements, you might actually - in the very same way with
pip-tools/poetry - upgrade all your dependencies in your local fork of
Airflow before someone else does it in master/release. Those tools kind of
democratise dependency management. It should be as easy as `pip-compile
--upgrade` or `poetry update` and you will get all the "non-conflicting"
latest dependencies in your local fork (and poetry especially seems to do
all the heavy lifting of figuring out which versions will work). You should
be able to test and publish it locally as your private package for local
installations. You can even mark the specific dependency you want to use
specific version and let pip-tools/poetry figure out exact versions of
other requirements. You can even make a PR with such upgrade eventually to
get it faster in master. You can even downgrade in case newer dependency
causes problems for you in similar way. Guided by the tools, it's much
faster than figuring the versions out by yourself.

As long as we have simple way of managing it and document how to
upgrade/downgrade dependencies in your own fork, and mention how to locally
release Airflow as a package, I think your case could be covered even
better than now. What do you think ?

J.

On Fri, Oct 5, 2018 at 2:34 PM EKC (Erik Cederstrand)
 wrote:

> For us, exact pinning of versions would be problematic. We have DAG code
> that shares direct and indirect dependencies with Airflow, e.g. lxml,
> requests, pyhive, future, thrift, tzlocal, psycopg2 and ldap3. If our DAG
> code for some reason needs a newer point release due to a bug that's fixed,
> then we can't cleanly build a virtual environment containing the fixed
> version. For us, it's already a problem that Airflow has quite strict (and
> sometimes old) requirements in setup.py.
>
> Erik
> 
> From: Jarek Potiuk 
> Sent: Friday, October 5, 2018 2:01:15 PM
> To: dev@airflow.incubator.apache.org
> Subject: Re: Pinning dependencies for Apache Airflow
>
> I think one solution to release approach is to check as part of automated
> Travis build if all requirements are pinned with == (even the deep ones)
> and fail the build in case they are not for ALL versions (including
> dev). And 

Re: Flask-AppBuilder has pinned versions of Click & Flask-Login in 1.10.0

2018-10-05 Thread Kyle Hamlin
Thank for this Ash. Pipenv works very well in 99% of cases and is vastly
better than managing requirements files. Also, PYPA (Python Packaging
Authority) officially recommends Pipenv.

I gave poetry a try and it seems like it has a lot of potential. I did run
into two errors, a max recursion depth when installing Airflow with a lot
of extras, and when installing Airflow without any extras an issue
installing lxml.

On Fri, Oct 5, 2018 at 4:29 AM Ash Berlin-Taylor  wrote:

> Oh I meant FAB 1.11.1.
>
> And it looks like the Jinja issue is a bug in pip-tools, where it treats a
> dep of "jina" as actually being "jinja>=CURRENT"
> https://github.com/pypa/pipenv/issues/2596 <
> https://github.com/pypa/pipenv/issues/2596>
>
> In short: pip-env isn't ready for real-world use yet? (I'm guessing and
> extrapolating, but I haven't used it myself so don't trust my word on this)
>
> -ash
> > On 4 Oct 2018, at 16:38, Kyle Hamlin  wrote:
> >
> > If I remove the Flask-AppBuild pinning to 1.11.0 then it uncovers a
> Jinja2
> > conflict which is baffling because I don't see anywhere in the graph that
> > jinja2 >=2.10 is required.
> >
> > Could not find a version that matches
> > jinja2<2.9.0,>=2.10,>=2.4,>=2.5,>=2.7.3,>=2.8
> > Tried: 2.0, 2.1, 2.1.1, 2.2, 2.2.1, 2.3, 2.3.1, 2.4, 2.4.1, 2.5, 2.5.1,
> > 2.5.2, 2.5.3, 2.5.4, 2.5.5, 2.6, 2.7, 2.7.1, 2.7.2, 2.7.3, 2.8, 2.8,
> 2.8.1,
> > 2.8.1, 2.9, 2.9, 2.9.1, 2.9.1, 2.9.2, 2.9.2, 2.9.3, 2.9.3, 2.9.4, 2.9.4,
> > 2.9.5, 2.9.5, 2.9.6, 2.9.6, 2.10, 2.10
> >
> > I highlighted why the dep fails there one dep that requires Jinjs2 <
> 2.9.0
> > but I still have not idea where the 2.10.0 comes from.
> >
> > apache-airflow==2.0.0.dev0+incubating
> >  - alembic [required: >=0.9,<1.0, installed: 0.9.10]
> >- Mako [required: Any, installed: 1.0.7]
> >  - MarkupSafe [required: >=0.9.2, installed: 1.0]
> >- python-dateutil [required: Any, installed: 2.7.3]
> >  - six [required: >=1.5, installed: 1.11.0]
> >- python-editor [required: >=0.3, installed: 1.0.3]
> >- SQLAlchemy [required: >=0.7.6, installed: 1.1.18]
> >  - bleach [required: ~=2.1.3, installed: 2.1.4]
> >- html5lib [required:
> >>
> =0.pre,!=1.0b8,!=1.0b7,!=1.0b6,!=1.0b5,!=1.0b4,!=1.0b3,!=1.0b2,!=1.0b1,
> > installed: 1.0.1]
> >  - six [required: >=1.9, installed: 1.11.0]
> >  - webencodings [required: Any, installed: 0.5.1]
> >- six [required: Any, installed: 1.11.0]
> >  - configparser [required: >=3.5.0,<3.6.0, installed: 3.5.0]
> >  - croniter [required: >=0.3.17,<0.4, installed: 0.3.25]
> >- python-dateutil [required: Any, installed: 2.7.3]
> >  - six [required: >=1.5, installed: 1.11.0]
> >  - dill [required: >=0.2.2,<0.3, installed: 0.2.8.2]
> >  - flask [required: >=0.12.4,<0.13, installed: 0.12.4]
> >- click [required: >=2.0, installed: 7.0]
> >- itsdangerous [required: >=0.21, installed: 0.24]
> >- Jinja2 [required: >=2.4, installed: 2.8.1]
> >  - MarkupSafe [required: Any, installed: 1.0]
> >- Werkzeug [required: >=0.7, installed: 0.14.1]
> >  - flask-admin [required: ==1.4.1, installed: 1.4.1]
> >- Flask [required: >=0.7, installed: 0.12.4]
> >  - click [required: >=2.0, installed: 7.0]
> >  - itsdangerous [required: >=0.21, installed: 0.24]
> >  - Jinja2 [required: >=2.4, installed: 2.8.1]
> >- MarkupSafe [required: Any, installed: 1.0]
> >  - Werkzeug [required: >=0.7, installed: 0.14.1]
> >- wtforms [required: Any, installed: 2.2.1]
> >  - flask-appbuilder [required: >=1.12,<2.0.0, installed: 1.12.0]
> >- click [required: ==6.7, installed: 7.0]
> >- colorama [required: ==0.3.9, installed: 0.3.9]
> >- Flask [required: >=0.10.0,<0.12.99, installed: 0.12.4]
> >  - click [required: >=2.0, installed: 7.0]
> >  - itsdangerous [required: >=0.21, installed: 0.24]
> >  - Jinja2 [required: >=2.4, installed: 2.8.1]
> >- MarkupSafe [required: Any, installed: 1.0]
> >  - Werkzeug [required: >=0.7, installed: 0.14.1]
> >- Flask-Babel [required: ==0.11.1, installed: 0.11.1]
> >  - Babel [required: >=2.3, installed: 2.6.0]
> >- pytz [required: >=0a, installed: 2018.5]
> >  - Flask [required: Any, installed: 0.12.4]
> >- click [required: >=2.0, installed: 7.0]
> >- itsdangerous [required: >=0.21, installed: 0.24]
> >- Jinja2 [required: >=2.4, installed: 2.8.1]
> >  - MarkupSafe [required: Any, installed: 1.0]
> >- Werkzeug [required: >=0.7, installed: 0.14.1]
> >  - Jinja2 [required: >=2.5, installed: 2.8.1]
> >- MarkupSafe [required: Any, installed: 1.0]
> >- Flask-Login [required: >=0.3,<0.5, installed: 0.4.1]
> >  - Flask [required: Any, installed: 0.12.4]
> >- click [required: >=2.0, installed: 7.0]
> >- itsdangerous [required: >=0.21, installed: 0.24]
> >- Jinja2 [required: >=2.4, installed: 2.8.1]
> >  - MarkupSafe [required: Any, installed: 1.0]
> >- Werkzeug 

Re: Airflow Docs - RTD vs Apache Site

2018-10-05 Thread Maxime Beauchemin
A few thoughts:
* we absolutely have to serve a project site off of `airflow.apache.org`,
that's an ASF requirement
* maybe `airflow.apache.org` could be setup as a proxy to
readthedocs-latest (?) [I'm on vacation and have very slow internet, so
didn't research whether that's a documented use-case, we could also ask
Apache-INFRA about it]
* we could (and really should) split the project site and the documentation
as two different sites, that assumes we'd have someone drive creating a
proper, professional looking project site that would link out to the docs
on "Read the Docs". Creating a project site is not that much work and could
be a rewarding project for someone in the community. Many static site
builder framework work off of "markdown" format, and it's possible to
auto-convert RST (the format we use) to markdown automatically. I'd be nice
to take fresh screenshots of the UI while at it!

Max


Max

On Wed, Oct 3, 2018 at 6:13 AM Kaxil Naik  wrote:

> Hi all,
>
> Continuing discussion from Slack, many users have had the problem with
> looking at a wrong version of the documentation. Currently, our docs on
> apache.airflow.org don't properly state version. Although we have
> specified
> this info on our Github readme and confluence, there has still been lots of
> confusion among the new users who try to google for the docs and are
> pointed to airflow.apache.org site which doesn't have version info.
>
> The problem currently with a.a.o site is it needs to be manually built and
> only has stable version docs. We can do 2 things if we don't want to
> redirect a.a.o with RTD: (1) Maintain History on our static a.a.o site (2)
> Point a.a.o site to RTD docs, so a.a.o would point to RTD docs i.e. add the
> domain to RTD site
>
> Ash has also suggested another approach:
>
> > Apache Infra run a jenkins instance (or other build bot type things) that
> > we might be able to use for autobuilding docs if we want?
>
>
>
> Let's discuss this and decide on a single-approach that is user-friendly.
>
> NB: I will be busy for a month, hence won't be able to actively help with
> this, so please feel free to contribute/commit after an approach is
> finalized.
>
> Regards,
> Kaxil
>


Re: Manual validation operator

2018-10-05 Thread Brian Greene
My first thought was this, but my understanding is   That if you had a large 
number of dags “waiting” the sensor would consume all the concurrency.

And what if the user doesn’t approve?

How about the dag you have as it’s last step writes to an api/db the status.

Then 2 other dags (or one with a branch) can each have a sensor that’s watching 
for approved/unapproved values.  When it finds one (or a batch depending on how 
you write it), trigger the “next” dag.  

This leaves only 1-2 sensors running and would enable your process without 
anyone using the airflow UI (assuming they have some other way to mark 
“approval”).  This avoids the “process by error and recover” logic it seems 
like you’d like to get out of.  (Which makes sense to me)

B

Sent from a device with less than stellar autocorrect

> On Oct 4, 2018, at 10:17 AM, Alek Storm  wrote:
> 
> Hi Björn,
> 
> We also sometimes require manual validation, and though we haven't yet
> implemented this, I imagine you could store the approved/unapproved status
> of the job in a database, expose it via an API, and write an Airflow sensor
> that continuously polls that API until the status becomes "approved", at
> which point the DAG execution will continue.
> 
> Best,
> Alek Storm
> 
> On Thu, Oct 4, 2018 at 10:05 AM Björn Pollex
>  wrote:
> 
>> Hi all,
>> 
>> In some of our workflows we require a manual validation step, where some
>> generated data has to be reviewed by an authorised person before the
>> workflow can continue. We currently model this by using a custom dummy
>> operator that always fails. After the review, we manually mark it as
>> success and clear the downstream tasks. This works, but it would be nice to
>> have better representation of this in the UI. The customisation points for
>> plugins don’t seem to offer any way of customising UI for specific
>> operators.
>> 
>> Does anyone else have similar use cases? How are you handling this?
>> 
>> Cheers,
>> 
>>Björn Pollex
>> 
>> 


Re: Pinning dependencies for Apache Airflow

2018-10-05 Thread EKC (Erik Cederstrand)
For us, exact pinning of versions would be problematic. We have DAG code that 
shares direct and indirect dependencies with Airflow, e.g. lxml, requests, 
pyhive, future, thrift, tzlocal, psycopg2 and ldap3. If our DAG code for some 
reason needs a newer point release due to a bug that's fixed, then we can't 
cleanly build a virtual environment containing the fixed version. For us, it's 
already a problem that Airflow has quite strict (and sometimes old) 
requirements in setup.py.

Erik

From: Jarek Potiuk 
Sent: Friday, October 5, 2018 2:01:15 PM
To: dev@airflow.incubator.apache.org
Subject: Re: Pinning dependencies for Apache Airflow

I think one solution to release approach is to check as part of automated
Travis build if all requirements are pinned with == (even the deep ones)
and fail the build in case they are not for ALL versions (including
dev). And of course we should document the approach of releases/upgrades
etc. If we do it all the time for development versions (which seems quite
doable), then transitively all the releases will also have pinned versions
and they will never try to upgrade any of the dependencies. In poetry
(similarly in pip-tools with .in file) it is done by having a .lock file
that specifies exact versions of each package so it can be rather easy to
manage (so it's worth trying it out I think  :D  - seems a bit more
friendly than pip-tools).

There is a drawback - of course - with manually updating the module that
you want, but I really see that as an advantage rather than drawback
especially for users. This way you maintain the property that it will
always install and work the same way no matter if you installed it today or
two months ago. I think the biggest drawback for maintainers is that you
need some kind of monitoring of security vulnerabilities and cannot rely on
automated security upgrades. With >= requirements those security updates
might happen automatically without anyone noticing, but to be honest I
don't think such upgrades are guaranteed even in current setup for all
security issues for all libraries anyway.

Finding the need to upgrade because of security issues can be quite
automated. Even now I noticed Github started to inform owners about
potential security vulnerabilities in used libraries for their project.
Those notifications can be sent to devlist and turned into JIRA issues
followed bvy  minor security-related releases (with only few library
dependencies upgraded).

I think it's even easier to automate it if you have pinned dependencies -
because it's generally easy to find applicable vulnerabilities for specific
versions of libraries by static analysers - when you have >=, you never
know which version will be used until you actually perform the
installation.

There is one big advantage for maintainers for "pinned" case. Your users
always have the same dependencies - so when issue is raised, you can
reproduce it more easily. It's hard to know which version user has (as the
user could install it month ago or yesterday) and even if you find out by
asking the user, you might not be able to reproduce the set of requirements
easily (simply because there are already newer versions of the libraries
released and they are used automatically). You can ask the user to run pip
--upgrade but that's dangerous and pretty lame ("check the latest version -
maybe it fixes your problem ? ") and sometimes not possible (e.g. someone
has pre-built docker image with dependencies from few months ago and cannot
rebuild the image easily).

J.

On Fri, Oct 5, 2018 at 12:35 PM Ash Berlin-Taylor  wrote:

> One thing to point out here.
>
> Right now if you `pip install apache-airflow=1.10.0` in a clean
> environment it will fail.
>
> This is because we pin flask-login to 0.2.1 but flask-appbuilder is >=
> 1.11.1, so that pulls in 1.12.0 which requires flask-login >= 0.3.
>
> So I do think there is maybe something to be said about pinning for
> releases. The down side to that is that if there are updates to a module
> that we want then we have to make a point release to let people get it
>
> Both methods have draw-backs
>
> -ash
>
> > On 4 Oct 2018, at 17:13, Arthur Wiedmer 
> wrote:
> >
> > Hi Jarek,
> >
> > I will +1 the discussion Dan is referring to and George's advice.
> >
> > I just want to double check we are talking about pinning in
> > requirements.txt only.
> >
> > This offers the ability to
> > pip install -r requirements.txt
> > pip install --no-deps airflow
> > For a guaranteed install which works.
> >
> > Several different requirement files can be provided for specific use
> cases,
> > like a stable dev one for instance for people wanting to work on
> operators
> > and non-core functions.
> >
> > However, I think we should proactively test in CI against unpinned
> > dependencies (though it might be a separate case in the matrix) , so that
> > we get advance warning if possible that things will break.
> > CI downtime is not a bad thing 

Re: Pinning dependencies for Apache Airflow

2018-10-05 Thread Jarek Potiuk
I think one solution to release approach is to check as part of automated
Travis build if all requirements are pinned with == (even the deep ones)
and fail the build in case they are not for ALL versions (including
dev). And of course we should document the approach of releases/upgrades
etc. If we do it all the time for development versions (which seems quite
doable), then transitively all the releases will also have pinned versions
and they will never try to upgrade any of the dependencies. In poetry
(similarly in pip-tools with .in file) it is done by having a .lock file
that specifies exact versions of each package so it can be rather easy to
manage (so it's worth trying it out I think  :D  - seems a bit more
friendly than pip-tools).

There is a drawback - of course - with manually updating the module that
you want, but I really see that as an advantage rather than drawback
especially for users. This way you maintain the property that it will
always install and work the same way no matter if you installed it today or
two months ago. I think the biggest drawback for maintainers is that you
need some kind of monitoring of security vulnerabilities and cannot rely on
automated security upgrades. With >= requirements those security updates
might happen automatically without anyone noticing, but to be honest I
don't think such upgrades are guaranteed even in current setup for all
security issues for all libraries anyway.

Finding the need to upgrade because of security issues can be quite
automated. Even now I noticed Github started to inform owners about
potential security vulnerabilities in used libraries for their project.
Those notifications can be sent to devlist and turned into JIRA issues
followed bvy  minor security-related releases (with only few library
dependencies upgraded).

I think it's even easier to automate it if you have pinned dependencies -
because it's generally easy to find applicable vulnerabilities for specific
versions of libraries by static analysers - when you have >=, you never
know which version will be used until you actually perform the
installation.

There is one big advantage for maintainers for "pinned" case. Your users
always have the same dependencies - so when issue is raised, you can
reproduce it more easily. It's hard to know which version user has (as the
user could install it month ago or yesterday) and even if you find out by
asking the user, you might not be able to reproduce the set of requirements
easily (simply because there are already newer versions of the libraries
released and they are used automatically). You can ask the user to run pip
--upgrade but that's dangerous and pretty lame ("check the latest version -
maybe it fixes your problem ? ") and sometimes not possible (e.g. someone
has pre-built docker image with dependencies from few months ago and cannot
rebuild the image easily).

J.

On Fri, Oct 5, 2018 at 12:35 PM Ash Berlin-Taylor  wrote:

> One thing to point out here.
>
> Right now if you `pip install apache-airflow=1.10.0` in a clean
> environment it will fail.
>
> This is because we pin flask-login to 0.2.1 but flask-appbuilder is >=
> 1.11.1, so that pulls in 1.12.0 which requires flask-login >= 0.3.
>
> So I do think there is maybe something to be said about pinning for
> releases. The down side to that is that if there are updates to a module
> that we want then we have to make a point release to let people get it
>
> Both methods have draw-backs
>
> -ash
>
> > On 4 Oct 2018, at 17:13, Arthur Wiedmer 
> wrote:
> >
> > Hi Jarek,
> >
> > I will +1 the discussion Dan is referring to and George's advice.
> >
> > I just want to double check we are talking about pinning in
> > requirements.txt only.
> >
> > This offers the ability to
> > pip install -r requirements.txt
> > pip install --no-deps airflow
> > For a guaranteed install which works.
> >
> > Several different requirement files can be provided for specific use
> cases,
> > like a stable dev one for instance for people wanting to work on
> operators
> > and non-core functions.
> >
> > However, I think we should proactively test in CI against unpinned
> > dependencies (though it might be a separate case in the matrix) , so that
> > we get advance warning if possible that things will break.
> > CI downtime is not a bad thing here, it actually caught a problem :)
> >
> > We should unpin as possible in setup.py to only maintain minimum required
> > compatibility. The process of pinning in setup.py is extremely
> detrimental
> > when you have a large number of python libraries installed with different
> > pinned versions.
> >
> > Best,
> > Arthur
> >
> > On Thu, Oct 4, 2018 at 8:36 AM Dan Davydov  >
> > wrote:
> >
> >> Relevant discussion about this:
> >>
> >>
> https://github.com/apache/incubator-airflow/pull/1809#issuecomment-257502174
> >>
> >> On Thu, Oct 4, 2018 at 11:25 AM Jarek Potiuk 
> >> wrote:
> >>
> >>> TL;DR; A change is coming in the way how dependencies/requirements are
> >>> 

Re: PR for refactoring Airflow SLAs

2018-10-05 Thread Colin Nattrass
Hello all, 

Is there any update on the status of this PR?

I discovered this following a request for help on StackOverflow (on creating 
SLAs on task duration 
https://stackoverflow.com/questions/52645422/sla-on-task-duration-airflow). If 
this is unlikely to implemented in the short term, is there a known workaround?

Colin N


On 2018/07/17 18:56:35, Maxime Beauchemin  wrote: 
> I did a quick scan and it looks like great work, thanks for your> 
> contribution! I'm guessing the committers are all either busy or> 
> vacationing at this moment. Let's make sure this gets properly reviewed and> 
> merged.> 
> 
> Related to this is the thought of having a formal flow for improvement> 
> proposals so that we can do some design review upfront, and couple> 
> contributors with committers early to make sure the process goes through> 
> smoothly. It hurts to have to have quality contributions ignored. Clearly> 
> we need to onboard more committers to insure quality work gets merged while> 
> also providing steady, high quality releases.> 
> 
> In the meantime I'd advise you to ping regularly to make sure this PR gets> 
> the attention it deserves and prevent it from getting buried in the pile.> 
> 
> Max> 
> 
> On Tue, Jul 17, 2018 at 5:27 AM James Meickle> 
>  wrote:> 
> 
> > Hi all,> 
> >> 
> > I'd still love to get some eyes on this one if anyone has time. Definitely> 
> > needs some direction as to what is required before merging, since this is 
> > a> 
> > higher-level API change...> 
> >> 
> > -James M.> 
> >> 
> > On Mon, Jul 9, 2018 at 11:58 AM, James Meickle > 
> > wrote:> 
> >> 
> > > Hi folks,> 
> > >> 
> > > Based on my earlier email to the list, I have submitted a PR that splits> 
> > > `sla=` into three independent SLA parameters, as well as heavily> 
> > > restructuring other parts of the SLA feature:> 
> > >> 
> > > https://github.com/apache/incubator-airflow/pull/3584> 
> > >> 
> > > This is my first Airflow PR and I'm still learning the codebase, so> 
> > > there's likely to be flaws with it. But I'm most interested in the> 
> > general> 
> > > compatibility of this feature with the rest of Airflow. We want this for> 
> > > our purposes at Quantopian, but we'd really prefer to get it into 
> > > Airflow> 
> > > core rather than running a fork forever!> 
> > >> 
> > > Let me know your thoughts,> 
> > >> 
> > > -James M.> 
> > >> 
> >> 
> 

Re: Pinning dependencies for Apache Airflow

2018-10-05 Thread Ash Berlin-Taylor
One thing to point out here.

Right now if you `pip install apache-airflow=1.10.0` in a clean environment it 
will fail.

This is because we pin flask-login to 0.2.1 but flask-appbuilder is >= 1.11.1, 
so that pulls in 1.12.0 which requires flask-login >= 0.3.

So I do think there is maybe something to be said about pinning for releases. 
The down side to that is that if there are updates to a module that we want 
then we have to make a point release to let people get it

Both methods have draw-backs

-ash

> On 4 Oct 2018, at 17:13, Arthur Wiedmer  wrote:
> 
> Hi Jarek,
> 
> I will +1 the discussion Dan is referring to and George's advice.
> 
> I just want to double check we are talking about pinning in
> requirements.txt only.
> 
> This offers the ability to
> pip install -r requirements.txt
> pip install --no-deps airflow
> For a guaranteed install which works.
> 
> Several different requirement files can be provided for specific use cases,
> like a stable dev one for instance for people wanting to work on operators
> and non-core functions.
> 
> However, I think we should proactively test in CI against unpinned
> dependencies (though it might be a separate case in the matrix) , so that
> we get advance warning if possible that things will break.
> CI downtime is not a bad thing here, it actually caught a problem :)
> 
> We should unpin as possible in setup.py to only maintain minimum required
> compatibility. The process of pinning in setup.py is extremely detrimental
> when you have a large number of python libraries installed with different
> pinned versions.
> 
> Best,
> Arthur
> 
> On Thu, Oct 4, 2018 at 8:36 AM Dan Davydov 
> wrote:
> 
>> Relevant discussion about this:
>> 
>> https://github.com/apache/incubator-airflow/pull/1809#issuecomment-257502174
>> 
>> On Thu, Oct 4, 2018 at 11:25 AM Jarek Potiuk 
>> wrote:
>> 
>>> TL;DR; A change is coming in the way how dependencies/requirements are
>>> specified for Apache Airflow - they will be fixed rather than flexible
>> (==
>>> rather than >=).
>>> 
>>> This is follow up after Slack discussion we had with Ash and Kaxil -
>>> summarising what we propose we'll do.
>>> 
>>> *Problem:*
>>> During last few weeks we experienced quite a few downtimes of TravisCI
>>> builds (for all PRs/branches including master) as some of the transitive
>>> dependencies were automatically upgraded. This because in a number of
>>> dependencies we have  >= rather than == dependencies.
>>> 
>>> Whenever there is a new release of such dependency, it might cause chain
>>> reaction with upgrade of transitive dependencies which might get into
>>> conflict.
>>> 
>>> An example was Flask-AppBuilder vs flask-login transitive dependency with
>>> click. They started to conflict once AppBuilder has released version
>>> 1.12.0.
>>> 
>>> *Diagnosis:*
>>> Transitive dependencies with "flexible" versions (where >= is used
>> instead
>>> of ==) is a reason for "dependency hell". We will sooner or later hit
>> other
>>> cases where not fixed dependencies cause similar problems with other
>>> transitive dependencies. We need to fix-pin them. This causes problems
>> for
>>> both - released versions (cause they stop to work!) and for development
>>> (cause they break master builds in TravisCI and prevent people from
>>> installing development environment from the scratch.
>>> 
>>> *Solution:*
>>> 
>>>   - Following the old-but-good post
>>>   https://nvie.com/posts/pin-your-packages/ we are going to fix the
>>> pinned
>>>   dependencies to specific versions (so basically all dependencies are
>>>   "fixed").
>>>   - We will introduce mechanism to be able to upgrade dependencies with
>>>   pip-tools (https://github.com/jazzband/pip-tools). We might also
>> take a
>>>   look at pipenv: https://pipenv.readthedocs.io/en/latest/
>>>   - People who would like to upgrade some dependencies for their PRs
>> will
>>>   still be able to do it - but such upgrades will be in their PR thus
>> they
>>>   will go through TravisCI tests and they will also have to be specified
>>> with
>>>   pinned fixed versions (==). This should be part of review process to
>>> make
>>>   sure new/changed requirements are pinned.
>>>   - In release process there will be a point where an upgrade will be
>>>   attempted for all requirements (using pip-tools) so that we are not
>>> stuck
>>>   with older releases. This will be in controlled PR environment where
>>> there
>>>   will be time to fix all dependencies without impacting others and
>> likely
>>>   enough time to "vet" such changes (this can be done for alpha/beta
>>> releases
>>>   for example).
>>>   - As a side effect dependencies specification will become far simpler
>>>   and straightforward.
>>> 
>>> Happy to hear community comments to the proposal. I am happy to take a
>> lead
>>> on that, open JIRA issue and implement if this is something community is
>>> happy with.
>>> 
>>> J.
>>> 
>>> --
>>> 
>>> *Jarek Potiuk, Principal Software Engineer*
>>> Mobile: +48 660 

Re: Pinning dependencies for Apache Airflow

2018-10-05 Thread Jarek Potiuk
Never tried poetry before, but it looks really good (it passes also my
aesthetic filter for slick design of the webpage). Quick look and it passes
a lot of criteria I have in my mind:

   - works on all platforms
   - easily installable with pip
   - uses standard PyPI repositories by default (but you can switch to
   private)
   - .lock file paradigm (similar to other pinning solutions - such as yarn
   and gradle)
   - automated virtualenv creation
   - has support for python 2.7 and 3.4
   - is pretty active and seems to have not very big but not small either
   number of contributors (https://github.com/sdispater/poetry)

The one thing about pip-tools which I do not like is that it actually uses
requirements.in -> requirements.txt generation and some people might not
realise that you should not modify the requirements.txt by hand (who reads
the comments anyway!). There is no such case for poetry it seems, but it
might be that some IDE support will be lost as well - for example the
excellent IntelliJ support (something I'd like to try).

I am tempted to try it and report how it works for Airflow. It's a question
to community whether they will be happy to accept such relatively new tool
in the standard toolchain. It's quite a change and it seems a bit more than
just package manager - with the virtualenv automated integration (on the
other hand it's kind of nice that by default it forces you to work in
virtualenv).

J.

On Fri, Oct 5, 2018 at 9:04 AM Björn Pollex
 wrote:

> Hi all,
>
> Have you considered looking into poetry[1]? I’ve had really good
> experiences with it, we specifically introduced it into our project because
> we were getting version conflicts, and it resolved them just fine. It
> properly supports semantic versioning, so package versions have upper
> bounds. It also has a full dependency resolver, so even when package
> upgrades are available, it will only upgrade if the version constraints
> allow it. It does have some issues though, most notably that it depends on
> package metadata being correct to properly resolve dependencies, and that’s
> not always the case.
>
> Cheers,
>
> Björn
>
> [1]: https://poetry.eustace.io/
>
> > On 5. Oct 2018, at 03:58, James Meickle 
> wrote:
> >
> > I suggest not adopting pipenv. It has a nice "first five minutes" demo
> but
> > it's simply not baked enough to depend on as a swap in pip replacement.
> We
> > are in the process of removing it after finding several serious bugs in
> our
> > POC of it.
> >
> > On Thu, Oct 4, 2018, 20:30 Alex Guziel 
> > wrote:
> >
> >> FWIW, there's some value in using virtualenv with Docker to isolate
> >> yourself from your system's Python.
> >>
> >> It's worth noting that requirements files can link other requirements
> >> files, so that would make groups easier, but not that pip in one run
> has no
> >> guarantee of transitive dependencies not conflicting or overriding. You
> >> need pip check for that or use --no-deps.
> >>
> >> On Thu, Oct 4, 2018 at 5:19 PM Driesprong, Fokko 
> >> wrote:
> >>
> >>> Hi Jarek,
> >>>
> >>> Thanks for bringing this up. I missed the discussion on Slack since I'm
> >> on
> >>> holiday, but I saw the thread and it was way too interesting, and
> >> therefore
> >>> this email :)
> >>>
> >>> This is actually something that we need to address asap. Like you
> >> mention,
> >>> we saw it earlier that specific transient dependencies are not
> compatible
> >>> and then we end up with a breaking CI, or even worse, a broken release.
> >>> Earlier we had in the setup.py the fixed versions (==) and in a
> separate
> >>> requirements.txt the requirements for the CI. This was also far from
> >>> optimal since we had two versions of the requirements.
> >>>
> >>> I like the idea that you are proposing. Maybe we can do an experiment
> >> with
> >>> it, because of the nature of Airflow (orchestrating different systems),
> >> we
> >>> have a huge list of dependencies. To not install everything, we've
> >> created
> >>> groups. For example specific libraries when you're using the Google
> >> Cloud,
> >>> Elastic, Druid, etc. So I'm curious how it will work with the `
> >>> extras_require` of Airflow
> >>>
> >>> Regarding the pipenv. I don't use any pipenv/virtualenv anymore. For me
> >>> Docker is much easier to work with. I'm also working on a PR to get rid
> >> of
> >>> tox for the testing, and move to a more Docker idiomatic test pipeline.
> >>> Curious what you thoughts are on that.
> >>>
> >>> Cheers, Fokko
> >>>
> >>> Op do 4 okt. 2018 om 15:39 schreef Arthur Wiedmer <
> >>> arthur.wied...@gmail.com
>  :
> >>>
>  Thanks Jakob!
> 
>  I think that this is a huge risk of Slack.
>  I am not against Slack as a support channel, but it is a slippery
> slope
> >>> to
>  have more and more decisions/conversations happening there, contrary
> to
>  what we hope to achieve with the ASF.
> 
>  When we are starting to discuss issues of development, extensions and
>  

Re: Pinning dependencies for Apache Airflow

2018-10-05 Thread Björn Pollex
Hi all,

Have you considered looking into poetry[1]? I’ve had really good experiences 
with it, we specifically introduced it into our project because we were getting 
version conflicts, and it resolved them just fine. It properly supports 
semantic versioning, so package versions have upper bounds. It also has a full 
dependency resolver, so even when package upgrades are available, it will only 
upgrade if the version constraints allow it. It does have some issues though, 
most notably that it depends on package metadata being correct to properly 
resolve dependencies, and that’s not always the case. 

Cheers,

Björn

[1]: https://poetry.eustace.io/

> On 5. Oct 2018, at 03:58, James Meickle  
> wrote:
> 
> I suggest not adopting pipenv. It has a nice "first five minutes" demo but
> it's simply not baked enough to depend on as a swap in pip replacement. We
> are in the process of removing it after finding several serious bugs in our
> POC of it.
> 
> On Thu, Oct 4, 2018, 20:30 Alex Guziel 
> wrote:
> 
>> FWIW, there's some value in using virtualenv with Docker to isolate
>> yourself from your system's Python.
>> 
>> It's worth noting that requirements files can link other requirements
>> files, so that would make groups easier, but not that pip in one run has no
>> guarantee of transitive dependencies not conflicting or overriding. You
>> need pip check for that or use --no-deps.
>> 
>> On Thu, Oct 4, 2018 at 5:19 PM Driesprong, Fokko 
>> wrote:
>> 
>>> Hi Jarek,
>>> 
>>> Thanks for bringing this up. I missed the discussion on Slack since I'm
>> on
>>> holiday, but I saw the thread and it was way too interesting, and
>> therefore
>>> this email :)
>>> 
>>> This is actually something that we need to address asap. Like you
>> mention,
>>> we saw it earlier that specific transient dependencies are not compatible
>>> and then we end up with a breaking CI, or even worse, a broken release.
>>> Earlier we had in the setup.py the fixed versions (==) and in a separate
>>> requirements.txt the requirements for the CI. This was also far from
>>> optimal since we had two versions of the requirements.
>>> 
>>> I like the idea that you are proposing. Maybe we can do an experiment
>> with
>>> it, because of the nature of Airflow (orchestrating different systems),
>> we
>>> have a huge list of dependencies. To not install everything, we've
>> created
>>> groups. For example specific libraries when you're using the Google
>> Cloud,
>>> Elastic, Druid, etc. So I'm curious how it will work with the `
>>> extras_require` of Airflow
>>> 
>>> Regarding the pipenv. I don't use any pipenv/virtualenv anymore. For me
>>> Docker is much easier to work with. I'm also working on a PR to get rid
>> of
>>> tox for the testing, and move to a more Docker idiomatic test pipeline.
>>> Curious what you thoughts are on that.
>>> 
>>> Cheers, Fokko
>>> 
>>> Op do 4 okt. 2018 om 15:39 schreef Arthur Wiedmer <
>>> arthur.wied...@gmail.com
 :
>>> 
 Thanks Jakob!
 
 I think that this is a huge risk of Slack.
 I am not against Slack as a support channel, but it is a slippery slope
>>> to
 have more and more decisions/conversations happening there, contrary to
 what we hope to achieve with the ASF.
 
 When we are starting to discuss issues of development, extensions and
 improvements, it is important for the discussion to happen in the
>> mailing
 list.
 
 Jarek, I wouldn't worry too much, we are still in the process of
>> learning
 as a community. Welcome and thank you for your contribution!
 
 Best,
 Arthur.
 
 On Thu, Oct 4, 2018 at 1:42 PM Jarek Potiuk 
 wrote:
 
> Thanks for pointing it out Jakob.
> 
> I am still very fresh in the ASF community and learning the ropes and
> etiquette and code of conduct. Apologies for my ignorance.
> I re-read the conduct and FAQ now again - with more understanding and
 will
> pay more attention to wording in the future. As you mentioned it's
>> more
 the
> wording than intentions, but since it was in TL;DR; it has stronger
> consequences.
> 
> BTW. Thanks for actually following the code of conduct and pointing
>> it
 out
> in respectful manner. I really appreciate it.
> 
> J.
> 
> Principal Software Engineer
> Phone: +48660796129
> 
> On Thu, 4 Oct 2018, 20:41 Jakob Homan,  wrote:
> 
>>> TL;DR; A change is coming in the way how
>> dependencies/requirements
 are
>>> specified for Apache Airflow - they will be fixed rather than
 flexible
>> (==
>>> rather than >=).
>> 
>>> This is follow up after Slack discussion we had with Ash and
>> Kaxil
>>> -
>>> summarising what we propose we'll do.
>> 
>> Hey all.  It's great that we're moving this discussion back from
>>> Slack
>> to the mailing list.  But I've gotta point out that the wording
>> needs
>> a small but critical fix up:
>>