Re: Manual validation operator
It's a bit of a hack, but to save up slots you could just have an instantly-failing PythonOperator (just raise an exception in the callable) that would go in a failed state. Marking it as "success" when the conditions are met would act as a trigger. On Fri, Oct 5, 2018 at 9:07 AM Brian Greene wrote: > My first thought was this, but my understanding is That if you had a > large number of dags “waiting” the sensor would consume all the concurrency. > > And what if the user doesn’t approve? > > How about the dag you have as it’s last step writes to an api/db the > status. > > Then 2 other dags (or one with a branch) can each have a sensor that’s > watching for approved/unapproved values. When it finds one (or a batch > depending on how you write it), trigger the “next” dag. > > This leaves only 1-2 sensors running and would enable your process without > anyone using the airflow UI (assuming they have some other way to mark > “approval”). This avoids the “process by error and recover” logic it seems > like you’d like to get out of. (Which makes sense to me) > > B > > Sent from a device with less than stellar autocorrect > > > On Oct 4, 2018, at 10:17 AM, Alek Storm wrote: > > > > Hi Björn, > > > > We also sometimes require manual validation, and though we haven't yet > > implemented this, I imagine you could store the approved/unapproved > status > > of the job in a database, expose it via an API, and write an Airflow > sensor > > that continuously polls that API until the status becomes "approved", at > > which point the DAG execution will continue. > > > > Best, > > Alek Storm > > > > On Thu, Oct 4, 2018 at 10:05 AM Björn Pollex > > wrote: > > > >> Hi all, > >> > >> In some of our workflows we require a manual validation step, where some > >> generated data has to be reviewed by an authorised person before the > >> workflow can continue. We currently model this by using a custom dummy > >> operator that always fails. After the review, we manually mark it as > >> success and clear the downstream tasks. This works, but it would be > nice to > >> have better representation of this in the UI. The customisation points > for > >> plugins don’t seem to offer any way of customising UI for specific > >> operators. > >> > >> Does anyone else have similar use cases? How are you handling this? > >> > >> Cheers, > >> > >>Björn Pollex > >> > >> >
Re: Pinning dependencies for Apache Airflow
Hello Erik, I understand your concern. It's a hard one to solve in general (i.e. dependency-hell). It looks like in this case you treat Airflow as 'library', where for some other people it might be more like 'end product'. If you look at the "pinning" philosophy - the "pin everything" is good for end products, but not good for libraries. In the case you have Airflow is treated as a bit of both. And it's perfectly valid case at that (with custom python DAGs being central concept for Airflow). However, I think it's not as bad as you think when it comes to exact pinning. I believe - a bit counter-intuitively - that tools like pip-tools/poetry with exact pinning result in having your dependencies upgraded more often, rather than less - especially in complex systems where dependency-hell creeps-in. If you look at Airflow's setup.py now - It's a bit scary to make any change to it. There is a chance it will blow at your face if you change it. You never know why there is 0.3 < ver < 1.0 - and if you change it, whether it will cause chain reaction of conflicts that will ruin your work day. On the contrary - if you change it to exact pinning in .lock/requirements.txt file (poetry/pip-tools) and have much simpler (and commented) exclusion/avoidance rules in your .in/.tml file, the whole setup might be much easier to maintain and upgrade. Every time you prepare for release (or even once in a while for master) one person might consciously attempt to upgrade all dependencies to latest ones. It should be almost as easy as letting poetry/pip-tools help with figuring out what are the latest set of dependencies that will work without conflicts. It should be rather straightforward (I've done it in the past for fairly complex systems). What those tools enable is - doing single-shot upgrade of all dependencies. After doing it you can make sure that all tests work fine (and fix any problems that result from it). And then you test it thoroughly before you make final release. You can do it in separate PR - with automated testing in Travis which means that you are not disturbing work of others (compilation/building + unit tests are guaranteed to work before you merge it) while doing it. It's all conscious rather than accidental. Nice side effect of that is that with every release you can actually "catch-up" with latest stable versions of many libraries in one go. It's better than waiting until someone deliberately upgrades to newer version (and the rest remain terribly out-dated as is the case for Airflow now). So a bit counterintuitively I think tools like pip-tools/poetry help you to catch up faster in many cases. That is at least my experience so far. Additionally, Airflow is an open system - if you have very specific needs for requirements, you might actually - in the very same way with pip-tools/poetry - upgrade all your dependencies in your local fork of Airflow before someone else does it in master/release. Those tools kind of democratise dependency management. It should be as easy as `pip-compile --upgrade` or `poetry update` and you will get all the "non-conflicting" latest dependencies in your local fork (and poetry especially seems to do all the heavy lifting of figuring out which versions will work). You should be able to test and publish it locally as your private package for local installations. You can even mark the specific dependency you want to use specific version and let pip-tools/poetry figure out exact versions of other requirements. You can even make a PR with such upgrade eventually to get it faster in master. You can even downgrade in case newer dependency causes problems for you in similar way. Guided by the tools, it's much faster than figuring the versions out by yourself. As long as we have simple way of managing it and document how to upgrade/downgrade dependencies in your own fork, and mention how to locally release Airflow as a package, I think your case could be covered even better than now. What do you think ? J. On Fri, Oct 5, 2018 at 2:34 PM EKC (Erik Cederstrand) wrote: > For us, exact pinning of versions would be problematic. We have DAG code > that shares direct and indirect dependencies with Airflow, e.g. lxml, > requests, pyhive, future, thrift, tzlocal, psycopg2 and ldap3. If our DAG > code for some reason needs a newer point release due to a bug that's fixed, > then we can't cleanly build a virtual environment containing the fixed > version. For us, it's already a problem that Airflow has quite strict (and > sometimes old) requirements in setup.py. > > Erik > > From: Jarek Potiuk > Sent: Friday, October 5, 2018 2:01:15 PM > To: dev@airflow.incubator.apache.org > Subject: Re: Pinning dependencies for Apache Airflow > > I think one solution to release approach is to check as part of automated > Travis build if all requirements are pinned with == (even the deep ones) > and fail the build in case they are not for ALL versions (including > dev). And
Re: Flask-AppBuilder has pinned versions of Click & Flask-Login in 1.10.0
Thank for this Ash. Pipenv works very well in 99% of cases and is vastly better than managing requirements files. Also, PYPA (Python Packaging Authority) officially recommends Pipenv. I gave poetry a try and it seems like it has a lot of potential. I did run into two errors, a max recursion depth when installing Airflow with a lot of extras, and when installing Airflow without any extras an issue installing lxml. On Fri, Oct 5, 2018 at 4:29 AM Ash Berlin-Taylor wrote: > Oh I meant FAB 1.11.1. > > And it looks like the Jinja issue is a bug in pip-tools, where it treats a > dep of "jina" as actually being "jinja>=CURRENT" > https://github.com/pypa/pipenv/issues/2596 < > https://github.com/pypa/pipenv/issues/2596> > > In short: pip-env isn't ready for real-world use yet? (I'm guessing and > extrapolating, but I haven't used it myself so don't trust my word on this) > > -ash > > On 4 Oct 2018, at 16:38, Kyle Hamlin wrote: > > > > If I remove the Flask-AppBuild pinning to 1.11.0 then it uncovers a > Jinja2 > > conflict which is baffling because I don't see anywhere in the graph that > > jinja2 >=2.10 is required. > > > > Could not find a version that matches > > jinja2<2.9.0,>=2.10,>=2.4,>=2.5,>=2.7.3,>=2.8 > > Tried: 2.0, 2.1, 2.1.1, 2.2, 2.2.1, 2.3, 2.3.1, 2.4, 2.4.1, 2.5, 2.5.1, > > 2.5.2, 2.5.3, 2.5.4, 2.5.5, 2.6, 2.7, 2.7.1, 2.7.2, 2.7.3, 2.8, 2.8, > 2.8.1, > > 2.8.1, 2.9, 2.9, 2.9.1, 2.9.1, 2.9.2, 2.9.2, 2.9.3, 2.9.3, 2.9.4, 2.9.4, > > 2.9.5, 2.9.5, 2.9.6, 2.9.6, 2.10, 2.10 > > > > I highlighted why the dep fails there one dep that requires Jinjs2 < > 2.9.0 > > but I still have not idea where the 2.10.0 comes from. > > > > apache-airflow==2.0.0.dev0+incubating > > - alembic [required: >=0.9,<1.0, installed: 0.9.10] > >- Mako [required: Any, installed: 1.0.7] > > - MarkupSafe [required: >=0.9.2, installed: 1.0] > >- python-dateutil [required: Any, installed: 2.7.3] > > - six [required: >=1.5, installed: 1.11.0] > >- python-editor [required: >=0.3, installed: 1.0.3] > >- SQLAlchemy [required: >=0.7.6, installed: 1.1.18] > > - bleach [required: ~=2.1.3, installed: 2.1.4] > >- html5lib [required: > >> > =0.pre,!=1.0b8,!=1.0b7,!=1.0b6,!=1.0b5,!=1.0b4,!=1.0b3,!=1.0b2,!=1.0b1, > > installed: 1.0.1] > > - six [required: >=1.9, installed: 1.11.0] > > - webencodings [required: Any, installed: 0.5.1] > >- six [required: Any, installed: 1.11.0] > > - configparser [required: >=3.5.0,<3.6.0, installed: 3.5.0] > > - croniter [required: >=0.3.17,<0.4, installed: 0.3.25] > >- python-dateutil [required: Any, installed: 2.7.3] > > - six [required: >=1.5, installed: 1.11.0] > > - dill [required: >=0.2.2,<0.3, installed: 0.2.8.2] > > - flask [required: >=0.12.4,<0.13, installed: 0.12.4] > >- click [required: >=2.0, installed: 7.0] > >- itsdangerous [required: >=0.21, installed: 0.24] > >- Jinja2 [required: >=2.4, installed: 2.8.1] > > - MarkupSafe [required: Any, installed: 1.0] > >- Werkzeug [required: >=0.7, installed: 0.14.1] > > - flask-admin [required: ==1.4.1, installed: 1.4.1] > >- Flask [required: >=0.7, installed: 0.12.4] > > - click [required: >=2.0, installed: 7.0] > > - itsdangerous [required: >=0.21, installed: 0.24] > > - Jinja2 [required: >=2.4, installed: 2.8.1] > >- MarkupSafe [required: Any, installed: 1.0] > > - Werkzeug [required: >=0.7, installed: 0.14.1] > >- wtforms [required: Any, installed: 2.2.1] > > - flask-appbuilder [required: >=1.12,<2.0.0, installed: 1.12.0] > >- click [required: ==6.7, installed: 7.0] > >- colorama [required: ==0.3.9, installed: 0.3.9] > >- Flask [required: >=0.10.0,<0.12.99, installed: 0.12.4] > > - click [required: >=2.0, installed: 7.0] > > - itsdangerous [required: >=0.21, installed: 0.24] > > - Jinja2 [required: >=2.4, installed: 2.8.1] > >- MarkupSafe [required: Any, installed: 1.0] > > - Werkzeug [required: >=0.7, installed: 0.14.1] > >- Flask-Babel [required: ==0.11.1, installed: 0.11.1] > > - Babel [required: >=2.3, installed: 2.6.0] > >- pytz [required: >=0a, installed: 2018.5] > > - Flask [required: Any, installed: 0.12.4] > >- click [required: >=2.0, installed: 7.0] > >- itsdangerous [required: >=0.21, installed: 0.24] > >- Jinja2 [required: >=2.4, installed: 2.8.1] > > - MarkupSafe [required: Any, installed: 1.0] > >- Werkzeug [required: >=0.7, installed: 0.14.1] > > - Jinja2 [required: >=2.5, installed: 2.8.1] > >- MarkupSafe [required: Any, installed: 1.0] > >- Flask-Login [required: >=0.3,<0.5, installed: 0.4.1] > > - Flask [required: Any, installed: 0.12.4] > >- click [required: >=2.0, installed: 7.0] > >- itsdangerous [required: >=0.21, installed: 0.24] > >- Jinja2 [required: >=2.4, installed: 2.8.1] > > - MarkupSafe [required: Any, installed: 1.0] > >- Werkzeug
Re: Airflow Docs - RTD vs Apache Site
A few thoughts: * we absolutely have to serve a project site off of `airflow.apache.org`, that's an ASF requirement * maybe `airflow.apache.org` could be setup as a proxy to readthedocs-latest (?) [I'm on vacation and have very slow internet, so didn't research whether that's a documented use-case, we could also ask Apache-INFRA about it] * we could (and really should) split the project site and the documentation as two different sites, that assumes we'd have someone drive creating a proper, professional looking project site that would link out to the docs on "Read the Docs". Creating a project site is not that much work and could be a rewarding project for someone in the community. Many static site builder framework work off of "markdown" format, and it's possible to auto-convert RST (the format we use) to markdown automatically. I'd be nice to take fresh screenshots of the UI while at it! Max Max On Wed, Oct 3, 2018 at 6:13 AM Kaxil Naik wrote: > Hi all, > > Continuing discussion from Slack, many users have had the problem with > looking at a wrong version of the documentation. Currently, our docs on > apache.airflow.org don't properly state version. Although we have > specified > this info on our Github readme and confluence, there has still been lots of > confusion among the new users who try to google for the docs and are > pointed to airflow.apache.org site which doesn't have version info. > > The problem currently with a.a.o site is it needs to be manually built and > only has stable version docs. We can do 2 things if we don't want to > redirect a.a.o with RTD: (1) Maintain History on our static a.a.o site (2) > Point a.a.o site to RTD docs, so a.a.o would point to RTD docs i.e. add the > domain to RTD site > > Ash has also suggested another approach: > > > Apache Infra run a jenkins instance (or other build bot type things) that > > we might be able to use for autobuilding docs if we want? > > > > Let's discuss this and decide on a single-approach that is user-friendly. > > NB: I will be busy for a month, hence won't be able to actively help with > this, so please feel free to contribute/commit after an approach is > finalized. > > Regards, > Kaxil >
Re: Manual validation operator
My first thought was this, but my understanding is That if you had a large number of dags “waiting” the sensor would consume all the concurrency. And what if the user doesn’t approve? How about the dag you have as it’s last step writes to an api/db the status. Then 2 other dags (or one with a branch) can each have a sensor that’s watching for approved/unapproved values. When it finds one (or a batch depending on how you write it), trigger the “next” dag. This leaves only 1-2 sensors running and would enable your process without anyone using the airflow UI (assuming they have some other way to mark “approval”). This avoids the “process by error and recover” logic it seems like you’d like to get out of. (Which makes sense to me) B Sent from a device with less than stellar autocorrect > On Oct 4, 2018, at 10:17 AM, Alek Storm wrote: > > Hi Björn, > > We also sometimes require manual validation, and though we haven't yet > implemented this, I imagine you could store the approved/unapproved status > of the job in a database, expose it via an API, and write an Airflow sensor > that continuously polls that API until the status becomes "approved", at > which point the DAG execution will continue. > > Best, > Alek Storm > > On Thu, Oct 4, 2018 at 10:05 AM Björn Pollex > wrote: > >> Hi all, >> >> In some of our workflows we require a manual validation step, where some >> generated data has to be reviewed by an authorised person before the >> workflow can continue. We currently model this by using a custom dummy >> operator that always fails. After the review, we manually mark it as >> success and clear the downstream tasks. This works, but it would be nice to >> have better representation of this in the UI. The customisation points for >> plugins don’t seem to offer any way of customising UI for specific >> operators. >> >> Does anyone else have similar use cases? How are you handling this? >> >> Cheers, >> >>Björn Pollex >> >>
Re: Pinning dependencies for Apache Airflow
For us, exact pinning of versions would be problematic. We have DAG code that shares direct and indirect dependencies with Airflow, e.g. lxml, requests, pyhive, future, thrift, tzlocal, psycopg2 and ldap3. If our DAG code for some reason needs a newer point release due to a bug that's fixed, then we can't cleanly build a virtual environment containing the fixed version. For us, it's already a problem that Airflow has quite strict (and sometimes old) requirements in setup.py. Erik From: Jarek Potiuk Sent: Friday, October 5, 2018 2:01:15 PM To: dev@airflow.incubator.apache.org Subject: Re: Pinning dependencies for Apache Airflow I think one solution to release approach is to check as part of automated Travis build if all requirements are pinned with == (even the deep ones) and fail the build in case they are not for ALL versions (including dev). And of course we should document the approach of releases/upgrades etc. If we do it all the time for development versions (which seems quite doable), then transitively all the releases will also have pinned versions and they will never try to upgrade any of the dependencies. In poetry (similarly in pip-tools with .in file) it is done by having a .lock file that specifies exact versions of each package so it can be rather easy to manage (so it's worth trying it out I think :D - seems a bit more friendly than pip-tools). There is a drawback - of course - with manually updating the module that you want, but I really see that as an advantage rather than drawback especially for users. This way you maintain the property that it will always install and work the same way no matter if you installed it today or two months ago. I think the biggest drawback for maintainers is that you need some kind of monitoring of security vulnerabilities and cannot rely on automated security upgrades. With >= requirements those security updates might happen automatically without anyone noticing, but to be honest I don't think such upgrades are guaranteed even in current setup for all security issues for all libraries anyway. Finding the need to upgrade because of security issues can be quite automated. Even now I noticed Github started to inform owners about potential security vulnerabilities in used libraries for their project. Those notifications can be sent to devlist and turned into JIRA issues followed bvy minor security-related releases (with only few library dependencies upgraded). I think it's even easier to automate it if you have pinned dependencies - because it's generally easy to find applicable vulnerabilities for specific versions of libraries by static analysers - when you have >=, you never know which version will be used until you actually perform the installation. There is one big advantage for maintainers for "pinned" case. Your users always have the same dependencies - so when issue is raised, you can reproduce it more easily. It's hard to know which version user has (as the user could install it month ago or yesterday) and even if you find out by asking the user, you might not be able to reproduce the set of requirements easily (simply because there are already newer versions of the libraries released and they are used automatically). You can ask the user to run pip --upgrade but that's dangerous and pretty lame ("check the latest version - maybe it fixes your problem ? ") and sometimes not possible (e.g. someone has pre-built docker image with dependencies from few months ago and cannot rebuild the image easily). J. On Fri, Oct 5, 2018 at 12:35 PM Ash Berlin-Taylor wrote: > One thing to point out here. > > Right now if you `pip install apache-airflow=1.10.0` in a clean > environment it will fail. > > This is because we pin flask-login to 0.2.1 but flask-appbuilder is >= > 1.11.1, so that pulls in 1.12.0 which requires flask-login >= 0.3. > > So I do think there is maybe something to be said about pinning for > releases. The down side to that is that if there are updates to a module > that we want then we have to make a point release to let people get it > > Both methods have draw-backs > > -ash > > > On 4 Oct 2018, at 17:13, Arthur Wiedmer > wrote: > > > > Hi Jarek, > > > > I will +1 the discussion Dan is referring to and George's advice. > > > > I just want to double check we are talking about pinning in > > requirements.txt only. > > > > This offers the ability to > > pip install -r requirements.txt > > pip install --no-deps airflow > > For a guaranteed install which works. > > > > Several different requirement files can be provided for specific use > cases, > > like a stable dev one for instance for people wanting to work on > operators > > and non-core functions. > > > > However, I think we should proactively test in CI against unpinned > > dependencies (though it might be a separate case in the matrix) , so that > > we get advance warning if possible that things will break. > > CI downtime is not a bad thing
Re: Pinning dependencies for Apache Airflow
I think one solution to release approach is to check as part of automated Travis build if all requirements are pinned with == (even the deep ones) and fail the build in case they are not for ALL versions (including dev). And of course we should document the approach of releases/upgrades etc. If we do it all the time for development versions (which seems quite doable), then transitively all the releases will also have pinned versions and they will never try to upgrade any of the dependencies. In poetry (similarly in pip-tools with .in file) it is done by having a .lock file that specifies exact versions of each package so it can be rather easy to manage (so it's worth trying it out I think :D - seems a bit more friendly than pip-tools). There is a drawback - of course - with manually updating the module that you want, but I really see that as an advantage rather than drawback especially for users. This way you maintain the property that it will always install and work the same way no matter if you installed it today or two months ago. I think the biggest drawback for maintainers is that you need some kind of monitoring of security vulnerabilities and cannot rely on automated security upgrades. With >= requirements those security updates might happen automatically without anyone noticing, but to be honest I don't think such upgrades are guaranteed even in current setup for all security issues for all libraries anyway. Finding the need to upgrade because of security issues can be quite automated. Even now I noticed Github started to inform owners about potential security vulnerabilities in used libraries for their project. Those notifications can be sent to devlist and turned into JIRA issues followed bvy minor security-related releases (with only few library dependencies upgraded). I think it's even easier to automate it if you have pinned dependencies - because it's generally easy to find applicable vulnerabilities for specific versions of libraries by static analysers - when you have >=, you never know which version will be used until you actually perform the installation. There is one big advantage for maintainers for "pinned" case. Your users always have the same dependencies - so when issue is raised, you can reproduce it more easily. It's hard to know which version user has (as the user could install it month ago or yesterday) and even if you find out by asking the user, you might not be able to reproduce the set of requirements easily (simply because there are already newer versions of the libraries released and they are used automatically). You can ask the user to run pip --upgrade but that's dangerous and pretty lame ("check the latest version - maybe it fixes your problem ? ") and sometimes not possible (e.g. someone has pre-built docker image with dependencies from few months ago and cannot rebuild the image easily). J. On Fri, Oct 5, 2018 at 12:35 PM Ash Berlin-Taylor wrote: > One thing to point out here. > > Right now if you `pip install apache-airflow=1.10.0` in a clean > environment it will fail. > > This is because we pin flask-login to 0.2.1 but flask-appbuilder is >= > 1.11.1, so that pulls in 1.12.0 which requires flask-login >= 0.3. > > So I do think there is maybe something to be said about pinning for > releases. The down side to that is that if there are updates to a module > that we want then we have to make a point release to let people get it > > Both methods have draw-backs > > -ash > > > On 4 Oct 2018, at 17:13, Arthur Wiedmer > wrote: > > > > Hi Jarek, > > > > I will +1 the discussion Dan is referring to and George's advice. > > > > I just want to double check we are talking about pinning in > > requirements.txt only. > > > > This offers the ability to > > pip install -r requirements.txt > > pip install --no-deps airflow > > For a guaranteed install which works. > > > > Several different requirement files can be provided for specific use > cases, > > like a stable dev one for instance for people wanting to work on > operators > > and non-core functions. > > > > However, I think we should proactively test in CI against unpinned > > dependencies (though it might be a separate case in the matrix) , so that > > we get advance warning if possible that things will break. > > CI downtime is not a bad thing here, it actually caught a problem :) > > > > We should unpin as possible in setup.py to only maintain minimum required > > compatibility. The process of pinning in setup.py is extremely > detrimental > > when you have a large number of python libraries installed with different > > pinned versions. > > > > Best, > > Arthur > > > > On Thu, Oct 4, 2018 at 8:36 AM Dan Davydov > > > wrote: > > > >> Relevant discussion about this: > >> > >> > https://github.com/apache/incubator-airflow/pull/1809#issuecomment-257502174 > >> > >> On Thu, Oct 4, 2018 at 11:25 AM Jarek Potiuk > >> wrote: > >> > >>> TL;DR; A change is coming in the way how dependencies/requirements are > >>>
Re: PR for refactoring Airflow SLAs
Hello all, Is there any update on the status of this PR? I discovered this following a request for help on StackOverflow (on creating SLAs on task duration https://stackoverflow.com/questions/52645422/sla-on-task-duration-airflow). If this is unlikely to implemented in the short term, is there a known workaround? Colin N On 2018/07/17 18:56:35, Maxime Beauchemin wrote: > I did a quick scan and it looks like great work, thanks for your> > contribution! I'm guessing the committers are all either busy or> > vacationing at this moment. Let's make sure this gets properly reviewed and> > merged.> > > Related to this is the thought of having a formal flow for improvement> > proposals so that we can do some design review upfront, and couple> > contributors with committers early to make sure the process goes through> > smoothly. It hurts to have to have quality contributions ignored. Clearly> > we need to onboard more committers to insure quality work gets merged while> > also providing steady, high quality releases.> > > In the meantime I'd advise you to ping regularly to make sure this PR gets> > the attention it deserves and prevent it from getting buried in the pile.> > > Max> > > On Tue, Jul 17, 2018 at 5:27 AM James Meickle> > wrote:> > > > Hi all,> > >> > > I'd still love to get some eyes on this one if anyone has time. Definitely> > > needs some direction as to what is required before merging, since this is > > a> > > higher-level API change...> > >> > > -James M.> > >> > > On Mon, Jul 9, 2018 at 11:58 AM, James Meickle > > > wrote:> > >> > > > Hi folks,> > > >> > > > Based on my earlier email to the list, I have submitted a PR that splits> > > > `sla=` into three independent SLA parameters, as well as heavily> > > > restructuring other parts of the SLA feature:> > > >> > > > https://github.com/apache/incubator-airflow/pull/3584> > > >> > > > This is my first Airflow PR and I'm still learning the codebase, so> > > > there's likely to be flaws with it. But I'm most interested in the> > > general> > > > compatibility of this feature with the rest of Airflow. We want this for> > > > our purposes at Quantopian, but we'd really prefer to get it into > > > Airflow> > > > core rather than running a fork forever!> > > >> > > > Let me know your thoughts,> > > >> > > > -James M.> > > >> > >> >
Re: Pinning dependencies for Apache Airflow
One thing to point out here. Right now if you `pip install apache-airflow=1.10.0` in a clean environment it will fail. This is because we pin flask-login to 0.2.1 but flask-appbuilder is >= 1.11.1, so that pulls in 1.12.0 which requires flask-login >= 0.3. So I do think there is maybe something to be said about pinning for releases. The down side to that is that if there are updates to a module that we want then we have to make a point release to let people get it Both methods have draw-backs -ash > On 4 Oct 2018, at 17:13, Arthur Wiedmer wrote: > > Hi Jarek, > > I will +1 the discussion Dan is referring to and George's advice. > > I just want to double check we are talking about pinning in > requirements.txt only. > > This offers the ability to > pip install -r requirements.txt > pip install --no-deps airflow > For a guaranteed install which works. > > Several different requirement files can be provided for specific use cases, > like a stable dev one for instance for people wanting to work on operators > and non-core functions. > > However, I think we should proactively test in CI against unpinned > dependencies (though it might be a separate case in the matrix) , so that > we get advance warning if possible that things will break. > CI downtime is not a bad thing here, it actually caught a problem :) > > We should unpin as possible in setup.py to only maintain minimum required > compatibility. The process of pinning in setup.py is extremely detrimental > when you have a large number of python libraries installed with different > pinned versions. > > Best, > Arthur > > On Thu, Oct 4, 2018 at 8:36 AM Dan Davydov > wrote: > >> Relevant discussion about this: >> >> https://github.com/apache/incubator-airflow/pull/1809#issuecomment-257502174 >> >> On Thu, Oct 4, 2018 at 11:25 AM Jarek Potiuk >> wrote: >> >>> TL;DR; A change is coming in the way how dependencies/requirements are >>> specified for Apache Airflow - they will be fixed rather than flexible >> (== >>> rather than >=). >>> >>> This is follow up after Slack discussion we had with Ash and Kaxil - >>> summarising what we propose we'll do. >>> >>> *Problem:* >>> During last few weeks we experienced quite a few downtimes of TravisCI >>> builds (for all PRs/branches including master) as some of the transitive >>> dependencies were automatically upgraded. This because in a number of >>> dependencies we have >= rather than == dependencies. >>> >>> Whenever there is a new release of such dependency, it might cause chain >>> reaction with upgrade of transitive dependencies which might get into >>> conflict. >>> >>> An example was Flask-AppBuilder vs flask-login transitive dependency with >>> click. They started to conflict once AppBuilder has released version >>> 1.12.0. >>> >>> *Diagnosis:* >>> Transitive dependencies with "flexible" versions (where >= is used >> instead >>> of ==) is a reason for "dependency hell". We will sooner or later hit >> other >>> cases where not fixed dependencies cause similar problems with other >>> transitive dependencies. We need to fix-pin them. This causes problems >> for >>> both - released versions (cause they stop to work!) and for development >>> (cause they break master builds in TravisCI and prevent people from >>> installing development environment from the scratch. >>> >>> *Solution:* >>> >>> - Following the old-but-good post >>> https://nvie.com/posts/pin-your-packages/ we are going to fix the >>> pinned >>> dependencies to specific versions (so basically all dependencies are >>> "fixed"). >>> - We will introduce mechanism to be able to upgrade dependencies with >>> pip-tools (https://github.com/jazzband/pip-tools). We might also >> take a >>> look at pipenv: https://pipenv.readthedocs.io/en/latest/ >>> - People who would like to upgrade some dependencies for their PRs >> will >>> still be able to do it - but such upgrades will be in their PR thus >> they >>> will go through TravisCI tests and they will also have to be specified >>> with >>> pinned fixed versions (==). This should be part of review process to >>> make >>> sure new/changed requirements are pinned. >>> - In release process there will be a point where an upgrade will be >>> attempted for all requirements (using pip-tools) so that we are not >>> stuck >>> with older releases. This will be in controlled PR environment where >>> there >>> will be time to fix all dependencies without impacting others and >> likely >>> enough time to "vet" such changes (this can be done for alpha/beta >>> releases >>> for example). >>> - As a side effect dependencies specification will become far simpler >>> and straightforward. >>> >>> Happy to hear community comments to the proposal. I am happy to take a >> lead >>> on that, open JIRA issue and implement if this is something community is >>> happy with. >>> >>> J. >>> >>> -- >>> >>> *Jarek Potiuk, Principal Software Engineer* >>> Mobile: +48 660
Re: Pinning dependencies for Apache Airflow
Never tried poetry before, but it looks really good (it passes also my aesthetic filter for slick design of the webpage). Quick look and it passes a lot of criteria I have in my mind: - works on all platforms - easily installable with pip - uses standard PyPI repositories by default (but you can switch to private) - .lock file paradigm (similar to other pinning solutions - such as yarn and gradle) - automated virtualenv creation - has support for python 2.7 and 3.4 - is pretty active and seems to have not very big but not small either number of contributors (https://github.com/sdispater/poetry) The one thing about pip-tools which I do not like is that it actually uses requirements.in -> requirements.txt generation and some people might not realise that you should not modify the requirements.txt by hand (who reads the comments anyway!). There is no such case for poetry it seems, but it might be that some IDE support will be lost as well - for example the excellent IntelliJ support (something I'd like to try). I am tempted to try it and report how it works for Airflow. It's a question to community whether they will be happy to accept such relatively new tool in the standard toolchain. It's quite a change and it seems a bit more than just package manager - with the virtualenv automated integration (on the other hand it's kind of nice that by default it forces you to work in virtualenv). J. On Fri, Oct 5, 2018 at 9:04 AM Björn Pollex wrote: > Hi all, > > Have you considered looking into poetry[1]? I’ve had really good > experiences with it, we specifically introduced it into our project because > we were getting version conflicts, and it resolved them just fine. It > properly supports semantic versioning, so package versions have upper > bounds. It also has a full dependency resolver, so even when package > upgrades are available, it will only upgrade if the version constraints > allow it. It does have some issues though, most notably that it depends on > package metadata being correct to properly resolve dependencies, and that’s > not always the case. > > Cheers, > > Björn > > [1]: https://poetry.eustace.io/ > > > On 5. Oct 2018, at 03:58, James Meickle > wrote: > > > > I suggest not adopting pipenv. It has a nice "first five minutes" demo > but > > it's simply not baked enough to depend on as a swap in pip replacement. > We > > are in the process of removing it after finding several serious bugs in > our > > POC of it. > > > > On Thu, Oct 4, 2018, 20:30 Alex Guziel > > wrote: > > > >> FWIW, there's some value in using virtualenv with Docker to isolate > >> yourself from your system's Python. > >> > >> It's worth noting that requirements files can link other requirements > >> files, so that would make groups easier, but not that pip in one run > has no > >> guarantee of transitive dependencies not conflicting or overriding. You > >> need pip check for that or use --no-deps. > >> > >> On Thu, Oct 4, 2018 at 5:19 PM Driesprong, Fokko > >> wrote: > >> > >>> Hi Jarek, > >>> > >>> Thanks for bringing this up. I missed the discussion on Slack since I'm > >> on > >>> holiday, but I saw the thread and it was way too interesting, and > >> therefore > >>> this email :) > >>> > >>> This is actually something that we need to address asap. Like you > >> mention, > >>> we saw it earlier that specific transient dependencies are not > compatible > >>> and then we end up with a breaking CI, or even worse, a broken release. > >>> Earlier we had in the setup.py the fixed versions (==) and in a > separate > >>> requirements.txt the requirements for the CI. This was also far from > >>> optimal since we had two versions of the requirements. > >>> > >>> I like the idea that you are proposing. Maybe we can do an experiment > >> with > >>> it, because of the nature of Airflow (orchestrating different systems), > >> we > >>> have a huge list of dependencies. To not install everything, we've > >> created > >>> groups. For example specific libraries when you're using the Google > >> Cloud, > >>> Elastic, Druid, etc. So I'm curious how it will work with the ` > >>> extras_require` of Airflow > >>> > >>> Regarding the pipenv. I don't use any pipenv/virtualenv anymore. For me > >>> Docker is much easier to work with. I'm also working on a PR to get rid > >> of > >>> tox for the testing, and move to a more Docker idiomatic test pipeline. > >>> Curious what you thoughts are on that. > >>> > >>> Cheers, Fokko > >>> > >>> Op do 4 okt. 2018 om 15:39 schreef Arthur Wiedmer < > >>> arthur.wied...@gmail.com > : > >>> > Thanks Jakob! > > I think that this is a huge risk of Slack. > I am not against Slack as a support channel, but it is a slippery > slope > >>> to > have more and more decisions/conversations happening there, contrary > to > what we hope to achieve with the ASF. > > When we are starting to discuss issues of development, extensions and >
Re: Pinning dependencies for Apache Airflow
Hi all, Have you considered looking into poetry[1]? I’ve had really good experiences with it, we specifically introduced it into our project because we were getting version conflicts, and it resolved them just fine. It properly supports semantic versioning, so package versions have upper bounds. It also has a full dependency resolver, so even when package upgrades are available, it will only upgrade if the version constraints allow it. It does have some issues though, most notably that it depends on package metadata being correct to properly resolve dependencies, and that’s not always the case. Cheers, Björn [1]: https://poetry.eustace.io/ > On 5. Oct 2018, at 03:58, James Meickle > wrote: > > I suggest not adopting pipenv. It has a nice "first five minutes" demo but > it's simply not baked enough to depend on as a swap in pip replacement. We > are in the process of removing it after finding several serious bugs in our > POC of it. > > On Thu, Oct 4, 2018, 20:30 Alex Guziel > wrote: > >> FWIW, there's some value in using virtualenv with Docker to isolate >> yourself from your system's Python. >> >> It's worth noting that requirements files can link other requirements >> files, so that would make groups easier, but not that pip in one run has no >> guarantee of transitive dependencies not conflicting or overriding. You >> need pip check for that or use --no-deps. >> >> On Thu, Oct 4, 2018 at 5:19 PM Driesprong, Fokko >> wrote: >> >>> Hi Jarek, >>> >>> Thanks for bringing this up. I missed the discussion on Slack since I'm >> on >>> holiday, but I saw the thread and it was way too interesting, and >> therefore >>> this email :) >>> >>> This is actually something that we need to address asap. Like you >> mention, >>> we saw it earlier that specific transient dependencies are not compatible >>> and then we end up with a breaking CI, or even worse, a broken release. >>> Earlier we had in the setup.py the fixed versions (==) and in a separate >>> requirements.txt the requirements for the CI. This was also far from >>> optimal since we had two versions of the requirements. >>> >>> I like the idea that you are proposing. Maybe we can do an experiment >> with >>> it, because of the nature of Airflow (orchestrating different systems), >> we >>> have a huge list of dependencies. To not install everything, we've >> created >>> groups. For example specific libraries when you're using the Google >> Cloud, >>> Elastic, Druid, etc. So I'm curious how it will work with the ` >>> extras_require` of Airflow >>> >>> Regarding the pipenv. I don't use any pipenv/virtualenv anymore. For me >>> Docker is much easier to work with. I'm also working on a PR to get rid >> of >>> tox for the testing, and move to a more Docker idiomatic test pipeline. >>> Curious what you thoughts are on that. >>> >>> Cheers, Fokko >>> >>> Op do 4 okt. 2018 om 15:39 schreef Arthur Wiedmer < >>> arthur.wied...@gmail.com : >>> Thanks Jakob! I think that this is a huge risk of Slack. I am not against Slack as a support channel, but it is a slippery slope >>> to have more and more decisions/conversations happening there, contrary to what we hope to achieve with the ASF. When we are starting to discuss issues of development, extensions and improvements, it is important for the discussion to happen in the >> mailing list. Jarek, I wouldn't worry too much, we are still in the process of >> learning as a community. Welcome and thank you for your contribution! Best, Arthur. On Thu, Oct 4, 2018 at 1:42 PM Jarek Potiuk wrote: > Thanks for pointing it out Jakob. > > I am still very fresh in the ASF community and learning the ropes and > etiquette and code of conduct. Apologies for my ignorance. > I re-read the conduct and FAQ now again - with more understanding and will > pay more attention to wording in the future. As you mentioned it's >> more the > wording than intentions, but since it was in TL;DR; it has stronger > consequences. > > BTW. Thanks for actually following the code of conduct and pointing >> it out > in respectful manner. I really appreciate it. > > J. > > Principal Software Engineer > Phone: +48660796129 > > On Thu, 4 Oct 2018, 20:41 Jakob Homan, wrote: > >>> TL;DR; A change is coming in the way how >> dependencies/requirements are >>> specified for Apache Airflow - they will be fixed rather than flexible >> (== >>> rather than >=). >> >>> This is follow up after Slack discussion we had with Ash and >> Kaxil >>> - >>> summarising what we propose we'll do. >> >> Hey all. It's great that we're moving this discussion back from >>> Slack >> to the mailing list. But I've gotta point out that the wording >> needs >> a small but critical fix up: >>