Re: Podling Report Reminder - October 2018
Justin, Please advise if we need to do the Oct report as well. Cheers! -s On Wed, Sep 19, 2018 at 5:53 PM Justin Mclean wrote: > Hi, > > The board meeting was only a few hours ago so September wouldn't show up > yet on that link. > > Thanks, > Justin > > On Thu., 20 Sep. 2018, 10:43 am Sid Anand, wrote: > > > Hi Justin! > > Well, it was signed off by Jakob (Mentor): > > https://wiki.apache.org/incubator/September2018 > > > > Why was this missed in the board minutes? > > > > This is super annoying. I keep filling these out and either they are not > > signed off or they get missed in other processes. > > > > In August, I filled it out but no mentors signed off : > > https://wiki.apache.org/incubator/August2018 > > -s > > > > > > On Wed, Sep 19, 2018 at 4:42 PM Justin Mclean > > wrote: > > > >> I don't see a report for September here > >> > >> https://whimsy.apache.org/board/minutes/Airflow.html > >> > >> On Thu., 20 Sep. 2018, 9:33 am Sid Anand, wrote: > >> > >>> Hi Jim, > >>> Apache Airflow just submitted this in September and it was signed off > by > >>> 1 mentor. Why is another one needed so soon given that we have been > >>> incubating for more than 2 years? My understanding is that reporting > needs > >>> to be quarterly. > >>> > >>> -s > >>> > >>> On Wed, Sep 19, 2018 at 2:58 PM wrote: > >>> > Dear podling, > > This email was sent by an automated system on behalf of the Apache > Incubator PMC. It is an initial reminder to give you plenty of time to > prepare your quarterly board report. > > The board meeting is scheduled for Wed, 17 October 2018, 10:30 am PDT. > The report for your podling will form a part of the Incubator PMC > report. The Incubator PMC requires your report to be submitted 2 weeks > before the board meeting, to allow sufficient time for review and > submission (Wed, October 03). > > Please submit your report with sufficient time to allow the Incubator > PMC, and subsequently board members to review and digest. Again, the > very latest you should submit your report is 2 weeks prior to the > board > meeting. > > Candidate names should not be made public before people are actually > elected, so please do not include the names of potential committers or > PPMC members in your report. > > Thanks, > > The Apache Incubator PMC > > Submitting your Report > > -- > > Your report should contain the following: > > * Your project name > * A brief description of your project, which assumes no knowledge of > the project or necessarily of its field > * A list of the three most important issues to address in the move > towards graduation. > * Any issues that the Incubator PMC or ASF Board might wish/need to > be > aware of > * How has the community developed since the last report > * How has the project developed since the last report. > * How does the podling rate their own maturity. > > This should be appended to the Incubator Wiki page at: > > https://wiki.apache.org/incubator/October2018 > > Note: This is manually populated. You may need to wait a little before > this page is created from a template. > > Mentors > --- > > Mentors should review reports for their project(s) and sign them off > on > the Incubator wiki page. Signing off reports shows that you are > following the project - projects that are not signed may raise alarms > for the Incubator PMC. > > Incubator PMC > > >>> >
Re: Podling Report Reminder - October 2018
Hi, The board meeting was only a few hours ago so September wouldn't show up yet on that link. Thanks, Justin On Thu., 20 Sep. 2018, 10:43 am Sid Anand, wrote: > Hi Justin! > Well, it was signed off by Jakob (Mentor): > https://wiki.apache.org/incubator/September2018 > > Why was this missed in the board minutes? > > This is super annoying. I keep filling these out and either they are not > signed off or they get missed in other processes. > > In August, I filled it out but no mentors signed off : > https://wiki.apache.org/incubator/August2018 > -s > > > On Wed, Sep 19, 2018 at 4:42 PM Justin Mclean > wrote: > >> I don't see a report for September here >> >> https://whimsy.apache.org/board/minutes/Airflow.html >> >> On Thu., 20 Sep. 2018, 9:33 am Sid Anand, wrote: >> >>> Hi Jim, >>> Apache Airflow just submitted this in September and it was signed off by >>> 1 mentor. Why is another one needed so soon given that we have been >>> incubating for more than 2 years? My understanding is that reporting needs >>> to be quarterly. >>> >>> -s >>> >>> On Wed, Sep 19, 2018 at 2:58 PM wrote: >>> Dear podling, This email was sent by an automated system on behalf of the Apache Incubator PMC. It is an initial reminder to give you plenty of time to prepare your quarterly board report. The board meeting is scheduled for Wed, 17 October 2018, 10:30 am PDT. The report for your podling will form a part of the Incubator PMC report. The Incubator PMC requires your report to be submitted 2 weeks before the board meeting, to allow sufficient time for review and submission (Wed, October 03). Please submit your report with sufficient time to allow the Incubator PMC, and subsequently board members to review and digest. Again, the very latest you should submit your report is 2 weeks prior to the board meeting. Candidate names should not be made public before people are actually elected, so please do not include the names of potential committers or PPMC members in your report. Thanks, The Apache Incubator PMC Submitting your Report -- Your report should contain the following: * Your project name * A brief description of your project, which assumes no knowledge of the project or necessarily of its field * A list of the three most important issues to address in the move towards graduation. * Any issues that the Incubator PMC or ASF Board might wish/need to be aware of * How has the community developed since the last report * How has the project developed since the last report. * How does the podling rate their own maturity. This should be appended to the Incubator Wiki page at: https://wiki.apache.org/incubator/October2018 Note: This is manually populated. You may need to wait a little before this page is created from a template. Mentors --- Mentors should review reports for their project(s) and sign them off on the Incubator wiki page. Signing off reports shows that you are following the project - projects that are not signed may raise alarms for the Incubator PMC. Incubator PMC >>>
Fwd: Auto-cleaning up Stale PRs
Re-opening https://issues.apache.org/jira/browse/INFRA-17005 -s -- Forwarded message - From: Sid Anand Date: Wed, Sep 19, 2018 at 5:47 PM Subject: Re: Auto-cleaning up Stale PRs To: Ismael, Thanks for this pointer. I've re-opened my INFRA ticket and referenced your Apache Beam one. Super helpful.. if we get it enabled, please collect a beer from anyone in the Apache Airflow community! -s On Wed, Sep 19, 2018 at 7:39 AM Ismaël Mejía wrote: > While I agree that autoclosing PRs can be unwelcoming. I don't see > clearly the argument of INFRA in the ticket. > > > The policy of no-write-access for bots is a requirement by the > foundation legal team. We cannot allow write access to repos without an > ICLA. > > Labeling and closing the PR in github does not imply write-access from > the bot into the 'real' gitbox repository, so I don't see how this can > be an issue, or are we in a gray area (in case bot automation of > metadata can have legal issues which I doubt since this is not part of > the source distribution). > > As a precedent we had Probot/Stale enabled for Apache Beam so I > suppose that this should be possible for Airflow too. > https://issues.apache.org/jira/browse/INFRA-16589 > > On Thu, Sep 13, 2018 at 5:55 PM Sid Anand wrote: > > > > Apache Airflow has, at any point, >200 PRs open. During the slower summer > > months, we've been merging 100-200 PRs a month. We have been growing the > > community -- we have <600 contributors, ~200 companies using it, and 20+ > > committers. A person is promoted to "Committer" in recognition for work > > he/she has done without an expectation of future work in maintaining the > > code base. Hence, minting new committers doesn't always translate into > > greater bench strength where merging PRs is concerned. That said, we are > > actively adding new committers. The last 4-5 committers we added have > been > > super active maintainers, so the coverage on PRs and questions has been > > getting better. > > > > There are many causes of Cold-case PRs: > > > >1. Submitter is not actively responding > > 1. One example is that we requested tests and they were never > written > > 2. Discussion ensued on the PR and the submitter did not accept the > > community's feedback > >2. Committers didn't get to it in a timely manner and after a while > the > >engagement fell > > > > We are in a better position now to handle (2) -- this was not the case a > > year ago. We're at least able to keep up with our in-flow of PRs > > week-to-week, but are still having challenges with the > > previously-established backlog. But, (1) is also a contributor to stale > PRs. > > > > We do have a lot of stale PRs to manually handle -- I spent all of Summer > > 2017 pinging submitters of old PRs and I find myself in the same position > > now. > > > > Probot/stale is a useful tool. It has legitimate use-cases. A policy > > reflects the health/mentality/approaches of the community. A tool like > this > > enforces the policy. Let's not overlook adoption of what would be a very > > useful tool to the community due to a meta conversation about policy. I > > think everyone on this list cares about growing a healthy and vibrant > > community. We also care about being efficient with our spare time. This > > tools can help us manage both. > > > > Also, I am not suggesting that we close JIRA, just stale PRs. JIRAs need > to > > be kept open so we don't lose visibility of bugs/features/etc... This > tool > > doesn't handle JIRA closing anyway. > > > > -s > > > > On Thu, Sep 13, 2018 at 1:37 AM Mark Thomas wrote: > > > > > On 12/09/18 19:16, Sid Anand wrote: > > > > A stale PR is defined by a policy -- for example, 60 days without any > > > > movement on the PR. > > > > > > Automatically closing such issues is not going to do anything to aid > > > community building and is likely to actively damage such efforts. > > > > > > > Stale PRs would be bad experiences in general for community members, > but > > > > after no movement for 60 days, this is just about cleaning up PRs > that > > > are > > > > not getting feedback from the committers or PR submitters. > > > > > > That is the wrong solution the problem. > > > > > > If reporters of issues are not responding to questions and there is > > > genuinely nothing the community can do to progress the issue without > > > their input then closing the issue is fair enough. But that should very > > > much be the exception rather than the rule. In projects I am involved > in > > > I probably do that a handful of times a year. However, even in a good > > > chunk of those cases, the main reason for the lack of response from the > > > OP is that the community did not respond to the original report for an > > > excessively long time. > > > > > > If the committers are not responding to issues in a timely manner then > > > the solution is to start looking for more committers. > > > > > > Reporting an issue is often the
Re: Connection Management in Multi-tenancy Scenario
Given the REST API is upcoming, and the DAG-level access control is in progress as well, maybe let’s revisit this Connection Management topic later when these “infrastructure” is fully ready. XD On Wed, Sep 19, 2018 at 23:21 Maxime Beauchemin wrote: > Another clear solution is for connection management to go through the > [upcoming] REST API we've been talking about. Then of course we'll need one > permission per connection and a "all_connections" perm that can be added to > roles (much like DAGs but for connections). > > Max > > On Wed, Sep 19, 2018 at 7:25 AM Ash Berlin-Taylor wrote: > > > You are correct that currently all DAGs can access all connections and > > variables. > > > > The other thing to bear in mind: currently PythonOperators have an active > > connection to the metadata DB where connections are stored, so at best > this > > is "co-operative" security, to prevent one team from accessing another > > team's connections, and not a hard barrier against an even mildly > > determined attacker. > > > > As for the implementation of it: it would be worth looking to see if we > > can use the Permissions model built in to FAB (Flask App Builder) that we > > are using in the RBAC-based UI. This would allow for much more granular > > permissions, and provides a pre-existing management UI for it to. > > > > I don't know if this would make the work dependent on the (in progress?) > > DAG-level access controls. > > > > -ash > > > > > On 19 Sep 2018, at 15:00, Deng Xiaodong wrote: > > > > > > Hi folks, > > > > > > Thinking of a scenario: I may have multiple users in the same Airflow > > > instance. I can use filter_by_owner feature so that each user can only > > see > > > their own DAGs. But what if their DAGs are using different data > sources, > > > say owner A is using mysql_conn_a, and owner B is using mysql_conn_b, > and > > > we don't want to allow them to access each other's database? > > > > > > Seems like all DAG (no matter who is the owner) can access all defined > > > connections? or have I missed something? > > > > > > If my suspicion is making sense, I think it would be necessary to have > > > values "*if_protect*" and "*owner*" for each connection. When > > "if_protect" > > > == True, only DAGs whose owner == "owner" would be able to use this > > > connection. I would like to take this up to prepare a PR. > > > > > > Thanks. > > > > > > XD > > > > >
Re: Podling Report Reminder - October 2018
Hi Jim, Apache Airflow just submitted this in September and it was signed off by 1 mentor. Why is another one needed so soon given that we have been incubating for more than 2 years? My understanding is that reporting needs to be quarterly. -s On Wed, Sep 19, 2018 at 2:58 PM wrote: > Dear podling, > > This email was sent by an automated system on behalf of the Apache > Incubator PMC. It is an initial reminder to give you plenty of time to > prepare your quarterly board report. > > The board meeting is scheduled for Wed, 17 October 2018, 10:30 am PDT. > The report for your podling will form a part of the Incubator PMC > report. The Incubator PMC requires your report to be submitted 2 weeks > before the board meeting, to allow sufficient time for review and > submission (Wed, October 03). > > Please submit your report with sufficient time to allow the Incubator > PMC, and subsequently board members to review and digest. Again, the > very latest you should submit your report is 2 weeks prior to the board > meeting. > > Candidate names should not be made public before people are actually > elected, so please do not include the names of potential committers or > PPMC members in your report. > > Thanks, > > The Apache Incubator PMC > > Submitting your Report > > -- > > Your report should contain the following: > > * Your project name > * A brief description of your project, which assumes no knowledge of > the project or necessarily of its field > * A list of the three most important issues to address in the move > towards graduation. > * Any issues that the Incubator PMC or ASF Board might wish/need to be > aware of > * How has the community developed since the last report > * How has the project developed since the last report. > * How does the podling rate their own maturity. > > This should be appended to the Incubator Wiki page at: > > https://wiki.apache.org/incubator/October2018 > > Note: This is manually populated. You may need to wait a little before > this page is created from a template. > > Mentors > --- > > Mentors should review reports for their project(s) and sign them off on > the Incubator wiki page. Signing off reports shows that you are > following the project - projects that are not signed may raise alarms > for the Incubator PMC. > > Incubator PMC >
Podling Report Reminder - October 2018
Dear podling, This email was sent by an automated system on behalf of the Apache Incubator PMC. It is an initial reminder to give you plenty of time to prepare your quarterly board report. The board meeting is scheduled for Wed, 17 October 2018, 10:30 am PDT. The report for your podling will form a part of the Incubator PMC report. The Incubator PMC requires your report to be submitted 2 weeks before the board meeting, to allow sufficient time for review and submission (Wed, October 03). Please submit your report with sufficient time to allow the Incubator PMC, and subsequently board members to review and digest. Again, the very latest you should submit your report is 2 weeks prior to the board meeting. Candidate names should not be made public before people are actually elected, so please do not include the names of potential committers or PPMC members in your report. Thanks, The Apache Incubator PMC Submitting your Report -- Your report should contain the following: * Your project name * A brief description of your project, which assumes no knowledge of the project or necessarily of its field * A list of the three most important issues to address in the move towards graduation. * Any issues that the Incubator PMC or ASF Board might wish/need to be aware of * How has the community developed since the last report * How has the project developed since the last report. * How does the podling rate their own maturity. This should be appended to the Incubator Wiki page at: https://wiki.apache.org/incubator/October2018 Note: This is manually populated. You may need to wait a little before this page is created from a template. Mentors --- Mentors should review reports for their project(s) and sign them off on the Incubator wiki page. Signing off reports shows that you are following the project - projects that are not signed may raise alarms for the Incubator PMC. Incubator PMC
Re: Connection Management in Multi-tenancy Scenario
Another clear solution is for connection management to go through the [upcoming] REST API we've been talking about. Then of course we'll need one permission per connection and a "all_connections" perm that can be added to roles (much like DAGs but for connections). Max On Wed, Sep 19, 2018 at 7:25 AM Ash Berlin-Taylor wrote: > You are correct that currently all DAGs can access all connections and > variables. > > The other thing to bear in mind: currently PythonOperators have an active > connection to the metadata DB where connections are stored, so at best this > is "co-operative" security, to prevent one team from accessing another > team's connections, and not a hard barrier against an even mildly > determined attacker. > > As for the implementation of it: it would be worth looking to see if we > can use the Permissions model built in to FAB (Flask App Builder) that we > are using in the RBAC-based UI. This would allow for much more granular > permissions, and provides a pre-existing management UI for it to. > > I don't know if this would make the work dependent on the (in progress?) > DAG-level access controls. > > -ash > > > On 19 Sep 2018, at 15:00, Deng Xiaodong wrote: > > > > Hi folks, > > > > Thinking of a scenario: I may have multiple users in the same Airflow > > instance. I can use filter_by_owner feature so that each user can only > see > > their own DAGs. But what if their DAGs are using different data sources, > > say owner A is using mysql_conn_a, and owner B is using mysql_conn_b, and > > we don't want to allow them to access each other's database? > > > > Seems like all DAG (no matter who is the owner) can access all defined > > connections? or have I missed something? > > > > If my suspicion is making sense, I think it would be necessary to have > > values "*if_protect*" and "*owner*" for each connection. When > "if_protect" > > == True, only DAGs whose owner == "owner" would be able to use this > > connection. I would like to take this up to prepare a PR. > > > > Thanks. > > > > XD > >
Re: Connection Management in Multi-tenancy Scenario
You are correct that currently all DAGs can access all connections and variables. The other thing to bear in mind: currently PythonOperators have an active connection to the metadata DB where connections are stored, so at best this is "co-operative" security, to prevent one team from accessing another team's connections, and not a hard barrier against an even mildly determined attacker. As for the implementation of it: it would be worth looking to see if we can use the Permissions model built in to FAB (Flask App Builder) that we are using in the RBAC-based UI. This would allow for much more granular permissions, and provides a pre-existing management UI for it to. I don't know if this would make the work dependent on the (in progress?) DAG-level access controls. -ash > On 19 Sep 2018, at 15:00, Deng Xiaodong wrote: > > Hi folks, > > Thinking of a scenario: I may have multiple users in the same Airflow > instance. I can use filter_by_owner feature so that each user can only see > their own DAGs. But what if their DAGs are using different data sources, > say owner A is using mysql_conn_a, and owner B is using mysql_conn_b, and > we don't want to allow them to access each other's database? > > Seems like all DAG (no matter who is the owner) can access all defined > connections? or have I missed something? > > If my suspicion is making sense, I think it would be necessary to have > values "*if_protect*" and "*owner*" for each connection. When "if_protect" > == True, only DAGs whose owner == "owner" would be able to use this > connection. I would like to take this up to prepare a PR. > > Thanks. > > XD
Connection Management in Multi-tenancy Scenario
Hi folks, Thinking of a scenario: I may have multiple users in the same Airflow instance. I can use filter_by_owner feature so that each user can only see their own DAGs. But what if their DAGs are using different data sources, say owner A is using mysql_conn_a, and owner B is using mysql_conn_b, and we don't want to allow them to access each other's database? Seems like all DAG (no matter who is the owner) can access all defined connections? or have I missed something? If my suspicion is making sense, I think it would be necessary to have values "*if_protect*" and "*owner*" for each connection. When "if_protect" == True, only DAGs whose owner == "owner" would be able to use this connection. I would like to take this up to prepare a PR. Thanks. XD
Re: Guidelines on Contrib vs Non-contrib
I am working on adding GCP Cloud Functions operator - https://issues.apache.org/jira/browse/AIRFLOW-2912 (and soon more GCP-related ones like GCE and CloudSQL). For now I am adding those operators in contrib (soon I will prepare a PR). I think it would indeed make sense to have those operators separated into it's own projects - it would make merging/rebasing etc. a bit easier at the expense of explicit management of dependencies (i.e. I imagine those external modules will have a dependency on some versions of Airflow - maybe "> x.y.z" version as it's important for the operators to use some of the objects/classes provided by Airflow (for example LoggingMixin). I would be happy (sooner or later) to make such move, but I am pretty fresh in Airflow and I am afraid I would not understand all consequences just yet, but I have already few things that came to my mind. Maybe someone can shed some light on those things: * the dependency management I mentioned above (not sure what is versioning scheme used by Airflow) * whether those new projects/repos will be also part of Apache Incubator? Or should they be completely independent, managed by the organisations/individuals that create them. How should we deal with responsibilities (discussed also in the separate thread linked earlier (https://lists.apache.org/thread.html/10be0c50a4aecdde66b1593cc30f0b0246035eb0b3281ee92744f783@%3Cdev.airflow.apache.org%3E) JIRA vs. GH issues). Currently all the code in airflow-incubator are community-owned (as discussed in the JIRA thread). Not sure how it would look like for separate projects, how to manage contributors there etc. I think it should be super-easy to create/maintain such repo - following simple guide - with very limited extra overhead, otherwise it will be a pain for the contributors to create and maintain separate repos. * should JIRA issues in Airflow JIRA also relate to the new projects ? (I am in favour with sticking to JIRA BTW. It's much more powerful than GH issues and as long as there are some rules everyone follows, integration plugins configured it can be much better - but some governance is indeed needed - I agree JIRA issues in Airflow are not really managed/manageable currently - especially by new contributors) J On 2018/09/18 18:01:55, James Meickle wrote: > So in favor of just using Python modules for operators. I initially wrote > mine as Airflow plugin compatible, and eventually had to un-write them that > way, so it's really a new-user trap. > > I've had at least a half dozen times installing/testing/operating Airflow > where we had some issue based on an integration for a service we've never > even used (like Hive). I would love to see all of that go away. However, we > should make sure that it's not too onerous to get a fairly fully featured > Airflow install, such as having a way for external repos/packages to even > be discoverable. > > On Tue, Sep 18, 2018 at 1:28 PM Driesprong, Fokko > wrote: > > > I fully agree with using plain Python modules :) > > > > I don't think a lot of hooks/operators graduate to core since it will break > > the import. A few of them, for example Databricks and the Google hooks are > > mature enough. For me the main point is having test coverage and a stable > > API. > > > > Cheers, Fokko > > > > Op di 18 sep. 2018 om 18:30 schreef Victor Noagbodji < > > vnoagbo...@amplify-analytics.com>: > > > > > yes, please! > > > > > > > On Sep 18, 2018, at 12:23 PM, Maxime Beauchemin < > > > maximebeauche...@gmail.com> wrote: > > > > > > > > +1 for deprecating operators/hooks as plugins, let's use Python's good > > > old > > > > python packages and maybe python "entry points" if we want to inject > > them > > > > in "airflow.operators"/"airflow.hooks" (which is probably not > > necessary) > > > > > > > > On Tue, Sep 18, 2018 at 2:12 AM Ash Berlin-Taylor > > > wrote: > > > > > > > >> Operators and hooks don't need any special plugin system - simply > > having > > > >> them as as separate Python modules which are imported using normal > > > python > > > >> semantics is enough. > > > >> > > > >> In fact now that I think about it: I want to deprecate the plugins > > > >> registering hooks/operators etc and limit it to only bits which a > > simple > > > >> python import can't manage - which I think is only anything that needs > > > to > > > >> be registered with another system, such as custom routes in the web > > UI. > > > >> > > > >> I'll draft an AIP for this soon. > > > >> > > > >> -ash > > > >> > > > >> > > > >>> On 18 Sep 2018, at 00:50, George Leslie-Waksman > > > >> wrote: > > > >>> > > > >>> Given we have a plugin system, could we alternatively move away from > > > >>> keeping non-core supported code outside of the core project/repo? > > > >>> > > > >>> It would hugely decrease the surface area of the main repository and > > > >>> testing infrastructure to get most of the contrib code out to its own > > > >> place. > > > >>> > > >