Lots of good discussion here. We should create separate threads for the
questions about (1) whether to keep or drop mysql / mssql / sqlite /
mongodb / just-pipe-to-/dev/null/ and (2) to UUID or not to UUID and (3)
database agnosticism a.k.a. an interface.
But some responses...
Using UUIDS was
> and
> > > > > suggest users that want to use them to make their analytics queries
> > > > > elsewhere. I'd very much prefer that it's slow "by design" for
> > everyone
> > > > > rather than add option for the user to speed them
n add option for the user to speed them up where we decided
> > not to do it ourselves because we know the consequences.
> >
> > I think the root cause of the problem is `US` adding new filtering
> options
> > to public API without thinking about and documenting consequen
So I think the notion that *all possibly expensive queries* should have an
index to support them is not a tenable one. E.g. there are something like
5 params on TI list endpoint that don't have an index.
In contrast with queries from airflow itself, the API queries are more
arbitrary -- user can
Historically we have added indexes as needed for the performance of airflow
itself and not for the rest API.
Lately we've observed more usage of task instances list endpoint and
specifically filtering on end_date and / or start_date and / or
execution_date.
One line of argument goes that every
As Jed said, many good contenders. Here's one vote for Kaxil's scarf PR
https://github.com/apache/airflow/pull/39510. It will be really cool to
have some more data on what users are doing.
On Tue, May 28, 2024 at 11:10 AM Kaxil Naik wrote:
> #39336 -- pain since a long long long time!
>
> On
re
As tasks require connection access, I assume connection data will somehow
> be passed as part of the
> metadata to task execution - whether it's part of the executor protocol or
> in some other way (I'm
> not an expert on that part of Airflow). Then, provided it's accessible as
> part of some
Thanks for engaging.
I don't think I need to go to a lazy consensus vote so I won't unless
someone thinks necessary.
The PR is now ready for review if anyone is interested:
https://github.com/apache/airflow/pull/39336
It was made more tricky by the fact that "backfill" is literally a second
But you could run them in a thread or subprocess.
Another option would be to just take all of the timed events and make them
all asyncio and then run them all via asyncio in one continually running
thread. That would be a bite size step towards AIP-70. Though, it might
be a large bite :)
On
TLDR
* changing handling of try_number in
https://github.com/apache/airflow/pull/39336
* no more private attr
* no more getter that changes value based on state of task
* no more decrementing
* try number now only handled by scheduler
* hope that sounds good to all of you
For more detail read
>
> It doesn’t affect my vote on the API, but I am very strongly against this
> one part of the AIP:
> > … dag_id are namespaced with `:` prefix.
> This specific part is getting an implementation/code veto from me. We made
> the mistake of overloading one column to store multiple things in Airflow
One thought that occurred to me... In order to really take advantage of
async python, it may essentially require a scheduler rewrite. Certainly
there will be a lot of refactoring required. And I have not looked closely
at this. But I think chances are good that there will be a lot of things
that
gt; really possible back then - but now, by getting rid of Mssql and if we have
> the right drivers for mysql, it should be possible - I guess.
>
> On Mon, Apr 8, 2024 at 8:17 PM Daniel Standish
> wrote:
>
> > I wholeheartedly agree with Ash that it should be all or nothing. And
>
If nothing else, write an ugly adapter using sync_to_async?
On Mon, Apr 8, 2024 at 1:06 PM Daniel Standish <
daniel.stand...@astronomer.io> wrote:
> https://github.com/omnilib/aiosqlite maybe?
>
> On Mon, Apr 8, 2024 at 1:03 PM Scheffler Jens (XC-AS/EAE-ADA-T)
> wrote:
I wholeheartedly agree with Ash that it should be all or nothing. And
*all* sounds
better to me :)
On Mon, Apr 8, 2024 at 10:54 AM Ash Berlin-Taylor wrote:
> I’m all in favour of async SQLAlchemy. We’ve built two products
> exclusively at @ Astronomer that use sqlalchemy+psycopg3+async and
One wrinkle to the have cake and eat it too approach is deferrable
operators. It doesn't seem it would be very practical to resume back into
the operator that is nested inside a taskflow function. One solution would
be to run the trigger in process like we currently do with `dag.test()`.
That
Another thing to consider is whether TI deps is the right place for this
kind of thing. The two examples were pretty infra related. Maybe it makes
sense in those cases for the TI's scheduling deps to be cleared but then
the executor needs to manage scaling up the infra for the task or
; solving
> > > > > > >> > > > >something that needs to be solved.
> > > > > > >> > > > >
> > > > > > >> > > > >I think 1) 2) 3) are real problems that it addresses.
> > > > &
Well let me add some more thoughts.
I like the idea in general, the principle of trying to somehow acknowledge
the comments and suggestions that have been made.
But it may have some unintended and perhaps undesirable consequences.
E.g. when you "resolve" a conversation, then you make it less
+1
On Tue, Dec 19, 2023 at 9:36 AM Pierre Jeambrun
wrote:
> This is something I already try to apply on my own PRs, never merge before
> explicitly solving all conversations.
>
> Also for a reviewer, I feel like this gives more confidence to the fact
> that the PR is ready, and indeed we are
>
> Tried to get others some opportunity to comment, but I see it's mostly me
> <> Bolke.
It might help to start off this separate discussion thread with a simple /
concise problem statement. (that might sound snarky but it ain't :) )
And I say *might help* cus you still might get crickets
nds that the Connection class' get_uri is meant to be general,
> > supporting the use case of serializing the Airflow connection on basis of
> > host, port, schema, password, type etc.
> >
> > Thanks & Regards,
> > Amogh Desai
> >
> > On Mon, Nov 20, 2023
Thanks Jarek
On Tue, Nov 21, 2023, 4:34 PM Jarek Potiuk wrote:
> I think we miss important insight - straight from the source.
>
> I believe it's time to be candid and simply ask questions for the future of
> Pendulum directly where we should - ie. we should just ask maintainers.
>
> I've just
The thing that makes *me* hesitant to deprecate is the sheer magnitude of
breaking it would bring (even though we're only talking about a
hypothetical 3.0 release), balanced against the actual pain it causes.
I.e. it's confusing to use, and takes up space in docs (when, if removed,
we could just
I would also like to see it deprecated.
That said, I am not convinced there is anything that we cannot encode using
URI though. I think the problem is just when one tries to use the same URI
to mean two different things, e.g. both airflow connection URI and
sqlalchemy URI. They are different.
+1 binding
Verified licence signature checksum, installed and ran a dag
On Mon, Oct 2, 2023 at 8:00 AM Hussein Awala wrote:
> Thank you for the clarification!
>
> > However, I just had an idea which we can try out for future releases
>
> I wonder if we can exclude the file from rat check and
I don't think of it as really a question about accurate record keeping but
more a question of what an SLA is, i.e. when do you want the warning, or
what do you want the warning based on. I think that the idea has been that
it really means, "if task not done by X time each day then warn". And the
Yeah I think your proposal seems reasonable. Airflow conn URI is not same
thing as sqlalchemy URI. That worked for some simple circumstances but it
is definitely not true in general. Hooks that need it should generally
implement it.
On Tue, Sep 19, 2023 at 10:51 AM Andrey Anshin
wrote:
>
I was able to chat with a couple folks about this. Small sample, but the
sentiment was, "this is just a timeout". In other words, if we're going to
call this SLA, we really ought to evaluate against the "this thing should
have run by" time and not the actual start time. And, ideally, we should
I share Jens's concerns about complexity.
OK so one difference here is, you're adding a new DAG SLA concept. Which
is useful. One subtle difference from what I think is the existing
"concept" of SLA is that you are evaluating it against when it started, as
opposed to when it should have started, and evaluating it only in the
course of
First of all, thanks for being so charitable in engaging in this dialogue,
I appreciate it.
Yeah I think that the notion that maybe Airflow is making really
impractical promises with SLA, well that could be true.
One question for you, as I continue to let this percolate.
Can you help me
not resolved immediately. So to me, as
> long as there are existing ways of detecting these more critical
> infrastructure issues (which there are), I am not too concerned that my SLA
> measuring might be impacted by a late scheduled DAG Run.
>
> On the idea of killing the running sla tr
Some questions for you Sung.
I tried looking to understand why we needed to remove behavior 3 discussed
in AIP:
*[remove]* Task-level SLA measured from DAG-run scheduled start time
I'm just a little concerned that removing this would be a mistake because,
in my mind, part of the essence of
>
> would definitely be in favour of that approach and using it more
> liberally. I think SemVer does not say anything about this case - "the
> software still supports it but you need to flip a flg" sounds like a nice
> way of introducing behavioural changes without breaking changes.
This
gt; * Also I agree 'backwards compatible" does not mean "has this feature
> enabled in new release". Take the infamous "SubDAG" as an example: If we
> find a way to decouple it, I would be all for having a flag that ENABLES it
> if set - but disabled by default.
>
otentially a better airflow. Personally, putting on my user hat, that
feels like a worthy trade.
On Wed, Aug 30, 2023 at 12:01 PM Daniel Standish <
daniel.stand...@astronomer.io> wrote:
> Yeah I agree completely with more liberal use of something like more
> liberal use of "expe
ir installation. This is 100% against the spirit and idea of
> the regulations. The regulations aim to force those who produce software to
> make it easy and possible for the users to upgrade immediately after
> security fixes are released.
>
> In a way - using SemVer and being able to t
e anything
more than that.
CalVer <https://calver.org/> may be a good option.
On Sat, Aug 26, 2023 at 11:22 PM Daniel Standish <
daniel.stand...@astronomer.io> wrote:
> For whatever reason, I was reminded recently of snowflake's "behavior
> change" policy
>
> See here:
For whatever reason, I was reminded recently of snowflake's "behavior
change" policy
See here: https://docs.snowflake.com/en/release-notes/behavior-change-policy
I think semver is problematic for core because basically you cannot
implement behavior changes until the "mythical" major release.
The vote has passed.
On Thu, Aug 3, 2023 at 6:30 PM Daniel Standish wrote:
> Calling for a vote by lazy consensus to accept the changes to AIP-52 Setup
> and Teardown Tasks.
>
> *Discussion thread:*
> https://lists.apache.org/thread/c4s0541nrjbjm1or8tpl08y4qtmjj4gd
>
Calling for a vote by lazy consensus to accept the changes to AIP-52 Setup
and Teardown Tasks.
*Discussion thread:*
https://lists.apache.org/thread/c4s0541nrjbjm1or8tpl08y4qtmjj4gd
*Docs:*
Yeah it ain't the easiest of decisions.
@niko, when I think about the staged approach, it feels like it is maybe
more disruptive than doing it all at once.
stage 1: force everyone to install k8s and celery libs -- that's one
disruption / risk
stage 2: remove pre-install -- another disruption
it
* *provider* extras or provider optional features
Yeah I think this is an important point:
>
> Core Airflow so far hasn't had the dependencies
> required to make those executors functional either - users either had to
> use the extra or install the provider directly. So that doesn't really
> change if we choose not to preinstall the providers.
>
> I guess the question is not "can we do it" but more "should we do it" :D
right :)
Just on the question of semver, I am not convinced that semver prohibits
this.
As a user, your concerns and expectations regarding semver are about
essentially how the code works, e.g. are you going to have to refactor all
of your 500 dags. In other words the API.. But to me this is a lower
You're not too late, and thanks for engaging with the issue.
I don't think anyone would dispute that users will sometimes want a setup
without a teardown. But the question is should we require that users
explicitly make the scope of a setup well-defined. Like Jarek, I have some
ambivalence on
Ok want to pick back up setup / teardown discussion as we get closer to 2.7.
For personal reasons I had to take some time off work just as we were
wrapping up work on 2.6, and at least partly due to that, we punted setup /
teardown to 2.7.
But we've picked it back up and continued along the path
Don't think fixture would break that. It would just be test code not in the
dag. It would just ensure that the triggerer is running before the tests
that use the triggerer need it. But doing it in breeze makes more sense for
sure. Although I suppose a combination approach could be considered EG,
I just took a look and it turns out that DebugExecutor works fine with
triggerer you just need to have one running.
You could run one in a subprocess. I experimented with refactoring the
subprocess hook for this purpose (so you can start the subprocess
asynchronously) and then ran this dag with
you can have any author you like, as long as it is potiuk
https://github.com/apache/airflow/pull/27264
On Fri, May 26, 2023 at 2:09 PM Hussein Awala wrote:
> I join Bas and I vote for #27264.
>
> On Sat 27 May 2023 at 00:03, Bas Harenslak
> wrote:
>
> > My vote goes to
Congrats
On Mon, May 22, 2023, 3:54 PM Jarek Potiuk wrote:
> Hello everyone,
>
> I am glad to announce that I just merged
> https://github.com/apache/airflow/pull/27264 that implements Python 3.11
> support for Airflow.
>
> Python 3.11 brings a number of speed improvements for single-threaded
nice
Thanks for the heads up about the timing re 2.7 @Jarek.
I too am eager for the walrus operator.
Feels very appropriate
On Fri, Apr 28, 2023 at 1:33 PM Michael Robinson
wrote:
> Hello all,
>
> Thanks to all who participated in this month’s unusually competitive vote
> for PR of the Month. By my count, we have a three-way tie. The winners:
>
> #30705 by @potiuk: “Optimize parallel test
This is a tough one. Many PRs and contributors deserving of recognition
here. And cool to see so much engagement in the voting.
But my vote goes to #30375... It may seem like a small fix but I think it
(at least hopefully) should alleviate a lot of frustration and it's an
example of someone,
It seems reasonable to me.
On Mon, Mar 27, 2023 at 12:02 AM Jarek Potiuk wrote:
> Hello Everyone,
>
> TL;DR; I wanted to raise a discussion and make a proposal about option
> to skip some niche providers of our from releasing if they are holding
> us back, regarding the dependencies
>
> We are
Happy to see the engagement on this one. Thanks to everyone for thinking
it through and contributing their thoughts.
re niko
> - Context managers:
> I found most of the context manager syntax proposals a little hard to
> understand, but some better than others. Ultimately if I put my DAG
tually we are discussing it now, so I think it is cool.
>
> J.
>
> On Mon, Mar 27, 2023 at 8:43 AM Ash Berlin-Taylor wrote:
> >
> > If the set-up ran then the tear down _must_ run. No question.
> >
> > Nothing should be able to change this fact. If you can, then they don't
>
he code of
> >teardown and setup decorators.
> >
> >This means that users of ShortCircuitOperator will not even know they need
> >to take action (until it wont work as expexted) and they will propbably
> >start as asking questions.
> >
> >I'm not saying this
Thanks Elad for the feedback.
re 1. i don't really see a problem with the trigger rule being public. The
way I see it, it's another trigger rule like any other trigger rule. Every
trigger rule behaves differently, that's true here too. This one happens to
be relied upon for teardown tasks.
Surprised to hear that it doesn't work with celery. Is that right? I
assumed that this was the main target.
If it's really only a benefit in dag processor, it's surprising that it
provides much benefit because it should be one call per var-file-parse; in
worker it will be once per ti and I
DAG code.
> ***
>
> Crucially I want us to not let perfect be the enemy of good, and all this
> confusion and discussion is exactly why I had originally placed "multiple
> setup/teardown" in future work. Having a single setup function and a single
> task group gives our use
It would not help with kubernetes executor. Did you try with local
executor?
Another option to consider is to implement this specifically on the AWS
secrets manager secrets backend itself.
re
> 2. `task1 >> task2 >> teardown_task` to me falsely implies that teardown
> depends on task2, But it doesn't. It only depends on the "scope being
> exited".
So that's not quite the case. With the proposed implementation, there's no
such scope concept. They're just normal tasks, with
Hi, would like to clarify, in this thread we're specifically hoping to get
community feedback on the proposal to drop the "implicit" logic.
In the original AIP, if you instantiate a setup task in a group, in effect
it's made the setup task for all tasks in the group. And the proposal up
for
I’m part of a group working on the implementation of AIP-52. We would like
to update the community on some changes to the implementation approach, the
planned roadmap, and give an opportunity to provide feedback.
First though, let’s recap briefly what are the main benefits of adding
setup and
Thinking about this some more.
As an airflow developer, a lot of our backcompat concerns (which takes up a
substantial portion of our energy), is about the concern that we might
break something for someone who has extended either built-in classes or
provider classes. Maybe we are to permissive
okie doke, took a crack at it https://github.com/apache/airflow/pull/29200
e important points in the conversation, or
> even misunderstood some points. But just summarizing what I have understood
> as well as what I would prefer.
>
>
> Regards,
> XD
>
> On Thu, Jan 26, 2023 at 9:45 AM Daniel Standish
> wrote:
>
>> I understand it's "
ackages as
> contributed by people external to those projects.
> This data can e.g. be used for static analysis, type checking or type
> inference.
> """
>
> This way a user who wishes to extend airflow could simply install the
> `types-apache-airflow==2.6.0` a
Following up here... that PR has been merged At some point we should
probably have a vote on that, if it's meant to be actual binding policy.
Maybe we're still feeling it out? But "what is public" is a pretty
fundamental concern to the project. Maybe such a policy itself should be
an AIP?
; currently IMHO is not an option) my vote goes to 2.
>
> We can also drop MySQL support :D
>
> J.
>
>
>
>
> On Thu, Jan 12, 2023 at 9:56 AM Ash Berlin-Taylor wrote:
>
>> +1 to what Daniel said
>>
>> On 12 January 2023 08:32:29 GMT, Daniel Standish
>&g
issues/23020>
>
> *Abdul Hadi Shakir*
>
>
> On Thu, Jan 12, 2023 at 1:19 PM Daniel Standish
> wrote:
>
>> Hi,
>>
>> Is it not possible to just have unicode dag_id with no distinct "name"?
>> If you explored this route and encountered problems
Hi,
Is it not possible to just have unicode dag_id with no distinct "name"? If
you explored this route and encountered problems which caused you to
abandon, can you share what were the problems?
I think having just one ID for a dag is a nice thing, if we can keep it.
On Wed, Jan 11, 2023 at
Verified signatures, licenses, checksums.
+1 (binding)
do want to mention though that it looks like yandex and jdbc release notes
still both mention 4.0 (which was an accident related to recent change in
policy re major bump for min airflow version bump) but i defer to your
judgment whether
I think it makes sense and I'm a +1.
For the convenience of other readers I'll paste your rationale here:
The rationale i have - that from the point of view of provider, it's just a
> dependency change (which we generally consided non-breaking) and it does
> not break people's workflows in
Ok -- just merged normalization for core. Now we're fully normalized.
Expect conflicts in your branches (if they touch core airflow). Apologies
in advance.
>
> 3. I am also getting a true sense of just how overwhelming the influx of
> Issues, PRs and Discussions is. I have come across several folks who
> submitted PRs and never got feedback and then left the community. Losing
> these folks is a bad experience for them but also for us because we lost
OK -- the non-providers/non-core (a.k.a. "other") group of string
normalization has just been merged. So, if you already rebased, unlucky
you -- you need to do it again.
But fear not, you won't need to do it yet another time until closer to the
release of airflow 2.5, when we'll apply string
Thanks Jarek
There will be a couple more notices like this as we apply to the rest of
the codebase. Non-providers / non-core is coming soon, and core will be
applied closer to 2.5.
And if you don't want to squash to one commit, I think you can also just do
`pre-commit run black --all-files`,
As of airflow 2.3, airflow hooks no longer need to name extra fields with
the `extra__conn_type__field_name` convention (PR
https://github.com/apache/airflow/pull/22607).
I believe using the short name is much more intuitive and user-friendly.
So I am currently working on updating our hooks so
+1 (binding)
checked signatures checksums and licenses
On Wed, Oct 5, 2022 at 3:31 PM Jed Cunningham
wrote:
> +1 (binding)
>
> Checked signatures, checksums, and licences.
>
Nice, congrats
On Mon, Oct 3, 2022 at 3:12 PM Jarek Potiuk wrote:
> Hello Airflow community,
>
> I have just merged the last PR from the "Breeze rewrite to Python"
> project, completing the project we started with Bowrna and Edith as
> Outreachy interns and Elad co-mentoring it with me.
+1 (binding) verified signatures and licenses, checksum and ran sample dag
On Fri, Sep 30, 2022 at 11:21 AM Jed Cunningham
wrote:
> Kind bump, I need 1 more vote on this release.
>
One vote for https://github.com/apache/airflow/pull/26400 (improved test
command)
On Tue, Sep 27, 2022 at 10:50 AM Jed Cunningham
wrote:
> My write-in is ExternalPythonOperator:
> https://github.com/apache/airflow/pull/25780
>
Proposal: remove param --skip-string-normalization on black pre-commit hook
Discussion thread:
https://lists.apache.org/thread/0g887n3gyr611k908zsz27ccshb85z2n
After 72 hours, unless there is an objection, this proposal will be adopted.
What is lazy consensus? Read here
>
> Opinion: Adding a lazy consensus thread will not hurt and I think it
> should happen
Sure, can do
>
OK seems like all are in favor. Do we even need a vote? I guess lazy
consensus still applies even when you don't formally call for a lazy
consensus vote. So I reckon I won't bother.
But the implementation to @Jed Cunningham 's point
we can wait until closer to 2.5. And I'll connect with
>
> I'll weigh in on this most important of decisions
:)
OK but I must clear up one thing ... if we turn on string normalization, we
do not get to choose single vs double -- with black, there is only one way,
and it is double.
Personally I have always liked single but yeah, I am in favor of
Black, our python formatter, can "normalize" strings to prefer double
quotes, and we disable this feature.
I have always been a single quotes person unless using f-string and
supported disabling normalization when we introduced black.
But lately, black’s string normalization has seemed more
+1 (non-binding)
On Sun, Sep 18, 2022 at 6:57 PM Phani Kumar
wrote:
> + non binding. Tested the Dataset functionality and working fine
>
> On Sun, Sep 18, 2022 at 11:09 PM Ephraim Anierobi <
> ephraimanier...@gmail.com> wrote:
>
>> +1 (binding)
>>
>> On Sun, 18 Sep 2022 at 16:57, Jeambrun
te:
> One question: are we deprecating the existing parameters straight away or
> leaving it without a warning for one release? (I think someone talked about
> that at one point)
>
> On 6 August 2022 19:46:02 BST, Daniel Standish
> wrote:
>>
>> Well, thanks for forcing us
Proposal:
- add new DAG param `schedule` that accepts everything *currently *supported
by `schedule_interval`, `timetable`, and `schedule_on`
- (note schedule_on, added for AIP-48, has not been in any release
yet; only present in main)
- remove param schedule_on immediately
a timetable? Is it a "cron TAB" ? Or what?
>>
>> For me even if we wanted to have different names, the current set of
>> names for those params is just terribly ambiguous.
>>
>> Making it "schedule" and acting depending on what you pass to it is
y
>>> recent and have not seen as much usage for this invocation yet. I am
>>> definitely curious about other perspectives on this as well.
>>>
>>> I have a hard time with option (1) of deprecating "schedule interval",
>>> because of the historical n
Oh nice, that makes it easy.
Did the rename.
On Thu, Aug 4, 2022 at 11:34 AM Jarek Potiuk wrote:
> No problem - we can rename it - whoever gets to the old name will get
> information that the page is gone and they will see the link to the new
> page.
>
> On Thu, Aug 4, 2022 at
I noticed that our AIP page says "Airflow *Improvements* Proposals" at the
header at the top.
But I think it should be "Airflow *Improvement* Proposals"
See:
https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals
Problem is our docs point to this link, which
gt;> Hi Daniel,
>>
>> +1 for ‘schedule' and thanks for bringing this up.
>>
>> I agree with Vikram, we should be very careful about deprecating existing
>> params even if we have warnings around it. Not sure if this is a general
>> case, but I notice that pe
1 - 100 of 228 matches
Mail list logo