Re: [DISCUSS] consolidate dag scheduling params

2022-08-03 Thread Daniel Standish
In case you forgot about us @Vikram, gentle nudge On Mon, Aug 1, 2022 at 8:40 AM Daniel Standish < daniel.stand...@astronomer.io> wrote: > Well, this is a discussion thread, so let's discuss! > > Vikram, what kind of implications are you thinking of? Maybe you could > pro

Re: [DISCUSS] consolidate dag scheduling params

2022-08-01 Thread Daniel Standish
, I don't think > we should be doing this yet until we understand the implications > completely. > I am really not in favor of deprecation of the existing params, unless > there is really no alternative. > > > On Fri, Jul 29, 2022 at 2:37 PM Daniel Standish > wrote: > &g

Re: [DISCUSS] consolidate dag scheduling params

2022-07-29 Thread Daniel Standish
So far, seems all in favor. I will just highlight, in case it's not clear when we release this (presumably 2.4), basically every single dag in existence will start emitting deprecation warnings, and prior to 3.0, basically every dag in existence will need to be updated. Thankfully, for most

Re: [LAZY CONSENSUS] deprecate "dependency detector" pluggability

2022-07-27 Thread Daniel Standish
This vote has officially passed. On Thu, Jul 21, 2022 at 3:59 PM Jarek Potiuk wrote: > Was it really *puggable* ? I would have never guessed. Not so lazy > consensus here > > On Thu, Jul 21, 2022 at 11:44 PM Daniel Standish > wrote: > > > > Sorry --- sunday july 24 >

Re: Remove from email list

2022-07-27 Thread Daniel Standish
1. here's how to do 2. The preferred channel for discussion is using the official Apache Airflow mailing lists. To subscribe, send an email to: - Users list (archive ): users-subscr...@airflow.apache.org

[DISCUSS] consolidate dag scheduling params

2022-07-27 Thread Daniel Standish
As of airflow 2.3, we have two mutually-exclusive scheduling params, `schedule_interval` and `timetable`. As it stands now, AIP-48 will be adding a *third* such param, `schedule_on`. I think it would probably be best to consolidate these into one single scheduling param. What do you think?

Re: [VOTE] July 2022 PR of the Month

2022-07-22 Thread Daniel Standish
my write-in vote: Implement expand_kwargs() ( #24989 ) (uranusjr) honorable

Re: [LAZY CONSENSUS] deprecate "dependency detector" pluggability

2022-07-21 Thread Daniel Standish
Sorry --- sunday july *24*

[LAZY CONSENSUS] deprecate "dependency detector" pluggability

2022-07-21 Thread Daniel Standish
Hi y'all I am calling for a vote by lazy consensus. Proposal below. Relevant PR: https://github.com/apache/airflow/pull/25175 If there are no objections by sunday july 22 at 2:40pm America/Los_Angeles, then the proposal will be adopted. Thanks for your consideration. *Background:* Did you

Re: [DISCUSS] Deprecating SLA feature?

2022-07-13 Thread Daniel Standish
I don't think it makes sense to deprecate it at this time just to re-add it. It's not necessarily backward incompatibility if you are fixing something that is not functioning as intended / desired. And I'm not sure that we'll really have to break backcompat even if you don't want to interpret it

Re: [DISCUSS] Deprecating SLA feature?

2022-07-12 Thread Daniel Standish
I also think SLAs make sense in airflow and think we should fix the feature.

Re: [PROPOSAL] Remove "Label when reviewed" workflow

2022-06-25 Thread Daniel Standish
oes it based on the heuristics of what files were modified in the PR. > > > > I believe (and maybe that's the reason you have not seen it) - we, ve > > all developed a sort of blind spot for it. > > Those messages are anyhow useful (or supposed to be useful) for > >

Re: [PROPOSAL] Remove "Label when reviewed" workflow

2022-06-25 Thread Daniel Standish
I never noticed this one but on the topic of useless CI checks Up-to-date checker seems like a strong candidate for removal for similar reasons. On Sat, Jun 25, 2022 at 2:26 AM Jarek Potiuk wrote: > Hey all, > > I think this workflow we have in CI is rather useless now. > It has been

Re: [DISCUSS] - Policy for removing deprecated code from providers

2022-05-20 Thread Daniel Standish
> It means, in a 3 months period, a developer needs to [do lots of things...] When removal is released (say after a min of 3 months since deprecation), as a user nothing forces you to upgrade to the latest major *immediately*.

Re: [DISCUSS] Deprecate core airflow k8s settings in KubernetesPodOperator

2022-04-25 Thread Daniel Standish
Appreciate it, Jarek Re your last point 5) Or MAYBE we should simply incorporate cncf.kubernetes provider > entirely in the core of Airlfow? Maybe there should be NO > "cncf.kubernetes" provider? My Answer: This is the point which is the real reason for me being > reluctant here. I see it as

Re: [DISCUSS] Deprecate core airflow k8s settings in KubernetesPodOperator

2022-04-15 Thread Daniel Standish
Thanks Jarek I think the settings intermingling is independent from the problem you're concerned with. I can see how it would be desirable to define the executor interface more robustly, and to allow core to not care about k8s version (so that provider can use whatever k8s version it likes).

Re: Apache Airflow 2.3.0beta1 available for testing

2022-04-14 Thread Daniel Standish
awesome   >

Re: [ANNOUNCEMENT] Feel the (new) breeze

2022-04-11 Thread Daniel Standish
very cool 

Re: [DISCUSS] Deprecate core airflow k8s settings in KubernetesPodOperator

2022-04-04 Thread Daniel Standish
Thanks Jarek I register a bit of a mixed message. Re this Maybe I am wrong but I think the separation should be solved together, > there is a good reason why some k8s settings are in core. > Are you saying I should not proceed with this plan *(add warnings in KPO that push users to start using

Re: [DISCUSS] Deprecate core airflow k8s settings in KubernetesPodOperator

2022-04-01 Thread Daniel Standish
Small clarification, KPO doesn't currently take into account the default namespace specified in kube settings. AFAIK it considers the following: - in_cluster - cluster_context - config_file - enable_tcp_keepalive - verify_ssl And so my proposal would be that, after a deprecation

[DISCUSS] Deprecate core airflow k8s settings in KubernetesPodOperator

2022-04-01 Thread Daniel Standish
Currently KPO uses "core" kubernetes utils for client generation. *(I'm working on migrating it to use the hook. The effort was waiting on a few other changes and i'm picking it back up now.)* In addition to using the core *utils*, it also uses the "core" airflow kubernetes *settings* from

Re: [VOTE] Release Airflow 2.2.5 from 2.2.5rc3

2022-04-01 Thread Daniel Standish
+1 (non-binding) installed in virtualenv and ran a deferrable task and a KPO task >

Re: [ANNOUNCE] Breeze caching improvements

2022-03-31 Thread Daniel Standish
awesome 

Re: Make first dag run optional when catchup is False

2022-03-22 Thread Daniel Standish
> > In other words, if catchup=False and a daily DAG is paused for a week, > then when it is unpaused it would only run the most recently missed DAG run > and not run the others from earlier. Right -- that's the behavior of catchup=False; but the same would be true of the hypothetical third

Re: Make first dag run optional when catchup is False

2022-03-22 Thread Daniel Standish
One concern I have with the idea of catchup="none" vs catchup="last" is, it really only controls what is the first run for the dag; other than that there is not one bit of difference. E.g. if you have some runs on a daily dag, and you pause it for a week, when you turn it back on, they both do

Re: Make first dag run optional when catchup is False

2022-03-22 Thread Daniel Standish
No I don't mean start date is dynamic. I mean that if you don't provide a start date, I mean in effect we use the dag creation date as the start date.

Re: Make first dag run optional when catchup is False

2022-03-22 Thread Daniel Standish
There's some wiggliness here because of Airflow's behavior of actually *running* the dag at the end of the interval rather than the start. So if we have start_date=None, then we default the start date to *now,* then maybe to be consistent, the first run needs to be not 00:00 tomorrow but 00:00

Re: Make first dag run optional when catchup is False

2022-03-22 Thread Daniel Standish
What this is really about is, "what should be the first run for a dag". I think one way we might be able to solve this is to make `start_date` optional. And if you don't provide a start date, then your first run can be the next "interval end". *(if we add support for more cron-like scheduling,

Re: Make first dag run optional when catchup is False

2022-03-22 Thread Daniel Standish
So the cost with this configurable with something like this ... Catchup=True : run all intervals Catchup=False: do not run any past interval Catchup="Last Interval" (or any better name :)) ... is that it adds complexity to the interface. Not necessarily saying it's not worth the cost, but it

Re: [ANNOUNCEMENT] Codespaces support in Airflow

2022-03-09 Thread Daniel Standish
very cool, Jarek  

Re: [LAZY CONSENSUS] Deprecate non-JSON extra, and require a dict

2022-03-08 Thread Daniel Standish
For the record, the vote [lazily] passed, and the PR was merged. As Ash has noted, this will in effect make conn.extra and conn.extra_dejson redundant (since, by requiring extra to be json, there's no reason we can't store it as a dict in the first

Re: "Release Notes" - rethinking our CHANGELOG and UPDATING docs

2022-03-08 Thread Daniel Standish
+1 looks like a great change

Re: Make first dag run optional when catchup is False

2022-03-04 Thread Daniel Standish
You are saying, when you turn on for the first time a dag with e.g. @daily schedule, and catchup = False, if start date is 2010-01-01, then it would run first the 2010-01-01 run, then the current run (whatever yesterday is)? That sounds familiar. Yeah I don't like that behavior. I agree that,

Re: [LAZY CONSENSUS] Deprecate non-JSON extra, and require a dict

2022-02-25 Thread Daniel Standish
> What does this mean for the extra and extra_dejson attrs that exist on Connection right now? It's a good question. I think ideally we should deprecate extra_dejson, and `extra` should be dict (require json-encodable), with a db type of JSON where supported. But the path to get there seems a

[LAZY CONSENSUS] Deprecate non-JSON extra, and require a dict

2022-02-24 Thread Daniel Standish
It's generally assumed that the `extra` field in airflow's Connection model is JSON string. However, it's not, strictly speaking, *required* to be so. I believe we should require it to be JSON. But I also think we should nudge this a tiny bit further. A python string value such as '"hi"'

Re: Changes to AIP-42 (Task Mapping) DAG author interface

2022-02-24 Thread Daniel Standish
Following up on this, we've adjusted this and replaced `apply` with `expand`. Example: @task def directories(): return ["~", "/etc"] def create_ls_command(directory): return ["ls", directory] # The following creates two tasks executing "ls ~" and "ls /etc". DockerOperator.partial(

Re: New Commiter: Malthe Borch

2022-02-19 Thread Daniel Standish
Congrats Malthe! >

Re: New committer: Josh Fell

2022-02-19 Thread Daniel Standish
Congrats! On Sat, Feb 19, 2022 at 5:11 AM Kaxil Naik wrote: > Congratulations Josh, welcome aboard > > On Sat, 19 Feb 2022 at 07:38, Ephraim Anierobi > wrote: > >> Congratulations Josh! >> >> On Sat, 19 Feb 2022 at 08:37, Tomasz Urbaszek >> wrote: >> >>> Congrats Josh! >>> >>

Re: [DISCUSSION] Specify tasks to skip when triggering DAG

2022-02-15 Thread Daniel Standish
Thinking more, equivalently, when raising AirflowSkipException, we could optionally include a payload, which would determine the "final" task state. I suppose neither of those would really make Jarek happier because for both of these the final state doesn't convey whether it did work or not.

Re: [DISCUSSION] Specify tasks to skip when triggering DAG

2022-02-15 Thread Daniel Standish
One thing you could do is add an `AirflowSuccessException`. Similar to AirflowSkipException, if raised task would immediately stop and be marked as successful. Then in your implementation of the pre-execute callable, you can choose whether you want it to be "skip" or "skip success".

Re: [VOTE] deprecate days_ago helper function

2022-02-10 Thread Daniel Standish
The vote has passed with 6 binding +1 votes and no -1 votes. I will proceed with a PR to implement the proposal. Votes: Jarek Potiuk +1 (binding) Arthur Wiedmer +1 (binding) Tomasz Urbaszek +1 (binding) Daniel Standish +1 (binding) Dennis Akpenyi +1 (non-binding) Drew Hubl +1 (non-binding) Josh

Re: [PROPOSAL] New operator for "watcher" scenario

2022-02-10 Thread Daniel Standish
The other thing that comes to mind is you can add your "normal" tasks to a task group and then do `my_group >> watcher` Also I noticed that dag can take success / failure callbacks. Maybe the "watcher" task makes sense as a callback.

Re: [DISCUSSION] Specify tasks to skip when triggering DAG

2022-02-05 Thread Daniel Standish
> > [Cons] > 1) Not scalable / Inconvenient > To make a task skippable, one needs to modify existing DAG (to set > `pre_execute`). It seems not difficult, but when your Airflow host a > thousand DAGs own by different teams/users, it can be challenging. You can use task_policy to apply (or chain,

Re: [VOTE] deprecate days_ago helper function

2022-02-04 Thread Daniel Standish
> >> On Thu, Feb 3, 2022, 17:03 Daniel Standish >> wrote: >> >>> Hi >>> >>> I would like to call a vote on the following proposal: >>> >>> Helper function `days_ago` is to be deprecated (warning of pending >>> removal with e

Re: [DISCUSSION] Specify tasks to skip when triggering DAG

2022-02-03 Thread Daniel Standish
That skip func had a typo (conf where it should have been context)... this is more likely to work: def skip_if_specified(context): if not context: return dr = context.get('dag_run') ti = context.get('task_instance') if not (dr and ti): return conf = dr.conf

Re: [DISCUSSION] Specify tasks to skip when triggering DAG

2022-02-03 Thread Daniel Standish
Ok and here's an example you could try: The callable: def skip_if_specified(context): dr = context.get('dag_run') if not dr: return conf = dr.conf if not conf: return skip_list = conf.get('skip_list') if not skip_list: return ti =

Re: [DISCUSSION] Specify tasks to skip when triggering DAG

2022-02-03 Thread Daniel Standish
And if not with cluster policy, then you could pass such a "conditionally skip" callable to all tasks in a dag with default_args On Thu, Feb 3, 2022, 10:16 PM Daniel Standish wrote: > It wouldn't necessarily need to involve the scheduler or the executor. > You could add logi

Re: [DISCUSSION] Specify tasks to skip when triggering DAG

2022-02-03 Thread Daniel Standish
It wouldn't necessarily need to involve the scheduler or the executor. You could add logic to pre_execute to read dag run conf and skip self is specified in the conf. In fact you could probably implement this currently with this approach using cluster policy, since we can supply pre_execute

[VOTE] deprecate days_ago helper function

2022-02-03 Thread Daniel Standish
Hi I would like to call a vote on the following proposal: Helper function `days_ago` is to be deprecated (warning of pending removal with each call) with removal targeted for 3.0. The deprecation warning should guide user and perhaps point to some documentation concerning how to maintain the

[DISCUSS] how to name classes with abbreviations?

2022-02-03 Thread Daniel Standish
How should we name, for example, an operator such as `BranchSQLOperator`? This operator used to be called `BranchSqlOperator` but at some point in the past was renamed. Meanwhile we have SparkSqlOperator which uses the other convention. And we have `EmrBaseSensor`, and `EMRContainerOperator`

Re: New PMC Member: Jed Cunningham

2022-01-04 Thread Daniel Standish
congrats jed! On Tue, Jan 4, 2022 at 9:14 AM Kaxil Naik wrote: > Hello Airflow Community, > > The Project Management Committee (PMC) for Apache Airflow > has invited *Jed Cunningham* to become a part of PMC Member and we are > pleased > to announce that he has kindly accepted it. > > Being a

Re: [DISCUSS] deprecate `days_ago` dates utility

2021-12-28 Thread Daniel Standish
ust the same as utcnow minus N days, it is always > "truncated" to the start of the day, so it's closer to > "utcnow().replace(hour=0, minute=0, second=0) - timedelta(n)” > > > On 28 December 2021 00:08:53 GMT, Daniel Standish > wrote: >> >> I recall some

[DISCUSS] deprecate `days_ago` dates utility

2021-12-27 Thread Daniel Standish
I recall some time ago we removed `days_ago` from all example dags. Not sure why we didn't also deprecate it. For reference, `days_ago(N)` returns utcnow minus N days. There's a PR to make it return a value in the default timezone, so that when you use it in an expression for dag `start_date`,

Re: [DISCUSS] Connection extra field widgets: long vs short name convention

2021-11-21 Thread Daniel Standish
I wasn't exactly proposing updating the front end to handle short names. It sounds like a reasonable idea but I am not trying to take that on right now. What I was trying to ask was, given the present form behavior, what should the *hook* behavior be, by convention? Specifically, should we write

Re: [VOTE] AIP-42 Dynamic Task Mapping

2021-11-18 Thread Daniel Standish
+1 (binding)

Re: [VOTE] Release Airflow 2.2.2 from 2.2.2rc2

2021-11-14 Thread Daniel Standish
+1 (non-binding) verified signatures, licenses and checksums installed in virtualenv and ran some sample dags

[LAZY CONSENSUS] Deprecate non-json-serializable params

2021-11-12 Thread Daniel Standish
Good day, everyone. Following the discussion thread here , this message calls for a vote, by lazy consensus , on the following proposal: - In the next release, deprecate

Re: [DISCUSS] non-json-serializable params

2021-11-11 Thread Daniel Standish
izable objects to make it fully featured like overriding it via CLI, > API and Webserver. > > > > Regards, > > Kaxil > > > > On Thu, Nov 11, 2021 at 6:51 PM Daniel Standish > wrote: > >> > >> Yeah I agree with you. > >> > >> The o

Re: [DISCUSS] non-json-serializable params

2021-11-11 Thread Daniel Standish
Yeah I agree with you. The one other thing I'll mention is the other use case that was raised in an issue was `datetime` which like set is also not json-serializable, but unlike set would probably not be yaml-serializable. But yeah let's see if others can help establish a consensus. Small note:

Re: [DISCUSS] non-json-serializable params

2021-11-11 Thread Daniel Standish
es should be considered as > "accidental" (it was never an intention) and not really part of the > public API, so we can simply "fix it" in 2.2.3 by requiring the > parameters to be YAML-serializable. > > J. > > > > On Wed, Nov 10, 2021 at 11:11 PM D

[DISCUSS] Connection extra field widgets: long vs short name convention

2021-11-10 Thread Daniel Standish
For some connection types, UI customizations have been added so that you have forms for the extra components. E.g. with GCP we have extra__google_cloud_platform__project etc. There's a PR for salesforce hook to make it so you that, when using secrets backend you could use either the short or

Re: [DISCUSS] non-json-serializable params

2021-11-10 Thread Daniel Standish
gt; should support it, but there should be an asterisk (*) that if you > want advanced features like YAML, you could use YAML. Then we would > not have to deal with deprecation which is problematic as we could > only remove this feature in Airflow 3. > > J. > > On Wed, Nov 10, 2021 at 7

[DISCUSS] non-json-serializable params

2021-11-10 Thread Daniel Standish
Prior to 2.2.0, you could use non-json-serializable params in a dag. Here's an example with `set`: @dag.task(params={"a": {1, 2, 3}, "b": [3, 4, 5]}) def set_param(intersection): print(intersection) set_param("{{ params.a.intersection(params.b).pop() }}") In 2.2.0 this was broken, and in

Re: [VOTE] Release Apache Airflow Helm Chart 1.3.0 based on 1.3.0rc1

2021-11-08 Thread Daniel Standish
+1 (non-binding) tested signatures, licenses, checksums and installed On Mon, Nov 8, 2021 at 11:46 AM Ephraim Anierobi wrote: > +1 binding > > On Mon, Nov 8, 2021 at 6:51 PM Kaxil Naik wrote: > >> +1 binding >> >> On Sun, Nov 7, 2021 at 4:42 PM Jarek Potiuk wrote: >> >>> +1 (binding):

Re: Task-level scheduling

2021-11-07 Thread Daniel Standish
typo: > > I'm a bit skeptical that there would be a good interface for like a > task-level-timetable-override if that's where you're going -- like a way > for each schedule *task *to have its own timetable override that's > processed within the dag run. > On Sun, Nov 7, 2021

Re: Task-level scheduling

2021-11-07 Thread Daniel Standish
A bit confused about what you're proposing Malthe. The thread subject is "task-level schedulling" but it says "there is no such interface to control task-level scheduling – or more specifically, the ability to control which DAG runs to skip." This makes it sound like you're talking about being

Re: SSIS

2021-06-29 Thread Daniel Standish
Here: from contextlib import closing from airflow.models.baseoperator import BaseOperator from airflow.providers.odbc.hooks.odbc import OdbcHook class SqlAgentOperator(BaseOperator): def __init__(self, job_name: str, **kwargs): super().__init__(**kwargs) self.job_name =

Re: SSIS

2021-06-29 Thread Daniel Standish
As others have suggested, using airflow to orchestrate stored procs directly (or to build sql statements and execute them) is a nice pattern that you could use to ultimately get rid of SSIS. However if you have legacy jobs that need to stay running as is, and you just want to orchestrate them

Re: [VOTE] Release Airflow 2.1.1 from RC1

2021-06-28 Thread Daniel Standish
I am not sure if this would block release, but I think CeleryKubernetesExecutor remains broken in this release if this issue is correct: https://github.com/apache/airflow/issues/16326#issuecomment-869597105 It was already broken in 2.1.0 so maybe that renders it a non-blocker? On Mon, Jun 28,

Re: [DISCUSS] Managing provider Connections via UI in managed Airflow services

2021-06-14 Thread Daniel Standish
Is at all feasible to deprecate connection UI customization? Then everything can just use `extra` json where the other params fall short. Seems like an area where the benefit does not outweigh the complexity. We could also take the opportunity to deprecate the long `extra` key names like

Re: Time between tasks - 4 minutes

2021-04-28 Thread Daniel Standish
> Regarding worker concurrency, according to the data scientist, the task that is kicked off is very intensive and spawns heavy duty C++ processes limiting our worker setting. This sounds like a good use case for kubernetes pod operator or kubernetes executor or ECS operator (if you are on aws)

Re: Time between tasks - 4 minutes

2021-04-27 Thread Daniel Standish
> > Loading the DagBag takes around 40 seconds because of the number of tasks this is suspicioius. it's not a given that a dag will take 40 seconds to parse due to 1000 or 2000 tasks. do you perhaps have network calls in your dag that are slowing things down? i would try to identify exactly

Re: [DISCUSS] Guidelines for Releasing Providers packages

2021-04-14 Thread Daniel Standish
The proposal to bump major for all providers with every core minor version seems like a reasonable solution to me, and it sounds like there may be consensus on that? Though eventially the version numbers may get pretty large :) This discussion came to my attiontion from engagement with the vault

Re: New Committers: Qian Yu & Xinbin Huang

2021-04-06 Thread Daniel Standish
Congrats! 

Re: Airflow install

2021-03-17 Thread Daniel Standish
Hi Sudhir It looks like you sent to user@ but it's user*s*@. Now to your question... Have you considered running airflow in docker? https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html If you're able to run in docker, that could be a good solution for you. Then airflow is

Re: [DISCUSS][AIP-39] Richer (and pluggable) schedule_interval on DAGs

2021-03-11 Thread Daniel Standish
Just to explore whether this would really produce confusion for users... There already are a number of Schedule* classes. - ScheduleIntervalSchema - SchedulerInfoSchema - SchedulerJob - SchedulerMetricsJob (excludes test classes) But there isn't already a Scheduler class.

Re: New Committers: James Timmins, Elad Kalif & Daniel Standish

2021-03-01 Thread Daniel Standish
Thanks y'all and congrats to Elad and James  Also, thank you to all the committers who have been so helpful and welcoming along the way 

Re: [DISCUSS][AIP-39] Richer (and pluggable) schedule_interval on DAGs

2021-02-26 Thread Daniel Standish
Very excited to see this proposal come through and love the direction this has gone. Couple comments... *Tree view / Data completeness view* When you design your tasks with the canonical idempotence pattern, the tree view shows you both data completeness and task execution history (success /

Re: New Airflow Committer: Ephraim Anierobi

2021-02-23 Thread Daniel Standish
congrats  On Tue, Feb 23, 2021 at 10:33 AM Karolina wrote: > Ww! Congratulations Ephraim Anierobi, well-deserved. > > Kind regards, > Karolina Rosół > > > > > On Tue, Feb 23, 2021 at 7:26 PM Kaxil Naik wrote: > >> Hello Airflow Community, >> >> The Project Management Committee (PMC) for

Re: [VOTE]: Release Apache Airflow 2.0.0 form 2.0.0rc2

2020-12-13 Thread Daniel Standish
In testing I found that installation of simplejson breaks webserver. Issue documented here . On Sun, Dec 13, 2020 at 5:41 AM Jarek Potiuk wrote: > > Also - for convenience - airflow 2.0.0rc2 images are available in > DockerHub: > >- docker

Re: [VOTE] Release Airflow 2.0.0 from 2.0.0rc1

2020-12-10 Thread Daniel Standish
> > Also inexplicitly got error `scheduler error (sqlite3.OperationalError) > database is locked`. sorry: inexplicably 臘‍♂️ >

Re: [VOTE] Release Airflow 2.0.0 from 2.0.0rc1

2020-12-10 Thread Daniel Standish
Also inexplicitly got error `scheduler error (sqlite3.OperationalError) database is locked`. Resulting from query DELETE FROM dag_code WHERE dag_code.fileloc_hash NOT IN (?) AND dag_code.fileloc NOT IN (?) Filed here https://github.com/apache/airflow/issues/12999 Possibly related to

Re: [VOTE] Release Airflow 2.0.0 from 2.0.0rc1

2020-12-10 Thread Daniel Standish
After letting a test dag run for a while, after a few successful dag runs, a task run failed with message `sudo: a password is required` Error documented in issue https://github.com/apache/airflow/issues/12997

Re: [DISCUSS] Custom XCom backends in core or not

2020-12-02 Thread Daniel Standish
Casas Saez > Twitter | Cortex | @casassaez <http://twitter.com/casassaez> > > > On Wed, Dec 2, 2020 at 10:25 AM Tomasz Urbaszek > wrote: > >> > Then you could have XComBackendSerializationBackend >> >> That's definitely something we should av

Re: [DISCUSS] Custom XCom backends in core or not

2020-12-02 Thread Daniel Standish
You could add xcom serialization utils in airflow.utils Then you could have XComBackendSerializationBackend ;) >

Re: [DISCUSS] Custom XCom backends in core or not

2020-12-02 Thread Daniel Standish
Shouldn't serialization be left to each custom backend? On Wed, Dec 2, 2020, 8:11 AM Tomasz Urbaszek wrote: > Thanks Ry! > > > This will allow us to put scone forward as a strong feature rather than > how it has been historically portrayed as flawed/limited. > > This is a good point and I agree

Re: [LAZY CONSENSUS] Commit policy clarification

2020-10-07 Thread Daniel Standish
If i may propose a few [hopefully] clarifying amendments: 1. strike the "furthermore" 2. strike "based on internal PMC discussion". It's unclear to me exactly what this means (which is one problem) but assuming it means "this policy was initially proposed in PMC deliberations", while this may be

Re: Defining Airflow idempotence

2020-07-09 Thread Daniel Standish
We should be careful not to treat every line in the docs as "constitution" -- i.e. as a commandment. And in the docs, I think we would be better off if we more clearly distinguished (1) the description of what *is* from (2) opinion about what *should be.* *This line should be chopped* Case in

Re: [DISCUSS] Enable 'Black' for Auto Code Formatting

2020-07-01 Thread Daniel Standish
+1 no string normalization On Wed, Jul 1, 2020 at 3:20 PM Kamil Breguła wrote: > Hello, > > I would also prefer to make this change after the release of Airflow 2.0. > > I would like to suggest that we use black without normalizing strings. In > my opinion, using two apostrophes in one file

Re: [DISCUSS] Naming of the transfer operators/Hooks

2020-05-29 Thread Daniel Standish
I also vote [1]: S3ToHiveOperator Transfer is redundant. And why not 3 --- as of now there is no such distinction and I am not convinced it is justified. As of now, this is just another operator. Probably most operators do some kind of "transferring" of data and trying to decide what is transfer

Re: Airflow Dev on Windows using WSL2

2020-05-28 Thread Daniel Standish
ories. Assuming WSL 2 becomes > > popular it seems likely that more dev tools will support this way of > > working. I had a quick test of pycharm by pointing it to the automatic > > WSL2 network share but it seems that it doesn't support all pycharm > > features yet. > &g

Re: Airflow Dev on Windows using WSL2

2020-05-28 Thread Daniel Standish
I too was looking forward to trying WSL 2 because we have some windows users. One thing to keep in mind with WSL 2 is that it does not like to share files with windows. Here is a doc on the differences: https://docs.microsoft.com/en-us/windows/wsl/compare-versions And a note specifically on

Re: Setting to add choice of schedule at end or schedule at start of interval

2020-05-06 Thread Daniel Standish
Inspired by James, I tried this out... For others interested, here is sample dag to test it out: class MyDAG(DAG): def following_schedule(self, dttm): pen_dt = pendulum.instance(dttm).replace(second=0, microsecond=0) minutes = pen_dt.minute minutes_mod = minutes % 10

Re: [VOTE] Make conn_id unique in Airflow (a.k.a. Remove connection balancing HA )

2020-04-24 Thread Daniel Standish
+1 (non-binding) :) On Fri, Apr 24, 2020 at 10:00 AM Xinbin Huang wrote: > +1 non-binding > > Best, > Bin > > On Fri, Apr 24, 2020 at 9:20 AM Tomasz Urbaszek < > tomasz.urbas...@polidea.com> > wrote: > > > +1 binding > > > > Tomek > > > > > > On Fri, Apr 24, 2020 at 6:15 PM QP Hou wrote: > >

Re: POC - Kubernetes and Airflow

2020-03-22 Thread Daniel Standish
I think it is good to start simple. I would start out using a single machine, use local executor, running on docker, using docker compose, with the puckel image: https://github.com/puckel/docker-airflow (or your own customization thereof). I would use a cloud database running postgres, e.g. on

Re: [DISCUSS] AIP-33 Secrets Backend

2020-03-13 Thread Daniel Standish
PR has been updated with simplified config as suggested by Kaxil: [secrets] backend_name = backend_kwargs = Now we can only specify one secrets backend. If an alternative backend is specified, search path is alternative > env vars > metastore. Otherwise it is env vars > metastore. No other

Re: [DISCUSS] AIP-33 Secrets Backend

2020-03-10 Thread Daniel Standish
nment Variable. >Of course (2) can solve this, for example, we can pass >*GOOGLE_APPLICATION_CREDENTIAL* JSON file path to > *creds_backend_kwargs *which >would then authenticate with GCP >Or do we want to get that value from* Connection object stored in DB*? > > &

Re: [DISCUSS] AIP-33 Creds Backend

2020-03-07 Thread Daniel Standish
different times > than when connection is retrieved. Often secrets are short-living and they > should be retrieved "just" before they are used - limiting it only to when > the Hook is created and only to URI is quite a limitation I think. > > Again - feel free to disregard that commen

[DISCUSS] AIP-32 Creds Backend

2020-03-07 Thread Daniel Standish
We can currently retrieve connections from environment variables or the metastore database. AIP-33 provides a way to retrieve them from other sources, for example AWS SSM parameter store. There are many instances in

Re: Stateful Tasks (was Poke Reschedule)

2020-01-15 Thread Daniel Standish
> > from my perspective real world systems are not stateless (or without > > side > > > effect) and modeling that state is sometimes the only viable solution. > > The > > > motivation of this thread is that some users are saying they need > > state, > > I

<    1   2   3   >