Re: Catchup By default = False vs LatestOnlyOperator
As the author of catch-up, the idea is that in many cases your data doesn't "window" nicely and you want instead to just run as if it were a brilliant Cron... Ben Sent from my iPhone > On Jul 20, 2018, at 11:39 PM, Shah Altaf wrote: > > Hi my understanding is: if you use the LatestOnlyOperator then when you run > the DAG for the first time you'll see a whole bunch of DAG runs queued up, > and in each run the LatestOnlyOperator will cause the rest of the DAG run > to be skipped. Only the latest DAG will run in 'full'. > > With catchup = False, you should just get just the latest DAG run. > > > On Fri, Jul 20, 2018 at 10:58 PM Shubham Gupta > wrote: > >> -- Forwarded message - >> From: Shubham Gupta >> Date: Fri, Jul 20, 2018 at 2:38 PM >> Subject: Catchup By default = False vs LatestOnlyOperator >> To: >> >> >> Hi! >> >> Can someone please explain the difference b/w catchup by default = False >> and LatestOnlyOperator? >> >> Regarding >> Shubham Gupta >>
Re: Managed Apache Airflow Service on Google Cloud Platform
While I'm no longer a Googler, while there, I was excited about this!!! Thanks, Ben -- Ben Tallman - 503.680.5709 On Tue, May 1, 2018 at 10:31 AM, Maxime Beauchemin < maximebeauche...@gmail.com> wrote: > I'm sure the community agrees when I say that we're happy and honored to > have Googlers on board. > > Congrats on the launch! > > Max > > On Tue, May 1, 2018 at 9:58 AM, Feng Lu <fen...@google.com.invalid> wrote: > > > *Hello everyone,I want to let everyone know that today Google Cloud > > launched a new managed service based on Apache Airflow - Cloud > Composer[1]. > > Now that we have launched into public beta, I wanted to connect with the > > community to share why we chose Airflow and our plans for Composer and > > involvement with the Airflow community.A year ago we set out to build a > > workflow orchestration product for Google Cloud. We strongly believe that > > such a system should be based on open source - it’s described as a core > > value on our public landing page[2]. We chose Airflow for many reasons, > > including the awesome community, its approachability for developers, and > > its core concepts. We built Cloud Composer because we wanted to make > > Airflow accessible to all Google Cloud customers. We’re also encouraging > > these customers to use Airflow outside of Google Cloud - whether it be > > another Cloud or on-premise. When we started building Cloud Composer we > got > > involved in the Airflow community. You have probably seen a few Googlers > > submitting pull requests, including myself. We do not plan on forking > > Airflow with the release of Cloud Composer and it’s our commitment to > > remain involved in the Airflow community as we grow Composer. We will > > continue to actively contribute to Airflow and look forward to partnering > > with the community. You should expect to see myself and other Googlers > > involved in Airflow in the future.Best,Feng[1] > > https://cloud.google.com/composer <https://cloud.google.com/composer>[2] > > https://cloud.google.com/ <https://cloud.google.com/>* > > >
Re: Issue with airflow upgradedb...
Yes, once Sumit asked that question, it made me dig a bit, and ARG. :) Thanks, Ben -- Ben Tallman - 503.680.5709 On Mon, Nov 14, 2016 at 11:40 AM, siddharth anand <san...@apache.org> wrote: > Ben, > I ran into issues while maintaining my company's airflow fork and > cherry-picking my changes into the fork, especially when my changes > included db changes. > > I had to play with the alembic_version in the db and do some other magic > that escapes me now. My best guidance for the future is to cherry pick ALL > DB-related changes from both master and your own btallman github fork into > your apigee fork. That way, the db migration lineage in your apigee fork > matches what is in master. > > -s > > On Fri, Nov 11, 2016 at 4:49 AM, Sumit Maheshwari <sumeet.ma...@gmail.com> > wrote: > >> Ben, >> >> Can u see whats current version using "alembic current".. afaik >> version f2ca10b85618 >> is the latest migration in master and I had no issue migrating to it.. >> >> Also did your CPs contain any custom migrations? >> >> >> >> On Fri, Nov 11, 2016 at 5:04 AM, Ben Tallman <btall...@gmail.com> wrote: >> >> > We are running master with a few cherry picked features... Did we squash >> > commits that Alembic is expecting? Did I? >> > >> > Basically, there are revisions that are no longer in master?? >> Specifically >> > at least: >> > >> > Can't locate revision identified by 'f2ca10b85618' >> > >> > === >> > >> > *airflow upgradedb* >> > [2016-11-10 15:31:04,156] {__init__.py:36} INFO - Using executor >> > CeleryExecutor >> > DB: postgresql://airflow_qa:***@ >> > nucleus.c7b2twrxxjtc.us-west-2.rds.amazonaws.com/nucleus >> > [2016-11-10 15:31:05,707] {utils.py:288} INFO - Creating tables >> > INFO [alembic.runtime.migration] Context impl PostgresqlImpl. >> > INFO [alembic.runtime.migration] Will assume transactional DDL. >> > Traceback (most recent call last): >> > File "/usr/local/bin/airflow", line 15, in >> > args.func(args) >> > File "/Library/Python/2.7/site-packages/airflow/bin/cli.py", line >> 459, >> > in >> > upgradedb >> > utils.upgradedb() >> > File "/Library/Python/2.7/site-packages/airflow/utils.py", line 295, >> in >> > upgradedb >> > command.upgrade(config, 'heads') >> > File "/Library/Python/2.7/site-packages/alembic/command.py", line >> 174, >> > in >> > upgrade >> > script.run_env() >> > File "/Library/Python/2.7/site-packages/alembic/script/base.py", line >> > 397, in run_env >> > util.load_python_file(self.dir, 'env.py') >> > File "/Library/Python/2.7/site-packages/alembic/util/pyfiles.py", >> line >> > 81, in load_python_file >> > module = load_module_py(module_id, path) >> > File "/Library/Python/2.7/site-packages/alembic/util/compat.py", line >> > 79, >> > in load_module_py >> > mod = imp.load_source(module_id, path, fp) >> > File "/Library/Python/2.7/site-packages/airflow/migrations/env.py", >> line >> > 74, in >> > run_migrations_online() >> > File "/Library/Python/2.7/site-packages/airflow/migrations/env.py", >> line >> > 69, in run_migrations_online >> > context.run_migrations() >> > File "", line 8, in run_migrations >> > File "/Library/Python/2.7/site-packages/alembic/runtime/environme >> nt.py", >> > line 797, in run_migrations >> > self.get_context().run_migrations(**kw) >> > File "/Library/Python/2.7/site-packages/alembic/runtime/migration >> .py", >> > line 303, in run_migrations >> > for step in self._migrations_fn(heads, self): >> > File "/Library/Python/2.7/site-packages/alembic/command.py", line >> 163, >> > in >> > upgrade >> > return script._upgrade_revs(revision, rev) >> > File "/Library/Python/2.7/site-packages/alembic/script/base.py", line >> > 314, in _upgrade_revs >> > for script in reversed(list(revs)) >> > File >> > "/System/Library/Frameworks/Python.framework/Versions/2.7/ >> > lib/python2.7/contextlib.py", >> > line 35, in __exit__ >> > self.gen.throw(type, value, traceback) >> > File "/Librar
Tasks getting Queued when Pool is full sometimes never get run
We are seeing an issue when running Master where tasks sometimes never run. It seems that once they get marked as Dependencies Not met because the Pool is full, that isn't being re-evaluated. Is anyone else seeing this? https://issues.apache.org/jira/browse/AIRFLOW-627 Thanks, Ben -- Ben Tallman - 503.680.5709
Issue with airflow upgradedb...
We are running master with a few cherry picked features... Did we squash commits that Alembic is expecting? Did I? Basically, there are revisions that are no longer in master?? Specifically at least: Can't locate revision identified by 'f2ca10b85618' === *airflow upgradedb* [2016-11-10 15:31:04,156] {__init__.py:36} INFO - Using executor CeleryExecutor DB: postgresql://airflow_qa:***@ nucleus.c7b2twrxxjtc.us-west-2.rds.amazonaws.com/nucleus [2016-11-10 15:31:05,707] {utils.py:288} INFO - Creating tables INFO [alembic.runtime.migration] Context impl PostgresqlImpl. INFO [alembic.runtime.migration] Will assume transactional DDL. Traceback (most recent call last): File "/usr/local/bin/airflow", line 15, in args.func(args) File "/Library/Python/2.7/site-packages/airflow/bin/cli.py", line 459, in upgradedb utils.upgradedb() File "/Library/Python/2.7/site-packages/airflow/utils.py", line 295, in upgradedb command.upgrade(config, 'heads') File "/Library/Python/2.7/site-packages/alembic/command.py", line 174, in upgrade script.run_env() File "/Library/Python/2.7/site-packages/alembic/script/base.py", line 397, in run_env util.load_python_file(self.dir, 'env.py') File "/Library/Python/2.7/site-packages/alembic/util/pyfiles.py", line 81, in load_python_file module = load_module_py(module_id, path) File "/Library/Python/2.7/site-packages/alembic/util/compat.py", line 79, in load_module_py mod = imp.load_source(module_id, path, fp) File "/Library/Python/2.7/site-packages/airflow/migrations/env.py", line 74, in run_migrations_online() File "/Library/Python/2.7/site-packages/airflow/migrations/env.py", line 69, in run_migrations_online context.run_migrations() File "", line 8, in run_migrations File "/Library/Python/2.7/site-packages/alembic/runtime/environment.py", line 797, in run_migrations self.get_context().run_migrations(**kw) File "/Library/Python/2.7/site-packages/alembic/runtime/migration.py", line 303, in run_migrations for step in self._migrations_fn(heads, self): File "/Library/Python/2.7/site-packages/alembic/command.py", line 163, in upgrade return script._upgrade_revs(revision, rev) File "/Library/Python/2.7/site-packages/alembic/script/base.py", line 314, in _upgrade_revs for script in reversed(list(revs)) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/contextlib.py", line 35, in __exit__ self.gen.throw(type, value, traceback) File "/Library/Python/2.7/site-packages/alembic/script/base.py", line 160, in _catch_revision_errors compat.raise_from_cause(util.CommandError(resolution)) File "/Library/Python/2.7/site-packages/alembic/util/compat.py", line 132, in raise_from_cause reraise(type(exception), exception, tb=exc_tb) File "/Library/Python/2.7/site-packages/alembic/script/base.py", line 129, in _catch_revision_errors yield File "/Library/Python/2.7/site-packages/alembic/script/base.py", line 310, in _upgrade_revs revs = list(revs) File "/Library/Python/2.7/site-packages/alembic/script/revision.py", line 610, in _iterate_revisions requested_lowers = self.get_revisions(lower) File "/Library/Python/2.7/site-packages/alembic/script/revision.py", line 299, in get_revisions return sum([self.get_revisions(id_elem) for id_elem in id_], ()) File "/Library/Python/2.7/site-packages/alembic/script/revision.py", line 304, in get_revisions for rev_id in resolved_id) File "/Library/Python/2.7/site-packages/alembic/script/revision.py", line 304, in for rev_id in resolved_id) File "/Library/Python/2.7/site-packages/alembic/script/revision.py", line 359, in _revision_for_ident resolved_id) alembic.util.CommandError: Can't locate revision identified by 'f2ca10b85618' Thanks, Ben -- Ben Tallman - 503.680.5709
Re: Killing DAGs
So to kill a running DAG (and keep it killed), we need to clear the state of each task instance? Do we then pause the DAG? Or do that in advance? Thanks, Ben -- Ben Tallman - 503.680.5709 On Tue, Nov 1, 2016 at 9:11 AM, Bolke de Bruin <bdbr...@gmail.com> wrote: > Clearing the state of the task, kills it. So the feature is already in, > but maybe not so clearly. > > > Op 1 nov. 2016, om 17:09 heeft Ben Tallman <btall...@gmail.com> het > volgende geschreven: > > > > I vote for this feature! Preferably a polite and a NOW option. > > > > Thanks, > > Ben > > > > -- > > Ben Tallman - 503.680.5709 > > > > On Tue, Nov 1, 2016 at 9:08 AM, Vishal Doshi <vis...@celect.com> wrote: > > > >> I haven’t been able to find anything on this in the code / docs. Is > there > >> a supported way to kill a DAG (and its still running tasks)? > >> > >
Re: Killing DAGs
I vote for this feature! Preferably a polite and a NOW option. Thanks, Ben -- Ben Tallman - 503.680.5709 On Tue, Nov 1, 2016 at 9:08 AM, Vishal Doshi <vis...@celect.com> wrote: > I haven’t been able to find anything on this in the code / docs. Is there > a supported way to kill a DAG (and its still running tasks)? >
Re: A question/poll on the TaskInstance data model...
That is part of it. In this case, we aren't planning to store the contents of the DagBag, as it was when the DagRun was created (that was the pickling stuff that is deprecated), but it solves HALF of the problem. It allows us to begin at least drawing the graph as it was when it was run. Storing the DagBag Dag would begin to solve your problem as well. I would dearly love to have tasks generated at schedule time (not during the run), not every time the dag file is evaluated (every 3 minutes or so). There is disagreement as to the best way to handle this, however based on conversations that I've heard and participated in, the current preferred solution is to head down the path of a "git time machine". However that doesn't actually solve the problem that we see. Basically, we want to have the evaluation of the dag python file interogate outside systems to generate the tasks and have them run. The problem with the git time machine solution is that those outside systems are not static. They change over time. In the past, an effort was made to pickle the dag, and run from that, but pickling has it's own issues. To be clear, at the time, I think the goal of the pickling was to distribute the dag to distributed workers, not freeze it in time. I think that storing the pickled dag in the dagrun could probably solve this, but it is a major issue/change. It is one that I am beginning to work on for us though. Thanks, Ben *--* *ben tallman* | *apigee <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Fwww.apigee.com%2F=5141814536306688=e558dca3-da0a-4d9f-c1b3-6cb9174fcb5e>* | m: +1.503.680.5709 | o: +1.503.608.7552 | twitter @anonymousmanage <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Ftwitter.com%2Fanonymousmanage=5141814536306688=e558dca3-da0a-4d9f-c1b3-6cb9174fcb5e> @apigee <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=https%3A%2F%2Ftwitter.com%2Fapigee=5141814536306688=e558dca3-da0a-4d9f-c1b3-6cb9174fcb5e> <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Fadapt.apigee.com%2F=5141814536306688=e558dca3-da0a-4d9f-c1b3-6cb9174fcb5e> On Sat, Oct 15, 2016 at 11:35 AM, Boris Tyukin <bo...@boristyukin.com> wrote: > Hi Ben, > > is it to address the issue I just described yesterday "Issue with > Dynamically created tasks in a DAG"? > > I was hoping someone can confirm this as a bug and if there is a JIRA to > address that - otherwise I would be happy to open one. To me it is a pretty > major issue and a very misleading one especially because Airflow's key > feature is to generate/update DAGs programmatically >
A question/poll on the TaskInstance data model...
I (and Apigee) would like to have the DAG Graph paint old DagRuns based on the tasks (and ids) that ran, and not based off of the current DAG from the DagBag. In order to do that, I need to be able to map a DagRun, and one way is from the TaskInstance table. However that doesn't actually contain links between tasks, but it could. Does anyone feel strongly against storing upstream and downstream task_ids in the taskinstance table as a first step? Our goal is NOT to be able to rerun the past, but to be able to see the past (and provide links to the taskinstance details and logs. Thanks, Ben *--* *ben tallman* | *apigee <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Fwww.apigee.com%2F=5141814536306688=0949e84f-a26c-4234-e814-b829991e14c9>* | m: +1.503.680.5709 | o: +1.503.608.7552 | twitter @anonymousmanage <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Ftwitter.com%2Fanonymousmanage=5141814536306688=0949e84f-a26c-4234-e814-b829991e14c9> @apigee <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=https%3A%2F%2Ftwitter.com%2Fapigee=5141814536306688=0949e84f-a26c-4234-e814-b829991e14c9> <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Fadapt.apigee.com%2F=5141814536306688=0949e84f-a26c-4234-e814-b829991e14c9>
New Scheduler process seems incompatible with sqlite...
When testing the scheduler locally, on master, using sqlite and sequential executor, dag.sync_to_db never returns when querying the db... orm_dag = session.query( > DagModel).filter(DagModel.dag_id == dag.dag_id).first() Any insights? My guess is that this is due to the new multi-process nature of dag processing conflicting with sqlite? Thanks, Ben *--* *ben tallman* | *apigee <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Fwww.apigee.com%2F=5141814536306688=291525b6-758e-488a-f6fb-b8a6ee46e8a8>* | m: +1.503.680.5709 | o: +1.503.608.7552 | twitter @anonymousmanage <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Ftwitter.com%2Fanonymousmanage=5141814536306688=291525b6-758e-488a-f6fb-b8a6ee46e8a8> @apigee <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=https%3A%2F%2Ftwitter.com%2Fapigee=5141814536306688=291525b6-758e-488a-f6fb-b8a6ee46e8a8> <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Fadapt.apigee.com%2F=5141814536306688=291525b6-758e-488a-f6fb-b8a6ee46e8a8>
Re: Invitation: Airflow Contributors & Roadmapping Meeting @ Thu Oct 6, 2016 10am - 12pm (PDT) (gurer.kira...@airbnb.com)
I will be there... Thanks, Ben *--* *ben tallman* | *apigee <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Fwww.apigee.com%2F=5141814536306688=7c5aea8e-ca41-471c-c5ca-303aaa06c73e>* | m: +1.503.680.5709 | o: +1.503.608.7552 | twitter @anonymousmanage <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Ftwitter.com%2Fanonymousmanage=5141814536306688=7c5aea8e-ca41-471c-c5ca-303aaa06c73e> @apigee <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=https%3A%2F%2Ftwitter.com%2Fapigee=5141814536306688=7c5aea8e-ca41-471c-c5ca-303aaa06c73e> <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Fadapt.apigee.com%2F=5141814536306688=7c5aea8e-ca41-471c-c5ca-303aaa06c73e> On Fri, Sep 16, 2016 at 3:53 PM, <gurer.kira...@airbnb.com.invalid> wrote: > Never miss an appointment. > Download the Google Calendar app. > <https://goo.gl/czFEqm> <https://goo.gl/hxLBzR> > more details » > <https://www.google.com/calendar/event?action=VIEW=b2YxYjB2a3M2cDc0ZDVnMHYwMGZmMmFzN2sgZGV2QGFpcmZsb3cuaW5jdWJhdG9yLmFwYWNoZS5vcmc=MjQjZ3VyZXIua2lyYXRsaUBhaXJibmIuY29tMDQ5ZGNhMzI3OWI5NTJjNThjYTc3YTkzYzgwODc0Yjk0ZmI2NWUwOQ=America/Los_Angeles=en> > Airflow Contributors & Roadmapping Meeting > Hi all, > > Again it has been a while after our last meeting. Let's have another > meeting to sync up! > > We are super happy to host all you folks at Airbnb(888 Brannan St 94103) > on October 7th at 10:00am. Also we will have a webex session at > https://airbnb.webex.com/meet/gurer.kiratli > <https://www.google.com/url?q=https%3A%2F%2Fairbnb.webex.com%2Fmeet%2Fgurer.kiratli=D=2=AFQjCNFC3kEwm1Mu8gSE2gl7SlNkV5NMCg>. > > > I will send this out as a Google Calendar but due to the fact that it goes > thru the mail group I don't see your responses. If you are planning to come > on please respond back to me with your first name, last name. And please > try to arrive by 9:30 so we can check you and head to the meeting room. : ) > > Here is the proposed agenda: > 10:00am -10:45am PDT > Contributors sync-up: progress and plan > Release Schedule, Management > 10:45am - 11:00am PDT > Coffee Break > 11:00am - 12:00pm PDT > Roadmap discussion > 12:00pm - 1:00pm PDT > Lunch @ Airbnb > Cheers, > > Gurer > > === * * * === > https://airbnb.webex.com/meet/gurer.kiratli > <https://www.google.com/url?q=https%3A%2F%2Fairbnb.webex.com%2Fmeet%2Fgurer.kiratli=D=2=AFQjCNFC3kEwm1Mu8gSE2gl7SlNkV5NMCg> > > [WebEx: 0] > > *When* > Thu Oct 6, 2016 10am – 12pm Pacific Time > > *Where* > Airbnb HQ, 888 Brannan St, San Francisco, CA 94103, USA (map > <https://maps.google.com/maps?q=Airbnb+HQ,+888+Brannan+St,+San+Francisco,+CA+94103,+USA=en> > ) > > *Calendar* > gurer.kira...@airbnb.com > > *Who* > • > gurer.kira...@airbnb.com - organizer > • > dev@airflow.incubator.apache.org > > Going? *Yes > <https://www.google.com/calendar/event?action=RESPOND=b2YxYjB2a3M2cDc0ZDVnMHYwMGZmMmFzN2sgZGV2QGFpcmZsb3cuaW5jdWJhdG9yLmFwYWNoZS5vcmc=1=MjQjZ3VyZXIua2lyYXRsaUBhaXJibmIuY29tMDQ5ZGNhMzI3OWI5NTJjNThjYTc3YTkzYzgwODc0Yjk0ZmI2NWUwOQ=America/Los_Angeles=en> > - Maybe > <https://www.google.com/calendar/event?action=RESPOND=b2YxYjB2a3M2cDc0ZDVnMHYwMGZmMmFzN2sgZGV2QGFpcmZsb3cuaW5jdWJhdG9yLmFwYWNoZS5vcmc=3=MjQjZ3VyZXIua2lyYXRsaUBhaXJibmIuY29tMDQ5ZGNhMzI3OWI5NTJjNThjYTc3YTkzYzgwODc0Yjk0ZmI2NWUwOQ=America/Los_Angeles=en> > - No > <https://www.google.com/calendar/event?action=RESPOND=b2YxYjB2a3M2cDc0ZDVnMHYwMGZmMmFzN2sgZGV2QGFpcmZsb3cuaW5jdWJhdG9yLmFwYWNoZS5vcmc=2=MjQjZ3VyZXIua2lyYXRsaUBhaXJibmIuY29tMDQ5ZGNhMzI3OWI5NTJjNThjYTc3YTkzYzgwODc0Yjk0ZmI2NWUwOQ=America/Los_Angeles=en>* > more options » > <https://www.google.com/calendar/event?action=VIEW=b2YxYjB2a3M2cDc0ZDVnMHYwMGZmMmFzN2sgZGV2QGFpcmZsb3cuaW5jdWJhdG9yLmFwYWNoZS5vcmc=MjQjZ3VyZXIua2lyYXRsaUBhaXJibmIuY29tMDQ5ZGNhMzI3OWI5NTJjNThjYTc3YTkzYzgwODc0Yjk0ZmI2NWUwOQ=America/Los_Angeles=en> > > Invitation from Google Calendar <https://www.google.com/calendar/> > > You are receiving this courtesy email at the account > dev@airflow.incubator.apache.org because you are an attendee of this > event. > > To stop receiving future updates for this event, decline this event. > Alternatively you can sign up for a Google account at > https://www.google.com/calendar/ and control your notification settings > for your entire calendar. > > Forwarding this invitation could allow any recipient to modify your RSVP > response. Learn More > <https://support.google.com/calendar/answer/37135#forwarding>. >
Travis CI...
I noticed on a pull request (and on my fork) that Travis CI doesn't seem very reliable. I get a fail on at least one of the 6 builds almost every time, and even if I change nothing, it seems to succeed/fail on another of the 6 each time, unrelated to my code... Has anyone else seen this? Thanks, Ben *--* *ben tallman* | *apigee <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Fwww.apigee.com%2F=5141814536306688=6af5cfc8-c90e-4e34-9b50-01af7a111a7e>* | m: +1.503.680.5709 | o: +1.503.608.7552 | twitter @anonymousmanage <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Ftwitter.com%2Fanonymousmanage=5141814536306688=6af5cfc8-c90e-4e34-9b50-01af7a111a7e> @apigee <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=https%3A%2F%2Ftwitter.com%2Fapigee=5141814536306688=6af5cfc8-c90e-4e34-9b50-01af7a111a7e> <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Fadapt.apigee.com%2F=5141814536306688=6af5cfc8-c90e-4e34-9b50-01af7a111a7e>
Re: Dynamically defining tasks in a DAG -- HOW?
We have done this a lot, and the one issue is that every time the DAG is evaluated (even during a run), the SQL will be re-run, and tasks can vary. In fact, we had a select statement that actually marked items as in process during select, and THAT was bad. We have moved to x number of tasks, and each one grabs a line from the DB, and 0 to n of them can actually get skipped if they don't get a line from the DB. To be clear, we would really like the DAG's tasks to be frozen at time of schedule, but that has not been our experience, and I believe will take a fairly major re-factor. Furthermore, I believe that the definition of a Dynamic Acyclic Graph is that it is re-evaluated during runtime and that the path is non-determinate at runtime. Thanks, Ben *--* *ben tallman* | *apigee <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Fwww.apigee.com%2F=5141814536306688=999a610c-8298-4095-eefd-dfab06b90c1f>* | m: +1.503.680.5709 | o: +1.503.608.7552 | twitter @anonymousmanage <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Ftwitter.com%2Fanonymousmanage=5141814536306688=999a610c-8298-4095-eefd-dfab06b90c1f> @apigee <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=https%3A%2F%2Ftwitter.com%2Fapigee=5141814536306688=999a610c-8298-4095-eefd-dfab06b90c1f> <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Fadapt.apigee.com%2F=5141814536306688=999a610c-8298-4095-eefd-dfab06b90c1f> On Thu, Sep 8, 2016 at 1:50 PM, J C Lawrence <c...@kanga.nu> wrote: > I have a few hundred thousand files arriving from an external service > each day and would like to ETL their contents into my store with > Airflow. As the files are large and numerous and slow to process, I'd > also like to process them in parallel...so I thought something like > this: > > def sub_dag ( > parent_dag_name, > child_dag_name, > start_date, > schedule_interval): > dag = DAG( > "%s.%s" % (parent_dag_name, child_dag_name), > schedule_interval = schedule_interval, > start_date = start_date, > ) > fan_out = operators.DummyOperator( > task_id = "fan_out", > dag = dag, > ) > fan_in = operators.DummyOperator( > task_id = "fan_in", > dag = dag, > ) > cur = hooks.PostgresHook ("MY_DB").get_cursor () > cur.execute ("""SELECT file_id > FROM some_table > WHERE something;""".format (foo = func(start_date)) > for rec in cur: > fid = rec[0] > o = operators.PythonOperator ( > task_id = "ImportThing__%s" % fid, > provide_context = True, > python_callable = import_func, > params = {"file_id": fid,}, > dag = dag) > o.set_upstream (fan_out) > o.set_downstream (fan_in) > cur.close () > return dag > > The idea being that the number and identity of the tasks in the sub-DAG > would vary dynamically depending on what day it was running for (ie > which what rows come back from the query for that day). But...no, this > doesn't seem to work. > > Any recommendations for how to approach this? > > -- JCL >
Re: Speaking at AWS Loft on Wednesday...
Sorry, no video. I will also be speaking to the Data Warehouse team at Optimizely on the 21st about our process, and really digging in to the Airflow stuff. Thanks, Ben *--* *ben tallman* | *apigee <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Fwww.apigee.com%2F=5141814536306688=6339792d-16c4-439c-f3d4-3b53f7945039>* | m: +1.503.680.5709 | o: +1.503.608.7552 | twitter @anonymousmanage <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Ftwitter.com%2Fanonymousmanage=5141814536306688=6339792d-16c4-439c-f3d4-3b53f7945039> @apigee <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=https%3A%2F%2Ftwitter.com%2Fapigee=5141814536306688=6339792d-16c4-439c-f3d4-3b53f7945039> <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Fadapt.apigee.com%2F=5141814536306688=6339792d-16c4-439c-f3d4-3b53f7945039> On Tue, Sep 6, 2016 at 9:13 PM, siddharth anand <san...@apache.org> wrote: > Ben, > Did you get video for this talk? > -s > > On Thu, Sep 1, 2016 at 10:04 AM, siddharth anand <san...@apache.org> > wrote: > > > Nice. Please feel free to add a link to our Links wiki if you have slides > > or Video, we'd be happy to retweet as well. We will do the same when you > > give the meetup talk as well. > > -s > > > > On Mon, Aug 29, 2016 at 2:09 PM, Ben Tallman <b...@apigee.com> wrote: > > > >> I thought I would let people know that I'm speaking at the AWS Popup > loft > >> on Wednesday at 2:00 discussing how RDS powers (and empowers) our > stack... > >> Planning on mentioning and discussing Apache Airflow in that stack. > >> > >> https://aws.amazon.com/start-ups/loft/sf-loft/ - August 31st. > >> > >> Customer Reference: Ben Tallman, an Apigeek at Apigee, will discuss > >> Apigee's Data Warehouse and Business Performance Management, > specifically > >> how leveraging Apigee Edge, RDS Postgres, Apache Airflow and Periscope > >> Data > >> has allowed them to "free the data" and become a more agile and data > >> driven > >> company. > >> > >> About the Speaker: Ben Tallman is both a Developer and an IT > Professional, > >> a Serial Entrepreneur, and of course, a Change Agent. When he isn't > >> chasing > >> his kids, cooking food or solving business problems, he is thinking > about > >> business, data, systems and reduced complexity. > >> > >> Thanks, > >> Ben > >> > >> *--* > >> *ben tallman* | *apigee > >> <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x > >> 6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t= > >> http%3A%2F%2Fwww.apigee.com%2F=5141814536306688= > >> f5deb95f-ce71-4339-8f52-88bf448cc65c>* > >> | m: +1.503.680.5709 | o: +1.503.608.7552 | twitter @anonymousmanage > >> <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x > >> 6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t= > >> http%3A%2F%2Ftwitter.com%2Fanonymousmanage=51418145363066 > >> 88=f5deb95f-ce71-4339-8f52-88bf448cc65c> > >> @apigee > >> <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x > >> 6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t= > >> https%3A%2F%2Ftwitter.com%2Fapigee=5141814536306688& > >> pi=f5deb95f-ce71-4339-8f52-88bf448cc65c> > >> > > > > >
Spam in the Confluence wiki...
Is there a limit on who get's to edit/add pages? Seems like someone is putting TV listings into the pages... Thanks, Ben *--* *ben tallman* | *apigee <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Fwww.apigee.com%2F=5141814536306688=6d9066cf-12cf-4a10-f84d-1e3d47c29bfd>* | m: +1.503.680.5709 | o: +1.503.608.7552 | twitter @anonymousmanage <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Ftwitter.com%2Fanonymousmanage=5141814536306688=6d9066cf-12cf-4a10-f84d-1e3d47c29bfd> @apigee <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=https%3A%2F%2Ftwitter.com%2Fapigee=5141814536306688=6d9066cf-12cf-4a10-f84d-1e3d47c29bfd> <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Fadapt.apigee.com%2F=5141814536306688=6d9066cf-12cf-4a10-f84d-1e3d47c29bfd>
Re: Meetup in SF: 2016-09-21
Happy to take the extra time to facilitate a talk about "features" and "issues"... Get Outlook for iOS On Tue, Aug 16, 2016 at 1:28 PM -0700, "siddharth anand"wrote: Great! I just tweeted it via our ApacheAirflow twitter account! -s On Tue, Aug 16, 2016 at 11:26 AM, Jeff Balogh wrote: Stripe is hosting the next Airflow Meetup on Wednesday, September 21st in San Francisco. You can sign up here: http://www.meetup.com/Bay-Area-Apache-Airflow-Incubating-Meetup/events/233316814/ We're still looking for one more speaker to give a 15 minute talk. Let me know if you'd like to volunteer. Cheers, jeff
Re: Airflow Meetup at Stripe in September?
Happy to give a talk about Apigee's use case as well... Thanks, Ben *--* *ben tallman* | *apigee <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Fwww.apigee.com%2F=5141814536306688=de892aba-bd1f-4c0e-948f-454ffe35b613>* | m: +1.503.680.5709 | o: +1.503.608.7552 | twitter @anonymousmanage <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Ftwitter.com%2Fanonymousmanage=5141814536306688=de892aba-bd1f-4c0e-948f-454ffe35b613> @apigee <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=https%3A%2F%2Ftwitter.com%2Fapigee=5141814536306688=de892aba-bd1f-4c0e-948f-454ffe35b613> On Wed, Aug 10, 2016 at 11:20 PM, siddharth anand <san...@apache.org> wrote: > Jeff, > Any updates on this? > > Sumit, > You are also welcome to host a meet-up in your neck of the woods and some > of us can present remotely. Presenting this option as not an "either-or", > but an "and" - you are welcome to present remotely here and host in your > home country. > > -s > > On Thu, Aug 4, 2016 at 10:56 PM, Sumit Maheshwari <sumeet.ma...@gmail.com> > wrote: > > > Hi Jeff, > > > > If possible I would also love to give a small talk remotely. > > > > > > Thanks, > > Sumit > > > > On Fri, Aug 5, 2016 at 10:39 AM, siddharth anand <san...@apache.org> > > wrote: > > > > > You can also ask on this list if there are folks interested in > speaking. > > > Also, should you folks ever want to speak in other meet-ups too, it's > > fully > > > welcome. > > > > > > -s > > > > > > On Thu, Aug 4, 2016 at 7:39 PM, Siddharth Anand > <san...@agari.com.invalid > > > > > > wrote: > > > > > > > Jeff, > > > > That sounds great. We encourage any and all in the community to host > > > > Airflow meet-ups. > > > > > > > > We recommend the following meet-up format: > > > > https://cwiki.apache.org/confluence/display/AIRFLOW/Meetups > > > > > > > > Though the format is a suggestion, we do prefer you shoot for 1 host > > > > speaker and 2 guest speakers. > > > > > > > > When you are ready (i.e. confirmed with our organization of the date, > > > time, > > > > venue, and 3 speakers), shoot us a note on this list and one of the > > > > committers will set up the Meet-up and tweet/announce the meet-up > using > > > the > > > > Airflow twitter channel. We also suggest, as mentioned on the page, > > that > > > > you stream and record the video for posterity. > > > > > > > > -s > > > > > > > > On Thu, Aug 4, 2016 at 6:40 PM, Jeff Balogh <jbal...@stripe.com> > > wrote: > > > > > > > > > Hey y'all, we're heavy users of Airflow at Stripe and we'd be happy > > to > > > > > host a meetup in September. Our office is in San Francisco down by > > > > > AT park. How do we make that happen? > > > > > > > > > > > > > > >
Celery Executor Question...
The doc section on Celery essentially points to the celery site for config, however the celery_result_backend setting seems to be skipped, as a result... Any insights? Thanks, Ben *--* *ben tallman* | *apigee <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Fwww.apigee.com%2F=5141814536306688=d4310b3d-5ab1-470f-8c18-e74f56a84b9a>* | m: +1.503.680.5709 | o: +1.503.608.7552 | twitter @anonymousmanage <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Ftwitter.com%2Fanonymousmanage=5141814536306688=d4310b3d-5ab1-470f-8c18-e74f56a84b9a> @apigee <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=https%3A%2F%2Ftwitter.com%2Fapigee=5141814536306688=d4310b3d-5ab1-470f-8c18-e74f56a84b9a>