Re: Catchup By default = False vs LatestOnlyOperator

2018-07-21 Thread Ben Tallman
As the author of catch-up, the idea is that in many cases your data doesn't 
"window" nicely and you want instead to just run as if it were a brilliant 
Cron...

Ben

Sent from my iPhone

> On Jul 20, 2018, at 11:39 PM, Shah Altaf  wrote:
> 
> Hi my understanding is: if you use the LatestOnlyOperator then when you run
> the DAG for the first time you'll see a whole bunch of DAG runs queued up,
> and in each run the LatestOnlyOperator will cause the rest of the DAG run
> to be skipped.  Only the latest DAG will run in 'full'.
> 
> With catchup = False, you should just get just the latest DAG run.
> 
> 
> On Fri, Jul 20, 2018 at 10:58 PM Shubham Gupta 
> wrote:
> 
>> -- Forwarded message -
>> From: Shubham Gupta 
>> Date: Fri, Jul 20, 2018 at 2:38 PM
>> Subject: Catchup By default = False vs LatestOnlyOperator
>> To: 
>> 
>> 
>> Hi!
>> 
>> Can someone please explain the difference b/w catchup by default = False
>> and LatestOnlyOperator?
>> 
>> Regarding
>> Shubham Gupta
>> 


Re: Managed Apache Airflow Service on Google Cloud Platform

2018-05-01 Thread Ben Tallman
While I'm no longer a Googler, while there, I was excited about this!!!

Thanks,
Ben

--
Ben Tallman - 503.680.5709

On Tue, May 1, 2018 at 10:31 AM, Maxime Beauchemin <
maximebeauche...@gmail.com> wrote:

> I'm sure the community agrees when I say that we're happy and honored to
> have Googlers on board.
>
> Congrats on the launch!
>
> Max
>
> On Tue, May 1, 2018 at 9:58 AM, Feng Lu <fen...@google.com.invalid> wrote:
>
> > *Hello everyone,I want to let everyone know that today Google Cloud
> > launched a new managed service based on Apache Airflow - Cloud
> Composer[1].
> > Now that we have launched into public beta, I wanted to connect with the
> > community to share why we chose Airflow and our plans for Composer and
> > involvement with the Airflow community.A year ago we set out to build a
> > workflow orchestration product for Google Cloud. We strongly believe that
> > such a system should be based on open source - it’s described as a core
> > value on our public landing page[2]. We chose Airflow for many reasons,
> > including the awesome community, its approachability for developers, and
> > its core concepts. We built Cloud Composer because we wanted to make
> > Airflow accessible to all Google Cloud customers. We’re also encouraging
> > these customers to use Airflow outside of Google Cloud - whether it be
> > another Cloud or on-premise. When we started building Cloud Composer we
> got
> > involved in the Airflow community. You have probably seen a few Googlers
> > submitting pull requests, including myself. We do not plan on forking
> > Airflow with the release of Cloud Composer and it’s our commitment to
> > remain involved in the Airflow community as we grow Composer. We will
> > continue to actively contribute to Airflow and look forward to partnering
> > with the community. You should expect to see myself and other Googlers
> > involved in Airflow in the future.Best,Feng[1]
> > https://cloud.google.com/composer <https://cloud.google.com/composer>[2]
> > https://cloud.google.com/ <https://cloud.google.com/>*
> >
>


Re: Issue with airflow upgradedb...

2016-11-14 Thread Ben Tallman
Yes, once Sumit asked that question, it made me dig a bit, and ARG.

:)



Thanks,
Ben

--
Ben Tallman - 503.680.5709

On Mon, Nov 14, 2016 at 11:40 AM, siddharth anand <san...@apache.org> wrote:

> Ben,
> I ran into issues while maintaining my company's airflow fork and
> cherry-picking my changes into the fork, especially when my changes
> included db changes.
>
> I had to play with the alembic_version in the db and do some other magic
> that escapes me now. My best guidance for the future is to cherry pick ALL
> DB-related changes from both master and your own btallman github fork into
> your apigee fork. That way, the db migration lineage in your apigee fork
> matches what is in master.
>
> -s
>
> On Fri, Nov 11, 2016 at 4:49 AM, Sumit Maheshwari <sumeet.ma...@gmail.com>
> wrote:
>
>> Ben,
>>
>> Can u see whats current version using "alembic current".. afaik
>> version f2ca10b85618
>> is the latest migration in master and I had no issue migrating to it..
>>
>> Also did your CPs contain any custom migrations?
>>
>>
>>
>> On Fri, Nov 11, 2016 at 5:04 AM, Ben Tallman <btall...@gmail.com> wrote:
>>
>> > We are running master with a few cherry picked features... Did we squash
>> > commits that Alembic is expecting? Did I?
>> >
>> > Basically, there are revisions that are no longer in master??
>> Specifically
>> > at least:
>> >
>> > Can't locate revision identified by 'f2ca10b85618'
>> >
>> > ===
>> >
>> > *airflow upgradedb*
>> > [2016-11-10 15:31:04,156] {__init__.py:36} INFO - Using executor
>> > CeleryExecutor
>> > DB: postgresql://airflow_qa:***@
>> > nucleus.c7b2twrxxjtc.us-west-2.rds.amazonaws.com/nucleus
>> > [2016-11-10 15:31:05,707] {utils.py:288} INFO - Creating tables
>> > INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
>> > INFO  [alembic.runtime.migration] Will assume transactional DDL.
>> > Traceback (most recent call last):
>> >   File "/usr/local/bin/airflow", line 15, in 
>> > args.func(args)
>> >   File "/Library/Python/2.7/site-packages/airflow/bin/cli.py", line
>> 459,
>> > in
>> > upgradedb
>> > utils.upgradedb()
>> >   File "/Library/Python/2.7/site-packages/airflow/utils.py", line 295,
>> in
>> > upgradedb
>> > command.upgrade(config, 'heads')
>> >   File "/Library/Python/2.7/site-packages/alembic/command.py", line
>> 174,
>> > in
>> > upgrade
>> > script.run_env()
>> >   File "/Library/Python/2.7/site-packages/alembic/script/base.py", line
>> > 397, in run_env
>> > util.load_python_file(self.dir, 'env.py')
>> >   File "/Library/Python/2.7/site-packages/alembic/util/pyfiles.py",
>> line
>> > 81, in load_python_file
>> > module = load_module_py(module_id, path)
>> >   File "/Library/Python/2.7/site-packages/alembic/util/compat.py", line
>> > 79,
>> > in load_module_py
>> > mod = imp.load_source(module_id, path, fp)
>> >   File "/Library/Python/2.7/site-packages/airflow/migrations/env.py",
>> line
>> > 74, in 
>> > run_migrations_online()
>> >   File "/Library/Python/2.7/site-packages/airflow/migrations/env.py",
>> line
>> > 69, in run_migrations_online
>> > context.run_migrations()
>> >   File "", line 8, in run_migrations
>> >   File "/Library/Python/2.7/site-packages/alembic/runtime/environme
>> nt.py",
>> > line 797, in run_migrations
>> > self.get_context().run_migrations(**kw)
>> >   File "/Library/Python/2.7/site-packages/alembic/runtime/migration
>> .py",
>> > line 303, in run_migrations
>> > for step in self._migrations_fn(heads, self):
>> >   File "/Library/Python/2.7/site-packages/alembic/command.py", line
>> 163,
>> > in
>> > upgrade
>> > return script._upgrade_revs(revision, rev)
>> >   File "/Library/Python/2.7/site-packages/alembic/script/base.py", line
>> > 314, in _upgrade_revs
>> > for script in reversed(list(revs))
>> >   File
>> > "/System/Library/Frameworks/Python.framework/Versions/2.7/
>> > lib/python2.7/contextlib.py",
>> > line 35, in __exit__
>> > self.gen.throw(type, value, traceback)
>> >   File "/Librar

Tasks getting Queued when Pool is full sometimes never get run

2016-11-14 Thread Ben Tallman
We are seeing an issue when running Master where tasks sometimes never run.
It seems that once they get marked as Dependencies Not met because the Pool
is full, that isn't being re-evaluated. Is anyone else seeing this?

https://issues.apache.org/jira/browse/AIRFLOW-627

Thanks,
Ben

--
Ben Tallman - 503.680.5709


Issue with airflow upgradedb...

2016-11-10 Thread Ben Tallman
We are running master with a few cherry picked features... Did we squash
commits that Alembic is expecting? Did I?

Basically, there are revisions that are no longer in master?? Specifically
at least:

Can't locate revision identified by 'f2ca10b85618'

===

*airflow upgradedb*
[2016-11-10 15:31:04,156] {__init__.py:36} INFO - Using executor
CeleryExecutor
DB: postgresql://airflow_qa:***@
nucleus.c7b2twrxxjtc.us-west-2.rds.amazonaws.com/nucleus
[2016-11-10 15:31:05,707] {utils.py:288} INFO - Creating tables
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
Traceback (most recent call last):
  File "/usr/local/bin/airflow", line 15, in 
args.func(args)
  File "/Library/Python/2.7/site-packages/airflow/bin/cli.py", line 459, in
upgradedb
utils.upgradedb()
  File "/Library/Python/2.7/site-packages/airflow/utils.py", line 295, in
upgradedb
command.upgrade(config, 'heads')
  File "/Library/Python/2.7/site-packages/alembic/command.py", line 174, in
upgrade
script.run_env()
  File "/Library/Python/2.7/site-packages/alembic/script/base.py", line
397, in run_env
util.load_python_file(self.dir, 'env.py')
  File "/Library/Python/2.7/site-packages/alembic/util/pyfiles.py", line
81, in load_python_file
module = load_module_py(module_id, path)
  File "/Library/Python/2.7/site-packages/alembic/util/compat.py", line 79,
in load_module_py
mod = imp.load_source(module_id, path, fp)
  File "/Library/Python/2.7/site-packages/airflow/migrations/env.py", line
74, in 
run_migrations_online()
  File "/Library/Python/2.7/site-packages/airflow/migrations/env.py", line
69, in run_migrations_online
context.run_migrations()
  File "", line 8, in run_migrations
  File "/Library/Python/2.7/site-packages/alembic/runtime/environment.py",
line 797, in run_migrations
self.get_context().run_migrations(**kw)
  File "/Library/Python/2.7/site-packages/alembic/runtime/migration.py",
line 303, in run_migrations
for step in self._migrations_fn(heads, self):
  File "/Library/Python/2.7/site-packages/alembic/command.py", line 163, in
upgrade
return script._upgrade_revs(revision, rev)
  File "/Library/Python/2.7/site-packages/alembic/script/base.py", line
314, in _upgrade_revs
for script in reversed(list(revs))
  File
"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/contextlib.py",
line 35, in __exit__
self.gen.throw(type, value, traceback)
  File "/Library/Python/2.7/site-packages/alembic/script/base.py", line
160, in _catch_revision_errors
compat.raise_from_cause(util.CommandError(resolution))
  File "/Library/Python/2.7/site-packages/alembic/util/compat.py", line
132, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb)
  File "/Library/Python/2.7/site-packages/alembic/script/base.py", line
129, in _catch_revision_errors
yield
  File "/Library/Python/2.7/site-packages/alembic/script/base.py", line
310, in _upgrade_revs
revs = list(revs)
  File "/Library/Python/2.7/site-packages/alembic/script/revision.py", line
610, in _iterate_revisions
requested_lowers = self.get_revisions(lower)
  File "/Library/Python/2.7/site-packages/alembic/script/revision.py", line
299, in get_revisions
return sum([self.get_revisions(id_elem) for id_elem in id_], ())
  File "/Library/Python/2.7/site-packages/alembic/script/revision.py", line
304, in get_revisions
for rev_id in resolved_id)
  File "/Library/Python/2.7/site-packages/alembic/script/revision.py", line
304, in 
for rev_id in resolved_id)
  File "/Library/Python/2.7/site-packages/alembic/script/revision.py", line
359, in _revision_for_ident
resolved_id)
alembic.util.CommandError: Can't locate revision identified by
'f2ca10b85618'

Thanks,
Ben

--
Ben Tallman - 503.680.5709


Re: Killing DAGs

2016-11-01 Thread Ben Tallman
So to kill a running DAG (and keep it killed), we need to clear the state
of each task instance? Do we then pause the DAG? Or do that in advance?

Thanks,
Ben

--
Ben Tallman - 503.680.5709

On Tue, Nov 1, 2016 at 9:11 AM, Bolke de Bruin <bdbr...@gmail.com> wrote:

> Clearing the state of the task, kills it. So the feature is already in,
> but maybe not so clearly.
>
> > Op 1 nov. 2016, om 17:09 heeft Ben Tallman <btall...@gmail.com> het
> volgende geschreven:
> >
> > I vote for this feature! Preferably a polite and a NOW option.
> >
> > Thanks,
> > Ben
> >
> > --
> > Ben Tallman - 503.680.5709
> >
> > On Tue, Nov 1, 2016 at 9:08 AM, Vishal Doshi <vis...@celect.com> wrote:
> >
> >> I haven’t been able to find anything on this in the code / docs. Is
> there
> >> a supported way to kill a DAG (and its still running tasks)?
> >>
>
>


Re: Killing DAGs

2016-11-01 Thread Ben Tallman
I vote for this feature! Preferably a polite and a NOW option.

Thanks,
Ben

--
Ben Tallman - 503.680.5709

On Tue, Nov 1, 2016 at 9:08 AM, Vishal Doshi <vis...@celect.com> wrote:

> I haven’t been able to find anything on this in the code / docs. Is there
> a supported way to kill a DAG (and its still running tasks)?
>


Re: A question/poll on the TaskInstance data model...

2016-10-15 Thread Ben Tallman
That is part of it. In this case, we aren't planning to store the contents
of the DagBag, as it was when the DagRun was created (that was the pickling
stuff that is deprecated), but it solves HALF of the problem. It allows us
to begin at least drawing the graph as it was when it was run. Storing the
DagBag Dag would begin to solve your problem as well.

I would dearly love to have tasks generated at schedule time (not during
the run), not every time the dag file is evaluated (every 3 minutes or so).

There is disagreement as to the best way to handle this, however based on
conversations that I've heard and participated in, the current preferred
solution is to head down the path of a "git time machine". However that
doesn't actually solve the problem that we see. Basically, we want to have
the evaluation of the dag python file interogate outside systems to
generate the tasks and have them run. The problem with the git time machine
solution is that those outside systems are not static. They change over
time. In the past, an effort was made to pickle the dag, and run from that,
but pickling has it's own issues.

To be clear, at the time, I think the goal of the pickling was to
distribute the dag to distributed workers, not freeze it in time. I think
that storing the pickled dag in the dagrun could probably solve this, but
it is a major issue/change. It is one that I am beginning to work on for us
though.


Thanks,
Ben

*--*
*ben tallman* | *apigee
<http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Fwww.apigee.com%2F=5141814536306688=e558dca3-da0a-4d9f-c1b3-6cb9174fcb5e>*
 | m: +1.503.680.5709 | o: +1.503.608.7552 | twitter @anonymousmanage
<http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Ftwitter.com%2Fanonymousmanage=5141814536306688=e558dca3-da0a-4d9f-c1b3-6cb9174fcb5e>
 @apigee
<http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=https%3A%2F%2Ftwitter.com%2Fapigee=5141814536306688=e558dca3-da0a-4d9f-c1b3-6cb9174fcb5e>
<http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Fadapt.apigee.com%2F=5141814536306688=e558dca3-da0a-4d9f-c1b3-6cb9174fcb5e>

On Sat, Oct 15, 2016 at 11:35 AM, Boris Tyukin <bo...@boristyukin.com>
wrote:

> Hi Ben,
>
> is it to address the issue I just described yesterday "Issue with
> Dynamically created tasks in a DAG"?
>
> I was hoping someone can confirm this as a bug and if there is a JIRA to
> address that - otherwise I would be happy to open one. To me it is a pretty
> major issue and a very misleading one especially because Airflow's key
> feature is to generate/update DAGs programmatically
>


A question/poll on the TaskInstance data model...

2016-10-13 Thread Ben Tallman
I (and Apigee) would like to have the DAG Graph paint old DagRuns based on
the tasks (and ids) that ran, and not based off of the current DAG from the
DagBag. In order to do that, I need to be able to map a DagRun, and one way
is from the TaskInstance table. However that doesn't actually contain links
between tasks, but it could.

Does anyone feel strongly against storing upstream and downstream task_ids
in the taskinstance table as a first step? Our goal is NOT to be able to
rerun the past, but to be able to see the past (and provide links to the
taskinstance details and logs.

Thanks,
Ben

*--*
*ben tallman* | *apigee
<http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Fwww.apigee.com%2F=5141814536306688=0949e84f-a26c-4234-e814-b829991e14c9>*
 | m: +1.503.680.5709 | o: +1.503.608.7552 | twitter @anonymousmanage
<http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Ftwitter.com%2Fanonymousmanage=5141814536306688=0949e84f-a26c-4234-e814-b829991e14c9>
 @apigee
<http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=https%3A%2F%2Ftwitter.com%2Fapigee=5141814536306688=0949e84f-a26c-4234-e814-b829991e14c9>
<http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Fadapt.apigee.com%2F=5141814536306688=0949e84f-a26c-4234-e814-b829991e14c9>


New Scheduler process seems incompatible with sqlite...

2016-10-04 Thread Ben Tallman
When testing the scheduler locally, on master, using sqlite and sequential
executor, dag.sync_to_db never returns when querying the db...

orm_dag = session.query(
> DagModel).filter(DagModel.dag_id == dag.dag_id).first()


Any insights? My guess is that this is due to the new multi-process nature
of dag processing conflicting with sqlite?

Thanks,
Ben

*--*
*ben tallman* | *apigee
<http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Fwww.apigee.com%2F=5141814536306688=291525b6-758e-488a-f6fb-b8a6ee46e8a8>*
 | m: +1.503.680.5709 | o: +1.503.608.7552 | twitter @anonymousmanage
<http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Ftwitter.com%2Fanonymousmanage=5141814536306688=291525b6-758e-488a-f6fb-b8a6ee46e8a8>
 @apigee
<http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=https%3A%2F%2Ftwitter.com%2Fapigee=5141814536306688=291525b6-758e-488a-f6fb-b8a6ee46e8a8>
<http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Fadapt.apigee.com%2F=5141814536306688=291525b6-758e-488a-f6fb-b8a6ee46e8a8>


Re: Invitation: Airflow Contributors & Roadmapping Meeting @ Thu Oct 6, 2016 10am - 12pm (PDT) (gurer.kira...@airbnb.com)

2016-09-27 Thread Ben Tallman
I will be there...


Thanks,
Ben

*--*
*ben tallman* | *apigee
<http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Fwww.apigee.com%2F=5141814536306688=7c5aea8e-ca41-471c-c5ca-303aaa06c73e>*
 | m: +1.503.680.5709 | o: +1.503.608.7552 | twitter @anonymousmanage
<http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Ftwitter.com%2Fanonymousmanage=5141814536306688=7c5aea8e-ca41-471c-c5ca-303aaa06c73e>
 @apigee
<http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=https%3A%2F%2Ftwitter.com%2Fapigee=5141814536306688=7c5aea8e-ca41-471c-c5ca-303aaa06c73e>
<http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Fadapt.apigee.com%2F=5141814536306688=7c5aea8e-ca41-471c-c5ca-303aaa06c73e>

On Fri, Sep 16, 2016 at 3:53 PM, <gurer.kira...@airbnb.com.invalid> wrote:

> Never miss an appointment.
> Download the Google Calendar app.
> <https://goo.gl/czFEqm> <https://goo.gl/hxLBzR>
> more details »
> <https://www.google.com/calendar/event?action=VIEW=b2YxYjB2a3M2cDc0ZDVnMHYwMGZmMmFzN2sgZGV2QGFpcmZsb3cuaW5jdWJhdG9yLmFwYWNoZS5vcmc=MjQjZ3VyZXIua2lyYXRsaUBhaXJibmIuY29tMDQ5ZGNhMzI3OWI5NTJjNThjYTc3YTkzYzgwODc0Yjk0ZmI2NWUwOQ=America/Los_Angeles=en>
> Airflow Contributors & Roadmapping Meeting
> Hi all,
>
> Again it has been a while after our last meeting. Let's have another
> meeting to sync up!
>
> We are super happy to host all you folks at Airbnb(888 Brannan St 94103)
> on October 7th at 10:00am. Also we will have a webex session at
> https://airbnb.webex.com/meet/gurer.kiratli
> <https://www.google.com/url?q=https%3A%2F%2Fairbnb.webex.com%2Fmeet%2Fgurer.kiratli=D=2=AFQjCNFC3kEwm1Mu8gSE2gl7SlNkV5NMCg>.
>
>
> I will send this out as a Google Calendar but due to the fact that it goes
> thru the mail group I don't see your responses. If you are planning to come
> on please respond back to me with your first name, last name. And please
> try to arrive by 9:30 so we can check you and head to the meeting room. : )
>
> Here is the proposed agenda:
> 10:00am -10:45am PDT
> Contributors sync-up: progress and plan
> Release Schedule, Management
> 10:45am - 11:00am PDT
> Coffee Break
> 11:00am - 12:00pm PDT
> Roadmap discussion
> 12:00pm - 1:00pm PDT
> Lunch @ Airbnb
> Cheers,
>
> Gurer
>
> === * * * ===
> https://airbnb.webex.com/meet/gurer.kiratli
> <https://www.google.com/url?q=https%3A%2F%2Fairbnb.webex.com%2Fmeet%2Fgurer.kiratli=D=2=AFQjCNFC3kEwm1Mu8gSE2gl7SlNkV5NMCg>
>
> [WebEx: 0]
>
> *When*
> Thu Oct 6, 2016 10am – 12pm Pacific Time
>
> *Where*
> Airbnb HQ, 888 Brannan St, San Francisco, CA 94103, USA (map
> <https://maps.google.com/maps?q=Airbnb+HQ,+888+Brannan+St,+San+Francisco,+CA+94103,+USA=en>
> )
>
> *Calendar*
> gurer.kira...@airbnb.com
>
> *Who*
> •
> gurer.kira...@airbnb.com - organizer
> •
> dev@airflow.incubator.apache.org
>
> Going?   *Yes
> <https://www.google.com/calendar/event?action=RESPOND=b2YxYjB2a3M2cDc0ZDVnMHYwMGZmMmFzN2sgZGV2QGFpcmZsb3cuaW5jdWJhdG9yLmFwYWNoZS5vcmc=1=MjQjZ3VyZXIua2lyYXRsaUBhaXJibmIuY29tMDQ5ZGNhMzI3OWI5NTJjNThjYTc3YTkzYzgwODc0Yjk0ZmI2NWUwOQ=America/Los_Angeles=en>
> - Maybe
> <https://www.google.com/calendar/event?action=RESPOND=b2YxYjB2a3M2cDc0ZDVnMHYwMGZmMmFzN2sgZGV2QGFpcmZsb3cuaW5jdWJhdG9yLmFwYWNoZS5vcmc=3=MjQjZ3VyZXIua2lyYXRsaUBhaXJibmIuY29tMDQ5ZGNhMzI3OWI5NTJjNThjYTc3YTkzYzgwODc0Yjk0ZmI2NWUwOQ=America/Los_Angeles=en>
> - No
> <https://www.google.com/calendar/event?action=RESPOND=b2YxYjB2a3M2cDc0ZDVnMHYwMGZmMmFzN2sgZGV2QGFpcmZsb3cuaW5jdWJhdG9yLmFwYWNoZS5vcmc=2=MjQjZ3VyZXIua2lyYXRsaUBhaXJibmIuY29tMDQ5ZGNhMzI3OWI5NTJjNThjYTc3YTkzYzgwODc0Yjk0ZmI2NWUwOQ=America/Los_Angeles=en>*
> more options »
> <https://www.google.com/calendar/event?action=VIEW=b2YxYjB2a3M2cDc0ZDVnMHYwMGZmMmFzN2sgZGV2QGFpcmZsb3cuaW5jdWJhdG9yLmFwYWNoZS5vcmc=MjQjZ3VyZXIua2lyYXRsaUBhaXJibmIuY29tMDQ5ZGNhMzI3OWI5NTJjNThjYTc3YTkzYzgwODc0Yjk0ZmI2NWUwOQ=America/Los_Angeles=en>
>
> Invitation from Google Calendar <https://www.google.com/calendar/>
>
> You are receiving this courtesy email at the account
> dev@airflow.incubator.apache.org because you are an attendee of this
> event.
>
> To stop receiving future updates for this event, decline this event.
> Alternatively you can sign up for a Google account at
> https://www.google.com/calendar/ and control your notification settings
> for your entire calendar.
>
> Forwarding this invitation could allow any recipient to modify your RSVP
> response. Learn More
> <https://support.google.com/calendar/answer/37135#forwarding>.
>


Travis CI...

2016-09-14 Thread Ben Tallman
I noticed on a pull request (and on my fork) that Travis CI doesn't seem
very reliable. I get a fail on at least one of the 6 builds almost every
time, and even if I change nothing, it seems to succeed/fail on another of
the 6 each time, unrelated to my code...

Has anyone else seen this?

Thanks,
Ben

*--*
*ben tallman* | *apigee
<http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Fwww.apigee.com%2F=5141814536306688=6af5cfc8-c90e-4e34-9b50-01af7a111a7e>*
 | m: +1.503.680.5709 | o: +1.503.608.7552 | twitter @anonymousmanage
<http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Ftwitter.com%2Fanonymousmanage=5141814536306688=6af5cfc8-c90e-4e34-9b50-01af7a111a7e>
 @apigee
<http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=https%3A%2F%2Ftwitter.com%2Fapigee=5141814536306688=6af5cfc8-c90e-4e34-9b50-01af7a111a7e>
<http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Fadapt.apigee.com%2F=5141814536306688=6af5cfc8-c90e-4e34-9b50-01af7a111a7e>


Re: Dynamically defining tasks in a DAG -- HOW?

2016-09-08 Thread Ben Tallman
We have done this a lot, and the one issue is that every time the DAG is
evaluated (even during a run), the SQL will be re-run, and tasks can vary.
In fact, we had a select statement that actually marked items as in process
during select, and THAT was bad.

We have moved to x number of tasks, and each one grabs a line from the DB,
and 0 to n of them can actually get skipped if they don't get a line from
the DB.

To be clear, we would really like the DAG's tasks to be frozen at time of
schedule, but that has not been our experience, and I believe will take a
fairly major re-factor. Furthermore, I believe that the definition of a
Dynamic Acyclic Graph is that it is re-evaluated during runtime and that
the path is non-determinate at runtime.


Thanks,
Ben

*--*
*ben tallman* | *apigee
<http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Fwww.apigee.com%2F=5141814536306688=999a610c-8298-4095-eefd-dfab06b90c1f>*
 | m: +1.503.680.5709 | o: +1.503.608.7552 | twitter @anonymousmanage
<http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Ftwitter.com%2Fanonymousmanage=5141814536306688=999a610c-8298-4095-eefd-dfab06b90c1f>
 @apigee
<http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=https%3A%2F%2Ftwitter.com%2Fapigee=5141814536306688=999a610c-8298-4095-eefd-dfab06b90c1f>
<http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Fadapt.apigee.com%2F=5141814536306688=999a610c-8298-4095-eefd-dfab06b90c1f>

On Thu, Sep 8, 2016 at 1:50 PM, J C Lawrence <c...@kanga.nu> wrote:

> I have a few hundred thousand files arriving from an external service
> each day and would like to ETL their contents into my store with
> Airflow.  As the files are large and numerous and slow to process, I'd
> also like to process them in parallel...so I thought something like
> this:
>
> def sub_dag (
> parent_dag_name,
> child_dag_name,
> start_date,
> schedule_interval):
>   dag = DAG(
> "%s.%s" % (parent_dag_name, child_dag_name),
> schedule_interval = schedule_interval,
> start_date = start_date,
>   )
>   fan_out = operators.DummyOperator(
> task_id = "fan_out",
> dag = dag,
>   )
>   fan_in = operators.DummyOperator(
> task_id = "fan_in",
> dag = dag,
>   )
>   cur = hooks.PostgresHook ("MY_DB").get_cursor ()
>   cur.execute ("""SELECT file_id
>  FROM some_table
>  WHERE something;""".format (foo = func(start_date))
>   for rec in cur:
> fid = rec[0]
> o = operators.PythonOperator (
>   task_id = "ImportThing__%s" % fid,
>   provide_context = True,
>   python_callable = import_func,
>   params = {"file_id": fid,},
>   dag = dag)
> o.set_upstream (fan_out)
> o.set_downstream (fan_in)
>   cur.close ()
>   return dag
>
> The idea being that the number and identity of the tasks in the sub-DAG
> would vary dynamically depending on what day it was running for (ie
> which what rows come back from the query for that day). But...no, this
> doesn't seem to work.
>
> Any recommendations for how to approach this?
>
> -- JCL
>


Re: Speaking at AWS Loft on Wednesday...

2016-09-06 Thread Ben Tallman
Sorry, no video. I will also be speaking to the Data Warehouse team at
Optimizely on the 21st about our process, and really digging in to the
Airflow stuff.


Thanks,
Ben

*--*
*ben tallman* | *apigee
<http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Fwww.apigee.com%2F=5141814536306688=6339792d-16c4-439c-f3d4-3b53f7945039>*
 | m: +1.503.680.5709 | o: +1.503.608.7552 | twitter @anonymousmanage
<http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Ftwitter.com%2Fanonymousmanage=5141814536306688=6339792d-16c4-439c-f3d4-3b53f7945039>
 @apigee
<http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=https%3A%2F%2Ftwitter.com%2Fapigee=5141814536306688=6339792d-16c4-439c-f3d4-3b53f7945039>
<http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Fadapt.apigee.com%2F=5141814536306688=6339792d-16c4-439c-f3d4-3b53f7945039>

On Tue, Sep 6, 2016 at 9:13 PM, siddharth anand <san...@apache.org> wrote:

> Ben,
> Did you get video for this talk?
> -s
>
> On Thu, Sep 1, 2016 at 10:04 AM, siddharth anand <san...@apache.org>
> wrote:
>
> > Nice. Please feel free to add a link to our Links wiki if you have slides
> > or Video, we'd be happy to retweet as well. We will do the same when you
> > give the meetup talk as well.
> > -s
> >
> > On Mon, Aug 29, 2016 at 2:09 PM, Ben Tallman <b...@apigee.com> wrote:
> >
> >> I thought I would let people know that I'm speaking at the AWS Popup
> loft
> >> on Wednesday at 2:00 discussing how RDS powers (and empowers) our
> stack...
> >> Planning on mentioning and discussing Apache Airflow in that stack.
> >>
> >> https://aws.amazon.com/start-ups/loft/sf-loft/ - August 31st.
> >>
> >> Customer Reference: Ben Tallman, an Apigeek at Apigee, will discuss
> >> Apigee's  Data Warehouse and Business Performance Management,
> specifically
> >> how leveraging Apigee Edge, RDS Postgres, Apache Airflow and Periscope
> >> Data
> >> has allowed them to "free the data" and become a more agile and data
> >> driven
> >> company.
> >>
> >> About the Speaker: Ben Tallman is both a Developer and an IT
> Professional,
> >> a Serial Entrepreneur, and of course, a Change Agent. When he isn't
> >> chasing
> >> his kids, cooking food or solving business problems, he is thinking
> about
> >> business, data, systems and reduced complexity.
> >>
> >> Thanks,
> >> Ben
> >>
> >> *--*
> >> *ben tallman* | *apigee
> >> <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x
> >> 6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=
> >> http%3A%2F%2Fwww.apigee.com%2F=5141814536306688=
> >> f5deb95f-ce71-4339-8f52-88bf448cc65c>*
> >>  | m: +1.503.680.5709 | o: +1.503.608.7552 | twitter @anonymousmanage
> >> <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x
> >> 6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=
> >> http%3A%2F%2Ftwitter.com%2Fanonymousmanage=51418145363066
> >> 88=f5deb95f-ce71-4339-8f52-88bf448cc65c>
> >>  @apigee
> >> <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x
> >> 6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=
> >> https%3A%2F%2Ftwitter.com%2Fapigee=5141814536306688&
> >> pi=f5deb95f-ce71-4339-8f52-88bf448cc65c>
> >>
> >
> >
>


Spam in the Confluence wiki...

2016-08-29 Thread Ben Tallman
Is there a limit on who get's to edit/add pages? Seems like someone is
putting TV listings into the pages...

Thanks,
Ben

*--*
*ben tallman* | *apigee
<http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Fwww.apigee.com%2F=5141814536306688=6d9066cf-12cf-4a10-f84d-1e3d47c29bfd>*
 | m: +1.503.680.5709 | o: +1.503.608.7552 | twitter @anonymousmanage
<http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Ftwitter.com%2Fanonymousmanage=5141814536306688=6d9066cf-12cf-4a10-f84d-1e3d47c29bfd>
 @apigee
<http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=https%3A%2F%2Ftwitter.com%2Fapigee=5141814536306688=6d9066cf-12cf-4a10-f84d-1e3d47c29bfd>
<http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Fadapt.apigee.com%2F=5141814536306688=6d9066cf-12cf-4a10-f84d-1e3d47c29bfd>


Re: Meetup in SF: 2016-09-21

2016-08-16 Thread Ben Tallman
Happy to take the extra time to facilitate a talk about "features" and 
"issues"...

Get Outlook for iOS




On Tue, Aug 16, 2016 at 1:28 PM -0700, "siddharth anand"  
wrote:










Great!
I just tweeted it via our ApacheAirflow twitter account! 



-s
On Tue, Aug 16, 2016 at 11:26 AM, Jeff Balogh  
wrote:
Stripe is hosting the next Airflow Meetup on Wednesday, September 21st

in San Francisco. You can sign up here:



http://www.meetup.com/Bay-Area-Apache-Airflow-Incubating-Meetup/events/233316814/



We're still looking for one more speaker to give a 15 minute talk. Let

me know if you'd like to volunteer.



Cheers,

jeff










Re: Airflow Meetup at Stripe in September?

2016-08-11 Thread Ben Tallman
Happy to give a talk about Apigee's use case as well...


Thanks,
Ben

*--*
*ben tallman* | *apigee
<http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Fwww.apigee.com%2F=5141814536306688=de892aba-bd1f-4c0e-948f-454ffe35b613>*
 | m: +1.503.680.5709 | o: +1.503.608.7552 | twitter @anonymousmanage
<http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Ftwitter.com%2Fanonymousmanage=5141814536306688=de892aba-bd1f-4c0e-948f-454ffe35b613>
 @apigee
<http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=https%3A%2F%2Ftwitter.com%2Fapigee=5141814536306688=de892aba-bd1f-4c0e-948f-454ffe35b613>


On Wed, Aug 10, 2016 at 11:20 PM, siddharth anand <san...@apache.org> wrote:

> Jeff,
> Any updates on this?
>
> Sumit,
> You are also welcome to host a meet-up in your neck of the woods and some
> of us can present remotely. Presenting this option as not an "either-or",
> but an "and" - you are welcome to present remotely here and host in your
> home country.
>
> -s
>
> On Thu, Aug 4, 2016 at 10:56 PM, Sumit Maheshwari <sumeet.ma...@gmail.com>
> wrote:
>
> > Hi Jeff,
> >
> > If possible I would also love to give a small talk remotely.
> >
> >
> > Thanks,
> > Sumit
> >
> > On Fri, Aug 5, 2016 at 10:39 AM, siddharth anand <san...@apache.org>
> > wrote:
> >
> > > You can also ask on this list if there are folks interested in
> speaking.
> > > Also, should you folks ever want to speak in other meet-ups too, it's
> > fully
> > > welcome.
> > >
> > > -s
> > >
> > > On Thu, Aug 4, 2016 at 7:39 PM, Siddharth Anand
> <san...@agari.com.invalid
> > >
> > > wrote:
> > >
> > > > Jeff,
> > > > That sounds great.  We encourage any and all in the community to host
> > > > Airflow meet-ups.
> > > >
> > > > We recommend the following meet-up format:
> > > > https://cwiki.apache.org/confluence/display/AIRFLOW/Meetups
> > > >
> > > > Though the format is a suggestion, we do prefer you shoot for 1 host
> > > > speaker and 2 guest speakers.
> > > >
> > > > When you are ready (i.e. confirmed with our organization of the date,
> > > time,
> > > > venue, and 3 speakers), shoot us a note on this list and one of the
> > > > committers will set up the Meet-up and tweet/announce the meet-up
> using
> > > the
> > > > Airflow twitter channel. We also suggest, as mentioned on the page,
> > that
> > > > you stream and record the video for posterity.
> > > >
> > > > -s
> > > >
> > > > On Thu, Aug 4, 2016 at 6:40 PM, Jeff Balogh <jbal...@stripe.com>
> > wrote:
> > > >
> > > > > Hey y'all, we're heavy users of Airflow at Stripe and we'd be happy
> > to
> > > > > host a meetup in September. Our office is in San Francisco down by
> > > > > AT park. How do we make that happen?
> > > > >
> > > >
> > >
> >
>


Celery Executor Question...

2016-07-19 Thread Ben Tallman
The doc section on Celery essentially points to the celery site for config,
however the celery_result_backend setting seems to be skipped, as a
result...

Any insights?

Thanks,
Ben

*--*
*ben tallman* | *apigee
<http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Fwww.apigee.com%2F=5141814536306688=d4310b3d-5ab1-470f-8c18-e74f56a84b9a>*
 | m: +1.503.680.5709 | o: +1.503.608.7552 | twitter @anonymousmanage
<http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Ftwitter.com%2Fanonymousmanage=5141814536306688=d4310b3d-5ab1-470f-8c18-e74f56a84b9a>
 @apigee
<http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=https%3A%2F%2Ftwitter.com%2Fapigee=5141814536306688=d4310b3d-5ab1-470f-8c18-e74f56a84b9a>