Re: 1.9.0alpha1 published

2017-10-13 Thread Alex Guziel
AIRFLOW-976 should be marked resolved. It is fixed by
https://github.com/apache/incubator-airflow/commit/b2e1753f5b74ad1b6e0889f7b784ce69623c95ce
(pardon my commit message), which is in v1.9.

On Fri, Oct 13, 2017 at 11:52 AM, Chris Riccomini 
wrote:

> Hey all,
>
> I have cut a 1.9.0alpha1 release of Airflow. You can download it here:
>
>   https://dist.apache.org/repos/dist/dev/incubator/airflow/1.9.0alpha1/
>
> The bin tarball can be installed with pip:
>
>   pip install apache-airflow-1.9.0alpha1+incubating-bin.tar.gz
>
> The goal is to have the community install and run this to expose any bugs
> before we move on to official release candidates.
>
> Here are the outstanding blocker bugs for 1.9.0:
>
>   AIRFLOW-1525 |Improvement |Fix minor LICENSE & NOTICE issue
>   AIRFLOW-1258 |Bug |TaskInstances within SubDagOperator are marked
> as
>   AIRFLOW-1055 |Bug |airflow/jobs.py:create_dag_run() exception
> for
> @on
>   AIRFLOW-1018 |Bug |Scheduler DAG processes can not log to stdout
>   AIRFLOW-1013 |Bug |airflow/jobs.py:manage_slas() exception for
> @once
>   AIRFLOW-976  |Bug |Mark success running task causes it to fail
>
> Note: it appears that none of the blocker bugs have been closed since
> alpha1. Are these truly blocker bugs, or should they be bumped to 1.10.0?
>
> Cheers,
> Chris
>


Re: Meetup Interest?

2017-10-13 Thread Daniel Imberman (BLOOMBERG/ SAN FRAN)
+1

We're getting really close on the Kubernetes Executor PR. Would love to discuss 
final features/architecture to make sure we cover our bases before we try to 
roll out alpha.


From: mw...@newrelic.com 
Subject: Re: Meetup Interest?

+1 for this meetup idea! We don't use Kube+Airflow, but I'd love to see talks 
on scaling it out team-wise and some design patterns people have come up with.

--
Marc Weil | Lead Engineer | Growth Automation, Marketing, and Engagement | New 
Relic 
On Fri, Oct 13, 2017 at 1:03 PM, Christopher Bockman  
wrote:

+1 as a vote.

We're very actively working on Kube+Airflow, so would be particularly
interested on discussions there.

On Fri, Oct 13, 2017 at 12:59 PM, Joy Gao  wrote:

> Hi Dan,
>
> I'd be happy to give an update on progress of the new RBAC UI we've been
> working on here at WePay.
>
> Cheers,
> Joy
>
> On Fri, Oct 13, 2017 at 12:10 PM, Dan Davydov <
> dan.davy...@airbnb.com.invalid> wrote:
>
> > Is there interest in doing an Airflow meet-up? Airbnb can host one in San
> > Francisco.
> >
> > Some talk ideas can include the progress on Kubernetes integration and
> > Scaling & Operations with Airflow. If you want to see other topics
> covered,
> > feel free to suggest them!
> >
>




Re: Meetup Interest?

2017-10-13 Thread Andy Hadjigeorgiou
+1 for meetup, although I'm in nyc... would love to hear about best
practices (file structure, development & deployment practices).

On Fri, Oct 13, 2017 at 4:26 PM, Marc Weil  wrote:

> +1 for this meetup idea! We don't use Kube+Airflow, but I'd love to see
> talks on scaling it out team-wise and some design patterns people have come
> up with.
>
> --
> Marc Weil | Lead Engineer | Growth Automation, Marketing, and Engagement |
> New Relic
>
> On Fri, Oct 13, 2017 at 1:03 PM, Christopher Bockman <
> ch...@fathomhealth.co>
> wrote:
>
> > +1 as a vote.
> >
> > We're very actively working on Kube+Airflow, so would be particularly
> > interested on discussions there.
> >
> > On Fri, Oct 13, 2017 at 12:59 PM, Joy Gao  wrote:
> >
> > > Hi Dan,
> > >
> > > I'd be happy to give an update on progress of the new RBAC UI we've
> been
> > > working on here at WePay.
> > >
> > > Cheers,
> > > Joy
> > >
> > > On Fri, Oct 13, 2017 at 12:10 PM, Dan Davydov <
> > > dan.davy...@airbnb.com.invalid> wrote:
> > >
> > > > Is there interest in doing an Airflow meet-up? Airbnb can host one in
> > San
> > > > Francisco.
> > > >
> > > > Some talk ideas can include the progress on Kubernetes integration
> and
> > > > Scaling & Operations with Airflow. If you want to see other topics
> > > covered,
> > > > feel free to suggest them!
> > > >
> > >
> >
>


Re: Meetup Interest?

2017-10-13 Thread Marc Weil
+1 for this meetup idea! We don't use Kube+Airflow, but I'd love to see
talks on scaling it out team-wise and some design patterns people have come
up with.

--
Marc Weil | Lead Engineer | Growth Automation, Marketing, and Engagement |
New Relic

On Fri, Oct 13, 2017 at 1:03 PM, Christopher Bockman 
wrote:

> +1 as a vote.
>
> We're very actively working on Kube+Airflow, so would be particularly
> interested on discussions there.
>
> On Fri, Oct 13, 2017 at 12:59 PM, Joy Gao  wrote:
>
> > Hi Dan,
> >
> > I'd be happy to give an update on progress of the new RBAC UI we've been
> > working on here at WePay.
> >
> > Cheers,
> > Joy
> >
> > On Fri, Oct 13, 2017 at 12:10 PM, Dan Davydov <
> > dan.davy...@airbnb.com.invalid> wrote:
> >
> > > Is there interest in doing an Airflow meet-up? Airbnb can host one in
> San
> > > Francisco.
> > >
> > > Some talk ideas can include the progress on Kubernetes integration and
> > > Scaling & Operations with Airflow. If you want to see other topics
> > covered,
> > > feel free to suggest them!
> > >
> >
>


Re: Question about skipping, state propagation, and trigger rules.

2017-10-13 Thread Alek Storm
Hi Daniel,

If you don’t find it too unwieldy, the following should satisfy your use
case. It basically converts what I assume was your use of
ShortCircuitOperator into a BranchPythonOperator with a dummy task. Try
running it with the environment variable FOO_SKIP=true to see the other
possible path. Let me know if you have any questions.

from datetime import datetimeimport os
from airflow.models import DAGfrom airflow.operators.bash_operator
import BashOperatorfrom airflow.operators.python_operator import
BranchPythonOperatorfrom airflow.operators.dummy_operator import
DummyOperator

default_args = {
'owner': 'airflow',
'start_date': datetime(2017, 1, 1),
'concurrency': 16,
}
with DAG('foo', default_args=default_args, schedule_interval='@once') as dag:
sensor_dataA = BranchPythonOperator(
task_id='sensor_dataA',
python_callable=lambda: 'skipA' if os.environ.get('FOO_SKIP',
'false') == 'true' else 'preprocess_and_stage_dataA')
sensor_dataB = BranchPythonOperator(
task_id='sensor_dataB',
python_callable=lambda: 'skipB' if os.environ.get('FOO_SKIP',
'false') == 'true' else 'preprocess_and_stage_dataB')
sensor_dataC = BranchPythonOperator(
task_id='sensor_dataC',
python_callable=lambda: 'skipC' if os.environ.get('FOO_SKIP',
'false') == 'true' else 'preprocess_and_stage_dataC')

preprocess_and_stage_dataA = BashOperator(
task_id='preprocess_and_stage_dataA',
bash_command='echo {{ti.task_id}}')
preprocess_and_stage_dataB = BashOperator(
task_id='preprocess_and_stage_dataB',
bash_command='echo {{ti.task_id}}')
preprocess_and_stage_dataC = BashOperator(
task_id='preprocess_and_stage_dataC',
bash_command='echo {{ti.task_id}}')

skipA = DummyOperator(
task_id='skipA')
skipB = DummyOperator(
task_id='skipB')
skipC = DummyOperator(
task_id='skipC')

joinA = DummyOperator(
task_id='joinA',
trigger_rule='one_success')
joinB = DummyOperator(
task_id='joinB',
trigger_rule='one_success')
joinC = DummyOperator(
task_id='joinC',
trigger_rule='one_success')

process_stagesABC = BashOperator(
task_id='process_stagesABC',
bash_command='echo {{ti.task_id}}')

cleanup = BashOperator(
task_id='cleanup',
bash_command='echo {{ti.task_id}}')

sensor_dataA >> preprocess_and_stage_dataA
sensor_dataA >> skipA
sensor_dataB >> preprocess_and_stage_dataB
sensor_dataB >> skipB
sensor_dataC >> preprocess_and_stage_dataC
sensor_dataC >> skipC

preprocess_and_stage_dataA >> joinA
skipA >> joinA
preprocess_and_stage_dataB >> joinB
skipB >> joinB
preprocess_and_stage_dataC >> joinC
skipC >> joinC

joinA >> process_stagesABC
joinB >> process_stagesABC
joinC >> process_stagesABC

process_stagesABC >> cleanup

Best,
Alek
​

On Thu, Oct 12, 2017 at 1:43 AM, Daniel Lamblin [Data Science & Platform
Center]  wrote:

> I hope this is an alright place to ask the following:
> In a case where some inputs will irregularly be missing, but where it's
> okay, I was reading
> https://airflow.incubator.apache.org/concepts.html#trigger-rules
> and I thought I needed `all_done` for a final task, but a skip is not a
> done state, nor does it (seem to) propagate.
> Is there a way to trigger something after all upstreams are either
> successful or skipped?
>
> My case looks a little like:
> sensor_dataA >> preprocess_and_stage_dataA >> process_stagesABC >> clean_up
> sensor_dataB >> preprocess_and_stage_dataB >> process_stagesABC
> sensor_dataC >> preprocess_and_stage_dataC >> process_stagesABC
>
> I don't want the preprocess to fail because the data isn't there and there
> will be side-effects, but if the sensor skips its associated
> preprocess_and_stage is not queued. The task doesn't seem to have any state
> (like `upstream_skipped`?) so process_stagesABC won't be triggered by
> `all_done`. `one_success` seems like it would be prefect except that it
> would start before all preprocess tasks have been either run or skipped.
>
> Am I missing a way that this can be done? Is there some general guide to
> changing the DAG structure that would handle completing the process? Am I
> supposed to be using XCOM here?
>
> If all these answers are "no/maybe" then is there some opportunity to
> introduce an `upstream_skipped` state or a different `trigger_rule`... a
> kludgy `SkipAheadOperator`, or something?
>
> Thanks,
> -Daniel Lamblin
>


Re: Meetup Interest?

2017-10-13 Thread Christopher Bockman
+1 as a vote.

We're very actively working on Kube+Airflow, so would be particularly
interested on discussions there.

On Fri, Oct 13, 2017 at 12:59 PM, Joy Gao  wrote:

> Hi Dan,
>
> I'd be happy to give an update on progress of the new RBAC UI we've been
> working on here at WePay.
>
> Cheers,
> Joy
>
> On Fri, Oct 13, 2017 at 12:10 PM, Dan Davydov <
> dan.davy...@airbnb.com.invalid> wrote:
>
> > Is there interest in doing an Airflow meet-up? Airbnb can host one in San
> > Francisco.
> >
> > Some talk ideas can include the progress on Kubernetes integration and
> > Scaling & Operations with Airflow. If you want to see other topics
> covered,
> > feel free to suggest them!
> >
>


Re: Meetup Interest?

2017-10-13 Thread Joy Gao
Hi Dan,

I'd be happy to give an update on progress of the new RBAC UI we've been
working on here at WePay.

Cheers,
Joy

On Fri, Oct 13, 2017 at 12:10 PM, Dan Davydov <
dan.davy...@airbnb.com.invalid> wrote:

> Is there interest in doing an Airflow meet-up? Airbnb can host one in San
> Francisco.
>
> Some talk ideas can include the progress on Kubernetes integration and
> Scaling & Operations with Airflow. If you want to see other topics covered,
> feel free to suggest them!
>


Meetup Interest?

2017-10-13 Thread Dan Davydov
Is there interest in doing an Airflow meet-up? Airbnb can host one in San
Francisco.

Some talk ideas can include the progress on Kubernetes integration and
Scaling & Operations with Airflow. If you want to see other topics covered,
feel free to suggest them!


1.9.0alpha1 published

2017-10-13 Thread Chris Riccomini
Hey all,

I have cut a 1.9.0alpha1 release of Airflow. You can download it here:

  https://dist.apache.org/repos/dist/dev/incubator/airflow/1.9.0alpha1/

The bin tarball can be installed with pip:

  pip install apache-airflow-1.9.0alpha1+incubating-bin.tar.gz

The goal is to have the community install and run this to expose any bugs
before we move on to official release candidates.

Here are the outstanding blocker bugs for 1.9.0:

  AIRFLOW-1525 |Improvement |Fix minor LICENSE & NOTICE issue
  AIRFLOW-1258 |Bug |TaskInstances within SubDagOperator are marked
as
  AIRFLOW-1055 |Bug |airflow/jobs.py:create_dag_run() exception for
@on
  AIRFLOW-1018 |Bug |Scheduler DAG processes can not log to stdout
  AIRFLOW-1013 |Bug |airflow/jobs.py:manage_slas() exception for
@once
  AIRFLOW-976  |Bug |Mark success running task causes it to fail

Note: it appears that none of the blocker bugs have been closed since
alpha1. Are these truly blocker bugs, or should they be bumped to 1.10.0?

Cheers,
Chris


Return results optionally from spark_sql_hook

2017-10-13 Thread Boris
hi guys,

I opened JIRA on this and will be working on PR
https://issues.apache.org/jira/browse/AIRFLOW-1713

any objections/suggestions conceptually?

Fokko, I see you have been actively contributing to spark hooks and
operators so I could use your opinion before I implement this.

Boris


Indefinitely Queued Tasks

2017-10-13 Thread Daniel Lamblin [Data Science & Platform Center]
Let me know if you really do need emails entirely in plaintext.

It seems there was a fix AIRFLOW-1074 which prevents tasks which a worker
rejected from getting orphaned and queued indefinitely. Here's a link to
the commit

I could find.
That fix reads to me as though it was meant to address the FIXME in models

and
possibly also the FIXME in jobs

.

Those messages weren't removed and I'm investigating one case of a job that
logged these in my cluster, though we're running 1.8.2.

Do I misunderstand the fix, was it considered partial or incomplete? Is
there an archive of the discussion around it (I only found one email about
someone being told his issue would be fixed in 1.8.2) what would have to be
done further to fix and remove the FIXME?

Thanks!
-- 
-Daniel Lamblin