Airflow talk at Munich Data Engineering Meetup

2018-07-23 Thread Stefan Seelmann
Hi all,

I'll give a talk about Airflow at the next Data Engineering Meetup in
Munich (Germany) on next Thursday the 26th. Maybe some folks from the
Munich area are interested. Details at [1].

Kind Regards,
Stefan

[1] https://www.meetup.com/data-engineering-munich/events/252170998/


Re: Sep Airflow Bay Area Meetup @ Google

2018-07-23 Thread Chris Riccomini
@Feng Lu   apparently you're listed as an EVENT
ORGANIZER on the group. I believe that should allow you to create meetups.
If not, can you let me know?

On Sun, Jul 22, 2018 at 12:36 PM Ben Gregory  wrote:

> Will do Feng!
>
> Also - is there an approximate date we'll know if the hackathon is going to
> happen? Want to make sure we can get a good attendance internally.
>
> Looking forward to it!
>
> On Sat, Jul 21, 2018 at 10:44 AM Feng Lu  wrote:
>
> > Sounds great, thank you Ben.
> > When you get a chance, could you please send me your talk
> > title/abstract/session type(regular or lightening)?
> >
> > On Fri, Jul 20, 2018 at 2:10 PM Ben Gregory  wrote:
> >
> >> Hey Feng!
> >>
> >> Awesome to hear that you're hosting the next meetup! We'd love to give a
> >> talk (and potentially a lightning session if available) -- we have a
> number
> >> of topics we could speak on but off the top of our heads we're thinking
> >> "Running Cloud Native Airflow", tying in some of our work on the
> Kubernetes
> >> Executor. How does that sound?
> >>
> >> Also, if there ends up being an Airflow hackathon, you can absolutely
> >> count us in. Let us know how we can help coordinate if the need presents
> >> itself!
> >>
> >> -Ben
> >>
> >> On Thu, Jul 19, 2018 at 3:26 PM Feng Lu 
> >> wrote:
> >>
> >>> Hi all,
> >>>
> >>> Hope you are enjoying your summer!
> >>>
> >>> This is Feng Lu from Google and we'll host the next Airflow meetup in
> >>> our Sunnyvale
> >>> campus . We plan to add
> >>> a *lightening
> >>> session* this time for people to share their airflow ideas, work in
> >>> progress, pain points, etc.
> >>> Here's the meetup date and schedule:
> >>>
> >>> -- Sep 24 (Monday)  --
> >>> 6:00PM meetup starts
> >>> 6:00 - 8:00PM light dinner /mix-n-mingle
> >>> 8:00PM - 9:40PM: 5 sessions (20 minutes each)
> >>> 9:40PM to 10:10PM: 6 lightening sessions (5 minutes each)
> >>> 10:10PM to 11:00PM: drinks and social hour
> >>>
> >>> I've seen a lot of interesting discussions in the dev mailing-list on
> >>> security, scalability, event interactions, future directions, hosting
> >>> platform and others. Please feel free to send your talk proposal to us
> by
> >>> replying this email.
> >>>
> >>> The Cloud Composer team is also going to share their experience running
> >>> Apache Airflow as a managed solution and service roadmap.
> >>>
> >>> Thank you and looking forward to hearing from y'all soon!
> >>>
> >>> p.s., if folks are interested, we can also add a one-day Airflow
> >>> hackathon
> >>> prior to the meet-up on the same day, please let us know.
> >>>
> >>> Feng
> >>>
> >>
> >>
> >> --
> >>
> >> [image: Astronomer Logo] 
> >>
> >> *Ben Gregory*
> >> Data Engineer
> >>
> >> Mobile: +1-615-483-3653 • Online: astronomer.io
> >> 
> >>
> >> Download our new ebook.  From
> >> Volume to Value - A Guide to Data Engineering.
> >>
> >
>
> --
>
> [image: Astronomer Logo] 
>
> *Ben Gregory*
> Data Engineer
>
> Mobile: +1-615-483-3653 • Online: astronomer.io <
> https://www.astronomer.io/>
>
> Download our new ebook.  From
> Volume
> to Value - A Guide to Data Engineering.
>


Re: Catchup By default = False vs LatestOnlyOperator

2018-07-23 Thread George Leslie-Waksman
Ok, not so fringe; I'm glad it's working well for your use case, James.

I retract my suggestion of deprecation.

On Mon, Jul 23, 2018 at 12:58 PM James Meickle
 wrote:

> We use LatestOnlyOperator in production. Generally our data is available on
> a regular schedule, and we update production services with it as soon as it
> is available; we might occasionally want to re-run historical days, in
> which case we want to run the same DAG but without interacting with live
> production services at all.
>
> On Mon, Jul 23, 2018 at 2:18 PM, George Leslie-Waksman 
> wrote:
>
> > As the author of LatestOnlyOperator, the goal was as a stopgap until
> > catchup=False landed.
> >
> > There are some (very) fringe use cases where you might still want
> > LatestOnlyOperator but in almost all cases what you want is probably
> > catchup=False.
> >
> > The situations where LatestOnlyOperator is still useful are where you
> want
> > to run most of your DAG for every schedule interval but you want some of
> > the tasks to run only on the latest run (not catching up, not
> backfilling).
> >
> > It may be best to deprecate LatestOnlyOperator at this point to avoid
> > confusion.
> >
> > --George
> >
> > On Sat, Jul 21, 2018 at 7:34 PM Ben Tallman  wrote:
> >
> > > As the author of catch-up, the idea is that in many cases your data
> > > doesn't "window" nicely and you want instead to just run as if it were
> a
> > > brilliant Cron...
> > >
> > > Ben
> > >
> > > Sent from my iPhone
> > >
> > > > On Jul 20, 2018, at 11:39 PM, Shah Altaf  wrote:
> > > >
> > > > Hi my understanding is: if you use the LatestOnlyOperator then when
> you
> > > run
> > > > the DAG for the first time you'll see a whole bunch of DAG runs
> queued
> > > up,
> > > > and in each run the LatestOnlyOperator will cause the rest of the DAG
> > run
> > > > to be skipped.  Only the latest DAG will run in 'full'.
> > > >
> > > > With catchup = False, you should just get just the latest DAG run.
> > > >
> > > >
> > > > On Fri, Jul 20, 2018 at 10:58 PM Shubham Gupta <
> > > shubham180695...@gmail.com>
> > > > wrote:
> > > >
> > > >> -- Forwarded message -
> > > >> From: Shubham Gupta 
> > > >> Date: Fri, Jul 20, 2018 at 2:38 PM
> > > >> Subject: Catchup By default = False vs LatestOnlyOperator
> > > >> To: 
> > > >>
> > > >>
> > > >> Hi!
> > > >>
> > > >> Can someone please explain the difference b/w catchup by default =
> > False
> > > >> and LatestOnlyOperator?
> > > >>
> > > >> Regarding
> > > >> Shubham Gupta
> > > >>
> > >
> >
>


Re: Catchup By default = False vs LatestOnlyOperator

2018-07-23 Thread James Meickle
We use LatestOnlyOperator in production. Generally our data is available on
a regular schedule, and we update production services with it as soon as it
is available; we might occasionally want to re-run historical days, in
which case we want to run the same DAG but without interacting with live
production services at all.

On Mon, Jul 23, 2018 at 2:18 PM, George Leslie-Waksman 
wrote:

> As the author of LatestOnlyOperator, the goal was as a stopgap until
> catchup=False landed.
>
> There are some (very) fringe use cases where you might still want
> LatestOnlyOperator but in almost all cases what you want is probably
> catchup=False.
>
> The situations where LatestOnlyOperator is still useful are where you want
> to run most of your DAG for every schedule interval but you want some of
> the tasks to run only on the latest run (not catching up, not backfilling).
>
> It may be best to deprecate LatestOnlyOperator at this point to avoid
> confusion.
>
> --George
>
> On Sat, Jul 21, 2018 at 7:34 PM Ben Tallman  wrote:
>
> > As the author of catch-up, the idea is that in many cases your data
> > doesn't "window" nicely and you want instead to just run as if it were a
> > brilliant Cron...
> >
> > Ben
> >
> > Sent from my iPhone
> >
> > > On Jul 20, 2018, at 11:39 PM, Shah Altaf  wrote:
> > >
> > > Hi my understanding is: if you use the LatestOnlyOperator then when you
> > run
> > > the DAG for the first time you'll see a whole bunch of DAG runs queued
> > up,
> > > and in each run the LatestOnlyOperator will cause the rest of the DAG
> run
> > > to be skipped.  Only the latest DAG will run in 'full'.
> > >
> > > With catchup = False, you should just get just the latest DAG run.
> > >
> > >
> > > On Fri, Jul 20, 2018 at 10:58 PM Shubham Gupta <
> > shubham180695...@gmail.com>
> > > wrote:
> > >
> > >> -- Forwarded message -
> > >> From: Shubham Gupta 
> > >> Date: Fri, Jul 20, 2018 at 2:38 PM
> > >> Subject: Catchup By default = False vs LatestOnlyOperator
> > >> To: 
> > >>
> > >>
> > >> Hi!
> > >>
> > >> Can someone please explain the difference b/w catchup by default =
> False
> > >> and LatestOnlyOperator?
> > >>
> > >> Regarding
> > >> Shubham Gupta
> > >>
> >
>


Re: Simple DAG Structure

2018-07-23 Thread srinivas . ramabhadran
Andrew - 

   I guess I am not sure how the CheckOperator is implemented, but wouldn't it 
amount to the same thing i.e. unnecessary polling? I imagine some process is 
kicked off somewhere and repeatedly polls to check if A and B are both done 
writing their outcome. I do not want to convert what is essentially a time 
dependency (and what I consider to be in the purview of the scheduler) into 
some sort of polling solution. 

   I am looking for a solution that respects the time dependencies of A and B 
and only runs them at their specified time. C being a child of A and B will run 
only on successful completion of the two. No task (sensor, check or any other 
poller) ever runs outside of this schedule. The scheduler itself might poll but 
we are not launching new processes that mostly just sleep.

Ram.

On 2018/07/23 17:58:56, Andrew Maguire  wrote: 
> Maybe you could have A and B report their outcome somewhere and then use
> that output, read back in from somewhere, as a check operator in C.
> 
> This is kinda reinventing the wheel a little bit though as ideally would be
> a way to keep all that state inside airflow.
> 
> I think what I suggest would work, but maybe a little hackish.
> 
> On Mon, 23 Jul 2018, 14:33 srinivas.ramabhad...@gmail.com, <
> srinivas.ramabhad...@gmail.com> wrote:
> 
> > Carl -
> >
> >Thanks, that definitely works, but it's non-ideal. If I had 100s of
> > jobs running throughout the day, a TimeSensor task (process) gets created
> > for each task at midnight even though a task may not be required to run for
> > a very long time (e.g. a whole bunch of tasks need to run @ 20:00. All of
> > their time sensors are kicked off at 00:00). Worse still, if I used a
> > LocalExcecutor with a pool size of 10, some jobs that need to run early may
> > not even get scheduled in favor of time sensors for tasks later in the day
> > which only perform a sleep operation.
> >
> >Is there another way to do this? If not, is there at least another way
> > around the LocalExecutor problem?
> >
> > Ram.
> >
> >
> > On 2018/07/23 08:23:45, Carl Johan Gustavsson 
> > wrote:
> > > Hi Ram,
> > >
> > > You can have a single DAG scheduled to 10am, which starts A and then use
> > a TimeSensor set to 11 am that B depends on  and then have C depend on A
> > and B.
> > >
> > > Something like:
> > >
> > > a = BashOperator(‘a’, …)
> > >
> > > delay_b = TimeSensor(‘delay_b’, target_time=time(11, 0, 0), …)
> > > b = BashOperator(‘b’, …)
> > > b.set_upstream(delay_b)
> > >
> > > c = BashOperator(‘c’, …)
> > > c.set_upstream(a)
> > > c.set_upstream(b)
> > >
> > >
> > > / Carl Johan
> > > On 23 July 2018 at 02:18:00, srinivas.ramabhad...@gmail.com (
> > srinivas.ramabhad...@gmail.com) wrote:
> > >
> > > Hi -
> > >
> > > I have recently started using Airflow version 1.9.0 and am having some
> > difficulty setting up a very simple DAG. I have three tasks A, B and C. I'd
> > like A to run every day at 10am and B at 11am. C depends on BOTH A and B
> > running successfully.
> > >
> > > Initially, I decided to create one DAG, add all three tasks to it and
> > set C as downstream to A and B. I then set the schedule_interval of the DAG
> > to @daily. But this meant I couldn't run A and B at 10am and 11am
> > respectively since the they are PythonOperators and tasks dont support
> > schedule_interval (or, at least, it's deprecated syntax and gets ignored).
> > >
> > > I scratched that idea and then created A and B as DAGs, specified the
> > schedule interval as per the cron syntax: '00 10 * * *' for A and '00 11 *
> > * *' for B. But now when I set C as a downstream of A and B, it complains
> > that C can't belong to two different dags.
> > >
> > > How do I accomplish such a simple dependency structure?
> > >
> > > Ram.
> > >
> >
> 


Re: Catchup By default = False vs LatestOnlyOperator

2018-07-23 Thread George Leslie-Waksman
As the author of LatestOnlyOperator, the goal was as a stopgap until
catchup=False landed.

There are some (very) fringe use cases where you might still want
LatestOnlyOperator but in almost all cases what you want is probably
catchup=False.

The situations where LatestOnlyOperator is still useful are where you want
to run most of your DAG for every schedule interval but you want some of
the tasks to run only on the latest run (not catching up, not backfilling).

It may be best to deprecate LatestOnlyOperator at this point to avoid
confusion.

--George

On Sat, Jul 21, 2018 at 7:34 PM Ben Tallman  wrote:

> As the author of catch-up, the idea is that in many cases your data
> doesn't "window" nicely and you want instead to just run as if it were a
> brilliant Cron...
>
> Ben
>
> Sent from my iPhone
>
> > On Jul 20, 2018, at 11:39 PM, Shah Altaf  wrote:
> >
> > Hi my understanding is: if you use the LatestOnlyOperator then when you
> run
> > the DAG for the first time you'll see a whole bunch of DAG runs queued
> up,
> > and in each run the LatestOnlyOperator will cause the rest of the DAG run
> > to be skipped.  Only the latest DAG will run in 'full'.
> >
> > With catchup = False, you should just get just the latest DAG run.
> >
> >
> > On Fri, Jul 20, 2018 at 10:58 PM Shubham Gupta <
> shubham180695...@gmail.com>
> > wrote:
> >
> >> -- Forwarded message -
> >> From: Shubham Gupta 
> >> Date: Fri, Jul 20, 2018 at 2:38 PM
> >> Subject: Catchup By default = False vs LatestOnlyOperator
> >> To: 
> >>
> >>
> >> Hi!
> >>
> >> Can someone please explain the difference b/w catchup by default = False
> >> and LatestOnlyOperator?
> >>
> >> Regarding
> >> Shubham Gupta
> >>
>


Re: Simple DAG Structure

2018-07-23 Thread Andrew Maguire
Maybe you could have A and B report their outcome somewhere and then use
that output, read back in from somewhere, as a check operator in C.

This is kinda reinventing the wheel a little bit though as ideally would be
a way to keep all that state inside airflow.

I think what I suggest would work, but maybe a little hackish.

On Mon, 23 Jul 2018, 14:33 srinivas.ramabhad...@gmail.com, <
srinivas.ramabhad...@gmail.com> wrote:

> Carl -
>
>Thanks, that definitely works, but it's non-ideal. If I had 100s of
> jobs running throughout the day, a TimeSensor task (process) gets created
> for each task at midnight even though a task may not be required to run for
> a very long time (e.g. a whole bunch of tasks need to run @ 20:00. All of
> their time sensors are kicked off at 00:00). Worse still, if I used a
> LocalExcecutor with a pool size of 10, some jobs that need to run early may
> not even get scheduled in favor of time sensors for tasks later in the day
> which only perform a sleep operation.
>
>Is there another way to do this? If not, is there at least another way
> around the LocalExecutor problem?
>
> Ram.
>
>
> On 2018/07/23 08:23:45, Carl Johan Gustavsson 
> wrote:
> > Hi Ram,
> >
> > You can have a single DAG scheduled to 10am, which starts A and then use
> a TimeSensor set to 11 am that B depends on  and then have C depend on A
> and B.
> >
> > Something like:
> >
> > a = BashOperator(‘a’, …)
> >
> > delay_b = TimeSensor(‘delay_b’, target_time=time(11, 0, 0), …)
> > b = BashOperator(‘b’, …)
> > b.set_upstream(delay_b)
> >
> > c = BashOperator(‘c’, …)
> > c.set_upstream(a)
> > c.set_upstream(b)
> >
> >
> > / Carl Johan
> > On 23 July 2018 at 02:18:00, srinivas.ramabhad...@gmail.com (
> srinivas.ramabhad...@gmail.com) wrote:
> >
> > Hi -
> >
> > I have recently started using Airflow version 1.9.0 and am having some
> difficulty setting up a very simple DAG. I have three tasks A, B and C. I'd
> like A to run every day at 10am and B at 11am. C depends on BOTH A and B
> running successfully.
> >
> > Initially, I decided to create one DAG, add all three tasks to it and
> set C as downstream to A and B. I then set the schedule_interval of the DAG
> to @daily. But this meant I couldn't run A and B at 10am and 11am
> respectively since the they are PythonOperators and tasks dont support
> schedule_interval (or, at least, it's deprecated syntax and gets ignored).
> >
> > I scratched that idea and then created A and B as DAGs, specified the
> schedule interval as per the cron syntax: '00 10 * * *' for A and '00 11 *
> * *' for B. But now when I set C as a downstream of A and B, it complains
> that C can't belong to two different dags.
> >
> > How do I accomplish such a simple dependency structure?
> >
> > Ram.
> >
>


Broken pipe 32

2018-07-23 Thread srinivas . ramabhadran
HI all - 

   I am using Airflow version 1.9.0 and am seeing these random errors in my 
logs. This causes Airflow to think that the task has failed, even though I can 
find the process with a ps aux. Any ideas what is causing this? Don't know if 
this is relevant, but this happens with processes that are fairly long-lived 
(they poll every minute during the day until it's time for them to run).

[2018-07-23 16:31:53,891] {logging_mixin.py:84} WARNING - Traceback (most 
recent call last):

[2018-07-23 16:31:53,892] {logging_mixin.py:84} WARNING -   File 
"/var/software/anaconda3/envs/statar_20180624/lib/python3.6/logging/__init__.py",
 line 996, in emit
self.flush()

[2018-07-23 16:31:53,892] {logging_mixin.py:84} WARNING -   File 
"/var/software/anaconda3/envs/statar_20180624/lib/python3.6/logging/__init__.py",
 line 976, in flush
self.stream.flush()

[2018-07-23 16:31:53,892] {logging_mixin.py:84} WARNING - BrokenPipeError: 
[Errno 32] Broken pipe

  

Ram.


Re: Simple DAG Structure

2018-07-23 Thread srinivas . ramabhadran
Carl - 

   Thanks, that definitely works, but it's non-ideal. If I had 100s of jobs 
running throughout the day, a TimeSensor task (process) gets created for each 
task at midnight even though a task may not be required to run for a very long 
time (e.g. a whole bunch of tasks need to run @ 20:00. All of their time 
sensors are kicked off at 00:00). Worse still, if I used a LocalExcecutor with 
a pool size of 10, some jobs that need to run early may not even get scheduled 
in favor of time sensors for tasks later in the day which only perform a sleep 
operation.

   Is there another way to do this? If not, is there at least another way 
around the LocalExecutor problem?

Ram.
   

On 2018/07/23 08:23:45, Carl Johan Gustavsson  
wrote: 
> Hi Ram,
> 
> You can have a single DAG scheduled to 10am, which starts A and then use a 
> TimeSensor set to 11 am that B depends on  and then have C depend on A and B.
> 
> Something like:
> 
> a = BashOperator(‘a’, …)
> 
> delay_b = TimeSensor(‘delay_b’, target_time=time(11, 0, 0), …)
> b = BashOperator(‘b’, …)
> b.set_upstream(delay_b)
> 
> c = BashOperator(‘c’, …)
> c.set_upstream(a)
> c.set_upstream(b)
> 
> 
> / Carl Johan
> On 23 July 2018 at 02:18:00, srinivas.ramabhad...@gmail.com 
> (srinivas.ramabhad...@gmail.com) wrote:
> 
> Hi -  
> 
> I have recently started using Airflow version 1.9.0 and am having some 
> difficulty setting up a very simple DAG. I have three tasks A, B and C. I'd 
> like A to run every day at 10am and B at 11am. C depends on BOTH A and B 
> running successfully.  
> 
> Initially, I decided to create one DAG, add all three tasks to it and set C 
> as downstream to A and B. I then set the schedule_interval of the DAG to 
> @daily. But this meant I couldn't run A and B at 10am and 11am respectively 
> since the they are PythonOperators and tasks dont support schedule_interval 
> (or, at least, it's deprecated syntax and gets ignored).  
> 
> I scratched that idea and then created A and B as DAGs, specified the 
> schedule interval as per the cron syntax: '00 10 * * *' for A and '00 11 * * 
> *' for B. But now when I set C as a downstream of A and B, it complains that 
> C can't belong to two different dags.  
> 
> How do I accomplish such a simple dependency structure?  
> 
> Ram.  
> 


Re: Airflow's JS code (and dependencies) manageable via npm and webpack

2018-07-23 Thread Bolke de Bruin
I think it should be removed now. 1.10.X should be the last release seri s that 
supports the old www. Do we need to vote on this?

Great work Verdan!

Verstuurd vanaf mijn iPad

> Op 23 jul. 2018 om 10:23 heeft Driesprong, Fokko  het 
> volgende geschreven:
> 
> ​Nice work Verdan.
> 
> The frontend really needed some love, thank you for picking this up. Maybe
> we should also think deprecating the old www. Keeping both of the UI's is
> something that takes a lot of time. Maybe after the release of 1.10 we can
> think of moving to Airflow 2.0, and removing the old UI.
> 
> 
> Cheers, Fokko​
> 
> 2018-07-23 10:02 GMT+02:00 Naik Kaxil :
> 
>> Awesome. Thanks @Verdan
>> 
>> On 23/07/2018, 07:58, "Verdan Mahmood"  wrote:
>> 
>>Heads-up!! This frontend change has been merged in master branch
>> recently.
>>This will impact the users working on Airflow RBAC UI only. That means:
>> 
>>*If you are a contributor/developer of Apache Airflow:*
>>You'll need to install and build the frontend packages if you want to
>> run
>>the web UI.
>>Please make sure to read the new section, "Setting up the node / npm
>>javascript environment"
>>> CONTRIBUTING.md#setting-up-the-node--npm-javascript-
>> environment-only-for-www_rbac>
>> 
>>in CONTRIBUTING.md
>> 
>>*If you are using Apache Airflow in your production environment:*
>>Nothing will impact you, as every new build of Apache Airflow will
>> come up
>>with pre-built dependencies.
>> 
>>Please let me know if you have any questions. Thank you
>> 
>>Best,
>>*Verdan Mahmood*
>> 
>> 
>>On Sun, Jul 15, 2018 at 6:52 PM Maxime Beauchemin <
>>maximebeauche...@gmail.com> wrote:
>> 
>>> Glad to see this is happening!
>>> 
>>> Max
>>> 
>>> On Mon, Jul 9, 2018 at 6:37 AM Ash Berlin-Taylor <
>>> ash_airflowl...@firemirror.com> wrote:
>>> 
 Great! Thanks for doing this. I've left some review comments on
>> your PR.
 
 -ash
 
> On 9 Jul 2018, at 11:45, Verdan Mahmood <
>> verdan.mahm...@gmail.com>
 wrote:
> 
> ​Hey Guys, ​
> 
> In an effort to simplify the JS dependencies of Airflow
> ​​
> ,
> ​I've
> introduce
> ​d​
> npm and webpack for the package management. For now, it only
>> implements
> this in the www_rbac version of the web server.
> ​
> 
> Pull Request: https://github.com/apache/
>> incubator-airflow/pull/3572
> 
> The problem with the
> ​existing ​
> frontend (
> ​JS
> ) code of Airflow is that most of the custom JS is written
> ​with​
> in the html files, using the Flask's (Jinja) variables in that
>> JS. The
 next
> step of this effort would be to extract that custom
> ​JS
> code in separate JS files
> ​,​
> use the dependencies in those files using require or import
> ​ and introduce the JS automated test suite eventually. ​
> (At the moment, I'm simply using the CopyWebPackPlugin to copy
>> the
 required
> dependencies for use)
> ​.
> 
> There are also some dependencies which are directly modified in
>> the
 codebase
> ​ or are outdated​
> . I couldn't found the
> ​ correct​
> npm versions of those libraries. (dagre-d3.js and
>> gantt-chart-d3v2.js).
> Apparently dagre-d3.js that we are using is one of the gist or
>> is very
 old
> version
> ​ not supported with webpack 4​
> , while the gantt-chart-d3v2 has been modified according to
>> Airflow's
> requirements
> ​ I believe​
> .
> ​ Used the existing libraries for now. ​
> 
> ​I am currently working in a separate branch to upgrade the
>> DagreD3
> library, and updating the custom JS related to DagreD3
>> accordingly. ​
> 
> This PR also introduces the pypi_push.sh
> <
 
>>> https://github.com/apache/incubator-airflow/pull/3572/files#diff-
>> 8fae684cdcc8cc8df2232c8df16f64cb
> 
> script that will generate all the JS statics before creating and
 uploading
> the package.
> ​
> ​Please let me know if you guys have any questions or
>> suggestions and
>>> I'd
> be happy to answer that. ​
> 
> Best,
> *Verdan Mahmood*
> (+31) 655 576 560
 
 
>>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> Kaxil Naik
>> 
>> Data Reply
>> 2nd Floor, Nova South
>> 160 Victoria Street, Westminster
>> London SW1E 5LB - UK
>> phone: +44 (0)20 7730 6000
>> k.n...@reply.com
>> www.reply.com
>> 


Re: Simple DAG Structure

2018-07-23 Thread Carl Johan Gustavsson
Hi Ram,

You can have a single DAG scheduled to 10am, which starts A and then use a 
TimeSensor set to 11 am that B depends on  and then have C depend on A and B.

Something like:

a = BashOperator(‘a’, …)

delay_b = TimeSensor(‘delay_b’, target_time=time(11, 0, 0), …)
b = BashOperator(‘b’, …)
b.set_upstream(delay_b)

c = BashOperator(‘c’, …)
c.set_upstream(a)
c.set_upstream(b)


/ Carl Johan
On 23 July 2018 at 02:18:00, srinivas.ramabhad...@gmail.com 
(srinivas.ramabhad...@gmail.com) wrote:

Hi -  

I have recently started using Airflow version 1.9.0 and am having some 
difficulty setting up a very simple DAG. I have three tasks A, B and C. I'd 
like A to run every day at 10am and B at 11am. C depends on BOTH A and B 
running successfully.  

Initially, I decided to create one DAG, add all three tasks to it and set C as 
downstream to A and B. I then set the schedule_interval of the DAG to @daily. 
But this meant I couldn't run A and B at 10am and 11am respectively since the 
they are PythonOperators and tasks dont support schedule_interval (or, at 
least, it's deprecated syntax and gets ignored).  

I scratched that idea and then created A and B as DAGs, specified the schedule 
interval as per the cron syntax: '00 10 * * *' for A and '00 11 * * *' for B. 
But now when I set C as a downstream of A and B, it complains that C can't 
belong to two different dags.  

How do I accomplish such a simple dependency structure?  

Ram.  


Re: Airflow's JS code (and dependencies) manageable via npm and webpack

2018-07-23 Thread Driesprong, Fokko
​Nice work Verdan.

The frontend really needed some love, thank you for picking this up. Maybe
we should also think deprecating the old www. Keeping both of the UI's is
something that takes a lot of time. Maybe after the release of 1.10 we can
think of moving to Airflow 2.0, and removing the old UI.


Cheers, Fokko​

2018-07-23 10:02 GMT+02:00 Naik Kaxil :

> Awesome. Thanks @Verdan
>
> On 23/07/2018, 07:58, "Verdan Mahmood"  wrote:
>
> Heads-up!! This frontend change has been merged in master branch
> recently.
> This will impact the users working on Airflow RBAC UI only. That means:
>
> *If you are a contributor/developer of Apache Airflow:*
> You'll need to install and build the frontend packages if you want to
> run
> the web UI.
> Please make sure to read the new section, "Setting up the node / npm
> javascript environment"
>  CONTRIBUTING.md#setting-up-the-node--npm-javascript-
> environment-only-for-www_rbac>
>
> in CONTRIBUTING.md
>
> *If you are using Apache Airflow in your production environment:*
> Nothing will impact you, as every new build of Apache Airflow will
> come up
> with pre-built dependencies.
>
> Please let me know if you have any questions. Thank you
>
> Best,
> *Verdan Mahmood*
>
>
> On Sun, Jul 15, 2018 at 6:52 PM Maxime Beauchemin <
> maximebeauche...@gmail.com> wrote:
>
> > Glad to see this is happening!
> >
> > Max
> >
> > On Mon, Jul 9, 2018 at 6:37 AM Ash Berlin-Taylor <
> > ash_airflowl...@firemirror.com> wrote:
> >
> > > Great! Thanks for doing this. I've left some review comments on
> your PR.
> > >
> > > -ash
> > >
> > > > On 9 Jul 2018, at 11:45, Verdan Mahmood <
> verdan.mahm...@gmail.com>
> > > wrote:
> > > >
> > > > ​Hey Guys, ​
> > > >
> > > > In an effort to simplify the JS dependencies of Airflow
> > > > ​​
> > > > ,
> > > > ​I've
> > > > introduce
> > > > ​d​
> > > > npm and webpack for the package management. For now, it only
> implements
> > > > this in the www_rbac version of the web server.
> > > > ​
> > > >
> > > > Pull Request: https://github.com/apache/
> incubator-airflow/pull/3572
> > > >
> > > > The problem with the
> > > > ​existing ​
> > > > frontend (
> > > > ​JS
> > > > ) code of Airflow is that most of the custom JS is written
> > > > ​with​
> > > > in the html files, using the Flask's (Jinja) variables in that
> JS. The
> > > next
> > > > step of this effort would be to extract that custom
> > > > ​JS
> > > > code in separate JS files
> > > > ​,​
> > > > use the dependencies in those files using require or import
> > > > ​ and introduce the JS automated test suite eventually. ​
> > > > (At the moment, I'm simply using the CopyWebPackPlugin to copy
> the
> > > required
> > > > dependencies for use)
> > > > ​.
> > > >
> > > > There are also some dependencies which are directly modified in
> the
> > > codebase
> > > > ​ or are outdated​
> > > > . I couldn't found the
> > > > ​ correct​
> > > > npm versions of those libraries. (dagre-d3.js and
> gantt-chart-d3v2.js).
> > > > Apparently dagre-d3.js that we are using is one of the gist or
> is very
> > > old
> > > > version
> > > > ​ not supported with webpack 4​
> > > > , while the gantt-chart-d3v2 has been modified according to
> Airflow's
> > > > requirements
> > > > ​ I believe​
> > > > .
> > > > ​ Used the existing libraries for now. ​
> > > >
> > > > ​I am currently working in a separate branch to upgrade the
> DagreD3
> > > > library, and updating the custom JS related to DagreD3
> accordingly. ​
> > > >
> > > > This PR also introduces the pypi_push.sh
> > > > <
> > >
> > https://github.com/apache/incubator-airflow/pull/3572/files#diff-
> 8fae684cdcc8cc8df2232c8df16f64cb
> > > >
> > > > script that will generate all the JS statics before creating and
> > > uploading
> > > > the package.
> > > > ​
> > > > ​Please let me know if you guys have any questions or
> suggestions and
> > I'd
> > > > be happy to answer that. ​
> > > >
> > > > Best,
> > > > *Verdan Mahmood*
> > > > (+31) 655 576 560
> > >
> > >
> >
>
>
>
>
>
>
> Kaxil Naik
>
> Data Reply
> 2nd Floor, Nova South
> 160 Victoria Street, Westminster
> London SW1E 5LB - UK
> phone: +44 (0)20 7730 6000
> k.n...@reply.com
> www.reply.com
>


Re: Airflow's JS code (and dependencies) manageable via npm and webpack

2018-07-23 Thread Naik Kaxil
Awesome. Thanks @Verdan

On 23/07/2018, 07:58, "Verdan Mahmood"  wrote:

Heads-up!! This frontend change has been merged in master branch recently.
This will impact the users working on Airflow RBAC UI only. That means:

*If you are a contributor/developer of Apache Airflow:*
You'll need to install and build the frontend packages if you want to run
the web UI.
Please make sure to read the new section, "Setting up the node / npm
javascript environment"



in CONTRIBUTING.md

*If you are using Apache Airflow in your production environment:*
Nothing will impact you, as every new build of Apache Airflow will come up
with pre-built dependencies.

Please let me know if you have any questions. Thank you

Best,
*Verdan Mahmood*


On Sun, Jul 15, 2018 at 6:52 PM Maxime Beauchemin <
maximebeauche...@gmail.com> wrote:

> Glad to see this is happening!
>
> Max
>
> On Mon, Jul 9, 2018 at 6:37 AM Ash Berlin-Taylor <
> ash_airflowl...@firemirror.com> wrote:
>
> > Great! Thanks for doing this. I've left some review comments on your PR.
> >
> > -ash
> >
> > > On 9 Jul 2018, at 11:45, Verdan Mahmood 
> > wrote:
> > >
> > > ​Hey Guys, ​
> > >
> > > In an effort to simplify the JS dependencies of Airflow
> > > ​​
> > > ,
> > > ​I've
> > > introduce
> > > ​d​
> > > npm and webpack for the package management. For now, it only 
implements
> > > this in the www_rbac version of the web server.
> > > ​
> > >
> > > Pull Request: https://github.com/apache/incubator-airflow/pull/3572
> > >
> > > The problem with the
> > > ​existing ​
> > > frontend (
> > > ​JS
> > > ) code of Airflow is that most of the custom JS is written
> > > ​with​
> > > in the html files, using the Flask's (Jinja) variables in that JS. The
> > next
> > > step of this effort would be to extract that custom
> > > ​JS
> > > code in separate JS files
> > > ​,​
> > > use the dependencies in those files using require or import
> > > ​ and introduce the JS automated test suite eventually. ​
> > > (At the moment, I'm simply using the CopyWebPackPlugin to copy the
> > required
> > > dependencies for use)
> > > ​.
> > >
> > > There are also some dependencies which are directly modified in the
> > codebase
> > > ​ or are outdated​
> > > . I couldn't found the
> > > ​ correct​
> > > npm versions of those libraries. (dagre-d3.js and 
gantt-chart-d3v2.js).
> > > Apparently dagre-d3.js that we are using is one of the gist or is very
> > old
> > > version
> > > ​ not supported with webpack 4​
> > > , while the gantt-chart-d3v2 has been modified according to Airflow's
> > > requirements
> > > ​ I believe​
> > > .
> > > ​ Used the existing libraries for now. ​
> > >
> > > ​I am currently working in a separate branch to upgrade the DagreD3
> > > library, and updating the custom JS related to DagreD3 accordingly. ​
> > >
> > > This PR also introduces the pypi_push.sh
> > > <
> >
> 
https://github.com/apache/incubator-airflow/pull/3572/files#diff-8fae684cdcc8cc8df2232c8df16f64cb
> > >
> > > script that will generate all the JS statics before creating and
> > uploading
> > > the package.
> > > ​
> > > ​Please let me know if you guys have any questions or suggestions and
> I'd
> > > be happy to answer that. ​
> > >
> > > Best,
> > > *Verdan Mahmood*
> > > (+31) 655 576 560
> >
> >
>






Kaxil Naik 

Data Reply
2nd Floor, Nova South
160 Victoria Street, Westminster
London SW1E 5LB - UK 
phone: +44 (0)20 7730 6000
k.n...@reply.com
www.reply.com


Re: Airflow's JS code (and dependencies) manageable via npm and webpack

2018-07-23 Thread Verdan Mahmood
Heads-up!! This frontend change has been merged in master branch recently.
This will impact the users working on Airflow RBAC UI only. That means:

*If you are a contributor/developer of Apache Airflow:*
You'll need to install and build the frontend packages if you want to run
the web UI.
Please make sure to read the new section, "Setting up the node / npm
javascript environment"


in CONTRIBUTING.md

*If you are using Apache Airflow in your production environment:*
Nothing will impact you, as every new build of Apache Airflow will come up
with pre-built dependencies.

Please let me know if you have any questions. Thank you

Best,
*Verdan Mahmood*


On Sun, Jul 15, 2018 at 6:52 PM Maxime Beauchemin <
maximebeauche...@gmail.com> wrote:

> Glad to see this is happening!
>
> Max
>
> On Mon, Jul 9, 2018 at 6:37 AM Ash Berlin-Taylor <
> ash_airflowl...@firemirror.com> wrote:
>
> > Great! Thanks for doing this. I've left some review comments on your PR.
> >
> > -ash
> >
> > > On 9 Jul 2018, at 11:45, Verdan Mahmood 
> > wrote:
> > >
> > > ​Hey Guys, ​
> > >
> > > In an effort to simplify the JS dependencies of Airflow
> > > ​​
> > > ,
> > > ​I've
> > > introduce
> > > ​d​
> > > npm and webpack for the package management. For now, it only implements
> > > this in the www_rbac version of the web server.
> > > ​
> > >
> > > Pull Request: https://github.com/apache/incubator-airflow/pull/3572
> > >
> > > The problem with the
> > > ​existing ​
> > > frontend (
> > > ​JS
> > > ) code of Airflow is that most of the custom JS is written
> > > ​with​
> > > in the html files, using the Flask's (Jinja) variables in that JS. The
> > next
> > > step of this effort would be to extract that custom
> > > ​JS
> > > code in separate JS files
> > > ​,​
> > > use the dependencies in those files using require or import
> > > ​ and introduce the JS automated test suite eventually. ​
> > > (At the moment, I'm simply using the CopyWebPackPlugin to copy the
> > required
> > > dependencies for use)
> > > ​.
> > >
> > > There are also some dependencies which are directly modified in the
> > codebase
> > > ​ or are outdated​
> > > . I couldn't found the
> > > ​ correct​
> > > npm versions of those libraries. (dagre-d3.js and gantt-chart-d3v2.js).
> > > Apparently dagre-d3.js that we are using is one of the gist or is very
> > old
> > > version
> > > ​ not supported with webpack 4​
> > > , while the gantt-chart-d3v2 has been modified according to Airflow's
> > > requirements
> > > ​ I believe​
> > > .
> > > ​ Used the existing libraries for now. ​
> > >
> > > ​I am currently working in a separate branch to upgrade the DagreD3
> > > library, and updating the custom JS related to DagreD3 accordingly. ​
> > >
> > > This PR also introduces the pypi_push.sh
> > > <
> >
> https://github.com/apache/incubator-airflow/pull/3572/files#diff-8fae684cdcc8cc8df2232c8df16f64cb
> > >
> > > script that will generate all the JS statics before creating and
> > uploading
> > > the package.
> > > ​
> > > ​Please let me know if you guys have any questions or suggestions and
> I'd
> > > be happy to answer that. ​
> > >
> > > Best,
> > > *Verdan Mahmood*
> > > (+31) 655 576 560
> >
> >
>