Re: Configuring/sharing Airflow github repo security alerts

2018-12-18 Thread Feng Lu
Cool, thank you Ash. Kindly let us know when you have opened the INFRA jira
ticket.

On Tue, Dec 18, 2018 at 2:21 AM Ash Berlin-Taylor 
wrote:

> We're not admins of the repo - only the ASF Infra team are, so we'll
> have to open an ticket against the INFRA queue in jira asking for this
>
> (I haven't done this. Not on large device right now)
>
> -a
>
> Feng Lu wrote on 18/12/2018 08:01:
> > Hi all,
> >
> > Looks like GitHub now adds a new "Security Alert" feature
> > <
> https://help.github.com/articles/viewing-and-updating-vulnerable-dependencies-in-your-repository/
> >
> > for tracking dependency CVEs, unfortunately I couldn't find it in Airflow
> > repo. <https://github.com/apache/incubator-airflow/pulse> So if it makes
> > sense to the community, could Airflow repo admin (assume it means PMC
> > members ;p) help to enable the alert feature and make it publicly
> > available?
> >
> > Happy to take a stab myself if I have the access permission.
> > Thanks.
> >
> > Feng
> >
>
>


Configuring/sharing Airflow github repo security alerts

2018-12-18 Thread Feng Lu
Hi all,

Looks like GitHub now adds a new "Security Alert" feature

for tracking dependency CVEs, unfortunately I couldn't find it in Airflow
repo.  So if it makes
sense to the community, could Airflow repo admin (assume it means PMC
members ;p) help to enable the alert feature and make it publicly
available?

Happy to take a stab myself if I have the access permission.
Thanks.

Feng


Re: explicit_defaults_for_timestamp for mysql

2018-10-29 Thread Feng Lu
I haven't tested the part where database tables are created with one flag
but accessed under a different flag, the changes have been working for us
so far.

On Tue, Oct 23, 2018 at 10:09 PM Bolke de Bruin  wrote:

> We only need it at table creation time or alter table time during which an
> alembic script would fail if MySQL restarts I assume?
>
> I'm not sure if the PR in this way is required (but if it works and works
> well it's okay to me too just like consistency across DBS and no surprises
> with MySQL )
>
> Sent from my iPhone
>
> On 24 Oct 2018, at 05:18, Feng Lu  wrote:
>
> Sorry for the late reply.
> GCP (CloudSQL) does support setting this parameter at session level but
> the VM used to host the mysqld might be restarted at any time, so it can't
> be done reliably.
>
> Haotian (cc-ed) in my team has looked into the needed schema changes to
> make Airflow 1.10 timestamp support to work with mysql without setting
> the exlicit_defaults_for_timestamp flag in mysql, details below:
>
> @@ -40,10 +40,6 @@ conn = op.get_bind() if conn.dialect.name == 
> 'mysql': conn.execute("SET time_zone = '+00:00'")-cur = 
> conn.execute("SELECT @@explicit_defaults_for_timestamp")-res = 
> cur.fetchall()-if res[0][0] == 0:-raise Exception("Global 
> variable explicit_defaults_for_timestamp needs to be on (1) for mysql")   
>   op.alter_column(table_name='chart', column_name='last_modified', 
> type_=mysql.TIMESTAMP(fsp=6))@@ -69,20 +65,28 @@ 
> op.alter_column(table_name='log', column_name='dttm', 
> type_=mysql.TIMESTAMP(fsp=6)) op.alter_column(table_name='log', 
> column_name='execution_date', type_=mysql.TIMESTAMP(fsp=6))-
> op.alter_column(table_name='sla_miss', column_name='execution_date', 
> type_=mysql.TIMESTAMP(fsp=6), nullable=False)+
> op.alter_column(table_name='sla_miss', column_name='execution_date', 
> type_=mysql.TIMESTAMP(fsp=6), \+nullable=False, 
> server_default=sa.text('CURRENT_TIMESTAMP(6)')) 
> op.alter_column(table_name='sla_miss', column_name='timestamp', 
> type_=mysql.TIMESTAMP(fsp=6))-op.alter_column(table_name='task_fail', 
> column_name='execution_date', type_=mysql.TIMESTAMP(fsp=6))+
> op.alter_column(table_name='task_fail', column_name='execution_date', 
> type_=mysql.TIMESTAMP(fsp=6), \+nullable=False, 
> server_default=sa.text('CURRENT_TIMESTAMP(6)')) 
> op.alter_column(table_name='task_fail', column_name='start_date', 
> type_=mysql.TIMESTAMP(fsp=6)) op.alter_column(table_name='task_fail', 
> column_name='end_date', type_=mysql.TIMESTAMP(fsp=6))-
> op.alter_column(table_name='task_instance', column_name='execution_date', 
> type_=mysql.TIMESTAMP(fsp=6), nullable=False)+
> op.alter_column(table_name='task_instance', column_name='execution_date', 
> type_=mysql.TIMESTAMP(fsp=6), \+nullable=False, 
> server_default=sa.text('CURRENT_TIMESTAMP(6)')) 
> op.alter_column(table_name='task_instance', column_name='start_date', 
> type_=mysql.TIMESTAMP(fsp=6)) 
> op.alter_column(table_name='task_instance', column_name='end_date', 
> type_=mysql.TIMESTAMP(fsp=6)) 
> op.alter_column(table_name='task_instance', column_name='queued_dttm', 
> type_=mysql.TIMESTAMP(fsp=6))-op.alter_column(table_name='xcom', 
> column_name='timestamp', type_=mysql.TIMESTAMP(fsp=6))-
> op.alter_column(table_name='xcom', column_name='execution_date', 
> type_=mysql.TIMESTAMP(fsp=6))+op.alter_column(table_name='xcom', 
> column_name='timestamp', type_=mysql.TIMESTAMP(fsp=6), \+
> nullable=False, server_default=sa.text('CURRENT_TIMESTAMP(6)'))+
> op.alter_column(table_name='xcom', column_name='execution_date', 
> type_=mysql.TIMESTAMP(fsp=6), \+nullable=False, 
> server_default=sa.text('CURRENT_TIMESTAMP(6)'))+conn.execute("alter 
> table task_instance alter column execution_date drop default")+
> conn.execute("alter table sla_miss alter column execution_date drop 
> default")+conn.execute("alter table task_fail alter column 
> execution_date drop default") else: # sqlite datetime is fine as 
> is not converting if conn.dialect.name == 'sqlite':--- 
> migrations/versions/f23433877c24_fix_mysql_not_null_constraint.py+++ 
> migrations/versions/f23433877c24_fix_mysql_not_null_constraint.py@@ -39,10 
> +39,15 @@ conn = op.get_bind() if conn.dialect.name == 'mysql':   
>   conn.execute("SET time_zone = '+00:00'")-
> op.alter_column('task_fail', 'execution_date', 
> existing_type=mysql.TIMESTAMP(fsp=6), nullable=Fa

Re: explicit_defaults_for_timestamp for mysql

2018-10-23 Thread Feng Lu
xplicit_defaults_for_timestamp
>
> indicates that it can be set on the session level as well. So we could
> just change the alembic scripts do try it. However
> MariaDB does not support it in a session so we always need to check the
> variable. We will also need to set it at *every*
> alembic script that deals with datetimes in the future. Nevertheless this
> might be the easiest solution.
>
> Does GCP’s MySQL also allow this setting in the session scope?
>
> B.
>
> On 19 Oct 2018, at 18:48, Deng Xiaodong  wrote:
>
> I'm ok to test this.
>
> @ash, may you kindly give some examples of what exact behaviour the testers
> should pay attention to? Since people like me may not know the full
> background of having introduced this restriction & check, or what issue it
> was trying to address.
>
> @Feng Lu, may you please advise if you are still interested to prepare this
> PR?
>
> Thanks!
>
>
> XD
>
> On Sat, Oct 20, 2018 at 12:38 AM Ash Berlin-Taylor  wrote:
>
> This sounds sensible and would mean we could also run on GCP's MySQL
> offering too.
>
> This would need someone to try out and check that timezones behave
> sensibly with this change made.
>
> Any volunteers?
>
> -ash
>
> On 19 Oct 2018, at 17:32, Deng Xiaodong  wrote:
>
> Wondering if there is any further thoughts about this proposal kindly
>
> raised by Feng Lu earlier?
>
>
> If we can skip this check & allow explicit_defaults_for_timestamp to be
>
> 0, it would be helpful, especially for enterprise users in whose
> environments it’s really hard to ask for a database global variable change
> (like myself…).
>
>
>
> XD
>
> On 2018/08/28 15:23:10, Feng Lu  wrote:
>
> Bolke, a gentle ping..>
> Thank you.>
>
> On Thu, Aug 23, 2018, 23:01 Feng Lu  wrote:>
>
> Hi all,>
>
>
> After reading the MySQL documentation on the>
> exlicit_defaults_for_timestamp, it appears that we can skip the check
>
> on explicit_defaults_for_timestamp>
>
> = 1>
> <
>
>
> https://github.com/apache/incubator-airflow/blob/master/airflow/migrations/versions/0e2a74e0fc9f_add_time_zone_awareness.py#L43
> >
> by>
>
> setting the column to accept NULL explicitly. For example:>
>
>
> op.alter_column(table_name='chart', column_name='last_modified',>
> type_=mysql.TIMESTAMP(fsp=6)) -->>
> op.alter_column(table_name='chart', column_name='last_modified',>
> type_=mysql.TIMESTAMP(fsp=6), nullable=True)>
>
>
> Here's why:>
> From MySQL doc (when explicit_defaults_for_timestamp is set to True):>
> "TIMESTAMP columns not explicitly declared with the NOT NULL attribute
>
> are>
>
> automatically declared with the NULL attribute and permit NULL
>
> values.>
>
> Assigning such a column a value of NULL sets it to NULL, not the
>
> current>
>
> timestamp.">
>
>
> Thanks and happy to shoot a PR if it makes sense.>
>
>
> Feng>
>
>
>
>
>
>
>
>


[Reminder] Sep 24 Airflow Bay Area Meetup @ Google

2018-09-15 Thread Feng Lu
Hi all,

The Sep Airflow meetup is only one week away and I am excited to share with
you the detailed agenda! We have a really good mix of Airflow talks that
span from testing, deployment, best practices, production, and future
directions. Details can be found on the meetup site

.

Looking forward to seeing you all there on Sep 24 in Google Sunnyvale campus

!
Have a great weekend.

Feng


Re: Sep Airflow Bay Area Meetup @ Google

2018-09-15 Thread Feng Lu
Not going to happen for this time, we don't receive enough interest from
the community.


On Wed, Sep 12, 2018 at 7:57 AM Bolke de Bruin  wrote:

> btw how are we doing on the “one day” hackathon?
>
> > On 12 Sep 2018, at 16:49, Bolke de Bruin  wrote:
> >
> > Hi feng,
> >
> > I can do “Elegant pipelining with Airflow” recycle of pydata 2018
> amsterdam (that I did together with Fokko).
> >
> > Cheers
> > Bolke
> >
> >> On 4 Sep 2018, at 22:13, Feng Lu  wrote:
> >>
> >> We are 3 weeks away from the meetup and still have a few lightening
> talks
> >> open, please take the chance and share your cool ideas/work ;)
> >> Meanwhile, speakers could you please send me and Trishka (
> tris...@google.com)
> >> your slides?
> >>
> >> Thank you.
> >>
> >> Feng
> >>
> >> On Sun, Aug 12, 2018 at 9:46 PM Maxime Beauchemin <
> >> maximebeauche...@gmail.com> wrote:
> >>
> >>> Hey Feng,
> >>>
> >>> Sign me up for a session on "Challenges ahead - taking airflow to the
> next
> >>> level". I'm planning on recycling the content from the talk @Google
> next
> >>> Friday.
> >>>
> >>> Max
> >>>
> >>> On Fri, Aug 10, 2018 at 3:22 PM Feng Lu 
> wrote:
> >>>
> >>>> Hi all,
> >>>>
> >>>> We still have 1-2 regular sessions and 4-5 lightening sessions
> available,
> >>>> please send in your talks ;)
> >>>> Here's a quick summary on the talks I have received:
> >>>>
> >>>> Regular sessions:
> >>>> Ben Gregory (Astronomer): Running Cloud Native Airflow.
> >>>> Feng Lu (Google): Managing Airflow As a Service: Best Practices,
> >>> Experience
> >>>> and Roadmap
> >>>> Fokko Driesprong (GoDataDriven): Apache Airflow in the Google Cloud:
> >>>> Backfilling streaming data using Dataflow
> >>>>
> >>>> Lightening Session:
> >>>> Barni Seetharaman (Google): Deploy Airflow on Kubernetes using Airflow
> >>>> Operator
> >>>>
> >>>> Session type TBD:
> >>>> Manish Ranjan (Tile): Functional yet cost-effective Data Engineering
> With
> >>>> Airflow
> >>>>
> >>>> Thanks and looking forward to the meetup(120+ sign-ups to date)!
> >>>>
> >>>>
> >>>> Feng
> >>>>
> >>>> On Thu, Jul 19, 2018 at 2:26 PM Feng Lu  wrote:
> >>>>
> >>>>> Hi all,
> >>>>>
> >>>>> Hope you are enjoying your summer!
> >>>>>
> >>>>> This is Feng Lu from Google and we'll host the next Airflow meetup in
> >>>> our Sunnyvale
> >>>>> campus. We plan to add a *lightening session* this time for people to
> >>>>> share their airflow ideas, work in progress, pain points, etc.
> >>>>> Here's the meetup date and schedule:
> >>>>>
> >>>>> -- Sep 24 (Monday)  --
> >>>>> 6:00PM meetup starts
> >>>>> 6:00 - 8:00PM light dinner /mix-n-mingle
> >>>>> 8:00PM - 9:40PM: 5 sessions (20 minutes each)
> >>>>> 9:40PM to 10:10PM: 6 lightening sessions (5 minutes each)
> >>>>> 10:10PM to 11:00PM: drinks and social hour
> >>>>>
> >>>>> I've seen a lot of interesting discussions in the dev mailing-list on
> >>>>> security, scalability, event interactions, future directions, hosting
> >>>>> platform and others. Please feel free to send your talk proposal to
> us
> >>> by
> >>>>> replying this email.
> >>>>>
> >>>>> The Cloud Composer team is also going to share their experience
> running
> >>>>> Apache Airflow as a managed solution and service roadmap.
> >>>>>
> >>>>> Thank you and looking forward to hearing from y'all soon!
> >>>>>
> >>>>> p.s., if folks are interested, we can also add a one-day Airflow
> >>>> hackathon
> >>>>> prior to the meet-up on the same day, please let us know.
> >>>>>
> >>>>> Feng
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >
>
>


Re: Airflow Bay Area Meetup

2018-09-04 Thread Feng Lu
+1, the content looks great, please send in your talk title and abstract.
Thanks.

Feng

On Tue, Sep 4, 2018 at 6:54 PM Chandu Kavar  wrote:

> Hey Feng/Trishka,
>
> I am Chandu working at Grab Singapore as a Data Engineer. I am interested
> in giving a lightning talk on Airflow Testing that we recently stated in my
> project.
>
> Also, I have written a blog on Airflow testing, you can find it here
> 
>
> Let me know if you think its good to share.
>
> Thanks,
> Chandu
>


Re: Sep Airflow Bay Area Meetup @ Google

2018-09-04 Thread Feng Lu
We are 3 weeks away from the meetup and still have a few lightening talks
open, please take the chance and share your cool ideas/work ;)
Meanwhile, speakers could you please send me and Trishka (tris...@google.com)
your slides?

Thank you.

Feng

On Sun, Aug 12, 2018 at 9:46 PM Maxime Beauchemin <
maximebeauche...@gmail.com> wrote:

> Hey Feng,
>
> Sign me up for a session on "Challenges ahead - taking airflow to the next
> level". I'm planning on recycling the content from the talk @Google next
> Friday.
>
> Max
>
> On Fri, Aug 10, 2018 at 3:22 PM Feng Lu  wrote:
>
> > Hi all,
> >
> > We still have 1-2 regular sessions and 4-5 lightening sessions available,
> > please send in your talks ;)
> > Here's a quick summary on the talks I have received:
> >
> > Regular sessions:
> > Ben Gregory (Astronomer): Running Cloud Native Airflow.
> > Feng Lu (Google): Managing Airflow As a Service: Best Practices,
> Experience
> > and Roadmap
> > Fokko Driesprong (GoDataDriven): Apache Airflow in the Google Cloud:
> > Backfilling streaming data using Dataflow
> >
> > Lightening Session:
> > Barni Seetharaman (Google): Deploy Airflow on Kubernetes using Airflow
> > Operator
> >
> > Session type TBD:
> > Manish Ranjan (Tile): Functional yet cost-effective Data Engineering With
> > Airflow
> >
> > Thanks and looking forward to the meetup(120+ sign-ups to date)!
> >
> >
> > Feng
> >
> > On Thu, Jul 19, 2018 at 2:26 PM Feng Lu  wrote:
> >
> > > Hi all,
> > >
> > > Hope you are enjoying your summer!
> > >
> > > This is Feng Lu from Google and we'll host the next Airflow meetup in
> > our Sunnyvale
> > > campus. We plan to add a *lightening session* this time for people to
> > > share their airflow ideas, work in progress, pain points, etc.
> > > Here's the meetup date and schedule:
> > >
> > > -- Sep 24 (Monday)  --
> > > 6:00PM meetup starts
> > > 6:00 - 8:00PM light dinner /mix-n-mingle
> > > 8:00PM - 9:40PM: 5 sessions (20 minutes each)
> > > 9:40PM to 10:10PM: 6 lightening sessions (5 minutes each)
> > > 10:10PM to 11:00PM: drinks and social hour
> > >
> > > I've seen a lot of interesting discussions in the dev mailing-list on
> > > security, scalability, event interactions, future directions, hosting
> > > platform and others. Please feel free to send your talk proposal to us
> by
> > > replying this email.
> > >
> > > The Cloud Composer team is also going to share their experience running
> > > Apache Airflow as a managed solution and service roadmap.
> > >
> > > Thank you and looking forward to hearing from y'all soon!
> > >
> > > p.s., if folks are interested, we can also add a one-day Airflow
> > hackathon
> > > prior to the meet-up on the same day, please let us know.
> > >
> > > Feng
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> >
>


Re: Python 3.6 Support for Airflow 1.10.0

2018-08-28 Thread Feng Lu
+1 for keeping 2.7 as long as we can so people have time to plan and
migrate away from it.

On Tue, Aug 28, 2018, 10:35 Arthur Wiedmer  wrote:

> Given that Python 2.7 EOL is slated for January 1st 2020, we should
> probably ensure that the early releases of 2019 are still 2.7 compatible.
>
> Beyond this, I think we can also be responsible security wise and help
> nudge people towards 3.
>
> Best,
> Arthur
>
> On Tue, Aug 28, 2018 at 10:28 AM Bolke de Bruin  wrote:
>
> > Let’s not drop 2.7 too quickly but maybe mark it deprecated. I’m pretty
> > sure Airbnb still runs on 2.7.
> >
> > Also RedHat does not deliver python 3 in its enterprise edition yet by
> > default so it will put enterprise users in a bit of an awkward spot.
> >
> > B.
> >
> > Verstuurd vanaf mijn iPad
> >
> > > Op 28 aug. 2018 om 19:00 heeft Sid Anand  het
> > volgende geschreven:
> > >
> > > I'm +1 on going to 3.7 -- I'm running 3.6 myself.
> > >
> > > Regarding dropping Python2 support, with almost 200 companies using
> > > Airflow, I'd want to be very careful that we don't put any of them at a
> > > disadvantage. For example, my former employer (a small startup) is
> > running
> > > on Python2 -- after I left, they don't have anyone actively maintaining
> > it
> > > at the company. Easing upgrades for such cases will keep them using
> > Airflow.
> > >
> > > It would be good to hold a survey that we promote beyond daily readers
> of
> > > this mailing list and raise this as an AIP, since it's a major change.
> > > Let's not rush it.
> > >
> > > -s
> > >
> > >> On Tue, Aug 28, 2018 at 9:24 AM Naik Kaxil  wrote:
> > >>
> > >> We should definitely support 3.7. I left comments on the PR @tedmiston
> > >> regarding the same. Python 2.7 will be dropped in 2020, so I guess we
> > >> should start planning about it. Not really 100% sure though that we
> > should
> > >> drop it in Airflow 2.0
> > >>
> > >> On 28/08/2018, 17:08, "Taylor Edmiston"  wrote:
> > >>
> > >>I am onboard with dropping Python 2.x support.  Django officially
> > >> dropped
> > >>Python 2.x support with their 2.0 release since December 2017.
> > >>
> > >>*Taylor Edmiston*
> > >>Blog  | CV
> > >> | LinkedIn
> > >> | AngelList
> > >> | Stack Overflow
> > >>
> > >>
> > >>
> > >>
> > >>On Tue, Aug 28, 2018 at 12:03 PM Ash Berlin-Taylor  >
> > >> wrote:
> > >>
> > >>> Supporting 3.7 is absolutely something we should do - it just got
> > >> released
> > >>> while we were already mid-way through the release process of 1.10 and
> > >>> didn't want the scope creep.
> > >>>
> > >>> I'm happy to release a 1.10.1 that supports Py 3.7. The only issue
> > >> I've
> > >>> seen so far is around the use of `async` as a keyword. both in
> > >>>
> > >>> A perhaps bigger question: What are people's thoughts on dropping
> > >> support
> > >>> for Python2? This wouldn't happen before 2.0 at the earliest if we
> > >> did it.
> > >>> Probably something to raise an AIP for.
> > >>>
> > >>> -ash
> > >>>
> > 
> > >>
> > >> Kaxil Naik
> > >>
> > >> Data Reply
> > >> 2nd Floor, Nova South
> > >> 160 Victoria Street, Westminster
> > >> London SW1E 5LB - UK
> > >> phone: +44 (0)20 7730 6000
> > >> k.n...@reply.com
> > >> www.reply.com
> > >> On 28 Aug 2018, at 16:56, Taylor Edmiston 
> wrote:
> > 
> >  We are also running on 3.6 for some time.
> > 
> >  I put a quick branch together adding / upgrading to 3.6 in all of
> > >> the
> >  places.  CI is still running so I may expect some test failures but
> >  hopefully nothing major.  I would be happy to merge this into
> > >> Kaxil's
> >  current #3815 or as a follow-on PR.  I'll paste this back onto his
> > >> PR as
> >  well.
> > 
> >  https://github.com/apache/incubator-airflow/pull/3816
> > 
> >  I think it's important for the project to officially support
> > >> Python 3.6
> >  latest especially since 3.7 is out now.  While we're on the topic,
> > >> does
> >  anyone else have thoughts on supporting 3.7 (perhaps unofficially
> > >> to
> >  start)?  I wouldn't mind starting to get that ball rolling.
> > 
> >  *Taylor Edmiston*
> >  Blog  | CV
> >   | LinkedIn
> >   | AngelList
> >   | Stack Overflow
> >  
> > 
> > 
> > 
> >  On Tue, Aug 28, 2018 at 9:29 AM Adam Boscarino
> >   wrote:
> > 
> > > fwiw, we run Airflow on Python 3.6.
> > >
> > > On Tue, Aug 28, 2018 at 8:30 AM Naik Kaxil 
> > >> wrote:
> > >
> > >> To provide more context to the issue:
> > >>
> > >>
> > >>
> > >> PyPI shows that Airflow 

Re: explicit_defaults_for_timestamp for mysql

2018-08-28 Thread Feng Lu
Bolke, a gentle ping..
Thank you.

On Thu, Aug 23, 2018, 23:01 Feng Lu  wrote:

> Hi all,
>
> After reading the MySQL documentation on the
> exlicit_defaults_for_timestamp, it appears that we can skip the check on 
> explicit_defaults_for_timestamp
> = 1
> <https://github.com/apache/incubator-airflow/blob/master/airflow/migrations/versions/0e2a74e0fc9f_add_time_zone_awareness.py#L43>
>  by
> setting the column to accept NULL explicitly. For example:
>
> op.alter_column(table_name='chart', column_name='last_modified',
> type_=mysql.TIMESTAMP(fsp=6)) -->
> op.alter_column(table_name='chart', column_name='last_modified',
> type_=mysql.TIMESTAMP(fsp=6), nullable=True)
>
> Here's why:
> From MySQL doc (when explicit_defaults_for_timestamp is set to True):
> "TIMESTAMP columns not explicitly declared with the NOT NULL attribute are
> automatically declared with the NULL attribute and permit NULL values.
> Assigning such a column a value of NULL sets it to NULL, not the current
> timestamp."
>
> Thanks and happy to shoot a PR if it makes sense.
>
> Feng
>
>
>


explicit_defaults_for_timestamp for mysql

2018-08-24 Thread Feng Lu
Hi all,

After reading the MySQL documentation on the
exlicit_defaults_for_timestamp, it appears that we can skip the check
on explicit_defaults_for_timestamp
= 1

by
setting the column to accept NULL explicitly. For example:

op.alter_column(table_name='chart', column_name='last_modified',
type_=mysql.TIMESTAMP(fsp=6)) -->
op.alter_column(table_name='chart', column_name='last_modified',
type_=mysql.TIMESTAMP(fsp=6), nullable=True)

Here's why:
>From MySQL doc (when explicit_defaults_for_timestamp is set to True):
"TIMESTAMP columns not explicitly declared with the NOT NULL attribute are
automatically declared with the NULL attribute and permit NULL values.
Assigning such a column a value of NULL sets it to NULL, not the current
timestamp."

Thanks and happy to shoot a PR if it makes sense.

Feng


Re: Sep Airflow Bay Area Meetup @ Google

2018-08-10 Thread Feng Lu
Hi all,

We still have 1-2 regular sessions and 4-5 lightening sessions available,
please send in your talks ;)
Here's a quick summary on the talks I have received:

Regular sessions:
Ben Gregory (Astronomer): Running Cloud Native Airflow.
Feng Lu (Google): Managing Airflow As a Service: Best Practices, Experience
and Roadmap
Fokko Driesprong (GoDataDriven): Apache Airflow in the Google Cloud:
Backfilling streaming data using Dataflow

Lightening Session:
Barni Seetharaman (Google): Deploy Airflow on Kubernetes using Airflow
Operator

Session type TBD:
Manish Ranjan (Tile): Functional yet cost-effective Data Engineering With
Airflow

Thanks and looking forward to the meetup(120+ sign-ups to date)!


Feng

On Thu, Jul 19, 2018 at 2:26 PM Feng Lu  wrote:

> Hi all,
>
> Hope you are enjoying your summer!
>
> This is Feng Lu from Google and we'll host the next Airflow meetup in our 
> Sunnyvale
> campus. We plan to add a *lightening session* this time for people to
> share their airflow ideas, work in progress, pain points, etc.
> Here's the meetup date and schedule:
>
> -- Sep 24 (Monday)  --
> 6:00PM meetup starts
> 6:00 - 8:00PM light dinner /mix-n-mingle
> 8:00PM - 9:40PM: 5 sessions (20 minutes each)
> 9:40PM to 10:10PM: 6 lightening sessions (5 minutes each)
> 10:10PM to 11:00PM: drinks and social hour
>
> I've seen a lot of interesting discussions in the dev mailing-list on
> security, scalability, event interactions, future directions, hosting
> platform and others. Please feel free to send your talk proposal to us by
> replying this email.
>
> The Cloud Composer team is also going to share their experience running
> Apache Airflow as a managed solution and service roadmap.
>
> Thank you and looking forward to hearing from y'all soon!
>
> p.s., if folks are interested, we can also add a one-day Airflow hackathon
> prior to the meet-up on the same day, please let us know.
>
> Feng
>
>
>
>
>
>
>
>
>


Re: Sep Airflow Bay Area Meetup @ Google

2018-07-24 Thread Feng Lu
The meetup event is now available for people to sign up:
https://www.meetup.com/Bay-Area-Apache-Airflow-Incubating-Meetup/events/253105418/.
yay!

On Mon, Jul 23, 2018 at 3:40 PM Chris Riccomini 
wrote:

> @Feng Lu   apparently you're listed as an EVENT
> ORGANIZER on the group. I believe that should allow you to create meetups.
> If not, can you let me know?
>
> On Sun, Jul 22, 2018 at 12:36 PM Ben Gregory  wrote:
>
>> Will do Feng!
>>
>> Also - is there an approximate date we'll know if the hackathon is going
>> to
>> happen? Want to make sure we can get a good attendance internally.
>>
>> Looking forward to it!
>>
>> On Sat, Jul 21, 2018 at 10:44 AM Feng Lu  wrote:
>>
>> > Sounds great, thank you Ben.
>> > When you get a chance, could you please send me your talk
>> > title/abstract/session type(regular or lightening)?
>> >
>> > On Fri, Jul 20, 2018 at 2:10 PM Ben Gregory  wrote:
>> >
>> >> Hey Feng!
>> >>
>> >> Awesome to hear that you're hosting the next meetup! We'd love to give
>> a
>> >> talk (and potentially a lightning session if available) -- we have a
>> number
>> >> of topics we could speak on but off the top of our heads we're thinking
>> >> "Running Cloud Native Airflow", tying in some of our work on the
>> Kubernetes
>> >> Executor. How does that sound?
>> >>
>> >> Also, if there ends up being an Airflow hackathon, you can absolutely
>> >> count us in. Let us know how we can help coordinate if the need
>> presents
>> >> itself!
>> >>
>> >> -Ben
>> >>
>> >> On Thu, Jul 19, 2018 at 3:26 PM Feng Lu 
>> >> wrote:
>> >>
>> >>> Hi all,
>> >>>
>> >>> Hope you are enjoying your summer!
>> >>>
>> >>> This is Feng Lu from Google and we'll host the next Airflow meetup in
>> >>> our Sunnyvale
>> >>> campus <http://1155 Borregas Ave, Sunnyvale, CA 94089>. We plan to
>> add
>> >>> a *lightening
>> >>> session* this time for people to share their airflow ideas, work in
>> >>> progress, pain points, etc.
>> >>> Here's the meetup date and schedule:
>> >>>
>> >>> -- Sep 24 (Monday)  --
>> >>> 6:00PM meetup starts
>> >>> 6:00 - 8:00PM light dinner /mix-n-mingle
>> >>> 8:00PM - 9:40PM: 5 sessions (20 minutes each)
>> >>> 9:40PM to 10:10PM: 6 lightening sessions (5 minutes each)
>> >>> 10:10PM to 11:00PM: drinks and social hour
>> >>>
>> >>> I've seen a lot of interesting discussions in the dev mailing-list on
>> >>> security, scalability, event interactions, future directions, hosting
>> >>> platform and others. Please feel free to send your talk proposal to
>> us by
>> >>> replying this email.
>> >>>
>> >>> The Cloud Composer team is also going to share their experience
>> running
>> >>> Apache Airflow as a managed solution and service roadmap.
>> >>>
>> >>> Thank you and looking forward to hearing from y'all soon!
>> >>>
>> >>> p.s., if folks are interested, we can also add a one-day Airflow
>> >>> hackathon
>> >>> prior to the meet-up on the same day, please let us know.
>> >>>
>> >>> Feng
>> >>>
>> >>
>> >>
>> >> --
>> >>
>> >> [image: Astronomer Logo] <https://www.astronomer.io/>
>> >>
>> >> *Ben Gregory*
>> >> Data Engineer
>> >>
>> >> Mobile: +1-615-483-3653 • Online: astronomer.io
>> >> <https://www.astronomer.io/>
>> >>
>> >> Download our new ebook. <http://marketing.astronomer.io/guide/> From
>> >> Volume to Value - A Guide to Data Engineering.
>> >>
>> >
>>
>> --
>>
>> [image: Astronomer Logo] <https://www.astronomer.io/>
>>
>> *Ben Gregory*
>> Data Engineer
>>
>> Mobile: +1-615-483-3653 • Online: astronomer.io <
>> https://www.astronomer.io/>
>>
>> Download our new ebook. <http://marketing.astronomer.io/guide/> From
>> Volume
>> to Value - A Guide to Data Engineering.
>>
>


Re: Sep Airflow Bay Area Meetup @ Google

2018-07-21 Thread Feng Lu
Sounds great, thank you Ben.
When you get a chance, could you please send me your talk
title/abstract/session type(regular or lightening)?

On Fri, Jul 20, 2018 at 2:10 PM Ben Gregory  wrote:

> Hey Feng!
>
> Awesome to hear that you're hosting the next meetup! We'd love to give a
> talk (and potentially a lightning session if available) -- we have a number
> of topics we could speak on but off the top of our heads we're thinking
> "Running Cloud Native Airflow", tying in some of our work on the Kubernetes
> Executor. How does that sound?
>
> Also, if there ends up being an Airflow hackathon, you can absolutely
> count us in. Let us know how we can help coordinate if the need presents
> itself!
>
> -Ben
>
> On Thu, Jul 19, 2018 at 3:26 PM Feng Lu  wrote:
>
>> Hi all,
>>
>> Hope you are enjoying your summer!
>>
>> This is Feng Lu from Google and we'll host the next Airflow meetup in
>> our Sunnyvale
>> campus <http://1155 Borregas Ave, Sunnyvale, CA 94089>. We plan to add
>> a *lightening
>> session* this time for people to share their airflow ideas, work in
>> progress, pain points, etc.
>> Here's the meetup date and schedule:
>>
>> -- Sep 24 (Monday)  --
>> 6:00PM meetup starts
>> 6:00 - 8:00PM light dinner /mix-n-mingle
>> 8:00PM - 9:40PM: 5 sessions (20 minutes each)
>> 9:40PM to 10:10PM: 6 lightening sessions (5 minutes each)
>> 10:10PM to 11:00PM: drinks and social hour
>>
>> I've seen a lot of interesting discussions in the dev mailing-list on
>> security, scalability, event interactions, future directions, hosting
>> platform and others. Please feel free to send your talk proposal to us by
>> replying this email.
>>
>> The Cloud Composer team is also going to share their experience running
>> Apache Airflow as a managed solution and service roadmap.
>>
>> Thank you and looking forward to hearing from y'all soon!
>>
>> p.s., if folks are interested, we can also add a one-day Airflow hackathon
>> prior to the meet-up on the same day, please let us know.
>>
>> Feng
>>
>
>
> --
>
> [image: Astronomer Logo] <https://www.astronomer.io/>
>
> *Ben Gregory*
> Data Engineer
>
> Mobile: +1-615-483-3653 • Online: astronomer.io
> <https://www.astronomer.io/>
>
> Download our new ebook. <http://marketing.astronomer.io/guide/> From
> Volume to Value - A Guide to Data Engineering.
>


Re: Sep Airflow Bay Area Meetup @ Google

2018-07-20 Thread Feng Lu
To Naik: yes, it'll be recorded.
To Manish: yes (via the meetup link which I'll set up and share with
everyone soon) so we can pre-print out guest badges ;)
To George: already reached out to the meetup group organizers yesterday so
I can create a new meetup.

On Fri, Jul 20, 2018 at 10:17 AM George Leslie-Waksman 
wrote:

> Hi Feng,
>
> Thank you for organizing. Would you please add the meetup to the
> meetup.com
> listing: https://www.meetup.com/Bay-Area-Apache-Airflow-Incubating-Meetup/
>
> Also, if you're looking for speakers, I've been digging into all of
> Airflow's different concurrency controls and have a scheduler tuning talk
> simmering in the back of my mind.
>
> Thanks,
> George Leslie-Waksman
>
> On Fri, Jul 20, 2018 at 3:06 AM Naik Kaxil  wrote:
>
> > Hi Feng,
> >
> > Will the session be recorded? Will love if it is. :)
> >
> > On 19/07/2018, 22:26, "Feng Lu"  wrote:
> >
> > Hi all,
> >
> > Hope you are enjoying your summer!
> >
> > This is Feng Lu from Google and we'll host the next Airflow meetup in
> > our Sunnyvale
> > campus <http://1155 Borregas Ave, Sunnyvale, CA 94089>. We plan to
> add
> > a *lightening
> > session* this time for people to share their airflow ideas, work in
> > progress, pain points, etc.
> > Here's the meetup date and schedule:
> >
> > -- Sep 24 (Monday)  --
> > 6:00PM meetup starts
> > 6:00 - 8:00PM light dinner /mix-n-mingle
> > 8:00PM - 9:40PM: 5 sessions (20 minutes each)
> > 9:40PM to 10:10PM: 6 lightening sessions (5 minutes each)
> > 10:10PM to 11:00PM: drinks and social hour
> >
> > I've seen a lot of interesting discussions in the dev mailing-list on
> > security, scalability, event interactions, future directions, hosting
> > platform and others. Please feel free to send your talk proposal to
> us
> > by
> > replying this email.
> >
> > The Cloud Composer team is also going to share their experience
> running
> > Apache Airflow as a managed solution and service roadmap.
> >
> > Thank you and looking forward to hearing from y'all soon!
> >
> > p.s., if folks are interested, we can also add a one-day Airflow
> > hackathon
> > prior to the meet-up on the same day, please let us know.
> >
> > Feng
> >
> >
> >
> >
> >
> >
> > Kaxil Naik
> >
> > Data Reply
> > 2nd Floor, Nova South
> > 160 Victoria Street, Westminster
> > London SW1E 5LB - UK
> > phone: +44 (0)20 7730 6000 <+44%2020%207730%206000>
> > k.n...@reply.com
> > www.reply.com
> >
>


Sep Airflow Bay Area Meetup @ Google

2018-07-19 Thread Feng Lu
Hi all,

Hope you are enjoying your summer!

This is Feng Lu from Google and we'll host the next Airflow meetup in
our Sunnyvale
campus <http://1155 Borregas Ave, Sunnyvale, CA 94089>. We plan to add
a *lightening
session* this time for people to share their airflow ideas, work in
progress, pain points, etc.
Here's the meetup date and schedule:

-- Sep 24 (Monday)  --
6:00PM meetup starts
6:00 - 8:00PM light dinner /mix-n-mingle
8:00PM - 9:40PM: 5 sessions (20 minutes each)
9:40PM to 10:10PM: 6 lightening sessions (5 minutes each)
10:10PM to 11:00PM: drinks and social hour

I've seen a lot of interesting discussions in the dev mailing-list on
security, scalability, event interactions, future directions, hosting
platform and others. Please feel free to send your talk proposal to us by
replying this email.

The Cloud Composer team is also going to share their experience running
Apache Airflow as a managed solution and service roadmap.

Thank you and looking forward to hearing from y'all soon!

p.s., if folks are interested, we can also add a one-day Airflow hackathon
prior to the meet-up on the same day, please let us know.

Feng


Re: Deprecating Run task from Airflow webUI

2018-07-02 Thread Feng Lu
Thank you Max, kindly see my reply inline:
On Sun, Jul 1, 2018 at 1:01 PM Maxime Beauchemin 
wrote:

> Few thoughts:
> * in our environment at Lyft, cleared tasks do get picked up by the
> scheduler. Is there an issue opened for the bug you are referring to? is
> that on 1.9.0?
> * "clearing" in both the web ui and CLI also flips the DagRun state back
> to running as the intent of clearing is usually to get the scheduler to
> pick it up, there may be caveats when the DagRun doesn't exist, or is a
> DagRun of the "backfill" type. Maybe `clear` should make sure that there's
> a proper DagRun
> * "clear" is confusing as a verb to use when the actual intent is to
> reprocess...
> * another [backward compatible] option is to put the feature behind a
> feature flag and set that flag to False in your environment
>
We reproduced the bug in 1.9 and it appears that this will be an issue for
master as well (will confirm).

>
> Whether the web server and the upcoming REST API should be able to talk to
> an executor is bigger question. We may want Airflow to allow running
> arbitrary DAG (through the upcoming DagFetcher abstraction) on arbitrary
> executors, and the REST API (web server) may need to communicate with the
> executor for that purpose. Though that's far enough ahead that it's mostly
> unrelated to your current concern
>

> I'll go a bit further and say that eventually we should have a way to run
> local, "in-development" DAGs on a remote executor in an adhoc fashion. In
> next-gen Airflow that would go through publishing a DAG to the DagFetcher
> abstraction (say git://{org}/{repo}/{gitref_say_a_branch}/{path/to/dag.py}
> ), and run say `airflow test` or `airflow backfill` through the REST API
> and get that to run remote on k8s (through k8sexecutor) for instance. I
> think this may require the REST api talking to the executor.
>
> +1 to the idea of DAG testing/execution in a remote Airflow setup.
To implement the remote testing/backfll as a RESTful API, it requires us to
define the corresponding resource collection/object which will exist in the
metadata database.
The executor can next start DAG runs based on resource updates in the
metadata database.
It seems that we can decouple API server and executor this way.

Max
>
> On Fri, Jun 29, 2018 at 8:03 PM Feng Lu  wrote:
>
>> Re-attaching the image..
>>
>> On Fri, Jun 29, 2018 at 4:54 PM Feng Lu  wrote:
>>
>>> Hi all,
>>>
>>> Please take a look at our proposal to deprecate Run task in Airflow
>>> webUI.
>>>
>>> *What?*
>>> Deprecate Run task support in the Airflow webUI and make it a no-op for
>>> now.
>>>
>>> [image: xGAcOrcLJE4.png]
>>> ​
>>> *Why?*
>>>
>>>1. It only works with CeleryExecutor
>>>
>>> <https://github.com/apache/incubator-airflow/blob/master/airflow/www/views.py#L1001-L1003>
>>>and renders an inconsistent experience for other types of Executors.
>>>2. It requires Airflow webserver to have direct connection with the
>>>message backend of CeleryExecutor, and opens more vulnerability in the
>>>system. In many cases, users may want to restrict access to the celery
>>>messaging backend as much as possible.
>>>
>>> *Mitigation:*
>>> This Run task feature is mainly for the purpose of re-executing of a
>>> previously running task which got stuck in running and deleted manually.
>>> It's currently a two step process:
>>> 1. Navigate to the task instance view page and delete the running task
>>> that's stuck
>>> 2. Go back to DAG/task view and click "Run"
>>>
>>> We proposed to combine the two steps, after a running task is deleted,
>>> the Airflow scheduler will automatically re-schedule (which it does today)
>>> and re-queue the task (there's a bug that needs to be fixed).
>>>
>>> *Fix:*
>>> The scheduler currently doesn't not automatically re-queue the task
>>> despite the task instance has changed from running to scheduled state. The
>>> heartbeat check incorrectly returns a success in this case. The root cause
>>> is that LocalTaskJob doesn't set the job state to failed (details
>>> <https://github.com/apache/incubator-airflow/blob/master/airflow/jobs.py#L2674-L2681>)
>>> when a running task is externally deleted and confuses the heartbeat
>>> check
>>> <https://github.com/apache/incubator-airflow/blob/master/airflow/models.py#L443>
>>> .
>>> Once this is fixed, a killed running task instance will be
>>> auto-scheduled/enqueued for execution, verified locally.
>>>
>>> Thank you.
>>>
>>> Feng
>>>
>>>
>>>


Re: Deprecating Run task from Airflow webUI

2018-06-29 Thread Feng Lu
Re-attaching the image..

On Fri, Jun 29, 2018 at 4:54 PM Feng Lu  wrote:

> Hi all,
>
> Please take a look at our proposal to deprecate Run task in Airflow webUI.
>
> *What?*
> Deprecate Run task support in the Airflow webUI and make it a no-op for
> now.
>
> [image: xGAcOrcLJE4.png]
> ​
> *Why?*
>
>1. It only works with CeleryExecutor
>
> <https://github.com/apache/incubator-airflow/blob/master/airflow/www/views.py#L1001-L1003>
>and renders an inconsistent experience for other types of Executors.
>2. It requires Airflow webserver to have direct connection with the
>message backend of CeleryExecutor, and opens more vulnerability in the
>system. In many cases, users may want to restrict access to the celery
>messaging backend as much as possible.
>
> *Mitigation:*
> This Run task feature is mainly for the purpose of re-executing of a
> previously running task which got stuck in running and deleted manually.
> It's currently a two step process:
> 1. Navigate to the task instance view page and delete the running task
> that's stuck
> 2. Go back to DAG/task view and click "Run"
>
> We proposed to combine the two steps, after a running task is deleted, the
> Airflow scheduler will automatically re-schedule (which it does today) and
> re-queue the task (there's a bug that needs to be fixed).
>
> *Fix:*
> The scheduler currently doesn't not automatically re-queue the task
> despite the task instance has changed from running to scheduled state. The
> heartbeat check incorrectly returns a success in this case. The root cause
> is that LocalTaskJob doesn't set the job state to failed (details
> <https://github.com/apache/incubator-airflow/blob/master/airflow/jobs.py#L2674-L2681>)
> when a running task is externally deleted and confuses the heartbeat check
> <https://github.com/apache/incubator-airflow/blob/master/airflow/models.py#L443>
> .
> Once this is fixed, a killed running task instance will be
> auto-scheduled/enqueued for execution, verified locally.
>
> Thank you.
>
> Feng
>
>
>


Deprecating Run task from Airflow webUI

2018-06-29 Thread Feng Lu
Hi all,

Please take a look at our proposal to deprecate Run task in Airflow webUI.

*What?*
Deprecate Run task support in the Airflow webUI and make it a no-op for
now.

[image: xGAcOrcLJE4.png]
​
*Why?*

   1. It only works with CeleryExecutor
   

   and renders an inconsistent experience for other types of Executors.
   2. It requires Airflow webserver to have direct connection with the
   message backend of CeleryExecutor, and opens more vulnerability in the
   system. In many cases, users may want to restrict access to the celery
   messaging backend as much as possible.

*Mitigation:*
This Run task feature is mainly for the purpose of re-executing of a
previously running task which got stuck in running and deleted manually.
It's currently a two step process:
1. Navigate to the task instance view page and delete the running task
that's stuck
2. Go back to DAG/task view and click "Run"

We proposed to combine the two steps, after a running task is deleted, the
Airflow scheduler will automatically re-schedule (which it does today) and
re-queue the task (there's a bug that needs to be fixed).

*Fix:*
The scheduler currently doesn't not automatically re-queue the task despite
the task instance has changed from running to scheduled state. The
heartbeat check incorrectly returns a success in this case. The root cause
is that LocalTaskJob doesn't set the job state to failed (details
)
when a running task is externally deleted and confuses the heartbeat check

.
Once this is fixed, a killed running task instance will be
auto-scheduled/enqueued for execution, verified locally.

Thank you.

Feng


Re: Tasks in remain in queued status

2018-06-01 Thread Feng Lu
I see, there are a couple of reasons, for example, you don't have enough
celery workers or workers are faulty.
You may want to inspect the celery state using the GUI-based flower
 tool.
Alternatively, try delete all queued tasks, Airflow scheduler should
re-generate and re-queue these task instances.

On Fri, Jun 1, 2018 at 4:30 PM Pedro Machado  wrote:

> >
> > Using postgres and redis running in their containers. The set up is based
> > on the astronomer open set up:
> >
> https://github.com/astronomerio/astronomer/blob/master/examples/airflow-enterprise/docker-compose.yml
>


Re: Tasks in remain in queued status

2018-06-01 Thread Feng Lu
What message broker and result backend are you using?
Please make sure they are persistent and data can survive between restarts.

On Fri, Jun 1, 2018 at 11:13 AM Pedro Machado  wrote:

> Hi,
>
> I am sure this must have come up before but I a quick search of the list
> archive didn't return any results.
>
> I am running airflow 1.9 on docker using the CeleryExecutor.
>
> I am seeing tasks that stay in queued status for a long time. This morning,
> I restarted the scheduler service and the tasks started right away. I
> already have a few more in this state.
>
> What do you recommend to troubleshoot this problem?
>
> Thanks,
>
> Pedro
>


Re: Apache Airflow welcome new committer/PMC member : Naik Kaxil (a.k.a. kaxil)

2018-05-08 Thread Feng Lu
Welcome Naik!

On Tue, May 8, 2018 at 3:39 PM Tao Feng  wrote:

> Welcome!
>
> On Tue, May 8, 2018 at 1:47 PM, Driesprong, Fokko 
> wrote:
>
> > Hi Airflow'ers,
> >
> > Please join the Apache Airflow PMC in welcoming its newest member and
> > co-committer, Naik Kaxil (a.k.a. kaxil ).
> > Welcome
> > Kaxil, great to have you on board!
> >
> > Cheers, Fokko
> >
>


Re: Managed Apache Airflow Service on Google Cloud Platform

2018-05-01 Thread Feng Lu
Thank you all and looking forward to the many collaborations to come!

On Tue, May 1, 2018, 17:37 Alex Tronchin-James 949-412-7220 <
alex.n.ja...@gmail.com> wrote:

> Bravo!
>
> On Tue, May 1, 2018, 12:57 Driesprong, Fokko <fo...@driesprong.frl> wrote:
>
> > Awesome! Looking forward to give it a spin! Great job guys!
> >
> > Cheers!
> >
> > 2018-05-01 21:26 GMT+02:00 Arthur Wiedmer <arthur.wied...@gmail.com>:
> >
> > > Feng,
> > >
> > > We are really grateful for the work Googlers have put in the in the
> > > project, including improving compatibility with GCP.
> > >
> > > Thanks for your contributions and congratulations on the launch!
> > >
> > > Best regards,
> > > Arthur
> > >
> > >
> > > On Tue, May 1, 2018 at 9:59 AM Feng Lu <fen...@google.com.invalid>
> > wrote:
> > >
> > > > *Hello everyone,I want to let everyone know that today Google Cloud
> > > > launched a new managed service based on Apache Airflow - Cloud
> > > Composer[1].
> > > > Now that we have launched into public beta, I wanted to connect with
> > the
> > > > community to share why we chose Airflow and our plans for Composer
> and
> > > > involvement with the Airflow community.A year ago we set out to
> build a
> > > > workflow orchestration product for Google Cloud. We strongly believe
> > that
> > > > such a system should be based on open source - it’s described as a
> core
> > > > value on our public landing page[2]. We chose Airflow for many
> reasons,
> > > > including the awesome community, its approachability for developers,
> > and
> > > > its core concepts. We built Cloud Composer because we wanted to make
> > > > Airflow accessible to all Google Cloud customers. We’re also
> > encouraging
> > > > these customers to use Airflow outside of Google Cloud - whether it
> be
> > > > another Cloud or on-premise. When we started building Cloud Composer
> we
> > > got
> > > > involved in the Airflow community. You have probably seen a few
> > Googlers
> > > > submitting pull requests, including myself. We do not plan on forking
> > > > Airflow with the release of Cloud Composer and it’s our commitment to
> > > > remain involved in the Airflow community as we grow Composer. We will
> > > > continue to actively contribute to Airflow and look forward to
> > partnering
> > > > with the community. You should expect to see myself and other
> Googlers
> > > > involved in Airflow in the future.Best,Feng[1]
> > > > https://cloud.google.com/composer <https://cloud.google.com/composer
> > >[2]
> > > > https://cloud.google.com/ <https://cloud.google.com/>*
> > > >
> > >
> >
>


Managed Apache Airflow Service on Google Cloud Platform

2018-05-01 Thread Feng Lu
*Hello everyone,I want to let everyone know that today Google Cloud
launched a new managed service based on Apache Airflow - Cloud Composer[1].
Now that we have launched into public beta, I wanted to connect with the
community to share why we chose Airflow and our plans for Composer and
involvement with the Airflow community.A year ago we set out to build a
workflow orchestration product for Google Cloud. We strongly believe that
such a system should be based on open source - it’s described as a core
value on our public landing page[2]. We chose Airflow for many reasons,
including the awesome community, its approachability for developers, and
its core concepts. We built Cloud Composer because we wanted to make
Airflow accessible to all Google Cloud customers. We’re also encouraging
these customers to use Airflow outside of Google Cloud - whether it be
another Cloud or on-premise. When we started building Cloud Composer we got
involved in the Airflow community. You have probably seen a few Googlers
submitting pull requests, including myself. We do not plan on forking
Airflow with the release of Cloud Composer and it’s our commitment to
remain involved in the Airflow community as we grow Composer. We will
continue to actively contribute to Airflow and look forward to partnering
with the community. You should expect to see myself and other Googlers
involved in Airflow in the future.Best,Feng[1]
https://cloud.google.com/composer [2]
https://cloud.google.com/ *


Re: Airflow 1.10

2018-03-27 Thread Feng Lu
+1 for 1.10, thank you Chris!

On Mon, Mar 26, 2018 at 12:44 PM Milan van der Meer <
milan.vanderm...@riaktr.com> wrote:

> The fix for the on_kill method for all the operators should really be
> included in the 1.10 release.
> https://github.com/apache/incubator-airflow/pull/2975
>
> On Mon, Mar 26, 2018 at 8:52 PM, Chris Riccomini 
> wrote:
>
> > Hey all,
> >
> > Now that the RBAC UI has been merged in, I wanted to revive this thread
> > again. Timezone stuff has been in for a while. I'm not sure on the status
> > of the K8s stuff, but I do feel it's time to start getting the ball
> rolling
> > on a 1.10 release. What do others think? Any volunteers for a release
> > manager? :)
> >
> > Cheers,
> > Chris
> >
> > On Tue, Jan 16, 2018 at 12:42 PM, George Leslie-Waksman <
> > geo...@cloverhealth.com.invalid> wrote:
> >
> > > +1
> > >
> > > On Tue, Jan 16, 2018 at 11:08 AM Joy Gao  wrote:
> > >
> > > > The new FAB UI does not modify the existing API (i.e. www2/api/ will
> > be a
> > > > copy of www/api/), and the endpoints are registered as blueprints to
> > the
> > > > flask app the same way as before, so it is fully backward compatible.
> > > >
> > > > Although FAB offers REST APIs on the models out-of-the-box, it
> > currently
> > > is
> > > > still in BETA and does not support any HTTP authentication scheme,
> > > meaning
> > > > it would require a cookie to mimic a normal user session in order to
> be
> > > > used. In the long run (for Airflow 2.X+), I believe the best approach
> > is
> > > to
> > > > patch FAB with auth backends, so we can apply @has_access annotation
> on
> > > all
> > > > rest endpoints to streamline authentication.  For 1.10, the current
> > plan
> > > is
> > > > to leave the API as is. I wouldn't recommend folks to use the
> FAB-based
> > > > REST APIs yet. I would also like to release the new UI as an alpha
> > > version,
> > > > and wait for 2.0 before promoting it to the default version. This
> will
> > > give
> > > > us some time to address any new UI bugs which I overlooked.
> > > >
> > > > +1 on polishing! (With the exception of "*Rest Api should standardise
> > and
> > > > have proper swagger definitions" and any other bugs that require
> major
> > > > overhaul*, which I think can wait until 2.0)
> > > >
> > > >
> > > > On Sun, Jan 14, 2018 at 12:18 PM, Bolke de Bruin 
> > > > wrote:
> > > >
> > > > > Chris, Joy,
> > > > >
> > > > > Can you shed some light on the backward compatibility of the new
> UI,
> > > > > particularly with regards to the API? The API for example cannot
> use
> > > the
> > > > > login from FAB afaik.
> > > > >
> > > > > As much of the work is already in for 1.10 I think focus should be
> on
> > > > > polishing. There are some minor quirks and slightly annoying bugs.
> > > > >
> > > > > - It seems a dag with schedule “none” can still run when turned on
> > from
> > > > > the UI (unconfirmed)
> > > > > - Exceptions are swallowed when importing a custom logging conf
> > > > > - UI only displays UTC
> > > > > - Logging ends up duplicated (fixed in master)
> > > > > - Tasks instantiated outside airflow do not set default time zone
> > > > (@msumit)
> > > > > - Log file retrieval feels archaic (non local?)
> > > > > - Rest Api should standardise and have proper swagger definitions
> > > > >
> > > > > And probably some others.
> > > > >
> > > > > Bolke
> > > > >
> > > > > > On 14 Jan 2018, at 15:41, Driesprong, Fokko  >
> > > > > wrote:
> > > > > >
> > > > > > I think 1.10 is a good idea. I'm working on this refactoring of
> the
> > > > > sensor
> > > > > > structure: https://github.com/apache/incubator-airflow/pull/2875
> > > > > >
> > > > > > Would be awesome to get this in. At my current project we use
> > sensors
> > > > in
> > > > > a
> > > > > > few places, but still there is some work to be done. For example,
> > > don't
> > > > > > allocate an executor slot to the sensors, but have a more
> > > sophisticated
> > > > > way
> > > > > > of poking.
> > > > > >
> > > > > > Cheers, Fokko
> > > > > >
> > > > > >
> > > > > >
> > > > > > 2018-01-12 21:19 GMT+01:00 Chris Riccomini <
> criccom...@apache.org
> > >:
> > > > > >
> > > > > >> Just the operator (AIRFLOW-1517)
> > > > > >>
> > > > > >> On Fri, Jan 12, 2018 at 11:21 AM, Anirudh Ramanathan <
> > > > > >> ramanath...@google.com.invalid> wrote:
> > > > > >>
> > > > > >>> Sounds awesome. Is k8s support here referring to both the
> > executor
> > > > and
> > > > > >> the
> > > > > >>> operator?
> > > > > >>>
> > > > > >>> Thanks,
> > > > > >>>
> > > > > >>>
> > > > > >>> On Jan 12, 2018 11:18 AM, "Sid Anand" 
> wrote:
> > > > > >>>
> > > > >  +1
> > > > > 
> > > > > 
> > > > >  On Fri, Jan 12, 2018 at 10:56 AM, Chris Riccomini <
> > > > > >> criccom...@apache.org
> > > > > 
> > > > >  wrote:
> > > > > 
> > > > > > Hey all,
> > > > > >
> > > > > > After some past 

Re: Help with podling status report?

2018-02-08 Thread Feng Lu
Thank you Sid!


On Thu, Feb 8, 2018 at 10:41 AM Sid Anand <san...@apache.org> wrote:

> I can take care of it :-)
>
> -s
>
> On Thu, Feb 8, 2018 at 9:13 AM, Chris Riccomini <criccom...@apache.org>
> wrote:
>
> > Hey Feng,
> >
> > You need to sign up for an account if you don't have access to the wiki
> > already. Once you have an account you can edit and update the page that I
> > linked to above.
> >
> > I would more or less copy what's in the October update, but you'll have
> to
> > update the various stats using JIRA, github, etc. Bit of manual work, but
> > hopefully doesn't take too long.
> >
> > Cheers,
> > Chris
> >
> > On Wed, Feb 7, 2018 at 3:55 PM, Feng Lu <fen...@google.com.invalid>
> wrote:
> >
> > > Chris, I am happy to help if you (or anyone in the mailing group) could
> > let
> > > me know how to update the podling report.
> > >
> > >
> > > On Wed, Feb 7, 2018 at 8:25 AM Chris Riccomini <criccom...@apache.org>
> > > wrote:
> > >
> > > > Hey all,
> > > >
> > > > If someone has an extra 15m today, could you please fill out:
> > > >
> > > > https://wiki.apache.org/incubator/February2018
> > > >
> > > > Anyone is welcome to fill it out (via
> > > > https://incubator.apache.org/guides/ppmc.html#podling_status_reports
> ):
> > > >
> > > > > The PPMC does not have to fill out the report itself; the PPMC is
> > just
> > > > responsible for making sure that it gets filled out
> > > >
> > > > This is a nice light weight way that someone can contribute a little
> > bit
> > > of
> > > > help to the project.
> > > >
> > > > An example of a prior report is here:
> > > >
> > > > https://wiki.apache.org/incubator/October2017
> > > >
> > > > Cheers,
> > > > Chris
> > > >
> > >
> >
>


Re: Airflow 1.9.0 is released

2018-01-03 Thread Feng Lu
+1, thanks a lot Chris!

On Wed, Jan 3, 2018 at 10:33 AM, Driesprong, Fokko 
wrote:

> Awesome work!
>
> Cheers, Fokko
>
> Op wo 3 jan. 2018 om 18:50 schreef Chris Riccomini 
>
> > Hey all,
> >
> > I have updated the docs as well:
> >
> > https://airflow.incubator.apache.org/
> >
> > Cheers,
> > Chris
> >
> > On Wed, Jan 3, 2018 at 9:30 AM, Sid Anand  wrote:
> >
> > > Grazzi!
> > > -s
> > >
> > > On Wed, Jan 3, 2018 at 9:03 AM, Chris Riccomini  >
> > > wrote:
> > >
> > > > Fixed!
> > > >
> > > > On Tue, Jan 2, 2018 at 9:23 PM, Niranda Perera <
> > niranda...@cse.mrt.ac.lk
> > > >
> > > > wrote:
> > > >
> > > > > Hi sid,
> > > > >
> > > > > in here,
> > > > >   - Announcements : https://cwiki.apache.org/
> > > confluence/display/AIRFLOW/
> > > > > Announcements#Announcements-Jan2,2018
> > > > >
> > > > > Source & Binary "Sdist" release link is broken! there's a space in
> > the
> > > > > middle ;-)
> > > > >
> > > > > Best regards
> > > > >
> > > > > Niranda Perera
> > > > > Research Assistant
> > > > > Dept of CSE, University of Moratuwa
> > > > > niranda...@cse.mrt.ac.lk
> > > > > +94 71 554 8430
> > > > > https://lk.linkedin.com/in/niranda
> > > > >
> > > > > On Wed, Jan 3, 2018 at 5:14 AM, Sid Anand 
> wrote:
> > > > >
> > > > > > Woohoo!!! Thanks Chris & Bolke! Supreme accomplishment!
> > > > > >
> > > > > > I've updated:
> > > > > >
> > > > > >- Announcements : https://cwiki.apache.org/
> > > > > confluence/display/AIRFLOW/
> > > > > >Announcements#Announcements-Jan2,2018
> > > > > > > > > > > Announcements#Announcements-Jan2,2018>
> > > > > >- Twitter : https://twitter.com/ApacheAirflow/status/
> > > > > 948337902001405952
> > > > > >- Updated our podling report with the new release :
> > > > > >https://wiki.apache.org/incubator/January2018
> > > > > >
> > > > > > Again, great work getting this out!
> > > > > > -s
> > > > > >
> > > > > > On Tue, Jan 2, 2018 at 2:57 PM, Marc Bollinger <
> m...@lumoslabs.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Phew! Great job, all involved! 
> > > > > > >
> > > > > > > On Tue, Jan 2, 2018 at 2:52 PM, Andy Loughran 
> > > wrote:
> > > > > > >
> > > > > > > > Congratulations guys - lots and lots of voting and you got it
> > > over
> > > > > the
> > > > > > > > line.
> > > > > > > >
> > > > > > > > happy new year!
> > > > > > > >
> > > > > > > > Andy
> > > > > > > >
> > > > > > > > On 2 January 2018 at 22:45, Arthur Wiedmer <
> > > > arthur.wied...@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Woohoo! 
> > > > > > > > >
> > > > > > > > > Thanks Chris!
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Tue, Jan 2, 2018 at 2:40 PM, Chris Riccomini <
> > > > > > criccom...@apache.org
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Dear Airflow community,
> > > > > > > > > >
> > > > > > > > > > Airflow 1.9.0 was just released.
> > > > > > > > > >
> > > > > > > > > > The source release as well as the binary "sdist" release
> > are
> > > > > > > available
> > > > > > > > > > here:
> > > > > > > > > >
> > > > > > > > > > https://dist.apache.org/repos/dist/release/incubator/
> > > > > > > > > > airflow/1.9.0-incubating/
> > > > > > > > > >
> > > > > > > > > > We also made this version available on PyPi for
> convenience
> > > > (`pip
> > > > > > > > install
> > > > > > > > > > apache-airflow`):
> > > > > > > > > >
> > > > > > > > > > https://pypi.python.org/pypi/apache-airflow
> > > > > > > > > >
> > > > > > > > > > Find the CHANGELOG here for more details:
> > > > > > > > > >
> > > > > > > > > >
> > https://github.com/apache/incubator-airflow/blob/master/CHAN
> > > > > > > GELOG.txt
> > > > > > > > > >
> > > > > > > > > > Cheers,
> > > > > > > > > > Chris
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: How to bes use Google Cloud Storage for logging?

2017-12-18 Thread Feng Lu
Hi Kevin,

Kindly see my reply inline:

On Mon, Dec 18, 2017 at 3:28 PM, Kevin Lam  wrote:

> Hi,
>
> I'm trying to get airflow to use GCS for logging purposes and had a few
> questions.
>
> We're currently using Airflow 1.9rc2, running in a Kubernetes Airflow
> deployment (similar to https://github.com/mumoshu/kube-airflow)
>
> 1/ Seems like the logging code has been going through some changes in the
> recent versions. What's the correct way to set up GCS for logging? Is it by
> just specifying remote_base_log_folder and remote_log_conn_id in
> airflow.cfg? Or by following this guide:
> http://airflow.readthedocs.io/en/latest/integration.html#gcp, using the
> python based logging config? Is there an Airflow version that we should use
> to be most stable?
>
The python based logging config is the right place to make changes, in our
test setup, we override the airflow_local_settings.py similarly to the link
you pasted.
You may also want to config: [core]task_log_reader = gcs.task


>
> 2/ Is there a way to encode the connection for GCS in a file so that one
> doesn't have to open the webserver and create it from the admin panel? It'd
> be nice if the GCS connection would be automatically created.
>
Unfortunately GCS connection ties to some GCP project and is impossible to
pre-populate.
Airflow1.9 should fix the gcp connection type issue  (
https://github.com/apache/incubator-airflow/commit/2f107d8a30910fd025774004d5c4c95407ed55c5),
so you can use airflow connections CLI directly.


>
> Thanks in advance for your help!
>


Re: [VOTE] Airflow 1.9.0rc8

2017-12-15 Thread Feng Lu
+0.5 (non-binding)

Looks like the version(1.9.0) and tag(1.9.0rc8) is mismatched,

which will cause the installation (pip install or python setup) to error
out and fail.
nit: mind also updating the release log "
https://github.com/apache/incubator-airflow/blob/1.9.0rc8/CHANGELOG.txt;


On Fri, Dec 15, 2017 at 3:21 PM, Driesprong, Fokko 
wrote:

> +1 binding
>
> Op vr 15 dec. 2017 om 23:39 schreef Bolke de Bruin 
>
> > +1, binding
> >
> > Checked sigs, version, source is there (did not check build), bin is
> there.
> >
> > Bolke
> >
> > Verstuurd vanaf mijn iPad
> >
> > > Op 15 dec. 2017 om 23:31 heeft Joy Gao  het volgende
> > geschreven:
> > >
> > > +1, binding
> > >
> > > Thank you Chris!
> > >
> > > On Fri, Dec 15, 2017 at 2:30 PM, Chris Riccomini <
> criccom...@apache.org>
> > > wrote:
> > >
> > >> Hey all,
> > >>
> > >> (Last time, I hope)^2
> > >>
> > >> I have cut Airflow 1.9.0 RC8. This email is calling a vote on the
> > release,
> > >> which will last for 72 hours. Consider this my (binding) +1.
> > >>
> > >> Airflow 1.9.0 RC8 is available at:
> > >>
> > >> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.9.0rc8/
> > >>
> > >> apache-airflow-1.9.0rc8+incubating-source.tar.gz is a source release
> > that
> > >> comes with INSTALL instructions.
> > >> apache-airflow-1.9.0rc8+incubating-bin.tar.gz is the binary Python
> > "sdist"
> > >> release.
> > >>
> > >> Public keys are available at:
> > >>
> > >> https://dist.apache.org/repos/dist/release/incubator/airflow/
> > >>
> > >> The release contains no new JIRAs. Just a version fix.
> > >>
> > >> I also had to change the version number to exclude the `rc6` string as
> > well
> > >> as the "+incubating" string, so it's now simply 1.9.0. This will allow
> > us
> > >> to rename the artifact without modifying the artifact checksums when
> we
> > >> actually release.
> > >>
> > >> See JIRAs that were in 1.9.0RC7 and before (see previous VOTE email
> for
> > >> full list).
> > >>
> > >> Cheers,
> > >> Chris
> > >>
> >
>


Re: [VOTE] Airflow 1.9.0rc6

2017-12-13 Thread Feng Lu
+1 (non-binding)

On Tue, Dec 12, 2017 at 10:31 AM, Driesprong, Fokko 
wrote:

> +1 from my side
>
> Cheers, Fokko
>
> Op di 12 dec. 2017 om 17:28 schreef Ash Berlin-Taylor <
> ash_airflowl...@firemirror.com>
>
> > +0.5 from me.
> >
> > Our big test will come on Thursday morning, but looking good so far for
> > the small daily dags we've got are running okay, logs are showing up, and
> > making their way to S3.
> >
> > -ash
> >
> > > On 11 Dec 2017, at 18:50, Chris Riccomini 
> wrote:
> > >
> > > Hey all,
> > >
> > > I have cut Airflow 1.9.0 RC6. This email is calling a vote on the
> > release,
> > > which will last for 72 hours. Consider this my (binding) +1.
> > >
> > > Airflow 1.9.0 RC6 is available at:
> > >
> > > https://dist.apache.org/repos/dist/dev/incubator/airflow/1.9.0rc6/
> > >
> > > apache-airflow-1.9.0rc6+incubating-source.tar.gz is a source release
> that
> > > comes with INSTALL instructions.
> > > apache-airflow-1.9.0rc6+incubating-bin.tar.gz is the binary Python
> > "sdist"
> > > release.
> > >
> > > Public keys are available at:
> > >
> > > https://dist.apache.org/repos/dist/release/incubator/airflow/
> > >
> > > The release contains the following JIRAs:
> > >
> > > https://issues.apache.org/jira/browse/AIRFLOW-1897
> > > https://issues.apache.org/jira/browse/AIRFLOW-1873
> > > https://issues.apache.org/jira/browse/AIRFLOW-1896
> > >
> > > Along with all JIRAs that were in 1.9.0RC5 (see previous VOTE email for
> > > full list).
> > >
> > > Cheers,
> > > Chris
> >
> >
>


Re: PR Review Request

2017-12-08 Thread Feng Lu
Daniel, thanks for sharing this.

Indeed so, we thought about something very similar to what Sumit proposed
for viewing GCP job details.
The tricky part is how can we pass in/manage authorization tokens so users
don't have to login again (e.g., the job is submitted by GCP service
account specified in gcp connection id while the Airflow web user may not
even have the needed permission to see the page).

Will take a closer look on the PR.

On Fri, Dec 8, 2017 at 7:05 AM, Daniel Imberman 
wrote:

> Cc: @fenglu. Seems like this could be useful for the GCP operators.
>
> On Fri, Dec 8, 2017 at 1:00 AM Sumit Maheshwari 
> wrote:
>
>> I think we need some more eyes on the PR. As of now, it got stuck between
>> Bolke and me :).
>>
>> I am not able to convince Bolke, that to pre-generate all links on UI is a
>> time & cpu consuming task, as web server has to prepare all such links
>> before handing the rendering work to UI.
>>
>> While he is not able to convince me that UI processes 1 task node at a
>> time, so there would be no extra load on the web server.
>>
>>
>>
>> On Thu, Nov 23, 2017 at 7:12 PM, Sumit Maheshwari > >
>> wrote:
>>
>> > Ping!
>> >
>> > folks, please review :)
>> >
>> >
>> > On Mon, Nov 6, 2017 at 12:55 PM, Driesprong, Fokko > >
>> > wrote:
>> >
>> >> Hi Sumit,
>> >>
>> >> Thanks for the PR. I think this is a nice addition. This would also be
>> >> applicable for the Google Cloud and Databricks operators.
>> >>
>> >> I've had two remarks on the code. I still have to fire up Airflow to
>> see
>> >> how this would work in the UI.
>> >>
>> >> Cheers, Fokko
>> >>
>> >> 2017-11-06 8:07 GMT+01:00 Sumit Maheshwari :
>> >>
>> >> > Hi All,
>> >> >
>> >> > As of now TI model view in Airflow is very static and each operator
>> has
>> >> to
>> >> > make use of given options only. I have opened a PR to add support for
>> >> more
>> >> > links (buttons) on model view, which can redirect users to the
>> outside
>> >> of
>> >> > Airflow programmatically.
>> >> >
>> >> > Some simple use cases of this feature could be:
>> >> >- Redirecting users to Hadoop RM page
>> >> >- Adding quick links to operators documentation
>> >> >- Better integration with third-party operators
>> >> >
>> >> > Please review following PR (
>> >> > https://github.com/apache/incubator-airflow/pull/2657) and give your
>> >> > thoughts, +1s or -1s.
>> >> >
>> >> >
>> >> > Thanks,
>> >> > Sumit
>> >> >
>> >>
>> >
>> >
>>
>


Dynamic Airflow config reloads

2017-12-06 Thread Feng Lu
Hi,

It's probably well-known that Airflow only loads config file (i.e.,
airflow.cfg) at instance creation time, if one needs to change the config
file, all Airflow instances have to be restarted (understand that Airflow
worker does restart itself for each task execution and therefore picks up
the latest config updates).

Are people from this mailing group interested in adding dynamic config
loading support inside AirflowConfigParser (
https://github.com/apache/incubator-airflow/blob/master/airflow/configuration.py#L110
)?

Initial implementation ideas:
- introduce threadling.Rlock to AirflowConfigParser and guard all method
access
- add a periodical timer task that reads in the config file (of course
needs to acquire the lock beforehand).

Since config data is always accessed via the AirflowConfigParser object,
this essentially gives us dynamic config update without restarting Airflow
scheduler/webserver.

Thoughts?
Thank you.

Feng


Re: ImportError: No module named sendgrid

2017-10-25 Thread Feng Lu
SGTM, thanks for the prompt reply!

On Wed, Oct 25, 2017 at 7:38 PM, Sumit Maheshwari <sumeet.ma...@gmail.com>
wrote:

> Feng,
>
> IMO the correct place could be under *airflow/contrib/plugins *or at 
> *airflow/contrib/utils/sendgrid.py.
> *Also instead of keeping sendgrid api key in the config, it should be
> kept as a connection.
>
>
>
> On Thu, Oct 26, 2017 at 4:30 AM, Feng Lu <fen...@google.com> wrote:
>
>> Understand, thanks for the explanation ;)
>>
>> More than happy to make a PR to revert and move it to plugins.
>> Does this look like the right place? https://github.com/apac
>> he/incubator-airflow/tree/master/airflow/contrib/plugins
>> Or should we keep it to ourselves based on https://airflow.apache.org/
>> plugins.html?
>>
>> On Wed, Oct 25, 2017 at 11:02 AM, Sumit Maheshwari <
>> sumeet.ma...@gmail.com> wrote:
>>
>>> Feng, genuinely I am not against Sendgrid or any other third party
>>> integration with Airflow. Infact we ourselves use Sendgrid for all mailing
>>> purposes. But as being part of an open source community, we must try to
>>> remain as neutral as possible.
>>>
>>> I still remember of opposing integration of one particular SAML provider
>>> long long back, as there are 100s of such SAML providers out there, and we
>>> just can't let the config file filled with empty configurations settings.
>>>
>>> Anyway I am eagerly waiting to use sendgrid email backend :)
>>>
>>>
>>> On Wed, Oct 25, 2017 at 10:17 PM, Chris Riccomini <criccom...@apache.org
>>> > wrote:
>>>
>>>> Let's do it as a plugin.
>>>>
>>>> I merged it.
>>>>
>>>> On Wed, Oct 25, 2017 at 9:20 AM, Feng Lu <fen...@google.com.invalid>
>>>> wrote:
>>>>
>>>>> Sorry for the glitch.
>>>>>
>>>>> I am the author of this commit, I agree that if sendgrid is not used,
>>>>> it
>>>>> doesn't need to be installed.
>>>>> That being said, I can submit a quick fix to dynamically load this
>>>>> module.
>>>>>
>>>>> Re: core vs plugin, I feel it really depends on whether the sendgrid
>>>>> email
>>>>> integration is needed by
>>>>> users other than us. As I mentioned in the corresponding JIRA issue,
>>>>> the
>>>>> other email alternative
>>>>> in Airflow is SMTP, which requires username + password in plaintext
>>>>> (and
>>>>> these are not easily revokable).
>>>>>
>>>>> On the contrast, the current sendgrid integration in Airflow only
>>>>> needs an
>>>>> API key, which supports
>>>>> fine-grained permission control and can be easily revoked.
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Oct 25, 2017 at 4:27 AM, Bolke de Bruin <bdbr...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> > Code gets reviewed by a committer other than the author of the code.
>>>>> > Merging can happen by either.
>>>>> >
>>>>> > I also agree that the sendgrid dependency should be reverted and
>>>>> made into
>>>>> > a plugin instead.
>>>>> >
>>>>> > Bolke
>>>>> >
>>>>> > > On 25 Oct 2017, at 11:07, Ash Berlin-Taylor
>>>>> <ash_airflowlist@firemirror.
>>>>> > com> wrote:
>>>>> > >
>>>>> > > 1. PRs generally aren't "approved" in Github if a core contributor
>>>>> looks
>>>>> > at it and is happy
>>>>> > > 2. It was merged by criccomini, not the opener
>>>>> > https://github.com/apache/incubator-airflow/commit/
>>>>> > 7cb818bbacb2a2695282471591a9e323d8efbf5c <https://github.com/apache/
>>>>> > incubator-airflow/commit/7cb818bbacb2a2695282471591a9e323d8efbf5c>
>>>>> > > 3, 4, 5. I agree, and I made the same point
>>>>> https://issues.apache.org/
>>>>> > jira/browse/AIRFLOW-1723 <https://issues.apache.org/
>>>>> > jira/browse/AIRFLOW-1723>
>>>>> > >
>>>>> > > Either way this is a bug that it always tries to import
>>>>> unconditionally
>>>>> > and fails when the sendgrid mailer isn't used.
>>>>&g

Re: ImportError: No module named sendgrid

2017-10-25 Thread Feng Lu
Understand, thanks for the explanation ;)

More than happy to make a PR to revert and move it to plugins.
Does this look like the right place?
https://github.com/apache/incubator-airflow/tree/master/airflow/contrib/plugins

Or should we keep it to ourselves based on
https://airflow.apache.org/plugins.html?

On Wed, Oct 25, 2017 at 11:02 AM, Sumit Maheshwari <sumeet.ma...@gmail.com>
wrote:

> Feng, genuinely I am not against Sendgrid or any other third party
> integration with Airflow. Infact we ourselves use Sendgrid for all mailing
> purposes. But as being part of an open source community, we must try to
> remain as neutral as possible.
>
> I still remember of opposing integration of one particular SAML provider
> long long back, as there are 100s of such SAML providers out there, and we
> just can't let the config file filled with empty configurations settings.
>
> Anyway I am eagerly waiting to use sendgrid email backend :)
>
>
> On Wed, Oct 25, 2017 at 10:17 PM, Chris Riccomini <criccom...@apache.org>
> wrote:
>
>> Let's do it as a plugin.
>>
>> I merged it.
>>
>> On Wed, Oct 25, 2017 at 9:20 AM, Feng Lu <fen...@google.com.invalid>
>> wrote:
>>
>>> Sorry for the glitch.
>>>
>>> I am the author of this commit, I agree that if sendgrid is not used, it
>>> doesn't need to be installed.
>>> That being said, I can submit a quick fix to dynamically load this
>>> module.
>>>
>>> Re: core vs plugin, I feel it really depends on whether the sendgrid
>>> email
>>> integration is needed by
>>> users other than us. As I mentioned in the corresponding JIRA issue, the
>>> other email alternative
>>> in Airflow is SMTP, which requires username + password in plaintext (and
>>> these are not easily revokable).
>>>
>>> On the contrast, the current sendgrid integration in Airflow only needs
>>> an
>>> API key, which supports
>>> fine-grained permission control and can be easily revoked.
>>>
>>>
>>>
>>> On Wed, Oct 25, 2017 at 4:27 AM, Bolke de Bruin <bdbr...@gmail.com>
>>> wrote:
>>>
>>> > Code gets reviewed by a committer other than the author of the code.
>>> > Merging can happen by either.
>>> >
>>> > I also agree that the sendgrid dependency should be reverted and made
>>> into
>>> > a plugin instead.
>>> >
>>> > Bolke
>>> >
>>> > > On 25 Oct 2017, at 11:07, Ash Berlin-Taylor
>>> <ash_airflowlist@firemirror.
>>> > com> wrote:
>>> > >
>>> > > 1. PRs generally aren't "approved" in Github if a core contributor
>>> looks
>>> > at it and is happy
>>> > > 2. It was merged by criccomini, not the opener
>>> > https://github.com/apache/incubator-airflow/commit/
>>> > 7cb818bbacb2a2695282471591a9e323d8efbf5c <https://github.com/apache/
>>> > incubator-airflow/commit/7cb818bbacb2a2695282471591a9e323d8efbf5c>
>>> > > 3, 4, 5. I agree, and I made the same point
>>> https://issues.apache.org/
>>> > jira/browse/AIRFLOW-1723 <https://issues.apache.org/
>>> > jira/browse/AIRFLOW-1723>
>>> > >
>>> > > Either way this is a bug that it always tries to import
>>> unconditionally
>>> > and fails when the sendgrid mailer isn't used.
>>> > >
>>> > >> On 25 Oct 2017, at 09:48, Sumit Maheshwari <sumeet.ma...@gmail.com>
>>> > wrote:
>>> > >>
>>> > >> Sorry, mistakenly sent halfbaked:
>>> > >>
>>> > >> So, the concerns are:
>>> > >>
>>> > >> 1. Don't see any approver on the PR, neither +1s.
>>> > >> 2. The PR author and PR merging guy seem to be same, which I think
>>> is
>>> > >> against the general understanding.
>>> > >> 3. Why sendgrid got special privileges, and why not 100s of other
>>> mail
>>> > >> services? The Same concern was raised by Ash in JIRA as well.
>>> > >> 4. Why it was not coded as a plugin.
>>> > >> 5. Why there is a hard dependency to install Sendgrid module if I
>>> am not
>>> > >> using it.
>>> > >>
>>> > >> I think this commit needs to be reverted and a new PR should be
>>> raised,
>>> > >> which adds its as a plugin instead of a core feature.
>>> > >>
>>> > >>
>>> > >>
>>> > >> On Wed, Oct 25, 2017 at 2:12 PM, Sumit Maheshwari <
>>> > sumeet.ma...@gmail.com>
>>> > >> wrote:
>>> > >>
>>> > >>> When I fetched the latest master, installed it and ran webserver,
>>> it
>>> > >>> failed with this error:
>>> > >>>
>>> > >>> *ImportError: No module named sendgrid*
>>> > >>>
>>> > >>> On further investigation, I found that it has been introduced as
>>> part
>>> > of
>>> > >>> this PR <https://github.com/apache/incubator-airflow/pull/2695>
>>> and
>>> > this
>>> > >>> JIRA <https://issues.apache.org/jira/browse/AIRFLOW-1723>.
>>> > >>> Now couple of doubts:
>>> > >>>
>>> > >>> 1.
>>> > >>>
>>> > >>>
>>> > >>>
>>> > >>>
>>> > >>> Thanks,
>>> > >>> Sumit Maheshwari
>>> > >>> cell. 9632202950
>>> > >>>
>>> > >>>
>>> > >
>>> >
>>> >
>>>
>>
>>
>


Re: ImportError: No module named sendgrid

2017-10-25 Thread Feng Lu
Sorry for the glitch.

I am the author of this commit, I agree that if sendgrid is not used, it
doesn't need to be installed.
That being said, I can submit a quick fix to dynamically load this module.

Re: core vs plugin, I feel it really depends on whether the sendgrid email
integration is needed by
users other than us. As I mentioned in the corresponding JIRA issue, the
other email alternative
in Airflow is SMTP, which requires username + password in plaintext (and
these are not easily revokable).

On the contrast, the current sendgrid integration in Airflow only needs an
API key, which supports
fine-grained permission control and can be easily revoked.



On Wed, Oct 25, 2017 at 4:27 AM, Bolke de Bruin  wrote:

> Code gets reviewed by a committer other than the author of the code.
> Merging can happen by either.
>
> I also agree that the sendgrid dependency should be reverted and made into
> a plugin instead.
>
> Bolke
>
> > On 25 Oct 2017, at 11:07, Ash Berlin-Taylor  com> wrote:
> >
> > 1. PRs generally aren't "approved" in Github if a core contributor looks
> at it and is happy
> > 2. It was merged by criccomini, not the opener
> https://github.com/apache/incubator-airflow/commit/
> 7cb818bbacb2a2695282471591a9e323d8efbf5c  incubator-airflow/commit/7cb818bbacb2a2695282471591a9e323d8efbf5c>
> > 3, 4, 5. I agree, and I made the same point https://issues.apache.org/
> jira/browse/AIRFLOW-1723  jira/browse/AIRFLOW-1723>
> >
> > Either way this is a bug that it always tries to import unconditionally
> and fails when the sendgrid mailer isn't used.
> >
> >> On 25 Oct 2017, at 09:48, Sumit Maheshwari 
> wrote:
> >>
> >> Sorry, mistakenly sent halfbaked:
> >>
> >> So, the concerns are:
> >>
> >> 1. Don't see any approver on the PR, neither +1s.
> >> 2. The PR author and PR merging guy seem to be same, which I think is
> >> against the general understanding.
> >> 3. Why sendgrid got special privileges, and why not 100s of other mail
> >> services? The Same concern was raised by Ash in JIRA as well.
> >> 4. Why it was not coded as a plugin.
> >> 5. Why there is a hard dependency to install Sendgrid module if I am not
> >> using it.
> >>
> >> I think this commit needs to be reverted and a new PR should be raised,
> >> which adds its as a plugin instead of a core feature.
> >>
> >>
> >>
> >> On Wed, Oct 25, 2017 at 2:12 PM, Sumit Maheshwari <
> sumeet.ma...@gmail.com>
> >> wrote:
> >>
> >>> When I fetched the latest master, installed it and ran webserver, it
> >>> failed with this error:
> >>>
> >>> *ImportError: No module named sendgrid*
> >>>
> >>> On further investigation, I found that it has been introduced as part
> of
> >>> this PR  and
> this
> >>> JIRA .
> >>> Now couple of doubts:
> >>>
> >>> 1.
> >>>
> >>>
> >>>
> >>>
> >>> Thanks,
> >>> Sumit Maheshwari
> >>> cell. 9632202950
> >>>
> >>>
> >
>
>


Re: Meetup Interest?

2017-10-15 Thread Feng Lu
+1

We can give an update on task secret management in K8SExecutor and also
want to share our thoughts and get feedback on Airflow CI/CD with the set
of GCP operators/hooks as an example.

On Sat, Oct 14, 2017 at 7:06 PM, Marc Bollinger  wrote:

> +1
>
> We'd definitely be in. Would love to chat more about K8s/Airflow--Data Eng
> has been a little twitchy about being the guinea pigs in our org, but the
> production app is now serving all traffic from it, so we're planning out
> our strategy.
>
> On Fri, Oct 13, 2017 at 1:29 PM, Daniel Imberman (BLOOMBERG/ SAN FRAN) <
> dimber...@bloomberg.net> wrote:
>
> > +1
> >
> > We're getting really close on the Kubernetes Executor PR. Would love to
> > discuss final features/architecture to make sure we cover our bases
> before
> > we try to roll out alpha.
> >
> >
> > From: mw...@newrelic.com
> > Subject: Re: Meetup Interest?
> >
> > +1 for this meetup idea! We don't use Kube+Airflow, but I'd love to see
> > talks on scaling it out team-wise and some design patterns people have
> come
> > up with.
> >
> > --
> > Marc Weil | Lead Engineer | Growth Automation, Marketing, and Engagement
> |
> > New Relic
> > On Fri, Oct 13, 2017 at 1:03 PM, Christopher Bockman <
> > ch...@fathomhealth.co> wrote:
> >
> > +1 as a vote.
> >
> > We're very actively working on Kube+Airflow, so would be particularly
> > interested on discussions there.
> >
> > On Fri, Oct 13, 2017 at 12:59 PM, Joy Gao  wrote:
> >
> > > Hi Dan,
> > >
> > > I'd be happy to give an update on progress of the new RBAC UI we've
> been
> > > working on here at WePay.
> > >
> > > Cheers,
> > > Joy
> > >
> > > On Fri, Oct 13, 2017 at 12:10 PM, Dan Davydov <
> > > dan.davy...@airbnb.com.invalid> wrote:
> > >
> > > > Is there interest in doing an Airflow meet-up? Airbnb can host one in
> > San
> > > > Francisco.
> > > >
> > > > Some talk ideas can include the progress on Kubernetes integration
> and
> > > > Scaling & Operations with Airflow. If you want to see other topics
> > > covered,
> > > > feel free to suggest them!
> > > >
> > >
> >
> >
> >
>


Re: Airflow 1.9.0 status

2017-10-04 Thread Feng Lu
Thank you so much Chris!

On Tue, Oct 3, 2017 at 10:30 PM, Chris Riccomini <criccom...@apache.org>
wrote:

> I've added AIRFLOW-1635 to the v1-9-test branch. It's not in alpha0, but
> will be included in alpha1.
>
> On Tue, Oct 3, 2017 at 4:13 PM, Feng Lu <fen...@google.com.invalid> wrote:
>
> > Hi Chris,
> >
> > I know it's annoying to have last minute commit com in, but this is a
> > highly desirable feature for folks using GCP operators, is it possible to
> > include AIRFLOW-1635
> > <https://github.com/apache/incubator-airflow/commit/
> > b3e985a3146272ecfd3ceaaa0d8567e4e9e117d4>
> > in?
> > More than happy to offer help if there's something I can do.
> > Thanks a lot.
> >
> > Feng
> >
> > On Mon, Oct 2, 2017 at 3:24 PM, Chris Riccomini <criccom...@apache.org>
> > wrote:
> >
> > > Hey all,
> > >
> > > I have cut a 1.9.0alpha0 release of Airflow. You can download it here:
> > >
> > >   https://dist.apache.org/repos/dist/dev/incubator/airflow/1.
> 9.0alpha0/
> > >
> > > The bin tarball can be installed with pip:
> > >
> > >   pip install apache-airflow-1.9.0alpha0+incubating-bin.tar.gz
> > >
> > > The goal is to have the community install and run this to expose any
> bugs
> > > before we move on to official release candidates.
> > >
> > > Here are the outstanding blocker bugs for 1.9.0:
> > >
> > >   AIRFLOW-1525 |Improvement |Fix minor LICENSE & NOTICE issue
> > >   AIRFLOW-1258 |Bug |TaskInstances within SubDagOperator are
> > marked
> > > as
> > >   AIRFLOW-1055 |Bug |airflow/jobs.py:create_dag_run()
> exception
> > > for
> > > @on
> > >   AIRFLOW-1018 |Bug |Scheduler DAG processes can not log to
> > stdout
> > >   AIRFLOW-1013 |Bug |airflow/jobs.py:manage_slas() exception
> for
> > > @once
> > >   AIRFLOW-976  |Bug |Mark success running task causes it to
> fail
> > >
> > > Cheers,
> > > Chris
> > >
> > > On Fri, Sep 29, 2017 at 3:54 PM, Chris Riccomini <
> criccom...@apache.org>
> > > wrote:
> > >
> > > > Welp. Work got in the way, so I'll cut the beta on Monday. :)
> > > >
> > > > On Thu, Sep 28, 2017 at 1:32 PM, Chris Riccomini <
> > criccom...@apache.org>
> > > > wrote:
> > > >
> > > >> Works for me. Will try and cut a beta before end of week.
> > > >>
> > > >> Blockers for 1.9.0 are:
> > > >>
> > > >> AIRFLOW-1611 |Bug |Customize logging in Airflow
> > > >> AIRFLOW-1525 |Improvement |Fix minor LICENSE & NOTICE issue
> > > >> AIRFLOW-1258 |Bug |TaskInstances within SubDagOperator are
> > > marked
> > > >> as
> > > >> AIRFLOW-976  |Bug |Mark success running task causes it to
> fail
> > > >>
> > > >>
> > > >> On Thu, Sep 28, 2017 at 1:09 PM, Bolke de Bruin <bdbr...@gmail.com>
> > > >> wrote:
> > > >>
> > > >>> Hi Chris
> > > >>>
> > > >>> Can I suggest releasing a beta? The stable branch is only cut at RC
> > > >>> time. Betas allow us a broader exposure. It also gives us a point
> of
> > > >>> reference.
> > > >>>
> > > >>> In addition the list below are mostly longer standing issues that
> are
> > > >>> also part of the 1.8.x branch. Maybe only consider 1611, 1525,
> 1258,
> > > and
> > > >>> 976 as blocker?
> > > >>>
> > > >>> Cheers
> > > >>> Bolke
> > > >>>
> > > >>> Verstuurd vanaf mijn iPad
> > > >>>
> > > >>> > Op 28 sep. 2017 om 19:49 heeft Chris Riccomini <
> > > criccom...@apache.org>
> > > >>> het volgende geschreven:
> > > >>> >
> > > >>> > Hey all,
> > > >>> >
> > > >>> > I was planning to cut a 1.9.0 stable branch and 1.9.0 beta
> release,
> > > but
> > > >>> > seeing as there are several outstanding bugs, I'm going to delay.
> > > Here
> > > >>> are
> > > >>> > the bugs that I'm tracking:
> > > >>> >
> > > >>> &

Re: Airflow 1.9.0 status

2017-10-03 Thread Feng Lu
Hi Chris,

I know it's annoying to have last minute commit com in, but this is a
highly desirable feature for folks using GCP operators, is it possible to
include AIRFLOW-1635

in?
More than happy to offer help if there's something I can do.
Thanks a lot.

Feng

On Mon, Oct 2, 2017 at 3:24 PM, Chris Riccomini 
wrote:

> Hey all,
>
> I have cut a 1.9.0alpha0 release of Airflow. You can download it here:
>
>   https://dist.apache.org/repos/dist/dev/incubator/airflow/1.9.0alpha0/
>
> The bin tarball can be installed with pip:
>
>   pip install apache-airflow-1.9.0alpha0+incubating-bin.tar.gz
>
> The goal is to have the community install and run this to expose any bugs
> before we move on to official release candidates.
>
> Here are the outstanding blocker bugs for 1.9.0:
>
>   AIRFLOW-1525 |Improvement |Fix minor LICENSE & NOTICE issue
>   AIRFLOW-1258 |Bug |TaskInstances within SubDagOperator are marked
> as
>   AIRFLOW-1055 |Bug |airflow/jobs.py:create_dag_run() exception
> for
> @on
>   AIRFLOW-1018 |Bug |Scheduler DAG processes can not log to stdout
>   AIRFLOW-1013 |Bug |airflow/jobs.py:manage_slas() exception for
> @once
>   AIRFLOW-976  |Bug |Mark success running task causes it to fail
>
> Cheers,
> Chris
>
> On Fri, Sep 29, 2017 at 3:54 PM, Chris Riccomini 
> wrote:
>
> > Welp. Work got in the way, so I'll cut the beta on Monday. :)
> >
> > On Thu, Sep 28, 2017 at 1:32 PM, Chris Riccomini 
> > wrote:
> >
> >> Works for me. Will try and cut a beta before end of week.
> >>
> >> Blockers for 1.9.0 are:
> >>
> >> AIRFLOW-1611 |Bug |Customize logging in Airflow
> >> AIRFLOW-1525 |Improvement |Fix minor LICENSE & NOTICE issue
> >> AIRFLOW-1258 |Bug |TaskInstances within SubDagOperator are
> marked
> >> as
> >> AIRFLOW-976  |Bug |Mark success running task causes it to fail
> >>
> >>
> >> On Thu, Sep 28, 2017 at 1:09 PM, Bolke de Bruin 
> >> wrote:
> >>
> >>> Hi Chris
> >>>
> >>> Can I suggest releasing a beta? The stable branch is only cut at RC
> >>> time. Betas allow us a broader exposure. It also gives us a point of
> >>> reference.
> >>>
> >>> In addition the list below are mostly longer standing issues that are
> >>> also part of the 1.8.x branch. Maybe only consider 1611, 1525, 1258,
> and
> >>> 976 as blocker?
> >>>
> >>> Cheers
> >>> Bolke
> >>>
> >>> Verstuurd vanaf mijn iPad
> >>>
> >>> > Op 28 sep. 2017 om 19:49 heeft Chris Riccomini <
> criccom...@apache.org>
> >>> het volgende geschreven:
> >>> >
> >>> > Hey all,
> >>> >
> >>> > I was planning to cut a 1.9.0 stable branch and 1.9.0 beta release,
> but
> >>> > seeing as there are several outstanding bugs, I'm going to delay.
> Here
> >>> are
> >>> > the bugs that I'm tracking:
> >>> >
> >>> > AIRFLOW-1611 |Bug |Customize logging in Airflow
> >>> > AIRFLOW-1525 |Improvement |Fix minor LICENSE & NOTICE issue
> >>> > AIRFLOW-1258 |Bug |TaskInstances within SubDagOperator are
> >>> marked as
> >>> > AIRFLOW-1055 |Bug |airflow/jobs.py:create_dag_run()
> exception
> >>> for
> >>> > @on
> >>> > AIRFLOW-1018 |Bug |Scheduler DAG processes can not log to
> >>> stdout
> >>> > AIRFLOW-1013 |Bug |airflow/jobs.py:manage_slas() exception
> >>> for @once
> >>> > AIRFLOW-988  |Bug |SLA Miss Callbacks Are Repeated if Email
> is
> >>> Not
> >>> > be
> >>> > AIRFLOW-976  |Bug |Mark success running task causes it to
> fail
> >>> >
> >>> > These are the priority issues. Once they're merged, I'll cut the
> >>> > v1-9-stable and beta release.
> >>> >
> >>> > If you can help clean this up, that would be really appreciated.
> >>> >
> >>> > Cheers,
> >>> > Chris
> >>> >
> >>> > On Thu, Sep 28, 2017 at 10:06 AM, Chris Riccomini <
> >>> criccom...@apache.org>
> >>> > wrote:
> >>> >
> >>> >> Marked it for 1.9.0.
> >>> >>
> >>> >>> On Thu, Sep 28, 2017 at 9:56 AM, Charlie Jones 
> >>> wrote:
> >>> >>>
> >>> >>> Is there any chance we could include AIRFLOW-988 in 1.9.0? SLA
> >>> callbacks
> >>> >>> are not working correctly without emails... Its not a major bug,
> but
> >>> it
> >>> >>> does cause us some annoyance in our current deployment.
> >>> >>>
> >>> >>> Link to Jira:
> >>> >>> https://issues.apache.org/jira/browse/AIRFLOW-988
> >>> >>>
> >>> >>> Link to PR:
> >>> >>> https://github.com/apache/incubator-airflow/pull/2415
> >>> >>>
> >>> >>> Thanks!
> >>> >>> Charlie Jones
> >>> >>>
> >>> >>> CHARLIE JONES
> >>> >>> Data Engineer
> >>> >>> cjo...@simpli.fi  |  M: 972.821.7631
> >>> >>> __
> >>> >>>
> >>> >>>
> >>> >>> Programmatic Performance.* Localized.*
> >>> >>> __
> >>> >>>
> >>> >>> 1407 Texas Street  |  Suite 202  |  Fort Worth, TX 76102
> >>> >>> 

Re: RFC: Managing task credentials inside KubernetesExecutor

2017-09-12 Thread Feng Lu
Thank you Maxime for the confirmation, good suggestion on the use of policy
function!

On Mon, Sep 11, 2017 at 9:16 AM, Maxime Beauchemin <
maximebeauche...@gmail.com> wrote:

> Hi,
>
> The proposal seems rational to me. `BaseOperator.executor_config` seems
> like a good [new] place to put this. I'd assume that in some environments
> there would be rules in the policy function
> <https://airflow.incubator.apache.org/concepts.html#cluster-policy> to
> force values in certain/all contexts.
>
> Max
>
> On Thu, Aug 31, 2017 at 10:17 PM, Feng Lu <fen...@google.com.invalid>
> wrote:
>
> > Sounds great, thanks a lot for setting up the meeting and will be there.
> >
> > On Thu, Aug 31, 2017 at 4:10 PM, Daniel Imberman <
> > daniel.imber...@gmail.com>
> > wrote:
> >
> > > Thank you for posting this to the wiki Feng Lu :).
> > >
> > > I'm going to propose an overall "airflow + kubernetes update" meeting
> in
> > a
> > > seperate email to discuss with the community at large. Would love it if
> > you
> > > could discuss this further at that meeting!
> > >
> > > Daniel
> > >
> > > On Wed, Aug 30, 2017 at 10:38 PM Feng Lu <fen...@google.com.invalid>
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > *TL;DR*
> > > > Airflow doesn't have adequate built-in support for managing per-task
> > > > credentials, the concept of connection helps to certain extent but is
> > not
> > > > very satisfactory. The current Airflow KubernetesExecutor work opens
> up
> > > the
> > > > possibility to handle task credentials at the framework level and
> > > separate
> > > > workflow business logic from credential/account management by
> > leveraging
> > > > the Kubernetes initializer mechanism. At the end of the day, a
> task/dag
> > > > only needs to specify an account name and everything else is taken
> care
> > > by
> > > > the Airflow framework in a secure fashion.
> > > >
> > > > Detailed design:
> > > >
> > > > https://cwiki.apache.org/confluence/display/AIRFLOW/
> > > Managing+Per-task+Credentials+in+KubernetesExecutor
> > > >
> > > > Critics and comments are welcome :-)
> > > > Thank you.
> > > >
> > > > Feng
> > > >
> > >
> >
>


Re: Airflow + Kubernetes update meeting

2017-09-05 Thread Feng Lu
+1, either way works for me.

On Tue, Sep 5, 2017 at 10:10 AM, Chris Riccomini 
wrote:

> Works for me.
>
> On Tue, Sep 5, 2017 at 7:44 AM, Grant Nicholas  northwestern.edu> wrote:
>
>> +1 for me if it works with others.
>>
>> On Mon, Sep 4, 2017 at 11:02 PM, Anirudh Ramanathan <
>> ramanath...@google.com> wrote:
>>
>>> Date/time work for me if we get quorum from this group.
>>>
>>> On Thu, Aug 31, 2017 at 7:54 PM, Christopher Bockman <
>>> ch...@fathomhealth.co> wrote:
>>>
 Hi Daniel, would this be remote or in person?


 On Aug 31, 2017 4:16 PM, "Daniel Imberman" 
 wrote:

 Hey guys!

 So I wanted to set up a meeting to discuss some of the updates/current
 work
 that is going on with both the kubernetes operator and kubernetes
 executor
 efforts. There has been some really cool updates/proposals on the
 design of
 these two features and I would love to get some community feedback to
 make
 sure that we are taking this in a direction that benefits everyone.

 I am thinking of having this meeting at 10:00AM on Thursday, September
 7th
 PST. Would this time/place work?

 Thanks!

 Daniel



>>>
>>>
>>> --
>>> Anirudh Ramanathan
>>>
>>
>>
>


Re: User delegation does not work on current GoogleCloudBaseHook

2017-09-01 Thread Feng Lu
That looks right to me.

Unfortunately Python client lib, unlike the java client lib
,
doesn't support generating GoogleCredentials while impersonating another
user/service account.
Otherwise, the code can be much simplified and we only need to deal with
GoogleCrentials.

Happy to take a look at your PR too, just @fenglu-g.

On Thu, Aug 31, 2017 at 6:03 PM, Pras Srinivasan <
pras.sriniva...@glassdoor.com> wrote:

> I'm upgrading from airflow 1.7 to 1.8.2rc4. I noticed that the user
> delegation feature does not work for service accounts when inheriting from
> GoogleCloudBaseHook anymore .
>
> Older versions of this hook used to support delegation when
> SignedJwtAssertionCredentials was being used. Actually, the current code in
> master still has some code left over from when
> SignedJwtAssertionCredentials was being used. Specifically these lines
> (#68-#70) in gcp_api_base_hook.py :
>
> kwargs = {}
> if self.delegate_to:
> kwargs['sub'] = self.delegate_to
>
> However, this information is not used anywhere and the _authorize method
> simply returns a HTTP object without allowing for delegation.
>
> I think the changes that need to be made are:
> 1) Remove lines 68-70
> 2) Add a couple of lines after line #83 that enable returning a delegated
> credential object :
> if self.delegate_to:
> credentials = credentials.create_delegated(self.delegate_to)
>
> Can another dev please review/confirm that my understanding is correct? I'm
> happy to open a JIRA on Apache, as well as submit the fix.
>
> Thanks much!
> Pras
>


Re: RFC: Managing task credentials inside KubernetesExecutor

2017-08-31 Thread Feng Lu
Sounds great, thanks a lot for setting up the meeting and will be there.

On Thu, Aug 31, 2017 at 4:10 PM, Daniel Imberman <daniel.imber...@gmail.com>
wrote:

> Thank you for posting this to the wiki Feng Lu :).
>
> I'm going to propose an overall "airflow + kubernetes update" meeting in a
> seperate email to discuss with the community at large. Would love it if you
> could discuss this further at that meeting!
>
> Daniel
>
> On Wed, Aug 30, 2017 at 10:38 PM Feng Lu <fen...@google.com.invalid>
> wrote:
>
> > Hi all,
> >
> > *TL;DR*
> > Airflow doesn't have adequate built-in support for managing per-task
> > credentials, the concept of connection helps to certain extent but is not
> > very satisfactory. The current Airflow KubernetesExecutor work opens up
> the
> > possibility to handle task credentials at the framework level and
> separate
> > workflow business logic from credential/account management by leveraging
> > the Kubernetes initializer mechanism. At the end of the day, a task/dag
> > only needs to specify an account name and everything else is taken care
> by
> > the Airflow framework in a secure fashion.
> >
> > Detailed design:
> >
> > https://cwiki.apache.org/confluence/display/AIRFLOW/
> Managing+Per-task+Credentials+in+KubernetesExecutor
> >
> > Critics and comments are welcome :-)
> > Thank you.
> >
> > Feng
> >
>


RFC: Managing task credentials inside KubernetesExecutor

2017-08-30 Thread Feng Lu
Hi all,

*TL;DR*
Airflow doesn't have adequate built-in support for managing per-task
credentials, the concept of connection helps to certain extent but is not
very satisfactory. The current Airflow KubernetesExecutor work opens up the
possibility to handle task credentials at the framework level and separate
workflow business logic from credential/account management by leveraging
the Kubernetes initializer mechanism. At the end of the day, a task/dag
only needs to specify an account name and everything else is taken care by
the Airflow framework in a secure fashion.

Detailed design:
https://cwiki.apache.org/confluence/display/AIRFLOW/Managing+Per-task+Credentials+in+KubernetesExecutor

Critics and comments are welcome :-)
Thank you.

Feng


Re: add a page to Airflow wiki site

2017-08-30 Thread Feng Lu
Thank you Siddharth!

On Wed, Aug 30, 2017 at 6:29 PM, siddharth anand <san...@apache.org> wrote:

> I keep forgetting the images are stripped... I'm guessing you are *fenglu-g
> *from the list shown in the image link.
>
> https://www.dropbox.com/s/1bvh7pd1revb3x5/Screenshot%
> 202017-08-30%2018.26.53.png?dl=0
>
> On Wed, Aug 30, 2017 at 6:28 PM, siddharth anand <san...@apache.org>
> wrote:
>
>> I've granted you permissions. I'm guessing you are *fenglu-g*
>>
>>
>> [image: Inline image 1]
>>
>> On Wed, Aug 30, 2017 at 5:42 PM, Feng Lu <fen...@google.com.invalid>
>> wrote:
>>
>>> Hi,
>>>
>>> We would like to share a design proposal on the wiki page
>>> https://cwiki.apache.org/confluence/display/AIRFLOW, unfortunately it
>>> doesn't look like I have the permission to do so, could someone (the
>>> committers?) kindly grant me edit access?
>>> Thank you.
>>>
>>> Feng
>>>
>>
>>
>


Adding airflow top-level label(s)

2017-06-06 Thread Feng Lu
Hi,

Resource label/tag has gained increasing popularity, for example, both GCP
and AWS allow labels/tags to be attached to the cloud resources they
offered. I am wondering whether the airflow community is interested in
adding some default top-level airflow labels for all cloud resources that
support labeling, spawned from airflow. In other words, all operators
creating/using these resources will include the airflow labels (e.g.,
"airflow-version" : "1.8.1").

The benefits are twofold:
- enable measuring the impact of apache airflow quantitatively at different
places.
- allow users to better organize airflow triggered cloud resources.

These top-level labels will then be merged with user-provided labels
specified in the various operators (e.g.
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/dataproc_operator.py#L55
).

Thanks.

Feng


Re: Airflow 1.8.0 Release Candidate 1

2017-02-07 Thread Feng Lu
When num_runs is not explicitly specified, the default is set to -1 to
match the expectation of SchedulerJob here:

​
Doing so also matches the type of num_runs ('int' in this case).
The scheduler will run non-stop as a result regardless whether dag files
are present (since the num_runs default is now -1: unlimited).

Based on what Alex described, the import error doesn't look like directly
related to this change.
Maybe this one?
https://github.com/apache/incubator-airflow/commit/67cbb966410226c1489bb730af3af45330fc51b9

I am still in the middle of running some quick test using celery executor,
will update the thread once it's done.


On Tue, Feb 7, 2017 at 6:56 AM, Bolke de Bruin  wrote:

> Hey Alex,
>
> Thanks for tracking it down. Can you elaborate want went wrong with
> celery? The lines below do not particularly relate to Celery directly, so I
> wonder why we are not seeing it with LocalExecutor?
>
> Cheers
> Bolke
>
> > On 7 Feb 2017, at 15:51, Alex Van Boxel  wrote:
> >
> > I have to give the RC1 a *-1*. I spend hours, or better days to get the
> RC
> > running with Celery on our test environment, till I finally found the
> > commit that killed it:
> >
> > e7f6212cae82c3a3a0bc17bbcbc70646f67d02eb
> > [AIRFLOW-813] Fix unterminated unit tests in SchedulerJobTest
> > Closes #2032 from fenglu-g/master
> >
> > I was always looking at the wrong this, because the commit only changes a
> > single default parameter from *None to -1*
> >
> > I do have the impression I'm the only one running with Celery. Are other
> > people running with it?
> >
> > *I propose* *reverting the commit*. Feng, can you elaborate on this
> change?
> >
> > Change the default back no *None* in cli.py got it finally working:
> >
> > 'num_runs': Arg(
> >("-n", "--num_runs"),
> >default=None, type=int,
> >help="Set the number of runs to execute before exiting"),
> >
> > Thanks.
> >
> > On Tue, Feb 7, 2017 at 3:49 AM siddharth anand 
> wrote:
> >
> > I did get 1.8.0 installed and running at Agari.
> >
> > I did run into 2 problems.
> > 1. Most of our DAGs broke due the way Operators are now imported.
> > https://github.com/apache/incubator-airflow/blob/master/
> UPDATING.md#deprecated-features
> >
> > According to the documentation, these deprecations would only cause an
> > issue in 2.0. However, I needed to fix them now.
> >
> > So, I needed to change "from airflow.operators import PythonOperator" to
> > from "from airflow.operators.python_operator import PythonOperator". Am
> I
> > missing something?
> >
> > 2. I ran into a migration problem that seems to have cleared itself up. I
> > did notice that some dags do not have data in their "DAG Runs" column on
> > the overview page computed. I am looking into that issue presently.
> > https://www.dropbox.com/s/cn058mtu3vcv8sq/Screenshot%
> 202017-02-06%2018.45.07.png?dl=0
> >
> > -s
> >
> > On Mon, Feb 6, 2017 at 4:30 PM, Dan Davydov  invalid>
> > wrote:
> >
> >> Bolke, attached is the patch for the cgroups fix. Let me know which
> >> branches you would like me to merge it to. If anyone has complaints
> about
> >> the patch let me know (but it does not touch the core of airflow, only
> the
> >> new cgroups task runner).
> >>
> >> On Mon, Feb 6, 2017 at 4:24 PM, siddharth anand 
> wrote:
> >>
> >>> Actually, I see the error is further down..
> >>>
> >>>  File
> >>> "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/default.py",
> >>> line
> >>> 469, in do_execute
> >>>
> >>>cursor.execute(statement, parameters)
> >>>
> >>> sqlalchemy.exc.IntegrityError: (psycopg2.IntegrityError) null value in
> >>> column "dag_id" violates not-null constraint
> >>>
> >>> DETAIL:  Failing row contains (null, running, 1, f).
> >>>
> >>> [SQL: 'INSERT INTO dag_stats (state, count, dirty) VALUES (%(state)s,
> >>> %(count)s, %(dirty)s)'] [parameters: {'count': 1L, 'state': u'running',
> >>> 'dirty': False}]
> >>>
> >>> It looks like an autoincrement is missing for this table.
> >>>
> >>>
> >>> I'm running `SQLAlchemy==1.1.4` - I see our setup.py specifies any
> > version
> >>> greater than 0.9.8
> >>>
> >>> -s
> >>>
> >>>
> >>>
> >>> On Mon, Feb 6, 2017 at 4:11 PM, siddharth anand 
> >>> wrote:
> >>>
>  I tried upgrading to 1.8.0rc1 from 1.7.1.3 via pip install
>  https://dist.apache.org/repos/dist/dev/incubator/airflow/
>  airflow-1.8.0rc1+apache.incubating.tar.gz and then running airflow
>  upgradedb didn't quite work. First, I thought it completed
> > successfully,
>  then saw errors some tables were indeed missing. I ran it again and
>  encountered the following exception :
> 
>  DB: postgresql://app_coust...@db-cousteau.ep.stage.agari.com:543
> >>> 2/airflow
> 
>  [2017-02-07 00:03:20,309] {db.py:284} INFO - Creating tables
> 
>  INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
> 
>  INFO  

Re: Airflow 1.8.0 Release Candidate 1

2017-02-07 Thread Feng Lu
Hi Alex-

Please see the attached screenshots of my local testing using
celeryexecutor (on k8s as well).
All look good and the workflow is successfully completed.

Curious did you also update the worker image?
Sorry for the confusion, happy to debug more if you could share with me
your k8s setup.

Feng

On Tue, Feb 7, 2017 at 8:37 AM, Feng Lu <fen...@google.com> wrote:

> When num_runs is not explicitly specified, the default is set to -1 to
> match the expectation of SchedulerJob here:
>
> ​
> Doing so also matches the type of num_runs ('int' in this case).
> The scheduler will run non-stop as a result regardless whether dag files
> are present (since the num_runs default is now -1: unlimited).
>
> Based on what Alex described, the import error doesn't look like directly
> related to this change.
> Maybe this one? https://github.com/apache/incubator-airflow/commit/
> 67cbb966410226c1489bb730af3af45330fc51b9
>
> I am still in the middle of running some quick test using celery executor,
> will update the thread once it's done.
>
>
> On Tue, Feb 7, 2017 at 6:56 AM, Bolke de Bruin <bdbr...@gmail.com> wrote:
>
>> Hey Alex,
>>
>> Thanks for tracking it down. Can you elaborate want went wrong with
>> celery? The lines below do not particularly relate to Celery directly, so I
>> wonder why we are not seeing it with LocalExecutor?
>>
>> Cheers
>> Bolke
>>
>> > On 7 Feb 2017, at 15:51, Alex Van Boxel <a...@vanboxel.be> wrote:
>> >
>> > I have to give the RC1 a *-1*. I spend hours, or better days to get the
>> RC
>> > running with Celery on our test environment, till I finally found the
>> > commit that killed it:
>> >
>> > e7f6212cae82c3a3a0bc17bbcbc70646f67d02eb
>> > [AIRFLOW-813] Fix unterminated unit tests in SchedulerJobTest
>> > Closes #2032 from fenglu-g/master
>> >
>> > I was always looking at the wrong this, because the commit only changes
>> a
>> > single default parameter from *None to -1*
>> >
>> > I do have the impression I'm the only one running with Celery. Are other
>> > people running with it?
>> >
>> > *I propose* *reverting the commit*. Feng, can you elaborate on this
>> change?
>> >
>> > Change the default back no *None* in cli.py got it finally working:
>> >
>> > 'num_runs': Arg(
>> >("-n", "--num_runs"),
>> >default=None, type=int,
>> >help="Set the number of runs to execute before exiting"),
>> >
>> > Thanks.
>> >
>> > On Tue, Feb 7, 2017 at 3:49 AM siddharth anand <san...@apache.org>
>> wrote:
>> >
>> > I did get 1.8.0 installed and running at Agari.
>> >
>> > I did run into 2 problems.
>> > 1. Most of our DAGs broke due the way Operators are now imported.
>> > https://github.com/apache/incubator-airflow/blob/master/UPDA
>> TING.md#deprecated-features
>> >
>> > According to the documentation, these deprecations would only cause an
>> > issue in 2.0. However, I needed to fix them now.
>> >
>> > So, I needed to change "from airflow.operators import PythonOperator" to
>> > from "from airflow.operators.python_operator import PythonOperator".
>> Am I
>> > missing something?
>> >
>> > 2. I ran into a migration problem that seems to have cleared itself up.
>> I
>> > did notice that some dags do not have data in their "DAG Runs" column on
>> > the overview page computed. I am looking into that issue presently.
>> > https://www.dropbox.com/s/cn058mtu3vcv8sq/Screenshot%202017-
>> 02-06%2018.45.07.png?dl=0
>> >
>> > -s
>> >
>> > On Mon, Feb 6, 2017 at 4:30 PM, Dan Davydov <dan.davy...@airbnb.com
>> .invalid>
>> > wrote:
>> >
>> >> Bolke, attached is the patch for the cgroups fix. Let me know which
>> >> branches you would like me to merge it to. If anyone has complaints
>> about
>> >> the patch let me know (but it does not touch the core of airflow, only
>> the
>> >> new cgroups task runner).
>> >>
>> >> On Mon, Feb 6, 2017 at 4:24 PM, siddharth anand <san...@apache.org>
>> wrote:
>> >>
>> >>> Actually, I see the error is further down..
>> >>>
>> >>>  File
>> >>> "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/
>> default.py",
>> >>> line
>> >>> 469, in do_e