Re: Airflow Meetup at Stripe in September?

2016-08-04 Thread siddharth anand
You can also ask on this list if there are folks interested in speaking.
Also, should you folks ever want to speak in other meet-ups too, it's fully
welcome.

-s

On Thu, Aug 4, 2016 at 7:39 PM, Siddharth Anand 
wrote:

> Jeff,
> That sounds great.  We encourage any and all in the community to host
> Airflow meet-ups.
>
> We recommend the following meet-up format:
> https://cwiki.apache.org/confluence/display/AIRFLOW/Meetups
>
> Though the format is a suggestion, we do prefer you shoot for 1 host
> speaker and 2 guest speakers.
>
> When you are ready (i.e. confirmed with our organization of the date, time,
> venue, and 3 speakers), shoot us a note on this list and one of the
> committers will set up the Meet-up and tweet/announce the meet-up using the
> Airflow twitter channel. We also suggest, as mentioned on the page, that
> you stream and record the video for posterity.
>
> -s
>
> On Thu, Aug 4, 2016 at 6:40 PM, Jeff Balogh  wrote:
>
> > Hey y'all, we're heavy users of Airflow at Stripe and we'd be happy to
> > host a meetup in September. Our office is in San Francisco down by
> > AT park. How do we make that happen?
> >
>


Re: Airflow Meetup at Stripe in September?

2016-08-04 Thread Siddharth Anand
Jeff,
That sounds great.  We encourage any and all in the community to host
Airflow meet-ups.

We recommend the following meet-up format:
https://cwiki.apache.org/confluence/display/AIRFLOW/Meetups

Though the format is a suggestion, we do prefer you shoot for 1 host
speaker and 2 guest speakers.

When you are ready (i.e. confirmed with our organization of the date, time,
venue, and 3 speakers), shoot us a note on this list and one of the
committers will set up the Meet-up and tweet/announce the meet-up using the
Airflow twitter channel. We also suggest, as mentioned on the page, that
you stream and record the video for posterity.

-s

On Thu, Aug 4, 2016 at 6:40 PM, Jeff Balogh  wrote:

> Hey y'all, we're heavy users of Airflow at Stripe and we'd be happy to
> host a meetup in September. Our office is in San Francisco down by
> AT park. How do we make that happen?
>


Airflow Meetup at Stripe in September?

2016-08-04 Thread Jeff Balogh
Hey y'all, we're heavy users of Airflow at Stripe and we'd be happy to
host a meetup in September. Our office is in San Francisco down by
AT park. How do we make that happen?


Airflow Developers Meeting - 08/03 Notes

2016-08-04 Thread Gurer Kiratli
Agenda


   -

   Committers sync-up: progress and plans
   -

  Max to do a recap since last release
  -

  Airbnb
  -

  Anyone else?
  -

   Cooperation Best Practices
   -

   Solicit for feedback for Impersonation Design Review
   -

   Release Schedule, Management
   -

   Roadmap discussion


Attendees: Jeremiah Lowin, Chris Riccomini, Andrew Phillips(?), Maxime
Beauchemin, Paul Yang, Dan Davydov, Xuanji Li, George Ke, Arthur Wiedmer,
Gurer Kiratli

Notes


   -

   Recap for the overall project
   -

  [Max]
  https://gist.github.com/mistercrunch/5460483ec764e2a1cb816c6b1d6ad5a3
  -

   Airbnb Recap
   -

  [Paul] Airbnb is continuing work on features related to migrating
  more jobs onto Airflow - DB connection scalability, impersonation,
  refreshing web UI, DAG git versioning, task resource isolation
  -

   Other Recaps
   -

  [Chris] Improvements to Import functionality. Joy did some work
  around CLIs and mark success in collaboration with Bolke.
  -

  [Jeremiah] Came up with the Merge tool. Focusing on the work around
  configuration.
  -

   Meeting cadence
   -

  [Max] We can have monthly meetings. Let’s start with monthly we can
  change the cadence. We should invite all contributors. We should
specify in
  the invite that everyone is welcome. Currently posting this on
the mailing
  list and wiki.
  -

  *Action Item*  Gurer to schedule monthly meetings next one at Airbnb.
  Airbnb is welcoming folks to come onsite
  -

   Blog/Documentation
   -

  [Max, Chris, Jeremiah] Having a blog would be useful. We can
  structure our thoughts.
  -

  [Andrew] Blog should have better language with screenshots and all.
  This raises the level of effort bar and this might mean that
nobody really
  does it.
  -

  [Max] Maybe use Medium? We can try it out.
  -

  [Max] Haven’t looked at the documentation for a long time. I don’t
  really know about its quality.
  -

  [Arthur] Looking at the email list responses, seems like the
  documentation is missing certain elements.
  -

  [Max] Maybe we can interview people who just started and identify
  gaps.
  -

  *Action Item* [Jeremiah] We can also ask the mailing list about this.
  He will send this out.
  -

   Cooperation guidelines
   -

  [Max, Arthur]
  https://cwiki.apache.org/confluence/display/AIRFLOW/Community+Guidelines
  -

  [Chris] Let’s link the guidelines to contributor md . This should be
  discoverable
  -

   Release
   -

  [Chris, Max] Bolke might have branch with High charts and Licensing
  cherry picked. We can use this as a dry run. We can test the process.
  -

  [Max] Big release is probably around September.
  -

  [Andrew] If we want to have a true practice, we need to be diligently
  following Apache guidelines.
  -

  [Max] We might want to revive the email thread about how we should do
  this.
  -

  [Chris] It would be great for someone who hasn’t done this before to
  do this.
  -

  [Andrew] Building a script for this is smart otherwise the manual
  work is painful from experience.
  https://cwiki.apache.org/confluence/display/JCLOUDS/Releasing+jclouds


https://cwiki.apache.org/confluence/display/JCLOUDS/Validate+a+Release

   -

  *Action Item* [Max] Max will own this. This will happen by 08/19.
  -

  [Arthur] For the main release around mid September. We need to
  confirm this date.
  -

   Roadmap
   -

  [Max] Maybe each company puts their own section? Or should we have a
  more cohesive roadmap?
  -

  *Action Item* Gurer to schedule a roadmap discussion
  -

   Misc
   -

  [Max] For executor thinking about sticking with Celery and maybe
  having Docker implemented on top of it.
  -

  [Max] We might want to explore Sphinx for generating documentation
  from code.
  - [Max] Please review the impersonation design review. This is pretty
  simple and reversible.


Re: DAG status still running when all its tasks are complete

2016-08-04 Thread Nadeem Ahmed Nazeer
Hi Siddharth,

AIRFLOW-396  has been
assigned to you with requested information. Thanks for your help.

Please revert if any further information is required.

Thanks,
Nadeem

On Tue, Aug 2, 2016 at 10:27 PM, siddharth anand  wrote:

> Hi Nadeem,
> Can you open a JIRA, attach a DAG which I can run to reproduce your issue,
> and assign the JIRA to me?
> -s
>
> On Tue, Aug 2, 2016 at 8:40 PM, Nadeem Ahmed Nazeer 
> wrote:
>
> > Could someone please shed some light on this DAG status?
> >
> > My airflow version is 1.7.0. This is the only version that works for me
> > when it comes to scheduler. Any version above this, the scheduler gets
> > stuck without a trace and wouldn't schedule anything.
> >
> > Thanks,
> > Nadeem
> >
> > On Mon, Aug 1, 2016 at 2:29 PM, Nadeem Ahmed Nazeer  >
> > wrote:
> >
> > > Hello,
> > >
> > > I am facing a situation with Airflow where it doesn't flag the DAG's as
> > > success even though all of the tasks in that DAG are complete.
> > >
> > > I have a BranchPythonOperator which forks into running all downstream
> > > tasks or just a single task (dummy operator as an endpoint) depending
> if
> > > files exists to be processed or not for that cycle.
> > >
> > > I see that in the DAG's that go to the dummy operator, the status of
> the
> > > DAG always shows running where its complete. I can't get to figure out
> > what
> > > is stopping the scheduler from marking this DAG success. Since it is in
> > > running state, every time the scheduler checks the status of this DAG
> > which
> > > is unnecessary.
> > >
> > > Please advise.
> > >
> > > Thanks,
> > > Nadeem
> > >
> > >
> >
>


Broken unit tests -- request for help

2016-08-04 Thread Jeremiah Lowin
We have a few non-deterministic unit test failures that are affecting many
-- but not all -- PRs. I believe they are being ignored as "unrelated" but
they have the potential to mask real issues and should be addressed.
Unfortunately they're out of my expertise so I'm going to list the ones
I've identified and hope someone smarter than me can see if they can help!

In particular, we have a number of simple PR's that should obviously have
no problems (typos, readme edits, etc.) that are nonetheless failing tests,
causing frustration for all. Here is one from just this morning:
https://github.com/apache/incubator-airflow/pull/1705/files

Thanks in advance!

1. Python 3 Mysql (this one is pretty common), due to not being able to
find "beeline" which I believe is related to Hive. This is the error:

==

ERROR: test_mysql_to_hive_partition (tests.TransferTests)

--

Traceback (most recent call last):

  File 
"/home/travis/build/apache/incubator-airflow/tests/operators/operators.py",
line 208, in test_mysql_to_hive_partition

t.run(start_date=DEFAULT_DATE, end_date=DEFAULT_DATE, force=True)

  File "/home/travis/build/apache/incubator-airflow/airflow/models.py",
line 2350, in run

force=force,)

  File "/home/travis/build/apache/incubator-airflow/airflow/utils/db.py",
line 54, in wrapper

result = func(*args, **kwargs)

  File "/home/travis/build/apache/incubator-airflow/airflow/models.py",
line 1388, in run

result = task_copy.execute(context=context)

  File 
"/home/travis/build/apache/incubator-airflow/airflow/operators/mysql_to_hive.py",
line 131, in execute

recreate=self.recreate)

  File 
"/home/travis/build/apache/incubator-airflow/airflow/hooks/hive_hooks.py",
line 322, in load_file

self.run_cli(hql)

  File 
"/home/travis/build/apache/incubator-airflow/airflow/hooks/hive_hooks.py",
line 212, in run_cli

cwd=tmp_dir)

  File "/opt/python/3.4.2/lib/python3.4/subprocess.py", line 858, in __init__

restore_signals, start_new_session)

  File "/opt/python/3.4.2/lib/python3.4/subprocess.py", line 1456, in
_execute_child

raise child_exception_type(errno_num, err_msg)

nose.proxy.FileNotFoundError: [Errno 2] No such file or directory: 'beeline'


2. Python 3 Postgres (this one is really infrequent):

==

FAIL: Test that ignore_first_depends_on_past doesn't affect results

--

Traceback (most recent call last):

  File "/home/travis/build/apache/incubator-airflow/tests/jobs.py",
line 349, in test_dagrun_deadlock_ignore_depends_on_past

run_kwargs=dict(ignore_first_depends_on_past=True))

  File "/home/travis/build/apache/incubator-airflow/airflow/utils/db.py",
line 54, in wrapper

result = func(*args, **kwargs)

  File "/home/travis/build/apache/incubator-airflow/tests/jobs.py",
line 221, in evaluate_dagrun

self.assertEqual(ti.state, expected_state)

nose.proxy.AssertionError: None != 'success'

3. Mysql (py2 and py3, infrequent). This appears to happen when the
SLA code is called wiht mysql. Bizarrely, this doesn't appear to
actually raise an error in the test -- it just prints a logging error.
It must be trapped somewhere.

ERROR [airflow.jobs.SchedulerJob] Boolean value of this clause is not defined

Traceback (most recent call last):

  File "/home/travis/build/apache/incubator-airflow/airflow/jobs.py",
line 667, in _do_dags

self.manage_slas(dag)

  File "/home/travis/build/apache/incubator-airflow/airflow/utils/db.py",
line 53, in wrapper

result = func(*args, **kwargs)

  File "/home/travis/build/apache/incubator-airflow/airflow/jobs.py",
line 301, in manage_slas

.all()

  File 
"/home/travis/build/apache/incubator-airflow/.tox/py34-cdh-airflow_backend_mysql/lib/python3.4/site-packages/sqlalchemy/sql/elements.py",
line 2760, in __bool__

raise TypeError("Boolean value of this clause is not defined")

TypeError: Boolean value of this clause is not defined