Task moving from RUNNING to SUCCEEDED to REMOVED

2018-05-16 Thread Nasron Cheong
Hi,

I'm trying to debug an issue where a task in my dag is being set to REMOVED
state, and the tasks after are running waiting for preceding tasks to
succeed/fail.

If I delete the removed tasks from the meta database, the tasks will start
running, move to succeeded state, but then promptly move to REMOVED state
again.

A change that may have caused this was to change the dag from:

latest_only >> cleanup_hdfs >> run_mr
run_mr >> refresh_tables >> refresh_done_file
run_mr >> refresh_global_table >> refresh_done_file
refresh_done_file >> reap_old

to:

latest_only >> cleanup_hdfs >> run_mr
run_mr >> refresh_tables >> *compute_stats_orgs* >> refresh_done_file
run_mr >> refresh_global_table >> *compute_stats_global* >>
refresh_done_file
refresh_done_file >> reap_old

Where the highlighted tasks were added.

Any idea what would cause this?

I've tried clearing all task instances for the dag, restarting all
services, no luck.

I'm on 1.9.0

Thanks!

- Nasron


Re: tabulate 0.82

2018-05-16 Thread Ruslan Dautkhanov
https://issues.apache.org/jira/browse/AIRFLOW-2476
https://github.com/apache/incubator-airflow/pull/3366/

Thanks!



-- 
Ruslan Dautkhanov

On Tue, Jan 16, 2018 at 11:51 PM, Bolke de Bruin  wrote:

> It is a protection against major upgraded that are not backwards
> compatible. Please test and provide a PR if it is ok.
>
> Cheers
> Bolke
>
> Verstuurd vanaf mijn iPad
>
> > Op 16 jan. 2018 om 22:36 heeft Ruslan Dautkhanov 
> het volgende geschreven:
> >
> > We have tabulate 0.8.2.. requirements demand tabulate<0.8.0,>=0.7.5
> >
> > Are there known issues with tabulate versions higher than 0.8.0?
> >
> >
> > $ airflow kerberos -D
> >>
> >
> >
> >> Traceback (most recent call last):
> >>  File "/opt/cloudera/parcels/Anaconda/bin/airflow", line 4, in 
> >>
> >> __import__('pkg_resources').require('apache-airflow==1.10.
> 0.dev0+incubating')
> >>  File
> >> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-
> packages/setuptools-27.2.0-py2.7.egg/pkg_resources/__init__.py",
> >> line 2985, in 
> >>  File
> >> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-
> packages/setuptools-27.2.0-py2.7.egg/pkg_resources/__init__.py",
> >> line 2971, in _call_aside
> >>  File
> >> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-
> packages/setuptools-27.2.0-py2.7.egg/pkg_resources/__init__.py",
> >> line 2998, in _initialize_master_working_set
> >>  File
> >> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-
> packages/setuptools-27.2.0-py2.7.egg/pkg_resources/__init__.py",
> >> line 662, in _build_master
> >>  File
> >> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-
> packages/setuptools-27.2.0-py2.7.egg/pkg_resources/__init__.py",
> >> line 675, in _build_from_requirements
> >>  File
> >> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-
> packages/setuptools-27.2.0-py2.7.egg/pkg_resources/__init__.py",
> >> line 854, in resolve
> >> pkg_resources.DistributionNotFound: The 'tabulate<0.8.0,>=0.7.5'
> >> distribution was not found and is required by apache-airflow
> >
> >
> >
> > --
> > Ruslan Dautkhanov
>


Re: Python3 and sensors module

2018-05-16 Thread Joy Gao
Not a full patch to replace snakebite with libhdfs, but should temporarily
unblock ci/cd: https://github.com/apache/incubator-airflow/pull/3365

On Wed, May 16, 2018 at 10:00 AM, Bolke de Bruin  wrote:

> Snakebite is not py3 compatible. We should move to libhdfs. Happy to take
> patches!
>
> B.
>
> Verstuurd vanaf mijn iPad
>
> > Op 16 mei 2018 om 18:57 heeft Cindy Rottinghuis <
> cindyrottingh...@gmail.com> het volgende geschreven:
> >
> > Here is a sample of what I am seeing.
> > I am running on ubuntu, python 3.5.   This is a subset of the full
> message.  Once it went into the snakebite package, checked and the library
> wasn’t supported in python3.
> > ….
> > File "/usr/local/airflow/dags/dim_date.py", line 5, in 
> >from airflow.operators.sensors import S3KeySensor
> >  File "/usr/local/lib/python3.5/dist-packages/airflow/operators/sensors.py",
> line 34, in 
> >from airflow.hooks.hdfs_hook import HDFSHook
> >  File "/usr/local/lib/python3.5/dist-packages/airflow/hooks/hdfs_hook.py",
> line 20, in 
> >from snakebite.client import Client, HAClient, Namenode,
> AutoConfigClient
> >  File "/usr/local/lib/python3.5/dist-packages/snakebite/client.py",
> line 1473
> >baseTime = min(time * (1L << retries), cap);
> >^
> >
> >
> >> On May 16, 2018, at 9:24 AM, Cindy Rottinghuis <
> cindyrottingh...@gmail.com> wrote:
> >>
> >> Hi,
> >>
> >> Yes, it looks like all of the other sensors will work under Python3,
> but not hdfs.  I am planning to use the s3 sensor, which is wrapped up in
> the sensors.py module.  My issue is that when I test my dag or airflow
> installation under python3, I get errors about the hdfs_hook, which I’m not
> using.   Other then creating my own version of the sensors.py file and
> removing the hdfs related functions/libraries, is there any thing else I
> can do to work around this?
> >>
> >>
> >>
> >>> On May 16, 2018, at 1:33 AM, Driesprong, Fokko 
> wrote:
> >>>
> >>> Hi Cindy,
> >>>
> >>> The other sensors should work under Python3. We try to support Python3
> as
> >>> much as possible, but sometimes libraries are used that are not
> compatible.
> >>> Could you describe what you are running into?
> >>>
> >>> Cheers, Fokko
> >>>
> >>> 2018-05-16 5:36 GMT+02:00 Cindy Rottinghuis <
> cindyrottingh...@gmail.com>:
> >>>
>  Hi,
> 
>  Are there any plans to update the HDFS_hook.py script to remove the
>  reference to the snakebite python library? I’d like to run airflow on
>  python3, and this is causing some issues.   The hdfs_hook script is
>  referenced in the sensors module.
> 
>  Any suggestions?
> 
>  Thanks,
>  Cindy
> >>
> >
>
>


Re: Python3 and sensors module

2018-05-16 Thread Bolke de Bruin
Snakebite is not py3 compatible. We should move to libhdfs. Happy to take 
patches!

B.

Verstuurd vanaf mijn iPad

> Op 16 mei 2018 om 18:57 heeft Cindy Rottinghuis  
> het volgende geschreven:
> 
> Here is a sample of what I am seeing.   
> I am running on ubuntu, python 3.5.   This is a subset of the full message.  
> Once it went into the snakebite package, checked and the library wasn’t 
> supported in python3.
> ….
> File "/usr/local/airflow/dags/dim_date.py", line 5, in 
>from airflow.operators.sensors import S3KeySensor
>  File "/usr/local/lib/python3.5/dist-packages/airflow/operators/sensors.py", 
> line 34, in 
>from airflow.hooks.hdfs_hook import HDFSHook
>  File "/usr/local/lib/python3.5/dist-packages/airflow/hooks/hdfs_hook.py", 
> line 20, in 
>from snakebite.client import Client, HAClient, Namenode, AutoConfigClient
>  File "/usr/local/lib/python3.5/dist-packages/snakebite/client.py", line 1473
>baseTime = min(time * (1L << retries), cap);
>^
> 
> 
>> On May 16, 2018, at 9:24 AM, Cindy Rottinghuis  
>> wrote:
>> 
>> Hi,
>> 
>> Yes, it looks like all of the other sensors will work under Python3, but not 
>> hdfs.  I am planning to use the s3 sensor, which is wrapped up in the 
>> sensors.py module.  My issue is that when I test my dag or airflow 
>> installation under python3, I get errors about the hdfs_hook, which I’m not 
>> using.   Other then creating my own version of the sensors.py file and 
>> removing the hdfs related functions/libraries, is there any thing else I can 
>> do to work around this?
>> 
>> 
>> 
>>> On May 16, 2018, at 1:33 AM, Driesprong, Fokko  wrote:
>>> 
>>> Hi Cindy,
>>> 
>>> The other sensors should work under Python3. We try to support Python3 as
>>> much as possible, but sometimes libraries are used that are not compatible.
>>> Could you describe what you are running into?
>>> 
>>> Cheers, Fokko
>>> 
>>> 2018-05-16 5:36 GMT+02:00 Cindy Rottinghuis :
>>> 
 Hi,
 
 Are there any plans to update the HDFS_hook.py script to remove the
 reference to the snakebite python library? I’d like to run airflow on
 python3, and this is causing some issues.   The hdfs_hook script is
 referenced in the sensors module.
 
 Any suggestions?
 
 Thanks,
 Cindy
>> 
> 


Re: Python3 and sensors module

2018-05-16 Thread Cindy Rottinghuis
Here is a sample of what I am seeing.   
I am running on ubuntu, python 3.5.   This is a subset of the full message.  
Once it went into the snakebite package, checked and the library wasn’t 
supported in python3.
….
 File "/usr/local/airflow/dags/dim_date.py", line 5, in 
from airflow.operators.sensors import S3KeySensor
  File "/usr/local/lib/python3.5/dist-packages/airflow/operators/sensors.py", 
line 34, in 
from airflow.hooks.hdfs_hook import HDFSHook
  File "/usr/local/lib/python3.5/dist-packages/airflow/hooks/hdfs_hook.py", 
line 20, in 
from snakebite.client import Client, HAClient, Namenode, AutoConfigClient
  File "/usr/local/lib/python3.5/dist-packages/snakebite/client.py", line 1473
baseTime = min(time * (1L << retries), cap);
^


> On May 16, 2018, at 9:24 AM, Cindy Rottinghuis  
> wrote:
> 
> Hi,
> 
> Yes, it looks like all of the other sensors will work under Python3, but not 
> hdfs.  I am planning to use the s3 sensor, which is wrapped up in the 
> sensors.py module.  My issue is that when I test my dag or airflow 
> installation under python3, I get errors about the hdfs_hook, which I’m not 
> using.   Other then creating my own version of the sensors.py file and 
> removing the hdfs related functions/libraries, is there any thing else I can 
> do to work around this?
> 
> 
> 
>> On May 16, 2018, at 1:33 AM, Driesprong, Fokko  wrote:
>> 
>> Hi Cindy,
>> 
>> The other sensors should work under Python3. We try to support Python3 as
>> much as possible, but sometimes libraries are used that are not compatible.
>> Could you describe what you are running into?
>> 
>> Cheers, Fokko
>> 
>> 2018-05-16 5:36 GMT+02:00 Cindy Rottinghuis :
>> 
>>> Hi,
>>> 
>>> Are there any plans to update the HDFS_hook.py script to remove the
>>> reference to the snakebite python library? I’d like to run airflow on
>>> python3, and this is causing some issues.   The hdfs_hook script is
>>> referenced in the sensors module.
>>> 
>>> Any suggestions?
>>> 
>>> Thanks,
>>> Cindy
> 



Airflow dev mailing list DMARC settings

2018-05-16 Thread James Meickle
Hi folks,

I got an email from our email administrator that:

"However, it looks like the AirFlow mailing list isn't rewriting email
headers in messages sent to the list, such that all messages sent to the
list from domains that use DMARC are non-compliant.

At some point we're going to have to flip the bit on DMARC rejection for
our domain, even if the maintainers of this list don't fix it to be
compliant, at which point at least some recipients of the list will stop
receiving your emails to the list because their mail servers will reject
them as non-compliant."

The only ASF info on this I could find was here:
https://blogs.apache.org/infra/entry/dmarc_filtering_on_lists_that

I don't know if that blog post is still up to date, but it implies that a
project member would need to file a JIRA issue requesting a change.

-James M.


Re: Airflow with Celery

2018-05-16 Thread Driesprong, Fokko
I had similar issues with Airflow running the Celery executor.

The celery_result_backend should be a persistent database like Postgres or
MySql. What broker are you using? I would recommend using Redis or
RabbitMQ, depending on what you like the most.

Cheers, Fokko

2018-05-15 21:12 GMT+02:00 David Capwell :

> What I find is that when celery rejects we hit this.  For us we don't do
> work on the hosts so solve by over provisioning tasks in celery
>
> On Tue, May 15, 2018, 6:30 AM Andy Cooper 
> wrote:
>
>> I have had very similar issues when there was a problem with the
>> connection
>> string pointing to the message broker. Triple check those connection
>> strings and attempt to connect outside of airflow.
>>
>> On Tue, May 15, 2018 at 9:27 AM Goutham Pratapa > >
>> wrote:
>>
>> > Hi all,
>> >
>> > I have been using airflow with Celery executor in the background
>> >
>> > https://hastebin.com/sipecovomi.ini --> airflow.cfg
>> >
>> > https://hastebin.com/urutokuvoq.py   --> The dag I have been using
>> >
>> >
>> >
>> > This shows that the dag is always in running state.
>> >
>> >
>> >
>> >
>> > Airflow flower shows nothing in the tasks or in the broker.
>> >
>> >
>> > Did I miss anything can anyone help me in this regard.
>> >
>> >
>> > --
>> > Cheers !!!
>> > Goutham Pratapa
>> >
>>
>


Re: Python3 and sensors module

2018-05-16 Thread Driesprong, Fokko
Hi Cindy,

The other sensors should work under Python3. We try to support Python3 as
much as possible, but sometimes libraries are used that are not compatible.
Could you describe what you are running into?

Cheers, Fokko

2018-05-16 5:36 GMT+02:00 Cindy Rottinghuis :

> Hi,
>
> Are there any plans to update the HDFS_hook.py script to remove the
> reference to the snakebite python library? I’d like to run airflow on
> python3, and this is causing some issues.   The hdfs_hook script is
> referenced in the sensors module.
>
> Any suggestions?
>
> Thanks,
> Cindy