[jira] [Commented] (AIRFLOW-2363) S3 remote logging appending tuple instead of str

2018-04-23 Thread Kevin Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449156#comment-16449156
 ] 

Kevin Yang commented on AIRFLOW-2363:
-

Hi [~hamlinkn], I've created a PR to fix it but I think I didn't tag you 
correctly on github [https://github.com/apache/incubator-airflow/pull/3259]

Do you think it is possible for you to test it in your infra end to end? It 
might take some extra work to have S3 set up on my side and I think you might 
benefit from a faster merge.

Thank you!

> S3 remote logging appending tuple instead of str
> 
>
> Key: AIRFLOW-2363
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2363
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Reporter: Kyle Hamlin
>Assignee: Kevin Yang
>Priority: Major
> Fix For: 1.10.0
>
>
> A recent merge into master that added support for Elasticsearch logging seems 
> to have broken S3 logging by returning a tuple instead of a string.
> [https://github.com/apache/incubator-airflow/commit/ec38ba9594395de04ec932481212a86fbe9ae107#diff-0442332ecbe42ebbf426911c68d8cd4aR128]
>  
> following errors thrown:
>  
> *Session NoneType error*
>  Traceback (most recent call last):
>    File 
> "/usr/local/lib/python3.6/site-packages/airflow/utils/log/s3_task_handler.py",
>  line 171, in s3_write
>      encrypt=configuration.conf.getboolean('core', 'ENCRYPT_S3_LOGS'),
>    File "/usr/local/lib/python3.6/site-packages/airflow/hooks/S3_hook.py", 
> line 274, in load_string
>      encrypt=encrypt)
>    File "/usr/local/lib/python3.6/site-packages/airflow/hooks/S3_hook.py", 
> line 313, in load_bytes
>      client = self.get_conn()
>    File "/usr/local/lib/python3.6/site-packages/airflow/hooks/S3_hook.py", 
> line 34, in get_conn
>      return self.get_client_type('s3')
>    File 
> "/usr/local/lib/python3.6/site-packages/airflow/contrib/hooks/aws_hook.py", 
> line 151, in get_client_type
>      session, endpoint_url = self._get_credentials(region_name)
>    File 
> "/usr/local/lib/python3.6/site-packages/airflow/contrib/hooks/aws_hook.py", 
> line 97, in _get_credentials
>      connection_object = self.get_connection(self.aws_conn_id)
>    File "/usr/local/lib/python3.6/site-packages/airflow/hooks/base_hook.py", 
> line 82, in get_connection
>      conn = random.choice(cls.get_connections(conn_id))
>    File "/usr/local/lib/python3.6/site-packages/airflow/hooks/base_hook.py", 
> line 77, in get_connections
>      conns = cls._get_connections_from_db(conn_id)
>    File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 
> 72, in wrapper
>      with create_session() as session:
>    File "/usr/local/lib/python3.6/contextlib.py", line 81, in __enter__
>      return next(self.gen)
>    File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 
> 41, in create_session
>      session = settings.Session()
>  TypeError: 'NoneType' object is not callable
>  
> *TypeError must be str not tuple*
>  [2018-04-16 18:37:28,200] ERROR in app: Exception on 
> /admin/airflow/get_logs_with_metadata [GET]
>  Traceback (most recent call last):
>    File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1982, in 
> wsgi_app
>      response = self.full_dispatch_request()
>    File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1614, in 
> full_dispatch_request
>      rv = self.handle_user_exception(e)
>    File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1517, in 
> handle_user_exception
>      reraise(exc_type, exc_value, tb)
>    File "/usr/local/lib/python3.6/site-packages/flask/_compat.py", line 33, 
> in reraise
>      raise value
>    File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1612, in 
> full_dispatch_request
>      rv = self.dispatch_request()
>    File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1598, in 
> dispatch_request
>      return self.view_functions[rule.endpoint](**req.view_args)
>    File "/usr/local/lib/python3.6/site-packages/flask_admin/base.py", line 
> 69, in inner
>      return self._run_view(f, *args, **kwargs)
>    File "/usr/local/lib/python3.6/site-packages/flask_admin/base.py", line 
> 368, in _run_view
>      return fn(self, *args, **kwargs)
>    File "/usr/local/lib/python3.6/site-packages/flask_login.py", line 755, in 
> decorated_view
>      return func(*args, **kwargs)
>    File "/usr/local/lib/python3.6/site-packages/airflow/www/utils.py", line 
> 269, in wrapper
>      return f(*args, **kwargs)
>    File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 
> 74, in wrapper
>      return func(*args, **kwargs)
>    File "/usr/local/lib/python3.6/site-packages/airflow/www/views.py", line 
> 770, in get_logs_with_metadata
>      logs, metadatas = 

[jira] [Updated] (AIRFLOW-2368) Relative path should be used when enqueuing task instace

2018-04-23 Thread Xiao Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Zhu updated AIRFLOW-2368:
--
Description: 
In our Airflow setup, scheduler and workers (connected via Celery) runs on 
different machines and have different root dirs. Currently in airflow when 
scheduler enqueues a task instance it uses the full absolute path of the DAG 
file, while the DAG is located at a different location on our workers (because 
of different root dirs). As a result workers can't find the DAG file.

When scheduler enqueues a task instances it should use its relative path 
(relative to DAGS_FOLDER).

  was:
In our Airflow setup, scheduler and workers (connected via Celery) runs on 
different machine and have different root dir. Currently in airflow when 
scheduler enqueues a task instance it uses the full absolute path of the DAG 
file, while the DAG is located at a different location on our workers (because 
of different root dir). As a result workers can't find the DAG file.

When scheduler enqueues a task instances it should use its relative path 
(relative to DAGS_FOLDER).


> Relative path should be used when enqueuing task instace
> 
>
> Key: AIRFLOW-2368
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2368
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Xiao Zhu
>Priority: Major
>
> In our Airflow setup, scheduler and workers (connected via Celery) runs on 
> different machines and have different root dirs. Currently in airflow when 
> scheduler enqueues a task instance it uses the full absolute path of the DAG 
> file, while the DAG is located at a different location on our workers 
> (because of different root dirs). As a result workers can't find the DAG file.
> When scheduler enqueues a task instances it should use its relative path 
> (relative to DAGS_FOLDER).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2340) SQLalchemy pessimistic connection handling not working

2018-04-23 Thread Xiao Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449053#comment-16449053
 ] 

Xiao Zhu commented on AIRFLOW-2340:
---

We see that with our mysql db too

> SQLalchemy pessimistic connection handling not working
> --
>
> Key: AIRFLOW-2340
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2340
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 1.9.0
>Reporter: John Arnold
>Priority: Critical
> Attachments: airflow_traceback.txt, webserver.txt
>
>
> Our scheduler keeps crashing, about once a day.  It seems to be triggered by 
> a failure to connect to the postgresql database, but then it doesn't recover 
> and crashes the scheduler over and over.
> The scheduler runs in a container in our environment, so after several 
> container restarts, docker gives up and the container stays down.
> Airflow should be able to recover from a connection failure without blowing 
> up the container altogether.  Perhaps some exponential backoff is needed?
>  
> See attached log from the scheduler.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2368) Relative path should be used when enqueuing task instace

2018-04-23 Thread Xiao Zhu (JIRA)
Xiao Zhu created AIRFLOW-2368:
-

 Summary: Relative path should be used when enqueuing task instace
 Key: AIRFLOW-2368
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2368
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Xiao Zhu


In our Airflow setup, scheduler and workers (connected via Celery) runs on 
different machine and have different root dir. Currently in airflow when 
scheduler enqueues a task instance it uses the full absolute path of the DAG 
file, while the DAG is located at a different location on our workers (because 
of different root dir). As a result workers can't find the DAG file.

When scheduler enqueues a task instances it should use its relative path 
(relative to DAGS_FOLDER).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-2363) S3 remote logging appending tuple instead of str

2018-04-23 Thread Kevin Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Yang reassigned AIRFLOW-2363:
---

Assignee: Kevin Yang

> S3 remote logging appending tuple instead of str
> 
>
> Key: AIRFLOW-2363
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2363
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Reporter: Kyle Hamlin
>Assignee: Kevin Yang
>Priority: Major
> Fix For: 1.10.0
>
>
> A recent merge into master that added support for Elasticsearch logging seems 
> to have broken S3 logging by returning a tuple instead of a string.
> [https://github.com/apache/incubator-airflow/commit/ec38ba9594395de04ec932481212a86fbe9ae107#diff-0442332ecbe42ebbf426911c68d8cd4aR128]
>  
> following errors thrown:
>  
> *Session NoneType error*
>  Traceback (most recent call last):
>    File 
> "/usr/local/lib/python3.6/site-packages/airflow/utils/log/s3_task_handler.py",
>  line 171, in s3_write
>      encrypt=configuration.conf.getboolean('core', 'ENCRYPT_S3_LOGS'),
>    File "/usr/local/lib/python3.6/site-packages/airflow/hooks/S3_hook.py", 
> line 274, in load_string
>      encrypt=encrypt)
>    File "/usr/local/lib/python3.6/site-packages/airflow/hooks/S3_hook.py", 
> line 313, in load_bytes
>      client = self.get_conn()
>    File "/usr/local/lib/python3.6/site-packages/airflow/hooks/S3_hook.py", 
> line 34, in get_conn
>      return self.get_client_type('s3')
>    File 
> "/usr/local/lib/python3.6/site-packages/airflow/contrib/hooks/aws_hook.py", 
> line 151, in get_client_type
>      session, endpoint_url = self._get_credentials(region_name)
>    File 
> "/usr/local/lib/python3.6/site-packages/airflow/contrib/hooks/aws_hook.py", 
> line 97, in _get_credentials
>      connection_object = self.get_connection(self.aws_conn_id)
>    File "/usr/local/lib/python3.6/site-packages/airflow/hooks/base_hook.py", 
> line 82, in get_connection
>      conn = random.choice(cls.get_connections(conn_id))
>    File "/usr/local/lib/python3.6/site-packages/airflow/hooks/base_hook.py", 
> line 77, in get_connections
>      conns = cls._get_connections_from_db(conn_id)
>    File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 
> 72, in wrapper
>      with create_session() as session:
>    File "/usr/local/lib/python3.6/contextlib.py", line 81, in __enter__
>      return next(self.gen)
>    File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 
> 41, in create_session
>      session = settings.Session()
>  TypeError: 'NoneType' object is not callable
>  
> *TypeError must be str not tuple*
>  [2018-04-16 18:37:28,200] ERROR in app: Exception on 
> /admin/airflow/get_logs_with_metadata [GET]
>  Traceback (most recent call last):
>    File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1982, in 
> wsgi_app
>      response = self.full_dispatch_request()
>    File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1614, in 
> full_dispatch_request
>      rv = self.handle_user_exception(e)
>    File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1517, in 
> handle_user_exception
>      reraise(exc_type, exc_value, tb)
>    File "/usr/local/lib/python3.6/site-packages/flask/_compat.py", line 33, 
> in reraise
>      raise value
>    File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1612, in 
> full_dispatch_request
>      rv = self.dispatch_request()
>    File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1598, in 
> dispatch_request
>      return self.view_functions[rule.endpoint](**req.view_args)
>    File "/usr/local/lib/python3.6/site-packages/flask_admin/base.py", line 
> 69, in inner
>      return self._run_view(f, *args, **kwargs)
>    File "/usr/local/lib/python3.6/site-packages/flask_admin/base.py", line 
> 368, in _run_view
>      return fn(self, *args, **kwargs)
>    File "/usr/local/lib/python3.6/site-packages/flask_login.py", line 755, in 
> decorated_view
>      return func(*args, **kwargs)
>    File "/usr/local/lib/python3.6/site-packages/airflow/www/utils.py", line 
> 269, in wrapper
>      return f(*args, **kwargs)
>    File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 
> 74, in wrapper
>      return func(*args, **kwargs)
>    File "/usr/local/lib/python3.6/site-packages/airflow/www/views.py", line 
> 770, in get_logs_with_metadata
>      logs, metadatas = handler.read(ti, try_number, metadata=metadata)
>    File 
> "/usr/local/lib/python3.6/site-packages/airflow/utils/log/file_task_handler.py",
>  line 165, in read
>      logs[i] += log
>  TypeError: must be str, not tuple



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2367) High POSTGRES DB CPU utilization

2018-04-23 Thread John Arnold (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448980#comment-16448980
 ] 

John Arnold commented on AIRFLOW-2367:
--

Manual repro of the top query only takes 250ms...

> High POSTGRES DB CPU utilization
> 
>
> Key: AIRFLOW-2367
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2367
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: Airflow 2.0, 1.9.0
>Reporter: John Arnold
>Priority: Major
> Attachments: cpu.png, postgres.png
>
>
> We are seeing steady state 70-90% CPU utilization.  It feels like a missing 
> index kind of problem, as our TPS rate is really low, I'm not seeing any long 
> running queries, connection counts are reasonable (low hundreds) and locks 
> also look reasonable (not many exclusive / write locks)
> We shut down the webserver and it doesn't go away, so it doesn't seem to be 
> in that part of the code. My guess is either the scheduler has an inefficient 
> query, or the (Celery) executor code path does.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AIRFLOW-2367) High POSTGRES DB CPU utilization

2018-04-23 Thread John Arnold (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448978#comment-16448978
 ] 

John Arnold edited comment on AIRFLOW-2367 at 4/23/18 10:34 PM:


I believe the "top talker" query is from this query in models.py:

@provide_session
 def get_task_instances(self, state=None, session=None):
 """
 Returns the task instances for this dag run
 """
 TI = TaskInstance
 tis = session.query(TI).filter(
 TI.dag_id == self.dag_id,
 TI.execution_date == self.execution_date,
 )


was (Author: johnarnold):
I believe the "top talker" query is from this query in models.py:

@provide_session
 def get_task_instances(self, state=None, session=None):
 """
 Returns the task instances for this dag run
 """
 TI = TaskInstance
 tis = session.query(TI).filter(
 TI.dag_id == self.dag_id,
 TI.execution_date == self.execution_date,
 )

> High POSTGRES DB CPU utilization
> 
>
> Key: AIRFLOW-2367
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2367
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: Airflow 2.0, 1.9.0
>Reporter: John Arnold
>Priority: Major
> Attachments: cpu.png, postgres.png
>
>
> We are seeing steady state 70-90% CPU utilization.  It feels like a missing 
> index kind of problem, as our TPS rate is really low, I'm not seeing any long 
> running queries, connection counts are reasonable (low hundreds) and locks 
> also look reasonable (not many exclusive / write locks)
> We shut down the webserver and it doesn't go away, so it doesn't seem to be 
> in that part of the code. My guess is either the scheduler has an inefficient 
> query, or the (Celery) executor code path does.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2367) High POSTGRES DB CPU utilization

2018-04-23 Thread John Arnold (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448978#comment-16448978
 ] 

John Arnold commented on AIRFLOW-2367:
--

I believe the "top talker" query is from this query in models.py:

@provide_session
 def get_task_instances(self, state=None, session=None):
 """
 Returns the task instances for this dag run
 """
 TI = TaskInstance
 tis = session.query(TI).filter(
 TI.dag_id == self.dag_id,
 TI.execution_date == self.execution_date,
 )

> High POSTGRES DB CPU utilization
> 
>
> Key: AIRFLOW-2367
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2367
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: Airflow 2.0, 1.9.0
>Reporter: John Arnold
>Priority: Major
> Attachments: cpu.png, postgres.png
>
>
> We are seeing steady state 70-90% CPU utilization.  It feels like a missing 
> index kind of problem, as our TPS rate is really low, I'm not seeing any long 
> running queries, connection counts are reasonable (low hundreds) and locks 
> also look reasonable (not many exclusive / write locks)
> We shut down the webserver and it doesn't go away, so it doesn't seem to be 
> in that part of the code. My guess is either the scheduler has an inefficient 
> query, or the (Celery) executor code path does.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AIRFLOW-2367) High POSTGRES DB CPU utilization

2018-04-23 Thread John Arnold (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448974#comment-16448974
 ] 

John Arnold edited comment on AIRFLOW-2367 at 4/23/18 10:31 PM:


This insert is taking the second-most time (about 15% of total time):

INSERT INTO log (dttm, dag_id, task_id, event, execution_date, owner, extra) 
VALUES (?::timestamptz, ?, ?, ?, ?::timestamptz, ?, ?) RETURNING log.id

I verified there are no indexes, triggers or weird constraints that would make 
it slow, it just has a high volume.


was (Author: johnarnold):
This insert is taking the second-most time (about 1/3 of the above):

INSERT INTO log (dttm, dag_id, task_id, event, execution_date, owner, extra) 
VALUES (?::timestamptz, ?, ?, ?, ?::timestamptz, ?, ?) RETURNING log.id

I verified there are no indexes, triggers or weird constraints that would make 
it slow, it just has a high volume.

> High POSTGRES DB CPU utilization
> 
>
> Key: AIRFLOW-2367
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2367
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: Airflow 2.0, 1.9.0
>Reporter: John Arnold
>Priority: Major
> Attachments: cpu.png, postgres.png
>
>
> We are seeing steady state 70-90% CPU utilization.  It feels like a missing 
> index kind of problem, as our TPS rate is really low, I'm not seeing any long 
> running queries, connection counts are reasonable (low hundreds) and locks 
> also look reasonable (not many exclusive / write locks)
> We shut down the webserver and it doesn't go away, so it doesn't seem to be 
> in that part of the code. My guess is either the scheduler has an inefficient 
> query, or the (Celery) executor code path does.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AIRFLOW-2367) High POSTGRES DB CPU utilization

2018-04-23 Thread John Arnold (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448969#comment-16448969
 ] 

John Arnold edited comment on AIRFLOW-2367 at 4/23/18 10:30 PM:


This query is taking the most time (about 50% of total time in the top 10 
queries):

 SELECT task_instance.try_number AS task_instance_try_number, 
task_instance.task_id AS task_instance_task_id, task_instance.dag_id AS 
task_instance_dag_id, task_instance.execution_date AS 
task_instance_execution_date, task_instance.start_date AS 
task_instance_start_date, task_instance.end_date AS task_instance_end_date, 
task_instance.duration AS task_instance_duration, task_instance.state AS 
task_instance_state, task_instance.max_tries AS task_instance_max_tries, 
task_instance.hostname AS task_instance_hostname, task_instance.unixname AS 
task_instance_unixname, task_instance.job_id AS task_instance_job_id, 
task_instance.pool AS task_instance_pool, task_instance.queue AS 
task_instance_queue, task_instance.priority_weight AS 
task_instance_priority_weight, task_instance.operator AS 
task_instance_operator, task_instance.queued_dttm AS task_instance_queued_dttm, 
task_instance.pid AS task_instance_pid, task_instance.uuid AS 
task_instance_uuid 
 FROM task_instance 
 WHERE task_instance.dag_id = ? AND task_instance.execution_date = 
?::timestamptz


was (Author: johnarnold):
This query is taking the most time:
SELECT task_instance.try_number AS task_instance_try_number, 
task_instance.task_id AS task_instance_task_id, task_instance.dag_id AS 
task_instance_dag_id, task_instance.execution_date AS 
task_instance_execution_date, task_instance.start_date AS 
task_instance_start_date, task_instance.end_date AS task_instance_end_date, 
task_instance.duration AS task_instance_duration, task_instance.state AS 
task_instance_state, task_instance.max_tries AS task_instance_max_tries, 
task_instance.hostname AS task_instance_hostname, task_instance.unixname AS 
task_instance_unixname, task_instance.job_id AS task_instance_job_id, 
task_instance.pool AS task_instance_pool, task_instance.queue AS 
task_instance_queue, task_instance.priority_weight AS 
task_instance_priority_weight, task_instance.operator AS 
task_instance_operator, task_instance.queued_dttm AS task_instance_queued_dttm, 
task_instance.pid AS task_instance_pid, task_instance.uuid AS 
task_instance_uuid 
FROM task_instance 
WHERE task_instance.dag_id = ? AND task_instance.execution_date = ?::timestamptz

> High POSTGRES DB CPU utilization
> 
>
> Key: AIRFLOW-2367
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2367
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: Airflow 2.0, 1.9.0
>Reporter: John Arnold
>Priority: Major
> Attachments: cpu.png, postgres.png
>
>
> We are seeing steady state 70-90% CPU utilization.  It feels like a missing 
> index kind of problem, as our TPS rate is really low, I'm not seeing any long 
> running queries, connection counts are reasonable (low hundreds) and locks 
> also look reasonable (not many exclusive / write locks)
> We shut down the webserver and it doesn't go away, so it doesn't seem to be 
> in that part of the code. My guess is either the scheduler has an inefficient 
> query, or the (Celery) executor code path does.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2367) High POSTGRES DB CPU utilization

2018-04-23 Thread John Arnold (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448974#comment-16448974
 ] 

John Arnold commented on AIRFLOW-2367:
--

This insert is taking the second-most time (about 1/3 of the above):

INSERT INTO log (dttm, dag_id, task_id, event, execution_date, owner, extra) 
VALUES (?::timestamptz, ?, ?, ?, ?::timestamptz, ?, ?) RETURNING log.id

I verified there are no indexes, triggers or weird constraints that would make 
it slow, it just has a high volume.

> High POSTGRES DB CPU utilization
> 
>
> Key: AIRFLOW-2367
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2367
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: Airflow 2.0, 1.9.0
>Reporter: John Arnold
>Priority: Major
> Attachments: cpu.png, postgres.png
>
>
> We are seeing steady state 70-90% CPU utilization.  It feels like a missing 
> index kind of problem, as our TPS rate is really low, I'm not seeing any long 
> running queries, connection counts are reasonable (low hundreds) and locks 
> also look reasonable (not many exclusive / write locks)
> We shut down the webserver and it doesn't go away, so it doesn't seem to be 
> in that part of the code. My guess is either the scheduler has an inefficient 
> query, or the (Celery) executor code path does.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2367) High POSTGRES DB CPU utilization

2018-04-23 Thread John Arnold (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448969#comment-16448969
 ] 

John Arnold commented on AIRFLOW-2367:
--

This query is taking the most time:
SELECT task_instance.try_number AS task_instance_try_number, 
task_instance.task_id AS task_instance_task_id, task_instance.dag_id AS 
task_instance_dag_id, task_instance.execution_date AS 
task_instance_execution_date, task_instance.start_date AS 
task_instance_start_date, task_instance.end_date AS task_instance_end_date, 
task_instance.duration AS task_instance_duration, task_instance.state AS 
task_instance_state, task_instance.max_tries AS task_instance_max_tries, 
task_instance.hostname AS task_instance_hostname, task_instance.unixname AS 
task_instance_unixname, task_instance.job_id AS task_instance_job_id, 
task_instance.pool AS task_instance_pool, task_instance.queue AS 
task_instance_queue, task_instance.priority_weight AS 
task_instance_priority_weight, task_instance.operator AS 
task_instance_operator, task_instance.queued_dttm AS task_instance_queued_dttm, 
task_instance.pid AS task_instance_pid, task_instance.uuid AS 
task_instance_uuid 
FROM task_instance 
WHERE task_instance.dag_id = ? AND task_instance.execution_date = ?::timestamptz

> High POSTGRES DB CPU utilization
> 
>
> Key: AIRFLOW-2367
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2367
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: Airflow 2.0, 1.9.0
>Reporter: John Arnold
>Priority: Major
> Attachments: cpu.png, postgres.png
>
>
> We are seeing steady state 70-90% CPU utilization.  It feels like a missing 
> index kind of problem, as our TPS rate is really low, I'm not seeing any long 
> running queries, connection counts are reasonable (low hundreds) and locks 
> also look reasonable (not many exclusive / write locks)
> We shut down the webserver and it doesn't go away, so it doesn't seem to be 
> in that part of the code. My guess is either the scheduler has an inefficient 
> query, or the (Celery) executor code path does.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AIRFLOW-2367) High POSTGRES DB CPU utilization

2018-04-23 Thread John Arnold (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448953#comment-16448953
 ] 

John Arnold edited comment on AIRFLOW-2367 at 4/23/18 10:18 PM:


[~bolke]  Any suggestions on what metrics or configuration options?  We've been 
looking over the database (top 10 queries etc) and there are no surprises that 
I can see. The top query by far is for task_instance table and all the 
conditionals are for indexed columns.  I went through basically every query in 
models.py looking for any that are using unindexed columns, and didn't find any.

I've attached a screenshot of the top 10 queries.

 

We played with our connection pool sizes, thinking that perhaps we were 
hammering the db with connections, but that didn't seem to make any difference. 
 We have the scheduler set with a connection pool of 20,  two instances of the 
webserver with connection pool = 5, and all the celery workers have connection 
pool = 1.


was (Author: johnarnold):
[~bolke]  Any suggestions on what metrics or configuration options?  We've been 
looking over the database (top 10 queries etc) and there are no surprises that 
I can see. The top query by far is for task_instance table and all the 
conditionals are for indexed columns.  I went through basically every query in 
models.py looking for any that are using unindexed columns, and didn't find any.

I've attached a screenshot of the top 10 queries.

 

We played with our connection pool sizes, thinking that perhaps we were 
hammering the db with connections, but that didn't seem to make any difference.

> High POSTGRES DB CPU utilization
> 
>
> Key: AIRFLOW-2367
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2367
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: Airflow 2.0, 1.9.0
>Reporter: John Arnold
>Priority: Major
> Attachments: cpu.png, postgres.png
>
>
> We are seeing steady state 70-90% CPU utilization.  It feels like a missing 
> index kind of problem, as our TPS rate is really low, I'm not seeing any long 
> running queries, connection counts are reasonable (low hundreds) and locks 
> also look reasonable (not many exclusive / write locks)
> We shut down the webserver and it doesn't go away, so it doesn't seem to be 
> in that part of the code. My guess is either the scheduler has an inefficient 
> query, or the (Celery) executor code path does.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2367) High POSTGRES DB CPU utilization

2018-04-23 Thread John Arnold (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448953#comment-16448953
 ] 

John Arnold commented on AIRFLOW-2367:
--

[~bolke]  Any suggestions on what metrics or configuration options?  We've been 
looking over the database (top 10 queries etc) and there are no surprises that 
I can see. The top query by far is for task_instance table and all the 
conditionals are for indexed columns.  I went through basically every query in 
models.py looking for any that are using unindexed columns, and didn't find any.

I've attached a screenshot of the top 10 queries.

 

We played with our connection pool sizes, thinking that perhaps we were 
hammering the db with connections, but that didn't seem to make any difference.

> High POSTGRES DB CPU utilization
> 
>
> Key: AIRFLOW-2367
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2367
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: Airflow 2.0, 1.9.0
>Reporter: John Arnold
>Priority: Major
> Attachments: cpu.png, postgres.png
>
>
> We are seeing steady state 70-90% CPU utilization.  It feels like a missing 
> index kind of problem, as our TPS rate is really low, I'm not seeing any long 
> running queries, connection counts are reasonable (low hundreds) and locks 
> also look reasonable (not many exclusive / write locks)
> We shut down the webserver and it doesn't go away, so it doesn't seem to be 
> in that part of the code. My guess is either the scheduler has an inefficient 
> query, or the (Celery) executor code path does.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2367) High POSTGRES DB CPU utilization

2018-04-23 Thread John Arnold (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Arnold updated AIRFLOW-2367:
-
Attachment: postgres.png

> High POSTGRES DB CPU utilization
> 
>
> Key: AIRFLOW-2367
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2367
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: Airflow 2.0, 1.9.0
>Reporter: John Arnold
>Priority: Major
> Attachments: cpu.png, postgres.png
>
>
> We are seeing steady state 70-90% CPU utilization.  It feels like a missing 
> index kind of problem, as our TPS rate is really low, I'm not seeing any long 
> running queries, connection counts are reasonable (low hundreds) and locks 
> also look reasonable (not many exclusive / write locks)
> We shut down the webserver and it doesn't go away, so it doesn't seem to be 
> in that part of the code. My guess is either the scheduler has an inefficient 
> query, or the (Celery) executor code path does.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2367) High POSTGRES DB CPU utilization

2018-04-23 Thread Bolke de Bruin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448812#comment-16448812
 ] 

Bolke de Bruin commented on AIRFLOW-2367:
-

You really need to provide more metrics and configuration options. The 
scheduler can be really busy when you have a lot of dags. worker memory also 
affects your performance. In other words let a DBA have a look at what you are 
doing and report back.

> High POSTGRES DB CPU utilization
> 
>
> Key: AIRFLOW-2367
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2367
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: Airflow 2.0, 1.9.0
>Reporter: John Arnold
>Priority: Major
> Attachments: cpu.png
>
>
> We are seeing steady state 70-90% CPU utilization.  It feels like a missing 
> index kind of problem, as our TPS rate is really low, I'm not seeing any long 
> running queries, connection counts are reasonable (low hundreds) and locks 
> also look reasonable (not many exclusive / write locks)
> We shut down the webserver and it doesn't go away, so it doesn't seem to be 
> in that part of the code. My guess is either the scheduler has an inefficient 
> query, or the (Celery) executor code path does.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2068) Mesos Executor should allow specifying optional Docker image before running command

2018-04-23 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-2068.
-
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3008
[https://github.com/apache/incubator-airflow/pull/3008]

> Mesos Executor should allow specifying optional Docker image before running 
> command
> ---
>
> Key: AIRFLOW-2068
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2068
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: executor
>Affects Versions: Airflow 1.8
>Reporter: Agraj Mangal
>Assignee: Agraj Mangal
>Priority: Major
> Fix For: 1.10.0
>
>
> In its current form, MesosExecutor schedules tasks on mesos slaves which just 
> contain airflow commands assuming that the mesos slaves already have airflow 
> installed and configured on them. This assumption goes against the Mesos 
> philosophy of having a heterogeneous cluster.
> Since mesos provides an option to pull a docker image before actually running 
> the actual task/command so this improvement changes the mesos_executor.py to 
> specify an optional docker image containing airflow which can be pulled on 
> slaves before running the actual airflow command. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2068) Mesos Executor should allow specifying optional Docker image before running command

2018-04-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448806#comment-16448806
 ] 

ASF subversion and git services commented on AIRFLOW-2068:
--

Commit d96534c268a13f8e6a0e1e288821a436322926c4 in incubator-airflow's branch 
refs/heads/v1-10-test from [~amangal]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=d96534c ]

[AIRFLOW-2068] MesosExecutor allows optional Docker image

In its current form, MesosExecutor schedules tasks
on mesos slaves which
just contain airflow commands assuming that the
mesos slaves already
have airflow installed and configured on them.
This assumption goes
against the Mesos philosophy of having a
heterogeneous cluster.

Since Mesos provides an option to pull a Docker
image before actually
running the actual task/command so this
improvement changes the
mesos_executor.py to specify an optional docker
image containing
airflow which can be pulled on slaves before
running the actual
airflow command. This also opens the door for an
optimization of
resources in a future PR, by allowing the
specification of CPU and
memory needed for each airflow task.

Closes #3008 from agrajm/AIRFLOW-2068

(cherry picked from commit 1f86299cf998729ea93c666481390451c9724ebc)
Signed-off-by: Bolke de Bruin 


> Mesos Executor should allow specifying optional Docker image before running 
> command
> ---
>
> Key: AIRFLOW-2068
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2068
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: executor
>Affects Versions: Airflow 1.8
>Reporter: Agraj Mangal
>Assignee: Agraj Mangal
>Priority: Major
> Fix For: 1.10.0
>
>
> In its current form, MesosExecutor schedules tasks on mesos slaves which just 
> contain airflow commands assuming that the mesos slaves already have airflow 
> installed and configured on them. This assumption goes against the Mesos 
> philosophy of having a heterogeneous cluster.
> Since mesos provides an option to pull a docker image before actually running 
> the actual task/command so this improvement changes the mesos_executor.py to 
> specify an optional docker image containing airflow which can be pulled on 
> slaves before running the actual airflow command. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2068) Mesos Executor should allow specifying optional Docker image before running command

2018-04-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448805#comment-16448805
 ] 

ASF subversion and git services commented on AIRFLOW-2068:
--

Commit d96534c268a13f8e6a0e1e288821a436322926c4 in incubator-airflow's branch 
refs/heads/v1-10-test from [~amangal]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=d96534c ]

[AIRFLOW-2068] MesosExecutor allows optional Docker image

In its current form, MesosExecutor schedules tasks
on mesos slaves which
just contain airflow commands assuming that the
mesos slaves already
have airflow installed and configured on them.
This assumption goes
against the Mesos philosophy of having a
heterogeneous cluster.

Since Mesos provides an option to pull a Docker
image before actually
running the actual task/command so this
improvement changes the
mesos_executor.py to specify an optional docker
image containing
airflow which can be pulled on slaves before
running the actual
airflow command. This also opens the door for an
optimization of
resources in a future PR, by allowing the
specification of CPU and
memory needed for each airflow task.

Closes #3008 from agrajm/AIRFLOW-2068

(cherry picked from commit 1f86299cf998729ea93c666481390451c9724ebc)
Signed-off-by: Bolke de Bruin 


> Mesos Executor should allow specifying optional Docker image before running 
> command
> ---
>
> Key: AIRFLOW-2068
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2068
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: executor
>Affects Versions: Airflow 1.8
>Reporter: Agraj Mangal
>Assignee: Agraj Mangal
>Priority: Major
> Fix For: 1.10.0
>
>
> In its current form, MesosExecutor schedules tasks on mesos slaves which just 
> contain airflow commands assuming that the mesos slaves already have airflow 
> installed and configured on them. This assumption goes against the Mesos 
> philosophy of having a heterogeneous cluster.
> Since mesos provides an option to pull a docker image before actually running 
> the actual task/command so this improvement changes the mesos_executor.py to 
> specify an optional docker image containing airflow which can be pulled on 
> slaves before running the actual airflow command. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2068) Mesos Executor should allow specifying optional Docker image before running command

2018-04-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448804#comment-16448804
 ] 

ASF subversion and git services commented on AIRFLOW-2068:
--

Commit 1f86299cf998729ea93c666481390451c9724ebc in incubator-airflow's branch 
refs/heads/master from [~amangal]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=1f86299 ]

[AIRFLOW-2068] MesosExecutor allows optional Docker image

In its current form, MesosExecutor schedules tasks
on mesos slaves which
just contain airflow commands assuming that the
mesos slaves already
have airflow installed and configured on them.
This assumption goes
against the Mesos philosophy of having a
heterogeneous cluster.

Since Mesos provides an option to pull a Docker
image before actually
running the actual task/command so this
improvement changes the
mesos_executor.py to specify an optional docker
image containing
airflow which can be pulled on slaves before
running the actual
airflow command. This also opens the door for an
optimization of
resources in a future PR, by allowing the
specification of CPU and
memory needed for each airflow task.

Closes #3008 from agrajm/AIRFLOW-2068


> Mesos Executor should allow specifying optional Docker image before running 
> command
> ---
>
> Key: AIRFLOW-2068
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2068
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: executor
>Affects Versions: Airflow 1.8
>Reporter: Agraj Mangal
>Assignee: Agraj Mangal
>Priority: Major
>
> In its current form, MesosExecutor schedules tasks on mesos slaves which just 
> contain airflow commands assuming that the mesos slaves already have airflow 
> installed and configured on them. This assumption goes against the Mesos 
> philosophy of having a heterogeneous cluster.
> Since mesos provides an option to pull a docker image before actually running 
> the actual task/command so this improvement changes the mesos_executor.py to 
> specify an optional docker image containing airflow which can be pulled on 
> slaves before running the actual airflow command. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


incubator-airflow git commit: [AIRFLOW-2068] MesosExecutor allows optional Docker image

2018-04-23 Thread bolke
Repository: incubator-airflow
Updated Branches:
  refs/heads/v1-10-test 64900e2b0 -> d96534c26


[AIRFLOW-2068] MesosExecutor allows optional Docker image

In its current form, MesosExecutor schedules tasks
on mesos slaves which
just contain airflow commands assuming that the
mesos slaves already
have airflow installed and configured on them.
This assumption goes
against the Mesos philosophy of having a
heterogeneous cluster.

Since Mesos provides an option to pull a Docker
image before actually
running the actual task/command so this
improvement changes the
mesos_executor.py to specify an optional docker
image containing
airflow which can be pulled on slaves before
running the actual
airflow command. This also opens the door for an
optimization of
resources in a future PR, by allowing the
specification of CPU and
memory needed for each airflow task.

Closes #3008 from agrajm/AIRFLOW-2068

(cherry picked from commit 1f86299cf998729ea93c666481390451c9724ebc)
Signed-off-by: Bolke de Bruin 


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/d96534c2
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/d96534c2
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/d96534c2

Branch: refs/heads/v1-10-test
Commit: d96534c268a13f8e6a0e1e288821a436322926c4
Parents: 64900e2
Author: Agraj Mangal 
Authored: Mon Apr 23 22:22:29 2018 +0200
Committer: Bolke de Bruin 
Committed: Mon Apr 23 22:23:06 2018 +0200

--
 airflow/config_templates/default_airflow.cfg   |   4 +
 airflow/config_templates/default_test.cfg  |   9 ++
 airflow/contrib/executors/__init__.py  |   6 +-
 airflow/contrib/executors/mesos_executor.py|  19 
 docs/configuration.rst |  39 +++-
 tests/contrib/executors/__init__.py|  23 +++--
 tests/contrib/executors/test_mesos_executor.py | 105 
 7 files changed, 188 insertions(+), 17 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/d96534c2/airflow/config_templates/default_airflow.cfg
--
diff --git a/airflow/config_templates/default_airflow.cfg 
b/airflow/config_templates/default_airflow.cfg
index 845342c..fa5eea0 100644
--- a/airflow/config_templates/default_airflow.cfg
+++ b/airflow/config_templates/default_airflow.cfg
@@ -479,6 +479,10 @@ authenticate = False
 # default_principal = admin
 # default_secret = admin
 
+# Optional Docker Image to run on slave before running the command
+# This image should be accessible from mesos slave i.e mesos slave
+# should be able to pull this docker image before executing the command.
+# docker_image_slave = puckel/docker-airflow
 
 [kerberos]
 ccache = /tmp/airflow_krb5_ccache

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/d96534c2/airflow/config_templates/default_test.cfg
--
diff --git a/airflow/config_templates/default_test.cfg 
b/airflow/config_templates/default_test.cfg
index 4145dfd..7c569cd 100644
--- a/airflow/config_templates/default_test.cfg
+++ b/airflow/config_templates/default_test.cfg
@@ -91,6 +91,15 @@ flower_host = 0.0.0.0
 flower_port = 
 default_queue = default
 
+[mesos]
+master = localhost:5050
+framework_name = Airflow
+task_cpu = 1
+task_memory = 256
+checkpoint = False
+authenticate = False
+docker_image_slave = test/docker-airflow
+
 [scheduler]
 job_heartbeat_sec = 1
 scheduler_heartbeat_sec = 5

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/d96534c2/airflow/contrib/executors/__init__.py
--
diff --git a/airflow/contrib/executors/__init__.py 
b/airflow/contrib/executors/__init__.py
index f0f8b68..b7f8352 100644
--- a/airflow/contrib/executors/__init__.py
+++ b/airflow/contrib/executors/__init__.py
@@ -7,13 +7,13 @@
 # to you under the Apache License, Version 2.0 (the
 # "License"); you may not use this file except in compliance
 # with the License.  You may obtain a copy of the License at
-# 
+#
 #   http://www.apache.org/licenses/LICENSE-2.0
-# 
+#
 # Unless required by applicable law or agreed to in writing,
 # software distributed under the License is distributed on an
 # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
-
+#

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/d96534c2/airflow/contrib/executors/mesos_executor.py
--
diff --git 

[jira] [Commented] (AIRFLOW-2068) Mesos Executor should allow specifying optional Docker image before running command

2018-04-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448803#comment-16448803
 ] 

ASF subversion and git services commented on AIRFLOW-2068:
--

Commit 1f86299cf998729ea93c666481390451c9724ebc in incubator-airflow's branch 
refs/heads/master from [~amangal]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=1f86299 ]

[AIRFLOW-2068] MesosExecutor allows optional Docker image

In its current form, MesosExecutor schedules tasks
on mesos slaves which
just contain airflow commands assuming that the
mesos slaves already
have airflow installed and configured on them.
This assumption goes
against the Mesos philosophy of having a
heterogeneous cluster.

Since Mesos provides an option to pull a Docker
image before actually
running the actual task/command so this
improvement changes the
mesos_executor.py to specify an optional docker
image containing
airflow which can be pulled on slaves before
running the actual
airflow command. This also opens the door for an
optimization of
resources in a future PR, by allowing the
specification of CPU and
memory needed for each airflow task.

Closes #3008 from agrajm/AIRFLOW-2068


> Mesos Executor should allow specifying optional Docker image before running 
> command
> ---
>
> Key: AIRFLOW-2068
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2068
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: executor
>Affects Versions: Airflow 1.8
>Reporter: Agraj Mangal
>Assignee: Agraj Mangal
>Priority: Major
>
> In its current form, MesosExecutor schedules tasks on mesos slaves which just 
> contain airflow commands assuming that the mesos slaves already have airflow 
> installed and configured on them. This assumption goes against the Mesos 
> philosophy of having a heterogeneous cluster.
> Since mesos provides an option to pull a docker image before actually running 
> the actual task/command so this improvement changes the mesos_executor.py to 
> specify an optional docker image containing airflow which can be pulled on 
> slaves before running the actual airflow command. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2366) New Webserver UI with Role-Based Access Control not working

2018-04-23 Thread Ramki Subramanian (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448696#comment-16448696
 ] 

Ramki Subramanian commented on AIRFLOW-2366:


[~johnarnold] - I am not able to view any attachment.

> New Webserver UI with Role-Based Access Control not working
> ---
>
> Key: AIRFLOW-2366
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2366
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: authentication
>Affects Versions: 1.9.0
>Reporter: Ramki Subramanian
>Priority: Major
>
> I tried setting up the Role-Based Access Control as given in 
> [https://github.com/apache/incubator-airflow/blob/master/UPDATING.md]
> I had changed my airflow.cfg file as given below
> {quote}{{```}}{{To turn on this feature, in your airflow.cfg file, under 
> [webserver], set configuration variable }}{{rbac = True}}{{,}}{{```}}
> {quote}
> But am not sure which airflow command to use to generate webserver_config.py
> {quote}```and then run {{airflow}} command, which will generate the 
> {{webserver_config.py}} file in your $AIRFLOW_HOME.```
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2366) New Webserver UI with Role-Based Access Control not working

2018-04-23 Thread Ramki Subramanian (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramki Subramanian updated AIRFLOW-2366:
---
Description: 
I tried setting up the Role-Based Access Control as given in 

[https://github.com/apache/incubator-airflow/blob/master/UPDATING.md]

I had changed my airflow.cfg file as given below
{quote}{{```}}{{To turn on this feature, in your airflow.cfg file, under 
[webserver], set configuration variable }}{{rbac = True}}{{,}}{{```}}
{quote}
But am not sure which airflow command to use to generate webserver_config.py
{quote}```and then run {{airflow}} command, which will generate the 
{{webserver_config.py}} file in your $AIRFLOW_HOME.```
{quote}

  was:
I tried setting up the Role-Based Access Control as given in 

[https://github.com/apache/incubator-airflow/blob/master/UPDATING.md]

I had changed my airflow.cfg file as given below

```To turn on this feature, in your airflow.cfg file, under [webserver], set 
configuration variable {{rbac = True}},```

But am not sure which airflow command to use to generate webserver_config.py

```and then run {{airflow}} command, which will generate the 
{{webserver_config.py}} file in your $AIRFLOW_HOME.```


> New Webserver UI with Role-Based Access Control not working
> ---
>
> Key: AIRFLOW-2366
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2366
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: authentication
>Affects Versions: 1.9.0
>Reporter: Ramki Subramanian
>Priority: Major
>
> I tried setting up the Role-Based Access Control as given in 
> [https://github.com/apache/incubator-airflow/blob/master/UPDATING.md]
> I had changed my airflow.cfg file as given below
> {quote}{{```}}{{To turn on this feature, in your airflow.cfg file, under 
> [webserver], set configuration variable }}{{rbac = True}}{{,}}{{```}}
> {quote}
> But am not sure which airflow command to use to generate webserver_config.py
> {quote}```and then run {{airflow}} command, which will generate the 
> {{webserver_config.py}} file in your $AIRFLOW_HOME.```
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2367) High POSTGRES DB CPU utilization

2018-04-23 Thread John Arnold (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Arnold updated AIRFLOW-2367:
-
Attachment: cpu.png

> High POSTGRES DB CPU utilization
> 
>
> Key: AIRFLOW-2367
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2367
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: Airflow 2.0, 1.9.0
>Reporter: John Arnold
>Priority: Major
> Attachments: cpu.png
>
>
> We are seeing steady state 70-90% CPU utilization.  It feels like a missing 
> index kind of problem, as our TPS rate is really low, I'm not seeing any long 
> running queries, connection counts are reasonable (low hundreds) and locks 
> also look reasonable (not many exclusive / write locks)
> We shut down the webserver and it doesn't go away, so it doesn't seem to be 
> in that part of the code. My guess is either the scheduler has an inefficient 
> query, or the (Celery) executor code path does.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2366) New Webserver UI with Role-Based Access Control not working

2018-04-23 Thread John Arnold (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Arnold updated AIRFLOW-2366:
-
Attachment: (was: cpu.png)

> New Webserver UI with Role-Based Access Control not working
> ---
>
> Key: AIRFLOW-2366
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2366
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: authentication
>Affects Versions: 1.9.0
>Reporter: Ramki Subramanian
>Priority: Major
>
> I tried setting up the Role-Based Access Control as given in 
> [https://github.com/apache/incubator-airflow/blob/master/UPDATING.md]
> I had changed my airflow.cfg file as given below
> ```To turn on this feature, in your airflow.cfg file, under [webserver], set 
> configuration variable {{rbac = True}},```
> But am not sure which airflow command to use to generate webserver_config.py
> ```and then run {{airflow}} command, which will generate the 
> {{webserver_config.py}} file in your $AIRFLOW_HOME.```



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2366) New Webserver UI with Role-Based Access Control not working

2018-04-23 Thread John Arnold (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Arnold updated AIRFLOW-2366:
-
Attachment: cpu.png

> New Webserver UI with Role-Based Access Control not working
> ---
>
> Key: AIRFLOW-2366
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2366
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: authentication
>Affects Versions: 1.9.0
>Reporter: Ramki Subramanian
>Priority: Major
> Attachments: cpu.png
>
>
> I tried setting up the Role-Based Access Control as given in 
> [https://github.com/apache/incubator-airflow/blob/master/UPDATING.md]
> I had changed my airflow.cfg file as given below
> ```To turn on this feature, in your airflow.cfg file, under [webserver], set 
> configuration variable {{rbac = True}},```
> But am not sure which airflow command to use to generate webserver_config.py
> ```and then run {{airflow}} command, which will generate the 
> {{webserver_config.py}} file in your $AIRFLOW_HOME.```



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2367) High POSTGRES DB CPU utilization

2018-04-23 Thread John Arnold (JIRA)
John Arnold created AIRFLOW-2367:


 Summary: High POSTGRES DB CPU utilization
 Key: AIRFLOW-2367
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2367
 Project: Apache Airflow
  Issue Type: Bug
  Components: scheduler
Affects Versions: Airflow 2.0, 1.9.0
Reporter: John Arnold


We are seeing steady state 70-90% CPU utilization.  It feels like a missing 
index kind of problem, as our TPS rate is really low, I'm not seeing any long 
running queries, connection counts are reasonable (low hundreds) and locks also 
look reasonable (not many exclusive / write locks)

We shut down the webserver and it doesn't go away, so it doesn't seem to be in 
that part of the code. My guess is either the scheduler has an inefficient 
query, or the (Celery) executor code path does.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-2365) Fix autocommit test issue with SQLite

2018-04-23 Thread Arthur Wiedmer (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arthur Wiedmer reassigned AIRFLOW-2365:
---

Assignee: Arthur Wiedmer

> Fix autocommit test issue with SQLite
> -
>
> Key: AIRFLOW-2365
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2365
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Arthur Wiedmer
>Assignee: Arthur Wiedmer
>Priority: Major
>
> In a previous PR, I added acheck for an autocommit attribute which fails for 
> SQLite. Correcting the tests now.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2365) Fix autocommit test issue with SQLite

2018-04-23 Thread Arthur Wiedmer (JIRA)
Arthur Wiedmer created AIRFLOW-2365:
---

 Summary: Fix autocommit test issue with SQLite
 Key: AIRFLOW-2365
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2365
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Arthur Wiedmer


In a previous PR, I added acheck for an autocommit attribute which fails for 
SQLite. Correcting the tests now.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2366) New Webserver UI with Role-Based Access Control not working

2018-04-23 Thread Ramki Subramanian (JIRA)
Ramki Subramanian created AIRFLOW-2366:
--

 Summary: New Webserver UI with Role-Based Access Control not 
working
 Key: AIRFLOW-2366
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2366
 Project: Apache Airflow
  Issue Type: Bug
  Components: authentication
Affects Versions: 1.9.0
Reporter: Ramki Subramanian


I tried setting up the Role-Based Access Control as given in 

[https://github.com/apache/incubator-airflow/blob/master/UPDATING.md]

I had changed my airflow.cfg file as given below

```To turn on this feature, in your airflow.cfg file, under [webserver], set 
configuration variable {{rbac = True}},```

But am not sure which airflow command to use to generate webserver_config.py

```and then run {{airflow}} command, which will generate the 
{{webserver_config.py}} file in your $AIRFLOW_HOME.```



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


svn commit: r26473 - /dev/incubator/airflow/1.10.0beta1/

2018-04-23 Thread bolke
Author: bolke
Date: Mon Apr 23 17:59:26 2018
New Revision: 26473

Log:
Add 1.10.0 beta 1

Added:
dev/incubator/airflow/1.10.0beta1/
dev/incubator/airflow/1.10.0beta1/apache-airflow-1.10.0b1+incubating.tar.gz 
  (with props)

dev/incubator/airflow/1.10.0beta1/apache-airflow-1.10.0b1+incubating.tar.gz.asc

dev/incubator/airflow/1.10.0beta1/apache-airflow-1.10.0b1+incubating.tar.gz.md5

dev/incubator/airflow/1.10.0beta1/apache-airflow-1.10.0b1+incubating.tar.gz.sha512

dev/incubator/airflow/1.10.0beta1/apache-airflow-1.10.0beta1+incubating-source.tar.gz
   (with props)

dev/incubator/airflow/1.10.0beta1/apache-airflow-1.10.0beta1+incubating-source.tar.gz.asc

dev/incubator/airflow/1.10.0beta1/apache-airflow-1.10.0beta1+incubating-source.tar.gz.md5

dev/incubator/airflow/1.10.0beta1/apache-airflow-1.10.0beta1+incubating-source.tar.gz.sha512

Added: 
dev/incubator/airflow/1.10.0beta1/apache-airflow-1.10.0b1+incubating.tar.gz
==
Binary file - no diff available.

Propchange: 
dev/incubator/airflow/1.10.0beta1/apache-airflow-1.10.0b1+incubating.tar.gz
--
svn:mime-type = application/octet-stream

Added: 
dev/incubator/airflow/1.10.0beta1/apache-airflow-1.10.0b1+incubating.tar.gz.asc
==
--- 
dev/incubator/airflow/1.10.0beta1/apache-airflow-1.10.0b1+incubating.tar.gz.asc 
(added)
+++ 
dev/incubator/airflow/1.10.0beta1/apache-airflow-1.10.0b1+incubating.tar.gz.asc 
Mon Apr 23 17:59:26 2018
@@ -0,0 +1,16 @@
+-BEGIN PGP SIGNATURE-
+
+iQIzBAABCgAdFiEEB1hcT6CMC2tdrJKNNRkLg9kFoLoFAlreHa0ACgkQNRkLg9kF
+oLpfdBAAoT9YbPWnSz9PamAL+fo/no2tR/tu8NOeM/5em5LruhSCNQuHK90R3kYt
+jhl0s1AKzgyy4d5yELn9DReEM/VUFbx2ROrHjM9HPud4gd30MzHsiAoMIheIaakO
+dDPA62PrK9+Qa3CDWOYVseqkMdmrvo5GrmJDf7M4QU+HqD0mBvf1JKD/Dc779u7k
+5yqi2FgufHRlj8yqwGaWLtiZnprHXmQcoSictG/2PEIW4HHOkU4/L1/QKt6y7yfF
+DZy+/rnGNTzncFBZd7cnrysjDYY45mUddkrSnxl/4Td0KiiLAITmbhI+RANpvHq7
+5iyOAQ8VqDLgWkYlMRRhyH+Aik+jQNz2+iDRWt0K1w3rAcN9Ta5hPk0tF8wIh3I7
+Yb82+uZKExyV2XhMQvtCrDWpE7tsiXApDSyGCnIghXmYWjAB9yH1LyyjLhjyZFGU
+lPhLpH2ycxcZxmX6n6E3BCccr+moHAZeoKOOg89qLqOWCEmZg/9vC54GgX5gmSba
+RvCAP8FO12RIDQrlRXrDkVkSWslYz5UnnsRqyZuf+KKvd9WsE52ifZvGg92KO0ns
+z/LhJIdT+0KtM4WdOCxo3TjOa/g3RYc2kQ7zMl6llpykB4G6dkm03YBpPAQJbDrM
+4jhrQGg61snq7YtwdWMY3lzSA71MI12Kmd1ncdhYrTiHuupvsLM=
+=w0bp
+-END PGP SIGNATURE-

Added: 
dev/incubator/airflow/1.10.0beta1/apache-airflow-1.10.0b1+incubating.tar.gz.md5
==
--- 
dev/incubator/airflow/1.10.0beta1/apache-airflow-1.10.0b1+incubating.tar.gz.md5 
(added)
+++ 
dev/incubator/airflow/1.10.0beta1/apache-airflow-1.10.0b1+incubating.tar.gz.md5 
Mon Apr 23 17:59:26 2018
@@ -0,0 +1,2 @@
+apache-airflow-1.10.0b1+incubating.tar.gz: 
+B1 1D A8 1F D3 A8 1D 47  21 EE F9 FD 72 15 50 07

Added: 
dev/incubator/airflow/1.10.0beta1/apache-airflow-1.10.0b1+incubating.tar.gz.sha512
==
--- 
dev/incubator/airflow/1.10.0beta1/apache-airflow-1.10.0b1+incubating.tar.gz.sha512
 (added)
+++ 
dev/incubator/airflow/1.10.0beta1/apache-airflow-1.10.0b1+incubating.tar.gz.sha512
 Mon Apr 23 17:59:26 2018
@@ -0,0 +1,3 @@
+apache-airflow-1.10.0b1+incubating.tar.gz: 
+08FE7003 8C3CEBD8 907FB279 2FEFE4C1 698FC44A 1CCF26CE 9204DD4F 35A1AABA 
51819B4C
+ 50294324 70425C99 71881343 B39F34DC A405AB67 1A9F8527 45992C2F

Added: 
dev/incubator/airflow/1.10.0beta1/apache-airflow-1.10.0beta1+incubating-source.tar.gz
==
Binary file - no diff available.

Propchange: 
dev/incubator/airflow/1.10.0beta1/apache-airflow-1.10.0beta1+incubating-source.tar.gz
--
svn:mime-type = application/octet-stream

Added: 
dev/incubator/airflow/1.10.0beta1/apache-airflow-1.10.0beta1+incubating-source.tar.gz.asc
==
--- 
dev/incubator/airflow/1.10.0beta1/apache-airflow-1.10.0beta1+incubating-source.tar.gz.asc
 (added)
+++ 
dev/incubator/airflow/1.10.0beta1/apache-airflow-1.10.0beta1+incubating-source.tar.gz.asc
 Mon Apr 23 17:59:26 2018
@@ -0,0 +1,16 @@
+-BEGIN PGP SIGNATURE-
+
+iQIzBAABCgAdFiEEB1hcT6CMC2tdrJKNNRkLg9kFoLoFAlreHc0ACgkQNRkLg9kF
+oLpxKA//WW3OaL2pfmB62HxSQ2/FC1kauhQzm3Jlzpfx2SHAq32P2GI0hKS8fHxF
+YwOoAsRgGl5MSzbyj2rUsZn1c/IS3P2PaN8tfq911VR2wLl1NzEmYNGWn3N1pbms
+IqabAQSYxjIl+GXF6UpdAxiwCz5WGR9AxCnesINw3H6JFMOv4MsrBPL+f5Tbg5mG
+E3daxm3XOX56BTN9ZtkzZPB07EP//ELpFzYmNI9HD8GK7YqdJxSOrri3PfSt6i0V
+RADqJwaa7LxRi2g6X5GAaYsnEEwO9xvO+ZhD54wesodFBr5XfOZi23Rx7334MJKG
+t2mGZ7aDn68FNb+JcBLqef4yFfrlfuytq+8WbQZfMIjVUi3vsGzvhJJMCpZjO+Ke

incubator-airflow git commit: Add incubating

2018-04-23 Thread bolke
Repository: incubator-airflow
Updated Branches:
  refs/heads/v1-10-test 2a1676e29 -> 64900e2b0


Add incubating


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/64900e2b
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/64900e2b
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/64900e2b

Branch: refs/heads/v1-10-test
Commit: 64900e2b0753f03fcc9d5fcee1c671f8440109a9
Parents: 2a1676e
Author: Bolke de Bruin 
Authored: Mon Apr 23 19:35:55 2018 +0200
Committer: Bolke de Bruin 
Committed: Mon Apr 23 19:35:55 2018 +0200

--
 airflow/version.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/64900e2b/airflow/version.py
--
diff --git a/airflow/version.py b/airflow/version.py
index 6ead469..3da4fc3 100644
--- a/airflow/version.py
+++ b/airflow/version.py
@@ -18,4 +18,4 @@
 # under the License.
 #
 
-version = '1.10.0beta1'
+version = '1.10.0beta1+incubating'



incubator-airflow git commit: Add incubating

2018-04-23 Thread bolke
Repository: incubator-airflow
Updated Branches:
  refs/heads/master 305a787e3 -> a30acafc8


Add incubating


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/a30acafc
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/a30acafc
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/a30acafc

Branch: refs/heads/master
Commit: a30acafc866b24880225495dd51ed71344758d54
Parents: 305a787
Author: Bolke de Bruin 
Authored: Mon Apr 23 19:34:07 2018 +0200
Committer: Bolke de Bruin 
Committed: Mon Apr 23 19:34:07 2018 +0200

--
 airflow/version.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/a30acafc/airflow/version.py
--
diff --git a/airflow/version.py b/airflow/version.py
index 42f1dea..750da36 100644
--- a/airflow/version.py
+++ b/airflow/version.py
@@ -18,4 +18,4 @@
 # under the License.
 #
 
-version = '2.0.0dev0'
+version = '2.0.0dev0+incubating'



incubator-airflow git commit: Bump version

2018-04-23 Thread bolke
Repository: incubator-airflow
Updated Branches:
  refs/heads/master 2b030699d -> 305a787e3


Bump version


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/305a787e
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/305a787e
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/305a787e

Branch: refs/heads/master
Commit: 305a787e33b086fad605379853e5428397d2ba26
Parents: 2b03069
Author: Bolke de Bruin 
Authored: Mon Apr 23 19:26:40 2018 +0200
Committer: Bolke de Bruin 
Committed: Mon Apr 23 19:26:40 2018 +0200

--
 airflow/version.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/305a787e/airflow/version.py
--
diff --git a/airflow/version.py b/airflow/version.py
index 08abc30..42f1dea 100644
--- a/airflow/version.py
+++ b/airflow/version.py
@@ -18,4 +18,4 @@
 # under the License.
 #
 
-version = '1.10.0dev0+incubating'
+version = '2.0.0dev0'



incubator-airflow git commit: Bump version

2018-04-23 Thread bolke
Repository: incubator-airflow
Updated Branches:
  refs/heads/v1-10-test 2b030699d -> 2a1676e29


Bump version


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/2a1676e2
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/2a1676e2
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/2a1676e2

Branch: refs/heads/v1-10-test
Commit: 2a1676e2908514308221e146525a16565e5ea40d
Parents: 2b03069
Author: Bolke de Bruin 
Authored: Mon Apr 23 19:25:49 2018 +0200
Committer: Bolke de Bruin 
Committed: Mon Apr 23 19:25:49 2018 +0200

--
 airflow/version.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/2a1676e2/airflow/version.py
--
diff --git a/airflow/version.py b/airflow/version.py
index 08abc30..6ead469 100644
--- a/airflow/version.py
+++ b/airflow/version.py
@@ -18,4 +18,4 @@
 # under the License.
 #
 
-version = '1.10.0dev0+incubating'
+version = '1.10.0beta1'



[jira] [Commented] (AIRFLOW-1652) Push DatabricksRunSubmitOperator metadata into XCOM

2018-04-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448471#comment-16448471
 ] 

ASF subversion and git services commented on AIRFLOW-1652:
--

Commit 2b030699de351c557a646e8e620c1b96a0397070 in incubator-airflow's branch 
refs/heads/master from [~andrewachen]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=2b03069 ]

[AIRFLOW-1652] Push DatabricksRunSubmitOperator metadata into XCOM

[AIRFLOW-1652] Push DatabricksRunSubmitOperator
metadata into XCOM

Push run_id and run_page_url into xcom so
callbacks and other
tasks can reference this information

address comments

Closes #2641 from andrewmchen/databricks-xcom


> Push DatabricksRunSubmitOperator metadata into XCOM
> ---
>
> Key: AIRFLOW-1652
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1652
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Andrew Chen
>Priority: Major
> Fix For: 1.10.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-1652) Push DatabricksRunSubmitOperator metadata into XCOM

2018-04-23 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-1652.
-
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #2641
[https://github.com/apache/incubator-airflow/pull/2641]

> Push DatabricksRunSubmitOperator metadata into XCOM
> ---
>
> Key: AIRFLOW-1652
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1652
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Andrew Chen
>Priority: Major
> Fix For: 1.10.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1652) Push DatabricksRunSubmitOperator metadata into XCOM

2018-04-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448472#comment-16448472
 ] 

ASF subversion and git services commented on AIRFLOW-1652:
--

Commit 2b030699de351c557a646e8e620c1b96a0397070 in incubator-airflow's branch 
refs/heads/master from [~andrewachen]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=2b03069 ]

[AIRFLOW-1652] Push DatabricksRunSubmitOperator metadata into XCOM

[AIRFLOW-1652] Push DatabricksRunSubmitOperator
metadata into XCOM

Push run_id and run_page_url into xcom so
callbacks and other
tasks can reference this information

address comments

Closes #2641 from andrewmchen/databricks-xcom


> Push DatabricksRunSubmitOperator metadata into XCOM
> ---
>
> Key: AIRFLOW-1652
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1652
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Andrew Chen
>Priority: Major
> Fix For: 1.10.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-775) AutoCommit in jdbc hook seems not to turn off if set to false

2018-04-23 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-775.

   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3257
[https://github.com/apache/incubator-airflow/pull/3257]

> AutoCommit in jdbc hook seems not to turn off if set to false
> -
>
> Key: AIRFLOW-775
> URL: https://issues.apache.org/jira/browse/AIRFLOW-775
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: db, hooks
>Reporter: Daniel Lamblin
>Priority: Major
> Fix For: 1.10.0
>
>
> If I use JdbcHook and run with autocommit=false I still get exceptions when 
> the commit is made because autocommit mode is on by default and apparently 
> was not set to off.
> This can be worked around by setting the connection host with 
> ;autocommit=false
> however it doesn't seem like the intended behavior when passing 
> autocommit=False with the hook's methods.
> The JdbcHook does not seem to have a constructor that could take the jdbc 
> driver, location, host, schema, port, username, and password and work without 
> a set connection id, so working around this in code isn't too straightforward 
> either.
> [2017-01-19 19:03:22,728] {models.py:1286} ERROR - 
> org.netezza.error.NzSQLException: The connection object is in auto-commit mode
> Traceback (most recent call last):
>   File 
> "/Users/daniellamblin/VEnvs/airflow-dags/lib/python2.7/site-packages/airflow/models.py",
>  line 1242, in run
> result = task_copy.execute(context=context)
>   File 
> "/Users/daniellamblin/VEnvs/airflow-dags/lib/python2.7/site-packages/airflow/operators/python_operator.py",
>  line 66, in execute
> return_value = self.python_callable(*self.op_args, **self.op_kwargs)
>   File 
> "/Users/daniellamblin/airflow/dags/dpds/dpds_go_pda_dwd_sku_and_dwd_hist_up_sku_grade.py",
>  line 356, in stage_to_update_tables
> hook.run(sql=sql, autocommit=False)
>   File 
> "/Users/daniellamblin/VEnvs/airflow-dags/lib/python2.7/site-packages/airflow/hooks/dbapi_hook.py",
>  line 134, in run
> conn.commit()
>   File 
> "/Users/daniellamblin/VEnvs/airflow-dags/lib/python2.7/site-packages/jaydebeapi/__init__.py",
>  line 391, in commit
> _handle_sql_exception()
>   File 
> "/Users/daniellamblin/VEnvs/airflow-dags/lib/python2.7/site-packages/jaydebeapi/__init__.py",
>  line 148, in _handle_sql_exception_jpype
> reraise(exc_type, exc_info[1], exc_info[2])
>   File 
> "/Users/daniellamblin/VEnvs/airflow-dags/lib/python2.7/site-packages/jaydebeapi/__init__.py",
>  line 389, in commit
> self.jconn.commit()
> DatabaseError: org.netezza.error.NzSQLException: The connection object is in 
> auto-commit mode
> [2017-01-19 19:03:22,730] {models.py:1306} INFO - Marking task as FAILED.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-766) Skip conn.commit() when in Auto-commit

2018-04-23 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-766.

   Resolution: Fixed
Fix Version/s: (was: Airflow 2.0)
   1.10.0

Issue resolved by pull request #3257
[https://github.com/apache/incubator-airflow/pull/3257]

> Skip conn.commit() when in Auto-commit
> --
>
> Key: AIRFLOW-766
> URL: https://issues.apache.org/jira/browse/AIRFLOW-766
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: db
>Affects Versions: Airflow 2.0
> Environment: Airflow 2.0, IBM Netezza
>Reporter: Pfubar.k
>Assignee: Pfubar.k
>Priority: Major
>  Labels: easyfix
> Fix For: 1.10.0
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Some JDBC Drivers fails when using DbApiHook.run(), DbApiHook.insert_rows().
> I'm using IBM Netezza. When auto-commit mode is on I get this error message.
> {code}
> NzSQLException: The connection object is in auto-commit mode
> {code}
> conn.commit() needs to be called only when auto-commit mode is off.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2364) The autocommit flag can be set on a connection which does not support it.

2018-04-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448459#comment-16448459
 ] 

ASF subversion and git services commented on AIRFLOW-2364:
--

Commit a33b29c8512c599b97e53808ebf796039a89c15c in incubator-airflow's branch 
refs/heads/master from [~artwr]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=a33b29c ]

[AIRFLOW-2364] Warn when setting autocommit on a connection which does not 
support it


> The autocommit flag can be set on a connection which does not support it.
> -
>
> Key: AIRFLOW-2364
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2364
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Arthur Wiedmer
>Assignee: Arthur Wiedmer
>Priority: Minor
> Fix For: 1.10.0
>
>
> We could just add a logging warning when the method is invoked.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-775) AutoCommit in jdbc hook seems not to turn off if set to false

2018-04-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448460#comment-16448460
 ] 

ASF subversion and git services commented on AIRFLOW-775:
-

Commit 97954e21229982f580f18732c5018714f7e51087 in incubator-airflow's branch 
refs/heads/master from [~artwr]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=97954e2 ]

[AIRFLOW-775] Fix autocommit settings with Jdbc hook


> AutoCommit in jdbc hook seems not to turn off if set to false
> -
>
> Key: AIRFLOW-775
> URL: https://issues.apache.org/jira/browse/AIRFLOW-775
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: db, hooks
>Reporter: Daniel Lamblin
>Priority: Major
>
> If I use JdbcHook and run with autocommit=false I still get exceptions when 
> the commit is made because autocommit mode is on by default and apparently 
> was not set to off.
> This can be worked around by setting the connection host with 
> ;autocommit=false
> however it doesn't seem like the intended behavior when passing 
> autocommit=False with the hook's methods.
> The JdbcHook does not seem to have a constructor that could take the jdbc 
> driver, location, host, schema, port, username, and password and work without 
> a set connection id, so working around this in code isn't too straightforward 
> either.
> [2017-01-19 19:03:22,728] {models.py:1286} ERROR - 
> org.netezza.error.NzSQLException: The connection object is in auto-commit mode
> Traceback (most recent call last):
>   File 
> "/Users/daniellamblin/VEnvs/airflow-dags/lib/python2.7/site-packages/airflow/models.py",
>  line 1242, in run
> result = task_copy.execute(context=context)
>   File 
> "/Users/daniellamblin/VEnvs/airflow-dags/lib/python2.7/site-packages/airflow/operators/python_operator.py",
>  line 66, in execute
> return_value = self.python_callable(*self.op_args, **self.op_kwargs)
>   File 
> "/Users/daniellamblin/airflow/dags/dpds/dpds_go_pda_dwd_sku_and_dwd_hist_up_sku_grade.py",
>  line 356, in stage_to_update_tables
> hook.run(sql=sql, autocommit=False)
>   File 
> "/Users/daniellamblin/VEnvs/airflow-dags/lib/python2.7/site-packages/airflow/hooks/dbapi_hook.py",
>  line 134, in run
> conn.commit()
>   File 
> "/Users/daniellamblin/VEnvs/airflow-dags/lib/python2.7/site-packages/jaydebeapi/__init__.py",
>  line 391, in commit
> _handle_sql_exception()
>   File 
> "/Users/daniellamblin/VEnvs/airflow-dags/lib/python2.7/site-packages/jaydebeapi/__init__.py",
>  line 148, in _handle_sql_exception_jpype
> reraise(exc_type, exc_info[1], exc_info[2])
>   File 
> "/Users/daniellamblin/VEnvs/airflow-dags/lib/python2.7/site-packages/jaydebeapi/__init__.py",
>  line 389, in commit
> self.jconn.commit()
> DatabaseError: org.netezza.error.NzSQLException: The connection object is in 
> auto-commit mode
> [2017-01-19 19:03:22,730] {models.py:1306} INFO - Marking task as FAILED.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-766) Skip conn.commit() when in Auto-commit

2018-04-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448458#comment-16448458
 ] 

ASF subversion and git services commented on AIRFLOW-766:
-

Commit 3450f526cec894cc7beceeae18f9d54cb2ae7520 in incubator-airflow's branch 
refs/heads/master from [~artwr]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=3450f52 ]

[AIRFLOW-766] Skip conn.commit() when in Auto-commit


> Skip conn.commit() when in Auto-commit
> --
>
> Key: AIRFLOW-766
> URL: https://issues.apache.org/jira/browse/AIRFLOW-766
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: db
>Affects Versions: Airflow 2.0
> Environment: Airflow 2.0, IBM Netezza
>Reporter: Pfubar.k
>Assignee: Pfubar.k
>Priority: Major
>  Labels: easyfix
> Fix For: Airflow 2.0
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Some JDBC Drivers fails when using DbApiHook.run(), DbApiHook.insert_rows().
> I'm using IBM Netezza. When auto-commit mode is on I get this error message.
> {code}
> NzSQLException: The connection object is in auto-commit mode
> {code}
> conn.commit() needs to be called only when auto-commit mode is off.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2364) The autocommit flag can be set on a connection which does not support it.

2018-04-23 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-2364.
-
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3257
[https://github.com/apache/incubator-airflow/pull/3257]

> The autocommit flag can be set on a connection which does not support it.
> -
>
> Key: AIRFLOW-2364
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2364
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Arthur Wiedmer
>Assignee: Arthur Wiedmer
>Priority: Minor
> Fix For: 1.10.0
>
>
> We could just add a logging warning when the method is invoked.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[2/4] incubator-airflow git commit: [AIRFLOW-2364] Warn when setting autocommit on a connection which does not support it

2018-04-23 Thread bolke
[AIRFLOW-2364] Warn when setting autocommit on a connection which does not 
support it


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/a33b29c8
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/a33b29c8
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/a33b29c8

Branch: refs/heads/master
Commit: a33b29c8512c599b97e53808ebf796039a89c15c
Parents: 3450f52
Author: Arthur Wiedmer 
Authored: Mon Apr 23 09:42:13 2018 -0700
Committer: Arthur Wiedmer 
Committed: Mon Apr 23 09:52:08 2018 -0700

--
 airflow/hooks/dbapi_hook.py | 8 
 1 file changed, 8 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/a33b29c8/airflow/hooks/dbapi_hook.py
--
diff --git a/airflow/hooks/dbapi_hook.py b/airflow/hooks/dbapi_hook.py
index b6cdff6..25b1588 100644
--- a/airflow/hooks/dbapi_hook.py
+++ b/airflow/hooks/dbapi_hook.py
@@ -173,6 +173,14 @@ class DbApiHook(BaseHook):
 conn.commit()
 
 def set_autocommit(self, conn, autocommit):
+"""
+Sets the autocommit flag on the connection
+"""
+if not self.supports_autocommit and autocommit:
+self.log.warn(
+("%s connection doesn't support "
+ "autocommit but autocommit activated."),
+getattr(self, self.conn_name_attr))
 conn.autocommit = autocommit
 
 def get_cursor(self):



[4/4] incubator-airflow git commit: Merge pull request #3257 from artwr/awiedmer-fix-issue-with-jdbc-autocommit

2018-04-23 Thread bolke
Merge pull request #3257 from artwr/awiedmer-fix-issue-with-jdbc-autocommit


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/1e82e11a
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/1e82e11a
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/1e82e11a

Branch: refs/heads/master
Commit: 1e82e11aed7b28c1c0622e8f830624346ee7d55d
Parents: 65b6cea 97954e2
Author: Bolke de Bruin 
Authored: Mon Apr 23 19:08:24 2018 +0200
Committer: Bolke de Bruin 
Committed: Mon Apr 23 19:08:24 2018 +0200

--
 airflow/hooks/dbapi_hook.py | 15 ---
 airflow/hooks/jdbc_hook.py  |  2 +-
 2 files changed, 13 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/1e82e11a/airflow/hooks/dbapi_hook.py
--



[1/4] incubator-airflow git commit: [AIRFLOW-766] Skip conn.commit() when in Auto-commit

2018-04-23 Thread bolke
Repository: incubator-airflow
Updated Branches:
  refs/heads/master 65b6ceae7 -> 1e82e11ae


[AIRFLOW-766] Skip conn.commit() when in Auto-commit


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/3450f526
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/3450f526
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/3450f526

Branch: refs/heads/master
Commit: 3450f526cec894cc7beceeae18f9d54cb2ae7520
Parents: 1d3bb54
Author: Arthur Wiedmer 
Authored: Mon Apr 23 09:41:56 2018 -0700
Committer: Arthur Wiedmer 
Committed: Mon Apr 23 09:42:10 2018 -0700

--
 airflow/hooks/dbapi_hook.py | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/3450f526/airflow/hooks/dbapi_hook.py
--
diff --git a/airflow/hooks/dbapi_hook.py b/airflow/hooks/dbapi_hook.py
index a9f4e43..b6cdff6 100644
--- a/airflow/hooks/dbapi_hook.py
+++ b/airflow/hooks/dbapi_hook.py
@@ -7,9 +7,9 @@
 # to you under the Apache License, Version 2.0 (the
 # "License"); you may not use this file except in compliance
 # with the License.  You may obtain a copy of the License at
-# 
+#
 #   http://www.apache.org/licenses/LICENSE-2.0
-# 
+#
 # Unless required by applicable law or agreed to in writing,
 # software distributed under the License is distributed on an
 # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
@@ -169,7 +169,8 @@ class DbApiHook(BaseHook):
 else:
 cur.execute(s)
 
-conn.commit()
+if not conn.autocommit:
+conn.commit()
 
 def set_autocommit(self, conn, autocommit):
 conn.autocommit = autocommit



[3/4] incubator-airflow git commit: [AIRFLOW-775] Fix autocommit settings with Jdbc hook

2018-04-23 Thread bolke
[AIRFLOW-775] Fix autocommit settings with Jdbc hook


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/97954e21
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/97954e21
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/97954e21

Branch: refs/heads/master
Commit: 97954e21229982f580f18732c5018714f7e51087
Parents: a33b29c
Author: Arthur Wiedmer 
Authored: Mon Apr 23 09:42:21 2018 -0700
Committer: Arthur Wiedmer 
Committed: Mon Apr 23 09:52:08 2018 -0700

--
 airflow/hooks/jdbc_hook.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/97954e21/airflow/hooks/jdbc_hook.py
--
diff --git a/airflow/hooks/jdbc_hook.py b/airflow/hooks/jdbc_hook.py
index b76e80d..8f0cd67 100644
--- a/airflow/hooks/jdbc_hook.py
+++ b/airflow/hooks/jdbc_hook.py
@@ -58,4 +58,4 @@ class JdbcHook(DbApiHook):
 :param conn: The connection
 :return:
 """
-conn.jconn.autocommit = autocommit
+conn.jconn.setAutoCommit(autocommit)



[jira] [Commented] (AIRFLOW-2234) Enable insert_rows for PrestoHook

2018-04-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448446#comment-16448446
 ] 

ASF subversion and git services commented on AIRFLOW-2234:
--

Commit 65b6ceae74c166efe95113ad5aa55004e2ad25c5 in incubator-airflow's branch 
refs/heads/master from [~sekikn]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=65b6cea ]

[AIRFLOW-2234] Enable insert_rows for PrestoHook

PrestoHook.insert_rows() raises
NotImplementedError for now.
But Presto 0.126+ allows specifying column names
in INSERT queries,
so we can leverage DbApiHook.insert_rows() almost
as is.
This PR enables this function.

Closes #3146 from sekikn/AIRFLOW-2234


> Enable insert_rows for PrestoHook
> -
>
> Key: AIRFLOW-2234
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2234
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: hooks
>Reporter: Kengo Seki
>Assignee: Kengo Seki
>Priority: Major
> Fix For: 1.10.0
>
>
> PrestoHook.insert_rows() raises NotImplementedError for now.
> But [Presto 0.126+ allows specifying column names in INSERT 
> queries|https://prestodb.io/docs/current/release/release-0.126.html], so we 
> can leverage DbApiHook.insert_rows() almost as is.
> I think there is no reason to keep it disabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


incubator-airflow git commit: [AIRFLOW-2234] Enable insert_rows for PrestoHook

2018-04-23 Thread bolke
Repository: incubator-airflow
Updated Branches:
  refs/heads/master ed9329017 -> 65b6ceae7


[AIRFLOW-2234] Enable insert_rows for PrestoHook

PrestoHook.insert_rows() raises
NotImplementedError for now.
But Presto 0.126+ allows specifying column names
in INSERT queries,
so we can leverage DbApiHook.insert_rows() almost
as is.
This PR enables this function.

Closes #3146 from sekikn/AIRFLOW-2234


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/65b6ceae
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/65b6ceae
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/65b6ceae

Branch: refs/heads/master
Commit: 65b6ceae74c166efe95113ad5aa55004e2ad25c5
Parents: ed93290
Author: Kengo Seki 
Authored: Mon Apr 23 19:01:38 2018 +0200
Committer: Bolke de Bruin 
Committed: Mon Apr 23 19:01:38 2018 +0200

--
 airflow/hooks/dbapi_hook.py |  4 +--
 airflow/hooks/presto_hook.py| 17 --
 tests/hooks/test_dbapi_hook.py  | 64 +---
 tests/hooks/test_presto_hook.py | 49 +++
 4 files changed, 125 insertions(+), 9 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/65b6ceae/airflow/hooks/dbapi_hook.py
--
diff --git a/airflow/hooks/dbapi_hook.py b/airflow/hooks/dbapi_hook.py
index a9f4e43..de0a3a3 100644
--- a/airflow/hooks/dbapi_hook.py
+++ b/airflow/hooks/dbapi_hook.py
@@ -213,8 +213,8 @@ class DbApiHook(BaseHook):
 for cell in row:
 l.append(self._serialize_cell(cell, conn))
 values = tuple(l)
-placeholders = ["%s",]*len(values)
-sql = "INSERT INTO {0} {1} VALUES ({2});".format(
+placeholders = ["%s", ] * len(values)
+sql = "INSERT INTO {0} {1} VALUES ({2})".format(
 table,
 target_fields,
 ",".join(placeholders))

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/65b6ceae/airflow/hooks/presto_hook.py
--
diff --git a/airflow/hooks/presto_hook.py b/airflow/hooks/presto_hook.py
index 935a9a5..8920448 100644
--- a/airflow/hooks/presto_hook.py
+++ b/airflow/hooks/presto_hook.py
@@ -114,5 +114,18 @@ class PrestoHook(DbApiHook):
 """
 return super(PrestoHook, self).run(self._strip_sql(hql), parameters)
 
-def insert_rows(self):
-raise NotImplementedError()
+# TODO Enable commit_every once PyHive supports transaction.
+# Unfortunately, PyHive 0.5.1 doesn't support transaction for now,
+# whereas Presto 0.132+ does.
+def insert_rows(self, table, rows, target_fields=None):
+"""
+A generic way to insert a set of tuples into a table.
+
+:param table: Name of the target table
+:type table: str
+:param rows: The rows to insert into the table
+:type rows: iterable of tuples
+:param target_fields: The names of the columns to fill in the table
+:type target_fields: iterable of strings
+"""
+super(PrestoHook, self).insert_rows(table, rows, target_fields, 0)

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/65b6ceae/tests/hooks/test_dbapi_hook.py
--
diff --git a/tests/hooks/test_dbapi_hook.py b/tests/hooks/test_dbapi_hook.py
index 9fcc970..c3ae187 100644
--- a/tests/hooks/test_dbapi_hook.py
+++ b/tests/hooks/test_dbapi_hook.py
@@ -28,17 +28,18 @@ class TestDbApiHook(unittest.TestCase):
 
 def setUp(self):
 super(TestDbApiHook, self).setUp()
-
+
 self.cur = mock.MagicMock()
-self.conn = conn = mock.MagicMock()
+self.conn = mock.MagicMock()
 self.conn.cursor.return_value = self.cur
-
+conn = self.conn
+
 class TestDBApiHook(DbApiHook):
 conn_name_attr = 'test_conn_id'
-
+
 def get_conn(self):
 return conn
-
+
 self.db_hook = TestDBApiHook()
 
 def test_get_records(self):
@@ -78,3 +79,56 @@ class TestDbApiHook(unittest.TestCase):
 self.conn.close.assert_called_once()
 self.cur.close.assert_called_once()
 self.cur.execute.assert_called_once_with(statement)
+
+def test_insert_rows(self):
+table = "table"
+rows = [("hello",),
+("world",)]
+
+self.db_hook.insert_rows(table, rows)
+
+self.conn.close.assert_called_once()
+self.cur.close.assert_called_once()
+
+

[jira] [Commented] (AIRFLOW-2234) Enable insert_rows for PrestoHook

2018-04-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448447#comment-16448447
 ] 

ASF subversion and git services commented on AIRFLOW-2234:
--

Commit 65b6ceae74c166efe95113ad5aa55004e2ad25c5 in incubator-airflow's branch 
refs/heads/master from [~sekikn]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=65b6cea ]

[AIRFLOW-2234] Enable insert_rows for PrestoHook

PrestoHook.insert_rows() raises
NotImplementedError for now.
But Presto 0.126+ allows specifying column names
in INSERT queries,
so we can leverage DbApiHook.insert_rows() almost
as is.
This PR enables this function.

Closes #3146 from sekikn/AIRFLOW-2234


> Enable insert_rows for PrestoHook
> -
>
> Key: AIRFLOW-2234
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2234
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: hooks
>Reporter: Kengo Seki
>Assignee: Kengo Seki
>Priority: Major
> Fix For: 1.10.0
>
>
> PrestoHook.insert_rows() raises NotImplementedError for now.
> But [Presto 0.126+ allows specifying column names in INSERT 
> queries|https://prestodb.io/docs/current/release/release-0.126.html], so we 
> can leverage DbApiHook.insert_rows() almost as is.
> I think there is no reason to keep it disabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2234) Enable insert_rows for PrestoHook

2018-04-23 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-2234.
-
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3146
[https://github.com/apache/incubator-airflow/pull/3146]

> Enable insert_rows for PrestoHook
> -
>
> Key: AIRFLOW-2234
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2234
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: hooks
>Reporter: Kengo Seki
>Assignee: Kengo Seki
>Priority: Major
> Fix For: 1.10.0
>
>
> PrestoHook.insert_rows() raises NotImplementedError for now.
> But [Presto 0.126+ allows specifying column names in INSERT 
> queries|https://prestodb.io/docs/current/release/release-0.126.html], so we 
> can leverage DbApiHook.insert_rows() almost as is.
> I think there is no reason to keep it disabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2208) Keep execution_date context when switching from task_instance_details

2018-04-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448444#comment-16448444
 ] 

ASF subversion and git services commented on AIRFLOW-2208:
--

Commit ed932901757a9e38049ef77e8b02c8bb8a9452c7 in incubator-airflow's branch 
refs/heads/master from [~iansuvak]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=ed93290 ]

[AIRFLOW-2208][Airflow-22208] Link to same DagRun graph from TaskInstance view

Allow graph view to accept blank execution_date
and pass it
through when it's available.

Closes #3132 from iansuvak/persistent_graph


> Keep execution_date context when switching from task_instance_details
> -
>
> Key: AIRFLOW-2208
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2208
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: ui
>Reporter: Ian Suvak
>Priority: Minor
> Fix For: 1.10.0
>
>
> With alerts that give direct links to Task Instance logs it's difficult to go 
> back to a Graph view of that Dag Run, since clicking Graph link goes to the 
> most recent Dag Run. If within a view that's specific to a DagRun such as 
> viewing task_instance log click graph link should go to DagRun that the Task 
> Instance belongs to.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


incubator-airflow git commit: [AIRFLOW-2208][Airflow-22208] Link to same DagRun graph from TaskInstance view

2018-04-23 Thread bolke
Repository: incubator-airflow
Updated Branches:
  refs/heads/master 09bbe2477 -> ed9329017


[AIRFLOW-2208][Airflow-22208] Link to same DagRun graph from TaskInstance view

Allow graph view to accept blank execution_date
and pass it
through when it's available.

Closes #3132 from iansuvak/persistent_graph


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/ed932901
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/ed932901
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/ed932901

Branch: refs/heads/master
Commit: ed932901757a9e38049ef77e8b02c8bb8a9452c7
Parents: 09bbe24
Author: Ian Suvak 
Authored: Mon Apr 23 18:59:58 2018 +0200
Committer: Bolke de Bruin 
Committed: Mon Apr 23 18:59:58 2018 +0200

--
 airflow/www/templates/airflow/dag.html |  2 +-
 airflow/www/utils.py   |  2 +-
 tests/core.py  | 40 +
 3 files changed, 25 insertions(+), 19 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/ed932901/airflow/www/templates/airflow/dag.html
--
diff --git a/airflow/www/templates/airflow/dag.html 
b/airflow/www/templates/airflow/dag.html
index d5a145d..ed84f27 100644
--- a/airflow/www/templates/airflow/dag.html
+++ b/airflow/www/templates/airflow/dag.html
@@ -54,7 +54,7 @@
   Back to {{ dag.parent_dag.dag_id }}
   
   {% endif %}
-  
+  
   
 Graph View
   

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/ed932901/airflow/www/utils.py
--
diff --git a/airflow/www/utils.py b/airflow/www/utils.py
index ada8ce2..4a3ac2e 100644
--- a/airflow/www/utils.py
+++ b/airflow/www/utils.py
@@ -259,7 +259,7 @@ def action_logging(f):
 task_id=request.args.get('task_id'),
 dag_id=request.args.get('dag_id'))
 
-if 'execution_date' in request.args:
+if request.args.get('execution_date'):
 log.execution_date = 
timezone.parse(request.args.get('execution_date'))
 
 with create_session() as session:

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/ed932901/tests/core.py
--
diff --git a/tests/core.py b/tests/core.py
index 308804d..9cbae9d 100644
--- a/tests/core.py
+++ b/tests/core.py
@@ -1596,12 +1596,12 @@ class WebUiTests(unittest.TestCase):
 
 self.dagbag = models.DagBag(include_examples=True)
 self.dag_bash = self.dagbag.dags['example_bash_operator']
-self.dag_bash2 = self.dagbag.dags['test_example_bash_operator']
+self.dag_python = self.dagbag.dags['example_python_operator']
 self.sub_dag = self.dagbag.dags['example_subdag_operator']
 self.runme_0 = self.dag_bash.get_task('runme_0')
 self.example_xcom = self.dagbag.dags['example_xcom']
 
-self.dagrun_bash2 = self.dag_bash2.create_dagrun(
+self.dagrun_python = self.dag_python.create_dagrun(
 
run_id="test_{}".format(models.DagRun.id_for_date(timezone.utcnow())),
 execution_date=DEFAULT_DATE,
 start_date=timezone.utcnow(),
@@ -1628,14 +1628,16 @@ class WebUiTests(unittest.TestCase):
 self.assertIn("DAGs", resp_html)
 self.assertIn("example_bash_operator", resp_html)
 
-# The HTML should contain data for the last-run. A link to the 
specific run, and the text of
-# the date.
+# The HTML should contain data for the last-run. A link to the 
specific run,
+#  and the text of the date.
 url = "/admin/airflow/graph?" + urlencode({
-"dag_id": self.dag_bash2.dag_id,
-"execution_date": self.dagrun_bash2.execution_date,
+"dag_id": self.dag_python.dag_id,
+"execution_date": self.dagrun_python.execution_date,
 }).replace("&", "")
 self.assertIn(url, resp_html)
-self.assertIn(self.dagrun_bash2.execution_date.strftime("%Y-%m-%d 
%H:%M"), resp_html)
+self.assertIn(
+self.dagrun_python.execution_date.strftime("%Y-%m-%d %H:%M"),
+resp_html)
 
 def test_query(self):
 response = self.app.get('/admin/queryview/')
@@ -1662,6 +1664,10 @@ class WebUiTests(unittest.TestCase):
 response = self.app.get(
 '/admin/airflow/graph?dag_id=example_bash_operator')
 self.assertIn("runme_0", response.data.decode('utf-8'))
+# confirm that the graph page loads when execution_date is blank
+response = self.app.get(
+

[jira] [Resolved] (AIRFLOW-1153) params in HiveOperator constructor can't be passed into Hive execution context

2018-04-23 Thread Fokko Driesprong (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fokko Driesprong resolved AIRFLOW-1153.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request #3136
[https://github.com/apache/incubator-airflow/pull/3136]

> params in HiveOperator constructor can't be passed into Hive execution context
> --
>
> Key: AIRFLOW-1153
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1153
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: hive_hooks, operators
>Affects Versions: Airflow 2.0, Airflow 1.8
>Reporter: Xianping lin
>Assignee: Fokko Driesprong
>Priority: Critical
>  Labels: easyfix, newbie
> Fix For: 2.0.0
>
>
> params parameter in HiveOperator can't be imported into Hive executation 
> context.
> so the following centence won't work, because 'mynumber' doesn't work for 
> sql sentence.
> test_hiveoperator = HiveOperator(
> task_id='hive_test',
> hiveconf_jinja_translate=True,
> hql = ''' use myDB;
> INSERT OVERWRITE TABLE t2
> select * from t1 where t1.x > ' ${hiveconf:mynumber}'
> ''',
> params={'mynumber': 2},
> dag=dag
> )
> this modification pass the 'params' in HiveOperator construction to Hive 
> sql execution context.
> The the variable definition can pass to hive sql



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-1153) params in HiveOperator constructor can't be passed into Hive execution context

2018-04-23 Thread Fokko Driesprong (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fokko Driesprong reassigned AIRFLOW-1153:
-

Assignee: Fokko Driesprong

> params in HiveOperator constructor can't be passed into Hive execution context
> --
>
> Key: AIRFLOW-1153
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1153
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: hive_hooks, operators
>Affects Versions: Airflow 2.0, Airflow 1.8
>Reporter: Xianping lin
>Assignee: Fokko Driesprong
>Priority: Critical
>  Labels: easyfix, newbie
> Fix For: 2.0.0
>
>
> params parameter in HiveOperator can't be imported into Hive executation 
> context.
> so the following centence won't work, because 'mynumber' doesn't work for 
> sql sentence.
> test_hiveoperator = HiveOperator(
> task_id='hive_test',
> hiveconf_jinja_translate=True,
> hql = ''' use myDB;
> INSERT OVERWRITE TABLE t2
> select * from t1 where t1.x > ' ${hiveconf:mynumber}'
> ''',
> params={'mynumber': 2},
> dag=dag
> )
> this modification pass the 'params' in HiveOperator construction to Hive 
> sql execution context.
> The the variable definition can pass to hive sql



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1153) params in HiveOperator constructor can't be passed into Hive execution context

2018-04-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448437#comment-16448437
 ] 

ASF subversion and git services commented on AIRFLOW-1153:
--

Commit 09bbe247728993867c716635951219cc49f65dd1 in incubator-airflow's branch 
refs/heads/master from Alan Ma
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=09bbe24 ]

[AIRFLOW-1153] Allow HiveOperators to take hiveconfs

HiveOperator can only replace variables via jinja
and the replacements
are global to the dag through the context and
user_defined_macros.
It would be much more flexible to open up
hive_conf to the HiveOperator
level so hive scripts can be recycled at the task
level, leveraging
HiveHook already existing hive_conf param and
_prepare_hiveconf
function.

Closes #3136 from wolfier/AIRFLOW-1153


> params in HiveOperator constructor can't be passed into Hive execution context
> --
>
> Key: AIRFLOW-1153
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1153
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: hive_hooks, operators
>Affects Versions: Airflow 2.0, Airflow 1.8
>Reporter: Xianping lin
>Priority: Critical
>  Labels: easyfix, newbie
>
> params parameter in HiveOperator can't be imported into Hive executation 
> context.
> so the following centence won't work, because 'mynumber' doesn't work for 
> sql sentence.
> test_hiveoperator = HiveOperator(
> task_id='hive_test',
> hiveconf_jinja_translate=True,
> hql = ''' use myDB;
> INSERT OVERWRITE TABLE t2
> select * from t1 where t1.x > ' ${hiveconf:mynumber}'
> ''',
> params={'mynumber': 2},
> dag=dag
> )
> this modification pass the 'params' in HiveOperator construction to Hive 
> sql execution context.
> The the variable definition can pass to hive sql



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1153) params in HiveOperator constructor can't be passed into Hive execution context

2018-04-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448436#comment-16448436
 ] 

ASF subversion and git services commented on AIRFLOW-1153:
--

Commit 09bbe247728993867c716635951219cc49f65dd1 in incubator-airflow's branch 
refs/heads/master from Alan Ma
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=09bbe24 ]

[AIRFLOW-1153] Allow HiveOperators to take hiveconfs

HiveOperator can only replace variables via jinja
and the replacements
are global to the dag through the context and
user_defined_macros.
It would be much more flexible to open up
hive_conf to the HiveOperator
level so hive scripts can be recycled at the task
level, leveraging
HiveHook already existing hive_conf param and
_prepare_hiveconf
function.

Closes #3136 from wolfier/AIRFLOW-1153


> params in HiveOperator constructor can't be passed into Hive execution context
> --
>
> Key: AIRFLOW-1153
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1153
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: hive_hooks, operators
>Affects Versions: Airflow 2.0, Airflow 1.8
>Reporter: Xianping lin
>Priority: Critical
>  Labels: easyfix, newbie
>
> params parameter in HiveOperator can't be imported into Hive executation 
> context.
> so the following centence won't work, because 'mynumber' doesn't work for 
> sql sentence.
> test_hiveoperator = HiveOperator(
> task_id='hive_test',
> hiveconf_jinja_translate=True,
> hql = ''' use myDB;
> INSERT OVERWRITE TABLE t2
> select * from t1 where t1.x > ' ${hiveconf:mynumber}'
> ''',
> params={'mynumber': 2},
> dag=dag
> )
> this modification pass the 'params' in HiveOperator construction to Hive 
> sql execution context.
> The the variable definition can pass to hive sql



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


incubator-airflow git commit: [AIRFLOW-1153] Allow HiveOperators to take hiveconfs

2018-04-23 Thread fokko
Repository: incubator-airflow
Updated Branches:
  refs/heads/master e30a1f451 -> 09bbe2477


[AIRFLOW-1153] Allow HiveOperators to take hiveconfs

HiveOperator can only replace variables via jinja
and the replacements
are global to the dag through the context and
user_defined_macros.
It would be much more flexible to open up
hive_conf to the HiveOperator
level so hive scripts can be recycled at the task
level, leveraging
HiveHook already existing hive_conf param and
_prepare_hiveconf
function.

Closes #3136 from wolfier/AIRFLOW-1153


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/09bbe247
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/09bbe247
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/09bbe247

Branch: refs/heads/master
Commit: 09bbe247728993867c716635951219cc49f65dd1
Parents: e30a1f4
Author: Alan Ma 
Authored: Mon Apr 23 18:56:29 2018 +0200
Committer: Fokko Driesprong 
Committed: Mon Apr 23 18:56:29 2018 +0200

--
 airflow/operators/hive_operator.py | 21 +++--
 tests/operators/hive_operator.py   | 10 ++
 2 files changed, 25 insertions(+), 6 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/09bbe247/airflow/operators/hive_operator.py
--
diff --git a/airflow/operators/hive_operator.py 
b/airflow/operators/hive_operator.py
index 64ea61d..b62744b 100644
--- a/airflow/operators/hive_operator.py
+++ b/airflow/operators/hive_operator.py
@@ -35,6 +35,9 @@ class HiveOperator(BaseOperator):
 :type hql: string
 :param hive_cli_conn_id: reference to the Hive database
 :type hive_cli_conn_id: string
+:param hiveconfs: if defined, these key value pairs will be passed
+to hive as ``-hiveconf "key"="value"``
+:type hiveconfs: dict
 :param hiveconf_jinja_translate: when True, hiveconf-type templating
 ${var} gets translated into jinja-type templating {{ var }} and
 ${hiveconf:var} gets translated into jinja-type templating {{ var }}.
@@ -56,7 +59,7 @@ class HiveOperator(BaseOperator):
 """
 
 template_fields = ('hql', 'schema', 'hive_cli_conn_id', 'mapred_queue',
-   'mapred_job_name', 'mapred_queue_priority')
+   'hiveconfs', 'mapred_job_name', 'mapred_queue_priority')
 template_ext = ('.hql', '.sql',)
 ui_color = '#f0e4ec'
 
@@ -65,6 +68,7 @@ class HiveOperator(BaseOperator):
 self, hql,
 hive_cli_conn_id='hive_cli_default',
 schema='default',
+hiveconfs=None,
 hiveconf_jinja_translate=False,
 script_begin_tag=None,
 run_as_owner=False,
@@ -74,15 +78,15 @@ class HiveOperator(BaseOperator):
 *args, **kwargs):
 
 super(HiveOperator, self).__init__(*args, **kwargs)
-self.hiveconf_jinja_translate = hiveconf_jinja_translate
 self.hql = hql
-self.schema = schema
 self.hive_cli_conn_id = hive_cli_conn_id
+self.schema = schema
+self.hiveconfs = hiveconfs or {}
+self.hiveconf_jinja_translate = hiveconf_jinja_translate
 self.script_begin_tag = script_begin_tag
 self.run_as = None
 if run_as_owner:
 self.run_as = self.dag.owner
-
 self.mapred_queue = mapred_queue
 self.mapred_queue_priority = mapred_queue_priority
 self.mapred_job_name = mapred_job_name
@@ -119,8 +123,13 @@ class HiveOperator(BaseOperator):
 .format(ti.hostname.split('.')[0], ti.dag_id, ti.task_id,
 ti.execution_date.isoformat())
 
-self.hook.run_cli(hql=self.hql, schema=self.schema,
-  hive_conf=context_to_airflow_vars(context))
+if self.hiveconf_jinja_translate:
+self.hiveconfs = context_to_airflow_vars(context)
+else:
+self.hiveconfs.update(context_to_airflow_vars(context))
+
+self.log.info('Passing HiveConf: %s', self.hiveconfs)
+self.hook.run_cli(hql=self.hql, schema=self.schema, 
hive_conf=self.hiveconfs)
 
 def dry_run(self):
 self.hook = self.get_hook()

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/09bbe247/tests/operators/hive_operator.py
--
diff --git a/tests/operators/hive_operator.py b/tests/operators/hive_operator.py
index 9cc0f1a..0914ba9 100644
--- a/tests/operators/hive_operator.py
+++ b/tests/operators/hive_operator.py
@@ -98,6 +98,16 @@ class HiveOperatorTest(HiveEnvironmentTest):
 t.prepare_template()
 self.assertEqual(t.hql, "SELECT {{ num_col 

[jira] [Commented] (AIRFLOW-2357) Use a persisten volume for the Kubernetes logs

2018-04-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448411#comment-16448411
 ] 

ASF subversion and git services commented on AIRFLOW-2357:
--

Commit e30a1f451aa5ec5aca4c886067ba8946a3d33395 in incubator-airflow's branch 
refs/heads/master from [~Fokko]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=e30a1f4 ]

[AIRFLOW-2357] Add persistent volume for the logs

The logs are kept inside of the worker pod. By
attaching a persistent
disk we keep the logs and make them available for
the webserver.

- Remove the requirements.txt since we dont want
to maintain another
  dependency file
- Fix some small casing stuff
- Removed some unused code
- Add missing shebang lines
- Started on some docs
- Fixed the logging

Closes #3252 from Fokko/airflow-2357-pd-for-logs


> Use a persisten volume for the Kubernetes logs
> --
>
> Key: AIRFLOW-2357
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2357
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Fokko Driesprong
>Priority: Major
> Fix For: 1.10.0
>
>
> Right now, when a pod exits, the log is lost forever because it is still in 
> the container. By mounting a persistent volume we can easily fix this and 
> this allows us to have logs on our local machines (minikube). In production 
> you might want to write your logs to GCS/S3/etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


incubator-airflow git commit: [AIRFLOW-2357] Add persistent volume for the logs

2018-04-23 Thread bolke
Repository: incubator-airflow
Updated Branches:
  refs/heads/master 1d3bb5470 -> e30a1f451


[AIRFLOW-2357] Add persistent volume for the logs

The logs are kept inside of the worker pod. By
attaching a persistent
disk we keep the logs and make them available for
the webserver.

- Remove the requirements.txt since we dont want
to maintain another
  dependency file
- Fix some small casing stuff
- Removed some unused code
- Add missing shebang lines
- Started on some docs
- Fixed the logging

Closes #3252 from Fokko/airflow-2357-pd-for-logs


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/e30a1f45
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/e30a1f45
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/e30a1f45

Branch: refs/heads/master
Commit: e30a1f451aa5ec5aca4c886067ba8946a3d33395
Parents: 1d3bb54
Author: Fokko Driesprong 
Authored: Mon Apr 23 18:43:24 2018 +0200
Committer: Bolke de Bruin 
Committed: Mon Apr 23 18:43:24 2018 +0200

--
 airflow/config_templates/default_airflow.cfg|   3 +
 .../contrib/executors/kubernetes_executor.py| 216 +
 airflow/contrib/kubernetes/pod.py   |   8 +-
 airflow/contrib/kubernetes/pod_generator.py | 150 +---
 airflow/contrib/kubernetes/pod_launcher.py  |  22 +-
 .../contrib/kubernetes/worker_configuration.py  | 116 +
 .../operators/kubernetes_pod_operator.py|  21 +-
 airflow/utils/cli.py|   6 +-
 docs/kubernetes.rst |  12 +-
 scripts/ci/kubernetes/docker/Dockerfile |  15 +-
 scripts/ci/kubernetes/docker/Dockerfile_zip |  20 --
 scripts/ci/kubernetes/docker/bootstrap.sh   |   3 +-
 scripts/ci/kubernetes/docker/build.sh   |   8 +-
 scripts/ci/kubernetes/docker/requirements.txt   | 103 
 scripts/ci/kubernetes/kube/airflow.yaml | 221 +
 .../ci/kubernetes/kube/airflow.yaml.template| 240 ---
 scripts/ci/kubernetes/kube/deploy.sh|   9 +-
 scripts/ci/kubernetes/kube/postgres.yaml|  28 +--
 scripts/ci/kubernetes/kube/volumes.yaml |  64 +
 19 files changed, 531 insertions(+), 734 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/e30a1f45/airflow/config_templates/default_airflow.cfg
--
diff --git a/airflow/config_templates/default_airflow.cfg 
b/airflow/config_templates/default_airflow.cfg
index 6a966cb..845342c 100644
--- a/airflow/config_templates/default_airflow.cfg
+++ b/airflow/config_templates/default_airflow.cfg
@@ -518,6 +518,9 @@ dags_volume_subpath =
 # For DAGs mounted via a volume claim (mutually exclusive with volume claim)
 dags_volume_claim =
 
+# A shared volume claim for the logs
+logs_volume_claim =
+
 # Git credentials and repository for DAGs mounted via Git (mutually exclusive 
with volume claim)
 git_repo =
 git_branch =

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/e30a1f45/airflow/contrib/executors/kubernetes_executor.py
--
diff --git a/airflow/contrib/executors/kubernetes_executor.py 
b/airflow/contrib/executors/kubernetes_executor.py
index 1a50d85..cdce95f 100644
--- a/airflow/contrib/executors/kubernetes_executor.py
+++ b/airflow/contrib/executors/kubernetes_executor.py
@@ -47,16 +47,10 @@ class KubernetesExecutorConfig:
 
 def __repr__(self):
 return "{}(image={}, request_memory={} ,request_cpu={}, 
limit_memory={}, " \
-"limit_cpu={}, gcp_service_account_key={})"\
-.format(
-KubernetesExecutorConfig.__name__,
-self.image,
-self.request_memory,
-self.request_cpu,
-self.limit_memory,
-self.limit_cpu,
-self.gcp_service_account_key
-)
+   "limit_cpu={}, gcp_service_account_key={})" \
+.format(KubernetesExecutorConfig.__name__, self.image, 
self.request_memory,
+self.request_cpu, self.limit_memory, self.limit_cpu,
+self.gcp_service_account_key)
 
 @staticmethod
 def from_dict(obj):
@@ -65,33 +59,33 @@ class KubernetesExecutorConfig:
 
 if not isinstance(obj, dict):
 raise TypeError(
-"Cannot convert a non-dictionary object into a 
KubernetesExecutorConfig")
+'Cannot convert a non-dictionary object into a 
KubernetesExecutorConfig')
 
 namespaced = obj.get(Executors.KubernetesExecutor, {})
 
 return KubernetesExecutorConfig(
-

[jira] [Resolved] (AIRFLOW-2357) Use a persisten volume for the Kubernetes logs

2018-04-23 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-2357.
-
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3252
[https://github.com/apache/incubator-airflow/pull/3252]

> Use a persisten volume for the Kubernetes logs
> --
>
> Key: AIRFLOW-2357
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2357
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Fokko Driesprong
>Priority: Major
> Fix For: 1.10.0
>
>
> Right now, when a pod exits, the log is lost forever because it is still in 
> the container. By mounting a persistent volume we can easily fix this and 
> this allows us to have logs on our local machines (minikube). In production 
> you might want to write your logs to GCS/S3/etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2364) The autocommit flag can be set on a connection which does not support it.

2018-04-23 Thread Arthur Wiedmer (JIRA)
Arthur Wiedmer created AIRFLOW-2364:
---

 Summary: The autocommit flag can be set on a connection which does 
not support it.
 Key: AIRFLOW-2364
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2364
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Arthur Wiedmer
Assignee: Arthur Wiedmer


We could just add a logging warning when the method is invoked.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2363) S3 remote logging appending tuple instead of str

2018-04-23 Thread Kyle Hamlin (JIRA)
Kyle Hamlin created AIRFLOW-2363:


 Summary: S3 remote logging appending tuple instead of str
 Key: AIRFLOW-2363
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2363
 Project: Apache Airflow
  Issue Type: Bug
  Components: logging
Reporter: Kyle Hamlin
 Fix For: 1.10.0


A recent merge into master that added support for Elasticsearch logging seems 
to have broken S3 logging by returning a tuple instead of a string.

[https://github.com/apache/incubator-airflow/commit/ec38ba9594395de04ec932481212a86fbe9ae107#diff-0442332ecbe42ebbf426911c68d8cd4aR128]

 

following errors thrown:

 
*Session NoneType error*
 Traceback (most recent call last):
   File 
"/usr/local/lib/python3.6/site-packages/airflow/utils/log/s3_task_handler.py", 
line 171, in s3_write
     encrypt=configuration.conf.getboolean('core', 'ENCRYPT_S3_LOGS'),
   File "/usr/local/lib/python3.6/site-packages/airflow/hooks/S3_hook.py", line 
274, in load_string
     encrypt=encrypt)
   File "/usr/local/lib/python3.6/site-packages/airflow/hooks/S3_hook.py", line 
313, in load_bytes
     client = self.get_conn()
   File "/usr/local/lib/python3.6/site-packages/airflow/hooks/S3_hook.py", line 
34, in get_conn
     return self.get_client_type('s3')
   File 
"/usr/local/lib/python3.6/site-packages/airflow/contrib/hooks/aws_hook.py", 
line 151, in get_client_type
     session, endpoint_url = self._get_credentials(region_name)
   File 
"/usr/local/lib/python3.6/site-packages/airflow/contrib/hooks/aws_hook.py", 
line 97, in _get_credentials
     connection_object = self.get_connection(self.aws_conn_id)
   File "/usr/local/lib/python3.6/site-packages/airflow/hooks/base_hook.py", 
line 82, in get_connection
     conn = random.choice(cls.get_connections(conn_id))
   File "/usr/local/lib/python3.6/site-packages/airflow/hooks/base_hook.py", 
line 77, in get_connections
     conns = cls._get_connections_from_db(conn_id)
   File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 72, 
in wrapper
     with create_session() as session:
   File "/usr/local/lib/python3.6/contextlib.py", line 81, in __enter__
     return next(self.gen)
   File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 41, 
in create_session
     session = settings.Session()
 TypeError: 'NoneType' object is not callable
 
*TypeError must be str not tuple*
 [2018-04-16 18:37:28,200] ERROR in app: Exception on 
/admin/airflow/get_logs_with_metadata [GET]
 Traceback (most recent call last):
   File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1982, in 
wsgi_app
     response = self.full_dispatch_request()
   File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1614, in 
full_dispatch_request
     rv = self.handle_user_exception(e)
   File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1517, in 
handle_user_exception
     reraise(exc_type, exc_value, tb)
   File "/usr/local/lib/python3.6/site-packages/flask/_compat.py", line 33, in 
reraise
     raise value
   File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1612, in 
full_dispatch_request
     rv = self.dispatch_request()
   File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1598, in 
dispatch_request
     return self.view_functions[rule.endpoint](**req.view_args)
   File "/usr/local/lib/python3.6/site-packages/flask_admin/base.py", line 69, 
in inner
     return self._run_view(f, *args, **kwargs)
   File "/usr/local/lib/python3.6/site-packages/flask_admin/base.py", line 368, 
in _run_view
     return fn(self, *args, **kwargs)
   File "/usr/local/lib/python3.6/site-packages/flask_login.py", line 755, in 
decorated_view
     return func(*args, **kwargs)
   File "/usr/local/lib/python3.6/site-packages/airflow/www/utils.py", line 
269, in wrapper
     return f(*args, **kwargs)
   File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 74, 
in wrapper
     return func(*args, **kwargs)
   File "/usr/local/lib/python3.6/site-packages/airflow/www/views.py", line 
770, in get_logs_with_metadata
     logs, metadatas = handler.read(ti, try_number, metadata=metadata)
   File 
"/usr/local/lib/python3.6/site-packages/airflow/utils/log/file_task_handler.py",
 line 165, in read
     logs[i] += log
 TypeError: must be str, not tuple



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2362) DockerOperator imports Client should import APIClient

2018-04-23 Thread Kyle Hamlin (JIRA)
Kyle Hamlin created AIRFLOW-2362:


 Summary: DockerOperator imports Client should import APIClient
 Key: AIRFLOW-2362
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2362
 Project: Apache Airflow
  Issue Type: Bug
  Components: operators
Reporter: Kyle Hamlin
 Fix For: 1.10.0


The docker-py package changed its API over a year ago to rename the Client to 
APIClient:

 

https://github.com/docker/docker-py/commit/f5ac10c469fca252e69ae780749f4ec6fe369789



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2358) make kubernetes examples installed optionally

2018-04-23 Thread Taylor Edmiston (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448241#comment-16448241
 ] 

Taylor Edmiston commented on AIRFLOW-2358:
--

Is a good solution here: the k8s DAGs would just not load and perhaps info log 
that this is due to the kubernetes install option not being set?

> make kubernetes examples installed optionally
> -
>
> Key: AIRFLOW-2358
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2358
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: examples
>Affects Versions: Airflow 2.0, 1.10.0
>Reporter: Ruslan Dautkhanov
>Priority: Major
>  Labels: kubernetes
>
> Is it possible to make kubernetes examples installed optionally?
>  
> We don't use Kubernetes and a bare Airflow install fills logs with following :
>  
> {quote}2018-04-22 19:49:04,718 ERROR - Failed to import: 
> /opt/airflow/airflow-20180420/src/apache-airflow/airflow/example_dags/example_kubernetes_operator.py
> Traceback (most recent call last):
>   File "/opt/airflow/airflow-20180420/src/apache-airflow/airflow/models.py", 
> line 300, in process_file
>     m = imp.load_source(mod_name, filepath)
>   File 
> "/opt/airflow/airflow-20180420/src/apache-airflow/airflow/example_dags/example_kubernetes_operator.py",
>  line 19, in 
>     from airflow.contrib.operators.kubernetes_pod_operator import 
> KubernetesPodOperator
>   File 
> "/opt/airflow/airflow-20180420/src/apache-airflow/airflow/contrib/operators/kubernetes_pod_operator.py",
>  line 21, in 
>     from airflow.contrib.kubernetes import kube_client, pod_generator, 
> pod_launcher
>   File 
> "/opt/airflow/airflow-20180420/src/apache-airflow/airflow/contrib/kubernetes/pod_launcher.py",
>  line 25, in 
>     from kubernetes import watch
> {color:#f6c342}ImportError: No module named kubernetes{color}{quote}
>  
> Would be great to make examples driven by what modules installed if they have 
> external dependencies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2351) timezone fix for @once DAGs

2018-04-23 Thread Fokko Driesprong (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fokko Driesprong resolved AIRFLOW-2351.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request #3256
[https://github.com/apache/incubator-airflow/pull/3256]

> timezone fix for @once DAGs
> ---
>
> Key: AIRFLOW-2351
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2351
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: core, scheduler
>Affects Versions: Airflow 2.0, 1.10.0, 1.9.1, 2.0.0
>Reporter: Ruslan Dautkhanov
>Assignee: Ruslan Dautkhanov
>Priority: Major
> Fix For: 2.0.0
>
>
> As discussed on the dev list: 
>  
> {quote}
> Upgraded Airflow .. getting following error [1] when processing a DAG
> We have 'start_date': None set in default_args.. but this used to work im 
> previous airflow versions.
> This is a '@once DAG.. so we don't need a start_date (no back fill).
> {quote}
>  
> {quote}[2018-01-16 16:05:25,283] \{models.py:293} ERROR - Failed to import: 
> /home/rdautkha/airflow/dags/discover/discover-ora-load-2.py
> Traceback (most recent call last):
>   File "/opt/airflow/airflow-20180116/src/apache-airflow/airflow/models.py", 
> line 290, in process_file
>     m = imp.load_source(mod_name, filepath)
>   File "/home/rdautkha/airflow/dags/discover/discover-ora-load-2.py", line 
> 66, in 
>     orientation                    = 'TB',                          # default 
> graph view
>   File "/opt/airflow/airflow-20180116/src/apache-airflow/airflow/models.py", 
> line 2951, in __init__
>     self.timezone = self.default_args['start_date'].tzinfo
> AttributeError: 'NoneType' object has no attribute 'tzinfo'{quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2351) timezone fix for @once DAGs

2018-04-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448235#comment-16448235
 ] 

ASF subversion and git services commented on AIRFLOW-2351:
--

Commit 1d3bb5470711a935b36b6a0ab4c7ec414d460d75 in incubator-airflow's branch 
refs/heads/master from [~bolke]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=1d3bb54 ]

[AIRFLOW-2351] Check for valid default_args start_date

A bug existed when default_args did contain
start_date
but it was set to None, failing to instantiate the
DAG.

Closes #3256 from bolkedebruin/AIRFLOW-2351


> timezone fix for @once DAGs
> ---
>
> Key: AIRFLOW-2351
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2351
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: core, scheduler
>Affects Versions: Airflow 2.0, 1.10.0, 1.9.1, 2.0.0
>Reporter: Ruslan Dautkhanov
>Assignee: Ruslan Dautkhanov
>Priority: Major
> Fix For: 2.0.0
>
>
> As discussed on the dev list: 
>  
> {quote}
> Upgraded Airflow .. getting following error [1] when processing a DAG
> We have 'start_date': None set in default_args.. but this used to work im 
> previous airflow versions.
> This is a '@once DAG.. so we don't need a start_date (no back fill).
> {quote}
>  
> {quote}[2018-01-16 16:05:25,283] \{models.py:293} ERROR - Failed to import: 
> /home/rdautkha/airflow/dags/discover/discover-ora-load-2.py
> Traceback (most recent call last):
>   File "/opt/airflow/airflow-20180116/src/apache-airflow/airflow/models.py", 
> line 290, in process_file
>     m = imp.load_source(mod_name, filepath)
>   File "/home/rdautkha/airflow/dags/discover/discover-ora-load-2.py", line 
> 66, in 
>     orientation                    = 'TB',                          # default 
> graph view
>   File "/opt/airflow/airflow-20180116/src/apache-airflow/airflow/models.py", 
> line 2951, in __init__
>     self.timezone = self.default_args['start_date'].tzinfo
> AttributeError: 'NoneType' object has no attribute 'tzinfo'{quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2351) timezone fix for @once DAGs

2018-04-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448236#comment-16448236
 ] 

ASF subversion and git services commented on AIRFLOW-2351:
--

Commit 1d3bb5470711a935b36b6a0ab4c7ec414d460d75 in incubator-airflow's branch 
refs/heads/master from [~bolke]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=1d3bb54 ]

[AIRFLOW-2351] Check for valid default_args start_date

A bug existed when default_args did contain
start_date
but it was set to None, failing to instantiate the
DAG.

Closes #3256 from bolkedebruin/AIRFLOW-2351


> timezone fix for @once DAGs
> ---
>
> Key: AIRFLOW-2351
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2351
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: core, scheduler
>Affects Versions: Airflow 2.0, 1.10.0, 1.9.1, 2.0.0
>Reporter: Ruslan Dautkhanov
>Assignee: Ruslan Dautkhanov
>Priority: Major
> Fix For: 2.0.0
>
>
> As discussed on the dev list: 
>  
> {quote}
> Upgraded Airflow .. getting following error [1] when processing a DAG
> We have 'start_date': None set in default_args.. but this used to work im 
> previous airflow versions.
> This is a '@once DAG.. so we don't need a start_date (no back fill).
> {quote}
>  
> {quote}[2018-01-16 16:05:25,283] \{models.py:293} ERROR - Failed to import: 
> /home/rdautkha/airflow/dags/discover/discover-ora-load-2.py
> Traceback (most recent call last):
>   File "/opt/airflow/airflow-20180116/src/apache-airflow/airflow/models.py", 
> line 290, in process_file
>     m = imp.load_source(mod_name, filepath)
>   File "/home/rdautkha/airflow/dags/discover/discover-ora-load-2.py", line 
> 66, in 
>     orientation                    = 'TB',                          # default 
> graph view
>   File "/opt/airflow/airflow-20180116/src/apache-airflow/airflow/models.py", 
> line 2951, in __init__
>     self.timezone = self.default_args['start_date'].tzinfo
> AttributeError: 'NoneType' object has no attribute 'tzinfo'{quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


incubator-airflow git commit: [AIRFLOW-2351] Check for valid default_args start_date

2018-04-23 Thread fokko
Repository: incubator-airflow
Updated Branches:
  refs/heads/master 6da88bb42 -> 1d3bb5470


[AIRFLOW-2351] Check for valid default_args start_date

A bug existed when default_args did contain
start_date
but it was set to None, failing to instantiate the
DAG.

Closes #3256 from bolkedebruin/AIRFLOW-2351


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/1d3bb547
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/1d3bb547
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/1d3bb547

Branch: refs/heads/master
Commit: 1d3bb5470711a935b36b6a0ab4c7ec414d460d75
Parents: 6da88bb
Author: Bolke de Bruin 
Authored: Mon Apr 23 16:37:55 2018 +0200
Committer: Fokko Driesprong 
Committed: Mon Apr 23 16:37:55 2018 +0200

--
 airflow/models.py |  6 +++---
 tests/models.py   | 12 ++--
 2 files changed, 13 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/1d3bb547/airflow/models.py
--
diff --git a/airflow/models.py b/airflow/models.py
index 18e9e26..c8a737c 100755
--- a/airflow/models.py
+++ b/airflow/models.py
@@ -7,9 +7,9 @@
 # to you under the Apache License, Version 2.0 (the
 # "License"); you may not use this file except in compliance
 # with the License.  You may obtain a copy of the License at
-# 
+#
 #   http://www.apache.org/licenses/LICENSE-2.0
-# 
+#
 # Unless required by applicable law or agreed to in writing,
 # software distributed under the License is distributed on an
 # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
@@ -3094,7 +3094,7 @@ class DAG(BaseDag, LoggingMixin):
 # set timezone
 if start_date and start_date.tzinfo:
 self.timezone = start_date.tzinfo
-elif 'start_date' in self.default_args:
+elif 'start_date' in self.default_args and 
self.default_args['start_date']:
 if isinstance(self.default_args['start_date'], six.string_types):
 self.default_args['start_date'] = (
 timezone.parse(self.default_args['start_date'])

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/1d3bb547/tests/models.py
--
diff --git a/tests/models.py b/tests/models.py
index 2dbc143..c2e54e5 100644
--- a/tests/models.py
+++ b/tests/models.py
@@ -7,9 +7,9 @@
 # to you under the Apache License, Version 2.0 (the
 # "License"); you may not use this file except in compliance
 # with the License.  You may obtain a copy of the License at
-# 
+#
 #   http://www.apache.org/licenses/LICENSE-2.0
-# 
+#
 # Unless required by applicable law or agreed to in writing,
 # software distributed under the License is distributed on an
 # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
@@ -212,6 +212,14 @@ class DagTest(unittest.TestCase):
 
 self.assertEquals(tuple(), dag.topological_sort())
 
+def test_dag_none_default_args_start_date(self):
+"""
+Tests if a start_date of None in default_args
+works.
+"""
+dag = DAG('DAG', default_args={'start_date': None})
+self.assertEqual(dag.timezone, settings.TIMEZONE)
+
 def test_dag_task_priority_weight_total(self):
 width = 5
 depth = 5



[jira] [Commented] (AIRFLOW-1433) Convert Airflow to Use FAB Framework

2018-04-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448062#comment-16448062
 ] 

ASF subversion and git services commented on AIRFLOW-1433:
--

Commit 6da88bb4206fbb52b2bcb592a1826faf8860738d in incubator-airflow's branch 
refs/heads/master from [~wrp]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=6da88bb ]

[AIRFLOW-1433] Set default rbac to initdb

05e1861e24de42f9a2c649cd93041c5c744504e1 breaks
the api for
any program directly importing airflow.utils.
This sets
a reasonable default.

Closes #3240 from wrp/initdb


> Convert Airflow to Use FAB Framework
> 
>
> Key: AIRFLOW-1433
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1433
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Joy Gao
>Assignee: Joy Gao
>Priority: Major
>
> The authentication capabilities in the RBAC design proposal introduces a 
> significant amount of work that is otherwise already built-in in existing 
> frameworks. 
> Per community discussion, Flask-AppBuilder (FAB) is the best fit for Airflow 
> as a foundation to implementing RBAC. This will support integration with 
> different authentication backends out-of-the-box, and generate permissions 
> for views and ORM models that will simplify view-level and dag-level access 
> control.
> This implies modifying the current flask views, and deprecating the current 
> Flask-Admin in favor of FAB's crud.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


incubator-airflow git commit: [AIRFLOW-1433] Set default rbac to initdb

2018-04-23 Thread bolke
Repository: incubator-airflow
Updated Branches:
  refs/heads/master a704b541f -> 6da88bb42


[AIRFLOW-1433] Set default rbac to initdb

05e1861e24de42f9a2c649cd93041c5c744504e1 breaks
the api for
any program directly importing airflow.utils.
This sets
a reasonable default.

Closes #3240 from wrp/initdb


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/6da88bb4
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/6da88bb4
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/6da88bb4

Branch: refs/heads/master
Commit: 6da88bb4206fbb52b2bcb592a1826faf8860738d
Parents: a704b54
Author: William Pursell 
Authored: Mon Apr 23 14:35:43 2018 +0200
Committer: Bolke de Bruin 
Committed: Mon Apr 23 14:35:43 2018 +0200

--
 airflow/utils/db.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/6da88bb4/airflow/utils/db.py
--
diff --git a/airflow/utils/db.py b/airflow/utils/db.py
index c2c3d17..3ec6069 100644
--- a/airflow/utils/db.py
+++ b/airflow/utils/db.py
@@ -85,7 +85,7 @@ def merge_conn(conn, session=None):
 session.commit()
 
 
-def initdb(rbac):
+def initdb(rbac=False):
 session = settings.Session()
 
 from airflow import models



[jira] [Created] (AIRFLOW-2361) Allow GoogleCloudStorageToGoogleCloudStorageOperator to store list of copied files to XCom

2018-04-23 Thread Berislav Lopac (JIRA)
Berislav Lopac created AIRFLOW-2361:
---

 Summary: Allow GoogleCloudStorageToGoogleCloudStorageOperator to 
store list of copied files to XCom
 Key: AIRFLOW-2361
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2361
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Berislav Lopac
Assignee: Berislav Lopac


When {{GoogleCloudStorageToGoogleCloudStorageOperator}} is used with a 
wildcard, it can copy more than one file. It would be useful if there would 
exist a mechanism to store the list of copied files as XCom so it can be used 
by other tasks downstream.

Proposed solution: Add a {{xcom_push}} flag argument to the constructor; if 
{{True}}, the {{execute }}method returns a report on copied files. The report 
is a dict of following structure:
{code:java}
{
"source_bucket": "source-bucket-name",
"destination_bucket": "destination-bucket-name",
"copied_files": [
["original/file/path.ext", "target/prefix/original/file/path.ext"],
...
]
}{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-843) Store task exceptions in context

2018-04-23 Thread Andrew Jones (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447904#comment-16447904
 ] 

Andrew Jones commented on AIRFLOW-843:
--

It would be great to see this merged and released 

> Store task exceptions in context
> 
>
> Key: AIRFLOW-843
> URL: https://issues.apache.org/jira/browse/AIRFLOW-843
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Scott Kruger
>Priority: Minor
>
> If a task encounters an exception during execution, it should store the 
> exception on the execution context so that other methods (namely 
> `on_failure_callback` can access it.  This would help with custom error 
> integrations, e.g. Sentry.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2270) Subdag backfill spins on removed tasks

2018-04-23 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-2270.
-
   Resolution: Fixed
Fix Version/s: 1.10.0

> Subdag backfill spins on removed tasks
> --
>
> Key: AIRFLOW-2270
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2270
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Winston Huang
>Priority: Major
> Fix For: 1.10.0
>
>
> My understanding is that subdag operators execute via a backfill job which 
> runs in a loop, maintaining the state of the associated tasks and breaking 
> only once all pending tasks have been exhausted: 
> [https://github.com/apache/incubator-airflow/blob/64206615a790c90893d5836da8d2f7159bda23ac/airflow/jobs.py#L2159]
>  
> The issue is that this task instance status is initialized by this method 
> [https://github.com/apache/incubator-airflow/blob/64206615a790c90893d5836da8d2f7159bda23ac/airflow/jobs.py#L2075,]
>  which may include tasks with {{state = State.REMOVED}}, i.e. tasks that were 
> previously instantiated in the database but removed from the dag definition. 
> Hence, the task will be missing from this list 
> [https://github.com/apache/incubator-airflow/blob/64206615a790c90893d5836da8d2f7159bda23ac/airflow/jobs.py#L2168]
>  but will exist in {{ti_status.to_run}}. This causes the backfill job to loop 
> indefinitely, since it considers those removed tasks to be pending but 
> doesn't attempt to run them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2270) Subdag backfill spins on removed tasks

2018-04-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447899#comment-16447899
 ] 

ASF subversion and git services commented on AIRFLOW-2270:
--

Commit a704b541fea6343e5d0a17828c5746287a8dd316 in incubator-airflow's branch 
refs/heads/master from [~ji-han]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=a704b54 ]

[AIRFLOW-2270] Handle removed tasks in backfill

Fix issue with backfill jobs of dags, where tasks
in the
removed state are not run but still considered to
be pending,
causing an indefinite loop.

Closes #3176 from ji-han/AIRFLOW-
2270_dag_backfill_removed_tasks


> Subdag backfill spins on removed tasks
> --
>
> Key: AIRFLOW-2270
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2270
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Winston Huang
>Priority: Major
>
> My understanding is that subdag operators execute via a backfill job which 
> runs in a loop, maintaining the state of the associated tasks and breaking 
> only once all pending tasks have been exhausted: 
> [https://github.com/apache/incubator-airflow/blob/64206615a790c90893d5836da8d2f7159bda23ac/airflow/jobs.py#L2159]
>  
> The issue is that this task instance status is initialized by this method 
> [https://github.com/apache/incubator-airflow/blob/64206615a790c90893d5836da8d2f7159bda23ac/airflow/jobs.py#L2075,]
>  which may include tasks with {{state = State.REMOVED}}, i.e. tasks that were 
> previously instantiated in the database but removed from the dag definition. 
> Hence, the task will be missing from this list 
> [https://github.com/apache/incubator-airflow/blob/64206615a790c90893d5836da8d2f7159bda23ac/airflow/jobs.py#L2168]
>  but will exist in {{ti_status.to_run}}. This causes the backfill job to loop 
> indefinitely, since it considers those removed tasks to be pending but 
> doesn't attempt to run them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


incubator-airflow git commit: [AIRFLOW-2270] Handle removed tasks in backfill

2018-04-23 Thread bolke
Repository: incubator-airflow
Updated Branches:
  refs/heads/master 0d199e5f3 -> a704b541f


[AIRFLOW-2270] Handle removed tasks in backfill

Fix issue with backfill jobs of dags, where tasks
in the
removed state are not run but still considered to
be pending,
causing an indefinite loop.

Closes #3176 from ji-han/AIRFLOW-
2270_dag_backfill_removed_tasks


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/a704b541
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/a704b541
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/a704b541

Branch: refs/heads/master
Commit: a704b541fea6343e5d0a17828c5746287a8dd316
Parents: 0d199e5
Author: Winston Huang 
Authored: Mon Apr 23 12:14:52 2018 +0200
Committer: Bolke de Bruin 
Committed: Mon Apr 23 12:14:52 2018 +0200

--
 airflow/jobs.py |  3 ++-
 tests/jobs.py   | 44 
 2 files changed, 46 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/a704b541/airflow/jobs.py
--
diff --git a/airflow/jobs.py b/airflow/jobs.py
index fe76fbd..ecbfef8 100644
--- a/airflow/jobs.py
+++ b/airflow/jobs.py
@@ -2121,7 +2121,8 @@ class BackfillJob(BaseJob):
 # all tasks part of the backfill are scheduled to run
 if ti.state == State.NONE:
 ti.set_state(State.SCHEDULED, session=session)
-tasks_to_run[ti.key] = ti
+if ti.state != State.REMOVED:
+tasks_to_run[ti.key] = ti
 
 return tasks_to_run
 

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/a704b541/tests/jobs.py
--
diff --git a/tests/jobs.py b/tests/jobs.py
index 9eb166b..9e07645 100644
--- a/tests/jobs.py
+++ b/tests/jobs.py
@@ -714,6 +714,50 @@ class BackfillJobTest(unittest.TestCase):
 subdag.clear()
 dag.clear()
 
+
+def test_backfill_execute_subdag_with_removed_task(self):
+"""
+Ensure that subdag operators execute properly in the case where
+an associated task of the subdag has been removed from the dag
+definition, but has instances in the database from previous runs.
+"""
+dag = self.dagbag.get_dag('example_subdag_operator')
+subdag = dag.get_task('section-1').subdag
+
+executor = TestExecutor(do_update=True)
+job = BackfillJob(dag=subdag,
+  start_date=DEFAULT_DATE,
+  end_date=DEFAULT_DATE,
+  executor=executor,
+  donot_pickle=True)
+
+removed_task_ti = TI(
+task=DummyOperator(task_id='removed_task'),
+execution_date=DEFAULT_DATE,
+state=State.REMOVED)
+removed_task_ti.dag_id = subdag.dag_id
+
+session = settings.Session()
+session.merge(removed_task_ti)
+
+with timeout(seconds=30):
+job.run()
+
+for task in subdag.tasks:
+instance = session.query(TI).filter(
+TI.dag_id == subdag.dag_id,
+TI.task_id == task.task_id,
+TI.execution_date == DEFAULT_DATE).first()
+
+self.assertIsNotNone(instance)
+self.assertEqual(instance.state, State.SUCCESS)
+
+removed_task_ti.refresh_from_db()
+self.assertEqual(removed_task_ti.state, State.REMOVED)
+
+subdag.clear()
+dag.clear()
+
 def test_update_counters(self):
 dag = DAG(
 dag_id='test_manage_executor_state',



[jira] [Commented] (AIRFLOW-2344) Fix `airflow connections -l` to work with pipe and redirect

2018-04-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447808#comment-16447808
 ] 

ASF subversion and git services commented on AIRFLOW-2344:
--

Commit 0d199e5f37f076e5ff0a8e20140e35a125eedcfc in incubator-airflow's branch 
refs/heads/master from [~sekikn]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=0d199e5 ]

[AIRFLOW-2344] Fix `connections -l` to work with pipe/redirect

`airflow connections -l` uses 'tabulate' package
with
fancy_grid format, which outputs box drawing
characters.
It can occur UnicodeEncodeError with pipe or
redirect,
since the default encoding for Python 2.x is
ascii.
This PR fixes it and contains some flask8 related
fixes.

Closes #3244 from sekikn/AIRFLOW-2344


> Fix `airflow connections -l` to work with pipe and redirect
> ---
>
> Key: AIRFLOW-2344
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2344
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: cli
>Affects Versions: 1.9.0
>Reporter: Kengo Seki
>Assignee: Kengo Seki
>Priority: Minor
>
> {{airflow connections -l}} fails with pipe or redirect e.g.:
> {code}
> $ airflow connections -l > foo
> Traceback (most recent call last):
>   File "/home/sekikn/.virtualenvs/a/bin/airflow", line 6, in 
> exec(compile(open(__file__).read(), __file__, 'exec'))
>   File "/home/sekikn/dev/incubator-airflow/airflow/bin/airflow", line 32, in 
> 
> args.func(args)
>   File "/home/sekikn/dev/incubator-airflow/airflow/utils/cli.py", line 77, in 
> wrapper
> raise e
> UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-141: 
> ordinal not in range(128)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (AIRFLOW-2344) Fix `airflow connections -l` to work with pipe and redirect

2018-04-23 Thread Fokko Driesprong (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fokko Driesprong closed AIRFLOW-2344.
-
Resolution: Fixed

> Fix `airflow connections -l` to work with pipe and redirect
> ---
>
> Key: AIRFLOW-2344
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2344
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: cli
>Affects Versions: 1.9.0
>Reporter: Kengo Seki
>Assignee: Kengo Seki
>Priority: Minor
>
> {{airflow connections -l}} fails with pipe or redirect e.g.:
> {code}
> $ airflow connections -l > foo
> Traceback (most recent call last):
>   File "/home/sekikn/.virtualenvs/a/bin/airflow", line 6, in 
> exec(compile(open(__file__).read(), __file__, 'exec'))
>   File "/home/sekikn/dev/incubator-airflow/airflow/bin/airflow", line 32, in 
> 
> args.func(args)
>   File "/home/sekikn/dev/incubator-airflow/airflow/utils/cli.py", line 77, in 
> wrapper
> raise e
> UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-141: 
> ordinal not in range(128)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


incubator-airflow git commit: [AIRFLOW-2344] Fix `connections -l` to work with pipe/redirect

2018-04-23 Thread fokko
Repository: incubator-airflow
Updated Branches:
  refs/heads/master 49826af10 -> 0d199e5f3


[AIRFLOW-2344] Fix `connections -l` to work with pipe/redirect

`airflow connections -l` uses 'tabulate' package
with
fancy_grid format, which outputs box drawing
characters.
It can occur UnicodeEncodeError with pipe or
redirect,
since the default encoding for Python 2.x is
ascii.
This PR fixes it and contains some flask8 related
fixes.

Closes #3244 from sekikn/AIRFLOW-2344


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/0d199e5f
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/0d199e5f
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/0d199e5f

Branch: refs/heads/master
Commit: 0d199e5f37f076e5ff0a8e20140e35a125eedcfc
Parents: 49826af
Author: Kengo Seki 
Authored: Mon Apr 23 11:12:52 2018 +0200
Committer: Fokko Driesprong 
Committed: Mon Apr 23 11:12:52 2018 +0200

--
 airflow/bin/cli.py |  7 +--
 tests/core.py  | 17 +++--
 2 files changed, 16 insertions(+), 8 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/0d199e5f/airflow/bin/cli.py
--
diff --git a/airflow/bin/cli.py b/airflow/bin/cli.py
index a5936a9..975f481 100644
--- a/airflow/bin/cli.py
+++ b/airflow/bin/cli.py
@@ -1025,9 +1025,12 @@ def connections(args):
   Connection.is_extra_encrypted,
   Connection.extra).all()
 conns = [map(reprlib.repr, conn) for conn in conns]
-print(tabulate(conns, ['Conn Id', 'Conn Type', 'Host', 'Port',
+msg = tabulate(conns, ['Conn Id', 'Conn Type', 'Host', 'Port',
'Is Encrypted', 'Is Extra Encrypted', 'Extra'],
-   tablefmt="fancy_grid"))
+   tablefmt="fancy_grid")
+if sys.version_info[0] < 3:
+msg = msg.encode('utf-8')
+print(msg)
 return
 
 if args.delete:

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/0d199e5f/tests/core.py
--
diff --git a/tests/core.py b/tests/core.py
index 24c24fe..308804d 100644
--- a/tests/core.py
+++ b/tests/core.py
@@ -30,6 +30,7 @@ import os
 import re
 import signal
 import sqlalchemy
+import subprocess
 import tempfile
 import warnings
 from datetime import timedelta
@@ -1037,9 +1038,18 @@ class CliTests(unittest.TestCase):
 lines = [l for l in stdout.split('\n') if len(l) > 0]
 self.assertListEqual(lines, [
 ("\tThe following args are not compatible with the " +
- "--list flag: ['conn_id', 'conn_uri', 'conn_extra', 'conn_type', 
'conn_host', 'conn_login', 'conn_password', 'conn_schema', 'conn_port']"),
+ "--list flag: ['conn_id', 'conn_uri', 'conn_extra', " +
+ "'conn_type', 'conn_host', 'conn_login', " +
+ "'conn_password', 'conn_schema', 'conn_port']"),
 ])
 
+def test_cli_connections_list_redirect(self):
+cmd = ['airflow', 'connections', '--list']
+with tempfile.TemporaryFile() as fp:
+p = subprocess.Popen(cmd, stdout=fp)
+p.wait()
+self.assertEqual(0, p.returncode)
+
 def test_cli_connections_add_delete(self):
 # Add connections:
 uri = 'postgresql://airflow:airflow@host:5432/airflow'
@@ -1415,8 +1425,6 @@ class CliTests(unittest.TestCase):
 sleep(1)
 
 def test_cli_webserver_foreground(self):
-import subprocess
-
 # Confirm that webserver hasn't been launched.
 # pgrep returns exit status 1 if no process matched.
 self.assertEqual(1, subprocess.Popen(["pgrep", "-c", 
"airflow"]).wait())
@@ -1434,8 +1442,6 @@ class CliTests(unittest.TestCase):
 @unittest.skipIf("TRAVIS" in os.environ and bool(os.environ["TRAVIS"]),
  "Skipping test due to lack of required file permission")
 def test_cli_webserver_foreground_with_pid(self):
-import subprocess
-
 # Run webserver in foreground with --pid option
 pidfile = tempfile.mkstemp()[1]
 p = subprocess.Popen(["airflow", "webserver", "--pid", pidfile])
@@ -1450,7 +1456,6 @@ class CliTests(unittest.TestCase):
 @unittest.skipIf("TRAVIS" in os.environ and bool(os.environ["TRAVIS"]),
  "Skipping test due to lack of required file permission")
 def test_cli_webserver_background(self):
-import subprocess
 import psutil
 
 # Confirm that webserver hasn't been launched.



[jira] [Commented] (AIRFLOW-2344) Fix `airflow connections -l` to work with pipe and redirect

2018-04-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447806#comment-16447806
 ] 

ASF subversion and git services commented on AIRFLOW-2344:
--

Commit 0d199e5f37f076e5ff0a8e20140e35a125eedcfc in incubator-airflow's branch 
refs/heads/master from [~sekikn]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=0d199e5 ]

[AIRFLOW-2344] Fix `connections -l` to work with pipe/redirect

`airflow connections -l` uses 'tabulate' package
with
fancy_grid format, which outputs box drawing
characters.
It can occur UnicodeEncodeError with pipe or
redirect,
since the default encoding for Python 2.x is
ascii.
This PR fixes it and contains some flask8 related
fixes.

Closes #3244 from sekikn/AIRFLOW-2344


> Fix `airflow connections -l` to work with pipe and redirect
> ---
>
> Key: AIRFLOW-2344
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2344
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: cli
>Affects Versions: 1.9.0
>Reporter: Kengo Seki
>Assignee: Kengo Seki
>Priority: Minor
>
> {{airflow connections -l}} fails with pipe or redirect e.g.:
> {code}
> $ airflow connections -l > foo
> Traceback (most recent call last):
>   File "/home/sekikn/.virtualenvs/a/bin/airflow", line 6, in 
> exec(compile(open(__file__).read(), __file__, 'exec'))
>   File "/home/sekikn/dev/incubator-airflow/airflow/bin/airflow", line 32, in 
> 
> args.func(args)
>   File "/home/sekikn/dev/incubator-airflow/airflow/utils/cli.py", line 77, in 
> wrapper
> raise e
> UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-141: 
> ordinal not in range(128)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2360) Create user in UI broken with latest SQL Alchemy

2018-04-23 Thread James Gregory (JIRA)
James Gregory created AIRFLOW-2360:
--

 Summary: Create user in UI broken with latest SQL Alchemy
 Key: AIRFLOW-2360
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2360
 Project: Apache Airflow
  Issue Type: Bug
  Components: webserver
Affects Versions: 1.9.0
Reporter: James Gregory


Installing airflow via pip and then clicking "Create user" in the web server UI 
displays a Python error traceback

This is because the pip install of Airflow 1.9.0 pins flask-admin to 1.4.1, but 
sqlalchemy 1.1.17 and 1.1.18 break are incompatible with that version of flask 
admin. This is fixed in flask-admin 1.5.1, or alternatively fixed by limiting 
the version of SQL Alchemy.

I.e. in airflow setup.py either sqlalchemy could become:

'sqlalchemy>=1.1.15, <=1.1.16',

Or flask-admin could become:

'flask-admin==1.5.1'

Perhaps flask-admin has been pinned to 1.4.1 to prevent new versions breaking 
things, in which case the change to the sqlalchemy requirement would be safer.

The flask github issue is:

https://github.com/flask-admin/flask-admin/issues/1588



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-1819) Fix slack operator unittest bug

2018-04-23 Thread Kevin Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Yang resolved AIRFLOW-1819.
-
Resolution: Fixed

> Fix slack operator unittest bug
> ---
>
> Key: AIRFLOW-1819
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1819
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Kevin Yang
>Assignee: Kevin Yang
>Priority: Major
>
> slack_operator.py unittest is failing and is not covering code paths for 
> passing in api_params.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-1805) Allow to supply Slack token through connection

2018-04-23 Thread Kevin Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Yang resolved AIRFLOW-1805.
-
Resolution: Fixed

> Allow to supply Slack token through connection
> --
>
> Key: AIRFLOW-1805
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1805
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Kevin Yang
>Assignee: Kevin Yang
>Priority: Major
>
> To prevent passing in Slack token directly in plain text, it is safer to pass 
> in the token as 'password' through connection.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-1787) Fix batch clear RUNNING task instance and inconsistent timestamp format bugs

2018-04-23 Thread Kevin Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Yang resolved AIRFLOW-1787.
-
Resolution: Fixed

> Fix batch clear RUNNING task instance and inconsistent timestamp format bugs
> 
>
> Key: AIRFLOW-1787
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1787
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webserver
>Reporter: Kevin Yang
>Assignee: Kevin Yang
>Priority: Major
> Fix For: 1.10.0
>
>
> * Batch clear in CRUD is not working for task instances in RUNNING state, 
> need to be fixed
> * Batch clear and set status are not working for manually triggered task 
> instances because manually triggered task instances have different execution 
> date format.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2359) Add set failed for DagRun and TaskInstance in tree view

2018-04-23 Thread Kevin Yang (JIRA)
Kevin Yang created AIRFLOW-2359:
---

 Summary: Add set failed for DagRun and TaskInstance in tree view
 Key: AIRFLOW-2359
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2359
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Kevin Yang
Assignee: Kevin Yang


User has been requesting to add set failed in tree view for DagRun and 
TaskInstance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2300) Add S3 Select functionarity to S3ToHiveTransfer

2018-04-23 Thread Fokko Driesprong (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fokko Driesprong resolved AIRFLOW-2300.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request #3243
[https://github.com/apache/incubator-airflow/pull/3243]

> Add S3 Select functionarity to S3ToHiveTransfer
> ---
>
> Key: AIRFLOW-2300
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2300
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: aws, operators
>Reporter: Kengo Seki
>Assignee: Kengo Seki
>Priority: Major
> Fix For: 2.0.0
>
>
> For the same reason as AIRFLOW-2299, S3ToHiveTransfer should leverage S3 
> Select.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2300) Add S3 Select functionarity to S3ToHiveTransfer

2018-04-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447644#comment-16447644
 ] 

ASF subversion and git services commented on AIRFLOW-2300:
--

Commit 49826af108d2e245ca921944296f24cc73120461 in incubator-airflow's branch 
refs/heads/master from [~sekikn]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=49826af ]

[AIRFLOW-2300] Add S3 Select functionarity to S3ToHiveTransfer

To improve efficiency and usability, this PR adds
S3 Select functionarity to S3ToHiveTransfer.
It also contains some minor fixes for documents
and comments.

Closes #3243 from sekikn/AIRFLOW-2300


> Add S3 Select functionarity to S3ToHiveTransfer
> ---
>
> Key: AIRFLOW-2300
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2300
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: aws, operators
>Reporter: Kengo Seki
>Assignee: Kengo Seki
>Priority: Major
> Fix For: 2.0.0
>
>
> For the same reason as AIRFLOW-2299, S3ToHiveTransfer should leverage S3 
> Select.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2358) make kubernetes examples installed optionally

2018-04-23 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created AIRFLOW-2358:
--

 Summary: make kubernetes examples installed optionally
 Key: AIRFLOW-2358
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2358
 Project: Apache Airflow
  Issue Type: Improvement
  Components: examples
Affects Versions: Airflow 2.0, 1.10.0
Reporter: Ruslan Dautkhanov


Is it possible to make kubernetes examples installed optionally?
 
We don't use Kubernetes and a bare Airflow install fills logs with following :
 
{quote}2018-04-22 19:49:04,718 ERROR - Failed to import: 
/opt/airflow/airflow-20180420/src/apache-airflow/airflow/example_dags/example_kubernetes_operator.py
Traceback (most recent call last):
  File "/opt/airflow/airflow-20180420/src/apache-airflow/airflow/models.py", 
line 300, in process_file
    m = imp.load_source(mod_name, filepath)
  File 
"/opt/airflow/airflow-20180420/src/apache-airflow/airflow/example_dags/example_kubernetes_operator.py",
 line 19, in 
    from airflow.contrib.operators.kubernetes_pod_operator import 
KubernetesPodOperator
  File 
"/opt/airflow/airflow-20180420/src/apache-airflow/airflow/contrib/operators/kubernetes_pod_operator.py",
 line 21, in 
    from airflow.contrib.kubernetes import kube_client, pod_generator, 
pod_launcher
  File 
"/opt/airflow/airflow-20180420/src/apache-airflow/airflow/contrib/kubernetes/pod_launcher.py",
 line 25, in 
    from kubernetes import watch
{color:#f6c342}ImportError: No module named kubernetes{color}{quote}
 
Would be great to make examples driven by what modules installed if they have 
external dependencies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2300) Add S3 Select functionarity to S3ToHiveTransfer

2018-04-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447645#comment-16447645
 ] 

ASF subversion and git services commented on AIRFLOW-2300:
--

Commit 49826af108d2e245ca921944296f24cc73120461 in incubator-airflow's branch 
refs/heads/master from [~sekikn]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=49826af ]

[AIRFLOW-2300] Add S3 Select functionarity to S3ToHiveTransfer

To improve efficiency and usability, this PR adds
S3 Select functionarity to S3ToHiveTransfer.
It also contains some minor fixes for documents
and comments.

Closes #3243 from sekikn/AIRFLOW-2300


> Add S3 Select functionarity to S3ToHiveTransfer
> ---
>
> Key: AIRFLOW-2300
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2300
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: aws, operators
>Reporter: Kengo Seki
>Assignee: Kengo Seki
>Priority: Major
> Fix For: 2.0.0
>
>
> For the same reason as AIRFLOW-2299, S3ToHiveTransfer should leverage S3 
> Select.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


incubator-airflow git commit: [AIRFLOW-2300] Add S3 Select functionarity to S3ToHiveTransfer

2018-04-23 Thread fokko
Repository: incubator-airflow
Updated Branches:
  refs/heads/master a15b7c5b7 -> 49826af10


[AIRFLOW-2300] Add S3 Select functionarity to S3ToHiveTransfer

To improve efficiency and usability, this PR adds
S3 Select functionarity to S3ToHiveTransfer.
It also contains some minor fixes for documents
and comments.

Closes #3243 from sekikn/AIRFLOW-2300


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/49826af1
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/49826af1
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/49826af1

Branch: refs/heads/master
Commit: 49826af108d2e245ca921944296f24cc73120461
Parents: a15b7c5
Author: Kengo Seki 
Authored: Mon Apr 23 08:57:23 2018 +0200
Committer: Fokko Driesprong 
Committed: Mon Apr 23 08:57:23 2018 +0200

--
 airflow/hooks/S3_hook.py |  6 ++-
 airflow/operators/s3_to_hive_operator.py | 34 ++-
 tests/operators/s3_to_hive_operator.py   | 61 ---
 3 files changed, 90 insertions(+), 11 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/49826af1/airflow/hooks/S3_hook.py
--
diff --git a/airflow/hooks/S3_hook.py b/airflow/hooks/S3_hook.py
index 7a4b8b0..edde6ea 100644
--- a/airflow/hooks/S3_hook.py
+++ b/airflow/hooks/S3_hook.py
@@ -194,9 +194,11 @@ class S3Hook(AwsHook):
 :param expression_type: S3 Select expression type
 :type expression_type: str
 :param input_serialization: S3 Select input data serialization format
-:type input_serialization: str
+:type input_serialization: dict
 :param output_serialization: S3 Select output data serialization format
-:type output_serialization: str
+:type output_serialization: dict
+:return: retrieved subset of original data by S3 Select
+:rtype: str
 
 .. seealso::
 For more details about S3 Select parameters:

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/49826af1/airflow/operators/s3_to_hive_operator.py
--
diff --git a/airflow/operators/s3_to_hive_operator.py 
b/airflow/operators/s3_to_hive_operator.py
index 129dd92..e9a979d 100644
--- a/airflow/operators/s3_to_hive_operator.py
+++ b/airflow/operators/s3_to_hive_operator.py
@@ -85,6 +85,8 @@ class S3ToHiveTransfer(BaseOperator):
 :type input_compressed: bool
 :param tblproperties: TBLPROPERTIES of the hive table being created
 :type tblproperties: dict
+:param select_expression: S3 Select expression
+:type select_expression: str
 """
 
 template_fields = ('s3_key', 'partition', 'hive_table')
@@ -108,6 +110,7 @@ class S3ToHiveTransfer(BaseOperator):
 hive_cli_conn_id='hive_cli_default',
 input_compressed=False,
 tblproperties=None,
+select_expression=None,
 *args, **kwargs):
 super(S3ToHiveTransfer, self).__init__(*args, **kwargs)
 self.s3_key = s3_key
@@ -124,6 +127,7 @@ class S3ToHiveTransfer(BaseOperator):
 self.aws_conn_id = aws_conn_id
 self.input_compressed = input_compressed
 self.tblproperties = tblproperties
+self.select_expression = select_expression
 
 if (self.check_headers and
 not (self.field_dict is not None and self.headers)):
@@ -146,16 +150,42 @@ class S3ToHiveTransfer(BaseOperator):
 raise AirflowException(
 "The key {0} does not exists".format(self.s3_key))
 s3_key_object = self.s3.get_key(self.s3_key)
+
 root, file_ext = os.path.splitext(s3_key_object.key)
+if (self.select_expression and self.input_compressed and
+file_ext != '.gz'):
+raise AirflowException("GZIP is the only compression " +
+   "format Amazon S3 Select supports")
+
 with TemporaryDirectory(prefix='tmps32hive_') as tmp_dir,\
 NamedTemporaryFile(mode="wb",
dir=tmp_dir,
suffix=file_ext) as f:
 self.log.info("Dumping S3 key {0} contents to local file {1}"
   .format(s3_key_object.key, f.name))
-s3_key_object.download_fileobj(f)
+if self.select_expression:
+option = {}
+if self.headers:
+option['FileHeaderInfo'] = 'USE'
+if self.delimiter:
+option['FieldDelimiter'] = self.delimiter
+
+input_serialization = {'CSV': option}
+